ICFP’10 Proceedings of the 2010 ACM SIGPLAN International Conference on Functional Programming

September 27–29, 2010 Baltimore, Maryland, USA ICFP’10 Proceedings of the 2010 ACM SIGPLAN International Conference on...

Author: Stephanie Weirich ICFP’10 Program Chair

49 downloads 1152 Views 11MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

September 27–29, 2010 Baltimore, Maryland, USA

ICFP’10 Proceedings of the 2010 ACM SIGPLAN

International Conference on Functional Programming Sponsored by:

ACM SIGPLAN Supported by:

CreditSuisse, Erlang Solutions, Galois, Jane Street Capital, Microsoft Research, Standard Chartered

The Association for Computing Machinery 2 Penn Plaza, Suite 701 New York, New York 10121-0701 Copyright © 2010 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permission to republish from: Publications Dept., ACM, Inc. Fax +1 (212) 869-0481 or . For other copying of articles that carry a code at the bottom of the first or last page, copying is permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. Notice to Past Authors of ACM-Published Articles ACM intends to create a complete electronic archive of all articles and/or other material previously published by ACM. If you have written a work that has been previously published by ACM in any journal or conference proceedings prior to 1978, or any SIG Newsletter at any time, and you do NOT want this work to appear in the ACM Digital Library, please inform [email protected], stating the title of the work, the author(s), and where and when published.

ISBN: 978-1-60558-794-3 Additional copies may be ordered prepaid from:

ACM Order Department PO Box 11405 New York, NY 10286-1405 Phone: 1-800-342-6626 (USA and Canada) +1-212-626-0500 (all other countries) Fax: +1-212-944-1318 E-mail: [email protected]

ACM Order Number 565100 Printed in the USA

ii

Foreword It is my great pleasure to welcome you to the 15th ACM SIGPLAN International Conference on Functional Programming – ICFP’10. This conference features original papers on the art and science of functional programming. Submissions are invited on all topics from principles to practice, from foundations to features, from abstraction to application. The scope includes all languages that encourage functional programming, including both purely applicative and imperative languages, as well as languages with objects or concurrency. This year, the Call for Papers attracted 99 submissions, comprised of 92 full submissions and 7 experience reports. From these, the Program Committee selected 30 full papers and 3 experience reports. Full papers were evaluated according to their relevance, correctness, significance, originality, and clarity. Full papers include a special form of research paper, called Functional Pearls, which are not required to report original research, but must be concise, instructive, and entertaining. They are strictly limited to twelve pages. ICFP also includes experience reports. These reports provide evidence that functional programming really works or describe obstacles that have kept it from working. Experience reports are labeled in their titles and are limited to six pages. Each submission was reviewed by at least three Program Committee members. PC members were encouraged to solicit external reviews, but were required to read the papers themselves and form their own opinions. Reviews included comments from 125 external reviewers, and were made available to the authors for response before the PC meeting. Papers were selected during a physical meeting on June 3-4, 2010, held at Microsoft Research Cambridge. All PC members attended the meeting. In addition to selecting the program, the PC also chose to invite talks by Mike Gordon, Matthias Felleisen and Guy Blelloch. There were three submissions by PC members, two of which were accepted. These submissions were discussed after all other decisions had been made and were evaluated at a higher standard. I have been honored to serve as this year’s program chair and I am grateful to the many people who have contributed to the success of ICFP. I received guidance from ICFP General Chair Paul Hudak, and past Program Chairs Norman Ramsey and Andrew Tolmach, as well as many members of the ICFP steering committee. Eddie Kohler created the conference management software (HotCRP) and Robert Williams installed it at Penn. Rachel Billings helped with the PC meeting organization. Dimitrios Vytiniotis greatly assisted during the PC meeting. The external reviewers provided expert opinions, often on short notice. The Program Committee put in a tremendous amount of time and effort, generating fair, insightful, and helpful reviews for every submitted paper. Finally, the authors of all submitted papers deserve the most thanks for making this an exciting and rewarding ICFP.

Stephanie Weirich ICFP’10 Program Chair University of Pennsylvania

iii

Table of Contents ICFP 2010 Conference Organization .................................................................................................viii ICFP 2010 External Reviewers ...............................................................................................................ix ICFP 2010 Sponsors & Supporters ........................................................................................................x Keynote I Session Chair: Peter Dybjer (Chalmers University of Technology) •

ML: Metalanguage or Object Language? ...................................................................................................1 Michael J. C. Gordon (University of Cambridge)

Session 1 Session Chair: Andres Löh (Utrecht University) •

The Gentle Art of Levitation .........................................................................................................................3 James Chapman (Tallinn University of Technology), Pierre-Évariste Dagand, Conor McBride (University of Strathclyde), Peter Morris (University of Nottingham)

•

Functional Pearl: Every Bit Counts ...........................................................................................................15 Dimitrios Vytiniotis, Andrew J. Kennedy (Microsoft Research, Cambridge, U.K.)

Session 2 Session Chair: Olivier Danvy (University of Aarhus) •

ReCaml: Execution State as the Cornerstone of Reconfigurations ........................................................27 Jérémy Buisson (Université Européenne de Bretagne, Ecoles de St-Cyr Coëtquidan / VALORIA), Fabien Dagnat (Université Européenne de Bretagne, Institut Télécom / Télécom Bretagne)

•

Lolliproc: To Concurrency From Classical Linear Logic via Curry-Howard and Control ...............39 Karl Mazurak, Steve Zdancewic (University of Pennsylvania)

Session 3 Session Chair: Fritz Henglein (University of Copenhagen) •

Abstracting Abstract Machines ..................................................................................................................51 David Van Horn (Northeastern University), Matthew Might (University of Utah)

•

Polyvariant Flow Analysis with Higher-Ranked Polymorphic Types and Higher-Order Effect Operators ...............................................................................................63 Stefan Holdermans (Vector Fabrics), Jurriaan Hage (Utrecht University)

Session 4 Session Chair: Simon Peyton Jones (Microsoft Research) •

The Reduceron Reconfigured......................................................................................................................75 Matthew Naylor, Colin Runciman (University of York)

•

Using Functional Programming Within an Industrial Product Group: Perspectives and Perceptions.......................................................................................................................87 David Scott, Richard Sharp (Citrix Systems UK R&D), Thomas Gazagnaire (INRIA Sophia Antipolis), Anil Madhavapeddy (University of Cambridge)

•

Lazy Tree Splitting .......................................................................................................................................93 Lars Bergstrom, Mike Rainey, John Reppy, Adam Shaw (University of Chicago), Matthew Fluet (Rochester Institute of Technology)

v

Session 5 Session Chair: Peter Thiemann (University of Freiburg) •

Semantic Subtyping with an SMT Solver ................................................................................................105 Gavin M. Bierman, Andrew D. Gordon (Microsoft Research), Cătălin Hriţcu (Saarland University), David Langworthy (Microsoft Corporation)

•

Logical Types for Untyped Languages.....................................................................................................117 Sam Tobin-Hochstadt, Matthias Felleisen (Northeastern University)

Keynote 2 Session Chair: Stephanie Weirich (University of Pennsylvania) •

TeachScheme! — A Checkpoint................................................................................................................129 Matthias Felleisen (Northeastern University)

Session 6 Session Chair: Amal Ahmed (Indiana University) •

Higher-Order Representation of Substructural Logics .........................................................................131 Karl Crary (Carnegie Mellon University)

•

The Impact of Higher-Order State and Control Effects on Local Relational Reasoning..................143 Derek Dreyer, Georg Neis (MPI-SWS), Lars Birkedal (IT University of Copenhagen)

Session 7 Session Chair: Michael Hicks (University of Maryland, College Park) •

Distance Makes the Types Grow Stronger: A Calculus for Differential Privacy ...............................157 Jason Reed, Benjamin C. Pierce (The University of Pennsylvania)

•

Security-Typed Programming Within Dependently Typed Programming.........................................169 Jamie Morgenstern, Daniel R. Licata (Carnegie Mellon University)

Session 8 Session Chair: James Cheney (University of Edinburgh) •

Combining Syntactic and Semantic Bidirectionalization ......................................................................181 Janis Voigtländer (University of Bonn), Zhenjiang Hu (National Institute of Informatics, Tokyo), Kazutaka Matsuda (Tohoku University), Meng Wang (University of Oxford)

•

Matching Lenses: Alignment and View Update......................................................................................193 Davi M. J. Barbosa, Julien Cretin (École Polytechnique, INRIA), Nate Foster (Princeton University), Michael Greenberg, Benjamin C. Pierce (The University of Pennsylvania)

•

Bidirectionalizing Graph Transformations.............................................................................................205 Soichiro Hidaka, Zhenjiang Hu, Kazuhiro Inaba, Hiroyuki Kato (National Institute of Informatics, Japan), Kazutaka Matsuda (Tohoku University), Keisuke Nakano (The University of Electro-Communications, Japan)

Session 9 Session Chair: Graham Hutton (University of Nottingham), •

A Fresh Look at Programming with Names and Binders .....................................................................217 Nicolas Pouillard, François Pottier (INRIA)

•

Experience Report: Growing Programming Languages for Beginning Students ..............................229 Marcus Crestani (University of Tübingen), Michael Sperber (DeinProgramm)

•

Fortifying Macros .......................................................................................................................................235 Ryan Culpepper, Matthias Felleisen (Northeastern University)

Session 10: Awards and Announcements Session Chair: Robby Findler (Northwestern University),

vi

Keynote 3 Session Chair: Umut Acar (Max Planck Institute for Software Systems) •

Functional Parallel Algorithms.................................................................................................................247 Guy E. Blelloch (Carnegie Mellon University)

Session 11 Session Chair: Zhenjiang Hu (National Institute of Informatics) •

Specifying and Verifying Sparse Matrix Codes ......................................................................................249 Gilad Arnold (University of California, Berkeley), Johannes Hölzl (Technische Universität München), Ali Sinan Köksal (École Polytechnique Fédérale de Lausanne), Rastislav Bodík (University of California, Berkeley), Mooly Sagiv (Tel Aviv University)

•

Regular, Shape-Polymorphic, Parallel Arrays in Haskell .....................................................................261 Gabriele Keller, Manuel M. T. Chakravarty, Roman Leshchinskiy (University of New South Wales), Simon Peyton Jones (Microsoft Research Ltd.), Ben Lippmeier (University of New South Wales)

Session 12 Session Chair: James Hook (Portland State University) •

A Certified Framework for Compiling and Executing Garbage-Collected Languages ....................273 Andrew McCreight, Tim Chevalier, Andrew Tolmach (Portland State University)

•

Total Parser Combinators .........................................................................................................................285 Nils Anders Danielsson (University of Nottingham)

Session 13 Session Chair: Andrew Tolmach (Portland State University) •

Scrapping Your Inefficient Engine: Using Partial Evaluation to Improve Domain-Specific Language Implementation...........................297 Edwin C. Brady, Kevin Hammond (University of St Andrews)

•

Rethinking Supercompilation ...................................................................................................................309 Neil Mitchell

Session 14 Session Chair: Matthieu Sozeau (Harvard University) •

Program Verification Through Characteristic Formulae .....................................................................321 Arthur Charguéraud (INRIA)

•

VeriML: Typed Computation of Logical Terms Inside a Language with Effects..............................333 Antonis Stampoulis, Zhong Shao (Yale University)

•

Parametricity and Dependent Types........................................................................................................345 Jean-Philippe Bernardy, Patrik Jansson (Chalmers University of Technology and University of Gothenburg), Ross Paterson (City University London)

Session 15 Session Chair: Manuel Chakravarty (University of New South Wales) •

A Play on Regular Expressions: Functional Pearl..................................................................................357 Sebastian Fischer, Frank Huch, Thomas Wilke (Christian-Albrechts University of Kiel)

•

Experience Report: Haskell as a Reagent — Results and Observations on the Use of Haskell in a Python Project................................................................................................369 Iustin Pop (Google Switzerland)

•

Instance Chains: Type Class Programming without Overlapping Instances .....................................375 J. Garrett Morris, Mark P. Jones (Portland State University)

Author Index ................................................................................................................................................387

vii

ICFP 2010 Conference Organization General Chair: Program Chair: Local Arrangements Chair: Workshop Co-Chairs:

Programming Contest Chair: Publicity Chair:

Paul Hudak (Yale University, USA) Stephanie Weirich (University of Pennsylvania, USA) Michael Hicks (University of Maryland, College Park, USA) Derek Dreyer (Max Planck Institute for Software Systems, Germany) Christopher Stone (Harvey Mudd College, USA) Johannes Waldmann (Hochschule für Technik, Wirtschaft und Kultur, Leipzig, Germany) Wouter Swierstra (Vector Fabrics, The Netherlands)

Video Chair:

Scott Smith (Johns Hopkins University, USA)

Steering Committee Chair:

James Hook (Portland State University, USA)

Steering Committee:

Amal Ahmed (Indiana University, USA) Manuel Chakravarty (University of New South Wales, Australia) Olivier Danvy (Aarhus University, Denmark) Robby Findler (Northwestern University, USA) Fritz Henglein (University of Copenhagen, Denmark) Zhenjiang Hu (National Institute of Informatics, Japan) Paul Hudak (Yale University, USA) Graham Hutton (University of Nottingham, England) Francois Pottier (INRIA, France) Wouter Swierstra (Vector Fabrics, The Netherlands) Peter Thiemann (University of Freiburg, Germany) Andrew Tolmach (Portland State University, USA) Philip Wadler (University of Edinburgh, Scotland) Stephanie Weirich (University of Pennsylvania, USA)

Program Committee:

Umut Acar (Max Planck Institute for Software Systems, Germany) Zena Ariola (University of Oregon, USA) James Cheney (University of Edinburgh, Scotland) Peter Dybjer (Chalmers University of Technology, Sweden) Robby Findler (Northwestern University, USA) Andy Gill (Kansas University, USA) Fritz Henglein (University of Copenhagen, Denmark) Michael Hicks (University of Maryland, College Park, USA) Patricia Johann (University of Strathclyde, Scotland) Andres Löh (Utrecht University, The Netherlands) Simon Peyton Jones (Microsoft Research, England) Didier Rémy (INRIA Paris-Rocquencourt, France) John Reppy (University of Chicago, USA) Manuel Serrano (INRIA Sophia-Antipolis, France) Matthieu Sozeau (Harvard University, USA) viii

ICFP 2010 External Reviewers Amal Ahmed Jade Alglave Kenichi Asai Robert Atkey Patrick Bahr Martin Berger Lars Bergstrom Jost Berthold Yves Bertot Pramod Bhatotia Richard Bird Gérard Boudol Frédéric Boussinot Lucas Bordeaux Dan Brown Neil Brown Peter Buneman Jacques Carette Manuel Chakravarty Arthur Charguéraud Avik Chaudhuri Yan Chen Adam Chlipala Patrick Cousot Russ Cox Karl Crary Alcino Cunha Nils Anders Danielsson Olivier Danvy Zaynah Dargaye Atze Dijkstra Christos Dimoulas Derek Dreyer Matthias Felleisen Andrzej Filinski Sebastian Fischer Matthew Flatt Matthew Fluet Steve Freund Nate Foster Ronald Garcia François Garillot

Neil Ghani Torsten Grust Jurriaan Hage Tim Harris Chris Hawblitzel Bastiaan Heeren Anders Starcke Henriksen Ralf Hinze Giang Hoang Brian Huffman Chung-Kil Hur Tom Hvitved Jun Inoue Patrik Jansson Alan Jeffrey Ranjit Jhala Mark Jones Ben Kavanagh Gabriele Keller Nick Kidd Oleg Kiselyov Casey Klein Ilya Klyuchnikov Boris Koepf Dexter Kozen Neel Krishnaswami Sava Krstic John Launchbury Didier Le Botlan Daan Leijen Xavier Leroy Ruy Ley-Wild Sam Lindley Dave MacQueen José Pedro Magalhães Stephen Magill Yitzhak Mandelbaum Vikash Mansinghka Simon Marlow Conor McBride Marino Miculan Shin-Cheng Mu

ix

Anca Muscholl Magnus Myreen Rasmus Møgelberg Aleks Nanevski Lasse Nielsen Morten Ib Nielsen Roland Olsson Roly Perera Frances Perry Brigitte Pientka Andrew M Pitts François Pottier Norman Ramsey Benoit Razet Tamara Rezk Colin Runciman Joseph Russ Alejandro Russo Susmit Sarkar Tom Schrijvers Chung-chieh Shan Satnam Singh Kristian Støvring T. Stephen Strickland Nikhil Swamy S. Doaitse Swierstra Don Syme Nicolas Tabareau Andrew Tolmach Franklyn Turbak Aaron Turon David Van Horn Thomas van Noort Jeff Vaughan Janis Voigtländer Dimitrios Vytiniotis Phil Wadler David Walker Edwin Westbrook Jerome Vouillon Dana Xu

ICFP 2010 Sponsors & Supporters

Sponsor:

Supporters:

x

ML: Metalanguage or Object Language? A talk in honour of Robin Milner Mike Gordon University of Cambridge Computer Laboratory [email protected]

Abstract

Background

My talk will celebrate Robin Milner’s contribution to functional programming via a combination of reminiscences about the early days of ML and speculations about its future.

ML was designed by Robin Milner in the 1970s as the language for scripting interactive proof commands for the Edinburgh LCF theorem prover.1 He used the term “metalanguage” (abbreviated to “ML”) for this scripting language and “object language” for the formal logic in which theorems were proved.2 This first version of ML evolved, over the years, into several general-purpose functional programming languages, though its role as a theorem prover metalanguage has also continued and grown. A prominent member of the ML family, Standard ML (SML), was designed in the 1980s by a team lead by Milner. SML is specified with a formal semantics; this formed the basis for a considerable body of research on the metatheory of programming languages, leading to many insights and advances. The semantics of SML also provide, in principle, a rigorous foundation for reasoning about individual ML programs, though the complexity of the full language semantics make this very challenging in practise. To make the analysis of functional programs more tractable, some descendants of Milner’s original LCF system have object logics whose terms are inspired by ML programs, but which are simplified so that they are easier to reason about than terms based on full SML. Thus ML is now both a metalanguage of interactive theorem provers, and an inspiration for object languages.

Categories and Subject Descriptors D.3.3 [Programming Languages]: Functional Programming General Terms Languages, theory Keywords Functional programming, proof assistants, ML, LCF, metalanguage, object language, types, polymorphism

1 The

acronym “LCF” abbreviates “Logic for Computable Functions”. The Edinburgh LCF system was the successor to the Stanford LCF system that was implemented by Milner and Weyhrauch at Stanford University. The object language of Stanford LCF was a monomorphically typed λ-calculus designed for reasoning about recursively defined functions on Scott domains. Stanford LCF had a fixed set of commands for creating proofs interactively, but no metalanguage for programming combinations of commands. 2 The object language of Edinburgh LCF was called “PPλ”, which abbreviated “Polymorphic Predicate λ-calculus”. It was an extension of the object language of Stanford LCF, with a polymorphic type system devised by Milner similar to the Hindly-Milner type system of ML. I don’t know whether Milner first conceived his theory of polymorphic types for ML or for PPλ, but I think that the design of PPλ was completed before that of ML.

Copyright is held by the author/owner(s). ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. ACM 978-1-60558-794-3/10/09.

1

The Gentle Art of Levitation James Chapman Institute of Cybernetics, Tallinn University of Technology [email protected]

Pierre-Évariste Dagand Conor McBride

University of Nottingham

University of Strathclyde

[email protected]

{dagand,conor}@cis.strath.ac.uk

Abstract

including those with dependent types, feature a datatype declaration construct, external to and extending the language for defining values and programs. However, dependent type systems also allow us to reflect types as the image of a function from a set of ‘codes’— a universe construction [Martin-Löf 1984]. Computing with codes, we expose operations on and relationships between the types they reflect. Here, we adopt the universe as our guiding design principle. We abolish the datatype declaration construct, by reflecting it as a datatype of datatype descriptions which, moreover, describes itself. This apparently self-supporting construction is a trick, of course, but we shall show the art of it. We contribute

We present a closed dependent type theory whose inductive types are given not by a scheme for generative declarations, but by encoding in a universe. Each inductive datatype arises by interpreting its description—a first-class value in a datatype of descriptions. Moreover, the latter itself has a description. Datatype-generic programming thus becomes ordinary programming. We show some of the resulting generic operations and deploy them in particular, useful ways on the datatype of datatype descriptions itself. Simulations in existing systems suggest that this apparently self-supporting setup is achievable without paradox or infinite regress.

• a closed type theory, extensible only definitionally, nonetheless

Categories and Subject Descriptors D.1.1 [Programming Techniques]: Applicative (Functional) Programming; D.3.3 [Language Constructs and Features]: Data types and structures General Terms

Peter Morris

equipped with a universe of inductive families of datatypes; • a self-encoding of the universe codes as a datatype in the

universe—datatype generic programming is just programming;

Design, Languages, Theory

• a bidirectional type propagation mechanism to conceal artefacts

1.

of the encoding, restoring a convenient presentation of data;

Introduction

• examples of generic operations and constructions over our uni-

Dependent datatypes, such as the ubiquitous vectors (lists indexed by length) express relative notions of data validity. They allow us to function in a complex world with a higher standard of basic hygiene than is practical with the context-free datatypes of MLlike languages. Dependent type systems, as found in Agda [Norell 2007], Coq [The Coq Development Team], Epigram [McBride and McKinna 2004], and contemporary Haskell [Cheney and Hinze 2003; Xi et al. 2003], are beginning to make themselves useful. As with rope, the engineering benefits of type indexing sometimes outweigh the difficulties you can arrange with enough of it. The blessing of expressing just the right type for the job can also be a curse. Where once we might have had a small collection of basic datatypes and a large library, we now must cope with a cornucopia of finely confected structures, subtly designed, subtly different. The basic vector equipment is much like that for lists, but we implement it separately, often retyping the same code. The Agda standard library [Danielsson 2010], for example, sports a writhing mass of list-like structures, including vectors, bounded-length lists, difference lists, reflexive-transitive closures—the list is petrifying. Here, we seek equipment to tame this gorgon’s head with reflection. The business of belonging to a datatype is itself a notion relative to the type’s declaration. Most typed functional languages,

verse, notably the free monad construction; • datatype generic programming delivered directly, not via some

isomorphic model or ‘view’ of declared types. We study two universes as a means to explore this novel way to equip a programming language with its datatypes. We warm up with a universe of simple datatypes, just sufficient to describe itself. Once we have learned this art, we scale up to indexed datatypes, encompassing the inductive families [Dybjer 1991; Luo 1994] found in Coq and Epigram, and delivering experiments in generic programming with applications to the datatype of codes itself. We aim to deliver proof of concept, showing that a closed theory with a self-encoding universe of datatypes can be made practicable, but we are sure there are bigger and better universes waiting for a similar treatment. Benke, Dybjer and Jansson [Benke et al. 2003] provide a useful survey of the possibilities, including extension to inductive-recursive definition, whose closed-form presentation [Dybjer and Setzer 1999, 2000] is both an inspiration for the present enterprise, and a direction for future study. The work of Morris, Altenkirch and Ghani [Morris 2007; Morris and Altenkirch 2009; Morris et al. 2009] on (indexed) containers has informed our style of encoding and the equipment we choose to develop, but the details here reflect pragmatic concerns about intensional properties which demand care in practice. We have thus been able to implement our work as the basis for datatypes in the Epigram 2 prototype [Brady et al.]. We have also developed a stratified model of our coding scheme in Agda and Coq1 .

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

1 This

model is available at http://personal.cis.strath.ac.uk/~dagand/levitate.tar.gz

3

2.

The Type Theory

The judgmental equality comprises the computational rules below, closed under reflexivity, symmetry, transitivity and structural congruence, even under binders. We omit the mundane rules which ensure these closure properties for reasons of space.

One challenge in writing this paper is to extricate our account of datatypes from what else is new in Epigram 2. In fact, we demand relatively little from the setup, so we shall start with a ‘vanilla’ theory and add just what we need. The reader accustomed to dependent types will recognise the basis of her favourite system; for those less familiar, we try to keep the presentation self-contained. 2.1

Γ ` S : S ET Γ; x : S ` t : T Γ ` s:S Γ ` (λS x. t) s ≡ t[s/x] : T[s/x]

Base theory

Γ ` s : S Γ; x : S ` T : S ET Γ; s : S ` t : T[s/x] Γ ` π0 ([s, t]x.T ) ≡ s : S

We adopt a traditional presentation for our type theory, with three mutually defined systems of judgments: context validity, typing, and equality, with the following forms: Γ ` VALID Γ ` t:T Γ ` s ≡ t:T

Given a suitable stratification of S ET, the computation rules yield a terminating evaluation procedure, ensuring the decidability of equality and thence type checking.

Γ is a valid context, giving types to variables term t has type T in context Γ s and t are equal at type T in context Γ

2.2

The rules are formulated to ensure that the following ‘sanity checks’ hold by induction on derivations

and that judgments J are preserved by well-typed instantiation.

where some value E : En in the ‘enumeration universe’ describes a type of tag choices #E. We shall need some tags—valid identifiers, marked to indicate that they are data, not variables scoped and substitutable—so we hardwire these rules:

We specify equality as a judgment, leaving open the details of its implementation, requiring only a congruence including ordinary computation (β-rules), decided, e.g., by testing α-equivalence of β-normal forms [Adams 2006]. Coquand and Abel feature prominently in a literature of richer equalities, involving η-expansion, proof-irrelevance and other attractions [Abel et al.; Coquand 1996]. Agda and Epigram 2 support such features, Coq currently does not, but they are surplus to requirements here. Context validity ensures that variables inhabit well-formed sets.

Γ ` VALID Γ ` Tag : S ET

nE : En

Γ ` VALID Γ ` 1 : S ET

Γ ` VALID Γ ` 0 : #(cE t E)

Γ ` S ≡ T : S ET Γ ` s:T Γ ` VALID Γ ` [] : 1

Γ; x : S ` T : S ET Γ ` t : T[s/x] Γ ` [s, t]x.T : (x : S) × T

Γ ` p : (x : S) × T Γ ` π0 p : S

π : (E : En)(P : #E → S ET) → S ET π nE P 7→ 1 π (cE t E) P 7→ P 0 × π E λx. P (1+x)

Γ ` p : (x : S) × T Γ ` π1 p : T[π0 p/x]

This builds a right-nested tuple type, packing a Pi value for each i in the given domain. The step case exposes our notational convention that binders scope rightwards as far as possible. These tuples are ‘jump tables’, tabulating dependently typed functions. We give this functional interpretation—the eliminator we need—by the switch operator, which, unsurprisingly, iterates projection:

Γ ` S : S ET Γ; x : S ` T : S ET Γ ` (x : S) → T : S ET Γ ` S : S ET Γ; x : S ` t : T Γ ` λS x. t : (x : S) → T

Γ ` n : #E Γ ` 1+n : #(cE t E)

However, we expect that in practice, you might rather refer to these values by tag, and we shall ensure that this is possible in due course. Enumerations come with further machinery. Each #E needs an eliminator, allowing us to branch according to a tag choice. Formally, whenever we need such new computational facilities, we add primitive operators to the type theory and extend the judgmental equality with their computational behavior. However, for compactness and readability, we shall write these operators as functional programs (much as we model them in Agda). We first define the ‘small product’ π operator:

Γ ` S : S ET Γ; x : S ` T : S ET Γ ` (x : S) × T : S ET Γ ` s:S

cE (t : Tag) (E : En) : En

What are the values in #E? Formally, we represent the choice of a tag as a numerical index into E, via new rules:

The basic typing rules for tuples and functions are also standard, save that we locally adopt S ET : S ET for presentational purposes. Usual techniques to resolve this typical ambiguity apply [Courant 2002; Harper and Pollack; Luo 1994]. A formal treatment of stratification for our system is a matter of ongoing work.

Γ ` VALID Γ ` S ET : S ET

Γ ` VALID s a valid identifier Γ ` ’s : Tag

Let us describe enumerations as lists of tags, with signature:

Γ ` S : S ET x 6∈ Γ Γ; x : S ` VALID

Γ ` s:S

#(E : En) : S ET

En : S ET

Γ; x : S; ∆ ` J ⇒ Γ ` s : S ⇒ Γ; ∆[s/x] ` J[s/x]

Γ; x : S; ∆ ` VALID Γ; x : S; ∆ ` x : S

Finite enumerations of tags

It is time for our first example of a universe. You might want to offer a choice of named constructors in your datatypes: we shall equip you with sets of tags to choose from. Our plan is to implement (by extending the theory, or by encoding) the signature

Γ ` t:T ⇒ Γ ` VALID ∧ Γ ` T : S ET Γ ` s ≡ t:T ⇒ Γ ` s:T ∧ Γ ` t:T

` VALID

Γ ` s : S Γ; x : S ` T : S ET Γ; s : S ` t : T[s/x] Γ ` π1 ([s, t]x.T ) ≡ t : T[s/x]

Γ ` f : (x : S) → T Γ ` s:S Γ ` f s : T[s/x]

switch : (E : En)(P : #E → S ET) → π E P → (x : #E) → P x switch (cE t E) P b 0 7→ π0 b switch (cE t E) P b (1+x) 7→ switch E (λx. P(1+x)) (π1 b) x

Notation. We subscript information needed for type synthesis but not type checking, e.g., the domain of a λ-abstraction, and suppress it informally where clear. Square brackets denote tuples, with a LISP-like right-nesting convention: [a b] abbreviates [a, [b, []]].

The π and switch operators deliver dependent elimination for finite enumerations, but are rather awkward to use directly. We do

4

Γ exprEx . term ∈ type

Γ type 3 exprIn . term

Γ S ET 3 T . T 0 Γ T 0 3 t . t 0 Γ (t : T ) . t 0 ∈ T 0

Γ s . s0 ∈ S Γ S ET 3 S ≡ T Γ T 3 s . s0

Γ; x : S; ∆ ` VALID Γ; x : S; ∆ x . x ∈ S Γ p . p0 ∈ (x : S) × T Γ π0 p . π0 p0 ∈ S

Γ ` VALID Γ S ET 3 S ET . S ET

Γ f . f 0 ∈ (x : S) → T Γ S 3 s . s0 Γ f s . f 0 s0 ∈ T[s0 /x]

Γ S ET 3 S . S0 Γ; x : S0 S ET 3 T . T 0 Γ S ET 3 (x : S) → T . (x : S0 ) → T 0

Γ p . p0 ∈ (x : S) × T Γ π1 p . π1 p0 ∈ T[π0 p0 /x]

Γ; x : S T 3 t . t0 Γ (x : S) → T 3 λx. t . λS x. t0 Γ S ET 3 S . S0 Γ; x : S0 S ET 3 T . T 0 Γ S ET 3 (x : S) × T . (x : S0 ) × T 0

Figure 1. Type synthesis

Γ S 3 s . s0 Γ T[s0 /x] 3 t . t0 Γ (x : S) × T 3 [s, t] . [s0 , t0 ]x.T

not write the range for a λ-abstraction, so it is galling to supply P for functions defined by switch. Let us therefore find a way to recover the tedious details of the encoding from types. 2.3

Γ (x : S) → (y : T) → U[[x, y]x.T /p] 3 f . f 0 Γ (p : (x : S) × T) → U 3 ∧f . λ((x:S) ×T) p. f 0 (π0 p) (π1 p)

Type propagation

Our approach to tidying the coding cruft is deeply rooted in the bidirectional presentation of type checking from Pierce and Turner [Pierce and Turner 1998]. They divide type inference into two communicating components. In type synthesis, types are pulled out of terms. A typical example is a variable in the context:

Γ ` VALID Γ S ET 3 1 . 1 Γ ` VALID Γ En 3 [] . nE Γ ` E : En Γ #(cE ’t E) 3 ’t . 0

Γ; x : S; ∆ ` VALID Γ; x : S; ∆ ` x : S

Γ ` VALID Γ 1 3 [] . []

Γ En 3 E . E0 Γ En 3 [’t, E] . cE ’t E0 Γ #E 3 ’t . n ’t 6= ’t0 Γ #(cE ’t0 E) 3 ’t . 1+n

Γ #E 3 n . n0 Γ #(cE ’t0 E) 3 1+n . 1+n0 Γ π E (λ#E x. T) 3 ~t . t0 Γ (x : #E) → T 3 ~t . switch E (λ#E x. T) t0

Γ ` E : En Γ #(cE ’t E) 3 0 . 0

Because the context stores the type of the variable, we can extract the type whenever the variable is used. On the other hand, in the type checking phase, types are pushed into terms. We are handed a type together with a term, our task consists of checking that the type admits the term. In doing so, we can and should use the information provided by the type. Therefore, we can relax our requirements on the term. Consider λ-abstraction:

Figure 2. Type checking

Γ ` S : S ET Γ; x : S ` t : T Γ ` λS x. t : (x : S) → T

Canonical set-formers are checked: we could exploit S ET : S ET to give them synthesis rules, but this would prejudice our future stratification plans. Note that abstraction and pairing are free of annotation, as promised. Most of the propagation rules are unremarkably structural: we have omitted some mundane rules which just follow the pattern, e.g., for Tag. However, we also add abbreviations. We write ∧f , pronounced ‘uncurry f ’ for the function which takes a pair and feeds it to f one component at a time, letting us name them individually. Now, for the finite enumerations, we go to work. Firstly, we present the codes for enumerations as right-nested tuples which, by our LISP convention, we write as unpunctuated lists of tags [’t0 . . . ’tn ]. Secondly, we can denote an element by its name: the type pushed in allows us to recover the numerical index. We retain the numerical forms to facilitate generic operations and ensure that shadowing is punished fittingly, not fatally. Finally, we express functions from enumerations as tuples. Any tuple-form, [] or [_, _], is accepted by the function space—the generalised product—if it is accepted by the small product. Propagation fills in the appeal to switch, copying the range information. Our interactive development tools also perform the reverse transformation for intelligible output. The encoding of any specific enumeration is thus hidden by these translations. Only, and rightly, in enumeration-generic programs is the encoding exposed. Our type propagation mechanism does no constraint solving, just copying, so it is just the thin end of the elaboration wedge. It can afford us this ‘assembly language’ level of civilisation as

The official rules require an annotation specifying the domain. However, in type checking, the Π-type we push in determines the domain, so we can drop the annotation. We adapt this idea, yielding a type propagation system, whose purpose is to elaborate compact expressions into the terms of our underlying type theory, much as in the definition of Epigram 1 [McBride and McKinna 2004]. We divide expressions into two syntactic categories: exprIn into which types are pushed, and exprEx from which types are extracted. In the bidirectional spirit, the exprIn are subject to type checking, while the exprEx—variables and elimination forms—admit type synthesis. We embed exprEx into exprIn, demanding that the synthesised type coincides with the type proposed. The other direction—only necessary to apply abstractions or project from pairs—takes a type annotation. Type synthesis (Fig. 1) is the source of types. It follows the exprEx syntax, delivering both the elaborated term and its type. Terms and expressions never mix: e.g., for application, we instantiate the range with the term delivered by checking the argument expression. Hardwired operators are checked as variables. Dually, type checking judgments (Fig. 2) are sinks for types. From an exprIn and a type pushed into it, they elaborate a lowlevel term, extracting information from the type. Note that we inductively ensure the following ‘sanity checks’: Γ e .t ∈T ⇒Γ`t :T Γ T 3e .t ⇒Γ`t :T

5

En universe specifies not only the representation of the low-level

Descn : S ETn+1 ’1 : Descn ’Σ (S : S ETn ) (D : S → Descn ) : Descn ’ind× (D : Descn ) : Descn n n n J_ K : Desc → S ET → S ET J’1K X 7→ 1 J’Σ S DK X 7→ (s : S) × JD sKX J’ind× DK X 7→ X × JDK X

values in each set as bounded numbers, but also the presentation of these values as high-level tags. To encode only the former, we should merely need the size of enumerations, but we extract more work from these types by making them more informative. We have also, en passant, distinguished enumerations which have the same cardinality but describe distinct notions: #[’red ’blue] is not #[’green ’orange].

3.

A Universe of Inductive Datatypes

Figure 3. Universe of Descriptions

In this section, we describe an implementation of inductive types, as we know them from ML-like languages. By working with familiar datatypes, we hope to focus on the delivery mechanism, warming up gently to the indexed datatypes we really want. Dybjer and Setzer’s closed formulation of induction-recursion [Dybjer and Setzer 1999], but without the ‘-recursion’. An impredicative Church-style encoding of datatypes is not adequate for dependently typed programming, as although such encodings present data as non-dependent eliminators, they do not support dependent induction [Geuvers 2001]. Whilst the λ-calculus captures all that data can do, it cannot ultimately delimit all that data can be. 3.1

operator for JDK . In Section 4, you will see what we do instead. Meanwhile, let us gain some intuition by developing examples. 3.3

We begin with the natural numbers, now working in the high-level expression language of Section 2.3, exploiting type propagation. NatD : Descn NatD 7→ ’Σ #[’zero ’suc] [’1

(’ind× ’1)]

Let us explain its construction. First, we use ’Σ to give a choice between the ’zero and ’suc constructors. What follows depends on this choice, so we write the function computing the rest of the description in tuple notation. In the ’zero case, we reach the end of the description. In the ’suc case, we attach one recursive argument and close the description. Translating the Σ to a binary sum, we have effectively described the functor:

The power of Σ

In dependently typed languages, Σ-types can be interpreted as two different generalisations. This duality is reflected in the notation we can find in the literature. The notation Σx:A (B x) stresses that Σ-types are ‘dependent sums’, generalising sums over arbitrary arities, where simply typed languages have finite sums. On the other hand, our choice, (x:A) ×(B x), emphasises that Σtypes generalise products, with the type of the second component depending on the value of the first. Simply typed languages do not express such relative validity. In ML-like languages, datatypes are presented as a sum-ofproducts. A datatype is defined by a finite sum of constructors, each carrying a product of arguments. To embrace these datatypes, we have to capture this grammar. With dependent types, the notion of sum-of-products translates into sigmas-of-sigmas. 3.2

Examples

NatD Z 7→ 1 + Z

Correspondingly, we can see the injections to the sum: [’zero] : JNatDK Z

[’suc (z : Z)] : JNatDK Z

The pattern functor for lists needs but a small change: ListD : S ETn → Descn ListD X 7→ ’Σ #[’nil ’cons] [’1

(’Σ X λ_. ’ind× ’1)]

The ’suc constructor becomes ’cons, taking an X followed by a recursive argument. This code describes the following functor:

The universe of descriptions

ListD X Z 7→ 1 + X×Z

While sigmas-of-sigmas can give a semantics for the sum-ofproducts structure in each node of the tree-like values in a datatype, we need to account for the recursive structure which ties these nodes together. We do this by constructing a universe [Martin-Löf 1984]. Universes are ubiquitous in dependently typed programming [Benke et al. 2003; Oury and Swierstra 2008], but here we take them as the foundation of our notion of datatypes. To add inductive types to our type theory, we build a universe of datatype descriptions by implementing the signature presented in Figure 3, with codes mimicking the grammar of datatype declarations. We can read a description D : Descn as a ‘pattern functor’ on S ETn , with JDK its action on an object, X, soon to be instantiated recursively. The superscripts indicate the S ET-levels at which we expect these objects in a stratified system. This is but an informal notation, to give a flavour of the stratified presentation. Note that the functors so described are strictly positive, by construction. Descriptions are sequential structures ending in ’1, indicating the empty tuple. To build sigmas-of-sigmas, we provide a ’Σ code, interpreted as a Σ-type. To request a recursive component, we have ’ind× D, where D describes the rest of the node. These codes give us sigmas-of-sigmas with recursive places. An equivalent, more algebraic presentation could be given, as illustrated in Section 5. We admit to being a little coy, writing of ‘implementing a signature’ without clarifying how. A viable approach would simply be to extend the theory with constants for the constructors and an

Of course, we are not limited to one recursive argument. Here are the node-labelled binary trees: TreeD : S ETn → Descn TreeD X 7→ ’Σ #[’leaf ’node] [’1 (’ind× (’Σ X λ_. ’ind× ’1))]

Again, we are one evolutionary step away from ListD. However, instead of a single call to the induction code, we add another. The interpretation of this code corresponds to the following functor: TreeD X Z 7→ 1 + Z × X × Z

From the examples above, we observe that datatypes are defined by a ’Σ whose first argument enumerates the constructors. We call codes fitting this pattern tagged descriptions. Again, this is a clear reminder of the sum-of-products style. Any description can be forced into this style with a singleton constructor set. We characterise tagged descriptions thus: TagDescn : S ETn+1 TagDescn 7→ (E : En) ×(π E λ_. Descn ) de : TagDescn → Descn de 7→ ∧λE. λD. ’Σ #E (switch E (λ_. Descn ) D)

It is not such a stretch to expect that the familiar datatype declaration might desugar to the definitions of a tagged description.

6

3.4

The least fixpoint

This abbreviation is no substitute for the dependent pattern matching to which we are entitled in a high-level language built on top of this theory [Goguen et al. 2006]. It does at least make ‘assembly language’ programming mercifully brief, albeit hieroglyphic.

So far, we have built pattern functors with our Desc universe. Being polynomial functors, they all admit a least fixpoint, which we now construct by tying the knot: the element type abstracted by the functor is now instantiated recursively: Γ ` D : Descn Γ ` µD : S ETn

plus : Nat → Nat → Nat plus 7→ ∧[(λ_. λ_. λy. y)

Γ ` D : Descn

Γ ` d : JDK (µD) Γ ` con d : µD

This concludes our introduction to the universe of datatype descriptions. We have encoded sum-of-products datatypes from the simply-typed world as data and equipped them with computation. We have also made sure to hide the details by type propagation.

Tagged descriptions are very common, so we abbreviate: µ+ : TagDescn → S ETn

µ+ T 7→ µ(de T)

We can now build datatypes and their elements, e.g.:

4.

Nat 7→ µ+ [[’zero ’suc], [’1 (’ind× ’1)]] : S ETn con [’zero] : Nat con [’suc (n : Nat)] : Nat

cata : (D : Descn )(T : S ETn ) →(JDK T → T) → µD → T

However, iteration is inadequate for dependent computation. We need induction to write functions whose type depends on inductive data. Following Benke et al. [2003], we adopt the following:

4.1

ind : (D : Descn )(P : µD → S ETk ) → ((d : JDK (µD)) → All D (µD) P d → P(con d)) →

Implementing finite enumerations

In Section 2.2, we specified the finite sets of tags. We are going to implement (at every universe level) the En type former and its constructors. Recall:

(x : µD) → Px

ind D P m (con d) 7→ m d (all D (µD) P (ind D P m) d)

En : S ETn

Here, All D X P d states that P : X → S ETk holds for every subobject x : X in D, and all D X P p d is a ‘dependent map’, applying some p : (x : X) → P x to each x contained in d. The definition (including an extra case, introduced soon) is in Figure 4.2 So, ind is our first operation generic over descriptions, albeit hardwired. Any datatype we define comes with induction. Note that the very same functors JDK also admit greatest fixpoints: we have indeed implemented coinductive types this way, but that is another story.

nE : En

cE (t : Tag) (E : En) : En

The nE and cE constructors are just the ‘nil’ and ‘cons’ or ordinary lists, with elements from Tag. Therefore, we implement: En 7→ µ(ListD Tag)

nE 7→ ’nil

cE t E 7→ ’cons t E

Let us consider the consequences. We find that the type theory does not need a special type former En, or special constructors nE and cE. Moreover, the π E P operator, computing tuple types of Ps by recursion on E need not be hardwired: we can just use the generic ind operator, as we would for any ordinary program. Note, however, that the universe decoder #E is hardwired, as are the primitive 0 and 1+ that we use for low-level values, and indeed the switch operator. We cannot dispose of data altogether! We have, however, gained the ordinariness of the enumeration codes, and hence of generic programs which manipulate them. Our next step is similar: we are going to condense the entire naming scheme of datatypes into itself.

Extending type propagation

We now have low level machinery to build and manipulate inductive types. Let us apply cosmetic surgery to reduce the syntactic overhead. We extend type checking of expressions: Γ #E 3 ’c . n Γ JD nK (µ(’Σ #E D)) 3 ~t . t0 Γ µ(’Σ #E D) 3 ’c~t . con [n, t0 ]

4.2

Here ’c~t denotes a tag ‘applied’ to a sequence of arguments, and ~t that sequence’s repackaging as a right-nested tuple. Now we can just write data directly. ’zero : Nat

Levitating the Universe of Descriptions

In this section, we will fulfil our promises and show how we implement the signatures, first for the enumerations, and then for the codes of the Descn universe. Persuading these programs to perform was a perilous pedagogical peregrination for the protagonist. Our method was indeed to hardwire constants implementing the signatures specified above, in the first instance, but then attempt to replace them, step by step, with definitions: “Is 2 + 2 still 4?”, “No, it’s a loop!”. But we did find a way, so now we hope to convey to you the dizzy feeling of levitation, without the falling.

But how shall we compute with our data? We should expect an elimination principle. Following a categorical intuition, we might provide the ‘fold’, or ‘iterator’, or ‘catamorphism’:

3.5

(λ_. ∧λh. λ_. λy. ’suc (h y))]

Implementing descriptions

The set of codes, Desc, is already some sort of datatype; as with En, we ought to be able to describe it, coding of Descn in Descn+1 , spiralling upwards. Hence, this code would be a first-class citizen, born with the generic equipment of datatypes.

’suc (n : Nat) : Nat

Once again, the type explains the legible presentation, as well as the low-level representation. We may also simplify appeals to induction by type propagation, as we have done with functions from pairs and enumerations.

4.2.1

First attempt

Our first attempt gets stuck quite quickly: n+1 DescDn : Desc 

Γ (d : JDK (µD)) → All D (µD) (λµD x. P) d → P[con d/x] 3 f . f0 Γ (x : µD) → P 3 f . ind D (λµD x. P) f 0

DescDn

   ’1 ’1 n 7→ de ’Σ  , ’Σ S ET λS. {?}  ’ind× ’ind× ’1

Let us explain where we stand. Much as we have done so far, we first offer a constructor choice from ’1, ’Σ, and ’ind×. You may notice that the ‘tagged’ notation we have used for the Descn constructors now fits the facts: these were actually the tags we are defining. For ’1, we immediately reach the end of the description.

2 To

pass the termination checker, we had to inline the definition of all into ind in our Agda model. A simulation argument shows that the definition presented here terminates if the inlined version does. Hence, although not directly structural, this definition is indeed terminating.

7

(D : Descn )(X : S ETn )(P : X → S ETk ) (xs : JDK X) → S ETk All ’1 X P [] 7→ 1 All (’Σ S D) X P [s, d] 7→ All (D s) X P d All (’ind× D) X P [x, d] 7→ P x × All D X P d All (’hind× H D) X P [f , d] 7→ ((h : H) → P (f h)) × All D X P d All :

(D : Descn )(X : S ETn )(P : X → S ETk ) (p : (x : X) → P x)(xs : JDK X) → All D X P xs all ’1 X P p [] 7→ [] all (’Σ S D) X P p [s, d] 7→ all (D s) X P p d all (’ind× D) X P p [x, d] 7→ [p x, all D X P p d] all (’hind× H D) X P p [f , d] 7→ [λh. p (f h), all D X P p d] all :

Figure 4. Defining and collecting inductive hypotheses For ’ind×, there is a single recursive argument. Describing ’Σ is problematic. Recall the specification of ’Σ:

Expanding de − and propagating types as in Figure 2 reveals the awful truth:

’Σ (S : S ETn ) (D : S → Descn ) : Descn

Descn 7→ µ(’Σ #[’1 ’Σ ’ind× ’hind×] switch [’1 ’Σ ’ind× ’hind×] (λ_. Descn+1 )

  ’1 n ’Σ S ET λS. ’hind× S ’1 ) ’ind× ’1 n ’Σ S ET λ_. ’ind× ’1

So, we first pack a S ETn , S, as well we might when working in Descn+1 . We should then like a recursive argument indexed by S, but that is an exponential, and our presentation so far delivers only sums-of-products. To code our universe, we must first enlarge it! 4.2.2

The recursion shows up only because we must specify the return type of the general-purpose switch, and it is computing a Descn+1 ! Although type propagation allows us to hide this detail when defining a function, we cannot readily suppress this information and check types when switch is fully applied. We are too close to give up now. If only we did not need to supply that return type, especially when we know what it must be! We eliminate the recursion by specialising switch:

Second attempt

In order to capture a notion of higher-order induction, we add a code ’hind× that takes an indexing set H. This amounts to give a recursive subobject for each element of H. ’hind× (H : S ETn ) (D : Descn ) : Descn J’hind× H DK X 7→ (H → X) × JDK X

switchD : (E : En) →(π E λ_. Descm ) → #E → Descm

The magician’s art rests here, in this extension. We conceal it behind a type propagation rule for switchD which we apply with higher priority than for switch in general. Γ π E λ#E x. Descm 3 ~t . t0 Γ #E → Descm 3 ~t . switchD E t0

Note that up to isomorphism, ’ind× is subsumed by ’hind× 1 . However, the apparent duplication has some value. Unlike its counterpart, ’ind× is first-order: we prefer not to demand dummy functions from 1 in ordinary data, e.g. ’suc (λ_. n). It is naïve to imagine that up to isomorphism, any representation of data will do. Firstorder representations are finitary by construction, and thus admit a richer, componentwise decidable equality than functions may in general possess.3 We are now able to describe our universe of datatypes:

As a consequence, our definition above now propagates without introducing recursion. Of course, by pasting together the declaration of Descn and its internal copy, we have made it appear in its own type. Hardwired as a trusted fait accompli, this creates no regress, although one must assume the definition to recheck it. Our Agda model does not formalise the switchD construction. Instead, we exhibit the isomorphism between declared and encoded descriptions. Here, switchD lets us collapse this isomorphism, operationally identifying defined and coded descriptions. There are other ways to achieve a sufficient specialisation to avoid a recursive code, e.g., extending Descn with specialised codes for finite sums and products, pushing the switch into the interpretation of codes, rather than the code itself. Here, we prefer not to add codes to Descn which are otherwise unmotivated. We have levitated Desc at every level. Beyond its pedagogical value, this exercise has several practical outcomes. First, it confirms that each Desc universe is just plain data. As any piece of data, it can therefore be inspected and manipulated. Moreover, it is expressed in a Desc universe. As a consequence, it is equipped, for free, with an induction principle. So, our ability to inspect and program with Desc is not restricted to a meta-language: we have the necessary equipment to program with data, so we can program over datatypes. Generic programming is just programming.

DescDn : Descn+1

    ’1 ’1 ’Σ  ’Σ S ETn λS. ’hind× S ’1 DescDn 7→ de  ,  ’ind×  ’ind× ’1 ’hind× ’Σ S ETn λ_. ’ind× ’1 The ’1 and ’ind× cases remain unchanged, as expected. We successfully describe the ’Σ case via the higher-order induction, branching on S. The ’hind× case just packs a S ETn with a recursive argument. At a first glance, we have achieved our goal. We have described the codes of the universe of descriptions. The fixpoint of JDescDn K is a datatype just like Descn , in S ETn+1 . Might we be so bold as to take Descn 7→ µDescDn as the levitating definition? If we do, we shall come down with a bump! To complete our levitation, just as in the magic trick, requires hidden assistance. Let us explain the problem and reveal the ‘invisible cable’ which fixes it. 4.2.3

Final move

The definition Descn 7→ µDescDn is circular, but the offensive recursion is concealed by a prestidigitation.

4.3

The generic catamorphism

In Section 3.4, we hardwired a dependent induction principle, but sometimes, iteration suffices. Let us construct the catamorphism. We proceed by induction on the data in µD: the non-dependent return type T is readily propagated. Given a node xs and the in-

3 E.g.,

extensionally, there is one function in #[] → Nat; intensionally, there is a countable infinitude which it is dangerous to identify definitionally.

8

duction hypotheses, the method ought to build an element of T. Provided that we know how to make an element of JDK T, this step will be performed by the algebra f . Let us take a look at this jigsaw:

Object

cata : (D : Desc)(T : S ET) →(JDK T → T) → µD → T cata D T f 7→ λxs. λhs. f {?}

J_K µ, con ind, All, all

En Desc

The hole remains: we have xs : JDK µD and hs : All D µD (λ_. T) xs to hand, and we need a JDK T. Now, xs has the right shape, but its components have the wrong type. However, for each such component, hs holds the corresponding value in T. We need a function to replace the former with the latter: this pattern matching sketch yields an induction on D. We fill the hole with replace D (µD) T xs hs.

Status Levitated Levitated Hardwired Hardwired Hardwired

Table 1. Summary of constructions on Descriptions

We complete the hole with apply D X Y σ. Every tagged description can be seen as a signature of operations: we can uniformly add a notion of variable, building a new type from an old one, then providing the substitution structure.

replace : (D : Desc)(X, Y : S ET) (xs : JDK X) → All D X (λ_. Y) xs → JDK Y replace ’1 X Y [] [] 7→ [] replace (’Σ S D) X Y [s, d] d0 7→ [s, replace (D s) X Y d d0 ] replace (’ind× D) X Y [x, d] [y, d0 ] 7→ [y, replace D X Y d d0 ] replace (’hind× H D) X Y [f , d] [g, d0 ] 7→ [g, replace D X Y d d0 ]

4.5

Skyhooks all the way up?

In this section, we have seen how to levitate descriptions. Although our theory, as presented here, takes S ET : S ET, our annotations indicate how a stratified theory could code each level from above. We do not rely on the paradoxical nature of S ET : S ET to flatten the hierarchy of descriptions and fit large inside small. We shall now be more precise about what we have done. Let us first clarify the status of the implementation. The kit for making datatypes is presented in Table 1. For each operation, we describe its role and its status, making clear which components are self-described and which ones are actually implemented. In a stratified system, the ‘self-encoded’ nature of Desc appears only in a set polymorphic sense: the principal type of the encoded description generalises to the type of Desc itself. We encode this much in our set polymorphic model in Agda and in our Coq model, crucially relying on typical ambiguity [Harper and Pollack]. We step outside current technology only to replace the declared Desc with its encoding. Even this last step, we can approximate within a standard predicative hierarchy. Fix a top level, perhaps 42. We may start by declaring Desc42 : S ET43 . We can then construct DescD41 : Desc42 and thus acquire an encoded Desc41 . Although Desc41 is encoded, not declared, it includes the relevant descriptions, including DescD40 . We can thus build the tower of descriptions down to Desc0 , encoding every level below the top. Description of descriptions forms a ‘spiral’, rather than a circle. We have modelled this process exactly in Agda, without any appeal to dependent pattern matching, induction-recursion, or set polymorphism. All it takes to build such a sawn-off model of encodings is inductive definition and a cumulative predicative hierarchy of set levels.

We have shown how to derive a generic operation, cata, from a pre-existing generic operation, ind, by manipulating descriptions as data: the catamorphism is just a function taking each Desc value to a datatype specific operation. This is polytypic programming, as in PolyP [Jansson and Jeuring 1997], made ordinary. 4.4

Role Build finite sets Describe pattern functors Interpret descriptions Define, inhabit fixpoints Induction principle

The generic free monad

In this section, we try a more ambitious generic operation. Given a functor—a signature of operations represented as a tagged description—we build its free monad, extending the signature with variables and substitution. Let us recall this construction in, say, Haskell. Given a functor f, the free monad over f is given thus: data FreeMonad f x = Var x | Op (f (FreeMonad f x)) Provided f is an instance of Functor, we may take Var for return and use f’s fmap to define »= as substitution. Being an inductive type, FreeMonad arises by a pattern functor: FreeMonadD F X Z 7→ X + F Z

Our construction takes the functor as a tagged description, and given a set X of variables, computes the tagged description of the free monad pattern functor. _∗ : TagDesc → S ET → TagDesc [E, D] ∗ X 7→ [[’var , E], [’Σ X ’1, D]] We simply add a constructor, ’var, making its arguments ’Σ X ’1— just an element of X. E and D stay put, leaving the other constructors unchanged. Unfolding the interpretation of this definition, we find an extended sum, corresponding to the X + in FreeMonadD. Taking the fixpoint ties the knot and we have our data. Now we need the operations. As expected, λx. ’var x plays the rôle of return, making variables terms. Meanwhile, bind is indeed substitution, which we now implement generically, making use of cata. Let us write the type, and start filling in the blanks:

5.

A Universe of Inductive Families

So far, we have explored the realm of inductive types, building on intuition from ML-like datatypes, using type dependency as a descriptive tool in Desc and its interpretation. Let us now make dependent types the object as well as the means of our study. Dependent datatypes provide a way to work at higher level of precision a priori, reducing the sources of failure we might otherwise need to manage. For the perennial example, consider vectors—lists indexed by length. By making length explicit in the type, we can prevent hazardous operations (the type of ‘head’ demands vectors of length ’suc n) and offer stronger guarantees (pointwise addition of n-vectors yields an n-vector). However, these datatypes are not individually inductive. For instance, we have to define the whole family of vectors mutually, in one go. In dependently typed languages, the basic grammar of datatypes is that of inductive families. To capture this grammar, we must account for indexing.

(D : TagDesc)(X, Y : S ET) →(X → µ+ (D∗ Y)) → µ+ (D∗ X) → µ+ (D∗ Y) subst D X Y σ 7→ cata (de (D∗ X)) (µ+ (D∗ Y)) {?}

subst :

We are left with implementing the algebra of the catamorphism. Its role is to catch appearances of ’var x and replace them by σ x. This corresponds to the following definition: (D : TagDesc)(X, Y : S ET) →(X → µ+ (D∗ Y)) → Jde (D∗ X)K (µ+ (D∗ Y)) → µ+ (D∗ Y) apply D X Y σ [’var x] 7→ σ x apply D X Y σ [c, xs] 7→ con [c, xs]

apply :

9

5.1

The universe of indexed descriptions

IDesc (I : S ET) : S ET ’var (i : I) : IDesc I ’k (A : S ET) : IDesc I (D : IDesc I) ’×(D : IDesc I) : IDesc I ’Σ (S : S ET) (D : S → IDesc I) : IDesc I ’Π (S : S ET) (D : S → IDesc I) : IDesc I J_ K :(I:S ET) → IDesc I →(I → S ET) → S ET J’var iKI X 7→ X i J’k KKI X 7→ K JD ’× D0 KI X 7→ JDKI X × JD0 KI X J’Σ S DKI X 7→ (s : S) × JD sKI X J’Π S DKI X 7→ (s : S) → JD sKI X

We presented the Desc universe as a grammar of strictly positive endofunctors on S ET and developed inductive types by taking a fixpoint. To describe inductive families indexed by some I : S ET, we play a similar game with endofunctors on the category S ETI , families of sets X, Y : I → S ET for objects, and for morphisms, families of functions in X → ˙ Y, defined pointwise: X→ ˙ Y 7→ (i : I) → X i → Y i An indexed functor in S ETI → S ETJ has the flavour of a device driver, characterising ‘responses’ to a given request in J where we may in turn make ‘subrequests’ at indices chosen from I. When we use indexed functors to define inductive families of datatypes, I and J coincide: we explain how to make a node fit a given index, including subnodes at chosen indices. E.g., if we are asked for a vector of length 3, we choose to ask in turn for a tail of length 2. To code up valid notions of response to a given request, we introduce IDesc and its interpretation:

Figure 6. Universe of indexed descriptions 5.2

IDesc (I : S ET) : S ET

For basic reassurance, we upgrade NatD:

upgrade NatD : IDesc 1 upgrade NatD 7→ ’Σ (#[’zero ’suc]) [(’k 1) (’var [] ’× ’k 1)]

J_K :(I:S ET) → IDesc I →(I → S ET) → S ET

Note that trailing 1’s keep our right-nested, []-terminated tuple structure, and with it our elaboration machinery. We can similarly upgrade any inductive type. Moreover, IDesc I can now code a bunch of mutually inductive types, if I enumerates the bunch [Paulin-Mohring 1996; Yakushev et al. 2009].

An IDesc I specifies just one response, but a request-to-response function, R : I → IDesc I, yields a strictly positive endofunctor λX. λi. JR iKI X : S ETI → S ETI

whose fixpoint we then take:

Indexed descriptions: Note that IDesc I is a plain inductive type, parametrised by I, but indexed trivially.

Γ ` I : S ET Γ ` R : I → IDesc I Γ ` µI R : I → S ET

IDescD : (I : S ET) → IDesc 1 IDescD I 7→ ’Σ   (’k I ’var  ’k (’k S ET   # ’× (’var [] ’× ’var []

Γ ` I : S ET Γ ` R : I → IDesc I Γ ` i:I Γ ` x : JR iKI (µI R) Γ ` con x : µI R i

 ’× ’k 1) ’× ’k 1)  ’× ’k 1)  ’Σ (’Σ S ET λS. (’Π S λ_. ’var [])’× ’k 1) (’Σ S ET λS. (’Π S λ_. ’var [])’× ’k 1) ’Π

We define the IDesc grammar in Figure 6, delivering only strictly positive families. As well as indexing our descriptions, we have refactored a little, adopting a more compositional algebra of codes, where Desc is biased towards the right-nested tuples. We now have ’var i for recursive ‘subrequests’ at a chosen index i, with tupling by right-associative ’× and higher-order branching by ’Π. Upgrade your old Desc to a trivially indexed IDesc 1 as follows! upgrade : Desc upgrade ’1 upgrade (’Σ S D) upgrade (’ind× D) upgrade (’hind× H D)

Examples

Natural numbers:

Therefore, this universe is self-describing and can be levitated. As before, we rely on a special purpose switchID operator to build the finite function [. . .] without mentioning IDesc. Vectors: So far, our examples live in IDesc 1, with no interesting indexing. Let us at least have vectors. Recall that the constructors ’vnil and ’vcons are defined only for ’zero and ’suc respectively:

→ IDesc 1 7→ ’k 1 7→ ’Σ S λs. upgrade (D s) 7→ ’var [] ’× upgrade D 7→ (’Π H λ_. ’var []) ’× upgrade D

data Vec (X : S ET) : (i : Nat) → S ET where ’vnil : Vec X ’zero ’vcons : (n:Nat) → X → Vec X n → Vec X (’suc n)

To deliver induction for indexed datatypes, we need the ‘holds everywhere’ machinery. We present AllI and allI in Figure 5, with a twist—where Desc admits the all construction, IDesc is closed under it! The AllI operator for a description indexed on I is strictly positive in turn, and has a description indexed on some (i : I) × X i. Induction on indexed descriptions is then hardwired thus:

One way to code constrained datatypes is to appeal to a suitable notion of propositional equality == on indices. The constraints are expressed as ‘Henry Ford’ equations in the datatype. For vectors: VecD : S ET → Nat → IDesc Nat VecD X i 7→ ’Σ ’vnil (’k (’zero == i)) # ’vcons (’Σ Nat λn. ’k X ’× ’var n ’× ’k (’suc n == i))

(I:S ET) → (R : I → IDesc I)(P : ((i : I) × µI R i) → S ET ) → ((i : I)(xs : JR iKI (µI R)) → JAllI (R i) (µI R) xsK P → P [i, con xs]) → (i : I)(x : µI R i) → P [i, x] indI R P m i (con xs) 7→ m i xs (allI R i (µI R) P (∧λi. λxs. indI R P m) xs)

indI :

You may choose ’vnil for any index you like as long as it is ’zero; in the ’vcons case, the length of the tail is given explicitly, and the index i must be one more. Our previous 1-terminated tuple types can now be seen as the trivial case of constraint-terminated tuple types, with elaboration supplying the witnesses when trivial. In this paper, we remain anxiously agnostic about propositional equality. Any will do, according to conviction; many variations are popular. The homogeneous identity type used in Coq is ill-suited to dependent types, but its heterogeneous variant (forming equations

The generic catamorphism, cataI, is constructed from indI as before. Its type becomes more elaborated, to deal with the indexing: cataI :(I : S ET)(R : I → IDesc I)

(T : I → S ET) →((i : I) → JR iK T → T i) → µI R → ˙ T

10

(I:S ET) → (D : IDesc I)(X : I → S ET ) → JDKI X → IDesc ((i : I) × X i) AllI (’var i) X x 7→ ’var [i, x] AllI (’k K) Xk 7→ ’k 1 AllI (D ’× D0 ) X [d, d 0 ] 7→ AllI D X d ’× AllI D0 X d0 AllI (’Σ S D) X [s, d] 7→ AllI (D s) X d AllI (’Π S D) X f 7→ ’Π S λs. AllI (D s) X (f s)

AllI :

allI :

(I:S ET) → (D : IDesc I)(X : I → S ET )(P : ((i : I) × X i) → S ET ) → ((x : (i : I) × X i) → P x) → (xs : JDKI X) → JAllI D X xsK P

allI (’var i) X P p x 7→ allI (’k K) X Ppk 7→ allI (D ’× D0 ) X P p [d, d 0 ] 7→ allI (’Σ S D) X P p [s, d] 7→ allI (’Π S D) X P p f 7→

p [i, x]

[] [allI D X P p d, allI D0 X P p d0 ] allI (D s) X P p d λa. allI (D a) X P p (f a)

Figure 5. Indexed induction predicates This language has values, conditional expression, addition and comparison. Informally, their types are:

regardless of type) allows the translation of pattern matching with structural recursion to indI [Goguen et al. 2006]. The extensional equality of Altenkirch et al. [2007] also sustains the translation. However, sometimes, the equations are redundant. Looking back at Vec, we find that the equations constrain the choice of constructor and stored tail index retrospectively. But inductive families need not store their indices [Brady et al. 2003]! If we analyse the incoming index, we can tidy our description of Vec as follows:

’val : Val ty → ty ’cond : ’bool → ty → ty → ty

’plus : ’nat → ’nat → ’nat ’le : ’nat → ’nat → ’bool

The function Val interprets object language types in the host language, so that arguments to ’val fit their expected type. Val : Ty → S ET Val ’nat 7→ Nat Val ’bool 7→ Bool

VecD (X : S ET) : Nat → IDesc Nat VecD X ’zero 7→ ’k 1 VecD X (’suc n) 7→ ’k X ’× ’var n

The constructors and equations have simply disappeared. A similar example is Fin (bounded numbers), specified by:

We take Nat and Bool to represent natural numbers and Booleans in the host language, equipped with addition +H and comparison ≤H . We express our syntax as a tagged indexed description, indexing over object language types Ty. We note that some constructors are always available, namely ’val and ’cond. On the other hand, ’plus and ’le constructors are index-dependent, with ’plus available just when building a ’nat, ’le just for ’bool. The code, below, reflects this intuition, with the first component uniformly offering ’val and ’cond, the second selectively offering ’plus or ’le.

data Fin : (n : Nat) → S ET where ’fz : (n:Nat) → Fin (’suc n) ’fs : (n:Nat) → Fin n → Fin (’suc n) In this case, we can eliminate equations but not constructors, since both ’fz and ’fs both target ’suc: FinD : Nat → IDesc Nat FinD ’zero 7→ ’Σ #[] [] FinD (’suc n) 7→ ’Σ #[’fz ’fs] [(’k 1) (’var n)]

ExprD : TagIDesc Ty ExprD 7→ [ExprAD, ExprID]

This technique of extracting information by case analysis on indices applies to descriptions exactly where Brady’s ‘forcing’ and ‘detagging’ optimisations apply in compilation. They eliminate just those constructors, indices and constraints which are redundant even in open computation. In closed computation, where proofs can be trusted, all constraints are dropped.

ExprAD : AlwaysD Ty ’val ’k (Val ty) ’× ’k 1 ExprAD 7→ , λty. ’cond ’var ’bool ’× ’var ty ’× ’var ty ’× ’k 1

Tagged indexed descriptions: Let us reflect this index analysis technique. We can divide a description of tagged indexed data in two: first, the constructors that do not depend on the index; then, the constructors that do. The non-dependent part mirrors the definition for non-indexed descriptions. The index-dependent part simply indexes the choice of constructors by I. Hence, by inspecting the index, it is possible to vary the ‘menu’ of constructors.

Given the syntax, let us supply the semantics. We implement an evaluator as a catamorphism:

ExprID : IndexedD Ty [’plus] , λ_. [’var ’nat ’× ’var ’nat ’× ’k 1] ExprID 7→ [’le]

eval⇓ : (ty : Ty) → µ+Ty ExprD ty → Val ty eval⇓ ty term 7→ cataITy (de ExprD) Val eval↓ ty term

To finish the job, we must supply the algebra which implements a single step of evaluation, given subexpressions evaluated already.

TagIDesc I 7→ AlwaysD I × IndexedD I AlwaysD I 7→ (E : En) × (i : I) → π E λ_. IDesc I IndexedD I 7→ (F : I → En) × (i : I) → π (F i) λ_. IDesc I

eval↓ : (ty : Ty) → J(de ExprD) tyKTy Val → Val ty eval↓ _ (’val x) 7→ x eval↓ _ (’cond ’true x _) 7→ x eval↓ _ (’cond ’false _ y) 7→ y eval↓ ’nat (’plus x y) 7→ x +H y eval↓ ’bool (’le x y) 7→ x ≤H y

In the case of a tagged Vec, for instance, for the index ’zero, we would only propose the constructor ’nil. Similarly, for ’suc n, we would only propose the constructor ’cons. We write de D i to denote the IDesc I computed from the tagged indexed description D at index i. Its expansion is similar to the definition of de for tagged descriptions, except that it must also append the two parts. We again write µ+I D for µI (de D) .

Hence, we have a type-safe syntax and a tagless interpreter for our language, in the spirit of Augustsson and Carlsson [1999], with help from the generic catamorphism. However, so far, we are only able to define and manipulate closed terms. Adding variables, it is possible to build and manipulate open terms, that is, terms in a context. We shall get this representation, for free, thanks to the free indexed monad construction.

Typed expressions: We are going to define a syntax for a small language with two types, natural numbers and booleans: Ty 7→ #[’nat ’bool]

11

5.3

Free indexed monad

Correspondingly, you can update an old ExprD to a shiny closeTm: update : µ+Ty ExprD → ˙ µ+Ty closeTm update ty tm 7→ cataITy (de ExprD) (µ+Ty closeTm) (λ_. λ[tag, tm]. con [1+tag, tm]) ty tm

In Section 4.4, we have built a free monad operation for simple descriptions. The process is similar in the indexed world. Namely, given an indexed functor, we derive the indexed functor coding its free monad:

The other direction of the isomorphism is straightforward, the ’var case being impossible. Therefore, we are entitled to reuse the eval⇓ function to define the semantics of closeTm. Now we would like to give a semantics to the open term language. We proceed in two steps: first, we substitute variables by their value in the context; then, we evaluate the resulting closed term. Thanks to eval⇓ , the second problem is already solved. Let us focus on substituting variables from the context. Again, we can subdivide this problem: first, discharging a single variable from the context; then, applying this discharge function on every variables in the term. The discharge function is relative to the required type and a context of the right type. Its action is to map values to themselves, and variables to their value in context. This corresponds to the following function:

_∗ : (I:S ET) → (R : TagIDesc I)(X : I → S ET) → TagIDesc I [E, F] ∗I R 7→ [’cons ’var (π0 E), λi. [’k (R i), (π1 E) i]], F Just as in the universe of descriptions, this construction comes with an obvious return and a substitution operation, the bind. Its definition is the following: (I:S ET) → (X, Y : I → S ET ) → (R : TagIDesc I) (X → ˙ µ+I (R∗I Y)) → µ+I (R∗I X) → ˙ µ+I (R∗I Y) substI X Y R σ i t 7→ cataII (de R∗ X) (µ+Y (R∗ Y)) (applyI R X Y σ) i t

substI :

where applyI is defined as follows: (I:S ET) → (R : TagIDesc I)(X, Y : I → S ET ) → (X → ˙ µ+I (R∗I Y)) → Jde R∗I XKI µ+I (R∗I Y) → ˙ µ+I (R∗I Y) applyI R X Y σ i [’var, x] 7→ σ i x applyI R X Y σ i [c, ys] 7→ con [c, ys]

applyI :

discharge : (G : Context) → Env G → Var G → ˙ µ+Ty closeTm discharge G g ty v 7→ con [’val, lookup G g ty v]

We are now left with applying discharge over all variables of the term. We simply have to fill in the right arguments to substI, the type guiding us:

The subscripted types corresponds to implicit arguments that can be automatically inferred, hence do not have to be typed in. Let us now consider two examples of free indexed monads.

(G : Context) → (Var G → ˙ µ+Ty closeTm) → ˙ + µ Ty (openTm G) → ˙ µ+Ty closeTm substExpr G ty g σ tm 7→ substITy (Var G) Empty ExprD σ ty tm substExpr :

Typed expressions: In the previous section, we presented a language of closed arithmetic expressions. Using the free monad construction, we are going to extend this construction to open terms. An open term is defined with respect to a context, represented by a snoc-list of types:

Hence completing our implementation of the open terms interpreter. Without much effort, we have described the syntax of a welltyped language, together with its semantics.

Context : S ET [] : Context snoc : Context → Ty → Context

Indexed descriptions: An interesting instance of free monad is IDesc itself. Indeed, ’var is nothing but the return. The remaining constructors form the carrier functor, trivially indexed by 1. The signature functor is described as follow:

An environment realises the context, packing a value for each type: Env : Context → S ET Env [] 7→ 1 Env (snoc G S) 7→ Env G × Val S

IDescDSig : AlwaysD 1  [’k  ’× ’Σ ’Π],  ’k S ET  IDescDSig 7→   ’var [] ’× ’var []

     λ_.  ’Σ S ET (λS. ’Π S (λ_. ’var []))   ’Σ S ET (λS. ’Π S (λ_. ’var []))

In this setting, we define type variables, Var by: Var : Context → Ty → S ET Var [] T 7→ [] Var (snoc G S) T 7→ (Var G T)+(S == T)

We get IDesc I by extending the signature with variables from I: IDescD : (I : S ET) → TagIDesc 1 ∗ IDescD I 7→ [IDescDSig , [λ_. [],λ_. []]]1 λ_. I

While Val maps the type to the corresponding host type, Var indexes a value in the context, obtaining a proof that the types match. The lookup function precisely follow this semantics:

The fact that indexed descriptions are closed under substitution is potentially of considerable utility, if we can exploit this fact:

lookup : (G : Context) → Env G → (T : Ty) → Var G T → Val T lookup (snoc G .T ) [g, t] T (right refl) 7→ t lookup (snoc G S) [g, t] T (left x) 7→ lookup G g T x

JσDKJ X 7→ JDKI λi. JσiKJ X

openTm G 7→ ExprD∗Ty (Var G)

In this setting, the language of closed terms corresponds to the free monad assigning an empty set of values to variables where

where σ : I → IDesc J

By observing that a description can be decomposed via substitution, we split its meaning into a superstructure of substructures, e.g. a ‘database containing salaries’, ready for traversal operations preserving the former and targeting the latter.

Consequently, taking the free monad of ExprD by Var G, we obtain the language of open terms in a context G:

closeTm 7→ ExprD∗Ty Empty

 

6.

Discussion

In this paper, we have presented a universe of datatypes for a dependent type theory. We started from an unremarkable type theory with dependent functions and tuples, but relying on few other assumptions, especially where propositional equality is concerned. We added finite enumeration sufficient to account for constructor

Empty : Ty → S ET Empty _ 7→ #[]

Allowing variables from an empty set is much like forbidding variables, so closeTm and ExprD describe isomorphic datatypes.

12

proving correctness properties of Haskell code. Our approach is different in that we aim at building a foundation for datatypes, in a dependently-typed system, for a dependently-typed system. Closer to us is the work of Benke et al. [2003]. This seminal work introduced the usage of universes for developing generic programs. Our universes share similarities to theirs: our universe of descriptions is similar to their universe of iterated induction, and our universe of indexed descriptions is equivalent to their universe of finitary indexed induction. This is not surprising, as we share the same source of inspiration, namely induction-recursion. However, we feel ready to offer a more radical prospectus. Their approach is generative: each universe extends the base type theory with both type formers and elimination rules. Thanks to levitation, we rely only on a generic induction and a specialised switchD, closing the type theory. We explore programming with codes, but also how to conceal the encoding when writing ‘ordinary’ programs.

choice, and then we built coding systems, first (as a learning experience) for simple ML-like inductive types, then for the indexed inductive families which dependently typed programmers in Agda, Coq and Epigram take for granted. We adopt a bidirectional type propagation mechanism to conceal artifacts of the encoding, giving a familiar and practicable constructor-based presentation to data. Crucially to our approach, we ensure that the codes describing datatypes inhabit a datatype with a code. In a stratified setting, we avoid paradox by ensuring that this type of codes lives uniformly one level above the types the codes describe. The adoption of ordinary data to describe types admits datatype-generic operations implemented just by ordinary programming. In working this way, we make considerable use of type equality modulo open computation, silently specialising the types of generic operations as far as the datatype code for any given usage is known. 6.1

Related work in Generic Programming

Generic programming is a vast topic. We refer our reader to Garcia et al. [2003] for a broad overview of generic programming in various languages. For Haskell alone, there is a myriad of proposals: Hinze et al. [2007] and Rodriguez et al. [2008] provide useful comparative surveys. Our approach follows the polytypic programming style, as initiated by PolyP [Jansson and Jeuring 1997]. Indeed, we build generic functions by induction on pattern functors, exploiting type-level computation to avoid the preprocessing phase: our datatypes are, natively, nothing but codes. We have the type-indexed datatypes of Generic Haskell [Hinze et al. 2002] for free. From one datatype, we can compute others and equip them with relevant structure: the free monad construction provides one example. Our approach to encoding datatypes as data also sustains generic views [Holdermans et al. 2006], allowing us to rebias the presentation of datatypes conveniently. Tagged descriptions, giving us a sum-of-sigmas view, are a natural example. Unlike Generic Haskell, we do not support polykinded programming [Hinze 2000]. Our descriptions are limited to endofunctors on S ETI . Whilst indexing is known to be sufficient to encode a large class of higher-kinded datatypes [Altenkirch and McBride 2002], we should rather hope to work in a more compositional style. We are free to write higher-order programs manipulating codes, but is not yet clear whether that is sufficient to deliver abstraction at higher kinds. Similarly, it will be interesting to see whether arity-generic programming [Weirich and Casinghino 2010] arises just by computing with our codes, or whether a richer abstraction is called for. The Scrap Your Boilerplate [Lämmel and Peyton Jones 2003] (SYB) approach to generic programming offers a way to construct generic functions, based on dynamic type-testing via the Typeable type class. SYB cannot compute types from codes, but its dynamic character does allow a more flexible ad hoc approach to generic data traversal. By maintaining the correspondence between codes and types whilst supporting arbitrary inspection of codes, we pursue the same flexibility statically. The substitutive character of IDesc may allow us to observe and exploit ad hoc substructural relationships in data, but again, further work is needed if we are to make a proper comparison. 6.2

6.3

Metatheoretical Status

The S ET : S ET approach we have taken in this paper is convenient from an experimental perspective, and it has allowed us to focus primarily on the encoding of universes, leaving the question of stratification (and with it, consistency, totality, and decidability of type checking) to one side. However, we must surely face up to the latter, especially since we have taken up the habit of constructing ‘the set of all sets’. A proper account requires a concrete proposal for a system of stratified universes which allows us to make ‘levelpolymorphic’ constructions, and we are actively pursuing such a proposal. We hope soon to have something to prove. In the meantime, we can gain some confidence by systematically embedding predicative fragments of our theory within systems which already offer a universe hierarchy. We can, at the very least, confirm that in UTT-style theories with conventional inductive families of types [Luo 1994], as found in Coq (and in Agda if one avoids experimental extensions), we build the tower of universes we propose, cut off at an arbitrary height. It is correspondingly clear that some such system can be made to work, or else that other, longer-standing tools are troubled. A metatheoretical issue open at time of writing concerns the size of the index set I in IDesc I. Both Agda and recent versions of Coq allow inductive families with large indices, effectively allowing ‘higher-kind’ fixpoints on S ET S ET and more. They retain the safeguard that the types of substructures must be as small as the inductively defined superstructure. This liberalisation allows us large index sets in our models, but whilst it offers no obvious route to paradox by smuggling a large universe inside a small type, it is not yet known to be safe. We can restrict I as necessary to avoid paradox, provided 1, used to index IDesc itself, is ‘small’. 6.4

Further Work

Apart from the need to nail down a stratified version of the system and its metatheory, we face plenty of further problems and opportunities. Although we have certainly covered Luo’s criteria for inductive families [Luo 1994], there are several dimensions in which to consider expanding our universe. Firstly, we seek to encompass inductive-recursive datatype families [Dybjer and Setzer 2001], allowing us to interleave the definition and interpretation of data in intricate and powerful ways. This interleaving seems particularly useful when reflecting the syntax of dependent type systems. Secondly, we should very much like to extend our universe with a codes for internal fixpoints, as in [Morris et al. 2004]. The external knot-tying approach we have taken here makes types like ‘trees with lists of subtrees’ more trouble than they should be. Moreover, if we allow the alternation of least and greatest fixpoints,

Generic Programming with Dependent Types

Generic programming is not new to dependent types. Altenkirch and McBride [2002] developed a universe of polykinded types in Lego; Norell [2002] gave a formalisation of polytypic programming in Alfa, a precursor to Agda; Verbruggen et al. [2008, 2009] provided a framework for polytypic programming in the Coq theorem prover. However, these works aim at modelling PolyP or Generic Haskell in a dependently-typed setting for the purpose of

13

we should expect to gain types which are not readily encoded with one external µ. Thirdly, it would be fascinating to extend our universe with dedicated support for syntax with binding, not least because a universe with internal fixpoints has such a syntax. Harper and Licata have demonstrated the potential for and of such an encoding [Licata and Harper 2009], boldly encoding the invalid definitions along with the valid. A more conservative strategy might be to offer improved support for datatypes indexed by an extensible context of free variables, with the associated free monad structure avoiding capture as shown by Altenkirch and Reus [1999]. Lastly, we must ask how our new presentation of datatypes should affect the tools we use to build software. It is not enough to change the game: we must enable better play. If datatypes are data, what is design?

P. Dybjer and A. Setzer. Induction-recursion and initial algebras. In Annals of Pure and Applied Logic, 2000.

Acknowledgments

S. Holdermans, J. Jeuring, A. Löh, and A. Rodriguez. Generic views on data types. In MPC. 2006.

We are grateful to José Pedro Magalhães for his helpful comments on a draft of this paper. We are also grateful to the Agda team, without which levitation would have been a much more perilous exercise. J. Chapman was supported by the Estonian Centre of Excellence in Computer Science, EXCS, financed by the European Regional Development Fund. P.-É. Dagand, C. McBride and P. Morris are supported by the Engineering and Physical Sciences Research Council, Grants EP/G034699/1 and EP/G034109/1.

P. Jansson and J. Jeuring. PolyP—a polytypic programming language extension. In POPL, 1997. R. Lämmel and S. Peyton Jones. Scrap your boilerplate: a practical design pattern for generic programming. In TLDI, 2003. D. R. Licata and R. Harper. A universe of binding and computation. In ICFP, 2009. Z. Luo. Computation and Reasoning. Oxford University Press, 1994.

P. Dybjer and A. Setzer. Indexed induction-recursion. In Proof Theory in Computer Science. 2001. R. Garcia, J. Jarvi, A. Lumsdaine, J. Siek, and J. Willcock. A comparative study of language support for generic programming. In OOPSLA, 2003. H. Geuvers. Induction is not derivable in second order dependent type theory. In TLCA, 2001. H. Goguen, C. McBride, and J. McKinna. Eliminating dependent pattern matching. In Algebra, Meaning and Computation. 2006. R. Harper and R. Pollack. Type checking with universes. In TAPSOFT’89. R. Hinze. Polytypic values possess polykinded types. In MPC. 2000. R. Hinze, J. Jeuring, and A. Löh. Type-indexed data types. In MPC, 2002. R. Hinze, J. Jeuring, and A. Löh. Comparing approaches to generic programming in Haskell. In Datatype-Generic Programming. 2007.

P. Martin-Löf. Intuitionistic Type Theory. Bibliopolis·Napoli, 1984. C. McBride and J. McKinna. The view from the left. JFP, 2004. P. Morris. Constructing Universes for Generic Programming. PhD thesis, University of Nottingham, 2007.

References A. Abel, T. Coquand, and M. Pagano. A modular type-checking algorithm for type theory with singleton types and proof irrelevance. In TLCA.

P. Morris and T. Altenkirch. Indexed containers. In LICS, 2009. P. Morris, T. Altenkirch, and C. McBride. Exploring the regular tree types. In TYPES, 2004. P. Morris, T. Altenkirch, and N. Ghani. A universe of strictly positive families. IJCS, 2009. U. Norell. Functional generic programming and type theory. Master’s thesis, Chalmers University of Technology, 2002. U. Norell. Towards a practical programming language based on dependent type theory. PhD thesis, Chalmers University of Technology, 2007. N. Oury and W. Swierstra. The power of Pi. In ICFP, 2008. C. Paulin-Mohring. Définitions inductives en théorie des types d’ordre supérieur. thèse d’habilitation, ENS Lyon, 1996.

R. Adams. Pure type systems with judgemental equality. JFP, 2006. T. Altenkirch and C. McBride. Generic programming within dependently typed programming. In Generic Programming, 2002. T. Altenkirch and B. Reus. Monadic presentations of lambda terms using generalized inductive types. In Computer Science Logic. 1999. T. Altenkirch, C. McBride, and W. Swierstra. Observational equality, now! In PLPV, 2007. L. Augustsson and M. Carlsson. An exercise in dependent types: A well-typed interpreter. Available at http://www.cs.chalmers.se/ ~augustss/cayenne/interp.ps, 1999. M. Benke, P. Dybjer, and P. Jansson. Universes for generic programs and proofs in dependent type theory. Nordic Journal of Computing, 2003.

B. C. Pierce and D. N. Turner. Local type inference. In POPL, 1998. A. Rodriguez, J. Jeuring, P. Jansson, A. Gerdes, O. Kiselyov, and B. C. d. S. Oliveira. Comparing libraries for generic programming in Haskell. In Haskell Symposium, 2008. The Coq Development Team. The Coq Proof Assistant Reference Manual. W. Verbruggen, E. de Vries, and A. Hughes. Polytypic programming in Coq. In WGP, 2008. W. Verbruggen, E. de Vries, and A. Hughes. Polytypic properties and proofs in Coq. In WGP, 2009. S. Weirich and C. Casinghino. Arity-generic datatype-generic programming. In PLPV, 2010.

E. Brady, J. Chapman, P.-E. Dagand, A. Gundry, C. McBride, P. Morris, and U. Norell. An Epigram implementation. E. Brady, C. McBride, and J. McKinna. Inductive families need not store their indices. In TYPES, 2003. J. Cheney and R. Hinze. First-class phantom types. Technical report, Cornell University, 2003. T. Coquand. An algorithm for type-checking dependent types. SCP, 1996. J. Courant. Explicit universes for the calculus of constructions. In TPHOLs, 2002. N. A. Danielsson. The Agda standard library, 2010.

H. Xi, C. Chen, and G. Chen. Guarded recursive datatype constructors. In POPL, 2003. A. R. Yakushev, S. Holdermans, A. Löh, and J. Jeuring. Generic programming with fixed points for mutually recursive datatypes. In ICFP, 2009.

P. Dybjer. Inductive sets and families in Martin-Löf’s type theory. In Logical Frameworks. 1991. P. Dybjer and A. Setzer. A finite axiomatization of inductive-recursive definitions. In TLCA, 1999.

14

Functional Pearl: Every Bit Counts Dimitrios Vytiniotis

Andrew J. Kennedy

Microsoft Research, Cambridge, U.K. [email protected]

Microsoft Research, Cambridge, U.K. [email protected]

Abstract

Is it bound by the nearest λ? No. You must be λx:Nat.λy:Nat.x. You’re right!

We show how the binary encoding and decoding of typed data and typed programs can be understood, programmed, and verified with the help of question-answer games. The encoding of a value is determined by the yes/no answers to a sequence of questions about that value; conversely, decoding is the interpretation of binary data as answers to the same question scheme.

From the answer to the first question, we know that the program is not a function application. Moreover, the program is closed, and so it must be a λ-abstraction; hence we proceed to ask new questions about the argument type and body. We continue asking questions until we have identified the program. In this example, we asked just seven questions. Writing 1 for yes, and 0 for no, our answers were 0100110. This is a code for the program λx:Nat.λy:Nat.x.

We introduce a general framework for writing and verifying gamebased codecs. We present games for structured, recursive, polymorphic, and indexed types, building up to a representation of well-typed terms in the simply-typed λ-calculus. The framework makes novel use of isomorphisms between types in the definition of games. The definition of isomorphisms together with additional simple properties make it easy to prove that codecs derived from games never encode two distinct values using the same code, never decode two codes to the same value, and interpret any bit sequence as a valid code for a value or as a prefix of a valid code.

By deciding a question scheme for playing our game we’ve thereby built an encoder for programs. By interpreting a bit sequence as answers to that same scheme, we have a decoder. Correct roundtripping of encoding and decoding follows automatically. If, as in this example, we never ask ‘silly questions’ that reveal no new information, then every code represents some value, or is the prefix of a valid code. In other words, every bit counts. Related ideas have previously appeared in domain-specific work; tamper-proof bytecode [10, 13] and compact proof witnesses in proof carrying code [18]. In the latter case, an astonishing improvement of a factor of 30 in proof witness size is reported compared to previous syntactic representations! By contrast, standard serialization techniques do not easily guarantee tamper-proof codes, nor take advantage of semantic information to yield more compact endodings.

Categories and Subject Descriptors D.1.1 [Programming Techniques]: Applicative (Functional) Programming; D.3.3 [Programming Languages]: Language Constructs and Features; E.4 [CODING AND INFORMATION THEORY]: [Data compaction and compression] General Terms

1.

Design, Languages, Theory

Our paper identifies and formalizes a key intuition behind those works: question-and-answer games. Moreover, we take a novel typed approach to codes, using types for domains of values, and representing the partitioning of the domain by type isomorphisms. Concretely, our contributions are as follows:

Introduction

Let’s play a guessing game: I am a simply-typed program.1 Can you guess which one? Are you a function application? No. You must be a function. Is your argument a Nat? Yes. Is your body a variable? No. Is your body a function application? No. It must be a function. Is its argument a Nat? Yes. Is its body a variable? Yes.

• We introduce question-answer games for encoding and decod-

•

1A

closed program in the simply-typed λ-calculus with types τ ::= Nat | τ → τ and terms e ::= x | e e | λx:τ.e, identified up to α-equivalence. We have deliberately impoverished the language for simplicity of presentation; in practice there would also be constants, primitive operations, and perhaps other constructs.

•

•

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

•

15

ing: a novel way to think about and program codecs (Section 2). We build simple codecs for numeric types, and provide combinators that construct complex games from simpler ones, producing coding schemes for structured, recursive, polymorphic, and indexed types that are correct by construction. Under easily-stated assumptions concerning the structure of games, we prove round-trip properties of encoding and decoding, and the ‘every bit counts’ property of the title (Section 3). We develop more sophisticated codecs for abstract types such as sets and multisets, making crucial use of the invariants associated with such types (Section 4). We build games for untyped and simply-typed terms that yield every-bit-counts coding schemes (Section 5). Stated plainly: we can represent programs such that every sufficiently-long bit string represents a well-typed term. To our knowledge, this is the first such coding scheme for a typed language that has been proven correct. We discuss filters on games (Section 6). Finally, we discuss future developments and present connections to related work (Sections 7 and 8).

{n | n > 0} {0}

= 0? 1 0

0

{1}

{0..15} {n | n > 1} = 1? 1 0

1

{2}

{8..15} .. .

{n | n > 2} = 2? 1 0

{4..7}

.. .∞

> 3? 1

{0..3} 0 > 1?

{2..3} 1 {3}

Figure 1: Unary game for naturals

3

> 2? 1

0

0

{0..1} .. .

{2} 2

Figure 2: Binary game for {0..15}

We will be using Haskell (for readability, familiarity, and executability) but the paper is accompanied by a partial Coq formalization (for correctness) downloadable from:

the one-question-at-a-time nature of games ensures that codes are prefix-free: no code is the prefix of any other valid code [19].

http://research.microsoft.com/people/dimitris/

The correctness and compactness properties of our coding schemes follow by construction in our Coq development, and by very localized reasoning in our Haskell code. We make use of infinite structures, utilizing laziness in Haskell (and co-induction in Coq), but the code should adapt to call-by-value languages through the use of thunks.

2.

{0..7} 0

.. .

{n | n > 3}

2

> 7? 1

Notice two properties common to the games of Figure 1 and 2: every value in the domain is represented by some leaf node (we call such games total), and each question strictly partitions the domain (we call such games proper). Games satisfying both properties give rise to codecs with the following property: any bitstring of is a prefix of or has a prefix that is a code for some value. This is the ‘every bit counts’ property of the title. In Section 3 we pin these ideas down with theorems.

From games to codecs

But how can we actually compute with games? We’ve explained the basic principles in terms of set membership and potentially infinite trees, and we need to translate these ideas into code.

We can visualize question-and-answer games graphically as binary decision trees. Figure 1 visualizes a (na¨ıve) game for natural numbers. Each rectangular node contains a question, with branches to the left for yes and right for no. Circular leaf nodes contain the final result that has been determined by a sequence of questions asked on a path from the root. Arcs are labelled with the ‘knowledge’ at that point in the game, characterised as subsets of the original domain.

• We must represent infinite games without constructing all the

leaf nodes ahead-of-time. This is easy: just construct the game tree lazily. • We need something corresponding to ‘a set of possible values’,

which we’ve been writing on the arcs in our diagrams. Types are the answer here, sometimes with additional implicit invariants; for example, in Haskell, ‘Ints between 4 and 7’.

Let’s dry-run the game. We start at the root knowing that we’re in the set {n | n > 0}. First we ask whether the number is exactly 0 or not. If the answer is yes we continue on the left branch and immediately reach a leaf that tells us that the result is 0. If the answer is no then we continue on the right branch, knowing now that the number in hand is in the set {n | n > 1}. The next question asks whether the number is exactly 1 or not. If yes, we are done, otherwise we continue as before, until the result is reached.

• We must capture the splitting of the domain into two disjoint

parts. This is solved by type isomorphisms of the form τ ∼ = τ1 + τ2 , with τ1 representing the domain of the left subtree (corresponding to answering yes to the question) and τ2 representing the domain of the right subtree (corresponding to no).

• Lastly, we need a means of using this splitting to query the data

Figure 2 shows a more interesting game for natural numbers in {0..15}. This game proceeds by asking whether the number in hand is greater than the median element in the current range. For example, the first question asks of a number n whether n > 7, splitting the range into disjoint parts {8..15} and n ∈ {0..7}. If n ∈ {8..15} we play the game given by the left subtree. If n ∈ {0..7} we play the game given by the right subtree.

(when encoding), and to construct the data (when decoding). Type isomorphisms provide a very elegant solution to this task: we simply use the maps associated with the isomorphism. Let’s get concrete with some code, in Haskell! 2.1

In both games, the encoding of a value can be determined by labelling all left edges with 1 and all right edges with 0, and returning the path from the root to the value. Conversely, to decode, we interpret the input bitstream as a path down the tree. So in the game of Figure 1, a number n ∈ N is encoded in unary as n zeroes followed by a one, and in the game of Figure 2, a number n ∈ {0..15} is encoded as 4-bit binary, as expected. For example, the encoding of 2 is 0010 and 3 is 0011. There is one more difference between the two games: the game of Figure 1 is infinite whereas the game of Figure 2 is finite.

Games in Haskell

We’ll dive straight in, with a data type for games: data Game :: * → * where Single :: ISO t () → Game t Split :: ISO t (Either t1 t2) → Game t1 → Game t2 → Game t

A value of type Game t represents a game (strictly speaking, a strategy for playing a game) for domain t. Its leaves are built with Single and represent singletons, and its nodes are built with Split and represent a splitting of the domain into two parts. The leaves carry a representation of an isomorphism between t and (),

It’s clear that question-and-answer games give rise to codes that are unambiguous: a bitstring uniquely determines a value. Moreover,

16

Haskell’s unit type. The nodes carry a representation of an isomorphism between t and Either t1 t2 (Haskell’s sum type), and two subtrees of type Game t1 and Game t2.2

{x} ∼ =1 singleIso :: a → ISO a () -- ∀ x:a, ISO {z | z = x} unit singleIso x = Iso (const ()) (const x)

What is ISO? It’s just a pair of maps witnessing an isomorphism: -- (Iso to from) must satisfy -left inverse: from ◦ to = id -right inverse: to ◦ from = id data ISO t s = Iso { to :: t → s, from :: s → t }

X∼ = Y + (X \ Y ) splitIso :: (a → Bool) → ISO a (Either a a) -- ∀ p:a→bool, ISO a ({x|p x = true}+{x|p x = false}) splitIso p = Iso ask bld where ask x = if p x then Left x else Right x bld x = case x of Left y → y; Right y → y

In our Coq formalization, the ISO type also records proofs of the left inverse and right inverse properties. Without further ado we write a generic encoder and decoder, once and for all. We use Bit for binary digits rather than Bool so that output is more readable:

B∼ =1+1

data Bit = O | I

boolIso :: ISO Bool (Either () ()) boolIso = Iso ask bld where ask True ask False bld (Left ()) bld (Right ())

Given a Game t, here is an encoder for t: enc enc enc =

:: Game t → t → [Bit] (Single _) x = [] (Split (Iso ask _) g1 g2) x case ask x of Left x1 → I : enc g1 x1 Right x2 → O : enc g2 x2

Left () Right () True False

N∼ =1+N succIso :: ISO Nat (Either () Nat) succIso = Iso ask bld where ask 0 ask (n+1) bld (Left ()) bld (Right n)

If the game we are playing is a Single leaf, then t must be a singleton, so we need no bits to encode t, and just return the empty list. If the game is a Split node, we ask how x of type t can become either a value of type t1 or t2, for some t1 and t2 that split type t disjointly in two. Depending on the answer we output I or O and continue playing either the sub-game g1 or g2.

= = = =

Left () Right n 0 n+1

N∼ =N+N parityIso :: ISO Nat (Either Nat Nat) parityIso = Iso (λn → if even n then Left(n ‘div‘ 2) else Right(n ‘div‘ 2)) (λx → case x of Left m → m*2; Right m → m*2+1)

A decoder is also simple to write: :: Game t → [Bit] → (t, [Bit]) (Single (Iso _ bld)) str = (bld (), str) (Split _ _ _) [] = error "Input too short" (Split (Iso _ bld) g1 g2) (I : xs) let (x1, rest) = dec g1 xs in (bld (Left x1), rest) dec (Split (Iso _ bld) g1 g2) (O : xs) = let (x2, rest) = dec g2 xs in (bld (Right x2), rest)

dec dec dec dec =

X? ∼ = 1 + X × X? listIso :: ISO [t] (Either () (t,[t])) listIso = Iso ask bld where ask [] = Left () ask (x:xs) = Right (x,xs) bld (Left ()) = [] bld (Right (x,xs)) = x:xs

X? ∼ = Σn : N.X n

The decoder accepts a Game t and a bitstring of type [Bit]. If the input bitstring is too short to decode a value then dec raises an exception.3 Otherwise it returns a decoded value and the suffix of the input list that was not consumed. If the game is Single, then dec returns return the unique value in t by applying the inverse map of the isomorphism on (). No bits are consumed, as no questions need answering! If the game is Split and the input list is non-empty then dec decodes the rest of the bitstring using either sub-game g1 or g2, depending on whether the first bit is O or I, building a value of t using the bld function of the isomorphism gadget. 2.2

= = = =

depListIso :: ISO [t] (Nat,[t]) -- ISO (list t) { n:nat & t^n } depListIso = Iso ask bld where ask xs = (length xs, xs) bld (n,xs) = xs

Figure 3: Some useful isomorphisms

Unary naturals. lows:

The game of Figure 1 can be expressed as fol-

Number games geNatGame :: Nat → Game Nat -- ∀ k:nat, Game { x | x > k } geNatGame k = Split (splitIso ((==) k)) (Single (singleIso k)) (geNatGame (k+1))

These simple definitions already suffice for a range of numeric encodings. We make the Haskell definition type Nat = Int

to document that our integers are non-negative. Where the Haskell type system isn’t rich enough to express precise invariants, we will put Coq types in comments, lifted directly from our Coq development.

The function geNatGame returns a game for natural numbers greater than or equal to its parameter k. It consists of a Split node whose left subtree is a Singleton node for k, and whose right subtree is a game for values greater than or equal to k+1. The isomorphisms singleIso and splitIso are used to express singleton values and a partitioning of the set of values respectively. Their signatures and definitions are presented in Figure 3, along with some other basic isomorphisms that we shall use throughout the paper.

2 The type variables t1 and t2 are existential variables, not part of vanilla Haskell 98, but supported by all modern Haskell compilers. 3 We could alternatively have dec return Maybe (t,[Bit]); this is indeed what our Coq formalization does.

17

In this game, the isomorphisms just add clutter to the code: one might ask why we didn’t define a Game type with elements at the leaves and simple predicates in the nodes. But isomorphisms show their true colours when they are used to map between different representations or possibly even different types of data. Unary naturals, revisited. ural numbers:

> enc binNatGame 8 [O,O,O,I,O,I,I] > dec binNatGame [O,O,O,I,O,I,I] Just (8,[]) > enc binNatGame 16 [O,O,O,I,O,I,O,I,I]

Consider this alternative game for nat-

After staring at the output for a few moments one observes that the encoding takes double the bits (plus one) that one would expect for a logarithmic code. This is because before every step, an extra bit is consumed to check whether the number is zero or not. The final extra I terminates the code. In the next section we explain how the extra bits result in prefix codes, a property that our methodology is designed to validate by construction. The accompanying Haskell code gives additional examples of games for natural numbers, including Elias codes [8], as well as codes based on prime factorization.

unitGame :: Game () unitGame = Single (Iso id id) unaryNatGame :: Game Nat unaryNatGame = Split succIso unitGame unaryNatGame

This time we’re exploiting the isomorphism N ∼ = 1 + N, presented in Figure 3. Let’s see how it’s used in the game. When encoding a natural number n, we ask whether it’s zero or not using the forward map of the isomorphism to get answers of the form Left () or Right (n − 1), that capture both the yes/no ‘answer’ to the question and data with which to continue playing the game. If the answer is Left () then we just play the trivial unitGame on the value (), otherwise we have Right (n − 1) and play the very same unaryNatGame for the value n − 1.

2.3

Game combinators

To build games for structured types we provide combinators that construct complex games from simple ones. Constant. Our first combinator is trivial, making use of the isomorphism between the unit type and singletons.

When decoding, we apply the inverse map of the isomorphism to build data with Left () or Right x as determined by the next bit in the input stream.

constGame :: t → Game t -- ∀ (k:t), Game { x | x=k } constGame k = Single (singleIso k)

We can test our game using the generic enc and dec functions:

Cast. The combinator (+>) transforms a game for t into a game for s, given that s is isomorphic to t.

> enc unaryNatGame 3 [O,O,O,I] > enc unaryNatGame 2 [O,O,I] > dec unaryNatGame [O,O,I] Just (2,[])

(+>) :: Game t → ISO s t → Game s (Single j) +> i = Single (i ‘seqI‘ j) (Split j g1 g2) +> i = Split (i ‘seqI‘ j) g1 g2

What is seqI? It is a combinator on isomorphisms, which wires two isomorphisms together. In fact, combining isomorphisms together in many ways is generally useful, so we define a small library of isomorphism combinators. Their signatures are given in Figure 4 and their implementation (and proof) is entirely straightforward.

Finite ranges. How about the range encoding for natural numbers, sketched in Figure 2? That’s easy: rangeGame :: Nat → Nat → Game Nat -- ∀ m n : nat, Game { x | m 6 x && x 6 n } rangeGame m n | m == n = Single (singleIso m) rangeGame m n = Split (splitIso (λx → x > mid)) (rangeGame (mid+1) n) (rangeGame m mid) where mid = (m + n) ‘div‘ 2

Choice. It’s dead easy to construct a game for the sum of two types, if we are given games for each. The sumGame combinator is so simple that it hardly has a reason to exist as a separate definition: sumGame :: Game t → Game s → Game (Either t s) sumGame = Split idI

Let’s try it out:

Composition. Suppose we are given a game g1 of type Game t and a g2 of type Game s. How can we build a game for the product (t,s)? A simple strategy is to play g1 , the game for t, and at the leaves play g2 , the game for s. Graphically, if g1 looks like the tree on the left, below, composing it with g2 produces the tree on the right.

> enc (rangeGame 0 15) 5 [O,I,O,I] > dec (rangeGame 0 15) [O,I,O,I] (5,[])

Binary naturals. The range encoding results in a logarithmic coding scheme, but only works for naturals in a finite range. Can we give a general logarithmic scheme for arbitrary naturals? Yes, and here is the protocol: we first ask if the number n is 0 or not, making use of succIso again. If yes, we are done. If not, we ask whether n − 1 is divisible by 2 or not, making use of parityIso from Figure 3 that captures the isomorphism N ∼ = N + N. Here is the code:

.. . .. . .. .. . g2 . The prodGame combinator achieves this, as follows: .. .

=⇒

prodGame :: Game t → Game s → Game (t,s) prodGame (Single iso) g2 = g2 +> prodI iso idI ‘seqI‘ prodLUnitI prodGame (Split iso g1a g1b) g2 = Split (prodI iso idI ‘seqI‘ prodLSumI) (prodGame g1a g2) (prodGame g1b g2)

binNatGame :: Game Nat binNatGame = Split succIso unitGame (Split parityIso binNatGame binNatGame)

We can test this game; for example:

18

.. . g2

A∼ =A A∼ =B⇒B∼ =A A∼ =B∧B ∼ =C⇒A∼ =C A∼ =B∧C ∼ =D ⇒A×C ∼ =B×D A∼ =B∧C ∼ =D ⇒A+C ∼ =B+D A×B ∼ =B×A A+B ∼ =B+A A × (B × C) ∼ = (A × B) × C A + (B + C) ∼ = (A + B) + C 1×A∼ =A A×1∼ =A A × (B + C) ∼ = (A × B) + (A × C) (B + C) × A ∼ = (B × A) + (C × A)

idI :: ISO a a invI :: ISO a b →ISO b a seqI :: ISO a b →ISO b c →ISO a c prodI :: ISO a b →ISO c d →ISO (a,c) (b,d) sumI :: ISO a b →ISO c d →ISO (Either a c) (Either b d) swapProdI :: ISO (a,b) (b,a) swapSumI :: ISO (Either a b) (Either b a) assocProdI :: ISO (a,(b,c)) ((a,b),c) assocSumI :: ISO (Either a (Either b c)) (Either (Either a b) c) prodLUnitI :: ISO ((),a) a prodRUnitI :: ISO (a,()) a prodRSumI :: ISO (a,Either b c) (Either (a,b) (a,c)) prodLSumI :: ISO (Either b c,a) (Either (b,a) (c,a))

Figure 4: Isomorphism combinator signatures If the game for t is a singleton node, then we play g2, which is the game for s. However, that will return a Game s, whereas we’d like a Game (t,s). But from the type of the Single constructor we know that t is the unit type (), and so we coerce g2 to the appropriate type using combinators from Figure 4 to construct an isomorphism between s and ((),s). In the case of a Split node, we are given an isomorphism iso of type ISO t (Either t1 t2) for unknown types t1 and t2, and we create a new Split node whose subtrees are constructed recursively, and whose isomorphism of type ISO (t,s) (Either (t1,s) (t2,s)) is again constructed using the combinators from Figure 4.

The ilGame below does that by playing a bit from the game on the left, but always ‘flipping’ the order of the games in the recursive calls. Its definition is similar to prodGame, with isomorphism plumbing adjusted appropriately:

Lists. What can we do with prodGame? We can build more complex combinators, such as the following recursive lstGame that encodes lists:

The resulting encoding of product values of course differs between ilGame and prodGame, although it will use exactly the same number of bits.

ilGame :: Game t → Game s → Game (t,s) ilGame (Single iso) g2 = g2 +> prodI iso idI ‘seqI‘ prodLUnitI ilGame (Split iso g1a g1b) g2 = Split (swapProdI ‘seqI‘ prodI idI iso ‘seqI‘ prodRSumI) (ilGame g2 g1a) (ilGame g2 g1b)

listGame :: Game t → Game [t] listGame g = Split listIso unitGame (prodGame g (listGame g))

Dependent composition. Suppose that, after having decoded a value x of type t, we wish to play a game whose strategy depends on x. For example, given a game for natural numbers, and a game for lists of a particular size, we could create a game for arbitrary lists paired up with their size. We can do this with the help of a dependent composition game combinator.

It takes a game for t and produces a game for lists of t. The question asked by listIso is whether the list is empty or not. If empty then we play the left sub-game – a singleton node – and if non-empty then we play the right sub-game, consisting of a game for the head of the list followed by the list game for the tail of the list. This is just the product prodGame g (listGame g).

depGame :: Game t → (t → Game s) → Game (t,s) -- Game t → (∀ x:t, Game(s x)) → Game {x:t & s x} depGame (Single iso) f = f (from iso ()) +> prodI iso idI ‘seqI‘ prodLUnitI depGame (Split iso g1a g1b) f = Split (prodI iso idI ‘seqI‘ prodLSumI) (depGame g1a (f ◦ from iso ◦ Left)) (depGame g1b (f ◦ from iso ◦ Right))

Composition by interleaving. Recall that prodGame pastes copies of the second game in the leaves of the first game. An alternative approach is to interleave the bits of the two games. We illustrate this graphically, starting with example games given below:

α1

The definition of depGame resembles the definition of prodGame, but notice how in the Single case we apply the f function to the singleton value to determine the game we must play next.

B1

A1

β3

B2

A2 α2

α3

β1

β2

Lists, revisited. We can use depGame to create an alternative encoding for lists. Suppose we are given a function

Interleaving the two games, starting with the left-hand game gives:

vecGame :: Game t → Nat → Game [t] -- Game t → ∀ n:nat, Game t^n

A1 B1 α1 , β 3

B2 α1 , β 1

that builds a game for lists of the given length. Its definition should be straightforward and we leave it as an exercise for the reader. We can then define a game for lists paired with their length, and use the isomorphism depListIso from Figure 3 to derive a new game for lists, as follows:

B1 A2

α1 , β 2

A2

B2 α2 , β 1

B2 α2 , β 2

α3 , β1

α2 , β 3

α3 , β 3

listGame’ :: Game t → Game [t] listGame’ g = depGame binNatGame (vecGame g) +> depListIso

α3 , β 2

19

3.

Properties of games

Maybe Nat

Pearly code is all very well, but is it correct? In this section we study the formal properties of game-derived codecs, proving basic correctness and termination results, and also the every bit counts property of the title. All theorems have been proved formally using the Coq proof assistant.

= Just 0? Maybe Nat \ {Just 0} = Just 1?

Just 0

Maybe Nat \ {Just 0, Just 1}

3.1

Correctness

= Just 2?

Just 1

The following round-trip property follows directly from the ‘left inverse’ property of isomorphisms embedded inside the games.

Just 2

L EMMA 1 (Enc/Dec). Suppose g : Game t and x : t. If enc g x = ` then dec g (` ++ `s ) = (x, `s ).

.. .

Figure 5: Game for optional naturals

The lemma asserts that if x encodes to a bitstring `, then the decoding of any extension of ` returns x together with the extension.

The reader can check that, with the exception of the game in Figure 5, the games presented so far are total; furthermore the combinators on games preserve totality.

The literature on coding theory [19] emphasizes the essential property of codes being unambiguous: no two values are assigned the same code. This follows directly from Lemma 1.

L EMMA 2 (Termination). Suppose g : Game t. If g is total then enc g terminates on all inputs.

C OROLLARY 1 (Unambiguous codes). Suppose g : Game t and v, w : t. If enc g v = ` and enc g w = ` then v = w.

3.3

A stronger property that implies unambiguity is prefix-freedom: no prefix of a valid code can itself be a valid code. For prefix codes, we can stop decoding at the first successfully decoded value: no ‘lookahead’ is required. This property also follows from Lemma 1, or can be proved directly from the definition of enc.

Compactness

Lemma 1 guarantees basic correctness of game-based codes.4 But we can go further, and show how to construct codecs for which every bit counts, i.e. there are no ‘wasted’ bits. Consider the following trivial codec for booleans:

C OROLLARY 2 (Prefix encoding). Suppose g : Game t and v, w : t. If enc g v = ` and enc g w = ` ++ `s then v = w.

boolGame :: Game Bool boolGame = Split boolIso unitGame unitGame

It is worth pausing for a moment to return briefly to the game binNatGame from Section 2.1. Observe that the ‘standard’ binary encoding for natural numbers is not a prefix code. For example the encoding of 3 is 11 and the encoding of 7 is 111. The extra bits inserted by binNatGame are necessary to convert the standard encoding to one which is a prefix encoding. The anticipated downside are the inserted ‘terminator’ bits that double the size of the encoding (but keeping it Θ(log n)).

Now consider a codec in which both 00 and 01 code for False, and 10 and 11 code for True. The second bit of this code is wasted, as the first bit uniquely determines the value. Fortunately, correct construction of game guarantees not only that two values will never be assigned the same code, but also that two codes cannot represent the same value.

3.2

We show this by first proving another round-trip property that follows directly from the ‘right inverse’ property of isomorphisms.

It encodes False as 0, as True as 1. You can’t do better than that!

Termination

L EMMA 3 (Dec/Enc). Suppose g : Game t. If dec g ` = (x, `s ) then there exists `p such that enc g x = `p and `p ++ `s = `.

A close inspection of Lemma 1 reveals that the property is conditional on the termination of the encoder. Although in traditional coding theory termination of encoding for any value is taken for granted, it doesn’t follow automatically for our game-based codecs.

Injectivity of decoding is a simple corollary. C OROLLARY 3 (Non-redundancy). Suppose dec g `1 = (x, []) and dec g `2 = (x, []). Then `1 = `2 .

Here is a problematic example of a somewhat funny game for the type Maybe Nat, appearing in Figure 5. At step i, the game asks whether the value in hand is Some i, or any other value in the type Maybe Nat. Notice that when asked to encode a value Nothing the encoder will simply play the game for ever, diverging.

Unfortunately non-redundancy doesn’t tell us that every bit counts. Consider a slight variation on the wasteful encoding of booleans above in which True is encoded as 11 and False as 00, and 01 and 10 are simply invalid. This corresponds to a question-answer game in which the question Are you True? is asked twice. We can write such a game, as follows:

That’s certainly no good! Fortunately, we can require games to be total, meaning that every element in the domain is represented by some leaf node.

-- precondition: t is uninhabited voidGame :: Game t voidGame = Split (splitIso (const True)) voidGame voidGame

D EFINITION 1 (Totality). A game g of type Game t is total iff for every value x of type t, there exists a finite path g x, where is inductively defined below: g1 Single (Iso a b)

b ()

Split (Iso a b) g1 g2 g2

badBoolGame :: Game Bool badBoolGame = Split (splitIso id) (Split (splitIso id) (constGame True) voidGame) (Split (splitIso id) voidGame (constGame False))

x1 b (Left x1 )

x2

Split (Iso a b) g1 g2

4 But,

to be fair, sometimes lossy coding may be acceptable; for instance in video codecs.

b (Right x2 )

20

• Games constructed from valid isomorphisms give rise to codes

It may take a little head-scratching to work out what’s going on: the question expressed with splitIso id asks whether a boolean value is True or False and goes Left or Right respectively. But in both branches we ask the same question again, though we’re now in a singleton set. Here’s a session that illustrates the badBoolGame behaviour:

that are unambiguous, prefix-free, non-redundant, and which satisfy a basic round-trip correctness property. • The encoder terminates if and only if the game is total. • If additionally the game is proper then every bitstring encodes some value or is the prefix of such a bitstring. For the the rest of this paper we embark in giving more ambitious and amusing concrete games for sets and λ-terms.

> enc badBoolGame False [O,O] > enc badBoolGame True [I,I] > dec badBoolGame [O,I] (False,*** Exception: Input too short > dec badBoolGame [I,O] (True,*** Exception: Input too short

4.

So far we have considered primitive and structured data types such as natural numbers, lists and trees, for which games can be constructed in a type-directed fashion. Indeed, we could even use generic programming techniques [12, 14] to generate games (and thereby codecs) automatically for such types.

The first question asked by the game effectively partitions the booleans into {False} and {True}. But these are singletons, so any further questions would not reveal further information. If we do ask a question, using Split, then one branch must be dead, i.e. have a domain that is not inhabited – hence the use of voidGame in the code.

But what about other structures such as sets, multisets or maps, in which implicit invariants or equivalences hold, and which our games could be made aware of? For example, consider representing sets of natural numbers using lists. We know (a) that duplicate elements do not occur, and (b) that the order doesn’t matter when considering a list-as-a-set. We could use listGame binNatGame for this type. It would satisfy the basic round-tripping property (Enc/Dec); however, bits would be ‘wasted’ in assigning distinct codes to equivalent values such as [1,2] and [2,1], and in assigning codes to non-values such as [1,1].

For domains more complex than Bool, such non-revealing questions are harder to spot. Suppose, for example, that in the game for programs described in the introduction, the first question had been ‘Are you a variable?’ Because we know that the program under inspection is closed, this question is silly, and we already know that the answer is no.

In this section we show how to represent sets and multisets efficiently. First, we consider the specific case of sets and multisets of natural numbers, for which we can hand-craft a ‘delta’ encoding in which every bit counts. Next, we show how for arbitrary types we can use an ordering on values induced by the game for the type to construct a game for sets of elements of that type.

We call a game proper if every isomorphism in Split nodes is a proper splitting of the domain. Equivalently, we make the following definition. D EFINITION 2 (Proper games). A game g of type Game t is proper iff for every subgame g 0 of type Game s, type s is inhabited. It is immediate that voidGame is not a proper game and consequently badBoolGame is not proper either.

4.1

Hand-crafted games

How can we encode the multiset {3, 6, 5, 6}? We might start by ordering the values to obtain the canonical representation [3, 5, 6, 6]. But now imagine encoding this using a vanilla list of natural numbers game listGame binNatGame: when encoding the second element, we would be wasting the codes for values 0, 1, and 2, as none of these values can possibly follow 3 in the ordering. So instead of encoding the value 5 for the second element of the ordered list, we encode 2, the difference between the first two elements. Doing the same thing for the other elements, we obtain the list [3, 2, 1, 0], which we can encode using listGame binNatGame without wasting any bits. To decode, we reverse the process and add the difference.

Codecs associated with proper games have a very nice property that justifies the slogan every bit counts: every possible bitstring either decodes to a unique value, or is the prefix of such a bitstring. L EMMA 4 (Every bit counts). Let g be a proper and total Game t. Then, if dec g ` fails then there exists `s and a value x of type t such that enc g x = ` ++ `s . The careful reader will have observed that this lemma requires that the game be not only proper, but also total. Consider the following variation of binNatGame from Section 2.2.

We can apply the same ‘delta’ idea for sets, except that the delta is smaller by one, taking account of the fact that the difference between successive elements must be non-zero.

badNatGame :: Game Nat badNatGame = Split parityIso badNatGame badNatGame

The question asked splits the input set of all natural numbers into two disjoint and inhabited sets: the even and the odd ones. However, there are no singleton nodes in badNatGame and hence Lemma 4 cannot hold for this game.

In Haskell, we implement diff and undiff functions that respectively compute and apply difference lists. diff minus [] = [] diff minus (x:xs) = x : diff’ x xs where diff’ base [] = [] diff’ base (x:xs) = minus x base : diff’ x xs

As a final observation, notice that even in a total and proper game with infinitely many leaves (such as the natural numbers game in Figure 1) there will be an infinite number of bit strings on which the decoder fails. By K¨onig’s lemma, in such a game there must exist at least one infinite path, and the decoder will fail on all prefixes of that path. 3.4

Sets and multisets

undiff plus [] = [] undiff plus (x:xs) = x : undiff’ x xs where undiff’ base [] = [] undiff’ base (x:xs) = base’ : undiff’ base’ xs where base’ = plus base x

Summary

The functions are parameterized on subtraction and addition operations, and are instantiated with appropriate concrete operations to

Here is what we have learned in this section.

21

obtain games for finite multisets and sets of natural numbers, as follows:

case removeLE g x of Just g’ → setGame’ g’ Nothing → constGame []

natMultisetGame :: Game Nat → Game [Nat] natMultisetGame g = listGame g +> Iso (diff (-) ◦ sort) (undiff (+))

Notice the dependent composition, which, once a value is determined plays the game having removed all smaller elements from it.5

natSetGame :: Game Nat → Game [Nat] natSetGame g = listGame g +> Iso (diff (λ x y → x-y-1) ◦ sort) (undiff (λ x y → x+y+1))

5.

We’re now ready to return to the problem posed in the introduction: how to construct games for programs. As with the games for sets described in the previous section, the challenge is to devise games that satisfy the every-bit-counts property, so that any string of bits represents a unique well-typed program, or is the prefix of such a code.

Here is the multiset game in action, using our binary encoding of natural numbers on the example multiset {3, 6, 5, 6}. > enc (listGame binNatGame) [3,6,5,6] [O,O,I,O,I,I,O,O,O,O,O,I,O,O,I,O,O,I,O,O,O,O,I,I,I] > enc (natMultisetGame binNatGame) [3,6,5,6] [O,O,I,O,I,I,O,O,O,I,O,O,I,I,O,I,I] > dec (natMultisetGame binNatGame) it ([3,5,6,6],[])

5.1

data Exp = Var Nat | Lam Exp | App Exp Exp

For any natural number n the game expGame n asks questions of expressions whose free variables are in the range 0 to n − 1.

Generic games

expGame expGame expGame Split where

That’s all very well, but what if we want to encode sets of pairs, or sets of sets, or sets of λ-terms? First of all, we need an ordering on elements to derive a canonical list representation for the set. Conveniently, the game for the element type itself gives rise to natural comparison and sorting functions: compareByGame :: Game a → (a → a → Ordering) compareByGame (Single _) x y = EQ compareByGame (Split (Iso ask bld) g1 g2) x y = case (ask x, ask y) of (Left x1 , Left y1) → compareByGame g1 x1 y1 (Right x2, Right y2) → compareByGame g2 x2 y2 (Left x1, Right y2) → LT (Right x2, Left y1) → GT sortByGame :: Game a → [a] → [a] sortByGame g = sortBy (compareByGame g)

:: Nat → Game Exp 0 = appLamG 0 n = (Iso ask bld) (rangeGame 0 (n-1)) (appLamG n) ask (Var i) = Left i ask e = Right e bld (Left i) = Var i bld (Right e) = e

If n is zero, then the expression cannot be a variable, so expGame immediately delegates to appLamG that deals with expressions known to be non-variables. Otherwise, the game is Split between variables (handled by rangeGame from Section 2) and non-variables (handled by appLamG). The auxiliary game appLamG n works by splitting between application and lambda nodes: appLamG n = Split (Iso ask bld) (prodGame (expGame n) (expGame n)) (expGame (n+1)) where ask (App e1 e2) = Left (e1,e2) ask (Lam e) = Right e bld (Left (e1,e2)) = App e1 e2 bld (Right e) = Lam e

We can then use the list game on a sorted list, but at each successive element adapt the element game so that ‘impossible’ elements are excluded. To do this, we write a function removeLE that removes from a game all elements smaller than or equal to a particular element, with respect to the ordering induced by the game. If the resulting game would be empty, then the function returns Nothing. removeLE :: Game a → a → Maybe (Game removeLE (Single _) x = Nothing removeLE (Split (Iso ask bld) g1 g2) x case ask x of Left x1 → case removeLE g1 x1 of Nothing → Just (g2 +> rightI) Just g1’ → Just (Split (Iso ask Right x2 → case removeLE g2 x2 of Nothing → Nothing Just g2’ → Just (g2’ +> rightI) where rightI = Iso (λx → case ask x (bld ◦ Right)

No types

First let’s play a game for the untyped λ-calculus, declared as a Haskell datatype using de Bruijn indexing for variables:

As expected, the encoding is more compact than a vanilla list representation. Observe that here the round-trip property holds up to equivalence of lists when interpreted as multisets: encoding [3,6,5,6] and then decoding it results in an equivalent but not identical value [3,5,6,6]. 4.2

Codes for programs

For application terms we play prodGame for the applicand and applicator. For the body of a λ-expression the game expGame (n+1) is played, incrementing n by one to account for the bound variable.

a)

Let’s run the game on the expression I K where I = λx.x and K = λx.λy.x.

=

> let tmI = Lam (Var 0) > let tmK = Lam (Lam (Var 1)) > enc (expGame 0) (App tmI tmK) [O,I,O,I,I,I,O,I] > dec (expGame 0) it (App (Lam (Var 0)) (Lam (Lam (Var 1))),[])

bld) g1’ g2)

of Right y → y)

It’s easy to validate by inspection the isomorphisms used in expGame. It’s also straightforward to prove that the game is total and proper.

The code for listGame can then be adapted to do sets: setGame :: Game a → Game [a] setGame g = setGame’ g +> Iso (sortByGame g) id where setGame’ g = Split listIso unitGame $ depGame g $ λx →

5 The

$ notation is just Haskell syntactic sugar that allows applications to be written with fewer parentheses: f (h g) can be written as f $ h g.

22

Γ ` e1 : τ1 → τ2

x:τ ∈ Γ Γ`x:τ

VAR

Ty → Bool and an environment, and returns a game for all those indices (of type Nat) whose type in the environment matches the

Γ ` e2 : τ1

Γ ` e1 e2 : τ 2

A PP

predicate. varGame :: (Ty → Bool) → Env → Maybe (Game Nat) varGame f [] = Nothing varGame f (t:env) = case varGame f env of Nothing → if f t then Just (constGame 0) else Nothing Just g → if f t then Just (Split succIso unitGame g) else Just (g +> Iso pred succ)

Γ, x:τ1 ` e : τ2 Γ ` λx:τ1 .e : τ1 → τ2

L AM

Figure 6: Simply-typed λ-calculus 5.2

Notice that varGame returns Nothing when no variable in the environment satisfies the predicate. In all other cases it traverses the input environment. If the first type in the input environment matches the predicate and there is a possibility for a match in the rest of the input environment varGame returns a Split that witnesses this possible choice. It is easy to see that when varGame returns some game, that game will be proper.

Simple types

We now move to the simply-typed λ-calculus, whose typing rules are shown in conventional form in Figure 6. In Haskell, we define a data type Ty for types and Exp for expressions, differing from the untyped language only in that λabstractions are annotated with the type of the argument:

The function expGame accepts an environment and a pattern and returns a game for all expressions that are well-typed under the environment and whose type matches the pattern.

data Ty = TyNat | TyArr Ty Ty deriving (Eq, Show) data Exp = Var Nat | Lam Ty Exp | App Exp Exp

expGame :: Env → Pat → Game Exp -- ∀ (env:Env) (p:Pat), -Game { e | ∃ t, env ` e : t && matches p t = true } expGame env p = case varGame (matches p) env of Nothing → appLamG Just varG → Split varI varG appLamG where appLamG = Split appLamI appG (lamG p) appG = depGame (expGame env Any) $ λe → expGame env (PArr (typeOf env e) p) lamG (PArr t p) = prodGame (constGame t) $ expGame (t:env) p lamG Any = depGame tyG $ λt → expGame (t:env) Any

Type environments are just lists of types, indexed de Bruijn-style. It’s easy to write a function typeOf that determines the type of an open expression under some type environment – assuming that it is well-typed to start with. type Env = [Ty] typeOf :: Env → Exp typeOf env (Var i) = typeOf env (App e _) typeOf env (Lam t e)

→ Ty env !! i = let TyArr _ t = typeOf env e in t = TyArr t (typeOf (t:env) e)

We’d like to construct a game for expressions that have type t under some environment env. If possible, we’d like the game to be proper. But wait: there are combinations of env and t for which no expression even exists, such as the empty environment and the type TyNat. We could perhaps impose an ‘inhabitation’ precondition on the parameters of the game. But this only pushes the problem into the game itself, with sub-games solving inhabitation problems lest they ask superfluous questions and so be non-proper. As it happens, type inhabitation for the simply-typed λ-calculus is decidable but PSPACE-complete [20], which serves to scare us off!

varI = Iso ask bld where ask ask bld bld appLamI = Iso ask bld where ask (App e1 e2) = ask (Lam t e) = bld (Left (e2,e1)) = bld (Right (t,e)) =

We can make things easier for ourselves by solving a different problem: fix the type environment env (as before), but instead of fixing the type as previously, we will instead fix a pattern of the form τ1 → · · · → τn → ? where ‘?’ is a wildcard standing for any type. It’s easy to show that for any environment env and pattern there exists an expression typeable under env whose type matches the pattern.

(Var x) e (Left x) (Right e)

= = = =

Left x Right e Var x e

Left (e2,e1) Right (t,e) App e1 e2 Lam t e

The expGame function first determines whether the expression can possibly be a variable, by calling varGame. If this is not possible (case Nothing) the game proceeds with appLamG that will determine whether the non-variable expression is an application or a λabstraction. If the expression can be a variable (case Just varG) then we may immediately Split with varI by asking if the expression is a variable or not – it not we may play appLamG as in the first case. The appLamG game uses appLamI to ask whether the expression is an application, and then plays game appG; or a λ-abstraction, and then plays game lamG. The appG performs a dependent composition. After playing a game for the argument of the application, it binds the argument value to e and plays expGame for the function value, using the type of e to create a pattern for the function value. The lamG game analyses the pattern argument. If it is an arrow pattern we play a composition of the constant game for the type given by the pattern with the expression for the body of the λ-abstraction in the extended environment. On the other hand, if the pattern is Any we first play game tyG for the argument type, bind the type to t and play expGame for the body of the abstraction using t to extend the environment.

We can define such patterns using a data type Pat, and write a function that determines whether or not a type matches a pattern. data Pat = Any | PArr Ty Pat matches :: Pat → Ty → Bool matches Any _ = True matches (PArr t p) (TyArr t1 t2) = t1==t && matches p t2 matches _ _ = False

Now let’s play some games. Types are easy: tyG :: Game Ty tyG = Split (Iso ask bld) unitGame (prodGame tyG tyG) where ask TyNat = Left () ask (TyArr t1 t2) = Right (t1,t2) bld (Left ()) = TyNat bld (Right (t1,t2)) = TyArr t1 t2

That was it! Let’s test expGame on the example expression from Section 1: λx:Nat.λy:Nat.x.

To define a game for typed expressions we start with a game for variables. The function varGame below accepts a predicate

23

In other words the expGameCheck game is non-proper and hence violates the every bit counts property. On the other hand it’s definitely a useful game and enjoys all other properties we’ve been discussing in this paper. Happily, there is a way to convert non-proper games to proper games in many cases and we return to this problem in the next section.

> let ex = Lam TyNat (Lam TyNat (Var 1)) > enc (expGame [] Any) ex [O,I,O,O,I,I,O] > dec (expgame [] Any) it (Lam TyNat (Lam TyNat (Var 1)),[])

Compare the code with that obtained in the introduction. A perfect match – we have been using the same question scheme! Finally we can show properness and totality.6

6.

P ROPOSITION 1. For all patterns p and environments Γ, the game expGame Γ p is proper and total for the set of expressions e such that Γ ` e : τ and τ matches the pattern p.

Non-proper filtering. Sometimes it’s convenient not to be proper. Using voidGame from Section 3.3 we can write filterGame, which accepts a game and a predicate on t and returns a game for those elements of t that satisfy the predicate.

5.3

Stronger non-proper games for typed expressions

Filtering games

filterGame :: (t → Bool) → Game t → Game t -- ∀ (p : t → Bool), Game t → Game { x | p x } filterGame p g@(Single (Iso _ bld)) = if p (bld ()) then g else voidGame filterGame p (Split (Iso ask bld) g1 g2) = Split (Iso ask bld) (filterGame (p ◦ bld ◦ Left) g1) (filterGame (p ◦ bld ◦ Right) g2)

Let us be brave now and return to the original problem. Given any environment and type we will construct a game for expressions typeable in that environment with that type. As we have noted above, obtaining a proper game (and hence an every bit counts encoding) is difficult, but we can certainly obtain a game easily without having to implement a type inhabitation solver if we give up properness. The function expGameCheck below does that.

It works by inserting voidGame in place of all singleton nodes that do not satisfy the filter predicate. We may, for instance, filter a game for natural numbers to obtain a game for the even natural numbers.

-- ∀ (env:Env) (t:Ty), Game { e | env ` e : t } expGameCheck :: Env → Ty → Game Exp expGameCheck env t = case varGame (== t) env of Nothing → appLamG t Just varG → Split varI varG (appLamG t) where appLamG TyNat = appG +> Iso (λ(App e1 e2)→(e2,e1)) (λ(e2,e1)→App e1 e2) appLamG (TyArr t1 t2) = let ask (App e1 e2) = Left (e2,e1) ask (Lam t e) = Right e bld (Left (e2,e1)) = App e1 e2 bld (Right e) = Lam t1 e in Split (Iso ask bld) appG (lamG t1 t2) appG = depGame (expGame env Any) $ λe → expGameCheck env (TyArr (typeOf env e) t) lamG t1 t2 = expGameCheck (t1:env) t2

> enc (filterGame even binNatGame) 2 [I,I,O] > dec (filterGame even binNatGame) [I,I,O] (2,[])

Naturally, since the game is no longer proper, decoding can fail: > dec (filterGame even binNatGame) [I,O,I,O,O,I,I,I,I] (*** Exception: Input too short

Moreover, for the above bitstring, no suffix is sufficient to convert it to a valid code – we have entered the voidGame non-proper world. What is so convenient with the non-proper filterGame implementation? First, the structure of the original encoding is intact with only some codes being removed. Second, it avoids hard inhabitation questions that may involve theorem proving or search.

Similarly to expGame, expGameCheck first determines whether the expression can be a variable or not and uses the variable game or the appLamG next. The appLamG game in turn pattern matches on the input type. If the input type is TyNat the we know that the expression can’t possibly be a λ-abstraction and hence play the appG game. On the other hand, if the input type is an arrow type TyArr t1 t2 then the expression may be either application or abstraction. The application game appG as before plays a game for the argument of an application, binds it to e and recursively calls expGameCheck using the type of e. Interestingly we use expGame env Any to determine the type of the argument – alternatively we could perform a dependent composition where the first thing would be to play a game for the argument type, and subsequently use that type to play a game for the argument and the function. The lamG game is straightforward.

Proper finite filtering. Now let’s recover properness, with the following variant on filtering: filterFinGame :: (t → Bool) → Game t → Maybe (Game t) -- ∀ (p : t → Bool), Game t → option (Game { x | p x }) filterFinGame p g@(Single (Iso _ bld)) = if p (bld ()) then Just g else Nothing filterFinGame p (Split iso@(Iso ask bld) g1 g2) = case (filterFinGame (p ◦ bld ◦ Left) g1, filterFinGame (p ◦ bld ◦ Right) g2) of (Nothing, Nothing) → Nothing (Just g1’, Nothing) → Just $ g1’ +> iso1 (Nothing, Just g2’) → Just $ g2’ +> iso2 (Just g1’, Just g2’) → Just $ Split iso g1’ g2’ where fromLeft (Left x) = x fromRight (Right x) = x iso1 = Iso (fromLeft ◦ ask) (bld ◦ Left ) iso2 = Iso (fromRight ◦ ask) (bld ◦ Right)

There are no obvious empty types in this game – why is it non proper? Consider the case when the environment is empty and the expected type is TyNat. According to expGameCheck the game to be played will be the appG game for applications. But there can’t be any closed expressions of type TyNat to start with, and the game can’t possibly have any leaves – something that we failed to check. We’ve asked a silly question (by playing appG) on an uninhabited type!

The result of applying filterFinGame is of type Maybe (Game t). If no elements in the original game satisfy the predicate, then filterFinGame returns Nothing, otherwise it returns Just a game for those elements of t satisfying the predicate. In contrast to filterGame, though, filterFinGame preserves proper-ness: if the input game is proper, then the result game is too. It does this by eliminating Split nodes whose subgames would be empty.

6 Since we do not have expGame in Coq, we’ve only shown this on paper, hence it’s a Proposition and not a Theorem.

24

There is a limitation, though, as its name suggests: filterFinGame works only on finite games. This can be inferred from the observation that filterFinGame explores the game tree in a depth-first manner. Nevertheless, for such finite games we can use it profitably to obtain efficient encodings:

though our goal has been to explain the semantics of games and not their optimization and hence we used the easier-to-grasp definition of a game as just a familiar tree datatype. It’s also worth noting that the encoding and decoding functions can be specialized by hand for particular games, eliminating the game construction completely. For a trivial example, consider inlining unaryNatGame into enc, performing a few simplifications, to obtain the following code:

> enc (fromJust (filterFinGame even (rangeGame 0 7))) 4 [I,O]

Compare this to the original encoding before filtering:

encUnaryNat x = case x of 0 → I : [] n+1 → O : encUnaryNat n

> enc (rangeGame 0 7) 4 [I,O,O]

Compression. For reasons of space, we have compressed away any discussion of classic techniques such as Huffman coding. In the accompanying code, however, the reader can find a function huffGame that accepts a list of frequencies associated with elements of type t and returns a Game t constructed using the Huffman technique. Adaptive (or dynamic) Huffman encoding is achieved using just two more lines of Haskell!

Proper infinite filtering. What about infinite domains, as is typically the case for recursive types? Can we implement a filter on games that produces proper games for such types? The answer is yes, if we are willing to drastically change the original encoding that the game expressed, and if that original game has infinitely many leaves that satisfy the filter predicate. Here is the idea, not given here in detail for reasons of space, but implemented in the accompanying code as function filterInfGame: perform a breadth-first traversal of the original game, and each time you encounter a new singleton node (that satisfies the predicate) insert it into a right-spined tree:

α1 α2

.. . α3

Investigation of other compression techniques using games remains future work. In particular, we would like to integrate arithmetic coding, for which slick Haskell code already exists [2]. It would also be interesting to make use of statistical models in our games for typed programs [3], producing codes that are even more compact than is attained purely through the use of type information. Test generation. Test generation tools such as Quickcheck [4] are a potential application of game-based decoding, since generating random bitstrings amounts to generating programs. As a further direction for research, we wold like to examine how the programmer could affect the distribution of the generated programs, by tweaking the questions asked during a game.

α1 .. .

=⇒

α2 α3

.. .

Program development and verification in Coq. Our attempts to encode everything in this paper in Coq tripped over Coq’s limited support for co-recursion, namely the requirement that recursive calls be guarded by constructors of coinductive data types [1]. In many games for recursive types the recursive call was under a use of a combinator such as prodGame, which was itself guarded. Whereas it is easy to show on paper that the resulting co-fixpoint is well-defined (because it is productive), Coq does not admit such definitions. On the positive side, using the proof obligation generation facilities of Program [21] was a very pleasant experience. Our Coq code in many cases has been a slightly more verbose version of the Haskell code (due to the more limited type inference), but the isomorphism obligations could be proven on the side. Our overall conclusion from the experience is that Coq itself can become a very effective development platform but it would benefit from better support for more general patterns of recursion, co-recursion, and type inference.

The ability to become proper in this way can help us recover proper games for simply-typed expressions of a given type in a given environment, from the weaker games that expGameCheck of Section 5.3 produces, if we have a precondition that there exists one expression of the given type in the given environment. If there exists one expression of the given type in the given environment, there exist infinitely many, and hence the expGameCheck game has infinitely many inhabitants. Consequently it is possible to rebalance it in the described way to obtain a proper game for simply-typed expressions! expGameCheckProper env t = filterInfGame (λ_ → True) (expGameCheck env t)

7.

Discussion

Practicality. There is no reason to believe that the game-based approach is suitable only for theoretical investigations but not for ‘real’ implementations. To test this hypothesis we intend to apply the technique to a reasonably-sized compiler intermediate language such as Haskell Core [22] or .NET CIL [7]. (We’ve already created an every-bit-counts codec for ML-style let polymorphism.)

8.

Related work

Our work has strong connections to Kennedy’s pickler combinators [16]. There, a codec was represented by a pair of encoder and decoder functions, with codecs for complex types built from simple ones using combinators. The basic round-trip property (Enc/Dec) was considered informally, but stronger properties were not studied. Before developing the game-based codecs, we implemented by hand encoding and decoding functions for the simply-typed λcalculus. Compared to the game presented in Section 5, the code was more verbose – partly because out of necessity both encoder and decoder used the same ‘logic’. In our opinion, games are more succint representations of codecs, and are easier to verify, requiring only local reasoning about isomorphisms. Note that other related

Determining the space complexity of games is somewhat tricky: as we navigate down the tree, pointers to thunks representing both the left and the right subtrees are kept around, although only one of two pointers is relevant. An optimization would involve embedding the next game to be played on inside the isomorphism, by making the ask functions return not only a split but also, for each alternative (left or right), a next game to play on. Hence only the absolutely relevant parts of the game would be kept around during encoding and decoding. This representation could then be subject to the optimizations described in stream fusion work [5]. For this paper

25

work [6] identifies and formally proves similar round-trip properties for encoders and decoders in several encryption schemes.

[3] J. Cheney. Statistical models for term compression. In DCC ’00: Proceedings of the Conference on Data Compression, page 550, Washington, DC, USA, 2000. IEEE Computer Society.

One can think of games as yet another technique for datatypegeneric programming [12], where one of the most prominent applications is generic marshalling and unmarshalling. Many of the approaches to datatype-generic programming [14] are based on the structural representations of datatypes, typically as fixpoints of functors consisting of sums and products. It is straightforward to derive automatically a default ‘structural’ game for recursive and polymorphic types. On the other hand, games are convenient for expressing semantic aspects of the values to be encoded and decoded, such as naturals in a given range. Moreover, the state of a game and therefore the codes themselves can be modified as the game progresses, which is harder (but not impossible, perhaps through generic views [15]) in datatype-generic programming techniques.

[4] K. Claessen and J. Hughes. Quickcheck: a lightweight tool for random testing of Haskell programs. In ICFP ’00: Proceedings of the fifth ACM SIGPLAN International Conference on Functional Programming, pages 268–279, New York, NY, USA, 2000. ACM. [5] D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In ICFP ’07: Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, pages 315–326, New York, NY, USA, 2007. ACM. [6] J. Duan, J. Hurd, G. Li, S. Owens, K. Slind, and J. Zhang. Functional correctness proofs of encryption algorithms. In Logic for Programming, Artificial Intelligence and Reasoning (LPAR), volume 3835 of LNCS, pages 519–533. Springer, 2005. [7] ECMA. Standard ECMA-335: Common language infrastructure (CLI), 2006.

Another related area of work is data description languages, which associate the semantics of types to their low-level representations [9]. The interpetation of a datatype is a coding scheme for values of that datatype. There, the emphasis is on avoiding manually having to write encode and decode functions. Our goal is slightly different; more related to the properties of the resulting coding schemes and their verification rather than the ability to automatically derive encoders and decoders from data descriptions.

[8] P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2):197–203, 1975. [9] K. Fisher, Y. Mandelbaum, and D. Walker. The next 700 data description languages. SIGPLAN Not., 41(1):2–15, 2006. [10] M. Franz, V. Haldar, C. Krintz, and C. H. Stork. Tamper-proof annotations by construction. Technical Report 02-10, Dept of Information and Computer Science, University of California, Irvine, March 2002.

Though we have not seen games used for writing and verifying encoders and decoders, tree-like structures have been proposed as representations of mathematical functions. Ghani et al. [11] represent continuous functions on streams as binary trees. In our case, thanks to the embedded isomorphisms, the tree structures represent at the same time both the encode and the decode functions.

[11] N. Ghani, P. Hancock, and D. Pattinson. Representations of stream processors using nested fixed points. Logical Methods in Computer Science, 5(3), 2009. [12] J. Gibbons. Datatype-generic programming. In R. Backhouse, J. Gibbons, R. Hinze, and J. euring, editors, Datatype-Generic Programming, volume 4719 of LNCS, chapter 1, pages 1–71. Springer, Berlin, Heidelberg, 2007.

Other researchers have investigated typed program compression, claiming high compression ratios for every-bit-counts (and hence tamper-proof) codes for low-level bytecode [13, 10]. Although that work is not formalized, it is governed by the design principle of only asking questions that ‘make sense’. That is precisely what our properness property expresses, which provably leads to every bit counts codes. Also closely related is the idea behind oracle-based checking [18] in proof carrying code [17]. The motivation there is to eliminate proof search for untrusted software and reduce the size of proof encodings. In oracle-based checking, the bitstring oracle guides the proof checker in order to eliminate search and unambiguously determine a proof witness. Results report an improvement of a factor of 30 in the size of proof witnesses compared to their na¨ıve syntactic representations. Although not explicitly stated in this way, oracle-based checking really amounts to some game for well-typed terms in a variant of LF.

[13] V. Haldar, C. H. Stork, and M. Franz. The source is the proof. In NSPW ’02: Proceedings of the 2002 workshop on New security paradigms, pages 69–73, New York, NY, USA, 2002. ACM. [14] R. Hinze, J. Jeuring, and A. L¨oh. Comparing approaches to generic programming in Haskell. In Spring School on Datatype-Generic Programming, 2006. [15] S. Holdermans, J. Jeuring, A. L¨oh, and A. Rodriguez. Generic views on data types. In In T. Uustalu, editor, Proceedings of the 8th International Conference on Mathematics of Program Construction, MPC06, volume 4014 of LNCS, pages 209–234. Springer, 2006. [16] A. J. Kennedy. Functional Pearl: Pickler Combinators. Journal of Functional Programming, 14(6):727–739, October 2004. [17] G. C. Necula and P. Lee. The design and implementation of a certifying compiler. In PLDI ’98: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, pages 333–344, New York, NY, USA, 1998. ACM.

Acknowledgments The authors appreciated the lively discussions on this topic at the ‘Type Systems Wrestling’ event held weekly at MSR Cambridge. Special thanks to Johannes Borgstr¨om for his helpful feedback, and to the anonymous reviewers for many helpful suggestions.

[18] G. C. Necula and S. P. Rahul. Oracle-based checking of untrusted software. In POPL ’01: Proceedings of the 28th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 142–154, New York, NY, USA, 2001. ACM.

References

[20] M. H. Sørensen and P. Urzyczyn. Lectures on the Curry-Howard Isomorphism, Volume 149 (Studies in Logic and the Foundations of Mathematics). Elsevier Science Inc., New York, NY, USA, 2006.

[19] D. Salomon. A Concise Introduction to Data Compression. Undergraduate Topics in Computer Science. Springer, 2008.

[1] Y. Bertot and P. Casteran. Interactive Theorem Proving and Program Development. Springer-Verlag, 2004.

[21] M. Sozeau. Subset coercions in Coq. In Selected papers from the International Workshop on Types for Proofs and Programs (TYPES ’06), pages 237–252. Springer, 2006.

[2] R. Bird and J. Gibbons. Arithmetic coding with folds and unfolds. In J. Jeuring and S. Peyton Jones, editors, Advanced Functional Programming 4, volume 2638 of Lecture Notes in Computer Science, pages 1–26. Springer-Verlag, 2003. Code available at http://www.comlab.ox.ac.uk/oucl/work/jeremy.gibbons/ publications/arith.zip.

[22] M. Sulzmann, M. Chakravarty, and S. Peyton Jones. System F with type equality coercions. In ACM Workshop on Types in Language Design and Implementation (TLDI). ACM, 2007.

26

ReCaml: Execution State as the Cornerstone of Reconfigurations J´er´emy Buisson

Fabien Dagnat

Universit´e Europ´eenne de Bretagne ´ Ecoles de St-Cyr Co¨etquidan / VALORIA Guer, France [email protected]

Universit´e Europ´eenne de Bretagne Institut T´el´ecom / T´el´ecom Bretagne Plouzan´e, France [email protected]

Abstract To fix bugs or to enhance a software system without service disruption, one has to update it dynamically during execution. Most prior dynamic software updating techniques require that the code to be changed is not running at the time of the update. However, this restriction precludes any change to the outermost loops of servers, OS scheduling loops and recursive functions. Permitting a dynamic update to more generally manipulate the program’s execution state, including the runtime stack, alleviates this restriction but increases the likelihood of type errors. In this paper we present ReCaml, a language for writing dynamic updates to running programs that views execution state as a delimited continuation. ReCaml includes a novel feature for introspecting continuations called match cont which is sufficiently powerful to implement a variety of updating policies. We have formalized the core of ReCaml and proved it sound (using the Coq proof assistant), thus ensuring that state-manipulating updates preserve type-safe execution of the updated program. We have implemented ReCaml as an extension to the Caml bytecode interpreter and used it for several examples. Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifications—Applicative (functional) languages; D.3.3 [Programming Languages]: Language Constructs and Features—Control structures; D.3.4 [Programming Languages]: Processors—Compilers; F.3.2 [Logics and Meanings of Programs]: Semantics of Programming Languages General Terms

Languages

Keywords dynamic software updating, continuation, functional language, execution state introspection, static typing, Caml

1.

Introduction

Stopping a critical and long-running system may not be possible or more simply not acceptable as it would incur an excessive financial or human cost. Dynamic software updating technology addresses this challenge by enabling updates to running software, including bugfixes, feature additions, or even temporary instrumentation for diagnosis or performance tuning [3, 28]. One of the main issues

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

27

when updating a running software is to ensure safety. After an update, the modified software must remain consistent and continue to achieve its goals. Final results must not be compromised even if intermediate results are reused in a different context. The Gmail outage in February 2009 [11] has shown possible consequences of unsafe updates: an update of the data placement service inconsistent with the redundancy strategy has caused a global denial of service. Much prior work on dynamic software updating has observed that forms of safety (such as type safety) can be ensured by restricting updates to active system components [1, 9, 19, 22, 36]. For example, if an update attempts to fix a bug in function foo, then the update may be rejected if foo happens to be on the call stack. Baumann et al. [7] and Arnold and Kaashoek [4] report that for an OS kernel, up to 80% to 90% of the security fixes are supported by this approach. However, it happens that a function that never becomes passive, potentially in critical parts of the software system, needs to be updated. Not being able to update actively running functions prevents for instance updating the outermost loop of a server. Extracting loop bodies into separate functions [28] makes the code briefly inactive between each iteration. However, this technique does not solve any of the following additional cases. The primary Linux scheduler function is never passive as it is on the stack of all threads [4]. Baumann et al. [7] also mention exception handlers in a kernel which may need update at runtime [26]. The use of some compilers that squash software structure makes the situation even worse. For example, synchronous languages, used to program embedded systems, rely on compilers [2] that interleave instructions coming from many components and depending on the same input data into a single block of code. The compiled software structure thus causes what were once independent source-code units to be considered active when any one of them is. In order to support more updates, Hofmeister and Purtilo [20] have proposed to focus on the execution state rather than the program structure. Upon update, the runtime stack is captured, adjusted then restored. Because the stack is appropriately handled, it does not matter if some of the updated functions are actively running. However, this approach has currently no formal semantics and provides no guarantee that update developers will not produce type-incorrect states. This paper places the execution state approach [10, 20, 25] on safer ground by defining ReCaml, a functional language designed for manipulating execution states in a safe manner. We have defined ReCaml formally and proved it sound. Viewing the execution state as a delimited continuation [15], updating a computation consists in capturing, modifying and reinstating a continuation. To support the modification of a continuation, we define a new “match cont” pattern-matching operator. It matches a continuation with call sites

2.1

to decompose it in stack frames peforming specific update actions on each of them. Depending on the execution state, the update programmer specifies the action to apply, e.g., discarding a frame, modifying a frame or keeping it unmodified. Combining such actions, the approach is flexible enough to support many policies, such as completing the computation at the old version, combining old results with subsequent new computation, or discarding old results for recomputing entirely at the new version. Attaching types to call sites allows us to check that the “match cont” operator is well typed, and therefore that stack introspection is correct. The main contributions of our work are:

There is no point in splitting this code in finer structural elements1 . This program is built around a single recursive function, whose outermost execution completes only when the whole program terminates. Hence trying to passivate the fib function makes no sense. If old and new versions can be mixed, dynamic rebinding [12, 14] obviously solves the problem: active calls complete with the old version while new calls can be directed to the new version. Usually, this assumption implies that the type of the rebound function does not change. If the type of the fib function is changed, then rebinding it breaks consistency. An update has therefore to deal with the current execution state. It corresponds to the stack of calls already started with their arguments. Such ongoing calls are called activations in the rest of the paper. Updating a function requires to specify the action to handle each activation. Such specifications are called compensations2 . For example, updating a function f of type τ1 → τ2 while changing its type to τ10 → τ20 may require to convert its argument to its new type (τ10 ) or its result to be used by code expecting values of the old type (τ2 ). More generally, a compensation can:

• Explicit execution state management. Updates are expressed as

manipulations of the execution states. The work of update developers focuses mainly on this aspect, which we call compensation. In doing so, a developer can implement resulting deterministic behaviors by explicitly controlling the operations executed by the update depending on its timing. • Optimistic update. As a consequence of the previous point, up-

dates can occur at any time. A compensation ensures consistency afterwards, according to the execution state at the time of the update. Therefore, no preventive action (such as waiting for elements of the software to become inactive) is required. In addition, even if updates might not be effective immediately, they are executed with no delay.

• yield to the activation, hence executing the old version until

the completion of the activation. The result may need to be converted to conform to the new type of its calling activation if it has changed. Note that this is the semantics of Erlang [14], Java HotSwap [12] and more generally of dynamic rebinding, where result conversion is the identity function.

• DSU as manipulation of delimited continuations. While contin-

uations are common when studying languages and modelling exceptions and coroutines, they have not before been used for dynamic software updating. Relying on continuations, ReCaml does not require any source code transformation or any specific compilation scheme. DSU as manipulation of continuations fits nicely within a functional framework.

• cancel the activation, hence starting over the call with the new

version. Call parameters shall be converted according to the new version. The result shall also be converted according to how the compensation handles its calling activation. • extract intermediate results from the activation in order to feed

• Formal semantics and static type system. ReCaml comes with

some custom code. Depending on how the calling activation is compensated, this custom code computes the new result in place of the canceled activation.

operators for capturing, modifying and reinstating continuations. It is equipped with a formal operational semantics. Although it is aimed at manipulating execution states, which are dynamic structures, ReCaml, and especially the continuation manipulation, is statically typed. The type system is proved to be sound using the Coq theorem prover.

The relative worth of each strategy depends on the time at which the update occurs. For example, if the considered activation is close to its completion, then it may be worthwhile to let it complete its execution. If the activation has started recently, then it may be better to abort and start over. If the update occurs in the middle of the execution period, then the third option could be more appropriate. In the third option, the amount of reusable intermediate results varies depending on the old and new versions. The extreme case where no intermediate result can be reused matches the second option, i.e., aborting the activation and starting over the call. The quantity of reusable results gives an additional hint in order to choose the most advantageous option.

• Working prototype. We have developed a prototype of ReCaml,

which we have used to implement a few examples. In Section 2 we first present concrete strategies based on our approach. Section 3 outlines our approach. Section 4 describes in details ReCaml, the formal language underlying our approach. Section 5 discusses implementation issues.

2.

Initial Remarks and Overall Approach

Update Complexity vs Application Simplicity

In this section, our aim is to convince the reader that updates can be so complex that the search for sophisticated solutions is justified. We are aware that supporting tools will be required in order to ease the proposed solution. We leave this problem to subsequent work, beyond the scope of this paper. Our argumentation relies on a program computing a Fibonacci number. This very simple toy example is just aimed as a proof of concept to illustrate the difficulties when updating a program which is repeatedly active at the time of the update. If updates are already complex for such a simple program, then it should be worse for real applications. The initial version of our example is:

2.2

Replacing the Type of Integers

We first emphasize problems arising when modifying a type. As the computed Fibonacci number becomes high, using fixed-size integers will result in an overflow. Instead, it is safer to use arbitrary precision integers. The new version of the program is3 : 1 Except

possibly abstracting arithmetic operations in the integer data type. Here, the abstract data type is implicit as Haskell’s Num type class. 2 Makris and Bazzi [25] use the name stack/continuation transformer and Gupta et al. [18] use state mapping. Being functional, ReCaml does not allow in place modification of a continuation but favors the construction of a new future. Hence, we prefer a new name to avoid misunderstanding. 3 In Caml libraries, num of int is the function that converts an integer to arbitrary precision; +/ is the addition over arbitrary precision integers.

let rec fib n = if n < 2 then n else ( fib (n−1)) + ( fib (n−2)) in fib 12345

28

In addition, our examples show that even for a single application, the right scheme depends on the update itself. This is the reason why we argue in favor of a mechanism that allows developers to design specific schemes for each update. This approach would not prevent proposing some update schemes “off-the-shelf”, e.g., relying on some tools such as code generators, thus avoiding burdening developers when possible. Makris and Bazzi [25] for instance have already proposed such automatic generation strategies.

let rec fib n = if n < 2 then n u m o f i n t n else ( fib (n−1)) +/ ( fib (n−2)) Obviously, using dynamic rebinding forbids this update as the type of fib is changed and there is at least one active call. Assuming that the integer data type has been well abstracted, one possible strategy could consist in updating this data type, like Gilmore et al. [16] and Neamtiu et al. [28] do. This approach has two major drawbacks. First, it updates all the uses of integers, while we want that only the result of the fib function has the overhead of arbitrary precision integers. Second, at the time of the update, some of the executions of the fib function might have already produced overflowed integers. A systematic update of all integers has no chance to distinguish the overflowed values that must be recomputed. One possible update is as follows. Given an activation, if none of the recursive calls has been evaluated, then the activation can start over with the new version of the function. Otherwise, the compensation checks intermediate results in order to detect whether an overflow has occurred. Only non-overflowed results are converted to the new type. Overflowed or missing results are computed using the new version. Last, the compensation uses the arbitrary precision operator in order to perform the addition. The compensation handles caller activations in a similar way, taking into account the fact that the type of the call result has already been converted. The code of this update is outlined in Section 3 and detailed in Section 6 to illustrate ReCaml. 2.3

3.

Introducing Memoization

Second, we emphasize difficulties that occur when changing the algorithmic structure. In our example, there is a well-known algorithm with linear time complexity, while the initial one has exponential time complexity. The new version of the program is4 :

let rec fib n = if n < 2 then n else ( let fn1 = fib (n−1) in let fn2 = fib (n−2) in fn1 + fn2 ) in fib 12345

let rec fib ’ n i fi fi1 = if i = n then fi else fib ’ n ( i +1) ( fi +/ fi1 ) fi in let fib n = if n < 2 then n u m o f i n t n else fib ’ n 2 1/ 1/

Using these labels, the update developer can write a function that chooses the most appropriate strategy for each activation of fib depending on the point it has reached. The main function compensating the effect of the update from int to num is given below. At each step, this function match fib callers proceeds by finding what is the state of the activation at the top of the current continuation (k) using match cont. The second parameter (r) is the result value that would have been used to return to the top stack frame.

We can safely mix new and old versions and rebind dynamically the name fib as the type of the function is not changed. However, in this case, the effective behavior still has polynomial time complexity. Indeed, in the worst case, there is a stack of n activations of the old function, each of which subsequently performs up to one call to the new version. The effective behavior is worse than aborting and starting over the program, which is not satisfactory. A better way to perform this update is to look out for two consecutive Fibonacci numbers in intermediate results. The new version is evaluated from the greatest pair, passed as parameters to the fib’ function. If there is no such pair, it is not worth reusing any intermediate result and the program would rather start over. 2.4

Overview of the Approach

In the above examples, the key mechanism is the ability to introspect activations when updating. Updates of Section 2 require intermediate results from activations. They also need to identify what has been done and what has still to be evaluated in each activation. For the implementer, this means that we need a mechanism to reify the state of the execution, including the call stack. To achieve this, we use continuations to model activations and we propose a new pattern matching operator match cont, abbreviated as mc. Given a continuation, it matches the return address of the top stack frame as an indication of what remains to be done in the activation. It pops this stack frame and picks values from it in order to retrieve intermediate results. To do this, we extend the semantics with low-level details of the dynamics of the runtime stack. In the following, we give an overview of how this operator helps in the fib example (Section 2.2). Here we give only part of it to make it easier to comment and understand. Section 6 gives more details and the full source code is in Figure 7. The version below of the fib function is annotated for the purpose of update. Call sites’ labels may be given by the update developer or generated by some assisting tool. The labelling strategy is not discussed here because it is beyond the scope of this paper.

let rec match fib callers k r = match cont k with | :: k’ → (∗ (1) complete with new version ∗) | :: k’ → (∗ (2) convert fn1 ∗) let nfn1 = if (n−1)>44 then fib (n−1) else num of int nfn1 in let r’ = (fn1’ +/ r) in match fib callers k’ r’ | :: → (∗ (3) resume normal execution ∗)

Discussion

Using these two simple examples, we aim at showing that updating a software at runtime and in the right way is a difficult task. There is no general scheme that applies well to all of the cases. In the first case (Section 2.2), each activation is converted independently of the others to the new version. In the second case (Section 2.3), as the algorithm changes radically, all of the activations are cancelled and there is a lookup for specific intermediate results. These update schemes are complex despite the simplicity of the application.

Notice that when filtering a case the update developer can specify values that he wants to extract from the current activation. For example, in case (1), he may use the rank of the Fibonacci number being calculated (here it is bound to n) and in case (2), he may also access the intermediate result of fib (n-1) named here fn1. As described in Section 2.2, when the top stack frame matches L2, the compensation has first to check whether fib (n−1) has

4 To

keep the program simple, we extend Caml with 1/ to denote the arbitrary precision literal 1 similarly to the +/ notation for arbitrary precision addition.

29

In the following, v is a value; e denotes a term; x is a variable; k is a continuation, i.e., an evaluation context; p denotes a prompt; names a call site; and E is an environment.

overflowed. Assuming that integers are coded by, e.g., 31-bits signed integers, we statically know that the biggest correct (smaller than 230 − 1) Fibonacci number has rank 44. So the compensation compares the rank n−1 (where n is picked from the stack frame on top of the continuation k) to 44 in order to decide wether fn1 can be reused. We assume here that r has already been handled appropriately by the compensation, hence its type is num. See Section 6 for details on how it switches from int to num. Then the compensation completes the popped activation in r’. Last, we have to compensate the tail k’ of the continuation. Because the next stack frame is also suspended at a call of fib (L2 originates from fib), we have to check once again for the callers of fib. Hence the tail k’ is compensated by a recursive call of match fib callers .

v ::= (λx.e, E) | p | cont (k) e ::= v | x | λx.e | let rec x = λx.e in e | e e | frame,E,p0 e | envE e | mc e with (, x, x, x, e) e e | capture up to e with e | cap,E up to p with v | reinstate e e | setprompt e e | newprompt k ::= | | |

| k v | e k | frame,E,p0 k | envE k mc k with (, x, x, x, e) e e capture up to k with v | capture up to e with k reinstate k v | reinstate e k | setprompt k e

p0 ::= p | ⊥

4.

The ReCaml Language

Additional constraint:

Building on the λ-calculus, ReCaml adds a model of stack frames, which are generated by the compiler. On top of this model and of a continuation framework, it implements the mc operator. In doing so, developers programming updates in ReCaml can manipulate runtime states using the same language. Embedding the operator in the language allows us to extend the type system in order to eliminate statically unsound update programs. Triggering and executing an update is the responsibility of the execution platform. It is done by some kind of interrupt that can preempt execution at any time. However, updates must deal on their own with their timing with respect to the application execution. The execution platform captures the execution state and passes it as an argument to the update. In return, updates have to guess when the execution has been preempted to select appropriate actions. To mitigate the issue in bootstraping the compensation and to align continuation extremities on stack frame boundaries, as a first implementation, we check for the trigger only when the execution control returns to a caller. This restriction is equivalent to explicit update points. The application developer can cause additional points thanks to dummy calls, each of which incurs a return. 4.1

• A continuation cont (k) is either empty (k is ) or its innermost

operator is frame (k ends with frame,E,p0 ). Figure 1. Grammar of terms and continuations

tor denotes that e is evaluated in a new stack frame that results from the call/return site . At the boundary, a prompt is possibly set if the third annotation p0 is not ⊥ (i.e., it is the name of a prompt). E recalls the evaluation environment of the enclosing context of the operator thus keeping track of the values accessible in this frame. The last operator (mc e with (, x1 , x2 , x3 , e1 ) e2 e3 ) deconstructs a continuation relying on its stack frame structure. It compares and the return address on top of the continuation. If the labels match, the continuation is split at the second innermost frame operator in a head (the inner subcontinuation) bound to x1 and a tail (the outer subcontinuation) bound to x2 . Furthermore, the variables x3 are bound to the values of the topmost stack frame. Then e1 is executed in the so extended evaluation environment. There are two other cases: either the return address does not match (e2 is executed) or the continuation is empty (e3 is executed). The language has 3 kinds of values: closures, prompts and continuations.

Syntax

We first describe the syntactical constructs and notations (Figure 1) then we discuss the choices in the design of the grammar. 4.1.1

E ::= [] | (x 7→ v) :: E

Description of the grammar

4.1.2

Because we use an environment-based semantics, we need explicit closures and environment management. While λx.e is the usual abstraction construct, (λx.e, E) denotes a closure such that the captured environment E is used to evaluate the body of the function upon application. The syntax of the application operator ( e e) is extended with a label that names the call site. The (envE e) operator evaluates its subterm e in the environment E instead of the current evaluation environment. Recursive functions are defined as usual (let rec x = λx.e in e). Our continuation framework defines first-class instantiable prompts and first-class delimited continuations. Intuitively, prompts are delimiters that bound the outermost context that shall be captured within a continuation. Hence a delimited continuation represents only part of the remainder of execution. The newprompt operator instantiates a fresh prompt. The (setprompt e e) operator inserts a delimiter in the evaluation context. Given a prompt, the (capture up to e with e) operator captures and replaces the current continuation up to the innermost delimiter. The continuation is wrapped by the cont (k) constructor. The (reinstate e e) operator reinstates and evaluates a continuation.We shall explain later in Section 4.1.2 the cap,E up to p with v operator, which is an explicit intermediate step in the capture of a continuation. In order to model the state structure, we introduce an operator (frame,E,p0 e), which annotates activation boundaries. The opera-

Discussion

Having explicit closures and the env operator is the usual approach for the implementation of lexical scoping in small-step environment-based semantics. As a side-effect, the env operator also ensures that continuations are independent of any evaluation environment, i.e., any continuation brings its required environment in an env construct. To some extent, this is similar to the destructtime λ-calculus [8, 33], which delays substitutions until values are consumed. That way, bindings can be marshalled and move between scopes. Delimited continuations are a natural choice in our context. Indeed, when the mc operator splits a continuation into smaller ones, it instantiates continuations that represent only parts of execution contexts. This is what delimited continuations are designed for. Our framework is similar to the ones of Gunter et al. [17] and Dybvig et al. [13]. The following table approximates how our operators match with existing frameworks. Readers can refer to Shan [34], Kiselyov [21] and Dybvig et al. [13] for more complete comparisons. ReCaml newprompt setprompt capture reinstate

30

Dybvig et al. [13] newPrompt pushPrompt withSubCont pushSubCont

Gunter et al. [17] new prompt set cupto fun. application

In addition, we adapt the framework: A PPLY

E ` (λx.x, E2 ) v − −−−−−− → F RAME,E NV,S UBST − −−−−−− → F RAME,E NV VAL − −−−−−− → F RAME VAL − −−−−−− →

• We align the delimiters of continuations with the delimiters of

stack frames. To do so, we annotate the frame operator with an optional prompt in order to delimit where prompts are set. Furthermore, the continuation operators must have a call site label in order to insert frame constructs.

Capturing a continuation is done in two steps. First, the evaluation environment at the capture operator is saved, mutating the operator into cap (rule C AP 1). The second step is the standard continuation capturing. A cap operator using prompt p is only reduced within a frame tagged by p. If such a frame exists, the context k between this frame and cap is reified as a continuation cont(k). A frame is inserted in place of cap consistently with the constraint of our language (see at the bottom of Fig. 1). The closure argument of cap is applied to the resulting continuation (rule C AP 2). In rule C AP 2, the enclosing prompt p is consumed. The system proceeds as follows: E1 ` frame,E1 ,p envE2 capture up to p with (λx.e, E3 ) F RAME,E NV,C AP 1 −−−−−→ frame,E1 ,p envE2 cap,E2 up to p with(λx.e, E3 ) C AP 2 −−−−−→ frame,E1 ,⊥ env(x7→cont(k))::E3 e with k = envE2 (frame,E2 ,⊥ )

• We have to introduce a dummy cap operator to align a stack

frame delimiter with the inner delimiter of the continuation. To do so, a frame operator (which needs the evaluation environment) is inserted at the innermost position of the continuation, in place of the capture operator. The cap operator saves the needed evaluation environment (the one at the position of the capture operator) before the continuation is actually captured. • Like Dybvig et al. [13], we encode continuations in a specific

cont form rather than a closure [17]. That way, the linear structure of continuations (a stack in the implementation; the nesting of evaluation contexts in the language) is maintained and can be used by the mc operator. Furthermore, encoding a continuation as a closure would introduce a variable, which would infringe the type preservation lemma due to the typing of call site labels, as we will see later (Section 4.4). Last, making the distinction between continuations and closures, the mc operator does not have to handle regular closures.

We proceed in two steps in order to handle easily any context in place of envE2 . If no frame tagged by p encloses cap p (terms structured like k cap,E2 up to p with v where k does not contain any frame< >, ,p ), a runtime error occurs. The mc operator splits a continuation at the second innermost frame, which delimits the top stack frame (rule MCM ATCH). The rule MCM ATCH ’ handles the case where the continuation contains a single stack frame. The tail subcontinuation is therefore the empty continuation. The rules for mc assume that the continuation is either empty (rule MCE MPTY) or the innermost operator within the continuation is frame (rules MCN O M ATCH, MCM ATCH and MCM ATCH ’). As shown in Fig. 1, this is enforced as a structural constraint on the language. It is trivial to show that the semantics produces only continuations that conform to this constraint.

Intuitively, a frame operator is inserted when a call is done and disappears when the callee terminates. Thus, when a continuation is captured, all its activations are delimited by frame operators. The mc operator uses them to split continuations into smaller ones. One can note that the environment of a frame is redundant. This environment indeed comes from the enclosing env construct. While our choice imposes a dummy cap operator in the continuation framework, it makes mc simpler. Indeed, it does not need to look for env constructs to collect environments when a continuation is split.

4.3 4.2

frame,E,⊥ env(x7→v)::E2 x frame,E,⊥ env(x7→v)::E2 v frame,E,⊥ v v

Semantics

Type System

The type system adheres to the usual design of the simply-typed λ-calculus. Types may be type variables5 , usual functional types, prompt types or continuation types. The type of a prompt is parameterized by the type of the values that flow through delimiters tagged by that prompt. The type of a continuation is parameterized by the type of the parameter and the type of the result of the continuation. The grammar for types is:

The small step operational semantics of Figure 2 formalizes the above description of ReCaml. We adopt an environment-based approach with lexical scoping of variables. The judgment E ` e → e0 asserts that the term e reduces to e0 in the evaluation environment E. Rules S UBST, C LOSE and L ET R EC are the classical ones for substituting a variable, building a closure and recursive definitions, respectively. As usual with environment-based semantics, the env operator installs a local environment in order to evaluate the nested term (rule E NV). Because the frame operator bounds activations, the local environment used to evaluate the nested term is empty (rule F RAME). Here, it is the role of the inner env operator to give the actual execution environment. Figure 2 gives only primitive reduction rules. Except frame and env, which need special treatment of the environment, the C ONTEXT rule generically reduces contexts according to the grammar of k. Because it is constrained with values, it fixes a strict right-to-left call-by-value evaluation order. The management of the frame operator is one originality of the semantics. It implements the life cycle of activations. This operator is instantiated when a closure is applied (rule A PPLY), when a prompt is set (rule S ETPROMPT) and when a continuation is reinstated (rule R EINSTATE). It collapses when a callee activation returns a value (rule F RAME VAL). Paired with the frame operator, the env operator provides the local evaluation environment for the instantiated activation. For instance, applying a closure, e.g., the identity function, proceeds as follows:

τ

::=

α

|

τ →τ

|

τ prompt

|

κ

τ − →τ

Fig. 3 gives the type system for the term language. The judgement E, P, L, τ ` e : τe asserts that given the typing environments E, P and L, in an enclosing function whose return type is τ , the term e has type τe . E (resp. P ) maps variables (resp. prompts) to types. L maps call site labels to label types, which are triplets {τpar , τres , V } where τpar and τres are types; and V is an environment that maps variables to types. The inference algorithm computes τe and L. The L environment is intended for splitting continuations at activation boundaries. Figure 4 gives an intuition of its interpretation, κ based on the semantics of the mc operator. A τ1 − → τn continuation 5 We use type variables for convenience to solve the type inference problem.

As ReCaml is simply typed, type variables are never generalized as type parameters. Instead, they are unknown types that shall later be instantiated by unification. This is similar to Caml’s weak type variables such as ’ a in the type ’ a list ref of ref [].

31

S UBST:

E ` x → E (x)

L ET R EC :

C LOSE :

E ` λx.e → (λx.e, E)

A PPLY:

E1 ` (λx.e, E2 ) v → frame,E1 ,⊥ env(x7→v)::E2 e

E ` let rec x1 = λx2 .e1 in e2 → env(x1 7→(λx2 .let rec x1 =λx2 .e1 in e1 ,E))::E e2 E NV VAL :

E1 ` envE2 v → v MCN O M ATCH :

MCM ATCH :

MCE MPTY:

E ` mc cont () with (, x1 , x2 , x3 , e1 ) e2 e3 → e3

k1 does not contain any frame E1 (x3 ) = v3 E ` mc cont (k2 [frame,E2 ,p2 (k1 [frame,E1 ,p1 ])]) with (, x1 , x2 , x3 , e1 ) e2 e3 → env(x1 7→cont(k1 [frame,E ,p ]))::(x2 7→cont(k2 [frame,E ,p ]))::(x3 7→v3 )::E e1 1

1

2

2

2

k1 does not contain any frame E1 (x3 ) = v3 E ` mc cont (k1 [frame,E1 ,p1 ]) with (, x1 , x2 , x3 , e1 ) e2 e3 → env(x1 7→cont(k1 [frame,E ,p ]))::(x2 7→cont())::(x3 7→v3 )::E e1 1

C AP 1:

C AP 2:

E1 ` frame,E2 ,p0 v → v

l1 6= l2 E1 ` mc cont (k [frame,E2 ,p ]) with (, x1 , x2 , x3 , e1 ) e2 e3 → e2

1

MCM ATCH ’:

F RAME VAL :

1

p is fresh E ` newprompt → p

N EWPROMPT:

1

E ` capture up to v1 with v2 → cap,E up to v1 with v2

k does not contain any frame< >, ,p E1 ` frame,E2 ,p k cap,E3 up to p with (λx.e, E4 ) → frame,E2 ,⊥ env(x7→cont(k[frame
]))::E4 e

2 >,E3 ,⊥

S ETPROMPT:

E ` setprompt p e → frame,E,p (envE e)

R EINSTATE :

E ` reinstate cont (k) v → frame,E,⊥ k [v]

Context rules: F RAME :

[] ` e → e0 E1 ` frame,E2 ,p0 e → frame,E2 ,p0 e0

E NV:

E 2 ` e → e0 E1 ` envE2 e → envE2 e0

if no other rule matches, C ONTEXT:

E ` e → e0 E ` k [e] → k e0

k [a] substitutes a for in k, where a is either a term, hence resulting in a term, or a continuation, hence resulting in a continuation. Figure 2. Operational semantics κ

κ

the setprompt operator encloses e2 in a frame whose prompt is of type τ2 (see S ET P ROMPT in Figures 2 and 3). Typing a continuation expression (C ONT) requires a specific type system. It is mutually recursive with the type system for terms. κ The judgment E, P, L, τ `κ k : τ1 − → τ2 is similar to the one for terms. Most of the rules derive from the type system for terms. For instance, the following rule is immediate from rule A PPLY (Fig. 3):

k is split into khead (τ1 − → τ2 ) and ktail (τ2 − → τn ). Composing the two subcontinuations results obviously in the original continuation. τ2 is the return type of the function that encloses l1 . This is the reason why the type judgment has τ (the type of the enclosing function) in its left-hand side. τ1 is the return type of the call l1 . In order to type values that mc retrieves from the popped activation, e.g., the value of x1 , the type of l1 contains the type environment at the call l1 . Consequently, the type of l1 is:

κ

E, P, L, τ1 `κ k : τ2 − → (τ3 → τ4 ) E, P, L, τ1 ` v : τ3 L (l) = {τpar = τ4 , τres = τ1 , V = E}

• τparl = τ1 is the type of the value that flows at the boundary; 1 • τresl = τ2 is the return type of the enclosing function; 1

A PPLY L:

• Vl1 = [x1 7→ τx1 ] binds types to the activation variables.

In the example (Section 3), the types of labels are:

κ

E, P, L, τ1 `κ k v : τ2 − → τ4

We therefore omit the rules, except the following additional one for empty continuations:

L17→{τpar = int; τres = int; V = [fib 7→ int → int; n 7→ int]} L27→{τpar = int; τres = int; V = [fib 7→ int → int; n 7→ int; fn1 7→ int]} Lroot7→{τpar = int, τres = unit, V = [fib 7→ int → int]}

H OLE :

As usual, when typing an application (A PPLY), the two subexpressions are typed using the same hypotheses. The first subexpression must be a function accepting values of the type of the second subexpression. The originality of our rule concerning application is the calculus of the type of the label. This type captures the type of the enclosing function τ1 , the current environment E and the type τ3 that flows at the label, i.e., the type of the result. Some constructs introduce frames and therefore modify the type of the enclosing function of a subexpression. For example, the type of the enclosing function of e2 in setprompt e1 e2 is τ2 because

4.4

κ

E, P, L, τ1 `κ : τ2 − → τ2

Soundness

We consider soundness as the conjunction of type preservation and progress, stated as follows. L EMMA 1 (Type preservation). Given a term e1 and an evaluation environment E such that T (E) , P, L, τ1 ` e1 : τ2 . If e1 reduces to e2 in E, then there exists an extension P 0 of P (∀p and τp , P (p) = τp ⇒ P 0 (p) = τp ) such that in P 0 , e2 has the same type as e1 , i.e., T (E) , P 0 , L, τ1 ` e2 : τ2 .

32

C LOSURE :

VAR :

L ET R EC :

A PPLY:

P ROMPT:

E, P, L, τ ` p : P (p) prompt A BS :

E, P, L, τ ` x : E (x)

C ONT:

→ τ3 [] , P, L, τ3 `κ k : τ2 − κ

E, P, L, τ1 ` cont (k) : τ2 − → τ3

(x 7→ τ2 ) :: E, P, L, τ3 ` e : τ3 E, P, L, τ1 ` λx.e : τ2 → τ3

(x1 7→ τ3 → τ4 ) :: (x2 7→ τ3 ) :: E, P, L, τ4 ` e1 : τ4 (x1 7→ τ3 → τ4 ) :: E, P, L, τ1 ` e2 : τ2 E, P, L, τ1 ` let rec x1 = λx2 .e1 in e2 : τ2 E, P, L, τ1 ` e1 : τ2 → τ3

[] , P, L, P (p) ` e : P (p) F RAME :

F RAME ’:

κ

(x 7→ τ2 ) :: T (E) , P, L, τ3 ` e : τ3 E, P, L, τ1 ` (λx.e, E) : τ2 → τ3

E, P, L, τ1 ` e2 : τ2 L (l) = {τpar = τ3 , τres = τ1 , V = E} E, P, L, τ1 ` e1 e2 : τ3

L (l) = {τpar = P (p) , τres = τ, V = E} E, P, L, τ ` frame,E,p e : P (p)

[] , P, L, τ2 ` e : τ2 L (l) = {τpar = τ2 , τres = τ1 , V = E} E = T (E) E, P, L, τ1 ` frame,E,⊥ e : τ2

E = T (E) E NV:

N EW P ROMPT:

T (E) , P, L, τ1 ` e : τ2 E, P, L, τ1 ` envE e : τ2

E, P, L, τ1 ` newprompt : τ2 prompt

κ

κ

x1 7→ τ3 − → τ4

MC:

E, P, L,τ1 ` e1 : τ3 − → τ5 L (l) = {τpar = τ3 , τres = τ4 , V = E} κ :: x2 7→ τ4 − → τ5 :: (x3 7→ E (x3 )) :: E, P, L, τ1 ` e2 : τ2 E, P, L, τ1 ` e3 : τ2

E, P, L, τ1 ` e4 : τ2

E, P, L, τ1 ` mc e1 with (, x1 , x2 , x3 , e2 ) e3 e4 : τ2 κ E, P, L, τ1 ` e2 : τ2 − → τ3 → τ3

E, P, L, τ1 ` e1 : τ3 prompt C APTURE :

L (l) = {τpar = τ2 , τres = τ1 , V = E}

E, P, L, τ1 ` capture up to e1 with e2 : τ2 κ T (E) , P, L, τ1 ` v : τ2 − → P (p) → P (p)

C AP :

S ET P ROMPT:

E = T (E)

E, P, L, τ1 ` cap,E up to p with v : τ2 κ

R EINSTATE :

L (l) = {τpar = τ2 , τres = τ1 , V = E}

→ τ2 E, P, L, τ1 ` e1 : τ3 −

E, P, L, τ1 ` e2 : τ3 L (l) = {τpar = τ2 , τres = τ1 , V = E} E, P, L, τ1 ` reinstate e1 e2 : τ2

E, P, L, τ1 ` e1 : τ2 prompt

E, P, L, τ2 ` e2 : τ2 L (l) = {τpar = τ2 , τres = τ1 , V = E} E, P, L, τ1 ` setprompt e1 e2 : τ2

Where T (E) = [x 7→ τx | [] , P, L, τ ` E (x) : τx ], i.e., function T computes a type environment from an evaluation environment. Figure 3. Type system for terms The existential quantification of P 0 is the technique of Gunter et al. [17]6 in order to handle the newprompt case. Assume T (E) , P, L, τ1 ` newprompt : τ2 prompt. newprompt reduces to a fresh prompt p in E. p is not in the domain of P . Hence choosing P 0 = (p 7→ τ2 ) :: P trivially ensures type preservation. In the other cases, we systematically choose P 0 = P . Unlike usual proofs, we do not use a lemma showing that extending the environment would preserve typing. Instead, we use a context invariance approach. While Pierce [30], Pierce et al. [31] do so for pedagogical reasons, we have to because the standard weakening lemma is false due to the typing of call sites. Indeed, in L, the V field of the type associated with the label stores the typing environment (rules A PPLY, F RAME, F RAME ’, C APTURE, C AP, R EINSTATE and S ET P ROMPT). Hence adding new variables to the

environment, even if they do not occur free, may change label types in L. Intuitively, it would change the structure and content of stack frames, hence their types. Nevertheless, we must prove that the type of a value is independent of the context. L EMMA 2 (Typing values). Given a value v, the type of v is independent of any context: E, P, L, τ ` v : τv , implies E 0 , P, L, τ 0 ` v : τv for any E 0 and τ 0 . This lemma is trivial following the C LOSURE, P ROMPT and C ONT typing rules. Type preservation for the S UBST reduction rule is therefore immediate. Restricting evaluation environments to values is a pragmatic solution to avoid any variable capture issue upon substitution. In order to prove each of the other cases, we proceed in two stages. We first show that in order to type subterms, the rules build exactly the same environment before and after reduction. Hence reduction preserves the type of subterms. Then we use these results as premises of the typing rules for the reduced term. Let’s sketch for instance the case of the A PPLY reduction rule. Before reduction, assuming the parameter v has type τv , the body

6 Gunter

et al. [17] note e1 /P1 ⊂ e2 /P2 , where P1 and P2 are sets of prompts. The ⊂ relation denotes that given a typing environment over P1 , there exists an extension over P2 such that e1 and e2 have the same type in their respective prompt environments. Using our P and P 0 as typing environments (respectively over P1 and P2 ), the ⊂ relation is (part of) what our type preservation lemma states.

33

compilers. To fulfill this constraint, we need to accommodate the choices done in legacy compilers. We identify several alternatives that shall impact dynamic updates. In the following, we present how these points integrate our formal system. We focus on the specificities of our language. Hence we do not discuss variations, e.g., of the continuation framework, which have already been studied by Dybvig et al. [13]. Usually, the implementation of execution states is not of great interest in the design of a language. This issue regards the compiler. But because ReCaml focuses on modelling state manipulations, we have to take into consideration the implementation. For instance, label types depend on the context, and therefore on captured environments when building closures. Regarding variables, we implement the following rules in the semantics and type system:

continuation k: ... τn

x2 : τx2

l2

τ3

x1 : τx1 τ2

l1

κ

→ τn : τ1 − τ1

is split into: continuation ktail ... τn

x2 : τx2

τresl2 = τ3

l2 τparl2

κ

→ τn : τ2 − = τ2 continuation khead Vl1 = {x1 : τx1 } l1

τresl1 = τ2

τparl1

κ

: τ1 − → τ2 = τ1

Figure 4. Intuition for typing activation boundary annotations.

• When a closure is built, it captures all the variables in the scope

of which the λ operator lies, regardless these variables occur free in the body of the function.

e of the closure is typed in the (x 7→ τv ) :: T (E) environment and the return type of the enclosing function is τe the type of e (C LOSURE typing rule). After reduction, it is typed in the environment T ((x 7→ v) :: E) according to the F RAME ’ and E NV typing rules. From the definition of T , and invoking the lemma on typing values, the two environments are equal. Hence the type of subterm e is preserved. Using the E NV and F RAME ’ typing rules, we conclude that the type is the same before and after reduction. Last we check that the A PPLY typing rule (before reduction) and the F RAME ’ typing rule (after reduction) compute the same label type for l. Hence the A PPLY reduction rule preserves types. Traversing the evaluation context to the redex, evaluation rules C ONTEXT, F RAME and E NV compute at each step a new evaluation environment for each subcontext. Typing rules do the same with typing environments. Along the path to the redex, we observe that the rules recursivelly ensure that the evaluation and typing environments are equal up to T . This completes the proof.

• The parameter of a function is systematically added to the

evaluation environment, regardless it occurs free in the body of the function. We do the same for let rec. This is a coarse behavior. Indeed, many compilers optimize closures in order to capture only the variables that occur free. In order to model this behavior in ReCaml, we can replace the C LOSE reduction rule with the following one: R ESTRICT-C LOSE :

where restrict computes the restriction of the environment, e.g., [x 7→ E (x) |x ∈ f v (e)] to capture only the variables that occur free in the body. We have to change the type system accordingly, replacing A BS with: R ESTRICT-A BS :

L EMMA 3 (Progress). Given e1 such that [] , P, L, τ ` e1 : τ . Then e1 is either a value; or e1 is a runtime error (redex position is cap< >, up to p with v but it is not enclosed by any frame< >, ,p ); or e1 reduces to some term e2 in the empty evaluation environment.

(x 7→ τ2 ) :: restricte (E) , P, L, τ3 ` e : τ3 E, P, L, τ1 ` λx.e : τ2 → τ3

Type soundness obviously still holds. This implementation does the restriction when the closure is built. This is what happens in many compilers. Instead, we could have delayed the restriction until application, hence inserting restrict in the A PPLY reduction rule and in the C LOSURE and A BS typing rules. As of ReCaml, both implementations have the same behavior. We can also restrict parameters and let recbound variables using the same technique. Accurate modelling of the variables is important as it impacts type labels and the amount of values the mc operator is able to retrieve from continuations. Other aspects, such as tail-call optimization and function inlining, impact when new stack frames are created. Consequently, they (indirectly) impact the outcome of the mc operator as well. Tail-call optimization consists in destroying the calling activation at the time of a call when it occurs at the return position. We can implement this optimization thanks to additional rules, e.g., duplicating the A PPLY reduction rule for the specific case, such that it does not insert any new frame operator. Possibly, there are also several env constructs that must collapse with the stack frame.

In order to prove progress, we inductively analyze the typing rules. This proof is classical. The proofs have been mechanized using the Coq theorem prover and the library of Aydemir et al. [6], which together help to do machine-verified formal proofs on language semantics and type systems. For commodity reason, our Coq scripts differ in the following from the system of this paper. We explode the MCM ATCH, MCM ATCH ’, C AP 1/C AP 2 and R EINSTATE reduction rules into detailed small steps. For example, we instantiate the C AP 2 for each operator in the language k of evaluation contexts. For this purpose, we introduce additional dummy operators for in-progress mc and reinstate. In addition, the implementation of the mc operator has to look for the innermost (frame) operator of the continuation operand. Instead, it is much more convenient to reverse the nesting of operators in the continuation. At the cost of yet another dummy operator and of additional rules, we therefore represent continuations inside out. We use the technique of Gunter et al. [17] to implement the freshness of instantiated prompts. Last, we move from the grammar to the type system the constraint on the form of continuations (bottom of Fig. 1). 4.5

E ` λx.e → (λx.e, restricte (E))

TAIL -A PPLY:

k contains only (0 or more) env E ` frame,E1 ,p01 (k [ (λx.e, E2 ) v]) → frame,E1 ,p01 env(x7→v)::E2 e

Notice that the frame in the right-hand side is the exact copy of the left-hand side one. Indeed, the properties of the enclosing stack frame (return address, local environment) are unaffected. In order to handle inlined calls, the idea is coarsely the same, without any constraint on the context of the call. Nevertheless, there

Alternatives

One of the constraints that guides our work is to leave unchanged the application compiler. The rationale behind this constraint is that it makes it easier to integrate the ReCaml approach into existing

34

stack; it restores the argument counter; and it performs a return instruction with the argument. In addition to retrieving the stack pointer, setting a prompt makes a call such that the lower bound of a continuation is always aligned with a stack frame boundary, consistently with our semantics. Based on this implementation of continuations, the mc operator first checks whether the continuation is empty. If not, it uses the recorded number of pending arguments in order to skip data down to the first return address. The retrieved address is compared with the operand of the mc operator. Static knowledge gives the structure and size of the matching stack frame at the top of the continuation. This information allows to split the continuation at the stack frame boundary and retrieve values from the popped stack frame. Tail call optimization does not need any special treatment. Indeed, activation annotations of tail calls simply never match as there is no corresponding location in the code. In order to handle currying, the generated code uses the recorded number of pending arguments in order to find the location of the return address. Pending arguments are simply skipped, as if the callee was η-expanded according to the call. Following the same principle, arguments between the two subcontinuations belong to the tail. Therefore, the number of pending arguments has to be adjusted in subcontinuations like in Figure 6. In the head subcontinuation, the number of pending arguments in the stack frame is set to 0, as there is no pending argument below the stack frame. In the tail subcontinuation, the number of pending arguments on top of the stack comes from the popped stack frame (1 in the example). As Marlow and Peyton-Jones [27] have previously noticed, the push / enter uncurrying technique is not the most favorable setup in order to walk the stack, which is what our mc operator achieves. More precisely, we remark that problems arise only when push / enter is combined with tail call optimization. Assume the following code: let f2 = λa. capture up to p with v in let f1 = λa.λb. ( ( f2 x3 ) x4 ) in let x = ( f1 x1 ) x2 in e

pending arguments return address, environment & number of pending arguments above the stack frame local variables & arguments one stack frame stack growing to the right

Figure 5. Structure of the stack in the virtual machine.

number of pending arguments: #pa = 0 . . . l1 , #pa = n x4 l5 , #pa = 1

...

#pa = 1 l1 , #pa = n

x4

#pa = 0 l5 , #pa = 0

Figure 6. Splitting a continuation. are additional difficulties: call sites within the inlined function are replicated; the environment of the caller and callee environments shall merge. We do not run into deeper details in this paper, leaving the issues to further contributions.

5.

Compiler Implementation

As a proof of concept, we have developed a prototype compiler of ReCaml, which targets a modified ZAM2 [23] virtual machine. The machine has a single stack for locals, function arguments, register backup and return addresses. In addition to the stack pointer, the machine has 4 registers: • the program counter points at the next instruction to execute; • the environment points at the values stored in the closure; • the argument counter tells how many pending arguments have

As uncurrying is done, l1 and l2 (resp. l3 and l4 ) refer to the same code location. They differ in the number of pending arguments above the return address, respectively 0 and 1. Due to tail call optimization, l1 and l3 (resp. l2 and l4 ) cannot be distinguished. Given the above description of the compiler, the captured continuation is split like in Figure 6. If the tail subcontinuation is subsequently compared to l1 , it matches as there is 1 pending argument. Our formal system assumes that the type of the produced head subcontinκ uation is (τx2 → τx ) − → τ . However, its effective (runtime) type is κ (τx4 → τx ) − → τ . The problem arises because, due to tail call optimization, there is no means at this point to know where the pending parameter comes from, i.e., to distinguish between l1 and l3 . Since our formal system does not implement uncurrying or tailcall optimization, it does not raises the problem, consistently with our type soundness result. Indeed, our formal system produces the following continuation, which is different from Figure 6:

been pushed, as the machine implements the push / enter uncurrying technique [27]; • the accumulator holds an intermediate result.

As shown in Figure 5, stack frames are delimited by blocks that save the return program counter, the environment and the argument counter registers. Pending arguments, if any (possibly 0), are pushed immediately above this block. The virtual machine provides a specific instruction for tail calls. Like our TAIL -A PPLY rule (Section 4.5), this instruction pops the local environment; it pushes a new one; and it branches to the body of the callee. The push / enter uncurrying technique lets the caller of a function push all the available arguments onto the stack. The callee is responsible of popping only those it can immediately handle (or to build a partialapplication closure if there are not enough parameters on the stack). While there are some pending arguments on the stack, the return instruction assumes that the return value is a closure, and makes a call. When all the pending arguments are consumed, the instruction returns back to the caller. We extend the virtual machine to support continuations. A continuation is implemented as a slice of the machine stack with a copy of the argument counter register. Other registers (program counters, closure environment and accumulator) are saved within the slice of the stack by the generated code as required by the ZAM2. A prompt is a pointer to a position in the stack. The capture operator copies to the heap the slice between the prompt and the top of the stack; it saves the argument counter; and it makes a call to the body function. The reinstate operator copies from the heap back to the

let x = frame,r,p ( (frame,r,p ) x4 ) in e Notice that this continuation is actually the same as the one an eval / apply compiler would produce: as the arity of the f2 closure is 1, l4 is not applied and l3 is not a tail call. In order to solve this problem in our prototype, we simply prevent uncurrying tail calls. Alternatively, we could have implemented the push / enter technique in our formal system, for instance extending our frame operator with pending arguments. We have identified the following options the mc operator can handle pending arguments: • Pending arguments can go to the tail subcontination, as depicted

in Figure 6 and described earlier in this section. Adding tail

35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

prompt p → (∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗) (∗ Initial version ∗) let rec fib n = if n<2 then n else ( let fn1 = fib (n−1) in let fn2 = fib (n−2) in fn1 + fn2 ) in (∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗) (∗ New version ∗) let rec f i b n u m n = if n<2 then n u m o f i n t n else ( let fn1 = f i b n u m (n−1) in let fn2 = f i b n u m (n−2) in fn1 +/ fn2 ) in (∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗) (∗ update from fixed−size to a r b i t r a r y ∗) (∗ p r e c i s i o n integer ∗) (∗ if n is after 44 , r has o v e r f l o w e d so ∗) (∗ return f i b n e w n else r is correct ∗) let ifnotover n r = if n>44 then f i b n u m n else n u m o f i n t r in (∗ call graph : (∗ fib : L1 → fib (∗ fib : L2 → fib (∗ [ root ]: Lroot → fib (∗ : Lupdt → update (∗ to the fib node in the call graph : let rec m a t c h f i b c a l l e r s r k = m a t c h c o n t k with (∗ − L1 : r is fib (n−1) :: tl → let fn2 = f i b n u m (n−2) in (∗ back to the caller : fib m a t c h f i b c a l l e r s ( r +/ fn2 ) tl (∗ − L2 : r is fib (n−2) & fn1 is fib (n−1) | :: tl → (∗ check whether fn1 has o v e r f l o w e d let nfn1 = ifnotover (n−1) fn1 in

∗) ∗) ∗) ∗) ∗) ∗) ∗) ∗) ∗) ∗)

42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82

(∗ back to the caller : fib m a t c h f i b c a l l e r s ( nfn1 +/ r ) tl (∗ − Lroot : r is the result of the program | :: tl → reinstate tl r | → (∗ error ∗) (0/ −/1/) in let m a t c h f i b c a l l e r s r k = m a t c h c o n t k with (∗ − L1 : r is fib (n−1) :: tl → (∗ check whether r has o v e r f l o w e d let fn1 = ifnotover (n−1) r in let fn2 = f i b n u m (n−2) in (∗ back to the caller : fib m a t c h f i b c a l l e r s ( fn1 +/ fn2 ) tl (∗ − L2 : r is fib (n−2) & fn1 is fib (n−1) | :: tl → (∗ check whether fn1 has o v e r f l o w e d let nfn1 = ifnotover (n−1) fn1 in (∗ check whether r has o v e r f l o w e d let nfn2 = ifnotover (n−2) r in (∗ back to the caller : fib m a t c h f i b c a l l e r s ( nfn1 +/ nfn2 ) tl (∗ − Lroot : r is the result of the program | :: tl → reinstate tl ( ifnotover 12345 r ) | → (∗ error ∗) (0/ −/1/) in

∗) ∗)

∗) ∗) ∗) ∗) ∗) ∗) ∗) ∗)

(∗ c o m p e n s a t i o n fib → f i b n u m ∗) let compensate r k = m a t c h c o n t k with (∗ we " know " that we are in fib ∗) :: tl → m a t c h f i b c a l l e r s r tl | → (∗ error ∗) (0/ −/2/) in (∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗) (∗ main program ∗) (∗ r e g i s t e r the c o m p e n s a t i o n ∗) let = set update routine ( fun r → capture upto p as k in compensate r k ) in (∗ initial call ∗) n u m o f i n t (fib 12345)

Figure 7. Real ReCaml code: from fixed-size to arbitrary precision integers. call optimization breaks type preservation for call site labels because each tail call can push new pending parameters with types different than those previously popped from the stack; and the types of pending arguments appear in label types. Hence we confirm what Marlow and Peyton-Jones [27] say with a stronger argument: push / enter with tail-call optimization and mc-like stack-walking is not type-sound.

L1 root of execution

fib

Lupdt

compensate

L2 Figure 8. Static call graph of the program Figure 7.

• Pending arguments can go to the head subcontinuation. In this

case, the type of the subcontinuations depends on how many arguments are pending on the stack. In the example, we would have to discriminate between l5 / no pending argument, l5 / 1 pending argument, and so on.

function that is called when the virtual machine receives an update signal. In addition we use the following syntactic sugar: line 1

• Pending arguments can be dropped. In this case, part of the

calculation captured by the original continuation is lost.

line 79

Nevertheless, notice that the updated program actually walks the stack. We feel that one of the weaknesses of our current approach is that our mc operator handles one stack frame independently of any context. We leave the issue for future works.

6.

Lroot

prompt p → e ,→ let p = newprompt in (setprompt< > p e) capture upto p as k in e ,→ capture up to p with λk.e

The captured continuation corresponds to a path in the static call graph of the program (Figure 8) going from the root of execution to the compensation. The compensation is implemented by the compensate function (line 70). As represented by the dashed Lupdt edge in the call graph, the top stack frame is an activation of the anonymous function (line 79) registered by the set update routine primitive. It comes from the update infrastructure. Hence line 70, match cont pops this useless stack frame

Detailed Example

Figure 7 contains the full source code that updates fib from int to num. The set update routine primitive (line 78) registers the

36

before entering the effective compensation. In a more realistic application, we would have to find out which function the update is called from. In the example, as it can only be the fib function the compensation calls the match fib callers function (lines 47– 67) to handle the calls to fib according to the strategy described in Section 2.2:

based semantics. Our prototype compiler of ReCaml is able to execute all the updates of Section 2. The two examples of this article, the compiler and proofs (the coq scripts) can be found at http://perso.telecom-bretagne.eu/fabiendagnat/recaml. In this paper, we have built ReCaml on top of a simply typed λcalculus for simplicity reasons. It is well known that polymorphism with continuations needs restrictions in order to ensure soundness [5, 24, 35, 37]. As the mc operator splits continuations at activation boundaries, any type variable involved in an application might cause problems if it is generalized. One of the future challenges is therefore to reconcile ReCaml with polymorphism and to infer more precise types. We have adopted a strict functional language and the ZAM2 virtual machine [23]. The ZAM2 machine has allowed us quick and easy prototyping. Strict evaluation has made it easier to understand and therefore to manipulate the execution state. Unlike similar approaches [20, 25], ReCaml does not require any specific code generation. Instead, relying on low level details of the underlying machine, it is adapted to the form of the code generated by the legacy Caml compiler. Using continuations is not a necessity. Yet it provides sound formal foundations for our work. As works that provide production level JVM and CLR with continuations [29, 32] use specific code generation, targeting such machines might not be in the scope of ReCaml. On the contrary, call site types are actually close to usual debug information. Therefore the debugging infrastructures of JVM and CLR could be used to implement ReCaml for these platforms. While these infrastructures provides mechanisms to manipulate states, ReCaml brings static typing. We therefore plan experiences to ensure that our approach also fits these platforms. To do so, we will have to enhance ReCaml to support imperative features, especially shared data. We will also have to consider multithreading, reusing previous work such as [25].

L1 The compensation function receives the result of fib (n−1). Using ifnotover, we ensure that it is correct (line 52). Notice that if the result has overflowed, the function ifnotover recomputes the Fibonacci number using the new version (line 23). To complete the fib function, we compute fib (n−2) with the new version (line 53) then we sum the two results (line 55). Last, we recursively compensate the tail of the continuation (line 55) as if the popped stack had returned the newly computed value. L2 The compensation function receives the result of fib (n−2). Futhermore, the match cont gets the value of fib (n−1) from the call stack frame naming it fn1. Using the ifnotover function, we ensure that those intermediate results are correct (lines 58–61). Last we complete the fib function and we recursively compensate the tail of the continuation (line 63). Lroot At this point, r is fib 12345 and the compensation has completed. We use ifnotover to ensure r is correct before reinstating the tail subcontinuation (line 66). The match fib callers function (lines 31–46) is almost a clone of match fib callers, except that it assumes the compensation has already dealt correctly with the received result (parameter r). So recursive calls in match fib callers do in fact switch to match fib callers . In these functions, we assume that (1) the evaluation order is known, i.e., that fib (n-1) is evaluated before fib (n-2); and (2) intermediate results have names. To make this explicit, we use let. Instead, intermediate results could have had system-generated or a posteriori names. The evaluation order shall be inferred by the compensation. Because we have not integrated any exception handling in our prototype, a negative number is returned (lines 46, 67 and 73) to notify errors. Runtime errors can occur if the continuation does not match, when the update developer forgets to handle some call sites.

Acknowledgments We would like to kindly thank Kristis Makris and Ralph Matthes for their comments. We also thank Michael Hicks for shepherding the revision of the paper. The work presented in this paper has been partly funded by the French ministry of research through the SPaCIFY consortium (ANR 06 TLOG 27).

References 7.

Discussions and Conclusions

[1] Gautam Altekar, Ilya Bagrak, Paul Burstein, and Andrew Schultz. Opus: online patches and updates for security. In USENIX Security Symposium, pages 287–302, Baltimore, Maryland, USA, August 2005.

In this paper, we have presented two dynamic software updates (Sec. 2 – though only one example is detailed in Sec. 6 and Fig. 7) that many current systems are unable to implement. Even if we consider a toy example, we have argumented that the technique is still relevant in realistic applications. Despite the apparent simplicity of our use case, the two updates show high complexity both in design and in implementation. These examples contrast with the usual simple updates of complex applications in related works. In our work, we accept that updates might be difficult to design and implement. We have first focused in this paper on being able to achieve these updates. Still, we acknowledge that our current proposal is not very handy yet. In the context of a similar approach, Makris and Bazzi [25] have for instance proposed automatic generators for some of the updates, which could be used as building blocks for a higher level update language. The ReCaml language is the cornerstone of our work. It provides an operator (match cont or mc) in order to introspect and walk continuations. Our examples have indeed emphasized how this operation helps in updating. We have formalized its environment-based semantics and defined a type system whose soundness is proved mechanicaly. Even if we have not discussed it in this paper, we have also developed a sound substitution-

[2] Pascalin Amagb´egnon, Lo¨ıc Besnard, and Paul Le Guernic. Implementation of the dataflow synchronous language SIGNAL. ACM SIGPLAN Notices, 30(6):163–173, June 1995. doi: 10.1145/223428. 207134. [3] Jonathan Appavoo, Kevin Hui, Craig Soules, Robert Wisniewski, Dilma Da Silva, Orran Krieger, Marc Auslander, David Edelsohn, Ben Gamsa, Gregory Ganger, Paul McKenney, Michal Ostrowski, Bryan Rosenburg, Michael Stumm, and Jimi Xenidis. Enabling autonomic behavior in systems software with hot swapping. IBM Systems Journal, 42(1):60–76, 2003. [4] Jeff Arnold and M. Frans Kaashoek. Ksplice: automatic rebootless kernel updates. In European Conference on Computer Systems, pages 187–198, Nuremberg, Germany, April 2009. doi: 10.1145/1519065. 1519085. [5] Kenichi Asai and Yukiyoshi Kameyama. Polymorphic delimited continuations. In Asian Symposium on Programming Languages and Systems, volume 4807 of LNCS, pages 239–254, Singapore, December 2007. doi: 10.1007/978-3-540-76637-7 16. [6] Brian Aydemir, Aaron Bohannon, Benjamin Pierce, Jeffrey Vaughan, Dimitrios Vytiniotis, Stephanie Weirich, and Steve

37

Zdancewic. Using proof assistants for programming language research or, how to write your next popl paper in coq. http://www.cis.upenn.edu/~plclub/popl08-tutorial/, 2008. POPL 2008 tutorial.

[22] Jeff Kramer and Jeff Magee. The evolving philosophers problem: dynamic change management. IEEE Transactions on Software Engineering, 16(11):1293–1306, November 1990. doi: 10.1109/32.60317. [23] Xavier Leroy. The ZINC experiment, an economical implementation of the ML language. Technical Report 117, INRIA, 1990.

[7] Andrew Baumann, Jonathan Appavoo, Robert Wisniewski, Dilma Da Silva, Orran Krieger, and Gernot Heiser. Reboots are for hardware: challenges and solutions to updating an operating system on the fly. In USENIX Annual Technical Conference, Santa Clara, California, USA, June 2007.

[24] Xavier Leroy. Polymorphism by name for references and continuations. In Principles of Programming Languages, pages 220– 231, Charleston, South Carolina, USA, January 1993. doi: 10.1145/ 158511.158632.

[8] Gavin Bierman, Michael Hicks, Peter Sewell, Gareth Stoyle, and Keith Wansbrough. Dynamic rebinding for mashalling and update, with destruct-time λ. In International Conference on Functional Programming, pages 99–110, Uppsala, Sweden, August 2003. doi: 10.1145/944705.944715.

[25] Kristis Makris and Rida Bazzi. Multi-threaded dynamic software updates using stack reconstruction. In USENIX Annual Technical Conference, San Diego, California, USA, June 2009. [26] Kristis Makris and Kyung Dong Ryu. Dynamic and adaptive updates of non-quiescent subsystems in commodity operating system kernels. In European Conference on Computer Systems, pages 327–340, Lisboa, Portugal, March 2007. doi: 10.1145/1272996.1273031.

[9] Eric Bruneton, Thierry Coupaye, Matthieu Leclerq, Vivien Qu´ema, and Jean-Bernard Stefani. The Fractal component and its support in java. Software: Practice & Experience, special issue on experiences with auto-adaptive and reconfigurable systems, 36(11-12):1257–1284, September 2006. doi: 10.1002/spe.767.

[27] Simon Marlow and Simon Peyton-Jones. Making a fast curry: push/enter vs eval/apply for higher-order languages. Journal of Functionnal Programming, 16(4-5):415–449, July 2006. doi: 10.1017/ S0956796806005995.

[10] J´er´emy Buisson and Fabien Dagnat. Introspecting continuations in order to update active code. In Workshop on Hot Topics in Software Upgrades, Nashville, Tennessee, USA, October 2008. doi: 10.1145/ 1490283.1490289.

[28] Iulian Neamtiu, Micheal Hicks, Gareth Stoyle, and Manuel Oriol. Practical dynamic software updating for C. In Conference on Programming Language Design and Implementation, pages 72–83, Ottawa, Ontario, Canada, June 2006. doi: 10.1145/1133981.1133991. [29] Greg Pettyjohn, John Clements, Joe Marshall, Shriram Krishnamurthi, and Matthias Felleisen. Continuations from generalized stack inspection. In International Conference on Functional Programming, pages 216–227, Tallinn, Estonia, September 2005. doi: 10.1145/1090189. 1086393. [30] Benjamin Pierce. Lambda, the ultimate TA: Using a proof assistant to teach programming language foundations, September 2009. Keynote address at International Conference on Functional Programming. [31] Benjamin Pierce, Chris Casinghino, and Michael Greenberg. Software foundations. 2010. http://www.cis.upenn.edu/~bcpierce/ sf/. [32] Tiark Rompf, Ingo Maier, and Martin Odersky. Implementing firstclass polymorphic delimited continuations by a type-directed selective CPS transform. In International Conference on Functional Programming, Edinburgh, Scotland, UK, September 2009. doi: 10.1145/ 1596550.1596596.

[11] Acacio Cruz. Official Gmail Blog: Update on today’s Gmail outage. http://gmailblog.blogspot.com/2009/02/ update-on-todays-gmail-outage.html, February 2009. [12] Mikhail Dmitriev. Safe class and data evolution in large and long-lived java applications. Technical Report TR-2001-98, Sun Microsystems, August 2001. [13] Kent Dybvig, Simon Peyton-Jones, and Amr Sabry. A monadic framework for delimited continuations. Journal of Functional Programming, 17(6):687–730, November 2007. doi: 10.1017/ S0956796807006259. [14] Ericsson AB. Erlang 5.6.3 Reference manual, chapter 12. Compilation and code loading. 2008. http://www.erlang.org/doc/ reference_manual/part_frame.html. [15] Matthias Felleisen. The theory and practice of first-class prompts. In Principles of Programming Languages, pages 180–190, San Diego, California, USA, January 1988. doi: 10.1145/73560.73576. [16] Stephen Gilmore, Dilsun Kirli, and Christopher Walton. Dynamic ML without dynamic types. Technical Report ECS-LFCS-97-379, University of Edinburgh, December 1997.

[33] Peter Sewell, Gareth Stoyle, Michael Hicks, Gavin Bierman, and Keith Wansbrough. Dynamic rebinding for marshalling and update, via redex-time and destruct-time reduction. Journal of Functional Programming, 18(4):437–502, July 2008. doi: 10.1017/ S0956796807006600. [34] Chung-Chieh Shan. Shift to control. In ACM SIGPLAN Scheme Workshop, Snowbird, Utah, USA, September 2004. [35] Mads Tofte. Type inference for polymorphic references. Information and computation, 89(1):1–34, November 1990. doi: 10.1016/ 0890-5401(90)90018-D. [36] Yves Vandewoude, Peter Ebraert, Yolande Berbers, and Theo D’Hondt. Tranquility: a low disruptive alternative to quiescence for ensuring safe dynamic updates. IEEE Transactions on Software Engineering, 33(12):856–868, December 2007. doi: 10.1109/TSE.2007. 70733. [37] Andrew Wright. Polymorphism for imperative languages without imperative types. Technical Report TR93-200, Rice University, February 1993.

[17] Carl A. Gunter, Didier R´emy, and Jon G. Riecke. A generalization of exceptions and control in ML-like languages. In International Conference on Functional Programming Languages and Computer Architecture, pages 12–23, La Jolla, California, USA, June 1995. doi: 10.1145/224164.224173. [18] Deepak Gupta, Pankaj Jalote, and Gautam Barua. A formal framework for on-line software version change. IEEE Transactions on Software Engineering, 22(2):120–131, February 1996. doi: 10.1109/32.485222. [19] Jennifer Hamilton, Michael Magruder, James Hogg, William Evans, Vance Morrison, Lawrence Sullivan, Sean Trowbridge, Jason Zander, Ian Carmichael, Patrick Dussud, John Hamby, John Rivard, Li Zhang, Mario Chenier, Douglas Rosen, Steven Steiner, Peter Hallam, Brian Crawford, James Miller, Sam Spencer, and Habib Heydarian. Method and system for program editing and debugging in a common language runtime environment. Patent US7516441, Microsoft Corporation, April 2009. [20] Christine Hofmeister and James Purtilo. Dynamic reconfiguration in distributed systems: adapting software modules for replacement. In International Conference on Distributed Computing Systems, pages 101–110, Pittsburgh, Pennsylvania, USA, May 1993. doi: 10.1109/ ICDCS.1993.287718. [21] Oleg Kiselyov. How to remove a dynamic prompt: static and dynamic delimited continuation operators are equally expressible. Technical Report TR611, Indiana University, March 2005.

38

Lolliproc: to Concurrency from Classical Linear Logic via Curry-Howard and Control Karl Mazurak

Steve Zdancewic

University of Pennsylvania {mazurak,stevez}@cis.upenn.edu

Abstract

they are used to ensure that two end-points of a channel agree on which side is to send the next message and what type of data should be sent. Linearity is also useful for constraining the behavior of πcalculus processes [4, 28], and can be strong enough to yield fullyabstract encodings of (stateful) lambda-calculi [45]. Given all this, it is natural to seek out programming-language constructs that correspond directly to linear logic connectives via the Curry-Howard correspondence [26]. In doing so, one would hope to shed light on the computational primitives involved and, eventually, to apply those insights in the contexts of proof theory and programming-language design. Here too, there has been much progress, which falls, roughly, into three lines of work. First, there has been considerable effort to study various intuitionistic fragments of linear logic [6, 11, 29–31, 39]. This has yielded type systems and programming models that are relatively familiar to functional programmers and have applications in managing state and other resources [2, 13, 16, 24, 41, 47]. However, such intuitionistic calculi do not exploit concurrency (or nonstandard control operators) to express their operational semantics. A second approach has been to formulate proof terms for the sequent calculus presentation of linear logic. This path leads to proof nets, as in Girard’s original work [22] and related calculi [1, 18]. This approach has the benefit of fully exposing the concurrency inherent in linear logic, and it takes full advantage of the symmetries of the logical connectives to provide a parsimonious syntax. Yet the resulting type systems and programming models, with their fully symmetric operations, are far removed from familiar functional programming languages. A third approach studies natural deduction formulations of linear logic [10, 14], following work on term assignments for classical (though not linear) logic [35–37]. These calculi typically use typing judgments with multiple conclusions, which can be read computationally as assigning types to variables that name first-class continuations. Their operational semantics encode the so-called commuting conversions which shuffle (delimited) continuations in such a way as to effectively simulate parallel evaluation. This approach offers type systems that are relatively similar to those used in standard functional programming languages at the expense of obscuring the connections to concurrent programming.

While many type systems based on the intuitionistic fragment of linear logic have been proposed, applications in programming languages of the full power of linear logic—including double-negation elimination—have remained elusive. Meanwhile, linearity has been used in many type systems for concurrent programs—e.g., session types—which suggests applicability to the problems of concurrent programming, but the ways in which linearity has interacted with concurrency primitives in lambda calculi have remained somewhat ad-hoc. In this paper we connect classical linear logic and concurrent functional programming in the language Lolliproc, which provides simple primitives for concurrency that have a direct logical interpretation and that combine to provide the functionality of session types. Lolliproc features a simple process calculus “under the hood” but hides the machinery of processes from programmers. We illustrate Lolliproc by example and prove soundness, strong normalization, and confluence results, which, among other things, guarantees freedom from deadlocks and race conditions. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features General Terms Keywords

1.

Design, Languages, Theory

Linear logic, Concurrency, Type systems

Introduction: Linearity and Concurrency

Since its introduction by Girard in the 1980’s [22], linear logic has suggested applications in type system support for concurrency. Intuitively, the appeal of this connection stems from linear logic’s strong notion of resource management: if two program terms use distinct sets of resources, then one should be able to compute them both in parallel without fear of interference, thereby eliminating problems with race conditions or deadlock. Moreover, linear logic’s ability to account for stateful computation [42], when combined with the concurrency interpretation above, suggests that it is a good fit for describing stateful communication protocols in which the two endpoints must be synchronized. Indeed, there have been many successful uses of linearity in type systems for concurrent programming. Ideas from linearity play a crucial role in session types [12, 15, 25, 38, 40], for example, where

Contributions This paper introduces Lolliproc, a language in the natural deduction tradition that takes a more direct approach to concurrency. Lolliproc is designed first as a core calculus for concurrent functional programming; it gives a Curry-Howard interpretation of classical—as opposed to intuitionistic—linear logic1 that is nonetheless suggestive of familiar functional languages. There are two key ideas to our approach. First, in contrast with the work mentioned previously, we move from an intuitionistic to a classical setting by adding a witness for double-negation elimi-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. Copyright © 2010 ACM 978-1-60558-794-3/10/09. . . $10.00

1 Girard

39

would say “full linear logic” or simply “linear logic”.

nation, which we call yield. Second, to recover the expressiveness of linear logic, we introduce an operation go, which corresponds logically to the coercion from the intuitionistic negation (ρ ( ‹) to ρe, ρ’s dual as defined analogously to de Morgan’s laws in classical logic. Operationally, go spawns a new process that executes in parallel to the main thread while yield waits for a value sent by another process. These constructs are novel adaptations of Felleisen & Hieb’s control operator [17] to our linear setting. The search for appropriate operational semantics for these constructs leads us to a simple process language—reminiscent of Milner’s π-calculus [32]—hidden behind an abstract interface. Programs are written entirely in a standard linear λ-calculus augmented with the go and yield operations and elaborate to processes at run time. As a consequence, our type system isolates the classical multiple-conclusions judgments (captured by our typing rules for processes) so that they are not needed to type check source program expressions. This situation is somewhat analogous to how reference cells are treated in ML—location values and heap typings are needed to describe the operational semantics, but source program type checking doesn’t require them.

::= ::=

τ ( τ τ & τ 1 τ ⊗ τ τ ⊕ τ ‹ types τ (ρ ρ&ρ ‹ protocol types

i e

::= ::=

1 2 indices x λx:τ. e e e he, ei e.i expressions () e; e (e, e) iniτ ⊕τ e let (x, y) = e in e case e of in1 x 7→ e | in2 y 7→ e goρ e yield new primitives e a a cab channel endpoints

::=

ei () (v, v) iniτ ⊕τ v λx:τ. e he, a a cab

::=

E e v E E.i evaluation contexts E; e (E, e) (v, E) iniτ ⊕τ E

v

E

Organization The next Section introduces Lolliproc informally, covering both what we take from the standard intuitionistic linear λ-calculus and our new constructs. Given our goal of enabling concurrent programming in a traditional functional setting, we demonstrate Lolliproc’s functionality by example in Section 3; we show how a system that seems to permit communication in only one direction can in fact be used to mimic bidirectional session types. Section 4 gives the formal typing rules and operational semantics for Lolliproc and presents our main technical contributions: a proof of type soundness, which implies both deadlock-freedom and adherence to session types; a proof of strong normalization, ruling out the possibility of livelocks or other non-terminating computations; and a proof of confluence, showing that there are no race conditions in our calculus. Lolliproc does remain quite restricted, however—we have deliberately included only the bare minimum necessary to demonstrate its concurrent functionality. Section 5 discusses additions to the language that would relax these restrictions, including unrestricted (i.e., non-linear) types, general recursion via recursive types, and intentional nondeterminism. This approach adheres to our philosophy of starting from a core language with support for well-behaved concurrency, then explicitly introducing potentially dangerous constructs (which, for instance, might introduce race conditions) in a controlled way. This section also concludes with a discussion of related work and a comparison of Lolliproc to more conventional classical linear logics.

2.

τ ρ

values

let (x, y) = E in e case E of in1 x 7→ e | in2 y 7→ e goρ E yield E

P

::=

e P | P νa:ρ. P

Π ∆

::= ::=

· Π, a·ρ Π, a˜·ρ Π, a:ρ · ∆, x:τ

processes

channel contexts typing contexts

Figure 1. Lolliproc syntax

[E-A PP L AM] (λx:τ. e) v −→ {x 7→ v}e [E-L OCAL C HOICE] he1 , e2 i.i −→ ei

[E-U NIT] (); e −→ e

[E-L ET] let (x1 , x2 ) = (v1 , v2 ) in e −→ {x1 7→ v1 , x2 7→ v2 }e [E-C ASE] case iniτ1 ⊕τ2 v of in1 x1 7→ e1 | in2 x2 → 7 e2 −→ {xi 7→ v}ei Figure 2. Basic evaluation rules are call-by-value. Additive pairs he1 , e2 i use the same resources to construct both of their components and are thus evaluated lazily and eliminated via projection; multiplicative pairs (e1 , e2 ), whose components are independent, are evaluated eagerly and eliminated by let-binding both components. We use the sequencing notation e1 ; e2 to eliminate units () of type 1. Additive sums, eliminated by case expressions, are completely standard. Our new constructs—the go and yield operations, along with channels and processes—are perhaps best understood by looking at what motivated their design. In the rest of this section we will see how the desire to capture classicality led to processes with a simple communication model and how the desire to make that communication more express led back to classical linear logic. We will also see Lolliproc’s operational semantics; we defer a full account of its typing rules for Section 4.

An overview of Lolliproc

As shown in Figure 1, the types τ of Lolliproc include linear functions τ1 ( τ2 , additive products τ1 & τ2 (sometimes pronounced “with”), the unit type 1, multiplicative products τ1 ⊗ τ2 , and additive sums τ1 ⊕τ2 . These types form an intuitionistic subset of linear logic, and they come equipped with standard introduction and elimination forms and accompanying typing rules. In addition, we have the type ‹, which is notably not the falsity from which everything follows.2 Its purpose will become apparent later. Our syntax for expressions is given by the grammar e in Figure 1, and their standard evaluation semantics is summarized in Figure 2.3 In Lolliproc, all variables are treated linearly and functions

2.1

a type in linear logic is the additive false, while ‹ is the multiplicative false; we have left additive units out of Lolliproc for simplicity’s sake. 3 The typical rule for handling evaluation contexts is missing, as this is done at the process level in Lolliproc. 2 Such

Moving to classical linear logic

The differences between intuitionistic and classical logic can be seen in their treatment of negation and disjunction. In standard presentations of classical linear logic, negation is defined via a dualizing operator (−)‹ that identifies the de Morgan duals as

40

The second operation, yield, is used by the parent process to synchronize with the child by blocking on a source:

shown below: ‹

‹ (t1 & t2 )‹ (t1 ( t2 )‹

= = =

‹

1 ‹ t‹ 1 ⊕ t2 t1 ⊗ t‹ 2

1 (t1 ⊕ t2 )‹ (t1 ⊗ t2 )‹

‹ ‹ t‹ 1 & t2 t1 ( t‹ 2

= = =

E yield a | E 0 a v −→ E v | E 0 cab

2.2

With this definition, dualization is clearly an involution—that is, (τ ‹ )‹ = τ . Moreover, the logic is set up so that duals are logically equivalent to negation: τ ‹ is provable if and only if τ ( ‹ is provable. In this way, classical linear logic builds double-negation elimination into its very definition—it is trivial to prove the theorem ((τ ( ‹) ( ‹) ( τ , which is not intuitionistically valid. Sequent calculus formulations of classical linear logic take advantage of these dualities by observing that the introduction of τ is equivalent to the elimination of τ ‹ ; this allows them to be presented with half the typing rules and syntactic forms that would otherwise be required. This symmetric approach is extremely convenient for proof theory but does not allow us to conservatively extend the existing typing rules and operational semantics for the intuitionistic fragment of linear logic already described above. For that, we need a natural-deduction formulation of the type system. Our solution to this problem is to forget dualization (for now) and instead add double-negation elimination as a primitive. We take inspiration from type systems for Felleisen & Hieb’s control and abort operators [17, 34]: in a non-linear setting, control, can be given the type ((τ → ⊥) → ⊥) → τ , corresponds to doublenegation elimination, while abort is a functional variant of false elimination that takes ⊥ to any type. The operational behavior of these constructs is as follows:

E control (λc. e)

E abort e

−→

(λc. e) (λx. abort E x )

−→

e

e ‹

0

0

−→

τ ( τ

=

1

=

ρ‹1 ⊕ ρ‹2

τfl (ρ

=

τ ⊗ ρe

:

(ρ ( ‹) ( ρe

The channel endpoints a and a, then, must have the types ρ and ρe. Their types will change over the course of evaluation, as communication proceeds over the channel a; when communication is finished, the a of type ‹ will be replaced by cab of that same type, while the a of type 1 will simply step to (). With this plumbing in place, we can define our operational semantics for processes as shown in Figure 3. At the process level we bind channels with νa:ρ. P ; these binders are generated by rule EP-G O and require that we annotate go expressions as goρ e. Evaluation blocks when yielding on sources or eliminating sinks

The closed channel token cab indicates that communication over a is finished; it also indicates that the child process may now terminate, but before process termination actually happens all linear resources in E 0 must be safely consumed. Linearity is preserved by both of our operations, as neither expressions nor evaluation contexts are duplicated or discarded. So far, though, these constructs offer a very poor form of concurrency—in the rules above, the parent process immediately blocks waiting for the child process to return. To allow the parent and child to execute in parallel, we split can control into two operations. The first, which we call go, is responsible for generating the channel a and spawning the child process; it immediately returns a source value to the parent, which can keep running:

:

ρ‡ 1 & ρ2

go

E control a | E a v −→ E v | E cab

E go (λc. e)

Aside from the extra double-negations—corresponding operationally to points at which we must synchronize with yield and logically to explicitly marking where classical reasoning will take place—this is exactly the left-hand column of the definition of (−)‹ .5 Additionally, since τ is defined in terms of implication, both ρ‹1 ⊕ ρ‹2 and τ ⊗ ρe are themselves protocol types, a fact which will become important as we go on. Thus go witnesses the logical isomorphism between the intuitionistic negation of a type and its dual:

Here, evaluating a control expression spawns its argument as a child process. The connection between the original evaluation context E and the child process is now the channel a: we write a for the receiving endpoint or source of a, held by the parent process, while the a passed to the child denotes the sending endpoint or sink. Now evaluation can proceed in the right-hand expression until the sink is applied to a value, at which point this “answer” is passed back to the parent process:

What about go? At first glance, it appears that go takes an expression of type τ and returns a τ —it is logically an identity function. This would be sound, but we can do better. The type τ ( ‹ is usually thought of as a continuation that accepts a τ , but here it is better to think of it as expressing a very simple protocol, one in which a τ is sent and there is no further communication. From this point of view, we can instead think of go as taking a function of type ρ ( ‹, and spawning that function as a child process that must communicate according to the protocol ρ. The parent process receives from go a source whose type describes the other side of the protocol ρ; hence a yield on the source waits for information to be sent across the sink by the child process, after which both sides continue with the protocol. Which types make sense as protocols? A protocol might be complete (i.e., ‹), it might specify that a value of type τ be sent before continuing according to the protocol ρ (i.e., τ ( ρ), or it might specify a choice between protocols ρ1 and ρ2 (i.e., ρ1 & ρ2 ). For each such protocol type ρ we define a dual type ρe, as follows:4

E control (λc. e) −→ E control a | (λc. e) a

Typing and extending go and yield

yield

How, then, to type check these new operations? Which is to say, what is their logical meaning? The source a has type ((τ ( ‹) ( ‹), and such doublynegated types appear so frequently in Lolliproc that we abbreviate them as τ , pronounced “source of τ ”. Invoking yield on such a source returns a τ —it eliminates the double negation—so we have:

Unfortunately, abort clearly has no place in a linear system, as it discards evaluation context E and any resources contained therein. What can we do instead? Observe that c has the continuation type τ → ⊥ (or, in a linear setting, τ ( ‹) and that invoking c within the body e returns an “answer” to the context E. We can reconcile this behavior with a linear system by dropping abort and instead introducing the ability to evaluate two expression in parallel:

e as 1 rather than 1 is a simple optimization that choice to define ‹ saves us from unnecessary synchronization at channel shutdown; our linkτ example in the next section shows how this can come in handy. 5 In linear logic, the protocol connectives are said to be negative, meaning that their introduction forms are invertible. That is, no additional choice is made in their construction—in contrast to the choice of injection for ⊕ and the choice of resource split for ⊗, which are both positive connectives.

4 The

E a | (λc. e) a

41

a not free in E goρ v

[EP-G O]

E goρ v −→ νa:ρ. (E a | v a)

[EP-A PP S INK] νa:τ ( ρ. E1 yield a | E2 a v −→ νa:ρ. E1 (v, a) | E2 a

[EP-R EMOTE C HOICE] νa:ρ1 & ρ2 . E1 yield a | E2 a.i −→ νa:ρi . E1 inρi 1 ⊕ρ2 a | E2 a

[EP-C LOSE] νa:‹. E1 a | E2 a −→ E1 () | νa:‹. E2 cab [EP-E VAL]

e −→ e0

0

E e −→ E e

[EP-PAR]

[EP-D ONE] P | νa:‹. cab −→ P

P1 −→ P10 P1 | P2 −→ P10 | P2

[EP-N EW]

P −→ P 0 νa:τ. P −→ νa:τ. P 0

Figure 3. Process evaluation rules

[E-Y IELD OTHER]

v 6= a

[E-A PP S OURCE] a v −→ v (yield a)

yield v −→ let (z, u) = yield (goτ (‹ v) in u; z

Figure 4. Expression congruence rules

3.

until a matching pair is in play, at which point the argument or choice bit is relayed across the channel (rules EP-A PP S INK and EP-R EMOTE C HOICE). Note that such communication has the effect of updating the type of the channel at its binding site to reflect the new state of the protocol. The rule EP-C LOSE is similar, but exists only to facilitate typing of completed channels and thus does not require a yield. EP-D ONE eliminates completed processes (reminiscent of 0 in the π-calculus) and their binders. EP-E VAL integrates evaluation contexts and expression evaluation with process evaluation, while EP-PAR and EP-N EW allow evaluation within processes. (We also define the standard notion of process equivalence, given in Section 4.) Two final points must be addressed by operational semantics: the type τ can be inhabited by more than just sources, and thus we need evaluation rules for yielding on other sorts of values; similarly, our sources all technically have function types, so we must be able to apply them. Figure 4 gives the appropriate congruence rules. For the first case, we recall our earlier intuition concerning the simpler (but less useful) language where yield and go are combined into control. Rule E-Y IELD OTHER thus synthesizes a go in such cases, although we must also synthesize a let binding, as we have transformed a value of type τ into one of type τ ⊗ 1. When a source appears in the function position of an application, we appeal to the intuition from other systems for classical logics [22, 35] that the interaction of a term with type τ and another with type τ ‹ should not depend on the order of those terms. Thus, applying a of type (τ ( ‹) ( ‹ to v of type τ ( ‹ should be the equivalent of first yielding on a, then supplying the result to v. Rule E-A PP S OURCE makes this so, and it is easy to verify that this property also holds in the case of other applications at those types. Although these congruence rules are a bit unusual, the fact that Lolliproc does not introduce a new family of types for channel endpoints turns out to be a very useful property of the system: for instance, it allows us to bootstrap bidirectional communication from what appears, at first glance, to be a unidirectional language. We will see how this transpires in the next section.

Examples

Here we demonstrate some of what can be done with Lolliproc by introducing several concurrency routines of increasing complexity. For ease of explanation and consistency, we write fooτ when the function foo is parameterized by the type τ , and we use capitalized type abbreviations, e.g., Bar τ . In a real language we would of course want polymorphism—either ML-style or the full generality of System F with linearity [31]. Futures A future [33] is simply a sub-computation to be calculated in a separate thread; the main computation will wait for this thread to complete when its value is needed. This is one of the simplest forms of concurrency expressible in Lolliproc. We can define =

τ ⊗ 1

futureτ

:

(1 ( τ ) ( Future τ

futureτ

=

λx:1 ( τ. goτ (‹ λk:τ ( ‹. k (x ())

waitτ waitτ

: =

Future τ ( τ λf :Future τ. let (z, u) = yield f in u; z

Future τ

The main process passes a thunk to its newly spawned child; this child applies the thunk and sends back the result. More pictorially, the run-time behavior of E futureτ g , where g () −→∗ v and E − −→∗ E 0 − , is

∗/ τ E future g E a LLL O O LLL O a LLL O ∗ L& a (g ())

/ E 0 a O O O a O ∗/ a v

∗

The connection between endpoints of a channel at a given moment in time are given by arrows. Similarly, for such some a of type

42

τ ∗/ a E link vsrc vsnk E YLD(a) II f f& O II O & f a f& f& II O II f& f& O II O II II O a II a cbb O II q8 O II qqq II O q q II q I$ ∗ O qqq ‹ ∗ / vsrc vsnk YLD(e) , let (z, u) = yield e in u; z go a; vsrc vsnk

/ E cbb

∗

a

/ cab

/ vsnk (yield vsrc )

Figure 5. Evaluation of linkτ vsrc vsnk

00 τ E wait a O O O a O a v

Future τ , we have

a

∗ 00 / E v

into a function send:

‡ : τ ⊗ ρ ( τ ( ρe e = λs:τ ‡ ⊗ ρ.

ρ sendτ (e τ (ρ

send

/ cab

let (f, u) = yield s in u; λx:τ. goρ λp:ρ. f (x, p)

∗ a

Similarly, the dual of ρ1 ⊕ ρ2 is ((ρ1 ⊕ ρ2 ) ( ‹) ⊗ 1; to coerce this to ρ‹1 & ρ‹2 , we define select as

Here the a subscript on evaluation arrows indicates that communication over a has occurred. Since a supports no further communication afterwards—its sink has been replaced by the closed channel token cab—the connection is then removed. Recall that such a lone cab indicates a completed process; the child process in this example is now complete and will disappear.

:

τ ( (τ ( ‹) ( ‹

linkτ

=

λx:τ . λf :τ ( ‹. yield λg:‹ ( ‹. go‹ g; x f

:

‚1 ⊕ ρ2 ( ρ‹1 & ρ‹2 ρ

selectρe1 &ρe2

=

‚1 ⊕ ρ2 . λs:ρ let (f, u) = yield s in u; hgoρ1 λp1 :ρ1 . f inρ11 ⊕ρ2 p1 , goρ2 λp2 :ρ2 . f inρ21 ⊕ρ2 p2 i

Linking channel endpoints Given a vsrc of type τ and vsnk of type τ ( ‹—which may or may not be a literal source and sink— we might want to join the two such that vsrc flows to vsnk without making the parent process wait by yielding on vsrc . In doing so, however, we must still somehow produce a value of type ‹; it can’t be the value that applying vsnk would produce, so it must come from some other process. Our solution relies on the ability to pass process completion tokens from one process to another: linkτ

selectρe1 &ρe2

To demonstrate the first of these coercions in action, we look to the identity function echo, which spawns a child process, passes its argument to that child, then receives it back: replyτ replyτ

: =

τ ⊗ (τ ( ‹) ( ‹ λh:τ ⊗ (τ ( ‹). let (y, g) = yield h in g y

echoτ echoτ

: =

τ (τ λx:τ. let (z, u) = yield sendτ (τ ⊗1 (goτ ⊗(τ (‹) replyτ ) x in u; z

Note that the final x f will step to f (yield x) via rule EA PP S OURCE; similarly, rule E-Y IELD OTHER will insert a go‹(‹ immediately following the yield. A call to linkτ vsrc vsnk thus spawns two processes: the first spawns the second with the trivial protocol, then proceeds to wait and link the original arguments; the second uses the sink created for the first child to immediately return control to the parent process. This is illustrated in Figure 5; we use the abbreviation YLD(e) for the now common pattern of yielding to receive a product, immediately unpacking the resulting pair, and eliminating the left component.

Here reply is the body of the child process that will receive the initial argument and send it back. (The type of replyτ could equally well have been written as the equivalent τ ⊗ (τ ( ‹) ( ‹— this notation better reflects how it is used with echo, while the notation given above more closely matches its definition.) The execution of echoτ v for some v of type τ is shown in Figure 6. We can see how, while the initial spawning of the replyτ process orients the channel a in the usual child-to-parent direction, the machinery of send spawns another process that sets up a channel b in the opposite direction; afterwards, a third channel c is established in the original direction. All this is facilitated again by our congruence rules. It is worth noting that, while the value v cycles among several processes, at no point does a cycle exist in the communication structure—the arrows—of Figure 6. That this fact always holds is crucial to our proof of soundness in Section 4.

Reversing directions So far we have seen only child processes that send information back to their parents. While our constructs show bias towards this sort of communication, Lolliproc does allow exchanges in both directions; a few complications arise, however, due to the unidirectional nature of our so-called dualization. For instance, while the dual of τ ( ρ is τ ⊗ ρe, the dual of τ ⊗ρ is the somewhat unwieldy ((τ ⊗ρ) ( ‹)⊗1 rather than the τ ( ρe for which we would have hoped. Yet we observe that the former can be transformed into the latter with a yield operation, an uncurrying, a partial application, and a go; we combine these steps

A larger example So far we have seen relatively small examples. As a larger demonstration of the protocols expressible in Lolliproc, we consider Diffie-Hellman key exchange, formulated as follows:

43

∗ ∗ ∗ ρ(‹ / YLD(sendτ (τ ⊗1 a v) / / v / YLD(c) λp:ρ. b (v, c)) a YLD(go c j * j O O ==} SSS i ) Z ) i O SSS O h( g' Y SSS f& e% W O c O a =} =} SSS c# a! U } = O O SS∗) ] S } = } = V Q ∗ ∗ } = O O / / a b =} a cab b M b (v, c) b cbb H ∗ a } = j * 6 j * m A K } = j * m 5 u *j *j *j u5 mm ;{ =} I b u5 u5 u5 j* j* *j =} =} mmmmm 8x 9y G 5 u w 7 } = j * 5 u m v 6 E j * m b *j *j * 5u u5 u u5 u5 D ∗ =} mmm 4t 4t 4t ∗ ∗ ∗/ τ / / ccb c v reply a let (y, g) = YLD(b) in g y c b echoτ v

∗

c

Figure 6. Evaluation of echoτ v 1. Alice and Bob select secret integers a and b. a

Alice chooses to abort whenever the public key Bob sends her is too small in comparison to some parameter n. An implementation of Bob’s side of the communication—i.e., the parent process—looks very similar. While bob relies on the type Alice to specify the whole communication protocol, we do need type annotations B1 and B2 for our uses of send and select.

b

2. Alice and Bob exchange g mod p and g mod p in the clear. 3. Alice and Bob compute the shared secret (g b )a = (g a )b mod p and use it to encrypt further communication. Here g is a publicly known generator with certain properties, often 2 or 5, and p is a similarly known large prime number. The shared secret cannot feasibly be computed from the publicly known values g a and g b . For purposes of this example, we declare that further communication consists only of Alice sending an encrypted string to Bob, and we treat Alice’s session as a child process spawned by Bob rather than as a process somewhere over the network that initiates contact. We augment Lolliproc with the types Int and String, as well as necessary operations over these types: bigrandom powmod lessthan encrypt decrypt

: : : : :

B1 B2

= =

1 & ((Int ⊗ (‹ & (String ( ‹))) ( ‹) ⊗ 1 Int ( 1 ⊕ (String ( ‹)

bob bob

: =

Int ( Int ( Int ( String λg:Int. λp:Int. λn:Int. let (a, s) = yield (goAlice (alice g p n)) in case lessthan a n of in1 u1 7→ u1 ; (selectB1 s).1; "ERROR1"

1 ( Int Int ( Int ( Int ( Int Int ( Int ( (1 ⊕ 1) Int ( String ( String Int ( String ( String

| in2 u2 7→ u2 ; let s1 = (selectB1 s).2 in let b = bigrandom b in let s2 = sendB2 s1 (powmod g b p) in case yield s0 of in1 u 7→ u; "ERROR2" | in2 s00 7→ let k = powmod a b p in let (c, u0 ) = yield s00 in u0 ; decrypt k c

For clarity, we also freely use general let expressions rather than only those that eliminate multiplicative products, and we allow the reuse of variables of type Int. To demonstrate the use of additive products and sums—and to add a hint of realism—we allow Alice or Bob to abort the session after receiving a value from the other party. Thus the protocol type that must be enforced in Alice’s session and a sample implementation of said session are Alice

=

Int ( ‹ ⊕ Int ⊗ (‹ & (String ( ‹))

alice alice

: =

Int ( Int ( Int ( Alice ( ‹ λg:Int. λp:Int. λn:Int. λs:Alice. let a = bigrandom () in case yield (s (powmod g a p)) of in1 s1 7→ s1 | in2 s2 7→ let (b, s0 ) = yield s2 in case lessthan b n of in1 u1 7→ u1 ; s0 .1 | in2 u2 7→ u2 ; let k = powmod b a p in (s0 .2) (encrypt k "I know secrets!")

For brevity, we do not illustrate an evaluation of bob g p n. We observe, however, that nothing new is going on in this example as compared to echoτ . We also observe that the definitions of alice and bob are relatively straightforward. They could be improved by standard type inference and by syntactic sugar that gave the repeated generation and consumption of linear variables the appearance of a single variable being mutated [31], but they are generally quite readable.

4.

Metatheory

We now discuss the technical aspects of Lolliproc, including the formal proofs of soundness, strong normalization, and confluence. 4.1

Typing

The expression typing rules for Lolliproc can be seen in Figure 7. As we discussed in the introduction, these typing rules follow the natural-deduction presentation of intuitionistic linear calculi. Our typing judgment Π; ∆ ` e : τ depends both on a channel context Π and a term variable context ∆. Term variables x are bound to types τ in ∆, while Π contains binders a·ρ (representing the ability

Since Alice’s session is the child process, the point at which she must check for an abort signal from Bob appear as ‹ ⊕ ρ, while the point at which she may abort appears as ‹ & ρ. In this case,

44

[T-U NIT] ·; · ` () : 1

[T-L AM]

Π; ∆, x:τ1 ` e : τ2 Π; ∆ ` λx:τ1 . e : τ1 ( τ2

[T-W ITH]

Π1 ; ∆1 ` e1 : 1 Π2 ; ∆2 ` e2 : τ Π1 d Π2 ; ∆1 d ∆2 ` e1 ; e2 : τ

[T-A PP]

Π; ∆ `

[T-S ELECT]

Π1 ; ∆1 ` e1 : τ1 Π2 ; ∆2 ` e2 : τ2 Π1 d Π2 ; ∆1 d ∆2 ` (e1 , e2 ) : τ1 ⊗ τ2

Π; ∆ ` e : τi inτi 1 ⊕τ2

[T-C ASE]

e : τ1 ⊕ τ2

[T-S INK] a·ρ; · ` a : ρ

[T-VAR] ·; x:τ ` x : τ

Π1 ; ∆1 ` e1 : τ1 ( τ2 Π2 ; ∆2 ` e2 : τ1 Π1 d Π2 ; ∆2 d ∆2 ` e1 e2 : τ2

Π; ∆ ` e1 : τ1 Π; ∆ ` e2 : τ2 Π; ∆ ` he1 , e2 i : τ1 & τ2

[T-T ENSOR]

[T-I N]

[T-S EQ]

Π; ∆ ` e : τ1 & τ2 Π; ∆ ` e.i : τi

[T-L ET]

[T-G O]

[T-Y IELD]

Π; ∆ ` e : ρ ( ‹ Π; ∆ ` goρ e : ρe

Π; ∆ ` e : τ Π; ∆ ` yield e : τ

Π1 ; ∆2 ` e0 : τ1 ⊗ τ2 Π2 ; ∆2 , x1 :τ1 , x2 :τ2 ` e : τ Π1 d Π2 ; ∆1 d ∆2 ` let (x1 , x2 ) = e0 in e : τ

Π 1 ; ∆ 1 ` e0 : τ 1 ⊕ τ 2 Π2 ; ∆2 , x1 :τ1 ` e1 : τ Π2 ; ∆2 , x2 :τ2 ` e2 : τ Π1 d Π2 ; ∆1 d ∆2 ` case e0 of in1 x1 7→ e1 | in2 x2 7→ e2 [TR-D ONE] a:‹; · ` cab : ‹

[T-S OURCE] a˜·ρ; · ` a : ρe Figure 7. Expression typing rules

[U-E MPTY] · d · = · § ¨ d

::= ::=

[UC-N ONE]

[UT-L EFT]

∆1 d ∆2 = ∆ x 6∈ dom(∆) ∆1 , x:τ d ∆2 = ∆, x:τ

[UT-R IGHT]

∆1 d ∆2 = ∆ x 6∈ dom(∆) ∆1 d ∆2 , x:τ = ∆, x:τ

[UC-L EFT]

¨ Π2 = Π Π1 d a 6∈ dom(Π) ¨ Π2 = Π, a§ρ Π1 , a§ρ d

[UC-R IGHT]

¨ Π2 = Π Π1 d a 6∈ dom(Π) ¨ Π2 , a§ρ = Π, a§ρ Π1 d

· ˜· : b d d Π1 d Π2 = Π

b Π2 = Π Π1 d

[UC-S RC S NK]

Π1 d Π2 = Π

a 6∈ dom(Π)

b Π2 , a·ρ = Π, a:ρ Π1 , a˜·ρ d

[UC-S NK S RC]

Π1 d Π2 = Π

a 6∈ dom(Π)

b Π2 , a˜·ρ = Π, a:ρ Π1 , a·ρ d

Figure 8. Context splitting rules

[TP-E XP]

Π; · ` e : τ Π`e:τ

[TP-PAR L EFT]

[TP-PAR R IGHT]

[TP-N EW] Π1 ` P1 : τ

Π1 d Π2 and ∆1 d ∆2 to denote contexts that can be split into Π1 and Π2 and into ∆1 and ∆2 respectively; this relation is formally defined in Figure 8. The typing rules for our new constructs are straightforward. The types for goρ e and yield e have already been discussed; channel endpoints a and a have the types ascribed to them by the channel context Π by a·ρ and a˜·ρ respectively. The closed channel cab accounts for both endpoints but must be given the type ‹. We write Π ` P : τ for a well-typed process P with channels typed by Π; our process typing rules are given in Figure 9. No ∆ is needed, as processes never depend on expression variables; rule TP-E XP type checks atomic processes in the empty variable context. Rule TP-N EW extends the channel environment at binders. As the final type of all processes but our original will always be ‹, rules TP-PAR L EFT and TP-PAR R IGHT require that one of their components always have type ‹. Note that TP-PAR L EFT and TP-PAR R IGHT split their channel b rather than simply d. As seen in Figure 8, this context with d allows exactly one a:ρ binding to be decomposed into an a·ρ binding and an a˜·ρ binding. This means that, in any well-typed process of the form P1 | P2 , there can be at most one channel for which one endpoint is in P1 and the other is in P2 . This restriction substantially cuts down the set of well-typed processes and, as will be seen shortly, proves crucial for type soundness.

Π, a:ρ ` P : τ Π ` νa:ρ. P : τ

Π2 ` P2 : ‹

b Π2 ` P1 | P2 : τ Π1 d Π1 ` P1 : ‹

Π2 ` P2 : τ

b Π2 ` P1 | P2 : τ Π1 d

Figure 9. Process typing rules to send on the channel a), a˜·ρ (representing the ability to receive on a), and a:ρ (combining both capabilities). Both varieties of context are linear, in the sense that they permit neither weakening nor contraction. Many of our rules are standard for a linear type system, but as linear type systems themselves are not quite standard, they still deserve some explanation. Because linear variables cannot be discarded, rules that serve as the leaves of proof trees require contexts that are either empty (as in T-U NIT) or that contain exactly what is being typed (as in T-VAR). Rules with multiple premises vary depending on how many of their subterms will eventually be evaluated. If only one of several will, then all those subexpressions should share the same contexts, as in T-W ITH. When multiple subexpressions will be evaluated, as in T-T ENSOR, the contexts must be divided among them. We write

4.2

Soundness

Taking the usual approach and defining soundness in terms of preservation—well-typed terms that step always step to well-typed

45

[EQP-R EFL] P ≡ P

[EQP-PAR]

[EQP-S YM]

P1 ≡ P10 P2 ≡ P20 P1 | P2 ≡ P10 | P20

P2 ≡ P1 P1 ≡ P2

[EQP-T RANS]

[EQP-N EW]

P1 ≡ P2 P2 ≡ P3 P1 ≡ P3

P ≡ P0 νa:ρ. P ≡ νa:ρ. P 0

[EQP-S WAP] νa1 :ρ1 . νa2 :ρ2 . P ≡ νa2 :ρ2 . νa1 :ρ1 . P

[EQP-C OMM] P1 | P2 ≡ P2 | P1

[EQP-A SSOC] (P1 | P2 ) | P3 ≡ P1 | (P2 | P3 )

[EQP-E XTRUDE]

a not free in P2 (νa:ρ. P1 ) | P2 ≡ νa:ρ. (P1 | P2 )

Figure 10. Process equivalence rules

b , which allows only a channel to be Proof. Recall the definition of d split over the two halves of a parallel composition. It is not possible to partition the atomic processes in a cycle without going through at least two edges, thus making it impossible to type check a process with a cyclic communication graph.

terms—and progress—well-typed non-values can always take a step—we observe that, while preservation makes sense on both expressions and processes, progress is only a property of well-typed processes, as there are certainly well-typed expressions that require the process evaluation rules to take a step. Preservation on expressions is straightforward, requiring the usual substitution lemma: Lemma 1 (Substitution). If Π; ∆1 , x:τ 0 , ∆2 ` e : τ Π0 ; ∆0 ` e0 : τ 0 , then Π, Π0 ; ∆1 , ∆0 , ∆2 ` {x 7→ e0 }e : τ .

Finally, we observe that acyclicity of communication graphs is preserved under process evaluation:

and

Lemma 6 (Acyclicity and evaluation). If the communication graph of P is acyclic and P −→ P 0 , then the graph of P 0 is also acyclic.

Lemma 2 (Expression preservation). If Π; ∆ ` e : τ and e −→ e0 , then Π; ∆ ` e0 : τ .

Proof. With respect to evaluation graphs, we observe that all evaluation steps amount to doing some combination of the following:

We have proved these results in the Coq proof assistant; the proofs are fairly standard, although the linear contexts introduce complexities that can usually be avoided in other systems, e.g., the need to reason about context permutation. Preservation and progress for processes are more complex. We first define a process equivalence relation ≡ as shown in Figure 10. This equivalence separates unimportant structural differences in process syntax from the evaluation rules of Figure 3, which determine how processes truly evolve. All of these equivalence rules are standard; they state that the precise position of binders, as well as the order and grouping of parallel composition, are irrelevant. We next introduce a notion of (not necessarily unique) canonical forms for processes: a canonically formed process is of the form νa1 :ρ1 . . . . νam :ρm . e1 | (e2 | (. . . | en )) for some m ≥ 0 and n ≥ 1. It is easy to see that any process can be put in canonical form by using the process equivalence rules.

1. the creation of a new vertex and a new edge connecting it to one existing vertex, e.g.

89:; ?>=< e1

89:; ?>=< e1

7→

89:; ?>=< e2

7→

89:; ?>=< e

7→

2. the deletion of a single edge, e.g.

89:; ?>=< e2

?>=< 89:; e3

89:; ?>=< e1

?>=< 89:; e2

3. the deletion of a single unconnected vertex, e.g.

4. and transferring the endpoint of an edge from one vertex to another by sending it across some other edge, e.g.

89:; ?>=< e1

Property 3 (Canonization). For any process P , there exists some P 0 in canonical form such that P ≡ P 0 . We define the communication graph of a process P to be the undirected6 multigraph in which the vertices are the atomic processes (that is, expressions) that make up P and an edge exists for each active channel a within the process, connecting the expressions containing a and a. (No edge exists for cab.) Since graphs are built out of atomic processes, it is easy to see that this graph structure is invariant under process equivalence.

89:; ?>=< e2

89:; ?>=< e3

7→

89:; ?>=< e1 BB BB BB BB B 89:; ?>=< ?>=< 89:; e2 e3

EP-G O involves one use of (1) along with uses of (4) corresponding to the number of channel endpoints in the argument to goρ . EP-A PP S INK can similarly be seen as a repetition of (4), while EP-C LOSE and EP-D ONE exactly correspond, respectively, to (2) and (3). All other evaluation rules do not impact the communication graph. Only (4) can conceivably create a cycle. If a cycle is created, the final step in its creation must be the connection of some atomic processes e1 and e2 . But this can only be facilitated by some e3 that is already connected to both e1 and e2 , in which case a cycle would already exist! Acyclic graphs can thus never become cyclic through application of these graph operations.

Property 4 (Graph invariance). For any processes P and P 0 where P ≡ P 0 , the communication graph of P 0 is isomorphic to the communication graph of P . We immediately notice a correspondence between well-typedness of a process and the acyclicity of its communication graph: Lemma 5 (Acyclicity and typing). If Π ` P : τ , then the communication graph of P is acyclic.

We can now tackle preservation and progress; our statements of both lemmas reflects the idea that both process typing and process evaluation are performed modulo the process equivalence relation.

6 One might imagine that the directed nature of communication in Lolliproc

would suggest directed graphs, but undirected graphs both entail stronger acyclicity properties and simplify the proof of process preservation.

46

Lemma 7 (Process preservation). If Π ` P1 : τ and there exists some P10 and P20 such that P1 ≡ P10 and P10 −→ P20 , then there exists some P2 such that P2 ≡ P20 and Π ` P2 : τ .

4.3

Strong normalization and confluence

Other properties common to simple, typed λ-calculi are strong normalization—the fact that all sequences of evaluations terminate— and confluence—the fact that all possible evaluations for a given term converge to the same final result. Although Lolliproc has a non-deterministic operational semantics, it still enjoys these properties.

Proof. Mostly straightforward, given the obvious extensions of Lemma 2 to evaluation contexts and processes. The difficulty comes from the requirement of the channel context splitting reb that at most one a:ρ binder be split at each step. We must lation d show that, given the canonical form of P20 , we can always rearrange the parallel compositions such that this is the case. Observe, however, that we can always do this if the communication graph of P20 (and thus its canonical form) is acyclic: we have our parallel compositions cut at most one edge at a time, and we will eventually reduce down to atomic processes. From Lemma 5 we already know that the communication graph of P1 and hence also P10 is acyclic, and thus from Lemma 6 we can conclude that the graph of P20 is acyclic as well. From this we can appropriately rearrange its canonical form to create a well-typed P2 .

Theorem 10 (Strong normalization). If Γ ` P : τ , any reduction sequence P ≡ P1 , P1 −→ P10 , P10 ≡ P2 , P2 −→ P20 , . . . will eventually terminate in some Pn0 such that there exists no Pn+1 0 0 and Pn+1 for which Pn0 ≡ Pn+1 and Pn+1 −→ Pn+1 . Proof. Since everything in our language is linear, subterms are never duplicated; thus we can verify strong normalization by assigning non-negative weights w (P ) to processes P and w (D) to derivations D of Γ; ∆ ` e : τ —which we abbreviate as w (e)— and showing that these weights always decrease with evaluation. We define w (νa:ρ. P ) = 1 + w (P ) and w (P1 | P2 ) = w (P1 ) + w (P2 ). For channel endpoints, we first define the length of a protocol type `(ρ) as `(‹) = 1, `(τ ( ρ) = 1 + `(ρ), and `(ρ1 &ρ2 ) = 1+max (`(ρ1 ), `(ρ2 )). Whenever a has type ρ, we define w (a) = `(ρ); similarly, when a has type ρe, we define w (a) = 2 · `(ρ) (as larger terms appear on the source side after communication). Since process communication always decreases the length of the protocol type, it will consequently decrease the weight of the composite process. We define w (goρ e) = 2 + 3 · `(ρ) + w (e), ensuring that its evaluation also decreases in weight even as it spawns a new process. The weights of most other expression forms are fairly straightforward; for instance, w (x) = w (()) = 0, w (λx:τ e) = 1 + w (e), w ((e1 , e2 )) = 1 + w (e1 ) + w (e2 ), and w (he1 , e2 i) = 1 + max w (e1 ), w (e2 ). The cases for yield and application are tricky, though, since the rules E-Y IELD OTHER and E-A PP S OURCE appear to increase the size of terms. For yield, we define w (yield e) = 1 + w (e) whenever e is either (goρ e0 ) or any source; otherwise, given that e is assigned the type τ , we define

For progress we must first define what it means for a process to be done evaluating. We use one of the simplest such definitions: a process has finished when it contains an atomic process that is a value and that is not a, a, or cab. Our proofs make use of the standard canonical forms properties: all expressions of a given type eventually reduce to certain forms. Some types have more canonical forms than usual, as sources and sinks are both values. Lemma 8 (Progress). If Π ` P : τ , then either P has finished or there exists some P1 and P2 such that P ≡ P1 and P1 −→ P2 . Proof. We proceed by examining each of the atomic processes within P . If, in doing so, we find an appropriate value or the opportunity to take a step, then we are done, but we may encounter an expression e stuck at the elimination of a sink or a yield on a source. In that case, we consider the atomic process e0 that contains the other endpoint of the channel in question. If e0 itself can take a step, we are done. If e0 is ready to communicate with e we stop searching, as we have found a matched source and sink. Otherwise, e0 itself is stuck at the elimination of a sink or a yield on a source for some different channel, in which case we recursively continue our search using the same procedure. Because P is well typed, it has an acyclic communication graph, so this search will eventually terminate in the identification of some matching source and sink that are ready to communicate. We then consider the canonical form of P and repeatedly push the appropriate channel binding inwards until the process matches the form of one of our communication rules.

w (yield e)

=

1 + w (let (y, z) = yield (goτ (‹ e) in z; y)

= =

5 + w (goτ (‹ e) 13 + w (e)

For applications, we must conservatively estimate how many times E-A PP S OURCE might be applied. For this we first define the height of a type h(τ ) such that h(τ ( ‹) = 1 + h(τ ) and h(τ ) = 0 otherwise. Assuming the derivation for e1 e2 gives e1 the type τ1 ( τ2 and e2 the type τ1 , then we can define w (e1 e2 ) = 1 + 14 · h(τ1 ) + w (e1 ) + w (e2 ), since the height of τ1 determines the maximum number of yields that could ever be introduced. With these definitions in place, it is clear by inspection of our evaluation rules that the weight of a process decreases with each evaluation step. Since weights are never negative, this assures us that evaluation always terminates.

From progress and preservation, we can state the standard soundness theorem:7 Theorem 9 (Soundness). If · ` P :τ , then there exists no P1 such that P ≡ P1 , P1 −→∗ P2 , and P2 has not completed but is not equivalent to any process that can step further.

With strong normalization, we can obtain confluence directly from local confluence (also known as the diamond property).

This soundness property guarantees freedom from deadlocks in Lolliproc, but our type system says nothing about whether an expression will evaluate to a single value or a composition of processes—both are considered acceptable final outcomes, and there is nothing preventing the programmer from, for instance, not matching each call to future with a corresponding call to wait. These concerns can be addressed in a language that also includes unrestricted types, however, which we will discuss in Section 5.

Theorem 11 (Local confluence). If Γ ` P : τ , and we have that P ≡ P1 , P ≡ P2 , P1 −→ P10 , and P2 −→ P20 , then there exist some P3 , P30 , P4 , and P40 such that P10 ≡ P3 , P3 −→ P30 , P20 ≡ P4 , P4 −→ P40 , and P30 ≡ P40 . Proof. Our expression evaluation rules are deterministic, and there is only one way to decompose an expression e into some E e0 such that some expression or process evaluation rule applies— and only one such rule will ever apply. Our only source of nondeterminism, then, is the parallel composition of processes. We

7 We are still working to extend our Coq proofs to preservation and progress

on processes; complications arise due to the relatively informal nature, by Coq’s standards, of our the graph-based reasoning.

47

must thus show that the evaluation P1 −→ P10 does not rule out subsequently applying the same steps that produced P2 −→ P20 , and vice-versa. We observe that, in a well-typed process, potential evaluation steps can never interfere with each other. We have only two endpoints for each process, so multiple acts of communication can never conflict, and since communication always involves values, it cannot conflict with some internal evaluation step on a non-value expression. And of course such internal steps cannot conflict with each other. It is thus easy to see that local confluence holds.

Recursion and non-determinism We have proved in Section 4.3 that Lolliproc is both strongly normalizing and confluent. However, one does not generally want to program in languages that rule out non-terminating programs, and in a concurrent setting it is common to want programs that might evaluate differently depending on which processes are available to communicate at which times, thus breaking confluence. One natural companion to Lolliproc’s existing constructs is recursive types µα[:κ]. τ , where any αs appearing within τ expand to µα[:κ].τ . Such types allow for full general recursion, can be used to encode many standard datatypes (e.g., lists over a given type), and, in our setting, enable looping protocols, for which there are many obvious applications. For instance, we could write a sessionserving server with the type µα[:•]. (ρ ⊗ α)), which could be used to send out any number of sessions for the protocol ρ. For controlled non-confluence, we can imagine a family of primitive functions like the one below

Strong normalization and confluence show that the concurrency available in Lolliproc is particularly well behaved. Strong normalization implies that there are no livelocks, while confluence implies a lack of race conditions, which could otherwise introduce irreconcilable nondeterminism.

5.

Future directions and related work

receive2τ1 ,τ2 ,τ : τ1 ( τ2 ( ((τ1 ( τ2 ( τ ) & (τ1 ( τ2 ( τ )) ( τ

Finally, we examine a few possible future directions of this work and look briefly at related systems. 5.1

A call to a receive function waits until a yield on one of its source arguments can succeed, then selects and applies the appropriate function from its additive product argument to handle that result and the other remaining sources. (We would, of course, want syntactic sugar for these functions.) This closely mimics the non-deterministic operations found in many concurrent languages—e.g., the join calculus [20, 21] and Erlang [3]—while still preserving our linearly typed channels. We would also likely want other constructs to handle cases for which receive is awkward: for instance, we might want non-deterministic analogs of map and fold for several sources of the same type.

Extending Lolliproc

Lolliproc is very far from being a full-fledged programming language. Many of the extensions needed to bridge this gap— compilation and runtime system, support for processes spread over the network, useful libraries, etc.—are beyond the scope of this paper, but several obvious extensions do warrant more discussion here. Unrestricted types and polymorphism Although we have defined Lolliproc such that all variables must be used exactly once, this is clearly an unrealistic simplification; unrestricted types must be accounted for somehow. In earlier work [31] we introduced an intuitionistic language System F◦ , an extension of the fully polymorphic System F in which the distinction between the linear and the unrestricted is handled at the kind level: a kind ? categorizes unrestricted types, while a kind ◦ categorizes linear types. System F◦ features a subkinding relation in which ? 6 ◦, implying that unrestricted types may safely be treated as though they were linear. We can extend this approach to encompass Lolliproc by introducing a protocol kind • such that • 6 ◦. We could then replace our syntactic separation of ρ types with the appropriate kinding rules. κ For function types—which System F◦ writes as → rather than the ( we use for Lolliproc—this gives us [K-A RR]

Γ ` τ 1 : κ1

Proof theory The expression typing rules in (Figure 7), when viewed as a logic, are clearly sound with respect to standard classical linear logic. To see why, note that we may consider only case where Π is empty, as channels do not occur in source programs. Our only nonstandard rules are then T-G O and T-Y IELD, but these are both admissible in standard linear logic. We leave establishing the completeness—with respect to the non-exponential fragment of standard linear logic—to future work. It would also be interesting to study the relationship between our evaluation rules and proof normalization—there seems to be a strong connection between our definition of channel endpoints and “focused” proofs [46]. 5.2

Related work

There is a vast literature on linear logic, its proof theory, and related type systems, ranging from applications to categorical semantics— we cannot possibly cover it all here. Thus we highlight the work most closely related to ours and suggest some connections that might be fruitful for subsequent research.

Γ ` τ2 : κ2 κ = • =⇒ κ2 = • κ Γ ` τ1 → τ2 : κ

Here Γ is an unrestricted context, binding both type variables and, although not relevant to this judgment, unrestricted term variables. Since such a system allows quantification over type variables α of kind •, we would also require dualized type variables α e, instantiated to ρe whenever α is instantiated with ρ. If we also allow ∀α:κ. ρ to be a protocol type—thus permitting types to be sent between processes—we gain even greater flexibility, allowing partially specified protocols dependent on protocol type variables. Adopting the techniques of System F◦ also allows us to address the concerns mentioned at the end of Section 4.2: we would know that, if e is a well-typed expression of type τ that does not contain any channel endpoints, e will eventually step to some isolated value v, regardless of how many processes may be spawned along the way. Here we appeal to an alternate operational semantics for System F◦ that tags values and types as they are substituted into expressions: this semantics guarantees that unrestricted values do not contain tagged linear objects, and, since channel endpoints do not appear in source programs, they would always appear tagged.

Intuitionistic linear types The intuitionistic fragment of linear logic has seen much use in programming languages [6, 9, 11, 29, 30]—particularly its connections to memory management [2, 13, 39, 42]. We recently looked at enforcing user-defined protocols in a linear variant of System F [31]. De Paiva and Ritter study an intuitionistic linear language that, like Lolliproc, is not directly involutive (i.e., τ is not identified with τ ⊥⊥ ); its operational semantics is reminiscent of the classical calculi described below. Classical natural deduction and control Natural deduction presentations of classical logics [10, 14, 35–37] typically use multiple conclusions judgments of the form: x1 :τ1 , . . . , xn :τn ` e : τ, yn+1 :τn+1 , . . . , ym :τm By duality, such a judgment is logically equivalent to ‹ ‹ x1 :τ1 , . . . , xn :τn , yn+1 :τn+1 , . . . , ym :τm `e:τ

48

This approach recovers the usual shape of the typing judgment and so can be reconciled more easily with type systems for functional programming. Moreover, if we recall that τ ( ‹ is the type of a continuation accepting τ values, it is possible to read the ys above as binding continuations. Operational semantics in this setting implement commuting conversions, which give rise to nondeterminism. The correspondence with concurrency is obscured, however, because these semantics rely on decomposing a single term (often using evaluation contexts). The connection between classical logic and control operators has been known for some time [17, 23, 34]. As mentioned in Section 2, control has the type of double-negation elimination; the more familiar callcc can similarly be given the type of Peirce’s Law. While these operations cannot be directly imported to the linear setting, they are a major part of the inspiration for our approach. Linear continuations in otherwise unrestricted languages have also been studied, as they indicate certain predictable patterns of control flow [8, 19]. Berdine’s dissertation [7], for example, shows how such continuations can be used to implement coroutines; Lolliproc goes further by allowing true concurrent evaluation. Our process typing rules can be seen as an alternative to the multiple conclusion judgment style described above. While these systems give all auxiliary conclusions a continuation type τ ( ‹, our helper processes simply have type ‹. A practical consequence of our design is that, since processes appear only at runtime, a type checker for a language based on Lolliproc would not need to implement these rules at all.

reusing as much standard machinery as possible. Additionally, πcalculus type systems are not as tightly coupled with logics as λcalculus type systems are, though there has been some work on using π-calculus terms to describe proof reductions [5].

Linear sequent calculi In order to take advantage of the symmetries discussed in Section 2, languages and proof terms based on linear sequent calculi [1, 22] feature a multiplicative disjunction ` and define τ1 ( τ2 as τ1‹ ` τ2 . It has proved difficult, however, to find intuitions for ` in a standard functional programming setting that fit as naturally as those for ⊗, &, and ⊕ [44]. We can encode ` in Lolliproc by noting following the logical equivalence:

References

5.3

Conclusion

We have presented Lolliproc, a concurrent language whose design separates source programs from the processes they spawn at runtime, while retaining a close correspondence to classical linear logic. Though simple, Lolliproc can express useful protocols whose well-behaved interactions are enforced by session types. It is our hope that Lolliproc will inspire language designers, if not to build their next language on its ideas, then at least to consider what linear types might have to offer in terms of concurrency. Whether or not this comes to pass, however, we feel that our approach offers an appealing point in the design space of concurrent calculi.

Acknowledgments The authors thank the anonymous reviewers, the Penn PL Club, and the MSR Cambridge types wrestling group for their feedback about this work. Phil Wadler and Guillaume Munch-Maccagnoni also provided excellent suggestions about how to improve this paper. This work was supported in part by NSF Grant CCF-541040 and some of this research was conducted while the second author was a visiting researcher at Microsoft Research, Cambridge.

[1] Samson Abramsky. Computational interpretations of linear logic. Theoretical Computer Science, 111:3–57, 1993. [2] Amal Ahmed, Matthew Fluet, and Greg Morrisett. L3: A linear language with locations. Fundam. Inf., 77(4):397–449, 2007. [3] Joe Armstrong, Robert Virding, Claes Wikstr¨om, and Mike Williams. Concurrent Programming in Erlang. Prentice-Hall, 1996. [4] Emmanuel Beffara. A concurrent model for linear logic. Electronic Notes in Theoretical Computer Science, 155:147–168, 2006.

τ1 ` τ2 ⇐⇒ ((τ1 ( ‹) ( τ2 ) & ((τ2 ( ‹) ( τ1 )

[5] G. Bellin and P. J. Scott. On the π-calculus and linear logic. Theoretical Computer Science, 135(1):11–65, 1994.

We will not be able to construct an object of this type unless we can eliminate some τ ( ‹ without producing a witness of type ‹, which requires the existence of another process and a channel over which we can send the closed channel token. Thus ` serves as a way of internalizing—and at least partially suspending—two processes within one, although it cannot exist in isolation. The choice of projections offered by & internalizes the commutativity of the ‘|’ constructor of process terms. Zeilberger presented an interesting sequent calculus [46] that, while not actually linear, makes use of the connectives of linear logic for their polarity and gives a term assignment in which eager positive connectives and lazy negative connectives coexist harmoniously. The dual calculus [43] and Filinski’s language [18] are also tightly tied to sequent calculus while being closer to standard term languages than, e.g., proof nets. All of these languages define programs as interactions between terms and co-terms, departing rather significantly from the norm in functional programming.

[6] Nick Benton, G. M. Bierman, J. Martin E. Hyland, and Valeria de Paiva. A term calculus for intuitionistic linear logic. In Proceedings of the International Conference on Typed Lambda Calculi and Applications, pages 75–90. Springer-Verlag LNCS 664, 1993. [7] Josh Berdine. Linear and Affine Typing of Continuation-Passing Style. PhD thesis, Queen Mary, University of London, 2004. [8] Josh Berdine, Peter W. O’Hearn, Uday S. Reddy, and Hayo Thielecke. Linearly used continuations. In Proceedings of the Continuations Workshop, 2001. [9] G. M. Bierman, A. M. Pitts, and C. V. Russo. Operational properties of Lily, a polymorphic linear lambda calculus with recursion. In Fourth International Workshop on Higher Order Operational Techniques in Semantics, Montral, volume 41 of Electronic Notes in Theoretical Computer Science. Elsevier, 2000. [10] Gavin Bierman. A classical linear lambda calculus. Computer Science, 227(1–2):43–78, 1999.

Process calculi Many type systems exist for the π-calculus [32], some able to guarantee sophisticated properties; Kobayashi [27] gives a good overview of this area. Many of these type systems use linearity in one form or another [4, 28], and, in particular, session types [12, 25, 38, 40] originated in this setting. The Sing] language, which ensures safety for its light-weight processes through its type system, takes many ideas from the world of process calculi [15]. Programming in a process calculus, however, is also rather different from programming in a traditional functional language, and it is not always clear how to best take ideas from that setting while

Theoretical

[11] Gavin M. Bierman. Program equivalence in a linear functional language. Journal of Functional Programming, 10(2), 2000. [12] Lu´ıs Caires and Frank Pfenning. Session types as intuitionistic linear propositions. In Proceedings of the 21st International Conference on Concurrency Theory (CONCUR 2010), Paris, France, August 2010. Springer LNCS. [13] Arthur Chargu´eraud and Franc¸ois Pottier. Functional translation of a calculus of capabilities. In ICFP ’08: Proceeding of the 13th ACM SIGPLAN international conference on Functional programming, pages 213–224, New York, NY, USA, 2008. ACM.

49

[32] R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes. Information and Computation, 100(1):1–77, 1992. [33] J. Niehren, J. Schwinghammer, and G. Smolka. A concurrent lambda calculus with futures. Theor. Comput. Sci., 364(3):338–356, 2006. [34] C.-H. L. Ong and C. A. Stewart. A curry-howard foundation for functional computation with control. In Proc. 24th ACM Symp. on Principles of Programming Languages (POPL), pages 215–227, Paris, France, 1997. [35] Michel Parigot. λµ-calculus: An algorithmic interpretation of classical natural deduction. In Proceedings of the International Conference on Logic Programming and Automated Reasoning, volume 624 of Lecture Notes in Computer Science, pages 190–201. Springer, 1992. [36] Michel Parigot. Classical proofs as programs. In Proceedings of the 3rd Kurt G¨odel Colloquium, volume 713 of Lecture Notes in Computer Science, pages 263–276. Springer-Verlag, 1993. [37] Eike Ritter, David J. Pym, and Lincoln A. Wallen. Proof-terms for classical and intuitionistic resolution. Journal of Logic and Computation, 10(2):173–207, 2000. [38] Kaku Takeuchi, Kohei Honda, and Makoto Kubo. An interactionbased language and its typing system. In Proceedings of PARLE’94, pages 398–413. Springer-Verlag, 1994. Lecture Notes in Computer Science number 817. [39] David N. Turner and Philip Wadler. Operational interpretations of linear logic. Theoretical Computer Science, 227(1-2):231–248, September 1999. [40] Vasco T. Vasconcelos, Simon J. Gay, and Ant´onio Ravara. Type checking a multithreaded functional language with session types. Theoretical Computer Science, 368(1–2):64–87, 2006. [41] Edsko Vries, Rinus Plasmeijer, and David M. Abrahamson. Uniqueness typing simplified. In Implementation and Application of Functional Languages: 19th International Workshop, IFL 2007, Freiburg, Germany, September 27-29, 2007. Revised Selected Papers, pages 201–218, Berlin, Heidelberg, 2008. Springer-Verlag. [42] Philip Wadler. Linear types can change the world! In M. Broy and C. Jones, editors, Progarmming Concepts and Methods, Sea of Galilee, Israel, April 1990. North Holland. IFIP TC 2 Working Conference. [43] Philip Wadler. Call-by-value is dual to call-by-name. In ICFP ’03: Proceedings of the eighth ACM SIGPLAN international conference on Functional programming, pages 189–201, New York, NY, USA, 2003. ACM. [44] Philip Wadler. Down with the bureaucracy of syntax! Pattern matching for classical linear logic. unpublished manuscript, 2004. [45] Nobuko Yoshida, Kohei Honda, and Martin Berger. Linearity and bisimulation. J. Log. Algebr. Program., 72(2):207–238, 2007. [46] Noam Zeilberger. On the unity of duality. Annals of Pure and Applied Logic, 153(1–3):66–96, 2006. [47] Dengping Zhu and Hongwei Xi. Safe Programming with Pointers through Stateful Views. In Proceedings of the 7th International Symposium on Practical Aspects of Declarative Languages, pages 83–97, Long Beach, CA, January 2005. Springer-Verlag LNCS vol. 3350.

[14] Valeria de Paiva and Eike Ritter. A parigot-style linear lambdacalculus for full intuitionistic linear logic. Theory and Applications of Categories, 17(3), 2006. [15] Manuel F¨ahndrich, Mark Aiken, Chris Hawblitzel, Orion Hodson, Galen Hunt, James R. Larus, and Steven Levi. Language support for fast and reliable message-based communication in singularity os. SIGOPS Oper. Syst. Rev., 40(4):177–190, 2006. [16] Manuel F¨ahndrich and Robert DeLine. Adoption and focus: Practical linear types for imperative programming. In Proc. of the SIGPLAN Conference on Programming Language Design, pages 13–24, Berlin, Germany, June 2002. [17] M. Felleisen and R. Hieb. A revised report on the syntactic theories of sequential control and state. Theoretical Computer Science, 103(2):235–271, 1992. [18] Andrzej Filinski. Declarative continuations and categorical duality. Master’s thesis, University of Copenhagen, August 1989. [19] Andrzej Filinski. Linear continuations. In Proc. 19th ACM Symp. on Principles of Programming Languages (POPL), pages 27–38, 1992. [20] C. Fournet and G. Gonthier. The Reflexive CHAM and the JoinCalculus. In Proc. ACM Symp. on Principles of Programming Languages (POPL), pages 372–385, 1996. [21] C´edric Fournet. The Join-Calculus: a Calculus for Distributed Mobile ´ Programming. PhD thesis, Ecole Polytechnique, nov 1998. [22] Jean-Yves Girard. Linear logic. Theoretical Computer Science, 50:1– 102, 1987. [23] Timothy G. Griffin. A formulae-as-types notion of control. In Conference Record of the Seventeenth Annual ACM Symposium on Principles of Programming Languages, pages 47–58. ACM Press, 1990. [24] Michael Hicks, Greg Morrisett, Dan Grossman, and Trevor Jim. Experience with safe manual memory-management in Cyclone. In ISMM ’04: Proceedings of the 4th international symposium on Memory management, pages 73–84, New York, NY, USA, 2004. ACM. [25] Kohei Honda, Vasco T. Vasconcelos, and Makoto Kubo. Language primitives and type discipline for structured communication-based programming. In ESOP98, volume 1381 of LNCS, pages 122–138. Springer-Verlag, 1998. [26] W. A. Howard. The formulae-as-types notion of contstruction. In To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus, and Formalism. Academic Press, 1980. [27] Naoki Kobayashi. Type systems for concurrent programs. In Proceedings of UNU/IIST 10th Anniversary Cooloquium, March 2002. [28] Naoki Kobayashi, Benjamin C. Pierce, and David N. Turner. Linearity and the Pi-Calculus. Transactions on Programming Languages and Systems, 21(5):914–947, 1999. [29] Yves Lafont. The linear abstract machine. Theoretical Computer Science, 59:157–180, 1988. Corrections in vol. 62, pp. 327–328. [30] John Maraist, Martin Odersky, David N. Turner, and Philip Wadler. Call-by-name, call-by-value, call-by-need, and the linear lambda calculus. In 11’th International Conference on the Mathematical Foundations of Programming Semantics, New Orleans, Lousiana, – 1995. [31] Karl Mazurak, Jianzhou Zhao, and Steve Zdancewic. Lightweight linear types in System F◦ . In TLDI ’10: Proceedings of the 5th ACM SIGPLAN workshop on Types in language design and implementation, pages 77–88, New York, NY, USA, 2010. ACM.

50

Abstracting Abstract Machines David Van Horn ∗

Matthew Might

Northeastern University [email protected]

University of Utah [email protected]

Abstract

We demonstrate that the technique of refactoring a machine with store-allocated continuations allows a direct structural abstraction1 by bounding the machine’s store. Thus, we are able to convert semantic techniques used to model language features into static analysis techniques for reasoning about the behavior of those very same features. By abstracting well-known machines, our technique delivers static analyzers that can reason about by-need evaluation, higher-order functions, tail calls, side effects, stack structure, exceptions and first-class continuations. The basic idea behind store-allocated continuations is not new. SML/NJ has allocated continuations in the heap for well over a decade [28]. At first glance, modeling the program stack in an abstract machine with store-allocated continuations would not seem to provide any real benefit. Indeed, for the purpose of defining the meaning of a program, there is no benefit, because the meaning of the program does not depend on the stack-implementation strategy. Yet, a closer inspection finds that store-allocating continuations eliminate recursion from the definition of the state-space of the machine. With no recursive structure in the state-space, an abstract machine becomes eligible for conversion into an abstract interpreter through a simple structural abstraction. To demonstrate the applicability of the approach, we derive abstract interpreters of:

We describe a derivational approach to abstract interpretation that yields novel and transparently sound static analyses when applied to well-established abstract machines. To demonstrate the technique and support our claim, we transform the CEK machine of Felleisen and Friedman, a lazy variant of Krivine’s machine, and the stack-inspecting CM machine of Clements and Felleisen into abstract interpretations of themselves. The resulting analyses bound temporal ordering of program events; predict return-flow and stack-inspection behavior; and approximate the flow and evaluation of by-need parameters. For all of these machines, we find that a series of well-known concrete machine refactorings, plus a technique we call store-allocated continuations, leads to machines that abstract into static analyses simply by bounding their stores. We demonstrate that the technique scales up uniformly to allow static analysis of realistic language features, including tail calls, conditionals, side effects, exceptions, first-class continuations, and even garbage collection. Categories and Subject Descriptors F.3.2 [Logics and Meanings of Programs]: Semantics of Programming Languages—Program analysis, Operational semantics; F.4.1 [Mathematical Logic and Formal Languages]: Mathematical Logic—Lambda calculus and related systems General Terms Keywords

1.

• a call-by-value λ-calculus with state and control based on the

Languages, Theory

CESK machine of Felleisen and Friedman [13],

abstract machines, abstract interpretation

• a call-by-need λ-calculus based on a tail-recursive, lazy vari-

ant of Krivine’s machine derived by Ager, Danvy and Midtgaard [1], and

Introduction

• a call-by-value λ-calculus with stack inspection based on the

Abstract machines such as the CEK machine and Krivine’s machine are first-order state transition systems that represent the core of a real language implementation. Semantics-based program analysis, on the other hand, is concerned with safely approximating intensional properties of such a machine as it runs a program. It seems natural then to want to systematically derive analyses from machines to approximate the core of realistic run-time systems. Our goal is to develop a technique that enables direct abstract interpretations of abstract machines by methods for transforming a given machine description into another that computes its finite approximation.

CM machine of Clements and Felleisen [3]; and use abstract garbage collection to improve precision [25]. Overview In Section 2, we begin with the CEK machine and attempt a structural abstract interpretation, but find ourselves blocked by two recursive structures in the machine: environments and continuations. We make three refactorings to: 1. store-allocate bindings,

∗ Supported by the National Science Foundation under grant 0937060 to the

2. store-allocate continuations, and

Computing Research Association for the CIFellow Project.

3. time-stamp machine states; resulting in the CESK, CESK? , and time-stamped CESK? machines, respectively. The time-stamps encode the history (context) of the machine’s execution and facilitate context-sensitive abstractions. We then demonstrate that the time-stamped machine abstracts directly into a parameterized, sound and computable static analysis.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

1

51

A structural abstraction distributes component-, point-, and member-wise.

In Section 3, we replay this process (slightly abbreviated) with a lazy variant of Krivine’s machine to arrive at a static analysis of by-need programs. In Section 4, we incorporate conditionals, side effects, exceptions, first-class continuations, and garbage collection. In Section 6, we abstract the CM (continuation-marks) machine to produce an abstract interpretation of stack inspection. In Section 7, we widen the abstract interpretations with a singlethreaded “global” store to accelerate convergence. For some of our analyzers, this widening results in polynomial-time algorithms and connects them back to known analyses.

2.

ς 7−→CEK ς 0 hx, ρ, κi h(e0 e1 ), ρ, κi hv, ρ, ar(e, ρ0 , κ)i hv, ρ, fn((λx.e), ρ0 , κ)i

hv, ρ0 , κi where ρ(x) = (v, ρ0 ) he0 , ρ, ar(e1 , ρ, κ)i he, ρ0 , fn(v, ρ, κ)i he, ρ0 [x 7→ (v, ρ)], κi

Figure 1. The CEK machine. An expression is either a value or uniquely decomposable into an evaluation context and redex. The standard reduction machine is:

From CEK to the abstract CESK?

In this section, we start with a traditional machine for a programming language based on the call-by-value λ-calculus, and gradually derive an abstract interpretation of this machine. The outline followed in this section covers the basic steps for systematically deriving abstract interpreters that we follow throughout the rest of the paper. To begin, consider the following language of expressions:2

E[e] 7−→βv E[e0 ], if e βv e0 . However, this machine does not shed much light on a realistic implementation. At each step, the machine traverses the entire source of the program looking for a redex. When found, the redex is reduced and the contractum is plugged back in the hole, then the process is repeated. Abstract machines such as the CEK machine, which are derivable from standard reduction machines, offer an extensionally equivalent but more realistic model of evaluation that is amenable to efficient implementation. The CEK is environment-based; it uses environments and closures to model substitution. It represents evaluation contexts as continuations, an inductive data structure that models contexts in an inside-out manner. The key idea of machines such as the CEK is that the whole program need not be traversed to find the next redex, consequently the machine integrates the process of plugging a contractum into a context and finding the next redex. States of the CEK machine [12] consist of a control string (an expression), an environment that closes the control string, and a continuation: ς∈Σ = Exp × Env × Kont v ∈ Val ::= (λx.e) ρ ∈ Env = Var →fin Val × Env κ ∈ Kont ::= mt | ar(e, ρ, κ) | fn(v, ρ, κ).

e ∈ Exp ::= x | (ee) | (λx.e) x ∈ Var a set of identifiers. A standard machine for evaluating this language is the CEK machine of Felleisen and Friedman [12], and it is from this machine we derive the abstract semantics—a computable approximation of the machine’s behavior. Most of the steps in this derivation correspond to well-known machine transformations and real-world implementation techniques—and most of these steps are concerned only with the concrete machine; a very simple abstraction is employed only at the very end. The remainder of this section is outlined as follows: we present the CEK machine, to which we add a store, and use it to allocate variable bindings. This machine is just the CESK machine of Felleisen and Friedman [13]. From here, we further exploit the store to allocate continuations, which corresponds to a well-known implementation technique used in functional language compilers [28]. We then abstract only the store to obtain a framework for the sound, computable analysis of programs. A standard approach to evaluating programs is to rely on a CurryFeys-style Standardization Theorem, which says roughly: if an expression e reduces to e0 in, e.g., the call-by-value λ-calculus, then e reduces to e0 in a canonical manner. This canonical manner thus determines a state machine for evaluating programs: a standard reduction machine. To define such a machine for our language, we define a grammar of evaluation contexts and notions of reduction (e.g., βv ). An evaluation context is an expression with a “hole” in it. For left-to-right evaluation order, we define evaluation contexts E as:

States are identified up to consistent renaming of bound variables. Environments are finite maps from variables to closures. Environment extension is written ρ[x 7→ (v, ρ0 )]. Evaluation contexts E are represented (inside-out) by continuations as follows: [ ] is represented by mt; E[([ ]e)] is represented by ar(e0 , ρ, κ) where ρ closes e0 to represent e and κ represents E; E[(v[ ])] is represented by fn(v 0 , ρ, κ) where ρ closes v 0 to represent v and κ represents E. The transition function for the CEK machine is defined in Figure 1 (we follow the textbook treatment of the CEK machine [11, page 102]). The initial machine state for a closed expression e is given by the inj function:

E ::= [ ] | (Ee) | (vE).

inj CEK (e) = he, ∅, mti.

2.1

The CEK machine

Typically, an evaluation function is defined as a partial function from closed expressions to answers:

2 Fine

print on syntax: As is often the case in program analysis where semantic values are approximated using syntactic phrases of the program under analysis, we would like to be able to distinguish different syntactic occurrences of otherwise identical expressions within a program. Informally, this means we want to track the source location of expressions. Formally, this is achieved by labeling expressions and assuming all labels within a program are distinct:

eval 0CEK (e) = (v, ρ) if inj (e) 7−→ →CEK hv, ρ, mti. This gives an extensional view of the machine, which is useful, e.g., to prove correctness with respect to a canonical evaluation function such as one defined by standard reduction or compositional valuation. However for the purposes of program analysis, we are concerned more with the intensional aspects of the machine. As such, we define the meaning of a program as the (possibly infinite) set of reachable machine states:

e ∈ Exp ::= x` | (ee)` | (λx.e)` ` ∈ Lab an infinite set of labels. However, we judiciously omit labels whenever they are irrelevant and doing so improves the clarity of the presentation. Consequently, they appear only in Sections 2.7 and 7, which are concerned with k-CFA.

eval CEK (e) = {ς | inj (e) 7−→ →CEK ς}.

52

Deciding membership in the set of reachable machine states is not possible due to the halting problem. The goal of abstract interpretation, then, is to construct a function, aval CEK [ , that is a sound and computable approximation to the eval CEK function. We can do this by constructing a machine that is similar in structure to the CEK machine: it is defined by an abstract state transition ˆ ˆ relation (7−→CEK [ ) ⊆ Σ × Σ, which operates over abstract states, ˆ Σ, which approximate the states of the CEK machine, and an abˆ that maps concrete machine states into straction map α : Σ → Σ abstract machine states. The abstract evaluation function is then defined as:

ς 7−→CESK ς 0 hx, ρ, σ, κi hv, ρ0 , σ, κi where σ(ρ(x)) = (v, ρ0 ) h(e0 e1 ), ρ, σ, κi he0 , ρ, σ, ar(e1 , ρ, κ)i hv, ρ, σ, ar(e, ρ0 , κ)i he, ρ0 , σ, fn(v, ρ, κ)i 0 0 he, ρ [x 7→ a], σ[a 7→ (v, ρ)], κi hv, ρ, σ, fn((λx.e), ρ , κ)i where a ∈ / dom(σ) Figure 2. The CESK machine.

aval CEK ς | α(inj (e)) 7−→ →CEK [ (e) = {ˆ [ ςˆ}.

The state space for the CESK machine is defined as follows: ς∈Σ = Exp × Env × Store × Kont ρ ∈ Env = Var →fin Addr σ ∈ Store = Addr →fin Storable s ∈ Storable = Val × Env a, b, c ∈ Addr an infinite set.

1. We achieve decidability by constructing the approximation in such a way that the state-space of the abstracted machine is finite, which guarantees that for any closed expression e, the set aval (e) is finite. 2. We achieve soundness by demonstrating the abstracted machine transitions preserve the abstraction map, so that if ς 7−→ ς 0 and α(ς) v ςˆ, then there exists an abstract state ςˆ0 such that ςˆ 7−→ ςˆ0 and α(ς 0 ) v ςˆ0 .

States are identified up to consistent renaming of bound variables and addresses. The transition function for the CESK machine is defined in Figure 2 (we follow the textbook treatment of the CESK machine [11, page 166]). The initial state for a closed expression is given by the inj function, which combines the expression with the empty environment, store, and continuation:

A first attempt at abstract interpretation: A simple approach to abstracting the machine’s state space is to apply a structural abstract interpretation, which lifts abstraction point-wise, elementwise, component-wise and member-wise across the structure of a machine state (i.e., expressions, environments, and continuations). The problem with the structural abstraction approach for the CEK machine is that both environments and continuations are recursive structures. As a result, the map α yields objects in an abstract state-space with recursive structure, implying the space is infinite. It is possible to perform abstract interpretation over an infinite state-space, but it requires a widening operator. A widening operator accelerates the ascent up the lattice of approximation and must guarantee convergence. It is difficult to imagine a widening operator, other than the one that jumps immediately to the top of the lattice, for these semantics. Focusing on recursive structure as the source of the problem, a reasonable course of action is to add a level of indirection to the recursion—to force recursive structure to pass through explicitly allocated addresses. In doing so, we will unhinge recursion in a program’s data structures and its control-flow from recursive structure in the state-space. We turn our attention next to the CESK machine [10, 13], since the CESK machine eliminates recursion from one of the structures in the CEK machine: environments. In the subsequent section (Section 2.3), we will develop a CESK machine with a pointer refinement (CESK? ) that eliminates the other source of recursive structure: continuations. At that point, the machine structurally abstracts via a single point of approximation: the store. 2.2

inj CESK (e) = he, ∅, ∅, mti. The eval CESK evaluation function is defined following the template of the CEK evaluation given in Section 2.1: eval CESK (e) = {ς | inj (e) 7−→ →CESK ς}. Observe that for any closed expression, the CEK and CESK machines operate in lock-step: each machine transitions, by the corresponding rule, if and only if the other machine transitions. Lemma 1 (Felleisen, [10]). eval CESK (e) ' eval CEK (e). A second attempt at abstract interpretation: With the CESK machine, half the problem with the attempted na¨ıve abstract interpretation is solved: environments and closures are no longer mutually recursive. Unfortunately, continuations still have recursive structure. We could crudely abstract a continuation into a set of frames, losing all sense of order, but this would lead to a static analysis lacking faculties to reason about return-flow: every call would appear to return to every other call. A better solution is to refactor continuations as we did environments, redirecting the recursive structure through the store. In the next section, we explore a CESK machine with a pointer refinement for continuations. 2.3

The CESK? machine

To untie the recursive structure associated with continuations, we shift to store-allocated continuations. The Kont component of the machine is replaced by a pointer to a continuation allocated in the store. We term the resulting machine the CESK? (control, environment, store, continuation pointer) machine. Notice the store now maps to denotable values and continuations:

The CESK machine

The states of the CESK machine extend those of the CEK machine to include a store, which provides a level of indirection for variable bindings to pass through. The store is a finite map from addresses to storable values and environments are changed to map variables to addresses. When a variable’s value is looked-up by the machine, it is now accomplished by using the environment to look up the variable’s address, which is then used to look up the value. To bind a variable to a value, a fresh location in the store is allocated and mapped to the value; the environment is extended to map the variable to that address.

ς∈Σ = Exp × Env × Store × Addr s ∈ Storable = Val × Env + Kont κ ∈ Kont ::= mt | ar(e, ρ, a) | fn(v, ρ, a). The revised machine is defined in Figure 3 and the initial machine state is defined as: inj CESK ? (e) = he, ∅, [a0 7→ mt], a0 i.

53

ς 7−→CESK ?t ς 0 , where κ = σ(a), b = alloc(ς), u = tick(t)

ς 7−→CESK ? ς 0 , where κ = σ(a), b ∈ / dom(σ) hx, ρ, σ, ai h(e0 e1 ), ρ, σ, ai hv, ρ, σ, ai if κ = ar(e, ρ0 , c) if κ = fn((λx.e), ρ0 , c)

hv, ρ0 , σ, ai where (v, ρ0 ) = σ(ρ(x)) he0 , ρ, σ[b 7→ ar(e1 , ρ, a)], bi

hx, ρ, σ, a, ti hv, ρ0 , σ, a, ui where (v, ρ0 ) = σ(ρ(x)) h(e0 e1 ), ρ, σ, a, ti he0 , ρ, σ[b 7→ ar(e1 , ρ, a)], b, ui hv, ρ, σ, a, ti if κ = ar(e, ρ, c) he, ρ, σ[b 7→ fn(v, ρ, c)], b, ui if κ = fn((λx.e), ρ0 , c) he, ρ0 [x 7→ b], σ[b 7→ (v, ρ)], c, ui

he, ρ0 , σ[b 7→ fn(v, ρ, c)], bi he, ρ0 [x 7→ b], σ[b 7→ (v, ρ)], ci

Figure 3. The CESK? machine.

Figure 4. The time-stamped CESK? machine.

The evaluation function (not shown) is defined along the same lines as those for the CEK (Section 2.1) and CESK (Section 2.2) machines. Like the CESK machine, it is easy to relate the CESK? machine to its predecessor; from corresponding initial configurations, these machines operate in lock-step:

The tick function returns the next time; the alloc function allocates a fresh address for a binding or continuation. We require of tick and alloc that for all t and ς, t < tick(t) and alloc(ς) ∈ / σ where ς = h , , σ, , i. The time-stamped CESK? machine is defined in Figure 4. Note that occurrences of ς on the right-hand side of this definition are implicitly bound to the state occurring on the left-hand side. The initial machine state is defined as:

Lemma 2. eval CESK ? (e) ' eval CESK (e). Addresses, abstraction and allocation: The CESK? machine, as defined in Figure 3, nondeterministically chooses addresses when it allocates a location in the store, but because machines are identified up to consistent renaming of addresses, the transition system remains deterministic. Looking ahead, an easy way to bound the state-space of this machine is to bound the set of addresses.3 But once the store is finite, locations may need to be reused and when multiple values are to reside in the same location; the store will have to soundly approximate this by joining the values. In our concrete machine, all that matters about an allocation strategy is that it picks an unused address. In the abstracted machine however, the strategy may have to re-use previously allocated addresses. The abstract allocation strategy is therefore crucial to the design of the analysis—it indicates when finite resources should be doled out and decides when information should deliberately be lost in the service of computing within bounded resources. In essence, the allocation strategy is the heart of an analysis (allocation strategies corresponding to well-known analyses are given in Section 2.7.) For this reason, concrete allocation deserves a bit more attention in the machine. An old idea in program analysis is that dynamically allocated storage can be represented by the state of the computation at allocation time [18, 22, Section 1.2.2]. That is, allocation strategies can be based on a (representation) of the machine history. These representations are often called time-stamps. A common choice for a time-stamp, popularized by Shivers [29], is to represent the history of the computation as contours, finite strings encoding the calling context. We present a concrete machine that uses general time-stamp approach and is parameterized by a choice of tick and alloc functions. We then instantiate tick and alloc to obtain an abstract machine for computing a kCFA-style analysis using the contour approach. 2.4

inj CESKt? (e) = he, ∅, [a0 7→ mt], a0 , t0 i. Satisfying definitions for the parameters are: Time = Addr = Z a 0 = t0 = 0

alloch , , , , ti = t.

Under these definitions, the time-stamped CESK? machine operates in lock-step with the CESK? machine, and therefore with the CESK and CEK machines as well. Lemma 3. eval CESKt? (e) ' eval CESK ? (e). The time-stamped CESK? machine forms the basis of our abstracted machine in the following section. 2.5

The abstract time-stamped CESK? machine

As alluded to earlier, with the time-stamped CESK? machine, we now have a machine ready for direct abstract interpretation via a single point of approximation: the store. Our goal is a machine that resembles the time-stamped CESK? machine, but operates over a finite state-space and it is allowed to be nondeterministic. Once the state-space is finite, the transitive closure of the transition relation becomes computable, and this transitive closure constitutes a static analysis. Buried in a path through the transitive closure is a (possibly infinite) traversal that corresponds to the concrete execution of the program. The abstracted variant of the time-stamped CESK? machine comes from bounding the address space of the store and the number of times available. By bounding these sets, the state-space becomes finite,4 but for the purposes of soundness, an entry in the store may be forced to hold several values simultaneously: [ = Addr →fin P (Storable). σ ˆ ∈ Store

The time-stamped CESK? machine

Hence, stores now map an address to a set of storable values rather than a single value. These collections of values model approximation in the analysis. If a location in the store is re-used, the new value is joined with the current set of values. When a location is dereferenced, the analysis must consider any of the values in the set as a result of the dereference. The abstract time-stamped CESK? machine is defined in Figure 5. The (non-deterministic) abstract transition relation changes little compared with the concrete machine. We only have to modify it to account for the possibility that multiple storable values (which

The machine states of the time-stamped CESK? machine include a time component, which is intentionally left unspecified: t, u ∈ Time ς ∈ Σ = Exp × Env × Store × Addr × Time. The machine is parameterized by the functions: tick : Σ → Time

tickh , , , , ti = t + 1

alloc : Σ → Addr .

3A

finite number of addresses leads to a finite number of environments, which leads to a finite number of closures and continuations, which in turn, leads to a finite number of stores, and finally, a finite number of states.

4 Syntactic

54

sets like Exp are infinite, but finite for any given program.

0 d κ) [ ς , κ), u = tick(t, ςˆ 7−→CESK ˆ (a), b = alloc(ˆ \? ςˆ , where κ ∈ σ

α(e, ρ, σ, a, t) = (e, α(ρ), α(σ), α(a), α(t))

[states]

t

α(ρ) = λx.α(ρ(x)) G α(σ) = λˆ a. {α(σ(a))}

hx, ρ, σ ˆ , a, ti hv, ρ0 , σ ˆ , a, ui where (v, ρ0 ) ∈ σ ˆ (ρ(x)) he0 , ρ, σ ˆ t [b 7→ ar(e1 , ρ, a)], b, ui h(e0 e1 ), ρ, σ ˆ , a, ti hv, ρ, σ ˆ , a, ti if κ = ar(e, ρ0 , c) he, ρ0 , σ ˆ t [b 7→ fn(v, ρ, c)], b, ui if κ = fn((λx.e), ρ0 , c) he, ρ0 [x 7→ b], σ ˆ t [b 7→ (v, ρ)], c, ui

[environments] [stores]

α(a)=ˆ a

α((λx.e), ρ) = ((λx.e), α(ρ)) α(mt) = mt

[closures] [continuations]

α(ar(e, ρ, a)) = ar(e, α(ρ), α(a)) Figure 5. The abstract time-stamped CESK? machine.

α(fn(v, ρ, a)) = fn(v, α(ρ), α(a)), ˆ \? . Figure 6. The abstraction map, α : ΣCESK ?t → Σ CESK

includes continuations) may reside together in the store, which we handle by letting the machine non-deterministically choose a particular value from the set at a given store location. The analysis is parameterized by abstract variants of the functions that parameterized the concrete version: d :Σ ˆ × Kont → Time, tick

t

Because ς transitioned, exactly one of the rules from the definition of (7−→CESK ?t ) applies. We split by cases on these rules. The rule for the second case is deterministic and follows by calculation. For the the remaining (nondeterministic) cases, we must show an abstract state exists such that the simulation is preserved. By examining the rules for these cases, we see that all three hinge on the abstract store in ςˆ soundly approximating the concrete store in ς, which follows from the assumption that α(ς) v ςˆ.

ˆ × Kont → Addr . [ :Σ alloc

In the concrete, these parameters determine allocation and stack behavior. In the abstract, they are the arbiters of precision: they determine when an address gets re-allocated, how many addresses get allocated, and which values have to share addresses. Recall that in the concrete semantics, these functions consume states—not states and continuations as they do here. This is because in the concrete, a state alone suffices since the state determines the continuation. But in the abstract, a continuation pointer within a state may denote a multitude of continuations; however the transition relation is defined with respect to the choice of a particular one. We thus pair states with continuations to encode the choice. The abstract semantics computes the set of reachable states:

2.7

In this section, we instantiate the time-stamped CESK? machine to obtain a contour-based machine; this instantiation forms the basis of a context-sensitive abstract interpreter with polyvariance like that found in k-CFA [29]. In preparation for abstraction, we instantiate the time-stamped machine using labeled call strings. Inside times, we use contours (Contour ), which are finite strings of call site labels that describe the current context:

ς | he, ∅, [a0 7→ mt], a0 , t0 i 7−→ →CESK aval CESK \? ςˆ}. \? (e) = {ˆ t

t

2.6

A k-CFA-like abstract CESK? machine

δ ∈ Contour ::= | `δ.

Soundness and computability

The labeled CESK machine transition relation must appropriately instantiate the parameters tick and alloc to augment the timestamp on function call. Next, we switch to abstract stores and bound the address space by truncating call string contours to length at most k (for k-CFA):

The finiteness of the abstract state-space ensures decidability. Theorem 1 (Decidability of the Abstract CESK? Machine). ςˆ ∈ aval CESK \? (e) is decidable. t

Proof. The state-space of the machine is non-recursive with finite sets at the leaves on the assumption that addresses are finite. Hence reachability is decidable since the abstract state-space is finite.

\ k iff δ ∈ Contour and |δ| ≤ k. δ ∈ Contour Combining these changes, we arrive at the instantiations for the concrete and abstract machines given in Figure 7, where the value bδck is the leftmost k labels of contour δ.

We have endeavored to evolve the abstract machine gradually so that its fidelity in soundly simulating the original CEK machine is both intuitive and obvious. But to formally establish soundness of the abstract time-stamped CESK? machine, we use an abstraction function, defined in Figure 6, from the state-space of the concrete time-stamped machine into the abstracted state-space. The abstraction map over times and addresses is defined so d are sound simulations of the [ and tick that the parameters alloc parameters alloc and tick, respectively. We also define the partial order (v) on the abstract state-space as the natural point-wise, element-wise, component-wise and member-wise lifting, wherein the partial orders on the sets Exp and Addr are flat. Then, we can prove that abstract machine’s transition relation simulates the concrete machine’s transition relation.

Comparison to k-CFA: We say “k-CFA-like” rather than “kCFA” because there are distinctions between the machine just described and k-CFA: 1. k-CFA focuses on “what flows where”; the ordering between states in the abstract transition graph produced by our machine produces “what flows where and when.” 2. Standard presentations of k-CFA implicitly inline a global approximation of the store into the algorithm [29]; ours uses one store per state to increase precision at the cost of complexity. In terms of our framework, the lattice through which classical k[ CFA ascends is P (Exp × Env “ × Addr ) × Store, whereas our ” [ × Addr . analysis ascends the lattice P Exp × Env × Store We can explicitly inline the store to achieve the same complexity, as shown in Section 7.

Theorem 2 (Soundness of the Abstract CESK? Machine). If ς 7−→CEK ς 0 and α(ς) v ςˆ, then there exists an abstract state 0 0 0 ςˆ0 , such that ςˆ 7−→CESK \ ? ςˆ and α(ς ) v ςˆ . t

3. On function call, k-CFA merges argument values together with previous instances of those arguments from the same context; our “minimalist” evolution of the abstract machine takes a

Proof. By Lemmas 1, 2, and 3, it suffices to prove soundness with respect to 7−→CESK ?t . Assume ς 7−→CESK ?t ς 0 and α(ς) v ςˆ.

55

ς 7−→LK ς 0

Time = (Lab + •) × Contour Addr = (Lab + Var ) × Contour hx, ρ, σ, κi if σ(ρ(x)) = d(e, ρ0 ) if σ(ρ(x)) = c(v, ρ0 ) h(e0 e1 ), ρ, σ, κi

t0 = (•, ) tickhx, , , , ti = t tickh(e0 e1 )` , , , , ( , δ)i = (`, δ) ( (`, δ), tickhv, , σ, a, (`, δ)i = (•, `δ),

if σ(a) = ar( , , ) if σ(a) = fn( , , )

hv, ρ, σ, c1 (a, κ)i h(λx.e), ρ, σ, c2 (a, κ)i

alloc(h(e`0 e1 ), , , , ( , δ)i) = (`, δ)

he, ρ0 , σ, c1 (ρ(x), κ)i hv, ρ0 , σ, κi he0 , ρ, σ[a 7→ d(e1 , ρ)], c2 (a, κ)i where a ∈ / dom(σ) hv, ρ, σ[a 7→ c(v, ρ)], κi he, ρ[x 7→ a], σ, κi

Figure 8. The LK machine.

alloc(hv, , σ, a, ( , δ)i) = (`, δ) if σ(a) = ar(e` , , ) alloc(hv, , σ, a, ( , δ)i) = (x, δ) if σ(a) = fn((λx.e), , )

d ς , κ) [ ς , κ), u = tick(ˆ ςˆ 7−→LK ˆ0 , where κ ∈ σ ˆ (a), b = alloc(ˆ ? ς [ t

d tick(hx, , , , ti, κ) = t ` d tick(h(e 0 e1 ) , , , , ( , δ)i, κ) = (`, δ) ( (`, δ), d ,σ ˆ , a, (`, δ)i, κ) = tick(hv, (•, b`δck ),

hx, ρ, σ ˆ , a, ti if σ ˆ (ρ(x)) 3 d(e, ρ0 ) hx, ρ, σ ˆ , a, ti if σ ˆ (ρ(x)) 3 c(v, ρ0 ) h(e0 e1 ), ρ, σ ˆ , a, ti

if κ = ar( , , ) if κ = fn( , , )

` [ alloc(h(e 0 e1 ), , , , ( , δ)i, κ) = (`, δ)

[ alloc(hv, ,σ ˆ , a, ( , δ)i, κ) = (`, δ) if κ = ar(e` , , ) hv, ρ, σ ˆ , a, ti if κ = c1 (a0 , c) h(λx.e), ρ, σ ˆ , a, ti if κ = c2 (a0 , c)

[ ,σ ˆ , a, ( , δ)i, κ) = (x, δ) if κ = fn((λx.e), , ) alloc(hv, Figure 7. Instantiation for k-CFA machine.

hv, ρ0 , σ ˆ , a, ui he0 , ρ, σ ˆ 0 , b, ui [ where c = alloc(ˆ ς , κ), σ ˆ0 = σ ˆ t [c 7→ d(e1 , ρ), b 7→ c2 (c, a)] hv, ρ0 , σ ˆ t [a0 7→ c(v, ρ)], c, ui he, ρ0 [x 7→ a0 ], σ ˆ , c, ui

Figure 9. The abstract LK? machine.

higher-precision approach: it forks the machine for each argument value, rather than merging them immediately. 4. k-CFA does not recover explicit information about stack structure; our machine contains an explicit model of the stack for every machine state.

3.

he, ρ0 , σ ˆ t [b 7→ c1 (ρ(x), a)], b, ui

application expression, which forces the operator expression to a value. The address a is the address of the argument. The concrete state-space is defined as follows and the transition relation is defined in Figure 8:

Analyzing by-need with Krivine’s machine

ς∈Σ = Exp × Env × Store × Kont s ∈ Storable ::= d(e, ρ) | c(v, ρ) κ ∈ Kont ::= mt | c1 (a, κ) | c2 (a, κ)

Even though the abstract machines of the prior section have advantages over traditional CFAs, the approach we took (store-allocated continuations) yields more novel results when applied in a different context: a lazy variant of Krivine’s machine. That is, we can construct an abstract interpreter that both analyzes and exploits laziness. Specifically, we present an abstract analog to a lazy and properly tail-recursive variant of Krivine’s machine [19, 20] derived by Ager, Danvy, and Midtgaard [1]. The derivation from Ager et al.’s machine to the abstract interpreter follows the same outline as that of Section 2: we apply a pointer refinement by store-allocating continuations and carry out approximation by bounding the store. The by-need variant of Krivine’s machine considered here uses the common implementation technique of store-allocating thunks and forced values. When an application is evaluated, a thunk is created that will compute the value of the argument when forced. When a variable occurrence is evaluated, if it is bound to a thunk, the thunk is forced (evaluated) and the store is updated to the result. Otherwise if a variable occurrence is evaluated and bound to a forced value, that value is returned. Storable values include delayed computations (thunks) d(e, ρ), and computed values c(v, ρ), which are just tagged closures. There are two continuation constructors: c1 (a, κ) is induced by a variable occurrence whose binding has not yet been forced to a value. The address a is where we want to write the given value when this continuation is invoked. The other: c2 (a, κ) is induced by an

When the control component is a variable, the machine looks up its stored value, which is either computed or delayed. If delayed, a c1 continuation is pushed and the frozen expression is put in control. If computed, the value is simply returned. When a value is returned to a c1 continuation, the store is updated to reflect the computed value. When a value is returned to a c2 continuation, its body is put in control and the formal parameter is bound to the address of the argument. We now refactor the machine to use store-allocated continuations; storable values are extended to include continuations: ς∈Σ = Exp × Env × Store × Addr s ∈ Storable ::= d(e, ρ) | c(v, ρ) | κ κ ∈ Kont ::= mt | c1 (a, a) | c2 (a, a). It is straightforward to perform a pointer-refinement of the LK machine to store-allocate continuations as done for the CESK machine in Section 2.3 and observe the lazy variant of Krivine’s machine and its pointer-refined counterpart (not shown) operate in lock-step: Lemma 4. eval LK (e) ' eval LK ? (e). After threading time-stamps through the machine as done in d and alloc [ analogously to the defiSection 2.4 and defining tick

56

nitions given in Section 2.5, the pointer-refined machine abstracts directly to yield the abstract LK? machine in Figure 9. The abstraction map for this machine is a straightforward structural abstraction similar to that given in Section 2.6 (and hence omitted). The abstracted machine is sound with respect to the LK? machine, and therefore the original LK machine.

d [ ς , κ), u = tick(t) ςˆ 7−→LK ˆ0 , where κ ∈ σ ˆ (a), b = alloc(ˆ \ 0? ς h(e0 e1 ), ρ, σ ˆ , ai h(λx.e), ρ, σ ˆ , ai if κ = c2 (e0 , ρ0 , c)

Theorem 3 (Soundness of the Abstract LK? Machine). If ς 7−→LK ς 0 and α(ς) v ςˆ, then there exists an abstract state ςˆ0 , such that ςˆ 7−→LK ˆ0 and α(ς 0 ) v ςˆ0 . ? ς [

he0 , ρ, σ ˆ t [b 7→ c2 (e1 , ρ, a)], bi he, ρ[x 7→ b], σ ˆ t [b 7→ d(e0 , ρ0 )], ci

Figure 11. The abstract thunk postponing LK? machine.

t

Optimizing the machine through specialization: Ager et al. optimize the LK machine by specializing application transitions. When the operand of an application is a variable, no delayed computation needs to be constructed, thus “avoiding the construction of space-leaky chains of thunks.” Likewise, when the operand is a λ-abstraction, “we can store the corresponding closure as a computed value rather than as a delayed computation.” Both of these optimizations, which conserve valuable abstract resources, can be added with no trouble, as shown in Figure 10.

0 d [ ς , κ), u = tick(t) ςˆ 7−→CESK ˆ (a), b = alloc(ˆ \? ςˆ , where κ ∈ σ t

h(if e0 e1 e2 ), ρ, σ ˆ , a, ti he0 , ρ, σ ˆ t [b 7→ if (e1 , e2 , ρ, a)], b, ui h#f, ρ, σ ˆ , a, ti he1 , ρ0 , σ ˆ , c, ui 0 if κ = if (e0 , e1 , ρ , c) hv, ρ, σ ˆ , a, ti he0 , ρ0 , σ ˆ , c, ui if κ = if (e0 , e1 , ρ0 , c), and v 6= #f h(set! x e), ρ, σ ˆ , a, ti he, ρ, σ ˆ t [b 7→ set(ρ(x), a)], b, ui hv, ρ, σ ˆ , a, ti hv 0 , ρ, σ ˆ t [a0 7→ v], c, ui 0 if κ = set(a , c) where v 0 ∈ σ ˆ (a0 ) h(λx.e), ρ, σ ˆ , a, ti he, ρ[x 7→ b], σ ˆ t [b 7→ c], c, ui [ ς , κ) if κ = fn(callcc, ρ0 , c) where c = alloc(ˆ hc, ρ, σ ˆ , a, ti ha, ρ, σ ˆ , c, ui if κ = fn(callcc, ρ0 , a0 ) hv, ρ, σ ˆ , a, ti hv, ρ, σ ˆ , c, ui if κ = fn(c, ρ0 , a0 )

d [ ς , κ), u = tick(t) ςˆ 7−→LK ˆ0 , where κ ∈ σ ˆ (a), b = alloc(ˆ ? ς [ h(ex), ρ, σ ˆ , a, ti h(ev), ρ, σ ˆ , a, ti

he, ρ, σ ˆ t [b 7→ c2 (ρ(x), a)], b, ui he0 , ρ, σ ˆ t [b 7→ c(v, ρ), c 7→ c2 (b, a)], c, ui [ ς , κ) where c = alloc(ˆ

Figure 10. The abstract optimized LK? machine. Varying the machine through postponed thunk creation: Ager et al. also vary the LK machine by postponing the construction of a delayed computation from the point at which an application is the control string to the point at which the operator has been evaluated and is being applied. The c2 continuation is modified to hold, rather than the address of a delayed computation, the constituents of the computation itself:

Figure 12. The abstract extended CESK? machine. E[(if [ ] e0 e1 )] where ρ closes e00 to represent e0 , ρ closes e01 to represent e1 , and a is the address of the representation of E. Side effects are fully amenable to our approach; we introduce Scheme’s set! for mutating variables using the (set! x e) syntax. The set! form evaluates its subexpression e and assigns the value to the variable x. Although set! expressions are evaluated for effect, we follow Felleisen et al. and specify set! expressions evaluate to the value of x before it was mutated [11, page 166]. The evaluation context E[(set! x [ ])] is represented by set(a0 , a1 ), where a0 is the address of x’s value and a1 is the address of the representation of E. First-class control is introduced by adding a new base value callcc which reifies the continuation as a new kind of applicable value. Denoted values are extended to include representations of continuations. Since continuations are store-allocated, we choose to represent them by address. When an address is applied, it represents the application of a continuation (reified via callcc) to a value. The continuation at that point is discarded and the applied address is installed as the continuation. The resulting grammar is:

κ ∈ Kont ::= mt | c1 (a, a) | c2 (e, ρ, a). The transitions for applications and functions are replaced with those in Figure 11. This allocates thunks when a function is applied, rather than when the control string is an application. As Ager et al. remark, each of these variants gives rise to an abstract machine. From each of these machines, we are able to systematically derive their abstractions.

4.

State and control

We have shown that store-allocated continuations make abstract interpretation of the CESK machine and a lazy variant of Krivine’s machine straightforward. In this section, we want to show that the tight correspondence between concrete and abstract persists after the addition of language features such as conditionals, side effects, exceptions and continuations. We tackle each feature, and present the additional machinery required to handle each one. In most cases, the path from a canonical concrete machine to pointerrefined abstraction of the machine is so simple we only show the abstracted system. In doing so, we are arguing that this abstract machine-oriented approach to abstract interpretation represents a flexible and viable framework for building abstract interpreters. 4.1

e ∈ Exp ::= . . . | (if e e e) | (set! x e) κ ∈ Kont ::= . . . | if (e, e, ρ, a) | set(a, a) v ∈ Val ::= . . . | #f | callcc | a. We show only the abstract transitions, which result from storeallocating continuations, time-stamping, and abstracting the concrete transitions for conditionals, mutation, and control. The first three machine transitions deal with conditionals; here we follow the Scheme tradition of considering all non-false values as true. The fourth and fifth transitions deal with mutation.

Conditionals, mutation, and control

To handle conditionals, we extend the language with a new syntactic form, (if e e e), and introduce a base value #f, representing false. Conditional expressions induce a new continuation form: if (e00 , e01 , ρ, a), which represents the evaluation context

57

ς 7−→CESHK ς 0

ς 7−→CESHK ? ς 0 , where η = σ(h), κ = σ(a), b ∈ / dom(σ)

hv, ρ, σ, hn(v 0 , ρ0 , κ, η), mti hv, ρ, σ, η, κi h(throw v), ρ, σ, hn((λx.e), ρ0 , κ0 , η), κi he, ρ0 [x 7→ a], σ[a 7→ (v, ρ)], η, κ0 i where a ∈ / dom(σ) h(catch e v), ρ, σ, η, κi he, ρ, σ, hn(v, ρ, κ, η), mti

hv, ρ, σ, h, ai hv, ρ, σ, h0 , a0 i if η = hn(v 0 , ρ0 , a0 , h0 ), and κ = mt h(throw v), ρ, σ, h, ai he, ρ0 [x 7→ b], σ[b 7→ (v, ρ)], h0 , a0 i if η = hn((λx.e), ρ0 , a0 , h0 ) h(catch e v), ρ, σ, h, ai he, ρ, σ[b 7→ hn(v, ρ, a, h)], b, amt i

Figure 13. The CESHK machine. Figure 14. The CESHK? machine. The remaining three transitions deal with first-class control. In the first of these, callcc is being applied to a closure value v. The value v is then “called with the current continuation”, i.e., v is applied to a value that represents the continuation at this point. In the second, callcc is being applied to a continuation (address). When this value is applied to the reified continuation, it aborts the current computation, installs itself as the current continuation, and puts the reified continuation “in the hole”. Finally, in the third, a continuation is being applied; c gets thrown away, and v gets plugged into the continuation b. In all cases, these transitions result from pointer-refinement, time-stamping, and abstraction of the usual machine transitions. 4.2

0 [ ς , η, κ), ςˆ 7−→CESHK ˆ (h), κ ∈ σ ˆ (a), b = alloc(ˆ \ ? ςˆ , where η ∈ σ t d u = tick(t)

hv, ρ, σ ˆ , h, a, ti hv, ρ, σ ˆ , h0 , a0 , ui 0 0 0 0 if η = hn(v , ρ , a , h ), and κ = mt h(throw v), ρ, σ ˆ , h, a, ti if η = hn((λx.e), ρ0 , a0 , h0 ) he, ρ0 [x 7→ b], σ ˆ t [b 7→ (v, ρ)], h0 , a0 , ui h(catch e v), ρ, σ ˆ , h, a, ti he, ρ, σ ˆ t [b 7→ hn(v, ρ, a, h)], b, amt , ui

Exceptions and handlers

To analyze exceptional control flow, we extend the CESK machine with a register to hold a stack of exception handlers. This models a reduction semantics in which we have two additional kinds of evaluation contexts:

Figure 15. The abstract CESHK? machine. In the pointer-refined machine, the grammar of handler continuations changes to the following:

E ::= [ ] | (Ee) | (vE) | (catch E v) F ::= [ ] | (F e) | (vF ) H ::= [ ] | H[F [(catch H v)]],

η ∈ Handl ::= mt | hn(v, ρ, a, h), where h is used to range over addresses pointing to handler continuations. The notation amt means a such that σ(a) = mt in concrete case and mt ∈ σ ˆ (a) in the abstract, where the intended store should be clear from context. The pointer-refined machine is given in Figure 14. After threading time-stamps through the machine as done in Section 2.4, the machine abstracts as usual to obtain the machine in Figure 15. The only unusual step in the derivation is to observe that some machine transitions rely on a choice of two continuations from the store; a handler and a local continuation. Analogously d and alloc [ to take two continuation to Section 2.5, we extend tick arguments to encode the choice:

and the additional, context-sensitive, notions of reduction: 0

0

(catch E[(throw v)] v ) → (v v),

0

(catch v v ) → v.

H contexts represent a stack of exception handlers, while F contexts represent a “local” continuation, i.e., the rest of the computation (with respect to the hole) up to an enclosing handler, if any. E contexts represent the entire rest of the computation, including handlers. The language is extended with expressions for raising and catching exceptions. A new kind of continuation is introduced to represent a stack of handlers. In each frame of the stack, there is a procedure for handling an exception and a (handler-free) continuation:

d :Σ ˆ × Handl × Kont → Time, tick

e ∈ Exp ::= . . . | (throw v) | (catch e (λx.e)) η ∈ Handl ::= mt | hn(v, ρ, κ, η)

ˆ × Handl × Kont → Addr . [ :Σ alloc

An η continuation represents a stack of exception handler contexts, i.e., hn(v 0 , ρ, κ, η) represents H[F [(catch [ ] v)]], where η represents H, κ represents F , and ρ closes v 0 to represent v. The machine includes all of the transitions of the CESK machine extended with a η component; these transitions are omitted for brevity. The additional transitions are given in Figure 13. This presentation is based on a textbook treatment of exceptions and handlers [11, page 135].5 The initial configuration is given by:

5.

Abstract garbage collection

Garbage collection determines when a store location has become unreachable and can be re-allocated. This is significant in the abstract semantics because an address may be allocated to multiple values due to finiteness of the address space. Without garbage collection, the values allocated to this common address must be joined, introducing imprecision in the analysis (and inducing further, perhaps spurious, computation). By incorporating garbage collection in the abstract semantics, the location may be proved to be unreachable and safely overwritten rather than joined, in which case no imprecision is introduced. Like the rest of the features addressed in this paper, we can incorporate abstract garbage collection into our static analyzers

inj CESHK (e) = he, ∅, ∅, mt, mti. 5 To

be precise, Felleisen et al. present the CHC machine, a substitution based machine that uses evaluation contexts in place of continuations. Deriving the CESHK machine from it is an easy exercise.

58

must consider the case in which garbage is never collected, implying no storage is reclaimed to improve precision. However, we can leverage abstract garbage collection to reduce the state-space explored during analysis and to improve precision and analysis time. This is achieved (again) by considering properties of the concrete machine, which abstract directly; in this case, we want the concrete machine to deterministically collect garbage. Determinism of the CESK? machine is restored by defining the transition relation as a non-GC transition (Figure 3) followed by the GC transition (Figure 16). This state-space of this concrete machine is “garbage free” and consequently the state-space of the abstracted machine is “abstract garbage free.” In the concrete semantics, a nice consequence of this property is that although continuations are allocated in the store, they are deallocated as soon as they become unreachable, which corresponds to when they would be popped from the stack in a non-pointer-refined machine. Thus the concrete machine really manages continuations like a stack. Similarly, in the abstract semantics, continuations are deallocated as soon as they become unreachable, which often corresponds to when they would be popped. We say often, because due to the finiteness of the store, this correspondence cannot always hold. However, this approach gives a good finite approximation to infinitary stack analyses that can always match calls and returns.

ς 7−→CESK ? ς 0 he, ρ, σ, ai he, ρ, {hb, σ(b)i | b ∈ L}, ai if hLLσ (e, ρ) ∪ LLσ (σ(a)), {a}, σi 7−→ →GC h∅, L, σi Figure 16. The GC transition for the CESK? machine. by a straightforward pointer-refinement of textbook accounts of concrete garbage collection, followed by a finite store abstraction. Concrete garbage collection is defined in terms of a GC machine that computes the reachable addresses in a store [11, page 172]: hG, B, σi 7−→GC h(G ∪ LLσ (σ(a)) \ (B ∪ {a})), B ∪ {a}, σi if a ∈ G. This machine iterates over a set of reachable but unvisited “grey” locations G. On each iteration, an element is removed and added to the set of reachable and visited “black” locations B. Any newly reachable and unvisited locations, as determined by the “live locations” function LLσ , are added to the grey set. When there are no grey locations, the black set contains all reachable locations. Everything else is garbage. The live locations function computes a set of locations which may be used in the store. Its definition will vary based on the particular machine being garbage collected, but the definition appropriate for the CESK? machine of Section 2.3 is

6.

Abstract stack inspection

In this section, we derive an abstract interpreter for the static analysis of a higher-order language with stack inspection. Following the outline of Section 2 and 3, we start from the tail-recursive CM machine of Clements and Felleisen [3], perform a pointer refinement on continuations, then abstract the semantics by a parameterized bounding of the store.

LLσ (e) = ∅ LLσ (e, ρ) = LLσ (ρ|fv(e)) LLσ (ρ) = rng(ρ) LLσ (mt) = ∅ LLσ (fn(v, ρ, a)) = {a} ∪ LLσ (v, ρ) ∪ LLσ (σ(a))

6.1

LLσ (ar(e, ρ, a)) = {a} ∪ LLσ (e, ρ) ∪ LLσ (σ(a)).

The λsec -calculus and stack-inspection

The λsec -calculus of Pottier, Skalka, and Smith is a call-by-value λ-calculus model of higher-order stack inspection [26]. We present the language as given by Clements and Felleisen [3]. All code is statically annotated with a given set of permissions R, chosen from a fixed set P. A computation whose source code was statically annotated with a permission may enable that permission for the dynamic extent of a subcomputation. The subcomputation is privileged so long as it is annotated with the same permission, and every intervening procedure call has likewise been annotated with the privilege.

We write ρ|fv(e) to mean ρ restricted to the domain of free variables in e. We assume the least-fixed-point solution in the calculation of the function LL in cases where it recurs on itself. The pointer-refinement of the machine requires parameterizing the LL function with a store used to resolve pointers to continuations. A nice consequence of this parameterization is that we can re-use LL for abstract garbage collection by supplying it an abstract store for the parameter. Doing so only necessitates extending LL to the case of sets of storable values: [ LLσ (S) = LLσ (s)

e ∈ Exp ::= . . . | fail | (grant R e) | (test R e e) | (frame R e)

s∈S ?

The CESK machine incorporates garbage collection by a transition rule that invokes the GC machine as a subroutine to remove garbage from the store (Figure 16). The garbage collection transition introduces non-determinism to the CESK? machine because it applies to any machine state and thus overlaps with the existing transition rules. The non-determinism is interpreted as leaving the choice of when to collect garbage up to the machine. The abstract CESK? incorporates garbage collection by the concrete garbage collection transition, i.e., we re-use the definition in Figure 16 with an abstract store, σ ˆ , in place of the concrete one. Consequently, it is easy to verify abstract garbage collection approximates its concrete counterpart. The CESK? machine may collect garbage at any point in the computation, thus an abstract interpretation must soundly approximate all possible choices of when to trigger a collection, which the abstract CESK? machine does correctly. This may be a useful analysis of garbage collection, however it fails to be a useful analysis with garbage collection: for soundness, the abstracted machine

A fail expression signals an exception if evaluated; by convention it is used to signal a stack-inspection failure. A (frame R e) evaluates e as the principal R, representing the permissions conferred on e given its origin. A (grant R e) expression evaluates as e but with the permissions extended with R enabled. A (test R e0 e1 ) expression evaluates to e0 if R is enabled and e1 otherwise. A trusted annotator consumes a program and the set of permissions it will operate under and inserts frame expressions around each λ-body and intersects all grant expressions with this set of permissions. We assume all programs have been properly annotated. Stack inspection can be understood in terms of an OK predicate on an evaluation contexts and permissions. The predicate determines whether the given permissions are enabled for a subexpression in the hole of the context. The OK predicate holds whenever the context can be traversed from the hole outwards and, for each permission, find an enabling grant context without first finding a denying frame context.

59

ς 7−→CM ς 0

0 ςˆ 7−→CM [? ςˆ

hfail, ρ, σ, mt∅ i

hfail, ρ, σ, κi h(frame R e), ρ, σ, κi h(grant R e), ρ, σ, κi h(test R e0 e1 ), ρ, σ, κi

hfail, ρ, σ ˆ , ai

he, ρ, σ, κ[R 7→ deny]i he, ρ, σ, κ[R 7→ grant]i ( he0 , ρ, σ, κi he1 , ρ, σ, κi

h(frame R e), ρ, σ ˆ , ai h(grant R e), ρ, σ ˆ , ai

if OK(R, κ), otherwise

h(test R e0 e1 ), ρ, σ ˆ , ai

OK(∅, κ) m

OK(R, mt ) ff

OK(R, fnm (v, ρ, κ)) OK(R, arm (e, ρ, κ))

−1

⇐⇒

(R ∩ m

⇐⇒

(R ∩ m−1 (deny) = ∅) ∧ OK(R \ m−1 (grant), κ)

(deny) = ∅)

The CM machine

7.

Widening to improve complexity

[ × Addr × Time| |Exp × Env × Store

We write κ[R 7→ c] to mean update the marks on κ to m[R 7→ c]. The CM machine is defined in Figure 17 (transitions that are straightforward adaptations of the corresponding CESK? transitions to incorporate continuation marks are omitted). It relies on the OK predicate to determine whether the permissions in R are enabled. The OK predicate performs the traversal of the context (represented as a continuation) using marks to determine which permissions have been granted or denied. The semantics of a program is given by the set of reachable states from an initial machine configuration:

= |Exp| × |Addr ||Var | × |Storable||Addr | × |Addr | × |Time|. Without simplifying any further, we clearly have an exponential number of abstract states. To reduce complexity, we can employ widening in the form of Shivers’s single-threaded store [29]. To use a single threaded store, we have to reconsider the abstract evaluation function itself. Instead of seeing it as a function that returns the set of reachable states, it is a function that returns a set of partial states plus a single globally approximating store, i.e., aval : Exp → System, where:

inj CM (e) = he, ∅, [a0 7→ mt∅ ], a0 i. 6.3

[? (R, σ if OK ˆ , a), otherwise.

If implemented na¨ıvely, it takes time exponential in the size of the input program to compute the reachable states of the abstracted machines. Consider the size of the state-space for the abstract timestamped CESK? machine:

mtm | arm (e, ρ, κ) | fnm (v, ρ, κ).

::=

( he0 , ρ, σ ˆ , ai he1 , ρ, σ ˆ , ai

Figure 18. The abstract CM? machine.

The CM (continuation-marks) machine of Clements and Felleisen is a properly tail-recursive extended CESK machine for interpreting higher-order languages with stack-inspection [3]. In the CM machine, continuations are annotated with marks [4], which, for the purposes of stack-inspection, are finite maps from permissions to {deny, grant}: κ

he, ρ, σ ˆ (a)[R 7→ deny], ai he, ρ, σ ˆ (a)[R 7→ grant], ai

[? (∅, σ OK ˆ , a) ? [ (R, σ OK ˆ , a) ⇐⇒ (R ∩ m−1 (deny) = ∅) if σ ˆ (a) 3 mtm [? (R, σ OK ˆ , a) ⇐⇒ (R ∩ m−1 (deny) = ∅) ∧ m [? (R \ m−1 (grant), σ if σ ˆ (a) 3 fn (v, ρ, b) OK ˆ , b) or σ ˆ (a) 3 arm (e, ρ, b)

Figure 17. The CM machine and OK predicate. 6.2

hfail, ρ, σ ˆ , amt i

[ System = P (Exp × Env × Addr × Time) × Store.

The abstract CM? machine

We compute this as a fixed point of a monotonic function, f : f : System → System

Store-allocating continuations, time-stamping, and bounding the store yields the transition system given in Figure 18. The notation σ ˆ (a)[R 7→ c] is used to mean [R 7→ c] should update some continuation in σ ˆ (a), i.e.,

f (C, σ ˆ ) = (C 0 , σ ˆ 00 ) where ˘ 0 0 ¯ 0 Q = (c , σ ˆ ) : c ∈ C and (c, σ ˆ ) 7−→ (c0 , σ ˆ0) (c0 , σ ˆ0 ) ∼ = inj (e) ¯ ˘ C 0 = C ∪ c0 : (c0 , ) ∈ Q0 ∪ {c0 } G σ ˆ 00 = σ ˆt σ ˆ0,

σ ˆ (a)[R 7→ c] = σ ˆ [a 7→ σ ˆ (a) \ {κ} ∪ {κ[R 7→ c]}], for some κ ∈ σ ˆ (a). It is worth noting that continuation marks are updated, not joined, in the abstract transition system. [? predicate (Figure 18) approximates the pointer refineThe OK ment of its concrete counterpart OK, which can be understood as tracing a path through the store corresponding to traversing the continuation. The abstract predicate holds whenever there exists such a path in the abstract store that would satisfy the concrete predicate: Consequently, in analyzing (test R e0 e1 ), e0 is reachable only when the analysis can prove the OK? predicate holds on some path through the abstract store. It is straightforward to define a structural abstraction map and verify the abstract CM? machine is a sound approximation of its concrete counterpart:

( ,ˆ σ 0 )∈Q0

so that aval (e) = lfp(f ). The maximum number of iterations of the function f times the cost of each iteration bounds the complexity of the analysis. Polynomial complexity for monovariance: It is straightforward to compute the cost of a monovariant (in our framework, a “0CFAlike”) analysis with this widening. In a monovariant analysis, environments disappear; a monovariant system-space simplifies to: System 0 = P (Exp × Lab × Lab ⊥ )

Theorem 4 (Soundness of the Abstract CM? Machine). If ς 7−→CM ς 0 and α(ς) v ςˆ, then there exists an abstract state ςˆ0 , 0 0 0 such that ςˆ 7−→CM [? ςˆ and α(ς ) v ςˆ .

addresses

fn conts

ar conts

}| { }| { z }| { z z × ((Var + Lab) → (Exp × Lab) + (Exp × Lab) +Lam).

t

60

If ascended monotonically, one could add one new partial state each time or introduce a new entry into the global store. Thus, the maximum number of monovariant iterations is:

containing higher-order objects (functions) over reflexive domains, whereas our purpose requires a more concrete compile-time representation of the values assumed by variables. We therefore modify the semantics such that its abstraction results in domains which are both finite and non-reflexive.” Because of the reflexivity of denotable values, a direct abstraction is not possible, so he performs closure conversion on the (representation of) the semantic function. Harrison then abstracts the machine by bounding the procedure string space (and hence the store) via an abstraction he calls stack configurations, which is represented by a finite set of members, each of which describes an infinite set of procedure strings. To prove that Harrison’s abstract interpreter is correct he argues that the machine interpreting the translation of a program in the intermediate language corresponds to interpreting the program as written in the standard semantics—in this case, the denotational semantics of R3 RS. On the other hand, our approach relies on well known machines with well known relations to calculi, reduction semantics, and other machines [10, 8]. These connections, coupled with the strong similarities between our concrete and abstract machines, result in minimal proof obligations in comparison. Moreover, programs are analyzed in direct-style under our approach.

|Exp| × |Lab|2 + 1 + |Var + Lab| × (2|Exp × Lab| + |Lam|), which is cubic in the size of the program.

8.

Related work

The study of abstract machines for the λ-calculus began with Landin’s SECD machine [21], the theory of abstract interpretation with the POPL papers of the Cousots’ [6, 7], and static analysis of the λ-calculus with Jones’s coupling of abstract machines and abstract interpretation [17]. All three have been active areas of research since their inception, but only recently have well known abstract machines been connected with abstract interpretation by Midtgaard and Jensen [23, 24]. We strengthen the connection by demonstrating a general technique for abstracting abstract machines. Abstract interpretation of abstract machines: The approximation of abstract machine states for the analysis of higher-order languages goes back to Jones [17], who argued abstractions of regular tree automata could solve the problem of recursive structure in environments. We re-invoked that wisdom to eliminate the recursive structure of continuations by allocating them in the store. Midtgaard and Jensen present a 0CFA for a CPS λ-calculus language [23]. The approach is based on Cousot-style calculational abstract interpretation [5], applied to a functional language. Like the present work, Midtgaard and Jensen start with an “off-the-shelf” abstract machine for the concrete semantics (in this case, the CE machine of Flanagan, et al. [14]) and employ a reachable-states model. They then compose well-known Galois connections to reveal a 0CFA with reachability in the style of Ayers [2].6 The CE machine is not sufficient to interpret direct-style programs, so the analysis is specialized to programs in continuation-passing style. Later work by Midtgaard and Jensen went on to present a similar calculational abstract interpretation treatment of a monomorphic CFA for an ANF λ-calculus [24]. The concrete semantics are based on reachable states of the Ca EK machine [14]. The abstract semantics approximate the control stack component of the machine by its top element, which is similar to the labeled machine abstraction given in Section 2.7 when k = 0. Although our approach is not calculational like Midtgaard and Jensen’s, it continues in their tradition by applying abstract interpretation to off-the-shelf tail-recursive machines. We extend the application to direct-style machines for a k-CFA-like abstraction that handles tail calls, laziness, state, exceptions, first-class continuations, and stack inspection. We have extended return flow analysis to a completely direct style (no ANF or CPS needed) within a framework that accounts for polyvariance. Harrison gives an abstract interpretation for a higher-order language with control and state for the purposes of automatic parallelization [15]. Harrison maps Scheme programs into an imperative intermediate language, which is interpreted on a novel abstract machine. The machine uses a procedure string approach similar to that given in Section 2.7 in that the store is addressed by procedure strings. Harrison’s first machine employs higher-order values to represent functions and continuations and he notes, “the straightforward abstraction of this semantics leads to abstract domains

Abstract interpretation of lazy languages: Jones has analyzed non-strict functional languages [17, 16], but that work has only focused on the by-name aspect of laziness and does not address memoization as done here. Sestoft examines flow analysis for lazy languages and uses abstract machines to prove soundness [27]. In particular, Sestoft presents a lazy variant of Krivine’s machine similar to that given in Section 3 and proves analysis is sound with respect to the machine. Likewise, Sestoft uses Landin’s SECD machine as the operational basis for proving globalization optimizations correct. Sestoft’s work differs from ours in that analysis is developed separately from the abstract machines, whereas we derive abstract interpreters directly from machine definitions. Fax´en uses a typebased flow analysis approach to analyzing a functional language with explicit thunks and evals, which is intended as the intermediate language for a compiler of a lazy language [9]. In contrast, our approach makes no assumptions about the typing discipline and analyzes source code directly. Realistic language features and garbage collection: Static analyzers typically hemorrhage precision in the presence of exceptions and first-class continuations: they jump to the top of the lattice of approximation when these features are encountered. Conversion to continuation- and exception-passing style can handle these features without forcing a dramatic ascent of the lattice of approximation [29]. The cost of this conversion, however, is lost knowledge— both approaches obscure static knowledge of stack structure, by desugaring it into syntax. Might and Shivers introduced the idea of using abstract garbage collection to improve precision and efficiency in flow analysis [25]. They develop a garbage collecting abstract machine for a CPS language and prove it correct. We extend abstract garbage collection to direct-style languages interpreted on the CESK machine. Static stack inspection: Most work on the static verification of stack inspection has focused on type-based approaches. Skalka and Smith present a type system for static enforcement of stackinspection [30]. Pottier et al. present type systems for enforcing stack-inspection developed via a static correspondence to the dynamic notion of security-passing style [26]. Skalka et al. present type and effect systems that use linear temporal logic to express regular properties of program traces and show how to statically enforce both stack- and history-based security mechanisms [31]. Our approach, in contrast, is not typed-based and focuses only on stackinspection, although it seems plausible the approach of Section 6 extends to the more general history-based mechanisms.

6 Ayers

derived an abstract interpreter by transforming (the representation of) a denotational continuation semantics of Scheme into a state transition system (an abstract machine), which he then approximated using Galois connections [2].

61

9.

Conclusions and perspective

[13] Mattias Felleisen and D. P. Friedman. A calculus for assignments in higher-order languages. In POPL ’87: Proceedings of the 14th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pages 314+, 1987.

We have demonstrated the utility of store-allocated continuations by deriving novel abstract interpretations of the CEK, a lazy variant of Krivine’s, and the stack-inspecting CM machines. These abstract interpreters are obtained by a straightforward pointer refinement and structural abstraction that bounds the address space, making the abstract semantics safe and computable. Our technique allows concrete implementation technology to be mapped straightforwardly into that of static analysis, which we demonstrated by incorporating abstract garbage collection and optimizations to avoid abstract space leaks, both of which are based on existing accounts of concrete GC and space efficiency. Moreover, the abstract interpreters properly model tail-calls by virtue of their concrete counterparts being properly tail-call optimizing. Finally, our technique uniformly scales up to richer language features. We have supported this by extending the abstract CESK machine to analyze conditionals, first-class control, exception handling, and state. We speculate that store-allocating bindings and continuations is sufficient for a straightforward abstraction of most existing machines.

[14] Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The essence of compiling with continuations. In PLDI ’93: Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, pages 237–247, June 1993. [15] Williams L. Harrison. The interprocedural analysis and automatic parallelization of scheme programs. LISP and Symbolic Computation, 2(3):179–396, October 1989. [16] N. Jones and N. Andersen. Flow analysis of lazy higher-order functional programs. Theoretical Computer Science, 375(1-3):120– 136, May 2007. [17] Neil D. Jones. Flow analysis of lambda expressions (preliminary version). In Proceedings of the 8th Colloquium on Automata, Languages and Programming, pages 114–128, 1981. [18] Neil D. Jones and Steven S. Muchnick. A flexible approach to interprocedural data flow analysis and programs with recursive data structures. In POPL ’82: Proceedings of the 9th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 66–74, 1982.

Acknowledgments: We thank Matthias Felleisen, Jan Midtgaard, and Sam Tobin-Hochstadt for discussions and suggestions. We also thank the anonymous reviewers for their close reading and helpful critiques; their comments have improved this paper.

[19] Jean-Louis Krivine. Un interpr´eteur du lambda-calcul. 1985. [20] Jean-Louis Krivine. A call-by-name lambda-calculus machine. Higher-Order and Symbolic Computation, 20(3):199–207, September 2007.

References

[21] Peter J. Landin. The mechanical evaluation of expressions. The Computer Journal, 6(4):308–320, 1964.

[1] Mads S. Ager, Olivier Danvy, and Jan Midtgaard. A functional correspondence between call-by-need evaluators and lazy abstract machines. Information Processing Letters, 90(5):223–232, June 2004.

[22] Jan Midtgaard. Control-flow analysis of functional programs. Technical Report BRICS RS-07-18, DAIMI, Department of Computer Science, University of Aarhus, December 2007. To appear in revised form in ACM Computing Surveys.

[2] Andrew E. Ayers. Abstract analysis and optimization of Scheme. PhD thesis, Massachusetts Institute of Technology, 1993.

[23] Jan Midtgaard and Thomas Jensen. A calculational approach to control-flow analysis by abstract interpretation. In Mar´ıa Alpuente and Germ´an Vidal, editors, SAS, volume 5079 of Lecture Notes in Computer Science, pages 347–362, 2008.

[3] John Clements and Matthias Felleisen. A tail-recursive machine with stack inspection. ACM Trans. Program. Lang. Syst., 26(6):1029– 1052, November 2004. [4] John Clements, Matthew Flatt, and Matthias Felleisen. Modeling an algebraic stepper. In ESOP ’01: Proceedings of the 10th European Symposium on Programming Languages and Systems, pages 320– 334, 2001.

[24] Jan Midtgaard and Thomas P. Jensen. Control-flow analysis of function calls and returns by abstract interpretation. In ICFP ’09: Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming, pages 287–298, 2009.

[5] Patrick Cousot. The calculational design of a generic abstract interpreter. In M. Broy and R. Steinbr¨uggen, editors, Calculational System Design. 1999.

[25] Matthew Might and Olin Shivers. Improving flow analyses via ΓCFA: Abstract garbage collection and counting. In ICFP ’06: Proceedings of the Eleventh ACM SIGPLAN International Conference on Functional Programming, pages 13–25, 2006.

[6] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Conference Record of the Fourth ACM Symposium on Principles of Programming Languages, pages 238–252, 1977.

[26] Franc¸ois Pottier, Christian Skalka, and Scott Smith. A systematic approach to static access control. ACM Trans. Program. Lang. Syst., 27(2):344–382, March 2005. [27] Peter Sestoft. Analysis and efficient implementation of functional programs. PhD thesis, University of Copenhagen, October 1991.

[7] Patrick Cousot and Radhia Cousot. Systematic design of program analysis frameworks. In POPL ’79: Proceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pages 269–282, 1979.

[28] Zhong Shao and Andrew W. Appel. Space-efficient closure representations. In LFP ’94: Proceedings of the 1994 ACM Conference on LISP and Functional Programming, pages 150–161, 1994.

[8] Olivier Danvy. An Analytical Approach to Program as Data Objects. DSc thesis, Department of Computer Science, Aarhus University, October 2006.

[29] Olin G. Shivers. Control-Flow Analysis of Higher-Order Languages. PhD thesis, Carnegie Mellon University, 1991.

[9] Karl Fax´en. Optimizing lazy functional programs using flow inference. In Static Analysis, pages 136–153. 1995.

[30] Christian Skalka and Scott Smith. Static enforcement of security with types. In ICFP ’00: Proceedings of the fifth ACM SIGPLAN International Conference on Functional Programming, pages 34–45, September 2000.

[10] Matthias Felleisen. The Calculi of Lambda-v-CS Conversion: A Syntactic Theory of Control and State in Imperative Higher-Order Programming Languages. PhD thesis, Indiana University, 1987.

[31] Christian Skalka, Scott Smith, and David Van Horn. Types and trace effects of higher order programs. Journal of Functional Programming, 18(02):179–249, 2008.

[11] Matthias Felleisen, Robert B. Findler, and Matthew Flatt. Semantics Engineering with PLT Redex. August 2009. [12] Matthias Felleisen and Daniel P. Friedman. Control operators, the SECD-machine, and the lambda-calculus. In 3rd Working Conference on the Formal Description of Programming Concepts, August 1986.

62

Polyvariant Flow Analysis with Higher-ranked Polymorphic Types and Higher-order Effect Operators Stefan Holdermans

Jurriaan Hage

Vector Fabrics Paradijslaan 28, 5611 KN Eindhoven, The Netherlands [email protected]

Dept. of Inf. and Comp. Sciences, Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands [email protected]

Abstract

applicable to all well-typed terms in an explicitly typed lambdacalculus with Booleans and conditionals (Section 6.1), and sound with respect to an instrumented, flow-tracking semantics (Section 6.2).

We present a type and effect system for flow analysis that makes essential use of higher-ranked polymorphism. We show that, for higher-order functions, the expressiveness of higher-ranked types enables us to improve on the precision of conventional letpolymorphic analyses. Modularity and decidability of the analysis are guaranteed by making the analysis of each program parametric in the analyses of its inputs; in particular, we have that higher-order functions give rise to higher-order operations on effects. As flow typing is archetypical to a whole class of type and effect systems, our approach can be used to boost the precision of a wide range of type-based program analyses for higher-order languages.

• The main technical innovations of our system are its use of so-

called fully flexible types to maintain the modularity of the analyses (Section 4.1) and its use of annotation and effect operators to have the analyses of higher-order functions explicitly parameterised in the analyses of their arguments (Section 4.2). • For all terms with fully flexibly typed free variables, our system

admits “best analyses” (Section 6.3), which can be obtained by means of a strikingly straightforward inference algorithm (Section 7).

Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features—Polymorphism; F.3.2 [Logics and Meanings of Programs]: Semantics of Programming Languages—Program analysis; F.3.3 [Logics and Meanings of Programs]: Studies of Program Constructs—Functional constructs, Type structure General Terms Keywords phism

1.

We stress that flow typing is, in a sense, archetypical to a whole class of type and effect systems; as a wide range of other analyses, including binding-time analysis, strictness analysis, and usage analysis, are known to be expressible as variations of typebased control-flow analysis, we expect our approach to also apply to most if not all of these analyses.

Languages, Theory

type-based program analysis, higher-ranked polymor-

2.

Motivation

Numerous static program analyses depend on information about the flow of control in the program under analysis. Whereas for firstorder languages this information is directly available from the program text, the situation for higher-order languages, in which functions or procedures can be passed as arguments to other functions or procedures, is considerably different; for these languages, one has to deal with the dynamic dispatch problem. Consider, for example, the following program fragment, written in some typed higherorder functional language:

Introduction

The use of polymorphic types in type and effect systems for static program analysis is usually limited to ML-style let-polymorphism. This restriction precludes the formal parameters of higher-order functions from being analysed polyvariantly rather than monovariantly. In this paper, we consider a type and effect system that allows analyses to be expressed in terms of higher-ranked polymorphic types and argue how the resulting polyvariant analyses are more powerful than the analyses obtained from let-polymorphic systems. Specifically, our contributions are the following:

h : (bool → bool) → bool h f = if f false then f true else false.

• We present an annotated type and effect system for flow anal-

As the function parameter f can, at run-time, be bound to any suitably typed function, it is not obvious to what code control is transferred when the condition f false in the body of h is evaluated. To cope with the dynamic dispatch problem, several flow analyses have been proposed. Of particular interest are flow analyses that, in some way or another, take advantage of the structure that is imposed on programs by a static typing discipline for the language under analysis; such type-based analyses can typically be more effective than analyses for dynamically typed languages or analyses that ignore the well-typedness of analysed programs (Palsberg 2001). An important class of type-based analyses is then that of so-called type and effect systems that extend the typing disci-

ysis that makes essential use of higher-ranked polymorphism in both annotations and effects (Section 5). The resulting analysis is polyvariant in its treatment of lambda-bound variables,

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

63

h : (∀β . boolβ → boolβ ) → bool{`2 ,`3 }

plines of languages as to express properties beyond just plain data types (Nielson and Nielson 1999). For instance, to track the flow of Boolean values through a program, we can decorate all occurrences of the Boolean constructors false and true in a program with labels `1 , `2 , . . . , as in

with different choices for β for different uses of f in the body of h allowing for a more polyvariant analysis. Here, we require h to have a so-called rank-2 polymorphic type. In general, the rank of a polymorphic type describes the maximum depth at which universal quantifiers occur in contravariant positions (Kfoury and Tiuryn 1992). As it is well-known that the higher-ranked fragment of the polymorphic lambda-calculus does not admit principal types and that type inference is undecidable for rank 3 and higher, it is not immediately obvious that higher-ranked polymorphic types can be of any practical use in type and effect systems for fully automatic program analysis. However, here it is crucial that we only need to consider types that are polymorphic in the annotations that decorate types rather than in the types themselves. As it turns out, higherranked annotation polymorphism does indeed provide a feasible basis for attaining analyses that are fully polyvariant with respect to the formal parameters of higher-order functions.1 The main challenge of incorporating higher-ranked polymorphic types in a type and effect system is then to take advantage of their expressive power without compromising the modularity of the analysis. For example, the rank-2 type for h that was proposed above is too specific as it presumes that the function bound to the parameter f will manifest identity-like behaviour, which in general is obviously unacceptably restrictive. Below, we will rise to the challenge and present a modular type and effect system with higher-ranked polymorphic types that admits analyses for higherorder functions like h that are adaptive enough for all appropriately typed functions to be passed in as arguments, while still allowing for the formal parameters of these higher-order functions to be analysed polyvariantly.

h f = if f false`1 then f true`2 else false`3 , and adopt an extended type system that annotates the type bool of Boolean values with sets of labels identifying the possible construction sites of these values. The Boolean identity function, id x = x, then, for example, can have the type bool{`1 ,`2 } → bool{`1 ,`2 } , indicating that if its argument x is a Boolean constructed at any of the sites labelled with `1 or `2 , then so is its result. Assigning the function id this type prepares it for being passed as an argument to the function h above, which can be of type (bool{`1 ,`2 } → bool{`1 ,`2 } ) → bool{`1 ,`2 ,`3 } . However, in general the assigned type is too specific as id could be used in other contexts as well. This is suggestive of annotating the argument and result types of id with a larger set as to reflect all uses of id in the program, but this is undesirable for at least two reasons. First, it requires the whole program to be available as information is required about all possible uses of id and thus precludes the analysis from being modular. Second, it renders the analysis of program fragments that directly or indirectly use id rather imprecise as the larger set shows up for every value that is obtained by applying id, irrespective of the actual argument supplied. This latter issue is known as the poisoning problem (Wansbrough and Peyton Jones 1999). In general, poisoning can be reduced by making the analysis more polyvariant, that is, allowing different uses of an identifier to be analysed independently. One way to make an analysis based on a type and effect system both more modular and more polyvariant is by making use of annotation polymorphism. For example, id can be assigned the polymorphic type ∀β . boolβ → boolβ with β ranging over sets of constructor labels. Indeed, this type can be derived from just the definition of id and instantiated to a more specific type for each use of id. The use of polymorphism in type and effect systems is usually limited to ML-style let-polymorphism (Damas and Milner 1982), meaning that polymorphic types can only be assigned to identifiers bound at top level or in local definitions. This seems like a natural restriction as program analyses are almost always required to be performed fully automatically and ML-style polymorphic types allow for mechanically and modularly deriving “best analyses”, which are then typically defined in terms of principal types, whereas more expressive uses of polymorphism do not necessarily admit such mechanisation. To see why we may still want to consider less restrictive uses of polymorphism, consider once more applying the function h from the example above to the Boolean identity function id. In a let-polymorphic type and effect system, h can be expected to have a type much like ∀β . (bool{`1 ,`2 } → boolβ ) → boolβ ∪{`3 } . The aforementioned polymorphic type of id is then instantiated to bool{`1 ,`2 } → bool{`1 ,`2 } and instantiating the variable β in the type of h then yields bool{`1 ,`2 ,`3 } as the type obtained for the application h id. Note that this result is imprecise in the sense that the Boolean constructed at the site labelled with `1 never flows to the result of any invocation of h. This imprecision is caused by the restriction that, in an ML-style type and effect system, the formal parameter f of h has to be assigned a monomorphic type. Hence, uses of f in the body of h are analysed monovariantly and subjected to poisoning. Now, if the type and effect system were to somehow allow the parameter f of h to have a polymorphic type, we could have

3.

Preliminaries

Throughout this paper, we use, as the language under analysis, an eagerly evaluated and simply typed Church-style lambda-calculus with Booleans, conditionals, and general recursion. Assuming an abstract set of program labels and a countable infinite set of variable symbols, ` ∈ Lab x ∈ Var

labels variables,

terms in our language are constructed from variables, producers, and consumers; that is, we have t p c

∈ Tm ∈ Prod ∈ Cons

terms producers consumers

with t ::= x | p` | c` p ::= false | true | λx : τ.t1 c ::= if t1 then t2 else t3 | t1 t2 | fix t1 . All producers and consumers are labelled. A producer is either one of the Boolean constructors false and true or a lambdaabstraction, while consumers subsume conditionals, function applications, and fixed points. As usual, function application associates to the left and lambda-abstractions extend as far to the right as possible. Each abstraction is annotated with the type of its formal parameter, where types, 1 This

approach is reminiscent of the use of polymorphic recursion in the type-based binding-time analysis of Dussart et al. (1995): while polymorphic recursion in its full, untamed glory renders type inference undecidable, its restriction to binding-time annotations has proven to allow for a very expressive yet workable analysis. See Section 4.3.

64

system and the types from the underlying type system play a crucial rˆole in our approach as they guide our polyvariant flow analysis.

t ⇓F p`

Evaluation p` ⇓{ } p`

[e-prod]

4.

t1 ⇓F1 true`p t2 ⇓F2 p` [e-if-true] (if t1 then t2 else t3 )`c ⇓F1 ∪{(`c ,`p )}∪F2 p` t1 ⇓F1 false`p

t3 ⇓F3 p`

(if t1 then t2 else t3 )`c ⇓F1 ∪{(`c ,`p )}∪F3 p` t2 ⇓F2 p2 `2

t1 ⇓F1 (λx : τ.t0 )`p

[e-if-false]

[x 7→ p2 `2 ]t0 ⇓F0 p`

(t1 t2 )`c ⇓F1 ∪F2 ∪{(`c ,`p )}∪F0 p` t1 ⇓F1 (λx : τ.t0 )`p

[x 7→ (fix t1 )`c ]t0 ⇓F0 p`

(fix t1 )`c ⇓F1 ∪{(`c ,`p )}∪F0 p`

[e-app]

[e-fix]

Figure 1. Instrumented natural semantics.

Γ`t :τ

Typing Γ(x) = τ [t-var] Γ`x:τ Γ ` false` : bool

[t-false]

Γ ` true` : bool

[t-true]

4.1

Γ[x 7→ τ1 ] ` t1 : τ2 [t-abs] Γ ` (λx : τ1 .t1 )` : τ1 → τ2

(λx : bool. (if x then false`1 else true`2 )`3 )`4 .

Γ ` t1 : τ → τ [t-fix] Γ ` (fix t1 )` : τ

Analysing this function may then result in the triple {(`3 ,β )}

(∀β . boolβ −−−−−→ bool{`1 ,`2 } ){`4 } & { },

Figure 2. The underlying type system. τ

∈ Ty

expressing that the `4 -labelled lambda-abstraction immediately (i.e., flowlessly) produces a function that may have its argument consumed by the conditional labelled with `3 before returning a Boolean that is produced at either `1 or `2 . Note that the annotated type for the negation function is polymorphic in the annotation for its argument x and how this is crucial for obtaining an analysis that is modular: whatever Boolean it is applied to, the type of the function can always be instantiated to obtain a suitable analysis for the application. As modularity is a key aspect of our analysis, let us from now on assume that functions are always analysed with maximum applicability in mind and, hence, that all functions have types that are indeed polymorphic in their argument annotations. We shall refer to such types as fully flexible types.

types,

are given by τ

::= bool | τ1 → τ2 .

An instrumented natural semantics is given in Figure 1 as a set of inference rules for deriving judgements of the form t ⇓F p` , indicating that the term t evaluates in zero or more steps to the value produced by the `-labelled producer p, while the flow of values during evaluation is captured by the flow set F, F

∈ Flow

= P(Lab × Lab) flow.

Concretely, each pair (`c , `p ) in a flow set F witnesses the consumption of a value produced at a program point labelled with `p by a consumer labelled with `c . Note that Boolean values (produced by the constructors false and true) are consumed by conditionals, while functions (produced by lambda-abstractions) are consumed by function applications and occurrences of the fixedpoint operator. Evaluation proceeds under a call-by-value strategy; capture-avoiding substitution, in rules [e-app] en [e-fix], is denoted by [· 7→ ·]ˆ . The static semantics of the language is presented in Figure 2 in terms of typing rules for deriving judgements Γ ` t : τ, expressing that, in the type environment Γ, the term t has the type τ. Here, type environments are finite maps from variables to types: Γ

∈ TyEnv

= Var →fin Ty

Fully Flexible Types

As an example, consider the Boolean negation function produced by

Γ ` t1 : bool Γ ` t2 : τ Γ ` t3 : τ [t-if ] Γ ` (if t1 then t2 else t3 )` : τ Γ ` t1 : τ2 → τ Γ ` t2 : τ2 [t-app] Γ ` (t1 t2 )` : τ

Key Ideas

In this section, we discuss the key ideas behind the type and effect system that will be presented in Section 5. Recall that our main objective is to provide a modular flow analysis that allows lambdabound variables to be analysed polyvariantly rather than monovariantly. To this end, we associate with each term t in the program a triple τbψ & ϕ, consisting of an annotated type τb, an annotation ψ, and an effect ϕ. The idea is that the annotation ψ describes the possible production sites of the values that t can evaluate to and that the effect ϕ describes the flow that may be incurred from the evaluation of t. Thus, annotations are essentially sets of labels `, while effects are sets of pairs (`, ψ) consisting of a consumer label ` and an annotation ψ. Annotated types are constructed from the type bool ϕ0 of Booleans and annotated function types of the form τb1 ψ1 −→ τbψ2 , where ψ1 and ψ2 denote the production sites of, respectively, the argument and the result of a function, and ϕ0 is the so-called latent effect of a function, i.e., the effect that may be observed from applying the function to an argument. Furthermore, and crucially, we allow universal quantification over both annotations and effects to occur anywhere in an annotated type.

4.2

Annotation and Effect Operators

To demonstrate how the notion of fully flexible types extends to higher-order functions, let us consider the second-order function produced by (λf : bool → bool. (f true`5 )`6 )`7 , which applies its argument to the Boolean true produced at `5 . How can we, for such a function, obtain an analysis that can be regarded as fully flexible? Clearly, modularity requires us to be polymorphic in the annotation of the argument function f . Moreover, as we assume that all functions have fully flexible types, the type of any function to be bound to f will itself be polymorphic in its argument annotation too, i.e., have a type of the form ϕ ∀β . boolβ − → boolψ . In general, the latent effect ϕ and the result annotation ψ of f depend on the argument annotation β . We

type environments.

In the sequel, we are only concerned with well-typed terms. The static semantics of Figure 2 is referred to as the underlying type

65

1984) for fixed points. Indeed, as recursive functions are constructed as fixed points fix t1 of terms t1 with higher-order types (τ1 → τ2 ) → τ1 → τ2 and higher-ranked polymorphism allows for arguments to such t1 to have polymorphic annotated types of the ϕ form ∀β . τb1 β − → τb2 ψ , it follows that recursive calls, i.e., uses of its argument by t1 , may be analysed polyvariantly rather than monovariantly. As expected, higher-ranked polymorphism gives you polymorphic recursion for free.

can make this explicit by writing ϕ and ψ as functions of β : ϕ0 β ∀β . boolβ −−→ boolψ0 β .

If we allow annotation and effect abstraction in annotated types, then the annotated types for all functions of underlying type bool → bool can be written in this form. For instance, for the annotated type of the negation function from Section 4.1, we have ϕ0 = λβ 0 . {(`3 , β 0 )} and ψ0 = λβ 0 . {`1 , `2 }, yielding (λβ 0 . {(`3 ,β 0 )}) β

∀β . boolβ −−−−−−−−−−−→ bool((λβ

0

. {`1 ,`2 }) β ) .

Returning to the analysis of the second-order function as a whole, modularity once more requires us to assume a type for f that can be instantiated for all possible choices for ϕ0 and ψ0 and, hence, we end up with a triple consisting of the rank-2 type

5.

∀β f . ∀δ0 . ∀β0 .

5.1

{(`6 ,β f )}∪δ0 {`5 }

δ0 β

(∀β . boolβ −−→ bool(β0 β ) )β f −−−−−−−−−−−→ bool(β0 {`5 }) ,

annotation variables effect variables.

Annotations and effects are then given by ψ ϕ with ψ ϕ

∈ Ann ∈ Eff

annotations effects

::= β | { } | {`} | λβ :: s. ψ1 | ψ1 ψ2 | ψ1 ∪ ψ2 ::= δ | { } | {(`, ψ)} | λβ :: s. ϕ1 | ϕ1 ψ | λδ :: s. ϕ1 | ϕ1 ϕ2 | ϕ1 ∪ ϕ2 .

Note that annotations ψ may contain annotation abstractions λβ :: s. ψ1 and annotation applications ψ1 ψ2 , while effects may contain annotation abstractions λβ :: s. ϕ1 and annotation applications ϕ1 ψ as well as effect abstractions λδ :: s. ϕ1 and effect applications ϕ1 ϕ2 . Furthermore, note that abstractions over annotations and effects make mention of sorts s, s

∈ Sort

sorts.

That is, to make sure that abstractions and applications in annotations and effects are used in meaningful ways only, we depend on sorts to act as the “types” of annotations and effects. Sorts are then constructed from

{(`3 ,β )}

{(`6 ,{`4 }),(`3 ,{`5 })}

−−−−−−−−−−−−−→ bool{`1 ,`2 } .

s

As a final example of the use of annotation and effect operators, consider the higher-order abstraction (cf. the running example from Section 2)

::= ann | eff | s1 → s2 ,

where ann denotes the sort of proper annotations, eff the sort of proper effects, and s1 → s2 the sort of operators that take annotations or effects of sort s1 to annotations or effects of sort s2 . Storing the sorts of free annotation and effect variables in a sort environment Σ,

(λf : bool → bool. (if (f false`1 )`2 then (f true`3 )`4 else false`5 )`6 )`7 and its fully flexible annotated type

Σ

δ0 β . (∀β . boolβ −−→ bool(β0 β ) )β f

∈ SortEnv

= (AnnVar ∪ EffVar) →fin Sort

sort env.,

which maps from annotation and effect variables to sorts, rules for assigning sorts to annotations and effects can be given as in Figure 3. In Figure 4, we have a collection of rules for definitional equivalence relations between annotations and effects. These rules allow us, when necessary, to treat the ∪-constructor that appears in annotations and effects as a commutative, associative, and idempotent operation with { } as unit, and to consider annotations and effects as equal up to beta-equivalence and distribution of union over flow construction.

{(`2 ,β f )}∪δ0 {`1 }∪{(`6 ,β0 {`1 })}∪{(`4 ,β f )}∪δ0 {`3 }

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→ bool(β0 {`3 }∪{`5 }) , and how this type can be instantiated with the analysis for the Boolean identity function produced by (λx : bool. x)`8 to yield the desired polyvariant {(`2 ,`8 ),(`6 ,`1 ),(`4 ,`8 )}

(∀β . boolβ −→ boolβ ) −−−−−−−−−−−−−−→ bool{`3 ,`5 } . 4.3

∈ AnnVar ∈ EffVar

β δ

(∀β . boolβ −−−−−→ bool{`1 ,`2 } ){`4 }

{}

Annotations and Effects

We assume to have at our disposal countable infinite sets of annotation variables (ranged over by β ) and effect variables (ranged over by δ ):

the singleton annotation and {`7 } and the empty effect { }. Here, the variables δ0 and β0 range over, respectively, effect and annotation operators rather than proper effects and annotations. Note how both the latent effect {(`6 , β f )} ∪ δ0 {`5 } and the result annotation β0 {`5 } express that for any call of the second-order function, the polymorphic type of the function bound to its parameter f is instantiated with the annotation {`5 } and that the supplied effect and annotation operators are applied accordingly. Essentially, what we have done here amounts to parameterising the analysis of a function by the analyses of its arguments. For a first-order function, the analysis of an argument is captured by a single annotation that identifies its possible production sites. For a higher-order function, the analysis of an argument of function type is captured by a proper annotation that identifies the possible production sites of the supplied function, and effect and annotation operators that describe how the analysis of the argument function depends on the analyses for its own arguments. Now, concretely, if we instantiate the annotated type of the second-order function above as to prepare it for being applied to the negation function from Section 4.1 and thus supply it with the analysis for the negation function, then, after beta-reducing the effects and annotations, we obtain the instantiated type

∀β f . ∀δ0 . ∀β0

Flow Analysis with Higher-ranked Types

In this section, we present the details of our type and effect system for flow analysis with higher-ranked polymorphic types.

Polymorphic Recursion

5.2

Type and Effect System

The actual type and effect system is defined in terms of rules for deriving judgements of the form

Being able to associate polymorphic annotated types with lambdabound variables naturally induces polymorphic recursion (Mycroft

66

Σ ` ψ :: s

Annotation sorting Σ(β ) = s [sa-var] Σ ` β :: s

ψ ≡ ψ0

Annotation equivalence Σ ` { } :: ann

Σ ` {`} :: ann

[sa-nil] ψ ≡ψ

[sa-sing]

ψ0 ≡ ψ [qa-symm] ψ ≡ ψ0

[qa-refl]

ψ ≡ ψ 00 ψ 00 ≡ ψ 0 [qa-trans] ψ ≡ ψ0

ψ1 ≡ ψ10 [qa-abs] λβ :: s. ψ1 ≡ λβ :: s. ψ10

Σ[β 7→ s1 ] ` ψ1 :: s2 [sa-abs] Σ ` λβ :: s1 . ψ1 :: s1 → s2

ψ1 ≡ ψ10 ψ2 ≡ ψ20 [qa-app] ψ1 ψ2 ≡ ψ10 ψ20

ψ1 ≡ ψ10 ψ2 ≡ ψ20 [qa-union] ψ1 ∪ ψ2 ≡ ψ10 ∪ ψ20

Σ ` ψ1 :: s2 → s Σ ` ψ2 :: s2 [sa-app] Σ ` ψ1 ψ2 :: s

(λβ . ψ11 ) ψ2 ≡ [β 7→ ψ2 ]ψ11

Σ ` ψ1 :: ann Σ ` ψ2 :: ann [sa-union] Σ ` ψ1 ∪ ψ2 :: ann

ψ ≡ ψ ∪{}

[qa-unit]

ψ ≡ ψ ∪ψ

Σ ` ϕ :: s

Effect sorting

ψ1 ∪ ψ2 ≡ ψ2 ∪ ψ1 Σ(δ ) = s [se-var] Σ ` δ :: s

Σ ` { } :: eff

[se-nil]

ϕ ≡ϕ

ϕ ≡ ϕ0

[qe-refl]

ϕ0 ≡ ϕ [qe-symm] ϕ ≡ ϕ0

ϕ ≡ ϕ 00 ϕ 00 ≡ ϕ 0 [qe-trans] ϕ ≡ ϕ0

ψ ≡ ψ0 [qe-sing] {(`, ψ)} ≡ {(`, ψ 0 )}

Σ ` ϕ1 :: s2 → s Σ ` ψ :: s2 [se-app-ann] Σ ` ϕ1 ψ :: s

ϕ1 ≡ ϕ10 [qe-abs-ann] λβ :: s. ϕ1 ≡ λβ . ϕ10

Σ[δ 7→ s1 ] ` ϕ1 :: s2 [se-abs-eff ] Σ ` λδ :: s1 . ϕ1 :: s1 → s2

ϕ1 ≡ ϕ10 ψ ≡ ψ 0 [qe-app-ann] ϕ1 ψ ≡ ϕ10 ψ 0

ϕ1 ≡ ϕ10 [qe-abs-eff ] λδ :: s. ϕ1 ≡ λδ :: s. ϕ10 0 0 ϕ1 ≡ ϕ1 ϕ2 ≡ ϕ2 [qe-app-eff ] ϕ1 ϕ2 ≡ ϕ10 ϕ20

Σ ` ϕ1 :: s2 → s Σ ` ϕ2 :: s2 [se-app-eff ] Σ ` ϕ1 ϕ2 :: s Σ ` ϕ1 :: eff Σ ` ϕ2 :: eff [se-union] Σ ` ϕ1 ∪ ϕ2 :: eff

ϕ1 ≡ ϕ10 ϕ2 ≡ ϕ20 [qe-union] ϕ1 ∪ ϕ2 ≡ ϕ10 ∪ ϕ20

Figure 3. Sorting for annotations and effects.

(λβ . ϕ11 ) ψ ≡ [β 7→ ψ ]ϕ11

b ` t : τbψ & ϕ, Σ|Γ

(λδ . ϕ11 ) ϕ2 ≡ [δ 7→ ϕ2 ]ϕ11

expressing that in the sort environment Σ and the annotated type b the term t can be assigned the annotated type τb as environment Γ, well as the annotation ψ and the effect ϕ. Annotated types are given by

ϕ ≡ ϕ ∪{}

annotated types

\ ∈ TyEnv

[qe-beta-ann] [qe-beta-eff ]

ϕ ≡ ϕ ∪ϕ

[qe-ass]

{(`, ψ1 ) ∪ {(`, ψ2 )} ≡ {(`, ψ1 ∪ ψ2 )}

| ∀β :: s. τb1 | ∀δ :: s. τb1 .

[qe-idem]

[qe-comm]

ϕ1 ∪ (ϕ2 ∪ ϕ3 ) ≡ (ϕ1 ∪ ϕ2 ) ∪ ϕ3

[qe-dist]

Figure 4. Definitional equivalence for annotations and effects.

Types are considered equal up to alpha-renaming. We require the argument and result annotations ψ1 and ψ2 and the latent effect ϕ ϕ in an annotated function type τb1 ψ1 − → τb2 ψ2 to be proper annotations and effects; this requirement is captured by the rules for type wellformedness, listed in Figure 5. We write bτbc for the underlying type that is obtained by removing all annotations and effects from the annotated type τb. If bτbc = τ, we say that τb is a completion of τ. b map variables to pairs (τb, ψ) Annotated type environments Γ consisting of an annotated type τb and an annotation ψ: b Γ

[qe-unit]

ϕ1 ∪ ϕ2 ≡ ϕ2 ∪ ϕ1

with τb ::= bool |

[qa-ass]

Effect equivalence

Σ[β 7→ s1 ] ` ϕ1 :: s2 [se-abs-ann] Σ ` λβ :: s1 . ϕ1 :: s1 → s2

ϕ τb1 ψ1 − → τb2 ψ2

[qa-idem]

[qa-comm]

ψ1 ∪ (ψ2 ∪ ψ3 ) ≡ (ψ1 ∪ ψ2 ) ∪ ψ3

Σ ` ψ :: ann [se-sing] Σ ` {(`, ψ)} :: eff

c τb ∈ Ty

[qa-beta]

Σ ` τb wft

Well-formedness Σ ` bool wft

[w-bool]

Σ ` ϕ :: eff Σ ` τb1 wft Σ ` ψ1 :: ann Σ ` τb2 wft Σ ` ψ2 :: ann ϕ

c × Ann) annotated type env. = Var →fin (Ty

Σ ` τb1 ψ1 − → τb2 ψ2 wft Σ[β 7→ s] ` τb1 wft [w-forall-ann] Σ ` ∀β :: s. τb1 wft

b for the underlying type environment that is obtained We write bΓc by removing all annotations and effects from the annotated type b environment Γ. The rules for flow typing are given in Figure 6. The rule [fvar] expresses that the annotated type τb and the annotation ψ for a

Σ[δ 7→ s] ` τb1 wft [w-forall-eff ] Σ ` ∀δ :: s. τb1 wft

Figure 5. Type well-formedness.

67

[w-arr]

b ` t : τbψ & ϕ Σ|Γ

Flow analysis b = (τb, ψ) Γ(x) [f-var] b ` x : τbψ & { } Σ|Γ

b ` false` : bool{`} & { } [f-false] Σ|Γ

b ` true` : bool{`} & { } [f-true] Σ|Γ

b 7→ (τb1 , ψ1 )] ` t1 : τb2 ψ2 & ϕ0 Σ ` τb1 wft Σ ` ψ1 :: ann Σ | Γ[x [f-abs] ϕ0 b ` (λx : bτb1 c.t1 )` : (τb1 ψ1 − Σ|Γ → τb2 ψ2 ){`} & { } b ` t2 : τbψ & ϕ2 Σ | Γ b ` t3 : τbψ & ϕ3 b ` t1 : boolψ1 & ϕ1 Σ | Γ Σ|Γ [f-if ] ` ψ b b Σ | Γ ` (if t1 then t2 else t3 ) : τ & ϕ1 ∪ {(`, ψ1 )} ∪ ϕ2 ∪ ϕ3 ϕ0 ψ ψ b ` t2 : τb2 ψ2 & ϕ2 b ` t1 : (τb2 ψ2 − → τb ) 1 & ϕ1 Σ | Γ Σ|Γ [f-app] ` ψ b Σ | Γ ` (t1 t2 ) : τb & ϕ1 ∪ ϕ2 ∪ {(`, ψ1 )} ∪ ϕ0

b ` (fix t1 )` : τbψ & ϕ1 ∪ {(`, ψ1 )} ∪ ϕ0 Σ|Γ

b ` t : τb1 ψ & ϕ Σ[β 7→ s] | Γ [f-gen-ann] b ` t : (∀β :: s. τb1 )ψ & ϕ Σ|Γ

b ` t : (∀β :: s. τb1 )ψ & ϕ Σ ` ψ0 :: s Σ|Γ [f-inst-ann] b ` t : ([β 7→ ψ0 ]τb1 )ψ & ϕ Σ|Γ

b ` t : τb1 ψ & ϕ Σ[δ 7→ s] | Γ [f-gen-eff ] b ` t : (∀δ :: s. τb1 )ψ & ϕ Σ|Γ

b ` t : (∀δ :: s. τb1 )ψ & ϕ Σ ` ϕ0 :: s Σ|Γ [f-inst-eff ] b ` t : ([δ 7→ ϕ0 ]τb1 )ψ & ϕ Σ|Γ

0

ψ ≡ ψ0

ϕ0 ψ ψ b ` t1 : (τbψ − Σ|Γ → τb ) 1 & ϕ1

b ` t : τbψ & ϕ 0 Σ|Γ Σ ` ψ :: ann ϕ ≡ ϕ 0 Σ ` ϕ :: eff [f-eq] b ` t : τbψ & ϕ Σ|Γ

[f-fix]

b ` t : τb0ψ1 & ϕ1 Σ|Γ τb0 6 τb Σ ` τb wft Σ ` ψ2 :: ann Σ ` ϕ2 :: eff [f-sub] b ` t : τb(ψ1 ∪ψ2 ) & (ϕ1 ∪ ϕ2 ) Σ|Γ τb 6 τb0

Subtyping ϕ 0 ≡ ϕ ∪ ϕ 00 τb 6 τb0

[s-refl]

τb10 6 τb1

ψ1 ≡ ψ10 ∪ ψ100

τb2 6 τb20

0 0 ψ10 ϕ

ϕ

τb1 ψ1 − → τb2 ψ2 6 τb1 τb1 6 τb10 [s-forall-ann] ∀β :: s. τb1 6 ∀β :: s. τb10

ψ20 ≡ ψ2 ∪ ψ200

0

−→ τb20 ψ2

[s-arr]

τb1 6 τb10 [s-forall-eff ] ∀δ :: s. τb1 6 ∀δ :: s. τb10

Figure 6. Type and effect system for flow analysis. annotated type τb and annotation ψ are then retrieved from the result positions in the type of t1 . The effect for the application subsumes the effects for its subterms t1 and t2 as well as the possible flow from the function labels ψ1 to the application site ` and the latent effect ϕ0 of t1 . For the fixed point (fix t1 )` of a term t1 , the annotated type τbψ is retrieved from the type of t1 , which is required to be of ϕ0 the form τ ψ −→ τ ψ . The effect component is then constructed by combining the effect ϕ1 of t1 , the singleton effect {(`, ψ1 )} with ψ1 the annotation of t1 , and the latent effect ϕ0 of t1 . The rules [f-gen-ann] and [f-inst-ann] form a pair of introduction and elimination rules for annotation polymorphism. Quantification over an s-sorted annotation is allowed, if the corresponding binding in the sort environment admits a valid analysis. Instantiation requires an annotation of appropriate sort to be supplied. Rules [f-gen-eff ] and [f-inst-eff ] are analogue rules for effect polymorphism. The rule [f-eq] expresses that annotations and effects at top level can always be safely replaced by well-sorted definitional equivalents. The rule [f-sub], finally, is a combined rule for subtyping and subeffecting (Tang and Jouvelot 1995) that allows for overapproximation of annotations and effects. This rule is typically used immediately before the rule [f-if ] in order to have the branches of a conditional agree on their types and annotations. The rules for subtyping are given in the lower part of Figure 6.

variable x are to be retrieved from the annotated type environment b In the call-by-value semantics of our language, the evaluation of Γ. a variable does not result in flow; hence, the effect component in the conclusion of rule [f-var] stays empty. For the Boolean producers false` and true` we have axioms [f-false] and [f-true] that assign the annotated type bool and a singleton annotation {`} that reflects the production site `. Producers are already fully evaluated and so no effect is recorded. Lambda-abstractions (λx : τ.t1 )` are dealt with by the rule [fabs]. It states that the body t1 of the abstraction is to be analysed in an extended annotated type environment that maps the formal parameter x to the pair (τb1 , ψ1 ), where ψ1 is a proper annotation and τb1 a possibly polymorphic completion of τ that is well-formed with respect to the sorting environment Σ. While τb1 and ψ1 are then used as the argument type and annotation for the abstraction, the annotated type τb2 and the annotation ψ2 , obtained from the analysis of the body, both end up in result position; the effect ϕ0 of t1 constitutes the latent effect. The annotation and effect for the abstraction as a whole are taken to be {`} and { }, respectively. The rule for conditionals (if t1 then t2 else t3 )` , [f-if ], requires the condition t1 to be of Boolean type and the branches t2 and t3 to agree on their annotated types and annotations, which will then be used as the annotated type and annotation for the conditional itself. The effect for the conditional is constructed by taking the union over the effects of the three subterms and recording that the Boolean values that may flow to the condition t1 are possibly consumed at the site labelled with `. In the rule [f-app] for applications (t1 t2 )` , the annotated type τb2 and the annotation ψ2 of the argument term t2 are to match with the argument type and annotation of the function term t1 . The

6.

Properties

Let us now briefly review the most important metatheoretical properties of our type and effect system.

68

6.1

Applicability

Definition 2. An annotated type τb is fully flexible if

Our flow analysis is a conservative extension of the underlying type system from Section 3 in the sense that every program typeable in the underlying system can be successfully subjected to the analysis. Furthermore, both systems agree on the shape of types assignable.

1. τb = bool, or ϕ → τb2 ψ2 ) for some ϕ and ψ2 with (a) τb1 fully 2. τb = (∀χi :: si . τb1 β − parametric, (b) τb2 fully flexible, and (c) χi = {β } ∪ ffv(τb1 ). Note that full parametricity implies full flexibility and how higherorder function types give rise to higher-ranked polymorphism and higher-order operators over annotations and effects. Full flexibility extends naturally to closed type environments.

Theorem 1 (Conservative extension). b τb, ψ, and ϕ with bΓc b = Γ and 1. If Γ ` t : τ, then there exist Γ, bτbc = τ, such that [ ] | Γ ` t : τbψ & ϕ. b ` t : τbψ & ϕ, then bΓc b ` t : bτbc. 2. If Σ | Γ 6.2

b is fully flexible if Definition 3. An annotated type environment Γ b = { } and if, for all x, τb, and ψ with Γ(x) b = (τb, ψ), we have ffv(Γ) that τb is fully flexible.

Semantic Correctness

To establish the correctness of the analysis with respect to the instrumented natural semantics from Section 3, we consider interpretations J·K of annotations ψ as sets of labels,

Now, in a fully flexible environment, each analysable term admits a fully flexible type. b ` t : τb0ψ 0 & ϕ 0 with Γ b fully flexible, then there Lemma 4. If [ ] | Γ b ` t : τbψ & ϕ. exist τb, ψ, and ϕ such that τb is fully flexible and Γ

J{ }K = {} J{`}K = {`} Jψ1 ∪ ψ2 K = Jψ1 K ∪ Jψ2 K,

Amongst all possible analyses for a given term in a given environment, we are interested in a fully flexible analysis that makes the most accurate prediction about production sites and flow, i.e., the analysis that results in the “smallest” types, annotations, and effects. As all fully flexible types for a term agree on their negative positions, the notion of a best analysis can be straightforwardly expressed in terms of subtyping and definitional equivalence.

and of effects ϕ as flows,

J{ }K = {} J{`, ψ }K = {(`, `0 ) | `0 ∈ JψK} Jϕ1 ∪ ϕ2 K = Jϕ1 K ∪ Jϕ2 K.

Both interpretations are partial in the sense that they do not account for abstractions, applications, and free variables in annotations and effects. Hence, we only consider closed environments and observe that the type and effect system guarantees all top-level annotations and effects to be proper annotations and effects.

Definition 4. The triple (τb, ψ, ϕ) consisting of a fully flexible annotated type τb, an annotation ψ, and an effect ϕ constitutes a b if [ ] | Γ b ` t : τbψ & ϕ and if, for all τb0 , ψ 0 , best analysis for t in Γ, b ` t : τb0ψ 0 & ϕ 0 and τb0 fully flexible, we have that and ϕ 0 with [ ] | Γ τb 6 τb0 , ψ 0 ≡ ψ ∪ ψ 00 , and ϕ 0 ≡ ϕ ∪ ϕ 00 for some ψ 00 and ϕ 00 .

Lemma 2. If [ ] | [ ] ` t : τbψ & ϕ, then [ ] ` τb wft, [ ] ` ψ :: ann, and [ ] ` ϕ :: eff.

b ` t : τb0ψ 0 & Theorem 5 (Existence of best analyses). If [ ] | Γ ϕ 0 with τb0 fully flexible, then there exist τb, ψ, and ϕ, such that b (τb, ψ, ϕ) is a best analysis for t in Γ.

As proper annotations and effects are always definitionally equivalent to forms without abstractions and applications, we can now formulate the following result. Theorem 3 (Semantic soundness). If [ ] | [ ] ` t : τbψ & ϕ and t ⇓F p` , then there exist ψ 0 and ϕ 0 with ψ ≡ ψ 0 and ϕ ≡ ϕ 0 , such that ` ∈ Jψ 0 K and F ⊆ Jϕ 0 K. 6.3

7.

Existence of “Best” Analyses

While Theorem 1 establishes that all well-typed programs can be analysed, we now wish to state that each analysable program admits an analysis that is in some sense “better” than all other analyses for that program. As we are interested in analyses that guarantee modularity, we shall restrict ourselves to analyses that provide fully flexible types. To this end, let χ range over both annotation and effect variables, together referred to as flow variables, χ

∈ AnnVar ∪ EffVar

Algorithm

In this section, we present an inference algorithm for obtaining best analyses. The algorithm naturally breaks up in two parts: a reconstruction algorithm R that produces annotated types, annotations, and effects for terms as well as constraints between flow variables (Section 7.1), and a procedure S for solving the constraints produced by R (Section 7.2). A crucial aspect of the algorithm is that the constraints that are generated for the body of a lambda-abstraction are solved locally, allowing for annotations and effects to be generalised over at the binding-sites of formal parameters. 7.1

flow variables,

Flow Reconstruction

The algorithm R for reconstructing types, annotations, and effects b consisting of an anis given in Figure 7. It takes as input a pair (Γ,t) b notated type environment Γ and a term t and produces a quadruple (τb, β , δ ,C) consisting of an annotated type τb, an annotation variable β , an effect variable δ , and a finite set C of constraints over β and δ as well as any intermediate flow variables. Constraints are given by

and let us use overbar notation to denote sequences, where we feel free to “downcast” sequences of flow variables to sets of flow b variables. We write ε for the empty sequence, ffv(τb) and ffv(Γ) for the set of free, i.e., unbound, flow variables in, respectively, b and an annotated type τb and an annotated type environment Γ, annvars(χi ) for the subsequence of annotation variables contained in χi . Then, fully flexible types are defined as follows.

q C

Definition 1. An annotated type τb is fully parametric if where

1. τb = bool, or 2.

∈ Constraint constraints ∈ F (Constraint) constraint sets,

δ0 χi τb = (∀χi :: si . τb1 β −−→ τb2 (β0 βi0 ) )

q

for some δ0 and β0 with (a) τb1 and τb2 fully parametric, (b) χi = {β } ∪ ffv(τb1 ), and (c) βi0 = annvars(χi ).

::= ψ ⊆ β | ϕ ⊆ δ .

That is, a constraint expresses either the inclusion of an annotation ψ in the annotation represented by the annotation variable β or

69

C(bool, χi :: si ) = (bool, { })

b x) = let (τb, ψ) = Γ(x) b R(Γ, β , δ be fresh in (τb, β , δ , {ψ ⊆ β })

C(τ1 → τ2 , χi :: si ) = let (τb1 , χ j :: s j ) = C(τ1 , ε) β1 be fresh (τb2 , χk :: sk ) = C(τ2 , (χi :: si , β1 :: ann, χ j :: s j ))

b false` ) = let β , δ be fresh in (bool, β , δ , {{`} ⊆ β }) R(Γ, b true` ) = let β , δ be fresh in (bool, β , δ , {{`} ⊆ β }) R(Γ,

βi0 :: si0 = annvars(χi :: si ) β j0 :: s j0 = annvars(χ j :: s j ) β0 , δ0 be fresh

b (λx : τ1 .t1 )` ) = R(Γ, let (τb1 , χi :: si ) = C(τ1 , ε) β1 be fresh b 7→ (τb1 , β1 )],t1 ) (τb2 , β2 , δ0 ,C1 ) = R(Γ[x

δ0 χi β1 χ j

(β β β β ) in (∀β1 :: ann. ∀χ j :: s j . τb1 β1 −−−−−→ τb2 0 i0 1 j0 , (δ0 :: si → ann → s j → eff, β0 :: si0 → ann → s j0 → ann, χk :: sk ))

b X = {β1 } ∪ { χi } ∪ ffv(Γ) (ψ2 , ϕ0 ) = S(C1 , X, β2 , δ0 )

Figure 8. Completion algorithm. ϕ

0 τb = ∀β1 :: ann. ∀χi :: si . τb1 β1 −→ τb2 ψ2 β , δ be fresh in (τb, β , δ , {{`} ⊆ β })

J(bool, bool) = bool

b (if t1 then t2 else t3 )` ) = R(Γ, b 1) let (bool, β1 , δ1 ,C1 ) = R(Γ,t b 2) (τb2 , β2 , δ2 ,C2 ) = R(Γ,t b 3) (τb3 , β3 , δ3 ,C3 ) = R(Γ,t τb = J(τb2 , τb3 ) β , δ be fresh C = {δ1 ⊆ δ } ∪ {{(`, β1 )} ⊆ δ } ∪ {δ2 ⊆ δ } ∪ {δ3 ⊆ δ } ∪ {β2 ⊆ β } ∪ {β3 ⊆ β } ∪C1 ∪C2 ∪C3 in (τb, β , δ ,C)

J(∀δ :: s. τb11 , ∀δ :: s. τb21 ) = ∀δ :: s. J(τb11 , τb21 ) J(τb1 , τb2 ) = fail

sequence χi :: si that contains the free flow variables of τb1 accompanied by their sorts. Then we create a mapping from the formal parameter x to the pair (τb1 , β1 ), where β1 is a fresh annotation variable, and use it in a recursive invocation of R for the body t1 of the abstraction. This recursive invocation results in a tuple (τb2 , β2 , δ0 ,C1 ). The constraints in C1 are then solved with respect to a finite set of active flow variables X (see Section 7.2),

0

X

b (fix t1 )` ) = R(Γ, b 1) let (τb1 , β1 , δ1 ,C1 ) = R(Γ,t ϕ0

in all other cases

Figure 9. Join algorithm.

τb2 −→ τb0ψ = I(τb1 ) θ = [β20 7→ β2 ] ◦ M([ ], τb2 , τb20 ) β , δ be fresh C = {δ1 ⊆ δ } ∪ {δ2 ⊆ δ } ∪ {{(`, β1 )} ⊆ δ } ∪ {θ ϕ00 ⊆ δ } ∪ {θ ψ 0 ⊆ β } ∪C1 ∪C2 in (θ τb0 , β , δ ,C)

0

ϕ ∪ϕ

J(∀β :: s. τb11 , ∀β :: s. τb21 ) = ∀β :: s. J(τb11 , τb21 )

b (t1 t2 )` ) = R(Γ, b 1) let (τb1 , β1 , δ1 ,C1 ) = R(Γ,t b 2) (τb2 , β2 , δ2 ,C2 ) = R(Γ,t ϕ0 0 β20 0

ϕ

ϕ

2 2 1 J(τb12 , τb22 )(ψ12 ∪ψ22 ) τb22 ψ22 ) = τb1 β1 −−1−−→ τb12 ψ12 , τb1 β1 −→ J(τb1 β1 −→

∈ F (AnnVar ∪ EffVar) flow-variable sets,

to yield a least solution (ψ2 , ϕ0 ) for the flow variables β2 and δ0 . An annotated type for the abstraction is then formed by quantifying over the argument annotation variable β1 and the free flow variables χi of the argument type τb1 ; choosing τb1 and β1 as argument type and annotation; choosing τb2 and ψ2 as result type and annotation; and, choosing ϕ0 as latent effect. For the annotation and effect of the abstraction as a whole, we pick fresh variables β and δ and record that ` is to be included in a solution for β . For conditionals (if t1 then t2 else t3 )` we make recursive calls to R for all three subterms. The thus obtained constraint sets C1 , C2 , and C3 are then combined with the constraints that account for the flow that is involved with evaluating a conditional to form the constraint set C for the conditional as a whole. The annotated type τb for the conditional is obtained by taking the least upper bound of the recursively obtained types τb2 and τb3 with respect to the subtyping relation of Figure 6. This least upper bound is computed by the join algorithm J in Figure 9. Note how J makes essential use of the invariant that all types are fully flexible (and that the types to join thus agree in their argument positions) as well as the fact that types are to be considered equal up to alpha-renaming (in the cases for quantified types). In the case for applications (t1 t2 )` , we make recursive calls to R for the function term t1 and the argument term t2 . The thus obtained annotated type for τb1 for t1 , for which our invariant guarantees that it is fully flexible, is then instantiated by means of a call to the auxiliary procedure I (Figure 10), from which we retrieve the

00

0 τb0β −→ τb00ψ = I(τb1 ) θ1 = M([ ], τb00 , τb0 ) θ2 = [β 0 7→ θ1 ψ 00 ] β , δ be fresh C = {δ1 ⊆ δ } ∪ {{(`, β1 )} ⊆ δ } ∪ {θ2 (θ1 ϕ00 ) ⊆ δ } ∪ {θ2 (θ1 ψ 00 ) ⊆ β } ∪C1 in (θ2 (θ1 τb0 ), β , δ ,C)

Figure 7. Reconstruction algorithm.

the inclusion of an effect ϕ in the effect represented by the effect variable δ . We carefully maintain the invariant that all annotated types produced are fully flexible. Turning to the details of the algorithm, the cases for variables and Boolean constants false` and true` are straightforward: we generate fresh annotation and effect variables and propagate the b or the relevant information from either the type environment Γ producer label ` to the result tuple. More interesting is the case for lambda-abstractions (λx : τ1 .t1 )` . Here, we first make a call to the subsidiary procedure C, given in Figure 8, that produces a pair (τb1 , χi :: si ) consisting of a fully parametric (cf. Definition 1) completion τb1 of τ1 and a

70

I(∀β :: s. τb1 ) = let β 0 be fresh in [β 7→ β 0 ](I(τb1 ))

S(C, X, β , δ ) = do (* initialisation *) worklist := { } analysis := [ ] dependencies := [ ] for all (ξ ⊆ χ) in C do worklist := worklist ∪ {ξ ⊆ χ } analysis := analysis[ χ 7→ { }] for all ξ 0 in ffv(ξ ) do dependencies := dependencies[ξ 0 7→ { }] for all (ξ ⊆ χ) in C do for all ξ 0 in ffv(ξ ) do dependencies := dependencies[ξ 0 7→ dependencies(ξ 0 ) ∪ {ξ ⊆ χ }] for all χ in X do analysis := analysis[ χ 7→ χ ] analysis := analysis[β 7→ { }][δ 7→ { }] (* iteration *) while worklist 6= { } do let C1 ] {ξ ⊆ χ } = worklist in do worklist :=C1 if (analysis ξ ) 6⊆ analysis(χ) then do analysis := analysis[ χ 7→ analysis(χ) ∪ (analysis ξ )] for all q in dependencies[ χ ] do worklist := worklist ∪ {q} (* finalisation *) return (analysis(β ), analysis(δ ))

I(∀δ :: s. τb1 ) = let δ 0 be fresh in [δ 7→ δ 0 ](I(τb1 )) I(τb) = τb

in all other cases

Figure 10. Instantiation algorithm.

M(Σ, bool, bool) = id δ χ

ϕ

0 i M(Σ, τb1 β1 − → τb20 β0 β j ) = → τb2 ψ2 , τb1 β1 −− [δ0 7→ (λχi :: Σ(χi ). ϕ)] ◦ [β0 7→ (λβ j :: Σ(β j ). ψ2 )] ◦ M(Σ, τb2 , τb20 )

M(Σ, ∀β :: s. τb1 , ∀β :: s, τb10 ) = M(Σ[β 7→ s], τb1 , τb2 ) M(Σ, ∀δ :: s. τb1 , ∀δ :: s, τb10 ) = M(Σ[δ 7→ s], τb1 , τb2 ) M(Σ, τb, τb0 ) = fail

in all other cases

Figure 11. Matching algorithm. fully parametric parameter type τb20 and the parameter annotation β20 . Against these we then match the actual argument type τb2 and the actual argument annotation β2 , resulting in a substitution θ , θ

∈ Subst

Figure 12. Worklist algorithm for constraint solving.

substitutions. After intialisation of the worklist set and the finite maps, the algorithm proceeds by considering constraints from the worklist as long as these are available. In each iteration a constraint is selected and tested for satisfaction. Here, we use the finite map analysis as a substitution and write analysis ξ for the interpretation of the flow term ξ under the subsitution provided by analysis. If a constraint is found unsatisfied, we update the solution for its right-hand-side flow variable χ and add all dependent constraints to the worklist. If the worklist is empty, the algorithm produces a pair consisting of the solutions for the flow variables β and δ . These are then guaranteed to consist of flow terms that, besides from applications and abstractions, are exclusively constructed from concrete labels and the flow variables from X.

For the matching of τb2 against τb20

we rely on a subsidiary procedure M, given in Figure 11. The substitution θ is used to determine the annotated type of the application as a whole from the result type τb0 from t1 . For the annotation and the effect of the application, we generate fresh variables β and δ and in the constraint set C we include constraints obtained for t1 and t2 as well as the constraints that are obtained by considering the flow incurred by the application. Finally, the case for fixed points (fix t1 )` is similar to the case for applications with the most important difference that a substitution is constructed in two steps here. First, a substitution θ1 is constructed by matching the result type of t1 against its fully parametric parameter type. Then, the “recursive knot is tied” by substituting the result annotation for the annotation variable β 0 that constitutes the parameter annotation. 7.2

7.3

Constraint Solving

For solving the constraints produced by the reconstruction algorithm R, we rely on a standard worklist algorithm. This algorithm, S, is given in Figure 12. As inputs it takes a constraint set C, a set of active flow variables X that are to be considered as constants during solving, an annotation variable β , and an effect variable δ . As outputs it produces least solutions ψ and ϕ for β and δ under C. During solving there is no need for explicitly distinguishing between annotation constraints and effect constaints. Therefore we take ξ

∈ Ann ∪ Eff

Syntactic Correctness

A trivial observation about the completion algorithm from Figure 8 with respect to the defintions from Section 6 is the following: Lemma 6. For all types τ, there is a fully parametric τb, such that C(τ, ε) = τb. Now, the correctness of both the reconstruction algorithm from Figure 7 and the worklist algorithm from Figure 12 with respect to the type and effect system from Section 5 comes in two parts. First, we have that each analysis produced by the algorithm is indeed admitted by the flow-typing rules of Figure 6. b = Theorem 7 (Syntactic soundness). If we have that R(Γ,t) b then (τb, β , δ ,C) and S(C, { }, β , δ ) = (ψ, ϕ) for a fully flexible Γ, b ` t : τbψ & ϕ. [] | Γ

flow terms

and write all constraints as ξ ⊆ χ. The algorithm maintains a finite set worklist for keeping track of constraints that are still to be considered. Furthermore, it uses a finite map analysis from flow variables to flow terms, in which intermediate solutions for β , δ , and the flow variables in X and the right-hand sides of C are kept; and a finite map dependencies that stores, for each flow variable χ, which constraints need to be reconsidered if the solution for χ is updated.

Second, we have that the algorithm produces best analyses for all analysable terms. This result depends crucially on the invariant maintainind by the reconstruction algorithm, i.e., that R always produces fully flexible types. In particular, we have that the join algorithm from Figure 9 will not fail if it invoked with two fully flexible completions of a single underlying type.

71

Lemma 8. If τb1 and τb2 are fully flexible with bτb1 c = bτb2 c = τ for some τ, then J(τb1 , τb2 ) = τb with bτbc = τ.

variables are given, rather than their annotated types, an analysis can be computed for which the best analysis for that term in any given annotated type environment is a substitution instance. More precisely, if for a given term t, we are given an underlying type environment Γ, such that Γ ` t : τ for some type τ, then Σ, b τb, ψ, and ϕ can be computed, such that Σ | Γ b ` t : τbψ & ϕ with Γ, b = Γ and bτbc = τ, and, moreover, for each fully flexible Γ b0 with bΓc 0 b bΓ c = Γ, there is a computable substitution θ mapping annotation variables to annotations and effect variables to effects, such b0 . The idea is to first that (θ τb, θ ψ, θ ϕ) is a best analysis for t in Γ tentatively “guess” a fully parametric completion of the given underlying type environment and then, as flow inference proceeds, to gradually adapt this completion by “growing” a substitution on flow variables. Then, effectively, our type and effect system admits, in a sense, principal typings, but only as far as annotations and effects are concerned. For practical purposes, this suffices, because, as real-world higher-order functional languages are typically based on the Damas-Milner typing discipline, which itself does not admit principal typings, underlying type environments can be expected to be available for all terms under analysis. The increased precision obtained from the use of polymorphic recursion in type-based analyses, as realised by Dussart et al. (1995), is reported on by several authors, including Henglein and Mossin (1994), and Tofte and Talpin (1994). To the best of our knowledge, we are the first to consider the generalisation to polymorphic types for all function arguments rather than for just those of functions from which fixed points are obtained.

Similarly, the matching algorithm from Figure 11 is guaranteed to succeed when invoked with one fully flexible and one fully parametric completion of the same underlying type: Lemma 9. If τb is fully flexible and τb0 fully parametric with bτbc = bτb0 c, then M([ ], τb, τb0 ) = θ with θ τb0 ≡ τb. b ` t : τb0ψ 0 & ϕ 0 Theorem 10 (Syntactic completeness). If [ ] | Γ b fully flexible, then there are τb, β , δ , C, ψ, and ϕ with with Γ b = (τb, β , δ ,C) and S(C, { }, β , δ ) = (ψ, ϕ) and (τb, ψ, ϕ) a R(Γ,t) b best analysis for t in Γ.

8.

Related Work

Early approaches to flow analysis for higher-order languages, e.g., the closure analysis of Sestoft (1991) and the set-based analysis of Heintze (1994), were monovariant, allowing only a single, contextinsensitive analysis result to be associated with each of a program’s functions. Later work resulted in polyvariant analyses that allow for the analysis results associated with at least some identifiers in a program to be context-sensitive; examples include Shivers’ k-CFA (1991) and Nielson and Nielson’s infinitary analysis (1997). Polymorphic type and effect systems for flow analysis, such as F¨ahndrich’s (2008), typically restrict polyvariant analysis results to be associated with let-bound identifiers only, leaving function parameters to be analysed monovariantly. Exceptions are the approaches of Fax´en (1997) and Smith and Wang (2000), who also present polymorphic type and effect systems for flow analysis that allow for function parameters to be analysed polyvariantly rather than monovariantly. The most important difference between our approach and both the approach of Fax´en and that of Smith and Wang is that, while we propose a single analysis, Fax´en and Smith and Wang investigate families of constraint systems parameterised over inference strategies; the choices of strategies that lead to decidable analyses in their systems are rather ad hoc. Furthermore, the look-and-feel of the systems of Fax´en and Smith and Wang differs significantly from ours, as we are, to the best of our knowledge, the first to consider the use of first-class operators on effects and annotations. Gustavsson and Svenningsson (2001) propose constrained type schemes that show a superficial similarity to ours, but do not allow quantification over effect operators; moreover, they do not allow type schemes to be associated with lambda-bound identifiers. An important class of type-based flow analyses makes use of intersection types rather than polymorphic types. In general, intersection types allow for more fine-grained analysis results than polymorphic types (Wells et al. 2002). Kfoury and Wells (1999) show that inference is decidable if analyses are restricted to intersection types of finite rank. Their inference algorithm makes essential use of so-called expansion variables and is arguably much more complicated than the one we give for our analysis in Section 7. Banerjee and Jensen (2003) demonstrate that the restriction to rank-2 intersection types allows for a simpler algorithm, but only at the expense of decreased precision, while Mossin (2003) proceeds in the opposite direction and shows that exact flow analyses can be obtained at the expense of a nonelementary recursive inference problem. A major advantage of the use of intersection types is that they admit principal typings rather than mere principal types (Jim 1996). As type systems with principal typings allow for terms to be typed independently from the types of their free variables, analyses based on intersection typing are even more modular than systems with just principal types. Our type and effect system does not admit principal typings, but, interestingly, in practice, the same level of modularity can be achieved as for systems with intersection types. That is, if, for a given term, the underlying types of its free

9.

Conclusions and Further Work

In this paper, we have presented a type and effect system for flow analysis with higher-ranked polymorphic types and higherorder effect operators. This system allows us to attain precision beyond what is offered by the ML-style let-polymorphic types that are typically used in polymorphic effect systems. The key innovation of our work is the use of fully flexible types, i.e., types that are as polymorphic as possible but impose no restrictions on the arguments that can be passed to functions. Given fully flexible types for all free variables, our analysis, which is a conservative extension of the standard Damas-Milner typing discipline, admits “best analyses” for all programs analysable: such analyses are both precise and modular. Our analysis distinguishes between producers and consumers. In the present paper we have focused on producers and consumers for Boolean and function values, but our approach applies to other data types as well. In particular, although the details are syntactically rather heavy, our analysis can be extended to user-defined, algebraic data types, as found in modern functional languages such as Haskell and ML. Accounting for the use of let-polymorphism in the underlying type system is largely an orthogonal issue. The flow analysis presented in this paper is a typical forward analysis: we keep track of the flow from producers to consumers. As future work—and as part of our research agenda to develop a reusable framework that can be used to construct precise and modular type and effect systems, much like monotone frameworks (Kam and Ullman 1977) are used to construct data-flow analyses— we aim at formulating a backward variation of our analysis, in which we keep track, for each production site, at which program points constructed values are consumed. Many static analyses for higher-order languages can, in a typebased formulation, be expressed as variations on flow analysis. We expect our approach to be of value to these analyses as well and, hence, we plan to define higher-ranked polymorphic type and effect systems for analyses such as binding-time analysis, strictness

72

analysis, and usage analysis, and to compare the results obtained with those from existing let-polymorphic systems. If a polyvariant type-based analysis is used to drive an optimising program transformation, a trade-off arises between the modularity of the analysis and the effectiveness of the transformation. For let-polymorphism, this trade-off may be resolved by differentiating between local and global let-bound identifiers (Holdermans and Hage 2010). For higher-ranked polymorphism, a similar measure may be in order, i.e., to obtain more effective transformations, selected lambda-bound identifiers may have to receive nonfully parametric types. Investigating how the algorithm of Section 7 can be adapted to such scenarios is a challenging but nevertheless appealing direction for further work. Finally, characterising the difference in expressiveness and the trade-offs in implementation techniques between our analysis and systems based on intersection types of various ranks promises to be an interesting topic for further research.

Fritz Henglein and Christian Mossin. Polymorphic binding-time analysis. In Donald Sannella, editor, Programming Languages and Systems, ESOP’94, 5th European Symposium on Programming, Edinburgh, U.K., April 11–13, 1994, Proceedings, volume 788 of Lecture Notes in Computer Science, pages 287–301. Springer-Verlag, 1994.

Acknowledgments

John B. Kam and Jeffrey D. Ullman. Monotone data flow analysis frameworks. Acta Informaticae, 7:305–317, 1977.

Stefan Holdermans and Jurriaan Hage. On the rˆole of minimal typing derivations in type-driven program transformation, 2010. To appear in the proceedings of the 10th Workshop on Language Descriptions, Tools, and Applications (LDTA 2010), Paphos, Cyprus, 27–28 March 2010. Trevor Jim. What are principal typings and what are they good for? In Conference Record of POPL’96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Language, Papers Presented at the Symposium, St. Petersburg Beach, Florida, 21–24 January 1996, pages 42–53. ACM Press, 1996.

This work was supported by the Netherlands Organisation for Scientific Research through its project on “Scriptable Compilers” (612.063.406) and carried out while the first author was employed at Utrecht University. The authors would like to thank Arie Middelkoop and Jeroen Weijers for their helpful comments on a draft of this paper, and the anonymous reviewers for their insightful feedback on the submitted version.

Assaf J. Kfoury and Jerzy Tiuryn. Type reconstruction in finite rank fragments of the second-order λ -calculus. Information and Computation, 98(2):228–257, 1992. Assaf J. Kfoury and Joe B. Wells. Principality and decidable type inference for finte-rank intersection types. In POPL ’99, Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 20–22, 1999, San Antonio, TX, pages 161–174. ACM Press, 1999.

References Anindya Banerjee and Thomas P. Jensen. Modular control-flow analysis with rank 2 intersection types. Mathemathical Structures in Computer Science, 13(1):87–124, 2003.

Christian Mossin. Exact flow analysis. Mathematical Structures in Computer Science, 13(1):125–156, 2003. Alan Mycroft. Polymorphic type schemes and recursive definitions. In Manfred Paul and Bernard Robinet, editors, International Symposium on Programming, 6th Colloquium, Toulouse, April 17–19, 1984, Proceedings, volume 167 of Lecture Notes in Computer Science, pages 217–228. Springer-Verlag, 1984.

Lu´ıs Damas and Robin Milner. Principal type-schemes for functional programs. In Conference Record of the Ninth Annual ACM Symposium on Principles of Programming Languages, Albuquerque, New Mexico, January 1982, pages 207–212. ACM Press, 1982.

Flemming Nielson and Hanne Riis Nielson. Type and effect systems. In Ernst-R¨udiger Olderog and Bernhard Steffen, editors, Correct System Design, Recent Insight and Advances, (to Hans Langmaack on the occasion of his retirement from his professorship at the University of Kiel), volume 1710 of Lecture Notes in Computer Science, pages 114–136. Springer-Verlag, 1999.

Dirk Dussart, Fritz Henglein, and Christian Mossin. Polymorphic recursion and subtype qualifications: Polymorphic binding-time analysis in polynominal time. In Alan Mycroft, editor, Static Analysis, Second International Symposium, SAS’95, Glasgow, UK, September 27, 1995, Proceedings, volume 983 of Lecture Notes in Computer Science, pages 118–135. Springer-Verlag, 1995.

Hanne Riis Nielson and Flemming Nielson. Infinitary control flow analysis: A collecting semantics for closure analysis. In Conference Record of POPL’97: The 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Papers Presented at the Symposium, Paris, France, 15–17 January 1997, pages 332–345. ACM Press, 1997.

Manuel F¨ahndrich and Jakob Rehof. Type-based flow analysis and context-free language reachability. Mathematical Structures in Computer Science, 18(5):823–894, 2008. Karl-Filip Fax´en. Polyvariance, polymorphism and flow analysis. In Mads Dam, editor, Analysis and Verification of MultipleAgent Languages, 5th LOMAPS Workshop, Stockholm, Sweden, June 24–26, 1996, Selected Papers, volume 1192 of Lecture Notes in Computer Science, pages 260–278. Springer-Verlag, 1997.

Jens Palsberg. Type-based analysis and applications. In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE’01, Snowbird, Utah, USA, June 18–19, 2001, pages 20–27. ACM Press, 2001.

J¨orgen Gustavsson and Josef Svenningsson. Constraint abstractions. In Olivier Danvy and Andrzej Filinski, editors, Programs as Data Objects, Second Symposium, PADO 2001, Aarhus, Denmark, May 21–23, 2001, Proceedings, volume 2053 of Lecture Notes in Computer Science, pages 63–83. Springer-Verlag, 2001.

Peter Sestoft. Analysis and Efficient Implementation of Functional Languages. PhD thesis, University of Copenhagen, 1991. Olin Shivers. Control-flow Analysis of Higher-Order Languages. PhD thesis, Carnegie Mellon University, 1991. Scott F. Smith and Tiejun Wang. Polyvariant flow analysis with constrained types. In Gert Smolka, editor, Programming Languages and Systems, 9th European Symposium on Programming, ESOP 2000, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS 2000,

Nevin Heintze. Set-based analysis of ML programs. In Proceedings of the 1994 ACM Conference on LISP and Functional Programming, Orlando, Florida, USA, 27–29 June 1994, pages 306–317. ACM Press, 1994.

73

Keith Wansbrough and Simon Peyton Jones. Once upon a polymorphic type. In POPL ’99, Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 20–22, 1999, San Antonio, TX, pages 15– 28. ACM Press, 1999.

Berlin, Germany, March 25–April 2, 2000, Proceedings, volume 1782 of Lecture Notes in Computer Sciences, pages 382–396. Springer-Verlag, 2000. Yan Mei Tang and Pierre Jouvelot. Effect systems with subtyping. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, La Jolla, California, USA, June 21–23, 1995, pages 45–53. ACM Press, 1995.

Joe B. Wells, Allyn Dimock, Robert Muller, and Franklyn A. Turbak. A calculus with polymorphic and polyvariant flow types. Journal of Functional Programming, 12(3):183–227, 2002.

Mads Tofte and Jean-Pierre Talpin. Implementation of the typed call-by-value λ -calculus using a stack of regions. In Conference Record of POPL’94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Portland, Oregon, January 17–21, 1994, pages 188–201. ACM Press, 1994.

74

The Reduceron Reconfigured Matthew Naylor

Colin Runciman

University of York, UK {mfn,colin}@cs.york.ac.uk

Abstract

for ever bigger, faster and cheaper mass-market machines, the idea of specialised hardware for functional languages went out of fashion.

The leading implementations of graph reduction all target conventional processors designed for low-level imperative execution. In this paper, we present a processor specially designed to perform graph-reduction. Our processor – the Reduceron – is implemented using off-the-shelf reconfigurable hardware. We highlight the lowlevel parallelism present in sequential graph reduction, and show how parallel memories and dynamic analyses are used in the Reduceron to achieve an average reduction rate of 0.55 function applications per clock-cycle.

Reconfigurable Hardware Today, the situation is quite different. Field-programmable gate arrays (FPGAs) have greatly reduced the effort and expertise needed to develop special-purpose hardware. They contain thousands of parallel logic blocks that can be configured at will by software tools. They are widely-available and are an advancing technology that continues to offer improved performance and capacity. The downside of FPGA applications is that they typically have much lower maximum clocking frequencies than corresponding directly-fabricated circuits – this is the price to pay for reconfigurability. To obtain good performance using an FPGA, it is therefore necessary to exploit significant parallelism.

Categories and Subject Descriptors C.1.3 [Processor Architectures]: Other Architecture Styles—High-level language architectures; D.3.4 [Programming Languages]: Processors—Run-time environments; I.1.3 [Symbolic and Algebraic Manipulation]: Languages and Systems—Special-Purpose Hardware General Terms Keywords

1.

The Reduceron In this paper, we present a special-purpose machine for sequential graph reduction – the Reduceron – implemented on an FPGA. We build upon our previous work on the same topic [Naylor and Runciman 2007] by presenting a new design that exhibits a factor of five performance improvement. A notable feature of our new design is that each of its six semantic reduction rules is performed in a a single-clock cycle. All the necessary memory transactions required to perform a reduction are done in parallel. The Reduceron performs on average 0.55 hand-reductions per clock-cycle. A hand-reduction is a reduction that programmer would perform in by-hand evaluation trace of a program; it includes function application and case analysis, but not machine-level reductions such as updating and unwinding. Another notable development in our new design is the use of two dynamic analyses enabling update avoidance and speculative evaluation of primitive redexes, both of which lead to significant performance improvements. On conventional computers, the runtime overhead of these dynamic analyses would be prohibitive, but on FPGA they are cheap and simple to implement.

Design, Experimentation, Performance

Graph Reduction, Reconfigurable Hardware

Introduction

Efficient evaluation of high-level functional programs on conventional computers is a big challenge. Sophisticated techniques are needed to exploit architectural features designed for low-level imperative execution. Furthermore, conventional computers have limitations when it comes to running functional programs. For example, memory bandwidth is limited to serial communication in small units. Evaluators based on graph reduction perform intensive construction and deconstruction of expressions in memory. Each such operation requires sequential execution of many machine instructions, not because of any inherent data dependencies, but because of architectural constraints in conventional computers. All this motivates the idea of computers specially designed to meet the needs of high-level functional languages - much as GPUs are designed to meet needs in graphics. This is not a new idea. In the ’80s and ’90s there was a 15-year ACM conference series Functional Programming Languages and Computer Architecture. In separate initiatives, there was an entire workshop concerned with graph-reduction machines alone [Fasel and Keller 1987], and a major computer manufacturer built a graph-reduction prototype [Scheevel 1986]. But the process of constructing exotic new hardware was slow and uncertain. With major advances in compilation

Contributions

In summary, we give:

§2 a precise description of the Reduceron compiler, including refinements to the Scott encoding of constructors, used for compiling case expressions, addressing various efficiency concerns; §3 an operational semantics of the template instantiation machine underpinning the Reduceron implementation; §4 a detailed description of how each semantic reduction rule is implemented in a single clock-cycle using an FPGA; §5 extensions to the semantics to support (1) dynamic sharing analysis, used to avoid unnecessary heap updates, and (2) dynamic detection of primitive redexes, enabling speculative reduction of such expressions during function-body instantiation;

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

§6 a comparative evaluation of the Reduceron implementation against other functional language implementations.

75

e

a b d

::= | | | | | | | | ::= ::= ::=

~e case e of { ~a } let { ~b } in e n x p f C C~ x -> e x = e f~ x = e

Expression Graph

(Application) (Case Expression) (Let Expression) (Integer) (Variable) (Primitive) (Function) (Constructor) (Case Table) (Case Alternative) (Let Binding) (Function Definition)

let { x1 = e1 ; · · · ; xn = en } in e is an expression graph exactly if e is a flat expression and each ei for i in 1 · · · n is a flat expression. Expression graphs are restricted A-normal forms [Flanagan et al. 1993]. Constructor Index and Arity Each constructor C of a data type with m constructors is associated with a unique index in the range 1 · · · m. More precisely, the index of a constructor is its position in the alphabetically sorted list of all constructors of that data type. For example, the standard list data type has two constructors: Cons has index 1 and Nil has index 2. A constructor with index i is denoted Ci , and the arity of a constructor C is denoted #C. 2.3

Figure 1. Core syntax of F-lite.

2.

Primitive Applications

In a lazy language, an application of a primitive function such as (+), (-) or (<=) requires special treatment: the integer arguments must be fully evaluated before the application can be reduced. One simple approach is to transform binary primitive applications by the rule

Compilation

This section defines a series of refinements that take programs written in a lazy functional language called F-lite to a form known as template code which the Reduceron can execute. 2.1

A let expression

p e0 e1 → e1 (e0 p)

(1)

with the run-time reduction rule

Source Language

ne → en

F-lite is a core lazy functional language, close to subsets of both Haskell and Clean. The syntax of F-lite is presented in Figure 1.

(2)

for any fully evaluated integer literal n. To illustrate this approach, consider the expression (+) (tri 1) (tri 2). By compiletime application of rule (1), the expression is transformed to tri 2 ((tri 1) (+)). At run-time, reduction is as follows.

Case Expressions Case expressions are in a simplified form that can be produced by a pattern match compiler such as that defined in [Peyton Jones 1987]. Patterns in case alternatives are constructors applied to zero or more variables. All case expressions contain an alternative for every constructor of the case subject’s type.

= = = =

Primitives The meta-variable p denotes a primitive function symbol. All applications of primitive functions are fully saturated. The Reduceron implements only a small set of primitive operations, not the full set of a conventional processor – e.g. we have no floatingpoint operations. Primitives used in this paper include: (+), (-) and (<=).

tri 2 ((tri 1) (+)) 3 ((tri 1) (+)) (tri 1) (+) 3 1 (+) 3 (+) 1 3

{ { { {

tri 2 evaluates to 3 } Rule (2) } tri 1 evaluates to 1 } Rule (2) }

After transformation by rule (1), tri looks as follows. tri n = case 1 (n (<=)) of { False -> n (tri (1 (n (-))) (+)) ; True -> 1 }

Main Every program contains a definition main = e where e is an expression that evaluates to an integer n; the result of the program is the value n.

In §5, we present more efficient techniques for dealing with primitive applications.

Case Tables Notice the unusual case-table construct . Case tables are introduced during compilation – see §2.4.

2.4

Examples Here are two example function definitions. The first concatenates two lists and the second computes triangular numbers.

This section describes how case expressions are compiled. First we recall the Scott encoding recently rediscovered by Jansen [Jansen et al. 2007]. Then we make a number of refinements to this encoding.

append xs ys = case xs of { Nil -> ys ; Cons x xs -> Cons x (append xs ys) }

The Scott/Jansen Encoding The first step of the encoding is to generate, for each constructor Ci of a data type with m constructors, a function definition

tri n = case (<=) n 1 of { False -> (+) (tri ((-) n 1)) n ; True -> 1 }

2.2

Case Expressions

Ci x1 · · · x#Ci k1 · · · km = ki x1 · · · x#Ci

Terminology

Application Length The length of an application e1 · · · en is n. For example, the length of the application append xs ys is three.

The idea is that each data constructor Ci is encoded as a function that takes as arguments the #Ci arguments of the constructor and m continuations.The function encoding constructor Ci passes the constructor arguments to the ith continuation. For example, the list constructors are transformed to the following functions.

Compound and Atomic Expressions Applications, case expressions and let expressions are compound expressions. All other expressions are atomic.

Cons x xs c n = c x xs Nil c n = n

Flat Expression A flat expression is an atomic expression or an application e1 · · · en in which each ei for i in 1 · · · n is an atomic expression. For example, append xs ys is a flat expression, but tri ((-) n 1) is not.

Now case expressions of the form case e of { C1 ~ x1 -> e1 ; · · · ; Cm ~ xm -> em }

76

Refinement 2 We now have a large row of contiguous constants in the body of eval. To allow these constants to be represented efficiently (see §2.7) we place them in a case table. Case expressions are transformed to

are transformed to e (alt1 ~v1 ~ x1 ) · · · (altm ~vm ~ xm ) where ~vi are the free variables occurring in the ith case alternative and each alti for i in 1 · · · m has the definition

e ~v

alti ~vi ~ x i = ei

and each constructor Ci is encoded as

For example, the append function is transformed to

Ci x1 · · · x#Ci t = (t ! i) x1 · · · x#Ci

append xs ys = xs (consCase ys) (nilCase ys) consCase ys x xs = Cons x (append xs ys) nilCase ys = ys

where t!i returns the ith element of case table t. Refinement 3 An evaluator can handle constructors more efficiently than general function definitions. We could introduce the following reduction rule for constructors.

Notice that the application of nilCase could be reduced at compile time. This is a consequence of constructor Nil having arity 0.

Ci e1 · · · e#Ci t → (t ! i) e1 · · · e#Ci

Larger Example Now let us look at a slightly larger example: an evaluator for basic arithmetic expressions. eval x y e = Add n m -> Neg n -> Sub n m -> X -> Y -> }

This rule replaces a constructor with a case-alternative function by looking up the case table using the constructor’s index. However, the rule also drops the t argument. As a result, an implementation would have to slide the constructor arguments down the stack. A reduction rule that does not require argument sliding is

case e of { (+) (eval x y n) (eval x y m); (-) 0 (eval x y n); (-) (eval x y n) (eval x y m); x; y;

Ci e1 · · · e#Ci t → (t ! i) e1 · · · e#Ci t

To account for the fact that t has not been dropped, the casealternative functions take the form:

After transformation, and in-lining the nullary cases, we have: eval add neg sub

x x x x

y y y y

(3)

alti ~ xi t ~v = ei

e = e (add x y) (neg x y) (sub x y) x y n m = (+) (eval x y n) (eval x y m) n = (-) 0 (eval x y n) n m = (-) (eval x y n) (eval x y m)

The final version of append is append xs ys = xs ys consCase x xs t ys = Cons x (append xs ys) nilCase t ys = ys

Look at the large body of eval: it contains three nested function applications and several repeated references to x and y. In typical functional language implementations, large function bodies are more expensive to construct than small ones.

The t argument is simply ignored by the case alternatives. The final version of tri is

Refinement 1 Rather than partially apply each case-alternative function to the free variables it refers to, we can define every alternative function alike to take all free variables occurring in any alternative. A case alternative can simply ignore variables that it does not need. So, let us instead transform case expressions to

tri n = 1 (n (<=)) n falseCase t n = n (tri (1 (n (-))) (+)) trueCase t n = 1

In §3.3 and §4.5 we will see how these refinements enable efficient choices to be made at the implementation level.

e alt1 · · · altm ~v 2.5 In-lining Our definition of append is no longer directly recursive. This is a consequence of splitting the case alternatives off as new function definitions. However, direct recursion is easily recovered: simply in-line the definition of append in the body of consCase.

where ~v is the union of the free variables in each case alternative, and each alti for i in 1 · · · m has the definition alti ~ xi ~v = ei Each case-alternative function now takes the constructor arguments followed by the free variables, rather than the other way around. To illustrate, append now looks as follows.

consCase x xs t ys = Cons x (xs ys)

This transformation motivates the following general in-lining rule: in-line saturated applications of functions that have flat bodies. Inlining a flat expression e is often a big win because it eliminates a reduction and e is often no larger than the application it replaces.

append xs ys = xs consCase nilCase ys consCase x xs ys = Cons x (append xs ys) nilCase ys = ys

And eval becomes

2.6

eval x y e = e add neg sub xCase yCase x y add n m x y = (+) (eval x y n) (eval x y m) neg n x y = (-) 0 (eval x y n) sub n m x y = (-) (eval x y n) (eval x y m) xCase x y = x yCase x y = y

Expression Graphs

It is convenient for implementation purposes to make the graph structure of function bodies explicit by transforming them to expression graphs (§2.2). This is achieved by three rewrite rules. (1) Lift nested applications into let bindings e1 · · · (ei ) · · · en → let { x = ei } in e1 · · · x · · · en

The new bodies of append and eval contain no nested function applications and no repeated references. An apparent disadvantage is that we have had to introduce functions for the 0-arity constructor cases nilCase, xCase, and yCase. But our next refinement prepares the way to recover the cost of applying these functions.

where ei is an application or a let expression, and x is a fresh variable. (2) Lift let expressions out of let bodies. let { ~b0 } in (let { ~b1 } in e) → let { ~b0 ; ~b1 } in e

77

> data Atom = > FUN Arity Int > | ARG Int > | PTR Int > | CON Arity Int > | INT Int > | PRI String > | TAB Int

--------

each case table occurring in e0 · · · en is translated to an atom TAB i. We assume that the functions in each case table are defined contiguously in the program.

Function with arity and address Reference to a function argument Pointer to an application Constructor with arity and index Integer literal Primitive function name Case table

Example

The template code for the program

main = tri 5 tri n = let x = n (<=) in 1 x n falseCase t n = let {x0 = tri x1 (+); x1 = 1 x2; x2 = n (-)} in n x0 trueCase t n = 1

Figure 2. Syntax of atoms in template code.

is as follows. (3) Lift let expressions out of let bindings.

> tri5 :: Prog > tri5 = [ (0, [FUN 1 1, INT 5], []) > , (1, [INT 1, PTR 0, TAB 2, ARG 0], > [[ARG 0, PRI "(<=)"]]) > , (2, [ARG 1, PTR 0], > [[FUN 1 1, PTR 1, PRI "(+)"], > [INT 1, PTR 2], > [ARG 1, PRI "(-)"]]) > , (2, [INT 1], []) ]

let { · · · ; x = let { ~b } in e0 ; · · · } in e1 → let { · · · ; ~b ; x = e0 ; · · · } in e1 These rules assume no variable shadowing. To illustrate, the definition of falseCase becomes: falseCase t n = let {x0 = tri x1 (+); x1 = 1 x2; x2 = n (-)} in n x0

It is easy to see the number and length of applications in an expression graph. For example, falseCase contains four applications and its longest application, tri x1 (+), has length three.

3.

Operational Semantics

We are now very close to the template code that can be executed by the Reduceron. We shall define template code as a Haskell data type, paving the way for an executable semantics to be defined in the next section. To highlight the semantics, each semantic definition is prefixed with a ’>’ symbol. In template code, a program is defined to be a list of templates.

This section defines a small-step operational semantics for the Reduceron. There are two main reasons for presenting a semantics: (1) to define precisely how the Reduceron works; and (2) to highlight the low-level parallelism present in graph reduction that is exploited by the Reduceron. We have found it very useful to encode the semantics directly in Haskell. Before we commit to a low-level implementation, we can assess the complexity and performance of different design decisions and optimisations. At the heart of the semantic definition is the small-step state transition function

> type Prog = [Template]

> step :: State -> State

A template represents a function definition. It contains an arity, a spine application and a list of nested applications.

where the state is a 4-tuple comprising a program, a heap, a reduction stack, and an update stack.

> type Template = (Arity, App, [App]) > type Arity = Int

> type State = (Prog, Heap, Stack, UStack)

2.7

Template Code

The heap is modelled as a list of applications, and can be indexed by a heap-address.

The spine application holds the let-body of a definition’s expression graph and the nested applications hold the let-bindings. Applications are flat and are represented as a list of atoms.

> type Heap = [App] > type HeapAddr = Int

> type App = [Atom]

An element on the heap can be modified using the update function.

An atom is a small, tagged piece of non-recursive data, defined in Figure 2. The following paragraphs define how programs are translated to template code. Functions

> update :: HeapAddr -> App -> Heap -> Heap > update i a as = take i as ++ [a] ++ drop (i+1) as

The reduction stack is also modelled as a list of nodes, with the top stack element coming first and the bottom element coming last.

Given a list of function definitions

> type Stack = [Atom] > type StackAddr = Int

f0 ~ x 0 = e0 , · · · , f n ~ x n = en each function identifier fi occurring in e0 · · · en is translated to an atom FUN #f i where #f is the arity of function f .

There is also an update stack. > type UStack = [Update] > type Update = (StackAddr, HeapAddr)

Arguments In each definition f x0 · · · xn = e, each variable xi occurring in e is translated to an atom ARG i. Let-Bound Variables

The meaning of a program p is defined by run p where

In each expression graph

> run :: Prog -> Int > run p = eval initialState > where initialState = (p, [], [FUN 0 0], [])

let { x0 = e0 ; · · · ; xn = en } in e each xi occurring in e, e0 · · · en is translated to an atom PTR i.

> eval (p, h, [INT i], u) = i > eval s = eval (step s)

Integers, Primitives, and Constructors An integer literal n, a primitive p, and a constructor Ci are translated to atoms INT n, PRI p, and CON #Ci i respectively. Case Tables

The initial state of the evaluator comprises a program, an empty heap, a singleton stack containing a call to main, and an empty update stack. The main template has arity 0 and is assumed to be the template at address 0. To illustrate, run tri5 yields 15. In the following sections, the central step function is defined.

Given a list of function definitions f0 ~ x 0 = e0 , · · · , f n ~ x n = en

78

3.1 Primitive Reduction The prim function applies a primitive function to two arguments supplied as fully-evaluated integers. > > > >

prim prim prim prim

> step (p, h, CON n j:s, u) = (p, h, FUN 0 (i+j):s,u) > where TAB i = s!!n

There is insufficient information available to compute the arity of the case-alternative function at address i+j. However, an arity of zero can be used because a case-alternative function is statically known not to be partially applied (§3.2). Function Application To apply a function f of arity n, n + 1 elements are popped off the reduction stack, the spine application of the body of f is instantiated and pushed onto the reduction stack, and the remaining applications are instantiated and appended to the heap.

:: String -> Atom -> Atom -> Atom "(+)" (INT n) (INT m) = INT (n+m) "(-)" (INT n) (INT m) = INT (n-m) "(<=)" (INT n) (INT m) = bool (n<=m)

The comparison primitive returns a boolean value. Both boolean constructors have arity 0; False has index 0 and True has index 1. > bool :: Bool -> Atom > bool False = CON 0 0 > bool True = CON 0 1

> step (p, h, FUN n f:s, u) = (p, h’, s’, u) > where > (pop, spine, apps) = p !! f > h’ = h ++ map (instApp s h) apps > s’ = instApp s h spine ++ drop pop s

3.2 Normal Forms The number of arguments demanded by an atom on top of the reduction stack is defined by the arity function. > > > > >

arity arity arity arity arity

Instantiating a function body involves replacing the formal parameters with arguments from the reduction stack and turning relative pointers into absolute ones.

:: Atom -> Arity (FUN n i) = n (INT i) = 1 (CON n i) = n+1 (PRI p) = 2

> instApp :: Stack -> Heap -> App -> App > instApp s h = map (inst s (length h)) > > > >

To reduce an integer, the evaluator demands one argument as shown in rewrite rule (2). And to reduce a constructor of arity n, the evaluator requires n + 1 arguments (the constructor’s arguments and the case table) as shown in rewrite rule (3). The arity of an atom is only used to detect when a normal form is reached. A normal form is an application of length n whose first atom has arity ≥ n. Some functions, such as case-alternative functions, are statically known never to be partially-applied, so they cannot occur as the first atom of a normal form. Such a function, say with address n, can be represented by the atom FUN 0 n. 3.3

inst inst inst inst

4.

:: Stack -> HeapAddr -> Atom -> Atom s base (PTR p) = PTR (base + p) s base (ARG i) = s !! i s base a = a

Implementation

We now refine the semantic definition to an actual implementation that runs on an FPGA. Specifically, our target is a mid-range Xilinx Virtex-5 released in 2008. Our guiding design principle is to perform as much reduction as possible in each clock-cycle. Our implementation performs each semantic reduction rule in a single clock-cycle, and clocks at a modest but respectable frequency for processor-like FPGA designs.

Step-by-Step Reduction

There is one reduction rule for each possible type of atom that can appear on top of the reduction stack.

4.1

Low-Level Parallelism

Below we motivate three main opportunities for parallelism that we exploit in our implementation.

Unwinding If the top of the reduction stack is a pointer x to an application on the heap, evaluation proceeds by unwinding: copying the application from the heap to the reduction stack where it can be reduced. We must also ensure that when evaluation of the application is complete, the location x on the heap can be updated with the result. So we push onto the update stack the heap address x and the current size of the reduction stack.

Parallel Memories The state of the reduction machine comprises four independent memory regions: the program, the heap, the reduction stack and the update stack. Most reduction rules refer to and modify more than one memory region. For example, the reduction rule for unwinding writes to both the reduction stack and the update stack. If the four memory regions are implemented as four separate memory units then they can be accessed in parallel, avoiding contentions that would arise if they were all stored in a single memory unit.

> step (p, h, PTR x:s, u) = (p, h, h!!x ++ s, upd:u) > where upd = (1+length s, x)

Updating Evaluation of an application is known to be complete when an argument is demanded whose index is larger than n, the difference between the current size of the reduction stack and the stack address of the top update. If this condition is met, then a normal form of arity n is on top of the reduction stack and must be written to the heap.

Wide Memories Many of the reduction rules involve transferring applications to and from memory. If a memory only allows one atom to be accessed at a time, transferring a single application involves several memory accesses. If memories are wide enough to allow a whole application to be accessed at a time, transferring an application needs only a single memory access.

> step (p, h, top:s, (sa,ha):u) > | arity top > n = (p, h’, top:s, u) > where > n = 1+length s - sa > h’ = update ha (top:take n s) h

Parallel Instantiation The reduction rule for function application involves instantiating each application in a function body and appending it to the heap. Each atom in an application can be instantiated in parallel, as indicated by the use of map in the definition of instApp. The wide heap then allows the instantiated application to be written in one memory access. Further, each application in a function body can also be instantiated in parallel, as indicated by the use of map in semantic rule for function application. If more than one application can be appended to the heap at a time, parallel instantiation of applications is possible.

Integers and Primitives Integer literals and primitive functions are reduced as described in §2.3. > step (p, h, INT n:x:s, u) = (p, h, x:INT n:s, u) > step (p, h, PRI f:x:y:s, u) = (p, h, prim f x y:s, u)

Constructors Constructors are reduced by indexing a case table, as described in §2.4.

79

4.2

Memory Unit Program Heap Reduction Stack Update Stack Case-Table Stack Copy Space

Bounded Template Instantiation

Maximum Application Length Ideally, we would have a wide enough data bus to transfer any entire application in one go. However, this is an impossibility without some upper bound on the length of an application. Therefore, we introduce a bound, MaxAppLen, on the number atoms that can occur in an application. To deal with an application whose length is larger than MaxAppLen, we split it into two or more smaller ones. For example, if MaxAppLen is 3 the application f a b c d e can be bracketed ((f a b) c d) e resulting in three applications rather than one. An alternative way to bound application length is to split applications into chunks that are aligned contiguously in memory, with the final chunk especially tagged by an end-marker. This approach [Naylor and Runciman 2007] is more efficient in some cases, but it cannot be expressed as a core-language transformation.

Template Splitting We explain template splitting by example. Consider the following template, representing the falseCase function occurring in the tri5 program defined in §2.7.

4.3

Spine Application 1 Application 2 Application 3

Memory Layout

Our Xilinx Virtex-5 FPGA contains 296 dual-port block RAMs each with a capacity of 18 kilobits giving a total on-chip RAM capacity of 5,328 kilobits. Each block RAM is dual-port allowing two independent accesses per clock-cycle. The data-bus and addressbus widths of each block-RAM are configurable. Possible configurations include 1k-by-18bit and 16k-by-1bit, and a range of possibilities in-between. Two 18 kilobit block RAMs can be merged to give further possible configurations ranging from 1k-by-36bit to 32k-by-1bit. For simplicity, our implementation uses FPGA block RAMs only; no off-chip RAMs are used. This represents a tight constraint on the amount of memory available to the implementation. (The possibility of introducing off-chip memories is discussed in §7.)

It contains one spine application and three nested applications. If MaxAppsPerBody is two then this template is split into two subtemplates. The first sub-template (0, [FUN 0 4] -- Intermediate spine , [ [FUN 1 1, PTR 1, PRI "(+)"] -- Application 1 , [INT 1, PTR 2] ] ) -- Application 2

replaces the original template in the tri5 program. The second sub-template is appended to the program at the next free program address: address four in the case of the tri5 program. (2, [ARG 1, PTR (-2)] , [ [ARG 1, PRI "(-)"] ])

Elements 1k 32k 8k 4k 4k 16k

high wastes resources. Our choices are informed by experiment. Table 1 shows the performance effect of varying each parameter in turn – non-varying parameters are effectively defined as infinity. The reduction count and heap usage figures are normalised across the varying parameter and averaged across a range of benchmark programs (see §6.1). The measurements are obtained using a PC implementation of the operational semantics. The reduction count represents the number of times that the step function is applied in the definition of eval1 . The chosen bounds are: MaxAppLen = 4, MaxSpineLen = 6, and MaxAppsPerBody = 2. The measurements suggest a MaxAppLen of three is preferable to four due to better heap usage; the choice of four is motivated by another implementation parameter – the arity limit – introduced in §4.3. A MaxSpineLen of five would not be much worse than six, but the choice of six does not cost much extra at the implementation level. A MaxAppsPerBody of two is motivated by the fact that three would not be much better and that two fits nicely with the dual-port memories available on the FPGA.

Maximum Applications per Template Ideally, all applications in a template would be instantiated in parallel. To allow for such an implementation, we introduce a bound, MaxAppsPerBody, on the maximum number of applications that can occur in a template body. To deal with templates containing more applications than MaxAppsPerBody, we employ a technique called template splitting.

-----

Bits/Element 234 77 18 28 18 77

Table 2. Size and type of each parallel memory unit.

Maximum Spine Length Spine applications are special because, during function application, they are written to the stack, not the heap. So it is fine for spine applications to have a different maximum length: MaxSpineLen.

(2, [ARG 1, PTR 0] , [ [FUN 1 1, PTR 1, PRI "(+)"] , [INT 1, PTR 2] , [ARG 1, PRI "(-)"] ] )

Element Template App Atom Update Atom App

-- Spine -- Application 3

Memory Structure The parallel memory units, each built out of block RAMs, are listed in Table 2 along with their capacities and the type of element stored at every addressable location. Note that there are uniform sizes for every program template and for every heap application. The two memory units at the bottom of the table are introduced in §4.5 and §4.6 respectively.

The spine of the first sub-template is simply a call to the second sub-template. There are three important points to note: • The first sub-template contains three applications, which is still

larger than MaxAppsPerBody. However, at the implementation level, we do not count a spine application of the form [FUN 0 f ] as an application: it can be interpreted simply as “jump to template f ”, and does not entail any heap or stack accesses.

Wide Memories The wide heap memory is implemented by concatenating the data-busses of 77 32k-by-1bit block RAMs and merging their address-busses. This is done on both ports of each block RAM, making a dual-port heap. Similarly, the wide program memory is implemented using 13 1k-by-18bit block RAMs, but this time the dual-port capability is not needed.

• In the second sub-template, each atom of the form PTR n is

replaced by PTR (n−2) to account for the fact that instantiation of the first sub-template will have increased the size of the heap by two.

Stack Memories We store the top N stack elements in specialpurpose stack registers. In any given clock-cycle, the stack implementation allows: the top N elements to be observed; and up to N elements to be popped off; and up to N elements to be pushed on. If pushing and popping occur the same clock-cycle, the pop is

• The arity of the first sub-template is set to zero: no elements are

popped from the stack since they may be required by the second sub-template. Choosing the Bounds We must choose the values of the bounds MaxAppLen, MaxSpineLen, and MaxAppsPerBody carefully: making them too low prevents useful parallelism; making them too

1 Constructor

reductions are not counted, anticipating the optimisation presented in §4.5.

80

MaxAppLen 2 3 4 5 6

Reductions 1.00 0.84 0.83 0.82 0.82

Heap 1.00 1.00 1.30 1.57 1.89

MaxSpineLen 2 3 4 5 6

Reductions 1.00 0.82 0.76 0.71 0.70

Heap 1.00 0.76 0.67 0.60 0.57

MaxAppsPerBody 1 2 3 4

Reductions 1.00 0.89 0.85 0.85

Table 1. Effect of application-length, spine-length, and applications-per-template bounds on reduction count and heap usage. • the first atom of app is the new top of the reduction stack and is

performed before the push. Simultaneous access to the top N elements of the stack is achieved by a crossbar switch. It requires over 2,000 logic gates, but this is less than 1% of our FPGA’s logic-gate capacity. There is a lot of parallelism in a crossbar, so the investment is worth it. Further hardware-level implementation details of the stack implementation are available in [Memo 27].

used to lookup heap and program memory in order to maintain Invariants 1 and 2. Updating The update stack’s data bus is used to determine if an update is required, and if so, at what heap address x. If an update is required, then a normal form is available on the reduction stack’s data bus. The following memory transactions are performed in parallel in a single clock-cycle:

Arity Limit The stack implementation is parameterised by N , but requires N to be a power of two. For the update stack, N is defined to be 1 since reading and writing multiple values is of no benefit. For the reduction stack, there are three considerations to take into account, bearing in mind the aim of single-cycle reduction: (1) only the top N stack elements are available in any clock-cycle, hence the maximum number of arguments that be taken by a function is N − 1; (2) the maximum length of a partially-applied function application, or normal-form, is therefore N − 1; and (3) the choice of N should allow a normal form of length N − 1 to be written onto the heap in a single clock cycle. As two applications of length M axAppLen can be written to the dual-port heap per clock-cycle, and M axAppLen is four, a sensible choice for N is eight since a normal form of length seven can be bracketed perfectly into two applications of length four. To deal with functions taking more than N −1 arguments, an abstraction algorithm can be used [Turner 1979]. We have developed a minor variant [Memo 12] of an abstraction algorithm based on director strings [Dijkstra 1980, Kennaway and Sleep 1988] which uses a more coarse-grained combinator set than Turner’s algorithm.

• if the normal form has length less than or equal to four it is

written to the heap at address x; • if the normal form has length larger than four, it is bracketed

into two applications of maximum length four one of which is written to the heap at address x, and the other of which is appended to the heap; • the top element of the update stack is popped; and • a program lookup is performed to preserve Invariant 2.

A heap lookup to preserve Invariant 1 is not necessary since the top of the reduction stack cannot possibly be of the form PTR x if an update is being performed. So updating requires at most two heap accesses, which can be done parallel thanks to dual-port memory. Integers, Primitives, and Constructors Each of these reduction rules involves a pure stack manipulation, and each straightforwardly consumes a single clock-cycle.

Heap and program memory units have the following two properties.

Function Application The top of the reduction stack has the form FUN n f . So the template of f , say t, is available on the data bus (Invariant 2). There are two cases to consider.

• If a memory location x is read in clock-cycle n, the value

Case 1: If t contains a spine application of the form [FUN 0 f ], then:

at address x becomes available on the memory’s data bus on clock-cycle n + 1.

• up to two nested applications in t are instantiated and appended

4.4

One Reduction per Clock-Cycle

to the heap;

• If a value is written to memory location x in clock-cycle n, the

• the atom FUN 0 f is written to the top of the reduction stack;

new value at address x is not apparent until clock-cycle n + 1.

and

The top stack elements are always observable without any clockcycle delay. Now we show how each reduction rule in the semantics can be performed in a single clock-cycle, with reference to the following two invariants.

• function f is looked-up in program memory in order to preserve

Invariant 2. Case 2: If t is of some other form, then: • zero or one nested applications in t are instantiated and ap-

Invariant 1: If the top of the reduction stack is of the form PTR x then the application at heap address x is currently available on the heap memory’s data bus.

pended to the heap; • the spine application in t is instantiated and written to the

reduction stack; and

Invariant 2: If the top of the reduction stack is of the form FUN n f then the template at program address f is currently available on the program memory’s data bus.

• the first element of the instantiated spine is used to lookup heap

and program memory to preserve Invariants 1 and 2. In Case 1, a heap lookup to preserve Invariant 1 is not required: the top of the stack is known to be a FUN, not a PTR. Thus in each case, at most two heap access are required.

Unwinding The top of the reduction stack has the form PTR x. So the application currently on the heap’s data bus, say app, is the application at heap address x (Invariant 1). The following memory transactions are performed in parallel in a single clock-cycle:

4.5

• the application app is pushed onto the reduction stack;

The Case-Table Stack

Constructor reduction modifies only the top element of the reduction stack by adding the index of the constructor to the address of a case table. This addition is almost cheap enough to be implemented

• an update (n, x) is pushed onto the update stack where n is the

size of the reduction stack before modification; and

81

in combinatorial logic (i.e. in zero clock-cycles) without affecting the critical path delay of the circuit. The problem is that the case table must be fetched from a variable position on the stack. This requires a multiplexer, making the combinatorial logic more expensive. To solve this problem, we introduce a new stack memory to store case tables. When unwinding an application containing a case table, the case table is pushed onto the case-table stack. When performing constructor reduction, the case table of interest is always in the same position: the top of the case-table stack. Table 3 shows the impact of various optimisations on clockcycle count and heap usage across a range of benchmark programs. The in-lining strategy defined in §2.5 and the case-table optimisation both result in significant performance gains on average. The other optimisations in Table 3 are introduced in §5. 4.6

ation. There are two cases in which such an update is unnecessary: (1) the application is already evaluated, and (2) the application is not shared so its result will never be needed again. We identify non-shared applications at run-time, by dynamic analysis. Argument and pointer atoms are extended to contain an extra boolean field. > data Atom = · · · | ARG Bool Int | PTR Bool Int | · · ·

An argument is tagged with True exactly if it is referenced more than once in the body of a function. A pointer is tagged with False exactly if it is a unique pointer; that is, it points to an application that is not pointed to directly by any other atom on the heap or reduction stack. There may be multiple pointers to an application containing a unique pointer, so the fact that a pointer is unique is, on its own, not enough to infer that it points to a non-shared application. To identify non-shared applications, we maintain the invariant:

Garbage Collection

Our implementation employs a simple two-space stop-and-copy garbage collector [Jones and Lins 1996]. Although a two-space collector may not make the best use of limited memory resources, it does have the attraction of being easy to implement. In particular, the algorithm is easily defined iteratively so that no recursive call stack is needed. 4.7

Invariant 3: A unique pointer occurring on the reduction stack points to a non-shared application. A pointer that is not unique is referred to as possibly-shared. Unwinding

> step (p, h, PTR sh x:s, u) = (p, h, app++s, upd++u) > where > app = map (dashIf sh) (h!!x) > upd = [(1+length s, x) | sh && red (h!!x)]

Hardware Description

The Reduceron is described entirely in around 2,000 lines of York Lava [Naylor et al. 2009], a hardware description language embedded in Haskell. A large proportion of the description deals with garbage collection and the bit-level encoding of template code; the actual reduction rules account for less than 400 lines. The Reduceron description is quite different to other reported Lava applications. It combines structural and behavioural description styles. Behavioural description brings improved modularity to our description. We associate each reduction rule with the memory transactions it performs, rather than associating each memory unit with all the memory transactions performed on it. So each reduction rule can be expressed in isolation. The behavioural description language, called Recipe and included with York Lava, takes the form of a 300 line Lava library. It provides mutable variables, assignment statements, sequential and parallel composition, conditional and looping constructs, and shared procedure calls. In addition, it uses the results of a simple timing analysis, implemented by abstract interpretation, to enable optimisations. 4.8

If the pointer on top of the stack is possibly-shared, then the application is dashed before being copied onto the stack by marking each atom it contains as possibly-shared. This has the effect of propagating sharing information through an application. > dashIf sh a = if sh then dash a else a > dash (PTR sh s) = PTR True s > dash a = a

If the pointer on top of the stack is unique, the application it points to must be non-shared according to Invariant 3. An update is only pushed onto the update stack if the pointer is possibly-shared and the application is reducible. An application is reducible if it is saturated or its first atom is a pointer. > red :: App -> Bool > red (PTR sh i:xs) = True > red (x:xs) = arity x <= length xs

Updating When an update occurs, the normal-form on the stack is written to the heap. The normal-form may contain a unique pointer, but the process of writing it to the heap will duplicate it. Hence the normal-form on the stack is dashed.

Synthesis Results

Synthesising our implementation on a Xilinx Virtex-5 LX110T (speed-grade 1) yields an FPGA design using 14% of available logic slices and 90% of available block RAMs. The maximum clock frequency after place-and-route is 96MHz. By comparison, Xilinx distributes a hand-optimised RISC soft-processor called the MicroBlaze that clocks at 210MHz on the same FPGA. However, as the Reduceron performs a lot of computation per clock-cycle, 96MHz seems respectable. Furthermore, the MicroBlaze supports up to five pipeline stages, whereas the Reduceron is not pipelined.

5.

> step (p, h, top:s, (sa,ha):u) > | arity top > n = (p, h’, top:dashN n s, u) > where > n = 1+length s - sa > h’ = update ha (top:take n s) h > dashN n s = map dash (take n s) ++ drop n s

It is unnecessary to dash the normal-form that is written to the heap, but there is no harm in doing so: the application being updated is possibly-shared, and a possibly-shared application will anyway be dashed when it is unwound onto the stack.

Optimisations

This section presents several optimisations, defined by a series of progressive modifications to the semantics defined in §3. A theme of this section is the use of cheap dynamic analyses to improve performance. 5.1

The reduction rule for unwinding becomes

Function Application When instantiating a function body, shared arguments must be dashed as they are fetched from the stack.

Update Avoidance

> inst s base (PTR sh p) = PTR sh (base + p) > inst s base (ARG sh i) = dashIf sh (s!!i) > inst s base a = a

Recall that when evaluation of an application on the heap is complete, the heap is updated with the result to prevent repeated evalu-

82

Program Adjoxo Braun Cichelli Clausify CountDown Fib KnuthBendix Mate MSS OrdList PermSort Queens Queens2 SumPuz Taut While

Baseline Time Heap 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

+In-lining Time Heap 0.85 0.80 0.84 0.93 0.93 0.97 0.79 0.59 0.95 0.97 1.28 2.33 0.81 0.66 0.83 0.45 0.92 1.00 0.73 0.67 0.77 0.77 0.75 0.54 0.82 0.92 0.95 1.06 0.90 0.99 0.93 0.95

+Case Stack Time Heap 0.71 0.80 0.63 0.93 0.77 0.97 0.59 0.59 0.86 0.97 1.21 2.33 0.63 0.66 0.67 0.45 0.84 1.00 0.55 0.67 0.62 0.77 0.68 0.54 0.67 0.92 0.80 1.06 0.70 0.99 0.77 0.95

+Update Avoid. Time Heap 0.54 0.80 0.46 0.93 0.56 0.97 0.48 0.59 0.70 0.97 0.96 2.33 0.48 0.66 0.50 0.45 0.61 1.00 0.42 0.67 0.48 0.77 0.51 0.54 0.55 0.92 0.60 1.05 0.56 0.99 0.58 0.95

+Infix Prims. Time Heap 0.43 0.49 0.43 0.88 0.42 0.36 0.41 0.42 0.49 0.53 0.75 2.00 0.43 0.58 0.43 0.29 0.38 0.51 0.42 0.67 0.42 0.69 0.40 0.39 0.50 0.77 0.50 0.74 0.51 0.90 0.50 0.81

+PRS Time Heap 0.36 0.41 0.42 0.88 0.41 0.33 0.41 0.42 0.31 0.33 0.35 0.33 0.40 0.49 0.40 0.25 0.24 0.03 0.42 0.67 0.42 0.69 0.21 0.11 0.50 0.73 0.48 0.63 0.50 0.87 0.49 0.80

Average

1.00

0.88

0.74

0.57

0.47

0.40

1.00

0.92

0.92

0.92

0.69

0.50

Table 3. Impact of optimisations on clock-cycle count and heap usage across a range of programs. • an integer reduction is required after e0 is evaluated; and

Performance Table 3 shows that, overall, update avoidance offers a significant run-time improvement. On average, 88% of all updates are avoided across the 16 benchmark programs. Just over half of these are avoided due to non-reducible applications, and just under half of them are avoided due to non-shared reducible applications. The average maximum update-stack usage drops from 406 to 11.

• a primitive reduction is required.

With the new approach, the application is translated to e0 (+) e1 at compile-time. At run-time, in two successive clock-cycles: • an integer reduction is required after e0 is evaluated; and • a primitive reduction is required after e1 is evaluated.

5.2 Infix Primitive Applications For every binary primitive function p, we introduce a new primitive *p, a version of p that expects its arguments flipped.

Also note that e0 (+) e1 comprises one application whereas e1 (e0 (+)) comprises two, so the former is cheaper to instantiate. Table 3 shows run-time and heap-usage improvements brought by the new approach.

> prim (’*’:p) n m = prim p m n

Any primitive function p can be flipped.

5.3 Speculative Evaluation of Primitive Redexes Consider evaluation of the expression tri 5. Application of tri yields the expression

> flip (’*’:p) = p > flip p = ’*’:p

Now we translate binary primitive applications by the rule pmn → mpn

case (<=) 5 1 of { False -> (+) (tri ((-) 5 1)) 5 ; True -> 1 }

(4)

In place of the existing reduction rules for primitives and integers, we define:

which contains two primitive redexes: (<=) 5 1 and (-) 5 1. This section introduces a technique called primitive-redex speculation (PRS) in which such redexes are evaluated during function body instantiation. For example, application of tri instead yields

> step (p, h, INT m:PRI f:INT n:s, u) = > (p, h, prim f m n:s, u) > step (p, h, INT m:PRI f:x:s, u) = > (p, h, x:PRI (flip f):INT m:s, u)

case False of { False -> (+) (tri 4) 5 ; True -> 1 }

The benefit is that primitive redexes need not be constructed in memory, nor fetched again when needed. Even if the result of a primitive redex is not needed, reducing it is no more costly than constructing it. We identify primitive redexes at run-time, by dynamic analysis. Register File To support PRS, we introduce a register file to the reduction machine, for storing the results of speculative reductions.

If both arguments are already evaluated, the primitive is applied. If only the first argument is evaluated, then the arguments are swapped and the primitive is flipped. Note that compilation rule (4) could just as sensibly be p m n → n *p m

(5)

In the interest of efficiency, the choice between (4) and (5) is informed for each primitive application by compile-time knowledge of whether m or n is expected to be already-evaluated.

> type RegFile = [Atom]

The body of a function may refer to these results as required. > data Atom = · · · | REG Bool Int

Example Consider the steps needed to evaluate (+) e0 e1 . Using the approach to primitive reduction of §3.1, the application is translated to e1 (e0 (+)) at compile time. At run-time, in four successive clock-cycles:

An atom of the form REG b i contains a reference i to a register, and a boolean field b that is true exactly if there is more than one reference to the register in the body of the function. The instantiation functions inst and instApp are modified to take the register file r as an argument, and the following equation is added to the definition of inst.

• an integer reduction is required after e1 is evaluated; • an unwinding is required to fetch argument e0 (+) from heap;

83

> inst s r base (REG sh i) = dashIf sh (r !! i)

Program Adjoxo Braun Cichelli Clausify CountDown Fib KnuthBendix Mate MSS OrdList PermSort Queens Queens2 SumPuz Taut While

Waves The primitive redexes in a function body are evaluated in a series of waves. To illustrate, consider (+) 1 ((+) 2 3). In the first wave of speculative evaluation, (+) 2 3 would be reduced to 5; in the second wave, (+) 1 5 would be reduced to 6. More specifically, a wave is a list of independent primitive redex candidates. A primitive redex candidate is an application which may turn out at run-time to be a primitive redex. Specifically, it is an application of the form [a0 , PRI p, a1 ] where a0 and a1 are INT, ARG or REG atoms. > type Wave = [App]

Templates are extended to contain a list of waves in which no application in a wave depends on the result of an application in the same or a later wave. > type Template = (Arity, App, [App], [Wave])

Given the reduction stack, the heap, and a series of waves, PRS produces a possibly-modified heap, and one result for each application in each wave.

Average

> prs :: Stack -> Heap -> [Wave] -> (Heap, RegFile) > prs s h = foldl (wave s) (h, [])

Lines 108 51 200 132 120 10 551 393 47 46 39 49 62 158 97 96

GHC -O2 Run-time 0.18 0.35 0.17 0.33 0.42 0.14 0.37 0.10 0.17 0.64 0.43 0.17 0.34 0.23 0.32 0.27

Clean Run-time 0.26 0.29 0.12 0.27 0.26 0.14 0.21 0.16 0.11 – 0.37 0.38 0.31 0.21 0.16 0.19

Hand reds. per Cycle 0.65 0.46 0.52 0.54 0.62 0.90 0.47 0.52 0.65 0.43 0.43 0.72 0.40 0.46 0.46 0.49

0.29

0.23

0.55

135

Table 4. Normalised run-times of GHC and Clean compiled code running on an Intel Core 2 Duo E8400 PC clocking at 3GHz. Runtimes are relative to 1.00, the run-time of Reduceron running on a Xilinx Virtex-5 FPGA clocking at 96MHz (over 30x slower).

> wave s (h,r) = foldl spec (h,r) . map (instApp s r h)

If a primitive redex candidate turns out to be a primitive redex at run-time, it is reduced, and its result is appended to the register file. Otherwise, the candidate application is constructed on the heap, and a pointer to this application is appended to the register file.

6.

Comparative Evaluation

This section evaluates the Reduceron in the context of previous and current work on functional language implementation.

> spec (h,r) [INT m,PRI p,INT n] = (h, r++[prim p m n]) > spec (h,r) app = (h++[app], r++[PTR False (length h)])

6.1

Function Application Since applications in a function body may refer to the results in the PRS register file, PRS is performed before instantiation of the body. The new rule is:

Benchmark programs

The performance of the Reduceron is measured using a set of 16 benchmark programs named in Table 4. The programs, though small (the largest is 551 lines), are diverse and fairly representative of functional programs in general. For details of the programs, including source code, see [Naylor et al. 2009].

> step (p, h, FUN n f:s, u) = (p, h’’, s’, u) > where > (pop, spine, apps, waves) = p !! f > (h’, r) = prs s h waves > s’ = instApp s r h’ spine ++ drop pop s > h’’ = h’ ++ map (instApp s r h’) apps

6.2

Previous work on the Reduceron

Compared to our previous work on the Reduceron presented in [Naylor and Runciman 2007], the implementation described in this paper reduces the number of clock-cycles required to run the benchmark programs by an average factor of 6.4. As the previous implementation clocks at 111MHz, and the new one at 96MHz on the same FPGA, the raw speed-up factor is 5.5. The gains are mainly due to the combined impact of improved case-expression compilation, single-cycle reduction, and the optimisations listed in Table 3. But another factor is that the new implementation performs spineless evaluation [Burn et al. 1988]. During function application, the spine of a function body is only written onto the stack, reducing heap contention and heap usage. The spine is only ever written to the heap during updating, and even then, only if it is a possiblyshared normal-form. Spineless evaluation also avoids the problem of indirection chains, and is more modular in the sense that it allows function application to be conceptually separated from updating.

The template splitting technique outlined in §4.2 is modified to deal with waves of primitive redex candidates. Each wave is split into a separate template. If a wave contains more than MaxAppsPerBody applications, it is further split in order to satisfy the constraint. Strictness Analysis PRS works well when recursive call sites sustain unboxed arguments2 . For example, if a call to tri is passed an unboxed integer then, thanks to PRS, so too is the recursive call. However, if the initial call is passed a boxed expression, primitive redexes never arise, e.g. the outer call in tri (tri 5) is passed a pointer to an application, inhibiting PRS. A basic strictness analyser in combination with the workerwrapper transformation [Gill and Hutton 2009] alleviates this problem. Each initial call to a recursive function is replaced with a call to a wrapper function. The wrapper applies a special primitive to force evaluation of any strict integer arguments before passing them on to the recursive worker.

6.3

State of the Art

Performance Table 3 shows how PRS cuts run-time and heapusage over the range of benchmark programs. On average, the maximum stack usage drops from 811 to 104, and 85% of primitive redex candidates turn out to be primitive redexes.

A run-time performance comparison of the Reduceron against state-of-the-art functional language implementations running on a 3GHz Intel Core 2 Duo PC is shown in Table 43 . Given the speed-up over our previous implementation of the Reduceron, we had hoped that the performance of our new implemen-

2 An

3 The

unboxed integer is an integer literal INT n as opposed to a pointer PTR x to an expression of type integer.

Clean-compiled version of the OrdList program does not terminate due to a bug in the Clean compiler.

84

tation would approach that of the PC implementations. However, new GHC optimisations and the use of a 3GHz Core 2 Duo instead of a 2.8GHz Pentium-4 have significantly boosted the PC results. (The Dhrystone MIPS (DMIPS) per MHz of the Core 2 Duo is almost twice that of the Pentium-4 [Longbottom 2009].) It would be interesting to compare the Reduceron against GHC or Clean compiled programs running on an FPGA soft-processor such as the Xilinx MicroBlaze. Unfortunately, this experiment would be quite an undertaking since the run-time system of GHC or Clean would need to be ported to the FPGA environment. We can, however, point out that the Core 2 Duo achieves almost three times as many DMIPs per MHz as the Xilinx MicroBlaze [MicroBlaze], and clocks 14 times faster. So the performance ratio for this conventional benchmarking is around 42. The performance ratio between the Reduceron and the PC is an order of magnitude less. Table 4 also shows hand-reductions per clock-cycle. A handreduction is the application of a function or a primitive function; it includes applications of functions introduced by case compilation, but does not include updating, unwinding, integer reduction, constructor reduction, or applications of functions introduced by template splitting. 6.4

ing primitive redexes cheaply, is of little use to the Reduceron. PRS provides such a mechanism. 6.7

Modern Processors

Modern microprocessors are the product of almost half a century of intensive engineering. Instruction pipelines with tens of stages have helped achieve clock frequencies in the region of 3-4GHz. Techniques such as dynamic branch prediction, out of order execution, and caching have enabled high utilisation of such deep pipelines. The Reduceron represents a different kind of processor: a vector processor. Rather than process one word at a time, it processes several in parallel. It is not pipelined, so the sophisticated techniques needed to keep rapidly-clocked pipelines busy are not needed. 6.5

6.8

Static versus Dynamic Analysis

Sharing Analysis The idea to avoid updates by identifying nonshared applications is discussed in [Burn et al. 1988], including trade-offs between static and dynamic sharing analysis. The authors write that dynamic sharing analysis has the advantage of greater precision but that “in general we strongly suspect that the cost of dashing greatly outweighs the advantages of precision when compared to [static analysis]”. In the Reduceron, dynamic sharing analysis (dashing) has no time cost: it is implemented in combinatorial logic that is not on the Reduceron’s critical path. It is precise and simple to implement, requiring only minor modifications to three of the Reduceron’s reduction rules.

The G-machine

In [Peyton Jones 1987], template instantiation is presented as a “simple” first step towards a more sophisticated approach to graph reduction based on the G-machine. So why is the Reduceron based on template instantiation and not the G-machine? The G-machine approach aims to generate good code for conventional hardware, exploiting its strengths and avoiding its weaknesses. We base the Reduceron on template instantiation precisely because it does not make assumptions about the target hardware. The G-machine executes a sequential stream of fine-grained instructions, many of which could in fact be executed in parallel. The FPGA negates the assumption that such a sequential stream of instructions is necessary to avoid interpretive overhead. 6.6

The SKIM Machine

SKIM is a microcoded processor designed specifically to perform combinator reduction [Stoye 1985]. Stoye writes that “a combinator reducer coded on an 8-MHz 68000 goes at about one thirtieth of the speed of SKIM, and was considerably harder to write than SKIM’s microcode”. One interesting aspect of SKIM is its use of one-bit reference counts. Stoye observes that such reference counts can be stored in the pointer to an application rather than in the application itself, making useful information about an application available without the expense of dereferencing a pointer. A reference count bit indicates whether the pointer is a unique application pointer or multiple application pointer. This information is used to good effect in SKIM by allowing the space pointed to by a unique pointer to be reused during reduction rather than discarded. On average about 70% of discarded cells are immediately reused. SKIM’s successful use of reference-count bits partly motivated the development of the dynamic sharing analysis presented in §5.1. We have precisely specified the modifications needed to implement dynamic sharing analysis in a general graph reduction machine. We also discuss two important details not mentioned by Stoye: (1) the subtle case in which an update can cause a unique pointer to become non-unique; and (2) Invariant 3, an important key to understanding why the technique actually works. We use the results of the analysis not for storage reclamation (which would complicate the machinery for template instantiation), but for update avoidance.

Primitive Redex Analysis Primitive redexes can also be detected by static or dynamic analysis. In our experience, a dynamic approach is simple and cheap to implement in hardware, and works quite well. As an alternative, we are currently trying a static analysis to determine expressions whose every instance at run-time will be a primitive redex. The analysis can be combined with specialisation to increase the incidence of such expressions. Eliminating the logic and memory capacity needed to handle failed PRS candidates could significantly boost performance.

Manipulating Basic Values

One aspect of reduction that the G-machine approach aims to optimise is the processing of basic values such as integers. In particular, avoiding construction of strictly-needed primitive applications in memory can lead to large performance gains. For example, if a function body has the form f ((+) x 1) and f is strict then construction of (+) x 1 on the heap can be avoided and instead reduced immediately. The Reduceron can also avoid construction of primitive applications to good effect (§5.3). However, it discovers suitable primitive applications at run-time and evaluates them speculatively. The Reduceron allows construction of (+) x 1 to be avoided regardless of whether or not f is strict, but only if x, at run-time, takes the form INT i. So the conditions under which construction of primitive applications can be avoided are quite different between the two approaches. As discussed in §5.3, strictness analysis can aid PRS. But strictness analysis alone, without some mechanism for reduc-

6.9

The Big Word Machine

A prototype machine similar in spirit to the Reduceron is Augustsson’s Big Word Machine (BWM) [Augustsson 1992]. The BWM is a graph reduction machine with a wide word size, four pointers long, allowing wide applications to be quickly built on, and fetched from, the heap. Augustsson likens the BWM to a VLIW (very long instruction word) machine [Hennessy and Patterson 1992], designed for functional languages rather than scientific computing. Like the Reduceron, the BWM has a crossbar switch attached to the stack allowing complex rearrangements to be done in a single clock-cycle. The BWM also uses the Scott encoding to implement case expressions and constructors. Unlike the Reduceron,

85

References

the BWM works on an explicit instruction stream rather than by template instantiation. The BWM was never actually built. Some simulations were performed but Augustsson writes “the absolute performance of the machine is hard to determine at this point”.

7.

[Augustsson 1992] L. Augustsson. BWM: A Concrete Machine for Graph Reduction. In Proceedings of the 1991 Glasgow Workshop on Functional Programming, pages 36–50, Springer, 1992. [Burn et al. 1988] G. L. Burn, S. L. Peyton Jones and J. D. Robson. The Spineless G-machine. In Proceedings of the 1988 Conference on Lisp and Functional Programming, pages 244-258, ACM, 1988.

Conclusions and Future Work

Considering their relatively low clocking frequencies, FPGA applications must exploit significant parallelism to achieve high performance. In the context of sequential graph-reduction, we have taken this idea to its natural limit: each reduction rule is performed within one clock-cycle. Furthermore, upon synthesis our design achieves a respectable clock frequency compared to similar FPGA designs for the same device. It is therefore quite hard to see how the Reduceron’s reduction rules could be performed more quickly. On the other hand, there is a lot of scope to reduce the number of reductions performed in a given program run. To this end, update avoidance and speculative evaluation of primitive redexes are both effective, making use of simple and precise dynamic analyses. These dynamic analyses would have a prohibitive run-time overhead on a PC, but have no such overhead on an FPGA. Compared to state-of-the-art functional language implementations running on a PC, the Reduceron implemented on a FPGA is on average around a factor of four slower. This difference may be disappointing, but it is an order of magnitude smaller than the typical performance gap between PC-based hard-processors and FPGA-based soft-processors.

[Clack 1999] C. Clack. Realisations for non-strict languages. In Research Directions in Parallel Functional Programming, pages 149187. Springer, 1999. [Dijkstra 1980] E. W. Dijkstra. A mild variant of Combinatory Logic. EWD735, 1980. [Fasel and Keller 1987] J. H. Fasel and R. M. Keller, editors. Graph Reduction, Proceedings of a Workshop. Springer LNCS 279, 1987. [Flanagan et al. 1993] C. Flanagan, A. Sabry, and B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In PLDI ’93: Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation, pages 237–247, ACM, 1993. [Gill and Hutton 2009] A. Gill and G. Hutton. The Worker/Wrapper Transformation. JFP, volume 18, part 2, pages 227–251, 2009. [Hennessy and Patterson 1992] J. Hennessy and D. Patterson. Computer Architecture; A Quantitative Approach. Morgan Kaufmann, 1992. [Jansen et al. 2007] J. M. Jansen, P. Koopman, R. Plasmeijer. Efficient Interpretation by Transforming Data Types and Patterns to Functions. In Trends in Functional Programming, volume 7, pages 157–172, 2007. [Jones and Lins 1996] R. Jones and R. Lins. Garbage Collection: Algorithms for Automatic Dynamic Memory Management. Wiley, 1996. [Kennaway and Sleep 1988] R. Kennaway and R. Sleep. Director strings as combinators. ACM Transactions on Programming Languages and Systems, volume 10, number 4, pages 602–626, 1988.

Future Work The main limitation of the current Reduceron implementation is the small amount of heap memory it provides. Could the heap be implemented using a larger, off-chip memory unit? We believe it could, without loss of performance, and without significant modification to the existing design. Two possible options are: (1) the use of low-latency memory technologies such as RLDRAM, ZBT RAM, and QDR SRAM, commonly used by FPGA applications that require access to large amounts of memory; and (2) the use of buffers or caches, implemented using onchip block RAM. Functional languages offer much scope for parallel evaluation of expressions. On conventional architectures there is a high cost for operations such as locking and releasing expressions under evaluation, so the benefits of parallel evaluation are offset by significant communication overheads. It would be interesting to see if specialpurpose hardware could be used to overcome such overheads. Multiple Reducerons could be synthesised to FPGA, coordinated for parallel graph reduction [Clack 1999]. One of the main features of FPGAs that we are not exploiting is that they can be configured on a per-program basis. One option would be to allow programmers to express, as part of their program, custom FPGA logic that accelerates execution of that program. Such logic would act as co-processor to the Reduceron, and could itself be suitably described in the functional source language. The future development and competitiveness of special-purpose processors for graph reduction remains questionable. But within a few years, just as plug-in GPU cards are already used for high-performance graphics, we’d like to see FPU cards for highperformance applications of functional languages. We hope our work on the Reduceron makes a small advance in that direction.

[Longbottom 2009] R. Longbottom. Dhrystone Benchmark Results On PCs, November 2009. (http://www.roylongbottom.org.uk/ dhrystone%20results.htm) [Naylor and Runciman 2007] M. Naylor and C. Runciman. The Reduceron: Widening the von Neumann bottleneck for graph reduction using an FPGA. In IFL’07, pages 129–146. Springer LNCS 5083, 2008. [Naylor et al. 2009] M. Naylor, C. Runciman, and J. Reich. Reduceron home page. (http://www.cs.york.ac.uk/fp/reduceron/) [Memo 9] M. Naylor. F-lite: a core subset of Haskell, 2008. (http: //www.cs.york.ac.uk/fp/reduceron/memos/Memo9.txt) [Memo 12] M. Naylor. An algorithm for arity-reduction, 2008. (http: //www.cs.york.ac.uk/fp/reduceron/memos/Memo12.lhs) [Memo 27] M. Naylor. Design of the Octostack, 2009. (http://www.cs. york.ac.uk/fp/reduceron/memos/Memo27.lhs) [MicroBlaze] Xilinx. MicroBlaze Soft Processor v7.20, April 2009. (http://www.xilinx.com/tools/microblaze.htm) [Peyton Jones 1987] S. L. Peyton Jones. The Implementation of Functional Programming Languages, Prentice Hall, 1987. [Peyton Jones 1992] S. L. Peyton Jones. Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. Journal of Functional Programming, volume 2, pages 127-202, 1992. [Scheevel 1986] M. Scheevel. NORMA: a graph reduction processor. In Proceedings of the 1986 Conference on LISP and Functional Programming, pages 212-219. ACM, 1986. [Stoye 1985] W. Stoye. The Implementation of Functional Languages using Custom Hardware. PhD Thesis, University of Cambridge, 1985. [Turner 1979] D. A. Turner. A New Implementation Technique for Applicative Languages. Software – Practice and Experience, volume 9, number 1, pages 31–49, 1979. [Weicker 1984] R. Weicker. Dhrystone: A Synthetic Systems Programming Benchmark. Communications of the ACM, volume 27, number 10, pages 1013-1030, 1984.

Acknowledgments This work was supported by the Engineering and Physical Sciences Research Council of the UK under grant EP/G011052/1. Thanks to Xilinx for donating the FPGA used in this work, and to Satnam Singh, Gabor Greif, and the anonymous ICFP reviewers for their helpful suggestions.

86

Using Functional Programming within an Industrial Product Group: Perspectives and Perceptions David Scott

Richard Sharp

Citrix Systems UK R&D Building 101 Cambridge Science Park Cambridge CB4 0FY, UK [email protected]

Thomas Gazagnaire

Anil Madhavapeddy

INRIA Sophia Antipolis 2004 route des Lucioles F-06902 Sophia Antipolis Cedex, France [email protected]

Computer Laboratory University of Cambridge William Gates Building Cambridge CB3 0FD, UK [email protected]

Abstract

a non-mainstream language for product development (§2). In the remainder of the paper we present data that compares the OCaml team with other XenServer teams, in terms of hiring patterns (§3) and code contribution (§4). Finally, we examine other work in the community (§5) and conclude (§6).

We present a case-study of using OCaml within a large product development project, focussing on both the technical and nontechnical issues that arose as a result. We draw comparisons between the OCaml team and the other teams that worked on the project, providing comparative data on hiring patterns and crossteam code contribution.

1.1

The XenServer engineering group is organised into five separate engineering teams, each responsible for different software components that comprise the XenServer product. There is a Hypervisor/Kernel team, a Storage team, a Management Tools team (MTT), a Windows Driver team and a User Interface team. Four of these teams use “mainstream” languages (including C, Python and C#), but the MTT use OCaml as their primary development language. There are about 40 engineers in total within XenServer engineering, including 10 full-time OCaml programmers in the MTT who are responsible for extending and maintaining a code-base that consists of approximately 130 KLoC of OCaml. The MTT team’s components consume and provide interfaces and APIs to those of all other teams; thus there is constant interaction between the OCaml programmers and the rest of the development group.

Categories and Subject Descriptors D.2.m [Software Engineering]: [Miscellaneous] ; D.3.2 [Programming Languages]: Language Classifications— Applicative (functional) languages General Terms Keywords

1.

The XenServer Engineering Group

Human Factors, Languages, Management

Industry, Functional Programming, Perceptions

Introduction

We present our experiences of using the programming language OCaml within the Citrix XenServer product group. The case-study is interesting for three reasons: 1. XenServer is deployed in over 40,000 companies worldwide, often in mission-critical infrastructure, with the largest single customer having more than 20,000 machines running XenServer [13]. We are presenting a very “real-world” use of functional programming.

1.2

2. It provides insight into the pros and cons of using OCaml for a major systems software project.

Resource pools: The ability to create clusters of servers and shared-storage that are managed as a unit. Virtual Machines (VMs) can be moved between servers in the pool while continuing to run [3].

The XenServer Product

Citrix XenServer is a managed virtualisation platform built on the open-source Xen hypervisor [1], offering a range of additional management features. Some of the features include:

3. The team that used OCaml was one of five teams working on XenServer. This enables us to draw comparisons between the OCaml team and other teams within the XenServer group.

High Availability (HA): The ability to restart VMs on other servers automatically, if the server they were executing on fails. Cluster fencing, required to preserve data integrity in the storage layer [5], is provided in software by the XenServer management tools.

We start with a brief background into the XenServer engineering group (§1.1) and the product (§1.2). Next we describe the authors’ perspectives of using OCaml for the XenServer project, reflecting on both our technical experiences and the different reactions within the company that we encountered regarding the use of

XenAPI: An XML-RPC management API that provides the ability to create resource pools and VMs, and configure all aspects of the system. XenCenter Management Console: A Windows GUI that allows administrators to create and configure VMs and resource pools.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

1.2.1

Architectural Overview

XenServer is based on a type-1 hypervisor [1], and is installed straight onto the bare metal and booted directly from a server’s BIOS. The hypervisor is the first component to be loaded and

87

takes control of CPUs, memory and interrupt mappings. Next, it spawns a control domain—a small Linux VM that provides system management services and provides physical device drivers for networking and storage. The main XenServer management process that resides in the control domain is known as X API, because it is the service that provides the XenAPI. The service’s primary responsibility is to listen to XenAPI calls (made over the network) and execute these requests. In addition X API itself implements resource pools (dealing with the distributed systems challenges that this entails), maintains a durable, replicated persistent database of configuration data on behalf of the resource pool and is responsible for high-availability planning and failover1 . The X API source code, consisting of approx. 130 KLoc of OCaml, is open source and can be freely downloaded under the LGPLv2 license2 . One of the defining characteristics of X API is that it communicates with all major components of the system. On the one hand it accepts connections from clients (e.g. the XenCenter GUI), performing XenAPI requests on their behalf and providing access to a variety of data-streaming services (e.g. remote-access to VM consoles, importing and exporting VM disk images). On the other hand, X API interfaces with other software components within the server, including the Xen hypervisor and the networking and storage subsystems. This requires X API to use a variety of different interfaces, including (i) calling into statically-linked C APIs to communicate with the Xen hypervisor and the Linux kernel; (ii) forking new processes to invoke vendor-specific storage scripts or other shell commands; (iii) utilising a variety of different IPC mechanisms, for example to communicate with subprocesses involved in a live VM migration [3]; and (iv) performing protocol processing functions over both TCP and Unix domain sockets to receive and parse XenAPI requests. Another property of X API is that it is highly concurrent. As well as managing a number of long-running background housekeeping threads, X API accepts and processes concurrent XenAPI requests across multiple connections from multiple clients and deals with communication between the multiple servers and shared storage devices that comprise a resource pool.

2.

people within the company who supported the use of OCaml, feeling that the risks of using a non-mainstream language were worth taking in return for the efficiencies that the engineers claimed it would bring. 3. XenSource had weak project governance within engineering. Thus, even though there were many people within the company who felt that using a non-mainstream language was not the right decision, the OCaml project started anyway and quickly built momentum as a grassroots effort. These factors are all non-technical; they created the environment in which a product-development initiative based on a non-mainstream language could be seeded. But there were also technical reasons why OCaml was chosen over other languages for the X API project: 1. Performance: XenSource engineers had used OCaml on previous projects and were confident that it could deliver the required performance for the project [11]. 2. Integration: OCaml’s low-overhead foreign-function interface and existing Unix bindings facilitated the required interactions with the myriad of software components that made up the XenServer system. 3. Robustness: As a long-running service, X API must not crash. This requirement made OCaml’s static type-safety and managed heap very appealing, offering the potential to reduce runtime failures due to type errors, memory leaks or heap corruption. 4. Compactness: there were plans for embedded versions of XenServer on flash storage as small as 16MB. The relatively simple OCaml run-time and compact native code output were key to this requirement. There were other languages that met the above criteria, the most notable being Haskell. The primary reason for choosing OCaml over Haskell was non-technical. The engineers involved in the project had considerably more experience of using OCaml, and using it reduced training costs (this being a luxury in a fast-paced startup). Our previous experiences had also given us confidence that the OCaml tool-chain would meet the project requirements.

Authors’ Perspectives

In this section we describe our perspectives of using OCaml within the context of the XenServer project. We discuss why OCaml was selected, describe the reactions within the company to using a nonmainstream language for product development and relate some of our technical experiences. 2.1

2.2

Selection of OCaml

The XenServer product did not start out within Citrix, but was first conceived within a startup called XenSource. Citrix acquired XenSource (and hence the XenServer team and product) in 2007. There were a number of factors within XenSource that drove the choice of OCaml and enabled the X API project to reach inception:

1. We will not be able to hire OCaml programers quickly enough to grow the team. 2. A large code base in a non-mainstream language will make XenSource a less attractive acquisition target.

1. XenSource was staffed by a number of ex-researchers from the University of Cambridge Computer Laboratory. Many of these engineers had used OCaml before in a research environment and believed that, for large projects, the OCaml language offered significant productivity benefits over both traditional systems languages such as C, and dynamically typed languages, such as Python [10].

3. Other teams (staffed with programmers who don’t know OCaml) will not able to work with the MTT because of “the language barrier”. 4. The OCaml tool-chain may not be mature enough to support the development of a complex system.

2. As a startup, XenSource had a culture of innovation and risktaking. In this environment there were a number of influential 1 See 2 See

Reactions within the company

Choosing OCaml for a product development project was a contentious decision that created some heated debate within XenSource. While the engineers in the MTT firmly believed that the benefits of using OCaml outweighed the risks, others strongly believed that the risks of using a non-mainstream language for a major product development project were simply too great. Specific risks that were highlighted included:

The MTT had enough experience of using OCaml to argue convincingly that Risk 4 could be effectively mitigated. However, at the time the X API project was initiated, there was no data available regarding Risks 1—3, so debate (although heated) made little forward progress.

http://community.citrix.com/x/O4KZAg http://www.xen.org/products/cloudxen.html

88

In hindsight, none of the risks above materialised. A year after work on X API started, Citrix paid $500M for XenSource, and the technical due-diligence process performed during the acquisition made it very clear that a large chunk of XenServer was implemented in OCaml. There were also no problems hiring OCaml programmers (§3), and other teams were able to work very effectively with the MTT (§4). 2.3

Likewise, the lack of high-level profiling data made performance tuning harder than it should have been, and made it difficult to track down memory leaks4 . 2.4

Over the last four years of commercial OCaml development, we have learnt several technical lessons regarding its use. Some of these are outlined in this section.

Technical experiences

We conducted a preliminary user study among the engineering group, with a set of open-ended questions designed to elicit individual opinions. Overall, the MTT report positive experiences of using OCaml on the XenServer project. Without exception, the engineers within the MTT believe that developing X API in OCaml has been a success, with the type system and automatic memory management being the most widely cited benefits of the language. Engineers also report that they “enjoy programming in OCaml”, particularly emphasising the fact that they believe OCaml allows them to express complex algorithms concisely. There is also a shared belief within the MTT that, overall, the choice of OCaml has enabled the team to be more productive than they would have been had they chosen a more mainstream language for the project (e.g. C++ or Python). Note that Java and .NET-based languages were not included due to the size of their runtime environments not being conducive to the ‘compactness’ requirement (§2). These positive experiences are backed up by internal test data and component defect levels that demonstrate that the quality and performance of the X API component is good. However, despite the overall positive outcome, there have been some technical challenges that relate to the choice of OCaml. These challenges are not due to the OCaml language per se, but are due to lack of available library support, the complexity of the Foreign Function Interface (FFI) and the limitations of the OCaml toolchain. We consider each of these issues in more detail in the remainder of this section. 2.3.1

2.4.1

2.4.2

Imperative Many of the lower-level modules of X API (e.g. those that interface with the hypervisor and control domain kernel) consist of step-wise, imperative code and look like type-safe C. OCaml fully supports this style with language constructs such as for/while loops and references.

Lack of Library Support

Functional Although a good chunk of X API is unashamedly imperative, some of the higher-level aspects of the system are functional in nature. For example the high-availability feature requires algorithms for distributed failure planning. These algorithms (e.g. bin packing) are implemented in a functional style. One function of X API is to communicate with Xenstore. The Xenstore service, which runs in the control domain, provides a tuple-space that is used for co-ordination between VMs and the XenServer management tools [7]. Xenstore exposes an asynchronous event interface that is hard to use. X API abstracts much of this complexity behind a straight-forward combinator library that handles events via composable functions. For example, consider the following code fragment:

C Bindings

wait_for (any_of [ ‘OK, value_to_appear "/path1" ‘Failed, value_to_become "/path" v ])

Lack of Tool Support

Our heavy use of threads and fork(2) made it impossible for us to effectively use ocamldebug or ocamlprof. Instead we relied on gdb and gprof directly against the compiled binary. This was better than nothing, but the low-level nature of gdb made it hard to relate the debugging output back to the OCaml source. 3 Universal

The Right Style for the Right Job

OCaml allows for many programming techniques to be used in the same codebase. X API takes full advantage of this fact, using different programming styles to solve different problems:

Writing C bindings was difficult and error-prone. Despite careful code-review and a policy of “keeping things simple” (avoiding references into the heap across the FFI, and avoiding use of callbacks whenever possible) some bugs still crept through, creating occasional X API segmentation faults that were hard to reproduce and track down. 2.3.3

Stability of Tools and Runtime

In the early days of X API development, we had no idea if the OCaml runtime (e.g. the garbage collector) would be robust enough to support long-running processes like X API that are required to execute continuously for months at a time. We joined the OCaml Consortium to offset this risk, providing us with a support channel in case bugs arose. However, it transpired that the OCaml runtime was remarkably stable. Our automated test system puts X API through 2000 machine-hours of testing per night, and also runs regular stress and soak tests that last for weeks on end. Customers also run their XenServers for several months at a time without restarting X API. Despite all this testing, we have never had a single XenServer defect reported from internal testing or from the field that can be traced back to a bug in the OCaml runtime or compiler. (During development we did once find a minor compiler bug, triggered when compiling auto-generated OCaml code with many function arguments, but this was already fixed in the development branch by the time we reported it and so no interaction with the maintainers at INRIA was required.)

We found that OCaml’s library support for common data structures and algorithms generally sufficient for our needs. However, the lack of library support for common systems protocols was more problematic. In particular we ended up having to write a pipelined HTTP/1.1 server from scratch and handcrafting our own SSL solution using separate stunnel3 processes to terminate and initiate SSL connections, and communicating with these over IPC. There were some open source HTTP and SSL OCaml libraries available. However, at the time, the libraries that we evaluated were not fully featured or robust enough to meet the requirements of the X API project. 2.3.2

Technical Lessons Learnt

The expression value to appear "/path1" represents the act of waiting for any value to become associated with key "/path1". The expression value to become "/path" v 4 In

a garbage collected language, like OCaml, memory leaks occur when global references to objects are not cleaned up explicitly (e.g. if something is added to a global hash-table and not subsequently removed).

SSL wrapper: http://www.stunnel.org

89

4.

represents the act of waiting for a specific value v to become associated with key "/path". The expression any of represents the act of waiting for any one of a set of labelled options; in this example the label ‘OK is used to represent a success case and the label ‘Failed represents a failure case. Finally the function wait for uses the Xenstore event interface, returning either ‘OK or ‘Failed as appropriate.

As described earlier (§1.1), the XenServer Engineering Group consists of five teams of full-time software engineers, supplemented by contractors. Each team is responsible for a different software component. The source code for each component is stored in a number of version-controlled repositories using Mercurial [14]. Each repository contains a complete historical record listing every code change, when it was made, who made it and why. In this section we will examine this historical record to identify and analyse which teams contributed to which components. We shall use this data to answer the question:

Meta-programming X API has a distributed database that runs across all the hosts in a resource pool, including failover and replication algorithms. The OCaml code to interface with this database and remote calls is all auto-generated from a succinct specification and compiler. Similarly, all of the XenAPI bindings to other languages (C, C#, Java) are generated from a single data-model.

“Did the use of OCaml within the MTT prevent engineers from other teams making significant contributions to the X API project?”

Object-oriented OCaml provides a comprehensive object system, but it is not used in X API except in small, local cases. Although we have nothing specific against using it, a compelling case for introducing them has never emerged. Modules, functors and polymorphic variants have been sufficient to date, and we anticipate that first-class packaged modules (in OCaml 3.12+) will further reduce the need for using objects. 2.4.3

OCaml Code contribution

For our analysis we shall focus on four components: 1. Management Console: a windows user-interface maintained by the User Interface team; 2. Storage: a set of plugin modules to connect XenServer to backend storage arrays where VM disks are stored maintained by the Storage team;

Garbage Collect Everything

3. X API: the component which implements the XenAPI maintained by the MTT; and

The automatic memory management that OCaml provides is a huge improvement over using C, but we still frequently get leaks due to mismatched allocation/deallocation of other limited OS resources, such as file descriptors and shared memory segments. These are usually only detected after automated stress testing detects the failure since the code involved works fine during development. Nowadays, we make an effort to abstract as many of the OS resources as possible behind our own extensions to the standard library.

4. Windows drivers: drivers required for high-performance VM I/O, maintained by the Windows Driver team. The components were chosen for the following reasons: 1. they were all created solely for the XenServer product unlike, for example, the open-source Xen hypervisor that was created as part of a research project a few years before the XenServer product emerged; 2. they are all maintained by different teams; and

3.

Hiring Patterns

3. they all primarily use different programming languages (even the X API code contains traces of C).

Despite concerns raised at the start of the X API project, the MTT has had no difficulty in finding and hiring good OCaml programmers, and has been able to grow at a comparable rate to the other XenServer teams that used mainstream languages. From October 2006 to April 2010, 12 engineers have been hired into OCamlprogramming positions (roughly a quarter of all XenServer engineers hired over the period). There are two interesting observations about the MTT’s hiring patterns. Firstly, we found that posting on functional programming mailing lists (including the OCaml List and Haskell Cafe) has consistently generated good inflows of high quality candidates interested in industrial functional programming positions. And, secondly, we have found that previous OCaml experience is not a prerequisite for hiring into OCaml-programming positions. In fact, of the 12 engineers hired, only 2 had prior experience of OCaml; the other 10 learnt OCaml after they started work at XenSource or Citrix. Interestingly, having to learn OCaml did not make a big difference to the training time of the new engineers: the 10 engineers that did not know OCaml became productive at about the same speed as the 2 engineers that did have prior OCaml experience. We believe that this is because, for a complex software product like XenServer, getting to know one’s way around the various code-bases and getting to grips with the architectural principles of the wider system is a much more time consuming task than learning a new programming language. The 10 engineers that did not know OCaml were already highly proficient programmers who had a solid grounding in data-structures, algorithms and computer science more generally.

The following table gives approximate sizes and primary language data for each component5 : Component

Size Main Languages 130kLOC OCaml 80 kLOC C, C++ Windows Drivers Management Console 200kLOC C# 40 kLOC Python, C Storage The diagram in Figure 1 displays four bars, one for each component in the analysis. The height of each bar indicates the total number of individuals who contributed code to each component. The bars are subdivided into sections, each one coloured to indicate the team the contributor belonged to. The diagram in Figure 2 displays four bars, one for each component as before. The bars now represent the relative contribution size from members of each team to each component. It is clear that, in all cases, the team responsible for maintaining a component makes the majority of contributions. However it is also clear that, in all cases, members of other teams made contributions. The size and colouring of the bar corresponding to X API in Figure 1 clearly shows that the use of OCaml did not prevent engineers from other teams making contributions. Furthermore, the size and colouring of the bar corresponding to X API in Figure 2 XAPI

5 The

X API number excludes auto-generated OCaml code, the Windows driver excludes header files as most are auto-generated, and the Management Console excludes auto-generated XenAPI and Windows Forms code.

90

100%

User Interface

MTT

30 25 20 15 10

xapi (ocaml)

Windows Drivers (C,CPP)

Storage (python,C)

0

Management Console (C#)

5

60%

40%

20%

0%

Component (primary languages used)

Component (primary languages used)

Figure 1. The total height of each bar shows the total number of unique contributors to each component. The color indicates the proportion of contributors from each team.

Figure 2. Each coloured section indicates the size of contributions to a component by a team, relative to the total contributions.

clearly shows that these contributions were as significant (in terms of size) as contributions made to other non-OCaml components.

5.

80%

xapi (ocaml)

Hypervisor/Kernel

Windows Drivers (C,CPP)

Storage

Windows Drivers

Storage (python,C)

Unknown

Management Console (C#)

Number of contributors

35

Contribution from each team

40

X API also has compilers written in OCaml to generate bindings from an executable specification for more verbose languages like C#, C, Java and Javascript. This helped keep the various XenAPI clients synchronised with the server as it developed rapidly in the early days.

Related Work

There are several groups using OCaml in industrial settings. Jane Street Capital is a successful proprietary trading company which uses OCaml for a wide range of tasks. In their experience report [12], they share several of our technical concerns with OCaml: (i) generic pretty-printing facilities have to be addressed via macros; and (ii) the lack of a wide range of community libraries for common tasks. Since their report, some of these aspects have improved somewhat. OCamlForge provides a central place to locate community libraries, and systems such as dyntype [8] and deriving [16] make it easier to operate on generic values and types without modifying the core OCaml tool-chain. Like them, X API also does not use the OCaml object system much. One concern we do not share is the lack of a multi-threaded garbage collector. Since X API is not a CPU-intensive service, and the control domain is limited to a single virtual CPU, the simplicity and stability benefits of the existing collector exceed the more complex concurrent alternative. XenServer is not a hosted service, but a product that ships externally to many customers. MLdonkey [9] was one of the earliest (and for some time, the most popular) peer-to-peer client applications, written entirely in OCaml. We restricted our use of OCaml to the server-side component of XenServer, and wrote the native Windows client using C#. We made some attempts to compile portions of the OCaml code (e.g. the command-line interface) for Windows, but the lack of robust libraries (particularly SSL) made it not worth the effort. Since our decision in 2006, desktop programming using functional languages has advanced considerably, as (i) Microsoft F# provides full access to Windows APIs [15]; and (ii) web browsers can host entire applications in Javascript, and be programmed in a functional style [2, 6]. We have not yet built a client using these technologies, however. OCaml is traditionally popular as a compiler tool, and FramaC is an example of an industrial-grade static analysis product [4].

6.

Conclusions

The X API project is perceived as a success within XenServer engineering. The MTT works effectively with other teams (i.e. without any ‘language barrier problems’), engineers have been hired into OCaml programming positions quickly and effectively and, technically, the X API component has shown itself to be stable and robust. Although there were some drawbacks to using OCaml, namely a lack of library support for common protocols (e.g. SSL, HTTP) and a lack of tool support, engineers within the MTT believe that overall OCaml has brought significant productivity and efficiency benefits to the project. In particular, MTT engineers believe that OCaml has enabled them to be more productive than they would have been had they adopted one of the mainstream languages that would have met the requirements of the project (e.g. C++ or Python). Since the X API code-base was open sourced in mid-2009 it has become possible for engineers beyond Citrix to work on the project. It remains to be seen whether the use of OCaml will act as a barrier to wider contribution, but based on our experiences reported in this paper, we are hopeful that it will not. We are already seeing some code submissions to the X API project from beyond Citrix and are working with development partners and the research community to encourage further contribution. The source code can be obtained from http://xenbits.xen. org/XCP/.

7.

Acknowledgments

We thank Eleanor Scott, Richard Mortier, Jonathan Knowles, Yaron Minsky, Tim Deegan, Jonathan Ludlam, Stephen Kell, Euan Harris, our Citrix colleagues and the anonymous reviewers for their feedback.

91

References

[8] T. Gazagnaire and A. Madhavapeddy. Statically-typed value persistence for ML. In Proceedings of the Workshop on Generative Technologies, March 2010. [9] F. Le Fessant and S. Patarin. MLdonkey, a Multi-Network Peer-toPeer File-Sharing Program. Research Report RR-4797, INRIA, 2003. [10] A. Madhavapeddy. Creating high-performance, statically type-safe network applications. Technical Report UCAM-CL-TR-775, University of Cambridge, Computer Laboratory, Apr. 2006. [11] A. Madhavapeddy, A. Ho, T. Deegan, D. Scott, and R. Sohan. Melange: creating a “functional” Internet. SIGOPS Oper. Syst. Rev., 41(3):101–114, 2007. [12] Y. Minsky and S. Weeks. Caml trading – experiences with functional programming on Wall Street. J. Funct. Program., 18(4):553–564, 2008. [13] T. Morgan. Citrix desktop virt soars in Q4, Jan. 2010. http: //bit.ly/ciB74a. [14] B. O’Sullivan. Mercurial: the definitive guide. O’Reilly Media, first edition, 2009. [15] D. Syme, A. Granicz, and A. Cisternino. Expert F#. [16] J. Yallop. Practical generic programming in OCaml. In Proceedings of the 2007 workshop on Workshop on ML, pages 83–94, New York, NY, USA, 2007. ACM.

[1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP), pages 164–177, New York, NY, USA, 2003. ACM Press. [2] B. Canou, V. Balat, and E. Chailloux. O’Browser: Objective Caml on browsers. In Proceedings of the 2008 ACM SIGPLAN workshop on ML, pages 69–78, New York, NY, USA, 2008. ACM. [3] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In Proceedings of the 2nd Symposium of Networked Systems Design and Implementation, May 2005. [4] P. Cuoq, J. Signoles, P. Baudin, R. Bonichon, G. Canet, L. Correnson, B. Monate, V. Prevosto, and A. Puccetti. Experience report: OCaml for an industrial-strength static analysis framework. In ICFP ’09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, pages 281–286, New York, NY, USA, 2009. ACM. [5] M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T. Talpey, and M. Wittle. The Direct Access File System. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 175–188, Berkeley, CA, USA, 2003. USENIX Association. [6] J. Donham. OCamlJS, July 2010. http://jaked.github.com/ ocamljs. [7] T. Gazagnaire and V. Hanquez. Oxenstored: an efficient hierarchical and transactional database using functional programming with reference cell comparisons. In ICFP ’09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, pages 203–214, New York, NY, USA, 2009. ACM.

92

Lazy Tree Splitting Lars Bergstrom John Reppy

Mike Rainey Adam Shaw

Matthew Fluet Rochester Institute of Technology [email protected]

University of Chicago {larsberg,mrainey,jhr,ams}@cs.uchicago.edu

Abstract

rectangular, structure; i.e., subarrays may have different lengths. NDP programming is supported by a number of different parallel programming languages [CLP+ 07, GSF+ 07], including our own Parallel ML (PML) [FRRS08]. On its face, implementing NDP operations seems straightforward because individual array elements are natural units for creating tasks, which are small, independent threads of control.1 Correspondingly, a simple strategy is to spawn off one task for each array element. This strategy is unacceptable in practice, as there is a scheduling cost associated with each task (e.g., the cost of placing the task on a scheduling queue) and individual tasks often perform only small amounts of work. As such, the scheduling cost of a given task might exceed the amount of computation it performs. If scheduling costs are too large, parallelism is not worthwhile. One common way to avoid this pitfall is to group array elements into fixed-size chunks of elements and spawn a task for each chunk. Eager Binary Splitting (EBS), a variant of this strategy, is used by Intel’s TBB [Int08, RVK08] and Cilk++ [Lei09]. Choosing the right chunk size is inherently difficult, as one must find the middle ground between undesirable positions on either side. If the chunks are too small, performance is degraded by the high costs of the associated scheduling and communicating. By contrast, if the chunks are too big, some processors go unutilized because there are too few tasks to keep them all busy. One approach to picking the right chunk size is to use static analysis to predict task execution times and pick chunk sizes accordingly [TZ93]. But this approach is limited by the fact that tasks can run for arbitrarily different amounts of time, and these times are difficult to predict in specific cases and impossible to predict in general. Dynamic techniques for picking the chunk size have the advantage that they can base chunk sizes on runtime estimates of system load. Lazy Binary Splitting (LBS) is one such chunking strategy for handling parallel do-all loops [TCBV10]. Unlike the two aforementioned strategies, LBS determines chunks automatically and without programmer (or compiler) assistance and imposes only minor scheduling costs. This paper presents an implementation of NDP that is based on our extension of LBS to binary trees, which we call Lazy Tree Splitting (LTS). LTS supports operations that produce and consume trees where tree nodes are represented as records allocated in the heap. We are interested in operations on trees because Manticore, the system that supports PML, uses ropes [BAP95], a balanced binary-tree representation of sequences, as the underlying representation of parallel arrays. Our implementation is purely functional in that it works with immutable structures, although some imperative techniques are used under the hood for scheduling.

Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this paper, we describe the implementation of NDP in Parallel ML (PML), part of the Manticore project. Managing the parallel decomposition of work is one of the main challenges of implementing NDP. If the decomposition creates too many small chunks of work, performance will be eroded by too much parallel overhead. If, on the other hand, there are too few large chunks of work, there will be too much sequential processing and processors will sit idle. Recently the technique of Lazy Binary Splitting was proposed for dynamic parallel decomposition of work on flat arrays, with promising results. We adapt Lazy Binary Splitting to parallel processing of binary trees, which we use to represent parallel arrays in PML. We call our technique Lazy Tree Splitting (LTS). One of its main advantages is its performance robustness: per-program tuning is not required to achieve good performance across varying platforms. We describe LTS-based implementations of standard NDP operations, and we present experimental data demonstrating the scalability of LTS across a range of benchmarks. Categories and Subject Descriptors D.3.0 [Programming Languages]: General; D.3.2 [Programming Languages]: Language Classifications—Concurrent, distributed, and parallel languages; D.3.4 [Programming Languages]: Processors—Run-time environments General Terms

Languages, Performance

Keywords nested-data-parallel languages, scheduling, compilers, and run-time systems

1.

Introduction

Nested data-parallelism (NDP) [BCH+ 94] is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. Irregular parallelism is achieved by the fact that nested arrays need not have regular, or

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

1 We

do not address flattening (or vectorizing) [Kel99, Les05] transformations here, since the techniques of this paper apply equally well to flattened or non-flattened programs.

93

LTS exhibits performance robustness; i.e., it provides scalable parallel performance across a range of different applications and platforms without requiring any per-application tuning. Performance robustness is a highly desirable characteristic for a parallel programming language, for obvious reasons. Prior to our adoption of LTS, we used Eager Tree Splitting (ETS), a variation of EBS. Our experiments demonstrate that ETS lacks performance robustness: the tuning parameters that control the decomposition of work are very sensitive to the given application and platform. Furthermore, we demonstrate that the performance of LTS compares favorably to that of (ideally-tuned) ETS across our benchmark suite.

2.

balancing the parallel execution of this unbalanced computation is the primary contribution of this paper. 2.2

The Manticore runtime system consists of a small core written in C, which implements a processor abstraction layer, garbage collection, and a few basic scheduling primitives. The rest of our runtime system is written in BOM, a PML-like language. BOM supports several mechanisms, such as first-class continuations and mutable data structures, that are useful for programming schedulers but are not in PML. Further details on our system may be found elsewhere [FRR08, Rai09, Rai07]. A task-scheduling policy determines the order in which tasks execute and the assignments from tasks to processors. Our LTS is built on top of on a particular task-scheduling policy called work stealing [BS81, Hal84]. In work stealing, we employ a group of workers, one per processor, that collaborate on a given computation. The idea is that idle workers which have no useful work to do bear most of the scheduling costs and busy workers which have useful work to do focus on finishing that work. We use the following, well-known implementation of work stealing [BL99, FLR98]. Each worker maintains a deque (doubleended queue) of tasks, represented as thunks. When a worker reaches a point of potential parallelism in the computation, it pushes a task for one independent branch onto the bottom of the deque and continues executing the other independent branch. Upon completion of the executed branch, it pops a task off the bottom of the deque and executes it. If the deque is not empty, then the task is necessarily the most recently pushed task; otherwise all of the local tasks have been stolen by other workers and the worker must steal a task from the top of some other worker’s deque. Potential victims are chosen at random from a uniform distribution. This work-stealing scheduler can be encapsulated in the following function, which is part of the runtime system core:

Nested data parallelism

In this section we give a high-level description of PML and discuss the runtime mechanisms we use to support NDP. More detail can be found in our previous papers [FRR+ 07, FFR+ 07, FRRS08]. 2.1

Runtime model

Programming model

PML is the programming language supported by the Manticore system.2 Our programming model is based on a strict, but mutationfree, functional language (a subset of Standard ML [MTHM97]), which is extended with support for multiple forms of parallelism. We provide fine-grain parallelism through several lightweight syntactic constructs that serve as hints to the compiler and runtime that the program will benefit from executing the computation in parallel. For this paper, we are primarily concerned with the NDP constructs, which are based on those found in N ESL [Ble90b, Ble96]. PML provides a parallel array type constructor (parray) and operations to map, filter, reduce, and scan these arrays in parallel. Like most languages that support NDP, PML includes comprehension syntax for maps and filters, but for this paper we omit the syntactic sugar and restrict ourselves the the following interface: type ’a parray val range : int * int -> int parray val mapP : (’a -> ’b) -> ’a parray -> ’b parray val filterP : (’a -> bool) -> ’a parray -> ’a parray val reduceP : (’a * ’a -> ’a) -> ’a -> ’a parray -> ’a val scanP : (’a * ’a -> ’a) -> ’a -> ’a parray -> ’a parray

val par2 : (unit -> ’a) * (unit -> ’b) -> ’a * ’b

When a worker P executes par2 (f ,g), it pushes the task g onto the bottom of its deque3 and then executes f (). When the computation of f () completes with result rf , P attempts to pop g from its deque. If successful, then P will evaluate g() to a result rg and return the pair (rf ,rg ). Otherwise, some other worker Q has stolen g, so P writes rf into a shared variable and looks for other work to do. When Q finishes the evaluation of g(), then it will pass the pair of results to the return continuation of the par2 call. The scheduler also provides a generalization of par2 to a list of thunks.

The function range generates an array of the integers between its two arguments, mapP, filterP, and reduceP have their usual meaning, except that they may be evaluated in parallel, and scanP produces a prefix scan of the array. These parallel-array operations have been used to specify both SIMD parallelism that is mapped onto vector hardware (e.g., Intel’s SSE instructions) and SPMD parallelism where parallelism is mapped onto multiple cores; this paper focuses on exploiting the latter. As a simple example, the main loop of a ray tracer generating an image of width w and height h can be written

val parN : (unit -> ’a) list -> ’a list

This function can be defined in terms of par2, but we use a more efficient implementation that pushes all of the tasks in its tail onto the deque at once.

fun raytrace (w, h) = mapP (fn y => mapP (fn x => trace (x, y)) (range (0,w-1))) (range (0,h-1))

2.3

Ropes

In our Manticore system, we use ropes as the underlying representation of parallel arrays. Ropes, originally proposed as an alternative to strings, are persistent balanced binary trees with seqs, contiguous arrays of data, at their leaves [BAP95]. For the purposes of this paper, we view the rope type as having the following definition:

This parallel map within a parallel map is an example of nested data parallelism. Note that the time to compute one pixel depends on the layout of the scene, because the ray cast from position (x,y) might pass through a subspace that is crowded with reflective objects or it might pass through relatively empty space. Thus, the amount of computation across the trace(x,y) expression (and, therefore, across the inner mapP expression) may differ significantly depending on the layout of the scene. A robust technique for

although in our actual implementation there is extra information in the Cat nodes to support balancing. Read from left to right, the

2 Manticore

3 Strictly

datatype ’a rope = Leaf of ’a seq | Cat of ’a rope * ’a rope

may support other parallel languages in the future.

94

speaking, it pushes a continuation that will evaluate g().

data elements at the leaves of a rope constitute the data of a parallel array it represents. Since ropes are physically dispersed in memory, they are wellsuited to being built in parallel, with different processors simultaneously working on different parts of the whole. Furthermore, the rope data structure is persistent, which provides, in addition to the usual advantages of persistence, two special advantages related to memory management. First, we can avoid the cost of store-list operations [App89], which would be necessary for maintaining an ephemeral data structure. Second, a parallel memory manager, such as the one used by Manticore [FRR08], can avoid making memory management a sequential bottleneck by letting processors allocate and reclaim ropes independently. As a parallel-array representation, ropes have several weaknesses as opposed to contiguous arrays of, say, unboxed doubles. First, rope random access requires logarithmic time. Second, keeping ropes balanced requires extra computation. Third, mapping over multiple ropes is more complicated than mapping over multiple arrays, since the ropes may have different shape. In our performance study in Section 5, we find that these weaknesses are not crippling by themselves, yet we know of no study in which NDP implementations based on ropes are compared side by side with implementations based on alternative representations, such as contiguous arrays. The maximum length of the linear sequence at each leaf of a rope is controlled by a compile-time constant M . At run-time, a leaf contains a number of elements n such that 0 ≤ n ≤ M . In general, rope operations try to keep the size each leaf as close to M as possible, although some leaves will necessarily be smaller. We do not demand that a rope maximize the size of its leaves. Relaxing the perfect balance requirement reduces excessive balancing, yet maintains the asymptotic behavior of rope operations. Our rope-balancing policy is a relaxed, parallel version of the sequential policy used by Boehm, et al. [BAP95]. The policy of Boehm, et al. is as follows. For a given rope r of depth d and length n, the balancing goal is d ≤ dlog2 ne+2. This property is enforced by the function

which splits its rope parameter into two subropes such that the size of these ropes differs by at most one. We also define val splitN : ’a rope * int -> ’a rope list

which splits its parameter into n subropes, where each subrope has the same size, except for one subrope that might be smaller than the others. We sometimes use val length : ’a rope -> int

which takes a rope r and returns the number of elements stored in the leaves of r. 4 The various parallel-array operations described in Section 2.1 are implemented by analogous operations on ropes. Sections 3 and 4 describes the implementation of these rope-processing operations in detail.

3.

The Goldilocks problem

In NDP programs, computations are divided into chunks, and chunks of work are spawned in parallel. Those chunks might be defined by subsequences (of arrays, for example, or, in our case, ropes) or iteration spaces (say, k to some k + n). The choice of chunk size influences performance crucially. If the chunks are too small, there will be too much overhead in managing them; in extreme cases, the benefits of parallelism will be obliterated. On the other hand, if they are too large, there will not be enough parallelism, and some processors may run out of work. An ideal chunking strategy apportions chunks that are neither too large nor too small, but are, like Goldilocks’s third bowl of porridge, “just right.” Some different chunking strategies are considered in the sequel. 3.1

Fragile chunking strategies

A fragile chunking strategy is prone either to creating an excessive number of tasks or to missing significant opportunities for parallelism. Let us consider a two simple strategies, T -ary decomposition and structural decomposition, and the reasons that they are fragile. In T -ary decomposition, we split the input rope into T = min(n, J × P ) chunks, where n is the size of the input rope, J is a fixed compile-time constant, and P is the number processors, and spawn a task for each chunk. For example, in Figure 1(a), we show the T -ary decomposition version of the rope-map operation. 5 In computations where all rope elements take the same time to process, such as those performed by regular affine (dense-matrix) scientific codes, the T -ary decomposition will balance the work load evenly across all processors because all chunks will take about the same amount of time. On the other hand, when rope elements correspond to varying amounts of work, performance will be fragile because some processors will get overloaded and others underutilized. Excessive splitting is also a problem. Observe that for i levels of nesting and sufficiently-large ropes, the T -ary decomposition creates (J × P )i tasks overall, which can be excessive when either i or P get large. To remedy the imbalance problem, we might try structural decomposition, in which both children of a Cat node are processed in parallel and the elements of a Leaf node are processed sequentially. We show the structural version of the rope-map operation in Figure 1(b). Recall that the maximum size of a leaf is determined by

val balance : ’a rope -> ’a rope

which takes a rope r and returns a balanced rope equivalent to r (returning r itself if it is already balanced). In our rope-balancing policy, only those ropes that are built serially are balanced by balance, i.e., the serial balancing process only ever takes place within a given chunk. There is no explicit guarantee on the balance of a rope containing subropes that are built by different processors. For such a rope, the amount of rope imbalance is proportional to the distribution of work across processors rather than the size of the rope itself. As we discuss in Section 5, across all our benchmarking results, balancing has minimal impact on performance. As noted above, rope operations try to keep the size of each leaf as close to M as possible. In building ropes, rather than using the Cat constructor directly, we define a smart constructor: val cat2 : ’a rope * ’a rope -> ’a rope

If cat2 is applied to two small leaves, it may coalesce them into a single larger leaf. Note that cat2 does not guarantee balance, although it will maintain balance if applied to two balanced ropes of equal size. We also define a similar function

4 In

our actual implementation, this operation takes constant time, as we cache lengths in Cat nodes. 5 In this and subsequent examples, we use the function mapSequential with type

val catN : ’a rope list -> ’a rope

which returns the smart concatenation of its argument ropes. We sometimes need a fast, cheap operation for splitting a rope into multiple subropes. For this reason, we provide

(’a -> ’b) -> ’a rope -> ’b rope which is the obvious sequential implementation of the rope-map operation.

val split2 : ’a rope -> ’a rope * ’a rope

95

fun mapTary J f rp = let fun g chunk = fn () => mapSequential f chunk val chunks = splitN (rp, J * numProcs ()) in catN (parN (map g chunks)) end

Barnes Hut DMM Raytracer Nested Sums Quicksort SMVM Tree Rootfix

parallel efficiency

100 90 80 70 60 50 40 30 20 10 0

(a) T -ary decomposition fun mapStructural f rp = (case rp of Leaf s => mapSequential f rp | Cat (l, r) => Cat (par2 (fn () => mapStructural f l, fn () => mapStructural f r)))

(b) structural decomposition

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

log2 SST

Figure 1. Two fragile implementations of the rope-map operation.

Figure 3. Parallel efficiency is sensitive to SST (16 processors).

fun mapETS SST f rp = if length rp <= SST then mapSequential f rp else let val (l, r) = split2 rp in cat2 (par2 (fn () => mapETS SST f l, fn () => mapETS SST f r)) end

The Raytracer benchmark demonstrates, in particular, how fragile ETS can be with respect to nesting and to relatively small ropes. Raytracer loses 80% of its speedup as SST is changed from 32 to 128. The two-dimensional output of the program is a 256 × 256 rope of ropes, representing the pixels of a square image. When SST = 128, Raytracer has just two chunks it can process in parallel: the first 128 rows and the second. We could address this problem by transforming the two-dimensional representation into a single flat rope, but then the clarity of the code would be compromised, as we would have to use index arithmetic to extract any pixel. It is a break with the nested-data-parallel programming style. Recall that task execution times can vary unpredictably. Chunking strategies that are based solely on fixed thresholds, such as EBS and ETS, are bound to be fragile because they rely on accurately predicting execution times. A superior chunking strategy would be able to adapt dynamically to the current state of load balance across processors.

Figure 2. The ETS implementation of the rope-map operation. a fixed, compile-time constant called M and that rope-producing operations tend to keep the size of each leaf close to M . But by choosing an M > 1, some opportunities for parallelism will always be lost and by choosing M = 1, an excessive number of threads may be created, particularly in the case of nested loops. 3.2

Eager binary splitting

EBS is a well-known approach that is used by many parallel libraries and languages, including Thread Building Blocks [Int08, RVK08] and Cilk++ [Lei09]. In EBS (and, by extension, eager tree splitting (ETS)), we group elements into fixed-size chunks and spawn a task for each chunk. This grouping is determined by the following recursive process. Initially, we group all elements into a single chunk. If the chunk size is less than the stop-splitting threshold (SST ), evaluate the elements sequentially.6 Otherwise, we create two chunks by dividing the elements in half and recursively apply the same process to the two new chunks. For example, in Figure 2, we show the ETS version of the rope-map operation. EBS has greater flexibility than the T -ary or structural decompositions because EBS allows chunk sizes to be picked manually. But this flexibility is not much of an improvement, because, as is well known [Int08, RVK08, TCBV10], finding a satisfactory SST can be difficult. This difficulty is due, in part, to the fact that parallel speedup is very sensitive to SST . We ran an experiment that demonstrates some of the extent of this sensitivity. Figure 3 shows, for seven PML benchmarks (see Section 5 for benchmark descriptions), parallel efficiency as a function of SST . The parallel efficiency is the sixteen-processor speedup divided by sixteen times 100, where the baseline time for the speedup is taken from the sequential evaluation. For example, 100% parallel efficiency represents perfect linear speedup and 6.25% parallel efficiency represents almost no speedup. The results demonstrate that there is no SST that is optimal for every program and furthermore that a poor SST is far from optimal.

3.3

Lazy binary splitting

The LBS strategy of Tzannes, et al. [TCBV10] is a promising alternative to the other strategies because it has good adaptivity to dynamic load balance. Tzannes, et al. show that LBS is capable of performing as well or better than each configuration of eager binary splitting, and does so without tuning. LBS is similar to eager binary splitting but with one key difference. In LBS, we base each splitting decision entirely on a dynamic estimation of load balance. Let us consider the main insight behind LBS. We call a processor hungry if it is idle and ready to take on new work, and busy otherwise. It is better for a given processor to delay splitting a chunk and to continue processing local iterations while remote processors remain busy. Splitting can only be profitable when a remote processor is hungry. Although this insight is sound, it is still unclear whether it is useful. A na¨ıve hungry-processor check would require interprocessor communication, and the cost of such a check would hardly be an improvement over the cost of spawning a thread. For now, let us assume that we have a good approximate hungryprocessor check val hungryProcs : unit -> bool

which returns true if there is probably a remote hungry processor and false otherwise. Later we explain how to implement such a check. LBS works as follows. The scheduler maintains a current chunk c and a pointer i that points at the next iteration in the chunk to process. Initially, the chunk contains all iterations and i = 0.

6 In

TBB, if SST is unspecified, the default is SST = 1, whereas Cilk++ only uses SST = 1.

96

fun mapLTS f rp = if length rp <= 1 then mapSequential f rp else (case mapUntil hungryProcs f rp of More (u, p) => let val (u1, u2) = split2 u in catN (parN [fn () => balance p, fn () => mapLTS f u1, fn () => mapLTS f u2]) end | Done p => balance p)

To process an iteration i, the scheduler first checks for a remote hungry processor. If the check returns false, then all of the other processors are likely to be busy, and the scheduler greedily executes the body of iteration i. If the check returns true, then some of the other processors are likely to be hungry, and the scheduler splits the chunk in half and spawns a recursive instance to process the second half. Tzannes, et al. [TCBV10] show how to implement an efficient and accurate hungry-processor check. Their idea is to derive such a check from the work stealing policy. Recall that, in work stealing, each processor has a deque, which records the set of tasks created by that processor. The hungry-processor check bases its approximation on the size of the local deque. If the deque of a given processor contains some existing tasks, then these tasks have not yet been stolen, and therefore it is unlikely to be profitable to add to these tasks by splitting the current chunk. On the other hand, if the deque is empty, then it is a strong indication that there is a remote hungry processor, and it is probably worth splitting the current chunk. This heuristic works surprisingly well considering its simplicity. It is cheap because the check itself requires two local memory accesses and a compare instruction, and it provides an accurate estimate of whether splitting is profitable. Let us consider how LBS behaves with respect to loop nesting. Suppose our computation has the form of a doubly-nested loop, one processor is executing an iteration of the inner loop, and all other processors are hungry. Consequently, the remainder of the inner loop will be split (possibly multiple times, as work is stolen from the busy processor and further split), generating relatively small chunks of work for the other processors. Because the parallelism is fork-join, the only way for the computation to proceed to the next iteration of the outer loop is for all of the work from the inner loop to be completed. At this point, all processors are hungry, except for the one processor that completed the last bit of inner-loop work. This processor has an empty deque; hence, when it starts to execute the next iteration of the outer loop, it will split the remainder of the outer loop. Because there is one hungry-processor check per loop iteration, and because loops are nested, most hungry-processor checks occur during the processing of the innermost loops. Thus, the general pattern is clear: splits tend to start during inner loops and then move outward quickly.

4.

Figure 4. The LTS implementation of the rope-map operation. results, divide the unprocessed remainder in half, and resume processing at the pause point. A key piece of our implementation is an internal operation called mapUntil. The mapUntil operation is capable of pausing its traversal based on a runtime predicate: val mapUntil : (unit -> bool) -> (’a -> ’b) -> ’a rope -> (’a rope * ’b rope, ’b rope) progress

The first argument to mapUntil is a polling function (e.g., hungryProcs); the second argument is the function to be applied to the individual data elements; and the third argument is the input rope. Instead of returning a fully processed rope, mapUntil returns a value of type (’a rope * ’b rope, ’b rope) progress, where the type constructor progress is defined as datatype (’a, ’b) progress = More of ’a | Done of ’b

In the result of mapUntil, a value More (u, p) represents a partially processed rope where u is the unprocessed part and p is the processed part; a value Done p represents a fully processed rope. The evaluation of mapUntil cond f rp proceeds by applying f to the elements of rp from left to right until either cond () returns true or the whole rope is processed. Before we describe the implementation of mapUntil, we examine how mapUntil is used to implement mapLTS. The mapLTS algorithm, shown in Figure 4, starts by checking the length of the input rope. When the rope length is greater than one (the interesting case), the algorithm calls mapUntil to start processing elements. If this call returns a partial result (More (u, p)), then mapLTS splits the unprocessed subrope u and schedules the parallel evaluation of the balancing (if necessary) of the processed subrope p and the recursive mapping of the halves of the unprocessed subrope u. At the join of the parallel computation, the three now processed subropes are concatenated and returned. Note that because this algorithm is recursive, splitting may continue until a single rope element is reached. If the call to mapUntil returns a complete result (Done p), then p is balanced (if necessary) and returned. Balancing p (in either the More or Done cases) may be profitable here because the ropes returned by mapUntil may be unbalanced. It remains to implement the mapUntil operation. The crucial property of the mapUntil operation is that it during the traversal of the input rope, it must maintain sufficient information to, at any moment, pause the traversal and reconstruct both the processed portion of the rope and the unprocessed remainder of the rope. Huet’s zipper technique [Hue97] provides the insight necessary to derive a persistent data structure, and functional operations over it, which enable this “pausable” traversal. A zipper is a representation of an

Lazy tree splitting for ropes

LTS operations are not as easy to implement as ETS operations, because, during the execution of a given LTS operation, a split can occur while processing any rope element. This section presents implementations of five important LTS operations. The technique we use is based on Huet’s zipper technique [Hue97] and a new technique we call splitting a context. We first look in detail at the LTS version of map (mapLTS) because its implementation offers a simple survey of our techniques. Then we summarize implementations of the additional operations. 4.1

Implementing mapLTS

Structural recursion, on its own, offers no straightforward way to implement mapLTS. Consider the case in which mapLTS detects that another processor is hungry. How can mapLTS be ready to halve the as-yet-unprocessed part of the rope, keeping in mind that, at the halving moment, the focus might be on a mid-leaf element deeply nested within a number of Cat nodes? In a typical structurally recursive traversal (e.g., Figure 1(b)), the code has no handle on either the processed portion of the rope or the unprocessed remainder of the rope; it can only see the current substructure. We need to be able to “step through” a traversal in such a way that we can, at any moment, pause the traversal, reconstruct processed

97

aggregate data structure that factors the data structure into a distinguished substructure under focus and a one-hole context; plugging the substructure into the context yields the original structure. Zippers allow efficient navigation through and modification of a data structure. With a customized zipper representation, some basic navigation operations, and our novel context-splitting technique, we arrive at an elegant implementation of mapUntil. To represent the rope-map traversal, we use a context representation similar to Huet’s single-hole contexts [Hue97], but with different types of elements on either side of the hole, as in McBride’s contexts [McB08]. Thus, our context representation is defined as

Top

rp

CatL

CatL

c'

CatL

s'

(a) start rp

⇒ (s’,c’)

Top

rp'

CatR

datatype (’a, ’b) ctx = Top | CatL of ’a rope * (’a, ’b) ctx | CatR of ’b rope * (’a, ’b) ctx

CatR

c

rp

(b) next (rp,c) ⇒ Done rp’

where Top represents an empty context, CatL(r, c) represents the context surrounding the left branch of a Cat node where r is the right branch and c is the context surrounding the Cat node, and CatR(l, c) represents the context surrounding the right branch of a Cat node where l is the left branch and c is the context surrounding the Cat node. Note that, for a rope-map traversal, all subropes to the left of the context’s hole are processed (’b rope) and all subropes to the right of the context’s hole are unprocessed (’a rope). The implementation of mapUntil will require a number of operations to manipulate a context. The leftmost (rp, c) ⇒ (s’, c’) operation plugs the (unprocessed) rope rp into the context c, then navigates to the leftmost leaf of rp, returning the sequence s’ at that leaf and the context c’ surrounding that leaf:

Top

Top

CatL

CatR

CatR

c' CatL s'

c

CatR

rp

(c) next (rp,c) ⇒ More (s’,c’) Top

p'

CatL

val leftmost : ’a rope * (’a, ’b) ctx -> ’a seq * (’a, ’b) ctx

CatR

fun leftmost (rp, c) = (case rp of Leaf s => (s, c) | Cat (l, r) => leftmost (l, CatL (r, c)))

CatR

p

The start operation simply specializes leftmost to the case of the whole unprocessed rope in the empty context (see Figure 5(a)):

c

u'

u

(d) splitCtx (u,p,c) ⇒ (u’,p’)

val start : ’a rope -> ’a seq * (’a, ’b) ctx fun start rp = leftmost (rp, Top)

Figure 5. Operations on contexts. A right-facing leaf node denotes a processed node and facing the left an unprocessed node.

It is used to initialize the mapUntil traversal. The next (rp, c) operation plugs the (processed) rope rp into the context c, then attempts to navigate to the next unprocessed leaf. val next : ’b rope * (’a, ’b) ctx -> (’a seq * (’a, ’b) ctx, ’b rope) progress

val splitCtx : ’a rope * ’b rope * (’a, ’b) ctx -> ’a rope * ’b rope fun splitCtx (u, p, c) = (case c of Top => (u, p) | CatL (u’, c’) => splitCtx (cat2 (u, u’), p, c’) | CatR (p’, c’) => splitCtx (u, cat2 (p, p’), c’))

fun next (rp, c) = (case c of Top => Done rp | CatL (r, c’) => More (leftmost (r, CatR (rp, c’))) | CatR (l, c’) => next (cat2 (l, rp), c’))

This navigation can either succeed, in which case next returns More (s’, c’) (see Figure 5(c)), where s’ is the sequence at the next leaf and c’ is the context surrounding that leaf, or fail, in which case next returns Done rp’ (see Figure 5(b)), where rp’ is the whole processed rope. The final operation on contexts is an operation to split a context into a pair of ropes — the unprocessed subrope that occurs to the right of the hole and the processed subrope that occurs to the left of the hole. It is convenient for the splitCtx operation to additionally take an unprocessed rope and a processed rope meant to fill the hole, which are incorporated into the result ropes (see Figure 5(d)):

With these context operations, we give the implementation of mapUntil in Figure 6. The traversal of mapUntil is performed by the function lp. The argument s represents the sequence of the leftmost unprocessed leaf of the rope and the argument c represents the context surrounding that leaf. The processing of the sequence is performed by mapSeqUntil, a function with similar behavior to mapUntil, but implemented over linear sequences. It is mapSeqUntil that actually calls cond and applies the function f. Note that mapSeqUntil must also maintain a context with processed elements to the left and unprocessed elements to the right, but

98

fun mapUntil cond f rp = let fun lp (s, c) = (case mapSeqUntil cond f s of More (us, ps) => More (splitCtx (Leaf us, Leaf ps, c)) | Done ps => (case next (Leaf ps, c) of Done p’ => Done p’ | More (s’, c’) => lp (s’, c’))) in lp (start rp) end

In a survey on prefix sums, Blelloch describes classes of important parallel algorithms that use this operation and gives an efficient parallel implementation of prefix sums [Ble90a], on which our implementation of scanLTS is based. The algorithm takes two passes over the rope. The first performs a parallel reduction over the input rope, constructing an intermediate rope in which partial reduction results are recorded at each internal node. The second pass builds the result rope in parallel by processing the intermediate rope. The efficiency of this second pass is derived from having constant-time access to the cached sums while it builds the result. The result of this first pass is called a monoid-cached tree [HP06], specialized in the current case to monoid-cached rope. In a monoid-cached rope,

Figure 6. The mapUntil operation. doing so is trivial for a linear sequence. (Recall the standard accumulate-with-reverse implementation of map for lists.) If mapSeqUntil returns a partial result (More (us, ps)), then the traversal pauses and returns its intermediate results by splitting its context. (This pause and return gives mapLTS the opportunity to split the unprocessed elements and push the parallel mapping of these halves of the unprocessed elements onto the work-stealing deque.) If mapSeqUntil returns a complete result (Done ps), then the traversal plugs the context with this completed leaf sequence and attempts to navigate to the next unprocessed leaf by calling next (Leaf ps, c). If next returns Done p’, then the rope traversal is complete and the whole processed rope is returned. Otherwise, next returns More (s’, c’) and the traversal loops to process the next leaf sequence (s’) with the new context (c’). 4.2

datatype ’a crope = CLeaf of ’a * ’a seq | CCat of ’a * ’a crope * ’a crope

each internal node caches the reduction of its children nodes. For example, supposing the scanning operator is integer addition, one such monoid-cached rope is CCat (10, CLeaf (3, [1, 2]), CLeaf (7, [3, 4]))

Our implementation of Blelloch’s algorithm is again similar to that of mapLTS, except that we use a context in which there are ropes to the right of the hole and cached_ropes to the left of the hole. Aside from some minor complexity involving the propagation of partial sums, the operations on this context are similar to those on the context used by mapLTS. The map2LTS operation maps a binary function over a pair of ropes (of the same length).

Implementing other operations

The implementation of filterLTS is very similar to that of mapLTS. Indeed, filterLTS uses the same context representation and operations as mapLTS, simply instantiated with unprocessed and processed elements having the same type:

val map2LTS : (’a * ’b -> ’c) -> ’a rope * ’b rope -> ’b rope

For example, the pointwise addition of the ropes rp1 and rp2 can be implemented as

val filterLTS : (’a -> bool) -> ’a rope -> ’a rope

map2LTS (op +) (rp1, rp2)

type ’a filter_ctx = (’a, ’a) ctx

As with mapLTS, where the mapping operation was applied by the mapSeqUntil operation, the actual filtering of elements is performed by the filterSeqUntil operation. The reduceLTS operation takes an associative operator and its zero and a rope and returns the rope’s reduction under the operator.

Note that rp1 and rp2 may have completely different branching structures, which would complicate any structural-recursive implementation. The zipper technique provides a clean alternative: we maintain a pair of contexts and advance them together in lock step during execution. The result rope is accumulated in one of these contexts. Contexts and partial results nicely handle the processing of leaves of unequal length. When the map2SeqUntil function is applied to two leaves of unequal length, it simply returns a partial result that includes the remaining elements from the longer sequence. The map2Until function need only step the context of the shorter linear sequence to find the next leaf with which to resume the map2SeqUntil processing. Note that we do need to distinguish map2SeqUntil returning with a partial result due to the polling function, in which case map2Until should also return a partial result (signaling that a task should be pushed to the work-stealing deque), from map2SeqUntil returning with a partial result do to exhausting one of the leaves, in which case map2Until should not return a partial result. The implementation straightforwardly extends to maps of arbitrary arity.

val reduceLTS : (’a * ’a -> ’a) -> ’a -> ’a rope -> ’a

Thus, the reduceLTS operation may be seen as a generalized sum operation. The implementation of reduceLTS is again similar to that of mapLTS, but uses a simpler context: datatype ’a reduce_ctx = Top | CatL of ’a rope * ’a reduce_ctx | CatR of ’a * ’a reduce_ctx

where CatR (z, c) represents the context surrounding the right branch of a Cat node in which z is the reduction of the left branch and c is the context surrounding the reduction of the Cat node. The scanLTS operation, also known as prefix sums, is an important building block of a data-parallel programming language. Like reduceLTS, the scanLTS operation takes an associative operator and its zero and a rope and returns a rope of the reductions of the prefixes of the input rope.

4.3

Rebalancing

In our implementation, there are two circumstances in which we need to do balancing. The first is in filterLTS, because the filtering predicate may drop elements at arbitrary positions inside the rope. The second is in operations like mapLTS, because such operations may split at an arbitrary rope leaf.

val scanLTS : (’a * ’a -> ’a) -> ’a -> ’a rope -> ’a

For example, scanLTS (op +) 0 (Cat (Leaf [1, 2], Leaf [3, 4])) ⇒ Cat (Leaf [1, 3], Leaf [6, 10])

99

5.

Evaluation

PML Benchmark Barnes Hut Raytracer Quicksort SMVM DMM Tree Rootfix Nested Sums

We have already presented data that shows the performance of ETS is sensitive to the SST parameter. In this section, we present the results of additional experiments that demonstrate that LTS performs as well or better than ETS over a range of benchmarks. Furthermore, it demonstrates scalable performance without any application-specific tuning. 5.1

LTS

Par. 16

Speedup

14.63s 3.58s 3.93s 0.15s 3.49s 8.43s 1.46s

20.62s 3.54s 5.61s 0.19s 4.12s 10.44s 1.80s

2.20s 0.22s 0.51s 0.02s 0.30s 1.32s 0.14s

6.64 16.15 7.77 8.94 11.65 6.38 10.17

Table 1. The performance of LTS for seven benchmarks.

curve (over sequential PML performance) of ETS with SST values of 1, 128, and 16384 and of LTS. We have argued that one of the main advantages of LTS over ETS is that LTS does not require tuning for each benchmark. These graphs show that LTS is better than most configurations of ETS, and that the downside of picking a poor SST value for ETS can be quite severe (e.g., Figure 7(b) with an SST of 128). They also show that not only is the best choice of SST for ETS dependent on the particular benchmark, but in some cases it is also dependent on the number of processors (e.g., Figure 7(a) and (f)). With an optimal pick of SST value, ETS can outperform LTS, because of lower overhead. In our experiments, we collected data for every SST ∈ {2i | 0 ≤ i ≤ 14} and compared the best ETS performance against LTS for each benchmark on 16 processors. We found that even with always choosing the best SST value for the given benchmark and number of processors, ETS was never more than 20% faster than LTS. In practice, it is impossible to make such precise and specialized tuning decisions a priori, since workloads and compute resources are unpredictable. Therefore, we believe that LTS provides a much better solution to the Goldilocks problem. To address the question of why optimal ETS is faster than LTS, we collected profiling data for our benchmarks. This data shows that the per-processor utilization for ETS is never more than 3% greater than that of LTS, which is almost within our 2% error bar. Thus, we believe that the performance gap has to do with increased overhead, rather than poorer scheduling. We also considered the possibility that rebalancing was the source of the performance gap, but our profiling data showed that the total time spent rebalancing is an insignificant fraction of the the total program’s run time. Thus, we believe that the main source of this performance gap is the overhead of using a zipper to implement LTS (this point is discussed in further detail below). In Table 1, we present performance measurements for our seven benchmarks run in several different sequential configurations, as well as on 16 processors. The first column of data presents timing results for MLton. MLton is a sequential whole-program optimizing compiler for Standard ML [MLt, Wee06], which is the “gold standard” for ML performance. The second data column gives the baseline performance of the natural sequential PML versions of the benchmarks (i.e., parallel operations are replaced with their natural sequential equivalents). We are about a factor of two slower than MLton for all of the benchmarks except DMM and Nested Sums. Considering MLton’s suite of aggressive optimizations and maturity, the sequential performance of PML is encouraging. Our slower performance can be attributed to at least two factors. First, the MLton compiler monomorphizes the program and then aggressively flattens the resulting monomorphic data representations. Since our ropes are polymorphic, we use a boxed representation for the array elements, instead of an unboxed representation. Second, our profiling shows higher GC overheads in our system. We expect to address these issues as we improve our sequential performance.

Benchmarks

For our empirical evaluation, we use six benchmark programs from our benchmark suite and one synthetic benchmark. Each benchmark is written in a pure, functional style and was originally written by other researchers and ported to PML. All benchmarks use the same max leaf size (M = 256), which provides the best average performance over the programs in our benchmark suite. The Barnes-Hut benchmark [BH86] is a classic N-body problem solver. Each iteration has two phases. In the first phase, a quadtree is constructed from a sequence of mass points. The second phase then uses this tree to accelerate the computation of the gravitational force on the bodies in the system. Our benchmark runs 20 iterations over 200,000 particles generated in a random Plummer distribution. Our version is a translation of a Haskell program [GHC]. The Raytracer benchmark renders a 256 × 256 image in parallel as two-dimensional sequence, which is then written to a file. The original program was written in ID [Nik91] and is a simple ray tracer that does not use any acceleration data structures. The sequential version differs from the parallel code in that it outputs each pixel to the image file as it is computed, instead of building an intermediate data structure. The Quicksort benchmark sorts a sequence of 1,000,000 integers in parallel. This code is based on the N ESL version of the algorithm [Sca]. The SMVM benchmark is a sparse-matrix by dense-vector multiplication. The matrix contains 1,091,362 elements and the vector 16,614. The DMM benchmark is a dense-matrix by dense-matrix multiplication in which each matrix is 100 × 100. The Tree Rootfix benchmark takes as input a tree structure in which each node is annotated with a value and returns, for each node, the sum of the values on the path from the root of the tree down to that node. This code is based on the N ESL version of the algorithm [Sca] and we use it to measure the performance of the scanP operation. The Nested Sums benchmark is a synthetic benchmark that exhibits irregular parallelism. Its basic form is as follows: let fun upTo i = range (0, i) in mapP sumP (mapP upTo (range (0, 5999))) end

5.3

Seq.

Experimental method

Our test machine has four quad-core AMD Opteron 8380 processors running at 2.5GHz. Each core has a 512Kb L2 cache and shares a 6Mb L3 cache with the other cores of the processor. The system has 32Gb of RAM and runs Debian Linux (kernel version 2.6.31.6amd64). We ran each experiment 10 times and we report the average performance results in our graphs and tables. For most of these experiments the standard deviation was below 2%, thus we omit the error bars from our plots. 5.2

MLton 7.71s 2.29s 1.36s 0.07s 0.84s 3.79s 0.21s

Lazy vs. eager tree splitting

Our most important experimental results come from a comparing LTS to ETS side by side. Figure 7 shows speedup curves for all seven of our benchmarks. For each graph, we plot the speedup

100

16

16

LTS ETS (SST =20 ) ETS (SST =27 ) ETS (SST =214 )

14 12

12 10

speedup

10

speedup

LTS ETS (SST =20 ) ETS (SST =27 ) ETS (SST =214 )

14

8

8

6

6

4

4

2 1

2 1 1 2

4

6

8

10

number of processors

12

14

16

1 2

4

(a) Barnes-Hut 16 12

16 12

8

14

16

12

14

16

12

14

16

8

6

6

4

4

2 1

2 1 1 2

4

6

8

10

number of processors

12

14

16

1 2

4

(c) Quicksort 16 12

6

8

10

number of processors (d) SMVM

16

LTS ETS (SST =20 ) ETS (SST =27 ) ETS (SST =214 )

14

LTS ETS (SST =20 ) ETS (SST =27 ) ETS (SST =214 )

14 12 10

speedup

10 8

8

6

6

4

4

2 1

2 1 1 2

4

6

8

10

number of processors

12

14

16

1 2

(e) DMM 16

4

6

8

10

number of processors (f) Tree Rootfix

LTS ETS (SST =20 ) ETS (SST =27 ) ETS (SST =214 )

14 12 10

speedup

12

10

speedup

speedup

10

LTS ETS (SST =20 ) ETS (SST =27 ) ETS (SST =214 )

14

10

speedup

8

(b) Raytracer

LTS ETS (SST =20 ) ETS (SST =27 ) ETS (SST =214 )

14

6

number of processors

8 6 4 2 1 1 2

4

6

8

10

number of processors

12

14

16

(g) Nested Sums

Figure 7. Comparison of lazy tree splitting (LTS) to eager tree splitting with ETS.

101

6.

parallel efficiency

100 90 80 70 60 50 40 30 20 10 0

Related work

Adaptive parallel loop scheduling The original work on lazy binary splitting presents a dynamic scheduling approach for parallel do-all loops [TCBV10]. Their work addresses splitting ranges of indices, whereas ours addresses splitting trees where tree nodes are represented as records allocated on the heap. In the original LBS work, they use a profitable parallelism threshold (PPT ) to reduce the number of hungry-processor checks. The PPT is an integer which determines how many iterations a given loop can process before a doing hungry-processor check. Our performance study has PPT = 1 (i.e., one hungryprocessor check per iteration) because we have not implemented the necessary compiler mechanisms to do otherwise. Robison et al. propose a variant of EBS called auto partitioning [RVK08], which offers good performance for many programs and does not require tuning.8 Auto partitioning derives some limited adaptivity by employing the heuristic that when a task detects it has been migrated it splits its chunk into at least some fixed number of subchunks. The assumption is that if a steal occurs, there are probably other processors that need work, and it is worthwhile to split a chunk further. As discussed by Tzannes, et al. [TCBV10], auto partitioning has two limitations. First, for i levels of loop nesting, P processors, and a small, constant parameter K, it creates (K × P )i chunks, which is excessive if the number of processors is large. Second, although it has some limited adaptivity, auto partitioning lacks performance portability with respect to the context of the loop, which limits its effectiveness for scheduling programs written in the liberal loop-nesting style of an NDP language.

Barnes Hut DMM Raytracer Nested Sums Quicksort SMVM Tree Rootfix

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

log2 M

Figure 8. The effect of varying max leaf size M (16 processors)

The third data column reports the execution time of the benchmarks using the LTS runtime mechanisms (e.g., zippers), but without parallelism. By comparing these numbers with the natural sequential measurements, we get a measure of the overhead of the LTS mechanisms. On average, the LTS version is about 24% slower. We have determined through profiling that the main source of this overhead is not from calls to hungryProcs or rebalancing. Instead, the primary source of the overhead comes from maintaining the traversal state via the zipper context. Such a strategy is less efficient than implicitly maintaining the state via the run-time call stack in a natural structural recursion.7 The last two columns report the parallel execution time and speedup on sixteen processors. Overall, the speedups are quite good. The super-linear speedup of the Raytracer is explained by a reduction in GC load per processor. This reduction happened because each processor has its own local heap, so the total size of the available heap increases with the number of processors. Our GC architecture is described in more detail elsewhere [FRR08]. The Barnes-Hut benchmark achieves a modest speedup, which we believe stems from a limit on the amount of parallelism in the program. This hypothesis is supported by the fact that increasing the problem size to 400,000 particles improves the speedup results. The DMM benchmark is 25-27% slower than a perfect speedup, which is also modest considering the large amount of parallelism available in the program. We attribute the slower performance on DMM to an increase in overheads incurred by the LTS Zipper traversal. Observe that the sequential version of DMM that uses a LTS is 20% slower than a similar version that does not. There is still a question of whether our technique trades one tuning parameter (SST ) for another, the max leaf size (M ). We address this concern in two ways. First, observe that even if performance is sensitive to M , this problem is specific to ropes, but neither ETS nor LTS. Second, consider Figure 8 which shows, for each of our benchmark programs, the parallel efficiency as a function of M (the parallel efficiency has the same meaning as it does in Figure 3). The results show all benchmarks performing well for M ∈ {512, 1024, 2048}. One concern is DMM, which is sensitive to M because it does many random access operations on its two input ropes. One can reduce this sensitivity by using an alternative rope representation that provides more efficient random access.

Granularity control Early work by Loidl and Hammond in the context of Haskell compared three strategies for deciding whether to create a thread for parallel work or continue in sequence [LH95]. In simulation, they found that using a simple cut-off generated more speedup than more complicated strategies that dynamically determine whether to create a thread and which thread to run based on a priority associated with the function to run. This cut-off is a value based on a granularity estimation function provided to the parallel primitives. They found, as we did, that speedup was highly dependent upon the cut-off value. Their approach differs from ours in that the cut-off value is statically provided to the runtime; they require a function that can report a granularity metric of the work to perform based on the function being called and the data computed upon. Notably, their work handles any divideand-conquer algorithm, whereas our solution specifically addresses parallel map operations. Tick and Zhong presented an approach using compile-time granularity analysis in concurrent logic programs [TZ93]. Their compiler creates a call graph,9 collapses all strongly-connected components (mutually-recursive functions), and then walks up the collapsed graph creating recurrence equations representing cost estimates. These recurrence questions are solved at compile time and used at run time for cost estimation of functions based on their dynamic inputs. This work does not discuss how these cost metrics are integrated into their scheduler, but does provide an 85–91% accurate estimator of runtime costs for arbitrary functions across their suite of benchmarks. Their static analysis takes advantage of logic programming language features, but demonstrates a potentially more effective approach to determining a satisfactory PPT . Data parallelism N ESL is a nested data-parallel dialect of ML [BCH+ 94]. The N ESL compiler uses a program transformation 8 Auto

partitioning is currently the default chunking strategy of TBB [Int08]. 9 This language is not higher-order, which greatly simplifies the construction of the call graph.

7 Our implementation uses heap-allocated continuations to represent the call stack [App92, FFR+ 07].

102

7.

called flattening, which transforms nested parallelism into a form of data parallelism that maps well onto SIMD architectures. Note that SIMD operations typically require arrays elements to have a contiguous layout in memory. Flattened code maps well onto SIMD architectures because the elements of flattened arrays are readily stored in adjacent memory locations. In contrast, LTS is a dynamic technique that has the goal of scheduling nested parallelism effectively on MIMD architectures. A flattened program may still use LBS (or LTS) to schedule the execution of array operations on MIMD architectures, so in that sense, flattening and LTS are orthogonal. There is, of yet, no direct comparison between an NDP implementation based on LTS and an implementation based on flattening. One major difference is that LTS uses a tree representation whereas flattening uses contiguous arrays. As such, the LTS representation has two major disadvantages. First, tree random access is costlier, for a rope it is O(log n) time, where n is the length of a given rope. Second, there is a large constant factor overhead imposed by maintaining tree nodes. One way to reduce these costs is to use a “bushy” representation that is similar to ropes but where the branching factor is greater than two and child pointers are stored in contiguous arrays. The N ESL backend written by Chatterjee [Cha93] and Data Parallel Haskell [CLPK08] performs fusion of parallel operations in order to increase granularity. We do not currently implement such transformations. While fusion reduces overall work for dataparallel operations, it reduces the work per element but does not affect the coarsening of the iterations within a data-parallel operation. Such fusion techniques are orthogonal to LTS. Narlikar and Blelloch present a parallel depth-first (PDF) scheduler that is designed to minimize space usage [NB99]. Later work by Greiner and Blelloch on proposes an implementation of NDP based on the PDF scheduler [BG96]. The PDF schedule is a greedy schedule that is based on the depth-first traversal of the parallel execution graph. The PDF schedule is as close to the sequential schedule as possible in the sense that the scheduler only ever goes ahead of the sequential schedule when the scheduler is limited by data dependencies. In contrast, the work stealing approach used by LTS has each processor doing an independent depth-first traversal of that processor’s own portion of the parallel execution graph. The work on space efficient scheduling does not address the issue of building an automatic chunking strategy, which is the main contribution of LTS. Narlikar and Blelloch coarsen loops manually in order to obtain scalable parallel performance in their performance study. LTS finds good chunk sizes automatically, without programmer assistance. Ct is an NDP extension to C++ [GSF+ 07]. So et al. describe a fusion technique for Ct that is similar to the fusion technique of DPH [SGW06]. The fusion technique used by Ct is orthogonal to LTS for the same reasons as for the fusion technique of DPH. The work on Ct does not directly address the issue of building an automatic chunking strategy, which is the main contribution of LTS.

Discussion

The main idea of lazy splitting is to maintain some extra information so it is always possible to spawn off half of the remaining work. This paper presents an instantiation of this idea for operations that produce and consume ropes. Although the main idea has potential to be adapted to a larger class of divide-and-conquer programs, we believe at least three substantial challenges must be met before this goal can be achieved. The first challenge is to support other tree representations, such as, for example, red-black trees. Specifically, one must derive efficient traversal patterns that preserve the invariants of such structures. Second, LTS programs involve zippers, which are an implementation detail. Are there general techniques to derive LTS specifications automatically from more natural specifications? For example, is there a mechanical process for deriving LTS programs (e.g., mapLTS) from structural-recursive programs (e.g., mapStructural)? One possible approach is to use a static analysis to identify divide-and-conquer recursive functions, then apply a program transformation to generate analogous lazy-splitting versions. Third, there is a need for general techniques to aggregate work for small problem sizes (rope leaves effectively provide this mechanism in the system described here). Failure to provide such techniques will result in excessive overhead and limited scalability. The splitting strategy used by LBS and our LTS can cause unnecessary splitting. To understand why, observe that splitting is prone to start at the innermost loops and then work its way to the outer loops, as discussed at the end of Section 3. Having the thief worker split the outermost loops is more efficient because the outer iterations usually contain the most work. Our current implementation uses innermost splitting for two reasons. First, to support outermost splitting would involve special support from the language implementation, as splitting the outermost loop would involve modifying a part of the whole continuation, not just a part of the continuation of the current loop. Second, in our empirical study, for each benchmark, we observed that total number of splits stayed under the low hundreds. Since, steals are extremely fast in our test machine, having a few extra steals made little difference. We expect that an implementation based on outermost stealing would be superior for larger machines.

8.

Conclusion

We have described the implementation of NDP features in the Manticore system. We have also presented a new technique for parallel decomposition, lazy tree splitting, inspired by the lazy binary splitting technique for parallel loops. We presented an efficient implementation of LTS over ropes, making novel use of the zipper technique to enable the necessary traversals. Our techniques can be readily adapted to tree data structures other than ropes and is not limited to functional languages. A work-stealing thread scheduler is the only special requirement of our technique. LTS compares favorably to ETS, requiring no applicationspecific or machine-specific tuning. For any given benchmark, LTS outperforms most or all configurations of ETS, and is, at worst, only 20% slower than the optimally tuned ETS configuration. Since, in general, optimal tuning of ETS for arbitrary programs and computational resources is not possible, we believe that LTS is a superior implementation technique. The ability of LTS to enable good parallel performance without requiring application-specific tuning is very promising.

GpH GpH introduced the notion of an “evaluation strategy,” [THLP98] which is a part of a program that is dedicated to controlling some aspects of parallel execution. Strategies have been used to implement eager-splitting-like chunking for parallel computations. We believe that a mechanism like an evaluation strategy could be used to build a clean implementation of lazy tree splitting in a lazy functional language.

Acknowledgments

Cilk Cilk is a parallel dialect of the C language extended with linguistic constructs for expressing fork-join parallelism [FLR98]. Cilk is designed for parallel function calls but not loops, whereas our approach addresses both.

We would like to thank the anonymous referees for their helpful suggestions and the National Science Foundation for their support under Grants CCF-0811389, CCF-0811419, and CCF-1010568. The views and conclusions contained herein are those of the authors

103

and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of these organizations or the U.S. Government.

[Hal84] Halstead Jr., R. H. Implementation of multilisp: Lisp on a multiprocessor. In LFP ’84. ACM, August 1984, pp. 9–17.

References

[Hue97] Huet, G. The zipper. JFP, 7(5), 1997, pp. 549–554.

[HP06] Hinze, R. and R. Paterson. Finger trees: a simple generalpurpose data structure. JFP, 16(2), 2006, pp. 197–217. [Int08] Intel. Intel Threading Building Blocks Reference Manual, 2008.

[App89] Appel, A. W. Simple generational garbage collection and fast allocation. SP&E, 19(2), 1989, pp. 171–183.

[Kel99] Keller, G. Transformation-based Implementation of Nested Data Parallelism for Distributed Memory Machines. Ph.D. dissertation, Technische Universit¨at Berlin, Berlin, Germany, 1999.

[App92] Appel, A. W. Compiling with Continuations. Cambridge University Press, Cambridge, England, 1992. [BAP95] Boehm, H.-J., R. Atkinson, and M. Plass. Ropes: an alternative to strings. SP&E, 25(12), December 1995, pp. 1315–1330.

[Lei09] Leiserson, C. E. The Cilk++ concurrency platform. In DAC ’09, San Francisco, California, 2009. ACM, pp. 522–527.

[BCH+ 94] Blelloch, G. E., S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested dataparallel language. JPDC, 21(1), 1994, pp. 4–14.

[Les05] Leshchinskiy, R. Higher-Order Nested Data Parallelism: Semantics and Implementation. Ph.D. dissertation, Technische Universit¨at Berlin, Berlin, Germany, 2005.

[BG96] Blelloch, G. E. and J. Greiner. A provable time and space efficient implementation of NESL. In ICFP ’96. ACM, May 1996, pp. 213–225.

[LH95] Loidl, H. W. and K. Hammond. On the Granularity of Divideand-Conquer Parallelism. In GWFP ’95. Springer-Verlag, 1995, pp. 8–10.

[BH86] Barnes, J. and P. Hut. A hierarchical o(n log n) force calculation algorithm. Nature, 324, December 1986, pp. 446–449.

[McB08] McBride, C. Clowns to the left of me, jokers to the right (pearl): dissecting data structures. In POPL ’08. ACM, January 2008, pp. 287–295. [MLt] MLton. The MLton Standard ML compiler. Available at http://mlton.org. [MTHM97] Milner, R., M. Tofte, R. Harper, and D. MacQueen. The Definition of Standard ML (Revised). The MIT Press, Cambridge, MA, 1997. [NB99] Narlikar, G. J. and G. E. Blelloch. Space-efficient scheduling of nested parallelism. ACM TOPLAS, 21(1), 1999, pp. 138– 173. [Nik91] Nikhil, R. S. ID Language Reference Manual. Laboratory for Computer Science, MIT, Cambridge, MA, July 1991. [Rai07] Rainey, M. The Manticore runtime model. Master’s dissertation, University of Chicago, January 2007. Available from http://manticore.cs.uchicago.edu. [Rai09] Rainey, M. Prototyping nested schedulers. In M. Felleisen, R. Findler, and M. Flatt (eds.), Semantics Engineering with PLT Redex. MIT Press, 2009. [RVK08] Robison, A., M. Voss, and A. Kukanov. Optimization via Reflection on Work Stealing in TBB. In IPDPS ’08. IEEE Computer Society Press, 2008. [Sca] Scandal Project. A library of parallel algorithms written NESL. Available from http://www.cs.cmu.edu/ ∼scandal/nesl/algorithms.html. [SGW06] So, B., A. Ghuloum, and Y. Wu. Optimizing data parallel operations on many-core platforms. In STMCS ’06, 2006. [TCBV10] Tzannes, A., G. C. Caragea, R. Barua, and U. Vishkin. Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In PPoPP ’10, Bangalore, India, January 2010. ACM, pp. 179– 190. [THLP98] Trinder, P. W., K. Hammond, H.-W. Loidl, and S. L. Peyton Jones. Algorithm + strategy = parallelism. JFP, 8(1), January 1998, pp. 23–60. [TZ93] Tick, E. and X. Zhong. A compile-time granularity analysis algorithm and its performance evaluation. In FGCS ’92, Tokyo, Japan, 1993. Springer-Verlag, pp. 271–295. [Wee06] Weeks, S. Whole program compilation in MLton. Invited talk at ML ’06 Workshop, September 2006. Invited talk; slides available at http://mlton.org/pages/ References/attachments/060916-mlton.pdf.

[BL99] Blumofe, R. D. and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5), 1999, pp. 720– 748. [Ble90a] Blelloch, G. E. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990. [Ble90b] Blelloch, G. E. Vector models for data-parallel computing. MIT Press, Cambridge, MA, USA, 1990. [Ble96] Blelloch, G. E. Programming parallel algorithms. CACM, 39(3), March 1996, pp. 85–97. [BS81] Burton, F. W. and M. R. Sleep. Executing functional programs on a virtual tree of processors. In FPCA ’81. ACM, October 1981, pp. 187–194. [Cha93] Chatterjee, S. Compiling nested data-parallel programs for shared-memory multiprocessors. ACM TOPLAS, 15(3), July 1993, pp. 400–462. [CLP+ 07] Chakravarty, M. M. T., R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A status report. In DAMP ’07. ACM, January 2007, pp. 10–18. [CLPK08] Chakravarty, M. M. T., R. Leshchinskiy, S. Peyton Jones, and G. Keller. Partial Vectorisation of Haskell Programs. In DAMP ’08. ACM, January 2008. [FFR+ 07] Fluet, M., N. Ford, M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Status Report: The Manticore Project. In ML ’07. ACM, October 2007, pp. 15–24. [FLR98] Frigo, M., C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI ’98, June 1998, pp. 212–223. [FRR+ 07] Fluet, M., M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Manticore: A heterogeneous parallel language. In DAMP ’07. ACM, January 2007, pp. 37–44. [FRR08] Fluet, M., M. Rainey, and J. Reppy. A scheduling framework for general-purpose parallel languages. In ICFP ’08, Victoria, BC, Candada, September 2008. ACM, pp. 241–252. [FRRS08] Fluet, M., M. Rainey, J. Reppy, and A. Shaw. Implicitlythreaded parallelism in Manticore. In ICFP ’08, Victoria, BC, Candada, September 2008. ACM, pp. 119–130. [GHC] GHC. Barnes Hut benchmark written in Haskell. Available from http://darcs.haskell.org/packages/ ndp/examples/barnesHut/. [GSF+ 07] Ghuloum, A., E. Sprangle, J. Fang, G. Wu, and X. Zhou. Ct: A flexible parallel programming model for tera-scale architectures. Technical report, Intel, October 2007. Available at http://techresearch.intel.com/UserFiles/ en-us/File/terascale/Whitepaper-Ct.pdf.

104

Semantic Subtyping with an SMT Solver Gavin M. Bierman Andrew D. Gordon

C˘at˘alin Hrit¸cu

David Langworthy

Saarland University

Microsoft Corporation

Microsoft Research

Abstract

idioms (such as uses of singleton and union types). Hence, type safety is enforced by dynamic checks, or not at all. This paper studies the problem of type-checking code that uses type-tests and refinements via a core calculus, named Dminor, whose syntax is a small subset of M, and which is expressive enough to encode all the essential features of the full M language. In the remainder of this section, we elaborate on the difficulties of type-checking Dminor (and hence M), and outline our solution, which is to use semantic subtyping rather than syntactic rules.

We study a first-order functional language with the novel combination of the ideas of refinement type (the subset of a type to satisfy a Boolean expression) and type-test (a Boolean expression testing whether a value belongs to a type). Our core calculus can express a rich variety of typing idioms; for example, intersection, union, negation, singleton, nullable, variant, and algebraic types are all derivable. We formulate a semantics in which expressions denote terms, and types are interpreted as first-order logic formulas. Subtyping is defined as valid implication between the semantics of types. The formulas are interpreted in a specific model that we axiomatize using standard first-order theories. On this basis, we present a novel type-checking algorithm able to eliminate many dynamic tests and to detect many errors statically. The key idea is to rely on an SMT solver to compute subtyping efficiently. Moreover, interpreting types as formulas allows us to call the SMT solver at run-time to compute instances of types.

1.1

• A refinement type, (x : T where e), consists of the values x of T

satisfying the Boolean expression e.

Categories and Subject Descriptors F.3.3 [Logics and Meanings of Programs]: Studies of Program Constructs—Type structure; D.3.1 [Programming Languages]: Formal Definitions and Theory—Semantics; F.3.2 [Logics and Meanings of Programs]: Semantics of Programming Languages—Denotational semantics; Operational semantics; Program analysis General Terms

1.

Programming with Type-Test and Refinement

The core types of Dminor are structural types for scalars, unordered collections, and records. (Following the database orientation of M, we refer to records as entities.) We write S <: T for the subtype relation, which means that every value of type S is also of type T . Two central primitives of Dminor are the following:

• A type-test expression, e in T , returns true or false depending on

whether or not the value of e belongs to type T . As we shall see, many types are derivable from these primitive constructs and their combination. For example, the singleton type [v], which contains just the value v, is derived as the refinement type (x : Any where x == v), where Any is the type of all values. The union type T | U, which contains the values of T together with the values of U, is derived as (x : Any where (x in T ) || (x in U)). Here is a snippet from a typical Dminor (and M) program for processing a DSL, a language of while-programs. The type is a union of different sorts of statements, each of which is an entity with a kind field of singleton type. (The snippet relies on an omitted—but similar—recursive type of arithmetic expressions.)

Languages, Theory, Verification

Introduction

This paper studies first-order functional programming in the presence of both refinement types (types qualified by Boolean expressions) and type-tests (Boolean expressions testing whether a value belongs to a type). The novel combination of type-test and refinement types appears in a recent commercial functional language, code-named M [1], whose types correspond to relational schemas, and whose expressions compile to SQL queries. Refinement types are used to express SQL table constraints within a type system, and type-tests are useful for processing relational data, for example, by discriminating dynamically between different forms of union types. Still, although useful and extremely expressive, the combination of type-test and refinement is hard to type-check using conventional syntax-driven subtyping rules. The preliminary implementation of M uses such subtyping rules and has difficulty with certain sound

type Statement = {kind:["assignment"]; var: Text; rhs: Expression;} | {kind:["while"]; test:Expression; body:Statement;} | {kind:["if"]; test:Expression; tt:Statement; ff:Statement;} | {kind:["seq"]; s1:Statement; s2:Statement;} | {kind:["skip"];};

In languages influenced by HOPE [10], such as ML and Haskell, we would use the built-in notion of algebraic type to represent such statements. But like many data formats, including relational databases, S-expressions, and JavaScript Object Notation (JSON) [11], the data structures of M and Dminor do not take as primitive the idea of data tagged with data constructors. Instead, we need to follow an idiom such as shown above, of taking the union of entity types that include kind fields of distinct singleton types. If y has type Statement, we may process such data as follows:

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

((y.kind == "assignment") ? y.var : "NotAssign") Intuitively, this code is type-safe because it checks the kind field before accessing the var field, which is only present for assignment

105

statements. More precisely, to type-check the then-branch y.var to type Text, we have y : Statement (i.e. a union type encoded using refinements and type-test), know that y.kind == "assignment", and need to decide [y] <: {var : Text; }. Subtyping should succeed, but clearly requires relatively sophisticated symbolic computation, including case analysis and propagation of equations. This is a typical example where syntax-driven rules for refinements and type-test are inadequate, and indeed this simple example cannot be checked statically by the preliminary release of M. Our proposal is to delegate the hard work to an external prover.

(2) Development of the theory, including both a declarative type assignment relation, and algorithmic rules in the bidirectional style. Our correctness results cover the core type assignment relation, the bidirectional rules, the algorithmic purity check, and some logical optimizations.

1.2

(4) Devising a systematic way to use the models produced by the SMT solver as evidence of satisfiability in order to provide precise counterexamples to typing, detect empty types and generate instances of types. The latter enables a new form of declarative constraint programming, where constraints arise from the interpretation of a type as a formula.

(3) An implementation based on checking semantic subtyping by constructing proof obligations for an external SMT solver. The proof obligations are interpreted in a model that is formalized in Coq and axiomatized using standard first-order theories (integers, datatypes and extensional arrays).

An Opportunity: SMT as a Platform

Over the past few years, there has been tremendous progress in the field of Satisfiability Modulo Theories (SMT), that is, for (fragments of) first-order logic plus various standard theories such as equality, real and integer (linear) arithmetic, bit vectors, and (extensional) arrays. Some of the leading systems include CVC3 [5], Yices [17], and Z3 [13]. There are common input formats such as Simplify’s [15] unsorted S-expression syntax and the SMT-LIB standard [36] for sorted logic. Hence, first-order logic with standard theories is emerging as a computing platform. Software written to generate problems in a standard format can rely on a wide range of back-end solvers, which get better over time due in part to healthy competition,1 and which may even be run in parallel when sufficient cores are available. There are limitations, of course, as firstorder validity is undecidable even without any theories, so solvers may fail to terminate within a reasonable time, but recent progress has been remarkable. 1.3

1.5

§2 describes the formal syntax of Dminor together with a smallstep operational semantics, e → e0 , where e and e0 are expressions. We encode a series of type idioms to illustrate the expressiveness of the language and its type system. §3 presents a logical semantics of pure expressions (those without side-effects) and Dminor types; each pure expression e is interpreted as a term R[[e]] and each type T is interpreted as a first-order logic formula F[[T ]](t). The formulas are interpreted in a specific model that we have formalized in Coq. Theorem 1 is a full abstraction result: two pure expressions have the same logical semantics just when they are operationally equivalent. We describe how to show purity of expressions using a syntactic termination restriction together with a confluence check that relies on the logical semantics. Theorem 2 shows that our algorithmic purity check is indeed a sufficient condition for purity. §4 presents the declarative type system for Dminor. The type assignment relation has the form E ` e : T , meaning that expression e has type T given typing environment E. Theorem 3 concerns logical soundness of type assignment; if e is assigned type T then formula F[[T ]](R[[e]]) holds. Progress and preservation results (Theorems 4 and 5) relate type assignment to the operational semantics, entailing that well-typed expressions cannot go wrong. §5 develops additional theory to justify our implementation techniques. First, we present simpler variations of the translations R[[e]] and F[[T ]](t), optimized by the observation that during typechecking we only interpret well-typed expressions, and so we need not track error values. Theorem 6 shows soundness of this optimization. Second, since the declarative rules of §4 are not directly algorithmic, we propose type checking and synthesis algorithms, presented as bidirectional rules. Theorem 7 shows these are sound with respect to type assignment. §6 shows how to use the models produced by the SMT solver to provide very precise counterexamples when type-checking fails and to find inhabitants of types statically or dynamically. §7 reports some details of our implementation. We survey related work in §8, before concluding in §9. A technical report [8] contains additional details and proofs.

Semantic Subtyping with an SMT Solver

The central idea in this paper is a type-checking algorithm for Dminor that is based on deciding subtyping by invoking an external SMT solver. To decide whether S is a subtype of T , we construct first-order formulas F[[S]](x) and F[[T ]](x), which hold when x belongs to the type S and the type T , respectively, and ask the solver whether the formula F[[S]](x) =⇒ F[[T ]](x) is valid, given any additional constraints known from the typing environment. This technique is known as semantic subtyping [2, 22], as opposed to the more common alternative, syntactic subtyping, which is to define syntax-driven rules for checking subtyping [34]. The idea of using an external solver for type-checking with refinement types is not new. Several recent type-checkers for functional languages, such as SAGE [20, 26], F7 [6], and Dsolve [38], rely on various SMT solvers. However, these systems all rely on syntactic subtyping, with the solver being used as a subroutine to check constraints during subtyping. To the best of our knowledge, our proposal to implement semantic subtyping by calling an external SMT solver is new. Semantic subtyping nicely exploits the solver’s knowledge of firstorder logic and the theory of equality; for example, we represent union and intersection types as logical disjunctions and conjunctions, which are efficiently manipulated by the solver. Hence, we avoid the implementation effort of explicit propagation of known equality constraints, and of syntax-driven rules for union and intersection types [16]. Moreover, we exploit the theories of extensional arrays [14], integer arithmetic, and algebraic datatypes. 1.4

Structure of the Paper

2.

Contributions of the Paper

(1) Investigation of semantic subtyping for a core functional language with both refinement types and type-test expressions (a novel combination, as far as we know). We are surprised that so many typing constructs are derivable from this combination.

Syntax and Operational Semantics

Dminor is a strict first-order functional language whose data includes scalars, entities, and collections; it has no mutable state, and its only side-effects are non-termination and non-determinism. This section describes: (1) the syntax of expressions, types, and global function definitions; (2) the operational semantics; (3) the definition of pure expressions (those without side-effects); and (4) some encodings to justify our expressiveness claims.

1 Most

important is the SMT-COMP [4] competition held each year in conjunction with CAV and in which more than a dozen SMT solvers contend.

106

The following example introduces the basic syntax of Dminor. An accumulate expression is a fold over an unordered collection; to evaluate from x in e1 let y = e2 accumulate e3 , we first evaluate e1 to a collection v, evaluate e2 to an initial value u0 , and then compute a series of values ui for i ∈ 1..n, by setting ui to the value of e3 {vi /x}{ui−1 /y}, and eventually return un , where v1 , . . . , vn are the items in the collection v, in some arbitrary order. 4 NullableInt = Integer | [null]

All values have type Any, the top type. The values of a scalar type G are the scalars in the set K(G) defined above. The values of type T ∗ are collections of values of type T . The values of type {` : T } are entities with (at least) a field ` holding values of type T . (We show in §2.4 how to define multi-field entity types as a form of intersection type.) Finally, the values of a refinement type (x : T where e) are the values v of type T such that the boolean expression e{v/x} returns true.

removeNulls(xs : NullableInt∗) : Integer∗ { from x in xs let a = ({}:Integer∗) accumulate (x!=null) ? (x :: a) : a }

Syntax of Expressions: e ::= expression x variable c scalar constant ⊕(e1 , . . . , en ) operator application e1 ?e2 : e3 conditional let x = e1 in e2 let-expression (scope of x is e2 ) e in T type-test {`i ⇒ ei i∈1..n } entity (`i distinct) e.` field selection {v1 , . . . , vn } collection (multiset) e1 :: e2 adding element e1 to collection e2 from x in e1 iteration over collection let y = e2 accumulate e3 (scope of x and y is e3 ) f (e1 , . . . , en ) function application

The type NullableInt is defined as the union of Integer with the singleton type containing only the value null. We then define a function removeNulls that iterates over its input collection and removes all null elements. As expected, executing removeNulls({1, null, 42, null}) produces {1, 42} (which denotes the same collection as {42, 1}). Given that the collection xs contains elements of type NullableInt (xs : NullableInt∗), that x is an element of xs, and the check that x != null, our type-checking algorithm infers that on the if branch x : Integer, and therefore the result of the comprehension is Integer∗, as declared by the function. If we remove the check that x != null, and copy all elements with x :: a then type-checking fails, as expected. 2.1

Expressions and Types

We observe the following syntactic conventions. We identify all phrases of syntax (such as types and expressions) up to consistent renaming of bound variables. For any phrase of syntax φ we write φ {v/x} for the outcome of a capture-avoiding substitution of v for each free occurrence of x in φ . We write fv(φ ) for the set of variables occurring free in φ . We assume some base types for integers, strings, and logical values, together with constants for each of these types, as well as a null value. We also assume an assortment of primitive operators; they are all binary apart from negation !, which is unary.

Variables, constants, operators, conditionals, and let-expressions are standard. When ⊕ is binary, we often write e1 ⊕ e2 instead of ⊕(e1 , e2 ). A type-test, e in T , returns a boolean to indicate whether or not the value of e inhabits the type T . The accumulate primitive can encode all the usual operations on collections: counting the number of elements of or occurrences of a certain element, checking membership, removing duplicates and elements, multiset union and difference, as well as LINQ [30] queries and comprehensions in the style of the nested relational calculus [9]. The precise definitions are in the technical report. To complete the syntax of Dminor, we interpret types and expressions in the context of a fixed collection of first-order, dependently-typed, potentially recursive function definitions. We assume for each expression f (e1 , . . . , en ) in a source program that there is a corresponding function definition for f with arity n.

Scalar Types, Constants, and Operators: G ::= Integer | Text | Logical scalar type K(Integer) = {i | integer i} K(Text) = {s | string s} K(Logical) = {true, false} c ∈ K(Integer) ∪ K(Text) ∪ K(Logical) ∪ {null} scalar constants ⊕ ∈ {+, −, ×, <, >, ==, !, &&, ||} primitive operators

Function Definitions: f (x1 : T1 , . . . , xn : Tn ) : U{e} We assume a finite, global set of function definitions, each of which associates a function name f with a dependent signature x1 : T1 , . . . , xn : Tn → U, formal parameters x1 , . . . , xn , and a body e, such that fv(e) ⊆ {x1 , . . . , xn } and fv(U) ⊆ {x1 , . . . , xn }.

A value may be a simple value (an integer, string, boolean, or null), a collection (a finite multiset of values), or an entity (a finite

set of fields, each consisting of a value with a distinct label). Syntax of Values: v ::= c {v1 , . . . , vn } {`i ⇒ vi i∈1..n }

2.2

value scalar (or simple value) collection (multiset; unordered) entity (`i distinct)

We define a nondeterministic, potentially divergent, small-step reduction relation e → e0 , together with a standard notion of expressions going wrong, to be prevented by typing. Each primitive operator is a partial function represented by a set of equations ⊕(v1 , . . . , vn ) 7→ v0 where each vi is a value. The == operator implements syntactic equality, which for collections and entities is up to reordering of elements. Apart from ==, the other operators only act on scalar values.

We identify values u and v, and write u = v, when they are identical up to reordering the items within collections or entities. Although collections are unordered, ordered lists can be encoded using nested entities (see §2.4).

Reduction Contexts: R ::= reduction context ⊕(v1 , . . . , v j−1 , •, ei+1 , . . . , en ) •?e2 : e3 | let x = • in e2 | • in T {`i ⇒ vi i∈1.. j−1 , ` j ⇒ •, `i ⇒ ei i∈ j+1..n } •.` | • :: e | v :: • | from x in • let y = e2 accumulate e3 f (v1 , . . . , v j−1 , •, ei+1 , . . . , en )

Syntax of Types: S, T,U ::= type Any

G T∗ {` : T } (x : T where e)

Operational Semantics

the top type scalar type collection type (single) entity type refinement type (scope of x is e)

107

Reduction Rules for Standard Constructs: e → e0 =⇒ R[e] → R[e0 ] ⊕(v1 , . . . , vn ) → v if ⊕(v1 , . . . , vn ) 7→ v defined true?e2 : e3 → e2 false?e2 : e3 → e3 let x = v in e2 → e2 {v/x} {`i ⇒ vi i∈1..n }.` j → v j where j ∈ 1..n v :: {v1 , . . . , vn } → {v1 , . . . , vn , v} from x in {v1 , . . . , vn } let y = e2 accumulate e3 → let y = e2 in let y = e3 {v1 /x} in . . . let y = e3 {vn /x} in y f (v1 , . . . , vn ) → e{v1 /x1 } . . . {vn /xn } given function definition f (x1 : T1 , . . . , xn : Tn ) : U{e}

that e yields true or false without getting stuck). Checking for purity is undecidable, but we present sufficient conditions for checking purity algorithmically, in §3.1. We assume that a subset of the function definitions are labeledpure; we intend that only these functions may be called from pure expressions. Let an expression e be terminating if and only if there exists no unbounded sequence e → e1 → e2 → . . . . Let a closed expression e be pure if and only if (1) e is terminating, (2) there exists a unique result r such that e ⇓ r, (3) for every subexpression f (e1 , . . . , en ) of e, the function f is labeled-pure, and (4) all subexpressions of e are pure. Let an arbitrary expression e be pure if and only if eσ is pure for all closing substitutions σ that assign a value to each free variable in e. Finally, we require that the body of every labeled-pure function is a pure expression.

Reduction Rules for Type-Test: v in Any → true true if v ∈ K(G) v in G → false otherwise v j in T j if v = {`i ⇒ vi i∈1..n } ∧ j ∈ 1..n v in {` j : T j } → false otherwise v1 in T && . . . && vn in T if v = {v1 , . . . , vn } v in T ∗ → false otherwise v in (x : T where e) → v in T && e{v/x}

2.4

Encoding of Empty and Singleton Types: 4 Empty = (x : Any where false) 4 [e] = (x : Any where x == e) (e pure, x ∈ / fv(e))

The reduction rules for type-test expressions, e in U, first reduce e to a value v and then proceed by case analysis on the structure of the type U. In case U is a refinement type (x : T where e) then v is a value of U if and only if v is a value of type T and e{v/x} reduces to the value true. Nondeterminism arises from the reduction rule for accumulate expressions. Since collections are unordered, the rule applies for any permutation of {v1 , . . . , vn }. For example, 4 consider the expression pick v1 v2 = from x in {v1 , v2 } let y = null accumulate x; we have both pick true false →∗ true and pick true false →∗ false. Next, we use reduction to define an evaluation relation, which relates a closed expression to its return values, or to Error, in case reduction gets stuck before reaching a value. Stuckness, Results, and Evaluation: e ⇓ r for closed e Let e be stuck if and only if e is not a value and ¬∃e0 .e → e0 . r ::= Error | Return(v) results of evaluation e ⇓ Return(v) if and only if e →∗ v e ⇓ Error if and only if there is e0 such that e →∗ e0 and e0 is stuck.

The type Empty has no elements; it is a subtype of all other types. The singleton type, [e], contains only the value of pure expression e (for example, type [null] consists just of the null value). Our calculus includes the operators of propositional logic on boolean values. We lift these operators to act on types as follows. Encoding of Union, Intersection, and Negation Types: 4 T | U = (x : Any where (x in T ) || (x in U)) x∈ / fv(T,U) 4 T & U = (x : Any where (x in T ) && (x in U)) 4 !T = (x : Any where !(x in T )) A value of the union type, T | U, is a value of T or of U. A value of the intersection type, T & U, is a value of both T and U. A value of the negation type, !T , is a value that is not a value of T . Next, we define the types of simple values, collections, and entities. We rely on the primitive types Integer, Text, and Logical, the primitive type constructor T ∗ for collections, and the fact that every proper value is either a scalar, a collection, or an entity: so the type of entities is the complement of the union type General | Collection. Encoding of Supertypes: 4 General = Integer | Text | Logical | [null] 4 Collection = Any∗ 4 Entity = !(General | Collection)

Let closed expression e go wrong if and only if e ⇓ Error. 4 For example, we have that stuck ⇓ Error, where stuck = {}.` for some label `. In the presence of type-test and refinement types, expressions can go wrong in unusual ways. For example, given the refinement type T = (x : Any where stuck), any type-test v in T goes wrong. The main goal of our type system is to ensure that no closed well-typed expression goes wrong.

2.3

Derived Types

We end this section by exploring the expressiveness of the primitive types introduced above, and in particular of the combination of refinement types and dynamic type-test. We show that the range of derivable types is rather wide. We begin with some basic examples.

The primitive type of entities is unary: the type {` : T } is the set of entities with a field ` whose value belongs to T (and possibly other fields). As in Forsythe [37], we derive multiple-field entity types as an intersection type. One advantage of this approach is that it immediately entails width subtyping for entities. Encoding of Multiple-Field Entity Types:

Pure Expressions and Refinement Types

A problem in languages with refinement types (x : T where e) is that the refinement expression e, even though well-typed, has effects, such as non-termination or non-determinism, and so makes no sense as a boolean condition. In Dminor calls to recursive functions can cause divergence, and since collections are unordered, iterating over them with accumulate may be nondeterministic, as above. To address this problem, we define the set of pure expressions, the ones that may be used as refinements. The details, below, are a little technical, but the gist is that pure expressions must be terminating, have a unique result (which may be Error), and must only call functions whose bodies are pure. The typing rule (Type Refine) in §4 requires that for (x : T where e) to be well-formed, the expression e must be pure and of type Logical (which guarantees

{`i : Ti ;

4 i∈1..n } = {`1

: T1 } & . . . & {`n : Tn }

(`i distinct, n > 0)

We can also derive closed entity types, which only contain entities with a fixed set of labels and therefore do allow width subtyping. To do so we constrain the multiple-field entity types above to additionally satisfy an eta law. Encoding of Closed Entity Types: 4 closed{`i : Ti ; i∈1..n } = i∈1..n (x : {`i : Ti ; } where x == {`i ⇒ x.`i i∈1..n })

108

that are true in our intended model, and ask the solver to prove the validity of the implication. We use Coq to state our model and to derive soundness of the axioms given to the SMT solver, but semantic subtyping calls only the SMT solver, not Coq. To represent the intended logical model formally sets are encoded as Coq types, and functions are encoded as Coq functions. We start with inductive types Value and Result given as grammars in §2 (for brevity we omit the corresponding Coq definitions; they are given in the technical report [8] ). We define a predicate Proper that is true for results that are not Error, and a function out V that returns the value inside if the result passed as argument is proper and null otherwise.

Pair types are just a special case of closed entity types. Given pair types, refinement types, and type-test, we can also encode dependent pair types Σx : T.U where x is bound in U. Encoding of Pair Types and Dependent Pair Types: 4 T ∗U = closed{fst : T ; snd : U; } 4 (Σx : T.U) = (p : T ∗ Any where let x = p.fst in (p.snd in U)) Sum types are obtained from union types by adding an additional Boolean tag; variant types are a simple generalization. Encoding of Sum and Variant Types: 4 T +U = ([true] ∗ T ) | ([false] ∗U) 4 h`1 : T1 ; . . . ; `n : Tn i = ([`1 ] ∗ T1 ) | . . . | ([`n ] ∗ Tn )

Model: Proper Results: Definition Proper (res : Result) := match res with | Return v ⇒ true | Error ⇒ false end. Definition out V (res : Result) : Value := match res with | Return v ⇒ v | Error ⇒ v null end.

Recursive types can be encoded as boolean recursive functions that dynamically test whether a given value has the required type. Using recursive, sum, and pair types we can encode any algebraic datatype. For instance the type of lists of elements of type T can be encoded as follows. Encoding List Types 4 ListT = (T ∗ (x : Any where fListT (x))) + [null] where fListT (x) is a new labeled pure function defined by fListT (x : Any) : Logical { x in ((T ∗ (x : Any where fListT (x))) + [null])}

Our semantics uses many-sorted first-order logic (each sort is interpreted by a Coq type of the same name). We write predicates as functions to sort bool, with truth values true and false. We assume a collection of sorted function symbols whose interpretation in the intended model is given below. Let t range over FOL terms; we write t : σ to mean that term t has sort σ ; if we omit the sort of a bound variable, it may be assumed to be Value. Similarly, free variables have sort Value by default. If F is a formula, let |= F mean that F is valid in our intended model. Our semantics consists of three translations:

Lists can be used to encode XML and JSON. Hence, Dminor can be viewed as a richly typed functional notation for manipulating data in XML format. In fact, DTDs can be encoded as Dminor types. XML data can be loaded into Dminor even if there is no prior schema. We map an XML element to an entity, with a field to represent the name of the element, additional fields for any attributes on the element, and a final field holding a list of all the items in the body of the element. Next, we show how to derive entity types for the common situation where the type of one field depends on the value of another. A dependent intersection type (s : T & U) [27] is essentially the intersection of T and U, except that the variable s is bound to the underlying value, with scope U. The type T cannot mention s, but we can rely on s : T when checking well-formedness of U. Encoding of Dependent Intersection Types: 4 (s : T & U) = (s : T where s in U)

• For any pure expression e, we have the FOL term R[[e]] : Result. • For any Dminor type T and FOL term t : Value, we have the

FOL formula F[[T ]](t), which is valid in the intended model if and only if the value denoted by t is a member of the type T . • For type T and FOL term t : Value, we have the formula

W[[T ]](t), which holds if and only if a type-test goes wrong when showing that the value denoted by t is a member of T . For instance, we have |= W[[(x : Any where stuck)]](null) ⇔ true, but |= W[[Any]](null) ⇔ false. These three (mutually recursive) translations are defined below. We rely on notations for let-binding within terms (let x = t in t 0 ), and terms conditional on formulas (if F then t else t 0 ). These notations are supported directly by most SMT solvers. Given these we can define the monadic bind for propagating errors as a simple notation. Notice that |= (Bind x ⇐ Return(v) in t) = t{v/x} and |= (Bind x ⇐ Error in t) = Error.

With this construct, we can define entity types where the type of one field depends on the value of another. For example, (p : {X : Integer} & {Y : (y : Integer where y < p.X)}) is the type of points below the diagonal. To further illustrate the power of collection types combined with refinements, we give types below that express universal and existential quantifications over the items in a collection. Collection {v1 , . . . , vn } : T ∗ has type all(x : T )e if e{vi /x} for all i ∈ 1..n, and, dually, it has type exists(x : T )e if e{vi /x} for some i ∈ 1..n. Quantifying Over Collections: 4 all(x : T )e = (x : T where e)∗ 4 exists(x : T )e = T ∗ & !(all(x : T )!e)

3.

Notation: Monadic Bind for Propagating Errors: 4 Bind x ⇐ t1 in t2 = (if ¬Proper(t1 ) then Error else let x = out V(t1 ) in t2 ) We begin by describing the semantics of some core types and expressions. The semantics of refinement types F[[(x : T where e)]](t) relies on the result of evaluating e with x bound to t. Remember however that operationally the type test v in (x : T where e) evaluates to Error if e{v/x} evaluates to Error or to a value that is not true or false. We use W[[(x : T where e)]](t) to record this fact, and we enforce that R[[e in T ]] returns Error if W[[T ]](t) holds. Tracking type tests going wrong is crucial for our full-abstraction result.

Logical Semantics

Semantics: Core Types and Expressions: F[[Any]](t) = true W[[Any]](t) = false

In this section we give a set-theoretic semantics for types and pure expressions. Pure expressions are interpreted as first-order terms, while types are interpreted as formulas in many-sorted first-order logic (FOL). These formulas are interpreted in a fixed model, which we formalize in Coq. We represent a Dminor subtyping problem as a logical implication, supply our SMT solver with a set of axioms

F[[(x : T where e)]](t) = F[[T ]](t) ∧ let x = t in (R[[e]] = Return(true)) W[[(x : T where e)]](t) = W[[T ]](t)∨ let x = t in (¬(R[[e]] = Return(false) ∨ R[[e]] = Return(true)))

109

R[[x]] = Return(x) R[[e1 ?e2 : e3 ]] = Bind x ⇐ R[[e1 ]] in (if x = true then R[[e2 ]] else (if x = false then R[[e3 ]] else Error)) R[[let x = e1 in e2 ]] = Bind x ⇐ R[[e1 ]] in R[[e2 ]] R[[e in T ]] = Bind x ⇐ R[[e]] in (if W[[T ]](x) then Error else (if F[[T ]](x) then Return(true) else Return(false)))

Program Fixpoint res acc fold (f : ClosureRes2) (vb : VBag) (a : Result) {measure List.length vb} : Result := match vb with | nil ⇒ a | v :: vb’ ⇒ match a with Return va ⇒ res acc fold vb’ (f va v) | Error ⇒ Error end end. Definition res accumulate (f : ClosureRes2) (cv v : Value) : Result := if is C cv then res acc fold f (out C cv) (Return v) else Error.

Next, we specify the semantics of scalar types and values. Model: Testers for Simple Values:

The semantics of the collection type T ∗ is the set of all values (denoted by t) that are collections (is C(t)) containing only elements of type T (∀x.v mem(x,t) ⇒ F[[T ]](x)). Semantics: Collection Types and Expressions: F[[T ∗]](t) = is C(t) ∧ (∀x.v mem(x,t) ⇒ F[[T ]](x)) x ∈ / fv(T,t) W[[T ∗]](t) = is C(t) ∧ (∃x.v mem(x,t) ∧ W[[T ]](x)) x ∈ / fv(T,t)

Definition In Logical v := (is G v) && is G Logical (out G v). Definition In Integer v := (is G v) && is G Integer (out G v). Definition In Text v := (is G v) && is G Text (out G v).

Semantics: Scalar Types, Simple Values and Operators: F[[Integer]](t) = In Integer(t) R[[c]] = Return(c) F[[Text]](t) = In Text(t) W[[G]](t) = false F[[Logical]](t) = In Logical(t)

R[[{v1 , . . . , vn }]] = Return({v1 , . . . , vn }) R[[e1 :: e2 ]] = Bind x1 ⇐ R[[e1 ]] in Bind x2 ⇐ R[[e2 ]] in (if is C(x2 ) then Return(v add(x1 , x2 )) else Error) R[[from x in e1 let y = e2 accumulate e3 ]] = Bind x1 ⇐ R[[e1 ]] in Bind x2 ⇐ R[[e2 ]] in res accumulate((fun x y → R[[e3 ]]), x1 , x2 )

R[[⊕(e1 , . . . , en )]] = Bind x1 ⇐ R[[e1 ]] in . . . Bind xn ⇐ R[[en ]] in (if F[[T1 ]](x1 ) ∧ · · · ∧ F[[Tn ]](xn ) then Return(O⊕ (x1 , . . . , xn )) else Error) where ⊕ : T1 , . . . , Tn → T The notation ⊕ : T1 , . . . , Tn → T defines type signatures for each primitive operator ⊕. We omit the details, as well as the definitions of the functions O⊕ interpreting each primitive operator ⊕. The semantics of an entity type {` : T } is the set of all values (denoted by t) that are entities (is E(t)) having the field ` (v has field(`,t)), which contains a value of type T (F[[T ]](v dot(t, `))). Model: Functions and Predicates on Entities: Program Definition v has field (s : string) (v : Value) : bool := match TheoryList.assoc eq str dec s (out E v) with | Some v ⇒ true | None ⇒ false end. Program Definition v dot (s : string) (v : Value) : Value := match TheoryList.assoc eq str dec s (out E v) with | Some v ⇒ v | None ⇒ v null end.

In order to give a semantics to function applications we recall that pure expressions may only call labeled-pure functions, and that the body of a labeled-pure function is itself a pure expression. For each labeled-pure function definition f (x1 : T1 , . . . , xn : Tn ) : U{e}, the model of the symbol f is the total function f ∈ Valuen → Result such that f (v1 , . . . , vn ) is the result r such that e{v1 /x1 } . . . {vn /x1 } ⇓ r. (We know that there is a unique r such that e{v1 /x1 } . . . {vn /x1 } ⇓ r because e is pure.) Hence, the following holds by definition: L EMMA 1. If f (x1 : T1 , . . . , xn : Tn ) : U{e} and e is pure and e{v1 /x1 } . . . {vn /xn } ⇓ r then |= f (v1 , . . . , vn ) = r. Semantics: Function Application: R[[ f (e1 , . . . , en )]] = Bind x1 ⇐ R[[e1 ]] in . . . Bind xn ⇐ R[[en ]] in f (x1 , . . . , xn )

Semantics: Entity Types and Expressions: F[[{` : T }]](t) = is E(t) ∧ v has field(`,t) ∧ F[[T ]](v dot(t, `)) W[[{` : T }]](t) = is E(t) ∧ v has field(`,t) ∧ W[[T ]](v dot(t, `)) R[[{`i ⇒ ei i∈1..n }]] = Bind x1 ⇐ R[[e1 ]] in . . . Bind xn ⇐ R[[en ]] in Return({`i ⇒ xi i∈1..n }) R[[e.`]] = Bind x ⇐ R[[e]] in (if is E(x) ∧ v has field(`, x) then Return(v dot(x, `)) else Error)

The operational semantics preserves logical meaning: P ROPOSITION 1. For all closed pure expressions e and e0 , if e → e0 then |= R[[e]] = R[[e0 ]]. Moreover, we have a full abstraction result for this first-order language: the equalities induced by the operational and logical semantics of closed pure expressions coincide.

The semantics of from x in e1 let y = e2 accumulate e3 relies on a function res accumulate that folds over a collection by applying a function of sort ClosureRes2, and if no error occurs at any step it returns a value, otherwise it returns Error. The model of the sort ClosureRes2 is the set of functions from Value to Value to Result. We write the lambda-abstraction fun x y → R[[e3 ]] for such a function. There are several standard techniques for representing lambda-abstractions in first-order logic [31]. Since the accumulate expression is pure it produces the same result no matter what order is used when folding.

T HEOREM 1 (Full Abstraction). For all closed pure expressions e and e0 , |= R[[e]] = R[[e0 ]] if and only if, for all r, e ⇓ r ⇔ e0 ⇓ r.

Model: Functions and Predicates on Collections:

(1) if e is a function application f (e1 , . . . , en ) then f is labeled-pure, and only calls f (directly or indirectly) on structurally smaller arguments;

3.1

Algorithmic Purity Check

The purity property defined in §2.3 is undecidable. We use a syntactic termination condition on the applied functions together with a restriction on the accumulate expressions to make the purity checks tractable. We call an expression e algorithmically pure if and only if the following three conditions hold:

Program Definition v mem (v cv : Value) : bool := mem eq rval dec v (out C cv). Program Definition v add (v cv : Value) : Value := (C (insert in sorted vb v (out C cv))). Definition ClosureRes2 := Value → Value → Result.

(2) if e is of the form from x in e1 let y = e2 accumulate e3 then |= R[[let y = e3 {x1 /x}{y1 /y} in e3 {x2 /x}]] = R[[let y = e3 {x2 /x}{y1 /y} in e3 {x1 /x}]]

110

(where the variables x1 , x2 , and y1 do not appear free in e3 );

The subtype relation is defined as logical implication between the logical semantics of well-formed types. Rule of Semantic Subtyping: (Subtype) E ` T E ` T0 x ∈ / dom(E) |= (F[[E]] ∧ F[[T ]](x)) =⇒ F[[T 0 ]](x) E ` T <: T 0

(3) all the proper subexpressions of e are algorithmically pure (including the ones inside all refinement types contained by e). Condition (1) enforces termination of algorithmically pure expressions: only labeled-pure functions can be called and if these functions are recursive then recursive calls can only be on syntactically smaller arguments. Condition (2) only allows accumulates in an algorithmically pure expression if the order in which the elements are processed is irrelevant for the final result. In general we call a (mathematical) function f : X × Y → Y order-irrelevant if f (x1 , f (x2 , y)) = f (x2 , f (x1 , y)) for all x1 , x2 and y. Enforcing that the semantics of the body of accumulate expressions is an orderirrelevant function is a sufficient condition for the uniqueness of evaluation results. We phrase this condition in terms of the logical semantics and check it using the SMT solver. Order-irrelevance is less restrictive than conditions found in the literature such as associativity and commutativity [28]. If f is associative and commutative then f is also order-irrelevant, but the converse fails in general. If f is order-irrelevant its two arguments need not even have the same type.

Rules of Type Assignment: E ` e : T (Exp Singular Subsum) (Exp Var) E ` e : T E ` [e : T ] <: T 0 E ` (x : T ) ∈ E E `x:T E ` e : T0 (Exp Eq) E ` e1 : T1 E ` e2 : T2 T = Logical E ` e1 == e2 : T

(Exp Let) E ` e1 : T E, x : T ` e2 : U x ∈ / fv(U) E ` let x = e1 in e2 : U

T HEOREM 2. If e is algorithmically pure then e is pure. The logical semantics is defined only on pure expressions. Given the logical semantics, we obtain algorithmic purity, a sufficient condition for purity. In the remainder of the paper we rely only on algorithmic purity.

4.

(Exp Entity) E ` ei : Ti E ` {`i ⇒ ei

In this section, we give a non-algorithmic type assignment relation, and prove preservation and progress properties relating it to the operational semantics. In the next section, we present algorithmic rules—the basis of our type-checker—for proving type assignment. Each judgment of the type system is with respect to a typing environment E, of the form x1 : T1 , . . . , xn : Tn , which assigns a type to each variable in scope. We write ∅ for the empty environment, dom(E) to denote the set of variables defined by a typing environment E, and F[[E]] for the logical interpretation of E.

(Exp Acc) E ` e1 : T ∗ E ` e2 : U E, x : T, y : U ` e3 : U x, y ∈ / fv(U) E ` from x in e1 : U let y = e2 accumulate e3

Environments and their Logical Semantics: E ::= x1 : T1 , . . . , xn : Tn type environments dom(x1 : T1 , . . . , xn : Tn ) = {x1 , . . . , xn } 4 F[[x1 : T1 , . . . , xn : Tn ]] = F[[T1 ]](x1 ) ∧ · · · ∧ F[[Tn ]](xn )

(Exp Cond) E ` e1 : Logical E, : Ok(e1 ) ` e2 : T E, : Ok(!e1 ) ` e3 : T E ` (e1 ?e2 : e3 ) : T (Exp Test) E ` e : Any E ` T E ` e in T : Logical

(Exp Dot) E ` e : {` : T } E ` e.` : T

(Exp Add) E ` e1 : T E ` e2 : T ∗ E ` (e1 :: e2 ) : T ∗

(Exp App) given f (x1 : T1 , . . . , xn : Tn ) : U{e f } {x1 , . . . , xn } ∩ dom(E) = ∅ σi = {e1 /x1 } . . . {ei /xi } ∀i ∈ 0..n ei alg. pure E ` ei : Ti σi−1 ∀i ∈ 1..n E ` f (e1 , . . . en ) : Uσn

The rule (Exp Cond) records the appropriate test expression in the environment, when typing the branches. The actual value of a type Ok(e) is arbitrary, the point is simply to record that condition e holds [23], provided it is pure. When e is not pure, Ok(e) is equivalent to Any.

Environments and Judgments of the Declarative Type System: E ` environment E is well-formed E `T in E, type T is well-formed E ` T <: T 0 in E, type T is a subtype of T 0 E `e:T in E, expression e has type T

Typed Singleton Types and Ok Types: 4 (x : T where x == e) (x ∈ / fv(e)) if e alg. pure [e : T ] = T otherwise 4 (x : Any where e) (x ∈ / fv(e)) if e alg. pure Ok(e) = Any otherwise

Global Assumptions: For each function definition f (x1 : T1 , . . . , xn : Tn ) : U{e f } we assume that x1 : T1 , . . . , xn : Tn ` e f : U.

The rule (Exp Singular Subsum) can be seen as a combination of the following conventional rules of subsumption and singleton introduction. (Exp Subsum) (Exp Singleton) E `e:T E ` e : T E ` T <: T 0 E ` e : [e : T ] E ` e : T0

Rules of Well-Formed Environments and Types: E ` , E ` T (Env Empty) (Env Var) (Type Any) (Type Scalar) E `T x∈ / dom(E) E ` E ` ∅` E, x : T ` E ` Any E `G (Type Entity) E `T E ` {` : T }

∀i ∈ 1..n E ` i∈1..n } : {` : T i∈1..n } i i

(Exp Coll) E ` vi : T ∀i ∈ 1..n E ` E ` {v1 , . . . , vn } : T ∗

Declarative Type System

(Type Collection) E `T E ` T∗

(Exp Operator) ⊕ 6= (==) ⊕ : T1 , . . . , Tn → T E ` ei : Ti ∀i ∈ 1..n E ` ⊕(e1 , . . . , en ) : T

(Exp Const) E ` E ` c : Any

Both these rules are derivable from (Exp Singular Subsum). In fact, we can go in the other direction too so that the type assignment relation would be unchanged were we to replace (Exp Singular Subsum) with (Exp Subsum) and (Exp Singleton). Still, the given presentation is simpler to work with because (Exp Singular Subsum) is the only rule not determined by the structure of the expression being typed.

(Type Refine) E, x : T ` e : Logical e alg. pure E ` (x : T where e)

111

Proof: By definition of notation, Bind x ⇐ R[[e]] in t is the term (if ¬Proper(R[[e]]) then Error else let x = out V(R[[e]]) in t). By Theorem 3, |= F[[E]] =⇒ Proper(R[[e]]). Hence the result. 2

In the rule (Exp App), we require that each ei in a dependent function application f (e1 , . . . en ) is (algorithmically) pure. This allows us to substitute these expressions into U. To form, say, f (e) where e is impure, we can work around this restriction by writing let x = e in f (x) instead. The following soundness property relates type assignment to the logical semantics of types and expressions. Point (1) is that the logical value of a well-typed expression satisfies the interpretation of its type as a predicate. Point (2) is that evaluating a type-test for a well-formed type cannot go wrong.

The following tables present the optimized definitions used in our type-checker, and the following theorem states their correctness with respect to the error tracking semantics of §3. Optimized Semantics of Types: F0 [[T ]](t) F0 [[Any]](t) = true F0 [[Integer]](t) = In Integer(t) F0 [[Text]](t) = In Text(t) F0 [[Logical]](t) = In Logical(t) F0 [[{` : T }]](t) = is E(t) ∧ v has field(`,t) ∧ F0 [[T ]](v dot(t, `)) / fv(T,t) F0 [[T ∗]](t) = is C(t) ∧ (∀x.v mem(x,t) ⇒ F0 [[T ]](x)) x ∈

T HEOREM 3 (Logical Soundness). (1) If e is alg. pure and E ` e : T then: • |= F[[E]] =⇒ Proper(R[[e]]) • |= F[[E]] =⇒ F[[T ]](out V(R[[e]])) (2) If E ` U then |= F[[E]] =⇒ ∀y.¬W[[U]](y), for y ∈ / fv(U).

F0 [[(x : T where e)]](t) = F0 [[T ]](t) ∧ let x = t in V[[e]] = true

The rule (Exp Singular Subsum), depends on the relation E ` [e : T ] <: T 0 , which we refer to as singular subtyping. We illustrate (Exp Singular Subsum) and singular subtyping with regard to (Exp Const). For example, to derive that E ` [42 : Any] <: Integer note that |= F[[[42 : Any]]](x) ⇔ x = 42 and hence that |= F[[[42 : Any]]](x) =⇒ In Integer(x).

Optimized Semantics of Pure Typed Expressions: V[[e]] V[[x]] = x V[[c]] = c V[[⊕(e1 , . . . , en )]] = O⊕ (V[[e1 ]], . . . , V[[en ]]) V[[e1 ?e2 : e3 ]] = (if V[[e1 ]] = true then V[[e2 ]] else V[[e3 ]]) V[[let x = e1 in e2 ]] = let x = V[[e1 ]] in V[[e2 ]] V[[e in T ]] = (if F0 [[T ]](V[[e]]) then true else false) V[[e : T ]] = V[[e]] V[[{`i ⇒ ei i∈1..n }]] = {`i ⇒ V[[ei ]] i∈1..n } V[[e.`]] = v dot(V[[e]], `) V[[{v1 , . . . , vn }]] = {v1 , . . . , vn } V[[e1 :: e2 ]] = v add(V[[e1 ]], V[[e2 ]]) V[[from x in e1 let y = e2 accumulate e3 ]] = v accumulate((fun x y → V[[e3 ]]), V[[e1 ]], V[[e2 ]])

L EMMA 2 (Singular Subtyping). Suppose E ` e : T and E ` T 0 and x ∈ / dom(E). (1) If e is alg. pure then: E ` [e : T ] <: T 0 iff |= F[[E]] ∧ F[[T ]](out V(R[[e]])) =⇒ F[[T 0 ]](out V(R[[e]])) (2) If e is not alg. pure then: E ` [e : T ] <: T 0 iff |= F[[E]] ∧ F[[T ]](x) =⇒ F[[T 0 ]](x) By the following lemma, singular subtyping is transitive, and hence we have that any derivation of a type assignment can be seen as one instance of a structural rule plus one instance of (Exp Singular Subsum). This observation is useful, for example, in proving type preservation, Theorem 4.

We omit the definition of the function v accumulate, which is a variant of res accumulate that works with values rather than results. See the technical report for the full details [8].

L EMMA 3 (Transitivity of Singular Subtyping). If E ` [e : T ] <: T 0 and E ` [e : T 0 ] <: T 00 then E ` [e : T ] <: T 00 .

T HEOREM 6 (Soundness of Optimized Semantics).

We have proved standard derived judgment, weakening, bound weakening, and substitution lemmas for the type system, which are used in the proofs of the progress and preservation theorems.

(1) If E ` T and x ∈ / dom(E) then: |= (F[[E]] =⇒ (F[[T ]](x) ⇔ F0 [[T ]](x)). (2) If E ` e : T then: |= F[[E]] =⇒ (R[[e]] = Return(V[[e]])).

T HEOREM 4 (Preservation). If E ` e : T and e → e0 then E ` e0 : T .

Proof: The proof is by simultaneous induction on the derivations of E ` T and E ` e : T , with appeal to Theorem 3 and Lemma 4. 2

T HEOREM 5 (Progress). If ∅ ` e : T and e is not a value then ∃e0 .e → e0 .

5.2

5.

Algorithmic Aspects

5.1

Optimizing the Logical Semantics

x∈ / fv(T,t)

Bidirectional Typing Rules

The Dminor type system is implemented as a bidirectional type system [35]. The key concept of bidirectional type systems is that there are two typing relations, one for type checking, and one for type synthesis. The chief characteristic of these relations is that they are local in the sense that type information is passed between adjacent nodes in the syntax tree without the use of long-distance constraints such as unification variables, as used in, e.g., ML.

Our logical semantics propagates error values so as to match the stuck expressions of our operational semantics. Tracking errors is important, but observe that when we use our logical semantics during semantic subtyping, we only ever ask whether well-formed types are related. Every expression occurring in a well-formed type is itself well-typed, and so, by Theorem 3, its logical semantics is a proper value, not Error. This suggests that when checking subtyping we can optimize the logical semantics given the assumption that the expressions occurring within the two types are well-typed. In particular, we can apply the following lemma to transform monadic error-checking binds into ordinary lets.

Judgments of the Algorithmic Type System: E `e→T in E, expression e synthesizes type T E `e←T in E, expression e checks against type T E B environment E is alg. well-formed E BT in E, type T is alg. well-formed E B S <: T in E, type S is alg. a subtype of type T

L EMMA 4. If e alg. pure and E ` e : T then |= F[[E]] =⇒ (Bind x ⇐ R[[e]] in t) = (let x = out V(R[[e]]) in t).

Both subtyping and well-formedness rely on type-checking, so we need to distinguish versions of these judgments that use the

112

declarative typing rules from versions that use the bidirectional typing rules (and in the case of subtyping, the optimized semantics). For brevity we omit the definitions, which may be found in the technical report [8]. Rules of Type Synthesis: E ` e → T (Synth Const) (Synth Var) E B E B (x : T ) ∈ E E ` x → [x : T ] E ` c → [c : typeof (c)] (Synth Operator) E ` ei ← Ti ∀i ∈ 1..n ⊕ : T1 , . . . , Tn → T E ` ⊕(e1 , . . . , en ) → [⊕(e1 , . . . , en ) : T ] (Synth Cond) E ` e1 ← Logical E, : Ok(e1 ) ` e2 → T2 E, : Ok(!e1 ) ` e3 → T3 E ` (e1 ?e2 : e3 ) → (if e1 then T2 else T3 ) (Synth Let) E ` e1 → T1 E, x : T1 ` e2 → T2 E ` T2 {e1 /x} E ` let x = e1 in e2 → T2 {e1 /x} (Synth Test) E ` e ← Any E B T E ` e in T → Logical

(Synth Ascribe) E `e←T E ` (e : T ) → T

(Synth Entity) E ` e1 → T1 · · · E ` en → Tn E B E ` {`i ⇒ ei i∈1..n } → {`1 : T1 } & · · · & {`n : Tn }

We can define two partial functions to extract field and item types from normalized entity and collection types. These are written D.` ; U and D.Items ; U, respectively. For example ({` : Integer} | {` : Logical}).` ; Integer | Logical and ((Text∗ & Logical) | Integer∗).Items ; Text | Integer. Note that both these functions are partial, e.g. ({` : Integer} | {`0 : Logical}).` 6;. The simple definitions of these functions are in the technical report [8]. Rules of Type Checking: E ` e ← T (Check Cond) (Swap) E ` e1 ← Logical E ` e → T E B [e : T ] <: T 0 E, : Ok(e1 ) ` e2 ← T E, : Ok(!e1 ) ` e3 ← T E ` e ← T0 E ` e1 ?e2 : e3 ← T

(Synth Dot) E ` e → T norm(T ) = D D.` ; U E ` e.` → [e.` : U] (Synth Coll) E ` vi → Ti ∀i ∈ 1..n E B E ` {v1 , . . . , vn } → (T1 | . . . | Tn )∗ (Synth Add) E ` e1 → T1

E ` e2 → T2 norm(T2 ) = D2 D2 .Items ; U2 E ` e1 :: e2 → ([e1 : T1 ] | U2 )∗

(Synth Acc) E ` e1 → T1 norm(T1 ) = D1 D1 .Items ; U1 E ` e2 → T2 E, x : U1 , y : T2 ` e3 ← T2 E ` from x in e1 let y = e2 accumulate e3 → T2

(Check Let) E ` e1 → T E, x : T ` e2 ← U x 6∈ fv(U) E ` let x = e1 in e2 ← U

(Synth App) given f (x1 : T1 , . . . , xn : Tn ) : U{e f } σi = {e1 /x1 } . . . {ei /xi } ∀i ∈ 0..n ei is alg. pure E ` ei ← (Ti σi−1 ) ∀i ∈ 1..n E ` f (e1 , . . . en ) → Uσn

(Check Dot) E ` e ← {` : T } E ` e.` ← T

The (Swap) rule tests for singular subsumption and applies if the expression to be type-checked is not a conditional, let-expression or a field selection. Typically (e.g. S AGE [26]), the type checking relation for a bidirectional type system consists of a single rule of the form: E ` e → S E B S <: T E `e←T However, we have found in practice that in the cases where the expression is a conditional or a let-expression, we get better precision of type checking by passing the type through to the subexpressions, as shown in the (Check Cond) and (Check Let) rules. Similarly, we can pass through an entity type in the (Check Dot) rule.

The rules (Synth Var), and (Synth Const) yield singleton types for all variables and constants, where the function typeof returns the type of a given constant. Rule (Synth Entity) uses intersection types to encode record types. The (Synth Cond) rule synthesizes a conditional type, which is the union of the two types synthesized for the branches along with the test expression (if it is pure) to allow more precise typing. Encoding of Conditional Types: 4 if ethen T else U = ( : T where e) | ( : U where !e) T |U

type annotations have no operational significance (in the small-step semantics e : T → e), and are necessary in case the type-checker cannot infer the loop invariants of accumulate expressions. In several of the type synthesis rules we need to inspect components of intermediate types. In simple type systems this is straightforward as one can rely on the syntactic structure of types, but for rich type systems such as the one of Dminor this is not possible. In other dependently-typed languages, either the programmer is required to insert casts to force the type into the appropriate syntactic shape [43], or types are first executed until a normal form is reached [3]. Unfortunately, neither approach is acceptable in Dminor: the former forces too many casts on the programmer, and the latter is not feasible because refinements often refer to potentially very large data sets. One pragmatic possibility is to attempt type normalization but place some ad hoc bound on evaluation [26]. As an alternative, we define a disjunctive normal form (DNF) for types, along with a normalization function, norm, for translating types into DNF, and procedures for extracting type information from DNF types. In practice, this approach works well. Normal Types (DNF): D ::= R1 | . . . | Rn normal disjunction (Empty if n = 0) R ::= x : C where e normal refined conjunction C ::= A1 & . . . & An normal conjunction (Any if n = 0) A ::= G | T ∗ | {` : T } atomic type

L EMMA 5 (Synthesis Checkable). If E ` e → T then E ` e ← T . T HEOREM 7 (Soundness of Algorithmic Type System). (1) (2) (3) (4) (5)

if e alg. pure otherwise

The (Synth Ascribe) rule allows the user to provide hints to the type-checker in the form of type annotations (e : T ). Such

113

If E B then E ` . If E B T then E ` T . If E B S <: T and E ` S then E ` S <: T . If E ` e → T then E ` e : T . If E ` e ← T then E ` e : T .

6.

L EMMA 7. If the three checks in §6.1 succeed for T 0 = Empty then ∅ ` xσ : T σ and ∅ ` yσ : Uσ for all (y : U) ∈ E.

Exploiting SMT Models

SMT solvers such as Z3 can produce a potential model in case they fail to prove the validity of a proof obligation (that is, when they show the satisfiability of its negation, or when they give up). In our case such models can be automatically converted into assignments mapping program variables to Dminor values. Because of the inherent incompleteness of the SMT solver2 and of the axiomatization we feed to it, the obtained assignment is not guaranteed to be correct. However, given a way to validate assignments, one can use the correct ones to provide very precise counterexamples when type-checking fails, and to find inhabitants of types statically or dynamically, in a way that amounts to a new style of constraint logic programming. 6.1

6.3

Operational Semantics for Finding Elements of Types: elementof T → v where v in T →∗ true elementof T → null

Precise Counterexamples to Type-checking

The type-checking algorithm from §5.2 crucially relies on subtyping, as in the rule (Swap), and our semantic subtyping relation E ` T <: T 0 produces proof obligations of the form

Finding elements of types is actually simpler to do dynamically than statically: at run-time all variables inside types have already been substituted by values, so there are fewer checks to perform. The outcome of elementof T is in general non-deterministic, and depends in practice on the computational power and load of the system as well as on the timeout used when calling the SMT solver. Because of this elementof T expressions are considered algorithmically impure, and therefore cannot appear inside types.

|= (F[[E]] ∧ F[[T ]](x)) =⇒ F[[T 0 ]](x) for some fresh variable x. If the SMT solver fails to prove such an obligation, it produces a potential model from which we can extract an assignment σ mapping x and all variables in E to Dminor values. To verify that σ is a valid counterexample, we check the following three conditions:

Typing rules for elementof: (Exp elementof) E `T E ` elementof T : (T | [null])

(1) E ` T and E ` T 0 (2) (yσ

in Uσ ) →∗ true,

for all (y : U) ∈ E;

(3) (xσ in (T &!T 0 )σ ) →∗ true. Condition (1) enforces that we only evaluate pure expressions therefore ensuring termination and confluence of the reduction. Condition (2) enforces that the values for all variables in E have their corresponding (possibly dependent) types. Condition (3) checks whether the value assigned to x by σ is an element of T but not an element of T 0 . If these three checks succeed, σ is a valid counterexample to typing that we display to the user.

(Synth elementof) E `T E ` elementof T → (T | [null])

L EMMA 8. If elementof T → v and ∅ ` T then ∅ ` v : T | [null]. The new elementof T construct enables a form of constraint programming in Dminor, in which we iteratively change the constraints inside types in order to explore a large state space. For instance the following recursive function computes all correct configurations of a complex system when called with the empty collection as argument. Correctness is specified by some type GoodConfig.

L EMMA 6. If the three checks above succeed then E 6` T <: T 0 .

allGoodConfigs(avoid : GoodConfig∗) : GoodConfig∗ { let m = elementof (GoodConfig where !(value in avoid)) in (m == null) ? {} : (m :: (allGoodConfigs(m :: avoid))) }

Since the type-checker is itself over-approximating, there is no guarantee that an expression e that fails to type-check is going to get stuck when evaluated. The best we might do is to evaluate eσ for a fixed number of steps, a fixed number of times (remember that e can be non-deterministic), searching for a counterexample trace we can additionally display to the user. 6.2

Finding Elements of Types Dynamically

We can use the same technique to find elements of types dynamically. We augment the calculus with a new primitive expression elementof T (not present in the M language) which tries to find an inhabitant of T . If successful the expression returns such a value, but otherwise it returns null. (We can always choose T so that null is not a member, so that returning null unambiguously signals that no member of T was found.)

Programming in this purely declarative style can be appealing for rapid prototyping or other tasks where efficiency is not the main concern. One only needs to specify what has to be computed in the form of a type. It is up to the SMT solver to use the right (semi)decision procedures and heuristics to perform the computation. If this fails or is too slow one can instead implement the required functionality manually. There is little productivity loss in this case since the types one has already written will serve as specification for the code that needs to be written manually.

Finding Elements of Types Statically

Type emptiness can be phrased in terms of subtyping as E ` T <: Empty, or equivalently |= ¬(F[[E]] ∧ F[[T ]](x)) for some fresh x. We additionally check that F[[E]] is satisfiable (and the model the SMT solver produces is a correct one) to exclude the case that the environment is inconsistent and therefore any subtyping judgment holds vacuously. Hence, we can detect empty types during typechecking and issue a warning to the user if an empty type is found. This is useful, since one can make mistakes when writing types containing complicated constraints. Moreover, if the SMT solver cannot prove that a type is empty we again obtain an assignment σ , which we can validate as in §6.1. If validation succeeds we know that xσ is an element of T σ , and we can display this information if the user hovers over a type.

7.

Implementation

Our prototype Dminor implementation is approximately 2700 lines of F] code, excluding the lexer and parser. Our type-checker implements the algorithmic purity check from §3.1, the optimized logical semantics from §5.1, and the bidirectional typing rules from §5.2. We use Z3 [13] to discharge the proof obligations generated by semantic subtyping. Together with the proof obligations we feed to Z3 a 500 line axiomatization of our intended model in SMT-LIB format [36], which uses the theories of integers, datatypes and extensional arrays. The formal definition of our intended model of Dminor is just over 4000 lines of Coq.

2 Other

than background theories with a non-recursively enumerable set of logical consequences such as integer arithmetic, other sources of incompleteness in SMT solvers are quantifiers (which are usually heuristically instantiated) and user-defined time-outs.

114

We have tested our type-checker on a test suite consisting of about 130 files, some type-correct and some type-incorrect, some hand-crafted by us and some transliterated from the M preliminary release. Even without serious optimization the type-checker is fast. Checking each of the 130 files in our test suite on a typical laptop takes from under 1 second (for just startup and parsing) to around 3 seconds (for type-checking an interpreter for while-programs— see §1.1—that discharges more than 300 proof obligations). Also, our experience with Z3 has been very positive so far—whilst it is possible to craft subtyping tests that cannot be efficiently checked,3 Z3 has performed very well on the idioms in our test suite. Still, we cannot draw firm conclusions until we have studied bigger examples. We have also implemented the techniques for exploiting SMT solver models described in §6. We built a plugin for the Microsoft Intellipad text editor [1] that displays precise counterexamples to typing, flags empty types and otherwise displays one element of each type defined in the code. Moreover, our interpreter for Dminor supports elementof for dynamically generating instances of types (§6.3). This works well for simple constraints involving equalities, datatypes and simple arithmetic, and types that are not too deeply nested. However, scaling this up to arbitrary Dminor types is a challenge that will require additional work, as well as further progress in SMT solvers.

8.

X10 [40] is an object-oriented language that supports refinement types. A class C can be refined with a constraint c on the immutable state of C, resulting in a type written C(:c). The base language supports only simple equality constraints but further constraints can be added and multiple constraint solvers can be integrated into the compiler. In comparison with Dminor, X10 uses a mixture of semantic and syntactic subtyping, while its constraint language [40, §2.11] does not support type-test expressions. The introduction of XML and XML query languages led to renewed (practical) interest in semantic subtyping. In the context of XML documents, there is a natural generalization of DTDs where the structures in XML documents can be described using regular expression operations (such as *, ?, and |) and subtyping between two types becomes inclusion between the set of sequences that are denoted by the regular expression types. Hosoya and Pierce first defined such a type system for XML [24] and their language, XDuce. Frisch, Castagna, and Benzaken [22] extended semantic subtyping to function types and propositional types, with type-test, but not refinement types, resulting in the language CDuce [7]. CDuce allows expressions to be pattern-matched against types and statically detects if a pattern-matching expression is nonexhaustive or if a branch is unreachable. If this is the case a counterexample XML document is generated that exhibits the problem. CDuce also issues warnings if empty types are detected. These tasks are much simpler in CDuce than they are in our setting, since we additionally have to deal with general refinement types. Typed Scheme [42] makes use of type-test expressions, union types and notions of visible and latent predicates to type-check Scheme programs. It would be interesting to see if these idioms can be internalized in the Dminor type system using refinements. PADS [19] develops a type theory for ad hoc data formats such as system traces, together with a rich range of tools for learning such formats and integrating into existing programming languages. The PADS type theory has refinement types, dependent pairs, and intersection types, but not type-test. There is a syntactic notion of type equivalence, but not subtyping. Dminor would be a useful language for programming transformations on data parsed using PADS, as our type system would enforce the constraints in PADS specifications, and hence guarantee statically that transformed data remains well-formed. Existing interfaces of PADS to C or to OCaml do not offer this guarantee.

Related Work

Whilst Dminor’s combination of refinement types and type-tests is new and highly expressive, it builds upon a large body of related work on advanced type systems. Refinement types have their origins in early work in theorem proving systems and specification languages, such as subset types in constructive type theory [33], set comprehensions in VDM [25], and predicate subtypes in PVS [39]. In PVS, constraints found when checking predicate subtypes become proof obligations to be proved interactively. More recently, Sozeau [41] extends Coq with subset types; as in PVS the proofs of subset type membership have to be constructed using tactics. Freeman and Pfenning [21] extended ML with a form of refinement type, and Xi and Pfenning [43] considered applications of dependent types in an extension of ML. In both of these systems, decidability of type checking is maintained by restricting which expressions can appear in types. Lovas and Pfenning [29] presented a bidirectional refinement type system for LF, where a restriction on expressions leads to an expressive yet decidable type system. Other work has combined refinement types with syntactic subtyping [6, 38] but none includes type-test in the refinement language. Closest to our type system is the work of Flanagan et al. on hybrid types and S AGE [26]. SAGE also uses an SMT solver to check the validity of refinements but not for subtyping (checked by traditional syntactic techniques), and does not allow type-test expressions in refinements. However, S AGE supports a dynamic type and employs a particular form of hybrid type checking [20] that allows particular expressions to have their type-check deferred until run-time. The idea of hybrid types is to strike a balance between runtime checking of contracts, as in Eiffel [32] and DrScheme [18], and static typing. Compared to purely static typing this can reduce the number of false alarms generated by type-checking. In spite of early work on semantic subtyping by Aiken and Wimmers [2] and Damm [12], most programming and query languages instead use a syntactic notion of subtyping. This syntactic approach is typically formalized by an inductively or co-inductively defined set of rules [34]. Unfortunately, deriving an algorithm from such a set of rules can be difficult, especially for advanced features such as intersection and union types [16]. 3 Z3

9.

Conclusions

We have described Dminor, a simple, yet flexible, functional language for defining data models and queries over these data models. The main novelty of Dminor is its especially rich type system. The combination of refinement types and type-test appears to be new. On top of familiar arithmetic constraints on types (analogous to the sort checked dynamically by other data modeling languages) we have given examples of how this type system can, in addition, encode singleton, nullable, union, intersection, negation, and algebraic types, although without first-class functions. The other main contribution of this paper is a technique to typecheck Dminor programs statically: we combine the use of a bidirectional type system with the use of an SMT solver to perform semantic subtyping. (Other systems have either devised special purpose algorithms for semantic subtyping, or used theorem provers only for refinement types.) The design of our bidirectional type system to enable precise typing of programs appears novel. We have implemented our type system in F] using the Z3 SMT solver. SMT solvers are now of sufficient maturity that they can realistically be thought of as a platform upon which many applications, including type systems, may be built. Our type-checker, like all static analyzers, has the potential to generate false negatives, that is, rejecting programs as type incor-

gets at most 1 second for each proof obligation by default.

115

rect that are, in fact, type correct. As any SMT solver is incomplete for the first-order theories that we are interested in, it is possible that the solver is unable to determine an answer to a logical statement. S AGE [20] avoids these problems by catching these cases and inserting a cast so that the test is performed again at run-time. This has the pleasant effect of not penalizing the developer for any possible incompletenesses of the SMT solver. The techniques used in S AGE should apply to Dminor without any great difficulty. Finally, the implications of this work go beyond the core calculus Dminor. PADS, JSON, and M, for example, show the significance of programming languages for first-order data. Our work establishes the usefulness of combining refinement types and typetest expressions when programming with first-order data, and the viability of type-checking such programs with an SMT solver.

[17] B. Dutertre and L. de Moura. The YICES SMT solver. Available at http://yices.csl.sri.com/tool-paper.pdf, 2006. [18] R. Findler and M. Felleisen. Contracts for higher-order functions. In ICFP, 2002. [19] K. Fisher, Y. Mandelbaum, and D. Walker. The next 700 data description languages. In Proceedings of POPL, 2006. [20] C. Flanagan. Hybrid type checking. In Proceedings of POPL, 2006. [21] T. Freeman and F. Pfenning. Proceedings of PLDI, 1991.

Refinement types for ML.

In

[22] A. Frisch, G. Castagna, and V. Benzaken. Semantic subtyping: Dealing set-theoretically with function, union, intersection, and negation types. J. ACM, 55(4), 2008. [23] A. D. Gordon and A. Jeffrey. Typing one-to-one and one-to-many correspondences in security protocols. In Proceedings of ISSS, 2002.

Acknowledgments We thank Nikolaj Bjørner for his invaluable help in using Z3. James Margetson helped with F# programming issues. Paul Anderson, Ioannis Baltopoulos, Johannes Borgstr¨om, Nate Foster, Tim Harris, and Thorsten Tarrach commented on a draft. Discussions with Mart´ın Abadi, Cliff Jones, and Benjamin Pierce were useful, as were the comments of the anonymous reviewers. C˘at˘alin Hrit¸cu is supported by a fellowship from Microsoft Research and the IMPRS.

[24] H. Hosoya, J. Vouillon, and B. Pierce. Regular expression types for XML. In Proceedings of ICFP, 2000. [25] C. Jones. Systematic software development using VDM. Prentice-Hall Englewood Cliffs, NJ, 1986. [26] K. Knowles, A. Tomb, J. Gronski, S. Freund, and C. Flanagan. S AGE: Unified hybrid checking for first-class types, general refinement types and Dynamic. Technical report, UCSC, 2007. [27] A. Kopylov. Dependent intersection: A new way of defining records in type theory. In LICS, pages 86–95. IEEE Computer Society, 2003.

References

[28] K. R. M. Leino and R. Monahan. Reasoning about comprehensions with first-order SMT solvers. In Proceedings of SAC, 2009.

[1] The Microsoft code name “M” Modeling Language Specification Version 0.5. Microsoft Corporation, Oct. 2009. Preliminary implementation available as part of the SQL Server Modeling CTP (November 2009).

[29] W. Lovas and F. Pfenning. A bidirectional refinement type system for LF. In Proceedings of LFMTP, 2007. [30] E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling objects, relations and XML in the .NET framework. In Proceedings of SIGMOD, 2007.

[2] A. Aiken and E. Wimmers. Type inclusion constraints and type inference. In Proceedings of ICFP, 1993. [3] D. Aspinall and M. Hofmann. Dependent types. In Advanced Topics in Types and Programming Languages, chapter 2. MIT Press, 2005.

[31] J. Meng and L. C. Paulson. Translating higher-order problems to first-order clauses. Journal of Automated Reasoning, 40(1):35–60, 2008.

[4] C. Barrett, M. Deters, A. Oliveras, and A. Stump. Design and results of the 3rd Annual SMT Competition. International Journal on Artificial Intelligence Tools, 17(4):569–606, 2008.

[32] B. Meyer. Eiffel: the language. Prentice Hall, 1992.

[5] C. Barrett and C. Tinelli. CVC3. In Proceedings of CAV, 2007.

[33] B. Nordstr¨om and K. Petersson. Types and specifications. In IFIP’83, 1983.

[6] J. Bengtson, K. Bhargavan, C. Fournet, A. D. Gordon, and S. Maffeis. Refinement types for secure implementations. In Proceedings of CSF, 2008.

[34] B. C. Pierce. Types and Programming Languages. MIT Press, 2002. [35] B. C. Pierce and D. N. Turner. Local type inference. In Proceedings of POPL, 1998.

[7] V. Benzaken, G. Castagna, and A. Frisch. CDuce: An XML-friendly general purpose language. In Proceedings of ICFP, 2003.

[36] S. Ranise and C. Tinelli. The SMT-LIB Standard: Version 1.2, 2006.

[8] G. M. Bierman, A. D. Gordon, C. Hrit¸cu, and D. Langworthy. Semantic subtyping with an SMT solver. Technical Report MSR– TR–2010–99, Microsoft Research, July 2010.

[37] J. C. Reynolds. Design of the programming language Forsythe. In Algol-Like Languages, chapter 8. Birkh¨aser, 1996. [38] P. Rondon, M. Kawaguchi, and R. Jhala. Liquid types. In Proceedings of PLDI, 2008.

[9] P. Buneman, S. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theoretical Computer Science, 149(1):3–48, 1995.

[39] J. Rushby, S. Owre, and N. Shankar. Subtypes for specifications: Predicate subtyping in PVS. IEEE Transactions on Software Engineering, 24(9):709–720, 1998.

[10] R. M. Burstall, D. B. MacQueen, and D. Sannella. HOPE: An experimental applicative language. In LISP Conference, pages 136– 143, 1980.

[40] V. Saraswat, N. Nystrom, J. Palsberg, and C. Grothoff. Constrained types for object-oriented languages. In Proceedings of OOPSLA, 2008.

[11] D. Crockford. The application/json media type for JavaScript Object Notation (JSON), July 2006. RFC 4627.

[41] M. Sozeau. Subset coercions in Coq. In Proceedings of TYPES, 2006.

[12] F. Damm. Subtyping with union types, intersection types and recursive types. In Proceedings of TACS, 1994.

[42] S. Tobin-Hochstadt and M. Felleisen. Logical types for Scheme. In Proceedings of ICFP, 2010.

[13] L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceedings of TACAS, 2008.

[43] H. Xi and F. Pfenning. Dependent types in practical programming. In Proceedings of POPL, 1999.

[14] L. M. de Moura and N. Bjørner. Generalized, efficient array decision procedures. In FMCAD, 2009. [15] D. Detlefs, G. Nelson, and J. B. Saxe. Simplify: a theorem prover for program checking. J. ACM, 52(3):365–473, 2005. [16] J. Dunfield and F. Pfenning. Tridirectional typechecking. In Proceedings of POPL, pages 281–292, 2004.

116

Logical Types for Untyped Languages ∗ Sam Tobin-Hochstadt

Matthias Felleisen

Northeastern University {samth,matthias}@ccs.neu.edu

Abstract

dilemma. To address this situation, we need to develop typed sister languages for untyped languages. With those, programmers can enrich existing programs with type declarations as needed while maintaining smooth interoperability. A type system for an existing untyped language must accommodate the existing programming idioms in order to keep the cost of type enrichment low. Otherwise, type enrichment requires changes to code, which may introduce new mistakes. Put positively, the ideal typed sister language requires nothing but the addition of type specifications to function headers, structure definitions, etc. Our experience shows that programming idioms in untyped functional languages rest on a combination of traditional typebased reasoning with reasoning about control flow. In particular, conditionals and data-type predicates are used to establish the nature of variables’ values, and based on this flow-sensitive reasoning, programmers use variables at more specific types than expected. Put differently, the programmer determines the type of each variable occurrence based on the predicates that flow-dominate it. Multiple researchers over the decades have discovered this insight. In his paper on the static analysis of untyped programs, Reynolds [1968] notes that such reasoning is necessary, stating that in future systems, “some account should be taken of the premises in conditional expressions.” In his work on TYPED LISP, Cartwright [1976, §5] describes having to abandon the policy of rejecting typeincorrect programs because the variables in conditionals had overly broad types. Similarly, in their paper on translating Scheme to ML, Henglein and Rehof [1995] state “type testing predicates aggravate the loss of static type information since they are typically used to steer the control flow in a program in such a fashion that execution depends on which type tag an object has at run-time.” We exploited this insight for the development of Typed Scheme, a typed sister language for Racket (formerly PLT Scheme) [TobinHochstadt and Felleisen 2008]. Its type system combines several preexisting elements—“true” recursive union types, subtyping, polymorphism—with the novel idea of occurrence typing,1 a type discipline for exploiting the use of data-type predicates in the test expression of conditionals. Thus, if a test uses (number? x), the type system uses the type Number for x in the then branch and the declared type of x, minus Number, otherwise. Three years of extensive use have revealed several shortcomings in our original design. One significant problem concerns control flow governed by logical combinations (e.g., and, or, not) of predicate tests. Another is that the type system cannot track uses of predicates applied to structure selectors such as car. This lack of expressiveness in the type system is due to fundamental limitations. First, our original system does not consider asymmetries between the then and else branches of conditionals. For example, when the expression (and (number? x) (> x 100)) is true, the type system should know that x is a number, but x might or

Programmers reason about their programs using a wide variety of formal and informal methods. Programmers in untyped languages such as Scheme or Erlang are able to use any such method to reason about the type behavior of their programs. Our type system for Scheme accommodates common reasoning methods by assigning variable occurrences a subtype of their declared type based on the predicates prior to the occurrence, a discipline dubbed occurrence typing. It thus enables programmers to enrich existing Scheme code with types, while requiring few changes to the code itself. Three years of practical experience has revealed serious shortcomings of our type system. In particular, it relied on a system of ad-hoc rules to relate combinations of predicates, it could not reason about subcomponents of data structures, and it could not follow sophisticated reasoning about the relationship among predicate tests, all of which are used in existing code. In this paper, we reformulate occurrence typing to eliminate these shortcomings. The new formulation derives propositional logic formulas that hold when an expression evaluates to true or false, respectively. A simple proof system is then used to determine types of variable occurrences from these propositions. Our implementation of this revised occurrence type system thus copes with many more untyped programming idioms than the original system. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features General Terms Keywords

1.

Languages

Type systems, Untyped languages, Logic

Reasoning about Untyped Languages

Developing programs in a typed language helps programmers avoid mistakes. It also forces them to provide some documentation, and it establishes some protective abstraction barriers. As such, the type system imposes a discipline on the programming process. Nevertheless, numerous programmers continue to choose untyped scripting languages for their work, including many who work in a functional style. When someone eventually decides that explicitly stated type information reduces maintenance cost, they face a ∗ This

research was partially supported by grants from the US NSF and a donation from the Mozilla Foundation.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

1 Komondoor

et al. [2005] independently coined the term “occurrence typing” in the context of providing an advanced type system for COBOL.

117

2.1

might not be a number when the expression is false, since it might be 97 or "Hello". Second, the type system does not appropriately distinguish between selector expressions such as (car x) and predicate expressions such as (number? x). Third, the treatment of combinations of tests relies on an ad-hoc collection of rules. In this paper, we present a new and simple framework for occurrence typing that eliminates all three problems via an increase in expressive power. The key innovation is to turn control flow predicates into formulas in propositional logic for reasoning about the types of variables. The atomic propositions include statements that a particular variable has a particular type, replacing the previous collection of special cases with textbook rules of logical inference. This new design allows the type system to reason about combinations of predicates:

The following function f always produces a number: S (define: (f [x : ( String Number)]) (if (number? x) (add1 x) (string-length x)))

. . . (let ([x (member v l)]) (if x — compute with x — (error ’fail))) . . .

Example 3

This idiom, seen here in member, uses arbitrary non-#f values as true and uses #f as a marker for missing results, analogous to ML’s NONE. The type for member specifies this behavior with an appropriate type signature. It can thus infer that in the then branch, x has the type of the desired result and is not #f. 2.2

Challenges

Of course, programmers write tests beyond simple applications of predicates such as (number? x). For example, logical connectives can combine the results of predicates:2

Now the programmer and the revised type system both determine that since (number? (node-left x)) is true in the second clause, (string? (node-right y)) must be false, and thus, (node-right y) is not a string. Using propositional logic to reason about predicates handles this and many other similar situations. Beyond the new type system, this paper contributes:

. . . (if (or (number? x) (string? x)) (f x) 0) . . .

Example 4

For this fragment to typecheck, the type system must recognize S that (or (number? x) (string? x)) ensures that x has type ( String Number) in the then branch, the domain of f from example 2. For and, there is no such neat connection:

• a full-fledged implementation of the calculus, now known as

. . . (if (and (number? x) (string? y)) (+ x (string-length y)) 0) . . .

Typed Racket, addressing the full complexities of the functional core of Racket, such as mutable data and multiple arguments; • an empirical study of the usefulness of our extensions; and

Example 5

Example 5 is perfectly safe, regardless of the values of x and y. In contrast, the next example shows how little we know when a conjunction evaluates to false:

• a novel model-theoretic proof technique for type soundness.

The paper begins with a brief review of the essence of occurrence typing with an emphasis on programming idioms that our original system cannot typecheck. We then describe the new system in a semi-formal manner. In the three following sections, we describe the system formally: first the core system of occurrence typing, then several extensions demonstrating the expressiveness of the system, and third the proof of soundness. These sections are followed by a description of our implementation strategy and empirical measures of its usefulness on existing code. Finally, we discuss related work and conclude.

;; x is either a Number or a String . . . (if (and (number? x) (string? y)) (+ x (string-length y)) (string-length x)) . . .

Example 6

Here a programmer falsely assumes x to be a String when the test fails. But, the test may produce #f because x is actually a String, or because y is not a String while x is a Number. In the latter case, (string-length x) fails. In general, when a conjunction is false, we do not know which conjunct is false. Finally, and is expressible using nested if expressions, a pattern that is often macro-generated:

A Brief Introduction to Occurrence Typing

. . . (if (if (number? x) (string? y) #f) (+ x (string-length y)) 0) . . .

Here is the simplest example of occurrence typing: . . . (if (number? x) (add1 x) 0) . . .

Example 2

If (number? x) produces #t, x is an appropriate input for add1. If it produces #f, x must be a String by process of elimination; it is therefore an acceptable input to string-length. To handle this program, the type system must take into account not only when predicates hold, but also when they fail to hold. Our next fragment represents the essence of a common idiom:

(cond [(and (number? (node-left x)) (string? (node-right y))) ;; known: (node-right y) is a string, (node-left x) is a number ...] [(number? (node-left x)) ;; known: (node-right y) is not a string ...] [(string? (node-right y)) ;; known: (node-left x) is not a number . . . ])

2.

Existing Capabilities

Example 1

Example 7

One way for the type system to deal with this pattern is to reason that it is equivalent to the conjunction of the two predicates. So far, we have seen how programmers can use predefined predicates. It is important, however, that programmers can also abstract over existing predicates:

Regardless of the value of x, this program fragment always produces a number. Thus, our type system should accept this fragment, regardless of the type assigned to x, even if the type is not legitimate for add1. The key to typing this program is to assign the second occurrence of x a different, more precise type than it has in the outer context. Fortunately, we know that for any value of type Number, number? returns #t; otherwise, it returns #f. Therefore, it is safe to use Number as the type of x in the then branch.

(define: (strnum? [x : >]) ;; > is the top type (or (string? x) (number? x))) 2 The

Example 8

original system could handle only an encoding of or, with different semantics than that provided by Racket.

118

S Take the previous example of a test for ( String Number). A programmer can use the test to create the function strnum?, which behaves as a predicate for that type. This means the type system must represent the fact that strnum? is a predicate for this type, so that it can be exploited for conditionals. In example 4, we saw the use of or to test for disjunctions. Like and, or is directly expressible using if: (if (let ([tmp (number? x)]) (if tmp tmp (string? x))) (f x) 0)

2.3

Our type system correctly handles all of the preceding examples. Finally, we combine these features into an example that demonstrates all aspects of our system: S (λ: ([input : ( Number String)] Example 14 [extra : h>, >i]) (cond [(and (number? input) (number? (car extra))) (+ input (car extra))] [(number? (car extra)) (+ (string-length input) (car extra))] [else 0]))

Example 9

The expansion is analyzed as follows: if (number? x) is #t, then so is tmp, and thus the result of the inner if is also #t. Otherwise, the result of the inner if is (string? x). This code presents a new challenge for the type system, however. Since the expression tested in the inner if is the variable reference tmp, but the system must also learn about (number? x) from the test of tmp.

In section 5.3, we return to this example with a type system that checks it correctly.

3.

3.1

Example 10

(if (number? x) (add1 x) 0) In this example, the typechecker must propagate information from the test to the then branch. Therefore, the typechecker really proves the proposition that “if the test evaluates to #t, then x is a number”, a proposition abbreviated as Nx , with N short for Number. The typechecker then uses this proposition to check the then branch. The proposition Nx is computed from (number? x) by combining information from two sources. On one hand, the type of number? includes the information that it is a predicate. On the other, testing x produces information about the variable x. The addition of a proposition as part of the type of the number? function accomplishes the first goal. Specifically, the added proposition allows the type of a function to describe what propositions are derivable when the function produces a true value. Borrowing terminology from work on effect systems [Lucassen and Gifford 1988], we refer to these propositions as latent. Borrowing notation from dependent types, we name the argument in each function, allowing latent propositions to be well-scoped in function types. If the argument to number? is named y, the latent proposition is Ny . This makes the type of number?:

Example 11

the test expression refines the type of p from the declared h>, >i to the required hNumber, Numberi. This is the expected result of the conjunction of tests on the car and cdr fields. Example 12 shows how programmers can simultaneously abstract over the use of both predicates and selectors: (define carnum? (λ: ([x : h>, >i]) (number? (car x))))

Example 12

The carnum? predicate tests if the car of its argument is a Number, and its type must capture this fact.

Ny y:>−−→B

Reasoning Logically Of course, we do learn something when conjunctions such as those in examples 5 and 6 are false. When a conjunction is false, we know that one of the conjuncts is false, and thus when all but one are true, the remaining one must be false. This reasoning principle is used in multi-way conditionals, which is a common idiom extensively illustrated in How to Design Programs [Felleisen et al. 2001]: . . . (cond [(and (number? x) (string? y)) — 1 —] [(number? x) — 2 —] [else — 3 —]) . . .

To satisfy the second goal, we modify the type system so that it derives an object for each expression. The object describes which part of the environment an expression accesses. In our example, it is simply x. Given these pieces of information, the typechecker obtains the desired proposition about a predicate application from the substitution of the actual object for the formal parameter in the latent proposition. For the first example, the result isSNx . In example 2, x initially has the type ( String Number). To check the else branch, the typechecker needs the information that x is not a Number; i.e., that it is a String. It computes this information via two propositions, one for each of the then and else branches. For the then branch the proposition is Nx , as above. For the else branch, the type checker must propagate the proposition “x is not a Number”—written Nx —from the test to the else branch. To this end, function types are actually equipped with two latent propositions: one for when the function produces a true value, and

Example 13

This program represents a common idiom. In clause 1, we obviously know that x is a Number and y is a String. In clause 2, x is again a Number. But we also know that y cannot be a String. To effectively typecheck such programs, the type system must be able to follow this reasoning. 3 Racket

Propositions and Objects

Recall example 1:

Even if p has the pair type h>, >i, then example 10 should produce a number.3 Of course, simply accommodating repeated applications of car is insufficient for real programs. Instead, the relevant portions of the type of p must be refined in the then and else branches of the if. In the next example: (λ: ([p : h>, >i]) (if (and (number? (car p)) (number? (cdr p))) (g p) ’no))

How to Check the Examples

Next we use the preceding examples to explain the basic ideas of our new system for occurrence typing.

Selectors All of the tests thus far only involve variables. It is also useful to subject the result of arbitrary expressions to type tests: . . . (if (number? (car p)) (add1 (car p)) 7) . . .

Putting it all Together

pairs are immutable; this reasoning is unsound for mutable pairs.

119

3.5

one for when it produces a false value. Thus, the type of number? is now Ny |Ny y:>−−−−−→B

The essence of example 10 is the application of predicates to selector expressions, e.g., (car p). Our type system represents such expressions as complex objects. For example, (number? (car p)) involves a predicate with latent propositions Nx |Nx applied to an expression whose object indicates that it accesses the car field of p. We write car(p) for this object. Thus, the entire expression has proposition Ncar(p) for the then branch and Ncar(p) for the else branch, obtained by substituting car(p) for x in the latent propositions. Combinations of such tests (example 11) and abstraction over them (example 12) work as before. To specify the access behavior of selectors, each function type is equipped with a latent object, added below the arrow. For car, it is car. But, since selectors can be composed arbitrarily, the function

with the two propositions separated by |. Substituting x for y in the latent propositions produces the desired results. Contrary to appearances, pairs of propositions need not be complementary. Recall (and (number? x) (> x 100)) from section 1. In this case, the then proposition should be Nx , because if the and expression produces #t, x must be a number. But the else proposition cannot be Nx , since x might have been 97, which would produce #f but is nonetheless a number. 3.2

Handling Complex Tests

For complex tests, such as those of example 4, the type system combines the propositions of different subexpressions. In the cited example, the propositions for (number? x) and (string? x) are Nx |Nx and Sx |Sx , respectively. For the or expression, these should be combined to Nx ∨ Sx for the then branch and Nx ∧ Sx for the else branch. From these complex propositions, the typechecker derives propositions about the type of x. If x is a number or a string, x S is in ( N S). By S codifying this as a rule of inference, it is possible to derive ( N S)x from Nx ∨ Sx , just what is needed to check the S then branch. From Nx ∧ Sx it is similarly possible to derive ( N S)x , as expected for the else branch. To propagate propositions, we use a proposition environment instead of a type environment; the type environment becomes a special case. Examples 5 and 6 are dealt with in the same manner, but with conjunction instead of disjunction. In example 7, the test expression of the outer if is itself an if expression. The typechecker must derive propositions from it and propagate them to the then and else branches. Thus, it first computes the propositions derived for each of the three subexpressions, giving Nx |Nx for the test and Sy |Sy for the then branch. Since the else branch—a plain #f— never produces a true value, the relevant propositions are ff and tt, the impossible and trivial propositions. 3.3

(λ: ([x : h>, h>, >ii]) (car (cdr x))) has the type #fcar(cdr(x )) |#fcar(cdr(x )) x:h>, h>, >ii−−−−−−−−−−−−−−−−−→> car(cdr(x))

3.6

3.7

The Form of the Type System

The essence of our discussion can be distilled into five ideas: • Propositions express relationships between variables and types. • Instead of type environments, we use proposition environments.

The next challenge, due to example 8, is to include proposition information in function types for user-defined predicates:

• Typechecking an expression computes two propositions, which

hold when the expression evaluates to true or false, respectively.

(λ: ([x : >]) (or (string? x) (number? x)))

• Typechecking an expression also determines an object of in-

quiry, describing the particular piece of the environment pointed to by that expression. This piece of the environment may also be a portion of a larger data structure, accessed via a path.

As explained above, the typechecker S Sassigns the body of the function type B and derives ( N S)x |( N S)x as the then and else propositions. To add these to a function type, it merely moves these propositions into the arrow type:

• Latent propositions and objects are attached to function types in

order to describe facts about the result of applying the function.

S)x x:>−−−−−−−−−−−−−→B

S)x |(S N

The next sections translate these ideas into a typed calculus, λTR .

The key to the simplicity of this rule is that the bound variable of the λ expression becomes the name of the argument in the function type, keeping the propositions well-scoped. 3.4

Reasoning Logically

Next we revisit conjunctions such as (and (number? x) (string? y)). If this expression evaluates to #f, the typechecker can infer some propositions about x and y. In particular, since the original expression derived the proposition Nx ∨ Sy for the else branch, the type system can combine this environmental information with the results of subsequent tests. In example 13, the type system derives the proposition Nx when the second cond clause produces true, which means Sy holds, too. In short, maintaining propositions in the environment allows the typechecker to simulate the reasoning of the programmer and to track the many facts available for deducing type correctness for an expression.

Abstracting over Predicates

S ( N

Selectors

4.

The Base Calculus

We begin our presentation of λTR with the base system, a typed lambda calculus with booleans, numbers, and conditionals. In Section 5, we extend the system with pairs and local variable binding. The fundamental judgment of the type system is

Variables as Tests

In examples 3 and 9, the test expression is just a variable. For such cases, the typechecker uses the proposition #fx in the else branch to indicate that variable x has the value #f. Conversely, in the then branch, the variable must be true, giving the proposition #fx . Example 9 demands an additional step. In the then branch of the if, tmp must be true. But this implies that (number? x) must also be true; the propsition representing this implication, #ftmp ⊃ Nx , is added to the environment used to check the body of the let expression.

Γ ` e : τ ; ψ+ |ψ− ; o It states that in environment Γ, the expression e has type τ, comes with then proposition ψ+ and else proposition ψ− , and references object o. That is, if e evaluates to a true value, then proposition ψ+ holds; conversely, if e evaluates to a false value, ψ− is true. Further, if e evaluates to a value, then looking up o in the runtime environment produces the same value.

120

d, e ::= x | (e e) | λx τ.e | (if e e e) | c | #t | #f | n c ::= add1 | zero? | number ? | boolean? | procedure? S → ψ|ψ σ, τ ::= > | N | #t | #f | ( − τ ) | x:τ −−→τ o

ψ o Γ

::= τx | τ x | ψ ⊃ ψ | ψ ∨ ψ | ψ ∧ ψ | ff | tt ::= x | ∅ − → ::= ψ

Expressions Primitive Operations Types Propositions Objects Environments

Figure 1. Syntax of Terms, Types, Propositions, and Objects 4.1

for a variable indicates that if x evaluates to a true value, x cannot have type #f. Similarly, if x evaluates to #f, its type is #f.

Syntax

The syntax of expressions, types, propositions, and objects is given in figure 1. The expression syntax is standard, with conditionals, numeric and boolean constants, and primitive operators, in addition to the basics of abstraction, application, and variable reference. Abstractions come with type annotations on the bound variable. The presentation of the standard operational semantics is deferred to section 6, in conjunction with the soundness proof. As for types, > is the supertype of all types; N is the type of numeric values; #t and #fSare the types of the true and false → constants, respectively; and ( − τ ) isSthe untagged or “true”S union of its components. We abbreviate ( #t #f) as B and ( ) as ⊥. Function types name their arguments. This name is in scope in the latent propositions and objects of a function type—written above and below the arrow, respectively—and in the result type. The latent propositions are knowledge about the types of variables when the function produces a true or false value, respectively. Most propositions come in familiar form, borrowed from propositional logic: implications, disjunctions, and conjunctions, plus always-true (tt) and always-false (ff ) propositions. Atomic propositions relate variables to their types: τx states that x has type τ; τ x states that x never assumes a value with type τ. An object describes a portion of the runtime environment. In the base system, it is either a variable or the empty object. For example, the expression (add1 7) has object ∅ because it does not access any portion of the environment. Environments are simply collections of arbitrary propositions.

Abstraction and Application The rule for checking an abstraction, T-A BS, takes the propositions and object from the body and makes them the latent propositions and object in the function type. By taking the bound variable from the abstraction and turns it into the name of the argument, references to the variable in the types, propositions, and object remain well-scoped. Correspondingly, in T-A PP, the latent propositions and object are used as the result propositions and object, just as with the result type. In all cases, the actual object oa is substituted for the name of the formal parameter, x. Consider the abstraction: λy >.(number ? y) which typechecks as follows. In the body of the abstraction, Γ = >y . Thus, Γ ` y : > ; #fy |#fy ; y, and number ? has the above-mentioned type. To check the application, the typechecker substitutes y for x in the result type, latent propositions, and latent object of number ?, which yields Γ ` (number ? y):B ; Ny |Ny ; ∅. Finally, the function is assigned the desired type via T-A BS: Ny |Ny y:>−−−−−→B ∅

In our prior system, this example required multiple specialpurpose rules and the use of several metafunctions, whereas here it is a simple matter of scope and substitution. Of course, substitution of an object for a variable must account for the case when the object is ∅. When this happens, any references to the variable are forgotten, and propositions or objects that refer to it become trivial. Figure 8 defines the full substitution function.

Types and Propositions Unlike many systems that relate type systems and logic, λTR distinguishes types, propositions, and terms. Propositions state claims about the runtime environment and thus relate types and variables. This choice allows a simple and decidable logic to be used to derive types from propositions to achieve the desired expressiveness.

Constants The simplest rule is T-N UM, which gives all numbers the type N. Since numbers are treated as true by the evaluation rules for if, numeric constants are assigned the propositions tt|ff , indicating that no new information is acquired when the number evaluates to a true value; if it evaluates to false, a contradiction is obtained. The rule for function constants, T-C ONST, work in similar manner, though we use a δτ function to assign types to function constants. The boolean constants are given singleton types by T-T RUE and T-FALSE, along with propositions that reflect that #t is always true, and #f is always false. All of the constants have the object ∅, since none refer to any portion of the environment.

Conditionals As far as types and objects are concerned, the T-I F rule is straightforward. The test may have any type and object, and the then and else branches must have identical types and objects, which then become the type and the object of the entire expression. The key difference between T-I F and conventional rules for conditionals is due to the differential propagation of knowledge from the test to the branches. Specifically, the rule uses two distinct environments to check the then and else branches, because ψ1+ holds in the then branch and ψ1− holds in the else branch. The resulting propositions follow from a simple principle about the evaluation of if. If a true value is produced, either the then branch or the else branch must have evaluated to a true value, and similarly for a false value. Therefore, in the true case, either the then proposition of the then branch, ψ2+ , or the then proposition of the else branch, ψ3+ , must be true, which means ψ2+ ∨ ψ3+ is the then proposition of the entire expression and, correspondingly, ψ2− ∨ ψ3− is the else proposition.

Variables The rule for typing variables, T-VAR, exploits the proof system. If the current environment proves that x has type τ, represented by the proposition τx , then the type system assigns x the type τ. The object for a variable is itself. Finally, the propositions

Subsumption & Subtyping Finally, λTR comes with subtyping. Expressions of type τ can be viewed as having a larger type τ 0 . Objects can also be lifted to larger objects. The ordering on propositions is simply provability in the current environment.

4.2

Typing Rules

Figures 2 and 3 specify the typing and subtyping rules.

121

T-N UM

T-C ONST

T-T RUE

T-FALSE

Γ ` n : N ; tt|ff ; ∅

Γ ` c : δτ (c) ; tt|ff ; ∅

Γ ` #t : #t ; tt|ff ; ∅

Γ ` #f : #f ; ff |tt ; ∅

T-A PP ψf

T-VAR

−

Γ ` e 0 : σ ; ψ+ 0 |ψ− 0 ; o 0 Γ ` (e e ) : τ[o 0 /x] ; ψf+ |ψf− [o 0 /x] ; of [o 0 /x]

Γ, σx ` e : τ ; ψ+ |ψ− ; o

Γ ` x : τ ; #fx |#fx ; x

|ψf

of

T-A BS

Γ ` τx

+

Γ ` e : x:σ −−−−−−→τ ; ψ+ |ψ− ; o 0

ψ+ |ψ−

Γ ` λx σ.e : x:σ −−−−→τ ; tt|ff ; ∅ o

T-S UBSUME

T-I F

Γ ` e : τ ; ψ+ |ψ− ; o 0 0 Γ, ψ+ ` ψ+ Γ, ψ− ` ψ− 0 0 ` τ <: τ ` o <: o 0 0 Γ ` e : τ 0 ; ψ+ |ψ− ; o0

Γ ` e1 : τ1 ; ψ1+ |ψ1− ; o1 Γ, ψ1+ ` e2 : τ ; ψ2+ |ψ2− ; o Γ, ψ1− ` e3 : τ ; ψ3+ |ψ3− ; o Γ ` (if e1 e2 e3 ) : τ ; ψ2+ ∨ ψ3+ |ψ2− ∨ ψ3− ; o Figure 2. Typing Rules

S-F UN SO-R EFL

S-R EFL

` o <: o

` τ <: τ

SO-T OP

S-T OP

` o <: ∅

` τ <: >

S-U NION S UPER

∃i. ` τ <: σi [ − → ` τ <: ( σ i)

` σ 0 <: σ ` τ <: τ 0 0 ψ+ ` ψ + ψ− ` ψ− 0 ` o <: o 0

S-U NION S UB

−−−−−−→i ` τi <: σ [ − → `( τ i) <: σ

ψ+ |ψ−

ψ+ 0 |ψ− 0

o

o

` x:σ −−−−→τ <: x:σ 0 − −−−0−− →τ 0

Figure 3. Subtyping Rules tion Nx ∧ Sy |tt since ff ` Nx ∧ Sy . Therefore, the entire inner if expression has then proposition

Given these definitions, the rules for subtyping are straightforward. All types are subtypes of > and of themselves. Subtypes of elements of a union are subtypes of the union, and any type that is a supertype of every element is a supertype of the union. Finally, function types are ordered in the usual fashion. 4.3

(Nx ∧ Sy ) ∨ (Nx ∧ Sy ) = Nx ∧ Sy and else proposition tt. Second, we typecheck the then branch of the main if expression in the environment Γ1 = >x , >y , Nx ∧ Sy . Since Γ1 ` Nx and Γ1 ` Sy , we can give x and y the appropriate types to check the expression (+ x (string-length y)).

Proof System

Figure 4 specifies the proof rules for our logic. The first nine rules—L-ATOM through L-O R E—use the natural deduction style to express the standard rules of propositional logic. The subsequent four rules relate the atomic propositions. In particular, L-S UB says that if x has type τ, then it has any larger type. Similarly, L-S UB N OT says that if x does not have type τ, then it does not have any smaller type. By L-B OT, if x has an empty type, it is possible to conclude anything since this is impossible. The L-U PDATE rule refines the type of a variable via a combination of multiple propositions. Roughly speaking, this metafunction satisfies the equations update(τ, σ) = τ ∩ σ

5.

5.1

Pairs

The most significant extension concerns compound data, e.g., pairs. We extend the expression, type, and proposition grammars as shown in figure 5.4 Most significantly, in all places where a variable appeared previously in propositions and objects, it is now legal to specify a path—a sequence of selectors—rooted at a variable, written π(x). This allows the system to refer not just to variables in the environment, but to parts of their values.

update(τ, σ) = τ − σ

See figure 9 for the full definition. 4.4

Extensions

The base system of section 4 lacks several important features, including support for compound data structures and let. This section shows how to extend the base system with these features.

A Worked Example

At this point, eight of our 14 examples typecheck. To illustrate the workings of the type system, let us work example 7:

Typing Rules Figure 6 shows the extensions to the typing and subtyping rules. Again, the subtyping rule S-PAIR and typing rule for cons are straightforward; all pair values are treated as true. The T-C AR and T-C DR rules are versions of the application rule specialized to the appropriate latent propositions and objects, which here involve non-trivial paths. Substitution of objects for variables is also appropriately extended; the full definition is in figure 8. None of the existing typing rules require changes.

(if (if (number? x) (string? y) #f) (+ x (string-length y)) 0) First, assume that the initial environment is Γ = >x , >y . Now consider the inner if expression. The test has then proposition Nx and else proposition Nx . The then branch has propositions Sy and Sy , or by subsumption Nx ∧ Sy |tt, since T-I F adds the then proposition of the test to the environment for checking the then branch. The else branch has propositions ff |tt, and by subsump-

4 In

122

a polymorphic λTR , pair operations could be added as primitives.

L-A ND I L-ATOM

ψ∈Γ Γ`ψ

Γ ` ψ1 Γ ` ψ2 Γ ` ψ1 ∧ ψ2

L-FALSE

L-T RUE

Γ ` ff Γ`ψ

Γ ` tt

L-A ND E

Γ, ψ1 ` ψ or Γ, ψ2 ` ψ Γ, ψ1 ∧ ψ2 ` ψ L-O R E

L-I MP E

Γ ` ψ1 Γ ` ψ1 ⊃ ψ 2 Γ ` ψ2

L-I MP I

Γ, ψ1 ` ψ2 Γ ` ψ1 ⊃ ψ2 L-S UB

Γ ` τx

` τ <: σ Γ ` σx

L-O R I

Γ ` ψ1 or Γ ` ψ2 Γ ` ψ1 ∨ ψ 2

Γ, ψ1 ` ψ Γ, ψ2 ` ψ Γ, ψ1 ∨ ψ2 ` ψ

L-S UB N OT

L-B OT

L-U PDATE

Γ ` σx ` τ <: σ Γ ` τx

Γ ` ⊥x Γ`ψ

Γ ` τx Γ ` νx Γ ` update(τ, ν)x

(The metavariable ν ranges over τ and τ (without variables).) Figure 4. Proof System

e ::= . . . | (cons e e) c ::= . . . | cons? | car | cdr σ, τ ::= . . . | hτ, τi

ψ o π pe

Expressions Primitive Operations Types

::= . . . | τπ(x) | τ π(x) ::= π(x) | ∅ → ::= − pe ::= car | cdr

Propositions Objects Paths Path Elements

Figure 5. Syntax Extensions for Pairs

S-PAIR

T-C ONS

` τ1 <: τ2 ` σ1 <: σ2 ` hτ1 , σ1 i <: hτ2 , σ2 i

Γ ` e1 : τ1 ; ψ1+ |ψ1− ; o1 Γ ` e2 : τ2 ; ψ2+ |ψ2− ; o2 Γ ` (cons e1 e2 ) : hτ1 , τ2 i ; tt|ff ; ∅

T-C AR

T-C DR

Γ ` e : hτ1 , τ2 i ; ψ0+ |ψ0− ; o ψ+ |ψ− = #fcar(x) |#fcar(x) [o/x] or = car(x)[o/x] Γ ` (car e) : τ1 ; ψ+ |ψ− ; or

Γ ` e : hτ1 , τ2 i ; ψ0+ |ψ0− ; o ψ+ |ψ− = #fcdr(x) |#fcdr(x) [o/x] or = cdr(x)[o/x] Γ ` (cdr e) : τ2 ; ψ+ |ψ− ; or

Figure 6. Type and Subtype Extensions L-S UB

L-S UB N OT

L-B OT

L-U PDATE

Γ ` τπ(x) ` τ <: σ Γ ` σπ(x)

Γ ` σ π(x) ` τ <: σ Γ ` τ π(x)

Γ ` ⊥π(x) Γ`ψ

Γ ` τπ0 (x) Γ ` νπ(π0 (x)) Γ ` update(τ, ν, π)π0 (x)

Figure 7. Logic Extensions Logic Rules Figure 7 specifies the changes to the logic for dealing with paths. For the first three rules, the only change needed is allowing paths in the appropriate syntactic locations. For the LU PDATE rule, there is an additional change. When the environment proves both h>, >ix and Ncar(x) , it must be possible to derive hN, >ix . The new version of L-U PDATE allows this inference via a revised version of update. Its third argument specifies a path to follow before refining the type. See figure 9 for details. Of course, none of the rules implementing the standard proof theory of propositional logic change with this extension. With the addition of pairs, the type system can cope with 12 of the 14 examples from section 2.

This rule has three components. The first antecedent checks the right-hand side. The second checks the body with an environment extended both with the type of the bound variable (τx ) and with implications stating that if x is not false, e0 must evaluate to true, and similarly if x is false, e0 must evaluate to false. The consequence replaces all references to x with the object of e0 .

5.2

5.3

ing us the following rule: T-L ET

Γ ` e0 : τ ; ψ0+ |ψ0− ; o0 Γ, τx , #fx ⊃ ψ0+ , #fx ⊃ ψ0− ` e1 : σ ; ψ1+ |ψ1− ; o1 Γ ` (let (x e0 ) e1 ) : σ[o0 /x] ; ψ1+ |ψ1− [o0 /x] ; o1 [o0 /x]

Local Binding

To add a local binding construct, we again extend the grammar:

The Final Example

With this extension, we are now able to check all the examples from section 2. To demonstrate theScomplete system, consider example 14. We begin with Γ0 = ( N S)input , h>, >iextra . The two tests, (number? input) and (number? (car extra)), yield the propositions Ninput |Ninput for the former and Ncar(extra) |Ncar(extra) for the latter. Using T-I F, T-S UBSUME, and the definition of and

d, e ::= . . . | (let (x e) e) Recall our motivating example 9. The crucial aspect is to relate the propositions about the initialization expression to the variable itself. Logical implication precisely expresses this connection, giv-

123

ψ+ |ψ− [o/x] 0

= ψ+ [o/x]|ψ− [o/x]

δτ (number ?)

0

νπ(x) [π (y)/x] νπ(x) [∅/x]+ νπ(x) [∅/x]− νπ(x) [o/z] νπ(x) [o/z]+ νπ(x) [o/z]− tt[o/x] ff [o/x] (ψ1 ⊃ ψ2 )[o/x]+ (ψ1 ⊃ ψ2 )[o/x]− (ψ1 ∨ ψ2 )[o/x] (ψ1 ∧ ψ2 )[o/x]

= (ν[π (y)/x])π(π0 (y)) = tt = ff = νπ(x) x 6= z and z 6∈ fv(ν) = tt x 6= z and z ∈ fv(ν) = ff x 6= z and z ∈ fv(ν) = tt = ff = ψ1 [o/x]− ⊃ ψ2 [o/x]+ = ψ1 [o/x]+ ⊃ ψ2 [o/x]− = ψ1 [o/x] ∨ ψ2 [o/x] = ψ1 [o/x] ∧ ψ2 [o/x]

π(x)[π 0 (y)/x] π(x)[∅/x] π(x)[o/z] ∅[o/x]

= π(π 0 (y)) =∅ = π(x) =∅

∅

δτ (boolean?)

x 6= z

update(hτ, σi, ν, π :: car) = hupdate(τ, ν, π), σi update(hτ, σi, ν, π :: cdr) = hτ, update(σ, ν, π)i update(τ, σ, ) = restrict(τ, σ) = remove(τ, σ) update(τ, σ, ) =⊥ if 6 ∃v. ` v : τ ; ψ1 ; o1 and ` v : σ ; ψ2 ; o2 −−−−−−−−→ → restrict((∪ − τ ), σ) = (∪ restrict(τ, σ)) restrict(τ, σ) =τ if ` τ <: σ restrict(τ, σ) =σ otherwise restrict(τ, σ)

∅

h>,>ix |h>,>ix

δτ (cons?)

= x:>−−−−−−−−−−−−→B

δτ (add1 ) δτ (zero?)

=N− →N =N− →B

∅

δ(add1 , n)

=n+1

δ(zero?, 0) δ(zero?, n)

= #t = #f

otherwise

δ(number ?, n) δ(number ?, v)

= #t = #f

otherwise

δ(boolean?, #t) δ(boolean?, #f) δ(boolean?, v)

= #t = #t = #f

otherwise

δ(procedure?, λx τ.e) δ(procedure?, c) δ(procedure?, v)

= #t = #t = #f

otherwise

δ(cons?, (cons v1 v2 )) = #t δ(cons?, v) = #f

otherwise

δ(car , (cons v1 v2 )) δ(cdr , (cons v1 v2 ))

= v1 = v2

Figure 13. Primitives

if ` τ <: σ recursion. There are a few tricky cases to consider, however. First, if the object being substituted is ∅, then references to the variable must be erased. In most contexts, such propositions should be erased to tt, the trivial proposition. But, just as with contravariance of function types, such propositions must be erased to ff when to the left of an odd number of implications. Second, if a proposition such as τx references a variable z in τ, then if z goes out of scope, the entire proposition must be erased. In comparison, the update metafunction is simple. It follows a path into its first argument and then appropriately replaces the type there with a type that depends on the second argument. If the second argument is of the form τ, update computes the intersection of the two types; if the second argument is of the form τ, update computes the difference. To compute the intersection and difference, update uses the auxiliary metafunctions restrict and remove, respectively.

otherwise

Figure 9. Type Update (see example 7), we can therefore derive the then proposition Ninput ∧Ncar(extra) and the else proposition Ninput ∨Ncar(extra) . The then proposition, added to the environment for the right-hand side of the first cond clause, also proves hN, >iextra by the LU PDATE rule, which suffices for typechecking the expression (+ input (car extra)). In the second cond clause, the test has the then proposition Ncar(extra) , and the environment is Γ0 , Ninput ∨Ncar(extra) From this, we can derive Ninput since Ncar(extra) and Ncar(extra) are S contradictory. Then, using ( N S)input from Γ0 and Ninput , we can derive Sinput , which is required to typecheck the application of string-length. This completes the second clause. The third clause is a constant and thus obvious. 5.4

Bx |Bx = x:>−−−−−→B

Figure 10. Constant Typing

Figure 8. Substitution

=⊥ −−−−−−−−→ = (∪ remove(τ, σ)) =τ

∅

→> ⊥− →> |⊥− x x δτ (procedure?) = x:>−−−−−−−−−−−−→B

Substitution on types is capture-avoiding structural recursion.

remove(τ, σ) → remove((∪ − τ ), σ) remove(τ, σ)

Nx |Nx = x:>−−−−−→B

6.

Semantics, Models, and Soundness

To prove type soundness, we introduce an environment-based operational semantics, use the environments to construct a model for the logic, prove the soundness of the logic with respect to this model, and conclude the type soundness argument from there.

Metafunctions

Equipped with the full formal system, we can now provide a detailed description of the substitution and type update metafunctions; see figures 8 and 9. Substitution replaces a variable with an object. When the object is of the form π(x), this is in general a straightforward structural

6.1

Operational Semantics

Figure 11 defines a big-step, environment-based operational semantics of λTR . The metavariable ρ ranges over value environments (or just environments), which are finite maps from variables

124

B-D ELTA

B-L ET

ρ`e⇓c ρ ` e0 ⇓ v δ(c, v) = v 0 ρ ` (e e 0 ) ⇓ v 0

B-VAR

ρ(x) = v ρ`x⇓v

ρ ` e a ⇓ va ρ[x 7→ va ] ` eb ⇓ v ρ ` (let (x ea ) eb ) ⇓ v

ρ ` λx τ.e ⇓ [ρ, λx τ.e]

B-I F T RUE

B-B ETA

ρ ` ef ⇓ [ρc , λx τ.eb ] ρ ` ea ⇓ v a ρc [x 7→ va ] ` eb ⇓ v ρ ` (ef ea ) ⇓ v

B-A BS

B-VAL

ρ`v⇓v

ρ ` e1 ⇓ v 1 v1 6= #f ρ ` e2 ⇓ v ρ ` (if e1 e2 e3 ) ⇓ v

B-C ONS

ρ ` e1 ⇓ v 1 ρ ` e2 ⇓ v 2 ρ ` (cons e1 e2 ) ⇓ (cons v1 v2 )

B-I F FALSE

ρ ` e1 ⇓ #f ρ ` e3 ⇓ v ρ ` (if e1 e2 e3 ) ⇓ v

Figure 11. Operational Semantics M-A ND

M-I MP

M-O R

ρ |= ψ1 or ρ |= ψ2 ρ |= ψ1 ∨ ψ2

ρ |= ψ implies ρ |= ψ 0 ρ |= ψ ⊃ ψ 0

ρ |= ψ ρ |= ψ 0 ρ |= ψ ∧ ψ 0

M-T OP

ρ |= tt

M-N OT T YPE M-T YPE

` ρ(π(x)) : τ ; ψ+ |ψ− ; o ρ |= τπ(x)

` ρ(π(x)) : σ ; ψ+ |ψ− ; o there is no v such that ` v : τ ; ψ1+ |ψ1− ; o1 and ` v : σ ; ψ2+ |ψ2− ; o2 ρ |= τ π(x) Figure 12. Satisfaction Relation 1. either o = ∅ or ρ(o) = v, 2. either v 6= #f and ρ |= ψ+ or v = #f and ρ |= ψ− , and 3. ` v : τ ; ψ+ 0 |ψ− 0 ; o 0 for some ψ+ 0 , ψ− 0 , and o 0 .

to closed values. We write ρ(x) for the value of x in ρ, and ρ(π(x)) for the value at path π in ρ(x). The central judgment is ρ`e⇓v

Proof: By induction on the derivation of the typing judgment. For illustrative purposes, we examine the T-I F case with e = (if e1 e2 e3 ). Proving part 1 is trivial: either o is ∅, or both e2 and e3 have an identical object, and e must evaluate to the results of one of them. To prove part 2, we note that if v = #f, either e2 or e3 must evaluate to #f. If it is e2 , we have ρ |= ψ2− , and thus ρ |= ψ2− ∨ ψ3− by M-O R, giving the desired result. The cases for e3 evaluating to false and the whole expression evaluating to true are dealt with in an analogous manner. Finally, part 3 is trivial, since both the then and else branches have the same type. Given this, we can state the desired soundness theorem.

which states that in environment ρ, the expression e evaluates to the value v. Values are given by the following grammar: v ::= c | #t | #f | n | [ρ, λx τ.e] | (cons v v) For the interpretation of primitives, see figure 13. 6.2

Models

A model is any value environment, and an environment ρ satisfies a proposition ψ, ρ |= ψ, as defined in figure 12 mostly in the usual manner. The satisfaction relation is extended to proposition environments in a pointwise manner. To formulate the satisfaction relation, we need a typing rule for closures:

Theorem 1 (Type Soundness for λTR ). If ` e : τ ; ψ+ |ψ− ; o and ` e ⇓ v then ` v : τ ; ψ+ 0 |ψ− 0 ; o 0 for some ψ+ 0 , ψ− 0 , and o 0 .

T-C LOS

∃Γ.ρ |= Γ and Γ ` λx τ.e : σ ; ψ+ |ψ− ; o ` [ρ, λx τ.e] : σ ; ψ+ |ψ− ; o

Proof: Corollary of lemma 2. This theorem comes with the standard drawbacks of big-step soundness proofs. It says nothing about diverging or stuck terms.5

Two clauses in figure 12—M-T YPE and M-N OT T YPE—need some explanation. They state that if a value of x in the environment has the type τ, the model satisfies τx , and if x has a type that does not overlap with τ, the model satisfies τ x . We can see immediately that not all propositions are consistent, such as ff , as expected, but also propositions such as Nx ∧ Bx . Our first lemma says that the proof theory respects models.

7.

As a core calculus, λTR lacks many of the features of a programming language such as Racket, which consists of a rich functional core enriched with support for object-oriented and componentoriented programming. Creating an implementation from the calculus demands additional research and engineering. This section reports on the most important implementation ideas. Our current implementation—dubbed Typed Racket—deals with the functional core of Racket, which also supports mutable

Lemma 1. If ρ |= Γ and Γ ` ψ then ρ |= ψ. Proof: Structural induction on Γ ` ψ. Conversely, we can use this lemma as the guideline that any logical rule that satisfies this lemma is appropriate.

5 To

6.3

From λTR to Typed Racket

Soundness for λTR

1. 2. 3. 4. 5.

With the model theory and the operational semantics in place, we can state and prove the second major lemma. Lemma 2. If Γ ` e : τ ; ψ+ |ψ− ; o, ρ |= Γ, and ρ ` e ⇓ v then all of the following hold:

125

deal with errors, we would need the following additional steps: Add an additional value, wrong, which has every type. Add evaluation rules that propagate wrong. Add evaluation rules to generate wrong for each stuck state. Add clauses to the δ function to generate wrong for undefined clauses. Prove that if ` e : τ ; ψ+ |ψ− ; o, then 6` e ⇓ wrong.

data structures, assignable variables and Scheme’s multiple value returns. In addition, the section discusses user interface issues and the implementation of the logical reasoning system. 7.1

the two propositions. For example, to specify the type of strnum? in Typed Racket syntax, the user writes S (: strnum? (> → Boolean : ( String Number))) S The syntax states that the latent then proposition is ( S N)x , where x is the name S of the argument, and the latent else proposition is conversely ( S N)x , with a latent object of ∅. In practice, this syntax suffices for capturing the substantial majority of the types with latent propositions. Third, Typed Racket uses local type inference and bi-directional typechecking [Pierce and Turner 2000]. Since all top-level definitions are annotated in the above fashion, the type system can propagate the latent propositions into non-latent propositions for the bodies of functions such as strnum?. In short, programmers almost never need to write down non-latent propositions.

Paths and Mutability

The λTR calculus assumes immutable data structures. In Racket, some forms of data are mutable, however. Predicate tests on paths into mutable data cannot soundly refine the type environment. Consider an example using Racket’s equivalent of ref cells: (let∗: ([b : (Box >) (box 0)] [b∗ : (Box >) b]) (if (number? (unbox b)) (begin (set-box! b∗ ’no) (unbox b)) 0)) A naive implementation might assign this fragment the type Number, but its evaluation produces a symbol, because b and b∗ are aliased. To avoid this unsoundness, the unbox procedure is assigned trivial latent propositions and object. In general, Racket provides structures that are mutable on a per-field basis and that are accessed with per-field selectors. Typed Racket therefore does not assign the selectors of mutable fields any latent propositions or objects. Assignable variables are a second concern in the same realm. If a variable is the target of an assignment, the L-U PDATE rule is unsound for this variable. Hence, Typed Racket uses a simple analysis to find all assignment statements and to disable the L-U PDATE rule for those. Since variables are assignable only in the module that declares them, the analysis is modular and straightforward. 7.2

7.4

Implementing the Logical System

The type system presented in section 4 is non-algorithmic. For an implementation, we must both eliminate the subsumption rule and implement the T-VAR rule. The former is accomplished via a distribution of the subtyping obligations among the rules. The latter demands the implementation of a notion of provability. Since propositional satisfiability is decidable, the logical system is straightforward to implement in principle. We employ three strategies to avoid paying the full cost of deciding the logic in almost all cases. First, we split the type environment from the proposition environment. This separation avoids invoking the logic to typecheck each variable reference. Second, Typed Racket eagerly simplifies logical formulas, significantly decreasing their typical size. Third, it also refines the type environment each time a new formula is added to the proposition environment. These optimizations mean that most code does not pay the cost of the proof system. These techniques are well-known from work on SAT solvers. Since deciding the logical inference rules of λTR can be cast as a satisfiability-modulo-theories problem, we plan to investigate applying existing off-the-shelf SMT solvers [Ganzinger et al. 2004].

Multiple Arguments, Multiple Values

All λTR functions consume one argument and produce one value. In contrast, Racket functions are multi-ary and may produce multiple values [Flatt and PLT 2010]. For example, the function (define two-val? (λ ([x : >] [y : >]) (values (number? x) (string? y)))) determines both whether its first argument is a number, and whether its second argument is a string. Expressing this form of reasoning in our type system demands a different representation of function types. On the domain side, no additional work is needed because propositions and objects refer directly to the names of the parameters. On the range side, each function produces a sequence of values, each of which comes with a type, two latent propositions, and a latent object. In our running example, the latent propositions for the first return value are Nx |Nx . Although test positions cannot cope with multiple values, the following idioms exploits multiple values for tests:

8.

Empirical Measurements

Our new function type representation allows the type system to prove e1 is a number in the then branch of the if expression, and to use y in a later test expression for checking if e2 is a string.

Numerous encounters with difficult-to-type idioms in Racket code triggered the development of our new system. In order to measure its actual effectiveness in comparison with the original system, we inspected the existing Racket code base and measured the frequency of certain idioms in practice. Since precise measurement of programming idioms is impossible, this section begins with a detailed explanation of our empirical approach and its limitations. The following two subsections report on the measurements for the two most important idioms that motivate the Typed Racket work: those that involve predicates applied to selectors, as in example 10, and those that involve combinations of predicates, as in example 4. In both cases, our results suggest that our new approach to occurrence typing greatly improves our capability to enrich existing Racket code with types.

7.3

8.1

(let-values ([(x y) (two-val? e1 e2 )]) (if x — —))

User Presentation

While the Typed Racket types capture much useful information about the program, they can also be hard for programmers to read and write. Fortunately, in most cases a simplified type presentation suffices, and complex types are reserved for special occasions. First, few type errors involve types with non-trivial propositions directly. In our experience with Typed Scheme, almost all type errors are directly explicable with the underlying, non-occurrence typing portion of the system, meaning that users’ primary experience with occurrence typing is that it just works. Second, when users need to specify or read types with propositions or objects, these are primarily latent and symmetric between

Methodology

Measuring the usefulness of Typed Racket for typing existing code presents both opportunities and challenges. The major opportunity is that the existing Racket code base provides 650,000 lines of code on which to test both our hypotheses about existing code and our type system. The challenge is assessing the use of type system features on code that does not typecheck. Since the purpose of Typed Racket is to convert existing untyped Racket programs, it is vital to confirm its usefulness on existing code. Our primary strategy for assessing the usefulness of our type system has been the porting of existing code, which is the ultimate test of the ability of Typed Racket to follow the reasoning

126

be user-defined type predicates. Each of these uses requires the extension for local binding described in section 5.2, as well as the more general logical reasoning framework of this paper to generate the correct filters.

of Racket programmers. However, Typed Racket does not operate on untyped code; it requires type annotations on all functions and user-defined structures. Therefore, it is not possible to simply apply our new implementation to existing untyped code. Instead, we have applied less exact techniques to empirically validate the usefulness of our extensions to Typed Racket. Starting from the knowledge of particular type predicates, selectors, and patterns of logical combinations, we searched for occurrences of the relevant idioms in the existing code base. We then randomly sampled these results and analyzed the code fragments in detail; this allows us to discover whether the techniques of occurrence typing are useful for the fragment under consideration. This approach has two major drawbacks. First, it only allows us to count a known set of predicates, selectors, idioms, and other forms. Whether a function is a selector or a type predicate could only be answered with a semantics-based search, which is currently not available. Second, our approach does not determine if a program would indeed typecheck under Typed Racket, merely that the features we have outlined are indeed necessary. Further limitations may be discovered in the future, requiring additional extensions. However, despite these drawbacks, we believe that empirical study of the features of Typed Racket is useful. It has already alerted us to uses of occurrence typing that we had not predicted. 8.2

9.

Intensional Polymorphism Languages with intensional polymorphism [Crary et al. 1998] also offer introspective operations, e.g., typecase, allowing programs to dispatch on type of the data provided to functions. The λTR calculus provides significantly greater flexibility. In particular, it is able to use predicates applied to selectors, reason about combinations of tests, abstract over type tests, use both the positive and negative results of tests, and use logical formulas to enhance the expressiveness of the system. In terms of our examples, the system of Crary et al. could only handle the first. Generalized Algebraic Data Types Generalized algebraic data types [Peyton Jones et al. 2006] are an extension to algebraic data types in which “pattern matching causes type refinement.” This is sometimes presented as a system of type constraints, in addition to the standard type environment, as in the HMG(X) and LHM(X) systems [Simonet and Pottier 2007, Vytiniotis et al. 2010]. Such systems are similar to λTR in several ways—they typecheck distinct branches of case expressions with enriched static environments and support general constraint environments from which new constraints can be derived. The λTR calculus and constraint-based such as HMG(X) differ in two fundamental ways, however. First, HMG(X), like other GADT systems, relies on pattern matching for type refinement, whereas λTR combines conditional expressions and selector applications, allowing forms abstractions that patterns prohibit. Second, all of these systems work solely on type variables, whereas λTR refines arbitrary types.

Selectors

Our first measurement focuses on uses of three widely used, built-in selectors: car, cdr, and syntax-e (a selector that extracts expression representations from a AST node). A search for compositions of any predicate-like function with any of these selectors yields: 1. 254 compositions of built-in predicates to uses of car for which λTR would assign a non-trivial object; 2. 567 such applications for cdr; and 3. 285 such applications for syntax-e.

Types and Logic Considering types as logical propositions has a long history, going back to Curry and Howard [Curry and Feys 1958, Howard 1980]. In a dependently typed language such as Agda [Norell 2007], Coq [Bertot and Cast´eran 2004], or Epigram [McBride and McKinna 2004], the relationships we describe with predicates and objects could be encoded in types, since types can contain arbitrary terms, including terms that reference other variables or the expression itself. The innovation in λTR is to consider propositions that relate types and variables. This allows us to express the relationships needed to typecheck existing Racket code, while keeping the logic decidable and easy to understand.

Counting only known predicate names means that (number? (car x)) is counted but neither (unknown? (car y)) or (string? (car (f ))) are included because (f ) is not known to have a non-trivial object. In sum, this measurement produces a total of at least 1106 useful instances for just three selectors composed with known predicates. A manual inspection of 20 uses each per selector suggests that the extensions to occurrence typing presented in this paper are needed for just under half of all cases. Specifically, in the case of car, seven of 20 uses require occurrence typing; for cdr, nine of 20 benefit; and the same number applies to syntax-e. Additionally, in four cases the type correctness of the code would rely on flowsensitivity based on predicate tests, but using exceptional control flow rather than conditionals. In conclusion, our manual inspection suggests that some 40% to 45% of the 1106 cases found can benefit from extending occurrence typing to selector expressions, as described in section 5. This measurement together with the numerous user requests for this feature justifies the logical extensions for selector-predicate compositions. 8.3

Related Work

Types and Flow Analysis for Untyped Languages Shivers [1991] describes a type recovery analysis—exploited by Wright [1997] and Flanagan [1999] for soft typing systems—that includes refining the type of variables in type tests. This analysis is only for particular predicates, however, and does not support abstraction over predicates or logical reasoning about combinations of predicates. Similarly, Aiken et al. [1994] describe a type inference system using conditional types, which refine the types of variables based on patterns in a case expression. Since this system is built on the use of patterns, abstracting over tests or combining them, as in examples 12 or 5 is impossible. Further, the system does not account for preceding patterns when typing a right-hand side and thus cannot perform logical reasoning as in examples 13 and 14.

Logical Combinations

Since our original system cannot cope with disjunctive combination of propositions, typically expressed using or, measuring or expressions in the code base is a natural way to confirm the usefulness of general propositional reasoning for Typed Racket. The source of Racket contains approximately 4860 uses of the or macro; also, or expressions are expanded more than 2000 times for the compilation of the minimal Racket library, demonstrating that this pattern occurs widely in the code base. The survey of all or expressions in the code base reveals that or is used with 37 different primitive type predicates a total of 474 times, as well as with a wide variety of other functions that may

Types for Untyped Languages There has long been interest in static typechecking of untyped code. Thatte [1990] and Henglein [1994] both present systems integrating static and dynamic types, and Henglein and Rehof [1995] describe a system for automatic translation of untyped Scheme code into ML. These systems did not take into account the information provided by predicate tests, as described by Henglein and Rehof in the quote from section 1.

127

In the past few years, this work has been picked up and applied to existing untyped languages. In addition to Typed Scheme, proposals have been made for Ruby [Furr et al. 2009], Thorn [Wrigstad et al. 2010], JavaScript [ECMA 2007], and others, and theoretical studies have been conducted by Siek and Taha [2006] and Wadler and Findler [2009]. To our knowledge, none have yet incorporated occurrence typing or other means of handling predicate tests, although the authors of DRuby have stated that occurrence typing is their most significant missing feature [priv. comm.].

H. Ganzinger, G. Hagen, R. Nieuwenhuis, A. Oliveras, and C. Tinelli. DPLL(T): Fast Decision Procedures. In 16th International Conference on Computer Aided Verification, CAV’04, volume 3114 of Lecture Notes in Computer Science, pages 175–188. Springer-Verlag, 2004. F. Henglein. Dynamic typing: Syntax and proof theory. Sci. Comput. Programming, 22(3):197–230, 1994. F. Henglein and J. Rehof. Safe polymorphic type inference for a dynamically typed language: translating Scheme to ML. In Proc. Seventh International Conference on Functional Programming Languages and Computer Architecture, pages 192–203. ACM Press, 1995.

Semantic Subtyping Bierman et al. [2010] present Dminor, a system with a rule for conditionals similar to T-I F. Their system supports extremely expressive refinement types, with subtyping determined by an SMT solver. However, while λTR supports higherorder use of type tests, due to the limitations of the semantics subtyping framework, Dminor is restricted to first order programs.

10.

W. A. Howard. The formulas-as-types notion of construction. In J. P. Seldin and J. Hindley, editors, To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus, and Formalism, pages 479–490. Academic Press, 1980. R. Komondoor, G. Ramalingam, S. Chandra, and J. Field. Dependent types for program understanding. In Tools and Algorithms for the Construction and Analysis of Systems, volume 3440 of Lecture Notes in Computer Science, pages 157–173. Springer-Verlag, 2005.

Conclusion

J. M. Lucassen and D. K. Gifford. Polymorphic effect systems. In Proc. 15th Symposium on Principles of Programming Languages, pages 47– 57. ACM Press, 1988. C. McBride and J. McKinna. The view from the left. Journal of Functional Programming, 14(1):69–111, 2004. U. Norell. Towards a practical programming language based on dependent type theory. PhD thesis, Chalmers University of Technology, 2007. S. Peyton Jones, D. Vytiniotis, S. Weirich, and G. Washburn. Simple unification-based type inference for GADTs. In Proc. Eleventh International Conference on Functional Programming, pages 50–61. ACM Press, 2006.

This paper describes a new framework for occurrence typing. The two key ideas are to derive general propositions from expressions and to replace the type environment with a propositions environment. These ideas increase the type system’s expressive power via reasoning tools from propositional logic. Acknowledgements Discussions with Aaron Turon greatly improved this paper. The development of Typed Racket has been supported by Stevie Strickland, Eli Barzilay, Hari Prashanth K R, Vincent St-Amour, Ryan Culpepper and many others. Jed Davis provided assistance with Coq.

B. C. Pierce and D. N. Turner. Local type inference. ACM Trans. Progr. Lang. Sys., 22(1):1–44, 2000. J. C. Reynolds. Automatic computation of data set definitions. In IFIP Congress (1), pages 456–461, 1968.

References A. Aiken, E. L. Wimmers, and T. K. Lakshman. Soft typing with conditional types. In Proc. 21st Symposium on Principles of Programming Languages, pages 163–173. ACM Press, 1994. Y. Bertot and P. Cast´eran. Interactive Theorem Proving and Program Development, volume XXV of EATCS Texts in Theoretical Computer Science. Springer-Verlag, 2004. G. M. Bierman, A. D. Gordon, C. Hricu, and D. Langworthy. Semantic subtyping with an SMT solver. In Proc. Fifteenth International Conference on Functional Programming. ACM Press, 2010. R. Cartwright. User-defined data types as an aid to verifying LISP programs. In International Conference on Automata, Languages and Programming, pages 228–256, 1976. K. Crary, S. Weirich, and G. Morrisett. Intensional polymorphism in type-erasure semantics. In Proc. Third International Conference on Functional Programming, pages 301–312. ACM Press, 1998. H. Curry and R. Feys. Combinatory Logic, volume I. North-Holland, 1958. ECMA. ECMAScript Edition 4 group wiki, 2007. URL http://wiki.ecmascript.org/. M. Felleisen, R. B. Findler, M. Flatt, and S. Krishnamurthi. How to Design Programs. MIT Press, 2001. URL http://www.htdp.org/. C. Flanagan and M. Felleisen. Componential set-based analysis. ACM Trans. Progr. Lang. Sys., 21(2):370–416, 1999. M. Flatt and PLT. Reference: Racket. Reference Manual PLT-TR2010-1, PLT Scheme Inc., June 2010. http://racket-lang.org/techreports/. M. Furr, J.-h. D. An, J. S. Foster, and M. Hicks. Static type inference for ruby. In SAC ’09: Proc. 2009 ACM Symposium on Applied Computing, pages 1859–1866. ACM Press, 2009.

O. Shivers. Control-Flow Analysis of Higher-Order Languages or Taming Lambda. PhD thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania, 1991. J. G. Siek and W. Taha. Gradual typing for functional languages. In Seventh Workshop on Scheme and Functional Programming, University of Chicago Technical Report TR-2006-06, pages 81–92, September 2006. V. Simonet and F. Pottier. A constraint-based approach to guarded algebraic data types. ACM Trans. Progr. Lang. Sys., 29(1):1–54, 2007. S. Thatte. Quasi-static typing. In Proc. 17th Symposium on Principles of Programming Languages, pages 367–381. ACM Press, 1990. S. Tobin-Hochstadt and M. Felleisen. The design and implementation of Typed Scheme. In Proc. 35th Symposium on Principles of Programming Languages, pages 395–406. ACM Press, 2008. D. Vytiniotis, S. Peyton Jones, and T. Schrijvers. Let should not be generalized. In TLDI ’10: Proc. 5th workshop on Types in language design and implementation, pages 39–50. ACM Press, 2010. P. Wadler and R. B. Findler. Well-typed programs can’t be blamed. In ESOP ’09: Proc. Eighteenth European Symposium on Programming, volume 5502 of Lecture Notes in Computer Science, pages 1–16. SpringerVerlag, 2009. A. K. Wright and R. Cartwright. A practical soft type system for Scheme. ACM Trans. Progr. Lang. Sys., 19(1):87–152, 1997. ¨ T. Wrigstad, F. Z. Nardelli, S. Lebresne, J. Ostlund, and J. Vitek. Integrating typed and untyped code in a scripting language. In Proc. 37th Symposium on Principles of Programming Languages, pages 377–388. ACM Press, 2010.

128

TeachScheme!—A Checkpoint Matthias Felleisen PLT Northeastern University, Boston, Massachusetts [email protected]

Abstract

struction [2]. Hence functional programming is the most natural fit. Unlike books, functional programming brings mathematics to life for kids—directly and without much ado. In this context, an animation is a mathematical function (from time to scenes); an interactive, graphical program is a mathematical expression; and a family of web pages is the result of some more mathematics. With functional programming, mathematics becomes fun; it is no longer a dry, paper-and-pencil exercise. Best of all, the basic rules of algebraic expression evaluation explain the computational model of functional programming, justifying the idea that it teaches the principles of computing and programming. At this point, our curriculum works with algebraic, geometric, trigonometric, and precalculus knowledge; implicitly, it also touches on mathematical integration in several different ways. Good programming also means planning, organizing, and sticking to a discipline. As such, programming can benefit students by teaching how to solve problems systematically. We realized from the beginning, however, that the connection between conventional programming courses and systematic problem solving was tenuous at best. More commonly, students and teachers would approach programming with the goal of satisfying the parser and getting decent output for a few program runs. Even books on functional programming didn’t offer more. If we wanted to use functional programming to teach systematic problem solving, we had to create a curriculum that was explicitly design-oriented. Fifteen years ago, it didn’t exist [5]. How to Design Programs (HtDP), our text for high schools and colleges [3], is the principal result of our effort. It uses the ideas of the functional community to teach programming as a systematic activity, i.e., as a design activity. A functional program deals with values; there is no imperative to parse text from some input medium or to write text to some output medium to see how the program works. Values come in a wide variety of flavors: atomic values; compound values; unions; hierarchically nested values; arbitrarily large values; higher-order values; and so on. In sum, functional program design is easy to explain as a two-dimensional grid: one axis describes the process, and the other axis describes the varieties of data. The content of the grid is program design, and this grid can be turned into courses for various age groups.

In 1995, my team and I decided to create an outreach project that would use our research on functional programming to change the K-12 computer science curriculum. We had two different goals in mind. On the one hand, our curriculum should rely on mathematics to teach programming, and it should exploit programming to teach mathematics. All students—not just those who major in computer science—should benefit. On the other hand, our course should demonstrate that introductory programming can focus on program design, not just a specific syntax. We also wished to create a smooth path from a design-oriented introductory course all the way to courses on large software projects. My talk presents a checkpoint of our project, starting with our major scientific goal, a comprehensive theory of program design. Our work on this theory progresses through the development of program design courses for all age groups. At this point, we offer curricular materials for middle schools, high schools, three collegelevel freshman courses, and a junior-level course on constructing large components. We regularly use these materials to train K-12 teachers, after-school volunteers, and college faculty; thus far, we have reached hundreds of instructors, who in turn have dealt with thousands of students in their classrooms. Categories and Subject Descriptors D.1.0 [Programming Techniques]: General; K.3.0 [Computers and Education]: General General Terms Keywords

Design, Human Factors

Curriculum Design, Program Design

Designing programs for the benefit of all The TeachScheme! project employs functional programming as a vehicle to deliver two intertwined messages about the introductory programming curriculum. First, if the community wishes to enroll all students in a first course on programming and computing, the curriculum must benefit everyone, not just those who continue to program or those who become computer science majors. Second, even the first course on programming should demonstrate that good programming involves a systematic approach, which we call design. Simply put, a design-based programming curriculum can benefit everyone. Our starting point is the insight that programming can easily benefit all students if it aligns itself with K-12 mathematics in-

From middle school to college For the first ten years, we focused on outreach to high schools and on the college-level freshman course [4]. We trained teachers on how to use the first two parts of HtDP; the complete book (and more) was used to teach the freshman course at Rice University. By connecting the two levels explicitly, we hoped that this continuity would guarantee a smooth path into a full-fledged CS curriculum. Over time, however,it became clear that the project needed to expand in two directions: upstream and downstream. On the upstream side, students encounter computers in middle school (approx. grades 4 through 8), not just high school (ap-

Copyright is held by the author/owner(s). ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. ACM 978-1-60558-794-3/10/09.

129

prox. grades 9 through 12). They listen to their high school friends and siblings when they discuss programming in Python or Java. Conversely, middle school introduces the few algebraic concepts that are needed to understand functional programming. Introducing simple functional programming at this level can help teach essential mathematical ideas, such as function and variable, while simultaneously preparing the ground for full-fledged programming. It may also preempt students from adopting certain prejudices about programming that we see in so many high school students. With Emmanuel Schanzer (Harvard U.), we launched the Bootstrap project, an after-school program, that is staffed by volunteer teachers. The program currently works with students in some ten underserved neighborhoods across the US. Bootstrap provides the strongest evidence yet that teaching functional programming directly affects the mathematics skills and interests of K-12 students. On the downstream side, students must see how the design principles in HtDP apply to class-based, object-oriented languages such as Java. These languages are what students need for their first co-op or internship. At the same time, these languages do not come with algebraic data types; such forms of data are encoded. These languages do not use pattern matching to evaluate function calls; they rely on method dispatch instead. Last but not least, object-oriented languages support different means for abstracting over repeated patterns; indeed, many abstraction mechanisms are protocols that simulate features built into functional languages. To establish a bridge between the HtDP course and the mainstream languages that students encounter at work and downstream in conventional curricula, we created a course dubbed How to Design Classes. The purpose of the course is to demonstrate how the design principles from the functional world seamlessly apply to the object-oriented world. Next the course moves on to object-oriented means for abstracting code while retaining the design principles for abstraction in functional languages. Finally, the course introduces imperative-style programming with for and while loops but also explains why doing so violates object-oriented design. Beyond object-oriented programming, the typical computing curriculum offers two more chances to re-emphasize the messages of design and functional programming: a course on logic and a course on large-scale program development, often called software construction. At Northeastern, we have recently revamped the second-semester logic course [1]. In the past this course employed a rather conventional syllabus, focusing on logic as an exercise in studying formal systems and their meta-theorems. Now the course continues where the HtDP course leaves off—though with ACL2 as the programming language. Students are introduced to ACL2 as an alternative syntax to the teaching languages of the first-semester course. Then they learn to state and prove theorems about their programs. While they start with small functions and theorems, their final project is typically an interactive, graphical game. Students quickly learn that ACL2’s theorem prover verifies theorems easily about functions designed according to HtDP and chokes on “spaghetti” functions.1 At Northeastern and Northwestern, we have developed a course on How to Design Systems. The goal of the course is to remind junior-level students one more time of the design principles of HtDP and to demonstrate how these ideas apply at a large scale. While students choose their favorite programming language to implement a reasonably large system, we demonstrate how the design process of HtDP applies at that scale and to all languages. The key to the course is that the project specification changes on a weekly basis, growing from a short, one-paragraph statement into the description of a distributed system; we also rotate students from

one code base to another mid-semester. The students’ public design presentations routinely illustrate why a systematic design process is critically important when the project is large and when its specifications continuously change. Students often confirm the importance of this course with notes from their first positions in industry. Side effects The “!” in TeachScheme! is a pun. One interpretation suggests that the goal of the project is to teach Scheme. It isn’t, because the alternative explanation says that “!” is postfix notation for “not.” While we never had the intention of teaching plain Scheme, lab observations during our first year drove home the important point that no off-the-shelf programming language is suitable for novices. This insight forced us to develop an entire series of teaching programming languages as well as DrScheme, a pedagogical IDE [6], that supports these teaching languages. The decision to develop our own support software ensured our continued presence in the research community. To support the construction of a series of teaching languages, we developed a programming language for creating full-fledged, ready-to-use programming languages. Over the years, Racket [7], formerly known as PLT Scheme, has served as our platform to explore novel linguistic constructs and to contribute ideas to the functional programming community. In short, TeachScheme! created a virtuous cycle—our outreach projects inspire mostly functional research projects, and the results of the research assist our outreach projects. Acknowledgments Cormac Flanagan asked the right question at the right time; it got us started. Matthew Flatt, Shriram Krishnamurthi, and Bruce Duba immediately agreed to drop everything we were doing and to help launch the TeachScheme! project; without them, it would all have been a short daydream. Robby Findler knew what he was getting into when he joined a year later, and he came to build DrScheme anyway. Kathi Fisler had the courage to take over my workshops; her contributions have been critical to the survival of the TeachScheme! and Bootstrap workshops. To all the other members of PLT, thank you very much for your labor of love. Over the past 15 years, TeachScheme! and Bootstrap have received generous support from the Department of Education, the National Science Foundation, Cord, Exxon, Google, Jane Street, and Microsoft. References [1] C. Eastlund, D. Vaillancourt, and M. Felleisen. ACL2 for freshmen— first experiences. In Proc. 7th ACL2 Workshop, pages 200–211, 2007. [2] M. Felleisen and S. Krishnamurthi. Why computer science doesn’t matter. Commun. ACM, 52(7):37–40, 2009. [3] M. Felleisen, R. B. Findler, M. Flatt, and S. Krishnamurthi. How to Design Programs. MIT Press, 2001. [4] M. Felleisen, R. B. Findler, M. Flatt, and S. Krishnamurthi. The TeachScheme! project: Computing and programming for every student. Computer Science Education, 14:55–77, 2004. [5] M. Felleisen, R. B. Findler, M. Flatt, and S. Krishnamurthi. The structure and interpretation of the computer science curriculum. Journal of Functional Programming, 14(4):365–378, 2004. [6] R. B. Findler, J. Clements, C. Flanagan, M. Flatt, S. Krishnamurthi, P. Steckler, and M. Felleisen. DrScheme: A programming environment for Scheme. Journal of Functional Programming, 12(2):159–182, Mar. 2002. [7] M. Flatt and PLT. Reference: Racket. Technical report, PLT Inc., June 2010. http://racket-lang.org/tr1/. [8] R. L. Page. Software is discrete mathematics. In International Conference on Functional Programming, pages 79–86, 2003. [9] R. L. Page, C. Eastlund, and M. Felleisen. Functional programming and theorem proving for undergraduates. In Functional and Declarative Programming in Education, pages 21–30, 2008.

1 Also

see Rex Page’s Besseme project on functional programming in discrete mathematics [8] and on theorem provers in software engineering [9].

130

Higher-order Representation of Substructural Logics Karl Crary Carnegie Mellon University

Abstract

implements LF, in part due to unresolved complications that linearity creates in its metalogical apparatus. Consequently, Linear LF is not currently an option for those engaged in formalizing metatheory. Moreover, Linear LF does not give us any assistance with other substructural logics, such as affine, strict, or modal logic. Another option is to break with standard LF practice and model object-language contexts explicitly [6]. Explicit contexts can be reconciled with higher-order abstract syntax, thereby retaining many of the benefits of LF. Once contexts are explicit, it is easy to state inference rules that handle the context in an appropriate way for a substructural logic. However, the explicit context method is clumsy to work with and sacrifices some of the advantages of LF. For example, although substitution is still free (since the syntax of terms is unchanged), the substitution lemma is not. The explicit context method is typically used internally within a proof, rather than in the “official” formalization of a logic.

We present a technique for higher-order representation of substructural logics such as linear or modal logic. We show that such logics can be encoded in the (ordinary) Logical Framework, without any linear or modal extensions. Using this encoding, metatheoretic proofs about such logics can easily be developed in the Twelf proof assistant. Categories and Subject Descriptors rem Proving]: Deduction General Terms

I.2.3 [Deduction and Theo-

Languages.

Keywords Logical frameworks, linear logic, modal logic, mechanized metatheory.

1.

Introduction

In this paper we advocate a more general and workable approach in which we look at substructural logic from a slightly different perspective. Rather than viewing a substructural logic from the perspective of its contexts (that is, collections of assumptions), we suggest it is profitable to look at it from the perspective of its individual assumptions. The essence of linear logic is not that type-checking splits the context when it checks a (multiplicative) term with multiple subterms. The essence of linear logic is that an assumption is used exactly once. The latter property can be stated on an assumption-byassumption basis, without reference to contexts. Thus, wherever an assumption is introduced, as part of the typing rule that introduced it, we can check that that assumption is used linearly. Pfenning [13] proposed enforcing linearity using a metajudgement that traced the use of an assumption throughout a typing derivation. Avron, et al. [3] later used a similar approach for modal logic. Unfortunately, the meta-judgement approach is very awkward to use in practice. Also both were able to prove adequacy for their encodings, neither (so far as we are aware) proved any further results using their encodings. Fortunately, we need not use a meta-judgement. We observe that the proof terms alone are enough to track the use of restricted assumptions. There is no need to examine typing derivations, and therefore no need for a meta-judgement. The idea of linearity as a judgement over proof terms dates to the early days of LF. Avron et al. [1, 2] suggested that linearity can be expressed by imposing a lattice structure on proof terms and defining linear proof terms as those that are strict and distributive, when viewed as a function of their linear variables. In this paper, we suggest a simpler formulation of linearity, based on tracking variables through the proof terms of linear logic. This allows for a clean, practical definition of linearity. We express linear logic using two judgements, the usual typing judgement:

The Logical Framework (or LF) [8] provides a powerful and flexible framework for encoding deductive systems such as programming languages and logics. LF employs an elegant account of binding structure by identifying object-language variables with LF variables, object-language contexts with (fragments of) the LF context, and object-language binding occurrences with LF lambda abstraction. This account of binding, often called higher-order abstract syntax [15], automatically handles most operations that pertain to binding, including alpha-equivalence, substitution, and variablefreshness conventions [4]. Since the object-language context is maintained implicitly, as part of the built-in LF context, the structural properties of LF contexts (such as weakening and contraction) automatically apply to the object language as well. Ordinarily this is desirable, but it poses a problem for encoding substructural logics that do not possess those properties.1 For example, linear logics [7] (by design) satisfy neither weakening nor contraction, so it would seem that they cannot be encoded in LF. One solution to this problem is to extend LF with linear features. Linear LF [5] extends LF with linear assumptions and connectives. This provides the ability to encode linear logics. However, linearity has yet to be implemented in Twelf [16], the proof assistant that 1 Substructural

logics may be defined in various different ways. For our purposes, we define substructural logic to mean any logic in which it is not the case that every bound variable can be freely used, or not, throughout its scope.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

of : term -> tp -> type.

131

linear : (term -> term) -> type.

atomic lolli tensor with plus one zero top !

The judgement linear(λx.Mx ) should be read as “the variable x is used linearly (i.e., is used exactly once) in Mx .” In this paper, we illustrate the use of a substructural judgement (such as linear) in three settings: linear logic, dependently typed linear logic, and judgemental modal logic [14]. Many other substructural logics including affine logic and strict logic can be handled analogously. Some others, such as ordered logic [17, 18], cannot, because the rules of the logic make it impossible to handle assumptions independently. We briefly discuss the latter in Section 5. The full Twelf development can be found on-line at:2

: : : : : : : : :

atom -> tp. tp -> tp -> tp -> tp -> tp -> tp -> tp -> tp -> tp. tp. tp. tp -> tp.

In our discussion, we assume familiarity with the Logical Framework, and with linear and modal logic. Some familiarity with Twelf may also be helpful. The sections on adequacy are technical, but the remainder of the paper should be accessible to the casual practitioner. Throughout the paper, we will consider alpha-equivalent expressions to be identical. We will do so in both the object language and the meta-language.

llam lapp tpair lett

: : : :

pair pi1 pi2 in1 in2 case

: : : : : :

star leto

: :

any unit bang letb

: : : :

Linear Logic

We begin by representing the syntax of linear logic in the usual fashion. The LF encoding, with the standard on-paper notation written alongside it for reference, is shown in Figure 1. The type atom ranges over a fixed set of atomic propositions. On paper, we represent linear logic with the typing judgment Γ; ∆ ` M : A. In this, the first context, Γ, represents the unrestricted context (i.e., truth), and the second context, ∆, represents the linear context (i.e., resources). To simplify the notation, we adopt the convention that the linear context is unordered. Thus (∆, ∆0 ) refers to a context that can be split into two pieces ∆ and ∆0 that may possibly be interleaved. We also adopt the convention that all the variables appearing in either context must be distinct. The encoding of the static semantics, as discussed previously, is given by two judgements:

| | | | | | | |

tp. tp. tp. tp.

a A(A A⊗A A&A A+A 1 0 > !A

M ::=

term : type.

www.cs.cmu.edu/~crary/papers/2009/substruct.tar

2.

A ::=

tp : type.

and a linearity judgement:

x (term -> term) -> term. | λx.M term -> term -> term. | MM term -> term -> term. | M ⊗M term -> (term -> term -> term) -> term | let x ⊗ x = M in M term -> term -> term. | hM, M i term -> term. | π1 M term -> term. | π2 M term -> term. | in1 M term -> term. | in2 M term -> (term -> term) -> (term -> term) -> term. | case(M, x.M.x.M ) term. | ∗ term -> term -> term. | let ∗ = M in M term -> term. | any M term. | hi term -> term. | !M term -> (term -> term) -> term. | let !x = M in M Figure 1. Linear logic syntax

linear/var : linear ([x] x).

of : term -> tp -> type. linear : (term -> term) -> type.

The rule for unrestricted variables states that an unrestricted variable may be used provided there are no linear variables in scope:

We read “of M A” as “M is of type A,” and we read “linear ([x:term] M x)” as “x is used linearly in (M x).” Note that [x:term] is Twelf’s concrete syntax for LF lambda abstraction3 (λx:term.). Twelf can usually infer the domain type, leaving just [x]. We proceed rule-by-rule to show the encoding of the static semantics.

Γ(x) = A Γ; ` x : A As with linear variables, there is no typing rule for unrestricted variables in the encoding. There is also no linearity rule for unrestricted variables. Linear implication is:

Variables The rule for linear variables states that a linear variable may be used provided there are no other linear variables in scope:

The introduction rule for linear implication Γ; (∆, x:A) ` M : B Γ; ∆ ` λx.M : A ( B

Γ; x:A ` x : A

This is encoded using two rules:

There is no typing rule for variables in the encoding; that is handled automatically by higher-order representations. However, there is a linearity rule that states that x is linear in x:

of/llam : of (llam ([x] M x)) (lolli A B) <- ({x:term} of x A -> of (M x) B) <- linear ([x] M x).

2 The development checks under the latest Twelf build, available at twelf. plparty.org/wiki/Download. Some earlier versions contain a bug that prevent the development from checking. 3 Keep in mind the distinction between lambda abstraction in LF, which represents binding, and lambda abstraction in the object language (llam).

linear/llam : linear ([y] llam ([x] M y x)) <- ({x:term} linear ([y] M y x)).

132

Note that {x:term} is Twelf’s concrete syntax for the dependent function space (Πx:term.). Again, Twelf can usually infer the domain type, leaving just {x}. The typing rule has the usual typing premise, plus a second premise that requires that the argument be used linearly in the body. The linearity rule says that a variable y is linear in a function (llam ([x] M y x)) if it is linear in its body (M y x) for any choice of x. The elimination rule splits the linear context between the function and argument:

of/lett : of (lett <- of M <- ({x} -> <- ({y} <- ({x}

M ([x] [y] N x y)) C (tensor A B) of x A {y} of y B -> of (N x y) C) linear ([x] N x y)) linear ([y] N x y)).

linear/lett1 : linear ([z] lett (M z) ([x] [y] N x y)) <- linear ([z] M z).

Γ; ∆ ` M : A ( B Γ; ∆0 ` N : A Γ; (∆, ∆0 ) ` M N : B

linear/lett2 : linear ([z] lett M ([x] [y] N z x y)) <- ({x} {y} linear ([z] N z x y)).

This is encoded using three rules: of/lapp : of (lapp M N) B <- of M (lolli A B) <- of N A.

Additive conjunction split the context:

linear/lapp1 : linear ([x] lapp (M x) N) <- linear ([x] M x).

In the encoding, there is one linearity rule, requiring that linear variables be linear in both constituents of the pair:

Γ; ∆ ` M : A Γ; ∆ ` N : B Γ; ∆ ` hM, N i : A & B

of/pair : of (pair M N) (with A B) <- of M A <- of N B.

linear/lapp2 : linear ([x] lapp M (N x)) <- linear ([x] N x). The typing rule is standard. There are two linearity rules, one for each way a linear variable might be used. The first linearity rule says that x is linear in (lapp (M x) N) if it is linear in (M x) and does not appear in N. (Since implicitly bound meta-variables such as M and N are quantified on the outside, stating N without a dependency on x means that x cannot appear free in N.) The second linearity rule provides the symmetric case. Multiplicative conjunction

The introduction rule for “with” does not

linear/pair : linear ([x] pair (M x) (N x)) <- linear ([x] M x) <- linear ([x] N x). The elimination rules are straightforward: Γ; ∆ ` M : A & B Γ; ∆ ` π1 M : A

The introduction rule for tensor is:

Γ; ∆ ` M : A & B Γ; ∆ ` π2 M : B

Γ; ∆ ` M : A Γ; ∆0 ` N : B Γ; (∆, ∆0 ) ` M ⊗ N : A ⊗ B

of/pi1

: of (pi1 M) A <- of M (with A B).

This is encoded using three rules, in a similar fashion to function application:

of/pi2

: of (pi2 M) B <- of M (with A B).

of/tpair : of (tpair M N) (tensor A B) <- of M A <- of N B.

linear/pi1 : linear ([x] pi1 (M x)) <- linear ([x] M x). linear/pi2 : linear ([x] pi2 (M x)) <- linear ([x] M x).

linear/tpair1 : linear ([x] tpair (M x) N) <- linear ([x] M x).

Disjunction

Γ; ∆ ` M : A Γ; ∆ ` in1 M : A + B

linear/tpair2 : linear ([x] tpair M (N x)) <- linear ([x] N x). The elimination rule is: Γ; ∆ ` M : A ⊗ B

The introduction rules for plus are straightforward:

Γ; (∆0 , x:A, y:B) ` N : C

Γ; (∆, ∆0 ) ` let x ⊗ y = M in N : C

Γ; ∆ ` M : B Γ; ∆ ` in2 M : A + B

of/in1

: of (in1 M) (plus A B) <- of M A.

of/in2

: of (in2 M) (plus A B) <- of M B.

linear/in1 : linear ([x] in1 (M x)) <- linear ([x] M x).

In the encoding, the typing rule requires that x and y are linear in N. As in previous cases where the linear context is split, there are two linearity rules depending on whether a linear variable is used in the let-bound term or the body:

linear/in2 : linear ([x] in2 (M x)) <- linear ([x] M x).

133

The elimination rule splits the context into two pieces, one for the discriminant and one used by both arms:

of/star

: of star one.

Γ; ∆ ` M : A + B Γ; (∆ , x:A) ` N1 : C Γ; (∆0 , x:B) ` N2 : C

of/leto

: of (leto M N) C <- of M one <- of N C.

0

Γ; (∆, ∆0 ) ` case(M, x.N1 , x.N2 ) : C

linear/leto1 : linear ([x] leto (M x) N) <- linear ([x] M x).

In the encoding, the typing rule requires that each arm’s bound variable be used linearly. The linearity rules provide the two cases, one when the variable is used linearly in the discriminant, and one in which is it used linearly in both arms:

linear/leto2 : linear ([x] leto M (N x)) <- linear ([x] N x).

of/case : of (case M ([x] N1 x) ([x] N2 x)) C <- of M (plus A B) <- ({x} of x A -> of (N1 x) C) <- ({x} of x B -> of (N2 x) C) <- linear ([x] N1 x) <- linear ([x] N2 x).

The unit for “with”, >, is more interesting. It stands for an unknown collection of resources, and consequently has an introduction form but no elimination form: Γ; ∆ ` hi : > The encoding provides that any variable is linear in unit: of/unit

linear/case1 : linear ([y] case (M y) ([x] N1 x) ([x] N2 x)) <- linear ([y] M y).

linear/unit : linear ([x] unit). The unit for plus, 0, represents falsehood. Accordingly, it has an elimination form but no introduction form. The elimination form behaves a little bit like hi; any resources not used to prove 0 may be discarded: Γ; ∆ ` M : 0 Γ; (∆, ∆0 ) ` any M : C

linear/case2 : linear ([y] case M ([x] N1 y x) ([x] N2 y x)) <- ({x} linear ([y] N1 y x)) <- ({x} linear ([y] N2 y x)). Exponentiation The introduction rule for exponentiation requires that the linear context be empty:

In the encoding there are two linearity rules. A variable is linear in (any M ) if it is linear in M or if it does not appear in M at all:

Γ; ` M : A Γ; ` !M : !A

of/any

: of (any M) T <- of M zero.

In the encoding, this means there is no linearity rule, since variables cannot be linear in exponents:

linear/any1 : linear ([x] any (M x)) <- linear M.

of/bang : of (bang M) (! A) <- of M A.

linear/any2 : linear ([x] any M).

The elimination rule splits the context and adds the newly bound variable to the unrestricted context: Γ; ∆ ` M : !A (Γ, x:A); ∆0 ` N : C

Note that it is tempting but incorrect to simplify this to the single rule:

Γ; (∆, ∆0 ) ` let !x = M in N : C

linear/any-wrong : linear ([x] any (M x)).

In the encoding, the unrestricted nature of x is handled by not checking that x is linear in (N x). The linearity rules work in the usual fashion:

That rule would allow x to be used multiple times in (M x), which is not permitted. It would be tantamount to moving the entire linear context into the unrestricted context, rather than merely discarding any unused resources.

of/letb : of (letb M ([x] N x)) B <- of M (! A) <- ({x} of x A -> of (N x) B).

2.1

Adequacy

It seems intuitively clear that the preceding is a faithful representation of linear logic. We wish to go further and make the correspondence rigorous, following the adequacy argument of Harper et al. [8]. Adequacy establishes a isomorphism between the object language (linear logic in this case) and its encoding in LF. As usual, an isomorphism is a bijection that respects the relevant operations. For syntax, the only primitively meaningful operation is substitution. (Other operations are given by defined semantics.) Thus, an isomorphism for syntax is a bijective translation that respects substitution. Our translation for syntax (written p−q) is standard, so we will omit the obvious details of its definition and simply state its adequacy theorem for reference:

linear/letb1 : linear ([y] letb (M y) N) <- linear M. linear/letb2 : linear ([y] letb M ([x] N y x)) <- ({x} linear ([y] N y x)). Units

: of unit top.

The unit for tensor is 1: Γ; ∆ ` M : 1 Γ; ∆0 ` N : C Γ; ` ∗ : 1 Γ; (∆, ∆0 ) ` let ∗ = M in N : C

D EFINITION 2.1. Translation of variable sets is defined:

The encoding is straightforward, with no linearity rule for introduction since variables cannot be linear in ∗:

p{x1 , . . . , xn }q = x1 :term, . . . , xn :term

134

T HEOREM 2.6 (Semantic adequacy). There exists a bijection p−q between derivations of the judgement Γ; ∆ ` M : A and encoding structures for Γ; ∆ ` M : A.

T HEOREM 2.2 (Syntactic adequacy). 1. Let Type be the set of linear logic types. Then there exists a bijection p−q between Type and LF canonical forms P such that `LF P : tp. (Variables cannot appear within types, so there is no substitution to respect.) 2. Let S be a set of variables and let Term S be the set of linear logic terms whose free variables are contained in S. Then there exists a bijection p−q between Term S and LF canonical forms P such that pSq `LF P : term. Moreover, p−q respects substitution: p[M/x]N q = [pM q/x]pN q.

Proving adequacy is typically straightforward but tedious once it is stated correctly. The same is true here, but the tedium is a bit more pronounced because of the need to manipulate encoding structures, rather than just canonical forms. We give a few cases by way of example: Proof Sketch First, by induction on derivations, we construct the translation and show it is type correct.

For semantic adequacy, we wish to establish a bijective translation between typing derivations and LF canonical forms.4 The usual statement of adequacy for typing is something to the effect of:

•

Suppose ∇ is the derivation: Γ; x:A ` x : A

D EFINITION 2.3. Translation of contexts is defined: def

Then p∇q = (dx, {x 7→ linear/var}).

px1 :A1 , . . . , xn :An q = x1 :term, dx1 :of x1 pA1 q, . . . , xn :term, dxn :of xn pAn q

•

Suppose ∇ is the derivation:

N ON -T HEOREM 2.4. There exists a bijection between derivations of the judgement Γ ` M : A and LF canonical forms P such that pΓq `LF P : of pM q pAq.

Γ(x) = A Γ; ` x : A def

Then p∇q = (dx, ∅).

Unfortunately, this simple statement of adequacy does not work in the presence of linearity. Consider the judgement ; x:a ` hi ⊗ hi : > ⊗ >. It has two derivations, depending on which conjunct is chosen to consume the assumption: ; x:a ` hi : > ; ` hi : > ; x:a ` hi ⊗ hi : > ⊗ >

•

Suppose ∇ is the derivation: ∇. 1 .. . Γ; (∆, x:A) ` M : B Γ; ∆ ` λx.M : A ( B

; ` hi : > ; x:a ` hi : > ; x:a ` hi ⊗ hi : > ⊗ >

Let p∇1 q = (P1 , H1 ). By induction, (P1 , H1 ) is an encoding structure for Γ; (∆, x:A) ` M : B, so:

However, the LF type corresponding to that judgement, {x:term} of x (atomic a) -> of (tpair unit unit) (tensor top top)

pΓ, ∆q, x:term, dx:of x pAq `LF P1 : of pM q pBq and

contains only one canonical form, namely:

pDomain(Γ, ∆)q `LF H1 (x) : linear ([x] pM q)

[x:term] [dx:of x (atomic a)] of/tpair of/unit of/unit

Therefore: pΓ; ∆q `LF of/llam (H1 (x)) ([x] [dx] P1 ) : of (llam ([x] pM q)) (lolli pAq pBq)

So linear-logic typing derivations are not in bijection with the LF encoding of typing in general. Our isomorphism must take linearity into account, and not only where linearity is a premise of a typing rule. Consequently, we establish a correspondence between each linear-logic typing derivation on the one hand, and an LF proof of typing paired with a collection of LF proofs of linearity on the other. Alas, this is notationally awkward when compared with the usual adequacy theorem.

So let p∇q = (of/llam (H1 (x)) ([x] [dx] P1 ), H) def

where for each y in Domain(∆), H(y) = linear/llam ([x] H1 (y)).

D EFINITION 2.5. An encoding structure for Γ; ∆ ` M : A is a pair (P, H) of an LF canonical form P and a finite mapping H from variables to LF canonical forms, such that:

•

• pΓ, ∆q `LF P : of pM q pAq, and • Domain(H) = Domain(∆), and • For each variable y in Domain(∆),

Suppose ∇ is the derivation: ∇. 1 ∇. 2 .. .. . . Γ; ∆1 ` M : A ( B Γ; ∆2 ` N : A Γ; (∆1 , ∆2 ) ` M N : B

pSy q `LF H(y) : linear ([y:term] pM q), where Sy = Domain(Γ, ∆) \ {y}.

Let p∇1 q = (P1 , H1 ) and let p∇2 q = (P2 , H2 ). By induction (P1 , H1 ) is an encoding structure for Γ; ∆1 ` M : A ( B and (P2 , H2 ) is an encoding structure for Γ; ∆2 ` N : A. Let y ∈ Domain(∆1 , ∆2 ) be arbitrary. Let S = Domain(Γ) and Si = Domain(∆i ). Then either y ∈ S1 and y 6∈ S2 or vice versa. Suppose the former. Then:

4 That

is, we view typing derivations as having no operations to respect. Harper et al. suggest that substitution of derivations for assumptions is a meaningful operation on typing derivations, and prove that their translation respects such substitutions. This could be done in our setting as well. However, we take the view that when substituting derivations for assumptions, we care only that the resulting derivation exists (this being the standard substitution lemma), and not about the identity of that resulting derivation.

135

2.2

To demonstrate the practicality of our encoding, we proved the subject reduction theorem in Twelf. We give the definition of reduction in Figure 2. Reduction is encoded with the judgement:

pS ∪ S1 \ {y}q `LF H1 (y) : linear ([y] pM q) Also, since y 6∈ Domain(∆2 ), y is not free in N or (consequently) in pN q. Therefore:

reduce : term -> term -> type.

pS ∪ S1 ∪ S2 \ {y}q `LF linear/lapp1 (H1 (y)) : linear ([y] lapp pM q pN q)

We will not discuss the encoding of reduction and its adequacy, as they are standard. We prove subject reduction by a series of four metatheorems. To make the development more accessible to readers not familiar with Twelf’s logic programming notation for proofs, we give those metatheorems in English.

The other case is symmetric. So let p∇q = (of/lapp P2 P1 , H), where for each y in Domain(∆1 , ∆2 ), linear/lapp1 (H1 (y)) (if y ∈ S1 ) def H(y) = linear/lapp2 (H2 (y)) (if y ∈ S2 ) •

L EMMA 2.8 (Composition of linearity). Suppose the ambient context is made up of bindings of the form x:term (and other bindings not subordinate6 to linear). If linear ([x] M1 x) and linear ([x] M2 x) are derivable, then linear ([x] M1 (M2 x)) is derivable.

Et cetera. It remains to show that p−q is a bijection. To do so, we exhibit an inverse x−y. The interesting cases are those that split the context. We give the application case as an example. Suppose (of/lapp P02 P01 , H 0 ) is an encoding structure for Γ; ∆ ` O : B 0 . Then O has the form M 0 N 0 , and pΓ; ∆q `LF P01 : of pM 0 q pA0 ( B 0 q, and pΓ; ∆q `LF P02 : of pN 0 q pA0 q.

The next lemma is usually glossed over in proofs on paper: L EMMA 2.9 (Reduction of closed terms). Suppose the ambient context is made up of bindings of the form x:term (and other bindings not subordinate to reduce). If ({x:term} reduce M1 (M2 x)) is derivable, then there exists M2’:term such that M2 = ([_] M2’).

We must sort ∆ into two pieces. Define: ∆1 ∆2 H10 H20

Metatheory

= {(y:C) ∈ ∆ | ∃R.H 0 (y) = linear/lapp1 R} = {(y:C) ∈ ∆ | ∃R.H 0 (y) = linear/lapp2 R} = {y 7→ R | H 0 (y) = linear/lapp1 R} = {y 7→ R | H 0 (y) = linear/lapp2 R}

L EMMA 2.10 (Subject reduction for linear). Suppose the ambient context is made up of bindings of the form x:term,dx:of x A (and other bindings not subordinate to reduce or of). If ({x} reduce (M x) (M’ x)) and ({x} of x A -> of (M x) B) and linear ([x] M x) are derivable, then linear ([x] M’ x) is derivable.

Note that ∆ = ∆1 , ∆2 . Also note that, by the definition of ∆1 and ∆2 , no variable in ∆1 appears free in N 0 or vice versa. Therefore it is easy to show that no assumption in p∆1 q appears free in P02 and vice versa. Hence5 pΓ; ∆1 q `LF P01 : of pM 0 q pA0 ( B 0 q and pΓ; ∆2 q `LF P02 : of pN 0 q pA0 q. Also, Domain(Hi0 ) = Domain(∆i ). Therefore (P01 , H10 ) is an encoding structure for Γ; ∆1 ` M 0 : A0 ( B 0 and (P02 , H20 ) is an encoding structure for Γ; ∆2 ` N 0 : A0 . Let ∇i = x(P0i , Hi0 )y. Then ∇1 is a derivation of Γ; ∆1 ` M 0 : A0 ( B 0 and ∇2 is a derivation of Γ; ∆2 ` N 0 : A0 . So let x(of/lapp P02 P01 , H 0 )y be the derivation:

Proof Sketch By induction on the first derivation. Cases involving substitution (most of the beta-reduction cases) use Lemma 2.8. Multiple-subterm compatibility cases use Lemma 2.9 to show that reduction of subterms not mentioning a linear variable will not create such a reference. T HEOREM 2.11 (Subject reduction for of). Suppose the ambient context is made up of bindings of the form x:term,dx:of x A (and other bindings not subordinate to reduce or of). If reduce M M’ and of M T are derivable, then of M’ T is derivable.

∇. 1 ∇. 2 .. .. . . Γ; ∆1 ` M 0 : A0 ( B 0 Γ; ∆2 ` N 0 : A0 Γ; (∆1 , ∆2 ) ` M 0 N 0 : B 0

Proof Sketch By induction on the first derivation. Cases with linearity premises (reduce/llam, reduce/lett, and reduce/case) use Lemma 2.10 to show that the linearity premises are preserved by reduction.

We can show, by induction over LF canonical forms, that x−y is fully defined over encoding structures. It is easy to verify that p−q and x−y are inverses. Therefore p−q is bijective.

C OROLLARY 2.12. If Γ; ∆ ` M : A and M −→ M 0 then Γ; ∆ ` M 0 : A.

When the linear context is empty, the H portion of an encoding structure is empty, and we recover the usual notion of adequacy:

Proof Immediate from Subject Reduction and Adequacy.

C OROLLARY 2.7. There exists a bijection between derivations of the judgement Γ; ` M : A and LF canonical forms P such that pΓq ` P : of pM q pAq.

6 “Subordinate”

is a term of art in Twelf. Informally, s is subordinate to t if s can contribute to t. More precisely, a type family s is subordinate to an type family t if there exist types S and T belonging to s and t such that objects of type S can appear within objects of type T [20]. If s is not subordinate to t, then assumptions whose types belong to s can be ignored while considering t.

5 This

fact, that non-appearing variables may be omitted from the context, requires a strengthening lemma for LF that is proved by Harper and Pfenning [9, Theorem 6.6].

136

(λx.M )N −→ [N/x]M

let x ⊗ y = M ⊗ N in O −→ [M, N/x, y]O

case(in1 M, x.N1 , x.N2 ) −→ [M/x]N1

case(in2 M, x.N1 , x.N2 ) −→ [M/x]N2 0

M −→ M 0 N −→ N 0 let x ⊗ y = M in N −→ let x ⊗ y = M 0 in N 0 M −→ M 0 in1 M −→ in1 M 0

M −→ M 0 in2 M −→ in2 M 0 M −→ M 0 any M −→ any M 0

0

M −→ M λx.M −→ λx.M 0

let !x = !M in N −→ [M/x]N

π1 hM, N i −→ M

M −→ M N −→ N M N −→ M 0 N 0

M −→ M 0 N −→ N 0 hM, N i −→ hM 0 , N 0 i

let ∗ = ∗ in M −→ M M −→ M 0 N −→ N 0 M ⊗ N −→ M 0 ⊗ N 0

M −→ M 0 π1 M −→ π1 M 0

M −→ M 0 N1 −→ N10 N2 −→ N20 case(M, x.N1 , x.N2 ) −→ case(M 0 , x.N10 , x.N20 ) M −→ M 0 !M −→ !M 0

0

π2 hM, N i −→ N

M −→ M 0 π2 M −→ π2 M 0

M −→ M 0 N −→ N 0 let ∗ = M in N −→ let ∗ = M 0 in N 0

M −→ M 0 N −→ N 0 let !x = M in N −→ let !x = M 0 in N 0

M −→ M

Figure 2. Linear logic reduction tp : type. A ::= ... atomic : atom -> tp. | const : constant -> term -> tp. | pi : tp -> (term -> tp) -> tp. |

We add a new judgement unrest that applies to unrestricted variables. We extend that judgement to terms by saying that a term is unrestricted if all its free variables are unrestricted:

··· a c(M ) Πx:A.B

unrest : term -> type.

term : type. M ::= ... ··· ulam : (term -> term) -> term. | λ! x.M uapp : term -> term -> term. | M @M

unrest/llam : unrest (llam ([x] M x)) <- ({x} unrest x -> unrest (M x)). unrest/lapp : unrest (lapp M N) <- unrest M <- unrest N. ...

Figure 3. Linear logic syntax (dependently typed)

3.

Note that, within the unrest judgement, all bound variables are taken to be unrestricted, even linear ones. Only unrestricted terms are permitted to serve as the parameter to a constant. On paper, this is written c : A → type Γ; ` M : A Γ ` c(M ) type

Dependently Typed Linear Logic

Adding dependent types to linear logic is straightforward syntactically. The revised syntax is shown in Figure 3. We delete atomic propositions, and replace them with constants that take a single term parameter. (That parameter may be a unit or tuple, which provides implicit support for zero or multiple parameters.) In the static semantics, a new wrinkle arises. Now that terms can appear within types, the typing rules must ensure that linear variables are not used within types. However, a variable might appear within a term’s type without appearing in the term itself. This is obvious because our lambda abstractions are unlabelled, but it would still be the case even if all bindings were labeled with types. This is because of the equivalence rule:

where we assume some pre-specified collection of axioms of the form c : A→type. In our encoding, the well-formedness judgement for types is wf : tp -> type. The constant rule is written: wf/const : wf (const C M) <- cparam C A <- of M A <- unrest M.

Γ; ∆ ` M : A Γ ` A0 type A ≡β A0

We assume there exists a unique cparam rule for each axiom c : A → type. The remaining wf rules are uninteresting (but note that the rule for pi introduces an unrestricted variable). Our existing typing rules must be altered in two ways. First, now that types can be ill-formed, several rules must add a wf premise. This is straightforward. Second, the rules for the exponential must be rewritten to use the unrest judgement:

Γ; ∆ ` M : A0 Using the equivalence rule, a term’s type can mention any variable in scope. Therefore, we must enforce the rule’s requirement that no linear variables appear in ∆0 . A linearity judgement on terms alone will not suffice. One solution to this problem is to make linearity a judgement over typing derivations, rather than over proof terms. However, that would make linearity a dependently typed meta-judgement, which would be too cumbersome to work with in practice. It is better to maintain linear as a judgement over proof terms. Instead, we change our view of unrestricted variables. In nondependently typed linear logic, we viewed unrestrictedness as merely the absence of a linearity restriction. Now we will view unrestrictedness as conferring an affirmative capability; specifically, the capability to appear within types.

of/bang : of (bang M) (! A) <- of M A <- unrest M. of/letb : of (letb M ([x] N x)) B <- of M (! A) <- ({x} of x A -> unrest x -> of (N x) B) <- wf B.

137

and we alter the first clause of the definition of encoding structures to read:

We also have the new rules for unrestricted functions and application: Γ ` A type (Γ, x:A); ∆ ` M : B

ppΓqq, p∆q `LF P : of pM q pAq

!

Γ; ∆ ` (λ x.M ) : Πx:A.B

Third, we state adequacy for typing and for well-formedness of types simultaneously:

Γ; ∆ ` M : Πx:A.B Γ; ` N : A Γ; ∆ ` M @ N : [N/x]B

T HEOREM 3.3 (Semantic adequacy). 1. There exists a bijection p−q between derivations of the judgement Γ; ∆ ` M : A and encoding structures for Γ; ∆ ` M : A. 2. There exists a bijection p−q between derivations of the judgement Γ ` A type and LF canonical forms P such that ppΓqq `LF P : wf pAq.

of/ulam : of (ulam ([x] M x)) (pi A ([x] B x)) <- wf A <- ({x} of x A -> unrest x -> of (M x) (B x)). of/uapp : of (uapp M N) (B N) <- of M (pi A ([x] B x)) <- of N A <- unrest N.

Fourth, we state a new lemma to deal with unrest derivations: L EMMA 3.4. 1. Suppose Γ; ∆ ` M : A. Then there exists a unique LF canonical form P such that ppΓ, ∆qq `LF P : unrest pM q. 2. Suppose there exists an LF canonical form P such that ppΓqq, p∆q `LF P : unrest pM q. Then no variable in Domain(∆) appears free in M . 3. Suppose there exists an LF canonical form P such that ppΓqq, p∆q `LF P : wf pAq. Then no variable in Domain(∆) appears free in A.

linear/ulam : linear ([y] ulam ([x] M y x)) <- ({x} linear ([y] M y x)). linear/uapp : linear ([x] uapp (M x) N) <- linear ([x] M x). And finally equivalence: of/equiv : of M A’ <- of M A <- wf A’ <- equiv A A’.

We give the adequacy case for unrestricted application to illustrate how Lemma 3.4 is used. Proof Sketch of Theorem 3.3 Suppose ∇ is the derivation:

The addition of dependent types complicates the proof of subject reduction in a number of ways, but nearly all are orthogonal to linearity. One issue that does relate to linearity is we require one additional lemma to show that unrestrictedness is preserved by reduction:

∇. 1 ∇. 2 .. .. . . Γ; ∆ ` M : Πx:A.B Γ; ` N : A Γ; ∆ ` M @ N : [N/x]B

L EMMA 3.1 (Subject reduction for unrest). Suppose the ambient context is made up of bindings of the form x:term,ex:unrest x and bindings of the form x:term (and other bindings not subordinate to reduce or unrest). If reduce M M’ and unrest M are derivable, then unrest M’ is derivable.

Let p∇1 q = (P1 , H1 ) and let p∇2 q = (P2 , H2 ). By induction (P1 , H1 ) is an encoding structure for Γ; ∆ ` M : Πx:A.B and (P2 , H2 ) is an encoding structure for Γ; ` N : A. By Lemma 3.4, there exists a unique Q such that ppΓqq `LF def unrest pN q. So let p∇q = (of/uapp Q P2 P1 , H), where for each y in Domain(∆), H(y) = linear/uapp (H1 (y)). As an example of the definition of the inverse, suppose (of/uapp Q0 P02 P01 , H 0 ) is an encoding structure for Γ; ∆ ` O : C. Then O has the form M 0 @ N 0 and C has the form [N 0 /x]B 0 . Also, ppΓqq, p∆q `LF P01 : of pM 0 q pΠx:A0 .B 0 q, and ppΓqq, p∆q `LF P02 : of pN 0 q pA0 q, and ppΓqq, p∆q `LF Q0 : unrest pN 0 q. Let H10 = {y 7→ R|H 0 (y) = linear/uapp R}. Then (P01 , H10 ) is an encoding structure for Γ; ∆ ` M 0 : Πx:A0 .B 0 . Let ∇01 = x(P01 , H10 )y. By Lemma 3.4, no variable in Domain(∆) appears free in N 0 . (In this case—but not in some others—this fact could also be ascertained by inspection of H 0 .) Therefore, ppΓqq `LF P02 : of pN 0 q pA0 q. Consequently, (P02 , ∅) is an encoding structure for Γ; ` N 0 : A0 . Let ∇02 = x(P02 , ∅)y. Then let x(of/uapp Q0 P02 P01 , H 0 )y be the derivation:

3.1

Adequacy

Adequacy for dependently typed linear logic proceeds in much the same fashion as before. We must make four changes. First, we revise syntactic adequacy of types, now that types are not closed: T HEOREM 3.2 (Syntactic adequacy). 1. Let S be a set of variables and let Type S be the set of linear logic types whose free variables are contained in S. Then there exists a bijection p−q between Type S and LF canonical forms P such that pSq `LF P : tp. Moreover, p−q respects substitution: p[M/x]Aq = [pM q/x]pAq. 2. Let S be a set of variables and let Term S be the set of linear logic terms whose free variables are contained in S. Then there exists a bijection p−q between Term S and LF canonical forms P such that pSq `LF P : term. Moreover, p−q respects substitution: p[M/x]N q = [pM q/x]pN q.

∇. 01 ∇. 02 .. .. . . 0 0 0 Γ; ∆ ` M : Πx:A .B Γ; ` N 0 : A0 Γ; ∆ ` M 0 @ N 0 : [N 0 /x]B 0

Second, we define a translation for unrestricted contexts: ppx1 :A1 , . . . , xn :An qq = x1 :term, dx1 :of x1 pA1 q, ex1 :unrest x1 . . . , xn :term, dxn :of xn pAn q,exn :unrest xn

138

A ::=

tp : type.

lam app bx letbx

: : : :

Implication

a | A→A | A

atomic : atom -> tp. arrow : tp -> tp -> tp. box : tp -> tp. term : type.

local/closed : local ([x] M). Γ; (∆, x:A) ` M : B Γ; ∆ ` λx.M : A → B This is encoded using two rules, reminiscent of the ones for linear implication:

M ::=

x (term -> term) -> term. | λx.M term -> term -> term. | MM term -> term. | box M term -> (term -> term) -> term. | let box x = M in M

of/lam

: of (lam ([x] M x)) (arrow A B) <- ({x} of x A -> of (M x) B) <- local ([x] M x).

local/lam : local ([y] lam ([x] M y x)) <- ({x} local ([y] M y x)).

Figure 4. Modal logic syntax

The function’s argument is a truth assumption, so it must be used locally in the body. The elimination rule for implication is straightforward: Γ; ∆ ` M : A → B Γ; ∆ ` N : A Γ; ∆ ` M N : B

0

Since Q is uniquely determined by Lemma 3.4, it is easy to verify that p−q and x−y are inverses.

4.

The introduction rule for implication is:

Modal Logic

of/app

There are (at least) two ways to specify modal logic. One is using an explicit notion of Kripke worlds and accessibility [19]. Such a formulation does not behave as a substructural logic (in that all assumptions are available throughout their scope) and can be encoded in LF without difficulty [3, 10]. A second, which we consider here, is judgemental modal logic [14]. Judgemental modal logic distinguishes between two sorts of assumption, truth and validity. Although judgemental modal logic has no explicit notion of Kripke worlds, one can think of truth as applying to only the current world, and validity as applying to all worlds. Consequently, the introduction rule for A, which internalizes validity, must require that no truth assumptions are used. This is accomplished with the rule: Γ; ` M : A Γ; ∆ ` box M : A

: of (app M N) B <- of M (arrow A B) <- of N A.

local/app : local ([x] app (M x) (N x)) <- local ([x] M x) <- local ([x] N x). Necessity

Recall the introduction rule for necessity: Γ; ` M : A Γ; ∆ ` box M : A

This is encoded with the single rule: of/bx : of (bx M) (box A) <- of M A. The important thing here is the absence of any locality rule for bx. The only way to show that a variable is local in (bx M) is using the local/closed rule, which requires that the variable not appear in M, as desired. The elimination rule for necessity is:

Here, Γ is the validity context and ∆ is the truth context. Whatever truth assumptions exist are discarded while type checking M . Since assumptions in ∆ are unavailable in M despite being in scope, judgemental modal logic behaves as a substructural logic. We express this restriction using a judgement reminiscent of linear, indicating that an assumption is used locally to the current world:

Γ; ∆ ` M : A (Γ, x:A); ∆ ` N : C Γ; ∆ ` let box x = M in N : C This is encoded using two rules:

local : (term -> term) -> type.

of/letbx : of (letbx M ([x] N x)) B <- of M (box A) <- ({x} of x A -> of (N x) B).

The judgement local([x] Mx ) should be read as “the variable x is used locally (i.e., not within boxes) in Mx .” The syntax of modal logic is given in Figure 4. In the interest of brevity, we omit discussion of the possibility modality here. A treatment of possibility appears in the full Twelf development.

local/letbx : local ([x] letbx (M x) ([y] N x y)) <- local ([x] M x) <- ({y} local ([x] N x y)).

Variables The rules for variables allow the use of any variable in the context: Γ(x) = A ∆(x) = A Γ; ∆ ` x : A Γ; ∆ ` x : A

Since the variable introduced by letbx is a validity assumption, we do not check that it is local in the body.

As usual, there is no typing rule for variables in the encoding, but there are two locality rules. First, x is local in x:

Metatheory Subject reduction for modal logic follows the same development as for linear logic in Section 2.2, with local standing in for linear. One lemma must be generalized: since local variables can appear multiple times in modal logic, composition of locality must allow the local variable to appear (locally) in the scope of substitution (M1 below), as well as in the substitutend (M2 below):

local/var : local ([x] x). Second, we wish to say that x is local in every variable (truth or validity) other than x. The easiest way to express this is to generalize to all terms M that do not contain x:

139

An encoding structure for Γ; ∆ ` M : A is a nonempty equivalence class (under ∼ =) of LF canonical forms P such that:

L EMMA 4.1 (Composition of locality). Suppose the ambient context is made up of bindings of the form x:term (and other bindings not subordinate to local). If ({y} local ([x] M1 x y)) and ({x} local ([y] M1 x y) and local ([x] M2 x) are derivable, then local ([x] M1 x (M2 x)) is derivable. 4.1

• pΓ, ∆q `LF P : of pP q pAq, and • For every y in Domain(∆), there exists an LF canonical form

Qy such that pSy q `LF Qy : local ([y:term] pM q), where Sy = Domain(Γ, ∆) \ {y}.

Adequacy

Syntactic adequacy for modal logic is again standard:

Observe that since the issue in modal logic is too many locality derivations (in contrast to linear logic where it was too few), we have no need to make a mapping from variables to locality derivations an explicit component of the encoding structure. Instead, it is convenient simply to quantify them existentially, as above.

D EFINITION 4.2. Translation of variable sets is defined: p{x1 , . . . , xn }q = x1 :term, . . . , xn :term T HEOREM 4.3 (Syntactic adequacy).

T HEOREM 4.6 (Semantic adequacy). There exists a bijection between derivations of the judgement Γ; ∆ ` M : A and encoding structures for Γ; ∆ ` M : A.

1. Let Type be the set of modal logic types. Then there exists a bijection p−q between Type and LF canonical forms P such that `LF P : tp. (Variables cannot appear within types, so there is no substitution to respect.) 2. Let S be a set of variables and let Term S be the set of modal logic terms whose free variables are contained in S. Then there exists a bijection p−q between Term S and LF canonical forms P such that pSq `LF P : term. Moreover, p−q respects substitution: p[M/x]N q = [pM q/x]pN q.

Proof Sketch We give one case in each direction, by way of example. Suppose ∇ is the derivation: ∇. 1 .. . Γ; (∆, x:A) ` M : B Γ; ∆ ` λx.M : A → B

Semantic adequacy again encounters a challenge; this time the opposite problem from the one we saw with linear logic. In the encoding of linear logic there were too few typing derivations; here there are too many. The problem lies in the local judgement. Unlike linear, which expressed a property that could be satisfied in many ways, local expresses a fact that essentially can be satisfied in only one way, by the variable not appearing in any boxes. In this regard, local is more like unrest than linear. However, unlike unrest, derivations of local are not unique. The problem stems from the fact that the local/closed rule can apply to terms that also have another rule. For example, suppose M and N are closed terms. Then local ([x] app M N) has at least two derivations: local/closed and (local/app local/closed local/closed). One solution to the problem would be to restrict local/closed to variables (and add another rule for closed boxes). This would ensure that local derivations are unique (like unrest derivations). We could impose the restriction by creating a judgement (say, var) to identify variables, and then rewrite the local/closed rule as:

Let p∇1 q = P1 . By induction, P1 is an encoding structure for Γ; (∆, x:A) ` M : B, so: pΓ, ∆q, x:term, dx:of x pAq `LF P1 : of pM q pBq and, for every y ∈ Domain(∆, x:A), there exists a Qy such that: pDomain(Γ, ∆, x:A) \ {y}q `LF Qy : local ([y:term] pM q) In particular, x ∈ Domain(∆, x:A), so: pDomain(Γ, ∆)q `LF Qx : local ([x:term] pM q) Therefore: pΓ, ∆q `LF of/lam Qx P1 : of pλx.M q pA → Bq Also, for every y ∈ Domain(∆), pDomain(Γ, ∆) \ {y}q `LF local/lam ([x:term] Qy ) : local ([y:term] pλx.M q)

local/closed-varonly : local ([y] X) <- var X.

So let p∇q be the equivalence class containing of/lam Qx P1 , which is an encoding structure for Γ; ∆ ` λx.M : A → B. As an example of the definition of the inverse, suppose (of/bx P0 ) belongs to an encoding structure for Γ; ∆ ` O : C. Then O has the form box M 0 , and C has the form A0 . Also, pΓ, ∆q `LF P0 : of pM 0 q pA0 q. Further, for every y in Domain(∆), there exists Qy such that pSy q `LF Qy : local ([y:term] bx pM 0 q), where Sy = Domain(Γ, ∆) \ {y}. Each Qy must be local/closed, so no y in Domain(∆) appears in M 0 . Therefore pΓq `LF P0 : of pM 0 q pA0 q. The second criterion of encoding structures is vacuously satisfied for an empty truth context, so P0 belongs to an encoding structure for Γ; ` M 0 : A0 . Let ∇0 = xP0 y. Then let xof/bx P0 y be the derivation: ∇. 0 .. . Γ; ` M 0 : A0 Γ; ∆ ` box M 0 : A0

However, this solution has a significant shortcoming; the substitution lemma would no longer be a free consequence of higher-order representation. Under such a regime, variable assumptions would take the form ({x:term} of x A -> var x -> ...whatever...) Consequently, we would only obtain substitution for free when the substitutend possesses a var derivation; that is, when the substitutend is another variable. The general substitution lemma would have to be proved and used explicitly. A better solution is to rephrase adequacy to quotient out the excess derivations: D EFINITION 4.4. Translation of contexts is defined: px1 :A1 , . . . , xn :An q = x1 :term, dx1 :of x1 pA1 q, . . . , xn :term, dxn :of xn pAn q D EFINITION 4.5. Let ∼ be the least congruence over LF canonical = forms such that P ∼ = P0 for any P, P0 : local F (where F : term -> term).

140

Suppose of/bx P0 ∼ = of/bx P00 . Then P0 ∼ = P00 . By induction, xP0 y = xP00 y, so xof/bx P0 y = xof/bx P00 y. It is easy to verify that, for appropriate ∇ and P, xp∇qy = ∇ and pxPyq ∼ = P. Therefore p−q and x−y are inverses.

5.

[3] Arnon Avron, Furio Honsell, Marino Miculan, and Cristian Paravano. Encoding modal logics in logical frameworks. Studia Logica, 60(1), January 1998. [4] Brian Aydemir, Arthur Chargu´eraud, Benjamin C. Pierce, Randy Pollack, and Stephanie Weirich. Engineering formal metatheory. In Thirty-Fifth ACM Symposium on Principles of Programming Languages, San Francisco, California, January 2008.

Conclusion

The Logical Framework is not only (nor even primarily) a type theory. More importantly, it is a methodology for representing deductive systems using higher-order representation of syntax and semantics, and a rigorous account of adequacy. Where applicable, the LF methodology provides a powerful and elegant tool for formalizing programming languages and logics. There are two reasons it might not apply. First, limitations of existing tools for LF, such as Twelf, might prevent one from carrying out the desired proofs once a system were encoded in LF. Second, there might be an inherent problem representing the desired deductive system adequately using a higher-order representation. When a language cannot be cleanly represented in a higher-order fashion, it often indicates that something about the language is suspect, such as an incorrect (or at least nonstandard) notion of binding and/or scope. In some cases, however, languages with unconventional notions of binding or scope are nevertheless sensible. Substructural logics are probably the most important example. In this paper, we show that many substructural logics can be given a clean higher-order representation by isolating its “substructuralness” (e.g., linearity or locality) and expressing that as a judgement over proof terms. Our strategy applies to other substructural logics as well. For example, affine logic and strict logic can each be encoded along very similar lines as linear logic. We conjecture that contextual modal logic [11] is encodable along similar lines as judgemental modal logic. This is a good avenue for future work. The logic of bunched implications [12] is another. On the other hand, since our method relies on enforcing “substructuralness” on an assumption-by-assumption basis, there are some substructural logics it does not support, such as ordered logic [17, 18]. In ordered logic, the context is taken to be ordered and assumptions must be processed in order. It appears that we cannot enforce this restriction on assumptions independently, as the very nature of the restriction is that assumptions are not independent. The usability of one assumption can depend on the disposition of every other assumption in scope.

References [1] Arnon Avron, Furio Honsell, and Ian A. Mason. Using typed lambda calculus to implement formal systems on a machine. Technical Report ECS-LFCS-87-31, Department of Computer Science, University of Edinburgh, July 1987. [2] Arnon Avron, Furio Honsell, and Ian A. Mason. An overview of the Edinburgh Logical Framework. In Graham Birtwistle and P. A. Subrahmanyam, editors, Current Trends in Hardware Verification and Automated Theorem Proving. Springer, 1989.

[5] Iliano Cervesato and Frank Pfenning. A linear logical framework. In Eleventh IEEE Symposium on Logic in Computer Science, pages 264– 275, New Brunswick, New Jersey, July 1996. [6] Karl Crary. Explicit contexts in LF. In Workshop on Logical Frameworks and Meta-Languages: Theory and Practice, Pittsburgh, Pennsylvania, 2008. Revised version at www.cs.cmu.edu/~crary/ papers/2009/excon-rev.pdf. [7] Jean-Yves Girard. Linear logic. Theoretical Computer Science, 50:1– 102, 1987. [8] Robert Harper, Furio Honsell, and Gordon Plotkin. A framework for defining logics. Journal of the ACM, 40(1):143–184, January 1993. [9] Robert Harper and Frank Pfenning. On equivalence and canonical forms in the LF type theory. ACM Transactions on Computational Logic, 6(1), 2005. [10] Tom Murphy, VII. Modal Types for Mobile Code. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania, May 2008. [11] Aleksandar Nanevski, Frank Pfenning, and Brigitte Pientka. A contextual modal type theory. ACM Transactions on Computational Logic, 9(3), 2008. [12] Peter W. O’Hearn and David J. Pym. The logic of bunched implications. Bulletin of Symbolic Logic, 5(2), 1999. [13] Frank Pfenning. Structural cut elimination in linear logic. Technical Report CMU-CS-94-222, Carnegie Mellon University, School of Computer Science, December 1994. [14] Frank Pfenning and Rowan Davies. A judgmental reconstruction of modal logic. Mathematical Structures in Computer Science, 11(4):511–540, 2001. [15] Frank Pfenning and Conal Elliott. Higher-order abstract syntax. In 1988 SIGPLAN Conference on Programming Language Design and Implementation, pages 199–208, Atlanta, Georgia, June 1988. [16] Frank Pfenning and Carsten Sch¨urmann. Twelf User’s Guide, Version 1.4, 2002. Available electronically at http://www.cs.cmu.edu/ ~twelf. [17] Jeff Polakow. Ordered Linear Logic and Applications. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania, August 2001. [18] Jeff Polakow and Frank Pfenning. Natural deduction for intuitionistic non-commutative linear logic. In 1999 International Conference on Typed Lambda Calculi and Applications, volume 1581 of Lecture Notes in Computer Science, L’Aquila, Italy, April 1999. Springer. [19] Alex Simpson. The Proof Theory and Semantics of Intuitionistic Modal Logic. PhD thesis, University of Edinburgh, 1994. [20] Roberto Virga. Higher-Order Rewriting with Dependent Types. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania, 1999.

141

The Impact of Higher-Order State and Control Effects on Local Relational Reasoning Derek Dreyer

Georg Neis

Lars Birkedal

MPI-SWS [email protected]

MPI-SWS [email protected]

IT University of Copenhagen [email protected]

Abstract

1. Introduction

Reasoning about program equivalence is one of the oldest problems in semantics. In recent years, useful techniques have been developed, based on bisimulations and logical relations, for reasoning about equivalence in the setting of increasingly realistic languages—languages nearly as complex as ML or Haskell. Much of the recent work in this direction has considered the interesting representation independence principles enabled by the use of local state, but it is also important to understand the principles that powerful features like higher-order state and control effects disable. This latter topic has been broached extensively within the framework of game semantics, resulting in what Abramsky dubbed the “semantic cube”: fully abstract game-semantic characterizations of various axes in the design space of ML-like languages. But when it comes to reasoning about many actual examples, game semantics does not yet supply a useful technique for proving equivalences. In this paper, we marry the aspirations of the semantic cube to the powerful proof method of step-indexed Kripke logical relations. Building on recent work of Ahmed, Dreyer, and Rossberg, we define the first fully abstract logical relation for an ML-like language with recursive types, abstract types, general references and call/cc. We then show how, under orthogonal restrictions to the expressive power of our language—namely, the restriction to first-order state and/or the removal of call/cc—we can enhance the proving power of our possible-worlds model in correspondingly orthogonal ways, and we demonstrate this proving power on a range of interesting examples. Central to our story is the use of state transition systems to model the way in which properties of local state evolve over time.

Reasoning about program equivalence is one of the oldest problems in semantics, with applications to program verification (“Is an optimized program equivalent to some reference implementation?”), compiler correctness (“Does a program transformation preserve the semantics of the source program?”), representation independence (“Can we modify the internal representation of an abstract data type without affecting the behavior of clients?”), and more besides. The canonical notion of program equivalence for many applications is observational (or contextual) equivalence. Two programs are observationally equivalent if no program context can distinguish them by getting them to exhibit observably different input/output behavior. Reasoning about observational equivalence directly is difficult, due to the universal quantification over program contexts. Consequently, there has been a huge amount of work on developing useful models and logics for observational equivalence, and in recent years this line of work has scaled to handle increasingly realistic languages—languages nearly as complex as ML or Haskell, with features like general recursive types, general (higherorder) mutable references, and first-class continuations. The focus of much of this recent work—e.g., environmental bisimulations [36, 17, 32, 35], normal form bisimulations [34, 16], step-indexed Kripke logical relations [4, 2, 3]—has been on establishing some effective techniques for reasoning about programs that actually use the interesting, semantically complex features (state, continuations, etc.) of the languages being modeled. For instance, most of the work on languages with state concerns the various kinds of representation independence principles that arise due to the use of local state as an abstraction mechanism. But of course this is only part of the story. When features are added to a language, they also enrich the expressive power of program contexts. Hence, programs that do not use those new features, and that are observationally equivalent in the absence of those features, might not be observationally equivalent in their presence. One well-known example of this is the loss of referential transparency in an impure language like ML. Another shows up in the work of Johann and Voigtl¨ander [15], who study the negative impact that Haskell’s strictness operator seq has on the validity of short-cut fusion and other free-theorems-based program transformations. In our case, we are interested in relational reasoning about stateful programs, so we will be taking a language with some form of mutable state as our baseline. Nonetheless, we feel it is important not only to study the kinds of local reasoning principles that stateful programming can enable, but also to understand the principles that powerful features like higher-order state and control effects disable. This latter topic has been broached extensively within the framework of game semantics. In the 1990s, Abramsky set forth a research programme (subsequently undertaken by a number of people) concerning what he called the semantic cube [19, 1, 24].

Categories and Subject Descriptors D.3.1 [Programming Languages]: Formal Definitions and Theory; D.3.3 [Programming Languages]: Language Constructs and Features; F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about Programs General Terms

Languages, Theory, Verification

Keywords Step-indexed Kripke logical relations, biorthogonality, observational equivalence, higher-order state, local state, first-class continuations, exceptions, state transition systems

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

143

for a language with call/cc and state. Moreover, a side benefit of biorthogonality is that it renders our model both sound and complete w.r.t. observational equivalence (unlike ADR’s, which was only sound).1 Interestingly, nearly all of the example program equivalences proved in the ADR paper continue to hold in the presence of call/cc, and their proofs carry over easily to our present formulation. (There is one odd exception, the “callback with lock” example, for which the ADR proof was very fiddly and ad hoc. We investigate this example in great detail, as we describe below.) The ADR paper also included several interesting examples that their method was unable to handle. The unifying theme of these examples is that they rely on the well-bracketed nature of computation—i.e., the assumption that control flow follows a stacklike discipline—an assumption that is only valid in the absence of call/cc. In Section 5, we consider two simple but novel enhancements to our state-transition-system model—private transitions and inconsistent states—which are only sound in the absence of call/cc and which correspondingly enable us to prove all of ADR’s “well-bracketed examples”. Conversely, in Section 6, we consider the additional reasoning power gained by restricting the language to first-order state. We observe that this restriction enables backtracking within a state transition system, and we demonstrate the utility of this feature on several examples. The above extensions to our basic state-transition-system model are orthogonal to each other, and can be used independently or in combination. One notable example of this is ADR’s “callback with lock” equivalence (mentioned above), an equivalence that holds in the presence of either higher-order state or call/cc but not both. Using private transitions but no backtracking, we can prove this equivalence in the presence of higher-order state but no call/cc; and using backtracking but no private transitions, we can prove it in the presence of call/cc but only first-order state. Yet another well-known example, due originally to O’Hearn [26], is true only in the absence of both higher-order state and call/cc; hence, it should come as no surprise that our novel proof of this example (presented in detail in Section 7.5) involves all three of our model’s new features working in tandem. Most of the paper is presented in an informal, pedagogical style. Indeed, one advantage of our state transition systems is that they lend themselves to clean “visual” proof sketches. In Section 7, we make our proof method formally precise and state some of the key metatheoretic results. Due to space limitations, we only work through the formal proof of one representative example. Detailed proofs of our full abstraction results, as well as all our examples (and more!), appear in the companion technical appendix [8]. In Section 8, we briefly consider how our Kripke logical relations are affected by the addition of exceptions to the language. Unlike call/cc, exceptions do not impose restrictions on our state transition systems, but they do require us to account for exceptional behavior in our proofs. Finally, in Section 9, we compare our methods to related work and suggest some directions for future work.

The idea was to develop fully abstract game-semantic characterizations of various axes in the design space of ML-like languages. For instance, the absence of mutable state can be modeled by restricting game strategies to be innocent, and the absence of control operators can be modeled by restricting game strategies to be well-bracketed. These restrictions are orthogonal to one another and can be composed to form fully abstract models of languages with different combinations of effects. Unfortunately, when it comes to reasoning about many actual examples, these game-semantics models do not yet supply a useful technique for proving programs equivalent, except in fairly restricted languages. One possible reason for the comparative lack of attention paid to this issue in the setting of relational reasoning is that some key techniques that have been developed for reasoning about local state—notably, Pitts and Stark’s method of local invariants [28]— turn out to work just as well in a language with higher-order state and call/cc as they do in the simpler setting (first-order state, no control operators) in which they were originally proposed. Before one can observe the negative impact of certain language features on relational reasoning principles, one must first develop a proof technique that actually exploits the absence of those features! 1.1 Overview In this paper, we marry the aspirations of Abramsky’s semantic cube to the powerful proof method of step-indexed Kripke logical relations. Specifically, we show how to define a fully abstract logical relation for an ML-like language with recursive types, abstract types, general references and call/cc. Then, we show how, under orthogonal restrictions to the expressive power of our language— namely, the restriction to first-order state and/or the removal of call/cc—we can enhance the proving power of our model in correspondingly orthogonal ways, and we demonstrate this proving power on a range of interesting examples. Our work builds closely on that of Ahmed, Dreyer, and Rossberg (hereafter, ADR) [3], who gave the first logical relation for modeling a language with both abstract types and higher-order state. We take ADR as a starting point because the concepts underlying that model provide a rich framework in which to explore the impact of various computational effects on relational reasoning. In particular, one of ADR’s main contributions was an extension of Pitts and Stark’s aforementioned “local invariants” method with the ability to establish properties about local state that evolve over time in some controlled fashion. ADR exploited this ability in order to reason about generative (or state-dependent) ADTs. The central contribution of our present paper is to observe that the degree of freedom with which local state properties may evolve depends directly on which particular effects are present in the programming language under consideration. In order to expound this observation, we first recast the ADR model in the more familiar terms of state transition systems (Section 3). The basic idea is that the “possible worlds” of the ADR model are really state transition systems, wherein each state dictates a potentially different property about the heap, and the transitions between states control how the heap properties are allowed to evolve. Aside from being somewhat simpler than ADR’s formulation of possible worlds (which relied on various non-standard anthropomorphic notions like “populations” and “laws”), our formulation highlights the essential notion of a state transition, which plays a crucial role in our story. Next, in Section 4, we explain how to extend the ADR model with support for first-class continuations via the well-studied technique of biorthogonality (aka ⊤⊤-closure) [18, 28]. The technical details of this extension are fairly straightforward, with the use of biorthogonality turning out to be completely orthogonal (no pun intended) to the other advanced aspects of the ADR model. That said, this is to our knowledge the first logical-relations model

2. The Language(s) Under Consideration In its unrestricted form, the language that we consider is a standard polymorphic lambda calculus with existential, pair, and isorecursive types, general references (higher-order state), and first1 It is important to note that the completeness result has nothing to do with the particular features present in the language, and all to do with the use of biorthogonality. In particular, biorthogonality gives us a uniform way of constructing fully abstract models for all of the different languages considered in this paper, regardless of whether they contain call/cc, general references, etc. See Section 9 for further discussion of this point.

144

class continuations (call/cc). We call this language HOSC. Its syntax and excerpts of its call-by-value semantics are given in Figure 1. Dots (. . . ) in the syntax cover primitive operations on base types b, such as addition and if-then-else. To ensure unique typing, various constructs have explicit type annotations, which we will typically omit if they are implicit from context. Evaluation contexts K, injected into the term language via contτ K, represent first-class continuations. They are a subset of general contexts C (“terms with a hole”), which are not shown here, but are standard. Their typing judgment ⊢ C : (Σ; ∆; Γ; τ ) ; (Σ′ ; ∆′ ; Γ′ ; τ ′ ) basically says that for any e with Σ; ∆; Γ ⊢ e : τ we have Σ′ ; ∆′ ; Γ′ ⊢ C[e] : τ ′ . The continuation typing judgment Σ; ∆; Γ ⊢ K ÷ τ says that K is an evaluation context with a hole of type τ . Finally, contextual (or observational) approximation, written Σ; ∆; Γ ⊢ e1 -ctx e2 : τ , means that in any well-typed program context C, if C[e1 ] terminates, then so does C[e2 ]. Contextual (or observational) equivalence is then defined as approximation in both directions. By restricting HOSC in two orthogonal ways, we obtain three fragments of interest:

τ ::= α | b | τ1 × τ2 | τ1 → τ2 | ∀α. τ | ∃α. τ | µα. τ | ref τ | cont τ e ::= x | l | he1 , e2 i | e.1 | e.2 | λx:τ. e | e1 e2 | Λα.e | e τ | pack hτ1 , ei as τ2 | unpack e1 as hα, xi in e2 | rollτ e | unroll e | ref e | e1 := e2 | !e | e1 == e2 | contτ K | call/ccτ (x. e) | throwτ e1 to e2 | . . . K ::= • | hK, e2 i | hv1 , Ki | K.1 | K.2 | K e2 | v1 K | K τ | pack hτ1 , Ki as τ2 | unpack K as hα, xi in e2 | rollτ K | unroll K | ref K | K := e2 | v1 := K | !K | K == e2 | v1 == K | throwτ K to e2 | throwτ v1 to K | . . . v ::= x | l | hv1 , v2 i | λx:τ. e | Λα.e | pack hτ1 , vi as τ2 | rollτ v | contτ K | . . . hh; K[ref v]i ֒→ hh ⊎ {l7→v}; K[l]i (l ∈ / dom(h)) hh; K[l := v]i ֒→ hh[l7→v]; K[hi]i (l ∈ dom(h)) hh; K[!l]i ֒→ hh; K[v]i (h(l) = v) hh; K[l1 == l2 ]i ֒→ hh; K[tt]i (l1 = l2 ) hh; K[l1 == l2 ]i ֒→ hh; K[ff]i (l1 = 6 l2 ) hh; K[call/ccτ (x. e)]i ֒→ hh; K[e[contτ K/x]]i hh; K[throwτ v to contτ ′ K ′ ]i ֒→ hh; K ′ [v]i

FOSC The result of restricting to first-order state. Concretely, this means only permitting reference types ref b, where b represents base types like int, bool, etc. HOS The result of removing call/cc, i.e., dropping the type cont τ and the corresponding three term-level constructs. FOS The result of making both of the above restrictions.

Heap typings Type environments Term environments

3. A Model Based on State Transition Systems The Ahmed-Dreyer-Rossberg (ADR) model [3], on which our model is based, is a step-indexed Kripke logical relation for the language HOS. In this section, we will briefly review what a stepindexed Kripke logical relation is, what is interesting about the ADR model, and how we can recast the essence of the ADR model in terms of state transition systems.

= = =

· | Σ, l:τ · | ∆, α · | Γ, x:τ

::= ::= ::=

⊢ K : (Σ; ∆; Γ; τ ) ; (Σ; ∆; Γ; τ ′ ) Σ; ∆; Γ ⊢ K ÷ τ Σ; ∆; Γ, x:cont τ ⊢ e : τ Σ; ∆; Γ ⊢ call/ccτ (x. e) : τ

Step-Indexed Kripke Logical Relations Logical relations are one of the best-known methods for local reasoning about equivalence (or, more generally, approximation) in higher-order, typed languages. The basic idea is to define the equivalence or approximation relation in question inductively over the type structure of the language, with each type constructor being interpreted by the logical connective to which it corresponds. For instance, two functions are logically related if relatedness of their arguments implies relatedness of their results; two existential packages are logically related if there exists a relational interpretation of their hidden type representations that is preserved by their operations; and so forth. In order to reason about equivalence in the presence of state, it becomes necessary to place constraints on the heaps under which programs are evaluated. This is where Kripke logical relations come in. Kripke logical relations [28] are logical relations indexed by a possible world W , which codifies some set of heap constraints. Roughly speaking, e1 is related to e2 under W only if they behave “the same” when run under any heaps h1 and h2 that satisfy the constraints of W . When reasoning about programs that maintain some local state, possible worlds allow us to impose whatever invariants on the local state we want, so long as we ensure that those invariants are preserved by the code that accesses the state. To make things concrete, consider the following example: τ e1 e2

Σ ∆ Γ

where fv(τ ) = ∅

Σ; ∆; Γ ⊢ K ÷ τ Σ; ∆; Γ ⊢ contτ K : cont τ

Σ; ∆; Γ ⊢ e′ : τ ′

Σ; ∆; Γ ⊢ e : cont τ ′

Σ; ∆; Γ ⊢ throwτ e′ to e : τ

∀l:τ ∈ Σ. Σ; ·; · ⊢ h(l) : τ ⊢h:Σ

Σ; ∆; Γ ⊢ e1 -ctx e2 : τ

def

=

Σ; ∆; Γ ⊢ e1 : τ ∧ Σ; ∆; Γ ⊢ e2 : τ ∧ ∀C, Σ′ , τ ′ , h. ⊢ C : (Σ; ∆; Γ; τ ) ; (Σ′ ; ·; ·; τ ′ ) ∧ ⊢ h : Σ′ ∧ hh; C[e1 ]i ↓ =⇒ hh; C[e2 ]i ↓ Figure 1. The Language HOSC private (i.e., it is never leaked to the context), and since it is never modified by the function returned by e1 , it will always point to 1. To prove this using Kripke logical relations, we would set out to prove that e1 and e2 are related under an arbitrary initial world W . So suppose we evaluate the two terms under heaps h1 and h2 that satisfy W . Since the evaluation of e1 results in the allocation of some fresh memory location for x (i.e., x 6∈ dom(h1 )), we know that the initial world W cannot already contain any constraints governing the contents of x. (If it contained such a constraint, h1 would have had to satisfy it, and hence x would have to be in dom(h1 ).) So we may extend W with a new invariant stating that x ֒→ 1 (i.e., x points to 1). It then remains to show that the two λabstractions are logically related under this extended world—i.e., under the assumption that x ֒→ 1—which is straightforward. Finally, step-indexed logical relations [4, 2] were proposed (originally by Appel and McAllester) as a way to account for se-

(unit → unit) → int let x = ref 1 in λf. (f hi; !x) λf. (f hi; 1)

We would like to show that e1 and e2 are observationally equivalent at type τ . The reason, intuitively, is obvious: the reference x is kept

145

mantically problematic features, such as general recursive types, whose relational interpretations are seemingly “cyclic” and thus difficult to define inductively. The idea is simply to stratify the construction of the logical relation by a natural number (or “step index”), representing roughly the number of steps of computation for which the programs in question behave in a related manner. One of the key contributions of the ADR model was to combine the machinery of step-indexed logical relations with that of Kripke logical relations in order to model higher-order state. While the details of this construction are quite interesting, they are orthogonal to the novel contributions of the model we present in this paper. Indeed, our present model follows ADR’s very closely in its use of step-indexing to resolve circularities in the construction, and so we refer the interested reader to the ADR paper for details.

we will show how to extend our STS’s in order to prove this very example in a much simpler, cleaner way than ADR’s model does.

4. Biorthogonality, Call/cc, and Full Abstraction One point on which different formulations of Kripke logical relations differ is the precise formulation of the logical relation for terms. The ADR model employs a “direct-style” term relation, which can be described informally as follows: two terms e1 and e2 are logically related under world W iff whenever they are evaluated in initial heaps h1 and h2 satisfying W , they either both diverge or they both converge to machine configurations hh′1 ; v1 i and hh′2 ; v2 i such that h′1 and h′2 satisfy W ′ and v1 and v2 are logically related values under W ′ , where W ′ is some “future world” of W . (By “future world”, we mean that W ′ extends W with new constraints about freshly allocated pieces of the heap, and/or the heap constraints of W may have evolved to different heap constraints in W ′ according to the STS’s in W .) We call this a direct-style term relation because it involves evaluating the terms directly to values and then showing relatedness of those values in some future world. An alternative approach, first employed in the logical relations setting by Pitts and Stark [28] but subsequently adopted by several others (e.g., [13, 7, 5]), is what one might call a “CPS” term relation, although it is more commonly known as a biorthogonal (or ⊤⊤-closed) term relation. The idea is to define two terms to be related under world W if they co-terminate (both converge or both diverge) when evaluated under heaps that satisfy W and under continuations K1 and K2 related under W . The latter (continuation relatedness) is then defined to mean that, for any future world W ′ of W , the continuations K1 and K2 co-terminate when applied (under heaps that satisfy W ′ ) to values that are related under W ′ . In this way, the logical relation for values is lifted to a logical relation for terms by a kind of CPS transform. The main arguable advantage of the direct-style term relation is that its definition is perhaps more intuitive, corresponding closely to the proof sketches of the sort that we will present informally in the sections that follow. That said, in any language for which a direct-style relation is sound, it is typically possible to start instead with a biorthogonal relation and then prove a direct-style proof principle—e.g., Pitts and Stark’s “principle of local invariants” [28]—as a corollary. The advantages of the biorthogonal approach are clearer. First, it automagically renders the logical relation complete with respect to observational equivalence, largely irrespective of the particular features in the language under consideration. (Actually, it is not so magical: ⊤⊤-closure is essentially a kind of closure under observational equivalence.) Second, and perhaps more importantly, the biorthogonal approach scales to handle languages with first-class continuations, such as our HOSC and FOSC, which the directstyle doesn’t. The reason for this is simple: the direct-style approach is only sound if the evaluation of terms is independent of the continuation under which they are evaluated. If the terms’ behavior is context-dependent, then it does not suffice to consider their co-termination under the empty continuation, which is effectively what the direct-style term relation does. Rather, it becomes necessary to consider co-termination of whole programs (terms together with their continuations), as the biorthogonal relation does. Thus, in this paper we adopt the biorthogonal approach. This enables us to easily adapt all the proofs from the ADR paper (save for one) to also work for a language with call/cc. (The one exception is the “callback with lock” equivalence, which simply doesn’t hold in the presence of call/cc.) It is worth noting that, although the kinds of example programs we focus on in this paper do not involve abstract types, a number of the ADR examples do. Additionally, we can prove equivalences involving programs that manipulate both call/cc and higher-order state. One well-

ADR and State Transition Systems The other key contribution of the ADR model was to provide an enhanced notion of possible world, which has the potential to express properties of local state that evolve over time. To motivate this feature of ADR, consider a simple variant of the example shown above, in which the first program e1 is replaced by e1

=

let x = ref 0 in λf. (x := 1; f hi; !x)

Here, x starts out pointing to 0, but if the function that e1 evaluates to is ever called, x will be set to 1 and will never change back to 0. In this case, the only invariant one can prove about x is that it points to either 0 or 1, but this invariant is insufficient to establish that after the call to the callback f , the contents of x have not changed back to 0. For this reason, Pitts and Stark, whose possible-worlds model only supported heap invariants, called this example the “awkward” example (because they could not handle it) [28]. While the awkward example is clearly contrived, it is also a minimal representative of a useful class of programs in which changes to local state occur in some monotonic fashion. As ADR showed, this includes well-known generative (or state-dependent) ADTs, in which the interpretation of an abstract type grows over time in correspondence with changes to some local state. ADR’s solution was to generalize possible worlds’ notion of “heap constraint” to express heap properties that change in a controlled fashion. We can understand their possible worlds as essentially state transition systems, where each state determines a particular heap property, and where the transitions determine how the heap property may evolve. For instance, in the case of the awkward example, ADR would represent the heap constraint on x via the following state transition system (STS): x ֒→ 0

x ֒→ 1

Initially, x points to 0, and then it is set to 1. Since the call to the callback f occurs when we are in the x ֒→ 1 state, we know it must return in the same state since there is no transition out of that state. Correspondingly, it is necessary to also show that the x ֒→ 1 state is really final—i.e., if the function to which e1 evaluates is called in that state, it will not change x’s contents again—but this is obvious. In ADR, states are called “populations” and state transition systems are called “laws”, but the power of their possible worlds is very similar to that of our STS’s (as we have described them thus far), and most of their proofs are straightforwardly presentable in terms of STS’s. That said, the two models are not identical. In particular, there is one example we are aware of, the “callback with lock” example, that is provable in ADR’s model but not in our basic STS model. As we will see shortly, there are good reasons why this example is not provable in our basic STS model, and in Section 5.1,

146

state—even though this situation (x pointing to 0 after that call) cannot actually arise in reality. And indeed, if x could point to 0 at that point, our proof would be doomed. In summary, while we would like to add this transition, we also want to keep the context from using it. This is where private transitions come in.

known challenging example of such an equivalence is the correctness of Friedman and Haynes’ encoding of call/cc via “one-shot” continuations (continuations that can only be invoked once) [11, 34]. The basic idea of the encoding is to model an unrestricted continuation using a private (local) ref cell that contains a one-shot continuation. Every time the continuation is invoked, the ref cell is updated with a fresh one-shot continuation. With biorthogonal logical relations, the proof of this example is completely straightforward, employing just a simple invariant on the private ref cell. As far as we know, though, this proof is novel. Full details are given in the technical appendix [8].

5.1 Private Transitions Private transitions are a new class of transitions in our state transition systems, separate from the ordinary transitions that we have seen so far (and which we henceforth call public transitions). The basic idea is very simple: when reasoning about the relatedness of terms, we must show that—when viewed extensionally—they appear only to be making public transitions, and correspondingly we may assume that the context only makes public transitions as well. Internally, however, within a computation, we may make use of both public and private transitions. Concretely, we can use the following STS to prove our running example (where the dashed arrow denotes a private transition):

5. Reasoning in the Absence of Call/cc In this section, we examine some reasoning principles that are enabled by removing call/cc from our language. Consider this variant of the “awkward” example (from ADR): τ e1

= =

e2

=

(unit → unit) → int let x = ref 0 in λf. (x := 0; f hi; x := 1; f hi; !x) λf. (f hi; f hi; 1)

x ֒→ 0

What has changed is that now the callback is run twice, and in e1 , the first call to f is preceded by the assignment of x to 0, not 1. It is easy to see that e1 and e2 are not equivalent in HOSC (or even FOSC). In particular, here is a distinguishing context C:

First, if v1 is called in the starting state x ֒→ 1, the presence of the private transition allows us to “lawfully” transition from x ֒→ 1 to x ֒→ 0. Second, we know that, because we are in the x ֒→ 1 state before the second call to f and there is no public transition from there to any other state, we must still be in that same state when f returns. Hence we know that x points to 1 at that point, as desired. Lastly, although the body of v1 makes a private transition internally (when called in starting state x ֒→ 1), it appears extensionally to make a public transition, since its final state (x ֒→ 1) is obviously publicly accessible from whichever state was the initial one. Private transitions let us prove not only this example, but also several others from the literature that hold exclusively in the absence of call/cc (including Pitts and Stark’s “higher-order profiling” example [28]—see the appendix [8] for details). The intuitive reason why private transitions “don’t work” with call/cc is that, in the presence of call/cc, every time we pass control to the context may be the last! Therefore, the requirement that the extensional behavior of a term must appear like a public transition would essentially imply that every internal transition must be public as well.

let g = • in let b = ref ff in let f = (λ . if !b then call/cc (k. g (λ . throw hi to k)) else b := tt) in gf Exploiting its ability to capture the continuation K of the second call to f , the context C is able to set x back to 0 and then immediately throw control back to K. It is easy to verify that C[e1 ] yields 0, while C[e2 ] yields 1. In the absence of call/cc, however, computations are “wellbracketed”. Here, this means that whenever x is set to 0, it will eventually be set to 1—no matter what the callback function does. Consequently, it seems intuitively clear that these programs are equivalent in HOS (and FOS), but how do we prove it? The STS model we have developed so far will clearly not do the job, precisely because that model is compatible with call/cc and this example is not. So the question remains: how can we augment the power of our STS’s so that they take advantage of well-bracketing? To see how to answer this question, let’s see what goes wrong if we try to give an STS for our well-bracketed equivalence. First, recall the STS (from Section 3) that we used in order to prove the original awkward example. To see why this STS is insufficient for our present purposes, suppose the function value resulting from evaluating e1 —call it v1 —is applied in the x ֒→ 1 state.2 The first thing that happens is that x is set to 0. However, as there is no transition from the x ֒→ 1 state to the x ֒→ 0 state, there is no way we can continue the proof. So how about adding that transition? x ֒→ 0

x ֒→ 1

The “Callback with Lock” Example Here is another equivalence (from ADR) that holds in HOS but not in HOSC. Interestingly, this example was provable in the original ADR model, but only through some complex step-index hackery. The proof we are about to sketch is much cleaner and easier to understand. Consider the following two encodings of a counter object with two methods: an increment function that also takes a callback argument, which it invokes, and a poll function that returns the current counter value. C = let b = ref tt in let x = ref 0 in hλf. if !b then b := ff; •; b := tt else hi, λ . !xi τ = ((unit → unit) → unit) × (unit → int) e1 = C[f hi; x := !x + 1] e2 = C[let n = !x in f hi; x := n + 1]

x ֒→ 1

While adding the transition from x ֒→ 1 to x ֒→ 0 clears the first hurdle, it also erects a new one: according to the STS, it is now possible that, after the second call to f , we end up in the left

Note that in the second program the counter x is dereferenced before the callback is executed, and in the first program it is dereferenced after. In both programs, a Boolean lock b guards the increment of the counter, thereby enforcing that running the callback will not result in any change to the counter. It is not hard to construct a context that exploits the combination of call/cc and higher-order state in order to distinguish e1 and e2 .

2 When proving functions

logically related, we must consider the possibility that they are invoked in an arbitrary “future” world—i.e., a world where our STS may be in any state that is reachable from its initial state. This ensures monotonicity of the logical relation (Theorem 1, Section 7.1).

147

The basic idea is to pass the increment method a callback that captures its current continuation and stores that in a ref cell so it can be invoked later. The definition of this distinguishing context appears in the appendix [8]. In the absence of call/cc, however, the two programs are equivalent. To prove this, we employ the following infinite STS: b ֒→ tt x ֒→ 0

b ֒→ tt x ֒→ 1

b ֒→ tt x ֒→ 2

foo ...

b ֒→ ff x ֒→ 0

b ֒→ ff x ֒→ 1

b ֒→ ff x ֒→ 2

...

will check if y points to tt and, if so, diverge. If the thunk was not applied during the call to f , then e1 will set x to tt, thus ensuring that any future attempt to apply the thunk will diverge as well. As in the previous examples, note that this equivalence does not hold in the presence of call/cc. Here is a distinguishing context: call/cc (k. • (λg. throw g hi to k)) To prove the equivalence in HOS, we can split the proof into two directions of approximation. Proving that e2 approximates e1 is actually very easy because (1) it is trivial to show that λ . ⊥ approximates the thunk that e1 passes to f , and (2) if a program C[e2 ] terminates (which is the assumption of observational approximation), then C[e1 ] must in fact maintain the invariant that y ֒→ ff, and using that invariant the proof is totally straightforward. In contrast, the other direction of approximation seems at first glance impossible to prove using logical relations. The issue is that we have to show that the thunks passed to the callback f are related, i.e., that λ . if !x then ⊥ else y := tt approximates λ . ⊥, which obviously is false since, when applied (as they may be) in a state where x points to ff, the first converges while the second diverges. To solve this conundrum, we do the blindingly obvious thing, which is to introduce falsehood into our model! Specifically, we extend our STS’s with inconsistent states, in which we can prove false things, such as that a terminating computation approximates a divergent one. How, one may ask, can this possibly work? The idea is as follows: when we enter an inconsistent state, we effectively shift the proof burden from the logical relation for terms to the logical relation for continuations. That is, while it becomes very easy to prove that two terms are related in an inconsistent state, it becomes very hard to prove that two continuations K1 and K2 are related in such a state—in most cases, we will be forced to prove that K1 diverges. Thus, while inconsistent states do allow a limited kind of falsehood inside an approximation proof, we can only enter into them if we know that the continuation of the term on the lefthand side of the approximation will diverge anyway. Concretely, to show that e1 approximates e2 , we construct the following STS, where the diamond indicates an inconsistent state:

For each number n there are two states: one (the “unlocked” state) saying that b points to tt and x points to n in both programs, and another (the “locked” state) saying that b points to ff and x points to n in both programs. It is thus easy to see that the two poll methods are related (they return the same number). To show the increment methods related, suppose they are executed in a state where x points to some m and b points to tt (the other case where b ֒→ ff is trivial). Before invoking the callback, b is set to ff and, in the second program, n is bound to m. Accordingly, we move “downwards” in our STS to the locked state and can then call f . Because that state does not have any other public successors, we will still be there if and when f returns—indeed, this is the essence of what it means to be a “locked” state. In the first program, x is then incremented, i.e., set to m + 1. In the second program, x is set to n + 1 = m + 1. Finally, b is set back to tt and we thus move to the matching private successor (b ֒→ tt, x ֒→ m + 1) in the STS. Since this is a public successor of the initial state (b ֒→ tt, x ֒→ m), our extensional transition appears public and we are done. 5.2 Inconsistent States While private transitions are clearly a useful extension to our STS model, there is one kind of “well-bracketed example” we are aware of that private transitions alone are insufficient to account for. We are referring to the “deferred divergence” example, presented by ADR as an example they could not handle. The original version of this equivalence, due to O’Hearn [26], was presented in the setting of Idealized Algol, and it does not hold in the presence of higherorder state. (We will consider a variant of O’Hearn’s example later on, in Section 6.) Here, we consider a version of the equivalence that does hold in HOS, based on the one in Bohr’s thesis [7]: τ e1

= =

e2

=

((unit → unit) → unit) → unit let x = ref ff in let y = ref ff in λf. f (λ . if !x then ⊥ else y := tt); if !y then ⊥ else x := tt λf. f (λ . ⊥)

x ֒→ ff y ֒→ ff

x ֒→ ff y ֒→ ff

x ֒→ tt y ֒→ ff

x ֒→ ff y ֒→ tt

For the moment, ignore the top-left state (we explain it below). In the proof, we wish to show that the thunks passed to the callback f are logically related in the top-right state, which requires showing that they are related in any state accessible from it. Fortunately, this is easy. If the thunks are called in the bottom-left state, then they both diverge. If they are called in the top-right or bottom-right state, then the else-branch is executed (in the first program) and we move to (or stay in) the bottom-right state—since this state is inconsistent, the proof is trivially done. Dually, we must show that the continuations of the callback applications are also related in any state (publicly) accessible from the top-right one. If the continuations are invoked in the top-right or the bottom-left state, they will set x to tt, thereby transitioning to the bottom-left. If, on the other hand, they are invoked in the inconsistent bottom-right state, then we are required to show that the first one diverges, which fortunately it will since y points to tt.

Intuitively, the explanation why e1 and e2 are equivalent goes as follows. The functions returned by both programs take a higherorder callback f as an argument and apply it to a thunk. In the case of e2 , if that thunk argument (λ . ⊥, where ⊥ is a divergent term) is ever applied, either during the call to f or at some point in the future (e.g., if the thunk were stored by f in a ref cell and then called later), then the program will clearly diverge. Now, e1 implements the same divergence behavior, but in a rather sneaky way. It maintains two private flags x and y, initially set to ff. If the thunk that it passes to f is applied during the call to f , then the thunk’s body will not immediately diverge (as in the case of e2 ), but rather merely set y to tt. Then, if and when f returns, e1

148

the first-order setting, though, there is no way for the callback to store such higher-order data, so backtracking is not a problem. A precise technical explanation of how the model is changed to allow backtracking, and why this is sound, will be given in Section 7.3.

Now about the top-left state, whose heap constraint is identical to the one in the top-right state: the reason for including this state has to do with soundness of the logical relation. In order to ensure soundness, we require that when an STS is installed in the possible world, it may not contain any inconsistent states that are publicly accessible from its starting state. We say in this case that the starting state is safe. (Without this safety restriction, it would be easy to show, for instance, that tt approximates ff in any world W by simply adding an STS to W with a single inconsistent state.) To circumvent this restriction, we use the top-left state as our starting state and connect it to the top-right state by a private transition. (In the proof, the first step before invoking the callbacks is to transition into the top-right state.) This is fine so long as the extensional behavior of the functions we are relating makes a public transition, and here it does—if they are invoked in the top-left state, then either they diverge or they return control in the bottom-left state, which is publicly accessible from the top-left.

6.2 Putting It Together The example we just looked at might suggest that backtracking is mainly useful as a replacement for private transitions in the presence of call/cc. But in fact, they are complementary techniques. In particular, for equivalences that hold only in FOS but not in HOS or FOSC, we can profitably employ backtracking, private transitions, and inconsistent states, all working together. Consider this simpler version of the “deferred divergence” example, based closely on an example of O’Hearn [26]:

6. Reasoning With First-Order State In this section, we consider an orthogonal restriction to the one examined in the previous section. Instead of removing call/cc from the language, what happens if we restrict state to be first-order? What new reasoning principles are enabled by this restriction?

C

Recall the “callback with lock” example from Section 5.1, which we proved equivalent in HOS. As it turns out, that equivalence also holds in FOSC. Of course, we won’t be able to prove that using the HOSC model since the equivalence doesn’t hold in HOSC. But let us see what exactly goes wrong if we try. First of all, recall the use of private transitions in our earlier proof. Due to call/cc, we cannot use any private transitions this time. Clearly, making them public is not an option, so what if we just drop them entirely? b ֒→ tt x ֒→ 1

b ֒→ tt x ֒→ 2

foo

b ֒→ ff x ֒→ 0

b ֒→ ff x ֒→ 1

b ֒→ ff x ֒→ 2

···

= =

e2

=

((unit → unit) → unit) → unit let y = ref ff in λf. f (λ . y := tt); if !y then ⊥ else hi λf. f (λ . ⊥)

These programs are not only distinguishable in the setting of FOSC (by the same distinguishing context as given in Section 5.2), but also in HOS, as the following context demonstrates:

6.1 Backtracking

b ֒→ tt x ֒→ 0

τ e1

=

let r = ref (λ . hi) in • (λg. r := g); !r hi

It is easy to verify that C[e1 ] terminates, while C[e2 ] diverges. The two programs are, however, equivalent in FOS, which we can prove using the following STS: y ֒→ ff

y ֒→ ff

y ֒→ tt

The proof is largely similar to (if a bit simpler than) the one sketched for the higher-order version of this example in Section 5.2. We start in the left state and transition immediately along the private transition to the middle state. With the help of the inconsistent right state, it is easy to show that the thunk arguments passed to the callback are related in the middle state. Hence, when the callback returns, we are either in the right state or the middle state. In the former case, we must show that the continuation in the l.h.s. program diverges; in the latter, we backtrack to the initial, left state, which is of course publicly accessible from itself. (We will present this proof in more detail below, in Section 7.5.) Why, one might ask, is it not possible to avoid the use of backtracking here by adding a private transition back from the middle state to the left state? (Of course, it must not be possible, or else the equivalence would hold true in HOS, which as we have seen it does not.) The answer is that, if we were to add such a transition, then we would not be able to prove that the thunk arguments to the callback f were logically related in the middle state. Specifically, in order to show the latter, we must show that the thunks are related in any state accessible (by any kind of transition) from the middle state. So if there were any transition from the middle to the left state, we would have to show that the thunks were related starting in the left state as well—but they are not, because there is no public transition from the initial left state to the inconsistent right state, and adding one would be unsound.

···

In the resulting STS, we still know that running the callback in a locked state (b ֒→ ff, x ֒→ m) will leave us in the very same state if and when it returns. However, without any outgoing (private) transition from that state, it seems that we are subsequently stuck. Fortunately, we are not. The insight now is that the absence of higher-order state allows us to do backtracking within our STS. Concretely, we can backtrack from the locked state to the unlocked state we were in before (b ֒→ tt, x ֒→ m), and then transition (publicly) to its successor (b ֒→ tt, x ֒→ m + 1). Intuitively, this kind of backtracking would not be sound in the presence of higherorder state because, in that setting, the callback might have stored some higher-order data during its execution (such as functions or continuations) that are only logically related in the locked state and its successors.3 Since (b ֒→ tt, x ֒→ m + 1) is not a successor of the previous locked state, the final heaps would then fail to satisfy the final world in which the increment functions return. Here in

7. Technical Development We now present the models for our various languages formally. It is easiest to start with the model for HOS, and then show how small changes to that yield the models for HOSC, FOS, and FOSC.

3 Indeed,

the context that distinguishes between the two programs in HOSC employs precisely such a callback, namely one that stores its current continuation in a ref cell.

149

def

HeapAtomn HeapReln Islandn

= = def =

Worldn ContAtomn [τ1 , τ2 ] TermAtomn [τ1 , τ2 ] ValRel[τ1 , τ2 ] SomeValRel

= = def = def = def =

def

def

def

⌊(ι1 , . . . , ιm )⌋k ⌊(s, δ, ϕ, , H)⌋k

def

⊲r def

= =

def def

=

λs.⌊H(s)⌋k {(W, h1 , h2 ) ∈ r | W.k < k} {(W, e1 , e2 ) | W.k > 0 =⇒ (⊲W, e1 , e2 ) ∈ r}

⊒ (k, Σ1 , Σ2 , ω) ⊒ (ι1 , . . . , ιm ) , H ) ⊒ (s, δ, ϕ, , H)

= = def =

k ≤k∧ ⊇ Σ1 ∧ Σ′2 ⊇ Σ2 ∧ ω ′ ⊒ ⌊ω⌋k′ m′ ≥ m ∧ ∀j ∈ {1, . . . , m}. ι′j ⊒ ιj (δ ′ , ϕ′ , ′ , H ′ ) = (δ, ϕ, , H) ∧ (s, s′ ) ∈ δ

(k′ , Σ′1 , Σ′2 , ω ′ ) ⊒pub (k, Σ1 , Σ2 , ω) (ι′1 , . . . , ι′m′ ) ⊒pub (ι1 , . . . , ιm )

def

k′ ≤ k ∧ Σ′1 ⊇ Σ1 ∧ Σ′2 ⊇ Σ2 ∧ ω ′ ⊒pub ⌊ω⌋k′ m′ ≥ m ∧ ∀j ∈ {1, . . . , m}. ι′j ⊒pub ιj ∧ ∀j ∈ {m + 1, . . . , m′ }. safe(ι′j ) (δ ′ , ϕ′ , ′ , H ′ ) = (δ, ϕ, , H) ∧ (s, s′ ) ∈ ϕ

(s′ , δ ′ , ϕ′ ,

′

, H ′ ) ⊒pub (s, δ, ϕ, , H)

∀ι ∈ W.ω. safe(ι)

safe(ι)

def

=

def

= =

def

def

=

′

def

(k , Σ′1 , Σ′2 , ω ′ ) (ι′1 , . . . , ι′m′ ) ′ ′ ′ ′ ′

(s , δ , ϕ ,

=

⌊H⌋k ⌊ψ⌋k

(k, Σ1 , Σ2 , ⌊ω⌋k )

=

′

safe(W )

(⌊ι1 ⌋k , . . . , ⌊ιm ⌋k ) (s, δ, ϕ, , ⌊H⌋k )

= =

def

⊲(k + 1, Σ1 , Σ2 , ω)

def

{(W, h1 , h2 ) | W ∈ Worldn } {ψ ⊆ HeapAtomn | ∀(W, h1 , h2 ) ∈ ψ. ∀W ′ ⊒ W. (W ′ , h1 , h2 ) ∈ ψ} {ι = (s, δ, ϕ, , H) | s ∈ State ∧ δ ⊆ State2 ∧ ϕ ⊆ δ ∧ δ, ϕ reflexive ∧ δ, ϕ transitive ∧ ⊆ State ∧ H ∈ State → HeapReln } {W = (k, Σ1 , Σ2 , ω) | k < n ∧ ∃m. ω ∈ Islandm k } {(W, K1 , K2 ) | W ∈ Worldn ∧ W.Σ1 ; ·; · ⊢ K1 ÷ τ1 ∧ W.Σ2 ; ·; · ⊢ K2 ÷ τ2 } {(W, e1 , e2 ) | W ∈ Worldn ∧ W.Σ1 ; ·; · ⊢ e1 : τ1 ∧ W.Σ2 ; ·; · ⊢ e2 : τ2 } {r ⊆ TermAtomval [τ1 , τ2 ] | ∀(W, v1 , v2 ) ∈ r. ∀W ′ ⊒ W. (W ′ , v1 , v2 ) ∈ r} {R = (τ1 , τ2 , r) | r ∈ ValRel[τ1 , τ2 ]}

def

Σ′1

∀s′ . (ι.s, s′ ) ∈ ι.ϕ =⇒ s′ ∈ / ι.

consistent(W )

def

=

∄ι ∈ W.ω. ι.s ∈ ι.

def

ψ ⊗ ψ ′ = {(W, h1 ⊎ h′1 , h2 ⊎ h′2 ) | (W, h1 , h2 ) ∈ ψ ∧ (W, h′1 , h′2 ) ∈ ψ ′ } def

(h1 , h2 ) : W = ⊢ h1 : W.Σ1 ∧ ⊢ h2 : W.Σ2 ∧ (W.k > 0 =⇒ (⊲W, h1 , h2 ) ∈

N|W.ω| i=1

W.ω(i).H(W.ω(i).s))

Figure 2. Worlds and Auxiliary Definitions 7.1 HOS

“states of interest”—whether there is other junk in the State space is irrelevant. Our use of step-indexing to stratify the construction of worlds and to define the logical relation by a primary induction on natural numbers follows the development in ADR quite closely. For space reasons, we therefore omit explanation of the approximation operation ⌊·⌋k , the “later” operator ⊲, and other step-related technicalities and refer the interested reader to the literature [3, 9]. One point S about notation, though: we sometimes write World to mean n Worldn , and similarly for the other semantic classes. Based on the two transition relations (full and public), we define two notions of future worlds (aka world extension). First, we say that W ′ extends W , written W ′ ⊒ W , iff it contains the same islands as W (and possibly more), and for each island in W , the new state s′ of that island in W ′ —which is the only aspect of the island that is permitted to change in future worlds—is accessible from the old state s in W , according to the island’s full transition relation δ. Public extension, written W ′ ⊒pub W , is defined analogously, except using the public transition relation ϕ instead of δ, and with the additional requirement that the new islands (those in W ′ but not in W ) must be safe. An island is safe iff there is no public transition from its current state to any inconsistent state. The reason why our (and ADR’s) heap relations are worldindexed is that, when expressing heap constraints, we want to be able to say, for instance, that a value in the first heap must be logically related to a value in the second heap. In that case, we need to have some way of talking about the “current” world under which that logical relation should be considered, and by world-indexing the heap relations we enable the current world to be passed in as a

As described in Section 3, we employ a step-indexed Kripke logical relation, which is a kind of possible-worlds model. Worlds Figure 2 displays the construction of worlds, along with various related operations and relations.4 Worlds W consist of a step index k, heap typings Σ1 and Σ2 (for the first and second programs, respectively), and an array of islands ω = ι1 , . . . , ιn . Islands in turn are (possibly infinite) state transition systems governing disjoint pieces of the heap. Each consists of a current state s, a transition relation δ, a public transition relation ϕ, a set of inconsistent states , and last but not least, a mapping H from states to heap constraints (in the form of world-indexed heap relations— more on that below). The public transition relation ϕ must be a subset of the “full” transition relation δ (note: the private transitions are obtained by subtracting ϕ from δ), and we require both δ and ϕ to be reflexive and transitive. What exactly “states” s are—i.e., how we define the state space State—does not really matter. That is, State is essentially a parameter of the model, except that it needs to be at least large enough to encode bijections on memory locations (see our relational interpretation of ref types below). For our purposes, we find it convenient to assume that State contains all terms and all sets of terms. Also, note that while an island’s H map is defined on all states in State, we typically only care about how it is defined on a particular set of 4 Here

and in the following development we use the dot-notation to project components out of a structure. As an example, we write W.Σ1 to extract the first heap typing out of a world W .

150

VJαKρ VJbKρ VJτ × τ ′ Kρ VJτ ′ → τ Kρ VJ∀α. τ Kρ VJ∃α. τ Kρ VJµα. τ Kρ VJref τ Kρ O KJτ Kρ E Jτ Kρ GJ·Kρ DJ·K SJ·K

def

= = def = def =

def

def

=

def

=

def

= =

def

def

=

def

=

def

=

def

= = def = def

ρ(α).r {(W, v, v) ∈ TermAtom[b, b]} {(W, hv1 , v1′ i, hv2 , v2′ i) ∈ TermAtom[ρ1 (τ × τ ′ ), ρ2 (τ × τ ′ )] | (W, v1 , v2 ) ∈ VJτ Kρ ∧ (W, v1′ , v2′ ) ∈ VJτ ′ Kρ} {(W, λx:τ1 . e1 , λx:τ2 . e2 ) ∈ TermAtom[ρ1 (τ ′ → τ ), ρ2 (τ ′ → τ )] | ∀W ′ , v1 , v2 . W ′ ⊒ W ∧ (W ′ , v1 , v2 ) ∈ VJτ ′ Kρ =⇒ (W ′ , e1 [v1 /x], e2 [v2 /x]) ∈ E Jτ Kρ} {(W, Λα.e1 , Λα.e2 ) ∈ TermAtom[ρ1 (∀α. τ ), ρ2 (∀α. τ )] | ∀W ′ ⊒ W. ∀(τ1 , τ2 , r) ∈ SomeValRel. (W ′ , e1 [τ1 /α], e2 [τ2 /α]) ∈ E Jτ Kρ, α7→(τ1 , τ2 , r)} {(W, pack hτ1 , v1 i as τ1′ , pack hτ2 , v2 i as τ2′ ) ∈ TermAtom[ρ1 (∃α. τ ), ρ2 (∃α. τ )] | ∃r. (τ1 , τ2 , r) ∈ SomeValRel ∧ (W, v1 , v2 ) ∈ VJτ Kρ, α7→(τ1 , τ2 , r)} {(W, rollτ1 v1 , rollτ2 v2 ) ∈ TermAtom[ρ1 (µα. τ ), ρ2 (µα. τ )] | (W, v1 , v2 ) ∈ ⊲VJτ [µα. τ /α]Kρ} {(W, l1 , l2 ) ∈ TermAtom[ρ1 (ref τ ), ρ2 (ref τ )] | ∃i. ∀W ′ ⊒ W. (l1 , l2 ) ∈ bij(W ′ .ω(i).s) ∧ f , {l1 7→v1 }, {l2 7→v2 }) ∈ HeapAtom | (W f , v1 , v2 ) ∈ VJτ Kρ}} ∃ψ. W ′ .ω(i).H(W ′ .ω(i).s) = ψ ⊗ {(W {(W, e1 , e2 ) | ∀h1 , h2 . (h1 , h2 ) : W ∧ hh1 ; e1 i ↓<W.k =⇒ consistent(W ) ∧ hh2 ; e2 i ↓} {(W, K1 , K2 ) ∈ ContAtom[ρ1 (τ ), ρ2 (τ )] | ∀W ′ , v1 , v2 . W ′ ⊒pub W ∧ (W ′ , v1 , v2 ) ∈ VJτ Kρ =⇒ (W ′ , K1 [v1 ], K2 [v2 ]) ∈ O} {(W, e1 , e2 ) ∈ TermAtom[ρ1 (τ ), ρ2 (τ )] | ∀K1 , K2 . (W, K1 , K2 ) ∈ KJτ Kρ =⇒ (W, K1 [e1 ], K2 [e2 ]) ∈ O} {(W, ∅) | W ∈ World} {∅} World def

Σ; ∆; Γ ⊢ e1 -log e2 : τ =

def

GJΓ, x:τ Kρ = {(W, (γ, x7→(v1 , v2 ))) | (W, γ) ∈ GJΓKρ ∧ (W, v1 , v2 ) ∈ VJτ Kρ} def DJ∆, αK = {ρ, α7→R | ρ ∈ DJ∆K ∧ R ∈ SomeValRel} def SJΣ, l:τ K = SJΣK ∩ {W ∈ World | (W, l, l) ∈ VJref τ K∅}

Σ; ∆; Γ ⊢ e1 : τ ∧ Σ; ∆; Γ ⊢ e2 : τ ∧ ∀W, ρ, γ. W ∈ SJΣK ∧ ρ ∈ DJ∆K ∧ (W, γ) ∈ GJΓKρ =⇒ (W, ρ1 γ1 e1 , ρ2 γ2 e2 ) ∈ E Jτ Kρ

Figure 3. A Step-Indexed Biorthogonal Kripke Logical Relation for HOS parameter. These world-indexed heap relations are quite restricted, however. Specifically, they must be monotone with respect to world extension, meaning that heaps related in one world will continue to be related in any future world. This ensures that adding a new island to the world, or making (any kind of) transition within an existing island, does not violate the heap constraints of other islands. The last two definitions also concern heap relations. Two heaps h1 and h2 satisfy a world W , written (h1 , h2 ) : W , iff they can be split into disjoint subheaps such that for each island in W there is a subheap of h1 and a corresponding subheap of h2 that are related by that island’s current heap relation (the relation associated with the island’s current state). A heap relation ψ is the tensor of ψ ′ and ψ ′′ , written ψ ′ ⊗ ψ ′′ , if it contains all (W, h1 , h2 ) that can be split into disjoint parts (W, h′1 , h′2 ) ∈ ψ ′ and (W, h′′1 , h′′2 ) ∈ ψ ′′ .

In logical relations proofs, we frequently assume that we are given some related values (e.g., as inputs to functions), and we want them to be still related after we have added an island to the world or made a transition. It is therefore crucial that, like heap relations, value relations are monotone w.r.t. world extension. Since we enforce this property for relational interpretations of abstract types (see the definition of ValRel in Figure 2), it is easy to show that the value relation indeed has this property: Theorem 1 (Monotonicity of the Value Relation). If W ′ ⊒ W and (W, v1 , v2 ) ∈ VJτ Kρ, then (W ′ , v1 , v2 ) ∈ VJτ Kρ. As explained in Section 4, the value relation is lifted to a term relation via biorthogonality. Concretely, we define the continuation relation KJτ Kρ based on VJτ Kρ, and then the term relation E Jτ Kρ based on KJτ Kρ:

Logical Relation Our logical relation for HOS is defined in Figure 3. The value relation VJτ Kρ (where fv(τ ) ⊆ dom(ρ)) is fairly standard. The only real difference from the ADR model is in VJref τ Kρ, our interpretation of reference types. Basically, we say that two references l1 and l2 are logically related at type ref τ in world W if there exists an island ι in W , such that (1) ι’s heap constraint (in any reachable state) requires of l1 and l2 precisely that their contents are related at type τ , and (2) the reachable states in ι encode a bijection between locations that includes the pair (l1 , l2 ). The latter condition, which employs an auxiliary “bij” function (defined in the appendix [8]), is needed in order to model the presence of reference equality testing l1 == l2 in the language. Our formulation of VJref τ Kρ is slightly different from ADR’s and a bit more flexible—e.g., ours can be used to prove Bohr’s “local state release” example [7] (see the appendix), whereas ADR’s can’t— but this added flexibility does not affect any of our “headlining” examples from Sections 3–6. We will report on the advantages of our present formulation in a future, extended version of this paper.

• Two continuations are related iff they yield related observations

when applied to related values.

• Two terms are related iff they yield related observations when

evaluated under related continuations.

Yielding related observations here means (see the definition of O) that, whenever two heaps satisfy the world W in question and the first program terminates in the first heap (within W.k steps), then the second program terminates in the second heap and the world is consistent (i.e., no island is in an inconsistent state). This corresponds to the intuition given in Section 5.2 that an inconsistent world is one in which the first program diverges. Notice that the continuation relation quantifies only over public future worlds. This captures the essential idea (explained in Section 5.1) that the context can only make public transitions. In order to see this, it is important to understand how a typical proof in a biorthogonal logical relation goes. Roughly, showing the related-

151

7.3 FOS

ness of two programs that involve a call to an unknown function (e.g., a callback) eventually reduces to showing that the continuations of the function call are related; thanks to the definition of KJτ Kρ, we will only need to consider the possibility that those continuations are invoked in a public future world of the world we were in prior to the function call—in other words, we can assume that the function call made a public transition. We will see how this works in detail in the example proof in Section 7.5. Finally, the logical relation is lifted to open terms in the usual way, quantifying over related closing substitutions δ and γ matching ∆ and Γ, respectively, as well as an initial world in which every location bound in Σ is related to itself. We write δ1 (resp. γ1 ) and δ2 (resp. γ2 ) here as shorthand for the first and second type (resp. value) substitutions contained in δ (resp. γ).

In the first-order state setting, observe that, for the types of values that can be stored in the heap—namely, those of base type—our logical relation for values coincides with syntactic equality. Consequently, when expressing that two heap values are logically related, we no longer need to refer to a world. Obtaining the model for FOS from the one for HOS is therefore very simple—all that is needed is to remove the ability of heap relations to be world-dependent: def

HeapRel′n = P(Heap × Heap) Our heap relations are now more or less the same as in Pitts and Stark [28]—that is, they are simply heap relations! Correspondingly, we must also update the definitions of (h1 , h2 ) : W , ψ ′ ⊗ψ ′′ , and VJref τ Kρ, all in the obvious manner, to reflect the lack of world indices in heap relations. (For details, see the appendix.) Note that while step-indices are no longer needed to stratify our worlds, they are still useful in modeling general recursive types. This simplification of HeapRel enables backtracking (see Section 6.1) by isolating islands from one another completely. Whereas before, changing the state of an island ι could break the heap constraints in other islands if we did not strictly follow ι’s STS, now there is no way for changes to ι’s state to affect the satisfaction of other islands’ heap constraints, so we are free to backtrack.

Soundness and Completeness The proof that our logical relation is sound w.r.t. contextual approximation follows closely that of ADR [3]. It involves proving the usual “compatibility” lemmas and the construction of a canonical safe world for a given heap typing. Details can be found in the technical appendix [8]. Theorem 2 (Fundamental Property). If Σ; ∆; Γ ⊢ e : τ , then Σ; ∆; Γ ⊢ e -log e : τ . Theorem 3 (Soundness). -log ⊆ -ctx Following Pitts and Stark [28], we show completeness of our logical relation w.r.t. contextual approximation with the help of Mason and Talcott’s ciu-approximation [23] as an intermediate relation.

7.4 FOSC The changes to the HOS model discussed in Sections 7.2 and 7.3 are completely orthogonal and may be easily combined in order to obtain a fully abstract model for FOSC.

Theorem 4 (Completeness). -ctx ⊆ -ciu ⊆ -log

7.5 Proof of Deferred Divergence Example (FOS Version)

Proving the inclusion of -ctx in -ciu is fairly easy. The inclusion of -ciu in -log follows as an almost immediate consequence of the Fundamental Property, together with the logical relation’s biorthogonal definition. Again, full details can be found in the appendix [8].

We now present in detail a proof that demonstrates the use of all three of our model’s special features (private transitions, inconsistent states, and backtracking). Concretely, we show the difficult direction of approximation in the FOS version of the “deferred divergence” example from Section 6.2. Formally, our goal is to prove ·; ·; · ⊢ e1 -log e2 : τ . Unfolding the definition, this reduces to showing (W, e1 , e2 ) ∈ E Jτ K for W ∈ World. So assume we are given continuations (W, K1 , K2 ) ∈ KJτ K and heaps (h1 , h2 ) : W and hh1 ; K1 [e1 ]i terminates in less than W.k steps. We must now show that W is consistent and that hh2 ; K2 [e2 ]i terminates as well. Observe that since hh1 ; K1 [e1 ]i terminates in less than W.k steps, so does hh1 ⊎ {ly 7→ff}; K1 [eb1 [ly /y]]i, where eb1 is the body of the let-expression in e1 , and ly is some fresh location. For this new location, we extend the world with an island representing the STS from Section 6.2, with s = 1, 2, and 3 representing the left, middle, and right states of the STS, respectively:

7.2 HOSC The model for HOSC can be obtained from the one for HOS by making two changes. First of all, in HOSC, we have to account for the presence of first-class continuation values contτ K. Fortunately, we already have a continuation relation KJτ Kρ, so it is easy to define the value relation at type cont τ in terms of it: def

VJcont τ Kρ = {(W, cont K1 , cont K2 ) | (W, K1 , K2 ) ∈ KJτ Kρ} Now, recall that we need our value relation to be monotone w.r.t. ⊒. Given the extension we have just made to the value relation for cont τ , that means we need our continuation relation to be monotone w.r.t. ⊒ as well. However, as explained above, the continuation relation is only monotone w.r.t. ⊒pub (in order to ensure that the context can only make public transitions). Of course, what this means is that in the presence of call/cc, the private and public transition relations must be collapsed into one, and consequently we must disallow inconsistent states, too. This corresponds to the intuition we gave in Section 5.1, namely that private transitions and inconsistent states are only sound to use in the absence of call/cc. Formally, we disallow them by redefining Islandn as follows: Island′n

def

=

Ws ιs δ ϕ H(1) H(2) H(3)

{ι ∈ Islandn | ι.ϕ = ι.δ ∧ ι. = ∅}

= = = = = = = =

(W.k, (W.Σ1 , ly :bool), W.Σ2 , (W.ω, ιs )) (s, δ, ϕ, , H) {(1, 2), (2, 3)}∗ {(2, 3)}∗ {3} f1 , h f2 ) | h f1 (ly ) = ff} {(h f1 , h f2 ) | h f1 (ly ) = ff} {(h f1 , h f2 ) | h f1 (ly ) = tt} {(h

Here the superscript “*” in the definitions of δ and ϕ denotes the reflexive, transitive closure over State. Note that ι1 is safe and therefore W1 ⊒pub W . Given how we defined our island, it is easy to see that (h1 ⊎ {ly 7→ff}, h2 ) : W1 follows from (h1 , h2 ) : W . Assuming we are able to show (W1 , eb1 [ly /y], e2 ) ∈ VJτ K, we can instantiate (W, K1 , K2 ) ∈ KJτ K and get consistent(W1 ) and that hh2 ; K2 [e2 ]i terminates. The latter is one of the two things we needed to show. The other

Under this definition, the two notions of world extension coincide and all worlds are consistent. The rest of the model stays the same. In particular, proofs done in the HOS model that do not make use of private transitions or inconsistent states can be transferred without any change. The soundness and completeness proofs carry over as well. The former merely needs to be extended in a straightforward way to deal with call/cc, throw, and cont.

152

one is consistent(W ). Since the only difference between W and W1 is our island, this follows from consistent(W1 ). It remains to show (W1 , eb1 [ly /y], e2 ) ∈ VJτ K. So suppose we are given a future world W ′ ⊒ W1 and related callbacks (W ′ , f1 , f2 ) ∈ VJ(unit → unit) → unitK. We need to show (W ′ , e′1 , f2 (λ . ⊥)) ∈ E JunitK, where

natural, then, to ask about the impact that other control effects have on our model. At least in the case of exceptions, the answer is quite simple, as we will now briefly explain. (Details appear in the technical appendix [8], and we intend to elaborate on these in an extended version of this paper. We leave consideration of other control effects, such as delimited continuations, to future work.) First of all, unlike throwing to a continuation, raising an exception causes a “well-bracketed” kind of control effect, in the sense that it passes control to the exception handler that was most recently pushed onto the control stack. Thus, the presence of exceptions does not per se restrict our STS model: we are free to use STS’s with private transitions and inconsistent states. However, the possibility of exceptional behavior means that, when proving two continuations to be logically related (by KJτ Kρ), we must show that they behave in a related manner not only when they are plugged with related values, but also when they are passed related raised exceptions. Concretely, the definition of KJτ Kρ becomes the following (assuming a new base type exn of exceptions):

e′1 = f1 (λ . ly := tt); if !ly then ⊥ else hi. So suppose we are given continuations (W ′ , K1′ , K2′ ) ∈ KJunitK and heaps (h′1 , h′2 ) : W ′ and hh′1 ; K1′ [e′1 ]i terminates in less than W ′ .k steps. We must now show that W ′ is consistent and that hh′2 ; K2′ [f2 (λ . ⊥)]i terminates as well. As a matter of notation, let Ws′ denote the world obtained from W ′ by setting our island’s state to s. We only show the case W ′ = W1′ here; the other two are similar (and simpler). The first step is to “move to the middle state (state 2)”. Formally, since the heap constraints of state 1 and 2 are the same, (h′1 , h′2 ) : W1′ implies (h′1 , h′2 ) : W2′ . Now, we want to prove the following: 1. (W2′ , f1 (λ . ly := tt), f2 (λ . ⊥)) ∈ E JunitK

{(W, K1 , K2 ) ∈ ContAtom[ρ1 (τ ), ρ2 (τ )] | ∀W ′ , v1 , v2 . W ′ ⊒pub W =⇒ ((W ′ , v1 , v2 ) ∈ VJτ Kρ =⇒ (W ′ , K1 [v1 ], K2 [v2 ]) ∈ O) ∧ ((W ′ , v1 , v2 ) ∈ VJexnK =⇒ (W ′ , K1 [raise v1 ], K2 [raise v2 ]) ∈ O)}

2. (W2′ , K1′ [•; if !ly then ⊥ else hi], K2′ ) ∈ KJunitK If we can prove these two subgoals, then instantiating (1) with (2) yields consistent(W2′ ) and that hh′2 ; K2′ [f2 (λ . ⊥)]i terminates. The latter is one of the two things we needed to show. The other one is consistent(W1′ ), which obviously follows from consistent(W2′ ). So it remains to show (1) and (2). For (1), first note that since f1 and f2 are related in W1′ , they are by monotonicity also related in W2′ since W2′ ⊒ W1′ . It therefore suffices to show the relatedness of their thunk arguments, i.e., (W2′ , (λ . ly := tt), (λ . ⊥)) ∈ VJunit → unitK. To that end, we suppose W ′′ ⊒ W2′ and have to show (W ′′ , ly := tt, ⊥) ∈ E JunitK. So assume we are given continuations (W ′′ , K1′′ , K2′′ ) ∈ KJunitK and heaps (h′′1 , h′′2 ) : W ′′ . With the help of the inconsistent state we will now show that hh′′1 ; K1′′ [ly := tt]i certainly does not terminate in less than W ′′ .k steps (so there is nothing further to do). Assume it does, implying that hh′′1 [ly 7→tt]; K1′′ [hi]i does, too. Since W ′′ ⊒ W2′ , W ′′ is either W2′′ or W3′′ (using the same notational trick as above). Consequently, it is easy to see that W3′′ ⊒pub W ′′ , as well as (h′′1 [ly 7→tt], h′′2 ) : W3′′ . Instantiating (W ′′ , K1′′ , K2′′ ) ∈ KJunitK with all this plus the trivial fact that (W3′′ , hi, hi) ∈ VJunitK yields consistent(W3′′ ), which is clearly in contradiction to 3 being an inconsistent state. For (2), suppose we are given W ′′ ⊒pub W2′ and heaps (h′′1 , h′′2 ) : W ′′ and that hh′′1 ; K1′ [if !ly then ⊥ else hi]i terminates in less than W ′′ .k steps. We have to show consistent(W ′′ ) and that hh′′2 ; K2′ [hi]i terminates. From the assumptions it is clear that h′′1 (ly ) must be ff and thus hh′′1 ; K1′ [hi]i terminates in less than W ′′ .k steps. This also implies that W ′′ must be W2′′ . We now want to instantiate (W1′ , K1′ , K2′ ) ∈ KJunitK, but W2′′ does not publicly extend W1′ because there is no public transition from state 1 to state 2. However, we can now backtrack to state 1: because both states express the same heap constraint and because heap relations for FOS are world-independent, (h′′1 , h′′2 ) : W2′′ implies (h′′1 , h′′2 ) : W1′′ . Note that W2′′ ⊒pub W2′ implies W1′′ ⊒pub W1′ . Finally, we can instantiate (W1′ , K1′ , K2′ ) ∈ KJunitK with all this plus (W1′′ , hi, hi) ∈ VJunitK, to obtain consistent(W1′′ ) and that hh′′2 ; K2′ [hi]i terminates. Since our state 2 is a consistent state, consistent(W1′′ ) implies consistent(W2′′ ), and we are done.

In essence, this new definition is equivalent to KJM(τ )Kρ, where M is the exception monad—i.e., M(τ ) ≈ τ + exn. Each of the various examples we have considered in this paper involves proving equivalence of two higher-order functions that, when called, will manipulate some local state and invoke their (unknown) callback arguments. Thus, for each of the examples, the new, more restrictive definition of KJτ Kρ requires us to consider the possibility that the callback invocation may raise an exception. Since the higher-order function in each example does not install any exception handler around its callback invocation, any exception raised by that callback invocation will remain uncaught, causing the function to return immediately (raising the same exception). We therefore need to show that any state in which the callback may raise an exception—i.e., any state that is publicly accessible from the one in which the callback was invoked—is also publicly accessible from the initial state in which the higher-order function was called. For the callback-with-lock example, this is indeed the case, since the only state publicly accessible from the “locked” state (in which the callback is invoked) is itself, which is publicly accessible from the “unlocked” starting state. For the other examples, on the other hand, this criterion is not met; and indeed, in the presence of exceptions, it is not hard to find program contexts that distinguish the higher-order functions in those examples.

9. Related and Future Work

8. Reasoning in the Presence of Exceptions

Many techniques have been proposed for reasoning about contextual equivalence of stateful programs. Using a variety of these techniques, most of the examples we discuss in this paper have been proven already (with minor variations) in different language settings, but there has not heretofore been any clear account of how they all fit together. Indeed, our main contribution lies in our unifying framework of STS’s, along with the realization that the absence of call/cc and/or higher-order state enables the extension of our STS model in orthogonal ways. That said, some of our examples are also new, such as “callback with lock” in FOSC, and the other ADR examples in HOSC (see the appendix [8] for more).

In this paper, we have focused attention on first-class continuations as our control effect of interest, and demonstrated that their absence enables the extension of our STS-based Kripke model with the mechanisms of private transitions and inconsistent states. It is

Game Semantics As explained in the introduction, game semantics has served as an inspiration to us, especially Abramsky’s idea of the “semantic cube”. There are many papers on this topic; perhaps the two most relevant to our present work are Laird’s model

153

step-indexed. In the absence of step indices, biorthogonality renders the logical relation admissible (an important property when modeling recursion). In the presence of step indices, admissibility is not as important, since the model essentially only consists of finite approximations, and there is no need to ever talk about their limit. Nevertheless, as we have seen, biorthogonality plays a crucial role in modeling control and ensuring full abstraction. With respect to the latter, it is not clear how useful the full abstraction property is for us per se, since it is achieved in a largely “feature-independent” manner. That is, the proof that biorthogonality makes the logical relation complete is essentially the same for each of the four languages we consider, so full abstraction here is perhaps not the most informative criterion. One could for instance take Pitts and Stark’s original model, add step-indexing to it, and get out a different fully abstract model for HOSC. Clearly, that model would not be as practically powerful as our STS-based model, but it would nevertheless be fully abstract. Aside from ADR, the closest logical relations to ours are the ones developed by Bohr in her thesis [7]. Hers also employ biorthogonality, albeit in a denotational setting. Her possible worlds bear some similarity to ADR’s in that they, too, allow one to model heap properties that evolve over time. In addition, they allow one to impose constraints on continuations. Like us, she is also able to handle the HOS version of the deferred divergence example, but the language she considers is not as rich as ours (it does not support full polymorphism), and she does not consider handling call/cc or the restriction to first-order state. We can prove all of the examples from her thesis, and we believe that our proofs are significantly simpler to understand. Regarding the deferred divergence example: it is originally due to O’Hearn, who formulated it in the context of Idealized Algol [26, 2.3]. Pitts showed how to prove this example using operational Kripke logical relations, by allowing the parameters of the logical relation to relate proper states to undefined states (i.e., by phrasing heap relations over “lifted” heaps) [29]. It is not clear whether this technique generalizes to higher-order state, however. More recently, Johann, Simpson, and Voigtl¨ander [14] have proposed a generic framework for operational reasoning about algebraic effects. Their work is complementary to ours: they develop effect-independent proof principles, whereas we develop effectspecific proof principles. They do not consider local state, higherorder state, or control. Lastly, our decision to employ both step-indexing and biorthogonality was influenced directly by the work of Benton, together with Tabareau [6] and Hur [5], on compiler correctness. They argue persuasively for the benefits of combining the two techniques.

of call-by-name PCF extended with a control operator [19] and Abramsky, Honda, and McCusker’s model of call-by-value PCF extended with general references [1]. Unlike our HOSC and its fragments, the language considered by Abramsky et al. does not support pointer equality.5 The primary focus of the research on games models has been full abstraction. One of the key motivations for having a fully abstract model is, of course, that it allows one to prove two programs observationally equivalent by proving that their denotations (in games models, “strategies”) are the same. However, the games models do not in general directly facilitate such proofs since the strategies are non-trivial to analyze for equality (and since game categories also involve a non-trivial quotienting). Hence, proof methods for proving actual program equivalences based on specific games models have primarily been developed only for simple languages with state, namely call-by-name Idealized Algol. For a finitary version of that language (i.e., a version with only finite ground types and no recursion) there is a full classification of when contextual equivalence is decidable (e.g., see [12, 25]). A finitary version of a call-by-value variant has also been studied by Murawski [24], and with that model he could show some finitary versions of the examples of Pitts and Stark, e.g., the profiling example [24, p. 29]. Another focus of game semantics is on understanding how the presence of different features in a language affects the kinds of interactions a program can have with its context. Laird [19] models the presence of control operators by relaxing the “well-bracketing” restriction on strategies. Abramsky et al. [1] model the presence of higher-order state by relaxing the “visibility” restriction. There seems intuitively to be some correspondence between the former and our private transitions, and between the latter and our backtracking, but determining the precise nature of this correspondence is left to future work. Operational Game Semantics Another line of related work concerns what some have called “operational game semantics”. This work considers labeled transition systems, and either traces or bisimulation relations over those, directly inspired by games models. Such so-called “normal form bisimulation” relations have been developed for an untyped language with state and control [34], for a typed language with recursive types (but no state) [21], and for a language with impredicative polymorphism (but no state) [22]. Laird [20] gave a fully abstract trace semantics for the language of Abramsky et al. [1] extended with pointer equality. His trace-sets may be viewed as deterministic strategies in the sense of game semantics. Normal form bisimulations have been used to prove contextual equivalence of actual examples, e.g., Støvring and Lassen’s proof of correctness [34] for the encoding of call/cc via one-shot continuations that we described at the end of Section 4. Koutavas and Lassen have shown, in unpublished work [16], how Laird’s trace semantics can be used to prove the HOS version of the deferred divergence example (Section 5.2), by showing that the two programs have the same set of traces. To the best of our knowledge, however, no fully abstract games model (either operational or denotational) has yet been given for the rich language that we consider in this paper (call-by-value, impredicative polymorphism, general references with pointer equality, call/cc, and recursive types).

Environmental Bisimulations For reasoning about contextual equivalences (involving either type abstraction or local state), one of the most successful alternatives to logical relations is the coinductive technique of environmental bisimulations. The current state of the art is Sumii’s work on type abstraction and general references [35], which builds on work by Sumii-Pierce [36], KoutavasWand [17], and Sangiorgi-Kobayashi-Sumii [32]. Sumii is able to handle all the examples we have presented here in the setting of HOS; he does not consider call/cc or first-order state (but does, in the work with Sangiorgi, consider concurrency). In some cases (e.g., for the well-bracketed version of the “awkward” example— see Section 5.1), his approach is somewhat “brute-force” in the sense that it requires explicit reasoning about the intensional structure of program contexts. We believe our state transition systems capture the intuitions about well-bracketing at a more abstract level.

Logical Relations Our work is heavily indebted to the pioneering work of Pitts and Stark [28], who gave a fully abstract logical relation for a simply-typed functional language with recursion and first-order state. In particular, we rely on the basic setup of their biorthogonal Kripke model, although (like ADR’s) ours is also

Anti-Frame Rule Pottier [30] has proposed an alternative way of reasoning about local state using a rich type system with capabilities, regions, and linearity. His anti-frame rule allows one to es-

5 We

have not emphasized the fact that we model pointer equality in this paper, but some of ADR’s examples do make use of it, and it is a feature one generally expects to find in real ML-like languages.

154

tablish a hidden property about a piece of local state, much in the same way that our islands do. In its original form, however, the anti-frame rule was restricted to reasoning about invariants, which we argued in Section 3 are insufficient for many examples. To address this limitation, Pottier has suggested two extensions of his framework. First, in joint work with Pilkiewicz [27], he proposes the use of fates, which enable reasoning about monotonic state in a manner rather similar to the state transition systems in our Kripke model. Second, in a brief unpublished note [31], he sets forth a generalized version of the anti-frame rule that permits reasoning about well-bracketed state change. While there are clear analogies between these extensions and our public/private state transitions, determining a precise formal correspondence is likely to be difficult because the methods are tailored to different purposes. On one hand, Pottier’s type systems are richer than that of ML, and thus his techniques can be used to verify correctness of some interesting programs that exploit the advanced features of his type systems. On the other hand, some equivalences—like our “deferred divergence” example from Section 5.2—do not seem to be easily expressible as “unary” typechecking problems and thus cannot seemingly be handled by Pottier’s method. Moreover, like Sumii [35], Pottier restricts attention to languages that support higher-order state but no control effects. Finally, it is important to note that Pottier’s anti-frame rule has only been proven sound in a relatively idealized setting [33], and its soundness has yet to be established even in the context of the typeand-capability system in which it was originally proposed [30], let alone the extended systems mentioned above [27, 31].

[9] D. Dreyer, G. Neis, A. Rossberg, and L. Birkedal. A relational modal logic for higher-order stateful ADTs. In POPL, 2010. [10] M. Felleisen and R. Hieb. The revised report on the syntactic theories of sequential control and state. TCS, 103(2):235–271, 1992. [11] D. Friedman and C. Haynes. Constraining control. In POPL, 1985. [12] D. R. Ghica and G. McCusker. Reasoning about Idealized Algol using regular languages. In ICALP, 2000. [13] P. Johann. Short cut fusion is correct. JFP, 13(4):797–814, 2003. [14] P. Johann, A. Simpson, and J. Voigtl¨ander. A generic operational metatheory for algebraic effects. In LICS, 2010. [15] P. Johann and J. Voigtl¨ander. The impact of seq on free theoremsbased program transformations. Fundamenta Informaticae, 69(1– 2):63–102, 2006. [16] V. Koutavas and S. Lassen. Fun with fully abstract operational game semantics for general references. Unpublished, Feb. 2008. [17] V. Koutavas and M. Wand. Small bisimulations for reasoning about higher-order imperative programs. In POPL, 2006. [18] J.-L. Krivine. Classical logic, storage operators and second-order lambda-calculus. Annals of Pure and Applied Logic, 68:53–78, 1994. [19] J. Laird. Full abstraction for functional languages with control. In LICS, 1997. [20] J. Laird. A fully abstract trace semantics for general references. In ICALP, 2007. [21] S. B. Lassen and P. B. Levy. Typed normal form bisimulation. In CSL, 2007. [22] S. B. Lassen and P. B. Levy. Typed normal form bisimulation for parametric polymorphism. In LICS, 2008.

Other Related Work Seminal work on operational reasoning about state and control was conducted by Felleisen and Hieb [10] and Mason and Talcott [23], but the proof principles they developed are relatively weak in comparison to the ones afforded by our model. Thielecke [37] demonstrated an interesting equivalence that holds in the presence of exceptions and state, but not in the presence of continuations and state. His proof method is relatively bruteforce, however, and we can easily prove his example using an STS with private transitions. More recently, Yoshida et al. [38] proposed a Hoare-style logic for reasoning about higher-order programs with local state, but it does not handle abstract types, nor does it permit the kind of reasoning achieved by our STS’s. Dreyer et al. [9] have devised a relational modal logic that accounts for the essential aspects of the ADR model. In the future, we hope to generalize that logic to account for the additional features we have proposed here.

[23] I. Mason and C. Talcott. Equivalence in functional languages with effects. JFP, 1(3):287–327, 1991. [24] A. S. Murawski. Functions with local state: regularity and undecidability. TCS, 338(1–3):315–349, 2005. [25] A. S. Murawski and I. Walukiewicz. Third-order Idealized Algol with iteration is decidable. TCS, 390(2–3):214–229, 2008. [26] P. O’Hearn and U. Reddy. Objects, interference, and the Yoneda embedding. In MFPS, 1995. [27] A. Pilkiewicz and F. Pottier. The essence of monotonic state. Submitted for publication, 2009. [28] A. Pitts and I. Stark. Operational reasoning for functions with local state. In HOOTS, 1998. [29] A. M. Pitts. Reasoning about local variables with operationally-based logical relations. In LICS, 1996.

References

[30] F. Pottier. Hiding local state in direct style: a higher-order anti-frame rule. In LICS, 2008.

[1] S. Abramsky, K. Honda, and G. McCusker. A fully abstract game semantics for general references. In LICS, 1998. [2] A. Ahmed. Semantics of Types for Mutable State. Princeton University, 2004.

[31] F. Pottier. Generalizing the higher-order frame and anti-frame rules. Unpublished, 2009.

PhD thesis,

[32] D. Sangiorgi, N. Kobayashi, and E. Sumii. Environmental bisimulations for higher-order languages. In LICS, 2007.

[3] A. Ahmed, D. Dreyer, and A. Rossberg. State-dependent representation independence. In POPL, 2009.

[33] J. Schwinghammer, H. Yang, L. Birkedal, F. Pottier, and B. Reus. A semantic foundation for hidden state. In FOSSACS, 2010.

[4] A. Appel and D. McAllester. An indexed model of recursive types for foundational proof-carrying code. TOPLAS, 23(5):657–683, 2001.

[34] K. Støvring and S. B. Lassen. A complete, co-inductive syntactic theory of sequential control and state. In POPL, 2007.

[5] N. Benton and C.-K. Hur. Biorthogonality, step-indexing and compiler correctness. In ICFP, 2009.

[35] E. Sumii. A complete characterization of observational equivalence in polymorphic λ-calculus with general references. In CSL, 2009.

[6] N. Benton and N. Tabareau. Compiling functional types to relational specifications for low level imperative code. In TLDI, 2009.

[36] E. Sumii and B. Pierce. A bisimulation for type abstraction and recursion. Journal of the ACM, 54(5):1–43, 2007.

[7] N. Bohr. Advances in Reasoning Principles for Contextual Equivalence and Termination. PhD thesis, IT University of Copenhagen, 2007.

[37] H. Thielecke. On exceptions versus continuations in the presence of state. In ESOP, 2000.

[8] D. Dreyer, G. Neis, and L. Birkedal. The impact of higher-order state and control effects on local relational reasoning (Technical appendix), 2010. http://www.mpi-sws.org/~dreyer/papers/stslr/.

[38] N. Yoshida, K. Honda, and M. Berger. Logical reasoning for higherorder functions with local state. LMCS, 4(4:2), 2008.

155

Distance Makes the Types Grow Stronger A Calculus for Differential Privacy Jason Reed

Benjamin C. Pierce

University of Pennsylvania

Abstract

first, replacing the informal goal of ‘not violating privacy’ with a technically precise and strong statistical guarantee, and then offering various mechanisms for achieving this guarantee. Essentially, a mechanism for publishing data is differentially private if any conclusion made from the published data is almost exactly as likely if any one individual’s data is omitted from the database. Methods for achieving this guarantee can be attractively simple, usually involving taking the true answer to a query and adding enough random noise to blur the contributions of individuals. For example, the query “How many patients at this hospital are over the age of 40?” is intuitively “almost safe”—safe because it aggregates many individuals’ contributions together, and “almost” because if an adversary happened to know the ages of every patient except John Doe, then answering this query would give them certain knowledge of a fact about John. The differential privacy methodology rests on the observation that, if we add a small amount of random noise to its result, we can still get a useful idea of the true answer to this query while obscuring the contribution of any single individual. By contrast, the query “How many patients are over the age of 40 and also happen to be named John Doe?” is plainly problematic, since it is focused on an individual rather than an aggregate. Such a query cannot usefully be privatized: if we add enough noise to obscure any individual’s contribution to the result, there won’t be any signal left. So far, most of the work in differential privacy concerns specific algorithms rather than general, compositional language features. Although there is already an impressive set of differentially private versions of particular algorithms [6, 18], each new one requires its own separate proof. McSherry’s Privacy Integrated Queries (PINQ) [25] are a good step toward more general principles: they allow for some relational algebra operations on database tables, as well as certain forms of composition of queries. But even these are relatively limited. We offer here a higher-order functional programming language whose type system directly embodies reasoning about differential privacy. In this language, we can implement McSherry’s principles of sequential and parallel composition of differentially private computations, and many others besides, as higherorder functions. This provides a foundational explanation of why compositions of differentially private mechanisms succeed in the ways that they do. The central idea in our type system also appears in PINQ and in many of the algorithm-by-algorithm proofs in the differential privacy literature: the sensitivity of query functions to quantitative differences in their input. Sensitivity is a sort of continuity property; a function of low sensitivity maps nearby inputs to nearby outputs. To give precise meaning to ‘nearby,’ we equip every type with a metric — a notion of distance — on its values. Sensitivity matters for differential privacy because the amount of noise required to make a deterministic query differentially private is proportional to that query’s sensitivity. The sensitivity of

We want assurances that sensitive information will not be disclosed when aggregate data derived from a database is published. Differential privacy offers a strong statistical guarantee that the effect of the presence of any individual in a database will be negligible, even when an adversary has auxiliary knowledge. Much of the prior work in this area consists of proving algorithms to be differentially private one at a time; we propose to streamline this process with a functional language whose type system automatically guarantees differential privacy, allowing the programmer to write complex privacy-safe query programs in a flexible and compositional way. The key novelty is the way our type system captures function sensitivity, a measure of how much a function can magnify the distance between similar inputs: well-typed programs not only can’t go wrong, they can’t go too far on nearby inputs. Moreover, by introducing a monad for random computations, we can show that the established definition of differential privacy falls out naturally as a special case of this soundness principle. We develop examples including known differentially private algorithms, privacy-aware variants of standard functional programming idioms, and compositionality principles for differential privacy. Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifications—specialized application languages General Terms Keywords

1.

Languages

Differential Privacy, Type Systems

Introduction

It’s no secret that privacy is a problem. A wealth of information about individuals is accumulating in various databases — patient records, content and link graphs of social networking sites, book and movie ratings, ... — and there are many potentially good uses to which it could be put. But, as Netflix and others have learned [26] to their detriment, even when data collectors try to release only anonymized or aggregated results, it is easy to publish information that reveals much more than was intended, when cleverly combined with other data sources. An exciting new body of work on differential privacy [6, 7, 12–15, 27] aims to address this problem by,

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

157

both queries discussed above is in fact 1: adding or removing one patient’s records from the hospital database can only change the true value of the query by at most 1. This means that we should add the same amount of noise to “How many patients at this hospital are over the age of 40?” as to “How many patients are over the age of 40, who also happen to be named John Doe?” This may appear counter-intuitive, but actually it is just right: the privacy of single individuals is protected to exactly the same degree in both cases. Of course, the usefulness of the results differs: knowing the answer to the first query with, say, a typical error margin of ±100 could still be valuable if there are thousands of patients in the hospital’s records, whereas knowing the answer to the second query (which can only be zero or one) ±100 is useless. (We might try making the second query more useful by scaling its answer up numerically: “Is John Doe over 40? If yes, then 1,000, else 0.” But this query has a sensitivity of 1,000, not 1, and so 1,000 times as much noise must be added, blocking our sneaky attempt to violate privacy.) To track function sensitivity, we give a distance-aware type system. This type system embodies two important connections between differential privacy and concepts from logic and type theory. First, reasoning about sensitivity itself strongly resembles linear logic [4, 16], which has been widely applied in programming languages. The essential intuition about linear logic and linear type theories is that they treat assumptions as consumable resources. We will see that in our setting the capability to sensitively depend on an input’s value behaves like a resource. This intuition recurs throughout the paper, and we sometimes refer to sensitivity to an input as if it is counting the number of “uses” of that input. The other connection comes from the use of a monad to internalize the operation of adding random noise to query results. We include in the programming language a monad for random computations, similar to previously proposed stochastic calculi [29, 30]. Since every type has a metric in our setting, we are led to ask: what should the metric be for the monad? We find that, with the right choice of metric, the definition of differentially private functions falls out as a special case of the definition of function sensitivity for functions, when the function output happens to be monadic. This observation is very useful: while prior work treats differential privacy mechanisms and private queries as separate things, we see here that they can be unified in a single language. Our type system can express the privacy-safety of individual queries, as well as more complex query protocols (see Section 5) that repeatedly interact with a private database, adjusting which queries they perform depending on the responses they receive. To briefly foreshadow what a query in our language looks like, suppose that we have the following functions available:

of 1, and this fact is intimately connected to privacy properties, as explained in Section 4. With these in place, the query can be written as the program λd : db. add noise (filter over 40 d) : db ( #R.

As we explain in Section 4, its type indicates that it is a differentially private computation taking a database and producing a real number. Its runtime behavior is to yield a privacy-preserving noised count of the number of patients in the hospital that are over 40. We begin in Section 2 by describing a core type system that tracks function sensitivity. We state an informal version of the key metric preservation theorem, which says the execution of every well-typed function reflects the sensitivity that the type system assigns it. Section 3 gives examples of programs that can be implemented in our language. Section 4 shows how to add the probability monad, and Section 5 develops further examples. In Section 6 we state the standard safety properties of the type system, give a formal statement of the metric preservation theorem, and sketch its proof. The remaining sections discuss related work and offer concluding remarks.

2.

A Type System for Function Sensitivity

2.1

Sensitivity

Our point of departure for designing a programming language for differential privacy is function sensitivity. A function is said to be csensitive (or have sensitivity c) if it can magnify distances between inputs by a factor of at most c. Since this definition depends on the input and output types of the function having a metric (a notion of distance) defined on them, we begin by discussing a special case of the definition for functions from R to R, where we can use the familiar Euclidean metric dR (x, y) = |x − y| on the real line. We can then formally define c-sensitivity for real-valued functions as follows. Definition A function f : R → R is said to be c-sensitive iff dR (f (x), f (y)) ≤ c · dR (x, y) for all x, y ∈ R. A special case of this definition that comes up frequently is the case where c = 1. A 1-sensitive function is also called a nonexpansive function, since it keeps distances between input points the same or else makes them smaller. Some examples of 1-sensitive functions are f1 (x) = x

f2 (x) = −x

f4 (x) = |x|

f3 (x) = x/2

f5 (x) = (x + |x|)/2

and some non-examples include: f6 (x) = 2x and f7 (x) = x2 . The function f6 , while not 1-sensitive, is 2-sensitive. On the other hand, f7 is not c-sensitive for any c.

over 40 : row → bool size : db ( R filter : (row → bool) → db ( db add noise : R ( #R

P ROPOSITION 2.1. Every function that is c-sensitive is also c0 sensitive for every c0 ≥ c.

The predicate over 40 simply determines whether or not an individual database row indicates that patient is over the age of 40. The function size takes an entire database, and outputs how many rows it contains. Its type uses a special arrow (, related to the linear logic function type of the same name, which expresses that the function has sensitivity of 1. The higher-order function filter takes a predicate on database rows and a database; it returns the subset of the rows in the database that satisfy the predicate. This filtering operation also has a sensitivity of 1 in its database argument, and again ( is used in its type. Finally, the function add noise is the differential privacy mechanism that takes a real number as input and returns a random computation (indicated by the monad #) that adds in a bit of random noise. This function also has a sensitivity

For example, f3 is both 1/2-sensitive and 1-sensitive. So far we only have one type, R, with an associated metric. We would like to introduce other base types, and type operators to build new types from old ones. We require that for every type τ that we discuss, there is a metric dτ (x, y) for values x, y ∈ τ . This requirement makes it possible to straightforwardly generalize the definition of c-sensitivity to arbitrary types. Definition A function f : τ1 → τ2 is said to be c-sensitive iff dτ2 (f (x), f (y)) ≤ c · dτ1 (x, y) for all x, y ∈ τ1 . The remainder of this subsection introduces several type operators, one after another, with examples of c-sensitive functions

158

on the types that they express. We use suggestive programminglanguage terminology and notation, but emphasize that the discussion for now is essentially about pure mathematical functions — we do not yet worry about computational issues such as the possibility of nontermination. For example, we speak of values of a type in a way that should be understood as more or less synonymous with mere elements of a set — in Section 2.2 below, we will show how to actually speak formally about types and values. First of all, when τ is a type with associated metric dτ , let !r τ be the type whose values are the same as those of τ , but with the metric ‘scaled up’ by a factor of r. That is, we define

the differences between their components instead the sum. Even though the underlying set of values is essentially the same, we regard choosing a different metric as creating a distinct type: the type τ1 & τ2 consists of pairs hv1 , v2 i, (written differently from pairs of type τ1 ⊗ τ2 to further emphasize the difference) with the metric dτ1 &τ2 (hv1 , v2 i, hv10 , v2 i) = max(dτ1 (v1 , v10 ), dτ2 (v2 , v20 )). Now we can say that f15 (x, y) = hx, xi is a 1-sensitive function R ⊗ R → R & R. More generally, & lets us combine outputs of different c-sensitive functions even if they share dependency on common inputs.

d!r τ (x, y) = r · dτ (x, y).

P ROPOSITION 2.3. If f : τ → τ1 and g : τ → τ2 are c-sensitive, then λx.hf x, g xi is a c-sensitive function in τ → τ1 & τ2 .

One role of this type operator is to allow us to reduce the concept of c-sensitivity to 1-sensitivity. For we have

Next we would like to capture the set of functions itself as a type, so that we can, for instance, talk about higher-order functions. Let us take τ1 ( τ2 to be the type whose values are 1-sensitive functions f : τ1 → τ2 . We have already established that the presence of !r means that having 1-sensitive functions suffices to express c-sensitive functions for all c, so we need not specially define an entire family of c-sensitive function type constructors: the type of c-sensitive functions from τ1 to τ2 is just !c τ1 ( τ2 . We define the metric for ( as follows:

P ROPOSITION 2.2. A function f is a c-sensitive function in τ1 → τ2 if and only f it is a 1-sensitive function in !c τ1 → τ2 . Proof Let x, y : τ1 be given. Suppose dτ1 (x, y) = r. Then d!c τ1 (x,y) = cr. For f to be c-sensitive as a function τ1 → τ2 we must have dτ2 (f (x), f (y)) ≤ cr, but this is exactly the same condition that must be satisfied for f to be a 1-sensitive function !c τ 1 → τ 2 . We can see therefore that f6 is a 1-sensitive function !2 R → R, and also in fact a 1-sensitive function R → !1/2 R. The symbol ! is borrowed from linear logic, where it indicates that a resource can be used an unlimited number of times. In our setting an input of type !r τ is analogous to a resource that can be used at most r times. We can also speak of !∞ , which scales up all non-zero distances to infinity, which is then like the original linear logic !, which allows unrestricted use. Another way we can consider building up new metric-carrying types from existing ones is by forming products. If τ1 and τ2 are types with associated metrics dτ1 and dτ2 , then let τ1 ⊗ τ2 be the type whose values are pairs (v1 , v2 ) where v1 ∈ τ1 and v2 ∈ τ2 . In the metric on this product type, we define the distance between two pairs to be the sum of the distances between each pair of components:

dτ1 (τ2 (f, f 0 ) = max dτ2 (f (x), f 0 (x)) x∈τ1

This is chosen to ensure that ( and ⊗ have the expected currying/uncurrying behavior with respect to each other. We find in fact that curry(f ) = λx.λy.f (x, y) uncurry(g) = λ(x, y).g x y are 1-sensitive functions in (R ⊗ R ( R) → (R ( R ( R) and (R ( R ( R) → (R ⊗ R ( R), respectively. We postulate several more type operators that are quite familiar from programming languages. The unit type 1 which has only one inhabitant (), has the metric d1 ((), ()) = 0. Given two types τ1 and τ2 , we can form their disjoint union τ1 + τ2 , whose values are either of the form inj1 v where v ∈ τ1 , or inj2 v where v ∈ τ2 . Its metric is 8 0 0 0 > :∞ otherwise.

dτ1 ⊗τ2 ((v1 , v2 ), (v10 , v2 )) = dτ1 (v1 , v10 ) + dτ2 (v2 , v20 ) With this type operator we can describe more arithmetic operations on real numbers. For instance, f8 (x, y) = x + y

f9 (x, y) = x − y

Note that this definition creates a type that is an extremely disjoint union of two components. Any distances between pairs of points within the same component take the distance that that component specifies, but distances from one component to the other are all infinite. Notice what this means for the type bool in particular, which we define as usual as 1 + 1. It is easy to write c-sensitive functions from bool to other types, for the infinite distance between the values true and false licenses us to map them to any two values we like, no matter how far apart they are. However, it is conversely hard for a nontrivial function to bool to be c-sensitive. The function gtzero : R → bool, which returns true when the input is greater than zero, is not c-sensitive for any finite c. This can be blamed, intuitively, on the discontinuity of gtzero at zero. Finally, we include the ability to form (iso)recursive types µα.τ whose values are of the form fold v, where v is of the type [µα.τ /α]τ , and whose metric we would like to give as

are 1-sensitive functions in R ⊗ R → R, and f10 (x, y) = (x, y) f12 (x, y) = (x + y, 0)

f11 (x, y) = (y, x) ( (x, y) if x < y cswp(x, y) = (y, x) otherwise

are 1-sensitive functions in R ⊗ R → R ⊗ R. We will see the usefulness of cswp in particular below in Section 3.6. However, f13 (x, y) = (x · y, 0)

f14 (x, y) = (x, x)

are not 1-sensitive functions in R⊗R → R⊗R. The function f14 is of particular interest, since at no point do we ever risk multiplying x by a constant greater than 1 (as we do in, say, f6 and f13 ) and yet the fact that x is used twice means that variation of x in the input is effectively doubled in measurable variation of the output. This intuition about counting uses of variables is reflected in the connection between our type system and linear logic. This metric is not the only one that we can assign to pairs. Just as linear logic has more than one conjunction, our type theory admits more than one product type. Another one that will prove useful is taking distance between pairs to be the maximum of

dµα.τ (fold v, fold v 0 ) = d[µα.τ /α]τ (v, v 0 ). This definition, however, is not well-founded, since it depends on a metric at possibly a more complex type, due to the substitution

159

P result of evaluating e only varies by i ri si . More carefully, we state the following metric preservation theorem for the type system, which is of central importance. The notation [v/x]e indicates substitution of the value v for the variable x in expression e as usual.

[µα.τ /α]τ . It will suffice as an intuition for our present informal discussion, since we only want to use it to talk about lists (rather than, say, types such as µα.α), but a formally correct treatment of the metric is given in Section 6.1. With these pieces in place, we can introduce a type of lists of real numbers, listreal = µα.1 + R ⊗ α. (The reader is invited to consider also the alternative where ⊗ is replaced by &; we return to this choice below in Section 3.) The metric between lists that arises from the preceding definitions is as follows. Two lists of different lengths are at distance ∞ from each other; this comes from the definition of the metric on disjoint union types. For two lists [x1 , . . . , xn ] and [y1 , . . . , yn ] of the same length, we have dlistreal ([x1 , . . . , xn ], [y1 , . . . , yn ]) =

n X

T HEOREM 2.4 (Metric Preservation). Suppose Γ ` e : τ . Let sequences of values (vi )1≤i≤n and (vi0 )1≤i≤n be given. Suppose for all i ∈ 1, . . . , n that we have 1. ` vi : τi 2. dτi (vi , vi0 ) = si 3. xi :ri τi ∈ Γ. If the program [v1 /x1 ] · · · [vn /xn ]e evaluates to v, then there exists a v 0 such that [v10 /x1 ] · · · [vn0 /xn ]e evaluates to v 0 , and X dτ (v, v 0 ) ≤ r i si .

|xi − yi |.

i=1

We now claim that there is a 1-sensitive function sort : listreal ( listreal that takes in a list of reals and outputs the sorted version of that same list. This fact may seem somewhat surprising, since a small variation in the input list can lead to an abrupt change in the permutation of the list that is produced. However, what we output is not the permutation itself, but merely the values of the sorted list; the apparent point of discontinuity where one value overtakes another is exactly where those two values are equal, and their exchange of positions in the output list is unobservable. Of course, we would prefer not to rely on such informal arguments. So let us turn next to designing a rigorous type system to capture sensitivity of programs, so that we can see that the 1sensitivity of sorting is a consequence of the fact that an implementation of a sorting program is well-typed. 2.2

i

We give a more precise version of this result in Section 6. 2.3

Typing Judgment

Type safety for a programming language ordinarily guarantees that a well-typed open expression e of type τ is well-behaved during execution. ‘Well-behaved’ is usually taken to mean that e can accept any (appropriately typed) value for its free variables, and will evaluate to a value of type τ without becoming stuck or causing runtime errors: Well-typed programs can’t go wrong. We mean to make a strictly stronger guarantee than this, namely a guarantee of c-sensitivity. It should be the case that if an expression is given similar input values for its free variables, the result of evaluation will also be suitably close—i.e., Well-typed programs can’t go too far. To this end, we take, as usual, a typing judgment Γ ` e : τ (expressing that e is a well-formed expression of type τ in a context Γ) but we add further structure the contexts. By doing so we are essentially generalizing c-sensitivity to capture what it means for an expression to be sensitive to many inputs simultaneously — that is, to all of the variables in the context — rather than just one. Contexts Γ have the syntax Γ

::=

Types

The complete syntax and formation rules for types are given in Figure 1. Essentially all of these types have already been mentioned in Section 2.1. There are type variables α, (which appear in type variable contexts Ψ) base types b (drawn from a signature Σ), unit and void and sum types, metric-scaled types !r τ , and recursive types µα.τ . There are the two pair types ⊗ and &, which differ in their metrics. There are two kinds of function space, ( and →, where τ1 ( τ2 contains just 1-sensitive functions, while τ1 → τ2 is the ordinary unrestricted function space, containing the functions that can be programmed without any sensitivity requirements on the argument. As in linear logic, there is an encoding of τ1 → τ2 , in our case as !∞ τ1 ( τ2 , but it is convenient to have the builtin type constructor → to avoid having to frequently introduce and eliminate !-typed expressions. 2.4

Expressions

The syntax of expressions is straightforward; indeed, our language can be seen as essentially just a refinement type system layered over the static and dynamic semantics of an ordinary typed functional programming language. Almost all of the expression formers should be entirely familiar. One feature worth noting (which is also familiar from linear type systems) is that we distinguish two kinds of pairs: the one that arises from ⊗, which is eliminated by patternmatching and written with (parentheses), and the one that arises from &, which is eliminated by projection and written with hangle bracketsi. The other is that for clarity we have explicit introduction and elimination forms for the type constructor !r . e

· | Γ, x :r τ

for r ∈ R>0 ∪{∞}. To have a hypothesis x :r τ while constructing an expression e is to have permission to be r-sensitive to variation in the input x: the output of e is allowed to vary by rs if the value substituted for x varies by s. We include the special value ∞ as an allowed value of r so that we can express ordinary (unconstrained by sensitivity) functions as well as c-sensitive functions. Algebraic operations involving ∞ are defined by setting ∞ · r = ∞ (except for ∞ · 0 = 0) and ∞ + r = ∞. This means that to be ∞-sensitive is no constraint at all: if we consider the definition of sensitivity, then ∞-sensitivity permits any variation at all in the input to be blown up to arbitrary variation in the output. A well-typed expression x :c τ1 ` e : τ2 is exactly a program that represents a c-sensitive computation. However, we can also consider more general programs x1 :r1 τ1 , . . . , xn :rn τn ` e : τ in which case the guarantee is that, if each xi varies by si , then the

::=

x | c | () | he, ei | (e, e) let(x, y) = e in e | πi e | λx.e | e e | inji e | (case e of x.e | x.e) | !e | let !x = e in e | unfoldτ e | foldτ e

Just as with base types, we allow for primitive constants c to be drawn from a signature Σ. 2.5

Typing Relation

To present the typing relation, we need a few algebraic operations on contexts. The notation sΓ indicates pointwise scalar multiplication of all the sensitivity annotations in Γ by s. We can also define addition of two contexts (which may share some variables) by ·+·=· (Γ, x :s τ ) + (∆, x :r τ ) = (Γ + ∆), x :r+s τ (Γ, x :r τ ) + ∆ = (Γ + ∆), x :r τ Γ + (∆, x :r τ ) = (Γ + ∆), x :r τ

160

(x 6∈ ∆) (x 6∈ Γ)

τ ::= α | b | 1 | µα.τ | τ + τ | τ ⊗ τ | τ & τ | τ ( τ | τ → τ | !r τ

e1 ,→ λx.e λx.e ,→ λx.e

Ψ, α : type ` τ : type Ψ, α : type ` α : type

Ψ ` 1 : type

Ψ ` µα.τ : type

b : type ∈ Σ

Ψ ` τ : type

Ψ ` b : type

r∈R

>0

Ψ ` τ1 : type

Ψ ` τ2 : type

e ,→ (v1 , v2 )

? ∈ {+, &, ⊗, (, →}

Figure 1. Type Formation

inji e ,→ inji v

e ,→ inji v

∆ ` e 1 : τ1

Γ ` e 2 : τ2 ⊗I

e ,→ v

unfold e ,→ v

!e ,→ !v

πi e ,→ vi e ,→ v

[v/x]ei ,→ v 0

e ,→ !v

fold e ,→ fold v τ

τ

0

[v/x]e ,→ e0

let !x = e in e0 ,→ v 0

Figure 3. Evaluation Rules

∆, x :r τ1 , y :r τ2 ` e0 : τ 0

Γ ` e : τ1 ⊗ τ 2

⊗E

∆ + rΓ ` let(x, y) = e in e0 : τ 0 Γ ` e2 : τ2

In the rule ⊗I, consider the role of the contexts. Γ represents the variables that e1 depends on, and captures quantitatively how sensitive it is to each one. ∆ does the same for e2 . In the conclusion of the rule, we add together the sensitivities found in Γ and ∆, precisely because the distances in the type τ1 ⊗ τ2 are measured by a sum of how much e1 and e2 vary. Compare this to &I, where we merely require that the same context is provided in the conclusion as is used to type the two components of the pair. We can see the action of the type constructor !r in its introduction rule. If we scale up the metric on the expression being constructed, then we must scale up the sensitivity of every variable in its context to compensate. The closed-scope elimination rules for ⊗, +, and ! share a common pattern. The overall elimination has a choice as to how much it depends on the expression of the type being eliminated: this is written as the number r in all three rules. The cost of this choice is that context Γ that was used to build that expression must then be multiplied by r. The payoff is that the variable(s) that appear in the scope of the elimination (in the case of ⊗E, the two variables x and y, in +E the xs one in each branch) come with permission for the body to be r-sensitive to them. In the case of !E, however, the variable appears with an annotation of rs rather than r, reflecting that the !s scaled the metric for that variable by a factor of s. We note that (I, since ( is meant to capture 1-sensitive functions, appropriately creates a variable in the context with an annotation of 1. Compare this to →I, which adds a hypothesis with annotation ∞, whose use is unrestricted. Conversely, in →E, note that the context Γ used to construct the argument e2 of the function is multiplied by ∞ in the conclusion. Because the function e1 makes no guarantee how sensitive it is to its argument, we can in turn make no guarantee how much e1 e2 depends on the variables in Γ. This plays the same role as requirements familiar in linear logic, that the argument to an unrestricted implication cannot depend on linear resources.

Γ ` e : τ1 & τ2 &I

&E

Γ ` he1 , e2 i : τ1 & τ2

Γ ` π i e : τi ∆, x :r τ1 ` e1 : τ 0 ∆, x :r τ2 ` e2 : τ 0

Γ ` e : τ1 + τ 2

∆ + rΓ ` case e of x.e1 | x.e2 : τ 0 +I

Γ ` λx.e : τ ( τ 0

Γ ` inji e : τ1 + τ2 ∆ ` e1 : τ ( τ 0

(E

∆ + Γ ` e1 e2 : τ 0

Γ ` λx.e : τ → τ 0

0

!I

→E sΓ ` !e : !s τ

Γ ` e : [µα.τ /α]τ

∆, x :rs τ ` e0 : τ 0

∆ + rΓ ` let !x = e in e0 : τ 0

→I

Γ`e:τ

Γ ` e2 : τ

∆ + ∞Γ ` e1 e2 : τ

(I

Γ, x :∞ τ ` e : τ 0

Γ ` e2 : τ

∆ ` e1 : τ → τ 0

+E

Γ, x :1 τ ` e : τ 0

Γ ` e : τi

Γ ` e : !s τ

e ,→ hv1 , v2 i

∆ + Γ ` (e1 , e2 ) : τ1 ⊗ τ2

Γ, x :r τ ` x : τ

Γ ` e1 : τ1

τ

0

case e of x.e1 | x.e2 ,→ v 0

e ,→ fold v τ

var

[v1 /x][v2 /y]e0 ,→ v 0

let(x, y) = e in e ,→ v e ,→ v

e2 ,→ v2

(e1 , e2 ) ,→ (v1 , v2 )

0

Ψ ` τ1 ? τ2 : type

r≥1

e1 ,→ v1

e2 ,→ v2

he1 , e2 i ,→ hv1 , v2 i

Ψ ` !r τ : type

() ,→ ()

e1 e2 ,→ v 0

e1 ,→ v1

∪ {∞}

[v/x]e ,→ v 0

e2 ,→ v

µI

!E

Γ ` fold e : τ µα.τ

Γ`e:τ µE Γ ` unfold e : [µα.τ /α]τ µα.τ

Figure 2. Typing Rules The typing relation is defined by the inference rules in Figure 2. Every occurrence of r and s in the typing rules is assumed to be drawn from R>0 ∪ {∞}. Type-checking is decidable; see Section 6 and the appendix1 for more details. In short, the only novelty is that lower bounds on the annotations in the context are inferred topdown from the leaves to the root of the derivation tree. The rule var allows a variable from the context to be used as long as its annotation is at least 1, since the identity function is csensitive for any c ≥ 1 (cf. Proposition 2.1). Any other context Γ is allowed to appear in a use of var, because permission to depend on a variable is not an obligation to depend on it. (In this respect our type system is closer to affine logic than linear logic.)

2.6

Evaluation

We give a big-step operational semantics for this language, which is entirely routine. Values, the subset of expressions that are allowed as results of evaluation, are defined as follows. v

1 Available at http://www.cis.upenn.edu/~bcpierce/papers/dp.pdf

161

::=

() | hv, vi | (v, v) | λx.e | inji v | foldτ v | !v

The judgment e ,→ v says that e evaluates to v. The complete set of evaluation rules is given in Figure 3.

3.

Examples

Again, every bound variable is used once, except for f , which is provided as an unrestricted argument, making its repeated use acceptable. The fact that the initializer to the fold (of type σ) together with the list to be folded over (of type τ list) occur to the left of a ( is essential, capturing the fact that variation in the initializer and in every list element can jointly affect the result. Binary and iterated concatenation are also straightforwardly implemented:

We now present some more sophisticated examples of programs that can be written in this language. We continue to introduce new base types and new constants as they become relevant. For readability, we use syntactic sugar for case analysis and pattern matching a´ la ML. 3.1

Fixpoint Combinator

@ : τ list ⊗ τ list ( τ list @ ([ ], x) = x @ (h :: tl, x) = h :: @ (tl, x)

Because we have general recursive types, we can simulate a fixpoint combinator in pretty much the usual way: we just need to be a little careful about how sensitivity interacts with fixpoints. Let τ0 = µα.α → (τ ( σ). Then the expression Y

=

concat : τ list list ( τ list concat [ ] = [ ] concat (h :: tl) = @ (h, concat tl)

λf.(λx.λa.f ((unfoldτ0 x) x) a) (foldτ0 (λx.λa.f ((unfoldτ0 x) x) a))

has type ((τ ( σ) → (τ ( σ)) → (τ ( σ). This is the standard call-by-value fixed point operator (differing from the more familiar Y combinator by the two λa · · · a eta-expansions). It is easy to check that the unfolding rule

If we define the natural numbers as usual by nat = µα.1 + α z = foldnat inj1 () s x = foldnat inj2 x

f (Y f ) v ,→ v0

then we can implement a function that finds the length of a list as follows: length : τ list ( nat length [ ] = z length (h :: tl) = s (length tl) However, this implementation is less than ideal, for it ‘consumes’ the entire list in producing its answer, leaving further computations unable to depend on it. We can instead write

Y f v ,→ v0 is admissible whenever f is a function value λx.e. We could alternatively add a fixpoint operator fixf.e to the language directly, with the following typing rule: Γ, f :∞ τ ( σ ` e : τ ( σ ∞Γ ` fixf.e : τ ( σ

length : τ list ( τ list ⊗nat length [ ] = ([ ], z) length (h :: tl) = let(tl0 , `) = length tl in(h :: tl0 , s `)

This rule reflects the type we assigned to Y above: uses of fix can soundly be compiled away by defining fixf.e = Y (λf.e). The fact that f is added to the context annotated ∞ means that we are allowed to call the recursive function an unrestricted number of times within e. The context Γ must be multiplied by ∞ in the conclusion because we can’t (because of the fixpoint), establish any bound on how sensitive the overall function is from just one call to it. In the rest of the examples, we write recursive functions in the usual high-level form, eliding the translation in terms of Y . 3.2

which deconstructs the list enough to determine its length, but builds up and returns a fresh copy that can be used for further processing. Consider why this function is well-typed: as it decomposes the input list into h and tl, the value of h is only used once, by including it in the output. Also, tl is only used once, as it is passed to the recursive call, which is able to return a reconstructed copy tl0 , which is then included in the output. At no point is any data duplicated, but only consumed and reconstructed.

Lists

We can define the type of lists with elements in τ as follows: 3.3

τ list = µα.1 + τ ⊗ α

&-lists

Another definition of lists uses & instead of ⊗: we can say τ alist = µα.1 + τ & α. (the ‘a’ in alist is for ‘ampersand’). To distinguish these lists visually from the earlier definition, we write Nil for foldτ alist inj1 () and Cons p for foldτ list inj2 p. Recall that & is eliminated by projection rather than patternmatching. This forces certain programs over lists to be implemented in different ways. We can still implement map for this kind of list without much trouble. amap : (τ ( σ) → (τ alist ( σ alist) amap f Nil = Nil amap f (Cons p) = Conshf (π1 p), map f (π2 p)i

We write [ ] for the nil value foldτ list inj1 () and h :: tl for foldτ list inj2 (h, tl), and we use common list notations such as [a, b, c] for a :: b :: c :: [ ]. Given this, it is straightforward to program map in the usual way. map : (τ ( σ) → (τ list ( σ list) map f [ ] = [ ] map f (h :: tl) = (f h) :: map f tl The type assigned to map reflects that a nonexpansive function mapped over a list yields a nonexpansive function on lists. Every bound variable is used exactly once, with the exception of f ; this is permissible since f appears in the context during the typechecking of map with an ∞ annotation. Similarly, we can write the usual fold combinators over lists:

This function is well-typed (despite the apparent double use of p in the last line!) because the &I rule allows the two components of an &-pair to use the same context. This makes sense, because the eventual fate of an &-pair is to have one or the other of its components be projected out. The fold operations are more interesting. Consider a na¨ıve implementation of foldl for alist

foldl : (τ ⊗ σ ( σ) → (σ ⊗ τ list) ( σ foldl f (init, [ ]) = init foldl f (init, (h :: tl)) = foldl f (f (h, init), tl)

afoldl : (τ & σ ( σ) → (σ & τ alist) ( σ alist afoldl f p = case π2 p of x. π1 p | x. afoldl f hf hπ1 x, π1 pi, π2 xi

foldr : (τ ⊗ σ ( σ) → (σ ⊗ τ list) ( σ foldr f (init, [ ]) = init foldr f (init, (h :: tl)) = f (h, foldr f (init, tl))

162

P The expression sum f S returns s∈S clip(f (s)), where clip(x) returns x clipped to the interval [−1, 1] if necessary. This clipping is required for sum to be 1-sensitive in its set argument. Otherwise, an individual set element could affect the sum by an unbounded amount. We can then define size S = sum (λx.1) S. The operation split takes a predicate on τ , and a set; it yields two sets, one containing the elements of the original set that satisfy the predicate and the other containing all the elements that don’t. Notice that split is 1-sensitive in its set argument; this is because if an element is added to or removed from that set, it can only affect one of the two output sets, not both. By using split repeatedly, we can write programs that, given a set of points in R, computes a histogram, a list of counts indicating how many points are in each of many intervals. For a simple example, suppose our histogram bins are the intervals (−∞, 0], (0, 10], . . . , (90, 100], (100, ∞).

where we have replaced ⊗ with & everywhere in foldl’s type to get the type of afoldl. This program is not well-typed, because π1 p is still used in each branch of the case despite the fact that π2 p is case-analyzed. The +E rule sums together these uses, so the result has sensitivity 2, while afoldl is supposed to be only 1-sensitive to its argument of type σ & τ alist. We would like to case-analyze the structure of the second component of that pair, the τ alist, without effectively consuming the first component. The existing type system does not permit this, but we can soundly add a primitive2 analyze : σ & (τ1 + τ2 ) ( (σ & τ1 ) + (σ & τ2 ) that gives us the extra bit that we need. The operational behavior of analyze is simple: given a pair value hv, inji v 0 i with v : σ and v 0 : τi , it returns inji hv, v 0 i. With this primitive, a well-typed implementation of afoldl can be given as follows: unf : (σ & τ alist) ( (σ & (1 + τ & τ alist)) unf p = hπ1 p, unfoldτ alist π2 pi

hist0 : R → R set ( R set list hist0 c s = if c ≥ 101 then [s] else let(y, n) = split(λz.c ≥ z) in y :: hist0 (c + 10) n

afoldl : (τ & σ ( σ) → (σ & τ alist) ( σ alist afoldl f p = case analyze (unf p) of x : (σ & 1). π1 x | x : (σ & (τ & τ alist)). afoldl f hf hπ1 π2 x, π1 xi, π2 π2 xi 3.4

hist : R set ( R list hist s = map size (hist0 0 s) Here we are also assuming the use of ordinary distanceinsensitive arithmetic operations such as ≥ : R → R → bool and + : R → R → R. We see in the next section that comparison operators like ≥ cannot be so straightforwardly generalized to be distance sensitive.

Sets

Another useful collection type is finite sets. We posit that τ set is a type for any type τ , with the metric on it being the Hamming metric dτ set (S1 , S2 ) = ||S1 4 S2 || where 4 indicates symmetric difference of sets, and ||S|| the cardinality of the set S; the distance between two sets is the number of elements that are in one set but not the other. Note that there is no obvious way to implement this type of sets in terms of the list types just presented, for the metric is different: two sets of different size are a finite distance from one another, but two lists of different size are infinitely far apart. Primitives that can be added for this type include

3.5

Higher-Order Set Operations and Termination

Some comments must be made on the termination of the higherorder functions setfilter, setmap, and setsplit. Consider the expression setfilter f s for s of type τ set and an arbitrary function f : τ → bool. If f diverges on some particular input v : τ , then the presence or absence of v in the set s can make setfilter f s diverge or terminate. This runs afoul of the claim of theorem 2.4 that two metrically similar computations should together evaluate to metrically nearby values. A way of avoiding this problem is to adopt primitives for which 2.4 can still be proved: we can ensure dynamically that the function argument (setfilter, setmap, and setsplit) terminates by imposing a time limit on the number of steps it can run over each element of the set. Whenever a function exceeds its time limit while operating on a set element x, it is left out of the filter or of the current split as appropriate, and in the case of setmap, a default element of type τ is used. One alternative is to weaken theorem 2.4 to state that if two computations over metrically related inputs do both terminate, then their outputs are metrically related. This weakened result is considerably less desirable for our intended application to differential privacy, however. A final option is to statically ensure the termination of the function argument. This seems to combine the best features of both of the other choices, but at the price of greater difficulty of program analysis.

size : τ set ( R setfilter : (τ → bool) → τ set ( τ set setmap : (σ → τ ) → τ → σ set ( τ set ∩, ∪, \ : τ set ⊗ τ set ( τ set split : (τ → bool) → τ set ( τ set ⊗ τ set where size returns the cardinality of a set, ∩ returns the intersection of two sets, ∪ their union, and \ the difference. Notably, for these last three primitives, we could not have given them the type τ set & τ set ( τ set. To see why, consider {b} ∪ {c, d} = {b, c, d} and {a} ∪ {c, d, e} = {a, c, d, e}. We have d({b}, {a}) = 2 and d({c, d}, {c, d, e}) = 1 on the two inputs to ∪, but on the output d({b, c, d}, {a, c, d, e}) = 3, and 3 is strictly larger than max(2, 1). The functions setfilter and setmap work mostly as expected, but with a proviso concerning termination below in Section 3.5. We note that size is a special case of a more basic summation primitive: sum : (τ → R) → τ set ( R

3.6 2 The

reader may note that this primitive is exactly the well-known distributivity property that the BI, the logic of bunched implications [28], notably satisfies in contrast with linear logic. We conjecture that a type system based on BI might also be suitable for distance-sensitive computations, but we leave this to future work, because of uncertainties about the decidability of typechecking and BI’s lack of exponentials, that is, operators such as !, which are important for interactions between distance-sensitive and -insensitive parts of a program.

Sorting

What about distance-sensitive sorting? Ordinarily, the basis of sorting functions is a comparison operator such as ≥τ : τ × τ → bool. However, we cannot take ≥R : R ⊗ R ( bool as a primitive, because ≥ is not 1-sensitive in either of its arguments: it has a glaring discontinuity. (Compare the example of gtzero in Section 2.1) Although (0, ) and (, 0) are nearby values in R ⊗ R if is small (they are just 2 apart), nonetheless ≥R returns false for one and

163

4.

true for the other, values of bool that are by definition infinitely far apart. Because of this we instead take as a primitive the conditional swap function cswp : R⊗R ( R⊗R defined in Section 2.1, which takes in a pair, and outputs the same pair, swapped if necessary so that the first component is no larger than the second. We are therefore essentially concerned with sorting networks, [5] with cswp being the comparator. With the comparator, we can easily implement a version of insertion sort.

We now describe how to apply the above type system to expressing differentially private computations. There are two ways to do this. One is to leverage the fact that our type system captures sensitivity, and use standard results about obtaining differential privacy by adding noise to c-sensitive functions. Since Theorem 2.4 guarantees that every well-typed expression b :c db ` e : R (for a type db of databases) is a c-sensitive function db → R, we can apply Proposition 4.1 below to obtain a differentially private function by adding the appropriate amount of noise to the function’s result. But we can do better. In this section, we show how adding a probability monad to the type theory allows us to directly capture differential privacy within our language.

insert : R ( R list ( R list insert x [ ] = [x] insert x (h :: tl) = let(a, b) = cswp (x, h) in a :: (insert b tl)

4.1

sort : R list ( R list sort [ ] = [ ] sort (h :: tl) = insert h (sort tl)

Background

First, we need a few technical preliminaries from the differential privacy literature [14]. The definition of differential privacy is a property of randomized functions that take as input a database, and return a result, typically a real number. For the sake of the current discussion, we take a database to be a set of ‘rows’, one for each user whose privacy we mean to protect. The type of one user’s data—that is, of one row of the database—is written row. For example, row might be the type of a single patient’s complete medical record. The type of databases is then db = row set; we use the letter b for elements of this type. Differential privacy is parametrized by a number , which controls how strong the privacy guarantee is: the smaller is, the more privacy is guaranteed. It is perhaps just as well to think about as a measure rather of how much privacy can be lost by allowing a query to take place. We assume from now on that we have fixed to some particular appropriate value. Informally, a function is differentially private if it behaves statistically similarly on similar databases, so that any individual’s presence in the database has a statistically negligible effect. Databases b and b0 are considered similar, written b ∼ b0 if they differ by at most one row—in other words if ddb (b, b0 ) ≤ 1. The standard definition [15] of differential privacy for functions from databases to real numbers is as follows:

Of course, the execution time of this sort is Θ(n2 ). It is an open question whether any of the typical Θ(n log n) sorting algorithms (merge sort, quick sort, heap sort) can be implemented in our language, but we can implement bitonic sort [5], which is Θ(n(log n)2 ), and we conjecture that one can implement the logdepth (and therefore Θ(n log n) time) sorting network due to Ajtai, Koml´os, and Szemer´edi [2]. 3.7

A Calculus for Differential Privacy

Finite Maps

Related to sets are finite maps from σ to τ , which we write as the type σ * τ . A finite map f from σ to τ is an unordered set of tuples (s, t) where s : σ and t : τ , subject to the constraint that each key s has at most one value t associated with it: if (s, t) ∈ f and (s, t0 ) ∈ f , then t = t0 . One can think of finite maps as SQL databases where one column is distinguished as the primary key. This type has essentially the same metric as the metric for sets, dσ*τ (S1 , S2 ) = ||S1 4 S2 ||. By isolating the primary key, we can support some familiar relational algebra operations: fmsize : (σ * τ ) ( R fmfilter : (σ → τ → bool) → (σ * τ ) ( (σ * τ ) mapval : (τ1 → τ2 ) → (σ * τ1 ) ( (σ * τ2 ) join : (σ * τ1 ) ⊗ (σ * τ2 ) ( (σ * (τ1 ⊗ τ2 ))

Definition A random function q : db → R is -differentially private if for all S ⊆ R, and for all databases b, b0 with b ∼ b0 , we have P r[q(b) ∈ S] ≤ e P r[q(b0 ) ∈ S].

The size and filter functions work similar to the corresponding operations on sets, and there are now two different map operators, one that operates on keys and one on values. The join operation takes two maps (i, si )i∈I1 and (i, s0i )i∈I2 , and outputs the map (i, (s1 , s2 ))i∈I1 ∩I2 . This operation is 1-sensitive in the pair of input maps, but only because we have identified a unique primary key for both of them! For comparison, the cartesian product × on sets — the operation that join is ordinarily derived from in relational algebra — is not c-sensitive for any finite c, for we can see that ({x} ∪ X) × Y has |Y | many more elements than X × Y . McSherry also noted this issue with unrestricted joins, and deals with it in a similar way in PINQ [25]. Finally, we are also able to support a form of GroupBy aggregation, in the form of a primitive

We see that for a differentially private function, when its input database has one row added or deleted, there can only be a very small multiplicative difference (e ) in the probability of any outcome S. For example, suppose an individual is concerned about their data being included in a query to a hospital’s database; perhaps that the result of that query might cause them to be denied health insurance. If we require that query to be 0.1-differentially private (i.e., if is set to 0.1), then they can be reassured that the chance of them being denied health care can only increase by about 10%. (Note that this is a 10% increase relative to what the probability would have been without the patient’s participation in the database. If the probability without the patient’s data being included was 5%, then including the data raises it at most to 5.5%, not to 15%!) It is straightforward to generalize this definition to other types, by using the distance between two inputs instead of the database similarity condition. We say:

group : (τ → σ) → !2 τ set ( (σ * (τ set)) which takes a key extraction function f : τ → σ, and a set S of values of type τ , and returns a finite map which maps values y ∈ σ to the set of s ∈ S such that f (s) = y. This function is 2-sensitive (thus the !2 ) in the set argument, because the addition or removal of a single set element may change one element in the output map: it takes two steps to represent such a change as the removal of the old mapping, and the insertion of the new one.

Definition A random function q : τ → σ is -differentially private if for all S ⊆ σ, and for all v, v 0 : τ , have P r[q(v) ∈ S] ≤ 0 edτ (v,v ) P r[q(v 0 ) ∈ S]. Although we will use this general definition below in Lemma 4.2, for the time being we continue considering only functions db → R.

164

One way to achieve differential privacy is via the Laplace mechanism. We suppose we have a deterministic database query, a function f : db → R of known sensitivity, and we produce a differentially private function by adding Laplace-distributed noise to the result of f . The Laplace distribution Lk is parametrized by k— intuitively, a measure of the spread, or ‘amount’, of noise to be 1 −|x|/k added. It has the probability density function P r[x] = 2k e . The Laplace distribution is symmetric and centered around zero, and its probabilities fall off exponentially as one moves away from zero. It is a reasonable noise distribution, which is unlikely to yield values extremely far from zero. The intended behavior of the Laplace mechanism is captured by the following result:

The typing rules for the monad are as follows: Γ`e:τ ∞Γ ` return e : #τ

e1 ,→ (pi , vi )i∈I

∆ + Γ ` let #x = e in e0 : #τ 0

#E

∀i ∈ I. [vi /x]e2 ,→ (qij , wij )j∈Ji

let #x = e1 in e2 ,→ (pi qij , wij )i∈I,j∈Ji We see that return creates the trivial distribution that always yields v. Monadic sequencing considers all possible values vi that e could evaluate to, and then subsequently all the values that e0 could evaluate to, assuming that it received the sample vi . The probabilities of these two steps are multiplied, and appropriately aggregated together. Combining the type system’s metric preservation property with Lemma 4.2, we find that typing guarantees differential privacy:

The Probability Monad

C OROLLARY 4.3. The execution of any closed program e such that ` e : !n τ ( #σ is an (n)-differentially private function from τ to σ.

We now show how to extend our language with a monad of random computations. Formally, the required extensions to the syntax are: ::= ::= ::=

Γ, x :∞ τ ` e0 : #τ 0

return e ,→ (1, v)

That is, the amount of noise required to make a c-sensitive function -private is c/. Stronger privacy requirements (smaller ) and more sensitive functions (larger c) both require more noise. Note that we must impose a global limit on how many queries can be asked of the same database: if we could ask the same query over and over again, we could eventually learn the true value of f with high probability despite the noise. If we exhaust the “privacy budget” for a given database, the database must be destroyed. This data-consuming aspect of differentially private queries was the initial intuition that guided us to the linear-logic-inspired design of the type system.

Types τ Expressions e Values v

∆ ` e : #τ

The introduction rule multiplies the context by infinity, because nearby inputs (perhaps surprisingly!) do not lead to nearby deterministic probability distributions. Even if t and t0 are close, say dτ (t, t0 ) = , still return t has a 100% chance — and return t0 has a 0% chance — of yielding t. The elimination rule adds together the influence ∆ that e may have over the final output distribution to the influence Γ that e0 has, and provides the variable x unrestrictedly (with annotation ∞) to e0 , because once a differentially private query is made, the published result can be used in any way at all. We add the following cases to the operational semantics: e ,→ v

P ROPOSITION 4.1 ([15]). Suppose f : db → R is c-sensitive. Define the random function q : db → R by q = λb.f (b) + N , where N is a random variable distributed according to Lc/ . Then q is -differentially private.

4.2

#I

· · · | #τ · · · | return x | let #x = e in e0 ··· | δ

5.

Differential Privacy Examples

Easy examples of -differentially private computations come from applying the Laplace mechanism at the end of a deterministic computation. We can add a primitive function

We add #τ , the type of random computations over τ . Expressions now include a monadic return, which deterministically always yields x, as well as monadic sequencing: the expression let #x = e in e0 can be interpreted as drawing a sample x from the random computation e, and then continuing with the computation e0 . We postpone discussing the typing rules until after we have established what the metric on #τ is, and for that we need to understand what its values are. For simplicity, we follow Ramsey and Pfeiffer [30] in taking a rather denotational approach, and think of values of type #τ as literally being mathematical probability distributions. A more strictly syntactic presentation (in terms of, say, pseudo-random number generators) certainly is also possible, but is needlessly technical for our present discussion. In what follows, a probability distribution δ is written as (pi , vi )i∈I , a multiset of probabilityvalue pairs. We write δ(v) for the probability ((pi , vi )i∈I )(v) = P {i | vi =v} pi of observing v in the distribution δ. The metric on probability distributions is carefully chosen to allow our type system to speak about differential privacy. Recall that we have assumed to be fixed, and define: ˛ „ „ «˛« ˛ δ1 (x) ˛˛ 1 d#τ (δ1 , δ2 ) = max ˛˛ln x∈τ δ2 (x) ˛

add noise : R ( #R

which adds Laplace noise L1/ to its input. According to Proposition 4.1, this is exactly the right amount of noise to add to a 1sensitive function to make it -differentially private. For a concrete example, suppose that we have a function age : row → int. We can then straightforwardly implement the over-40 count query from the introduction. over 40 : row → bool. over 40 r = age r > 40. count query : row set ( #R count query b = add noise (setfilter over 40 b)

The definition measures how multiplicatively far apart two distributions are in the worst case, as is required by differential privacy. We can then easily see by unrolling definitions that

Notice that we are able to use convenient higher-order functional programming idioms without any difficulty. The function over 40 is also an example of how ‘ordinary programming’ can safely be mixed in with distance-sensitive programs. Since the type of over 40 uses → rather than (, it makes no promise about sensitivity, and it is able to use ‘discontinuous’ operations like numeric comparison >. Other deterministic queries can be turned into differentially private functions in a similar way. For example, consider the histogram function hist : R set ( R list from Section 3.4. We can first of all write the following program.

L EMMA 4.2. A 1-sensitive function τ → #σ is the same thing as an -differentially private random function τ → σ.

hist query0 : row set ( (#R) list hist query0 b = map add noise (hist (setmap age b))

165

This takes a database, finds the age of every individual, and computes a histogram of the ages. Then we prescribe that each item in the output list — every bucket in the histogram — should be independently noised. This yields a list of random computations, while what we ultimately want is a random computation returning a list. But we can use monadic sequencing to get exactly this:

We sketch how this program can be implemented, taking data points to be of the type pt = R⊗R. The following helper functions are used: assign : pt list → pt set ( (pt ⊗ int) set partition : (pt ⊗ int) set ( pt set list totx, toty : pt set ( R zip : τ list → σ list → (τ ⊗ σ) list

seq : (#R) list ( #(R list) seq [] = return [] seq (h :: tl) = let #h0 = h in let #tl0 = seq tl in return(h0 :: tl0 )

These can be written with the primitives we have described; assign takes a list of centers and the dataset, and returns a version of the dataset where each point is labelled by the index of the center it’s closest to. Then partition divides this up into a list of sets, using split. The functions totx and toty compute the sum of the first and second coordinates, respectively, of each point in a set. This Figure 4. k-Means Output can be accomplished with sum. Finally, zip is the usual zipping operation that combines two lists into a list of pairs. With these, we can write a function that performs one iteration of private k-means: means

hist query : row set ( #(R list) hist query b = seq (hist query0 b) In the differential privacy literature, there are explicit definitions of both the meaning of sensitivity and the process of safely adding enough noise to lists of real numbers [15]. By contrast, we have shown how to derive these concepts from the primitive metric type R and the type operators µ, 1, +, ⊗, and #. We can also derive more complex combinators on differentially private computations, merely by programming with the monad. We consider first a simple version3 of McSherry’s principle of sequential composition [25].

iterate : !3 pt set ( R list → #(R list) iterate b ms = let !b0 = b in let b00 = partition (assign ms b0 ) tx = map (add noise ◦ totx) b00 ty = map (add noise ◦ toty) b00 t = map (add noise ◦ size) b00 stats = zip (zip (tx, ty), t) in seq (map avg stats)

L EMMA 5.1 (Sequential Composition). Let f1 and f2 be two differentially private queries, where f2 is allowed to depend on the output of f1 . Then the result of performing both queries is 2differentially private. In short, the privacy losses of consecutive queries are added together. This principle can be embodied as the following higherorder function: sc : (τ1 ( #τ2 ) → (τ1 ( τ2 → #τ3 ) → (!2 τ1 ( #τ3 ) sc f1 f2 t1 = let !t01 = t1 in let #t2 = f1 t01 in f2 t01 t2

It works by asking for noisy sums of the x-coordinate total, ycoordinate total, and total population of each cluster. These data are then combined via the function avg:

It takes two arguments are the functions f1 and f2 , which are both -differentially private in a data source of type τ1 (and f2 additionally has unrestricted access to the τ2 result of f1 ), and returns a 2-differentially private computation. McSherry also identifies a principle of parallel composition:

avg : ((#R ⊗ #R) ⊗ #R) ( #(R ⊗ R) avg ((x, y), t) = let #x0 = x in let #y 0 = y in let #t0 = t in return (x0 /t0 , y 0 /t0 )

L EMMA 5.2 (Parallel Composition). Let f1 and f2 be two differentially private queries, which depend on disjoint data. Then the result of performing both queries is -differentially private.

We can read off from the type that one iteration of k-means is 3differentially private. This type arises from the 3-way replication of the variable b00 . We can use monadic sequencing to do more than one iteration: two iters : !6 pt set ( R list → #(R list) two iters b ms = let !b0 = b in iterate !b0 (iterate !b0 ms)

This can be coded up by interpreting “disjoint” with ⊗. pc : (τ1 ( #τ2 ) → (σ1 ( #σ2 ) → (τ1 ⊗ σ1 ) ( #(τ2 ⊗ σ2 ) pc f g (t, s) = let #t0 = f t in let #s0 = g s in return(t0 , s0 )

In McSherry’s work, what is literally meant by “disjoint” is disjoint subsets of a database construed as a set of records. This is also possible to treat in our setting, since we have already seen that split returns a ⊗-pair of two sets. For a final, slightly more complex example, let us consider the privacy-preserving implementation of k-means by Blum et al. [6]. Recall that k-means is a simple clustering algorithm, which works as follows. We assume we have a large set of data points in some space (say Rn ), and we want to find k ‘centers’ around which they cluster. We initialize k provisional ‘centers’ to random points in the space, and iteratively try to improve these guesses. One iteration consists of grouping each data point with the center it is closest to, then taking the next round’s set of k centers to be the mean of each group.

This function is 6-differentially private. Figure 4 shows the result of three independent runs of this code, with k = 2, 6 = 0.05, and 12,500 points of synthetic data. We see that it usually manages to come reasonably close to the true center of the two clusters. We have also developed appropriate additional primitives and programming techniques to make it possible (as one would certainly hope!) to choose the number of iterations not statically but at runtime, but space reasons prevent us from discussing them here.

6.

Metatheory

In this section we address the formal correctness of the programming language described above. First of all, we can prove appropriate versions of the usual basic properties that we expect to hold of a well-formed typed programming language.

3 McSherry actually states a stronger principle, where there are k different queries, all of different privacy levels. This can also be implemented in our language.

L EMMA 6.1 (Weakening). If Γ ` e : τ , then Γ + ∆ ` e : τ .

166

∀v : τ1 .[v/x]e1 ∼r [v/x]e2 : τ2

∀v : τ1 .[v/x]e1 ∼r [v/x]e2 : τ2

λx.e1 ∼r λx.e2 : τ1 → τ2

λx.e1 ∼r λx.e2 : τ1 ( τ2

v1 ∼r1 v10 : τ1

v2 ∼r2 v20 : τ2

v ∼r v 0 : τ

(v1 , v2 ) ∼r1 +r2 (v10 , v20 ) : τ1 ⊗ τ2 v1 ∼r v10 : τ1

distance vector γ tracks the distance between each corresponding pair of values. We define the dot product of a distance vector and a contextPas follows: if γ is r1 , . . . , rn , and Γ is as above, then γ·Γ= n i=1 ri si .

!v ∼rs !v 0 : !s τ

T HEOREM 6.4 (Metric Preservation). Suppose Γ ` e : τ . Suppose σ, σ 0 are two substitutions for Γ such that σ ∼γ σ 0 : Γ. Then we have σe ∼γ·Γ σ 0 e : τ .

() ∼r () : 1

A straightforward proof attempt of this theorem fails. If we try to split cases by the typing derivation of e, a problem arises at the case where e = e1 e2 . The induction hypothesis will tell us that σe1 is close to σ 0 e1 , and that σe2 is close to σ 0 e2 . But the definition of the metric at function types (whether → or ( — the problem arises for both of them) only quantifies over one value — how then can we reason about both σe2 and σ 0 e2 ? This problem is solved by using a step-indexed metric logical relation [1, 3] which represents a stronger induction hypothesis, but which agrees with the metric. We defer further details of this argument to the appendix.

v2 ∼r v20 : τ2

hv1 , v2 i ∼r hv10 , v20 i : τ1 & τ2 v ∼r v 0 : [µα.τ /α]τ

v ∼r v 0 : τi

fold v ∼r fold v 0 : µα.τ

inji v ∼r inji v 0 : τ1 + τ2

∀v1 : τ.e1 ,→ v1 ⇒ ∃v2 .e2 ,→ v2 ∧ v1 ∼r v2 : τ e1 ∼r e2 : τ

Figure 5. Metric Relation

7.

T HEOREM 6.2 (Substitution). If Γ ` e : τ and ∆, x :r τ ` e0 : τ 0 , then ∆ + rΓ ` [e/x]e0 : τ 0 . T HEOREM 6.3 (Preservation). If ` e : τ and e ,→ v, then ` v : τ . Note that the weakening lemma allows both making the context larger, and making the annotations numerically greater. The substitution property says that if we substitute e into a variable that is used r times, then Γ, the dependencies of e, must be multiplied by r in the result. The preservation lemma is routine; if we had presented the operational semantics in a small-step style, a progress theorem would also be easy to show. 6.1

Defining the Metric

Up to now, the metrics on types have been dealt with somewhat informally; in particular, our ’definition’ of distance for recursive types was not well founded. We now describe a formal definition. It is convenient to treat the metric not as a function, but rather as a relation on values and expressions. The relation v ∼r v 0 : τ (resp. e ∼r e0 : τ ) means that values v and v 0 (expressions e and e0 ) of type τ are at a distance of no more than r apart from each other. The metric on expressions is defined by evaluation: if the values that result from evaluation of the two expressions are no farther than r apart, then the two expressions are considered to be no farther than r apart. This relation is defined coinductively on the rules in Figure 5. By this we mean that we define v ∼r v 0 : τ to be the greatest relation consistent with the given rules. A relation is said to be consistent with a set of inference rules if for any relational fact that holds, there exists an inference rule whose conclusion is that fact, and all premises of that rule belong to the relation. Intuitively, this means that we allow infinitely deep inference trees. Note that ∼r never appears negatively (i.e., negated or to the left of an implication) in the premise of any rule, so we can see that closure under the rules is a property preserved by arbitrary union of relations, and therefore the definition is well-formed. 6.2

Related Work

The seminal paper on differential privacy is [15]; it introduces the fundamental definition and the Laplace mechanism. More general mechanisms for directly noising types other than R also exist, such as the exponential mechanism [24], and techniques have been developed to reduce the amount of noise required for repeated queries, such as the median mechanism [31]. Dwork [13] gives a useful survey of recent results. Girard’s linear logic [16] was a turning point in a long and fruitful history of investigation of substructural logics, which lack structural properties such as unrestricted weakening and contraction. A key feature of linear logic compared to earlier substructural logics [20] is its ! operator, which bridges linear and ordinary reasoning. Our type system takes its structure from the affine variant of linear logic (also related to Ketonen’s Direct Logic [19]), where weakening is permitted. The idea of counting, as we do, multiple uses of the same resource was explored by Wright [32], but only integral numbers of uses were considered. The study of database privacy and statistical databases more generally has a long history. Recent work includes Dalvi, R´e, and Suciu’s study of probabilistic database management systems [11], and Machanavajjhala et al.’s comparison of different notions of privacy with respect to real-world census data [22]. Quantitative Information Flow [21, 23] is, like our work, concerned with how much one piece of a program can affect another, but measures this in terms of how many bits of entropy leak during one execution. Provenance analysis [8] in databases tracks the input data actually used to compute a query’s output, and is also capable of detecting that the same piece of data was used multiple times to produce a given answer [17]. Chaudhuri et al. [10] also study automatic program analyses that establish continuity (in the traditional topological sense) of numerical programs. Our approach differs in two important ways. First, we consider the stronger property of csensitivity, which is essential for differential privacy applications. Second, we achieve our results with a logically motivated type system, rather than a program analysis.

Metric Preservation Theorem

Now we can state the central novel property that our type system guarantees. We introduce some notation to make the statement more compact. Suppose Γ = x :s1 τ1 , . . . x :sn τn . A substitution σ for Γ is a list of individual substitutions of values for variables in Γ, written [v1 /x1 ] · · · [vn /xn ]. A distance vector γ is a list r1 , . . . , rn such that every ri is in R≥0 ∪ ∞. We say σ ∼γ σ 0 : Γ when, for every [vi /xi ] ∈ σ and [vi0 /xi ] ∈ σ 0 , we have vi ∼ri vi0 : τi . In this case we think of σ and σ 0 as being ‘γ apart’: the

8.

Conclusion

We have presented a typed functional programming language that guarantees differential privacy. It is expressive enough to encode examples both from the differential privacy community and from functional programming practice. Its type system shows how differential privacy arises conceptually from the combination of sensitivity analysis and monadic encapsulation of random computations.

167

There remains a rich frontier of differentially private mechanisms and algorithms that are known, but which are described and proven correct individually. We expect that the exponential mechanism should be easy to incorporate into our language, as a higherorder primitive which directly converts McSherry and Talwar’s notion of quality functions [24] into probability distributions. The median mechanism, whose analysis is considerably more complicated, is likely to be more of a challenge. The private combinatorial optimization algorithms developed by Gupta et al. [18] use different definitions of differential privacy which have an additive error term; we conjecture this could be captured by varying the notion of sensitivity to include additive slack. We believe that streaming private counter of Chan et al. [9] admits an easy implementation by coding up stream types in the usual way. We hope to show in future work how these, and other algorithms can be programmed in a uniform, privacy-safe language.

[14] C. Dwork. Differential privacy. In Proceedings of ICALP (Part, volume 2, pages 1–12, 2006. [15] C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, 2006. [16] J. Girard. Linear logic. Theoretical Computer Science, 50(1):1–102, 1987. [17] T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS ’07: Proceedings of the twenty-sixth ACM SIGMODSIGACT-SIGART symposium on Principles of database systems, pages 31–40, New York, NY, USA, 2007. ACM. [18] A. Gupta, K. Ligett, F. McSherry, A. Roth, and K. Talwar. Differentially private combinatorial optimization. Nov 2009. [19] J. Ketonen. A decidable fragment of predicate calculus. Theoretical Computer Science, 32(3):297–307, 1984. ISSN 03043975. [20] J. Lambek. The mathematics of sentence structure. American Mathematical Monthly, 65(3):154–170, 1958.

Acknowledgments

[21] G. Lowe. Quantifying information flow. In In Proc. IEEE Computer Security Foundations Workshop, pages 18–31, 2002.

Thanks to Helen Anderson, Jonathan Smith, Andreas Haeberlen, Adam Aviv, Daniel Wagner, Michael Hicks, Katrina Ligett, Aaron Roth, and Michael Tschantz for helpful discussions. This work was supported by ONR Grant N00014-09-1-0770 “Networks Opposing Botnets (NoBot)”.

[22] A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE ’08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pages 277–286, Washington, DC, USA, 2008. IEEE Computer Society. [23] S. McCamant and M. D. Ernst. Quantitative information flow as network flow capacity. In PLDI ’08: Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, pages 193–205, New York, NY, USA, 2008. ACM. [24] F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS ’07: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, pages 94–103, Washington, DC, USA, 2007. IEEE Computer Society. [25] F. D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD ’09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 19–30, New York, NY, USA, 2009. ACM. [26] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In SP ’08: Proceedings of the 2008 IEEE Symposium on Security and Privacy, pages 111–125, Washington, DC, USA, 2008. IEEE Computer Society. ISBN 978-0-7695-3168-7. doi: http://dx.doi.org/10.1109/SP.2008.33. [27] K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75–84, New York, NY, USA, 2007. ACM. [28] P. O’Hearn and D. Pym. The logic of bunched implications. Bulletin of Symbolic Logic, 5(2):215–244, 1999.

References [1] A. Ahmed. Step-indexed syntactic logical relations for recursive and quantified types. In of Lecture Notes in Computer Science, volume 3924, pages 69–83, 2006. [2] M. Ajtai, J. Koml´os, and E. Szemer´edi. Sorting in c log n parallel steps. Combinatorica, 3(1):1–19, March 1983. ISSN 0209-9683. [3] A. W. Appel and D. McAllester. An indexed model of recursive types for foundational proof-carrying code. ACM Trans. Program. Lang. Syst., 23(5):657–683, 2001. ISSN 0164-0925. [4] A. Barber. Dual intuitionistic linear logic. Technical Report ECSLFCS-96-347, University of Edinburgh, 1996. [5] K. E. Batcher. Sorting networks and their applications. In AFIPS ’68 (Spring): Proceedings of the April 30–May 2, 1968, spring joint computer conference, pages 307–314, New York, NY, USA, 1968. ACM. [6] A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the sulq framework. In PODS ’05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 128–138, New York, NY, USA, 2005. ACM. [7] A. Blum, K. Ligett, and A. Roth. A learning theory approach to noninteractive database privacy. In STOC ’08: Proceedings of the 40th annual ACM symposium on Theory of computing, pages 609–618, New York, NY, USA, 2008. ACM. [8] P. Buneman, S. Khanna, and T. Wang-Chiew. Why and where: A characterization of data provenance. In J. Bussche and V. Vianu, editors, Database Theory ICDT 2001, volume 1973 of Lecture Notes in Computer Science, chapter 20, pages 316–330. Springer Berlin Heidelberg, Berlin, Heidelberg, October 2001. [9] T.-H. H. Chan, E. Shi, and D. Song. Private and continual release of statistics. Cryptology ePrint Archive, Report 2010/076, 2010. http://eprint.iacr.org/. [10] S. Chaudhuri, S. Gulwani, and R. Lublinerman. Continuity analysis of programs. SIGPLAN Not., 45(1):57–70, 2010. ISSN 0362-1340. [11] N. Dalvi, C. R´e, and D. Suciu. Probabilistic databases: diamonds in the dirt. Commun. ACM, 52(7):86–94, 2009. [12] C. Dwork. The differential privacy frontier (extended abstract). In Theory of Cryptography, Lecture Notes in Computer Science, chapter 29, pages 496–502. 2009. [13] C. Dwork. Differential privacy: A survey of results. 5th International Conference on Theory and Applications of Models of Computation, pages 1–19, 2008.

[29] S. Park, F. Pfenning, and S. Thrun. A monadic probabilistic language. In In Proceedings of the 2003 ACM SIGPLAN international workshop on Types in languages design and implementation, pages 38–49. ACM Press, 2003. [30] N. Ramsey and A. Pfeffer. Stochastic lambda calculus and monads of probability distributions. In In 29th ACM POPL, pages 154–165. ACM Press, 2002. [31] A. Roth and T. Roughgarden. The median mechanism: Interactive and efficient privacy with multiple queries, 2010. To appear in STOC 2010. [32] D. Wright and C. Baker-Finch. Usage Analysis with Natural Reduction Types. In Proceedings of the Third International Workshop on Static Analysis, pages 254–266. Springer-Verlag London, UK, 1993.

168

Security-Typed Programming within Dependently Typed Programming Jamie Morgenstern∗

Daniel R. Licata ∗

Carnegie Mellon University {jamiemmt,drl}@cs.cmu.edu

Abstract

trol using dependently typed proof-carrying authorization (PCA): the run-time system requires every access to a sensitive resource be accompanied by a proof of authorization [7], while the type system aids programmers in constructing correct proofs. Fable [37] and Jif [14] enforce information flow properties using type systems that restrict the use of values that depend on private information. Fine [38] combines these techniques to enforce both. These languages’ type systems employ a number of advanced techniques, such as dependently typed authorization proofs, indexed monads of computations at a place and on behalf of a principal [8], information flow types, and affine types for ephemeral security policies. Dependently typed programming languages provide a rich language of type-level data and computation. One promising application of dependent types is constructing domain-specific type systems as libraries, rather than new language designs—this allows the language designer to exploit the implementation, metatheory, and tools of the host language. In this paper, we apply this methodology to security typed programming, and show that security-typed programming can be embedded within a general-purpose dependently typed programming language, Agda [32]. We implement a library, Aglet, which accounts for the major features of existing securitytyped programming languages, such as Aura, PCML5, and Fine: Decentralized Access Control: Access control policies are expressed as propositions in an authorization logic, Garg and Pfenning’s BL0 [21]. This permits decentralized access control policies, expressed as the aggregate of statements made by different principals about the resources they control. In our embedding, we represent BL0 ’s propositions and proofs using dependent types, and exploit Agda’s type checker to validate the correctness of proofs. Dependently Typed PCA: Primitives that access resources, such as file system operations, require programmers to provide a proof of authorization, which is guaranteed by the type system to be a well-formed proof of the correct proposition. Ephemeral and Dynamic Policies: Whether or not one may access a resource is often dependent upon the state of a system. For example, in a conference management server, authors may submit a paper, but only before the submission deadline. Fine accounts for ephemeral policies using a technique called affine types, which requires a substructural notion of variables. Because Agda does not currently provide substructurality, we show that one can instead account for ephemeral policies using an indexed monad. Following Hoare Type Theory [31], we define a type Γ A Γ0 , which represents a computation that, given precondition Γ, returns a value of type A, with postcondition Γ0 . Here, Γ and Γ0 are propositions from the authorization logic, describing the state of resources in the system. For example, consider the operation in a conference management server that closes submissions and begins reviewing. We represent this by a computation of type

(InPhase Submission) Unit (InPhase Reviewing) Given the conference is in phase Submission, this computation re-

Several recent security-typed programming languages, such as Aura, PCML5, and Fine, allow programmers to express and enforce access control and information flow policies. In this paper, we show that security-typed programming can be embedded as a library within a general-purpose dependently typed programming language, Agda. Our library, Aglet, accounts for the major features of existing security-typed programming languages, such as decentralized access control, typed proof-carrying authorization, ephemeral and dynamic policies, authentication, spatial distribution, and information flow. The implementation of Aglet consists of the following ingredients: First, we represent the syntax and proofs of an authorization logic, Garg and Pfenning’s BL0 , using dependent types. Second, we implement a proof search procedure, based on a focused sequent calculus, to ease the burden of constructing proofs. Third, we represent computations using a monad indexed by pre- and post-conditions drawn from the authorization logic, which permits ephemeral policies that change during execution. We describe the implementation of our library and illustrate its use on benchmark examples considered in the literature. Categories and Subject Descriptors F.3.3 [Logics and Meanings Of Programs]: Studies of Program Constructs—Type structure; F.3.1 [Logics and Meanings Of Programs]: Specifying and Verifying and Reasoning about Programs General Terms

1.

Languages, Security, Verification

Introduction

Security-typed programming languages allow programmers to specify and enforce security policies, which describe both access control—who is permitted to access sensitive resources?—and information flow—what are they permitted to do with these resources once they get them? Aura [24] and PCML5 [9] enforce access con∗ This

research was sponsored in part by the National Science Foundation under grants CCF-0702381 and CNS-0716469, and by the Pradeep Sindhu Computer Science Fellowship. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

169

turns a value of type Unit, and the state of the conference has been changed to Reviewing. For comparison between the approaches, we adapt Fine’s conference management example to our indexed monad. Aglet also permits dynamic acquisition and generation of policies—e.g., generating a policy based on reading the state of the conference management server from a database on startup. Authentication: Following previous work by Avijit and Harper [8], we model authentication with an indexed monad of computation on behalf of a principal, which tracks the currently authenticated user. This monad is equipped with a sudo operation for switching users, given appropriate credentials. We show that computation on behalf of a principal is a special case of our policyindexed monad Γ A Γ’. Spatial distribution: We also show that our policy-indexed monad can be used to model spatial distribution as in PCML5. Information Flow: Information flow policies constrain the use of values based on what went into computing them, e.g. tainting user input to avoid SQL injection attacks. We represent information flow using well-established techniques, such as indexed monads [36] and applicative functors [38]. Compile-time and Run-time Theorem Proving: Dependently typed PCA admits a sliding scale between static and dynamic verification. At the static end, one can verify, at compile-time, that a program complies with a statically-given authorization policy. This verification consists of annotating each access to a resource with an authorization proof, whose correctness is ensured by type checking. However, in many programs, the policy is not known at compile time—e.g., the policy may depend upon a system’s state. Such programs may dynamically test whether each operation is permitted before performing it, in which case dependently typed PCA ensures that the correct dynamic checks are made and that failure cases are handled. A program may also mix static and dynamic verification: for example, a program may dynamically check that an expected policy is in effect, and then, in the scope of that check, deduce consequences statically. Security-typed languages use theorem provers to reduce the burden of static proofs (as in Fine) and to implement dynamic checks (as in PCML5). We have implemented a certified theorem prover for BL0 , based on a focused sequent calculus. Our theorem prover can be run at compile-time and at run-time, fulfilling both of these roles. The theorem prover also saves programmers from having to understand the details of the authorization logic, as they often do not need to write proofs manually.

Admin says (∀r.∀o.∀f. (HR says employee(r) ∧ System says owns(o, f ) ∧ o says mayread(r, f )) ⊃ mayread(r, f )) System says owns(Jamie, secret.txt) HR says employee(Dan) HR says employee(Jamie) Jamie says mayread(Dan, secret.txt) Jamie says mayread(Jamie, secret.txt) Figure 1. Sample access control policy

2.1

2.1.1

Policy

To begin, we specify an authorization policy for file system operations in BL0 (Figure 1): First, the principal Admin says that for any reader, owner, and file, if human resources says the reader is an employee, and the system administrator says the owner owns the file, and the owner says the reader may read a file, then the reader may read the file. Admin is a distinguished principal whose statements will be used to govern file system operations. Second, the system administrator says Jamie owns secret.txt. Third, human resources says both Dan and Jamie are employees. Fourth, Jamie says Dan and Jamie may read the file. This policy illustrates decentralized access control using the says modality: the policy is the aggregate of statements by different principals about resources they control. For the principal Dan to read secret.txt, it will be sufficient to deduce the goal Admin says mayread(Dan, secret.txt). This proposition is provable from the above policy because of three properties of says : First, says is closed under instantiation of universal quantifiers (that is, k says ∀x.A(x) entails ∀x.k says A(x)). Second, says distributes over implications (k says (A ⊃ B) entails ((k says A) ⊃ (k says B)). Third, every principal believes that every statement of every other principal has been made (k says A entails k0 says (k says A))—though it is not the case that every principal believes that every statement of every other principal is true. Thus, the goal can be proved by using the first clause of the policy (Admin says . . .), instantiating the quantifiers, and using the other statements in the policy to satisfy the preconditions. In Agda, we represent this first clause as the first element of the following context (list of propositions):

The remainder of this paper is organized as follows: In Section 2, we show a variety of examples adapted from the literature, which demonstrate that Aglet accounts for programming in the style of Aura, PCML5, and Fine. In Section 3, we describe the implementation of Aglet, including the representation of the logic and the implementation of the theorem prover. We discuss related work in Section 4 and future work in Section 5. The Agda code for this paper is available from http://www.cs.cmu.edu/~drl.

2.

File IO with Access Control

First, we show a dependently typed file system interface, a standard example of security typed programming [8, 38, 39].

Γpolicy = (Prin "Admin" says (∀e principal · ∀e principal · ∀e filename · let owner = . (iS (iS i0)) reader = . (iS i0) file = . i0 in ( ( (Prin "HR" says (a- (Employee · reader))) ∧ (Prin "System" says (a- (Owner · (owner , file)))) ∧ (owner says (a- (Mayread · (reader , file))))) ⊃ (a- (Mayread · (reader , file)))))) :: (Prin "Admin" says (∀e principal · ∀e filename · (Prin "System" says (a- (Owner · (. iS i0 , . i0)))) ⊃ (a- (MayChown · (. iS i0 , . i0))))) :: []

Examples

In this section, we show that Aglet supports security-typed programming in the style of Aura, PCML5, and Fine by implementing a number of the benchmark examples considered in the literature. We briefly review Agda’s syntax, referring the reader to the Agda Wiki(wiki.portal.chalmers.se/agda/). Dependent function types are written as (x : A) B. An implicit dependent function space is written {x : A} B or ∀ {x} B and arguments to implicit functions are inferred. Non-dependent functions are written A B. Anonymous functions are written λ x e. Named functions are defined clausally by pattern matching. Lists are constructed by [] and :: (note that : is used for type annotations). Set is the classifier of classifiers in Agda.

170

The second element of the list expresses an additional policy clause, not discussed above, which states that an owner of a file may change its ownership. Variables are represented as de Bruijn indices (i0, iS), constants are represented as injections of strings (Prin "Admin"), and atomic propositions are tagged with a polarity (a+ or a-), which can be thought of as a hint to the theorem prover. Quantifiers are written ∀e τ · A, where τ is the domain of quantification and A is the body of the quantifier. Atomic propositions are written p · t, where p is a proposition constant such as Mayread and t is a term (see Section 3.1 for details). Next, we define a context representing a particular file system state. This context includes all the employee, ownership, and mayread facts mentioned above, with one additional clause saying that Dan may su as Jamie.

Generic operations:

: TCtx+ [] (A : Set) (A TCtx+ []) Set return : ∀ { Γ A} A _>>=_

Γ A (\ _ Γ)

: ∀ {A B Γ Γ’ Γ’’} ( Γ A Γ’) ((x : A) (Γ’ x) B Γ’’) Γ B Γ’’

weakenPre : ∀ {A Γ Γ’ Γn } (Good Γn Good Γ) Γ A Γ’ Γ ⊆ Γn

Γstate = (Prin "System" says (a- (Owner · (Prin "Jamie" , File "secret·txt")))) :: (Prin "HR" says (a- (Employee · (Prin "Dan")))) :: (Prin "HR" says (a- (Employee · (Prin "Jamie")))) :: (Prin "Jamie" says (a-(Mayread · (Prin "Dan" , File "secret·txt")))) :: (Prin "Jamie" says (a-(Mayread · (Prin "Jamie" , File "secret·txt")))) :: (Prin "Admin" says (a- (MaySu · (Prin "Dan" , Prin "Jamie")))) :: [] Γall = Γpolicy ++ Γstate

Finally, we let Γall stand for the append of Γpolicy and Γstate.

Γn A Γ’

weakenPost : ∀ {A Γ Γ’ Γn} Γ A Γ’ ((x : A) (Γn x ⊆ Γ’ x)) ((x : A) -> (Good (Γ’ x) Good (Γn x))) Γ A Γn getLine : ∀ {Γ} Γ String (\ _ Γ) print : ∀ {Γ}

String Γ Unit (\ _ Γ)

error : ∀ {A Γ Γ’} String Γ A Γ’ acquire : ∀ {A Γ Γ’} (Γn : TCtx+ []) (Good Γ Good (Γn ++ Γ)) (Γn ++ Γ) A Γ’ Γ A Γ’ Γ A Γ’

File-specific operations: 2.1.2

Compile-time Theorem Proving

sudo : ∀ { Γ A Γ’ ∆ ∆’} (k1 k2 : _) Replace (a+ (As · k1)) (a+ (As · k2)) Γ ∆ ((x : A) Replace (a+ (As · k2)) (a+ (As · k1)) (∆’ x) (Γ’ x)) (Proof Γ (a- (MaySu · (k1 , k2)))) ∆ A ∆’ Γ A Γ’

We now explain the use of our theorem prover: goal = a-(Mayread · (Prin "Dan" , File "secret·txt")) proof? : Maybe (Proof Γall goal) proof? = prove 15 theProof : Proof Γall goal theProof = solve proof?

read : ∀ {Γ} (k : _) (file : _) Proof Γ ( (a- (Mayread · (k , ∧ (a+ (As · k))) Γ String (λ _ Γ)

The term proof? sets up a call to the theorem prover, attempting to prove mayread(Dan, secret.txt) using the policy specified by Γall. Sequent calculus proofs are represented by an Agda type family (Ω ; ∆ ; Γ ; k) ` A, where Ω binds individual variables, ∆ is a context of claims assumptions, Γ is context of truth assumptions, and k, the view, is a principal from whose point of view the judgement is made. Informally, the role of the view is that, in a sequent whose view is k, k says A entails A; see Section 3.1 for details about the logic. In this example, Ω and ∆ will always be empty, Γ will represent a policy, as above, and the view k will be Prin "Admin"—we abbreviate such a sequent by Proof Γ A. The context and proposition arguments to prove can be inferred by Agda, and so are left as implicit arguments. The term theProof checks that the theorem prover succeeds at compile-time in this instance. The function solve has type:

file)))

create : ∀ {Γ} (k : _) Proof Γ ( (a- (User · k)) ∧ (a+ (As · k))) Γ String (λ new (Prin "System" says (a-(Owner · (k , File new)))) :: Γ) chown : ∀ { Γ ∆} (k k1 k2 : _) (f : _) Replace (Prin "System" says (a-(Owner · (k1 , f)))) (Prin "System" says (a-(Owner · (k2 , f)))) Γ ∆ (Proof Γ ( (a+ (As · k)) ∧ (a- (MayChown · (k , f))))) Γ Unit (\ _ ∆)

solve : ∀ {A} (s : Maybe A) {p : Check (isSome s)} A

Figure 2. File IO with Authorization

The argument p, of type Check (isSome s), is a proof that s is equal to Some s’ for some s’. Because this argument is implicit, Agda will attempt to fill it in by unification, which will succeed when s is definitionally equal to a term of the form Some s’. In this example, the call to the theorem prover in the term proof? proves the goal, computing definitionally to Some s’ for a proof s’ of mayread(Dan, secret.txt). Thus, we can use solve to extract this proof s’. In general, a call to the theorem prover on a context and a proposition that have no free Agda variables will always be equal to either Some p or to None.

2.1.3

Computations

We present a monadic interface for file operations in Figure 2. This figure shows both the generic IO operations, as well as three file-specific operations for reading, creating, and changing the owner of a file. The type Γ A Γ’ represents a computation with precondition Γ and postcondition Γ’. The Agda type of a con-

171

file. proveReplace is a tactic used to prove that Γall’ is Γall with the ownership of secret.txt changed. solve (prove 15) calls the theorem prover to statically verify that Dan has permission to chown secret.txt.

text is TCtx+ [] (a context of positive truth assumptions, with no free individual variables—see Section 3.1). The postcondition is a function from A’s to contexts, so the postcondition may depend on the computation’s result (see create below). The generic operations are typed as follows: Because return is not effectful, its postcondition is its precondition. Bind (>>=) chains together two computations, where the postcondition of the first is the precondition of the second. Both pre- and postconditions can be weakened to larger and smaller contexts, respectively; the Good predicate can be ignored until Section 2.1.4 below. Primitives like getLine (reading a line of input) and print do not change the state and do not require proofs. The postcondition of error is arbitrary, as it never terminates successfully. The remaining computations are defined as follows:

Γstate’ = replace {_} {Γstate} (Prin "System" says (a- (Owner · (Prin "Dan" , File "secret·txt")))) i0 Γall’ = Γpolicy ++ Γstate’ dchown : (Γall’ as "Dan") Unit (λ _ Γall as "Dan") dchown = chown (Prin "Dan") (Prin "Dan") (Prin "Jamie") (File "secret·txt") (solve proveReplace) (solve (prove 15)) >> drdprnt

Read The function read takes a principal k, a file f, and a proof argument. The proof ensures that the principal k is authorized to access the file (Mayread(k,f)) and that the principal k is the currently authenticated user (As(k)). We use the proposition As to model computation on behalf of a principal [8]. The proof is checked in the context Γ that is the precondition of the computation, ensuring that it is valid in the current state of the world. read delivers the contents of the file and leaves the state unchanged. An example call to read looks like this:

Sudo Following Avijit and Harper [8], we now give a well-typed version of the Unix command sudo, which allows switching principals during execution. A first cut for the type of sudo is as follows: sudo1 : ∀ { Γ A Γ’} (k1 k2 : _) (Proof Γ (a- (MaySu · (k1 , k2)))) ((a+ (As · k2)) :: Γ) A (λ _ (a+ (As · k2)) :: Γ’) ((a+ (As · k1)) :: Γ) A (λ _ (a+ (As · k1)) :: Γ’)

Γj = Γall as "Jamie"

If there is a proof that k1 may sudo as k2 (e.g., a password was provided), and As(k1) is in the precondition, then it is permissible to run a subcomputation as k2. This subcomputation has a postcondition saying that it terminates running as k2, and then the overall computation returns to running as k1. Because our contexts are ordered (represented as lists rather than sets), sudo has the type in Figure 2, which allows the As facts to occur anywhere in the context. sudo’s type may be read: if replacing As(k1) with As(k2) in Γ equals ∆, and if replacing As(k2) with As(k1) in ∆’ equals Γ’, and k2 has permission to su as k1, then a computation with preconditions ∆ and postconditions ∆’ can produce a computation with preconditions Γ and postconditions Γ’. The following example call to sudo defines a computation as Dan that su’s as Jamie to run the computation jreadprint defined above:

jread : Γj String (λ _ Γj) jread = read (Prin "Jamie") (File "secret·txt") (solve (prove 17)) jreadprint : Γj Unit (λ _ Γj) jreadprint = jread >>= λ x print ("the secret is: " ^ x)

The function call Γall as k is shorthand for adding the proposition As(k) to the context Γall. The computation jread reads the file secret.txt as principal Jamie; the proof argument is supplied by a call to the theorem prover, which statically verifies that the required fact is derivable from the policy given by Γall. The computation jreadprint reads the file and then prints the result. Create The type of create is similar to read, in that it takes a principal and a proof that the principal can create a file (in this case, the fact that the principal is a registered user is deemed sufficient). It returns a String, the name of the created file, and illustrates why postconditions must be allowed to depend on the return value of the computation: the postcondition says that the principal is the owner of the newly created file. Thus, after a call to create(k), the postconditions signify System says Owner(k,f), where f is the name of the new file.

drdprnt : (Γall as "Dan") Unit (λ _ Γall as "Dan") drdprnt = sudo (Prin "Dan" ) (Prin "Jamie") (solve proveReplace) (λ _ solve proveReplace) (solve (prove 15)) jreadprint

This requires proving that Γstate as "Jamie" and Γstate as "Dan" are related by replacing As(Prin "Jamie") with As(Prin "Dan") (in both directions). Our tactic proveReplace proves all of these equalities. Additionally, the theorem prover statically verifies Dan may su as Jamie under the policy Γall as "Dan".

Chown To specify chown, we use a type Replace x y Γ ∆, which means that ∆ is the result of replacing exactly one occurrence of x in Γ with y. Replace (whose definition is not shown) is defined by saying that (1) there is a de Bruijn index i showing that x is in Γ and (2) ∆ is equal to the output of the function replace y i, which recurs on the index i and replaces the indicated element by y. The type of chown should be read as follows: if the principal k as whom the computation is running has the authority to change the owner of a file, and replacing owns(k,f) with owns(k’, f) in Γ produces ∆, then we can produce a computation which changes the owner of f from k to k’, leaving the remaining context unchanged. Next, we show an example call to chown, using a context Γstate’ that is the result of replacing the fact that Jamie owns secret.txt with Dan owning that file. The computation dchown runs as Dan; it changes the owner of the file from Dan to Jamie, and then runs a computation drdprnt, defined below, that reads the

Acquire The function acquire allows a program to check whether a proposition is true in the state of the world. This construct is inspired by acquire in PCML5, but there are slight differences: in PCML5, acquire does theorem proving to prove an arbitrary proposition from the policy, whereas here acquire only verifies the truth of state-dependent atomic facts (which have no evidence) and statements of principals (whose only evidence is a digital signature [9, 24]). The function acquire takes two continuations: one to run if the check is successful, whose precondition is extended with the proposition, and an error handler, whose precondition is the current context, to run if the check fails. In fact, we allow acquire to test an entire context at once: given a context Γn, a computation with preconditions Γ extended with Γn (the success

172

continuation), and a computation with preconditions Γ (the error continuation), acquire returns a computation with preconditions Γ. We use the notation acquire Γn / _ no⇒ s yes⇒ f to write a call to acquire in a pattern-matching style. The _ elides a Good argument, which is explained below.

Tracked : Set Label Set fmap : ∀ {A B L} (A B) Tracked A L Tracked B L _ _ : ∀ {A B L1 L2} Tracked (A B) L1 Tracked A L2 Tracked B (L1 t L2)

An application f x joins the security levels of the function and the argument. Next, we give flow-sensitive types to read and write: read tags the value with the file it was read from, and write requires a proof of MayAllFlow provs file, representing the fact that all of the files upon which the written string depends may flow into file.

main : [] Unit (λ _ []) main = acquire (Γall as "Jamie") / _ no⇒ error "acquiring policy failed" yes⇒ weakenPost jreadprint (λ _ ()) _

This example call begins and ends in the empty context. The call to acquire examines the system state to check the truth of each of the propositions in Γall as "Jamie". If all of these are true, then we run jreadprint and use weakening to forget the postconditions. If some proposition cannot be verified, then main calls error. 2.1.4

read : ∀ {Γ} (k : _) (file : _) Proof Γ ((a- (Mayread · (k , file))) ∧ (a+ (As · k))) Γ (Tracked String [ file ]) (λ _ Γ) write : ∀ {Γ provs} (k : _) (file : _) Tracked String provs Proof Γ ( (a- (Maywrite · (k , file))) ∧ (a+ (As · k)) ∧ (MayAllFlow provs file)) Γ Unit (λ _ Γ)

Verifying Policy Invariants

When authoring the above monadic signature for file IO, the programmer may have in mind some invariants to which policies Γ must adhere. For example, a call to chown (above) would have unexpected consequences if there ever were more than one copy of System says owns(k,f) in Γ (only one copy would be replaced, leaving a file with two owners in the postcondition). Our interface permits programmers to specify context invariants using a predicate Good Γ. The intended invariant of our interface is that a monadic computation Γ A Γ’ should have the property that Γ’ satisfies Good if Γ does. To achieve this, the weakening operations and acquire require preconditions Γ be accompanied by a proof of Good Γ, and the programmer must verify that operations such as read, chown, and sudo preserve the invariant. Because of this invariant, it is not necessary to make each monadic operation require a proof that the precondition is Good. This means, that when writing a client program, the programmer needs only to verify that the initial policy and those in calls to weakening and acquire satisfy the invariants. In the above examples, we took Good to be the trivially true invariant, so the proofs could be elided with an _. As mentioned above, a useful invariant to enforce is that for every file f there is at most one statement of the form System says Owner(_ , f) in the context. This is defined in Agda as follows:

For example, we can read two files and write their concatenation to secret.txt: go : (Γ as "Jamie") Unit (\ _ (Γ as "Jamie")) go = read (Prin "Jamie") (File "file1.txt") (solve (prove 15)) >>= \ s read (Prin "Jamie") (File "file2·txt") (solve (prove 15)) >>= \ s’ write (Prin "Jamie") (File "secret·txt") ((fmap String.string-append s) s’) (solve (prove 15))

Here the theorem prover shows that both file1.txt and file2.txt may flow into secret.txt, according to the policy. This proof obligation results from the fact that (fmap String.string-append s) s’ has type Tracked String [ "file1.txt" , "file2.txt" ]. 2.3

PCML5 investigates PCA for the spatially distributed programming language ML5 [29]. Here, we show how to embed an ML5-style type system, which can be combined with the above techniques for access control and information flow. PCML5 considers additional aspects of distributed authorization, such as treating the policy itself as a distributed resource, which we leave to future work. ML tracks where resources and computations are located using modal types of the form A @ w. For example, database.read : (key value) @ server says that a function that reads from the database must be run at the server, while javascript.alert : (string unit) @ client says that a computation that pops up a browser alert box must be run at the client. Network communication is expressed in ML5 using an operation get : (unit A) @ w A @ w’ that (under some conditions which we elide here) goes to w to run the given computation and brings the resulting value back to w’. In other work [27], we have shown how to build an ML5-like type system on top of an indexed monad of computations at a place, w A, with a rule get : w’ A

w A. Here, observe that this monad indexing can be represented using a proposition At(w), where get is given a type analogous to sudo:

Good : TCtx+ [] Set Good Γ = ∀ {k k’ f} (a : (Prin "System" says (a- (Owner · (k , f)))) ∈ Γ) (b : (Prin "System" says (a- (Owner · (k’ , f)))) ∈ Γ) Equal a b

Then we may prove that the postcondition of each operation is Good if the precondition is; e.g. ChownPreservesGood : ∀ {Γ ∆ k1 k2 f} Replace (Prin "System" says (a-(Owner · (k1 , f)))) (Prin "System" says (a-(Owner · (k2 , f)))) Γ ∆ Good Γ Good ∆

In the companion code, we revise the above examples so that they maintain this invariant, using a tactic to generate the proofs. 2.2

Spatial Distribution with Information Flow

File IO with Access Control and Information Flow

Next, we extend the above file signature with information flow, adapting an example from Fine [38]. First, we define a type Tracked A L which represents a value of type A tracked with security level L, where L is a list of filenames and t appends two lists. Following Fine, we define Tracked as an abstract functor that distributes over functions (though different type structures for information flow, such as an indexed monad [36], can be used in other examples):

get : (w1 w2 : _) ∀ {Γ A Γ’ ∆ ∆’} Replace (a+ (At · w1)) (a+ (At · w2)) Γ ∆ Replace (a+ (At · w2)) (a+ (At · w1)) ∆’ Γ’ ∆ A (\ _ ∆’) Γ (Tracked A w2) (\ _ Γ’)

173

This rule states that, for any principal author, paper paper, and principal coauthor, if the conference is in notification phase, and author is the author of paper, and author says coauthor may read the scores for paper, then coauthor may read the scores for paper. Similarly, using says, it is straightforward to specify a policy allowing PC members to delegate reviewing assignments to subreviewers.

Additionally, we combine spatial distribution with information flow, tagging the return value of the computation with the world it is from. The postcondition must be independent of the return value, as there is in general no coercion either way between A and Tracked A L. Information flow can be used in this setting to force strings to be escaped before they are sent back to the client—e.g. to prevent SQL injection attacks:

2.4.2

sanitize : Tracked String (client) HTML str : Tracked String (server) HTML

Strings from the client must be escaped before they can be included in an HTML document, whereas strings from the server are assumed to be non-malicious, and can be included directly. In our technical report [28], we extend this example with a simple database interface that enforces both authorization and spatial distribution—database handles are only used at the server. 2.4

doaction : ∀ {Γ} (k : _) (a : _) (e : ExtraArgs Γ a) Proof Γ (a- (May · (k , a))) ∧ (a+ (As · k)) Γ (Result a) (λ r PostCondition a Γ e k r)

ConfRM: A Conference Management System

Swamy et al. [38] present an example of a conference management server, ConfRM, adapted from C ONTINUE [26] and its access control policy [18]. Here, we show an excerpt of an authorization policy for ConfRM, a proof-carrying monadic interface to the computations which perform actions, and the main event loop of the server. This example uses ephemeral policies: authorization to perform actions, such as submitting a paper or a review, depend on the phase of the conference (submission, notification,. . . ). 2.4.1

Actions

Rather than defining a command for each action—doRead, doSubmit, etc.— we use type-level computation to write one command for processing all actions; this simplifies the code for the main loop presented below and allows for straightforward addition of actions. The generic command for processing an action, doaction, has the following type:

doaction takes a principal k, an action a to perform, and some ExtraArgs for a, along with a proof that the computation is running as k, and that k may perform a. In this example, a Proof abbreviates a sequent whose view is PCChair, rather than Admin. It returns a Result, and has a PostCondition, both of which are dependent upon the action being performed. In Agda, ExtraArgs, Result, and PostConditions are functions defined by recursion on actions, which compute a Set, a Set, and a context, respectively. Several actions, such as Submitting a paper, require extra data that is not part of the logical specification (e.g., the contents of the paper should not be part of the proposition which authorizes it to be submitted). ExtraArgs produces the set of additional arguments each action requires.

Policy

We formalize ConfRM’s policy using terms of various types: actions represent requests to the web server; principals represent users; papers and strings are used to specify actions; roles define whether a user is an Author, PCMember, and so on. The policy is also dependent on the phase of the conference (e.g., an Author may submit a paper during the submission phase). The proposition May · (k , a) states that k may perform action a. Each action is a first-order term constructed from some arguments (e.g., Submit, Review, Readscore, Read all have papers, while Progress has two phases, the phase the conference is in before and after it is progressed). Fine specifies the policy as a collection of Horn clauses, which are simple to translate to our logic, as in the following clause:

ExtraArgs ExtraArgs ExtraArgs ExtraArgs Replace

: TCtx+ [] Term [] (action) Set Γ (Review · _) = Term [] (string) Γ (Submit · _) = Term [] (string) Γ (Progress · (p1 , p2)) = Σ (λ ∆ (a- (InPhase · p1)) (a- (InPhase · p2)) Γ ∆) ExtraArgs Γ _ = Unit

Reviews and paper submissions require their contents, represented as terms of type string (the Agda type Term [] (string) is an injection of strings into the language of first-order terms that we use to represent propositions, as described in Section 3 below). Progressing the phase of the conference requires a proof that the conference is in the first phase, along with a new context in the resulting phase, which we represent by a pair of a new context ∆ and a proof of Replace. Next, we specify the result type of an action:

clause1 = ((∀e principal · ∀e string · let author = . iS i0 papername = . i0 in (((a- (InPhase · (Submission))) ∧ ((a- ( InRole · (author , Author))))) ⊃ (a- (May · (author , (Submit · papername)))))))

Result Result Result Result Result Result Result

This proposition reads: for all authors and paper names, if the conference is in the submission phase, and the principal is an author, then the principal may submit a paper. We have also begun to reformulate the policy using the says modality, e.g. to allow authors to share their paper scores with their coauthors.

: Term [] (action) Set (Submit · _) = Term [] (paper) (Review · _) = Unit (BeAssigned · _) = Unit (Readscore · _) = String (Read · _) = String (Progress · _) = Unit

Readscore and Read return a paper’s reviews and contents, while submit produces a Term [] paper, a unique id for the paper. Finally, we define the PostCondition of each action, which is dependent upon the action itself, the precondition, the extra arguments for the action, the principal performing the action, and the Result of the action. Submitting a paper extends the preconditions with two propositions: one saying the paper has been submitted, and one saying the submitting principal is its author. Reviewing and Assigning a paper add that the paper is reviewed

saysClause = ((∀e principal · ∀e paper · ∀e principal · let primary = . i0 paper = . (iS i0) coauthor = . (iS (iS i0)) in (( ((a- (InPhase · (Notification)))) ∧ ((a- (Author · (primary , paper) ))) ∧ (primary says (a- (May · (coauthor , (Readscore · paper)))))) ⊃ (a- (May · (coauthor , (Readscore · paper)))))))

174

fix : ∀ {A Γ’} ( (∀ {Γ} Γ A Γ’) (∀ {Γ} Γ A Γ’) ) (∀ {Γ} Γ A Γ’)} main : ∀ {Γ} Γ Unit (λ _ []) main = fix loop where loop : (∀ {Γ} Γ Unit (λ _ [])) (∀ {Γ} Γ Unit (λ _ [])) loop rec {Γ} = {-1-} prompt "Enter an action:" >>= λ astr case (parseAction astr) None⇒ error "Unknown action" Some⇒ λ actionArgs let a = (fst actionArgs) args = (snd actionArgs) in {-2-} prompt "Who are you?" >>= λ ustring let u = parsePrin ustring in {-3-} acquire [ ((a- (MaySu · (Prin "Admin" , u)))) ] / _ no⇒ error "Unable to su" {-4-} yes⇒ case make-replace None⇒ error "oops, not running as admin" Some⇒ λ asadmin {-5-} case (inputToE a _ args) None⇒ error "Bad input (e.g. not in phase)" Some⇒ λ args {-6-} (sudo (Prin "Admin") u (snd asadmin) (\x (snd (repAsPost (snd asadmin) {a} x))) (lfoc i0 init-) {-7-} (prove/dyn 15 _ _ >>= none⇒ error "Unauthorized action" some⇒ λ canDoAction {-8-} doaction u a args canDoAction) ) {-9-} >>= λ _ rec

Figure 3. ConfRM Main Loop by or assigned to the principal, respectively. Readscore and Read leave the conditions unchanged. The postcondition of Progress is the first component of its ExtraArgs, i.e. the context determined by replacing the current phase with the resulting one. PostCondition : (a : Term [] (action)) (Γ : TCtx+ []) ExtraArgs Γ a (k : Term [] (principal)) Result a TCtx+ [] PostCondition (Submit · y) Γ e k r = (a- (Submitted · r )) :: (a- (Author · (k , r))) :: Γ PostCondition (Review · y) Γ e k r = (a- (Reviewed · (k , y))) :: Γ PostCondition (BeAssigned · y) Γ e k r = (a- (Assigned · (k , y))) :: Γ PostCondition (Readscore · y) Γ e k r = Γ PostCondition (Read · y) Γ e k r = Γ PostCondition (Progress · (ph1 , ph2)) Γ e k r = (fst e)

tions. Because the requests are not determined until run-time, and authorization depends on the system state (the phase of the conference, the role of a principal), this example uses entirely dynamic verification of security policies: the server dynamically checks that each request is authorized just before performing it, using our theorem prover at run-time. The type system ensures that the appropriate dynamic check is made. Informally, the server loop works by (1) reading in an action and its arguments, (2) reading in a principal, (3) acquiring the credentials to su as that principal, (4) computing the precondition of the su, (5) computing the postconditions of performing the action, (6) su-ing as the principal, (7) proving the principal may perform the action, (8) performing the action, and (9) recurring. The fact that we have coalesced all of the actions into one primitive command makes this code much more concise than it would be otherwise, when we would have to repeat essentially this code as many times as there are actions. This code is rendered in Agda as follows. fix permits an IO computation to be defined by general recursion. Because its type is restricted to the monad, it does not permit non-terminating elements of other types, such as Proof. This fixed-point combinator abstracts over the precondition, so it may vary in recursive calls, but leaves the postcondition fixed throughout the loop; we leave more general loop invariants to future work. First, main is given the type ∀ {Γ} Γ Unit (λ _ []): given any precondition, the computation returns unit and an empty postcondition (we do not expect to run any code following main so it is not worthwhile to track the postconditions). main is defined by taking the fixed point of the axillary function loop, which is abstracted over the recursive call. On line (1), the loop prompts the user to enter an action to perform, parseAction then parses the string to produce a : action and args: InputArgs, and raises an error otherwise. (2) The loop prompts for a username, parses it into a Term [] principal. (3) The loop attempts to acquire credentials that "Admin" may su as the principal (e.g., by prompting for a password). (4) The loop calls the functions make-replace to produce the preconditions for the su, by replacing (As (Prin "Admin")) with a+ (As u). (5) The loop calls inputToE to produce the ExtraArgs for the action from the args; for Progress, this function computes the postcondition of the action from the current context. (6) The loop su-s as the principal. The first replace argument to su is the result of step (4), the proof argument is the assumption acquired in step (3), the second replace argument is discussed below. (7) The loop calls the theorem prover at run-time to prove the principal may perform the requested action. (8) The loop calls doaction and (9) recurs. The second replace argument to su is generated using a proof that As is preserved in the PostCondition of an action: postPreservesAs : ∀ {a Γ e k r k’ } (a+ (As · k’) ∈ Γ) ((a+ (As · k’)) ∈ PostCondition a Γ e k r)

This is another example of using Agda to verify invariants of the pre- and post-conditions, as in Section 2.1.4. 2.4.4

In writing the main server loop, we will use the following monadic wrapper of our theorem prover, in order to test at run time whether a given proposition holds in the current state of the server: prove/dyn : ∀ {Γ1} Nat (Γ : TCtx+ []) (A : Propo- [])

Γ1 (Maybe (Proof Γ A)) (λ _ Γ1)

2.4.3

Dynamic Policy Acquisition

Finally, we describe an example of dynamic policy acquisition in Figure 4: we read the reviewers’ paper assignments from a database, parse the result into a context, acquire the context, and start the main server loop with those preconditions. This is simple in a dependently typed language because contexts themselves are data. The function getReviewerAsgn takes a string, representing a path to the database, and returns the list of reviewers for each paper. The function parseReviewers then turns each of these lists into lists of propositions, each stating the parsed reviewer is a reviewer of the paper. A more realistic ConfRM implementation would read a variety of other propositions from the database as well

Server Main Loop

In Figure 3 we show the code for the main loop of the ConfRM server, implemented using the interface described above. The main loop serves requests made by principals who wish to perform ac-

175

getReviewerAsgn : ∀ {Γ} String

Γ (List (List String)) (λ _ Γ)

logic. The following excerpt from the signature for ConfRM illustrates what programmers write to define an individual example:

parseReviewers : List String TCtx+ []

data BaseType : Set where string paper role action phase principal : BaseType

mkPolicy : ∀ {Γ} Γ (TCtx+ []) (λ _ Γ) mkPolicy = getReviewerAsgn "papers.db" >>= λ asgn return (ListM.fold [] (λ x λ y parseReviewers x ++ y) asgn)

data Const : BaseType -> Set where Prin : String -> Const principal Paper : String -> Const paper PCChair Reviewer Author Public : Const role Init Presubmission Submission ... : Const phase

start = mkPolicy {[]} >>= λ ctx acquire ctx / _ no⇒ error "policy not accepted" yes⇒ main

data Func : BaseType -> Type -> Set where Review BeAssigned ... : Func action (paper) Progress : Func action (phase ⊗ phase)

Figure 4. ConfRM Policy Acquistion data Atom : Type -> Set where InPhase : Atom (phase) Assigned ... : Atom (principal ⊗ paper) May : Atom (principal ⊗ action) As : Atom (principal)

(which papers have been submitted, reviewed, etc.) The computation mkPolicy calls getReviewerAsgn and parses the results. The computation start uses mkPolicy to generate an initial policy, acquires these preconditions, and starts the main sever loop.

3.

The programmer defines a datatype of base types, a datatype giving constants of each type, a datatype of function symbols, and a datatype of atomic propositions over a given type. Additionally, the programmer must define a couple of operations on these types (equality, enumeration of all elements of a finite type) which in a future version of Agda could be generated automatically [5]. Types are BaseTypes, unit and pair types (τ 1 ⊗ τ 2). The terms over a signature are given by a datatype Term Ω τ , where Ω, an individual context (ICtx), represents the free variables of the term. An ICtx is a list of BaseTypes, and represents a context of individual variables—e.g. the context x1 : τ1 , . . . , xn : τn will be represented by the list τ1 :: ... :: τn :: []. Variables are represented by well-scoped de Bruijn indices, which are pointers into such a list—i0 says x ∈ (x :: l), and iS says that x ∈ (y :: l) if x ∈ l. Terms are either variables (. i), where i : τ ∈ Ω is a de Bruijn index, constants, applications of function symbols (f · t), or [] and (t1 , t2) for unit and product types. Atomic propositions are represented by a datatype Aprop Ω. An atomic proposition p · t consists of an Atom paired with a term of the appropriate type. We have defined weakening and substitution generically on terms and propositions, and proved several properties of them (e.g. functoriality of weakening).

Implementation

Our Agda implementation consists of about 1400 lines of code. We have also written about 1800 lines of example code in the embedded language, including policies, monadic interfaces to primitives, and example programs. In this section, we describe the implementation of the logic, the theorem prover, and the indexed monad. 3.1

Representing BL0

BL0 [21] extends first-order intuitionistic logic with the modality k says A. While a variety of definitions of says have been studied (Abadi [2] overviews some of the approaches), in BL0 , says is treated as a necessitation (2) modality, and not as a lax modality (i.e. a monad) [1, 8, 22, 24]. The definition of says in BL0 supports exclusive delegation, where a principal delegates responsibility for a proposition to another principal, without retaining the ability to assert that proposition himself. For example, consider a policy that payroll says ∀t.(HR says employee(t)) ⊃ MayBePaid(t). Under what circumstances can we conclude payroll says MayBePaid(Alice)? The fact that HR says employee(Alice) should be sufficient. However, the fact that payroll says employee(Alice) should not, as the intention of the policy is that payroll delegates responsibility for the employee predicate to human resources, without retaining the ability to assert employee instances itself. When says is treated as a lax modality, payroll says employee(Alice) implies payroll says HR says employee(Alice), which is enough to conclude the goal. Abstractly, we wish k says A to imply k0 says (k says A), but not k says (k0 says A). The modality satisfies several other axioms: for example, principals say all consequences the statements they have made (k says (p ⊃ q) entails (k says p ⊃ k says q)) and principals believe what they say is true (k says ((k says s) ⊃ s)). 3.1.1

3.1.2

Propositions

BL0 propositions include conjunction, disjunction, implication, universal and existential quantification, and the says modality: A, B, C

::=

P |A∧B |A∨B |A⊃B |> | ⊥ | ∀x : τ .s | ∃x : τ .A | k says A

In Figure 5, we represent this syntax in Agda. Propositions (Propo) are indexed by a context of free variables, and additionally by a polarity (+ or -), which will be helpful in defining a focused sequent calculus below. Because the syntax of propositions is polarized, there are two injections a- and a+ from atomic propositions Aprop to negative and positive propositions, respectively. Additionally, the shifts ↓ and ↑ include negative into positive and vice versa. The remaining datatype constructors correspond to the various ways of forming propositions in the above grammar. For example, the _ ∧ _ constructor takes two terms of type Propo+ Ω and returns a term of type Propo+ Ω. The constructor ∃i (existential quantification over individuals), takes a positive proposition, in a context with one new free variable of type τ , and returns a positive proposition in the original context Ω. We have suppressed the shifts up to this point in the paper for readability. We could suppress shifts in our Agda code by implementing a simple translation that, given an unpolarized proposition

Terms, Types, and Atomic Propositions

In the above examples, we used a variety of atomic propositions (Mayread, Owns, etc.), which refer to several datatypes (principals, papers, conference phases, etc.). We have parametrized the representation of BL0 and its theorem prover over such datatypes and atomic propositions by defining a generic datatype of first-order terms, with free variables, over a given signature. This allows us to specify the types, terms, and propositions for an example concisely, while exploiting a datatype-generic definition of weakening, substitution, etc., which are necessary to state the inference rules of the

176

data Propo : Polarity ICtx Set where _⊃_ : ∀ {Ω} Propo+ Ω Propo- Ω Propo- Ω ∀i_ : ∀ {Ω τ } Propo- (τ :: Ω) Propo- Ω a- : ∀ {Ω} Aprop Ω Propo- Ω ↑ : ∀ {Ω} Propo+ Ω Propo- Ω

focus (choice) steps of like-polarity connectives to be chained together, but does not force inversion (pattern-matching) steps to be chained together. We use weak, rather than full, focusing because it is slightly easier to represent in Agda, and because it can sometimes lead to shorter proofs if one internalizes the identity principles (which say that A entails A)—though we do not exploit this fact in our current prover. The polarity of k says A is as follows: A is negative, but k says A itself is positive. As a simple check on this, observe that k says A is invertible on the left—one can always immediately make the claims assumption—but not on the right—because saysR clears the true assumptions. For example, a policy is often of the form k1 says A1 , . . . kn says An , with a goal of the form k0 says B. It is necessary to use claimsL to turn all propositions of the form k says A in Γ into claims in ∆ before using saysR on the goal—if one uses saysR first, the policy would be discarded. This polarization is analogous to 2 in Pfenning and Davies [33] and to ! in linear logic [6], which is reasonable given that says is a necessitation modality. Our sequent calculus has three main judgements:

_∨_ : ∀ {Ω} Propo+ Ω Propo+ Ω Propo+ Ω _∧_ : ∀ {Ω} Propo+ Ω Propo+ Ω Propo+ Ω ⊥ : ∀ {Ω} Propo+ Ω > : ∀ {Ω} Propo+ Ω ∃i_ : ∀ {Ω τ } Propo+ (τ :: Ω) Propo+ Ω _says_ : ∀ {Ω} Term Ω principal Propo- Ω Propo+ Ω a+ : ∀ {Ω} Aprop Ω Propo+ Ω ↓ : ∀ {Ω} Propo- Ω Propo+ Ω

Figure 5. Agda Representation of BL0 Propositions and an intended polarization of each atom, computes a polarized proposition with minimal shifts. 3.1.3

Proofs

k

• Right focus: Ω; ∆; Γ − → [A+ ]

k

Sequent calculus. Sequents in BL0 have the form Ω; ∆; Γ − → A. The context Ω gives types to individual variables (e.g. it is extended by ∀), and the context Γ contains propositions that are assumed to be true (e.g. it is extended by ⊃)—these are the standard contexts of first-order logic. The context ∆ contains claims, assumptions of the form k0 claims A; claims is the judgement underlying the says connective [21, 33]. Finally, k, the view of the sequent, is the principal on behalf of whom the inference is made. The rules for says are as follows:

k • Left focus: Ω; ∆; Γ − → [A- ] > C k

• Neutral sequent: Ω; ∆; Γ − → C-

Here ∆ consists of claims k claims A- and Γ consists of positive propositions. For convenience in the Agda implementation, we k break out a one-step left-inversion judgement Ω; ∆; Γ − → A+ >I C, which applies a left rule to the distinguished proposition A+ and then reverts to a neutral sequent. The rules are a fairly simple integration of the idea of weak focusing [34] with the focusing interpretation of says described above. The interested reader can find the inference rules for these judgements in the extended version of this paper [28].

k

Ω; ∆; [] − →A

SAYS R

k0

Ω; ∆; Γ − − → k says A k0

Ω; ∆, (k claims A); Γ, (k says A) − − →C k0

SAYS L

Agda Representation In Figure 6, we show an excerpt of the Agda representation of this sequent calculus. First, we define a record type for a Ctx, which tuples together the Ω, ∆, Γ, and k parts of a sequent—we write Θ for such a tuple. Γ is represented as a list of propositions; ∆ is represented as a list of pairs of a principal and a proposition, written p claims A; k is a term of type principal. Record fields are selected by writing R.x, where the type of the record is R and the desired field is x (e.g., Ctx.rk selects the principal from a Ctx record). Note that Ctx is a dependent record: the true context, the claims context, and the view can mention the variables bound in the individual context rΩ. We write TCtx+ Ω for List (Propo+ Ω). We define several helper functions on Ctxs: sayCtx clears the Ctx of true propositions, and changes the view of the context to its second argument. ictx (not shown) is shorthand for Ctx.rΩ. addTrue and addClaim (not shown) add a true proposition onto Γ or a claim onto ∆, respectively. addVar adds a variable to Ω, and weakens the rest of the context. When writing down the calculus on paper, it is obvious that extending Ω does not affect Γ or ∆; any variables bound in Ω will be bound in Ω0 ⊇ Ω. However, in Agda, it is necessary to explicitly coerce F Ω to F Ω0 for type families F dependent on Ω. We have defined weakening functions for many of the types indexed by Ω: terms (weakenTerm), propositions, claims, true contexts (weakenT+), claims contexts (weakenC), and so on. There are 4 judgments in our weakly-focused sequent calculus; analogously, there are 4 mutually recursive datatype declarations representing these judgements in Agda, with one datatype constructor for each inference rule. We show the constructors ∀L (for the left focus judgement), ∃L and saysL (for the left inversion judgement), saysR (for the right focus judgement), and claimsL (for

Ω; ∆; Γ, (k says A) − − →C k0

Ω; (∆, k claims A); (Γ, A) − − →C k0

k ≥ k0

CLAIMS L

Ω; (∆, k claims A); Γ − − →C

In order to show k says A, one empties the context Γ of true assumptions, and reasons on behalf of k with the goal A (rule saysR). It is necessary to empty Γ because the facts in it may depend on claims by the principal k0 , which are not valid when reasoning as k. The rule saysL says that if one is reasoning from an assumption k says A, one may proceed using a new assumption that k claims A. Claims are used by the rule claimsL, which allows passage from a claim k claims A to an assumption that A is actually true. This rule makes use of a preorder on principals, and asserts that any statements made by a greater principal are accepted as true by lesser principals. Focused sequent calculus. To help with defining a proof search procedure, we present BL0 as a weakly-focused sequent calculus. Garg [21] describes both an unfocused sequent calculus and a focused proof system for FHH, a fragment of BL0 ; here we give a focused sequent calculus for all of BL0 . Focusing [6] is a prooftheoretic technique for reducing inessential non-determinism in proof search, by exploiting the fact that one can chain together certain proof steps into larger steps. In the Agda code above, we polarized the syntax of propositions, dividing them into positive and negative classes. Positive propositions, such as disjunction, require choices on the right, but are invertible on the left: a goal C is provable under assumption A+ if and only if it is provable under the left rule’s premises. Dually, negative propositions involve choices on the left but are invertible on the right. Weak focusing [34] forces

177

record Ctx : Set where field rΩ : ICtx rΓ+ : List (Propo+ rΩ) -- pairs written (k claims A) r∆ : List (Term rΩ principal × Propo- rΩ) rk : Term rΩ principal

3.2

Proof Search

We have implemented a simple proof-producing theorem prover for BL0 : prove : Nat (θ : Ctx) (A : Propo- (ictx θ)) Maybe (θ ` A)

addVar : (θ : Ctx) (A : Type) Ctx addVar θ τ = record {rΩ = (τ :: Ctx.rΩ θ) ; rΓ+ = (weakenT+ (Ctx.rΓ+ θ) iS) ; r∆ = (weakenC (Ctx.r∆ θ) iS) ; rk = (weakenTerm (Ctx.rk θ) iS)}

prove takes a depth bound, a context, and a proposition, and attempts to find a proof of θ ` A with at most the given depth. The prover is certified: when the prover succeeds, it returns a proof, which is guaranteed by type checking to be well-formed. When the prover fails, it simply returns None. The prover is implemented by around 200 lines of Agda code. Our prover is quite naïve, but it suffices to prove the examples in this paper. For the most part, the prover backchains over the focusing rules. However, whereas the above sequent calculus was only weakly focused, the prover is fully focused, in that it eagerly applies invertible rules, which avoids backtracking over different applications of them. If the goal is right-invertible, the prover applies right rules. Once the goal is not right-invertible (an atom or a shift ↑ A+ ), the prover fully left-inverts all of the assumptions in Γ. Inverting a context Γ breaks up the positive propositions using left rules, generating a list of non-invertible contexts Θ1 , ..., Θk such that, if for every i, Θi ` C, then Θ ` C. Once the sequent has been fully inverted, the prover tries right-focusing (if the goal is a shift ↑ A+ ) and left-focusing on all assumptions in Γ and claims in ∆, until one of these choices succeeds. The focus phases involves further backtracking over choices (e.g., which branch of a disjunction to take). The focus rules for quantifiers (∀E and ∃I) require guessing an instantiation of the quantifier. Our current implementation is brute-force: it simply computes all terms of a given type in a given context and tries each of them in turn—we have only considered individual types with finitely many inhabitants. The prover achieves tolerable compile times on the small examples we have considered so far (1 to 13 seconds). If it proves too slow for some examples, we have several options: First, we can improve our implementation—e.g. by implementing unification, which will eliminate much of the branching from quantifiers, or by doing a better job of clause selection. Second, we could connect Agda with an external theorem prover, following Kariso [25]. Garg has implemented theorem prover for BL0 in ML [21], which we could integrate soundly by writing a type checker for the certificates it produces. Third, we could optimize Agda itself, by fixing some known inefficiencies in Agda’s compile-time evaluation.

sayCtx : (θ : Ctx) (k : Term (Ctx.rΩ θ) principal) Ctx sayCtx θ k = (record {rΩ = Ctx.rΩ θ ; rΓ+ = [] ; r∆ = Ctx.r∆ θ ; rk = k}) mutual data _`L_>_ : (θ : Ctx) Propo- (ictx θ) Propo- (ictx θ) Set where ∀L : ∀ {θ τ A C} (t : Term (ictx θ) τ ) θ `L (substlast A t) > C θ `L ∀i_ {ictx θ}{τ } A > C ... data _`I_>_ : (θ : Ctx) (Propo+ (ictx θ)) Propo- (ictx θ) Set where ∃L : ∀ {θ τ A C} (addTrue (addVar θ τ ) A) ` (weakenP C iS) θ `I (∃e τ A) > C saysL : ∀ {θ k s B} addClaim θ (k claims s) ` C θ `I (k says s) > C ... data _`R_ : (θ : Ctx) Propo+ (ictx θ) Set where saysR : ∀ {θ k A} (sayCtx θ k) ` A θ `R (k says A) ... data _`_ : (θ : Ctx) Propo- (ictx θ) Set where claimsL : ∀ {θ k A C} (k claims A) ∈ Ctx.r∆ θ θ `L A > C k ≥ Ctx.rk θ θ ` C ...

Figure 6. Agda representation of proofs (exceprt)

3.3

Computations

The monadic interfaces presented in Section 2 are currently treated as refinement types on Haskell’s IO monad, which is exposed through the Agda foreign function interface. The implementations of proof-carrying file operations simply ignore their proof arguments. fix is compiled using general recursion in Haskell. In this operational model, programs written in Aglet adhere to the security policies, but no guarantees are made about programs that can access, e.g., the raw file system operations. We discuss alternatives in Section 5 below.

the neutral sequent judgement). For the most part, the rules are a straightforward transcription of the sequent calculus rules [28]. In ∀L, the function substlast substitutes a term for the last variable in a proposition; we have implemented substitution for individual variables for each of the syntactic categories. In ∃L, it is necessary to weaken the goal with the new variable, which is tacit in on-paper presentations. Properties Because the sequent calculus is cut-free, consistency of closed proofs is immediate:

4.

Consistency: For all principals k, there is no derivation of k []; []; [] − →↑ ⊥.

Related Work

Aglet implements security-typed programming in the style of Aura [24], PCML5 [9], Fine [38], and previous work by Avijit and Harper [8] (henceforth AH), which integrate authorization logics into functional/imperative programming languages. Our main contribution relative to these languages is to show how to support security-typed programming within an existing dependently-typed language. There are also some technical differences between these languages and ours:

Proof: no rule concludes ⊥ in right focus, and in the empty context no left focus or left inversion rules apply. Identity and cut can be proved using the usual syntactic methods, adapting Garg’s proof [21] for an unfocused sequent calculus to weak focusing, following Pfenning and Simmons [34].

178

predicates on an explicit representation of the system state, whereas we index by policies Γ that describe an implicit ambient state. Many security-typed languages address the problem of enforcing information flow policies (see Abadi et al. [4], Chothia et al. [15] for but a couple of examples). We follow Russo et al. [36], Swamy et al. [38] in representing information flow using an abstract type constructor (e.g., a monad or an applicative functor). Fable [37] takes a different approach to verifying access-control, information flow, and integrity properties, by providing a type of labelled data that is treated abstractly outside of certain policy portions of the program. This mechanism facilitates checking security properties (by choosing the labels appropriately and implementing policy functions) and proving bi-simulation properties of the programs that adhere to these policies. DeYoung and Pfenning [16] describe a technique for representing access control policies and stateful operations in a linear authorization logic. Our approach to verifying context invariants, as in Section 2.1.4, is inspired by their work. The literature describes a growing body of authorization logics [1–3, 17, 20, 21]. We chose BL0 [21], a simple logic that supports the expression of decentralized policies and whose says connective permits exclusive delegation. Appel and Felten [7] pioneered the use of proof-carrying authorization, in which a system checks authorization proofs at run-time. Several systems have been built using PCA [10, 23, 40]. Like many security-typed languages, we use dependently typed PCA to check authorization proofs at compile-time through type checking.

First, Aura, PCML5, and AH interpret says as a lax modality, whereas BL0 interprets it as a necessitation modality to support exclusive delegation; Fine uses first-order classical logic and does not directly support the says modality. The context-clearing necessitation modality is more challenging to represent than a lax modality. Second, unlike these four languages, our language treats propositions and proofs as inductively defined data, which has several applications: In Aura, all proof-carrying primitives log the supplied proofs for later audit; the programmer could implement logged operations on top of our existing interface by writing a function toString : Proof Γ A -> String by recursion over proofs. Recursion over propositions is also essential for writing our theorem prover inside of Agda. Third, our indexed monad of computations allows us to encode computation on behalf of a principal, following AH. In Aura, all computation proceeds on behalf of a single distinguished principal self. In PCML5, a program can authenticate as different principals, but the credentials are less precise: in PCML5, the program authenticates as k, whereas in AH the program acquires only the ability to su from a given k0 to k—which may be a useful restriction if the program is subsequently no longer running as k0 . Fine does not track authentication as a primitive notion, though it seems likely it could be encoded using an As predicate and affine types. Fourth, in PCML5, acquire uses theorem proving to deduce consequences of the policy, whereas in our language acquire only tests whether a state-dependent atom or a statement by a principal is literally in the policy, and a separate theorem prover deduces consequences from the policy. We separate theorem proving from acquire so that we may also use the same theorem prover at compile-time to statically discharge proof obligations. PCML5 and AH make use of a theorem prover only at run-time, whereas Fine uses theorem proving only at compile-time. Fifth, PCML5 is a language for spatially distributed authorization, where resources and policies are located at different sites on a network. We have shown how to support ML5-style spatial distribution using our indexed monad, but we leave spatial distribution of policies to future work. Sixth, the operational semantics of both PCML5 and AH include a proof-checking reference monitor; we have not yet considered such an implementation. Several other languages provide support for verifying security properties by type checking. For example, Fournet et al. [19] develop a type system for a process calculus, and Bengtson et al. [11] for F#, both of which can be used to verify authorization policies and cryptographic protocols. This work addresses important issues of concurrency, which we do not consider here. A technical difference is that, in their work, proofs are kept behind the scenes (e.g., in F7, propositions are proved by the Z3 theorem prover). In contrast, our language makes the proof theory directly available to the programmer, so that propositions and proofs can be computed with (for logging or run-time theorem proving) and so that proofs can be constructed manually when a theorem prover fails. Another example of a language that does not give the programmer direct access to the proof theory is PCAL [13], an extension of BASH that constructs the proofs required by a proof-carrying file system [23]; proof construction is entirely automated, but sometimes inserts runtime checks. Our indexed monad was inspired by HTT [30]. RIF [12] also investigates applications of indexed monads to security-typed programming, but there are some technical differences: First, RIF is a new language where refinement types (using first-order classical logic) and a refined state monad are primitive notions, whereas we embed an authorization logic and an indexed monad in an existing dependently typed language. Second, RIF’s monad is indexed by

5.

Conclusion

In this paper, we have described Aglet, a library embedding security-typed programming in a dependently-typed programming language. There are many interesting avenues for future work: First, we may consider embedding an authorization logic such as full BL [20] that accounts for resources that change over time. Second, we have currently implemented the monadic computation interface on top of unguarded Haskell IO commands, which provides security guarantees for well-typed programs. To maintain security in the presence of ill-typed attackers, we may instead implement our interface using a proof-carrying run-time system such as PCFS [23]. Following PCML5 [9], we may then be able to prove a progress theorem showing that well-typed programs always pass the reference monitor. Another intriguing possibility is to formalize the operational behavior of computations directly within Agda— e.g. using an algebraic axiomatization [35]. Third, in this paper we have shown examples of entirely static and entirely dynamic verification; we would like to consider examples that mix the two. This will require using reflection to represent Agda judgements as data, so that our theorem prover does not get stuck on open Agda terms. Fourth, we have shown a few small examples of using Agda to reason about the class of contexts that is possible given a particular monadic interface. In future work, we would like to explore ways of systematizing this reasoning (e.g., by using linear logic to describe transformations between contexts, as in DeYoung and Pfenning [16]). We would also like to use Agda to analyze global properties of a particular monadic interface (such as proving a principal can never access a resource). Once we have circumscribed the contexts generated by a particular interface, we can prove such properties by induction on BL0 proofs. Fifth, we would like to implement more significant examples, such as a larger portion of ConfRM. Acknowledgements We thank Frank Pfenning, Robert Harper, Kumar Avijit, Deepak Garg, and Rob Simmons for helpful discussions about this work. We thank Frank Pfenning, Robert Harper, and several anonymous referees for feedback on previous drafts of this article.

179

References

[21] D. Garg. Proof search in an authorization logic. Technical Report CMU-CS-09-121, Computer Science Department, Carnegie Mellon University, April 2009.

[1] M. Abadi. Access control in a core calculus of dependency. In Internatonal Conference on Functional Programming, 2006.

[22] D. Garg and F. Pfenning. Non-interference in constructive authorization logic. In Computer Security Foundations Workshop, pages 183– 293, 2006.

[2] M. Abadi. Variations in access control logic. In International Conference on Deontic Logic in Computer Science, pages 96–109. SpringerVerlag, 2008.

[23] D. Garg and F. Pfenning. PCFS: A proof-carrying file system. Technical Report CMU-CS-09-123, Carnegie Mellon University, 2009.

[3] M. Abadi, M. Burrows, B. Lampson, and G. Plotkin. A calculus for access control in distributed systems. ACM Transactions on Programming Languages and Systems, 15(4):706–734, September 1993.

[24] L. Jia, J. A. Vaughan, K. Mazurak, J. Zhao, L. Zarko, J. Schorr, and S. Zdancewic. Aura: A programming language for authorization and audit. In ACM SIGPLAN International Conference on Functional Programming, 2008.

[4] M. Abadi, A. Banerjee, N. Heintze, and J. G. Riecke. A core calculus of dependency. In ACM Symposium on Principles of Programming Languages, pages 147–160. ACM Press, 1999.

[25] K. Kariso. Integrating Agda and automated theorem proving techniques. Talk at Dependently Typed Programming Workshop, 2010.

[5] T. Altenkirch and C. McBride. Generic programming within dependently typed programming. In IFIP TC2 Working Conference on Generic Programming, Schloss Dagstuhl, 2003.

[26] S. Krishnamurthi. The CONTINUE server (or, How I administered PADL 2002 and 2003). In International Symposium on Practical Aspects of Declarative Languages, pages 2–16. Springer-Verlag, 2003.

[6] J.-M. Andreoli. Logic programming with focusing proofs in linear logic. Journal of Logic and Computation, 2(3):297–347, 1992.

[27] D. R. Licata and R. Harper. A monadic formalization of ML5. In Pre-preceedings of Workshop on Logical Frameworks and Metalanguages: Theory and Practice, July 2010.

[7] A. W. Appel and E. W. Felten. Proof-carrying authentication. In ACM Conference on Computer and Communications Security, pages 52–62, 1999.

[28] J. Morgenstern and D. R. Licata. Security-typed programming within dependently typed programming. Technical Report CMU-CS-10-114, Carnegie Mellon University, 2010.

[8] K. Avijit and R. Harper. A language for access control. Technical Report CMU-CS-07-140, Carnegie Mellon University, Computer Science Department, 2007.

[29] T. Murphy, VII. Modal Types for Mobile Code. PhD thesis, Carnegie Mellon, January 2008. Available as technical report CMU-CS-08-126.

[9] K. Avijit, A. Datta, and R. Harper. Distributed programming with distributed authorization. In ACM SIGPLAN-SIGACT Symposium on Types in Language Design and Implementation, 2010.

[30] A. Nanevski, G. Morrisett, and L. Birkedal. Polymorphism and separation in Hoare Type Theory. In ACM SIGPLAN International Conference on Functional Programming, pages 62–73, Portland, Oregon, 2006.

[10] L. Bauer, S. Garriss, J. M. Mccune, M. K. Reiter, J. Rouse, and P. Rutenbar. Device-enabled authorization in the Grey System. In Proceedings of the 8th Information Security Conference, pages 431– 445. Springer Verlag LNCS, 2005.

[31] A. Nanevski, G. Morrisett, A. Shinnar, P. Govereau, and L. Birkedal. Ynot: Reasoning with the awkward squad. In ACM SIGPLAN International Conference on Functional Programming, 2008.

[11] J. Bengtson, K. Bhargavan, C. Fournet, A. Gordon, and S. Maffeis. Refinement types for secure implementations. In Computer Science Logic, 2008.

[32] U. Norell. Towards a practical programming language based on dependent type theory. PhD thesis, Chalmers University of Technology, 2007.

[12] J. Borgström, A. D. Gordon, and R. Pucella. Roles, Stacks, Histories: A Triple for Hoare. Technical Report MSR-TR-2009-97, Microsoft Research, 2009.

[33] F. Pfenning and R. Davies. A judgmental reconstruction of modal logic. Mathematical Structures in Computer Science, 11:511–540, 2001.

[13] A. Chaudhuri and D. Garg. PCAL: Language support for proofcarrying authorization systems. In Proceedings of the 14th European Symposium on Research in Computer Security, September 2009. [14] S. Chong, A. C. Myers, K. Vikram, and L. Zheng. Jif reference manual. Available from http://www.cs.cornell.edu/jif/doc/jif-3.3.0/manual.html, February 2009. [15] T. Chothia, D. Duggan, and J. Vitek. Type-based distributed access control (extended abstract). In Computer Security Foundations Workshop, 2003.

[34] F. Pfenning and R. J. Simmons. Substructural operational semantics as ordered logic programming. In IEEE Symposium on Logic In Computer Science, pages 101–110, Los Alamitos, CA, USA, September 2009. IEEE Computer Society. [35] G. Plotkin and M. Pretnar. Handlers of algebraic effects. In European Symposium on Programming, pages 80–94. Springer-Verlag, 2009. [36] A. Russo, K. Claessen, and J. Hughes. A library for light-weight information-flow security in Haskell. In ACM SIGPLAN Symposium on Haskell, pages 13–24. ACM, 2008.

[16] H. DeYoung and F. Pfenning. Reasoning about the consequences of authorization policies in a linear epistemic logic. In Workshop on Foundations of Computer Security, 2009.

[37] N. Swamy, B. J. Corcoran, and M. Hicks. Fable: A language for enforcing user-defined security policies. In IEEE Symposium on Security and Privacy, pages 369–383. IEEE Computer Society, 2008.

[17] H. DeYoung, D. Garg, and F. Pfenning. An authorization logic with explicit time. In IEEE Computer Security Foundations Symposium, 2008.

[38] N. Swamy, J. Chen, and R. Chugh. Enforcing stateful authorization and information flow policies in Fine. In European Symposium on Programming, 2010.

[18] D. J. Dougherty, K. Fisler, and S. Krishnamurthi. Specifying and reasoning about dynamic access-control policies. In International Joint Conference on Automated Reasoning, pages 632–646. Springer, 2006.

[39] J. A. Vaughan, L. Jia, K. Mazurak, and S. Zdancewic. Evidencebased audit. In IEEE Computer Security Foundations Symposium, June 2008.

[19] C. Fournet, A. D. Gordon, and S. Maffeis. A type discipline for authorization in distributed systems. In Computer Science Logic, 2007.

[40] E. Wobber, M. Abadi, M. Burrows, and B. Lampson. Authentication in the Taos operating system. ACM Transactions On Computer Systems, 12(1):3–32, 1994.

[20] D. Garg. Proof Theory for Authorization Logic and its Application to a Practical File System. PhD thesis, Carnegie Mellon University, 2009.

180

Combining Syntactic and Semantic Bidirectionalization Janis Voigtl¨ander ∗ University of Bonn R¨omerstraße 164 53117 Bonn, Germany [email protected]

Zhenjiang Hu

Kazutaka Matsuda

Meng Wang

National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo 101-8430, Japan [email protected]

Tohoku University 6-3-09 Aramaki aza Aoba, Aoba-ku Sendai 980-8579, Japan [email protected]

University of Oxford Wolfson Building, Parks Road Oxford OX1 3QD, United Kingdom [email protected]

Abstract

Syntactic bidirectionalization [Matsuda et al. 2007] works on a syntactic representation of (somehow restricted) get-functions and synthesizes appropriate definitions for put-functions algorithmically. Semantic bidirectionalization [Voigtl¨ander 2009] does not inspect the syntactic definitions of get-functions at all, but instead provides a single definition of put, parameterized over get as a semantic object, that does the job by invoking get in a kind of “simulation mode”. (We will briefly introduce both techniques in Section 2.) Both syntactic and semantic bidirectionalization have their strengths and weaknesses. Syntactic bidirectionalization heavily depends on syntactic restraints exercised when implementing the get-function. Basically, the technique of Matsuda et al. [2007] can only deal with programs in a custom first-order language subject to linearity restrictions and absence of intermediate results between function calls. Semantic bidirectionalization, in contrast, provides very easy access to bidirectionality within a general-purpose language, liberated from the syntactic corset as to how to write functions of interest. The price to pay for this in the case of the approach of Voigtl¨ander [2009] is that it works for polymorphic functions only, and at present is unable to deal with view updates that change the shape of a data structure (more on this critical issue below). The syntactic approach, on the other hand, is successful for many such shape-changing updates, and can deal with non-polymorphic functions. In this paper we develop an approach for combining syntactic and semantic bidirectionalization. The resulting technique inherits the limitations in program coverage from both techniques. That is, except for some extensions we will consider later on, only functions that are written in the first-order language, are linear, and treeless in the sense of Wadler [1990], and moreover are polymorphic, can be dealt with. What we gain by the combination is improved updatability. Not only do we bring the possibility of shape-changing updates to semantic bidirectionalization, but also will the combined technique be superior to syntactic bidirectionalization on its own in many cases. To explain what we mean by improved updatability, we have to elaborate on the phrase “in a meaningful way” in the first sentence of this introduction, and on “suitable” at the start of the second paragraph. So, when is a get/put-pair “good”? How should s, v, v 0 , and s0 in get s ≡ v and put s v 0 ≡ s0 be related? One natural requirement is that if v ≡ v 0 , then s ≡ s0 , or, put differently,

Matsuda et al. [2007, ICFP] and Voigtl¨ander [2009, POPL] introduced two techniques that given a source-to-view function provide an update propagation function mapping an original source and an updated view back to an updated source, subject to standard consistency conditions. Being fundamentally different in approach, both techniques have their respective strengths and weaknesses. Here we develop a synthesis of the two techniques to good effect. On the intersection of their applicability domains we achieve more than what a simple union of applying the techniques side by side delivers. Categories and Subject Descriptors D.1.1 [Programming Techniques]: Applicative (Functional) Programming; D.3.3 [Programming Languages]: Language Constructs and Features—Data types and structures, Polymorphism; H.2.3 [Database Management]: Languages—Data manipulation languages, Query languages General Terms Keywords

1.

Design, Languages

program transformation, view-update problem

Introduction

Bidirectionalization is the task to given some function get :: τ1 → τ2 produce a function put :: τ1 → τ2 → τ1 such that if get maps an original source s to an original view v, and v is somehow changed into an updated view v 0 , then put applied to s and v 0 produces an updated source s0 in a meaningful way. Such get/putpairs, called bidirectional transformations, play an important role in various application areas such as databases, file synchronization, structured editing, and model transformation. A survey of relevant techniques and open problems has recently appeared [Czarnecki et al. 2009], and functional programming approaches have had an important impact, with several ideas and solutions springing from this part of the programming languages field in particular [Bohannon et al. 2006, 2008; Foster et al. 2007, 2008; Hu et al. 2004; Matsuda et al. 2007, 2009; Voigtl¨ander 2009]. Automatic bidirectionalization is one approach to obtaining suitable get/put-pairs, others are domain-specific languages or more ad-hoc programming techniques. Two different flavors of bidirectionalization have been proposed: syntactic and semantic. ∗ The

research reported here was performed while this author visited the National Institute of Informatics, Tokyo, under a fellowship by the Japan Society for the Promotion of Science, ID No. PE09076.

put s (get s) ≡ s . 0

(1) 0

Another requirement to expect is that s and v should be related in the same way as s and v are, or, again expressed as a round-trip property, get (put s v 0 ) ≡ v 0 . (2) These are the standard consistency conditions [Bancilhon and Spyratos 1981] known as GetPut and PutGet [Foster et al. 2007].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

181

Since our purpose here is to focus on the combination of techniques, we concentrate on one specific kind of functions, namely on functions from lists to lists. The original techniques we combine apply to algebraic data types more generally. In particular, Voigtl¨ander [2009, Section 6] employs generic programming techniques to deal with trees and the like. Something similar should be possible here, but we have not worked out the details. Our key ideas can all be explained, and hopefully appreciated, in the setting of lists only, and that explanation is what we seek to do. For the same reason, we do not consider type classes as Voigtl¨ander [2009, Sections 4 and 5] does; again, we think our ideas here could be transferred to those settings, but we refrain from doing so for the sake of focus. Our presentation will be partly example-driven, partly programdriven as we proceed through the refactoring and discovery process regarding generalization opportunities. We do state lemmas and theorems, but do not give formal proofs. These proofs can all be done similarly to those by Voigtl¨ander [2009], employing free theorems [Wadler 1989]. We will comment in a bit more detail where appropriate. As a final preparation before diving right in, we slightly revise the consistency conditions (1) and (2). Since our emphasis is on the updatability inherent in a get/put-pair, we make the partiality of put explicit in the type via optionality of the return value. The following definition formulates the consistency conditions for this setting.

But the latter of the two is often too hard to satisfy in practice. For fixed get, it can be impossible to provide a put-function fulfilling equation (2) for every choice of s and v 0 , simply because v 0 may not even be in the range of get. One solution is to make the putfunction partial and to only expect the PutGet law to hold in case put s v 0 is actually defined. Of course, a trivially consistent putfunction we could then always come up with is the one for which put s v 0 is only defined if get s ≡ v 0 and which simply returns s then. Clearly, this choice would satisfy both equations (1) and (2), but would be utterly useless in terms of updatability. The very idea that v and v 0 can be different in the original scenario would be countermanded. So our evaluation criteria for “goodness” are that get/put should satisfy equation (1), that they should satisfy equation (2) whenever put s v 0 is defined, and that put s v 0 should be actually defined on a big part of its potential domain, indeed preferably for all s and v 0 of appropriate type. With this measure in hand, one can compare different bidirectionalization methods. Semantic bidirectionalization as proposed by Voigtl¨ander [2009] has the problem that put s v 0 can only be defined when get s and v 0 have the same shape (length of a list, structure of a tree, . . . , and in some situations even with constraints on the equivalence and relative ordering of elements in data structures). Syntactic bidirectionalization as proposed by Matsuda et al. [2007] does not suffer from such a central and common (to all invocations) updatability weakness, but in many cases also rejects updates that one would really like to see accepted. The benefit of our combined technique now is that on the intersection of the classes of programs to which the original syntactic and semantic techniques apply, we can do strictly better in terms of updatability than either technique in isolation. We are never worse than the better of the two in a specific case. The combination strategy we pursue is essentially motivated by combining the specialties of the two approaches. Semantic bidirectionalization’s specialty is to employ polymorphism to deal with the content elements of data structures in a very lightweight way. In fact, in the original technique, the shape and content aspects of a data structure are completely separated, updates affecting the shape are completely outlawed, arbitrary updates to content elements can be simply absorbed, and by recombining original shape with updated content consistency is guaranteed. Syntactic bidirectionalization’s specialty is to have a more refined, and case-by-case, notion of what updates, including updates on the shape aspect, can be permitted. But it turns out that content elements often get in the way. In fact, by having to deal with both shape and content, at the same time, in the key step of syntactic bidirectionalization (namely “view complement derivation”), updatability is hampered. In our combined approach we divide the labor: semantic bidirectionalization deals with content only, syntactic bidirectionalization deals with shape only. As a result, the reach of semantic bidirectionalization is expanded beyond shape-preserving updates, and syntactic bidirectionalization is invoked on a more specialized kind of programs, on which it can yield better results, benefitting both. Technically, we treat syntactic bidirectionalization as a black box. Or rather, our eventual combined technique does so; for the sake of analyzing examples, we look into the box; but for actually executing the combined technique the syntactic technique could be a completely external component. Semantic bidirectionalization is treated as a glass box; we do look into it, and we refactor it to enable a plugging in of the syntactic technique. Indeed, our dissection of the semantic bidirectionalization technique is an independent contribution of this paper, beyond the specific use case of combining the techniques of Matsuda et al. [2007] and Voigtl¨ander [2009]. In principle, our refactoring allows also other approaches (than that of Matsuda et al.) for obtaining bidirectional transformations on shapes to be plugged into the semantic technique.

Definition 1. Let τ1 and τ2 be types. Let functions get :: τ1 → τ2 and put :: τ1 → τ2 → Maybe τ1 be given. We say that put is consistent for get if: • For every s :: τ1 ,

put s (get s) ≡ Just s . 0

• For every s, s :: τ1 and v 0 :: τ2 , if put s v 0 ≡ Just s0 , then

get s0 ≡ v 0 .

2.

The Original Techniques

We briefly introduce the two techniques we want to combine. Readers content with considering syntactic bidirectionalization as a black box can safely skip the next subsection and directly jump to Section 2.2. The combination approach can still be understood then, but it will be more difficult to appreciate some of the analysis of examples later on. 2.1

Syntactic Bidirectionalization

The technique of Matsuda et al. [2007] builds on the constantcomplement approach of Bancilhon and Spyratos [1981]. The basic idea is that for a function get :: τ1 → τ2 one finds a function compl :: τ1 → τ3 such that the pairing of the two, paired :: τ1 → (τ2 , τ3 ) paired s = (get s, compl s) is an injective function. Given an inverse inv :: (τ2 , τ3 ) → τ1 of paired , one obtains that put :: τ1 → τ2 → τ1 put s v 0 = inv (v 0 , compl s) makes equations (1) and (2) true.

182

In reality, asking for a full inverse inv of paired is too much. The function paired may not even be surjective. So one relaxes inv to be a partial function, either implicitly as Matsuda et al. [2007] do, or explicitly in the type. With

are collected by compl , and that, where necessary, different data constructors (of same arity/type) are used on the right-hand sides of compl to disambiguate between overlapping ranges of righthand sides of get. (In this specific example, this is not what causes different data constructors to be used. Instead, the simple fact that different arities are required, due to different numbers of dropped variables and recursive calls, leads to different data constructors.) Tupling gives the following definition for the paired function:

inv :: (τ2 , τ3 ) → Maybe τ1 and the requirements that • for every s :: τ1 ,

paired :: [α] → ([α], Compl α) paired [ ] = ([ ] , C1 ) paired [x ] = ([ ] , C2 x ) paired (x : y : zs) = (y : v , C3 x c) where (v , c) = paired zs

inv (paired s) ≡ Just s , and • for every s0 :: τ1 , v 0 :: τ2 , and c :: τ3 , if inv (v 0 , c) ≡ Just s0 ,

then paired s0 ≡ (v 0 , c) ,

Syntactic inversion, basically just exchanging left- and right-hand sides, plus introduction of monadic error propagation, gives:

we obtain that put :: τ1 → τ2 → Maybe τ1 put s v 0 = inv (v 0 , compl s)

:: Monad µ ⇒ ([α], Compl α) → µ [α] ([ ] , C1 ) = return [ ] ([ ] , C2 x ) = return [x ] (y : v , C3 x c) = do zs ← inv (v , c) return (x : y : zs) inv = fail "Update violates complement."

inv inv inv inv

is consistent for get in the sense of Definition 1. The approach of Matsuda et al. [2007] is to perform all the above by syntactic program transformations. For a certain class of programs, they give an algorithm that automatically derives compl from get in such a way that paired is indeed injective. Then instead of the definition for paired above they produce one using a tupling transformation [Pettorossi 1977] that avoids the two independent traversals of s with get and compl . They syntactically invert paired to obtain inv , and subsequently fuse the computations of inv and compl in the definition of put, again using a syntactic transformation [Wadler 1990]. We illustrate the syntactic approach based on two examples. A generalization over the above picture is that instead of Maybe we will use an arbitrary monad. This allows for more flexible use of the resulting put-function, and also enables us to provide informative error messages if desired.

Finally, put :: Monad µ ⇒ [α] → [α] → µ [α] put s v 0 = inv (v 0 , compl s) can be fused to: :: Monad µ ⇒ [α] → [α] → µ [α] [] [] = return [ ] [x ] [] = return [x ] (x : y : zs) (y 0 : v 0 ) = do zs 0 ← put zs v 0 return (x : y 0 : zs 0 ) = fail "Update violates complement." put

put put put put

Example 1. Assume our get-function is as follows, sieving a list to keep only every second element:

Note that for this function, put s v 0 fails if and only if length v 0 6= length (get 1 s). If it succeeds, it mixes the elements of s and v 0 as in, e.g., fromJust (put [1 . . 6] [7 . . 9]) = [1, 7, 3, 8, 5, 9].

get 1 :: [α] → [α] get 1 [ ] = [] get 1 [x ] = [] get 1 (x : y : zs) = y : (get 1 zs)

An implementation of the syntactic bidirectionalization method is available at http://www.kb.ecei.tohoku.ac.jp/~kztk/ bidirectionalization/. It automatically performs the steps from get to compl and paired . It also performs the syntactic inversion from paired to inv , though without the explicit monadic error propagation we have used here. It is not always the case as in the above example that inv can directly be interpreted as a deterministic program. Instead, it can happen that the non-failing equations have overlapping left-hand sides, leading to a nondeterministic program, in which case a backtracking search becomes necessary. Such a backtracking search is what the implementation then does, though in practice it would of course be preferable to directly obtain a deterministic program.2 Also, the implementation does not at present realize the final fusion step, but instead works with the definition of put in terms of inv and compl . Clearly, these “deficiencies” of the implementation only affect the efficiency of the bidirectional transformation, not its correctness/consistency. For the above and the following example, we continue to perform (“by hand”) the determinization and fusion steps, because the put-function thus obtained typically gives a better picture of the achieved updatability.

This function fulfills the syntactic prerequisites imposed by Matsuda et al. [2007]. They are (necessary1 and sufficient): that functions must be first-order, must be linear (no variable occurs more than once in a single right-hand side), and that there must be no function call with anything else than variables in its arguments. Given the above, the following complement function is automatically derived: data Compl α = C1 | C2 α | C3 α (Compl α) compl :: [α] → Compl α compl [ ] = C1 compl [x ] = C2 x compl (x : y : zs) = C3 x (compl zs) (Matsuda et al. work in an untyped language, so they have no need to explicitly introduce the data type Compl, but as we formulate our ideas in Haskell, we will be careful to introduce appropriate types as we go along.) The basic ideas for the derivation of compl are that variables dropped when going from left to right in a defining equation of get 1 At

2 An

least for the original method of Matsuda et al. [2007]. Later work [Matsuda et al. 2009, in Japanese] relaxes the restrictions somewhat.

alternative would be to run the syntactically inverted program in a functional logic language [Antoy and Hanus 2010].

183

Example 2. Assume our get-function is as follows, keeping every element of a list except for the last one:3

2.2

Semantic Bidirectionalization

As already mentioned, we will develop our combined bidirectionalization technique only for lists, and only for fully polymorphic functions to bidirectionalize. So from now on, let

get 2 :: [α] → [α] get 2 [ ] = [] get 2 [x ] = [] get 2 (x : y : zs) = x : (get 0 y zs)

get :: [α] → [α] be fixed but arbitrary (except when discussing concrete examples, of course). The intuition underlying the method of Voigtl¨ander [2009] is that put can gain information about the get-function by applying it to suitable input. The key is that get is polymorphic over the element type α. This entails that its behavior does not depend on any concrete list elements, but only on positional information. And this positional information can be observed explicitly by applying get to ascending lists over integer values. Say get is tail , then every list [0 . . n ] is mapped to [1 . . n ], which allows put to see that the head element of the original source is absent from the view, hence cannot be affected by an update on the view, and hence should remain unchanged when propagating an updated view back into the source. And this observation can be transferred to other source lists than [0 . . n ] just as well, even to lists over non-integer types, thanks to parametric polymorphism [Reynolds 1983; Strachey 1967]. Let us further consider the tail example as in the previous paragraph. First, put should find out to what element in an original source s each element in an updated view v 0 corresponds. Assume s has length n + 1. Then by applying tail to the same-length list [0 . . n ], put learns that the original view from which v 0 was obtained by updating had length n, and also to what element in s each element in that original view corresponded. Being conservative, the current semantic bidirectionalization method will only accept v 0 if it has retained that length n. For then, we also know directly the associations between elements in v 0 and positions in the original source. Now, to produce the updated source, we can go over all positions in [0 . . n ] and fill them with the associated values from v 0 . For positions for which there is no corresponding value in v 0 , because these positions were omitted when applying tail to [0 . . n ], we can look up the correct value in s rather than in v 0 . For the concrete example, this will only concern position 0, for which we naturally take over the head element from s. The same strategy works also for general get. In short, given s, produce a kind of template t = [0 . . n ] of the same length, together with an association g between integer values in that template and the corresponding values in s. Then apply get to t and produce a further association h by matching this template view versus the updated proper value view v 0 . Combine the two associations into a single one h0 , giving precedence to h whenever an integer template index is found in both h and g. Thus, it is guaranteed that we will only resort to values from the original source s when the corresponding position did not make it into the view, and thus there is no way it could have been affected by the update. Finally, produce an updated source by filling all positions in [0 . . n ] with their associated values according to h0 . The above strategy is exactly what Voigtl¨ander [2009] implements for the special case get :: [α] → [α]. We recall the corresponding Haskell definitions, reformulating just a bit:

get 0 :: α → [α] → [α] get 0 x [ ] = [] get 0 x (y : zs) = x : (get 0 y zs) Then the syntactic approach produces the following complement function: data Compl α = C1 | C2 α | C3 α compl :: [α] → Compl α compl [ ] = C1 compl [x ] = C2 x compl (x : y : zs) = compl 0 y zs compl 0 :: α → [α] → Compl α compl 0 x [ ] = C3 x compl 0 x (y : zs) = compl 0 y zs Note that there are no data constructors around recursive calls. This omission is possible, because no variables are dropped in the respective equations and because automatic range analysis can tell the right-hand sides of those equations never overlap, for any instantiation of variables, with any other right-hand sides of the same function. Tupling, inversion, and fusion (not spelled out here in detail) ultimately give: :: Monad µ ⇒ [α] → [α] → µ [α] [] [] = return [ ] [x ] [] = return [x ] (x : y : zs) (x 0 : v 0 ) = do (y 0 , zs 0 ) ← put 0 y zs v 0 return (x 0 : y 0 : zs 0 ) = fail "Update violates complement." put

put put put put

put 0 :: Monad µ ⇒ α → [α] → [α] → µ (α, [α]) put 0 y [ ] [] = return (y, [ ]) put 0 y (z : zs) [ ] = put 0 z zs [ ] put 0 y zs (x 0 : v 0 ) = do (y 0 , zs 0 ) ← put 0 y zs v 0 return (x 0 , y 0 : zs 0 ) The updatability of these functions is that put s v 0 succeeds if and only if length v 0 and length (get 2 s) are equal or both greater than zero. For the latter case, the behavior of put is best understood by observing that the definition of put 0 is semantically equivalent (depending on one of the monad laws) to: put 0 :: Monad µ ⇒ α → [α] → [α] → µ (α, [α]) put 0 y zs [ ] = return (last (y : zs), [ ]) put 0 y zs (x 0 : v 0 ) = return (x 0 , v 0 + + [last (y : zs)]) and thus the third defining equation of put is equivalent (again depending on the same monad law) to the following two: put (x : y : zs) (x 0 : [ ]) = return (x 0 : [last (y : zs)]) 0 0 0 put (x : y : zs) (x : y : v ) = return (x 0 : y 0 : v0 + + [last (y : zs)])

• Instead of presenting a higher-order bff -function that turns get

into put, we directly give a definition of put that refers to a top-level-defined get. • We write put in monadic style to provide for more convenient

and thus to: 0

0

0

error handling.

0

put (x : y : zs) (x : v ) = return (x : v + + [last (y : zs)])

We define put as follows, using some functions from module Data.IntMap. Their type signatures, which should provide sufficient documentation, are given in Figure 1. One detail in behavior to mention additionally is that IntMap.union is left-biased for in-

helper function get 0 is used to prevent a function call with an argument that is not a variable.

3A

184

fromList fromDistinctAscList empty insert union lookup

length (get 1 s). Indeed, the two versions of put are semantically equivalent (at type [τ ] → [τ ] → Maybe [τ ], for τ that is an instance of Eq). Here are a few representative calls and their results:

:: [(Int, α)] → IntMap α :: [(Int, α)] → IntMap α :: IntMap α :: Int → α → IntMap α → IntMap α :: IntMap α → IntMap α → IntMap α :: Int → IntMap α → Maybe α

s "abcd" "abcd" "abcd" "abcde" "abcde" "abcde"

Figure 1. Functions from module Data.IntMap. tegers occurring as keys in both input maps. This realizes exactly the “precedence of h over g” alluded to in the informal exposition above. put :: (Monad µ, Eq α) ⇒ [α] → [α] → µ [α] put s v 0 = do let t = [0 . . length s − 1] let g = IntMap.fromDistinctAscList (zip t s) h ← assoc (get t) v 0 let h0 = IntMap.union h g return (map (fromJust ◦ flip IntMap.lookup h0 ) t)

syntactic put s v 0 Nothing Just "axcy" Nothing Nothing Just "axcye" Nothing

semantic put s v 0 Nothing Just "axcy" Nothing Nothing Just "axcye" Nothing

Example 2 (continued). While, as we have seen, the put-function obtained via syntactic bidirectionalization succeeds whenever length v 0 and length (get 2 s) are equal or both greater than zero, for the put-function obtained via the semantic technique put s v 0 will only be successful if length v 0 = length (get 2 s). Again, a few representative calls and their results: s "" "" "a" "a" "ab" "ab" "ab" "abc" "abc" "abc" "abc"

assoc :: (Monad µ, Eq α) ⇒ [Int] → [α] → µ (IntMap α) assoc [ ] [] = return IntMap.empty assoc (i : is) (b : bs) = do m ← assoc is bs case IntMap.lookup i m of Nothing → return (IntMap.insert i b m) Just c → if b == c then return m else fail "Update violates equality." = fail "Update changes the length." assoc The following theorem is essentially (up to the different way of expressing partiality of put) what is proved by Voigtl¨ander [2009] in Theorems 1 and 2.

v0 "" "x" "" "x" "" "x" "xy" "" "x" "xy" "xyz"

syntactic put s v 0 Just "" Nothing Just "a" Nothing Nothing Just "xb" Just "xyb" Nothing Just "xc" Just "xyc" Just "xyzc"

semantic put s v 0 Just "" Nothing Just "a" Nothing Nothing Just "xb" Nothing Nothing Nothing Just "xyc" Nothing

We see that syntactic and semantic bidirectionalization can agree or disagree in terms of updatability. Our aim is to combine the two into a technique that will represent a significant improvement over both. A reviewer suggested that on the intersection of their applicability domains, the syntactic technique on its own is never worse than the semantic technique on its own. We believe this to be true. So in a sense, we “only” try to improve over the syntactic method. Interestingly, the way forward is to defer that method to the role of a plug-in, with the technique of Voigtl¨ander [2009] in the master role. As preparation, we refactor that latter technique.

Theorem 1. Let τ be a type that is an instance of Eq in such a way that the definition given for == makes it reflexive, symmetric, and transitive. • For every s :: [τ ],

put s (get s) :: Maybe [τ ] ≡ Just s . • For every s, v 0 , s0 :: [τ ], if put s v 0 :: Maybe [τ ] ≡ Just s0 ,

3.

then 0

v0 "x" "xy" "xyz" "x" "xy" "xyz"

Refactoring Semantic Bidirectionalization

From now on, assume that for every n :: Int, get [0 . . n ] contains no duplicates. We call this property semantic linearity. It will clearly be fulfilled if get’s syntactic definition is linear.

0

get s == v . Corollary 1. Let τ be a type that is an instance of Eq in a way that the definition given for == agrees with semantic equality. Then put :: [τ ] → [τ ] → Maybe [τ ] is consistent for get :: [τ ] → [τ ].

3.1

Specialization to Semantically Linear get-Functions

We define

The somewhat complicated definition of assoc and the references to Eq and == in the function definitions and in Theorem and Corollary 1 are due to the fact that get could duplicate some of its input list elements, which requires special handling. As we are anyway going to outlaw such copying (driven by the utilized syntactic bidirectionalization method’s inability to deal with non-linear functions), we do not elaborate on this further here. It is discussed in detail by Voigtl¨ander [2009, end of Section 2 and start of Section 3]. Applying semantic bidirectionalization is very easy. We simply put the function definitions of put and assoc side by side with the get-function we want to bidirectionalize.

put linear :: Monad µ ⇒ [α] → [α] → µ [α] like put (but note the different type), except that the call to assoc is replaced by a call, with the same arguments, to the following function: assoc 0 :: Monad µ ⇒ [Int] → [α] → µ (IntMap α) assoc 0 [ ] [] = return IntMap.empty assoc 0 (i : is) (b : bs) = do m ← assoc 0 is bs return (IntMap.insert i b m) assoc 0 = fail "Update changes the length." The proof of the following theorem is very similar to that of Theorem 1, additionally using semantic linearity of get in a straightforward way.

Example 1 (continued). Just as was the case for syntactic bidirectionalization here, put s v 0 fails if and only if length v 0 6=

185

The refactoring consists of:

Theorem 2. For every type τ ,

• making the check for equal length of get [0 . . length s −1] and

put linear :: [τ ] → [τ ] → Maybe [τ ]

v 0 , otherwise performed inside assoc 0 , explicit, and outsourcing it to sput naive , and

is consistent for

• realizing that once this check was successful, the role of assoc 0

get :: [τ ] → [τ ] .

can be taken over by zip and IntMap.fromList. But semantic linearity gives us more. It rules out one important cause for a potential failure of view-update. As a consequence, we can now formulate a sufficient condition for a successful update.

The following lemma establishes that the refactoring is indeed correct, and thus transports the (good and bad) properties of put linear , namely Theorems 2–4, to put refac .

Definition 2. We say that a function

Lemma 1. For every type τ and s, v 0 :: [τ ], we have

put :: [τ ] → [τ ] → Maybe [τ ]

put linear s v 0 :: Maybe [τ ] ≡ put refac s v 0 :: Maybe [τ ] . 0

(for some type τ ) is fixed-shape-friendly for get if for every s, v :: [τ ], if length v 0 = length (get s), then put s v 0 ≡ Just s0 for some s0 :: [τ ].

The motivation for our refactoring above is that we make explicit, in sput naive , what happens on the shape level, namely that only updated views with the same length as the original view can be accepted, and that the length of the source will never be changed. By “playing” with sput naive , we can change that behavior. For example, it is tempting to change the last line of the above definition of sput naive to:

Note that the original put :: [τ ] → [τ ] → Maybe [τ ] from Section 2.2 is not in general fixed-shape-friendly for get-functions that are not semantically linear. On the other hand, put linear :: [τ ] → [τ ] → Maybe [τ ] is not even generally consistent for get-functions that are not semantically linear. But since we have now restricted get-functions to be semantically linear, we have consistency by the above theorem, and can moreover prove the following one.

else return (head [ls0 | ls0 ← [0 . .], lv0 == length (get [0 . . ls0 − 1])]) That would correspond to a “brute force” search for an appropriate new source shape. A reviewer pointed out that, thanks to semantic linearity of get, it would be sufficient to start the search for ls0 at lv0 , i.e., that one could replace [0 . .] by [lv0 . .] above, and that further optimizations like memoization might be possible to speed up the search. However, our motivation for discarding the “brute force” approach is not primarily efficiency. We are looking for a more effective approach in the sense that updates should be meaningful to the user. The kind of perfect updatability that could be achieved using pure search (possibly with some limited guidance by the user via heuristics, expressed as reorderings of the candidate list [lv0 . .]) could produce quite unintuitive results. As reckoned by the same reviewer, we expect that by replacing sput naive with a more “intelligent” or “intuition-guided” shape-bidirectionalizer, such as one based on the constant-complement approach, we will get more useful results overall.

Theorem 3. For every type τ , put linear :: [τ ] → [τ ] → Maybe [τ ] is fixed-shape-friendly for get. For the proof, we basically just observe that the last defining equation of assoc 0 will never be reached if the argument lists are of the same length. We can also give a negative statement about updatability (which also holds for the put from Section 2.2, of course). Theorem 4. For every type τ and s, v 0 :: [τ ], if length v 0 6= length (get s), then put linear s v 0 :: Maybe [τ ] ≡ Nothing. For the proof, we observe that the last defining equation of assoc 0 (or assoc) is reached if the argument lists are of different lengths. 3.2

4.

Combining Syntactic and Semantic Bidirectionalization

Our key idea is abstraction: from lists to list lengths (generally, from data structures to their shapes). Since we prefer to work with a more symbolic representation than built-in integers provide, we first define a new data type and conversion functions as follows:

Decomposition to Expose the Shape Aspect

We refactor put linear to make the treatment of shapes (list lengths) explicit. To that end, we first define sput naive as follows: sput naive :: Monad µ ⇒ Int → Int → µ Int sput naive ls lv0 = if lv0 == length (get [0 . . ls − 1]) then return ls else fail "Update changes the length."

data Nat = Z | S Nat toNat :: Int → Nat toNat 0 =Z toNat n | n > 0 = S (toNat (n − 1))

Using that function, we then define put refac as follows:

fromNat :: Nat → Int fromNat Z =0 fromNat (S n) = 1 + fromNat n

put refac :: Monad µ ⇒ [α] → [α] → µ [α] put refac s v 0 = do let ls = length s let g = IntMap.fromDistinctAscList (zip [0 . . ls − 1] s) l0 ← sput naive ls (length v 0 ) let t = [0 . . l 0 − 1] let h = fromDistinctList (zip (get t) v 0 ) let h0 = IntMap.union h g return (map (fromJust ◦ flip IntMap.lookup h0 ) t)

and then a function sget as follows: sget :: Nat → Nat sget ls = toNat (length (get [0 . . fromNat ls − 1])) The point, later, will be that one can also directly derive a simplified syntactic definition for sget from a given definition for get. But for the moment, we simply take the above definition.

fromDistinctList = IntMap.fromList

186

Next, we assume that some function sput is given, with the following type:

The following two statements are then relatively direct consequences of Theorems 5 and 6.

sput :: Nat → Nat → Maybe Nat , Corollary 2. For every type τ and d :: τ ,

and that sput is consistent for sget. Of course,

dput d :: [τ ] → [τ ] → Maybe [τ ]

sput ls lv0 = case sput naive (fromNat ls ) (fromNat lv0 ) of Nothing → Nothing Just l → Just (toNat l)

is consistent for get :: [τ ] → [τ ] .

is always a valid choice, with any of the versions of sput naive discussed in Section 3.2, but for many get-functions there will be better alternatives! We now define put comb as below. There are three differences from put refac : we use Nat instead of Int to call out to sput instead of sput naive , we generate an error message in case sput fails (previously this was done directly in sput naive ), and we drop the fromJust from the last (return-) line. The latter change introduces an extra Maybe type constructor in the output list type, and is done to deal with list positions for which no data is known, neither from the original source nor from the updated view.

Corollary 3. For every type τ and d :: τ , dput d :: [τ ] → [τ ] → Maybe [τ ] is fixed-shape-friendly for get. (Moreover, the default value d is not actually used in dput d s v 0 if length v 0 = length (get s).) It is important to note that no general negative statement like Theorem 4 holds for dput (or for put comb ). It all depends on the definition of sput! Namely, if from a given get, we make an sget, and find a good sput for it, then dput will also be good for get. This is where we can now plug in the work of Matsuda et al. [2007] as a black box. For functions get that are polymorphic and at the same time satisfy the syntactic restrictions imposed by Matsuda et al.’s technique, we can use that technique for deriving sput from sget. Voila, done.

put comb :: Monad µ ⇒ [α] → [α] → µ [Maybe α] put comb s v 0 = do let ls = length s let g = IntMap.fromDistinctAscList (zip [0 . . ls − 1] s) l0 ← maybe (fail "Could not handle shape change.") return (sput (toNat ls ) (toNat (length v 0 ))) let t = [0 . . fromNat l0 − 1] let h = fromDistinctList (zip (get t) v 0 ) let h0 = IntMap.union h g return (map (flip IntMap.lookup h0 ) t)

5.

We detail the execution of the just introduced combination idea on the two examples considered in Section 2. This leads to some general observations about ways in which, and why, the combined approach improves over both its constituent techniques, and also provides motivation for further extensions we will consider in the two subsequent sections.

The proof of the following theorem is very similar to that by Voigtl¨ander [2009] for his Theorems 1 and 2, but of course additionally uses the assumption that sput is consistent for sget. Theorem 5. Let τ be a type.

Example 1 (continued). We have seen in Sections 2.1 and 2.2 that for get 1 both syntactic and semantic bidirectionalization on their own lead to quite limited updatability. Namely, put s v 0 only succeeds if length v 0 = length (get 1 s). The same holds for put linear and put refac , of course, as they are only refactorings of the putfunction obtained by semantic bidirectionalization. On the other hand, for the combination of the two techniques, we can proceed as follows. The sget corresponding to get 1 , as obtained via a pretty straightforward syntactic transformation, looks as follows:

• For every s :: [τ ],

put comb s (get s) :: Maybe [Maybe τ ] ≡ Just (map Just s) . • For every s, v 0 :: [τ ] and s0 :: [Maybe τ ], if put comb s v 0 ::

Maybe [Maybe τ ] ≡ Just s0 , then

get s0 ≡ map Just v 0 . The following theorem can also be shown to hold, basically by observing that if length v 0 = length (get s), then

sget sget sget sget

sget (toNat (length s)) ≡ toNat (length v 0 ) , and thus, by consistency of sput for sget, inside the put comb definition l0 will be successfully assigned the value toNat ls , and subsequently every index position from t will lead to a successful lookup in h0 , because at least g will contain a matching entry.

:: Nat → Nat Z =Z (S Z) =Z (S (S zs)) = S (sget zs)

For it, the syntactic bidirectionalization method of Matsuda et al. [2007] produces the following complement function:

Theorem 6. For every type τ and s, v 0 :: [τ ], if length v 0 = length (get s), then put comb s v 0 :: Maybe [Maybe τ ] ≡ Just (map Just s0 ) for some s0 :: [τ ].

data SCompl = SC1 | SC2 scompl :: Nat → SCompl scompl Z = SC1 scompl (S Z) = SC2 scompl (S (S zs)) = scompl zs

As mentioned above, put comb uses an extra Maybe type constructor to deal with positions in the output list for which no data is known, neither from the original source nor from the updated view. It is usually more convenient to instead use a default value for such positions, so we define a function dput as follows:4

Note that the move from [α] to Nat in get 1 7→ sget has obviated the need to collect any dropped variables in the complement function. As a consequence, with the help of range analysis, no data constructor is necessary around the recursive call. (That is a crucial optimization embedded in Matsuda et al.’s transformation.) For the two non-recursive equations, different data constructors are needed, because the ranges of the original right-hand sides overlap.

dput :: Monad µ ⇒ α → [α] → [α] → µ [α] dput d s v 0 = do s0 ← put comb s v 0 return (map (maybe d id ) s0 ) 4 Concrete

Analysis of Examples

examples of using default values appear in the next section.

187

Tupling of sget and scompl leads to:

length (get 2 s) are equal or both greater than zero, while the semantic technique is only successful if length v 0 = length (get 2 s). Let us analyze how the combined technique fares. The move from [α] to Nat yields:

spaired :: Nat → (Nat, SCompl) spaired Z = (Z , SC1 ) spaired (S Z) = (Z , SC2 ) spaired (S (S zs)) = (S v , c) where (v , c) = spaired zs

sget sget sget sget

Inversion gives:5 sinv sinv sinv sinv

sget 0 :: Nat → Nat sget 0 Z =Z sget 0 (S zs) = S (sget 0 zs)

:: Monad µ ⇒ (Nat, SCompl) → µ Nat (Z , SC1 ) = return Z (Z , SC2 ) = return (S Z) (S v , c) = do zs ← sinv (v , c) return (S (S zs))

Note that regarding the helper function get 0 one argument becomes superfluous. Indeed, when moving from [α] to Nat, there is no role to play anymore for content elements of type α. The automatic view complement generation of Matsuda et al. [2007] yields either of two functions scompl 1 /scompl 2 for sget (with data SCompl = SC1 | SC2 | SC3 ) which differ only in their last defining equation:

and finally, sput :: Nat → Nat → Maybe Nat sput s v 0 = sinv (v 0 , scompl s) can be fused to: sput sput sput sput sput

scompl ? :: Nat → SCompl scompl ? Z = SC1 scompl ? (S Z) = SC2 scompl ? (S (S zs)) = SC?

:: Nat → Nat → Maybe Nat Z Z = return Z (S Z) Z = return (S Z) (S (S zs)) Z = sput zs Z s (S v 0 ) = do zs ← sput s v 0 return (S (S zs))

while for sget 0 , one obtains the following complement function: scompl 0 :: Nat → SCompl scompl 0 Z = SC3 scompl 0 (S zs) = SC3

The benefit of the combination of syntactic and semantic bidirectionalization can be observed by comparing dput as obtained from the above sput-function to the function put from Example 1 in Section 2.1 (which we have seen is equivalent to put, put linear , and put refac as obtained via semantic bidirectionalization). Here are a few representative calls and their results: s "abcd" "abcd" "abcd" "abcd" "abcde" "abcde" "abcde" "abcde"

v0 "x" "xy" "xyz" "xyzv" "x" "xy" "xyz" "xyzv"

put s v 0 Nothing Just "axcy" Nothing Nothing Nothing Just "axcye" Nothing Nothing

:: Nat → Nat Z =Z (S Z) =Z (S (S zs)) = S (sget 0 zs)

Note that injectivity analysis (of sget 0 ) has enabled the omission of recursive calls, and the use of a constant function for scompl 0 . Due to range analysis, we have a choice between SC1 and SC2 in the equation scompl ? (S (S zs)) = . . .. Tupling, inversion, and fusion (again not spelled out here in detail) ultimately give:

dput ’ ’ s v 0 Just "ax" Just "axcy" Just "axcy z" Just "axcy z v" Just "axc" Just "axcye" Just "axcyez " Just "axcyez v "

sput 1 :: Nat → Nat → Maybe Nat sput 1 Z Z = return Z sput 1 (S Z) Z = return (S Z) sput 1 (S (S zs)) Z = return Z sput 1 Z (S v 0 ) = return (S (S v 0 )) sput 1 (S (S zs)) (S v 0 ) = return (S (S v 0 )) sput 1 = fail "..."

Note that when length v 0 6= length (get 1 s), dput ’ ’ s v 0 extends, making use of the default value, or shrinks the source list by a number of elements that is a multiple of two (to preserve the remainder modulo two, as fixed via scompl ). All updates can be successfully handled, in contrast to all the versions of put we have considered for this example before!

for scompl 1 , and a variant in which the third and fourth equation become: sput 2 (S (S zs)) Z = return (S Z) sput 2 (S Z) (S v 0 ) = return (S (S v 0 ))

As a “lesson” from the above example, we could formulate:

for scompl 2 . Let us compare the results of combining syntactic and semantic bidirectionalization, i.e. the now two possible dput-functions, to the results of either only syntactic or only semantic bidirectionalization, i.e. to put from Example 2 in Section 2.1 and to put linear ≡ put refac from Section 3. We call the dput-function obtained from sput 1 above, dput 1 , the other one, obtained from sput 2 , we call dput 2 . Figure 2 shows a few representative calls and their results.

The move from [α] to Nat can make the get-function considerably simpler. In particular, no data values have to be kept. Here, this has even led (thanks to range analysis) to one constructor in the complement creation becoming superfluous completely, which resulted in perfect updatability. Example 2 (continued). We have seen in Sections 2.1 and 2.2 that for get 2 /get 0 the updatability achieved by syntactic bidirectionalization is that put s v 0 succeeds whenever length v 0 and

As a lesson from this example, we could formulate: 5 Note

that there is no need for a fall-back function equation sinv = fail "Update violates complement.", because in fact the patternmatch is exhaustive. This eventually means that all updates/cases can be dealt with!

The move from [α] to Nat can lead to injectivity, and hence to considerably simpler (even constant) complement functions. This clearly benefits updatability.

188

s "" "" "" "a" "a" "ab" "ab" "ab" "abc" "abc" "abc" "abc"

v0 "" "x" "xy" "" "x" "" "x" "xy" "" "x" "xy" "xyz"

syntactic put s v 0 Just "" Nothing Nothing Just "a" Nothing Nothing Just "xb" Just "xyb" Nothing Just "xc" Just "xyc" Just "xyzc"

semantic put linear s v 0 Just "" Nothing Nothing Just "a" Nothing Nothing Just "xb" Nothing Nothing Nothing Just "xyc" Nothing

combined dput 1 ’ ’ s v 0 dput 2 ’ ’ s v 0 Just "" Just "" Just "x " Nothing Just "xy " Nothing Just "a" Just "a" Nothing Just "x " Just "" Just "a" Just "xb" Just "xb" Just "xy " Just "xy " Just "" Just "a" Just "xb" Just "xb" Just "xyc" Just "xyc" Just "xyz " Just "xyz "

Figure 2. Comparing different bidirectionalization methods for the get-function from Example 2.

6.

Explicit Bias

For example, Figure 2 (the interesting subset thereof; all other entries remain unchanged) now becomes:

Through the numbering scheme of our “template sources” via [0 . . l − 1] for a concrete source of length l , there is a certain bias that manifests itself when an update changes the length of the view. For example, while it is nice that for Example 2, as just seen, we have

s v0 "" "x" "" "xy" "a" "x" "ab" "" "ab" "xy" "abc" "" "abc" "x" "abc" "xyz"

dput 1 ’ ’ "" "x" ≡ Just "x " and dput 1 ’ ’ "" "xy" ≡ Just "xy " (in contrast to the completely syntactically obtained put and the completely semantically obtained put linear , which both give Nothing in both cases), it is maybe a bit disappointing that

put s v 0 Nothing Nothing Nothing Nothing Just "xyb" Nothing Just "xc" Just "xyzc"

dput 1 ’ ’ s v 0 Just "x " Just "xy " Nothing Just "" Just "xyb" Just "" Just "xc" Just "xyzc"

dput 2 ’ ’ s v 0 Nothing Nothing Just "xa" Just "b" Just "xyb" Just "c" Just "xc" Just "xyzc"

The entries that have changed are shaded above. One could argue that in this specific case all the changes are for the better, but in general it is desirable to be able to influence what bias is used. Making the bias explicit, and thus putting it under the potential control of the user, is easily possible by defining a further variation of put comb :6

dput 1 ’ ’ "ab" "xy" ≡ Just "xy " (instead of Just "xyb"). The reason for this is simple: the use of [0 . . ls − 1] and [0 . . fromNat l 0 − 1] in the definition of put comb means that when the updated source becomes shorter than the original source, then it’s the elements towards the rear of the original source that become discarded; while if the updated source becomes longer, then again positions towards the rear of the new source will be considered to be “additional” and thus will be filled with the default value. So there is an implicit assumption that shape-changing updates will always happen in such a way that the corresponding insertions or deletions affect the end of the source list, rather than its front or other elements. There is an easy remedy for the observed phenomenon. If we simply replace the lines

type Bias = Int → [Int] put bias :: Monad µ ⇒ Bias → [α] → [α] → µ [Maybe α] put bias bias s v 0 = do let ls = length s let g = fromDistinctList (zip (bias ls ) s) l 0 ← maybe (fail "...") return (sput (toNat ls ) (toNat (length v 0 ))) let t = bias (fromNat l 0 ) let h = fromDistinctList (zip (get t) v 0 ) let h 0 = IntMap.union h g return (map (flip IntMap.lookup h 0 ) t)

let g = IntMap.fromDistinctAscList (zip [0 . . ls − 1] s)

as well as: and

bdput :: Monad µ ⇒ Bias → α → [α] → [α] → µ [α] bdput bias d s v 0 = do s 0 ← put bias bias s v 0 return (map (maybe d id ) s 0 )

let t = [0 . . fromNat l 0 − 1] in the definition of put comb by

The only formal requirement imposed on a proper bias :: Bias, to ensure that analogues of Theorems 5 and 6 and of Corollaries 2 and 3 continue to hold, is that for every n > 0, bias n should return a list of length exactly n and with no duplicate elements. Then, we in particular obtain the following two corollaries.

let g = fromDistinctList (zip (reverse [0 . . ls − 1]) s) and let t = reverse [0 . . fromNat l 0 − 1] respectively, then Theorems 5 and 6, and thus Corollaries 2 and 3, continue to hold, but instead of a rear update (insertion/deletion) bias, there is now a front update bias.

6 No

189

change whatsoever is necessary to sput!

bias rear rear front front middle middle borders borders rear rear rear rear front front front front middle middle middle middle borders borders borders

Corollary 4. Let bias :: Bias be proper (in the way just described). For every type τ and d :: τ , bdput bias d :: [τ ] → [τ ] → Maybe [τ ] is consistent for get :: [τ ] → [τ ] .

Corollary 5. Let bias :: Bias be proper. For every type τ and d :: τ , bdput bias d :: [τ ] → [τ ] → Maybe [τ ] is fixed-shape-friendly for get. (Moreover, the default value d is not actually used in bdput bias d s v 0 if length v 0 = length (get s).) For bdput to behave well in practice, it makes sense to (at least) additionally impose that whenever n < m, the elements of the list bias n should form a subset of the elements of bias m. Some good examples are: rear :: Bias rear l = [0 . . l − 1] front :: Bias front l = reverse [0 . . l − 1] middle :: Bias middle l = [1, 3 . . l ] + + (reverse [2, 4 . . l ]) borders :: Bias borders l = (reverse [1, 3 . . l ]) + + [2, 4 . . l ]

v0 "x" "x" "x" "x" "x" "x" "x" "x" "bdx" "bdxy" "bdx" "bdxy" "xbd" "xybd" "xbd" "xybd" "bxd" "bxyd" "bxd" "bxyd" "xbdy" "xbdy" "xybdzv"

bdput bias ’ ’ s v 0 Just "ax" Just "axc" Just "cx" Just "cxe" Just "ax" Just "axe" Just "bx" Just "bxd" Just "abcd x" Just "abcd x y" Just "abcdex " Just "abcdex y " Just " xabcd" Just " x yabcd" Just " xabcde" Just " x yabcde" Just "ab xcd" Just "ab x ycd" Just "abcx de" Just "abcx y de" Just " xabcd y" Just " xabcdey " Just " x yabcdez v "

Figure 4. More update bias examples for get 1 from Example 1. Due to the accumulating parameter of get 0 , the technique of Matsuda et al. [2007] cannot be applied. The technique of Voigtl¨ander [2009] can be applied, but fails to permit any shape-changing updates: put s v 0 s v0 "abc" "x" Nothing "abc" "xy" Nothing "abc" "xyz" Just "zyx" "abc" "xyzv" Nothing Let us try the combined technique. The move from [α] to Nat yields:

Some examples for the get-function from Example 1 (with sput as given for this example in Section 5), illustrating the effects of different bias strategies, are given in Figure 3 (on the next page). The beneficial effects, still for the case of the get-function from Example 1, might become even more apparent when also looking at cases where the data values in the source and view lists are not disjoint, as in Figure 4. (When interpreting the results, note that both get 1 "abcd" and get 1 "abcde" equal "bd".) The simple hints about which bias to apply when reflecting specific updated views back to the source level are quite effective. In practice, which bias to choose could be determined on a case-by-case basis, with decisions being made based on a form of diff between the original view and the updated view, or based on information about performed editing operations, or even something more clever. The possibilities are open, since we have exposed the bias strategy explicitly.

7.

s "abcd" "abcde" "abcd" "abcde" "abcd" "abcde" "abcd" "abcde" "abcd" "abcd" "abcde" "abcde" "abcd" "abcd" "abcde" "abcde" "abcd" "abcd" "abcde" "abcde" "abcd" "abcde" "abcde"

sget :: Nat → Nat sget Z =Z sget (S xs) = sget 0 xs (S Z) sget 0 :: Nat → Nat → Nat sget 0 Z ys = ys sget 0 (S xs) ys = sget 0 xs (S ys) Still, an accumulating parameter is used, preventing direct application of the technique of Matsuda et al. to this new subproblem. However, it is now possible to apply a semantics-preserving program transformation of Giesl [2000] to transform sget 0 as follows:

Extending Applicability

It turns out that the separation of shape and content, through the resultant move from [α] to Nat in the task posed to the syntactic bidirectionalization subsystem, and with the help of some known syntactic program transformations, leads to applicability (and good results) of the combined technique in new situations otherwise out of reach. We illustrate this with two examples.

sget 0 :: Nat → Nat → Nat sget 0 Z ys = ys sget 0 (S xs) ys = S (sget 0 xs ys) and to subsequently propagate the constant element (S Z) from sget to the now never-changed second parameter of sget 0 , finally yielding:

Example 3. Assume our get-function is as follows, reversing a list: get 3 :: [α] → [α] get 3 [ ] = [] get 3 (x : xs) = get 0 xs [x ]

sget :: Nat → Nat sget Z =Z sget (S xs) = sget 0 xs

get 0 :: [α] → [α] → [α] get 0 [ ] ys = ys get 0 (x : xs) ys = get 0 xs (x : ys)

sget 0 :: Nat → Nat sget 0 Z =SZ sget 0 (S xs) = S (sget 0 xs)

190

s "abcd" "abcd" "abcd" "abcde" "abcde" "abcde" "abcde"

v0 "x" "xyz" "xyzv" "x" "xyz" "xyzv" "xyzvw"

bdput rear ’ ’ s v 0 Just "ax" Just "axcy z" Just "axcy z v" Just "axc" Just "axcyez " Just "axcyez v " Just "axcyez v w "

bdput front ’ ’ s v 0 Just "cx" Just " xaycz" Just " x yazcv" Just "cxe" Just " xaycze" Just " x yazcve" Just " x y zavcwe"

bdput middle ’ ’ s v 0 Just "ax" Just "ax ycz" Just "ax y zcv" Just "axe" Just "axcy ze" Just "axcy z ve" Just "axcy z v we"

bdput borders ’ ’ s v 0 Just "bx" Just " xbydz" Just " xaycz v" Just "bxd" Just " xbydz " Just " xayczev " Just " x ybzdv w "

Figure 3. Comparing different bias strategies for our combined technique on the get-function from Example 1. sget :: Nat → Nat sget Z =Z sget (S xs) = S (sget 0 xs xs)

Now not only has the technique of Matsuda et al. [2007] become applicable, but their injectivity analysis even detects both the above functions to be injective, which leads to the use of constant functions for scompl and scompl 0 . Tupling, inversion, and fusion then give an sput-function that is equivalent to:

sget 0 :: Nat → Nat → Nat sget 0 xs Z =Z sget 0 xs (S Z) =Z sget 0 (S xs) (S (S zs)) = S (sget 0 xs zs)

sput :: Nat → Nat → Maybe Nat sput s v 0 = return v 0

Some straightforward syntactic analysis now shows that, in particular when called with two equal arguments, sget 0 never really needs its first argument (in contrast to the situation with get 0 , where the first argument plays a crucial role for supplying the output list elements). So we can simplify to:

which leads to perfect updatability for the combined technique (no matter what kind of bias from the previous section is used): s "abc" "abc" "abc" "abc"

v0 "x" "xy" "xyz" "xyzv"

dput ’ ’ s v 0 Just "x" Just "yx" Just "zyx" Just "vzyx"

sget :: Nat → Nat sget Z =Z sget (S xs) = S (sget 0 xs)

While reversing a list may appear a bit toy, in particular as it does not omit any information when going from the source to the view, so that the bidirectionalization task essentially becomes one of “only” inversion, the important point here is that through the move from [α] to Nat the get-function becomes simpler, in general, so that additional benefit can be gained by exploiting readily available syntactic techniques.7 We further demonstrate this with another example (and another syntactic phenomenon).

sget 0 :: Nat → Nat sget 0 Z =Z sget 0 (S Z) =Z sget 0 (S (S zs)) = S (sget 0 zs) Now this is a program to which the technique of Matsuda et al. [2007] can be applied. Doing so, and combining the result with the semantic technique of Voigtl¨ander as described at the end of Section 4, gives very good updatability. An update only fails if either the source or the updated view is empty while the other is not. Of the different kinds of update bias available from Section 6, middle and borders are particularly appropriate (not surprisingly, on reflection, given the nature of the get-function under consideration here):

Example 4. Assume our get-function is as follows, returning the first half of a list: get 4 :: [α] → [α] get 4 [ ] = [] get 4 (x : xs) = x : (get 0 xs xs)

s v0 "" "" "abc" "x" "abc" "xy" "abc" "xyz" "abcd" "xy" "abcd" "xyzv" "abcdefgh" "xy"

get 0 :: [α] → [α] → [α] get 0 xs [] = [] get 0 xs [y ] = [] get 0 (x : xs) (y : z : zs) = x : (get 0 xs zs) Since the function definition of get 4 is not syntactically linear, the technique of Matsuda et al. [2007] is not applicable. The technique of Voigtl¨ander [2009] can be applied, and since get 4 is indeed semantically linear, even with the strong guarantees from Section 3.1. Of course, shape-changing updates will fail: s "abc" "abc"

v0 "x" "xyz"

8.

put linear s v 0 Nothing Nothing

bdput middle . . . Just "" Just "x" Just "xyc" Just "xyz c" Just "xycd" Just "xyzv cd" Just "xygh"

bdput borders . . . Just "" Just "x" Just "xyc" Just "xyzc " Just "xycd" Just "xyzvcd " Just "xyef"

Conclusion

We have developed an approach for combining the bidirectionalization methods of Matsuda et al. [2007] and Voigtl¨ander [2009]. By separating shape from content, we exploit the respective strengths of the two previous methods maximally. The key insight is that when we simplify the problem of explicit bidirectionalization by posing it only on the shape level (going from get to sget), the existing syntactic technique can give far better results than for the general problem. The existing semantic technique does the rest. The improvements achieved on the syntactic level (all caused by the fact that no data values have to be kept) can be classified as 1) making the complement smaller, 2) introducing injectivity,

For the combined technique, we again first move from [α] to Nat: 7 It is also possible to remove the accumulating parameter from the original, list-based get 0 -function in Example 3 using techniques of Giesl [2000] and

Giesl et al. [2007], but the resulting program will still not be amenable to the method of Matsuda et al. [2007]. The move from [α] to Nat is really essential to be successful here.

191

Acknowledgments

3) enabling additional transformations that may bring programs into the required form in the first place, and 4) permitting non-linear programs to be made linear. We have seen representative examples for all four phenomena (Examples 1–4, in this order), all in the case of lists. We expect to observe the same, even amplified, when considering functions on other data types. The move from [α] to Nat might appear somewhat ad-hoc, and very specific to lists. However, actually a very general principle is at work here. We could have equivalently replaced [α] by [()], for the unit type ().8 That is indeed a generic way to characterize the shape data type corresponding to a polymorphic data type: replace the polymorphic component α by (). It is also a good way to think about implementing the get 7→ sget step. A prototype of such an implementation (for the special case of lists) exists and has been packaged with the earlier implementation of the syntactic bidirectionalization method as well as with the relevant functions from Sections 4 and 6 of this paper, so that it is really possible to apply our combined bidirectionalization method automatically. The system is available at http://www.kb.ecei.tohoku.ac. jp/~kztk/b18n-combined/. Using the observation about the general principle above, it should be clear that the abstraction/combination ideas in this paper can be applied similarly to other data types than lists. Dealing with type class polymorphism as Voigtl¨ander [2009] does would be a bit more challenging, because a more refined notion of “shape” is needed then. Also, finding good pragmatic bias strategies as in Section 6 would be more complicated (but also interesting) in the case of non-lists. Finally, a few more words about formal properties of get/putpairs are in order. We have taken laws GetPut (1) and PutGet (2), in the form of Definition 1, as consistency conditions. The literature also knows PutPut:

We thank the anonymous reviewers for their insightful comments and suggestions.

References S. Antoy and M. Hanus. Functional logic programming. Communications of the ACM, 53(4):74–85, 2010. F. Bancilhon and N. Spyratos. Update semantics of relational views. ACM Transactions on Database Systems, 6(3):557–575, 1981. A. Bohannon, B.C. Pierce, and J.A. Vaughan. Relational lenses: A language for updatable views. In Principles of Database Systems, Proceedings, pages 338–347. ACM Press, 2006. A. Bohannon, J.N. Foster, B.C. Pierce, A. Pilkiewicz, and A. Schmitt. Boomerang: Resourceful lenses for string data. In Principles of Programming Languages, Proceedings, pages 407–419. ACM Press, 2008. K. Czarnecki, J.N. Foster, Z. Hu, R. L¨ammel, A. Sch¨urr, and J.F. Terwilliger. Bidirectional transformations: A cross-discipline perspective. In International Conference on Model Transformation, Proceedings, volume 5563 of LNCS, pages 260–283. Springer-Verlag, 2009. J.N. Foster, M.B. Greenwald, J.T. Moore, B.C. Pierce, and A. Schmitt. Combinators for bidirectional tree transformations: A linguistic approach to the view-update problem. ACM Transactions on Programming Languages and Systems, 29(3):17, 2007. J.N. Foster, A. Pilkiewicz, and B.C. Pierce. Quotient lenses. In International Conference on Functional Programming, Proceedings, pages 383–395. ACM Press, 2008. J. Giesl. Context-moving transformations for function verification. In Logic-Based Program Synthesis and Transformation 1999, Selected Papers, volume 1817 of LNCS, pages 293–312. Springer-Verlag, 2000. J. Giesl, A. K¨uhnemann, and J. Voigtl¨ander. Deaccumulation techniques for improving provability. Journal of Logic and Algebraic Programming, 71 (2):79–113, 2007. G. Gottlob, P. Paolini, and R. Zicari. Properties and update semantics of consistent views. ACM Transactions on Database Systems, 13(4):486– 524, 1988. Z. Hu, S.-C. Mu, and M. Takeichi. A programmable editor for developing structured documents based on bidirectional transformations. In Partial Evaluation and Semantics-Based Program Manipulation, Proceedings, pages 178–189. ACM Press, 2004. A.M. Keller. Comments on Bancilhon and Spyratos’ “Update semantics and relational views”. ACM Transactions on Database Systems, 12(3): 521–523, 1987. K. Matsuda, Z. Hu, K. Nakano, M. Hamana, and M. Takeichi. Bidirectionalization transformation based on automatic derivation of view complement functions. In International Conference on Functional Programming, Proceedings, pages 47–58. ACM Press, 2007.

put (put s v 0 ) v 00 ≡ put s v 00 , which as one interesting consequence together with GetPut implies undoability: put (put s v 0 ) (get s) ≡ s . Or, for partial put, the latter is required to hold whenever put s v 0 is defined, and the former if additionally put (put s v 0 ) v 00 is indeed defined. The technique of Matsuda et al. [2007] satisfies these two laws, by virtue of being based on the constant-complement approach of Bancilhon and Spyratos [1981]. Although not explicitly proved by Voigtl¨ander [2009], his technique also satisfies these two additional laws. In fact, it can be reformulated via the constantcomplement approach as well.9 So the question is natural whether our combined technique can also be so based, and satisfies PutPut and undoability as well. The answer is No, as invocations like dput ’ ’ "abcd" "x" ≡ Just "ax" ≡ dput ’ ’ "abyd" "x" for Example 1 show. Clearly, there is no way that dput ’ ’ "ax" "bd" is both Just "abcd" and Just "abyd" as undoability would demand; instead: dput ’ ’ "ax" "bd" ≡ Just "ab d". (PutPut fails for a similar reason.) Is that bad news? We would argue that not: any method that successfully deals with insertion and deletion updates for a function like the get 1 under consideration here will have to give up PutPut and undoability. Indeed, these two properties are often considered undesirable, precisely because they significantly limit the transformations one can hope to deal with [Foster et al. 2007; Gottlob et al. 1988; Keller 1987].

K. Matsuda, Z. Hu, K. Nakano, M. Hamana, and M. Takeichi. Bidirectionalizing programs with duplication through complementary function derivation. Computer Software, 26(2):56–75, 2009. A. Pettorossi. Transformation of programs and use of tupling strategy. In Informatica, Proceedings, pages 1–6, 1977. J.C. Reynolds. Types, abstraction and parametric polymorphism. In Information Processing, Proceedings, pages 513–523. Elsevier, 1983. C. Strachey. Fundamental concepts in programming languages. Lecture notes for a course at the International Summer School in Computer Programming, 1967. Reprint appeared in Higher-Order and Symbolic Computation, 13(1–2):11–49, 2000. J. Voigtl¨ander. Bidirectionalization for free! In Principles of Programming Languages, Proceedings, pages 165–176. ACM Press, 2009. P. Wadler. Theorems for free! In Functional Programming Languages and Computer Architecture, Proceedings, pages 347–359. ACM Press, 1989. P. Wadler. Deforestation: Transforming programs to eliminate trees. Theoretical Computer Science, 73(2):231–248, 1990.

8 Clearly,

disregarding partial values like ⊥, Nat and [()] are isomorphic. formal reference is available for this observation, but slides of a recent talk at the Workshop on Bidirectional Transformation in ArchitectureBased Component Composition (http://www.iai.uni-bonn.de/~jv/ bt_in_abc2010-slides.pdf). 9 No

192

Matching Lenses: Alignment and View Update Davi M. J. Barbosa

Julien Cretin

Nate Foster

´ Ecole Polytechnique

´ Ecole Polytechnique INRIA

Princeton University

Abstract

Benjamin C. Pierce

University of Pennsylvania

Formally, a basic lens l mapping between sets of sources S and views V with respect to “complements” C comprises functions:

Bidirectional programming languages are a practical approach to the view update problem. Programs in these languages, called lenses, define both a view and an update policy—i.e., every program can be read as a function mapping sources to views as well as one mapping updated views back to updated sources. One thorny issue that has not received sufficient attention in the design of bidirectional languages is alignment. In general, to correctly propagate an update to a view, a lens needs to match up the pieces of the view with the corresponding pieces of the underlying source, even after data has been inserted, deleted, or reordered. However, existing bidirectional languages either support only simple strategies that fail on many examples of practical interest, or else propose specific strategies that are baked deeply into the underlying theory. We propose a general framework of matching lenses that parameterizes lenses over arbitrary heuristics for calculating alignments. We enrich the types of lenses with “chunks” identifying reorderable pieces of the source and view that should be re-aligned after an update, and we formulate behavioral laws that capture essential constraints on the handling of chunks. We develop a core language of matching lenses for strings, together with a set of “alignment combinators” that implement a variety of alignment strategies.

l.get l.res l.put l.create

∈ ∈ ∈ ∈

S S V V

→V →C →C→S →S

The get function maps a source to a view. The res (“residue”) function maps a source to a complement, a structure that records (at least) the information not reflected in the view—i.e., the information that needs to be “remembered” so that it can be mixed together with an updated view to produce an updated source. The other two functions handle updates: put takes a view and a complement and builds a new source, while create handles the special case where we need to map a view to a source but do not have a complement available. It builds a source from a view directly, filling in any missC ing information with defaults. We write S ⇐⇒ V for the set of all basic lenses between S and V with respect to C.1 Basic lenses must obey the following laws for every source s, view v, and complement c:2 l.get (l.put v c) = v (P UT G ET) l.put (l.get s) (l.res s) = s (G ET P UT) These laws are closely related to the conditions on update translators that have been proposed in the database literature [1, 6, 14]. P UT G ET ensures that updates to the view are translated “exactly”—i.e., that, given a view and a complement, the put function produces a source that get maps back to the very same view. G ET P UT ensures a “stability” property for the source—i.e., it requires that the put function return the original source unmodified whenever the update to the view is a no-op. It also guarantees that the complement computed by res records all of the source information not reflected in the view. Lenses have been studied extensively [3, 4, 10, 13, 18–22, 24] and applied in areas as diverse as user interfaces [20], structure editors [15], configuration management [18], software model transformations [7, 23, 26], pattern matching [25], data synchronization [9], and security [12]. See [5] for a survey. However, one fundamental issue continues to hinder wide application of these ideas: alignment. In general, the get function may discard some of the information in the source, so the put function needs to recombine parts of the view with parts of the complement to produce the updated source. When the source and view include

Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifications—Specialized application languages General Terms Languages, Design, Theory Keywords Bidirectional languages, lenses, alignment, view update problem, Boomerang

1.

Michael Greenberg

Introduction

The view update problem is a classic issue in data management [6]: given a view and an update to the view, how do we find a new source that accurately reflects the update? Recent work in the programming languages community has made progress on this old problem by developing new languages in which programs, called lenses, can be read both as view definitions and as update translators. This approach avoids code duplication and allows once-andfor-all proofs of round-tripping laws as corollaries of type soundness.

1 Readers familiar with lenses will see some small differences from previous

formulations: the put function has type V → C → S rather than V → S → S and we assume that every lens has a res function that extracts a complement from a source. To recover the original presentation, we can take the set C to be S and let res be the identity function. The added precision that we get by breaking out a separate set of complements will be helpful in formulating the concepts we’re working with here. 2 Lenses also obey a C REATE G ET law analogous to P UT G ET . To save space, we elide this law and all other laws involving create. Complete definitions can be found in the long version (via the last author’s web page).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright ⃝

193

. . . . ..

ordered data (lists, strings, XML trees, etc.), doing this recombination correctly requires matching up the pieces of the updated view with the corresponding pieces of the complement. Consider a simple example where the source is a Wiki page =Tour de France= The Tour is held in July... =Vuelta a Spain= The Vuelta is held in September...

.

. . . .

. . . . ..

(a) positional . . . . ..

and the view is a string containing just the section headings: Tour de France Vuelta a Spain If we change the view by replacing “Spain” with “Espana” and adding a line for the Giro, we would like the put function to take the new view

.

.

(b) best match . . . . .

(c) best non-crossing match

Giro d’Italia Tour de France Vuelta a Espana

. . . .

. . . . ..

.

. . .. . ..

+

(d) actual operations

Figure 1. Alignment strategies blocks of otherwise unstructured text) or when keys themselves can be edited in the view (as in the Wiki example above), the simple alignment strategy baked into dictionary lenses can lead to mangled or lost data. Similar limitations apply to relational lenses [4], which use functional dependencies to identify keys that can be used to perform database operations like join in an updatable fashion. Our goal is a completely generic mechanism that overcomes the limitations of all these approaches and addresses the issue of alignment once and for all. To this end, we propose a new framework of matching lenses that separates the process of aligning the original and edited views from the process of weaving together the original source and the updated view to build an updated source. This separation yields a flexible framework that can be instantiated with arbitrary heuristic alignment strategies without complicating the basic theory. Figure 1 depicts some possible choices: (a) simple positional alignment; (b) “best match” alignment, which tries to match chunks without regard to ordering (good for set-like data where ordering is not critical); (c) a variant of best-match that only considers “non-crossing” matches, like the longest common subsequence heuristic used by diff (this can lose hidden data if the actual edit is a move, but in return it can use local context to guide matching and will often perform better on document-like data); and (d) using the actual edit operations performed by the user (assuming these are available) to calculate exactly the “intended match.” The matching lens framework can accommodate all of these, and many others. At the level of the mathematical semantics, we enrich lens types with annotations specifying what constitutes a reorderable “chunk,” and we add behavioral laws that capture the essential constraints on the handling of chunks—e.g., these stipulate that lenses must carry chunks in the source through to chunks in the view and vice versa, and they guarantee that reorderings on the chunks in the view are translated to corresponding reorderings on the source. Operationally, we make the separation of concerns described above explicit by splitting the complement into two pieces: a rigid complement that represents the source information that should be handled positionally and a resource that represents the information extracted from chunks. To supply a lens with a precise alignment directive, we simply rearrange the resource according to the directive and obtain a pre-aligned resource in which each piece of the source matches up with the specified piece of the view. Finally, we instantiate this abstract framework with primitive matching lenses and combinators for string data.3 We give coer-

together with the complement of the original source and build a new source reflecting the same updates. But if the lens uses a simple positional strategy—the only one available in most bidirectional languages—then the first line in the view will be matched up with the first section in the source, the second with the second, and so on. The result will be a mangled Wiki page =Giro d’Italia= The Tour is held in July... =Tour de France= The Vuelta is held in September... =Vuelta a Espana= in which the paragraph for the Tour appears underneath the heading for the Giro and the paragraph for the Vuelta appears underneath the heading for the Tour—a recipe for tragedy in the cycling world! Existing bidirectional languages deal with the challenge of alignment in different ways. At the simple end of the spectrum, many languages ignore the issue entirely and use a straightforward positional strategy to match up pieces of the source and view [10, 19, 22, 24, 25]. This works in some cases—when the structures are unordered to begin with, or when they are sufficiently rigid that updates only need to modify information in-place, without changing its position—but fails in many others. Other languages deal with alignment by adopting an operationbased approach [15, 20, 21, 26]—that is, rather than taking the state of the new view as an argument, the put function is told what update operation was applied to the view. Working with explicit operations gives the put function detailed information about the nature of the update applied to the view, which can help it calculate a good alignment. However, this solution is not fully general. The “update language” recognized by put functions is fixed—and usually simple, to keep the theory manageable—so complicated updates have to be broken down into several smaller ones. For example, many update languages support inserting and deleting items but not moving items from one location to another. To move an item in the view, we have to delete the item and re-insert it at the new location, causing the hidden information associated with the item to be lost. Finally, a few systems align the pieces of the source and view using keys. For example, in our own earlier proposal for dictionary lenses [3], the programmer identifies the reorderable chunks in the source and specifies how to compute a key for each one. The put function uses keys rather than positions to locate a chunk for each piece of the view. This alignment strategy works well when chunks have stable keys, but it is also not a complete solution. In particular, when the chunks do not have a natural key (e.g., because they are

3 We

work concretely with strings, rather than richer structures like trees or graphs, but we use regular expression types to overlay tree structures

194

. let wiki : lens = .*

cions that convert basic lenses to matching lenses and vice versa, and we show how to interpret each of the regular operators (union, concatenation, and Kleene star) as well as the composition operator on matching lenses. Finally, we describe primitives for specifying, combining, and tuning alignment strategies using notions of “species,” “tags,” “keys,” and “thresholds.” Our contributions can be summarized as follows:

We’ve made two changes. First, in the definition of the wiki lens, we have indicated that each section in the source should be treated as a reorderable “chunk” by enclosing the section lens in angle brackets. Second, we have specified the policy that should be used to align chunks using the annotations key and best. The key annotation indicates the portions of each chunk that should be taken into account when computing an alignment. The “alignment species” best selects the overall heuristic to use for computing a correspondence between chunks: one that minimizes the sum of the edit distances between the keys of matched chunks. The point of this example is that we can provide programmers with simple, compositional primitives that allow them to specify rich alignment strategies directly in a lens program. In particular, if we put back the updated view

1. We define a new semantic space of matching lenses that enriches the types of lenses with chunks and adds behavioral laws ensuring that chunks are handled correctly (Section 3). 2. We define a simple syntax for matching lenses over string data (Section 4), and we prove (in the long version of the paper) that each primitive is well behaved at its specified type. 3. We develop several ways of calculating alignments, using notions of “species”, “tags”, “keys,” and “thresholds” (Section 5). Alignments are represented as concrete data structures, making a clean interface between the core lens behaviors described in Sections 3 and 4 and these alignment-generating algorithms.

Giro d’Italia Tour de France Vuelta a Espana

4. We sketch extensions to the framework (Section 6) and discuss an implementation based on Boomerang [11] (Section 7).

with the complement from the original source, we obtain a new source in which paragraphs are restored to the appropriate sections:

Related and future work are discussed in Sections 8 and 9.

2.

=Giro d’Italia= =Tour de France= The Tour is held in July... =Vuelta a Espana= The Vuelta is held in September...

Example

Let’s push the Wiki example a little further, to preview the essential ingredients of our solution. First, here is a Boomerang program that implements the original Wiki lens with positional alignment: let let let let

Readers familiar with dictionary lenses [3] will recall similar constructs for specifying chunks and alignment policies. Indeed, on the surface, matching lenses are designed to look like a straightforward generalization of dictionary lenses. Under the hood, the critical difference that makes the generalization work is that matching lenses make all alignment decisions in a separate phase that happens before the outermost put function is called, whereas dictionary lenses interleave alignment decisions with the operation of put functions. This untangling of mechanisms has several beneficial effects. First, it modularizes the framework, allowing us to use many different alignment strategies instead of just one. Second, it allows us to use global alignment strategies that optimize some distance metric over the whole structure; in particular, we can relax the assumption that keys are never edited, a major practical restriction of dictionary lenses. Third, it permits an elegant treatment of the composition operator (see Section 4), which we have found important in practical bidirectional programming but which doesn’t interact nicely with dictionary lenses. Fourth, it clarifies the underlying theory by treating alignment algorithms, which are typically complex and heuristic, separately from the core language, which remains simple and generic. And finally, it avoids some arbitrary choices forced by the locality of alignment decisions in dictionary lenses—e.g., the left bias of concatenation and Kleene star.

NONSPECIAL : regexp = [^=\n] HEADING : regexp = [^=\n ] . NONSPECIAL* LINES : regexp = (NONSPECIAL+ . "\n")* PARAGRAPHS : regexp = LINES . ("\n" . LINES)*

let section : lens = "="<->"" . copy HEADING . ("=\n" . PARAGRAPHS)<->"\n" let wiki : lens = section* The first few lines define regular expressions describing “nonspecial” characters (everything except ‘=’ and ‘\n’), section headings, lines of text, and paragraphs. The section lens processes one section of the Wiki source. The copy E lens recognizes a source string matching the regular expression E and copies it (in both directions). The get direction of the “replace by constant” lens E <-> u recognizes a source string matching E but adds the fixed string u to the view; the put direction recognizes u and restores the original source from the complement. The concatenation operator . uses one sublens to process the beginning of its input and the other for the end. The wiki lens, defined using the Kleene star operator *, iterates the section lens to handle a list of sections. This version of the Wiki lens uses a simple positional alignment strategy for matching up paragraphs in the source with lines in the view, leading to the unfortunate behavior described in the introduction. Here is a better version, written using the features developed in this paper, that uses section headings to locate the corresponding paragraphs from the source:

3.

Semantics

We begin our technical development by defining the semantic space of matching lenses, which are organized around a two-tier architecture: a top-level matching lens processes the information outside of chunks while a subordinate basic lens processes the chunks themselves. To simplify the presentation, we assume in this section that chunks only appear at the top level, that the same basic lens is used to process every chunk, and that lenses themselves do not reorder chunks. We relax each of these assumptions in Section 6. Consider an example that illustrates how matching lenses work:

let section : lens = "="<->"" . .key (copy. HEADING) . ("=\n" . PARAGRAPHS)<->"\n"

let k : lens = key (copy [A-Z]) . [a-z]<->"" let l : lens = . (copy "," . )*

onto strings. Indeed, our types are already expressive enough to describe arbitrary XML structures with non-recursive schemas.

195

Jay’s “shapely types”, which require that it be possible to divide structures into a shape and a list of data items with the arity of the shape equal to the number of items in the list, capture the same concept [16].

The basic lens k copies an upper-case letter from source to view and deletes a lower-case letter, while the matching lens l iterates k over a non-empty list of comma-separated chunks. The behavior of l’s get component is straightforward—e.g., it maps “Xx,Yy,Zz” to “X,Y,Z”. Its put function is more interesting: it restores the lowercase letters from source chunks by matching upper-case letters in the old and new views. For example, if we reorder the view and insert “W” in the middle, then put behaves as follows:

Chunk Compatibility To ensure that chunks can be freely reordered, we require that the sets of sources and views must be closed under the operation of replacing chunks by other chunks. Formally, we say that a set of structures with chunks U is chunk compatible with an ordinary set of structures U ′ if and only if

l.put "Z,Y,W,X" into "Xx,Yy,Zz" = "Zz,Yy,Wa,Xx"

• the chunks of every structure in U belong to U ′ —i.e., for every

The evaluation of l.put proceeds in several steps. First, it uses l.res to extract a complement from the source. In a matching lens, the complement is represented as two structures: a rigid complement c that contains the information outside of chunks and a resource r that contains the information within chunks. 8˛ ˛ 9 <˛˛ 1 7→ “Xx” ˛˛= c = (, [(“,”, ), (“,”, )]) r = ˛˛ 2 7→ “Yy” ˛˛ :˛ 3 7→ “Zz” ; ˛

u in U and n in locations(u) we have u[n] in U ′ , and

• membership in U is preserved when we replace arbitrary

chunks with elements of U ′ —i.e., for every u in U , n in locations(u), and u′ in U ′ we have u[n:=u′ ] in U .

Resources We represent resources using finite maps from locations to basic lens complements. This makes it easy to re-align a resource—we simply apply a (possibly lossy) reordering to the map. We write

The rigid complement records the position of each chunk and the commas separating the chunks; its structure (a pair whose second component is a list of pairs) comes from the structure of the lens l. The resource records a mapping from chunk locations to chunk contents—i.e., r(i) contains the contents of the ith chunk in s, while c has a at the corresponding location. Next, the put function invokes an alignment function to compute a correspondence g between the locations of chunks in the new and old views, and composes this correspondence with the resource r to obtain a “pre-aligned” resource (r ◦ g):

g=

. Z .Y. .W .X

˛ 9 8˛ ˛ <˛˛ 1 7→ 3 = ˛ . = ˛˛ 2 7→ 2 ˛˛ Y :˛ 4 7→ 1 ; ˛ . Z .X

• {||} for the totally undefined map, • {|n 7→ c|} for the singleton map that associates the location n

to the basic lens complement c and is otherwise undefined, • r(n) for the basic lens complement that the finite map r asso-

ciates to n, • dom(r) for the domain of the finite map r, • |r| for the largest element of {0} ∪ dom(r), • (r1

++ r2 ) for the finite map that behaves like the finite map r1 on locations in dom(r1 ) and like the finite map r2 with locations shifted up by |r1 | otherwise, ( r1 (n) if n ≤ |r1 | (r1 ++ r2 )(n) , r2 (n − |r1 |) otherwise,

˛ 9 8˛ ˛ <˛˛ 1 7→ “Zz” = ˛ (r ◦ g) = ˛˛ 2 7→ “Yy” ˛˛ :˛ 4 7→ “Xx” ; ˛

Finally, it runs l.put on the updated view, the rigid complement, and the pre-aligned resource. The overall effect is that each lowercase letter is restored to the chunk containing the appropriate uppercase letter. Notice that the third chunk, W is created with the default lower-case letter “a” because the pre-aligned resource (r ◦ g) is undefined on location 3. 3.1

and • {|N 7→ C|} for the set of all finite maps from locations to

elements of C, where C is a set of basic lens complements. 3.2

Matching Lenses

Let S and V be sets of structures with chunks, C a set of structures Ck (“rigid complements”), and k a basic lens in Sk ⇐⇒ Vk such that S is chunk compatible with Sk and V is chunk compatible with Vk . Also let align be a function that takes the list of chunks for the new and old views and computes a correspondence between them, represented formally as a partial injective function from new locations to old locations. A matching lens l on S, C, k, and V comprises four functions

Preliminaries

Before we can define matching lenses formally, we need some notation for chunks and resources. Structures with Chunks The semantic space of matching lenses is generic—the source and view can be arbitrary—but we require that structures come equipped with a notion of what constitutes a chunk. When u is a (source or view) structure with chunks, we write

l.get l.res l.put l.create

• chunks(u) for the list of chunks in u, • |u| for the length of chunks(u), • locations(u) for the set {1, ..., |u|} of locations of chunks in u, • u[n] for the nth chunk in u, where n is in locations(u),

∈ ∈ ∈ ∈

S S V V

→V → C × {|N 7→ Ck |} → C × {|N 7→ Ck |} → S → {|N 7→ Ck |} → S

obeying the laws in Figure 2 (to be explained below). We write C,k S ⇐⇒ V for the set of all matching lenses between S and V with respect to C and k. The get function has the same type as in basic lenses. The put function takes a view together with a rigid complement and a resource as arguments, while the res function extracts a rigid complement and a resource from a source. The create function takes just a view and a resource; this makes it possible for matching lenses to restore information to chunks whose rigid complement is newly created—e.g., the last chunk in the example from the

• u[n:=v] for the structure obtained by setting the nth chunk in

u to v, and • skel (u) for the skeleton structure obtained by replacing each

chunk in u with , a special element not appearing in normal structures.

Notice that the number of chunks |u| is equal to the number of occurrences of in skel (u). Examples of such structures abound, including conventional datatypes such as trees, lists, matrices, etc.

196

l.get (l.put v (c, r)) = v

(P UT G ET)

l.put (l.get s) (l.res s) = s

(G ET P UT)

locations(s) = locations(l.get s)

(G ET C HUNKS)

c, r = l.res s locations(s) = dom(r)

(R ES C HUNKS)

n ∈ (locations(v) ∩ dom(r)) (l.put v (c, r))[n] = k.put v[n] (r(n))

(C HUNK P UT)

n ∈ (locations(v) \ dom(r)) (N O C HUNK P UT) (l.put v (c, r))[n] = k.create v[n] skel (v) = skel (v ′ ) skel (l.put v (c, r)) = skel (l.put v ′ (c, r′ ))

(S KEL P UT)

Figure 2. Matching lens laws beginning of this section, which contains “X”. To create a source from scratch, we invoke create with the empty resource {||}. The P UT G ET and G ET P UT laws in Figure 2 express the same fundamental conditions as the corresponding basic lens laws; the remaining laws capture essential constraints on the handling of chunks. The G ET C HUNKS law stipulates that the get function must carry each chunk in the source to a chunk in the view; the R ES C HUNKS law imposes an analogous constraint on the resource generated by res. (We do not state P UT C HUNKS as a law because it can be derived from the other laws.) The C HUNK P UT and N O C HUNK P UT laws are the most important. They ensure that the put function uses its resource argument correctly. The C HUNK P UT law stipulates that the nth chunk in the source produced by put must be identical to the structure produced by applying k.put to the nth chunk in the view and the complement associated to n in the resource (if the resource contains a complement for n). For instance, in the example above, it stipulates that the second chunk in the source obtained by putting back the updated view “Z,Y,W,X” using the pre-aligned resource (r ◦ g) must be equal to the result obtained by applying k.put to “Z,Y,W,X”[2] and (r ◦ g)(2)—i.e., to “Y” and “Yy”. The N O C HUNK P UT law is similar, but handles the case where the resource does not contain a complement for n. For example, it stipulates that the third chunk in the source must be equal to the result obtained by applying k.create to “Z,Y,W,X”[3]—i.e., “W”. The last law, S KEL P UT, states that the skeleton of the sources produced by put must not depend on any of the chunks in the view or complements in the resource. Among other things, this law is critical for ensuring that matching lenses translate reorderings on the view to reorderings on the source. Compared to the basic lens laws, these laws have a somewhat low-level and operational feel, spelling out the handling of chunks and resources in quite a bit of detail. Other axiomatizations of matching lenses are possible (see Section 9). We chose these laws because they express conditions that are readily verified using simple, local checks. In addition, we can use them to derive higherlevel properties. For instance, we can show that the put function translates reorderings on the chunks in the view to corresponding reorderings on the chunks in the source. We write Perms(u) for the set of permutations of chunks in u and q u for the structure obtained by reordering the chunks of u according to a permutation q. The next lemma follows directly from the matching lens laws:

in the example at the start of this section. It turns out that we only need a single constraint on the alignment function align to ensure well-behavedness of the lens resulting from ⌊·⌋: when presented with identical lists of chunks, it must yield the identity alignment. C,k

l ∈ S ⇐⇒ V . . S ⌊l⌋ ∈ S ⇐⇒ V .

get s = l.get s res s =s put v s = l.put v (c, r ◦ g) . where (c, r) = l.res s and g = align(chunks(v), chunks(l.get s)) create v = l.create v {||}

The typing rule in the top box can be read as a lemma asserting C,k that, if l is a matching lens in S ⇐⇒ V , then ⌊l⌋ is a basic lens in S S ⇐⇒ V . A proof appears in the long version of this paper. The bottom box defines the components of ⌊l⌋. The get function is just l.get. The res function uses the whole source as the basic lens complement. The put function takes a (possibly updated) view v and a basic lens complement s as arguments. It first uses l.res to calculate a rigid complement c and a resource r from s, and then uses align to calculate a correspondence g between the locations of chunks in the updated view v and chunks in the original view l.get s. Next, it composes r and g as functions, which has the effect of rearranging the complements in the resource r according to the alignment g. To finish the job, the put function passes v, c, and (r ◦ g) to l.put, which produces the new source. The basic create function invokes l.create with the view and the empty resource.

4.

Matching Lenses for Strings

Having defined the semantic space of matching lenses, we now turn our attention to syntax. This section defines a collection of matching lens primitives for strings, based on the basic and dictionary lenses for strings that we have studied previously [3, 13]. 4.1

Notation

Let Σ be a finite alphabet (e.g., ASCII). The ϵ symbol denotes the empty string and (u·v) denotes the concatenation of strings u and v. A language L is a subset of Σ∗; concatenation is lifted to languages in the usual way: L1 ·L2 , S {u·v | u ∈ L1 and v ∈ L2 }. n n The iteration of a language L is L∗ , ∞ n=0 L , where L denotes the n-fold concatenation of L with itself. Many of our definitions require that every string in the concatenation of two languages have a unique factorization into smaller strings belonging to the languages being concatenated. Two languages L1 and L2 are unambiguously concatenable, written L1 ·! L2 , provided that, for all strings u1 and v1 in L1 and u2 and v2 in L2 , if (u1 ·u2 ) = (v1 ·v2 ) then u1 = v1 and u2 = v2 . Sim-

C,k

3.1 Lemma [ReorderPut]: Let l be a matching lens in S ⇐⇒ V . For every view v in V , rigid complement c in C, resource r in {|N 7→ k.C|}, and permutation q in Perms(v), we have q (l.put v (c, r)) = l.put ( q v) (c, r ◦ q −1 ). Lowering To complete the discussion of semantics, we define a coercion ⌊·⌋ (pronounced “lower”) that takes a matching lens C,k l in S ⇐⇒ V and packages it up with the interface of a basic S lens in S ⇐⇒ V . This coercion performs the steps needed to actually use the put component of a matching lens, as described

197

ilarly, a language L is unambiguously iterable, written L! ∗, if for all strings u1 to um and v1 to vn in L, if (u1 · · · um ) = (v1 · · · vn ) then m = n and ui = vi for i from 1 to n. The set of regular expressions is generated by the grammar

C′

.

R ::= ∅ | u | R·R | R|R | R∗ where u ranges over strings. The denotation [[E]] of a regular expression E is a regular language. Regular languages are closed under the boolean operators and have many decidable properties including emptiness, inclusion, and equivalence. It is also decidable whether two regular languages are unambiguously concatenable and whether a single regular language is unambiguously iterable (see [2, Prop. 4.1.3]). 4.2

C

get s = k.get s res s = k.res s, {||} . put v (c, r) = k.put v c create v r = k.create v

Note that the basic lens k′ mentioned in the type of b k is arbitrary. Match

Another way to lift a basic lens is to place it in a chunk: C

k ∈ S ⇐⇒ . V . {},k ⟨k⟩ ∈ ⟨S⟩ ⇐= =⇒ ⟨V ⟩

Types

The types of our primitives are given by regular languages of strings decorated with annotations that indicate the locations of chunks. Let ‘⟨’ and ‘⟩’ be fresh symbols not in Σ. The set of chunkannotated regular expressions is generated by the grammar:

get s . res s

= k.get s = , ȷ {|1 7→ k.res s|} k.put v (r(1)) if 1 ∈ dom(r) put . v (, r) = k.create v otherwise ȷ k.put v (r(1)) if 1 ∈ dom(r) create v r = k.create v otherwise

A ::= R | ⟨R⟩ | A | A | A·A | A∗ Note that every ordinary regular expression is also a chunkannotated regular expression and that chunks are not nested. The denotation [[A]] of a chunk-annotated regular expression A is a language of chunk-annotated strings—i.e., a set of strings over the extended alphabet Σ ∪ {‘⟨’, ‘⟩’} in which occurrences of ‘⟨’ and ‘⟩’ are balanced and not nested. We use these annotations to determine the number |u| of chunks in u, the chunk u[n] at n in u, and so on. For example, if u is “⟨A1⟩⟨B2⟩⟨C3⟩”, then |u| is 3, u[2] is “B2”, and u[2:=“Z9”] is “⟨A1⟩⟨Z9⟩⟨C3⟩”. Although our primitives formally manipulate chunk-annotated strings, we can also use them to process ordinary strings—indeed, this is how we most often use them! Let ⌊·⌋ be the function that maps chunk-annotated strings to ordinary strings (by removing ’⟨’ and ’⟩’ characters and mapping every other character to itself), and lift ⌊·⌋ to languages in the obvious way. We say that a language of chunk-annotated strings L is chunk unambiguous if and only if L is isomorphic to ⌊L⌋. Not all languages are chunk unambiguous—e.g., in {“⟨a⟩b”, “a⟨b⟩”} the ordinary string “ab” corresponds to two different chunk-annotated strings—but for languages that are, we can get back and forth between ordinary strings and chunk-annotated strings unambiguously. Using the isomorphism between a chunk-unambiguous language and its erasure, we can view our matching lens primitives as acting either on chunkannotated strings or on ordinary strings. To use a component of a lens to process an ordinary string, we first “parse” the input string, then apply the lens function to the resulting chunk-annotated string, and finally erase the annotations in the chunk-annotated output string. Moreover, for languages given by chunk-annotated regular expressions, implementing parse and erase functions is straightforward. In each of the typing rules below, we will be careful to ensure that the source and view types are chunk unambiguous. 4.3

k′ ∈ S ′ ⇐⇒ V ′ . k ∈ S ⇐⇒ V . C,k′ b k ∈ S ⇐⇒ V

The lens ⟨k⟩ (pronounced “match k”) is the essential matching lens. It uses the basic lens k to process strings in both directions, treating the entire source as a reorderable chunk. The get component of ⟨k⟩ simply passes off control to the basic lens k. The res function takes a source s and produces as the rigid complement and {|1 7→ k.res s|} as the resource. The put function accesses the complement through its resource argument: it invokes k.put on the view and r(1) if r is defined on 1 and k.create on the view otherwise. The create function is identical. In examples, we often specify the global align parameter as an argument to match—e.g., we write to indicate that chunks should be aligned using the “best” heuristic. The typechecker verifies that all occurrences of match use the same heuristic—see Section 5. Concatenation These regular operators represent a core language that can be used to express many useful transformations on strings. Concatenation is simplest: C ,k

.

1 l1 ∈ S1 ⇐⇒ V1 C2 ,k l2 ∈ S2 ⇐⇒ V2.

⌊S1 ⌋·! ⌊S2 ⌋ ⌊V1 ⌋·! ⌊V2 ⌋

(C ×C ),k

1 2 l1 ·l2 ∈ (S1 ·S2 ) ⇐===⇒ (V1 ·V2 )

get (s1 ·s2 ) . res (s1 ·s2 )

= (l1 .get s1 )·(l2 .get s2 ) = (c1 , c2 ), (r1 ++ r2 ) where c1 , r1 = l1 .res s1 and c2 , r2 = l2 .res s2 put . (v1 ·v2 ) (c, r) = (l1 .put v1 (c1 , r1 ))·(l2 .put v2 (c2 , r2 )) where c1 , c2 = c and r1 , r2 = split(|v1 |, r) create (v1 ·v2 ) r = (l1 .create v1 r1 )·(l2 .create v2 r2 ) where r1 , r2 = split(|v1 |, r)

Primitives

The first two primitives convert basic lenses into matching lenses.

The get function splits the source into s1 and s2 , applies the get functions of l1 and l2 to these strings, and concatenates the results. We write s1 ·s2 in patterns to indicate that s1 and s2 are strings in S1 and S2 that concatenate to s1 ·s2 . The typing rule requires that the concatenation of ⌊S1 ⌋ and ⌊S2 ⌋ be unambiguous, so s1 and s2 are unique. Also, as S1 and S2 are chunk unambiguous, this condition also ensures that S1 ·S2 is chunk unambiguous.

Lift It should be clear that matching lenses generalize basic lenses. The lift operator witnesses this fact, and makes it possible to use basic lenses like copy and <-> as matching lenses. As the source and view are ordinary strings, the lifted lens does not have chunks so it satisfies the new matching lens laws vacuously.

198

The res function applies l1 .res to s1 and l2 .res to s2 , yielding rigid complements c1 and c2 and resources r1 and r2 . It merges the rigid complements into a pair (c1 , c2 ) and the resources into a finite map (r1 ++ r2 ). As the same basic lens k is mentioned in the types of both l1 and l2 , the resources r1 , r2 , and (r1 ++ r2 ) are all finite maps in {|N 7→ Ck |}.4 This ensures that we can freely reorder the resource and pass arbitrary portions of it to l1 and l2 . The put function splits each of the view, rigid complement, and resource in two, applies the put functions of l1 and l2 to the corresponding pieces of each, and concatenates the results. It splits the resource using split, which yields a resource that behaves like r on locations less than or equal to |v1 | and one that behaves like r shifted down by |v1 | on locations greater than |v1 |. Formally, split is defined as follows: ȷ r(m) if m ≤ n and m ∈ dom(r) (π1 (split(n, r)))(m) = ȷundefined otherwise r(m + n) if (m + n) ∈ dom(r) (π2 (split(n, r)))(m) = undefined otherwise.

The get and res components of the Kleene star lens are straightforward generalizations of the corresponding components of the concatenation lens. The put function, however is different. Because it must be a total function, it needs to handle situations where the number of substrings of the view is different than the number of items in the list of rigid complements—i.e., chunks have been added to or removed from the view. When there are more rigid complements than substrings of the view, the lens simply discards the extra complements. When there are more substrings than rigid complements, it processes the extra substrings using l.create. This is the reason that create takes a resource as an argument— the resource often has entries for the extra chunks (especially if the Kleene star lens appears embedded in an instance of the lower combinator, which pre-aligns the resource against the updated view before it invokes put). The final regular operator is union:

Union

C ,k

Splitting resources in this way ensures that a complement aligned with a chunk in the view remains aligned with the same chunk in the corresponding substring of the view. The proof of G ET P UT uses the equality split(|r1 |, r1 ++ r2 ) = (r1 , r2 ). The typing rule requires that l1 and l2 be defined over the same basic lens k, which ensures that the resource (r1 ++ r2 ) has a uniform type. It is tempting to relax this condition and allow l1 and l2 to be defined over different basic lenses, as long as those lenses have compatible complements. Unfortunately, this would require accepting weaker properties. For example, consider ⟨k1 ⟩·⟨k2 ⟩, where k1 and k2 are defined as follows:

.

()

Invoking put on “bc” yields “ab” as a result.5 Swapping the chunks of “bc” gives “cb”. According to Lemma 3.1, the put function should produce “ba”—i.e., the string obtained by swapping the chunks of “ab”. But this is not what happens: invoking put on “cb” yields “ca”. Thus, although it is tempting to allow matching lenses to use different lenses to process chunks, we do not allow it.

get (s1 · · · sn )

⌊V ⌋! ∗ . l ∈ S ⇐⇒ V (C list),k l∗ ∈ S ∗ ⇐= =⇒ V ∗ C,k

= (l.get s1 ) · · · (l.get sn )

res (s1 · · · sn ) = [c1 , . . . , cn ], (r1 ++ . . . ++ rn ) where ci , ri = l.res si for i ∈ {1, . . . , n} ′

(C +C ),k

The union lens behaves like a bidirectional conditional operator. The get function selects l1 .get or l2 .get by testing whether the source string belongs to ⌊S1 ⌋ or ⌊S2 ⌋. The typing rule requires that these types be disjoint, so this choice is deterministic. The res function also selects l1 .res or l2 .res by testing the source string. It places the resulting rigid complement in a tagged sum, producing Inl (c) if the source belongs to ⌊S1 ⌋ and Inr (c) if it belongs to ⌊S2 ⌋. It does not tag the resource—because l1 and l2 are defined over the same basic lens k for chunks, we can safely pass a resource computed by l1 .res to l2 .put and vice versa. The put function is slightly more complicated, because the typing rule allows the view types to overlap. It tries to select one of l1 .put or l2 .put using the view and uses the rigid complement disambiguate cases where the view belongs to both ⌊V1 ⌋ and ⌊V2 ⌋. The create function is similar. Note that because put is a total function, it needs to handle cases where the view belongs to (⌊V1 ⌋ \ ⌊V2 ⌋) but the complement is of the form Inr (c). To satisfy the P UT G ET law, it must invoke one of l1 ’s component functions, but it cannot invoke l1 .put because the rigid complement c does not necessarily belong to C1 . It discards c and uses l1 .create instead. The side condition (⌊V1 ⌋∩⌊V2 ⌋) ⊆ ⌊V1 ∩V2 ⌋ in the typing rule for union ensures that (V1 ∪ V2 ) is chunk unambiguous—i.e., that strings in the intersection (V1 ∩ V2 ) have unique parses. It rules out languages of chunk-annotated strings such as {“⟨a⟩b”, “a⟨b⟩”}.

The Kleene star operator iterates a lens: ⌊S⌋! ∗ .

⌊S1 ⌋ ∩ ⌊S2 ⌋ = ∅ ⌊V1.⌋ ∩ ⌊V2 ⌋ ⊆ ⌊V1 ∩ V2 ⌋

1 2 l1 | l2 ∈ (S1 ∪ S2 ) ⇐===⇒ (V1 ∪ V2 ) ȷ l .get s if s ∈ ⌊S1 ⌋ get s = 1 ȷl2 .get s if s ∈ ⌊S2 ⌋ Inl (c1 ), r1 if s ∈ ⌊S1 ⌋ = .res s Inr (c2 ), r2 if s ∈ ⌊S2 ⌋ where c1 , r1 = l1 .res s1 8 and c2 , r2 = l2 .res s2 . l .put v (c1 , r) if v ∈ ⌊V1 ⌋ ∧ c = Inl (c1 ) > <1 l .put v (c2 , r) if v ∈ ⌊V2 ⌋ ∧ c = Inr (c2 ) put v (c, r) = 2 l > : 1 .create v r if v ̸∈ ⌊V2 ⌋ ∧ c = Inr (c2 ) ȷ l2 .create v r if v ̸∈ ⌊V1 ⌋ ∧ c = Inl (c1 ) l .create v r if v ∈ ⌊V1 ⌋ create v r = 1 l2 .create v r if v ̸∈ ⌊V1 ⌋

k1 , (a ↔ b | b ↔ a | c ↔ c) ∈ {a, b, c} ⇐= =⇒ {a, b, c} () k2 , (a ↔ b | b ↔ c | c ↔ a) ∈ {a, b, c} ⇐= =⇒ {a, b, c}

Kleene Star

1 l1 ∈ S1 ⇐⇒ V1 C2 ,k l2 ∈ S2 ⇐⇒ V2

′

(c, r) = s1 · · · sn . put (v1 · · · vn ) ȷ l.put vi (ci , ri ) i ∈ {1, . . . , min(n, m)} ′ where si = l.create vi ri i ∈ {m + 1, . . . , n} . and [c1 , · · · , cm ] = c and r0′ = r ′ and ri , ri′ = split(|vi |, r(i−1) ) for i ∈ {1, . . . , n} create (v1 · · · vn ) r = (l.create v1 r1 ) · · · (l.create vn rn ) where r0′ = r ′ and ri , ri′ = split(|vi |, r(i−1) ) for i ∈ {1, . . . , n} 4 Recall

that Ck is the set of basic lens complements for k. k1 and k2 are “bijective”, the rigid complement and resource do not affect the evaluation of put.

Composition The composition operator puts two matching lenses in sequence:

5 As

199

C ,k

the resource is that the put function restores information from both sequential phases to the appropriate chunk:

C ,k

1 1 2 2 l1 ∈ S ⇐= =⇒ U . l2 ∈ U ⇐= =⇒ V . (C1 ⊗C2 ),(k1 ;k2 ) l1 ;l2 ∈ S ⇐=====⇒ V

l.put "b,a" into "Aa1,Bb2,Cc3" = "Bb2,Aa1"

get s . res s

= l2 .get (l1 .get s) = ⟨c1 , c2 ⟩, zip r1 r2 where c1 , r1 = l1 .res s and c2 , r2 = l2 .res (l1 .get s) . put v (⟨c1 , c2 ⟩, r) = l1 .put (l2 .put v (c2 , r2 )) (c1 , r1 ) where r1 , r2 = unzip r create v r = l1 .create (l2 .create v r2 ) r1 where r1 , r2 = unzip r

The typing rule for the composition lens requires that the view type of l1 be identical to the source type of l2 . In particular, it requires that the chunks in these types must be identical. Intuitively, this makes sense—the only way that the put function can reasonably translate alignments on the view back through both phases of computation to the source is if the chunks in the types of each lens agree. However, in some situations, it is useful to compose lenses that have identical erased types but different notions of chunks— e.g., one lens does not have any chunks, while the other lens does have chunks. To do this “asymmetric” form of composition, we can convert both lenses to basic lenses using ⌊·⌋, which forgets the chunks in the source and view, and compose them as basic lenses.

This operator is especially interesting as a matching lens because it handles alignment in two sequential phases of computation. Composition provides strong evidence that our design for matching lenses is robust. Unlike the composition operator defined in our previous work on dictionary lenses, whose behavior was often unpredictable, the constraints imposed by the matching lens laws lead naturally to a definition of an operator whose behavior is intuitive. The get function applies l1 .get and l2 .get in sequence. The res function applies l1 .res to the source s, yielding a rigid complement c1 and resource r1 , and l2 .res to l1 .get s, yielding c2 and r2 . It merges the rigid complements into a pair ⟨c1 , c2 ⟩ and combines the resources by zipping them together, where the zip function takes a C1 -resource and a C2 -resource to a C1 ⊗ C2 -resource as follows:6 ȷ ⟨r1 (m), r2 (m)⟩ if m ∈ dom(r1 ) ∩ dom(r2 ) (zip r1 r2 )(m) = undefined otherwise

5.

So far, our discussion has focused on the core mechanisms of matching lenses—extending basic lenses with chunks and developing an interface for supplying lenses with explicit alignment directives. But we have not said where these alignment directives come from! In this section, we describe the primitives for specifying alignments implemented in our extension of the Boomerang language. We describe three alignment “species” and show how alignments can be tuned using “keys” and “thresholds”. Because alignment is a fundamentally heuristic operation, the choice of an alignment function depends intimately on the details of the application at hand. One of the main strengths of the matching lens framework is its flexibility. Matching lenses can be instantiated with arbitrary alignment functions since well-behavedness does not hinge on any special properties of the function used to align chunks: the only property we require is that it returns the identity alignment when its arguments are identical. Thus, the functions described in this section are not exhaustive; it would be easy to add new primitives as needed.

Note that we have the following equalities dom(r1 )

= = =

Alignments

locations(s) by R ES C HUNKS for l1 locations(l.get s) by G ET C HUNKS for l1 dom(r2 ) by R ES C HUNKS for l2

so zip r1 r2 is defined on all locations in dom(r1 ) and dom(r2 ). The put function unzips the resource and applies l2 .put and l1 .put in that order. The unzip function is defined by ( ci if r(m) = ⟨c1 , c2 ⟩ (πi (unzip r))(m) = undefined otherwise

Species Boomerang currently supports three different alignment “species”, depicted graphically in Figure 1 (a-c): • Positional: The alignment matches chunks by position. If one

list contains more chunks, the extras at the end of the longer list are not matched with any chunk in the other list.

where i ∈ {1, 2}. Because the zipped resource represents the resources generated by l1 and l2 together, rearranging the resource has the effect of pre-aligning the resources for both phases of computation. To illustrate, consider the following example:

• Best match: The alignment minimizes the sum of the total

edit distances between matched chunks and the lengths of unmatched chunks.

let k1 : lens = copy [A-Z]. copy [a-z] . [0-9]<->"" let k2 : lens = [A-Z]<->"" . key (copy [a-z]) let l : lens = . (copy "," . )* ; . (copy "," . )*

• Best non-crossing match: The alignment minimizes the same

heuristic as in best match, but only considers alignments with “non-crossing” edges. This heuristic can be computed efficiently using a variant of the standard algorithm for computing longest common subsequence. For example:

The get function takes a non-empty list of comma-separated chunks containing an upper-case letter, a lower-case letter, and a number, and deletes the number in the first phase and the upper-case letter in the second phase:

let l : lens = key [A-Z] <pos:l>*.put "BCA" into *.put "BCA" into <nonx:l>*.put "BCA" into

l.get "Aa1,Bb2,Cc3" = "a,b,c" The resource produced by res represents the upper-case letter and number together, so even though the alignment is only calculated against the final view, the effect after applying the alignment to

. [0-9]<->"" "A1B2C3" = "B1C2A3" "A1B2C3" = "B2C3A1" "A1B2C3" = "B2C3A0"

When we convert a matching lens to a basic lens using the lower coercion, ⌊·⌋, the align function is instantiated using the species indicated in the match combinator. The Boomerang system checks that the same annotation is used on every instance of the match combinator—e.g., it disallows (<pos:l> . <nonx:l>), which uses two different species.

6 The angle brackets and type operator ⊗ distinguish these pairs from the ordinary pairs generated as rigid complements for the concatenation lens.

200

Keys Typically we only want to consider a part of each chunk when we compute an alignment. Boomerang includes two primitives, key and nokey, that provide a way for programmers to control the portion of each chunk that is used to compute an alignment. These combinators take a basic lens as an argument but they do not change the get/put behavior of the lens they enclose. Instead, they add extra annotations to the view type that we use to “read off” the key for chunks (just as we use annotations on regular expressions to “read off” the locations of chunks). When the align function computes an alignment for two lists of chunks, it first uses the view type to extract the regions of each chunk marked as keys and then computes an alignment. To illustrate the use of keys, consider a simple example:

l.put "DBD;CCC;AAA" into "AAA1;BBB2;CCC3" = "DBD0;CCC3;AAA1" The best species takes an optional integer n as an argument. When supplied with such an integer, it minimizes the total edit distances between aligned chunks, but it only aligns chunks whose longest common subsequence is at least n% of the lengths of their keys. (The strict key-based alignment used in dictionary lenses can be simulated using best 100.) The revised version of the l lens does not align DBD with BBB2 because the longest common subsequence computed from their keys does not meet the threshold. The nonx species also supports thresholds. We often use nonx with a threshold to align chunks containing totally unstructured text.

let k : lens = copy [A-Z] . copy [a-z] . [0-9]<->"" let l : lens = * l.put "CcBbAa" into "Aa1Bb2Cc3" = "Cc1Bb2Aa3"

6.

Extensions

To streamline the discussion, our presentation of matching lenses in the preceding sections has made three important assumptions: (1) chunks only appear at the top level, (2) the same basic lens processes every chunk, and (3) the lens does not reorder chunks in going from source to view. Of course, in many applications, it is important to be able to nest chunks, to use different basic lenses to process chunks, and to reorder chunks. This section describes how we can extend the matching lens framework to accommodate these features. Each of these extensions is implemented in Boomerang.

This program uses the best species, but behaves positionally because the view type does not contain any key annotations—i.e., the key of every chunk is the empty string. By adding a key annotation we obtain a lens whose put function matches up chunks using the upper-case letters in the view: let k : lens = . .key(copy [A-Z]) . copy [a-z] . [0-9]<->"" let l : lens = * l.put "CcBbAa" into "Aa1Bb2Cc3" = "Cc3Bb2Aa1"

6.1

Nested Chunks

To handle sources with reorderable information at several different levels, it is often useful to nest chunks inside each other. For example, suppose that we want to extend our Wiki lens to handle several levels of nested structure: sections, subsections, and paragraphs. So the get function will map the source

Note that lower-case letters, which are not marked as a part of the key, do not affect the alignment: l.put "CaBbAc" into "Aa1Bb2Cc3" = "Ca3Bb2Ac1" The nokey primitive is dual to key—it removes the key annotation on the view type. We can write an equivalent version of the previous lens using nokey:

=Grand Tours= The grand tours are major cycling races... ==Giro d’Italia== The Giro is usually held in May and June... =Classics= The classics are one-day cycling races... ==Milan-San Remo== The Spring classic is held in March...

let k : lens = . . .key(copy [A-Z] . .nokey(copy [a-z]) . [0-9]<->"") These simple mechanisms for indicating keys suffice for many practical examples, but they could be extended in several ways. For example, we could provide programmers with ways to generate unique keys or build keys structured as tuples or records (rather than flattening the portion of each chunk marked as a key into a string). We plan to explore these ideas in future work.

to a view that contains just section and subsection headings:

Thresholds The best and nonx species compute alignments by minimizing the sum of the total edit distances between matched chunks and the lengths of unmatched chunks. In some applications, it is important to not match up chunks that are “too different,” even if aligning those chunks would produce a minimal cost alignment. For instance, in the following example, where keys are three characters long

If we modify the view by reordering sections and adding new subsections,

Grand Tours Giro d’Italia Classics Milan-San Remo

Classics Milan-San Remo Paris-Roubaix Grand Tours Giro d’Italia Tour de France

let k : lens = key [A-Z]{3} . [0-9]<->"" let l : lens = . (copy ";" . )* l.put "DBD;CCC;AAA" into "AAA1;BBB2;CCC3" = "DBD2;CCC3;AAA1"

we would like paragraphs to be restored to the appropriate section or subsection. We can build a matching lens that has chunks at multiple levels of structure using the lower combinator, which converts a matching lens to a basic lens:

we might prefer to not align the DBD and BBB2 chunks with each other. The best species does align them because the cost of a twocharacter edit is less than the six-character edit of deleting BBB from the view and adding DBD. To obtain the desired behavior, we can add a threshold annotation:

let subsection : lens = "=="<->" " . key (copy HEADING) . ("==\n" . PARAGRAPHS)<->"\n"

let l : lens = . . . (copy ";" . )*

201

let section : lens = "="<->"" . key (copy HEADING) . ("=\n" . PARAGRAPHS)<->"\n" . lower * let wiki : lens = *

( < tag "section" best : section > . < tag "subsection" best : subsection >* )* This version of the wiki lens has two chunks at the top level—one for sections and another for subsections. The tag primitives assigns a distinct name to each kind of chunk, where each tag is associated with a different basic lens. On the same inputs as above, the put function of this lens produces a new source

The subsection lens inserts two characters of indentation, copies the heading, and deletes any paragraphs that follow. The section lens copies the heading, deletes the paragraphs that follow, and then uses lower to convert the matching lens that processes the list of subsection chunks into a basic lens. The top-level wiki lens uses the section lens to process a list of section chunks. If we put back the updated view into the original source, we get an updated source where paragraphs are restored appropriately:

=Classics= The classics are one-day cycling races... =Grand Tours= The grand tours are major cycling races... ==Giro d’Italia== The Giro is usually held in May and June... ==Milan-San Remo== The Spring classic is held in March...

=Classics= The classics are one-day cycling races... ==Milan-San Remo== The Spring classic is held in March... ==Paris-Roubaix== =Grand Tours= The grand tours are major cycling races... ==Giro d’Italia== The Giro is usually held in May and June... ==Tour de France==

where the paragraph under the Milan–San Remo subsection is restored from the source. To extend matching lenses with tags we simply generalize each of our structures with an extra level of indirection—e.g., we change the type of resources from finite maps from locations to complements to finite maps from tags to locations to complements. When we align chunks, we compute a separate alignment for each tag. 6.3

The main thing to notice about this program is that we can use lower to build matching lenses that process nested chunks. Lenses built in this way align chunks in strict nested fashion—e.g., in this example, the top-level wiki lens aligns the section chunks and then aligns the nested chunks for subsections within each section. 6.2

Reordering Chunks

Some applications require matching lenses that reorder chunks in going from source to view. The swap operator (l1 ∼ l2 ) is similar to concatenation, but inverts the order of the strings in the view. Adding swap as a primitive breaks the procedure for using a matching lens implemented by the ⌊·⌋ coercion described in Section 3 where we pre-align the resource using a correspondence computed between the old and new view. It also causes problems with the sequential composition operator—in general, the lenses being composed may reorder the source chunks differently, so it does not make sense to simply zip the resources generated by each lens together and align the result against the view. To recover the behavior we want, we need to extend matching lenses with another function that keeps track of the permutation on chunks computed by the lens:

Multiple Lenses

We can also build matching lenses that use different basic lenses to process chunks. Returning to our running example, suppose that we wanted a version of the wiki lens in which subsections and sections are aligned separately. Why would we want this? Observe that the lens described in the previous section never aligns subsections that appear in different sections. This means that if we move a subsection from one section to another Classics Grand Tours Giro d’Italia Milan-San Remo

l.perm ∈ Π s : ⌊S⌋. Perms(locations(s)) It is straightforward to add perm to each of the lenses we have seen so far—e.g., the lift primitive returns the empty permutation, match returns the identity permutation on its single chunk, the concatenation operator merges the permutations returned by its sublenses in the obvious way, and so on. We also need the C HUNK P UT and N O C HUNK P UT laws to use perm—the old versions are no longer valid for lenses that reorder chunks: n ∈ (locations(v) ∩ dom(r)) (l.perm (l.put v (c, r)))(m) = n (C HUNK P UT) (l.put v (c, r))[m] = k.put v[n] (r(n))

the paragraph under that subsection will be lost when we put the result back into the original source =Classics= The classics are one-day cycling races... =Grand Tours= The grand tours are major cycling races... ==Giro d’Italia== The Giro is usually held in May and June... ==Milan-San Remo==

n ∈ (locations(v) \ dom(r)) (l.perm (l.put v (c, r)))(m) = n (N O C HUNK P UT) (l.put v (c, r))[m] = k.create v[n]

because the alignment strictly follows the nesting structure of the document. We can build a lens that aligns section and subsections separately by using two different kinds of chunks, as in the following program, written using “tags”:

These laws generalize the laws given in Section 3. The C HUNK P UT law stipulates that the mth chunk in the source produced by put must be identical to the structure produced by applying k.put to the nth chunk in the view and the element r(n) in the resource, where the permutation computed by the perm function on the source maps m to n. The other laws generalize similarly.

let section : lens = "="<->"" . key (copy HEADING) . ("=\n" . PARAGRAPHS)<->"\n" let wiki : lens =

Composition Using perm, we can refine sequential composition operator to use the permutation on chunks computed in each phase:

202

C ,k

applications, including lenses for structured documents, Wikis, and literate Coq sources.

C ,k

1 1 2 2 l1 ∈ S ⇐= =⇒ U . l2 ∈ U ⇐= =⇒ V . (C1 ⊗C2 ),(k1 ;k2 ) (l1 ;l2 ) ∈ S ⇐=====⇒ V

8.

get s res s

= l2 .get (l1 .get s) = ⟨c1 , c2 ⟩, zip (r1 ◦ p−1 2 ) r2 where c1 , r1 = l1 .res s and c2 , r2 = l2 .res (l1 .get s) . and p2 = l2 .perm (l1 .get s) perm s = (l2 .perm (l1 .get s)) ◦ (l1 .perm s) . put v (⟨c1 , c2 ⟩, r) = l1 .put (l2 .put v (c2 , r2 )) (c1 , r1 ◦ p2 ) where r1 , r2 = unzip r and p2 = l2 .perm (l2 .put v (c2 , r2 )) create v r = l1 .create (l2 .create v r2 ) (r1 ◦ p2 ) where r1 , r2 = unzip r and p2 = l2 .perm (l2 .create v r2 ) The res function applies the inverse of the permutation computed by l2 on the intermediate view to the resource computed by l1 , which puts it into the “view order” of l2 . Likewise, the put function puts the r1 resource back into the view order of l1 . Swap

The swap lens is defined as follows: C ,k

1 l1 ∈ S1 ⇐⇒ V1 C2 ,k l2 ∈ S2 ⇐⇒ V2.

⌊S1 ⌋·! ⌊S2 ⌋ ⌊V2 ⌋·! ⌊V1 ⌋

. (C2 ×C1 ),k l1 ∼ l2 ∈ (S1 ·S2 ) ⇐===⇒ (V2 ·V1 ) get (s1 ·s2 ) res (s1 ·s2 )

= (l2 .get s2 )·(l1 .get s1 ) = (c2 , c1 ), (r2 ++ r1 ) . where c1 , r1 = l1 .res s1 and c2 , r2 = l2 .res s2 perm (s1 ·s2 ) = (l2 .perm s2 ) ∗∗ (l1 .perm s1 ) . put (v2 ·v1 ) (c, r) = (l1 .put v1 (c1 , r1 ))·(l2 .put v2 (c2 , r2 )) where c2 , c1 = c and r2 , r1 = split(|v2 |, r) create (v2 ·v1 ) r = (l1 .create v1 r1 )·(l2 .create v2 r2 ) where r2 , r1 = split(|v2 |, r) Like the concatenation lens, the get component of swap splits the source string in two and applies l1 .get and l2 .get to the resulting substrings. However, before it concatenates the results, it swaps their order. The res, put, and create functions are similar. The perm component of swap combines permutations using the (∗∗) operator ( q1 (m) + |q2 | if m < |q1 | (q2 ∗∗ q1 )(m) = q2 (m − |q1 |) otherwise which behaves like the (++) operator for resources.

7.

Related Work

This paper extends our previous work on lenses [3, 4, 10, 12, 13] with new mechanisms for specifying and using alignments. The original paper on lenses [10] includes an extensive survey of relevant threads from the database and programming languages literature. We focus here on the most closely related work. Matching lenses grew out of the dictionary lenses we proposed previously [3], but they differ in several important ways. First, dictionary lenses are based on a single alignment mechanism—”by keys”—whereas matching lenses provide a generic framework for using alignments in lenses that can be instantiated with arbitrary functions. Second, the semantic laws that govern the behavior of dictionary lenses express much weaker constraints than the matching lens laws, which specify the handling of chunks directly and in detail. Specifically, dictionary lenses obey an E QUIV P UT law that forces the put function to be “oblivious” to certain features of sources characterized by an equivalence relation ∼. By picking ∼ to be an equivalence that relates strings differing only in the relative order of chunks with different keys we get some constraints on put—e.g., it forbids lenses that operate positionally—but these constraints are weaker than the conditions stated in the matching lens laws. For example, Lemma 3.1 does not hold for dictionary lenses because the type system does not explicitly keep track of chunks. Much of the previous work on view update assumes that the user will modify the view using special operations in some “update language”, and, often, these update operations can be used to infer an intended alignment. For example, in Meertens’s work on constraint maintainers for user interfaces [20] users manipulate lists using “small updates” for which it is easy to maintain the correspondence between source and view items. Similarly, the bidirectional languages X and Inv [15, 21] assume that edit operations are applied to the data to yield annotated values that indicate whether a value was newly created or deleted. Their languages handle single insertions and deletions but not general reorderings. Diskin, Xiong, and Czarnecki’s u-lenses [8] offer a much more general semantic space for lenses operating on updates. The details are quite different, but there are some intriguing intuitive similarities to the approach in this paper; in particular, their notion of “sameness relations,” which they use to identify corresponding structures in the source and view, seems deeply related to our alignments. Many relational view update translators use schemas to guide the selection of a source update. For example Keller identifies criteria for view update translators requiring that the key of each source item appears in the view [17]. Matching lenses also use a notion of keys for alignment, but they permit the correspondence between chunks to be computed using arbitrary heuristics. Alignment issues also come up in software model transformations. Some systems offer “traceability links” that can be used for alignment [5, 23].

Implementation

To test the expressiveness and usability of our framework, we have extended the Boomerang implementation with the string matching lenses discussed in Section 4, with all the alignment strategies described in Section 5 and the extensions in Section 6. Typechecking is decidable—indeed, it can be made quite efficient using standard regular-expression algorithms over an extended alphabet Σ ∪ {‘⟨’, ‘⟩’}, where the extra characters are used for checking side conditions involving chunk annotations. Alignment strategies are implemented using a straightforward algorithm for the positional strategy, a diff-like least-common-subsequence algorithm for the non-crossing best matching, and a version of the Hungarian algorithm for the best-match strategy. We have developed several small

9.

Conclusions and Future Work

Matching lenses provide a general solution to the problems that come up when updatable views are defined over ordered structures. Decoupling the handling of rigidly ordered and reorderable information yields a flexible framework that can be instantiated with arbitrary heuristics for alignment. Our work can be extended in several directions. We are interested in exploring other axiomatizations of matching lenses. One idea, originally suggested by Alexandre Pilkiewicz, is to replace the current laws with laws stated in terms of a lens on skeletons and a basic lens mapped on the list of chunks. This would provide a

203

more elegant description of the semantics of matching lenses. However, we believe it would make it more complicated to verify operators such as concatenation. We are interested in instantiating the framework of matching lenses in other settings besides strings and exploring implementation issues, including algebraic optimization and lenses for streaming data.

[12] J. Nathan Foster, Benjamin C. Pierce, and Steve Zdancewic. Updatable security views. In IEEE Computer Security Foundations Symposium (CSF), Port Jefferson, NY, pages 60–74, July 2009.

Acknowledgments We thank Zack Ives, Alexandre Pilkiewicz, Val Tannen, Philip Wadler, Steve Zdancewic, and the anonymous ICFP reviewers for helpful comments. Our work is supported by the National Science Foundation under grants IIS-0534592 Linguistic Foundations for XML View Update, and CT-0716469 Manifest Security.

[13] J. Nathan Foster, Alexandre Pilkiewcz, and Benjamin C. Pierce. Quotient lenses. In ACM SIGPLAN International Conference on Functional Programming (ICFP), Victoria, BC, pages 383–395, September 2008. [14] G. Gottlob, P. Paolini, and R. Zicari. Properties and update semantics of consistent views. ACM Transactions on Database Systems (TODS), 13(4):486–524, 1988.

[11] J. Nathan Foster and Benjamin C. Pierce. Boomerang Programmer’s Manual, 2009. Available from http://www.seas.upenn. edu/~harmony/.

[15] Zhenjiang Hu, Shin-Cheng Mu, and Masato Takeichi. A programmable editor for developing structured documents based on bidirectional transformations. Higher-Order and Symbolic Computation, 21(1–2), June 2008. [16] C. Barry Jay and J. Robin B. Cockett. Shapely types and shape polymorphism. In Proceedings of the European Symposium on Programming (ESOP), London, UK, pages 302–316, 1994.

References [1] Franc¸ois Bancilhon and Nicolas Spyratos. Update semantics of relational views. ACM Transactions on Database Systems, 6(4):557–575, December 1981. [2] Jean Berstel, Dominique Perrin, and Christophe Reutenauer. Codes and Automata. Cambridge University Press, 2009. [3] Aaron Bohannon, J. Nathan Foster, Benjamin C. Pierce, Alexandre Pilkiewicz, and Alan Schmitt. Boomerang: Resourceful lenses for string data. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), San Francisco, CA, pages 407– 419, January 2008. [4] Aaron Bohannon, Jeffrey A. Vaughan, and Benjamin C. Pierce. Relational lenses: A language for updateable views. In ACM SIGACT– SIGMOD–SIGART Symposium on Principles of Database Systems (PODS), Chicago, TL, 2006. Extended version available as University of Pennsylvania technical report MS-CIS-05-27. [5] Krzysztof Czarnecki, J. Nathan Foster, Zhenjiang Hu, Ralf L¨ammel, Andy Sch¨urr, and James F. Terwilliger. Bidirectional transformations: A cross-discipline perspective. GRACE meeting notes, state of the art, and outlook. In International Conference on Model Transformations (ICMT), Zurich, Switzerland, pages 260–283, June 2009. Invited paper.

[17] Arthur M. Keller. Algorithms for translating view updates to database updates for views involving selections, projections, and joins. In Proceedings of Fourth Annual ACM Symposium on Principles of Database Systems (PODS), pages 154–163, march 1985. Portland, Oregon. [18] David Lutterkort. Augeas–A configuration API. In Linux Symposium, Ottawa, ON, pages 47–56, 2008. [19] Kazutaka Matsuda, Zhenjiang Hu, Keisuke Nakano, Makoto Hamana, and Masato Takeichi. Bidirectionalization transformation based on automatic derivation of view complement functions. In ACM SIGPLAN International Conference on Functional Programming (ICFP), Freiburg, Germany, pages 47–58, 2007. [20] Lambert Meertens. Designing constraint maintainers for user interaction, 1998. Manuscript, available from ftp://ftp.kestrel.edu/ pub/papers/meertens/dcm.ps. [21] Shin-Cheng Mu, Zhenjiang Hu, and Masato Takeichi. An algebraic approach to bi-directional updating. In ASIAN Symposium on Programming Languages and Systems (APLAS), pages 2–20, November 2004. [22] Hugo Pacheco and Alcino Cunha. Generic point-free lenses. In International Conference on Mathematics of Program Construction (MPC), Qu´ebec City, QC, 2010. To appear. [23] Perdita Stevens. Bidirectional model transformations in QVT: Semantic issues and open questions. In International Conference on Model Driven Engineering Languages and Systems (MoDELS), Nashville, TN, volume 4735 of Lecture Notes in Computer Science, pages 1–15. Springer-Verlag, 2007. [24] Janis Voigtl¨ander. Bidirectionalization for free! In ACM SIGPLAN– SIGACT Symposium on Principles of Programming Languages (POPL), Savannah, GA, pages 165–176, January 2009. [25] Meng Wang, Jeremy Gibbons, Kazutaka Matsuda, and Zhenjiang Hu. Gradual refinement: Blending pattern matching with data abstraction. In International Conference on Mathematics of Program Construction (MPC), Qu´ebec City, QC, 2010. To appear. [26] Y. Xiong, Z. Hu, H. Zhao, H. Song, M. Takeichi, and H. Mei. Supporting automatic model inconsistency fixing. In ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), Amsterdam, Netherlands, pages 315–324, 2009.

[6] Umeshwar Dayal and Philip A. Bernstein. On the correct translation of update operations on relational views. ACM Transactions on Database Systems, 7(3):381–416, September 1982. [7] Zinovy Diskin. Algebraic models for bidirectional model synchronization. In International Conference on Model Driven Engineering Languages and Systems (MoDELS), Toulouse, France, pages 21–36, September 2008. [8] Zinovy Diskin, Yingfei Xiong, and Krzysztof Czarnecki. From stateto delta-based bidirectional model transformations. In Laurence Tratt and Martin Gogolla, editors, ICMT, volume 6142 of Lecture Notes in Computer Science, pages 61–76. Springer, 2010. [9] J. Nathan Foster, Michael B. Greenwald, Christian Kirkegaard, Benjamin C. Pierce, and Alan Schmitt. Exploiting schemas in data synchronization. Journal of Computer and System Sciences, 73(4), June 2007. Short version in DBPL ’05. [10] J. Nathan Foster, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce, and Alan Schmitt. Combinators for bidirectional tree transformations: A linguistic approach to the view update problem. ACM Transactions on Programming Languages and Systems, 29(3), May 2007.

204

Bidirectionalizing Graph Transformations Soichiro Hidaka Kazuhiro Inaba

Zhenjiang Hu Hiroyuki Kato

National Institute of Informatics, Japan {hidaka,hu,kinaba,kato}@nii.ac.jp

Kazutaka Matsuda

Keisuke Nakano

Tohoku University, Japan [email protected]

The University of Electro-Communications, Japan [email protected]

Abstract

applications, including the synchronization of replicated data in different formats (Foster et al. 2005), presentation-oriented structured document development (Hu et al. 2008), interactive user interface design (Meertens 1998), coupled software transformation (L¨ammel 2004), and the well-known view updating mechanism which has been intensively studied in the database community (Bancilhon and Spyratos 1981; Dayal and Bernstein 1982; Gottlob et al. 1988; Hegner 1990; Lechtenb¨orger and Vossen 2003). Despite many promising results on bidirectional transformations, they are limited to the context of relational or XML (tree-like) databases. It remains unresolved (Czarnecki et al. 2009) whether bidirectional transformations can be addressed within the context of graphs containing node sharing and cycles. It would be remarkably useful in practice if bidirectional transformation could be applied to graph data structures, because graphs play an irreplaceable role in naturally representing more complex data structures such as those in biological information, WWW, UML diagrams in software engineering (Stevens 2007), and the Object Exchange Model (OEM) used for exchanging arbitrary database structures (Papakonstantinou et al. 1995). There are many challenges in addressing bidirectional transformation on graphs. First, unlike relational or XML databases, there is no unique way of representing, constructing, or decomposing a general graph, and this requires a more precise definition of equivalence between two graphs. Second, graphs have shared nodes and cycles, which makes both forward and backward computation much more complicated than that on trees; na¨ıve computation on graphs would visit the same nodes many times and possibly infinitely. It is particularly difficult to handle insertion in backward transformation because it requires a suitable subgraph to be created and inserted into a proper place in the source. This paper reports our first solution to the problem of bidirectional graph transformation. We approach this problem by providing a bidirectional semantics for UnCAL, which is a graph algebra for the known graph query language UnQL (Buneman et al. 2000); forward semantics (forward evaluation) corresponds to forward transformation and backward semantics (backward evaluation) corresponds to backward transformation. We choose UnQL/UnCAL as the basis of our bidirectional graph transformation for two main reasons.

Bidirectional transformations provide a novel mechanism for synchronizing and maintaining the consistency of information between input and output. Despite many promising results on bidirectional transformations, these have been limited to the context of relational or XML (tree-like) databases. We challenge the problem of bidirectional transformations within the context of graphs, by proposing a formal definition of a well-behaved bidirectional semantics for UnCAL, i.e., a graph algebra for the known UnQL graph query language. The key to our successful formalization is full utilization of both the recursive and bulk semantics of structural recursion on graphs. We carefully refine the existing forward evaluation of structural recursion so that it can produce sufficient trace information for later backward evaluation. We use the trace information for backward evaluation to reflect in-place updates and deletions on the view to the source, and adopt the universal resolving algorithm for inverse computation and the narrowing technique to tackle the difficult problem with insertion. We prove our bidirectional evaluation is well-behaved. Our current implementation is available online and confirms the usefulness of our approach with nontrivial applications. Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifications—Specialized application languages; E.1 [Data Structures]: Graphs and networks General Terms Design, Languages Keywords bidirectional transformation, view updating, graph query and transformation, structural recursion

1.

Introduction

Bidirectional transformations (Czarnecki et al. 2009; Foster et al. 2005) provide a novel mechanism for synchronizing and maintaining the consistency of information between input and output. They consist of a pair of well-behaved transformations: forward transformation is used to produce a target view from a source, while the backward transformation is used to reflect modification on the view to the source. This pair of forward and backward transformations should satisfy certain bidirectional properties. Bidirectional transformations are indeed pervasive and can be seen in many interesting

• First, UnQL/UnCAL is a graph query language that has been

well studied in the database community with a solid foundation and efficient implementation. It has a concise and practical surface syntax based on select-where clauses like SQL, and can be easily used to describe many interesting graph transformations.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

• Second, and more importantly, graph transformations in UnQL

can be automatically mapped to those in terms of structural recursion in UnCAL, which can be evaluated in a bulk manner (Buneman et al. 2000); a structural recursion is evaluated by first processing in parallel on all edges of the input graph and

205

then combining the results. This bulk semantics significantly contributes to our bidirectionalization, providing a smart way of treating shared nodes and cycles in graphs and of tracing back from the view to the source.

1

0

a b c

1 2

Our main technical contributions are summarized as follows.

3

4

c

a b

a a

61

• We are, as far as we are aware, the first to have recognized

5

the importance of structural recursion and its bulk semantics in addressing the challenging problem of bidirectional graph transformation, and succeeded in a novel two-stage framework of bidirectional graph transformation based on structural recursion. We demonstrate that graph transformations defined in terms of structural recursions (being suitable for optimization as have been intensively studied thus far (Buneman et al. 2000)) make backward evaluation easier.

2

3

c 62

da

a d

51

52

42 c

4 c c

d 6 (a) A Simple Graph

41

(b) An Equivalent Graph

Figure 1. Graph Equivalence Based on Bisimulation

• We give a formal definition of bidirectional semantics for Un-

CAL by (1) refining the existing forward evaluation so that it can produce useful trace information for later backward evaluation (Section 4), and (2) using the trace information to reflect in-place updates and deletions on the view to the source, and adopt the narrowing technique to tackle the difficult problem with insertion (Section 5). We prove our bidirectional evaluation is well-behaved.

2.

UnCAL: A Graph Algebra

We adopted UnCAL (Buneman et al. 2000), a well-studied graph algebra, as the basis of our bidirectional graph transformation. We will briefly review its graph data model and the core of UnCAL. 2.1

Graph Data Model

We deal with rooted, directed, and edge-labeled graphs with no order on outgoing edges. They are edge-labeled in the sense that all information is stored on labels of edges while labels of nodes serve only as a unique identifier without a particular meaning. UnCAL graph data model has two prominent features, markers and ε-edges. Nodes may be marked with input and output markers, which are used as an interface to connect them to other graphs. An ε-edge represents a shortcut of two nodes, working like the ε-transition in an automaton† . We use Label to denote the set of labels and M to denote the set of markers. Formally, a graph G, sometimes denoted by G(V,E,I,O) , is a quadruple (V, E, I, O), where V is a set of nodes, E ⊆ V × (Label ∪{ε})×V is a set of edges, I ⊆ M×V is a set of pairs of an input marker and the corresponding input node, and O ⊆ V × M is a set of pairs of output nodes and associated output markers. For each marker &x ∈ M, there is at most one node v such that (&x, v) ∈ I. The node v is called an input node with marker &x and is denoted by I(&x). Unlike input markers, more than one node can be marked with an identical output marker. They are called output nodes. Intuitively, input nodes are root nodes of the graph (we allow a graph to have multiple root nodes, and for singly rooted graphs, we often use default marker & to indicate the root), while an output node can be seen as a “context-hole” of graphs where an input node with the same marker will be plugged later. We write inMarker(G) to denote the set of input markers and outMarker(G) to denote the set of output markers in a graph G. In addition, we write label(ζ) to denote the label of the edge ζ. Note that multiple-marker graphs are meant to be an internal data structure for graph composition. In fact, the initial source graphs of our transformation have one input marker (single-rooted) and no output markers (no holes). For instance, the graph in Figure 1(a) is denoted by (V, E, I, O) where V = {1, 2, 3, 4, 5, 6}, E = {(1, a, 2), (1, b, 3), (1, c, 4), (2, a, 5), (3, a, 5), (4, c, 4), (5, d, 6)}, I = {(&, 1)}, and O = {}.

• We have fully implemented our bidirectionalization presented

in this paper and confirmed the effectiveness of our approach through many non-trivial examples, including all those presented in this paper and some typical bidirectional graph transformations in database management and software engineering. More examples and demos are available on our BiG project Web site∗ . We consider an operation based approach, which means that the user explicitly provides editing operations in terms of ”rename”, ”delete”, and ”insert”. Currently these operations are treated according to the order specified by users. It might be challenging to produce these operation sequences automatically from the states before and after user’s modifications on the view, but it is beyond the scope of this paper. The forward transformations we consider is based on UnCAL, which is bisimulation generic, meaning that the transformation can’t distinguish between graphs that are bisimilar. For example, it can’t extract “first child of a node”. Extending our model to cope with order is included in our future work. Also note that backward transformation is not bisimulation generic, meaning that two results of updates that are bisimilar do not always lead to bisimilar source. However, this is not necessarily a limitation introduced by our bidirectionalization, since this asymmetry comes from the expressiveness of conditional expression in the original UnCAL graph algebra. Similar argument apply for isomorphic updates. Outline We start with a brief review of the basic concept of a graph data model and the structural recursion of UnCAL in Section 2. Then, we clarify the bidirectional properties within our context and give an overview of our two-staged framework for bidirectionalizing graph transformations in Section 3. After explaining how to extend the forward evaluation of UnCAL with trace information in Section 4, we give a formal definition of bidirectional semantics for UnCAL and prove that it is well-behaved in Section 5. We discuss implementation issues in Section 6 and related work in Section 7. We conclude the paper in Section 8.

Value Equivalence between Graphs Two graphs are value equivalent if they are bisimilar. Please refer to (Buneman et al. 2000) for the complete definition. Informally, graph G1 is bisimilar to graph G2 if every node x1 in G1 has at least a bisimilar counterpart x2 † This analogy would choose NFA rather than DFA, since we allow multiple

∗ http://www.biglab.org

outgoing edges with identical labels from a node.

206

e ::= | | | |

{} | {l : e} | e ∪ e | &x := e | &y | () e ⊕ e | e @ e | cycle(e) { constructor } $g { graph variable } if l = l then e else e { conditional } rec(λ($l , $g).e)(e) { structural recursion application }

Figure 3. Core UnCAL Language Example 1. The graph equivalent to that in Figure 1(a) can be constructed as follows (though not uniquely). &z @ cycle((&z := {a : {a : &z1 }} ∪ {b : {a : &z1 }} ∪ {c : &z2 }) ⊕ (&z1 := {d : {}}) ⊕ (&z2 := {c : &z2 })) For simplicity, we often write {l1 : G1 , . . . , ln : Gn } to denote {l1 : G1 } ∪ · · · ∪ {ln : Gn }.

Figure 2. Graph Constructors

2.2

UnCAL (Unstructured Calculus) is an internal graph algebra for the graph query language UnQL, and its core syntax is depicted in Figure 3. It consists of the graph constructors, variables, conditionals, and structural recursion. We have already detailed the data constructors, while variables and conditionals are self explanatory. Therefore, we will focus on structural recursion, which is a powerful mechanism in UnCAL to describe graph transformations. A function f on graphs is called a structural recursion if it is defined by the following equations‡

in G2 and vice versa, and if there is an edge from x1 to y1 in G1 , then there is a corresponding edge from x2 to y2 in G2 that is a bisimilar counterpart of y1 , and vice versa. Therefore, unfolding a cycle or duplicating shared nodes does not really change a graph. This notion of bisimulation is extended to cope with ε-edges. For instance, the graph in Figure 1(b) is value equivalent to the graph in Figure 1(a); the new graph has an additional ε-edge (denoted by the dotted line), duplicates the graph rooted at node 5, and unfolds and splits the cycle at node 4. Unreachable parts are also disregarded, i.e., two bisimilar graphs are still bisimilar if one adds subgraphs unreachable from input nodes.

f ({}) f ({$l : $g}) f ($g1 ∪ $g2 )

Graph Constructors Figure 2 summarizes the nine graph constructors that are powerful enough to describe arbitrary (directed, edge-labeled, and rooted) graphs (Buneman et al. 2000): G

::= | | | | | | | |

{} {l : G} G1 ∪ G2 &x := G &y () G1 ⊕ G2 G1 @ G2 cycle(G)

The Core UnCAL

= = =

{} e @ f ($g) f ($g1 ) ∪ f ($g2 ),

where the expression e may contain references to variables $l and $g (but no recursive calls to f ). Since the first and the third equations are common in all structural recursions, we write the structural recursion in UnCAL simply as

{ single node graph } { an edge pointing to a graph } { graph union } { label the root node with an input marker } { a node graph with an output marker } { empty graph } { disjoint graph union } { append of two graphs } { graph with cycles }

f ($db) = rec(λ($l , $g).e)($db). Despite its simplicity, the core UnCAL is powerful enough to describe interesting graph transformation including all graph queries (in UnQL) (Buneman et al. 2000), and nontrivial model transformations (Hidaka et al. 2009). Some simple examples are given below. Example 2. The following structural recursion a2b replaces edge label a with b and leaves other labels unchanged.

Here, {} constructs a root-only graph, {l : G} constructs a graph by adding edge l pointing to the root of graph G, and G1 ∪ G2 adds two ε-edges from the new root to the roots of G1 and G2 . Also, &x := G associates an input marker, &x, to the root node of G, &y constructs a graph with a single node marked with one output marker &y, and () constructs an empty graph that has neither a node nor an edge. Further, G1 ⊕ G2 constructs a graph by using a componentwise (V, E, I and O) union. ∪ differs from ⊕ in that ∪ unifies input nodes while ⊕ does not. ⊕ requires input markers of operands to be disjoint, while ∪ requires them to be identical. G1 @ G2 composes two graphs vertically by connecting the output nodes of G1 with the corresponding input nodes of G2 with ε-edges, and cycle(G) connects the output nodes with the input nodes of G to form cycles. Newly created nodes have unique identifiers. We will give this creation rule extended for our bidirectionalization in Section 4.1. The definition here is based on graph isomorphism (identical graph construction expressions results in identical graphs up to isomorphism), and they are, together with other operators, also bisimulation generic (Buneman et al. 2000), i.e., bisimilar result is obtained for bisimilar inputs.

a2b($db) = rec(λ($l , $g). if $l = a then {b : &1 }2 else {$l : &3 }4 ) ($db)5 (The superscripts are for identifying code positions, which will be important in Section 4; they can simply be ignored for now.) Here is an instance of an execution: “ a2b ◦

c a

”

$•; = ◦

c b

;•$

where ◦ denotes the root of the graph. ‡ Informally,

the meaning of this definition can be considered to be a fixed point (though may not necessarily unique) over the graph, which is again defined by a set of equations using the three constructors {}, :, and ∪. For instance, the graph in Figure 1(a) can be considered to be the fixed point of the following equations: Groot G5 G4

207

= = =

{a : {a : G5 }, b : {a : G5 }, c : G4 } {d : {}} {c : G4 }.

root node.

1 d S12

S13 b S14

d

b

E12

E13

2

3

E14

S44

4

E44

c / d / e / • •) = abab(◦ •

◦;;a •A ; a •A ;;a •A ;; ; ;; a b a ;;; ; ; ;;; = ◦ •/ /• /• • b • b • b •

d d S25

2.3

S35

d

d

E25

Bulk Semantics of Structural Recursion

E35

By allowing ε-edges, we can evaluate a structural recursion in a bulk manner. Consider the structural recursion,

d

rec(λ($l , $g). e)

5 d S56

which is to be applied to an input graph G. In bulk semantics, we apply body e independently on every edge (l, g) in G where l is the label of the edge and g is the graph that the edge is pointing to, then join the results with ε-edges (as in the @ constructor). Recall the structural recursion a2d xc defined in Example 3. Applying it to the input graph in Figure 1(a) yields the graph in Figure 4(a), where each edge from i to j in the input graph leads to a subgraph containing a graph with an edge from Sij to Eij in the output graph (where the dotted edge denotes an ε-edge), and these subgraphs are connected with ε-edges according to the original shape of the graph. If we eliminate all ε-edges as explained in Section 3.2, we obtain a standard graph in Figure 4(c). One distinct feature of bulk semantics is that the shape of the input graph is remembered through additional ε-edges, which will be fully utilized in our later bidirectionalization.

d E56 6

(a) Before Removing ε- (b) Removing ε-edges edges

(c) After Removing ε-edges

Figure 4. Bulk Semantics of Structural Recursion in UnCAL Example 3. The following structural recursion a2d xc replaces all labels a with d and removes edges labeled c. a2d xc($db) = rec(λ($l , $g). if $l = a then {d : &1 }2 else if $l = c then {ε : &3 }4 else {$l : &5 }6 ) ($db)7 Applying the function a2d xc to the graph in Figure 1(a) yields the graph in Figure 4(c).

3.

It is more challenging to bidirectionalize transformations on graphs than trees, because graphs may contain shared nodes or cycles. We shall demonstrate that the structural recursion in UnCAL can serve as the basis to solve this problem. Although structural recursion was proposed within the context of query optimization, we will show that it plays a crucial role in our bidirectionalization.

Example 4. The following structural recursion consecutive extracts subgraphs that can be accessible by traversing two connected edges of the same label. consecutive($db) = rec(λ($l , $g). rec(λ($l 0 , $g 0 ). if $l = $l 0 then {result : $g 0 }1 else {}2 )($g)3 )($db)4

3.1

consecutive

8 a /• X •/ • Lr ◦r L& a / Y / b • • •

!

=◦

result

/•

X

Bidirectional Properties

Bidirectionalization is used to derive backward transformation from forward transformation. We approach the problem of bidirectionalization in graph transformation by providing a bidirectional semantics for UnCAL; forward semantics (forward evaluation) corresponds to forward transformation and backward semantics (backward evaluation) corresponds to backward transformation. Before giving our bidirectional semantics for UnCAL, let us clarify the bidirectional properties that the forward and backward evaluations should satisfy. Let F [[e]]ρ denote a forward evaluation (get) of expression e under environment ρ to produce a view, and B[[e]](ρ, G0 ) denote a backward evaluation (put) of expression e under environment ρ to reflect a possibly modified view G0 to the source by computing an updated environment. ρ is a set of mappings with form $x 7→ G with a graph (or label) G. The following are two important properties:

For example, we have a

Overview: Bidirectionalizing UnCAL

/•

Note that the structural recursive definition of consecutive uses graph parameter $g 0 to achieve the transformation. Also note that structural recursions are allowed to be nested, and inner recursion can refer to outer variables (as $l in the example). This enables us to express the join of multiple queries. Example 5. Although the examples given so far are self-recursive, it is possible to simulate mutual recursion by returning graphs with multiple markers. For instance, the following function abab

F [[e]]ρ = G (G ET P UT ) B[[e]](ρ, G) = ρ

abab($db) = &z1 @ rec(λ($l , $g). &z1 := {a : &z2 } ⊕ &z2 := {b : &z1 })($db)

B[[e]](ρ, G0 ) = ρ0 G0 ∈ Range(F[[e]]) (P UT G ET ) F[[e]]ρ0 = G0 The (G ET P UT) property states that unchanged view G should give no change on the environment ρ in the backward evaluation, while the (P UT G ET) property states that if a view is modified to G0 which is in the range of the forward evaluation, then this modification can be reflected to the source such that a forward evaluation will produce the same view G0 . These two properties are essentially the same as those in (Foster et al. 2005). One problem with the (P UT G ET) property is that it

changes all edges of even distances from the root node to a, and odd distance edges to b. We may consider the markers &zi as a mutually recursive call, and abab to consist of two mutual recursive functions. The first is &z1 , which, at each edge in the original graph, generates a new a edge pointing to the result of &z2 at the original destination node. The second is &z2 that generates b edges pointing to the result of &z1 from its destination. The result of the whole expression is defined to be the result of the &z1 at the root node of the argument graph. The following figure should be helpful. The dashed edges denote the edges that are unreachable from the output

208

v

ε closure

v

Stage 1 satisfies the ε-marker preserving property: (1) No ε-edges are added or deleted, (2) Markers are not added, deleted, or changed and (3) Unreachable parts are not modified. This property is very important in our bidirectionalization, because it not only enforces the nine graph constructors so that they are invertible, but it also makes it easy to bidirectionalize structural recursion because there is a clear correspondence between the input and output graphs. In the rest of this paper, we will focus on bidirectional graph transformation in Stage 1.

ε closure

Figure 5. General ε-edge Elimination Procedure

needs to check whether a graph is in the range of forward evaluation, which is difficult to do in practice. To avoid this range checking, we allow the modified view and the view obtained by backward evaluation followed by forward evaluation to differ, but require both views to have the same effect on the original source if backward evaluation is applied.

4.

An UnCAL expression usually specifies a forward evaluation mapping a graph database (which is just a graph) to a view graph (in Section 2). The main purpose of the present paper is to give backward evaluation (backward semantics), which specifies how to reflect view updates to the graph database. For this purpose, we have to detect how each node of the view is generated, particularly when it is constructed through connecting input/output markers and removing ε-edges, which are no longer in the view. To make the view more informative, viz., traceable, we enrich the original semantics of UnCAL by embedding trace information (like provenance traces (Cheney et al. 2008)) in all nodes of the view that possibly includes ε-edges. In this section, we explain what kind of trace information is embedded in the view, and extend the original semantics for UnCAL expressions to be evaluated into traceable views.

B[[e]](ρ, G0 ) = ρ0 F [[e]]ρ0 = G00 (WP UT G ET ) B[[e]](ρ, G00 ) = ρ0 The get in our (WP UT G ET) can be considered as an amendment of the modified view G0 to G00 . Certainly, if the (P UT G ET) property holds, so does the (WP UT G ET). We say that a pair of forward and backward evaluations is well-behaved if it satisfies (G ET P UT) and (WP UT G ET) properties. In the rest of this paper, we will give a bidirectional evaluation (semantics) for UnCAL, and prove the following theorem, which is a direct consequence of Lemmas 2, 3, and 4 that will be discussed later.

4.1

Theorem 1 (Well-behavedness). Our forward and backward evaluations are well-behaved, provided their evaluations succeed. 3.2

Traceable Forward Evaluation

Traceable Views

A view is obtained by evaluating an UnCAL expression with a database. Every node of the view originates in either a node of the database or a construct in the UnCAL expression, except when the node is generated through a structural recursion with a rec construct (in the bulk semantics). Recall that an expression rec(λ($l , $g).e1 )(e2 ) is evaluated by binding variables $l and $g in e1 to a part of the evaluation result of e2 . In this case, a node in the view may originate not only in the whole rec expression but also a sub-expression in e2 . A traceable view is a view each node of which has information for tracing its origin. The information, called trace ID, is defined by TraceID ::= SrcID | Code Pos Marker | RecN Pos TraceID Marker | RecE Pos TraceID Edge, where SrcID ranges over identifiers uniquely assigned to all nodes of the database, Pos ranges over code positions in the UnCAL expression, Marker ranges over input/output markers, and Edge stands for TraceID × Label × TraceID with a set of labels Label . We now briefly explain the meaning of each trace ID. Let i be a trace ID of a node u in a traceable view. When i is a node identifier in SrcID, node u originates in the node assigned by i in the database. When i is Code p &m with code position p and input marker &m, node u originates in the subexpression at p in the UnCAL expression. The marker &m is only required when the subexpression is given by the ∪ or cycle construct. This is because these constructs yield as many ε-edges as input markers. When i is either RecN p i0 &m or RecE p i0 (i1 , a, i2 ), node u is generated through the rec construct at the code position p. RecN and RecE stand for what node and edge, respectively, of the argument of the recursion, the node originates in. Let us explain these cases through an example where the UnCAL expression a2d xc in Example 3 is applied to the database Gsrc in Figure 1(a). The traceable view we want can be obtained from the graph Gview in Figure 4(a) by assigning trace IDs to all nodes. The trace ID assigned to node 1 in Gview is

Two-Stage Bidirectionalization

Recall a2d xc, which maps the source graph in Figure 1(a) to the view graph in Figure 4(c). The big gap between the source and the view makes it hard to reflect changes on the view to the source. Our idea to bridge this gap was to divide the forward evaluation into two easily handled stages: • Stage 1: Forward evaluation (in the bulk semantics) with suffi-

cient ε-edges, so that the output graph will have a similar shape to the input graph, making the later backward evaluation easier. • Stage 2: Elimination of ε-edges to produce a usual view.

For a2d xc, Stage 1 maps the source graph to the intermediate graph in Figure 4(a), and Stage 2 maps the intermediate graph to the view graph (Figure 4(c)). By doing so, each stage becomes easier to bidirectionalize. First, let us consider Stage 2. The ε-edge elimination procedure is simple: new edges are added to skip the ε-closure (Figure 5). It is easy to define a well-behaved backward evaluation for this procedure. First, all nodes in the result graph, Gv , exist in the original graph, Gs , so each node in Gv can be traced to Gs . Second, although an edge in Gs may be duplicated in Gv ((E25, d, E56) and (E35, d, E56) in Figure 4(b))§ , each edge in Gv should have a uniquely corresponding edge in Gs . Therefore, adding a new node to Gv corresponds to adding a new node to Gs , and adding a new edge to Gv corresponds to adding a new edge between two corresponding nodes in Gs . Similar correspondence holds for deletions of nodes and edges, and in-place updates of edges. Next, let us consider Stage 1. One fact worth noting is that after the backward evaluation in Stage 2, the modification to the view in § Note

that Figure 4(c) does not have this duplication because for this particular graph, it is safe to glue the source and the destination nodes of an ε-edge together. It is unsafe, if and only if, the source has another outgoing edge and the destination has another incoming edge. Here, duplication is unavoidable.

209

(RecN 7 1 &) because the node originates in node 1 of Gsrc in SrcID, which is used as a part of the argument of the rec construct at code position 7 in a2d xc. The trace ID assigned to node S12 in Gview is (RecE 7 (Code 2) (1, a, 2)) because the node originates in the a-labeled edge from node 1 to 2 of Gsrc in Edge through the graph constructor {d : } at code position 2 in the rec construct at 7 in a2d xc. When the argument of the rec construct is also a rec expression, RecN and RecE in the trace ID are nested like (RecN p (RecE p0 . . . ) . . . ) and (RecE p (RecE p0 . . . ) (RecN . . . , a, RecN . . . )). A traceable view is denoted by a quadruple (V, E, I, O) just like an ordinary UnCAL graph. The only difference is that in traceable views, trace IDs are assigned to all nodes.

where fwd eachedge and composerec are defined in Figure 6. Intuitively, fwd eachedge evaluates the body expression eb at each edge ζ of the argument graph Ga obtained by evaluating ea and returns the set of result graphs. Then, composeprec glues all the graphs together along the structure of Ga concerning code position p. Note that subgraph(G, ζ) denotes the subgraph to which the edge ζ is pointing in the graph G. Example 6. We will now illustrate the semantics of rec through an example: the structural recursion a2d xc, which is defined with position information in Example 3, is applied to Gsrc in Figure 1(a), and the traceable view is a graph similar to Gview in Figure 4(a). First, Gsrc is bound to a variable $db. Then, fwd eachedge generates a set of pairs of an edge and a ‘local result’ for each edge in Gsrc . The local result is obtained by evaluating the body of rec under ρ = {$db 7→ Gsrc } ∪ {$l 7→ L, $g 7→ G} with the label L of the edge and a subgraph G reachable from the edge. For example, as the local result for edge (3, a, 5) in Gsrc , edge (Code 2, d, Code 1) with input node Code 2 and output node Code 1 is generated because the subexpression {d : &1 }2 is used due to $l = a. The function composeprec glues all pairs of an edge and a local result after adding RecN or RecE to their nodes. For example, regarding a pair of edge ζ = (3, a, 5) and its local result containing edge (Code 2, d, Code 1), the set ERecE contains edge (RecE 7 (Code 2) ζ, d, RecE 7 (Code 1) ζ) where 7 is the code position of the concerned rec, while set ERecN contains edge (RecN 7 3 &, ε, RecE 7 (Code 2) ζ) and (RecE 7 (Code 1) ζ, ε, RecN 7 5 &) due to (&, Code 2) ∈ I and (Code 1, &) ∈ O. The former corresponds to the edge from S35 to E35 of Gview and the latter corresponds to two edges from 3 to S35 and from E35 to 5 of Gview . In this example, Eε is an empty set since Gsrc has no ε-edges. The sets IRecN and ORecN of input and output nodes are obtained with I = {(&, 1)} and O = ∅, respectively, which are those of Gsrc . Hence, IRecN = {(&.&, RecN 7 1 &)} and ORecN = ∅ because M = inMarker(eb ) ∪ outMarker(eb ) = {&}. Here, “.” denotes Skolem function (Buneman et al. 2000) that satisfies (&x.&y).&z = &x.(&y.&z) (associativity) and &.&x = &x.& = &x (left and right identity).

4.2 Enriched Forward Semantics Traceable views can be computed by a simple extension of the original forward semantics of UnCAL so that tracing information is recorded when a node is created. Let ep denote an UnCAL subexpression e at code position p. We write ρ($x ) for G when ($x 7→ G) ∈ ρ. ρ is naturally used as variable substitution in UnCAL expressions, e.g., eρ for an expression e. We inductively define the enriched forward semantics F [[ep ]]ρ for each UnCAL construct of e. Graph Constructor Expressions. The semantics of graph constructor expressions is straightforward according to the construction in Figure 2. For instance, we have F[[{}p ]]ρ = ({Code p}, ∅, {(&, Code p)}, ∅), which creates a graph having a single node with the trace ID of Code p (indicating the node is constructed by the code at position p), no edges, an input node (the single node itself), and no output nodes. As another example, the semantics for the expression e1 ∪e2 is defined below to unify two graphs by connecting their input nodes with matching markers using ε-edges: F[[(e1 ∪ e2 )p ]]ρ = F[[e1 ]]ρ ∪p F [[e2 ]]ρ, where ∪p is a union operator for two graphs concerning position p and is defined by G1 ∪p G2 = (V ∪ V1 ∪ V2 , E ∪ E1 ∪ E2 , I, O1 ∪ O2 ) where (V1 , E1 , I1 , O1 ) = G1 (V2 , E2 , I2 , O2 ) = G2 M = inMarker(G1 ) = inMarker(G2 ) V = {Code p &m | &m ∈ M } E = {(Code p &m, ε, v) | (&m, v) ∈ I1 ∪ I2 } I = {(&m, Code p &m) | &m ∈ M }.

!"# More concretely, if the source graph is s = '&%$ 1 a2d xc(s) gives the graph '&!RecN 7 1 &"#%$

d

which is bisimilar to the graph ◦

b

Variable. A variable looks up its binding from environment ρ. F [[($v )p ]]ρ = ρ($v ) Condition.

The forward semantics of a condition is defined as F [[(if l1= l2 then e1 else e2 )p ]]ρ F [[e1 ]]ρ if l1 ρ = l2 ρ = F [[e2 ]]ρ otherwise.

5.

a

' '&%$ !"#

72,

, '&!RecE %$ 7 (Code 2) (1, a, 2)"# d '&!RecE 7 (Code 1) (1, a, 2)%$"#

'&!RecE 7 (Code 6) (1, b, 2)%$"# b '&!RecE 7 (Code 5) (1, b, 2)%$"# '&!RecN 7 2 &"#%$r

We omit definitions for other constructor expressions.

b

$:•.

Backward Evaluation of UnCAL

With traceable views and the ε-marker preserving property (Section 3) on the modification of such views, backward evaluation (in Stage 1) turns out to be simpler for two reasons.

It first evaluates the conditional expression l1 = l2 , and with the result it evaluates either the then branch or the else branch.

• First, the graph constructors become invertible. For instance, if

G = G1 ∪ G2 , G is modified to G0 , but the modification is ε-marker preserving; then, we can follow tracing information, ε-edges, and marker information to uniquely decompose G0 to G01 and G02 such that G01 ∪ G02 ≡ G0 holds.¶ We will write this

Structural Recursion. The semantics of a structural recursion is given by bulk semantics as reviewed in Section 2.3, which can be formally defined by F [[(rec(λ($l , $g). eb )(ea ))p ]]ρ = composeprec (fwd eachedge(Ga , ρ, eb ), Ga , M ) where M = inMarker(eb ) ∪ outMarker(eb ) Ga = F [[ea ]]ρ,

¶G 1(V1 ,E1 ,I1 ,O1 ) ≡ G2(V2 ,E2 ,I2 ,O2 ) , the exact equivalence of two graphs, is defined by V1 = V2 ∧ E1 = E2 ∧ I1 = I2 ∧ O1 = O2 .

210

fwd eachedge(G(

,E, , ) , ρ, e)

=

˛ n o ˛ (ζ, F [[e]]ρζ ) ˛ ζ ∈ E, label(ζ) 6= ε, ρζ = ρ ∪ {$l 7→ label(ζ), $g 7→ subgraph(G, ζ)}

composeprec (G, (V, E, I, O), M ) = (VRecE ∪ VRecN , ERecE ∪ ERecN ∪ Eε , IRecN , ORecN ) where VRecE = {RecE p v ζ | (ζ, (Vζ , , , )) ∈ G, v ∈ Vζ } ERecE = {(RecE p u ζ, l, RecE p v ζ) | (ζ, ( , Eζ , , )) ∈ G, (u, l, v) ∈ Eζ } VRecN = {RecN p v &m | v ∈ V, &m ∈ M } ERecN = {(RecN p v &m, ε, RecE p u ζ) | &m ∈ M, (ζ = (v, , ), ( , , Iζ , )) ∈ G, (&m, u) ∈ Iζ } ∪ {(RecE p u ζ, ε, RecN p v &m) | &m ∈ M, (ζ = ( , , v), ( , , , Oζ )) ∈ G, (u, &m) ∈ Oζ } Eε = {(RecN p v &m, ε, RecN p u &m) | (v, ε, u) ∈ E, &m ∈ M } IRecN = {(&n.&m, RecN p v &m) | (&n, v) ∈ I, &m ∈ M } ORecN = {(RecN p v &m, &n.&m) | (v, &n) ∈ O, &m ∈ M }

Figure 6. Core of Forward Semantics of rec at Code Position p decomposition as decompG1 ∪G2 k , and applying it to G0 will give (G01 , G02 ).

(G1 = G2 ), then the binding on the left is adopted, and vice versa. If both are updated to different values, it fails, leading to the failure of the entire backward evaluation. Label variable bindings are treated similarly. We have omited the definitions for other constructor expressions, which can be defined similarly.

• Second, backward evaluation of a structural recursion rec(e) is

reduced to that of its body e (followed by result gluing), because of the bulk semantics of structural recursion. Backward evaluation greatly depends on what updates are allowed on the view. We allow the following three general updates on our edge-labeled graphs: (1) in-place updates as modification of edge labels, (2) deletion of edges, and (3) insertion of edges or a subgraph rooted at a node. And we accept a sequence of these updates on the view and reflect them to the source. In the rest of this section, we shall explain the respective backward evaluation for these updates on views. 5.1

Variable. A variable simply updates its binding as B[[$v ]](ρ, G0 ) = ρ[$v ← G0 ]. Here, ρ[$v ← G0 ] is an abbreviation for (ρ \ {$v 7→ }) ∪ {$v 7→ G0 }. Condition.

B[[if l1 8 = l2 then e1 else e2 ]](ρ, G0 ) < ρ01 if l1 ρ = l2 ρ ∧ l1 ρ01 = l2 ρ01 0 ρ if l1 ρ 6= l2 ρ ∧ l1 ρ02 6= l2 ρ02 = : F2AIL otherwise where ρ01 = B[[e1 ]](ρ, G0 ) ρ02 = B[[e2 ]](ρ, G0 ),

Reflection of In-place Updates

In this section, we formally define backward semantics for UnCAL, where only in-place updates are considered. Recall that backward semantics B[[e]](ρ, G0 ) is used to compute a new environment from the original input environment ρ and the modified view G0 . Like forward semantics, backward semantics can be defined inductively over the construction of expression. 5.1.1

which is reduced to the backward evaluation of e1 if l1 = l2 holds, and to the backward evaluation of e2 otherwise. To guarantee well-behavedness, we ensure that l1 = l2 does not change after backward evaluation.

Backward Evaluation of Simple Expressions

Graph Constructor Expressions. Since each constructor is revertible and is associated with a decomposition function, we can decompose the views of constructor expressions so as to define the backward semantics inductively. For example, we have

5.1.2

Backward Evaluation of Structural Recursion

Due to the traceable bulk forward evaluation of structural recursion rec and the ε-marker preserving property that retains similarity in shape between input and output graphs, backward semantics can easily be defined as

B[[(e1 ∪ e2 )p ]](ρ, G0 ) = B[[e1 ]](ρ, G01 ) ]ρ B[[e2 ]](ρ, G02 ) where G1 = F [[e1 ]]ρ G2 = F [[e2 ]]ρ (G01 , G02 ) = decompG1 ∪G2 (G0 )

B[[rec(λ($l , $g). eb )(ea )]](ρ, G0 ) = merge(ρ, ea , Ea , bwd eachedge(Ga , ρ, eb , decomprec (G0 , Ea ))) where Ga = ( , Ea , , ) = F [[ea ]]ρ

Unlike Foster et al. (2005), we have variable binding, and therefore multiple environments produced by backward evaluation of the operands are merged by ]ρ defined below, using an approach similar to that in Liu et al. (2007), which deals with variable bindings. (ρ1 ]8 ρ ρ2 ) ˛ ˛ < ˛ = ($v 7→ mg(G, G1 , G2 ) ˛˛ : ˛ 8 < where mg(G, G1 , G2 ) = :

The backward evaluation of a condition is defined by

This definition is easy to understand if we note duality with the definition of its forward semantics. Backward semantics first decomposes through decomprec the modified result graph G0 into pieces of graphs, which is intuitively an inverse operation of composerec . For every non-ε edge ζ ∈ Ea in the source argument graph, the decomposition extracts (possibly modified) subpart G0ζ of G0 , which originates at the result Gζ of the forward computation on the edge. Then, in bwd eachedge, we carry out backward computation of the body expression eb on each edge and compute the updated environment ρ0ζ . Finally, these environments are merged into the updated environment ρ0 of the whole expression. The merge function does two pieces of work. First, by combining the information ρ0ζ ($l ) and ρ0ζ ($g) from the updated environments (and ε-edges existing in the edges Ea of the source argument graph), it computes the modified argument graph G0a . Then, we inductively carry out backward

9 ($v 7→ G1 ) ∈ ρ1 , = ($v 7→ G) ∈ ρ, ($v 7→ G2 ) ∈ ρ2 ; G1 if G2 = G ∨ G1 = G2 G2 if G1 = G FAIL otherwise

]ρ unifies each binding by mg. If only the binding on the left hand side is modified (G2 = G), or both are consistently updated k It

would be more precise to write it as decompG1 ,∪,G2 in that the decomposition depends on three arguments.

211

8 > > > > <

˛ 9 ˛ ζ ∈ Ea , label(ζ) 6= ε, > ˛ 0 > 0 > ˛ Vζ = {w | (RecE p w ζ) ∈ V }, > = ˛ 0 0 decomprec ((V 0 , E 0 , I 0 , O 0 ), Ea ) = (ζ, (Vζ0 , Eζ0 , Iζ0 , Oζ0 )) ˛˛ Eζ = {(w1 , l, w2 ) | (RecE p w1 ζ, l, RecE p w2 ζ) ∈ E }, > > > ˛ Iζ0 = {(&m, w) | (RecN p v &m, ε, RecE p w ζ) ∈ E 0 }, > > > > ˛ 0 > : ; ˛ O = {(w, &m) | (RecE p w ζ, ε, RecN p v &m) ∈ E 0 } ζ ˛ n o ˛ bwd eachedge(G, ρ, e, G 0 ) = (ζ, B[[e]](ρζ , G0ζ )) ˛ (ζ, G0ζ ) ∈ G 0 , ρζ = ρ ∪ {$l 7→ label(ζ), $g 7→ subgraph(G, ζ)} ˛ o Un 0 ˛ merge(ρ, ea , Ea , R) = B[[ea ]](ρ, G0a ) ]ρ ρζ \ {$l 7→ } \ {$g 7→ } ˛ (ζ, ρ0ζ ) ∈ R “S ” S where G0a = Vζ00 , Eeps ∪ Eζ00 , Ia , Oa Eeps =“ {(u, ε, v) | (u, ε, v) ∈ Ea } ” (Vζ00 , Eζ00 ) = Vζ0 ∪ {u}, Eζ0 ∪ {(u, ρ0ζ ($l), Iζ0 (&))} for each (ζ, ρ0ζ ) ∈ R, letting (u, , ) = ζ and (Vζ0 , Eζ0 , Iζ0 , Oζ0 ) = ρ0ζ ($g)

Figure 7. Core of Backward Semantics of rec at Code Position p In general, for a labeled edge ζ = (u, l, v) with l 6= ε, its corresponding edge corr(ζ) is defined as:

evaluation on the argument expression ea to obtain another updated environment ρ0a . This ρ0a and all ρ0ζ s are merged into ρ0 . Let us explain in more detail the definition of decomprec , which is the key point of the backward evaluation. The function first extracts from result graph G0 nodes Vζ0 and edges Eζ0 that belong to each edge ζ by matching trace ID RecE p ζ. Note that if there are nodes that have been freshly inserted into the view, we also require these nodes to have this structure, so that these nodes are also passed to the backward evaluation of the recursion body. Input and output nodes with marker &m are recovered by selecting those pointed from/to “hub” nodes having structure RecN &m. Top-level constructors of trace ID are erased so that we can inductively compute the backward image from the body expression.

corr((u, l, v)) = (u, l, v) corr((RecE p u ζ 0 , l, RecE p v ζ 0 )) corr((u, l, v)) = corr(ζ 0 ) corr(ζ) = FAIL

'

!"# !"# source is s = '&%$ 1 a 7 '&%$ 2 , and a2d xc(s) gives the graph G. If the graph G is modified to G0 where the edge label b is updated to X, then B[[a2d xc]]({$db 7→ s}, G0 ) returns binding {$db 7→ s0 } X

if corr((u, l, v)) 6= FAIL if corr((u, l, v)) = FAIL otherwise.

Here, FAIL means failure on finding the corresponding edge. The first case means that if the edge ζ is a copy of an edge in the source, then ζ itself is the corresponding edge. The second and the third cases are for when ζ is a result of some structural recursion. According to the forward semantics of rec in Figure 6, the non-ε edge ζ must have the form (RecE p u ζ 0 , l, RecE p v ζ 0 ) for some p, u, v, and another non-ε edge ζ 0 . This means that ζ consists of an edge (u, l, v) originating from an evaluation of a recursion-body at ζ 0 . Hence, for this case, we first recursively trace the corresponding source of (u, l, v), and if this fails, then try that of ζ 0 . In other cases, corr fails to find the corresponding source, because it must be the case that u has a trace ID of the form Code , meaning that the edge is not derived from the source but from an UnCAL expression. Let $db be the source graph, Gview be the view produced by F[[e]]ρ from a forward computation of expression e with environment ρ, and G0view be a graph from Gview with a set of edges Dout = {ζ1 , . . . , ζn } removed. Our backward evaluation B[[e]](ρ, G0view ) consists of the following three steps.

Example 7. Recall the simple example in Example 3 where the b

if u, v ∈ SrcID

'

!"# !"# where s0 = '&%$ 1 a 7 '&%$ 2 . Therefore, the in-place update of the change on the view graph is reflected to the source. Lemma 2 (Well-behavedness for In-place Updates). If output graphs are modified by in-place updates on edges, then for any expression e, the two evaluations F[[e]] and B[[e]]( , ) form a well-behaved bidirectional transformation, if they succeed.

1. Compute the set of source edges Din = {corr(ζi ) | ζi is not an ε-edge}.

Proof. This statement can be proved by induction on the structure of e. For the base case where e is a variable, it clearly holds. Considering the inductive case, (1) if e is a constructor expression, it holds because each constructor is revertible within our context, (2) if e is a condition, its backward evaluation is reduced to that on either its true branch or its false branch, so the statement holds by induction, and (3) if e is a structural recursion, by bulk semantics, its backward computation is reduced to its body expression, so the statement holds by induction. 5.2

2. If FAIL ∈ Din , backward evaluation fails. If it is obtained successfully without failure, compute G0src = ρ($db) − Din , where G − E denotes removal of the edges in the set E from graph G. 3. Return ρ0 = ρ[$db ← G0src ] as the result if F [[e]]ρ0 = G0view , and fail otherwise. Lemma 3 (Well-behavedness for Deletion). If output graphs are modified by edge deletion, then for any expression e, the two evaluations F[[e]] and B[[e]]( , ) form a well-behaved bidirectional transformation, if they succeed.

Reflection of Deletion

Deletion in a view is reflected as deletion of the corresponding part in the source by using trace IDs. Suppose we want to delete the edge labeled d in the view of Example 7. Since both endpoints of the edge have trace IDs of the form RecE 7 (1, a, 2), we can see that the selected edge has been generated due to the existence of the source edge (1, a, 2), which is the “corresponding part” to be deleted in the source.

Proof. The (G ET P UT) property is clear because of the fact that Din = ∅ if Dout = ∅. For the (WP UT G ET) property, it holds because the third step actually does this check.

212

5.3

Proof. First, the (G ET P UT) property clearly holds because ρ is returned when no insertions occur. Next, we prove the (WP UT G ET) property by using the following calculation.

Reflection of Insertion

Reflection of insertion is much more complicated than that of inplace-updating and deletion. This is because there are no corresponding edges in the source for the freshly inserted edges in the view, which requires us not only to create new information but also to add it to a proper location in the source graph. Our idea was to use the Universal Resolving Algorithm (URA) (Abramov and Gl¨uck 2002), a powerful method of inversion computation, to derive a right inverse of the forward evaluation, and use the distributive property of structural recursion

F [[rec(e)($db)]]ρ02 = = =

rec(e)($g1 ∪ $g2 ) = rec(e)($g1 ) ∪ rec(e)($g2 )

=

to properly reflect insertion to the source. In this section, we shall give our algorithm for this reflection, before we highlight how URA can be used to derive the right inverse.

= =

{ partial application }

F [[rec(e)(ρ02 ($db))]]ρ02 { def. of ρ02 }

F [[rec(e)(Gsrc ∪ ρ01 ($db))]]ρ02

{ structural recursion property }

F [[rec(e)(Gsrc ) ∪ rec(e)(ρ01 ($db)))]]ρ02 { forward evaluation }

F [[rec(e)(Gsrc )]]ρ02 ∪ F [[rec(e)(ρ01 ($db)))]]ρ02 { e does not contain free variable }

Gview ∪ F [[rec(e)($db)]]ρ01 { right inversion }

Gview ∪ G0

It is worth noting that we have simplified our discussion in both the above algorithm and lemma by making it a requirement that e in rec(e)($db) does not contain any free variables. With this requirement, our forward and backward evaluation satisfies the stronger (P UT G ET) property. In fact, it is acceptable to relax this condition by allowing e to contain other free variables and the initial ρ contains binding of other variables. Then, right inversion will produce ρ01 that will be used to update all variable bindings in addition to $db. In this case, F[[rec(e)(Gsrc )]]ρ01 may produce a graph that is different from the original view Gview . In any case, this different graph will not have an additional effect on the source when we apply backward evaluation to this new graph. Therefore, (WP UT G ET) always holds. With this idea, we shall propose an algorithm in which (P UT G ET) property is satisfied without any additional requirements. The idea is to utilize the Trace ID information, as will be discussed later.

5.3.1 Insertion Reflection with Right Inverse We assume the monotonicity of insertion in that an insertion on the view is translated to an insertion on the source rather than other updating operations. The monotonicity comes from the absence of isEmpty (Buneman et al. 2000) in our core UnCAL. We only consider insertion on the view graph produced by forward computation of a variable expression or a structural recursion. For the case of a variable, this reflection is done in the same way as in Section 5.1.1. Insertion for structural recursion, the basic computation unit in UnCAL, needs to be carefully designed. In the following, we will focus on structural recursion, omitting other cases for simplicity. Before giving our reflection algorithm, we should clarify the meaning of right inverse. In general, a function h is said to be a right inverse of f if for any x in the range of f , f (h(x)) = x holds. Within our context, for an expression e and a graph G, F ◦ [[e]](G) is said to be a right inverse computation if it returns ρ0 such that F [[e]]ρ0 = G. Now, we will return to our reflection algorithm. Let Gsrc be the source graph, Gview = F [[rec(e)($db)]]ρ, where ρ = {$db 7→ Gsrc }, and G0view be a graph from G with new edges inserted. Notice that it is sufficient to consider $db as the argument of rec, because $db can be bound to other expression. Our backward evaluation B[[rec(e)($db)]](ρ, G0view ) returns ρ as the result if there are no new edges inserted in Gview ; otherwise, it does the following:

5.3.2

Improving Insertion Reflection

The method above satisfies the (P UT G ET) property only if the variables of e are disjoint from the variables bound in the initial environment ρ. However, in general, since a transformation may have multiple variable references, more effort is required to achieve the (P UT G ET) property. We tackle the problem by first locating where we insert a graph by using trace IDs, and then applying the URA algorithm (to be described later) to find what graph should be inserted. Consider the transformation a2d xc and the view in Example 6. Suppose we want to insert a graph Gvins rooted at the view node v = RecN 7 2 &. Where should some graph be inserted into the source to reflect this insertion? The answer is that we must insert a graph rooted at the source node 2 because there would be no edge from v in the view unless there were an edge from 2 in the source according to the bulk semantics of structural recursion. Now, our next task is to find what graph should be inserted under the source node 2. That is, we hope to find Gsins such that the following holds. ! b

1. Extract the inserted subgraph G0 from G0view such that G0view = Gview ∪ G0 . 2. Compute with right inverse computation: ρ01 = F ◦ [[rec(e)($db)]](G0 ). 3. Return ρ02 = {$db 7→ Gsrc ∪ ρ01 ($db)} as the result. The first step of extraction is possible provided that insertion happens at the root node∗∗ . The second step of right inverse computation will be explained in Section 5.3.3. The last step is to update the binding of $db and return this environment as our result. The following lemma shows the correctness of the algorithm.

!"# a2d xc '&%$ 1

a

' !"# 2 7 '&%$

'&!RecN 7 1 &"#%$

Lemma 4 (Well-behavedness for Insertion). If output graphs are modified by edge insertion, then for a structural recursion of the form rec(e)($db) where e contains no free variables, then two evaluations F[[e]] and B[[e]]( , ) form a well-behaved bidirectional transformation, if they succeed.

'&!RecE 7 (Code 4) (1, b, 2)%$"# b = '&!RecE 7 (Code 3) (1, b, 2)%$"# '&!RecN 7 2 &"#%$s

∗∗ Insertions

to non-root positions are possible due to bulk semantics that allows similar treatment for every node.

213

/ Gsins

+ '&!RecE %$ 7 (Code 2) (1, a, 2)"# d '&!RecE 7 (Code 1) (1, a, 2)%$"# /

Gvins

tr(SrcID) = SrcID tr(RecN tid ) = tr(tid ) tr(Code ) = FAIL tr(RecE tid ) = tr(tid )

computations of an expression, and to find a computation path that produced the result. Our right inverse computation consists of three steps.

Figure 8. Tracing Node ID

1. It lazily enumerates possible evaluation paths by symbolic computation called needed narrowing (Antoy et al. 1994)†† . URA can help us to find such Gsins for Gvins . For example, if Gvins is {b : {}}, then URA returns Gsins = {b : {}}. If Gvins is {d : {}}, then URA returns one of the possibilities, Gsins = {a : {}} or Gsins = {d : {}}, depending on the search method used in URA. According to the soundness and the completeness of URA, the reflection by URA is always correct in the sense that (P UT G ET) holds, and moreover URA always returns a Gsins if such Gsins exists. Of these, soundness is the key to insertion reflection satisfying (P UT G ET) for general UnCAL transformations. In summary, our insertion-reflection algorithm is as follows.

2. From the generated evaluation paths, it constructs a table of input/output pairs of computations. 3. If there is a pair in the table whose output is Gview , it generates a substitution (environment) from the path and returns it as the result. Example 8. As a simple example, let us see how we find ρ such that F [[a2d xc($x )]]ρ = Gview where Gview = {d : {}}. We search ρ by symbolic evaluation of a2d xc($x ). To evaluate a2d xc($x ), we unfold $x and recursively evaluate a2d xc, i.e., a structural recursion. There are many ways to instantiate $x such as

1. Let v be a node under which we want to insert a graph Gvins . 2. By using the tr function in Figure 8, we find the source node u = tr(v) under which we insert a graph to reflect the insertion.

$x 7→ {}, $x 7→ {$l1 : $x1 }, $x 7→ {$l1 : $x1 , $l2 : $x2 }.

3. Let G0view be a graph obtained from the view by adding ε-edge from v to Gvins .

If we choose $x 7→ {}, the computation finishes, yielding a table consisting of an input/output pair ({}, {}). Since this table does not contain a pair whose output is Gview , we continue searching. Assume that we choose $x 7→ {$l1 : $x1 }. Then a2d xc($x ) is unfolded to (if $l1 = a then {d : &} else (if $l1 = c then {ε : &} else {$l1 : &})) @ a2d xc($x1 ). As evaluation gets stuck here because of a free variable $l1 in the if condition, we find a suitable $l1 to resume the evaluation. If we choose $l1 7→ a, then the expression is reduced to {d : &} @ a2d xc($x1 ) and input/output pair ({a : {}}, {d : {}}) is obtained by choosing $x1 7→ {}. Since Gview = {d : {}}, we gather all bindings along this computation and return the following environment as the result.

4. We find a graph Gsins connected from u by an ε-edge, by applying URA for G0view . 5. We return a graph G0src obtained from the source by adding an ε-edge from u to Gsins . The soundness of the insertion-reflection algorithm is directly derived from the soundness of URA. Lemma 5 (Soundness of Insertion). Our insertion-reflection algorithm satisfies (P UT G ET). Note that we use URA for G0view instead of Gvins . Thus, URA rejects any insertion of Gsins that violates (P UT G ET). In addition, our insertion-reflection algorithm is complete in the sense that, if there exist some source insertions to reflect the view insertion under some conditions, the algorithm will find one of them.

{$x 7→ {a : {}}} Figure 9 shows part of a perfect process tree in our right-inverse computation: the left is the tree and the right is a table of a pair of input/output graph templates (it is more general than a pair of input/output graph instances, as we discussed above). Note this tree is a variant of SLD-resolution trees (Gl¨uck and Sørensen 1994).

Lemma 6 (Completeness of Insertion). Let v be a node such that tr(v) 6= FAIL. For any source graph G, we can insert any graph into its view if there exists a source insertion that reflects the view insertion and v still occurs in the view of the insertion-reflected source.

To use URA effectively for our right inverse computation of UnCAL, we define a small-step semantics for UnCAL such that a perfect process tree can be constructed though these small steps. The only non-standard feature of this semantics is that we use memoization to avoid infinite loops probably caused by cycles in the source graph (See Appendix of (Hidaka et al. 2010) for details). In addition, we provide a Dijkstra-searching strategy to enumerate all the possible evaluation paths so that a solution can always be found if one exists. The two heuristics we use to design the cost function are:

Recall that we only consider insertion on the view graph produced by forward computation of a variable expression or a structural recursion, which is expressed by tr(v) 6= FAIL. This lemma can be proved using the property of trace IDs stating that, to insert a graph rooted at view node v, we must insert a graph rooted at source node tr(v). By induction on the trace ID of v, we can show that, if there is an edge from v, it must be the case that there is an edge from tr(v), which is implied by the property of trace IDs. Note that Gvins has no edge to the original view. However, this is not a restriction since if there is a crossing edge pointing to a subgraph of the original view, we can duplicate the subgraph and integrate it to Gvins so that the edge can be eliminated. 5.3.3

• We use a (weighted) size of graphs (to be inserted into the

source) as a cost function in the Dijkstra-search. • For the weighted size, the depth (the length of the path) has

more weight than the width (the number of paths). This strategy works nicely for consecutive in Example 4.

Right Inverse Computation by URA

Moreover, we show that a suitable binding to continue evaluation of conditional expressions can easily be found for our core UnCAL, because the conditional part of a conditional expression is in the simple form of l1 = l2 .

Recall that the right inverse computation of an expression e is to take a graph Gview and return a ρ such that F [[e]]ρ = Gview . We adopt the universal resolving algorithm (URA) (Abramov and Gl¨uck 2002), a powerful and general inversion mechanism, to compute ρ. The basic idea behind URA is to search on a perfect process tree (Gl¨uck and Klimov 1993), which represents all possible

†† The

same notion is called driving (Gl¨uck and Klimov 1993; Gl¨uck and Sørensen 1994) in (Abramov and Gl¨uck 2002).

214

a2d xc($x )

{$x 7→ •}

•

$l1 6= a

$l1 7→ a

• {$x1 7→ •} {$x 7→ $l ′ } 1 ′ 1 . $x1

..

•

···

if $l1 = c . . . , $l1 6= a

// @ a2d xc($x1 ) (• &)

• d •

.. .

if $l1 = a . . .

d

-

• $l2 } {$x 7→ $l1 $x1 $x2

• {$x 7→ $l1 } $x1

$l1 7→ c ··· a2d xc($x1 )

• {$x1 7→ •} {$x 7→ $l ′ } 1 . $x1′ 1

$l1 6= c $l1

···

..

• $l • 1

(• //&) @ a2d xc($x1 ), $l1 6= a, c • {$x1 7→ •} {$x 7→ $l ′ } ··· 1 .. $x1′ 1 .

Input/Output Pairs Gsrc Gview φ • • > •a •d > • • •c • > • • $l • $l • 1 • 1 $l1 6= a, c .. .. .. . . . a •2 a d •22 d > • • • • .. .. .. . . .

Figure 9. URA for a2d xc and Enumerated Input/Output Pairs with Constraints (nodes without branching have been contracted)

6.

Implementation and Experiments

in Customer2Order composed with selection, 30% and 65% reductions for simpler examples that appeared in the evaluation for unidirectional transformation (Hidaka et al. 2009). These experiments are for in-place updates, but similar reduction could be achieved for other updates.

The prototype system has been implemented and is available on our BiG project Website. In addition to all the examples in Buneman et al. (2000) and in this paper, we have tested three non-trivial examples, demonstrating its usefulness in software engineering and database management. • Customer2Order: A case study in the textbook on model-driven

7.

software development (Pastor and Molina 2007).

Related Work

Bidirectional transformation has been discussed as view updating problem in the database community. Bancilhon and Spyratos (1981) proposed a general approach to the view updating problem. They introduced an elegant solution based on the concept of a constant complement view that captures the information in the view but not in the original database. Their idea was not only applied to relational databases (Hegner 1990; Lechtenb¨orger and Vossen 2003) but also to tree structures (Matsuda et al. 2007). Constant complement views satisfy very strong bidirectional properties at the sacrifice of the number of reflectable updates. Although such strong properties are nice for some applications (Hegner 1990), they are too strong for our purpose, i.e., model transformation in software engineering. Recent work by Fegaras (2010) propagates updates on XML views created from relational databases. It supports duplicates but detects view side effects at both compile and run time. In the area of programming languages, view updating has been studied as bidirectional transformation. Foster et al. (2005) proposed the first linguistic approach to solving this problem. They developed some domain specific languages to support the development of bidirectional transformation on strings and trees. Bohannon et al. (2006) applied these techniques to relational databases, making use of functional dependencies in relations to correctly propagate updates. However, their approach is limited to strings, trees and relations, and is difficult to apply to graph transformation due to graph-specific features such as circularity and sharing. Within the context of software engineering, there has been several works on bidirectional model (graph) transformation (Ehrig et al. 2005; Jouault and Kurtev 2005; OMG 2005; Sch¨urr and Klar 2008; Stevens 2007), which can deal with kinds of graph structures. However, they lack a clear formal bidirectional semantics and there has not yet been any powerful method of bidirectionalization that can be used to automatically derive backward model transformations from forward model transformations, so that both transformations can form a consistent bidirectional model transformation. The concept of structural recursion is not new and has been studied in both the database (Breazu-Tannen et al. 1991) and the functional programming communities (Sheard and Fegaras 1993). However, most of these have focused on structural recursion over lists or trees instead of graphs. Examples include the higher order function fold (Sheard and Fegaras 1993) in ML and Haskell, and the generic computation pattern called catamorphism in programming algebras (Bird and de Moor 1996). UnCAL (Buneman et al. 2000)

• PIM2PSM: A typical example of transforming a platform inde-

pendent object model to a platform specific object model. • Class2RDB; A non-trivial benchmark application for testing the

power of model transformation languages (Bezivin et al. 2005). All of these have demonstrated the effectiveness of our approach in practical applications. In our implementation, we carefully treat ε-edges introduced during operations related to markers, and retrieval of edges or nodes of interest, which greatly affect performance. Poor treatment would hinder large-scale UnQL queries to evaluate in bidirectional mode‡‡ in a reasonable amount of time. Speed-up of several orders of magnitude has been achieved since our initial implementation due to the above and the following optimizations. Reduction in number of ε-edges As mentioned in the UnQL paper (Buneman et al. 2000), ε-edges are generously generated during evaluation, especially in rec. This slows the evaluation process due to the increase in input size. Removing ε-edges during evaluation has no harm on forward semantics because of bisimulation equivalence. However, since ε-edges play an important role in backward evaluation, they are not freely omitted in our bidirectional settings. Moreover, a straightforward implementation of the removal algorithm (Buneman et al. 2000) may introduce additional edges, which may harm backward evaluation. Toward prudently removing ε-edges that are suitable for backward evaluation, our εremoval algorithm glues source and destination nodes of ε as long as bisimulation equivalence is not violated. Optimization by fusion transformation Note that the backward evaluation of rec(e1 )(rec(e2 )(e3 )), a composition of structural recursions, requires to generate intermediate result of backward transformation, which is very expensive. This can be avoided by fusing the two structural recursions into one. We have implemented this based on the fusion rule (Buneman et al. 2000): if e1 (l, G) does not depend on G then rec(e1 )(rec(e2 )(e3 )) = rec(rec(e1 ) ◦ e2 )(e3 ). With auxiliary rewriting rules such as e1 @ e2 = e1 for e1 that produces no output nodes, 30% and 50% of CPU time reductions are respectively achieved for forward and backward execution ‡‡ Note that we preserve every result of forward computation in the bidirec-

tional mode.

215

demonstrates that the idea of structural recursion can be extended to graphs, but the original focus was on the optimization of query fusion rather than bidirectionalization. Our work was greatly inspired by interesting work on efficient graph querying (Buneman et al. 2000; Sheng et al. 1999). Unlike trees, graphs involve subtle issues on their representation and equivalence. The use of bisimulation and structural recursion in (Buneman et al. 2000) opens a new way of building a framework for both declarative and efficient graph querying with high modularity and composability. This motivated us to extend the framework from graph querying to graph transformation and apply it to model transformation (Hidaka et al. 2009). This work is a further step in this direction to extend it from unidirectional model transformation to bidirectional model transformation.

8.

K. Czarnecki, J. N. Foster, Z. Hu, R. L¨ammel, A. Sch¨urr, and J. F. Terwilliger. Bidirectional transformations: A cross-discipline perspective. In ICMT 2009, pages 260–283, 2009. U. Dayal and P. A. Bernstein. On the correct translation of update operations on relational views. ACM Trans. Database Syst., 7(3):381–416, 1982. K. Ehrig, E. Guerra, J. de Lara, L. Lengyel, T. Levendovszky, U. Prange, G. Taentzer, D. Varr´o, and S. Varr´o-Gyapay. Model transformation by graph transformation: A comparative study. Presented at MTiP 2005. http://www.inf.mit.bme.hu/FTSRG/ Publications/varro/2005/mtip05.pdf, 2005. L. Fegaras. Propagating updates through xml views using lineage tracing. In ICDE 2010, pages 309–320, 2010. J. N. Foster, M. B. Greenwald, J. T. Moore, B. C. Pierce, and A. Schmitt. Combinators for bi-directional tree transformations: a linguistic approach to the view update problem. In POPL 2005, pages 233–246, 2005.

Concluding Remarks

R. Gl¨uck and A. V. Klimov. Occam’s razor in metacompuation: the notion of a perfect process tree. In WSA 1993, pages 112–123, 1993.

This paper reports our first attempt toward solving the challenging problem of bidirectional transformation on graphs. We show that structural recursion on graphs and its unique bulk semantics play an important role not only in query optimization, which has been recognized in the database community, but also in automatic derivation of backward evaluation, which has not been recognized thus far. As far as we are aware, the bidirectional semantics of UnCAL proposed in this paper is the first complete language-based framework for general graph transformations. Future work includes extending the framework from unordered graphs to ordered graphs, introducing graph schemas to provide structural information for more efficient bidirectional computation, an efficient algorithm for checking updatability, and more practical applications of the system for bidirectional model transformation in software engineering.

R. Gl¨uck and M. H. Sørensen. Partial deduction and driving are equivalent. In PLILP 1994, pages 165–181, 1994. G. Gottlob, P. Paolini, and R. Zicari. Properties and update semantics of consistent views. ACM Trans. Database Syst., 13(4):486–524, 1988. S. J. Hegner. Foundations of canonical update support for closed database views. In ICDT 1990, pages 422–436, 1990. S. Hidaka, Z. Hu, H. Kato, and K. Nakano. Towards a compositional approach to model transformation for software development. In SAC 2009, pages 468–475, 2009. S. Hidaka, Z. Hu, K. Inaba, H. Kato, K. Matsuda, and K. Nakano. Bidirectionalizing graph transformations. Technical Report GRACE-TR-201006, GRACE Center, National Institute of Informatics, July 2010. Z. Hu, S.-C. Mu, and M. Takeichi. A programmable editor for developing structured documents based on bidirectional transformations. HigherOrder and Symbolic Computation, 21(1-2):89–118, 2008.

Acknowledgments

F. Jouault and I. Kurtev. Transforming models with ATL. In MoDELS Satellite Events 2005, pages 128–138, 2005.

We thank Mary Fernandez who kindly provided us with the SML source codes of an UnQL system. We thank Fritz Henglein and James Cheney, and anonymous reviewers for their thorough comments on earlier versions of the paper. The research was supported in part by the Grand-Challenging Project on “Linguistic Foundation for Bidirectional Model Transformation” from the National Institute of Informatics, Grant-in-Aid for Scientific Research No. 22300012, No. 20500043, and No. 20700035.

R. L¨ammel. Coupled Software Transformations (Extended Abstract). In SET 2004, Nov. 2004. J. Lechtenb¨orger and G. Vossen. On the computation of relational view complements. ACM Trans. Database Syst., 28(2):175–208, 2003. D. Liu, Z. Hu, and M. Takeichi. Bidirectional interpretation of XQuery. In PEPM 2007, pages 21–30, 2007. K. Matsuda, Z. Hu, K. Nakano, M. Hamana, and M. Takeichi. Bidirectionalization transformation based on automatic derivation of view complement functions. In ICFP 2007, pages 47–58, 2007.

References

L. Meertens. Designing constraint maintainers for user interaction. http: //www.cwi.nl/˜lambert, June 1998. OMG. MOF QVT final adopted specification. http://www.omg.org/ docs/ptc/05-11-01.pdf, 2005. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In ICDE 1995, pages 251– 260, 1995. O. Pastor and J. C. Molina. Model-Driven Architecture in Practice: A Software Production Environment Based on Conceptual Modeling. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007.

S. M. Abramov and R. Gl¨uck. Principles of inverse computation and the universal resolving algorithm. In The Essence of Computation, pages 269–295, 2002. S. Antoy, R. Echahed, and M. Hanus. A needed narrowing strategy. In POPL 1994, pages 268–279, 1994. F. Bancilhon and N. Spyratos. Update semantics of relational views. ACM Trans. Database Syst., 6(4):557–575, 1981. J. Bezivin, B. Rumpe, and T. L. Sch¨urr A. Model transformation in practice workshop announcement. In MoDELS Satellite Events 2005, pages 120– 127, 2005.

A. Sch¨urr and F. Klar. 15 years of triple graph grammars. In ICGT ’08: Proceedings of the 4th international conference on Graph Transformations, pages 411–425. Springer-Verlag, 2008.

R. Bird and O. de Moor. Algebras of Programming. Prentice Hall, 1996. A. Bohannon, B. C. Pierce, and J. A. Vaughan. Relational lenses: a language for updatable views. In PODS 2006, pages 338–347, 2006.

T. Sheard and L. Fegaras. A fold for all seasons. In FPCA 1993, pages 233–242, Copenhagen, June 1993.

V. Breazu-Tannen, P. Buneman, and S. Naqvi. Structural recursion as a query language. In DBPL 1991, pages 9–19, 1991.

L. Sheng, Z. M. Ozsoyoglu, and G. Ozsoyoglu. A graph query language and its query processing. In ICDE 1999, pages 572–581, 1999.

P. Buneman, M. F. Fernandez, and D. Suciu. UnQL: a query language and algebra for semistructured data based on structural recursion. VLDB J., 9(1):76–110, 2000. J. Cheney, U. A. Acar, and A. Ahmed. abs/0812.0564, 2008.

Provenance traces.

P. Stevens. Bidirectional model transformations in QVT: Semantic issues and open questions. In MoDELS 2007, pages 1–15, 2007.

CoRR,

216

A Fresh Look at Programming with Names and Binders Nicolas Pouillard

François Pottier

INRIA {nicolas.pouillard,francois.pottier}@inria.fr

Abstract

3. “names with different scopes cannot be mixed”.

A wide range of computer programs, including compilers and theorem provers, manipulate data structures that involve names and binding. However, the design of programming idioms which allow performing these manipulations in a safe and natural style has, to a large extent, remained elusive. In this paper, we present a novel approach to the problem. Our proposal can be viewed either as a programming language design or as a library: in fact, it is currently implemented within Agda. It provides a safe and expressive means of programming with names and binders. It is abstract enough to support multiple concrete implementations: we present one in nominal style and one in de Bruijn style. We use logical relations to prove that “well-typed programs do not mix names with different scope”. We exhibit an adequate encoding of Pitts-style nominal terms into our system.

These slogans are not equivalent; we have listed them in increasing order of strength. A traditional representation of names as strings or de Bruijn indices satisfies none of these slogans. A system such as FreshML [22] satisfies only the first slogan. A strongly-typed representation of names as well-scoped de Bruijn indices satisfies the first two slogans, but, we argue (§4.5), not the third one. Finally, several systems in the literature [10, 12, 16–18, 21], as well as the one presented in this paper, satisfy all three. Our approach This paper describes a new way of addressing these challenges. We present an interface composed of a number of types and operations for declaring and manipulating data structures that involve names and binding. This interface can be viewed either as a library or as a programming language design. In support of the “library” point of view, we provide an implementation as a library within Agda. It could also be implemented within another variant of type theory, such as Coq. Our implementation exploits dependent types to express internal invariants and to guarantee that certain operations cannot fail. In support of the “language” point of view, our proposal could also be viewed as an extension of a standard calculus, such as System F ! , with new primitive types and operations. The types of our primitive operations do not involve dependency. As far as the programmer is concerned, our types and operations remain abstract. In particular, the nature of names is not revealed. As a result, multiple implementations of our interface are possible. We currently have two: one is based on atoms, in the style of FreshML, while the other is based on de Bruijn indices. In summary, we propose a novel approach to programming with names and binders. The semantics of the system is elementary: it rests upon a number of explicit, low-level primitive operations. No renaming, shifting, or substitution are built into the semantics. The programmer is offered an abstract view of names, independent of the chosen implementation scheme. One original feature of our proposal is that name abstraction is not primitive: it is built out of more elementary notions. This helps understand the essence of name abstraction, and increases the system’s expressiveness by allowing programmers to build custom forms of name abstractions.

Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features—Data types and structures; Polymorphism General Terms

Design, Languages, Theory

Keywords names, binders, meta-programming, name abstraction, higher-order abstract syntax

1.

Introduction

A wide range of computer programs, including compilers and theorem provers, manipulate and transform data structures that involve names and bindings. Significant effort has been invested in the design of programming idioms or languages that support these tasks in a safe and natural style. Nevertheless, a definitive solution is yet to be found. One challenge is to abstract away the details of any one particular implementation technique, such as atoms and permutations, or de Bruijn indices and shifting. A greater still challenge is to design a lightweight yet expressive static discipline to ensure that names are handled in a sound way. One must first ask: what does it mean to handle names in a sound way? The question is trickier than it seems. There are several informal slogans that attempt to describe what this means:

Overview of the paper In order to control the use of names, we introduce an abstract notion of world. The type system associates a world with each name, and allows two names to be compared for equality only if they inhabit a common world. Names, worlds, as well as a number of other types and operations, are introduced in §3. At the same time, the system is explained via examples of increasing complexity. In §4, we describe our two implementation schemes. In the nominal scheme, worlds are sets of atoms. In the de Bruijn-indexbased scheme, worlds are integer bounds. We then justify the soundness of our interface. We do this twice: once for each implementation scheme. In each case, we make novel use of logical relations in order to give richer meaning to worlds: we explain

1. “name abstractions cannot be violated”; or: “the representations of two -equivalent terms cannot be distinguished”; 2. “names do not escape their scope”;

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. Copyright c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00

217

system as an extension of a calculus that does have this distinction, such as System F ! under a type-erasure semantics. In that view, worlds, like types, can be erased at runtime.

how worlds can be viewed as bijections between names. In this setting, the fundamental theorem of logical relations corresponds to the three desired slogans. The slogans remain informal, though, because our system does not have a notion of “-equivalence”, or “scope”, to begin with. We do prove that nominal terms in the style of Pitts are adequately encoded in our system; this yields a formal version of slogan 1 with respect to Pitts’ notion of -equivalence. In our interface, a number of key primitive operations are provided only at names, and must be explicitly lifted (by the programmer) to user-defined data types. In §5, we show how to do this, and suggest that some of this boilerplate code can be automatically produced via generic programming. We conclude with an advanced example 6 and with discussions of related work (§7) and future work (§8).

2.

Names The type of names, Name, is indexed with a world. The idea is that two names can safely be compared only if they inhabit a common world. This is apparent in the type of the name equality test.

Name ?

_=Name_

8

( : World) ! Set g ! Name ! Name ! Bool

8 f

To witness the fact that no name inhabits the empty world, we introduce a function which produces a contradiction when applied to a name in the empty world. Its codomain is the empty type ?. Put differently, this function allows marking some cases as impossible, and instructs the system to statically check that they are indeed so.

A brief introduction to Agda notation

:

Throughout the paper, our definitions are presented in the syntax of Agda. In Agda, Set is the type of small types like Bool, Maybe (List Bool), or N. Set1 is the type of Set. The function space is written A ! B, while the dependent function space is written 8 (x : A) ! B. An implicit parameter, introduced via 8 f x : A g ! B, can be omitted at a call site if its value can be inferred from the context. There are shortcuts for introducing multiple arguments at once or for omitting a type annotation, as in 8 f A g f i j : A g x ! .... Existential quantification is available via the type constructor 9, which accepts a type function as its argument, as in 9 ! ( ! N) (N ! ). A data constructor name can be used in multiple data types. Agda makes use of type annotations to resolve ambiguities. As in Haskell, a definition consists of a type signature and a sequence of defining equations, which may involve pattern matching. The with construct extends a pattern-matching-based definition with new columns. An ellipsis ... is used to elide a redundant equation prefix. Agda is strict about whitespace: x+y is an identifier, whereas x + y is an application. This allows naming a variable after its type (deprived of any whitespace). We use mixfix declarations, such as __. We use some definitions from Agda’s standard library: operations over functors (_h$i_), monads (return, _ > >= _), and applicative functors (pure, _ _). For the sake of conciseness, the code fragments presented in the paper are sometimes not perfectly self-contained. However, a complete Agda development is available online [19].

nameø : Name ø ! ?

Weak links Let us go on to our next ingredient: a type for weak links between worlds.

_(_ : Rel World

If and are worlds, then ( is a type. Roughly speaking, a name x has type ( under two conditions: first, x inhabits the world ; second, the world that existed before x was introduced is . Put another way, ( is a more precise type for names. It keeps track of the worlds just before and just after a name is bound. We usually refer to as the “outer” world and to as the “inner” world. Weak links allow keeping track of connections between worlds: intuitively speaking, if x has type ( , then the worlds and assign the same meaning to every name other than x. The name x itself may have no meaning at all in , or it may have some meaning in and a different meaning in . Our weak links do not require x to be fresh for : they allow a new binding to shadow an earlier binding. Later on, we introduce strong links, which do imply a freshness condition. Since a weak link is just a more precise type for a name, we offer a way of converting the former into the latter.

nameOf( :

~

3.

: :

8 f

g ! ( ! Name

Example: representing -terms We now have enough elements to declare algebraic data types that involve names and binders. Let us begin with an explicitly typed object-language: the untyped calculus with local (let) definitions. data Tm (

Working with names and binders

V _ _ ¬ Let

We now present the signature (that is, the abstract types and operations) that our system offers to programmers. For the sake of presentation, we intersperse fragments of this signature (declarations) with examples of their use (code fragments).

: : : :

: World) : Set where (x : Name ) ! Tm 8 (t u : Tm ) ! Tm 8 f g (x : ( ) (t : Tm ) ! Tm 8 f g (x : ( ) (t : Tm ) (u : Tm ) ! Tm

8

The type constructor Tm is indexed with a world. The type Tm can be thought of as a type of terms whose free names inhabit the world . Accordingly, the constructor V carries a name that inhabits , and the constructor for applications carries two subterms that inhabit . The constructor ¬ shows how we build simple name abstractions. It carries a weak link (the name to be bound) between the outer world and some inner world . The body of the abstraction inhabits this inner world: it has type Tm . The abstraction itself inhabits the outer world: it has type Tm . Since does not occur in the latter type, it is really existentially quantified: viewed from the outside, an abstraction contains an unknown inner world. In Let, the sub-term t inhabits the outer world : thus, it is not in the scope of the bound name x. On the other hand, the subterm u inhabits the inner world : it is in the scope of x. It is easy to see how one would define LetRec.

Worlds We first introduce worlds, which names inhabit. There is an empty world, which no names inhabit. There are no other concrete worlds: most of the time, the programmer uses world variables , , , .

World : Set ø : World

We often use relations Rel, and in particular relations over worlds Rel World.

Rel : Set ! Set1 Rel A = 8 ( : A) ! Set Agda does not have a clear phase distinction, that is, a clear distinction between values and types. Nevertheless, one can also view our

218

The function rm applies export( x to every name y in the list and builds a list of only those that successfully cross the link x. It exhibits a typical way of using export( to perform a name comparison together with a type refinement. This idiom is recurrent in the programs that we have written. The function fv enjoys a free theorem, that is, a theorem that follows directly from its type: every name in the output list must occur free in the input term. This claim is backed up by the typed models in §4.2 and §4.4.

Using worlds, names, and weak links, it is possible to define a wide range of data structures with binders. The above encoding of -terms is but one instance of a general encoding of Pitts’ nominal terms and nominal signatures. We describe this general encoding and prove it adequate in §4.3. In fact, our system is more expressive than Pitts’: this is illustrated by several examples in this section. Here is a trivial example of a function that traverses a term and measures its size. It is remarkable for its simplicity: name abstractions are traversed without fuss. This unaltered induction also tells a bit about the expressiveness of such functions. It is also efficient: no renaming, substitution, or shifting is involved. Polymorphic recursion is exploited: the call to size t in the ¬ case is at some inner world.

Example: working with environments Here is another example, where we introduce the use of an environment.

occurs : 8 f g ! Name ! Tm ! Bool ? occurs x0 = occ ( y ! x0 =Name y) where OccEnv : World ! Set OccEnv = Name ! Bool extend : 8 f g ! ( ! OccEnv ! OccEnv extend x y = maybe false (export( x y) occ : 8 f g ! OccEnv ! Tm ! Bool occ (V y) = y occ (t u) = occ t _ occ u occ (¬ x t) = occ (extend x ) t occ (Let x t u) = occ t _ occ (extend x ) u

size : 8 f g ! Tm ! N size (V ) = 1 size (t u) = 1 + size t + size u size (¬ t) = 1 + size t size (Let t u) = 1 + size t + size u Exporting names When two worlds are connected via a weak link, it is desirable to be able to move names from one world into the other along the link. We introduce the function export( for this purpose.

export( :

8 f

g ! ( ! Name ! Maybe (Name )

The function occurs tests whether some name x occurs free in a term. An environment is carried down, augmented when a binder is crossed, and looked up at variables. Here, this environment is represented as a function of type Name ! Bool. Although this is a simple and elegant representation, others exist. For instance, we could represent the environment as a linked list of weak links: the code for this variant is online [19]; see also below. We claim that this code is standard and uncluttered. There is no hidden cost: no renaming is involved. Admittedly, linked lists are not the most efficient representation of environments. It would be nice to be able to implement environments using, say, balanced binary search trees, while preserving well-typedness. We leave this issue to future study. The type system forces us to use names in a sound way. For instance, in the definition of occ, forgetting to extend the environment when crossing a binder would cause a type error. In the definition of extend, attempting to check whether y occurs in without first comparing y and x would cause a type error. In our nominal implementation scheme (§4.1), it is permitted for newer bindings to shadow earlier ones; our type discipline guarantees that the code works also in that case. As suggested previously, one may wish to represent environments as an explicit data structure (a linked list of weak links) rather than as an opaque object (a lookup function). While there exists an appropriate abstraction in Agda’s standard library called Star, we define a custom data type. An environment is a chain of weak links. At runtime, it is just a list of names.

The function export( expects two names x and y, whose types are ( and Name . It compares (nameOf( x) and y for equality. If they are equal, export( fails: the name y has meaning in , but may not have meaning, or may have a different meaning, in . If they differ, export( succeeds and returns y at type Name : indeed, since y is not x, it has the same meaning in both worlds. export( is a partial function: it can fail. It is an injective function: if export( x y and export( x z are equal, then y and z are ? equal. Like _ =Name _, export( is a name equality test. However, it performs type refinement: in the event that the names differ, the input name is returned with a more precise type. The reader may wonder whether it is possible to move a name in the other direction, from the outer world into the inner world . The answer is negative: this would be unsound. (For a justification, see the discussion of dubious in §4.2.) Later on, we introduce a means of moving in this direction, namely world inclusion witnesses. Example: working with free and bound names We now have enough tools to present a more interesting example, namely a function that constructs a list of the free variables of a term. At variables and applications, the code is straightforward. At a name abstraction, one easily collects the free variables of the body via a recursive call. However, this yields a list of names that inhabit the inner world of the abstraction—a value of type List (Name ). This list cannot be returned, and this is fortunate, since doing so would let the bound variable leak out of its scope! We define an auxiliary function, rm, which removes all occurrences of a name in a list of names and at the same time performs type refinement in the style of export( .

data _?(_ (T

: _/_ :

fv : 8 f g ! Tm ! List (Name ) fv (V x) = [x] fv (fct arg) = fv fct ++ fv arg fv (¬ x t) = rm x (fv t) fv (Let x t u) = fv t ++ rm x (fv u) rm : 8 f g ! ( ! List (Name ) ! List (Name ) = [] rm [ ] rm x (y :: ys) with export( x y ... j just y = y :: rm x ys ... j nothing = rm x ys

: Rel World) : Rel World where g ! ?( 8 f g (x : ( ) ( : ?( ) ! ?( 8 f

The export( operation is extended to chains of weak links:

export?( : 8 f g ! ?( ! Name ! Maybe (Name ) export?( y = just y export?( (x / ) y = export( x y >>= export?( where open MaybeMonad

The type ?( is the type of an environment, or environment fragment, whose outer world is and whose inner world is . The expression export( y looks up the name y in the environment . The name y must make sense in the scope of , that is, y must

219

inhabit the world . If y is found among the bindings, then the information associated with y can be returned. (Here, there is no such information, so nothing is returned.) If y is not found among the bindings, then y is returned, with a more precise type: indeed, since y is not among the names introduced by , it makes sense outside , that is, in the world . We illustrate the use of chains of weak links with an alternative definition of the function fv. This variant avoids the need to take the bound atoms off the list by not inserting them in the first place. At variables, we use export?( to check whether the name is free or bound. At every other node, we simply carry out a recursive traversal. Whenever a name abstraction is entered, the current environment is extended with the bound name x.

fv' : 8 f g ! fv' (V x) = fv' (t u) = fv' (¬ x t) = fv' (Let x t u) =

stronger guarantee? The answer is: precisely because they are stronger than weak links, strong links are also more difficult to construct. In particular, strong links do not enjoy an analogue of the diagram _(-commute-_. Such a diagram would be unsound, because a name that is fresh for a smaller world is not necessarily fresh for a larger world. Yet, a commutative diagram in the style of _(-commute-_ plays a key role in the definition of generalized import operations (§5.3). This explains why we often use weak links in our term representations, such as Tm. Generating names The alert reader may have noticed that, up to this point, we have not yet introduced a way of producing names or links! To address this issue, we need a mechanism for producing fresh names. We find that it is sufficient to be able to produce strong links, since a strong link can degenerate into a weak link and into a name. We view a fresh name with respect to the world as a strong link into some unspecified next world , and define the following abbreviation:

?( ! Tm ! List (Name ) List.fromMaybe (export?( x) fv' t ++ fv' u fv' (x / ) t fv' t ++ fv' (x / ) u

Fresh : World ! Set Fresh = 9 ! ( !

Importing names We now introduce world inclusion witnesses __, whose purpose is to allow moving names from a smaller world into a larger world. In other words, we equip worlds with a system of explicit subtyping. World inclusion is reflexive and transitive; the type constructor Name is covariant; the empty world is the least world.

We introduce two primitive operations for creating fresh names. freshø is an initial strong link—a name that is fresh for the empty world. next( ! accepts two names: one is a weak link between two worlds and ; the other is fresh for . next( ! produces a name that is fresh for .

_ _ : Rel World -re : 8 fg ! -trans : 8 f g ! ! ! import : 8 f g ! ! Name ! Name ø-bottom- : 8 f g ! ø Like links, world inclusion witnesses come with an import function, which moves a name from one world into the other. One major difference with weak links is that this function is total. While importing names is nice and simple, we ( are interested, in general, in importing complex or data structures from one world into an terms other. This requires, in particular, being able to im port abstractions. Upon close examination, we find ( that we need this commutative diagram to hold. We make this property available to the programmer by introducing the following primitive operation: _(-commute-_ :

8 f

freshø : Fresh ø next( ! : 8 f g ! (

!

( ! ! Fresh

Together, these two low-level operations allow constructing an infinite stream of fresh names, that is, a name generator. Packaging up We are done introducing the abstract types and operations that we offer to the users of our library (or programming language). In summary, we have four primitive types (names, weak links, strong links, and world inclusion witnesses), and a number of operations over these types. In the Agda implementation, we find it convenient to package each type together with the operations that it offers. An idiomatic way of doing this involves defining parameterized records, like this: module NamePack f g (x

: Name ) where nameOf : Name nameOf = x module WeakPack f g (x : ( ) where open NamePack (nameOf( x) public weakOf : ( weakOf = x exportWith : Name ! Maybe (Name ) exportWith = export( x module Pack f g (x : ) where Of : Of = x importWith : Name ! Name importWith = import x module StrongPack f g (x : ( ! ) where open WeakPack (weaken x) public open Pack (dropName x) public strongOf : ( ! strongOf = x nextOf : Fresh nextOf = next( ! weakOf strongOf module FreshPack f g (x : Fresh ) where open StrongPack (proj2 x) public

g ! ( ! ! (

! 9

At this point, it is probably not clear why this commutative diagram is sound, or why it is useful. Its soundness – as well as that of every primitive operation presented here – is justified in §4. Its usefulness is illustrated in §5.2 and §5.3. Strong links Next, we introduce strong links. Again, the type ( ! is a precise type for a name: it is more precise than ( and (therefore) more precise than Name . If x has type ( ! , then x is guaranteed to be fresh for the world . That is, a strong link represents the introduction of a binding for a fresh name, and (in contrast with a weak link) cannot possibly shadow an earlier binding. As a result, if x has type ( ! , then must hold: out of a strong link, one can produce a world inclusion witness. _( !_

: Rel World weaken : 8 f g ! ( ! ! ( dropName : 8 f g ! ( ! ! Technically, a strong link comes with an even stricter guarantee: the name x must not just be fresh for ; it must dominate every atom in , in a sense to be made precise later on (§4.1). One might wonder why we need both weak links and strong links. Why not use strong links everywhere, since they offer a

The open/public declarations cause one record to be included within another. This permits a limited form of inheritance and overloading. For instance, within the scope of appropriate open

220

declarations, the method nameOf is applicable to names of type Name , ( , ( ! , and Fresh .

Finally, the fact that a type can be indexed with several world parameters can be exploited in other ways. For instance, if one wished to extend our object language with polymorphism, one would index the type Tm with two worlds: one for (names of) term variables, one for (names of) type variables. In other words, worlds can also serve as disjoint name spaces.

Constructing terms Once this boilerplate is set up, we at last show how to construct a term. For example, let us build a representation of the object-level term x y ! x y.

app : Tm ø app = ¬ (weakOf x) (¬ (weakOf y) (V (importWith y (nameOf x)) V (nameOf y))) where open FreshPack x = freshø y = nextOf x We generate two fresh names x and y. Each of these names is viewed as a weak link (via weakOf) when playing the role of a binding occurrence and is viewed as a name (via nameOf) when playing the role of a regular occurrence. Furthermore, in order to satisfy the type-checker, the regular occurrence of x must be imported into the scope of y. This is admittedly fairly difficult to read. If our system was implemented as a stand-alone programming language, as opposed to a library within Agda, it seems reasonable to think that one would be able to make the invocations of weakOf, nameOf, and importWith implicit. The omitted information would be reconstructed by a local type inference algorithm.

4.

Two sound implementations

We have axiomatized a number of notions, including worlds, names, and links. Now comes the time to give definitions of these types and terms. We have two versions of these definitions, that is, two Agda implementations of our library. One is in nominal style: it is based on atoms. The other is based on well-scoped de Bruijn indices. Both implementations can be found online [19]. Either of these implementations is well-typed in Agda: this guarantees that well-typed client programs of our library cannot go wrong. However, type soundness is not the whole story: we also wish to prove that well-typed client programs must respect name abstraction. For each of the two models, we establish this property via a logical relations argument. 4.1

The nominal model: implementation

4.2

The nominal model: logical relations

We posit a countably infinite set of atoms A, equipped with a notion of equality. In our Agda implementation, atoms are natural numbers and we make use of ordering of natural numbers for fresh name generation; this is apparent in the semantics of strong links below. In the nominal model, a world is a set of atoms. In the Agda implementation, such as set is represented as a list without duplicates. A name of type Name is an atom a together with a proof that a is a member of the world . A weak link of type ( is an atom a together with a proof of the equation [ f a g. That is, the world is the union of the world and of the atom a. It is important to note that a may or may not be a member of : a weak link permits shadowing. Like a weak link, a strong link of type ( ! includes an atom a, as well as a proof of the equation [ f a g. Furthermore, it contains a proof of the fact that the natural number a is a strict upper bound for the set . This condition reflects the fact that the name a is fresh for the world . It implies, and is stronger than, a 62 . Technically, this extra strength is exploited in the definition of next( ! , where we need to guarantee that, if a is fresh for , then the successors of a form an infinite stream of names that are fresh for . A world inclusion witness of type has no computational content: it is just a proof of the set-theoretic inclusion . In the nominal model, the operations import , nameOf( , weaken, dropName and _(-commute-_ have no computa? tional content. _ =Name _ is an atom equality test. export( also involves an atom equality test: it fails if its arguments are equal and returns its second argument otherwise. The function call next( !ab produces the maximum of the two integers 1 + a and b, so that, if b is fresh for some world , then next( ! a b is fresh for the world [ f a g.

Towards elaborate uses of worlds The type Tm is just one basic example of an algebraic data type that involves names and binders. As a more challenging example, consider a type C of one-hole contexts associated with Tm. The type C is indexed with two worlds, which respectively play the roles of an outer world and an inner world. The idea is, plugging a term of type Tm into the hole of a context of type C produces a term of type Tm . The definition of the type C is as follows: module Context where data C : World ! World ! Set where

Hole : C _1 _ : 8 f g C ! Tm ! C _2 _ : 8 f g Tm ! C ! C ¬ : 8 f g ( ! C ! C Let1 : 8 f g ( ! C ! Tm ! C Let2 : 8 f g ( ! Tm ! C ! C Contexts bind names: the hole can appear under one or several binders. This is why, in general, a context has distinct outer and inner worlds. A context contains a chain of weak links that connects the outer and inner worlds: these links are carried by the constructors ¬ and Let2 . Then a context and a term can be paired to produced a term in a context: CTm : World ! Set CTm = 9 ! C Tm It is straightforward to define a function plug from CTm to Tm , which accepts a pair of a context and a term and plugs the latter into the former. Conversely, one can define a family of focusing functions (8 f g ! Tm ! CTm ), which split a term into a pair of a context and a term. There are several such functions, according to where one wishes to focus. The role played by C in this existential type is identical to that played by ( in the single-name abstraction 9 ! ( Tm . In other words, the type C can be viewed as a new, user-defined type of links between worlds, and can be used to build elaborate forms of name abstractions. As another instance of this idea, if one wished to extend our object language with ML-style patterns, one would index the type Pat of patterns with an outer world and an inner world, and one would use elaborate abstractions of the form 9 ! Pat Tm .

Although the implementation described above guarantees type soundness in a traditional sense, this is not sufficient to guarantee that names are handled in a sound way. Indeed, it would be possible to extend this implementation of the library with operations that are well-typed but intuitively do not make sense. For instance, the above interpretation of worlds validates the fact that, out of a weak link of type ( , one can extract a proof of the inclusion . Yet, extending the system with an operation dubious : 8 f g ! ( ! , implemented

221

Two atoms a1 and a2 are related at type Name if and only if the pair (a1 ; a2 ) is in the partial bijection . They are related at type ( if and only if is the shadowing extension of with the pair (a1 ; a2 ). They are related at type ( ! if and only if is the fresh extension of with the pair (a1 ; a2 ). Last, two unit values are related at type if and only if the relation is a subset of the relation . Note that a1 ( ( ! ) a2 implies a1 ( ( ) a2 , which itself implies a1 (Name ) a2 . Thus, it is sound to turn a strong link into a weak one, and a weak link into a name. That is, it is sound to implement the operations nameOf( and weaken as the identity. In the case of these two operations, the proof of Theorem 4.5 (below) boils down to this simple remark. Similarly, in view of this interpretation, the operation dropName, which produces an inclusion witness out of a strong link, is clearly sound. We can now point out why a “binary” interpretation (worlds as partial bijections between atoms) is finer-grained than a “unary” interpretation (worlds as sets of atoms). Indeed, in order to check that the operation dubious, implemented as the identity, is sound, we would need to check that a1 ( ( ) a2 implies () ( ) (). That is, we would need to check that (a1 ; a2 ) implies . However, due to the possibility of shadowing, this is not in general the case: this implication is false when a1 2 dom() and a2 62 codom( ) and when a1 62 dom() and a2 2 codom( ). It is worth noting that (a1 ; a2 ) does imply dom() dom( ) and codom() codom( ). This explains why dubious seemed safe in a unary interpretation. There remains to establish the fundamental theorem of logical relations. The proof of this theorem is provided by the host language; we need only extend it with one new case for each of our primitive operations.

as the identity, would be intuitively unsound. To see this, consider a free atom x and an atom abstraction (¬ y t), whose respective types are Name and Tm . The bound atom y has type ( , where is the inner world of the abstraction. In the presence of dubious, it would become possible to use import to cast the atom x to the type Name , with undesirable consequences. First, one would then be able to compare the atoms x and y for equality, so that the identity of a bound name would become observable: name abstractions would be violated. Second, one would be able to build a new name abstraction whose bound atom is y and whose body contains a free occurrence of x: this would lead to name capture in the event that x and y happen to be the same atom. In the following, we remedy this problem by prodiving a richer interpretation of worlds in a nominal setting. We interpret a world no longer as a set of atoms, but as a partial bijection between atoms. On top of this, we carry out a standard logical relations construction. These logical relations validate all of the operations of §3, as implemented in §4.1, while rejecting dubious. The definitions and proofs in this section are informal, in the sense that they have not been machine-checked.

Definition 4.1 A relation between atoms is a subset of A A. We write a1 () a2 when the pair (a1 ; a2 ) is in the relation . A partial bijection between atoms is a relation such that a1 () a2 and b1 () b2 imply (a1 = b1 () a2 = b2 ).

The following notions are used in the interpretation of weak links and strong links, respectively. Definition 4.2 The shadowing extension of a partial bijection with an atom pair (b1 ; b2 ), written (b1 ; b2 ) , is the partial bijection such that a1 ((b1 ; b2 ) ) a2 holds if and only if either a1 b1 ^ a2 b2 or a1 6 b1 ^ a2 6 b2 ^ a1 () a2 .

Theorem 4.5 Every primitive operation itself at type .

The domain dom() of a relation is defined as the set of atoms fa1 j 9a2 ; a1 () a2 g. Its codomain codom() is defined analogously. If A is a set of atoms, we write b > A to indicate that the atom b is a strict upper bound for the set A.

Thus, let us assume that ; ; a1 ; a2 ; b1 ; b2 are as above. Now, a key remark is this: the hypotheses (a1 ; a2 ) and b1 ( ) b2 , together with the fact that is bijective, imply a1 = b1 () a2 = b2 . This remark allows us to distinguish only two cases: Case a1 = b1 ^ a2 = b2 . Then, the terms export( a1 b1 and export( a2 b2 both reduce to nothing. Because the value nothing is related to itself at type Maybe (Name ), the goal holds. Case a1 6= b1 ^ a2 6= b2 . Then, the terms export( a1 b1 and export( a2 b2 respectively reduce to just b1 and just b2 . We must prove that these two values are related at type Maybe (Name ). This boils down to proving that b1 and b2 are related by . It is easy to check that this goal does follow from the hypotheses b1 ( ) b2 , (a1 ; a2 ) , a1 6= b1 and a2 6= b2 .

The intuition behind the above proof case is: the success or failure of an export( operation does not depend on earlier choices of bound names. More precisely, if we run a single program twice, with different but related inputs, it is impossible for an export( operation to succeed in one run and fail in the other. One implication of Theorem 4.5 is that “choices of fresh names do not matter”. Our Agda implementation of the operation next( !,

Definition 4.4 At base types, the logical relation is defined by:

a1 ( ) a2 (a1 ; a2 ) (a1 ; a 2 )

if (a1 ; a2 ) and b1 ( ) b2 hold, then the terms (export( a1 b1 ) and (export( a2 b2 ) are related at type Maybe (Name ).

When the fresh extension exists, it coincides with the shadowing extension. We assume that the host language of our system supports the construction of logical relations in a standard manner. For instance, the host language may be System F or System F ! , where logical relations are well-understood [13]. At every type, two relations are defined: a relation between values and a relation between terms. We write v ( ) w when the values v and w are related at type ; we write t ( ) u when the terms t and u are related at type . We assume that the host language provides the definition of these relations at every standard type-theoretic connective (functions; universal and existential quantifiers; products, sums, unit). We also assume that equivalence of two terms at type is defined, independently of , in terms of the operational behavior of these terms and in terms of equivalence of two values at type . We now extend this construction by defining what it means for two values to be equivalent at our new primitive types: names, weak links, strong links, and world inclusion witnesses.

() () () ()

is related to

Proof. For the sake of brevity, we provide only one representative case, namely the case of export( . The goal is to show that export( is related to itself at type 8 f g ! ( ! Name ! Maybe (Name ). By definition of the logical relation at the standard connectives (8, !, Maybe ) and at our primitive types (Definition 4.4), the goal boils down to:

Definition 4.3 The fresh extension of a partial bijection with an atom pair (b1 ; b2 ), written (b1 ; b2 ), is defined only if b1 > dom() and b2 > codom(). When it is defined, (b1 ; b2 ) is the partial bijection f(b1 ; b2 )g [ .

a1 (Name ) a2 a1 ( ( ) a2 a1 ( ( ! ) a2 () ( ) ()

p of type

222

which we use to produce fresh names, is of course deterministic. However, one could in principle equip next( ! with a nondeterministic semantics, whereby next( ! a b produces an arbitrarily chosen integer that is greater than or equal to the maximum of 1 + a and b. Under this semantics, Theorem 4.5 still holds: related programs produce related results. In other words, non-determinism in the choice of fresh names is not observable by well-typed programs. One could in fact abandon next( ! and introduce an expression fresh, of type 8 f g ! 9 ! ( ! , which reduces to an arbitrary atom that dominates the world . Under this semantics, again, Theorem 4.5 holds. In a type-erasure implementation of our design, where worlds do not exist at runtime, fresh could be efficiently implemented using global state (that is, a gensym). Another implication is that “name abstractions cannot be violated”. It is perhaps not clear, at first, what this means, especially in light of the fact that our name abstractions are not primitive: they are built out of more elementary constructs. One way of formalizing this statement is to prove that our system permits an adequate encoding of nominal terms [14]: this guarantees that our name abstractions behave as intended. We do so below (§4.3). This adequacy result shows that our system is able to encode a standard notion of -equivalence. However, one should keep in mind that our system is more expressive than Pitts’ nominal terms and nominal types: it offers many types of data structures with names and binding that do not lie in the image of the encoding. Logical relations tell us what “-equivalence” means at these types.

In order to prove that this encoding is adequate, we wish to prove that two nominal terms are -equivalent if and only if their encodings are in the logical relation. -equivalence of nominal terms is defined in terms of total atom permutations, while our worlds are partial atom bijections. The following technical definition helps bridge the gap.

4.3

The proofs appears in the extended version of this paper [20]. As a corollary, if a nominal term t has nominal type , and if is a set of atoms that includes fa(t), then dte has type J K in our unary interpretation (§4.1). Furthermore, we have:

Definition 4.10 Let A1 and A2 be sets of atoms. Let 1 and 2 be permutations (that is, total, bijective relations over atoms). Let be a world, that is, a partial, bijective relation over atoms. We say that the permutations 1 and 2 correspond to the world , with respect to the domains A1 and A2 , if and only if

( 1 ; 2 1 ) \ ( A1 A 2 ) = \ ( A1 A2 )

Lemma 4.11 Let 1 and 2 correspond to with respect to A1 n fa1 g and A2 nfa2 g. Let c be fresh for 1 (A1 nfa1 g) and 2 (A2 n fa2 g). Let stand for the world (a1 ; a2 ) . Then, (1 a1 c) 1 and (2 a2 c) 2 correspond to with respect to A1 and A2 .

Theorem 4.12 If 1 and 2 correspond to with respect to fa(t1 ) and fa(t2 ), then the following two propositions are equivalent:

1 t1 2 t2 : (-equivalence of nominal terms) dt1 e (J K ) dt2 e (logical relation between terms)

Adequacy of an encoding of nominal terms

Definition 4.6 The nominal types and nominal terms are:

::= atom j j + j hatomi t ::= a j (t; t) j inji t j hait

Corollary 4.14 Let t1 and t2 be nominal terms of nominal type such that fa(t1 ) = fa(t2 ) = ;. If t1 t2 : holds, then dt1 e and dt2 e are observationally equivalent at type J Kø .

Definition 4.7 The free atoms of a nominal term are defined by:

Corollary 4.14 shows that our programming language respects object-level -equivalence. In other words, our name abstractions behave as intended: the identity of the bound name cannot be observed. The reverse implication – if t1 and t2 are not -equivalent, then their encodings can be distinguished by some well-typed observer – can be established by implementing an -equivalence test within the programming language (see §5.1) and by proving that it is correct. We have not yet carried out this proof.

Definition 4.8 -equivalence of two nominal terms at a nominal type is defined as follows:

a a : atom

t1 t2 : t01 t02 : 0 (t1 ; t01 ) (t2 ; t02 ) : 0

( a 1 c) t1 ( a 2 c) t2 : t1 t2 : i c # h a 1 i t1 ; h a 2 i t2 inji t1 inji t2 : 1 + 2 ha1 it1 ha2 it2 : hatomi We use a # t as a short-hand for a 62 fa(t). We write (a b)t for the result of swapping all occurrences of the atoms a and b through t.

4.4

h

i

dae d(t1 ; t2 )e dinji te dhaite

= Name = J1 K J2 K = J1 K + J2 K = 9 :( ( J K ) = a = (dt1 e; dt2 e) = inji dte = pack(a; dte)

The de Bruijn model: implementation

In the de Bruijn model, a world is just a natural number n, while a name is just a natural number in the interval [0; n). That is, World is N, and Name is Fin, where Fin n is the type of the natural numbers that are less than n. In this model, weak and strong links are the same. Indeed, contrary to the nominal model, there is no shadowing: it is impossible for a new binding to hide a previous one. A link of type ( or ( ! has no computational content: it is just a proof of the equation + 1. On the other hand, a world inclusion witness of type does have computational content: it is a natural number k such that the equation + k holds. The integer k represents the amount by which one must shift when importing a name from into . ? Most operations have straightforward semantics: _=Name _ is an integer equality test (in fact, it is an equality test at type Fin n);

Definition 4.9 The encodings of nominal types into our types, and of nominal terms into our extension of System F ! , are as follows. JatomK J1 2 K J1 + 2 K J atom K

Corollary 4.13 Let t1 and t2 be nominal terms of nominal type such that fa(t1 ) = fa(t2 ) = ;. Then, t1 and t2 are -equivalent if and only if their encodings are related with respect to the empty world. That is, t1 t2 : holds if and only if dt1 e (J Kø ) dt2 e holds.

For the sake of simplicity, we do not deal with recursive types, but one could extend our argument to do so. Note that the atom a is not considered bound in the nominal term hait.

fa(a) = fag fa((t1 ; t2 )) = fa(t1 ) [ fa(t2 ) fa(inji t) = fa(t) fa(hait) = fa(t) n fag

The following technical lemma shows how our notion of correspondence crosses a name abstraction. The proof of the theorem that follows is then straightforward.

223

export( is the predecessor function, which fails if applied to the index 0; import and -trans are integer addition; nameOf( is the constant function zero; and dropName is the constant function one. The functions weaken, next( ! and _(-commute-_ have no computational content. In the de Bruijn model, importing or exporting a term has a cost. Importing requires copying the term to increment its free names, while exporting requires copying and decrementing. (In the nominal model, in contrast, importing is the identity, while exporting requires an occurs check and is the identity when successful. The price to pay for this is that explicit renamings can be necessary.) Fortunately, thanks to the expressiveness of the type system, the programmer is in control of the import/export machinery, and has access to many of the classic tricks for dealing efficiently with de Bruijn indices. It is possible, for instance, to delay certain imports, and to cheaply combine multiple imports into one, since -trans is just integer addition. 4.5

there is no analogue of shadowing extension (Definition 4.2). The analogue of fresh extension (Definition 4.3) is the following:

Definition 4.15 The shift of a partial bijection , written ", is a partial bijection, characterized as follows:

" def = f(0; 0)g [ f(i1 + 1; i2 + 1) j (i1 ; i2 ) 2 g We write "k for the result of shifting the world k times.

Definition 4.16 At base types, the logical relation is defined by:

i1 (Name ) i2 () ( ( ) () () ( ( ! ) () k ( ) k

() () () ()

Theorem 4.17 Every primitive operation itself at type .

5.

The de Bruijn model: logical relations

i1 ( ) i 2 " " "k p of type

is related to

Programmable operations ?

Several of our primitive operations, such as _=Name_, export( , and import , operate upon names only. This is a good thing, insofar as it simplifies the meta-theory of our system. However, it is desirable to lift these operations to user-defined data types, such as Tm, so that user-defined terms can be compared for equality (up to equivalence) and exported or imported from one world into another. Fortunately, this can be done within the system: for a large class of algebraic data types, these generalized forms of the primitive operations can be programmed up. We now explain how to do so in the particular case of Tm. Where details are omitted, the reader is referred to the code [19].

Like the nominal model, the de Bruijn model is implemented [19] in Agda. This guarantees that our de Bruijn indices range over the expected intervals, and, more generally, that a well-typed client program cannot go wrong. However, as in the nominal model, this is not sufficient to guarantee that our implementation is correct in an intuitive sense. Again, there are operations that are well-typed in this unary interpretation of the de Bruijn model, yet do not make sense. For instance, imagine import is implemented as x ! x, instead of integer addition. This amounts to forgetting to shift, and is clearly a mistake. Yet, this version of import is well-typed, because if x has type Fin m then, for every n greater than or equal to m, x also has type Fin n. (This argument is slightly over-simplified. In reality, a coercion is needed: see Data.Fin.inject+ in Agda’s standard library.) As another instance, imagine export( is implemented as the function that fails when applied to the index 0 and returns 0 otherwise. Again, this is meaningless: this function is not even injective! Yet, this version of export( is well-typed, because if x has type Fin m then 0 has type Fin m. In light of these examples, we claim that, perhaps contrary to popular belief, well-scopedness of de Bruijn indices is not good enough: it does not guarantee that indices are correctly adjusted where needed. Again, our solution lies in the construction of logical relations that validate our implementation, while rejecting the incorrect implementations mentioned above. At this point, the reader may ask: do logical relations have anything non-trivial to say about the de Bruijn model? In the nominal model, logical relations were used to compare two program runs and to show that their outcome is insensitive to choices in the data representation (in particular, choices of bound names) and in the semantics (choices of fresh names). In the de Bruijn model, however, both bound names and fresh names are chosen in a canonical manner, so one might think that there is no interesting comparison to be made. In fact, there is. De Bruijn’s representation does carry an arbitrary component in its choice of free names. The logical relations argument tells us that a well-typed program is insensitive to choices of free names: if one applies some permutation to the free names in its input, one observes the same permutation in the free names of its output. This property is stronger than well-scopedness of de Bruijn indices; in particular, it is not satisfied by the incorrect implementations mentioned above. As in the nominal case §4.2, we view a world as a partial bijection between names. This time, names are de Bruijn indices, that is, natural numbers. In the de Bruijn model, there is no shadowing, so

5.1

Deciding -equivalence

We sketch how to implement a function that tests whether two terms are -equivalent. As before (§3), we use environments represented as chains of weak links. We modify the function export?( (§3) to obtain a new function, index?(, which accepts a name and classifies it as either free or bound in the environment. In the former case, like export?( , index?( produces a copy of the name at a more precise type. In the latter case, it converts the name to a de Bruijn index.

index?( :

8 f

g ! ?( ! Name ! Name ] N

Thus equipped, it is straightforward to write a recursive comparison function of type 8 f g ! ( ?( ) ( ?( ) ! Tm ! Tm ! Bool. At abstractions, the two environments are extended with the two bound names at hand. At variables, index?( is used to classify each of the two names at hand, and the results produced by index?( are compared for equality—which is possible because both have type Name ] N. Once applied to two empty environments, the comparison function has type 8 f g ! Tm ! Tm . 5.2

A generic traversal function

We are interested in several operations that move a term from one world to another. These operations are implemented by traversing the term and building a new term. Much of the traversal code can be shared between these operations, and it is beneficial to do so, as this sheds a more abstract light on the traversal. In general, a traversal function has a type of the following form:

Traverse : (_ _ : Rel World) (M : Set ! Set) (F : World ! Set) ! Set Traverse _ _ M F = 8 f g ! ! F ! M (F ) The parameter F describes the data structure that is being traversed and copied: for instance, F could be Tm. The parameter M is

224

typically either the identity or the Maybe monad. The former is used when implementing an operation that cannot fail; the latter is used when implementing an operation that can fail. More generally, M can be an arbitrary applicative functor [11]. The parameter _ _ indicates what kind of connection is expected between the world , which describes the input data structure, and the world , which the output data structure inhabits. For instance and as a first approximation, when implementing an export operation along a weak link, this parameter could be the type constructor for weak links, reversed, ip _(_. In reality, the implementation of an export operation needs to maintain an environment that keeps track of the binders that have been entered. In general, is the type of this environment, so it is more complex than just a single link. Still, it helps to think of it as the type of an abstract link between the input and output worlds. In the implementation of a traversal function, upon entering a name abstraction, we need this abstract link, whose type is , to commute with the weak link that represents the binding occurrence of the abstraction, whose type is ( . The following two definitions respectively describe a general commutative diagram and the particular diagram that is needed here:

5.3

Generalized import Whereas the primitive operation import moves a single name from one world to another, a generalized import function moves a data structure from one world to another. For instance, a generalized import function for Tm has type 8 f g ! ! Tm ! Tm . In other words, this function witnesses the fact that Tm is covariant in its index. To implement such a function, we instantiate the generic traversal function traverseTm. Because importing never fails, an appropriate applicative functor is the identity. The type of abstract links _ _ is instantiated with __, the type of world inclusion witnesses. The parameters comm and onName are instantiated with the primitive operations import and _(-commute-_. Generalized export A generalized export function for Tm has type 8 f g ! ( ! Tm ! Maybe (Tm ). It fails if its first argument, a name, occurs free in its second argument, a term. Otherwise, it returns a copy of its second argument at the outer world . We found generalized export more difficult to implement than generalized import. The reason is, the commutative diagram that we would like to use, which involves two weak links and has type Comm (ip _(_), appears to be unsound: it is not validated by our logical relations. We do have a work-around, but it involves freshening, that is, replacing the bound names of the input term with fresh names. Again, we instantiate the generic traversal function traverseTm. Because exporting can fail, an appropriate applicative functor is Maybe. The type of abstract links is instantiated with Fresh (Name ! Maybe (Name )). This means that, during the traversal, we maintain: (i) a fresh name generator for the output world ; and (ii) an environment, that is, a partial mapping of names in the input world to names in the output world . Upon entering an abstraction that binds some name x, this environment is extended by mapping x to a fresh name. (This is enough to implement the required commutative diagram, of type Comm _ _, for this particular definition of _ _.) Upon reaching a variable y, the environment is consulted. During this lookup, one of two situations arises. If y is bound in the environment, then the corresponding fresh name is returned. Otherwise, y is a free name of the term that we are attempting to export, so y is submitted to the primitive export( operation (which may fail), whose result is returned.

ComposeCommute : (_ 1 _ _ 2 _ : Rel World) ! Set ComposeCommute _ 1 _ _ 2 _

g ! 1 ! 2 ! 2 1 Comm : (_ _ : Rel World) ! Set Comm _ _ = ComposeCommute (ip _(_) _ _ =

Applications of the generic traversal

8 f

! 9

We now present a generic traversal function for the type Tm. In addition to the above parameters, it requires a function onName, which describes what to do at non-binding occurrences of names. The traversal is straightforward. The parameter is the abstract link between the worlds and . At variables, onName is used. At abstractions, the commutative diagram comm is used. This produces a new abstract link ' between ' and ', where ' is the inner world of the abstraction that is being deconstructed, and ' is the inner world of the abstraction that is being constructed. This new link is used in the recursive call. The diagram also produces a new weak link x', of type ( ', which is used in the construction of the new abstraction. The applicative functor machinery is used everywhere, so as to perform effect propagation behind the scenes. module TraverseTm

Checking whether a term is closed A closed term inhabits the empty world, and inhabits every world. That is, both of the types Tm ø and 8 f g ! Tm accurately describe closed terms. These types are interconvertible. To convert Tm ø into 8 f g ! Tm , one uses the subtyping axiom ø-bottom- as well as the fact that Tm is covariant in its index. To convert 8 f g ! Tm into Tm ø, one instantiates with ø. Terms that admit the above types are particularly easy to use, because, thanks to polymorphism, they can be freely moved to any world. Of course, the flip side of the coin is, it is somewhat difficult to create such terms. To help in this task, a useful tool is a function closeTm that checks at runtime whether a term is closed and, when it succeeds, returns a term that is statically known to be closed. Such a function should have type 8 f g ! Tm ! Maybe (Tm ø), or equivalently, 8 f g ! Tm ! Maybe (Tm ). At the base type Name, such a function is easy to implement. The function const nothing, which always fails, fits the bill: it has type 8 f g ! Name ! Maybe (Name ). Whereas an export link fails at one particular name and lets every other name through, this function can be viewed as a link that always fails. With this in mind, implementing closeTm is simple. The construction is identical to that of the generalized export function

M g (appli : Cat.RawApplicative M) (comm : Comm ) (onName : Traverse M Name) where open Cat.RawApplicative appli traverseTm : Traverse M Tm traverseTm (V x) = V h$i onName x traverseTm (t u) = __ h$i traverseTm t traverseTm u traverseTm (¬ x t) with comm x ... j ( ; '; x') = ¬ x' h$i traverseTm ' t traverseTm (Let x t u) with comm x ... j ( ; '; x') = Let x' h$i traverseTm t traverseTm ' u f

~

~

Thanks to the generic programming facilities of a language with dependent types, it should be possible to implement this generic traversal not just for Tm, but for any algebraic data type that is composed of unit, pairs, sums, names, and name abstractions. We have not yet done so.

225

the environment. Environments must offer the following constants and operations: empty (emptyEnv), lookup (lookupEnv); extension (_,_7!_); map (mapEnv); covariance of Env with respect to its parameter (importEnv). These requirements are expressed by the type ImportableEnvPack, whose definition is omitted.

above, except in the setup phase, where an export link is replaced with a link that always fails.

6.

Example: normalization by evaluation

As an advanced example, we show how to express a normalization by evaluation algorithm in our system. This algorithm has been previously used as a benchmark by several researchers [10, 14, 22]. The challenge lies in the way the algorithm mixes computational functions, name abstractions, and fresh name generation. The object language of interest is the pure -calculus. The algorithm exploits two different representations of object-level terms, which are respectively known as syntactic and semantic representations. Because these representations differ only in their treatment of name abstractions (we do not ensure normal forms for conciseness reasons), they can be given a common definition, which is parameterized over the representation of abstractions:

open ImportableEnvPack envPack

The algorithm uses an environment whose type takes the form Env (Sem ) . To a name, such an environment associates a semantic term that lies outside the scope of the environment. This type is, again, covariant in , as witnessed by the following import function:

g ! ! Env (Sem ) ! Env (Sem ) impEnv w = importEnv w mapEnv (impSem w) impEnv :

8 f

The first part of the algorithm evaluates a syntactic term within an environment to produce a semantic term. When evaluating a ¬abstraction, we build a semantic abstraction, which encapsulates a recursive call to eval. The bounded polymorphism required by the definition of semantic abstractions forces us to import the environment via impEnv.

module M (Abs : (World ! Set) ! World ! Set) where data T : Set where

V : Name ! T ¬ : Abs T ! T __ : T ! T ! T

The parameter Abs has kind (World ! Set) ! (World ! Set): it is an indexed-type transformer. In order to obtain the syntactic representation, we instantiate Abs with the abstractions that we have used throughout this paper: an abstraction is an existential package of a weak link and of a term that inhabits the inner world. This yields the type Term of syntactic terms.

app : 8 f g ! Sem ! Sem ! Sem app (¬ f) v = f -re v app n v = nv eval : 8 f g ! Env (Sem ) ! Term ! Sem eval (V x) = [ V; id ] (lookupEnv x) eval (t u) = app (eval t) (eval u) eval (¬ ( ; a; t)) = ¬ ( w v ! eval (impEnv w [ a 7! v ]) t)

SynAbs : (World ! Set) ! World ! Set SynAbs F = 9 ! ( F open M SynAbs renaming (T to Term) In order to obtain the semantic representation, we instantiate Abs with a different notion of abstraction, in the style of higher-order abstract syntax: an abstraction is a computational function, which substitutes a term for the bound name of the abstraction. This yields the type Sem of semantic terms. Sem is not an inductive data type; fortunately, with n o-positivity-check Agda accepts this type definition, at the cost of breaking strong normalization.

The second part of the algorithm reifies a semantic term back into a term. When reifying a semantic abstraction, we build a syntactic abstraction. This requires generating a fresh name, and leads us to parameterizing reify with a fresh name generator.

reify : 8 f g ! Fresh ! Sem ! Term reify g (V a) = V a reify g (n v) = reify g n reify g v reify g (¬ f) = ¬ ( ; weakOf g; t) where open FreshPack t = reify (nextOf g) (f (Of g) (V (nameOf g)))

SemAbs : (World ! Set) ! World ! Set SemAbs F = 8 f g ! ! F ! F open M SemAbs renaming (T to Sem) It is important to note that our semantic name abstractions involve bounded polymorphism in a world: we define SemAbs F as 8 f g ! ! F ! F , as opposed to the more naïve F ! F . This provides a more accurate and more flexible description of the behavior of substitution. Furthermore, this has the important effect of making SemAbs (and Sem) covariant with respect to the parameter , which would not be the case with the naïve definition. In other words, it is possible to define a generalized import operation for semantic terms:

Evaluation under an empty environment, followed with reification, yields a normalization algorithm. This algorithm works with open terms: its argument, as well as its result, are terms in an arbitrary world .

nf : 8 f g ! Fresh ! Term ! Term nf g = reify g eval emptyEnv

7.

Related work

The difficulty of programming with, or reasoning about, names and binders has been known for a long time. It has recently received a lot of attention, due in part to the P OPL M ARK challenge [4]. Despite this attention, the problem is still largely unsolved: according to Guillemette and Monnier, for instance, “none of the existing representations of bindings is suitable” [9]. In the following, we review several programming language designs that are intended to facilitate the manipulation of names and binders. By lack of space, this review cannot be exhaustive: we focus on relatively recent related work.

impSem : 8 f g ! ! Sem ! Sem impSem w (V a) = V (import w a) impSem w (¬ f) = ¬ ( w' v ! f (-trans w w') v) impSem w (t u) = impSem w t impSem w u At a semantic abstraction, no recursive call is performed, because the body of the abstraction is opaque: it is a computational function f. Instead, we exploit the transitivity of world inclusion and build a new semantic abstraction that inhabits the desired world. The normalization by evaluation algorithm is parameterized with a representation of environments. The type of environments takes the form Env A , where A is the type of the data carried in the environment and and are the outer and inner worlds of

Distinctions One traditionally distinguishes several broad approaches to the problem, which employ seemingly different tools, namely: atoms and atom abstractions; well-scoped de Bruijn indices; higher-order abstract syntax. We believe that this distinction can be artificial. In fact, our work presents strong connections with

226

can be implemented either using atoms (like the original FreshML) or using de Bruijn indices. This is an implementation choice, which the programmer need not know about. The present paper can be viewed as a different way of constructing a safe variant of FreshML. Whereas Pure FreshML supplements ordinary ML types with logical assertions, we explore the use of richer types and do not rely, for the time being, on a separate logic. In Pure FreshML, name abstraction is a primitive notion, and the fact that deconstructing an abstraction automatically freshens the bound atom is used to guarantee that all terms effectively live in a single world. Here, in contrast, name abstraction is explained in terms of more basic notions; it is possible to deconstruct a name abstraction without substituting a fresh name for the bound name. This leads to a finer-grained understanding of binding, and possibly to greater runtime efficiency: because our nominal compilation scheme permits shadowing, there is, in some cases, no need to pay a price to enforce the property that all names are distinct.

all three schools of thought. Perhaps more important are the following questions:

What properties are enforced by the system? FreshML [22] offers an adequate encoding of nominal terms, but does not prevent a newly generated atom from escaping its scope. Systems based on well-scoped de Bruijn indices enforce the invariant that every name refers to some binding site; however, as we have pointed out (§4.5), this alone does not imply that indices are correctly adjusted where needed. In systems based on higher-order abstract syntax, and in Pure FreshML [18], name manipulation is hygienic by design: this is built in the syntax and semantics of the programming language. In the present paper, hygiene is not built in. We have used logical relations to find out (and prove) which properties can be expected of a welltyped program.

Does the system offer “substitution for free”, and if so, at which types? Does it have separate data and computation layers? In FreshML, Pure FreshML [18] and the present work, the answer to both questions is negative. In several systems in the tradition of higher-order abstract syntax, including Elphin [21], Delphin [16, 17] and Beluga [12], the answer to both questions is positive. In Licata and Harper’s work [10], substitution is available for free at many types, even though data and computation are not separated.

Nominal System T [15] follows the tradition of FreshML and so guarantees that name abstractions are not violated. However compared to Pure FreshML and our system it does not statically enforce that names do not escape their scope. Instead the new construct is introduced to represent such escaped names, dynamically. This is akin to nan in floating point computation, they are not numbers but they still have the float type and result of mathematically ill founded operations like dividing by zero. So in some sense names do escape their scope but are dynamically turned into harmless values. Whether such programs should have a semantics or should simply be statically rejected is a matter of design.

Do names inhabit a fixed type, or do they inhabit every type? Usually, systems that provide some form of substitution for free [10, 12, 16, 17, 21] allow names to inhabit every type, while so-called nominal systems [18, 22] as well as the present work offer a separate type of names.

How does the system keep track of the context or world in which a name makes sense? In Pure FreshML, there is effectively just one world, within which every name makes sense; the proof obligations guarantee that no confusion can arise. In Elphin, Delphin, or in Licata and Harper’s system, the meaning of types is relative to a “current context”, and a number of modalities are provided to discard the current context, extend it with one new name, etc. In Beluga, contexts are explicit: a data-layer type, once annotated with a context, becomes a computation-layer type. In the present paper, worlds are explicit, and are built into algebraic data type definitions by the programmer.

Which very high-level operations does the semantics of the programming language involve? The semantics of FreshML and Pure FreshML involve renaming. The semantics of Elphin, Delphin, and Beluga involve higher-order matching. In the present work, as well as in Licata and Harper’s work, no costly operations are built into the semantics; high-level operations, such as our import and export operations, are obtained (at many, but not all, types) via generic programming.

Well-scoped de Bruijn indices It is by now well-known that typetheoretic machinery (such as nested algebraic data types, generalized algebraic data types, or dependent types) can be used to ensure that every de Bruijn index remains within range [1, 5]. In fact, dependent types can be used to encode not only the lexical scoping discipline, but also the type discipline of an object language: see, for instance, Chen and Xi [6] and Chlipala [7]. However, de Bruijn indices are, by nature, very low-level: it is desirable to build more abstract representations of top of them. For instance, Donnelly and Xi [8] define an algebraic data type of terms that is based on wellscoped de Bruijn indices, but is indexed with a higher-order abstract syntax representation of terms. Licata and Harper’s system [10] is implemented on top of well-scoped de Bruijn indices. The system presented in this paper can be compiled down to de Bruijn indices, and could thus be viewed as an abstraction layer on top of well-scoped de Bruijn indices. However, as we have pointed out (§4.5), our system offers a stronger guarantee than raw well-scoped de Bruijn indices do. It does not just guarantee that every index is within range: it also guarantees that a well-typed program component is insensitive to permutations of the free indices in its input. In a scenario where programs are type-checked but not proved correct, this extra guarantee could be welcome.

FreshML and Pure FreshML FreshML [22] extends ML with primitive types for names (known as atoms) and name abstractions. The semantics of FreshML dictates that pattern matching against a name abstraction silently replaces the bound atom with a fresh atom. This makes it easy to write programs in a style that matches informal mathematical practice. FreshML satisfies a correctness property analogous to our Corollary 4.14 – name abstractions cannot be broken. However, FreshML is unsafe: it is possible for a name to escape its scope. Put another way, FreshML is impure: name generation is an observable side effect. Pure FreshML [18] imposes additional proof obligations, which ensure that freshly created atoms do not escape their scope, and correspond to Pitts’ freshness condition for binders [14]. Because these proof obligations are expressed in a specialized logic, they can be discharged automatically. Because it is safe, Pure FreshML

Licata and Harper’s system [10] differs from ours in several ways. Perhaps most notably, Licata and Harper aim to provide substitution for free when possible, whereas we don’t; and they expose the use of well-scoped de Bruijn indices to the programmer, who must sometimes reason in terms of zero and successor, whereas we do not reveal the nature of names, thus permitting multiple compilation schemes. This said, there are numerous similarities between the two systems. Both keep track of the context, or world, within which each name makes sense. Both offer flexible ways of parameterizing or quantifying types over worlds. Both offer ways of moving data from one world to another: Licata and Harper’s weakening and strengthening respectively correspond to our import and export operations. Both systems support first-class computational functions.

227

Not all functions can be imported or exported, but some can: for instance, in both systems, the example of normalization by evaluation [20], which requires importing a function into a larger world, is made type-correct by planning ahead and making this function polymorphic with respect to an arbitrary world extension.

International Conference on Theorem Proving in Higher Order Logics (TPHOLs), Lecture Notes in Computer Science. Springer, August 2005. [5] Richard Bird and Ross Paterson. de Bruijn notation as a nested datatype. Journal of Functional Programming, 9(1):77–91, January 1999.

Elphin, Delphin, Beluga [12, 16, 17, 21] are closely related to one another in several ways. They separate the data and computation layers, which implies that they do not support first-class functions. At the data level, they provide substitution and higherorder matching as primitive operations. This ambitious approach can eliminate some boilerplate code, at the cost of a complex metatheory. By contrast, the meta-theory of our proposal is extremely simple, as it only extends an existing logical relations argument with a few new primitive types and operations.

[6] Chiyan Chen and Hongwei Xi. Implementing typeful program transformations. In ACM Workshop on Evaluation and Semantics-Based Program Manipulation (PEPM), pages 20–28, June 2003. [7] Adam Chlipala. A certified type-preserving compiler from lambda calculus to assembly language. In ACM Conference on Programming Language Design and Implementation (PLDI), pages 54–65, June 2007. [8] Kevin Donnelly and Hongwei Xi. Combining higher-order abstract syntax with first-order abstract syntax in ATS. In ACM Workshop on Mechanized Reasoning about Languages with Variable Binding, pages 58–63, 2005.

Moving across representations It is arguably desirable to be able to offer several choices of representation within a single system, and to be able to migrate from one representation to another. For instance, our implementation of normalization by evaluation (§6) illustrates how to move back and forth between “syntactic” name abstractions and “semantic” name abstractions in the style of higherorder abstract syntax. Atkey and co-authors [2, 3] investigate how to move back and forth between higher-order abstract syntax and de Bruijn indices. The translation out of higher-order abstract syntax produces well-scoped de Bruijn indices, but the proof of this fact is meta-theoretic. Atkey uses Kripke logical relations to argue that the current world at the time of application of a certain function must be larger than the world at the time of construction of this function. This seems somewhat related with our use of bounded polymorphism in the definition of semantic name abstractions (§6). An exact connection remains to be investigated.

8.

[9] Louis-Julien Guillemette and Stefan Monnier. A type-preserving compiler in Haskell. In ACM International Conference on Functional Programming (ICFP), 2008. [10] Daniel R. Licata and Robert Harper. A universe of binding and computation. In ACM International Conference on Functional Programming (ICFP), pages 123–134, September 2009. [11] Conor McBride and Ross Paterson. Applicative programming with effects. Journal of Functional Programming, 18(1):1–13, 2008. [12] Brigitte Pientka. A type-theoretic foundation for programming with higher-order abstract syntax and first-class substitutions. In ACM Symposium on Principles of Programming Languages (POPL), pages 371–382, January 2008. [13] Andrew M. Pitts. Parametric polymorphism and operational equivalence. Mathematical Structures in Computer Science, 10:321–359, 2000. [14] Andrew M. Pitts. Alpha-structural recursion and induction. Journal of the ACM, 53:459–506, 2006.

Future work

[15] Andrew M. Pitts. Nominal System T . In ACM Symposium on Principles of Programming Languages (POPL), pages 159–170, January 2010.

We have presented an abstract programming model, together with two concrete implementations, in nominal style and de Bruijn style. We have argued separately about the correctness of each implementation. In particular, we have proved that the nominal implementation allows an adequate encoding of nominal terms. Ideally, however, the nominal term encoding should be proved adequate directly with respect to the abstract model, not with respect to its implementations. We do not yet know how to do this, because our abstract model does not have a semantics. A related question is: how to carry out specifications and proofs of programs with respect to our abstract programming model?

[16] Adam Poswolsky and Carsten Schürmann. Practical programming with higher-order encodings and dependent types. In European Symposium on Programming (ESOP), volume 4960 of Lecture Notes in Computer Science, pages 93–107. Springer, March 2008. [17] Adam Poswolsky and Carsten Schürmann. System description: Delphin – A functional programming language for deductive systems. Electronic Notes in Theoretical Computer Science, 228:113–120, 2009. [18] François Pottier. Static name control for FreshML. In IEEE Symposium on Logic in Computer Science (LICS), pages 356–365, July 2007. [19] Nicolas Pouillard and François Pottier. A fresh look at programming with names and binders (Agda code), March 2010. http://tiny. nicolaspouillard.fr/FreshLookAgda. [20] Nicolas Pouillard and François Pottier. A fresh look at programming with names and binders (extended version). http://tiny. nicolaspouillard.fr/FreshLookExt, March 2010. [21] Carsten Schürmann, Adam Poswolsky, and Jeffrey Sarnat. The rcalculus: Functional programming with higher-order encodings. In International Conference on Typed Lambda Calculi and Applications (TLCA), volume 3461 of Lecture Notes in Computer Science, pages 339–353. Springer, April 2005.

Acknowledgements Thanks to Randy Pollack, Andrew Pitts, Benoît Montagu, Jean-Philippe Bernardy and the anonymous reviewers for providing us with very valuable feedback.

References [1] Thorsten Altenkirch and Bernhard Reus. Monadic presentations of lambda terms using generalized inductive types. In Computer Science Logic, volume 1683 of Lecture Notes in Computer Science, pages 453– 468. Springer, 1999. [2] Robert Atkey. Syntax for free: representing syntax with binding using parametricity. In International Conference on Typed Lambda Calculi and Applications (TLCA), volume 5608 of Lecture Notes in Computer Science, pages 35–49. Springer, July 2009. [3] Robert Atkey, Sam Lindley, and Jeremy Yallop. Unembedding domain-specific languages. In Haskell symposium, pages 37–48, September 2009.

[22] Mark R. Shinwell, Andrew M. Pitts, and Murdoch J. Gabbay. FreshML: Programming with binders made simple. In ACM International Conference on Functional Programming (ICFP), pages 263– 274, August 2003.

[4] Brian E. Aydemir, Aaron Bohannon, Matthew Fairbairn, J. Nathan Foster, Benjamin C. Pierce, Peter Sewell, Dimitrios Vytiniotis, Geoffrey Washburn, Stephanie Weirich, and Steve Zdancewic. Mechanized metatheory for the masses: The P OPL M ARK challenge. In

228

Experience Report: Growing Programming Languages for Beginning Students Marcus Crestani

Michael Sperber

University of T¨ubingen [email protected]

DeinProgramm [email protected]

Abstract

central tenets. We used PLT’s DrScheme (now called DrRacket) for the course, seeing it mainly as a graphical IDE for Scheme, and thus easier to use for students than traditional Scheme systems. Specifically, we ignored its hierarchy of language levels and instead ran DrScheme in its R5 RS mode. Our underlying assumption was the same as that of SICP, namely that the sheer power of functional programming combined with the syntactic simplicity of Scheme would make both teaching and learning so easy that we would fix all the problems of the previous courses instantly. However, while Scheme fixed many problems, significant issues remained.1 After having written a textbook on this approach, it took us until 2004 to realize that SICP’s example-driven approach to teaching did not work as well as we had expected with a large portion of our students: SICP admirably explains how many concepts of software development—abstraction in particular—work, but this is not enough to enable students to solve problems on their own. By this time, HtDP had appeared, and we started to adopt its central didactic concept, the design recipes, which implement an explicit programming process, driven by a data analysis. Adopting the design recipes meant expressing their concepts in code. However, pure R5 RS Scheme is a poor match for the design recipes—it lacks native constructs for compound data, mixed data (“sum types”), and noting violations. Consequently, we started implementing our own “language level,” which included the missing features and allowed us to adopt HtDP’s design recipes, while staying close to “standard” Scheme. We still shunned HtDP’s own language levels, as they deviate significantly from R5 RS Scheme. However, even though we did not know it at the time, we had replicated the first step of PLT’s own journey towards the language levels, and we would replicate more. For reasons initially unrelated to the language, we started observing our students as they worked on exercises (Bieniusa et al. 2008). Soon, we saw that, despite Scheme’s simplicity, students were making syntactic and other trivial mistakes. Experienced programmers “see” these mistakes immediately, but students often do not. This can be immensely frustrating, and a significant number of students gave up on programming on their own as a result. Disturbingly, many of these mistakes could have been detected by the Scheme implementation if only the language and the error messages were restricted to what the students knew. Consequently, we started implementing restrictions in line with the course, and custom error messages—replicating another step of PLT’s experience. Moreover, we saw that some students would read ahead and make use of programming-language features not yet covered in the course (most popular: assignments), which destroyed important didactic points: Thus, we implemented a sequence of progressively bigger language levels, replicating and thus confirming the final essential step of PLT’s development of the HtDP language levels. In 2006, after adding more add-on features analogous to HtDP’s, such as the handling of images and functional animations, we had lan-

A student learning how to program learns best when the programming language and programming environment cater to her specific needs. These needs are different from the requirements of a professional programmer. Consequently, the design of teaching languages poses challenges different from the design of “professional” languages. Using a functional language by itself gives advantages over more popular, professional languages, but fully exploiting these advantages requires careful adaptation to the needs of the students— as-is, these languages do not support the students nearly as well as they could. This paper describes our experience adopting the didactic approach of How to Design Programs, focussing on the design process for our own set of teaching languages. We have observed students as they try to program as part of our introductory course, and used these observations to significantly improve the design of these languages. This paper describes the changes we have made, and the journey we took to get there. Categories and Subject Descriptors D.2.10 [Software Engineering]: Design—Methodologies; K.3.2 [Computers and Education]: Computer and Information Science Education—Computer Science Education General Terms Keywords

1.

Design, Languages

Introductory Programming

Introduction

Functional programmers know that the choice of language affects the thinking of programmers and thus the design of software. The choice of language also matters when it comes to teaching introductory programming: It profoundly affects the students’ thinking repertory, as well as their learning experience. An “off the rack” language poses significant challenges for beginners and tends to be an obstacle to learning (Felleisen et al. 2004; Findler et al. 2002). In 1999, the University of T¨ubingen started revising its introductory course: A functional-programming-based course replaced more traditional previous offerings using Pascal, C++, or Java. The course was, to a large degree, based on the classic Structure and Interpretation of Computer Programs (or SICP) (Abelson et al. 1996). We were aware at the time of Rice PLT’s efforts, led by Matthias Felleisen, that would result in How to Design Programs (or HtDP) (Felleisen et al. 2001), which, however, had not been published yet—consequently, we only had a vague idea of its

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

1 We

were wrong about the course on other aspects as well (Bieniusa et al. 2008).

229

messages from DrScheme that reported the mistakes, and the students’ reaction to the error messages. The authors did this personally, and additionally trained our student TAs to look for mistakes, and report their observations to us. We also tried to raise the students’ awareness of these issues and report them. However, most of the helpful observations we made ourselves, closely followed by the TAs reports—we received very little unsolicited feedback from the students, and even this was mostly ad-hoc in-class feedback. The following insights from our experience have stayed with us:

guage levels almost completely analogous to HtDP’s. We also published our own follow-up textbook, Die Macht der Abstraktion (or DMdA, German for The Force of Abstraction) (Klaeren and Sperber 2007). In retrospect, we could have gotten there much faster and cheaper. However, at the time, we did not have PLT’s experience and the rationale for their design, and thus had proudly assumed a “not-invented-here” stance. Nevertheless, the design process for our teaching languages did not end there: The HtDP language levels still had insufficient support for some aspects of the design recipes—in particular, test cases, mixed data, and contracts. Moreover, new desirable aspects of teaching languages emerged—most recently, the support for formal properties of programs. This paper documents our experience with adopting the HtDP approach and evolving our teaching languages to better meet the students’ needs.

• We did not even know we had a problem, even though we have

always maintained an open door and open ears for our students. Consequently, it was extremely easy to deceive ourselves that “everything was fine.” • Mistakes made by one student were often repeated by other

students.

2.

• What seems easy or natural to us does not necessarily appear

HtDP’s Language Levels

that way to the students.

The HtDP (and DMdA) languages have evolved from Scheme, which, at the time, had been the basis for many introductory textbooks and courses, as its small size make it attractive for classroom use, and beginning students take well to the simple Lisp-style parenthesized prefix syntax. However, standard Scheme does not solve all language problems of the introductory course. Thus, improving the students’ experience meant changing and improving the language, as PLT’s TeachScheme! project has been doing since 1995 (Felleisen et al. 2004; Findler et al. 2002). In particular, students make mistakes when writing code. If the student is to make independent progress, the programming environment must provide feedback that enables the student to fix the mistakes on her own. These mistakes are often trivial: Syntax errors (which occur even with the “trivial” Scheme syntax) and type errors can be detected by the programming environment. Helping students fix other kinds of mistakes—misunderstanding the syntax or using features not yet covered in class—require actual changes to the languages beginners program in. In DrScheme, at any given time, the beginner uses one of several language levels. A language level is an operation mode of DrScheme that provides a language subset tailored to the needs of the beginner at that time. As the student progresses, she switches to more advanced language levels, each of which is a superset of the previous level. Each language level has its own implementation of error reporting tailored to the beginner’s needs. The error messages only mention terms that the course has introduced up to that point.

3.

• We could not expect the students to give us, on their own

initiative, the specific feedback we need to improve the course and the software for the course. The design decisions documented in this paper were mostly direct consequences of this action research, which is ongoing. Student’s scores in the programming exercises of the final exams have continually risen since we have adopted design recipes and started improving our teaching languages.

4.

Simple Differences

The original DMdA languages of 2006 differed from the HtDP languages in several minor ways—partly to reduce the differences with standard Scheme, and partly to cater more specifically to our German audience. The HtDP languages generally appeal to the students’ prior training in algebra, sacrificing some of the original Scheme syntax, whereas the DMdA languages stay closer to the original Scheme. The differences illustrate some of the decisions designers of languages for beginners face. 4.1

Procedure/Function Definitions

The difference in the handling of algebra is most visible in procedure definitions: In HtDP, procedures (called “functions” there) are defined with the usual Scheme syntactic sugar: (define (f x) ...)

Popularity 6= Success

This emphasizes the similarity to function definitions in mathematics as well as the visual congruence between function definitions and calls, and makes it easy to “see” the substitution that occurs. Conversely, DMdA’s procedure definitions use an explicit lambda:

Adopting HtDP’s insights for what would become DMdA was a lengthy process: Prior to the 2004 course, we only had a vague idea what the students were doing when they were on their own. That did not keep us from believing we had a fairly good idea of what they were doing, namely solving their homework problems using the techniques we had taught them. Only when we started personally supervising lab exercises, we found out that the students did not always follow the path we had laid out for them, and encountered numerous difficulties. This was easy to address during personal supervision, but would have kept the students from solving homework problems when on their own. In fact, many students resorted to copying somebody else’s homework (Bieniusa et al. 2008), and our impressions of what the students were doing turned out to be quite wrong, even though we thought we had good reason to believe they were right: The course was popular with students, and passing rates were higher than with the previous, “traditional” courses, even though we had covered more difficult material. When we realized this, we started observing our students more closely. Specifically, we recorded the mistakes they made, the error

(define f (lambda (x) ...)) This makes it easier later to introduce higher-order procedures, as it is straightforward to move the lambda somewhere else as opposed to explaining the concept of syntactic sugar, but loses the visual congruence. This is no great loss, however, as German students typically cannot identify the mathematical substitution principle, anyway—the subject does not play the explicit role in German high school that it enjoys in US curricula.2 Explaining it from scratch with lambda is thus no more difficult than explaining it using the syntactic sugar. 2 Ironically,

Felleisen traces back the algebraic aspect to his training in German high school, where algebra sadly has since been de-emphasized.

230

4.2

Record Definitions

With DMdA, we instead chose to emphasize the distinction between the expression (make-posn 1 2) and its value. This is particularly relevant in DrScheme’s stepper (Findler et al. 2002), which displays intermediate reductions as expression. In DMdA, the list prints as #<list 1 2 3>, and the ant prints as #>. This has the technical disadvantage of not being usable as an expression, but also prevents certain abstraction violations: In particular, it prevents students from cutting and pasting the result directly into a test case. Both approaches have been successful at avoiding the confusion associated with the standard external representation.

An important part of HtDP and DMdA is the treatment of compound data. Instructors teach students to recognize compound data, and use record definitions as implementations of the resulting data definitions. Teaching compound data effectively is surprisingly difficult, as beginning students tend to get confused about the idea of “several things becoming one.” Both DMdA and HtDP instructors teach simple heuristics such as that the number of components in the data definition should match the number of fields. (“How many parts does a calendar date have? Three! How many fields does the record-type definition for calendar dates have? Three!”) This means that the programming aspects of compound data ought to be as simple as possible, to not add to the students’ burden. Scheme has a long history of “record wars” (Clinger et al. 2005), hence it is no surprise that DMdA and HtDP chose different syntaxes for their record-type-definition forms. HtDP has chosen a so-called “implicit-naming” form. For example, consider the following HtDP “struct definition”:

4.4

(define-struct ant (weight loc)) This is in fact a definition of four procedures: A record constructor called make-ant, a predicate ant?, and two selectors ant-weight and ant-loc. The names are not explicitly mentioned in the form, hence “implicit-naming.” The DMdA teaching languages provide an “explicit-naming” form. Here is a definition equivalent to the above: (define-record-procedures ant make-ant ant? (ant-weight ant-loc))

5.

This is more verbose than the HtDP form, but makes it easier for the students to see that the form defines identifiers, and what those identifiers are. Also, define-record-procedures allows choosing arbitrary names for the various procedures, even though we emphasize the value of the conventions used above. Moreover, the DrScheme “Rename” menu entry works with explicit naming form, but not with the implicit naming. Some instructors in Germany experimenting with the HtDP languages reported that a significant number of students had difficulty understanding the “magic” of implicit naming. This particular problem is not as significant in DMdA courses; signatures (see Section 5.2) further alleviate any problems the students may have with writing record-type definitions. 4.3

Minor Language Changes

We made additional minor changes over the HtDP languages. One example is the omission of symbols in favor of strings: HtDP (and an ordinary Scheme programmer) uses symbols for enumerations (’solid, ’liquid, ’gaseous) where DMdA uses strings. This avoids the notational difficulties of using symbols, in particular the syntactic restrictions (no spaces etc.), and also the notational confluence between symbols and variables. We had observed these problems in earlier incarnations of the course, and switching to strings solved them all. (One might argue that this is less efficient, but it is the introductory course, after all.) Delaying symbols enables DMdA to also relegate quote (including quoted lists) to the very end, the general notion of which was quite confusing to students when introduced earlier. The inconvenience—(list "solid" "liquid" "gaseous") instead of ’(solid liquid gaseous)—is well worth it.

Growing the Teaching Languages

In 2006, when the DMdA teaching languages had become roughly analogous to the HtDP languages, we could focus on further improvements. In particular, we adopted and improved upon newer developments in the HtDP languages such as the support for testing. We have also developed two new additions: support for signatures, and the formulation of general, checked properties of procedures. 5.1

Encouraging Testing

Writing test cases is an early step of the design recipes. In particular, students should write test cases before they write the procedure definition itself. When we originally introduced testing as a mandatory part of the design recipes, we adopted graphical test boxes, which HtDP had implemented previously, that the students had to insert via a menu and fill out like a form. A test box would contain “Test” and “Should be” fields, that would be tested for equality. DrScheme would decorate test boxes of successful tests with green marks and failed tests with red marks and the actual value. The idea was that the graphical and form-like approach would make testing more attractive to students, but in fact the opposite was the case: The students found the GUI manipulation required to use test boxes too cumbersome. Moreover, the test boxes had to come after the procedure definition of the procedure they were supposed to test even though the design recipes specify that the students write them before writing the procedure definition. As a result, many students wrote their test cases after completing the procedure body. To encourage the students to test more, we replaced the mechanism for writing tests by one HtDP had implemented earlier: Instead of graphical test boxes, test cases are formulated as plain code using the check-expect form that accepts a test expression and a should-be expression as operands. The test case for is-5? can be formulated as a check-expect form like this:

Print Format

The REPL of a typical Scheme implementation accepts an expression and then prints its value. While the output format of the value is not standardized, most Scheme implementations output the (standard) external representation of the value: 5 prints as 5, “true” prints as #t, and the list with elements 1, 2, and 3 prints as (1 2 3). While the use of the external representation has advantages for dealing with advanced features of Scheme such as representing program source code as data, eval and quote, it confuses many beginning students about the difference between expressions and values. For example, the expression (list ’+ 1 2) evaluates to (+ 1 2), which looks like an expression that evaluates to 3. HtDP and DMdA avoid this confusion by using output formats different from the external representation. As HtDP emphasizes the relationship between algebra and programming, it prints out each value as a canonical form that evaluates to it. Thus, the list with elements 1, 2, 3 prints as (cons 1 (cons 2 (cons 3 empty))) or (list 1 2 3) (depending on the language level), which, as an expression, again evaluates to a list with elements 1, 2, 3. Record values are printed as constructor calls—for example, an ant will print out as (make-ant w (make-posn x y)).

(check-expect (is-5? 7) #f)

231

When we replaced graphical test boxes by check-expect, the students wrote significantly more test cases. The check-expect form allows quick creation, keyboard-based manipulation and easy duplication.3 Also, check-expect-based tests run after the rest of the program, and can be placed above the procedure definition. This successfully encourages the students to write test cases before writing the procedure definition. Thus, even though the difference between the graphical test boxes and check-expect is linguistically insignificant, the results differ dramatically: Details matter. 5.2

data (the terminology used by DMdA and HtDP for “sums”), which previously had no counterpart in the code, looks like this: (define animal (signature (mixed ant armadillo bigfoot))) This definition can be read as “an animal is an ant, armadillo, or a bigfoot” or, more precisely, “a value matching the signature animal must match one of the signatures ant, armadillo, bigfoot.” The signature keyword marks the expression as written in signature syntax.4 Compound data requires no new special form with signatures— students write regular signatures for the constructors, predicates, selectors and mutators. For the ants record definition from Section 4.2, students would typically write the following signatures:

Signatures

An important part of the design recipes is the formulation of a contract for every procedure. In HtDP the contracts are comments: ;; is-5? : number (define (is-5? n) (= n 5))

->

(: (: (: (:

boolean

The HtDP language of contracts is informal. (HtDP predates PLT’s well-known research on contracts as part of the programming language.) Most contracts look like type signatures. (Some represent more complex predicates, but this is not the main point here.) Writing down contracts is important for the students, as it helps answer typical questions, such as how many arguments they should supply in a procedure call, or how they should order them. Thus, contracts further guide decisions students have to make when they write their programs, and, once written, do so without requiring the student to think about the concrete problem at hand. Consequently, the remove the process of constructing the program from “solving the whole problem” by one—often crucial—step. Furthermore, TAs use contracts as anchors for giving helpful instructions, As contracts are not subject to static type checking, type errors do not keep a student from running the program and observing its behavior. Consequently, while writing down a type signature would have the same benefits as writing down the contract, the effects of doing this in a statically typed language would be detrimental for the beginning student when trying to run the program. The complete lack of checking also creates problems: Many students quickly realize that the contract comments have no bearing on the running program, and as a result they are sloppy with more complicated contracts. This led DMdA to add signatures as formal parts of the teaching languages in 2008, which take the place of HtDP’s informal contracts. Here is a signature declarations:

make-ant (real posn -> ant)) ant? (%a -> boolean)) ant-weight (ant -> real)) ant-loc (ant -> posn))

The first line declares that the constructor for ants accepts a real number and a position, and returns an ant record, the next that ant? accepts any value and returns a boolean, and the two following lines that the selectors for the weight and loc fields accept an ant record and return a real number and position, respectively. The first declaration already says all there is to say about ants—all predicates have the same signature. The selector signatures simply mirror the constructor signature, and we originally taught our students to only write this first line. To our (pleasant) surprise, the students soon insisted on writing all signatures, which have since been consistently helpful in getting students to understand the concepts of predicate and selector. The %a signature is a signature variable, as is every identifier appearing in a contract that starts with a %. This notation allows formulating typical “polymorphic” signatures like this: (: map ((%a -> %b) (list %a) -> (list %b))) The implementation views any such signature as meaning “any”— hence, the system does not check correct use of parametric polymorphism, and thus fails to prevent students from being sloppy with proper use of signature variables. However, this problem is quite minor compared with the sloppiness we had observed earlier. Note that signatures work as invariants for procedure calls. Conversely, the “real” contracts that are available in Racket monitor the flow of values across module boundaries (Flatt et al. 2010).

(: is-5? (number -> boolean)) 5.3

Any signature violation is logged like a test-case violation—see Figure 1. The feedback to the student includes the expression in the program whose evaluation violated the signature, the signature that was violated, and the value that violated it. The value is important for the student, as it provides concrete evidence that the program did something wrong (rather than a type system’s assertion that the program might do something wrong), and helps the student figure out the source of the problem. While replacing contracts with signatures does not significantly alter the pedagogy of the course, automatic checking plays the role of the lab supervisor for the students, and provides more immediate and precise feedback. The introduction of signatures showed instant results in class: The students were more thorough about writing them, and programming was more in line with the design recipes, as each part of a data definition now results in an actual piece of program code: The code for a definition for mixed

Properties

We noticed in the T¨ubingen 2008 course that some students, when the course introduced check-expect, would ask whether it might be possible to check for properties rather than examples. This struck a nerve with the DMdA team, as the textbook includes a section on formal specification using equational properties based on ADTs. This section had never worked particularly well, as it requires talking about semantics in terms of universal algebra. This was time-consuming and too obscure for students to grasp in the first semester. Moreover, we found that formulating interesting properties—such as fundamental properties of search trees—was 4 The signature syntax could almost but not quite be expressed as a combinator library, or individual macros for mixed etc.: The signature syntax delays references to signature variables and invocations of signature abstractions to allow recursive signatures. Moreover, it attaches fresh locations to the various parts of the syntax to enable intuitive error reporting. For example, when the number signature of is-5? above is violated, the visual feedback marks the particular occurrence of number in is-5?’s signature. To enable this, the system must treat number differently from a generic variable reference.

3 In

hindsight, this seems obvious, but it was far from obvious at the time, considering the prevalence of graphical paradigms in professional development environments.

232

Figure 1. Signature violation in DrScheme beyond the reach of the framework we had introduced, which was already too complex. Consequently, we decided to instead introduce properties in the concrete context of programming and add support for them to the DMdA languages. Here is an example:

practice. We generally assess the success of our own interventions through frequent testing, final exams, and direct observation, always comparing the results to those of previous courses, some of which have yielded significant empirical effects (Bieniusa et al. 2008). However, it is difficult to isolate the effects of individual changes in the teaching languages in empirical measurements. In particular, it is difficult to measure how many problems students were unable to solve because of language-design issues. Thus, we rely on direct observation in our supervised lab exercises, where our TAs log any problems the students have where the program environment or the programming language may help. We were able to observe some specific effects, however: For example, before the introduction of signatures, most contracts written by the students contained errors, whereas afterwards, most signatures did not contain errors. The effect of properties is not empirical, as they enable a particular didactic approach—we believe the basic approach is already validated, as many students are able to write properties on their own, whereas the previous ADT-based approach to specification was a disaster, as students were not able to formulate properties on their own.

(define +-is-commutative (for-all ((a number) (b number)) (= (+ a b) (+ b a))) The range of variables in the new for-all construct is specified using signatures. Thus, adding signatures to the language paid off in an unexpected way. Properties are objects, which can be composed. The new check-property form can be used to check a property: (check-property +-is-commutative) This invokes a QuickCheck clone (Claessen and Hughes 2000), and DrScheme displays counterexamples along with the test results. As signatures are run-time objects, the system constructs the value generators needed for QuickCheck using “regular programming” rather via type-class-based overloading. The fact that signatures are objects enables simple abstractions accessible to beginners, such as this:

7.

(: commutativity ((number number -> number) signature -> property)) (define commutativity (lambda (op sig) (for-all ((a sig) (b sig)) (= (op a b) (op b a)))))

Growing Teaching Languages

While it has become clear that standard Scheme as-is was not an ideal teaching language, it was still a good starting point for our endeavors: Functional programming is a more appropriate beginners’ paradigm than imperative or object-oriented programming; Scheme, being a functional language, supports the paradigms needed for implementing the design recipes, and its general abstraction mechanisms make it ideal for practicing abstraction. Its simple syntax makes classroom treatment easy. Educators and implementors can improve the learning experience with any (functional) language. This requires substantial action research and observation-driven improvement as part of a longrunning process, as our experience has demonstrated. Moreover, educators do well to clearly define their teaching goals. Appropriate goals are defined in terms of the actual learning experience rather than the subject coverage in class. The following principles have served us well on our journey:

This enables concrete practice dealing with abstract properties— this is helpful for our beginning students who struggle with the general concept of “commutativity” when divorced from arithmetic. Properties have now replaced the ADT-based approach to formal specification in the course, and the course segues from the QuickCheck testing to actual proofs of properties. Initial feedback from the 2009/2010 courses in T¨ubingen and Freiburg has been positive. In the T¨ubingen course, which placed more emphasis on properties, the students invented properties—typically simple algebraic properties such as commutativity, associativity, distributivity—throughout the course. Consequently, we are confident that properties will play a more prominent and supportive role in future courses. However, we will need to assess more systematic feedback and gather more experience to fully realize this potential.

• Observe your students directly and closely. • Be willing to abandon your favorite aspects of the course or

teaching language—at least be willing to move them to a different place. • Keep making changes, evaluate them, and be willing to abandon

6.

Assessing Success

them if they do not work. • Cooperate with others who are doing similar work. Learn from

Many pedagogic interventions have unexpected effects: Often, the best intentions are not sufficient to make a good idea work in

their mistakes.

233

8.

Related Work

group—particularly Matthew Flatt, Robby Findler, Shriram Krishnamurthi, and John Clements—are responsible for the ongoing development of DrRacket. Martin Gasbichler helped develop the DMdA language levels. Peter Thiemann and Torsten Grust and their groups provided helpful suggestions on the design of the DMdA languages, based on their own intro courses. Carl Eastlund suggested adding randomized testing to the language levels.

There are surprisingly few constructive investigations of how particular design elements of a programming language can support or hinder a beginner’s effort to learn programming. Wadler’s critique of Scheme for teaching (Wadler 1987) is such a constructive investigation; Wadler stresses the importance of a type-based approach to program construction, recognizes the problems of Scheme’s external representation, and the importance of algebraic techniques in understanding programs. The work on support for testing in ProfessorJ (Gray and Felleisen 2007) shows the importance of a concise and lightweight notation for tests, and thus mirrors the experience we had with test boxes and check-expect. The paper by McIver and Conway (McIver and Conway 1996) identifies a number of issues in the design of languages for introductory programming. The paper aptly concludes:

References Harold Abelson, Gerald Jay Sussman, and Julie Sussman. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, Mass., second edition, 1996. Annette Bieniusa, Markus Degen, Phillip Heidegger, Peter Thiemann, Stefan Wehr, Martin Gasbichler, Marcus Crestani, Herbert Klaeren, Eric Knauel, and Michael Sperber. HtDP and DMdA in the battlefield. In Frank Huch and Adam Parkin, editors, Functional and Declarative Programming in Education, Victoria, BC, Canada, September 2008.

This implies that the most important tool for pedagogical programming language design is usability testing, and that genuinely teachable programming languages must evolve through prototyping rather than springing fully-formed from the mind of the language designer.

Koen Claessen and John Hughes. QuickCheck: A lightweight tool for random testing of Haskell programs. In Philip Wadler, editor, Proceedings International Conference on Functional Programming 2000, pages 268– 279, Montreal, Canada, September 2000. ACM Press, New York. ISBN 1-58113-202-6. doi: http://doi.acm.org/10.1145/351240.351266. Will Clinger, R. Kent Dybvig, Michael Sperber, and Anton van Straaten. SRFI 76: R6RS records. http://srfi.schemers.org/srfi-76/, September 2005. Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, and Shriram Krishnamurthi. How to Design Programs. MIT Press, 2001. Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, and Shriram Krishnamurthi. The TeachScheme! project: Computing and programming for every student. Computer Science Education, March 2004. Robert Bruce Findler, John Clements, Cormac Flanagan, Matthew Flatt, Shriram Krishnamurthi, Paul A. Steckler, and Matthias Felleisen. DrScheme: A programming environment for Scheme. Journal of Functional Programming, pages 159–182, March 2002.

The work on Helium (Heeren et al. 2003) demonstrates the Haskell community’s insight that beginners have needs different from those of professionals—specifically, that they require better (type) error messages. Also, Helium, lacking type classes, is effectively a beginner’s language level for Haskell. The Helium project uses concrete observations of students’ interactions with the system to improve it (van Keeken 2006). Generally, producing comprehensible type error messages in Hindley-Milner-typed languages is ongoing research (Rahli et al. 2009). Marceau et al. have recently studied the quality of the error messages in DrScheme more systematically and concluded that there is still significant room for improvement (Marceau et al. 2010). DrJava (Hsia et al. 2005) has picked up the concept of language levels from DrScheme.

9.

Matthew Flatt, Robert Bruce Findler, and PLT. Guide: Racket. PLT, 2010. Available from http://pre.plt-scheme.org/docs/. Kathryn E. Gray and Matthias Felleisen. Linguistic support for unit tests. Technical Report UUCS-07-013 2007, University of Utah, 2007. Bastiaan Heeren, Daan Leijen, and Arjan van IJzendoorn. Helium, for learning Haskell. In Johan Jeuring, editor, Proceedings of the 2003 ACM SIGPLAN Haskell Workshop, pages 62–71, Uppsala, Sweden, August 2003. James I. Hsia, Elspeth Simpson, Daniel Smith, and Robert Cartwright. Taming Java for the classroom. In SIGCSE 2005, February 2005.

Conclusions

The programming language used by an introductory course can be either a help to the student, or an obstacle. However, even though the typical professional functional language is less complex than the typical professional object-oriented language, problems remain. Improving this situation requires language design specifically geared towards beginning students. The properties of these languages arise from the pedagogic principles of the course—the design recipes—and continual improvement from an ongoing process and observation of the students. The HtDP and DMdA languages have come a long way in supporting the beginning student. However, work on them is ongoing, and we believe further refinements are possible. In the near future, we will continue to work on the error messages, again following PLT’s lead (Marceau et al. 2010). We have also ported the work on signatures in the DMdA levels to the HtDP levels, which will be available in a future version of DrRacket. As many signatures already look like types, we also plan to experiment with adding additional levels that treat the signatures as type declarations. Moreover, we expect experience to guide us towards further improvements. In the future, we may benefit from a more systematic approach to evaluating our success instead of our past action research. We welcome new adopters and their feedback. We call on educators who teach programming using other languages to use similar or improved processes to tailor their tools to the needs of their students.

10.

Herbert Klaeren and Michael Sperber. Die Macht der Abstraktion. Teubner Verlag, 1st edition, 2007. Guillaume Marceau, Kathi Fisler, and Shriram Krishnamurthi. Measuring the effectiveness of error messages designed for novice programmiers. In 2010 Workshop on Scheme and Functional Programming, Montr´eal, Qu´ebec, Canada, August 2010. Linda McIver and Damian Conway. Seven deadly sins of introductory programming language design. In Proceedings Software Engineering: Education & Practice, pages 309–316, Los Alamitos, CA, USA, 1996. IEEE Computer Society Press. Vincent Rahli, J. B. Wells, and Fairouz Kamareddine. Challenges of a type error slicer for the SML language. Technical Report HW-MACSTR-0071, Heriot-Watt University, School of Mathematics & Computer Science, September 2009. Peter van Keeken. Analyzing Helium programs obtained through logging — the process of mining novice Haskell programs —. Master’s thesis, Utrecht University, October 2006. INF/SCR-05-93. Philip Wadler. A critique of Abelson and Sussman or why calculating is better than scheming. SIGPLAN Notices, 22(3):83–94, March 1987.

Acknowledgments

Many people were involved in shaping the DMdA and HtDP language levels: Matthias Felleisen and the members of the PLT

234

Fortifying Macros ∗ Ryan Culpepper †

Matthias Felleisen

Northeastern University {ryanc,matthias}@ccs.neu.edu

Abstract

A macro definition associates a name with a compile-time function, i.e., a syntax transformer. When the compiler encounters a use of the macro name, it calls the associated macro transformer to rewrite the expression. Because macros are defined by translation, they are often called derived syntactic forms. In the example above, the derived form let is expanded into the primitive forms λ and function application. Due to the restricted syntax of macro uses— the macro name must occur in operator position—extensions to the language easily compose. Since extensions are anchored to names, extensions can be managed by controlling the scope of their names. This allows the construction of a tower of languages in layers. Introducing new language elements, dubbed macros, has long been a standard element of every Lisper’s and Schemer’s repertoire. Racket [Flatt and PLT 2010], formerly PLT Scheme, is a descendent of Lisp and Scheme that uses macros pervasively in its standard libraries. Due in part to its pedagogical uses, Racket has high standards for error behavior. Languages built with macros are held to the same standards as Racket itself. In particular, syntactic mistakes should be reported in terms of the programmer’s error, not an error discovered after several rounds of rewriting; and furthermore, the mistake should be reported in terms documented by the language extension. Sadly, existing systems make it surprisingly difficult to produce easy-to-understand macros that properly validate their syntax. These systems force the programmer to mingle the declarative specification of syntax and semantics with highly detailed validation code. Without validation, however, macros aren’t true abstractions. Instead, erroneous terms flow through the parsing process until they eventually trip over constraint checks at a low level in the language tower. Low-level checking, in turn, yields incoherent error messages and leaves programmers searching for explanations. In short, such macros do not create seamless linguistic abstractions but sources of confusion and distraction. In this paper, we present a novel macro system for Racket that enables the creation of true syntactic abstractions. Programmers define modular, reusable specifications of syntax and use them to validate uses of macros. The specifications consist of grammars extended with context-sensitive constraints. When a macro is used improperly, the macro system uses the specifications to synthesize an error message at the proper level of abstraction.

Existing macro systems force programmers to make a choice between clarity of specification and robustness. If they choose clarity, they must forgo validating significant parts of the specification and thus produce low-quality language extensions. If they choose robustness, they must write in a style that mingles the implementation with the specification and therefore obscures the latter. This paper introduces a new language for writing macros. With the new macro system, programmers naturally write robust language extensions using easy-to-understand specifications. The system translates these specifications into validators that detect misuses—including violations of context-sensitive constraints— and automatically synthesize appropriate feedback, eliminating the need for ad hoc validation code. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features General Terms

1.

Design, Languages

What is a macro?

Every functional programmer knows that a let expression can be expressed as the immediate application of a λ abstraction [Landin 1965]. The let expression’s variables become the formal parameters of the λ expression, the initialization expressions become the application’s arguments, and the body becomes the body of the λ expression. Here is a quasi-formal expression of the idea: (let ([var rhs] . . . ) body) = ((λ (var . . . ) body) rhs . . . ) It is understood that each var is an identifier and each rhs and body is an expression; the variables also must be distinct. These constraints might be stated as an aside to the above equation, and some might even be a consequence of metavariable conventions. New language elements such as let can be implemented via macros, which automate the translation of new language forms into simpler ones. Essentially, macros are an API for extending the front end of the compiler. Unlike many language extension tools, however, a macro is part of the program whose syntax it extends; no separate pre-processor is used. ∗ The

research was partially supported by NSF infrastructure grants.

† New address: School of Computing, 50 Central Campus Drive (Rm 3190),

2.

Salt Lake City, UT 84112-9205

Expressing macros

To illustrate the problems with existing macro systems, let us examine them in the context of the ubiquitous let example: (let ([var rhs] . . . ) body) = ((λ (var . . . ) body) rhs . . . ) the vars are distinct identifiers body and the rhss are expressions

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

A macro’s syntax transformer is essentially a function from syntax to syntax. Many Lisp dialects take that as the entirety of the interface: macros are just distinguished functions, introduced with

235

define-macro instead of define, that consume and produce Sexpressions representing terms. Macros in such systems typically use standard S-expression functions to “parse” syntax, and they use quasiquotation to build up the desugared expression:

Yet MBE lacks the power to express all of the information in the informal description of let above. The example macros presented so far neglect to validate two critical aspects of the let syntax: the first term of each binding pair must be an identifier, and those identifiers must be distinct. Consider these two misuses of let: (let ([x 1] [x 2]) (+ x x)) (let ([(x y) (f 7)]) (g x y))

(define-macro (let bindings body) ‘((λ ,(map first bindings) ,body) ,@(map second bindings))) A well-organized implementation would extract and name the subterms before assembling the result, separating parsing from code generation:

In neither case does the let macro report that it has been used incorrectly. Both times it inspects the syntax, approves it, and produces an invalid λ expression. Then λ, implemented by a careful compiler writer, signals an error, such as “λ: duplicate identifier in: x” in Racket for the first term and “invalid parameter list in (λ ((x y)) (g x y))” in Chez Scheme [Cadence Research Systems 1994] for the second. Source location tracking [Dybvig et al. 1993] improves the situation somewhat in macro systems that offer it. For example, the DrRacket [Findler et al. 2002] programming environment highlights the duplicate identifier. But this is not a good solution. Macros should report errors on their own terms. Worse, a macro might pass through syntax that has an unintended meaning. In Racket, the second example above produces the surprising error “unbound variable in: y.” The pair (x y) is accepted as an optional parameter with a default expression, a feature of Racket’s λ syntax, and the error refers to the free variable y in the latter portion. If y were bound in this context, the second example would be silently accepted. A slight variation demonstrates another pitfall:

(define-macro (let bindings body) (define vars (map first bindings)) (define rhss (map second bindings)) ‘((λ ,vars ,body) ,@rhss)) These definitions do not resemble the specification, however, and they do not even properly implement it. The parsing code does not validate the basic syntax of let. For example, the macro simply ignores extra terms in a binding pair: (let ([x 1] [y 3 "what about me?"]) (+ x y)) Macro writers, eager to move on as soon as “it works,” will continue to write sloppy macros like these unless their tools make it easy to write robust ones. One such tool is the so-called Macro-By-Example (MBE) notation by Kohlbecker and Wand [1987]. In MBE, macros are specified in a notation close to the initial informal equation, and the parsing and transformation code is produced automatically. The generated parsing code enforces the declared syntax, rejecting malformed uses such as the one above. MBE replaces the procedural code with a sequence of clauses, each consisting of a pattern and a template. The patterns describe the macro’s syntax. A pattern contains syntax pattern variables, and when a pattern matches, the pattern variables are bound to the corresponding sub-terms of the macro occurrence. These sub-terms are substituted into the template where the pattern variables occur to produce the macro’s expansion result. Here is let expressed with syntax-rules [Sperber et al. 2009], one of many implementations of MBE:

(let ([(x) (f 7)]) (g x x)) This time, Racket reports the following error: “λ: not an identifier, identifier with default, or keyword at: (x).” The error message not only leaks the implementation of let, it implicitly obscures the legal syntax of let. (define-syntax (let stx) (syntax-case stx () [(let ([var rhs] . . . ) body) ;; Guard expression (and (andmap identifier? (syntax→list #’(var . . . ))) (not (check-duplicate #’(var . . . )))) ;; Transformation expression #’((λ (var . . . ) body) rhs . . . )]))

(define-syntax let (syntax-rules () [(let ([var rhs] . . . ) body) ((λ (var . . . ) body) rhs . . . )]))

Figure 1. let with guards

The pattern variables are var, rhs, and body. The crucial innovation of MBE is the use of ellipses (. . .) to describe sequences of sub-terms with homogeneous structure. Such sequences occur frequently in S-expression syntax. Some sequences have simple elements, such as the parameters of a λ expression, but often the sequences have non-trivial structure, such as binding pairs associating let-bound variables with their values. Every pattern variable has an associated ellipsis depth. A depth of 0 means the variable contains a single term, a depth of 1 indicates a list of terms, and so on. Syntax templates are statically checked to make sure the ellipsis depths are consistent. We do not address template checking and transcription in this work; see Kohlbecker and Wand [1987] for details. Ellipses do not add expressive power to the macro system but do add expressiveness to patterns. Without ellipses, the let macro can still be expressed via explicit recursion, but in a way that obscures the nature of valid let expressions; instead of residing in a single pattern, it would be distributed across multiple clauses of a recursive macro. In short, ellipses help close the gap between specification and implementation.

The traditional solution to this problem is to include a guard expression, sometimes called a fender, that is run after the pattern matches but before the transformation expression is evaluated. The guard expression produces true or false to indicate whether its constraints are satisfied. If the guard expression fails, the pattern is rejected and the next pattern is tried. If all of the patterns fail, the macro raises a generic syntax error, such as “bad syntax.” Figure 1 shows the implementation of let in syntax-case [Dybvig et al. 1993; Sperber et al. 2009], an implementation of MBE that provides guard expressions. A syntax-case clause consists of a pattern, an optional guard, and a transformation expression. Syntax templates within expressions are marked with a #’ prefix. Guard expressions suffice to prevent macros from accepting invalid syntax, but they suffer from two flaws. First, since guard expressions are separated from transformation expressions, work needed both for validation and transformation must be performed twice and code is often duplicated. Second and more important, guards do not explain why the syntax was invalid. That is, they only control matching; they do not track causes of failure.

236

becomes onerous, and the macro writer is likely to make odd, expedient compromises—arguments must appear in some order, or if one argument is given, both must be. Beyond two arguments, the approach is unworkable. The alternative is, again, to move part of the parsing into the transformer code. The macro writer sketches the rough structure of the syntax in broad strokes with a pattern, then fills in the details with procedural parsing code:

(define-syntax (let stx) (syntax-case stx () [(let ([var rhs] . . . ) body) (begin ;; Error-checking code (for-each (λ (var) (unless (identifier? var) (syntax-error "expected identifier" stx var))) (syntax→list #’(var . . . ))) (let ([dup (check-duplicate #’(var . . . ))]) (when dup (syntax-error "duplicate variable name" stx dup))) ;; Result term #’((λ (var . . . ) body) rhs . . . ))]))

(define-syntax (define-struct stx) (syntax-case stx () [(define-struct name (field . . . ) kw-options . . . ) —— #’(kw-options . . . ) ——])) In the actual implementation of define-struct, the parsing of keyword options alone takes over one hundred lines of code. In comparison, when formulated in our new system this code is shortened by an order of magnitude. In summary, MBE offers weak syntax patterns, forcing the programmer to move the work of validation and error-reporting into guards and transformers. Furthermore, guard expressions accept or reject entire clauses, and rejection comes without information as to why the guard failed. Finally, MBE lacks the vocabulary to describe a broad range of important syntaxes. Our new domainspecific language for macros eliminates these problems.

Figure 2. let with hand-coded error checking To provide precise error explanations, explicit error checking is necessary, as shown in figure 2. Of the ten non-comment lines of the macro’s clause, one is the pattern, one is the template, and eight are dedicated to validation. Furthermore, this macro only reports errors that match the shape of the pattern. If it is given a malformed binding pair with extra terms after the right-hand side expression, the clause fails to match, and syntax-case produces a generic error. Detecting and reporting those sorts of errors would require even more code. Only the most conscientious macro writers are likely to take the time to enumerate all the ways the syntax could be invalid and to issue appropriate error reports. Certainly, the code for let could be simplified. Macro writers could build libraries of common error-checking routines. Such an approach, however, would still obscure the natural two-line specification of let by mixing the error-checking code with the transformation code. Furthermore, abstractions that focus on raising syntax errors would not address the other purpose of guards, the selection among multiple valid alternatives. Even ignoring the nuances of error reporting, some syntax is simply difficult to parse with MBE patterns. Macro writers cope in two ways: either they compromise on the user’s convenience with simplified syntax or they hand-code the parser.

3.

The design of syntax-parse

Our system, dubbed syntax-parse, uses a domain-specific language to support parsing, validation, and error reporting. It features three significant improvements over MBE: • an expressive language of syntax patterns, including pattern

variables annotated with the classes of syntax they can match; • a facility for defining new syntax classes as abstractions over

syntax patterns; and • a matching algorithm that tracks progress to rank and report

failures and a notion of failure that carries error information. Furthermore, guard expressions are replaced with side conditions, which provide rejection messages. The syntax classes of our new system serve a role similar to that of non-terminals in traditional grammars. Their addition allows the disciplined interleaving of declarative specifications and handcoded checks. This section illustrates the design of syntax-parse with a series of examples based on the let example.

(define-struct struct (field . . . ) option . . . ) where struct, field are identifiers option ::= #:mutable | #:super super-struct-expr | #:inspector inspector-expr | #:property property-expr value-expr | #:transparent

(syntax-parse stx-expr [pattern side-clause . . . expr] . . . ) where side-clause ::= #:fail-when cond-expr msg-expr | #:with pattern stx-expr

Figure 3. The syntax of define-struct

Figure 4. Syntax of syntax-parse

Keyword arguments are one kind of syntax difficult to parse using MBE patterns. An example of a keyword-enhanced macro is Racket’s define-struct form, whose grammar is specified in figure 3. It has several keyword options, which can occur in any order. The #:transparent and #:inspector keywords control when structure values can be inspected via reflection. The #:mutable option makes the fields mutable; the #:property option allows structure types to override behavior such as how they are printed; and so on. Different keywords come with different numbers of arguments, e.g., #:mutable has none and #:property takes two. Parsing a define-struct form gracefully is simply beyond the capabilities of MBE’s pattern language, which focuses on homogeneous sequences. A single optional keyword argument can be supported by simply writing two clauses—one with the argument and one without. At two arguments, calculating out the patterns

3.1

Validating syntax

The syntax of syntax-parse—specified in figure 4—is similar to syntax-case. As a starting point, here is the let macro transliterated from the syntax-rules version: (define-syntax (let stx) (syntax-parse stx [(let ([var rhs] . . . ) body) #’((λ (var . . . ) body) rhs . . . )])) It enforces only the two side conditions in the original specification. To this skeleton we add the constraint that every term labeled var must be an identifier. Likewise, rhs and body are annotated to

237

name and an expression for its value, occurs in other variants of let, such as let∗ and letrec. In addition to patterns, syntax classes may contain side conditions. For example, both the let and letrec forms require that their variable bindings be distinct. Here is an appropriate syntax class:

indicate that they are expressions. For our purposes, an expression is any term other than a keyword. The final constraint, that the identifiers are unique, is expressed as a side condition using a #:fail-when clause. Here is the revised macro: (define-syntax (let stx) (syntax-parse stx [(let ([var:identifier rhs:expr] . . . ) body:expr) #:fail-when (check-duplicate #’(var . . . )) "duplicate variable name" #’((λ (var . . . ) body) rhs . . . )]))

(define-syntax-class distinct-bindings #:description "sequence of binding pairs" (pattern (b:binding . . . ) #:fail-when (check-duplicate #’(var . . . )) "duplicate variable name" #:with (var . . . ) #’(b.var . . . ) #:with (rhs . . . ) #’(b.rhs . . . )))

Note that a syntax class annotation such as expr is not part of the pattern variable name, and it does not appear in the template. The call to check-duplicate acts as a condition; if it is false, failure is averted and control flows to the template expression. But if it returns any other value, parsing fails with a “duplicate variable name” message; furthermore, if the condition value is a syntax object—that is, the representation of a term—that syntax is included as the specific site of the failure. In short, side conditions differ from guard expressions in that the failures they generate carry information describing the reasons for the failure. At this point, our let macro properly validates its syntax. It catches the misuses earlier and reports the following errors:

The attributes of distinct-bindings are var and rhs. They are bound by the #:with clauses, each of which consists of a pattern followed by an expression, which may refer to previously bound attributes such as b.var. The expression’s result is matched against the pattern, and the pattern’s attributes are available for export or for use by subsequent side clauses. Unlike the var and rhs attributes of binding, the var and rhs attributes of distinct-bindings have an ellipsis depth of 1, so bs.var and bs.rhs can be used within ellipses in the macro’s template, even if bs does not occur within ellipses in the macro’s pattern:

> (let ([x 1] [ x 2]) (h x)) let: duplicate variable name in: x > (let ([ (x y) (f 7)]) (g x y)) let: expected identifier in: (x y)

(define-syntax (let stx) (syntax-parse stx [(let bs:distinct-bindings body:expr) #’((λ (bs.var . . . ) body) bs.rhs . . . )]))

The boxes indicates the specific location of the problem; the DrRacket programming environment highlights these terms in red in addition to printing the error message. For some misuses, let still doesn’t provide good error messages. Here is an example that is missing a pair of parentheses:

Now that we have specified the syntax of binding and distinctbindings, syntax-parse can use them to generate good error message for additional misuses of let: > (let ( x 5) (add1 x)) let: expected binding pair in: x > (let 17 ) let: expected sequence of binding pairs in: 17

> (let (x 5) (add1 x)) let: bad syntax Our let macro rejects this misuse with a generic error message. To get better error messages, the macro writer must supply syntaxparse with additional information. 3.2

The next section explains how syntax-parse generates error messages and how defining syntax classes affects error reporting.

Defining syntax classes

4.

Syntax classes form the basis of syntax-parse’s error-reporting mechanism. Defining a syntax class for binding pairs gives syntaxparse the vocabulary to explain a new class of errors. The syntax of binding pairs is defined as a syntax class thus:

Reporting errors

The syntax-parse system uses the declarative specification of a macro’s syntax to report errors in macro uses. The task of reporting errors is factored into two steps. First, the matching algorithm selects the most appropriate error to report. Second, it reports the error by pinpointing the faulty term and describing the fault or stating the expected class of syntax.

(define-syntax-class binding #:description "binding pair" (pattern [var:identifier rhs:expr])) The syntax class is named binding, but for the purposes of error reporting it is known as “binding pair.” Since the pattern variables var and rhs have moved out of the main pattern into the syntax class, they must be exported as attributes of the syntax class so that their bindings are available to the main pattern. The name of the binding-annotated pattern variable, b, is combined with the names of the attributes to form the nested attributes b.var and b.rhs: (define-syntax (let stx) (syntax-parse stx [(let (b:binding . . . ) body:expr) #:fail-when (check-duplicate #’(b.var . . . )) "duplicate variable name" #’((λ (b.var . . . ) body) b.rhs . . . )]))

4.1

Error selection

Pattern variable annotations and side conditions serve a dual role in our system. As seen, pattern variable annotations and side conditions allow syntax-parse to validate syntax. When validation fails, syntax-parse reports the specific site and cause of the failure. But annotations and side conditions do not simply behave like the error checks of figure 2. A macro can have multiple clauses, and a syntax class can have multiple variants. If there are multiple choices, all of them must be attempted before an error is raised and explained. To illustrate this process, we must introduce choice into our running example. Serendipitously, Racket inherits Scheme’s let syntax, which has another variant—a so-called “named let”—that specifies a name for the implicit procedure. This notation provides a handy loop-like syntax. For example, the following program determines whether the majority of numbers in a list are positive:

Macros tend to share common syntactic structure. For example, the binding pair syntax, consisting of an identifier for the variable

238

(define (mostly-positive? nums) (let loop ([nums nums] [pos 0] [non 0]) (cond [(empty? nums) (> pos non)] [(positive? (first nums)) (loop (rest nums) (+ 1 pos) non)] [else (loop (rest nums) pos (+ 1 non))])))

(let ([a 1] [ 2 b]) (∗ a b)) This second sequence denotes strictly more progress than R EST · F IRST. Thus, the second failure is selected, and the macro reports that it expected an identifier in place of 2. Matching progress is not only a measure of position in a term. Consider the following example:

Implementing the new variant of let is as simple as adding another clause to the macro: (define-syntax (let stx) (syntax-parse stx [(let loop:identifier bs:distinct-bindings body:expr) #’(letrec ([Lp (λ (bs.var . . . ) body)]) (Lp bs.rhs . . . ))] [(let bs:distinct-bindings body:expr) #’((λ (bs.var . . . ) body) bs.rhs . . . )]))

(let ([x 1] [x 2]) (+ x x)) Both clauses agree on the faulty subterm. But this example is clearly closer to a use of normal-let rather than named-let. The faulty term matches the structure of distinct-bindings, just not the side condition. Pragmatically, we consider a check for side conditions—in contrast to an annotation check—to occur after traversal of the term. A progress step dubbed L ATE signals the failure of a side condition. Thus, while the named-let clause fails with the progress string R EST · F IRST in the example above, the normal-let clause fails with R EST · F IRST · L ATE, which is greater progress than the first. Sometimes multiple alternatives fail at the same place, e.g.,

The macro uses the annotations to pick the applicable pattern; it chooses named-let if the first argument is an identifier and normallet if it is a binding list. It happens that the two patterns are mutually exclusive, so the order of the clauses is irrelevant. The use of annotations to select the matching clause must be reconciled with the role of annotations in error reporting. An annotation rejection during pattern-matching clearly cannot immediately signal an error. But the annotations must retain their errorreporting capacity; if the whole parsing process fails, the annotations must be used to generate a specific error. The dual role of failure is supported using the following approach. When there are multiple alternatives, such as multiple syntax-parse clauses or multiple variants of a syntax class definition, they are tried in order. When an alternative fails, syntax-parse records the failure and backtracks to the next alternative. As alternatives are tried, syntax-parse accumulates a list of failures, and each failure contains a measure of the matching progress made. If the whole matching process fails, the attempts that made the most progress are chosen to explain the syntax error. Usually, but not always, there is a unique maximum, resulting in a single error explanation. Otherwise, the maximal failures are combined.

> (let 5 ) let: expected identifier or sequence of binding pairs in: 5 Both clauses make the same amount of progress with this term: R EST · F IRST. As a result, both failures are selected, and the error message includes both descriptions. 4.2

let: expected binding pair in: x This message consists of the macro’s expectations (a binding pair) and the specific term where parsing failed (x). A syntax error should identify the faulty term and concisely explain what was expected. It should not recapitulate the macro’s documentation; rather, the error message should make locating the appropriate documentation easy, e.g., via links and references. Consequently, syntax-parse produces messages from a limited set of ingredients. It automatically synthesizes messages for literal and datum patterns; for example, the pattern 5 yields the message “expected the literal 5.” As a special case, it also knows how to report when a compound term has too many sub-terms. The only other ingredients it uses are provided by the macro developer: descriptions and side-condition messages. In particular, syntax-parse does not synthesize messages to describe compound patterns. We call such patterns and the failures they cause “ineffable”; our system cannot generate explanations for them. An example is the following pattern:

Progress π ::= ps ∗ Progress Step ps ::= F IRST | R EST | L ATE F IRST < R EST < L ATE < ps · π

π1 < π 2 ps · π1 < ps · π2

Error messages

In addition to progress, a failure contains a message that indicates the nature of the error and the term where the failure occurred. A typical error message is

ps 1 < ps 2 ps 1 · π1 < ps 2 · π2

Figure 5. Progress Figure 5 defines our notion of progress as sequences of progress steps. The progress steps F IRST and R EST indicate the first and rest of a compound term, respectively. Parsing is performed left to right; if the parser is looking at the rest of a compound term, the first part must have been parsed successfully. Progress is ordered lexicographically. Steps are recorded left to right, so for example the second term in a sequence is written R EST · F IRST; that is, take the rest of the full term and then the first part of that. Consider the following erroneous let term:

(var:identifier rhs:expr) If a term such as 5 is matched against this pattern, it fails to match the compound structure of the pattern. The matching process does not reach the identifier or expression check. One possible error message is “expected a compound term consisting of an identifier and an expression.” Another is “expected (identifier expr).” In practice, macro writers occasionally write error messages of both forms. We have chosen not to generate such messages automatically for two reasons: first, they do not scale well to large or sophisticated patterns; and second, we consider such messages misguided. Generating messages from patterns is feasible when the patterns are simple, such as the example above. For patterns with deeper nesting and patterns using advanced features, however, generating an accurate message is tantamount to simply displaying the pattern

(let ([a 1] [2 b]) (∗ a b)) The named-let clause fails at the second sub-term with the progress string R EST · F IRST: (let ([a 1] [2 b]) (∗ a b)) The normal-let clause, however, fails deeper within the second argument, at R EST · F IRST · R EST · F IRST · F IRST:

239

itself. While showing patterns in failures is a useful debugging aid for macro developers, it is a bad way to construct robust linguistic abstractions. Error reporting should be based on documented concepts, not implementation details. When a compound pattern such as the one above fails, the pattern’s context is searched and the nearest enclosing description is used to report the error. Consider the following misuse of let:

(define-syntax-class distinct-bindings #:description "sequence of binding pairs" (pattern (˜var bs (bindings-excluding ’())) #:with (var . . . ) (bs.var . . . ) #:with (rhs . . . ) (bs.rhs . . . ))) ;; seen is a list of identifiers (define-syntax-class (bindings-excluding seen) (pattern () #:with (var . . . ) ’() #:with (rhs . . . ) ’()) (pattern ([(˜var var0 (id-excluding seen)) rhs0] . (˜var rest (bindings-excluding (cons #’var0 seen)))) #:with (var . . . ) #’(var0 rest.var . . . ) #:with (rhs . . . ) #’(rhs0 rest.rhs . . . )))

(let (x 1) (add1 x)) The error selection algorithm from section 4.1 determines that the most specific failure arose trying to match x against the pattern (var:identifier rhs:expr). Here is the full context of the failure: • matching x against (var:identifier rhs:expr) failed • while matching x against b:binding

;; seen is a list of identifiers (define-syntax-class (id-excluding seen) (pattern x:identifier #:fail-when (for/or ([id seen]) (bound-identifier=? #’x id)) "duplicate variable name"))

• while matching (x 1) against bs:distinct-bindings • while matching (let (x 1) (add1 x)) against the complex pattern

(let bs:distinct-bindings body:expr) The first and fourth frames contain ineffable patterns. Discarding them and rephrasing the expected syntax gives us the following context:

Figure 7. Parameterized syntax classes

• expected binding pair, given x • expected sequence of binding pairs, given (x 1)

is the standard notion of identifier equality in hygienic macro systems [Dybvig et al. 1993]. A ˜var pattern constrains a pattern variable to a syntax class. The colon notation is a shorthand for parameterless syntax classes; e.g., x:identifier is short for (˜var x (identifier)). When the syntax class takes parameters, the explicit ˜var notation is required. A syntax class’s parameters may be used in its sub-expressions, including its description and any of its side conditions. For example, here is a syntax class that recognizes literal natural numbers less than some upper bound:

The message and term of the first frame are used to formulate the error message “let: expected binding pair in: x” because it is the closest one.

5.

Syntax patterns

The power of syntax-parse is due to its expressive pattern language, an extension of the syntax patterns of MBE. Sections 3 and 4 have introduced some features of our pattern language. This section describes additional pattern forms that, in our experience, increase the expressive power of our system to the level necessary for developing real syntax specifications. Patterns S ::= | | | | | | | | |

;; ex.: the pattern (˜var n (nat< 10)) matches ;; any literal natural number less than 10 (define-syntax-class (nat< bound) #:description (format "natural number < ˜s" bound) (pattern n:nat #:fail-when (not (< (syntax→datum #’n) bound)) (format "got a number ˜s or greater" bound)))

x x : class (S . S ) (S ... . S ) datum (˜literal x ) (˜var x (class e∗ )) (˜and S + ) (˜or S + ) (˜describe expr S )

Notice how the upper bound is inserted into both the description and the check message using the format procedure. We can use parameterized syntax classes to give an alternative definition of distinct-bindings, via a syntax class parameterized over the identifiers that have already been seen. Figure 7 shows the alternative definition and the auxiliaries bindings-excluding and id-excluding. The pattern bindings-excluding syntax class accepts sequences of distinct bindings but also requires that the bound names not occur in seen. Consider bindings-excluding’s second pattern; var0 must be an identifier not in seen, and the identifier bound to var0 is added to the blacklisted identifiers for the rest of the binding sequence. Note that var0 is in scope in the argument to bindings-excluding. Since patterns are matched left to right, pattern variable binding also runs left to right, following the principle of scope being determined by control dominance [Shivers 2005]. While it accepts the same terms, this alternative definition of distinct-bindings reports errors differently from the one in section 3.2. The first definition verifies the structure of the binding pairs first, then checks for a duplicate name. The second checks the structure and checks duplicates in the same pass. They thus report different errors for the following term:

Figure 6. Single-term patterns 5.1

Single-term patterns

Figure 6 describes the syntax of syntax patterns, specifically singleterm patterns, the kind of pattern that specifies sets of single terms. The first four—pattern variables, annotated pattern variables, pair patterns, and ellipsis patterns—appear in section 3. So do datum patterns, in the form of (), which ends compound patterns.1 In general, data like numbers, booleans, and strings can be used as patterns that match themselves. The ˜literal pattern form2 recognizes identifiers that have the same binding as the enclosed identifier; this 1 The 2 All

notation (a b) is shorthand for (a . (b . ())). pattern keywords start with a tilde (˜).

(let ([a 1] [a 2] [x y z]) a)

240

The ˜parse form evaluates its sub-expression and matches it against the given pattern. One use for the ˜parse form is to bind default values within an ˜or pattern, avoiding the need for explicit attribute checks later. Recall parse-field-declaration. Here internal is bound in both alternatives, simplifying the result template:

In such cases, the macro writer must decide the most suitable order of validation. The ˜and pattern form provides a way of analyzing a term multiple ways. Matching order and binding go left to right within an ˜and pattern, so later sub-patterns can rely on earlier ones. The ˜or form matches if any of its sub-patterns match. Unlike in many pattern matching systems, where disjunction, if it is supported at all, requires that the disjuncts bind the same pattern variables, ˜or patterns are more flexible. An ˜or pattern binds the union of its disjuncts’ attributes, and those attributes that do not occur in the matching disjunct are marked “absent.” It is illegal to use an absent attribute in a syntax template, so syntax-parse provides the attribute form, which accesses the value of the attribute, returning false for absent attributes. Using attribute, a programmer can check whether it is safe to use an attribute in a template. Here is an auxiliary function for parsing field declarations for a class macro, where a field declaration contains either a single name or distinct internal and external names:

(define (parse-field-declaration stx) (syntax-parse stx [(˜or (˜and field:identifier (˜parse internal #’field)) [internal:identifier field:identifier]) (make-field #’internal #’field)])) This example also shows the use of ˜and to sequence an action pattern after a single-term pattern. Since ˜and propagates attributes bound in each of its sub-patterns to subsequent sub-patterns, ˜and can be used to parse a term and then perform actions depending on the contents of the term. The ˜fail patterns allow programmers to perform side-constraint checks. Additionally, if the condition evaluates to a syntax value, it is added to the failure as the specific term that caused the error. By default, ˜fail performs early checks. For example, the identifier syntax class performs its test as an early check:

(define (parse-field-declaration stx) (syntax-parse stx [(˜or field:identifier [internal:identifier field:identifier]) (make-field (if (attribute internal) #’internal #’field) #’field)]))

(define-syntax-class identifier (pattern (˜and x (˜fail (not (identifier? #’x)) no-msg)))) The ˜late form turns enclosed checks into late checks. In fact, the #:fail-when keyword option used in distinct-bindings is just shorthand for a combination of ˜late and ˜fail: (define-syntax-class distinct-bindings #:description "sequence of distinct bindings" (pattern (˜and (b:binding . . . ) (˜late (˜fail (check-duplicate #’(b.var . . . )) "duplicate variable name")))))

Some uses of ˜or patterns are better expressed as syntax classes, not least because a syntax class can use a #:with clause to bind missing attributes: (define-syntax-class field-declaration (pattern field:id #:with internal #’field) (pattern [internal:id field:id])) The final pattern form, ˜describe, pushes a new description d onto the matching context of its sub-pattern. Hence, if a failure occurs and if there is no other description closer to the source of the error, the description d is used to explain the failure. There is no difference between a description attached to a syntax class and one given via ˜describe. Recall the binding and distinctbindings syntax class definitions from section 3.2; the binding syntax class could be inlined into distinct-bindings as follows:

5.3

(define-syntax-class distinct-bindings #:description "sequence of distinct binding pairs" (pattern ((˜describe "binding pair" [var:identifier rhs:expr]) . . . ) #:fail-when ——))

(define-struct point (x y) #:super geometry #:mutable) No single-term pattern describes the inspector option. In particular, the pattern (#:super sup:expr) does not, because #:super and its argument do not appear as a separate parenthesized term, such as (#:super geometry).

In fact, distinct-bindings could be inlined into the let macro itself using ˜describe and action patterns. Action patterns A ::= | | Patterns S ::= |

Head patterns H ::= | | | | List pattern L ::= | | | Patterns S ::= | |

(˜parse S expr ) (˜fail condition message) (˜late A) ··· (˜and S {S |A}∗ )

Figure 8. Action patterns

5.2

Head patterns

The patterns of Sections 5.1 and 5.2 do not provide the power needed to parse macros like define-struct from figure 3. There are elements of define-struct’s syntax that comprise multiple consecutive terms, but single-term patterns describe only single terms, and action patterns do not describe terms at all. An occurrence of the super option, for example, consists of two adjacent terms: the keyword #:super followed by an expression, e.g.,

Action patterns

The action patterns of figure 8 do not describe syntax; instead, they affect the parsing process without consuming input. The ˜parse form allows the programmer to divert matching from the current input to a computed term; ˜fail provides a way of explicitly causing a match failure; and ˜late affects the ordering of failures.

(˜seq . L) (˜and H {H |A}∗ ) (˜or H + ) (˜describe expr H ) S () (S . L) (H . L) (H ... . L) ··· (H . S ) (H ... . S )

Figure 9. Head patterns Our solution is to introduce the head patterns of figure 9, which describe sequences of terms. The primary head pattern constructor

241

is ˜seq, which is followed by a proper list pattern (L). For example, (˜seq x:identifier . . . y:expr) matches a sequence of any number of identifiers followed by one expression. Contrast that pattern with (x:identifier . . . y:expr), which matches a single compound term containing a sequence of identifiers followed by an expression. A head pattern may be combined with a normal single-term pattern to form a single-term pattern. The combined pattern matches a term by attempting to split it into a prefix sequence of terms that matches the head pattern and a suffix term that matches the tail. The term need not be a compound term if the prefix can be empty. For example, the pattern ((˜seq x y z) w:identifier . . . ) matches the term (1 2 3 a b) because the term can be split into the prefix of three terms 1 2 3 matching (˜seq x y z) and the suffix (a b) matching (w:identifier . . . ). Of course, ((˜seq x y z) w:identifier . . . ) is equivalent to (x y z w:identifier . . . ). The ˜seq pattern is useful primarily when combined with other pattern forms, such as ˜and and ˜or, as in macros with optional keyword arguments:

(define-struct name:identifier (field:identifier . . . ) (˜or (˜optional (˜seq #:mutable) #:name "mutable clause") (˜optional (˜seq #:super super-expr) #:name "super clause") (˜optional (˜or (˜seq #:inspector inspector-expr) (˜seq #:transparent)) #:name "inspector or transparent clause") (˜seq #:property pkey:expr pval:expr)) . . . )))

Figure 11. syntax-parse pattern for define-struct sequences consisting of some number of instances of the alternatives joined together. An alternative may be annotated with one of two repetition constraint forms, ˜optional and ˜once, that restrict the number of times that alternative may appear in the sequence. The meaning of an ˜or-pattern changes slightly when it occurs immediately before ellipses. Instead of “absent” values accruing for every alternative that is not chosen, only the chosen alternative accrues attribute values. Consequently, when the term (1 a 2 b c) is matched against the pattern ((˜or x:identifier y:number) . . . ), x matches (a b c) and y matches (1 2). These extensions to ellipses and head patterns provide enough power to specify define-struct’s syntax. Figure 11 shows the complete pattern. After the fields come the keyword options, in any order. Keywords and their arguments are grouped together with ˜seq patterns. Many of the options can occur at most once, so they are wrapped with ˜optional patterns. The exception is the #:property option, which can occur any number of times. The #:inspector and #:transparent options are mutually exclusive, so they are grouped together under one ˜optional disjunct.

(define-syntax (test-case stx) (syntax-parse stx [(test-case (˜or (˜seq #:around proc) (˜seq)) e:expr) —— (attribute proc) ——])) Head patterns are not intrinsically tied to keywords, of course. We could describe the syntax of let, accommodating both normal-let and named-let syntax, with the following pattern: (let (˜or (˜seq loop:identifier) (˜seq)) bs:distinct-bindings body:expr) Splicing syntax classes encapsulate head patterns. Each of its variants is a head pattern (H ), most often a ˜seq pattern, although other kinds of head pattern are possible. The optional #:around keyword argument could be extracted thus:

6.

(define-splicing-syntax-class optional-around (pattern (˜seq #:around proc)) (pattern (˜seq) #:with proc #’(λ (p) (p))))

• Errors are selected from all failures based on progress. • Errors are described using explicitly-provided descriptions.

A pattern variable annotated with a splicing syntax class can represent multiple terms. In this example, ka matches two terms:

This section presents the semantics of pattern matching in syntaxparse and explains how it implements the two principles. The error selection algorithm is represented by a backtracking monad with a notion of failure that incorporates matching progress. The error description principle is implemented by the semantic functions, which propagate error descriptions as an inherited attribute.

(define-syntax (test-case stx) (syntax-parse stx [(test-case ka:optional-around e) —— #’ka.proc ——])) (test-case #:around call-with-connection ——) Head patterns can also occur in front of ellipses. In those cases, a few additional variants are available that enable macro writers to support multiple optional arguments occurring in any order. Ellipsis patterns EH ::= | | | Patterns S ::= | List patterns L ::= |

6.1

Tracking failure

We model backtracking with failure information with a “singleelimination” monad, a variant of well-known backtracking monads [Hughes 1995]. A single-elimination (SE) sequence consists of a finite list of successes (ai ) terminated by at most one failure (φ):

(˜or EH + ) (˜once H #:name expr ) (˜optional H #:name expr ) H ··· (EH ... . S ) ··· (EH ... . L)

ha1 , · · · , an ; φi The monad is parameterized by the type of success elements; see below. The sequences of successes may be empty. For simplicity we always include the failure and use • to represent “no failure.” The important aspect of this monad is its handling of failures, which models our macro system’s error selection algorithm. A failure (other than •) consists of a progress (π) together with a set of reasons (`). Each reason consists of a term and a message. When sequences are combined, their failures are joined: (1) the failure with the greatest progress (see figure 5) is selected; (2) if they have the same progress, their message sets are combined. The identity element is •; it is considered to have less progress than any other failure. Failure is a bounded join-semilattice with least element •. Figure 12 defines the monad’s operations, including unit, bind (written ?), and disjoin (written ˆ ). The unit operation creates a

Figure 10. Ellipsis-head patterns 5.4

Semantics

The syntax-parse matching algorithm is based on two principles:

Ellipsis-head patterns

Ellipsis-head patterns—specified in figure 5.4 are the final ingredient necessary to specify syntax like the keyword options of definestruct. An ellipsis-head pattern may have multiple alternatives combined with ˜or; each alternative is a head pattern. It specifies

242

SE(A) Failure Progress Reason Message

se φ π ` msg

::= ::= ::= ::=

ha1 , · · · , an ; φi where ai ∈ A • | FAIL(π, {`1 , · · · , `n }) | π · F IRST | π · R EST | π · L ATE (z, msg)

unit(a) fail(π, `) ha1 , · · · , an ; φi ? f

= = =

Term z ::= x | datum | () | (z1 . z2 ) Substitution σ, ρ ::= {x1 7→ zn , · · · , xn 7→ zn } σ t hσ1 , · · · , σn ; φi S[[S ]]ρ∆ zπ` A[[A]]ρ∆ π` H[[H ]]ρ∆ zπ`

ha; •i h ; FAIL(π, {`})i f (a1 ) ˆ · · · ˆ f (an ) ˆ h ; φi

hσ t σ1 , · · · , σ t σn ; φi SE(Substitution) SE(Substitution) SE(Substitution, Term, Progress)

Figure 13. Domains, operations, signatures for pattern semantics

ha1 , · · · , ak ; φ1 i ˆ hak+1 , · · · , an ; φ2 i = ha1 , · · · , ak , ak+1 , · · · , an ; φ1 ∨ φ2 i

overload the combination operator notation; when the right-hand side is a SE-sequence, it indicates that the left-hand substitution is combined with every substitution in the sequence. The pattern denotation functions are parameterized over a set of syntax definitions ∆ and a substitution ρ from patterns already matched. In addition to the appropriate patterns, the denotation functions take up to three additional arguments: a term (z) to parse, a progress string (π), and a failure reason (`). The term and progress arguments change as the matching algorithm descends into the term. The term argument is not needed, however, for action patterns. The reason argument represents the closest enclosing description; it changes when matching passes into a ˜describe form. Each of the pattern denotation function returns a SE-sequence representing successes and failure. The S and A functions return sequences whose success elements are substitutions. The H function additionally includes terms and progress strings, which indicate where to resume matching.

Figure 12. Single-elimination sequences and operations sequence of one success and no failure. Disjoin ( ˆ ) concatenates successes and joins (∨) the failures, and bind (?) applies a function to all successes in a sequence and combines the resulting sequences with the original failure. This monad is similar to the standard list monad except for the way it handles failures. One might expect to use the simpler model of a list of successes or a failure. After all, if a pattern succeeds, backtracking typically occurs only when triggered by a failure of greater progress, which would make any failure in the prior pattern irrelevant. This is not always the case, however. Furthermore, our choice has two advantages over the seemingly simpler model. First, ranking failures purely by progress is compelling and easy for programmers to understand. Second, this monad corresponds neatly to a twocontinuation implementation [Wand and Vaillancourt 2004]. 6.2

= : : :

S[[(˜var x)]]ρ∆ zπ` = unit({x 7→ z}) S[[(˜var x (cS e))]]ρ∆ zπ` {y7→eval(e,ρ)} = S[[S ]]∆ zπ` ? λσ. pfx(x, σ) t unit({x 7→ z}) where {cS (y) = S } ∈ ∆ S[[(˜datum d )]]ρ∆ zπ` ( unit(∅) when z = d = fail(π, (z, “expected d ”)) otherwise S[[(S1 . S2 )]]ρ∆ zπ` 8 S[[S1 ]]ρ∆ z1 (π·F IRST)` > > > < ? λσ. σtS[[S ]]ρtσ z (π · R EST)` 2 ∆ 2 = > when z = (z 1 . z2 ) > > : fail(π, `) otherwise S[[(˜and S1 S2 )]]ρ∆ zπ` = S[[S1 ]]ρ∆ zπ` ? λσ. σ t S[[S2 ]]ρtσ ∆ zπ` S[[(˜and S1 A2 )]]ρ∆ zπ` = S[[S1 ]]ρ∆ zπ` ? λσ. σ t A[[A2 ]]ρtσ ∆ π` S[[(˜or S1 S2 )]]ρ∆ zπ` = S[[S1 ]]ρ∆ zπ` ˆ S[[S2 ]]ρ∆ zπ` S[[(˜describe e S )]]ρ∆ zπ` = S[[S ]]ρ∆ zπ(z, eval(ρ, e)) S[[(H1 . S2 )]]ρ∆ zπ` 0 0 = H[[H1 ]]ρ∆ zπ` ? λ(σ, z 0 , π 0 ). σ t S[[S2 ]]ρtσ ∆ z π `

Domains and signatures

We explain pattern matching on a core version of the pattern language. The colon shorthand for annotated pattern variables is desugared into the ˜var form. Similarly, all datum patterns are given as explicit ˜datum patterns. All ˜and and ˜or patterns are converted to have exactly two sub-patterns; ˜and patterns must be left-associated so that any action patterns in the original ˜and pattern occur as second sub-patterns of the desugared ˜and patterns. The disjuncts of core ˜or patterns all bind the same attributes; additional bindings via ˜and and ˜parse are added as necessary to make “absent” attributes explicit. We generalize the repetition constraint forms ˜optional and ˜once to a ˜between form. An unconstrained ellipsis head pattern is modeled as a ˜between pattern with Nmin = 0 and Nmax = ∞. Each repetition disjunct has a distinct label (R) used to track repetitions and two message expressions, one to report too few repetitions and one for too many. We omit the ellipsis nesting depth of attributes; it is a static property and as such easy to compute separately. Syntax classes take a single parameter and references to syntax classes are updated accordingly. The syntax class’s variants are combined into a single ˜or pattern, which is wrapped with a ˜describe pattern holding the syntax class’s description. Finally, we assume an eval function for evaluating expressions. The environment of evaluation is a substitution with mappings for attributes encountered previously in the pattern matching process. For simplicity, we do not model the environment corresponding to the program context. It would be easy but tedious to add. Figure 13 defines the additional domains and operations used by the semantics as well as the signatures of the denotation functions. Terms consist of atoms and “dotted pairs” of terms. Parsing success is represented by a substitution σ mapping names to terms. Substitutions are combined by the t operator, which produces a substitution with the union of the two arguments’ attribute bindings. We

Figure 14. Semantics of S-patterns 6.3

Meaning

A syntax-parse expression has the following form: (syntax-parse stx [S1 rhs1 ] . . . [Sn rhsn ])

243

The meaning of the syntax-parse expression is defined via the following denotation:

H[[(˜seq . L)]]ρ∆ zπ` = S[[S]]ρ∆ zπ` ? λσ. (σ − {pr, term}, σ(pr), σ(term)) where S = rewrite-L(L) S[[(˜end-of-head)]]ρ∆ zπ` = unit({pr = π, term = z}) H[[(˜and H1 H2 )]]ρ∆ zπ` = H[[H1 ]]ρ∆ zπ` ? λ(σ, z 0 , π 0 ). σ t S[[S2 ]]ρ∆ (take(z, π, π 0 ))π` where S2 = (H2 . ()) H[[(˜or H1 H2 )]]ρ∆ zπ` = H[[H1 ]]ρ∆ zπ` ˆ H[[H2 ]]ρ∆ zπ` H[[(˜var x (cH e))]]ρ∆ zπ` {y7→eval(e,ρ)} = H[[H ]]∆ zπ` ? f where {cH (y) = H } ∈ ∆ f(σ, z 0 , π 0 ) = unit(g(σ, π 0 ), z 0 , π 0 ) g(σ, π 0 ) = {x 7→ take(z, π, π 0 )} t pfx(x, σ)

S[[S ]]∅∆ z` where result is fresh with respect to S , ∆ S = (˜or (˜and S1 (˜parse result rhs1 )) · · · (˜and Sn (˜parse result rhsn ))) z = eval(stx, ∅) ` = (z, “bad syntax”) If the sequence contains at least one substitution, the result of the syntax-parse expression is the result attribute of the first substitution in the sequence. Otherwise, the syntax-parse expression fails with an error message derived from the SE-sequence’s failure. Figure 14 shows the denotations of single-term patterns. A variable pattern always matches, and it produces a substitution mapping the pattern variable to the input term. A class pattern matches according to the pattern recorded in the syntax class environment ∆. The resulting substitutions’ attributes are prefixed (pfx) with the pattern variable, and the pattern variable binding itself is added. When a ˜datum pattern fails, it synthesizes an error message based on the expected datum. The other pattern variants use the inherited error reason (`), which represents the closest enclosing description around the pattern. That is, it represents the nearest “explainable” frame in the matching context. The pair, head, and ˜and patterns propagate the success substitutions from their first sub-patterns to their second sub-patterns. This allows expressions within patterns to refer to attributes bound by previous patterns. Head patterns also produce a term and progress string in addition to each success substitution; the term and progress indicate where to resume matching.

pr, term do not appear in the pattern rewrite-L(())

=

(˜end-of-head)

rewrite-L((S1 . L2 ))

=

(S1 . rewrite-L(L2 ))

rewrite-L((H1 . L2 ))

=

(H1 . rewrite-L(L2 ))

rewrite-L((EH 1 ... . L2 ))

=

(EH 1 ... . rewrite-L(L2 ))

Figure 16. Semantics of H-patterns yields a repetition environment mapping a ˜between form to the number of times it has occurred in the sequence so far. A ˜between form’s lower bound is checked when matching proceeds to the tail; its upper bound is checked on every iteration of the head pattern.

A[[(˜parse S e)]]ρ∆ π` = S[[S ]]ρ∆ (eval(e, ρ))(π · L ATE)` ρ A[[(˜fail 8 econd emsg )]]∆ π` > : unit(∅) otherwise A[[(˜late A)]]ρ∆ π` = A[[A]]ρ∆ (π · L ATE)`

6.4

Implementation

The implementation of syntax-parse uses a two-continuation representation of the backtracking monad. The success continuation is represented as an expression where possible, so that substitutions are represented in Racket’s environment rather than as a data structure. Thus, the code is similar to the backtracking-automaton method of compiling pattern matching. We have not yet attempted to add known pattern-matching optimizations to our implementation but plan on doing so. Optimizations must be adapted to accommodate progress tracking. For example, exit optimization [Fessant and Maranget 2001] may not skip a clause that cannot succeed if the clause may fail with greater progress than the exiting clause.

Figure 15. Semantics of A-patterns Action patterns, unlike other kinds of patterns, do not depend on the term being matched. Like single-term patterns, however, they produce records. Figure 15 displays the denotations of action patterns. The ˜parse pattern evaluates its sub-expression to a term and matches that term against the sub-pattern. The ˜fail pattern evaluates its condition expression in the context of the previous attributes. Depending on the result, it either succeeds with an empty record or fails with the associated label. The ˜late form extends the progress string, marking the enclosed pattern as a late check. A ˜seq pattern matches a sequence of terms if the embedded list pattern would match the compound term consisting of those terms. Rather than duplicating and modifying the denotation function for single-term patterns to work with list patterns, we reuse S and add a new variant of single-term pattern, (˜end-of-head), that sneaks the additional information into the substitution. For head ˜and patterns, we perform the opposite transformation; after the first conjunct matches a sequence of terms, we convert that sequence into a term (take). We convert the second conjunct from a head pattern to a single-term pattern and use it to match the new term. We omit the semantics of ellipsis patterns. It is similar to the semantics of head patterns, but an ellipsis-head pattern additionally

7.

Case studies

Racket has included syntax-parse for one year. Reformulating existing macros with syntax-parse can cut parsing code by several factors without loss in quality in error reporting. Users confirm that syntax-parse makes it easy to write macros for complex syntax. The primary benefit, however, is increased clarity and robustness. This section presents two case studies illustrating applications of syntax-parse. The case studies are chosen from a large series to span the spectrum of robustness; the first case study initially performed almost no error checking, whereas the second case study checked errors aggressively. Each case study starts with a purpose statement, followed by an analysis of the difference in behavior and a comparison of the two pieces of code. 7.1

Case: loop

The loop macro [Shivers 2005] allows programmers to express a wide range of iteration constructs via loop clauses. The loop macro

244

is an ideal case study because the existing implementation performs almost no error-checking, and its author makes the following claim:

helper macro, the parsing code shrank, despite much improved error handling, due to simplifications enabled by syntax-parse.

It is frequently the case with robust, industrial-strength software systems for error-handling code to dominate the line counts; the loop package is no different. Adding the code to provide careful syntax checking and clear error messages is tedious but straightforward implementation work.

7.2

Case: parser

The parser macro [Owens et al. 2004] implements a parser generator for LALR(1) grammars. The macro a grammar description and a few configuration options, and it generates a table-driven parser or a list of parsers, if multiple start symbols are given. The parser case study represents macros with aggressive, hand-coded error reporting. The macro checks both shallow properties as well as contextdependent constraints. The parser macro takes a sequence of clauses specifying different aspects of the parser. Some clauses are mandatory, such as the grammar clause, which contains the list of productions, and the tokens clause, which imports terminal descriptions. Others are optional, such as the debug clause, which specifies a file name where the table descriptions should be printed. In all, there are ten clauses, five mandatory and five optional, and they can occur in any order. The original version used a loop and mutable state to recognize clauses; different clauses were parsed at various points later in the macro’s processing. The new version uses our improved ellipses patterns in two well-defined passes to resolve dependencies between clauses. For example, the productions in the grammar clause depend on the terminals imported by the tokens clause. The second pass involves syntax classes parameterized over the results gathered from the first pass. The original version of parser explicitly detects thirty-nine different syntax errors beyond those caught by MBE-style patterns. Repetition constraints (˜once and ˜optional) on the different clause variants cover thirteen of the original errors plus a few that the original macro failed to check. Pattern variable annotations cover eleven of the original errors, including simple checks such as “Debugging filename must be a string” as well as context-dependent errors such as “Start symbol not defined as a non-terminal.” The latter kind of error is handled by a syntax class that is parameterized over the declared non-terminals. Side-condition checks cover eight errors—such as “duplicate non-terminal definition”— with the use of #:fail-when. The remaining seven checks performed by the original macro belong to catch-all clauses that explain what valid syntax looks like for the given clause or sub-form. Five of the catch-all checks cover specific kinds of sub-forms, such as “Grammar must be of the form (grammar (non-terminal productions . . . ) . . . ).” In a few cases the message is outdated; programmers who revised the parser macro failed to update the error message. In the syntax-parse version each of these sub-forms is represented as a syntax class, which automatically acts as a local catch-all according to our error message generation algorithm (section 4.2); syntax-parse reports the syntax class’s description rather than reciting the macro’s documentation. (A macro writer could put the same information in the syntax class description, if they wanted to.) The final two checks are catch-alls for parser clauses and the parser form itself. These are implemented using ˜fail and patterns crafted to catch clauses that do not match other clause keywords. In most cases the error messages are rephrased according to syntax-parse conventions. For example, where the original macro reported “Multiple grammar declarations,” the new macro uses “too many occurrences of grammar clause”; and where the original macro reported “End token must be a symbol,” the new macro produces the terser message “expected declared terminal name.” The original version devoted 570 lines to parsing and processing, counting the macro and its auxiliary functions. The line count leaves out separate modules such as the one that implements the LALR(1) algorithm. In the original code, parsing and processing are tightly intertwined, and it is impossible to directly count the

Olin Shivers, 2005 In other words, adding error-checking to the loop macro is expected to double the size of the code. Using syntax-parse we can do better. The original loop macro performs little error checking; in thirtytwo exported macros there are only three syntax validation checks plus a handful of internal sanity checks. The exported macros consist of the loop macro itself plus thirty-one CPS macros [Hilsdale and Friedman 2000] for loop clauses such as for and do. CPS macros pose challenges for generating good error messages because the macro’s syntax differs from the syntax apparent to the user due to the CPS protocol. When the programmer writes (for x in xs), the loop macro rewrites it as (for (x in xs) k kargs) to accommodate the macro’s continuation. Errors in the programmer’s use of for should be reported in terms of the original syntax, not the rewritten syntax. We accomplish this by parsing the syntax in two passes. We parse the CPS-level syntax and reconstruct the original term, and then we parse that term. Twenty of the CPS macros are expressed using define-simple-syntax, a simplified version of define-syntax. We changed define-simple-syntax to automatically rewrite these macros’ patterns to perform two-stage parsing; we also changed them to use syntax-parse internally so that the simple macros could use annotations and the other features of our system. The other eleven CPS macros were transformed by hand. Another hazard of CPS macros is inadvertent transfer of control to a macro that does not use the CPS protocol, resulting in incoherent errors or unexpected behavior. In Racket, this problem can be prevented by registering CPS macros and checking their applications. We use a syntax class to recognize registered CPS macros. Once the concrete syntax is separated from the CPS-introduced syntax, validating it is fairly simple. Many of the loop forms take only expressions, so validation is trivial. Some of the loop forms require identifier annotations or simple side conditions. The initial and bind loop forms have more structured syntax, so we define syntax classes for their sub-terms, including a shared syntax class var/vars; it represents a single variable or a group of variables. A loop-clause keyword such as for is implemented by a macro named loop-keyword/for; the name is chosen to reduce contention for short names. The loop macro rewrites the loop-clause keywords, except that programmers can write the long form in parentheses, e.g., ((loop-keyword/for) x in xs), to avoid the rewriting. The code to recognize and rewrite both cases and is duplicated, since for enforces the same protocol for its auxiliaries: in becomes for-clause/in. In the syntax-parse version, we define a loopkw syntax class that does the rewriting automatically. The syntax class is parameterized so it can handle both loop and for keywords. The original version of the loop macro consists of 1840 lines of code, not counting comments and empty lines. The implementation of the loop keyword macros takes 387 lines; the rest includes the implementation of its various intermediate languages and scope inference for loop-bound variables. The syntax-parse version is 1887 lines, an increase of forty-seven lines. The increase is due to the new version of define-simple-syntax. Overall, the increase is 12% of the size of the main body of the macros and merely 2.6% of entire code, which falls far short of the 100% increase predicted by the package’s highly experienced author. Aside from the new

245

of abstraction. Even though syntax-parse has been available for less than a year, it has become clear that it improves on MBE-style macros to the same degree—or perhaps a larger one—that MBE improved over Lisp-style macros.

lines of code dedicated to each. In the new version, parsing and processing took a total of 378 lines of code, consisting of 124 lines for parsing (25 for the main macro pattern and 99 for syntax class definitions) and 254 lines for processing. By reasoning that the lines dedicated to processing should be roughly equivalent in both versions, we estimate 300 lines for processing in the original version, leaving 270 for parsing. Thus the syntax-parse version requires less than half the number of lines of code for parsing, and the new parsing code consists of modular, declarative specifications. The error reporting remains of comparable quality.

8.

Acknowledgments We are grateful to Matthew Flatt, Guy Steele, Sam Tobin-Hochstadt, and Jon Rafkind for feedback on the design and implementation of syntax-parse.

References Cadence Research Systems. Chez Scheme Reference Manual, 1994. R. Culpepper and M. Felleisen. Taming macros. In International Conference on Generative Programming and Component Engineering, pages 225–243, 2004.

Related work

Other backtracking parsers, such as packrat parsers [Ford 2002], also employ the technique of tracking and ordering failures. Unlike shift/reduce parsers, which enjoy the viable-prefix property, packrat parsers cannot immediately recognize when an input stream becomes nonviable—that is, where the error occurs. Instead, they maintain a high-water mark, the failure that occurs furthest into the input along all branches explored so far. While these string parsers can represent progress as the number of characters or tokens consumed, syntax-parse uses a notion of progress based on syntax tree traversal. Our ordering of parse failures is also similar to the work of Despeyroux [1995] on partial proofs in logic programming. In that work, a set of inference rules is extended with “recovery” rules that prove any proposition. The partial proofs are ordered so that use of a recovery rule has less progress than any real rule and uses of different original rules are incomparable; only the maximal proofs are returned. In contrast to the order of that system, which is indifferent to the system’s rules and propositions, our system uses the pragmatics of parsing syntax to define the order. Another line of research in macro specifications began with static checking of syntactic structure [Culpepper and Felleisen 2004] and evolved to encompass binding information and hygienic expansion [Herman and Wand 2008]. These systems, however, are incapable of fortifying a broad range of widely used macro programming idioms, and they do not address the issues of error feedback or of modular syntax specification addressed by our system.

9.

T. Despeyroux. Logical programming and error recovery. In Industrial Applications of Prolog, Oct. 1995. R. K. Dybvig, R. Hieb, and C. Bruggeman. Syntactic abstraction in Scheme. Lisp and Symbolic Computation, 5(4):295–326, Dec. 1993. F. L. Fessant and L. Maranget. Optimizing pattern matching. In International Conference on Functional Programming, pages 26–37, 2001. R. B. Findler, J. Clements, C. Flanagan, M. Flatt, S. Krishnamurthi, P. Steckler, and M. Felleisen. DrScheme: A programming environment for Scheme. Journal of Functional Programming, 12(2):159–182, 2002. M. Flatt and PLT. Reference: Racket. Technical report, PLT Inc., January 2010. http://racket-lang.org/tr1/. B. Ford. Packrat parsing: a practical linear-time algorithm with backtracking. Master’s thesis, Massachusetts Institute of Technology, Sept. 2002. D. Herman and M. Wand. A theory of hygienic macros. In European Symposium on Programming, pages 48–62, Mar. 2008. E. Hilsdale and D. P. Friedman. Writing macros in continuation-passing style. In Workshop on Scheme and Functional Programming, pages 53– 59, 2000. J. Hughes. The design of a pretty-printing library. In Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques-Tutorial Text, pages 53–96, London, UK, 1995. Springer-Verlag. E. E. Kohlbecker and M. Wand. Macro-by-example: Deriving syntactic transformations from their specifications. In Symposium on Principles of Programming Languages, pages 77–84, 1987. P. J. Landin. Correspondence between ALGOL 60 and Church’s lambdanotation: part i. Commun. ACM, 8(2):89–101, 1965. S. Owens, M. Flatt, O. Shivers, and B. McMullan. Lexer and parser generators in Scheme. In Workshop on Scheme and Functional Programming, pages 41–52, Sept. 2004. O. Shivers. The anatomy of a loop: a story of scope and control. In International Conference on Functional Programming, pages 2–14, 2005. M. Sperber, R. K. Dybvig, M. Flatt, A. van Straaten, R. Findler, and J. Matthews. Revised6 report of the algorithmic language Scheme. Journal of Functional Programming, 19(S1):1–301, Aug. 2009. M. Wand and D. Vaillancourt. Relating models of backtracking. In International Conference on Functional Programming, pages 54–65, 2004.

Conclusion

Our case studies, our other experiences, and reports from other programmers confirm that syntax-parse makes it easy to write easyto-understand, robust macros. Overall syntax-parse macros take less effort to formulate than comparable macros in MBE-based systems such as syntax-case and syntax-rules or even plain Lisp-style macros. Also in contrast to other macro systems, the syntax-parse style is distinctively declarative, closely resembling grammatical specification with side conditions. Best of all, these language extensions are translated into implementations that comprehensively validate all the constraints and that report errors at the proper level

246

Functional Parallel Algorithms Guy E. Blelloch Carnegie Mellon University Pittsburgh, PA [email protected]

247

Specifying and Verifying Sparse Matrix Codes ∗ Gilad Arnold

Johannes Hölzl

University of California, Berkeley [email protected]

Technische Universität München [email protected]

École Polytechnique Fédérale de Lausanne [email protected]

Rastislav Bodík

Mooly Sagiv

University of California, Berkeley [email protected]

Tel Aviv University † [email protected]

Abstract

hierarchy, expose parallelism that fits the hardware, and tailor the layout to the operations that will be performed on the matrix. The development of a sparse matrix format is nontrivial; formats exploit algebraic properties such as commutativity, associativity and zero; have to judiciously choose between linear and random access to array data to improve cache locality, memory bandwidth and use of vector instructions. Sparse codes are used heavily in scientific applications, simulations and data mining, as well as other domains. It is expected that more formats will be designed to support future (parallel) platforms. Our goal is to simplify their development. Sparse matrix codes are typically implemented using imperative languages like C and Fortran. This gives programmers control over low-level details of the computation, allowing them to create optimized implementations. However, imperative implementations obfuscate the structure of the format because logically independent steps of sparse matrix construction are fused, resulting in code with loop nests that contain complex array indirections, in-place data mutation and other low-level optimizations. Not only is the code hard to read, it is also challenging to verify. In fact, we failed to verify the functional correctness of even simple formats using several state-of-the-art tools. The key reason was that describing the properties of the format construction expressed using such low-level implementations required complex invariants that were hard to formulate. Consequently, we sought to raise the level of abstraction in programming sparse matrix formats. We describe a new approach to implementing and verifying sparse matrix codes. The main idea is to specify sparse codes as functional programs, where a computation is a sequence of highlevel transformations on lists. We then use Isabelle/HOL to verify full functional correctness of programs. We identify a “little language” (LL) for specifying a variety of sparse matrix formats. LL is a strongly typed, variable-free functional programming language in the spirit of FP [1]. It is also influenced by such languages as APL, J, NESL and Python, but favors simplicity and ease of programming over generality and terseness. LL provides several built-in functions and combinators for operations over vectors and matrices common in sparse formats. LL is restricted by design, lacking custom higher-order functions, recursive definitions, and a generic reduction operator. These limitations of LL, as well as its purely functional semantics, facilitate automatic verification of sparse codes. The contributions of this paper can be summarized as follows.

Sparse matrix formats are typically implemented with low-level imperative programs. The optimized nature of these implementations hides the structural organization of the sparse format and complicates its verification. We define a variable-free functional language (LL) in which even advanced formats can be expressed naturally, as a pipeline-style composition of smaller construction steps. We translate LL programs to Isabelle/HOL and describe a proof system based on parametric predicates for tracking relationship between mathematical vectors and their concrete representations. This proof theory automatically verifies full functional correctness of many formats. We show that it is reusable and extensible to hierarchical sparse formats. Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifications—Specialized application languages; F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about Programs—Mechanical Verification General Terms Languages, Verification

1.

Ali Sinan Köksal

Introduction

Sparse matrix formats compress large matrices with a small number of nonzero elements into a more compact representation. The goal is to both reduce memory footprint and increase efficiency of operations such as sparse matrix-vector multiplication (SpMV). More than fifty formats have been developed; the reason for this diversity is that a format may improve memory locality in a given memory ∗ Research

supported by Microsoft (Award #024263) and Intel (Award #024894) funding and by matching funding by U.C. Discovery (Award #DIG07-10227). Additional support comes from Par Lab affiliates National Instruments, NEC, Nokia, NVIDIA, Samsung, and Sun Microsystems. † This work was done while visiting Stanford University supported in part by grants NSF CNS-050955 and NSF CCF-0430378 with additional support from DARPA.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

• We design a variable-free functional language for sparse matrix

codes. We show how interesting and complex sparse formats can be naturally and concisely expressed in LL (Section 3).

249

0

• We describe a powerful proof method for automatic verification

a Bb @0 0

of sparse matrix codes using Isabelle/HOL [11] (Section 4). • We evaluate the reusability of proof rules in our theory and

its extensibility to proving additional formats. We show that our language and verifier can accommodate complex formats including Jagged Diagonals (JAD) and Coordinate (COO), as well as hierarchical formats including Sparse CSR (SCSR), register- and cache-blocking schemes (Section 5). As far as we know, this is the first successful attempt in proving full functional correctness of operations on such formats.

1 0 0C 0A e

P: [1 3 0 2] D: [3 5] J: [0 1 0 1 3] V: [b d a c e]

R: [1 3 3 5] J: [0 0 1 1 3] V: [a b c d e]

Figure 1. Two sparse matrix formats. Shown are “imperative” representations; their LL counterparts are in Figures 2 and 4.

arguments by name; instead, they transform a single, unnamed input parameter. For example, if the input to a function is a pair, then a function extracts the first element using the built-in function fst. LL is strongly typed and datatypes include numbers, Boolean values, pairs and lists. Vectors are represented by lists, and matrices by lists of (row) vectors. Compressed matrix representations use a variety of nested data structures built of lists and pairs. The three steps of JAD construction in LL are visualized in Fig. 2, which shows the dense matrix, the resulting JAD matrix, as well as the intermediate values of JAD construction. Notice that the JAD representation in LL (the result in Fig. 2) is more abstract than the JAD format in C (Fig. 1(b)). Where LL formats rely on lists of lists, the C formats linearize the outer list and create explicit indexing structures to access the inner lists. LL thus frees the programmer from reasoning about these optimized data structure layouts, eliminating dependence on explicit array indirection. The first step compresses rows by invoking the constructor for the sparse format CSR. In the second step, the function lenperm sorts the compressed rows by decreasing length:

Overview

This section outlines our solutions for implementing and verifying sparse matrix programs. We demonstrate our language using the JAD sparse format, and the proof system using the CSR sparse format. These formats are introduced properly in Section 3; in this section, we will make do with an informal overview of the formats and the examples shown in Fig. 1. 2.1

0 0 0 0

(a) Dense matrix. (b) JAD sparse format. (c) CSR sparse format.

We are currently writing a compiler which automatically generates efficient low-level code from LL programs.

2.

0 c 0 d

Sparse matrix codes in the LL language

Sparse matrix formats are usually constructed with a sequence of transformations. For example, a JAD sparse matrix is constructed in three steps, by (i) compressing each row in the dense matrix; (ii) sorting compressed rows by their length; and (iii) transposing the rows. Efficient imperative implementations usually fuse these distinct steps, which complicates code comprehension and maintenance. We define a small functional language that keeps these steps separate. The fusion, necessary for performance, will be performed by a data-parallel compiler (which is under development and outside the scope of this paper). Let us compare the characteristics of imperative and functional implementations of JAD format construction. Consider first the C code that compresses a dense matrix M into the JAD format, represented by arrays P, D, J, and V. The low-level code reads and writes a single word at a time, relies heavily on array indirections (i.e., array accesses whose index expressions are themselves array accesses), and explicitly spells out loop boundaries. The code does not distinguish the three construction steps provides little insight into the JAD format:

def lenperm: [(len, (#, id))] -> sort -> rev -> [snd] -> unzip

Here, the syntax [f ] denotes a map that applies the function f over the elements of the input list: len, # and id return the length of the current element, the position index of that element in the list, and the element itself (identity), respectively. The third-step function (fst, snd -> trans) takes a pair and produces a pair in which first element is unchanged and the second element is transposed. In summary, LL lifts an intricate imperative computation into a cleaner functional form, exposes high-level stages and the flow of data from one stage to another, and encourages the programmer to think about invariants over intermediate results. These benefits are not merely due to the use of functional programming. We believe that they are equally attributed to our careful selection of a very simple subset of functional language features, designed with the sparse matrix domain in mind. In particular, LL does not support lambda abstractions, which encourages expressing computations as pipelines of functions. LL also excludes definitions of recursive functions and a general fold operator, both of which are compensated for by a versatile set of built-ins (e.g., zip and sum) and combinators for handling lists (e.g., map and filter). These restrictions contribute to our ability to automatically verify LL programs because they sidestep the need to infer induction invariants, a hard task for automated tools. The LL language is introduced in detail in Section 3. We have recently developed compiler for LL that relies on optimization techniques pioneered in NESL [3] and later generalized in Data Parallel Haskell [4]. Thanks to LL’s simplicity, we were able to simplify the compilation and indentify more opportunities for optimization. Initial results indicate that code generated for realworld formats such as register-blocked CSR (see Section 5.2) runs as fast as a hand-optimized code and scales well to multiple cores.

lenperm (M, P); /* obtain row permutation */ for (d = k = 0; d < n; d++) { kk = k; for (i = 0; i < n; i++) { for (j = nz = 0; j < m; j++) if (M[P[i]][j]) if (++nz > d) break; if (j < m) { J[k] = j; V[k] = M[P[i]][j]; k++; } } if (k == kk) break; D[d] = k; }

Contrast the C code with this LL program, which is a composition of three functions corresponding to the steps in JAD construction. The function composition operator is ->. def jad: csr -> lenperm -> (fst, snd -> trans)

LL is a functional language rooted in the variable-free style of FP/FL [1], which means that functions do not refer to their

250

0

a Bb @0 0

0 c 0 d

0 0 0 0

1 ´ 0` 1 0 ` (0, a) ´ B (0, b) (1, c) C 0C C lenperm ´ csr B ` A 0 A −−−→ @ ` · ´ −−−−−−−→ e (1, d) (3, e)

` ´ 1 3 0 2 ` ´ ´1 0` 1 3 0 2 (0, b) (1, c) ` „` ´ ´« B (1, d) (3, e) C (fst, snd -> trans) C −−−−−−−−−−−−−−−−−→ B` `(0, b) (1, d)´ (0, a) ´ A @ (0, a) (1, c) (3, e) ` ´ ·

Figure 2. The three steps of JAD format construction. Shown are the dense matrix, the JAD matrix, and the two intermediate values. 2.2

Verifying sparse matrix codes

define suitable representation relations for the objects that arise in sparse matrix programs. We use Isabelle/HOL as our underlying theorem prover. We embed LL functions in Isabelle using typed λ-calculus and Isabelle libraries. Our proofs deploy two techniques: (a) term simplification, which rewrites subterms in functions into simpler, equivalent ones; and (b) introduction, which substitutes a proof goal with a certain term for alternative goal(s) that do not contain the term, and whose validity implies the validity of the original goal. In our example, term simplification unfolds the definitions of csrmv and csr and applies standard rules for simplifying function application and composition, map and filter operations on lists, and extraction of elements from pairs. This results in the goal

There are at least two arguments for full functional verification of sparse matrix codes. First, classical static typing is insufficient for static bug detection because these programs contain array indirection, whose memory safety would be typically guaranteed only with run-time safety checks. Dependent type systems may be able to prove memory safety but, in our experience, the necessary dependent-type predicates would need to capture invariants nearly as complex as those that we encountered during full functional verification. For example, to prove full functional correctness, one may need to show that a list is some permutation of a subset of values in another list; to prove memory safety, one may need to show that the values in a list are smaller than the length of another list. It thus seemed to us that with a little extra effort, we can use theorem proving to extend safety to full functional correctness. The second reason for full functional verification is synthesis of sparse matrix programs, including the discovery of new formats. In inductive synthesis, which is conceptually a search over a space of plausible (i.e., potentially semantically incorrect) implementations, a full functional verifier is a prerequisite for synthesis because it is an arbiter of correctness of the selected implementation. Synthesis, however, is outside the scope of this paper. Before settling on the design presented in this paper, we set as our goal the full functional verification of imperative sparse code, in the style presented in Section 2.1. However, even the simple CSR format turned out to be rather overwhelming. We attempted to verify its correctness in multiple ways: (i) manually with Hoarestyle logic, both with first-order predicates and inductive predicates; (ii) with ESC/Java [6]; (iii) with TVLA [13]; and (iv) using a SAT-based bounded model checker. The results were unsatisfactory either because it took weeks to develop the necessary invariants (i, ii), the abstraction was too complex for us to manage (iii), or because the checker scaled poorly (iv). Eventually, we concluded that we needed to verify sparse codes at a higher level of abstraction (and separately compile the verified code into efficient low-level code). Turning our attention to functional programs allowed us to replace explicit loops over arrays with maps and a few fixed reductions over lists, which in turn simplified the formulation and encapsulation of inductive invariants. Let us the simple CSR format to give the rationale for the design of our proof system. Suppose that A and x are concrete language objects that, respectively, contain dense representations of a mathematical matrix B and a vector y. We want to prove that the product of the CSR-compressed A with x produces an object that is a valid (dense) representation of the vector B · y. Note that the product is CSR-specific. Formally, our verification goal is

[enum -> [snd != 0 ? ] ->[snd * x[fst]] -> sum](A) m

B·y (1) The LL function on the left enumerates each row of A into a list of (column index, value) pairs, then filters out pairs whose second element is zero ([snd != 0 ? ]). For the remaining pairs, it multiplies the second (nonzero) component with the value of x at the index given by the first component ([snd * x[fst]]). Finally, it sums the resulting products (sum). So far, simplification has done a good job. To carry out the next step of the proof, we observe that the missing zeros do not affect the result of the computation, so we would like to simplify the left-hand-side by rewriting away the filter ([snd != 0 ? ]); this would effectively “desparsify” the lefthand side, moving it closer to the mathematical right-hand-side. Unfortunately, standard simplification available in prover libraries cannot perform the rewrite; we would need to add a rule tailored to this format. The hypothetical rule, shown below, would match p with snd != 0 and f with snd * x[fst]. ∀y . ¬p(y) −→ f (y) = 0 [p ? ] -> [f ] -> sum = [f ] -> sum The rule would achieve the desired simplification but we refrain from adding such a rule because it would take a considerable effort to prove it. Additionally, the rule would of little use in cases where the LL operations appear in just a slightly syntactically different way. We will instead rely on introduction which, by substituting the current goal with a set of goals, isolates independent pieces of reasoning. Introduction rules tend to be more general than simplification rules because they are concerned with a single construct from the current goal. Also, the validity of introduction rules is easier to establish. Our first introduction rule substitutes in the goal (1) the whole result vector with a single element of that vector. In effect, this removes the outermost map from the LL function on the left-hand side. Semi-formally, the rule for map can be stated as follows:

m

csrmv(csr(A), x) B · y

The goal expresses the relationship between a mathematical object k

and its concrete counterpart with the representation relation a b, which states that the concrete object a represents the mathematical vector b: for all i < k, a[i] equals bi and the lengths of a and b are k. In the course of the proof, we may need to track relationships on various kinds of concrete objects; one of our contributions is to

length of A is m

∀i < m . f (A[i]) = Bi m

[f ](A) B

251

(2)

In goal (1), f matches the entire chain of enum -> . . . -> sum and the new subgoals are

ity. For matrices, the inner relation relates a single row to its concrete indexed-list representation (ilist); technically, the inner relation predicate is a parameter to the (outer) representation predicate for the whole matrix. In addition to reducing the number of rules, parameterization helps with syntactic matching and substitution of inner comparators during introduction. For example, with a parameterized relation, an introduction rule for map similar to that in Eq. (2) can be written more generally and concisely: the conclusion of the rule contains an indexed-list representation relation where the concrete object is the term [f ](x) (i.e., map with an arbitrary function f over x) and the inner representation relation is some arbitrary predicate P —our parameter. The premise of the rule is again an indexed-list representation relation where the concrete object is x and the inner representation relation is λi a b. P (i, a, f (b)). Fortunately, Isabelle can match and substitute terms that contain parameters such as P (as well as f and x); these rules can thus be applied automatically. The representation relations are described in Section 4. Section 5 evaluates whether they improve reuse of rules and thus simplify theory development; we argue that the principles used in our approach are crucial for proofs on nested data representations. It may be interesting to apply such parameterized representation relations also in other domains.

(i) length of A is m (ii) ∀i < m . enum -> [snd != 0 ?] -> [snd * x[fst]] -> sum (A[i]) =

X

Bi,j · yj

j
We now need a second introduction step to remove the summation on both sides of the equality: instead of requiring equivalence between sums of sequences of numbers, we will require equivalence between the values in the sequences themselves. In order for such a rule to be general enough, we need to permit arbitrary permutations of the values in a sequence to prove programs that exploit associativity and commutativity of addition. A hypothetical rule may look as follows, where [xi |p(xi )]i=a,...,a+δ denotes a construction of an ordered list of elements out of xa , . . . , xa+δ that satisfy p. ∃n ≤ n, permutation P . n

f (A[i]) [Bi,j | Bi,j = 0]j=P0 ,...,Pn−1 X sum(f (A[i])) = Bi,j

3.

High-Level Sparse Matrix Programming

Sparse matrix codes can often be decomposed into sequences of high-level transformations. This section describes LL and its use for expressing such computations naturally and concisely.

j
This rule is problematic for two reasons. First, it is more complex than what we may want to prove. For example, the premise constructs a filtered and permuted mathematical vector on the righthand side (via list comprehension), rather than keeping the mathematical object untouched. This might hinder our ability to link our proof goal to the original input matrix in the assumptions of the theorem. Second, the rule is not as general as we would like because a concrete representation may contain zeros.

3.1 Introduction to LL The LL language constructs are presented in Fig. 3. The semantics of each construct is shown, either by translation to Isabelle/HOL λ-calculus and standard library for lists [11], or by de-sugaring to simpler LL constructs. The language includes (a) general functions such as identity, equality, constants, conditional branching, and a name binding form used for assigning names to components of an input value; (b) construction of pairs/tuples and extraction of values from pairs; (c) pipeline- and application-style composition, as well as a curried application operator; (d) standard arithmetic operators and comparators; (e) Boolean logic operators; and (f) list handling functions (e.g., distribution of values onto lists, zipping, enumeration, concatenation) and combinators (map, filter, and a unified comprehension syntax).

k

Our approach is to enrich the representation relation (a b). This relation uses plain equality to relate single elements from the two vector objects, which limits its applicability to more subtle mappings. To express a relation where, say, each element in a concrete representation equals the corresponding vector element multiplied by some value, we parameterize the representation relation with an inner relation that describes how individual elements represent their mathematical counterparts. Individual elements need not be scalars; they could be, recursively, lists. Therefore, inner relations could be parameterized by further inner relations. Our domain proof theory for sparse matrices is novel in two ways. First, we define common representation relations that occur in our domain. Our infrastructure is powerful because we (a) insist on relaxing invariants as much as possible (e.g., zeros may still be present in a compressed representation); (b) encapsulate many quantifications and implications in the representation relations (e.g., universal quantification on all indexes of a vector, existence of a permutation); (c) include necessary integrity constraints in the representation relations (e.g., lengths must match). The representation relations we define include indexed list (ilist), where the element at position i represents the ith vector element; value list (vlist), in which all nonzero values are represented; and associative list (alist), which contains index-value pairs. These representation relations raise the level of abstraction and focus theory development on these prevalent data representation. The use of representation relations also prevents oversimplification of proof terms by concealing their internal conjuncts from Isabelle’s simplifier. The second novelty is parameterizing the inner predicate, which describes how the vector elements represent their mathematical counterparts. In the case of a vector of numbers, we use equal-

3.2

Specification of sparse codes using LL

Compressed sparse rows (CSR). This format compresses each row by storing nonzero values together with their column indexes. The resulting sequence of compressed rows is not further compressed, so empty (all zero) rows are retained. This enables random access to the beginning of each row, but requires linear traversal to extract a particular element out of a row. CSR is widely used because it is relatively simple and entails good memory locality for row-wise computations such as SpMV. Implementing CSR in C, shown below,1 is not trivial. Traversal of the dense matrix (construction) or the compressed rows (SpMV) is done with nested loops. Single values are copied (construction) or extracted (SpMV) through array indirection. Compressed row boundaries need to be stored and observed. That said, the resulting SpMV code is rather efficient as the inner product of each row is incrementally accumulated, using very few instructions and avoiding unnecessary memory accesses. Applying CSR construction to the 4-by-4 matrix in Fig. 1(a) yields the data structure in Fig. 1(c). 1 For

brevity, we omit memory allocation and initialization and assume that matrix dimensions are known at compile-time.

252

id eq (=), neq (!=) n, true, false f ? g | h l 1 ,. . . ,lk = f : g † (f ) (f1 ,f2 ,. . .,fk ) fst, snd f -> g g (f1 ,. . .,fk ) g ‘ f add (+), sub (-), mul (*), div (/), mod (%) leq (<=), lt (<), geq (>=), gt (>) sum (/+), prod (/*) and (&&), or (||) neg (!) conj (/&&), disj (/||) len rev sub (f [g ]) subseq (f [g ]) distl, distr zip, unzip enum concat infl gather sort trans map f filter f ‡ [l1 ,. . .,ln = f : g ? h]

λx. x λ(x, y). x = y, λ(x, y). x = y λy. n, λy. true, λy. false λx. ` if f x then g x else h x ´ λ(x1 , . . . , xk ). g[li /λy. xi ] (x1 , . . . , xk ) ◦ f f λx. (f1 x, f2 x, . . . , fk x) λ(x, y). x, λ(x, y). y g◦f (f1 ,. . .,fk ) -> g (f , id) -> g λ(x, y). x + y, λ(x, y). x − y,. . . λ(x, y). x ≤ y, λ(x, y). x < y,. . . foldl (op +) 0, foldl (op ∗) 1 λ(x, y). x ∧ y, λ(x, y). x ∨ y λx. ¬x foldl (op ∧) True, foldl (op ∨) False length rev λ(v, i). v ! i λ(v, s). map (λi. v ! i) s λ(x, v). map (λy. (x, y)) v, λ(v, x). map (λy. (y, x)) v unsplit zip, λl. (map fst, map snd) λv. zip [0 .. < length v] v concat λ(d, n, v). foldr (λ(i, x) v. v[i := x]) v (replicate n d) λxs. map (λk. (k, map snd (filter (λ(k , v). k = k ) xs))) (remdups (map fst xs)) sort_key fst λv. [map (λv. v ! i) (takeWhile (λv. i < length v) v) . i ← [0 .. < if v = [] then 0 else length (v ! 0)]] map f filter f filter (l1 ,. . .,ln = f : g ); map (l1 ,. . .,ln : h)

Figure 3. LL constructs and their translation to Isabelle/HOL. Here, f , g and h denote functions, n a number, and l a label. Alternative infix, prefix or mixfix notation is shown in parentheses. † f defaults to id. ‡ Value naming is optional, f and h default to id and g to true.

0

a Bb @0 0

0 c 0 d

0 0 0 0

10 (0, a) (1, 0) 0 0 C B (0, b) (1, c) 0 A @ (0, 0) (1, 0) e (0, 0) (1, d)

(a)

(2, 0) (2, 0) (2, 0) (2, 0)

1 (3, 0) (3, 0) C (3, 0) A (3, e)

(b)

0` ´ 1 `(0, a) ´ B (0, b) (1, c) C B` ´ C @ · A ` ´ (1, d) (3, e)

ˆ

(c)

[a [b [0 [0

0 c 0 d

0 0 0 0

0] 0] [ 0, b 1, c 2, 0 3, 0 ] b 0] ˜ e] map

Figure 4. Conceptual phases in CSR construction.

filter

/* CSR SpMV. */ for (i=k=0; i<m; i++) for (y[i]=0; k
[b c 0 0]

true

[ 0, a ] [ 0, b 1, c ] [] ˜ [ 2, d 3, e ]

snd neq

enum 0

/* CSR construction. */ for (i=k=0; i<m; i++) { for (j=0; j
ˆ

0, b

?

[ 0, b 1, c ]

Figure 5. Dataflow view of high-level CSR construction.

[enum -> [j, v: v != 0 ? ]]

Fig. 4 shows the high-level stages in CSR compression mentioned above. Given (a), each row is enumerated with column indexes, resulting in (b). Pairs containing a zero value are then filtered, yielding (c). A dataflow view of such a computation is shown in Fig. 5. Notice how similar it is to the following LL function.

Alternatively, one can use an explicit enumeration operator inside comprehensions. The following variant appears more “integrated”, but in fact entails the exact same semantics.

def csr: [enum -> [snd != 0 ? ]]

[[v: v != 0 ? (#, v)]]

Using name binding in comprehensions may improve clarity.

253

A:

x:

0` ´ 1 `(0, a) ´ B (0, b) (1, c) C B` ´ C @ · A ` ´ (1, d) (3, e) ` ´ q r s t

.. . ´ (0, b) (1, c) .. .

`

x:

(a)

`

q

r

s

t

`

0 1

J:

.. ´.

V :

`

´ b c

.. . ´

`

x:

(b)

q

r

s

t

´

`

.. .

´ (b, q) (c, r) .. .

(c)

(d)

0

aq

B bq + cr B B @ 0

1 C C C A

dr + et (e)

Figure 6. Conceptual phases in CSR SpMV. A more verbose variant uses Python-style comprehension. This variant is de-sugared to the original definition.

its row index, which are then “inflated” to obtain the dense result vector in (f). The following LL function implements these steps. def jadmv((P, D), x): (P, D -> [unzip -> snd * x[fst]] -> trans -> [sum]) -> zip -> infl(0, m, id)

def csr(A): [[(j, v) for j, v in enum(r) if v != 0] for r in A]

Fig. 6 shows the stages in CSR SpMV. Each compressed row is multiplied separately, as shown in (b). First, column indexes are separated from nonzero values as in (c). They are used to retrieve corresponding values from x, pairing them with their respective row values as in (d). Finally, values in pairs are multiplied and the products are summed, yielding the inner-product in (e). This maps to the following LL function.

Other formats. Two additional standard formats are Coordinate (COO) and Compressed Sparse Columns (CSC) [10]. COO is a highly-portable compression in which nonzeros are stored together with their row and column indexes in a single, arbitrarily ordered sequence. Construction can be implemented in LL as follows. def coo: [i = #: [v: v != 0 ? (i, #, v)]] -> concat

def csrmv(A, x): A -> [J, V = unzip: (V, x[J]) -> zip -> [mul] -> sum]

COO SpMV is less straightforward: one needs to account for the fact that nonzeros of a particular row might be scattered along the compressed list. It is necessary to gather those values prior to computing the inner-product. This is expressed as follows.

Here, too, it is possible to write a more integrated variant that bundles multiplication with the extraction of single values. Although semantically equivalent, the resulting code is less amenable to vectorization due to the use of word-level operations.

def coomv (A, x): A-> gather -> [(fst, snd -> [j, v: v * x[j]] -> sum)] -> infl(0, m, id)

A -> [[j, v: v * x[j]] -> sum]

A CSC representation is obtained by compressing the nonzero values in the column direction, instead of row direction as in CSR. In C, it is done by swapping the order of the loops iterating over the dense matrix, and storing the row index with the nonzero values. In LL, it amounts to prepending a transposition to CSR construction.

Jagged diagonals (JAD). This format deploys a clever compression scheme that allows handling of sequences of nonzeros from multiple rows, taking advantage of vector instructions. The ith nonzero values from all rows are laid out consecutively in the compressed format, constituting a “jagged diagonal”. Since nonzeros are distributed differently in each row, column indexes need to be stored as well. These steps can be thought of as per-row compression (as shown above for CSR), followed by transposition to invert the direction of compressed rows and ith-element columns. However, packing ith elements in a predetermined order—e.g., from the first to the last row—induces a problem: one needs to account for compressed rows that are shorter than other rows that succeed them.2 This is addressed by adding a sorting step between row compression and transposition, in which rows are ordered by decreasing number of nonzeros. The sort permutation is stored with the resulting diagonals, so the correct order of rows can be restored. These conceptual steps in JAD compression are visualized in Fig. 2 and the LL implementation is shown in Section 2.1. Fig. 7 shows the high-level steps in JAD SpMV. (b) is obtained by computing, for each diagonal, the cross-product of its induced vector of values with the elements of x corresponding to their column indexes. These are transposed to obtain the lists of products in each (nonzero) row as in (c). Products in corresponding rows are summed, obtaining (d). In (e), each inner product is paired with

def csc: trans -> csr

Like COO, CSC SpMV calls for a gather operation prior to summing row cross-products. def cscmv: zip -> [cj, xj: cj -> [i, v: (i, v * xj)]] -> concat -> gather -> [(fst, snd -> sum)] -> infl(0, m, id)

Here, too, the fact that data layout is not in line with the computation entailed by matrix-vector multiplication calls for additional steps to massage the result into a proper vector form. In addition to the above formats, LL can naturally and succinctly describe hierarchical compression. This includes Sparse CSR (SCSR) and different block variants of all of the above. These will be described and studied in Section 5.

4.

Verifying Sparse Codes using Isabelle/HOL

We make use of Isabelle’s rich infrastructure in implementing a proof method for sparse matrix codes. This includes the simplifier and a powerful tactical language, which is used to combine existing proof methods in forming new ones. All parts of our proofs are checked from first principles by a small LCF-style kernel.

2 Transposition inverts columns up to the first missing element, below which

all other elements are omitted. In this respect it is “lossy” and the equality A = AT T only holds for matrices whose rows are sorted by length.

254

P : D: x:

` ´ 1 3 0 2 „` ´« `(0, b) (1, d)´ (0, a) (1, c) (3, e) ` ´ q r s t

´« „` `bq dr´ aq cr et

0` ´1 ` bq cr ´ @ dr et A ` ´ aq

(b)

(c)

(a)

0

1

0`

1, bq + cr

´1

0

aq

1

C B @ dr + et A aq

´C B` @ 3, `dr + ´et A 0, aq

C B B bq + cr C C B A @ 0 dr + et

(d)

(e)

(f)

bq + cr

Figure 7. Conceptual phases in JAD SpMV. 4.1

fixed. Third, for some representations (e.g., value list) there exists no injective mapping from concrete objects to abstract ones, forcing us to use relations rather than representation functions. Using relations across the board yields a more consistent and logically lightweight framework. An indexed list representation of an n-vector x by a list x is captured by the ilist predicate. Note that we refrain from fixing vector elements to a specific type (e.g., integers) and instead use type parameters α and β to denote the types of inner elements of the mathematical and concrete vectors, respectively.

Translating LL to Isabelle/HOL

Fig. 3 constitutes a shallow embedding [16] of LL in Isabelle/HOL, a standard technique when the goal is to verify correctness of programs in some language. In this approach, the functions and types of an object language (LL) are written directly in the language of the theorem prover (typed λ-calculus). Subsequent logical formulas relate to these translated programs as ordinary HOL objects, which allows to leverage existing support for proving properties of them. The CSR implementation in Section 3.2 translates to the following definitions, which will be used in our proofs.

ilist :: nat → (nat → α → β → bool) → (nat → α) → [β] → bool

csr = (filter(λ(j, v).v = 0)) ◦ enum and csrmv (A, x) = map (listsum ◦ map (λ(x, y). x ∗ y) ◦ unsplit zip ◦ (3) (λ(J, V ). (V, map (λi. x ! i) J)) ◦ map unzip

ilist n P x x ⇐⇒ (length x = n) ∧ (∀i < n. P i (x i) (x ! i)) The parameter P is a relation that specifies the representation of each element in the vector. For ordinary vectors, it is equality of elements. However, P turns useful for matrix representation, as we can use arbitrary relations to determine the representation of inner vectors. We introduce abbreviations for the common cases of indexed list representations.

We now pose the verification theorem: when A index-represents the m × n-matrix A and x the n-vector x , the result of CSR SpMV applied to a CSR version of A and to x represents the m-vector that is equal to A · x .

ilistM m n A A ∧ ilistv n x x

ilistv n x x ⇐⇒ ilist n (λj. op =) x x

−→ ilistv m (λi. Σj < n. A i j ∗ x j) (csrmv (csr A, x)) (4)

ilistM m n A A ⇐⇒ ilist m (λi. ilistv n) A A An associative list representation is central to sparse matrix codes as it is often used in vector compression. It is captured by the alist predicate.

The remainder of this section presents the formalism and explains the reasoning used in proving this goal. 4.2

alist :: nat → (nat → α → β → bool) → (α set) → (nat → α) → [nat, β] → bool

Formalizing vector and matrix representations

We begin by formalizing vectors and matrices in HOL. Mathematical vectors and matrices are formalized as functions from indexes to values, namely nat → α and nat → nat → α, respectively; note that the → type constructor is right-associative, hence a matrix is a vector of vectors. Dimensions are not encoded in the type itself, and values returned for indexes exceeding the dimensions can be arbitrary, which means that many functions can represent the same mathematical entity. Concrete representations of dense and sparse vectors/matrices are derived from the LL implementation and consist of lists and pairs. Commonly used representations include indexed lists, value lists and associative lists, all of which are explained below. We introduce representation relations (defined as predicates in HOL) to link mathematical vectors and matrices with different concrete representations, for three reasons. First, in proving correctness of functions we map operations on concrete objects to their mathematical counterparts. This is easy to do for indexed list representations but gets unwieldy with others. We hide this complexity inside the definitions of the relations. Second, predicates can be used to enforce integrity constraints of the representation. For example, an associative list representation requires that index values are unique; or the lengths of a list of indexed list representations need to be

alist n P D x x ⇐⇒ distinct (map fst x) ∧ (∀(i, v) ∈ set x. P i (x i) v ∧ i < n) ∧ (∀i < n. x i ∈ D −→ ∃v. (i, v) ∈ set x) Here, distinct is a predicate stating the uniqueness of indexes (i.e., keys) in x. Each element in an associative list must relate to the respective vector element, also requiring that index values are within the vector length. Finally, each element in the vector that is not a default value (specified by the set of values D) must appear in the representing list. Note that a set of default values accounts for cases where more than one such value exists, as in the case of nested vectors where each function mapping the valid dimensions to zero is a default value. Also note that alist does not enforce a particular order on elements in the compressed representation, nor does it insist that all default values are omitted. Sometimes concrete objects contain only the values of the elements in a given vector, without mention of their indexes. This value list representation often occurs prior to computing a crossor dot-product. It is captured by the vlist predicate, which states

255

that the list of values can be zipped with some list of indexes p to form a proper associative list representation. (The length restriction ensures that no elements are dropped from the tail of x.)

tion of value-represented rows. This results in ` ilist m λi r r. vlist n (λj. op =) {0} r (map (λv. snd v ∗ x ! fst v)

vlist :: nat → (nat → α → β → bool) → (α set) → (nat → α) → [β] → bool

´ (filter (λv. snd v = 0) (enum r)))

(λi j. A i j ∗ x j) A

vlist n P D x x ⇐⇒ ∃p. length p = length x ∧

Further simplification is not possible at this point, nor can we modify the vlist relation inside ilist . Luckily, ILIST- VLIST matches our goal, lifting the inner vlist to the outermost level and permitting to further operate on the concrete parameters of vlist . Note that ILIST- VLIST has two assumptions, resulting in new subgoals

alist n P D x (zip p x) Additional representations can be incorporated into our theory. For example, when a matrix is compressed into an associative list, a dual-index representation relation can be defined similarly to alist . 4.3

ilist m ?Q ?B A and ∀i < m. vlist n (λj. op =) {0} (λj. A i j ∗ x j) ` map (λv. snd v ∗ x ! fst v)

Proving correctness of sparse matrix computations

We prove Eq. (4) using term rewriting and introduction rules. Introduction rules are used whenever further rewriting cannot be applied. An introduction rule is applied by resolution: applying the rule G x ∧ H y −→ F x y to the goal F a b yields two new subgoals, G a and H b. The theorem in Eq. (4) makes the following two assumptions,

ilistM m n A A

(5)

ilistv n x x

(6)

´ (filter (λv. snd v = 0) (enum (A ! i))) (11)

In Eq. (10), ?Q and ?B are existentially quantified variables. They do not get instantiated when we apply ILIST- VLIST, and the subgoal Eq. (10) merely certifies that A has length n. Therefore, the prover is allowed to instantiate them arbitrarily and Eq. (10) is discharged by the assumption Eq. (5). The rules VLIST- MAP, ALIST- FILTER and ALIST- ENUM can now be applied to Eq. (11). Note that applying them amounts to the effect of simplification using Eq. (9). However, they can be applied regardless of the way in which the three operations—map, filter and enum —are intertwined. Therefore, they are applicable in numerous cases where the context imposed by Eq. (9) is too restrictive. The ALIST- FILTER rule forces us to prove that filter only removes default values, in the form of the following new subgoals,

which are added to the set of available introduction rules as true −→ . . . . The conclusion of Eq. (4) is our initial proof goal,

ilistv m (λi. Σj < n. A i j ∗ x j) (csrmv (csr A) x)

(7)

Simplifying the goal. We begin by applying Isabelle’s simplifier using Eq. (3) and standard rules for pairs, lists, arithmetic and Boolean operators. This removes most of the function abstractions, compositions and pair formations due to the translation from LL. Our new goal is analogous to Eq. (1) in Section 2.2.

∀i < m. ∀j < n. ∀v v .

ilistv m (λi. Σj < n. A i j ∗ x j) (map (λr. listsum (map (λv. snd v ∗ x ! fst v) (filter (λv. snd v = 0) (enum r)))) A) (8)

¬ snd (j, v) = 0 ∧ v = v ∗ x ! j −→ v ∈ {0} ∀i < m.

(12)

ilist n (λj v v. v = v ∗ x ! j) (λj. A i j ∗ x j) (A ! i)

Solving the entire goal using rewriting alone calls for simplification rules that are too algorithm-specific. For example, the rule (∀x ∈ set xs. ¬ P x −→ f x = 0) −→ listsum (map f (filter P xs)) = listsum (map f xs)

(10)

Fortunately, subgoal Eq. (12) is completely discharged by the simplifier. The remaining goal is solved using the ILIST- MULT, ILISTNTH , and ILIST v → ILIST M , as well as the assumptions Eq. (5) and Eq. (6).

(9)

4.4

allows further simplification of Eq. (8), but fails for all formats that introduce more complex operations between map and filter.

Automating the proof

The above proof outline already dictates a simple proof method. Isabelle’s tactical language [15] provides us with ample methods and combinators that can be used to implement custom proof tactics. Our proof method is implemented as follows.

Introduction rules on representation relations. Consider the equation in the conclusion of Eq. (9). We know that it holds when the two lists, xs and filter P xs, value-represent the same vector. By introducing rules, describing when it is allowed to apply map, filter and enum operations to value list representations, we prove that the result of listsum in Eq. (8) equals the mathematical dot-product. Fig. 8 shows the introduction rules used in proving Eq. (4). Application of introduction rules is syntax directed, choosing rules whose conclusion matches the current goal. Given Eq. (8), the prover applies ILIST- MAP, which moves the map from the representing object into the inner representation relation, followed by ILIST- LISTSUM , which substitutes listsum with an equivalent no-

1. The simplifier attempts to rewrite the goal until no further rewrites are applicable, returning the new goal. If no rewrite rule could be applied, it returns an empty goal. 2. The resolution tactic attempts to apply each of the introduction rules and returns a new goal state for each of the matches. It is possible that more than one rule matches a given goal, e.g. ILIST- MAP and ILIST- NTH both match ilist n (λi v v. v = y ! i) x (map f x), resulting in a sequence of alternative goal states to be proved. Invoking the proof method leads to a depth-first search on the combination of the two sub-methods. It maintains a sequence of goal states, initially containing only the main goal. After each

predicate P and the vector z are arbitrary, they just help to state that x is a list of length m.

3 The

256

ilist n (λi a b. P i a (f b)) x x ilist n P x (map f x)

ILIST- MAP

ilist m (λi r r. vlist n (λj.op =) {0} r (f r)) A A ilist m (λi r r. r = listsum (f r)) (λ i. Σ j < n. A i j) A ilist m Q B A ∀i < m. vlist n (P i) (D i) (f (A i) i) (g (A ! i) i) ilist m (λi r r. vlist n (P i) (D i) (f r i) (g r i)) A A

alist m (λi r r. P i r (f (i, r))) D x x vlist m P D x (map f x)

VLIST- MAP

ALIST- ENUM

ilistv m x y ilist m P z x ilist m (λi v v. v = y ! i) x x

ILIST- VLIST

alist n P D x x (∀i < n. ∀v v . ¬ Q (i, v) ∧ P i v v −→ v ∈ D) alist n P D x (filter Q x)

ilist m P x x alist m P D x (enum x)

ILIST- LISTSUM

ilist n (λi v v. v = f i v) x z ilist n (λi v v. v = g i v) y z ilist n (λi v v. v = f i v ∗ g i v) (λi. x i ∗ y i) z ilistM m n A A i<m ilistv n (A i) (A ! i)

ILIST- NTH 3

ILIST v

ALIST- FILTER

ILIST- MULT

→ ILISTM

Figure 8. Introduction rules used in the proof of CSR SpMV. k = 0 ilistv n (λi. block k 1 (λi j. A (i ∗ k + i ))) A ilistv (n ∗ k) A (concat_vectors k A) l = 0 ilistv (m ∗ l) x x ilistv m (λi. block l 1 (λi j . x (i ∗ l + i ))) (block_vector m l x)

ilistM

ILIST- CONCAT _ VECTORS

ILIST- BLOCK _ VECTOR

k = 0 l = 0 ilistM (m ∗ k) (n ∗ l) A A m n (λi j. block k l (λi j .A (i ∗ k + i )(j ∗ l + j ))) (block_matrix m n k l A)

ILIST- BLOCK _ MATRIX

Figure 9. Introduction rules used for proving blocked format operations. successful application of either sub-method, the result is prepended to the head of the sequence. A failure at any level causes the search to backtrack and continue with the next available goal state. When the top element of the goal state sequence is empty, the main goal has been discharged and the proof is complete.

for example by keeping the type of the value stored in the matrix parametric. In this section, we extend our prover to verify several formats that are strictly more complex than CSR. Our experience indicates that our prover can overcome variations in both format construction and matrix-vector multiplication. The variations were both syntactic (i.e., due to syntactic sugar) and structural (i.e., inducing a different dataflow structure). This benefit is thanks to Isabelle’s simplifier, which successfully canonicalizes these differences, requiring only minor tweaks to the prover’s rule base. Therefore, we consider below only a single implementation for each format and argue that the single variant represents a larger class of similar implementations.

5. Evaluation In this section we evaluate programmability of sparse codes in LL and the extensibility of our verification method to new formats. 5.1

Verifying additional sparse formats

We examine to what extent our prover design allows us to verify additional formats without adding excessively many rules. Recall that our initial implementation of the prover for CSR SpMV (Section 4) insisted on minimizing reliance on format-specific rules, avoiding duplication of logic, and keeping representation relations general,

Jagged Diagonals (JAD). A prominent feature of JAD’s proof goal is the double use of transpose, once during compression (jad) and once during multiplication (jadmv). This form can be simplified to the Isabelle takeWhile list operator on the premise that com-

257

constructs, such as first-class functions and folds. We show that LL can express advanced formats even without these constructs.

pressed rows are sorted by length prior to being transposed. The form is matched by a rewrite rule for transpose (transpose xs). Adding introduction rules for infl, takeWhile, rev and sort_key was sufficient for our verifier to complete the proof. The ability to prove full functional correctness of JAD SpMV documents the strength of our prover; no other verification framework that we know of can (i) handle the complex data transformations in JAD compression, and (ii) prove correctness of arithmetic operations on the resulting sparse representation (see Section 6).

Sparse CSR (SCSR). The SCSR format extends CSR with another layer of compression. SCSR compresses the list of (compressed) rows, by filtering out empty rows (i.e., those rows that have only zero-valued elements). The remaining rows are associated with their row index. Implementing SCSR in LL amounts to obtaining the CSR format, which compresses individual rows, followed by compression of the resulting list of compressed rows. Again, LL manages to express format construction as a pipeline of stages.

Coordinate (COO). As mentioned in Section 3.2, the COO format is challenging because it associates matrix values with both row and column coordinates, and also because it requires concatenation and gather operations. It turns out that the COO pair coordinates do not call for a new representation relation. In fact, thanks to how the functions coo and coomv are composed, we need to handle the pair coordinates only between concatenation (in coo) and gather (in coomv). The simplifier moves these two functions together; therefore, we introduce a rule to relate the representation of the input and output of gather (concat xs), allowing the prover to automatically complete the proof.

def scsr: csr -> [len != 0 ? (#, id)]

The corresponding SpMV implementation needs to account for the row indexes. It must also inflate the resulting sparse vector into dense format: def scsrmv(A, x): A -> [i, r: r -> unzip -> snd * x[fst] -> sum -> (i, id)] -> infl(0, m, id)

ALIST- GATHER - CONCAT

vlist n (λi. vlist m (λj a b. a = snd b ∧ i = fst b) {0}) {x.∀j < m.x j = 0} M xs alist n (λi. vlist m (λi.op =) {0}) {x.∀j < m.x j = 0} M (gather (concat xs))

Alternatively, we can reuse SpMV for CSR: A -> unzip -> (fst, csrmv(snd, x)) -> zip -> infl(0, m, id)

Compressed Sparse Columns (CSC). As CSC exhibits a peculiar use of concatenation and gather operations, it is handled similarly to COO. In contrast to COO, the input list to concat represents a transposed matrix, hence we use a rule similar to ALIST- GATHER CONCAT , but with a transposed matrix M .

SCSR demonstrates the ability of our prover to peel off the additional compression layer and prove correctness of the overall result, while requiring only two rules in addition to those needed by CSR (see Fig. 10). Next, we investigate two optimizations for SpMV—register blocking and cache blocking—designed to improve temporal locality of the vector x at two different levels of the memory hierarchy. The locality is improved by reorganizing the computation to operate on smaller segments of the input matrix, which in turn allows the reuse of a segment of x.

How many introduction rules did we need to prove our sparse formats? In total, 24 rules were needed, including both introduction and simplification rules. Introduction rules were typically used to (i) reason about some language construct such as map, sum and filter, in the context of a certain representation (e.g., rules ILISTMAP , ILIST- LISTSUM , ALIST- FILTER in Fig. 8); and (ii) formalize algebraic operations on vector and matrix representations, such as extracting an inner representation relation (ILIST- VLIST) and substituting a vector representation with a matrix representation (ILISTv → ILISTM ). Most operators were handled by a single introduction rule; a few (e.g., map ) required one rule per representation relation. To quantify rule reuse in our prover, we summarize the reuse of the 24 rules that were needed for proving five sparse formats (see Fig. 10). On average, fewer than 19% of rules used by a particular format are specific to this format, while over 66% of these rules are used by at least three additional formats, a significant level of reuse. Even of the rules needed for more complex formats (CSC and JAD), only up to a third are format-specific. On the other hand, format-specific rules tend to be harder to prove, as indicated by the average number of lines of Isar code required to prove the rules. A detailed examination reveals that two rules for handling a gather -concat sequence (used in CSC and COO) account for over a hundred lines each. We believe that these rules can be refactored for better reuse of simpler lemmas and greater automation. Note that most of the effort in proving JAD was invested in stating and proving simplification of transpose -transpose composition. Fig. 10 does not account for these rules, as they are quite general and were implemented as an extension to Isabelle’s theory of lists. 5.2

Register blocking. This optimization is useful when the nonzero values in the matrix appear in clusters [14]. The idea is to place the cluster of nonzeros in a small dense matrix. To obtain registerblocked format, instead of compressing single-cell values (i.e., numbers), a matrix is partitioned into uniformly sized rectangular blocks. These dense blocks form the base elements for compression: a block is filtered away if all its elements are zeros; if the block is nonzero, it is represented as a dense matrix. The size of these blocks is chosen so that the corresponding portion of the vector x can reside in registers during processing of a block. Register blocking can be applied to all sparse formats described in Section 3.2. The 2 × 2 blocked representation of Fig. 1(a) can be seen in Fig. 11(a). Applying CSR compression to this blocked matrix results in the register-blocked CSR format in Fig. 11(b). To construct the register-blocked CSR (RBCSR) in LL, we first “blockify” the dense matrix with the block function, which transforms a dense matrix A of size mr × nc to an m × n matrix of r × c dense blocks. Next, we pair these blocks with their column indices using [enum], and filter out the all-zero blocks. def rbcsr(A): block(r, c, A) -> [enum -> [snd -> [[neq ‘ 0] -> disj] -> disj ? ]]

In SpMV of an r × c-RBCSR matrix A and a dense vector x, we first bind the names B and l to each dense block and its index, respectively, and perform dense matrix-vector multiplication (densemv) on each block and the corresponding c-sub-vector of x.

Case study: hierarchical compression formats

This subsection evaluates expressiveness of the LL language. This question is motivated by the absence from LL of some powerful

258

Reuse degree 1 2 3 4 5

# Rules 11 3 1 5 4

Avg. LOC 28.3 6.3 6.0 6.4 2.3

CSR 1

SCSR

1 3 4

1 5 4

COO 2 3

CSC 5 3

4 4

3 4

JAD 3 1 5 4

Total 11 6 3 20 20

% 18.3 10.0 5.0 33.3 33.3

Figure 10. Reusability analysis of our sparse-matrix code prover. 0 „

a b

B B „ @ 0 0

0 c 0 d

«

„

«

„

0 0 0 0

0 0

« 1

C « C A 0 e

0 „fi „ a 0, B b B „fi „ @ 0 0, 0

(a)

0 c

«fl«

«fl fi „ 0 0 1, d 0 (b)

1 C «fl« C A 0 e

Figure 11. Example dense and sparse 2 × 2 blocked matrix representations. The latter is obtained by breaking x into a list of c-vectors using

theorem follows.

block(c, x) and selecting the appropriate sub-vector. The result

k = 0 ∧ l = 0 ∧ ilistM (m ∗ k) (n ∗ l) A A ∧ ilistv (n ∗ l) x x

vectors in a row block are summed, and the final result is obtained by concatenating the result sub-vectors from all row blocks.

−→ ilistv (m ∗ k) (λi. Σj < n ∗ l. A i j ∗ x j) (rbcsrmv m n k l (rbcsr m n k l A, x))

def rbcsrmv(A, x): A -> [[l, B: (B, block(c, x)[l]) -> densemv] -> sum] -> concat

(13) After adding the introduction rules in Fig. 9 and a few rewrite rules for matrix, the prover automatically proves Eq. (13).

Our prover allowed us to easily extend proofs to blocked formats because our matrices are of parametric type; the prover can work with matrices of numbers as well as with matrices whose elements are matrices. Parameterization of matrices was expressed with Isabelle/HOL type classes, which are used to restrict types in introduction rules. We use a theory of finite matrices [12]. Here, too, the size of a matrix is not encoded in the matrix type (denoted α matrix) but it is required that matrix dimensions are bounded. To represent matrices as abstract values, we introduce the matrix conversion function:

Cache blocking. The idea in cache blocking is to reduce cache misses for the source vector x when it is too large to entirely fit in the cache during SpMV. We consider static cache blocking [7]. The sparse matrix is partitioned into rectangular sub-matrices of size r × c. While in register blocking these sub-matrices were kept dense, in the cache-blocked format they are compressed. Our cache blocking scheme differs from the one in [7] in that we only allow cache blocks to start at column indices which are multiples of c; this restriction leads to suboptimal compression. We believe that this restriction can be relaxed by augmenting LL with a blocking function that creates optimally placed blocks. Notice that the construction of a cache-blocked matrix is very similar to the construction for register blocking. The only difference is the additional compression applied to each block. The LL code for the CSR compressed cache-blocked matrix, whose blocks are stored in CSR format, is shown below.

matrix :: nat → nat → (nat → nat → α) → α matrix The first two parameters specify the row and column dimensions, respectively. The third parameter is the abstract value encoded into the matrix. Implementing functions on compressed matrices necessitates a few more conversion functions: block_vector :: nat → nat → [α] → [α matrix]

block_matrix :: nat → nat → nat → nat → [[α]] → [[α matrix]] concat_vectors :: nat → [α matrix] → [α]

def cbcsr(A): block(r, c, A) -> [enum -> [snd -> [[neq ‘ 0] -> disj] -> disj ? ] -> [l, B: (l, B -> csr)]]

The operation block_matrix m n k l A transforms the object A, representing an mk × nl-matrix, into an object representing an m × n-matrix of k × l-blocks; block_vector m k x transforms the list x of length nk into a list of n k-vectors; concat_vectors k x is the inverse operation, unpacking the k-vectors in x. This code shows register-blocked CSR code in Isabelle.

The corresponding cache-blocked SpMV in LL: def cbcsrmv(A, x): A -> [[l, B: (B, block(c, x)[l]) -> csrmv] -> sum] -> concat

rbcsr m n k l A = map (filter (λ(i, v). v = 0) ◦ enum) (block_matrix m n k l A)

In the cache-blocked SpMV, we again notice the similarity to the register-blocked SpMV. The two codes are identical except for the function used for multiplying a block by a vector. It is somewhat desirable to factor out these inner multiplications (densemv and csrmv) but this is not possible in LL. The reason is that LL does not support lambda abstraction, which would allow reuse of code common to register- and cache-blocked versions. We have refrained from enriching LL with first-order functions for now because this

rbcsrmv m n k l (A, x) = concat_vectors k (map (listsum ◦ (map (λv.snd v ∗ block_vector n l x ! fst v))) A) We require that block dimensions are greater than zero and properly divide the respective matrix and vector dimensions. The correctness

259

allows for simpler verification and gives us broad verification coverage. We do not consider the absence of lambda abstraction a significant disadvantage because even optimized LL programs are small. In the future, we may decide to extend LL with a template mechanism that will be used to instantiate such hierarchical composition, allowing code reuse. The verification of cache-blocked sparse formats was not yet implemented. We expect that the amount of work will not be substantial, based on our experience with other hierarchical formats.

6.

definition of these representation relations we were able to build a reusable set of simplification and introduction rules, which could be applied to a variety of computations. We are currently working on the problem of compiling the functional code into an efficient C code. Deploying techniques from predecessor functional and data parallel languages, we already exhibit promising performance results with real-world sparse formats.

References [1] J. Backus. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Communications of the ACM (CACM), 21(8):613–641, 1978. [2] A. J. C. Bik, P. Brinkhaus, P. M. W. Knijnenburg, and H. A. G. Wijshoff. The automatic generation of sparse primitives. ACM Transactions on Mathematical Software, 24(2):190–225, 1998. [3] G. E. Blelloch. Programming parallel algorithms. Communications of the ACM (CACM), 39(3):85–97, 1996. [4] M. M. T. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data Parallel Haskell: a status report. In Workshop on Declarative Aspects of Multicore Programming (DAMP), pages 10–18, New York, NY, USA, 2007. ACM. [5] J. Duan, J. Hurd, G. Li, S. Owens, K. Slind, and J. Zhang. Functional correctness proofs of encryption algorithms. In Logic for Programming, Artificial Intelligence and Reasoning (LPAR), pages 519–533, 2005. [6] C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, and R. Stata. Extended static checking for Java. In Programming Languages Design and Implementation, pages 234–245, 2002. [7] E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley, 2000.

Related Work

Specifying sparse matrix code Bernoulli [8, 9] is a system that synthesizes efficient low-level implementations of matrix operations given a description of the sparse format using relational algebra. This is impressive and permits rapid development of fast low level implementations. However, the functionality of the system was limited and it had limited impact. Instead, we are expressing formats using a functional programming language which can be mechanically verified. We believe that function-level programming provides the right level of abstraction for expressing the desired transformations. Moreover, LL can be embedded in existing callby-value functional programming languages. Compiling LL into low-level code is a work in progress. The synthesizer by Bik et al. [2] produces efficient implementations by replacing A[i, j] in dense-matrix code with a representation function that maps to the corresponding sparse element; powerful compiler optimizations then yield efficient code. Verifying sparse matrix code We are not aware of previous work on verifying full functional correctness of sparse matrix codes. We are not even aware of work that verified their memory safety without explicitly provided loop invariants. Our own attempts at verification included ESC/Java, TVLA and SAT-based bounded model checking, neither of which was satisfactory. Furthermore, neither of these tools was capable of proving higher-order properties like the ones we currently prove. This led us to raising the level of abstraction and deferring to purely functional programs where loops are replaced with comprehensions and specialized reduction operators.

[8] V. Kotlyar and K. Pingali. Sparse code generation for imperfectly nested loops with dependences. In International Conference on Supercomputing (ICS), pages 188–195, 1997. [9] V. Kotlyar, K. Pingali, and P. Stodghill. A relational approach to the compilation of sparse matrix programs. In Euro-Par, pages 318–327, 1997. [10] N. Mateev, K. Pingali, P. Stodghill, and V. Kotlyar. Next-generation generic programming and its application to sparse matrix computations. In International Conference on Supercomputing (ICS), pages 88–99, 2000.

Higher order verification Duan et al. [5] verified a set of block ciphers using the interactive theorem prover HOL-4. They proved that the decoding of an encoded text results in the original data. Their proofs are mostly done using inversion rules, namely rules of the form f (f −1 x) = x, and algebraic rules on bit-word identities. For the block ciphers used by AES and IDEA special rules where needed. The domain of block cipher verification does not seem to require more complicated rules than bit-word identities.

7.

[11] T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL: A Proof Assistant for Higher-Order Logic, volume 2283 of Lecture Notes in Computer Science. Springer-Verlag, 2002. [12] S. Obua. Flyspeck II: The Basic Linear Programs. PhD thesis, Technische Universität München, 2008. [13] M. Sagiv, T. W. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. ACM Transactions on Programming Languages and Systems (TOPLAS), 24(3):217–298, 2002. [14] R. W. Vuduc. Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, Berkeley, 2004.

Conclusion

[15] M. Wenzel. The Isabelle/Isar Implementation. Technische Universität München.

In this paper we showed how to raise the level of abstraction for sparse matrix programs from imperative code with loops to functional programs with comprehensions and limited reductions. We also developed an automated proof method for verifying a diverse range of sparse matrix formats and their SpMV operations. This was accomplished by introducing relations that map a sparse representation to the abstract (mathematical) one. Through a clever

http://isabelle.in.tum.de/doc/implementation.pdf.

[16] M. Wildmoser and T. Nipkow. Certifying machine code safety: Shallow versus deep embedding. In International Conference on Theorem Proving in Higher-Order Logics, pages 305–320, 2004.

260

Regular, Shape-polymorphic, Parallel Arrays in Haskell Gabriele Keller†

Manuel M. T. Chakravarty† Simon Peyton Jones‡

Ben Lippmeier†

†

‡

Computer Science and Engineering University of New South Wales, Australia {keller,chak,rl,benl}@cse.unsw.edu.au

Abstract

Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features—Concurrent programming structures; Polymorphism; Abstract data types

Keywords

1.

Microsoft Research Ltd Cambridge, England

{simonpj}@microsoft.com

for previous work on algorithmic skeletons and the use of the BirdMeertens Formalism (BMF) for parallel algorithm design [17]. Our own work on Data Parallel Haskell (DPH) is based on the same premise, but aims at irregular data parallelism which comes with its own set of challenges [16]. Other work on byte arrays [7] also aims at high-performance, while abstracting over loop-based lowlevel code using a purely-functional combinator library. We aim higher by supporting multi-dimensional arrays, more functionality, and transparent parallelism. We present a Haskell library of regular parallel arrays, which we call Repa1 (Regular Parallel Arrays). While Repa makes use of the Glasgow Haskell Compiler’s many existing extensions, it is a pure library: it does not require any language extensions that are specific to its implementation. The resulting code is not only as fast as when using an imperative array interface, it approaches the performance of handwritten C code, and exhibits good parallel scalability on the configurations that we benchmarked. In addition to good performance, we achieve a high degree of reuse by supporting shape polymorphism. For example, map works over arrays of arbitrary rank, while sum decreases the rank of an arbitrary array by one – we give more details in Section 4. The value of shape polymorphism has been demonstrated by the language Single Assigment C, or SAC [18]. Like us, SAC aims at purely functional high-performance arrays, but SAC is a specialised array language based on a purpose-built compiler. We show how to embed shape polymorphism into Haskell’s type system. The main contributions of the paper are the following:

We present a novel approach to regular, multi-dimensional arrays in Haskell. The main highlights of our approach are that it (1) is purely functional, (2) supports reuse through shape polymorphism, (3) avoids unnecessary intermediate structures rather than relying on subsequent loop fusion, and (4) supports transparent parallelisation. We show how to embed two forms of shape polymorphism into Haskell’s type system using type classes and type families. In particular, we discuss the generalisation of regular array transformations to arrays of higher rank, and introduce a type-safe specification of array slices. We discuss the runtime performance of our approach for three standard array algorithms. We achieve absolute performance comparable to handwritten C code. At the same time, our implementation scales well up to 8 processor cores.

General Terms

Roman Leshchinskiy†

Languages, Performance

Arrays, Data parallelism, Haskell

Introduction

In purely functional form, array algorithms are often more elegant and easier to comprehend than their imperative, explicitly loopbased counterparts. The question is, can they also be efficient? Experience with Clean, OCaml, and Haskell has shown that we can write efficient code if we sacrifice purity and use an imperative array interface based on reading and writing individual array elements, possibly wrapped in uniqueness types or monads [10, 11, 13]. However, using impure features not only obscures clarity, but also forfeits the transparent exploitation of the data parallelism that is abundant in array algorithms. In contrast, using a purely-functional array interface based on collective operations —such as maps, folds, and permutations— emphasises an algorithm’s high-level structure and often has an obvious parallel implementation. This observation was the basis

• An API for purely-functional, collective operations over dense,

rectangular, multi-dimensional arrays supporting shape polymorphism (Section 5). • Support for various forms of constrained shape polymorphism

in a Hindley-Milner type discipline with type classes and type families (Section 4). • An aggressive loop fusion scheme based on a functional repre-

sentation of delayed arrays (Section 6). • A scheme to transparently parallelise array algorithms based on

our API (Section 7) • An evaluation of the sequential and parallel performance of

our approach on the basis of widely used array algorithms (Section 8).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

Before diving into the technical details of our contributions, the next section illustrates our approach to array programming by way of an example. 1 Repa

261

means “turnip” in Russian.

2.

Our approach to array programming

A simple operation on two-dimensional matrices is transposition. With our library we express transposition in terms of a permutation operation that swaps the row and column indices of a matrix: arr:

extent

:: Array sh e -> sh

sum

:: (Shape sh, Elt e, Num e) => Array (sh :. Int) e -> Array sh e

[[a00, a01, a02],[a10, a11, a12],[a21, a22, a23],[a31, a32, a33]]

zipWith :: (Shape sh, Elt e1, Elt e2, Elt e3) transpose2D :: Elt e => Array DIM2 ebrr: -> Array[[b DIM2 e], [b 00,b01 10,b20], [b20,b21]] transpose2D arr brrTran:[[b00,b10,b20],[b01,b11=> ,b21(e1 ]] -> e2 -> e3) -> Array sh e1rep -> 2Array sh e2 = backpermute new_extent swap arr rep 2 -> Array sh e3 where swap (Z :.i :.j) = Z :.j :.i new_extent = swap (extent arr) arrRepl:[[[a 00,a01,a02],[a00,a01,a02]],[[[a10,a11,a12],..],............] backpermute :: (Shape sh, Shape sh’, Elt e) brrRepl:[[[b ,b10,b20],[b01,b11,b21=> ]],[[[b 00,b10,b20],..],............] Like Haskell 98 arrays, our array type is parameterised by 00 the sh’ -> (sh’ -> sh) array’s index type, here DIM2, and by its element type e. The index -> Array sh e -> Array sh’ e type gives the rank of the array, which we also call the array’s rep 4 dimensionality, or shape. Figure 1. Types of library functions Consider the type of backpermute, given inresult Figure 1.:[[a The00first *b00+a01*b10+...,a00*b01+..],[.,.],[.,.],[.,.]] argument is the bounds (or extent) of the result array, which we obtain by swapping the row and column extents of the input array. sum For example transposing a 3 × 12 matrix gives a 12 × 3 matrix.2 trr The backpermute function constructs a new array in terms of an existing array solely through an index transformation, supplied as its second argument, swap: given an index into the result matrix, brrRepl swap produces the corresponding index into the argument matrix. x 4 A more interesting example is matrix-matrix multiplication: mmMult :: (Num e, Elt e) => Array DIM2 e -> Array DIM2 e -> Array DIM2 e mmMult arr brr = sum (zipWith (*) arrRepl brrRepl) where trr = transpose2D brr 1 April 2010 :.All) arr arrRepl = replicate (Z :.AllThursday,:.colsB brrRepl = replicate (Z :.rowsA :.All :.All) trr (Z :.colsA :.rowsA) = extent arr (Z :.colsB :.rowsB) = extent brr

arr

arrRepl x 2

Figure 2. Matrix-matrix multiplication illustrated more importantly, we measured very good absolute speedup, ×7.2 for 8 cores, on multicore hardware. For the C code, this can only be achieved through considerable additional effort or by employing special-purpose language extensions such as OpenMP [22] which often have difficulties with more complex programs.

The idea is to expand both rank-two argument arrays into rankthree arrays by replicating them across a new dimension, or axis, as illustrated in Figure 2. The front face of the cuboid represents the array arr, which we replicate as often as brr has columns (colsB), producing arrRepl. The top face represents trr (the transposed brr), which we replicate as often as arr has rows (rowsA), producing brrRepl. As indicated by the figure, the two replicated arrays have the same extent, which corresponds to the index space of matrix multiplication:

3.

Representing arrays

The representation of arrays is a central issue in any array library. Our library uses two devices to achieve good performance: 1. We represent array data as contiguously-allocated ranges of unboxed values. 2. We delay the construction of intermediate arrays to support constant-time index transformations and slices, and to combine these operations with traversals over successive arrays.

(AB)i,j = Σn k=1 Ai,k Bk,j

We describe these two techniques in the following sections.

where i and j correspond to rowsA and colsB in our code. The summation index k corresponds to the innermost axis of the replicated arrays and to the left-to-right axis in Figure 2. Along this axis we perform the summation after an elementwise multiplication of the replicated elements of arr and brr by zipWith (*). A naive implementation of the operations used in mmMult would result in very bad space and time performance. In particular, it would be very inefficient to compute explicit representations of the replicated matrices arrRepl and brrRepl. Indeed, a key principle of our design is to avoid generating explicit representations of the intermediate arrays that can be represented as the original arrays combined with a transformation of the index space (Section 3.2). A rigorous application of this principle results in aggressive loop fusion (Section 6) producing code that is similar to imperative code. As a consequence, this Haskell code has about the same performance as handwritten C code for the same computation. Even

3.1

Unboxed arrays

In Haskell 98 arrays are lazy: each element of an array is evaluated only when the array is indexed at that position. Although convenient, laziness is Very Bad Indeed for array-intensive programs: • A lazy array of (say) Float is represented as an array of point-

ers to either heap-allocated thunks, or boxed Float objects, depending on whether they have been forced. This representation requires at least three times as much storage as a conventional, contiguous array of unboxed floats. Moreover, when iterating through the array, the lazy representation imposes higher memory traffic. This is due to the increased size of the individual elements, as well as their lower spacial locality. • In a lazy array, evaluating one element does not mean that the

other elements will be demanded. However, the overwhelmingly common case is that the programmer intends to demand the entire array, and wants it evaluated in parallel.

2 For

now, just read the notation (Z :.i :.j) as if it was the familiar pair (i,j). The details are in Section 4 where we discuss shape polymorphism.

262

We can also wrap a UArr as a DArray:

We can solve both of these problems simultaneously using a Haskell-folklore trick. We define a new data type of arrays, which we will call UArr, short for “unboxed array”. These arrays are one-dimensional, indexed by Int, and are slightly stricter than Haskell 98 arrays: a UArr as a whole is evaluated lazily, but an attempt to evaluate any element of the array (e.g. by indexing) will cause evaluation of all the others, in parallel. For the sake of definiteness we give a bare sketch of how UArr is implemented. However, this representation is not new; it is well established in the Haskell folklore, and we use it in Data Parallel Haskell (DPH) [5, 16], so we do not elaborate the details.

wrap :: => wrap sh where

When wrapping an DArray over a UArr, we also take the opportunity to generalise from one-dimensional to multi-dimensional arrays. The index of these multi-dimensional arrays is of type sh, where the Shape class (to be described in Section 4) includes the method toIndex :: Shape sh => sh -> sh -> Int. This method maps the bounds and index of an Array to the corresponding linear Int index in the underlying UArr.

class Elt e where data UArr e (!) :: Array e -> Int -> e ...more methods...

3.3

Combining the two

Unfortunately, there are at least two reasons why it is not always beneficial to delay an array operation. One is sharing, which we discuss later in Section 6. Another is data layout. In our mmMult example from Section 2, we want to delay the two applications of replicate, but not the application of transpose2D. Why? We store multi-dimensional arrays in row-major order (the same layout Haskell 98 uses for standard arrays). Hence, iterating over the second index of an array of rank 2 is more cache friendly than iterating over its first index. It is well known that the order of the loop nest in an imperative implementation of matrix-matrix multiplication has a dramatic effect on performance due to these cache effects. By forcing transpose2D to produce its result as an unboxed array in memory —we call this a manifest array— instead of leaving it as a delayed array, the code will traverse both matrices by iterating over the second index in the inner loop. Overall, we have the following implementation:

instance Elt Float where data UArr Float = UAF Int ByteArray# (UAF max ba) ! i | i < max = F# (indexByteArray ba i) | otherwise = error "Index error" ...more methods... instance (Elt a, Elt b) => Elt (a :*: b) where data UArr (a :*: b) = UAP (UArr a) (UArr b) (UAP a b) ! i = (a!i :*: b!i) ...more methods... Here we make use of Haskell’s recently added associated data types [4] to represent an array of Float as a contiguous array of unboxed floats (the ByteArray#), and an array of pairs as a pair of arrays. Because the representation of the array depends on the element type, indexing must vary with the element type too, which explains why the indexing operation (!) is in a type class Elt. In addition to an efficient underlying array representation, we also need the infrastructure to operate on these arrays in parallel, using multiple processor cores. To that end we reuse part of our own parallel array library of Data Parallel Haskell. This provides us with an optimised implementation of UArr and the Elt class, and with parallel collective operations over UArr. It also requires us to represent pairs using the strict pair constructor (:*:), instead of Haskell’s conventional (,). 3.2

(Shape sh, Elt e) sh -> UArr e -> DArray sh e uarr = Array sh idx idx i = uarr ! toIndex sh i

mmMult arr brr = sum (zipWith (*) arrRepl brrRepl) where trr = force (transpose2D brr) -- New! force! arrRepl = replicate (Z :.All :.colsB :.All) arr brrRepl = replicate (Z :.rowsA :.All :.All) trr (Z :.colsA :.rowsA) = extent arr (Z :.colsB :.rowsB) = extent brr We could implement force by having it produce a value of type UArr and then apply wrap to turn it into a DArray again, providing the appropriate memory layout for a cache-friendly traversal. This would work, but we can do better. The function wrap uses array indexing to access the underlying UArr. In cases where this indexing is performed in a tight loop, GHC can optimise the code more thoroughly when it is able to inline the indexing operator, instead of calling an anonymous function encapsulated in the data type DArray. For recursive functions, this also relies on the constructor specialisation optimisation [15]. However, as explained in Coutts et al. [6, Section 7.2], to allow this we must make the special case of a wrapped UArr explicit in the datatype, so the optimiser can see whether or not it is dealing directly with a manifest array. Hence, we define regular arrays as follows:

Delayed arrays

When using Repa, index transformations such as transpose2D (discussed in Section 2) are ubiquitous. As we expect index transformations to be cheap, it would be wrong to (say) copy a 100Mbyte array just to transpose it. It is much better to push the index transformation into the consumer, which can then consume the original, unmodified array. We could do this transformation statically, at compile time, but doing so would rely on the consumer being able to “see” the index transformation. This could make it hard for the programmer to predict whether or not the optimisation would take place. In Repa we instead perform this optimisation dynamically, and offer a guarantee that index transformations perform no data movement. The idea is simple and well known: just represent an array by its indexing function, together with the array bounds (this is not our final array representation):

data Array sh e = Manifest sh (UArr e) | Delayed sh (sh -> e) We can unpack an arbitrary Array into delayed form thus:

data DArray sh e = DArray sh (sh -> e)

delay :: (Shape sh, Elt e) => Array sh e -> (sh, sh -> e) delay (Delayed sh f) = (sh, f) delay (Manifest sh uarr) = (sh, \i -> uarr ! toIndex sh i)

With this representation, functions like backpermute (whose type signature appeared in Figure 1) are quite easy to implement: backpermute sh’ fn (Array sh ix1) = Array sh’ (ix1 . fn)

263

infixl 3 :. data Z = Z data tail :. head = tail :. head type type type type

DIM0 DIM1 DIM2 DIM3

= = = =

in Figure 3, we use an inductive notation of tuples as heterogenous snoc lists. On both the type-level and the value-level, we use the infix operator (:.) to represent snoc. The constructor Z corresponds to a rank zero shape, and we use it to mark the end of the list. Thus, a three-dimensional index with components x, y and z is written (Z:.x:.y:.z) and has type (Z:.Int:.Int:.Int). This type is the shape of the array. Figure 3 gives type synonyms for common shapes: a singleton array of shape DIM0 represents a scalar value; an array of shape DIM1 is a vector, and so on. The motivation for using snoc lists, rather than the more conventional cons lists, is this: we store manifest arrays in row-major order, where the rightmost index is the most rapidly-varying when traversing linearly over the array in memory. For example, the value at index (Z:.3:.8) is stored adjacent to that at (Z:.3:.9). This is the same convention adopted by Haskell 98 standard arrays. We draw array indices from Int values only, so the shape of a rank-n array is:

Z DIM0 :. Int DIM1 :. Int DIM2 :. Int

class Shape sh where rank :: sh -> Int size :: sh -> Int toIndex :: sh -> sh -> Int fromIndex :: sh -> Int -> sh <..and so on..>

-- Number of elements -- Index into row-major -representation -- Inverse of ’toIndex’

instance Shape Z where ... instance Shape sh => Shape (sh:.Int) where ...

Z :. | Int :.{z· · · :. Int} n times

Figure 3. Definition of shapes

In principle, we could be more general and allow non-Int indices, like Haskell’s index type class Ix. However, this would complicate the library and the presentation, and is orthogonal to the contributions of this paper; so we do not consider it here. Nevertheless, shape types, such as DIM2 etc, explicitly mention the Int type. This is for two reasons: firstly, it simplifies the transition to using the Ix class if that is desired; and secondly, in Section 4.4 we discuss more elaborate shape constraints that require an explicit index type. The extent of an array is a value of the shape type:

This is the basis for a general traverse function that produces a delayed array after applying a transformation. The transformation produced with traverse may include index space transformations or other computations: traverse :: (Shape sh, Shape sh’, Elt e) => Array sh e -> (sh -> sh’) -> ((sh -> e) -> sh’ -> e’) -> Array sh’ e’ traverse arr sh_fn elem_fn = Delayed (sh_fn sh) (elem_fn f) where (sh, f) = delay arr

extent :: Array sh e -> sh The corresponding Haskell 98 function, bounds, returns an upper and lower bound, whereas extent returns only the upper bound. Repa uses zero-indexed arrays only, so the lower bound is always zero. For example, the extent (Z:.4:.5) characterises a 4 × 5 array of rank two containing 20 elements. The extent along each axis must be at least one. The shape type of an array also types its indices, which range between zero and one less than the extent along the same axis. In other words, given an array with shape (Z:.n1 :.· · · :.nm ), its index range is from (Z:.0:.· · · :.0) to (Z:.n1 −1:.· · · :.nm − 1). As indicated in Figure 3, the methods of the Shape type class determine properties of shapes and indices, very like Haskell’s Ix class. These methods are used to allocate arrays, index into their row-major in-memory representations, to traverse index spaces, and are entirely as expected, so we omit the details.

We use traverse to implement many of the other operations of our library — for example, backpermute is implemented as: backpermute :: (Shape sh, Shape sh’, Elt e) => sh’ -> (sh’ -> sh) -> Array sh e -> Array sh’ e backpermute sh perm arr = traverse arr (const sh) (. perm) We discuss the use of traverse in more detail in Sections 5 & 7.

4.

Shapes and shape polymorphism

In Figure 1 we gave this type for sum:

4.2

As the type suggests, sum is a shape-polymorpic function: it can sum the rightmost axis of an array of arbitrary rank. In this section we describe how shape polymorphism works in Repa. We will see that combination of parametric polymorphism, type classes, and type families enables us to track the rank of each array in its type, guaranteeing the absence of rank-related runtime errors. We can do this even in the presence of operations such as slicing and replication which change the rank of an array. However, bounds checks on indices are still performed at runtime — tracking them requires more sophisticated type system support [20, 24]. 4.1

Shape polymorphism

We call functions that operate on a variety of shapes shape polymorphic. Some such functions work on arrays of any shape at all. For example, here is the type of map:

sum :: (Shape sh, Num e, Elt e) => Array (sh:.Int) e -> Array sh e

map :: (Shape sh, Elt a, Elt b) => (a -> b) -> Array sh a -> Array sh b The function map applies its functional argument to all elements of an array without any concern for the shape of the array. The type class constraint Shape sh merely asserts that the type variable sh ought to be a shape. It does not constrain the shape of that shape in any way.

Shapes and indices

4.3

Haskell’s tuple notation does not allow us the flexibility we need, so we introduce our own notation for indices and shapes. As defined

At-least constraints and rank generalisation

With indices as snoc lists, we can impose a lower bound on the rank of an array by fixing a specific number of lower dimensions,

264

data All = All data Any sh = Any

but keeping the tail of the resulting snoc list variable. For example, here is the type of sum: sum :: (Shape sh, Num e, Elt e) => Array (sh:.Int) e -> Array sh e This says that sum takes an array of any rank n ≥ 1 and returns an array of rank n − 1. For a rank-1 array (a vector), sum adds up the vector to return a scalar. But what about a rank-2 array? In this case, sum adds up all the rows of the matrix in parallel, returning a vector of the sums. Similarly, given a three-dimensional array sum adds up each row of the array in parallel, returning a two-dimensional array of sums. Functions like sum impose a lower bound on the rank of an array. We call such constraints shape polymorphic at-least constraints. Every shape-polymorphic function with an at-least constraint is implicitly also a data-parallel map over the unspecified dimensions. This is a major source of parallelism in Repa. We call the process of generalising the code defined for the minimum rank to higher ranks rank generalisation. The function sum only applies to the rightmost index of an array. What if we want to reduce the array across a different dimension? In that case we simply perform an index permutation, which is guaranteed cheap, to bring the desired dimension to the rightmost position:

= = = =

Z sh FullShape sl :. Int FullShape sl :. Int

type family SliceShape ss type instance SliceShape Z ... SliceShape (Any sh) ... SliceShape (sl :. Int) ... SliceShape (sl :. All)

= = = =

Z sh SliceShape sl SliceShape sl :. Int

class Slice ss where sliceOfFull:: ss -> FullShape ss -> SliceShape ss fullOfSlice:: ss -> SliceShape ss -> FullShape ss instance Slice Z where ... instance Slice (Any sh) where ... instance Slice sl => Slice (sl :. Int) where sliceOfFull (fsl :. _) (ssl :. _) = sliceOfFull fsl ssl

sum2 :: (Shape sh, Elt e, Num e) => Array (sh:.Int:.Int) e -> Array (sh:.Int) e sum2 a = sum (backpermute new_extent swap2 a) where new_extent = swap2 (extent a) swap2 (is :.i2 :.i1) = is :.i1 :.i2

fullOfSlice (fsl :. n) ssl = fullOfSlice fsl ssl :. n instance Slice sl => Slice (sl :. All) where sliceOfFull (fsl :. All) (ssl :. s) = sliceOfFull fsl ssl :. s

In our examples so far, we have sometimes returned arrays of a different rank than the input, but their extent in any one dimension has always been unchanged. However, shape-polymorphic functions can also change the extent:

fullOfSlice (fsl :. All) (ssl :. s) = fullOfSlice fsl ssl :. s

selEven :: (Shape sh, Elt e) => Array (sh:.Int) e -> Array (sh:.Int) e selEven arr = backpermute new_extent expand arr where (ns :.n) = extent arr new_extent = ns :.(n ‘div‘ 2) expand (is :.i) = is :.(i * 2)

replicate :: ( Slice sl, Elt e , Shape (FullShape sl) , Shape (SliceShape sl)) => sl -> Array (SliceShape sl) e -> Array (FullShape sl) e replicate sl arr = backpermute (fullOfSlice sl (extent arr)) (sliceOfFull sl) arr

As we can see from the calculation of new_extent, the array returned by selEven is half as big as the input array, in the rightmost dimension. The index calculation goes in the opposite direction, selecting every alternate element from the input array. Note carefully that the extent of the new array is calculated from the extent of the old array, but not from the data in the array. This guarantees we can do rank generalisation and still have a rectangular array. To see the difference, consider:

slice

:: ( Slice sl, Elt e , Shape (FullShape sl) , Shape (SliceShape sl)) => Array (FullShape sl) e -> sl -> Array (SliceShape sl) e slice arr sl = backpermute (sliceOfFull sl (extent arr)) (fullOfSlice sl) arr

filter :: Elt e => (e -> Bool) -> Array DIM1 e -> Array DIM1 e

Figure 4. Definition of slices

The filter function is not, and cannot be, shape-polymorphic. If we filter each row of a matrix based on the element values, then each new row may have a different length. This gives no guarantee that the resulting matrix is rectangular. In contrast, we have carefully chosen our shape-polymorphic primitives to guarantee the rectangularity of the output. 4.4

type family FullShape ss type instance FullShape Z ... FullShape (Any sh) ... FullShape (sl :. Int) ... FullShape (sl :. All)

function replicate takes an array of arbitrary rank and replicates it along one or more additional dimensions. Note that we cannot uniquely determine the behaviour of replicate from the shape of the original and resulting arrays alone. For example, suppose that we want to expand a rank-2 array into a rank-3 array. There are three ways of doing this, depending on which dimension of the resulting array is to be duplicated. Indeed, the two calls to replicate in mmMult performed replication along two different

Slices and slice constraints

Shape types characterise a single shape. However, some collective array operations require a relationship between pairs of shapes. One such operation is replicate, which we used in mmMult. The

265

the minimum extent along every axis. This behaviour is the same as Haskell’s zip functions when applied to lists. The function map is implemented as follows:

dimensions, corresponding to different sides of the cuboid in Figure 2. It should be clear that replicate needs an additional argument, a slice specifier, that expresses exactly how the shape of the result array depends on the shape of the argument array. A slice specifier has the same format as an array index, but some index positions may use the value All instead of a numeric index.

map :: (a -> b) -> Array sh a -> Array sh b map f arr = traverse arr id (f .) The various zip functions are implemented in a similar manner, although they also use a method of the Shape type class to compute the intersection shape of the arguments.

data All = All In mmMult, we use replicate (Z:.All:.colsB:.All) arr to indicate that we replicate arr across the second innermost axis, colsB times. We use replicate (Z:.rowsA:.All:.All) trr to specify that we replicate trr across the outermost axis, rowsA times. The type of the slice specifier (Z :.All :.colsB :.All) is (Z :.All :.Int :.All). This type is sufficiently expressive to determine the shape of both the original array, before it gets replicated, and of the replicated array. More precisely, both of these types are a function of the slice specifier type. In fact, we derive these shapes using associated type families, a recent extension to the Haskell type system [3, 19], using the definition for the Slice type class shown in Figure 4. A function closely related to replicate is slice, which extracts a slice along multiple axes of an array. The full types of replicate and slice appear in Figure 4. We chose their argument order to match that used for lists: replicate is a generalisation of Data.List.replicate, while slice is a generalisation of Data.List.(!!). Finally, to enable rank generalisation for replicate and slice, we add a last slice specifier, namely Any, which is also defined in Figure 4. It is used in the tail position of a slice, just like Z, but gives a shape variable for rank generalisation. With its aid we can write repN which replicates an arbitrary array n times, with the replication being on the rightmost dimension of the result array:

5.2

Our library, Repa, provides two kinds of reductions: (1) generic reductions, such as foldl, and (2) specialised reductions, such as sum. In a purely sequential implementation, the latter would be implemented in terms of the former. However, in the parallel case we must be careful. Reductions of an n element array can be computed with parallel tree reduction, providing log n asymptotic step complexity in the ideal case, but only if the reduction operator is associative. Unfortunately, Haskell’s type system does not provide a way to express this side condition on the first argument of foldl. Hence, the generic reduction functions must retain their sequential semantics to remain deterministic. In contrast, for specialised reductions such as sum, when we know that the operators they use meet the associativity requirement, we can use parallel tree reduction. As outlined in Section 4.3, all reduction functions are defined with a shape polymorphic at-least constraint and admit rank generalisation. Therefore, even generic reductions, with their sequential semantics, are highly parallel if used with rank generalisation. Rank generalisation also affects specialised reductions, as they can be implemented in one of the following two ways. Firstly, if we want to maximise parallelism, then we can use a segmented tree reduction that conceptually performs multiple parallel tree reductions concurrently. Alternatively, we can simply use the same scheme as for general reductions, and perform all rank one reductions in parallel. We follow the latter approach and sacrifice some parallelism, as tree reductions come with some sequential overhead. In summary, when applied to an array of rank one, generic reductions (foldl etc.) execute purely sequentially with an asymptotic step complexity of n, whereas specialised reductions (sum etc.) execute in parallel using a tree reduction with an asymptotic step complexity of log n. In contrast, when applied to an array of rank strictly greater than one, both generic and specialised reductions use rank generalisation to execute many sequential reductions on one-dimensional subarrays concurrently.

repN :: Int -> Array sh e -> Array (sh:.Int) e repN n a = replicate (Any:.n) a

5.

Rectangular arrays, purely functional

As mentioned in Section 3, the type class Elt determines the set of types that can be used as array elements. We adopt Elt from the library of unboxed one-dimensional arrays in Data Parallel Haskell. With this library, array elements can be of the basic numeric types, Bool, and pairs formed from the strict pair constructor: data a :*: b = !a :*: !b We have also extended this to support index types, formed from Z and (:.), as array elements. Although it would be straightforward to allow other product and enumeration types as well, support for general sum types appears impractical in a framework based on regular arrays. Adding this would require irregular arrays and nested data parallelism [16]. Table 1 summarises the central functions of our library Repa. They are grouped according to the structure of the implemented array operations. We discuss the groups and their members in the following sections. 5.1

Reductions

5.3

Index space transformations

The structure-preserving operations and the reductions transform array elements, whereas index space transformations only alter the index at which an element is placed — that is, they rearrange and possibly drop elements. A prime example of this group of operations is reshape, which imposes a new shape on the elements of an array. A precondition of reshape is that the size of the extent of the old and new array is the same, meaning that the number of elements stays the same:

Structure-preserving operations

reshape :: Shape sh => sh -> Array sh’ e -> Array sh e reshape sh’ (Manifest sh ua) = assert (size sh == size sh’) $ Manifest sh’ ua reshape sh’ (Delayed sh f) = assert (size sh == size sh’) $ Delayed sh’ (f . fromIndex sh . toIndex sh’)

The simplest group of array operations are those that apply a transformation on individual elements without changing the shape, array size, or order of the elements. We have the plain map function, zip for element-wise pairing, and a family of zipWith functions that apply workers of different arity over multiple arrays in lockstep. In the case of zip and zipWith, we determine the shape value of the result by intersecting the shapes of the arguments — that is, we take

266

Structure-preserving operations map

:: (Shape sh, Elt a, Elt b) => (a -> b) -> Array sh a -> Array sh b zip :: (Shape sh, Elt a, Elt b) => Array sh a -> Array sh b -> Array sh (a :*: b) zipWith :: (Shape sh, Elt a, Elt b, Elt c) => (a -> b -> c) -> Array sh a -> Array sh b -> Array sh c hOther map-like operations: zipWith3, zipWith4, and so oni

Apply function to every array element. Elementwise pairing. Apply a function elementwise to two arrays. (the resulting shape is the intersection)

Reductions foldl :: (Shape sh, Elt a, Elt b) => (a -> b -> a) -> a -> Array (sh:.Int) b -> Array sh a hOther reduction schemes: foldr, foldl1, foldr1, scanl, scanr, scanl1 & scanr1i sum :: (Shape sh, Elt e, Num e) => Array (sh:.Int) e -> Array sh a hOther specific reductions: product, maximum, minimum, and & ori

Left fold.

Sum an array along its innermost axis.

Index space transformations reshape

:: => replicate :: => slice :: => (+:+) backpermute

(Shape sh, Shape sh’, Elt e) sh -> Array sh’ e -> Array sh e (Slice sl, Shape (FullShape sl), Shape (SliceShape sl)) sl -> Array (SliceShape sl) e -> Array (FullShape sl) e (Slice sl, Shape (FullShape sl), Shape (SliceShape sl)) Array (FullShape sl) e -> sl -> Array (SliceShape sl) e :: Shape sh => Array sh e -> Array sh e -> Array sh e :: (Shape sh, Shape sh’, Elt e) => sh’ -> (sh’ -> sh) -> Array sh e -> Array sh’ e backpermuteDft :: (Shape sh, Shape sh’, Elt e) => Array sh’ e -> (sh’ -> Maybe sh) -> Array sh e -> Array sh’ e unit :: Elt e => e -> Array Z e (!:) :: (Shape sh, Elt e) => Array sh e -> sh -> e

Impose a new shape on the same elements. Extend an array along new dimensions. Extract a subarray according to a slice specification. Append a second array to the first. Backwards permutation. Default backwards permutation.

Wrap a scalar into a singleton array. Extract an element at a given index.

General traversal traverse :: (Shape sh, Shape sh’, Elt e) => Array sh e -> (sh -> sh’) -> ((sh -> e) -> sh’ -> e’) -> Array sh’ e’ force :: (Shape sh, Elt e) => Array sh e -> Array sh e extent :: Array sh e -> sh

Unstructured traversal.

Force a delayed array into manifest form. Obtain size in all dimensions of an array.

Table 1. Summary of array operations The functions index and fromIndex are methods of the class Shape from Figure 3. The functions replicate and slice were already discussed in Section 4.4, and unit and (!:) are defined as follows:

diagonal :: Elt e => Array DIM2 e -> Array DIM1 e diagonal arr = assert (width == height) $ backpermute width (\x -> (x, x)) arr where _ :. height :. width = extent arr

unit :: e -> Array Z e unit = Delayed Z . const

Code that uses backpermute appears more like element-based array processing. However, it is still a collective operation with a clear parallel interpretation. Backwards permutation is defined in terms of the general traverse as follows:

(!:) :: (Shape sh, Elt e) => Array sh e -> sh -> e arr !: ix = snd (delay arr) ix A simple operator to rearrange elements is the function (+:+); it appends its second argument to the first and can be implemented with traverse by adjusting shapes and indexing. In contrast, general shuffle operations, such as backwards permutation, require the detailed mapping of target to source indices. We have seen this in the example transpose2D in Section 2. Another example is the following function that extracts the diagonal of a square matrix.

backpermute :: sh’ -> (sh’ -> sh) -> Array sh e -> Array sh’ e backpermute sh perm arr = traverse arr (const sh) (. perm) The variant backpermuteDft, known as default backwards permutation, operates in a similar manner, except that the target index is partial. When the target index maps to Nothing, the

267

6.

corresponding element from the default array is used. Overall, backpermuteDft can be interpreted as a means to bulk update the contents of an array. As we are operating on purely functional, immutable arrays, the original array is still available and the repeated use of backpermuteDft is only efficient if large part of the array are updated on each use. 5.4

We motivated the use of delayed arrays in Section 3.2 by the desire to avoid superfluous copying of array elements during index space transformation, such as in the definition of backpermute. However, another major benefit of delayed arrays is that it gives bydefault automatic loop fusion. Recall the implementation of map: map :: (a -> b) -> Array sh a -> Array sh b map f arr = traverse arr id (f .)

General traversal

The most general form of array traversal is traverse, which supports an arbitrary change of shape and array contents. Nevertheless, it is still represented as a delayed computation as detailed in Section 3.3. Although for efficiency reasons it is better to use specific functions such as map or backpermute, it is always possible to fall back on traverse if a custom computational structure is required. For example, traverse can be used to implement stencil-based relaxation methods, such as the following update function to solve the Laplace equation in a two dimensional grid [14]:

Now, imagine evaluating (map f (map g a)). If you consult the definition of traverse (Section 3.3) it should be clear that the two maps simply build a delayed array whose indexing function first indexes a, then applies g, and then applies f. No intermediate arrays are allocated and, in effect, the two loops have been fused. Moreover, this fusion does not require a sophisticated compiler transformation, nor does it require the two calls of map to be statically juxtaposed; fusion is a property of the data representation. Guaranteed, automatic fusion sounds too good to be true — and so it is. The trouble is that we cannot always use the delayed representation for arrays. One reason not to delay arrays is data layout, as we discussed in Section 3.3. Another is parallelism: force triggers data-parallel execution (Section 7). But the most immediately pressing problem with the delayed representation is sharing. Consider the following:

u0 (i, j) = (u(i − 1, j) + u(i + 1, j) + u(i, j − 1) + u(i, j + 1))/4 To implement this stencil, we use traverse as follows: stencil :: Array DIM2 Double -> Array DIM2 Double stencil arr = traverse arr id update where _ :. height :. width = extent arr update get d@(sh :. i :. j) = if isBoundary i j then get d else (get (sh :. (i-1) :. + get (sh :. i :. + get (sh :. (i+1) :. + get (sh :. i :.

Delayed arrays and loop fusion

let b = map f a in mmMult b b Every access to an element of b will apply the (arbitrarily-expensive) function f to the corresponding element of a. It follows that these arbitrarily-expensive computations will be done at least twice, once for each argument of mmMult, quite contrary to the programmer’s intent. Indeed, if mmMult itself consumes elements of its arguments in a non-linear way, accessing them more than once, the computation of f will be performed each time. If instead we say:

j) (j-1)) j) (j+1))) / 4

let b = force (map f a) in mmMult b b

isBoundary i j = (i == 0) || (i >= width - 1) || (j == 0) || (j >= height - 1)

then the now-manifest array b ensures that f is called only once for each element of a. In effect, a manifest array is simply a memo table for a delayed array. Here is how we see the situation:

As the shape of the result array is the same as the input, the second argument to traverse is id. The third argument is the update function that implements the stencil, while taking the grid boundary into account. The function get, passed as the first argument to update, is the lookup function for the input array. To solve the Laplace equation we would set boundary conditions along the edges of the grid and then iterate stencil until the inner elements converge to their final values. However, for benchmarking purposes we simply iterate it a fixed number of times:

• In most array libraries, every array is manifest by default, so

that sharing is guaranteed. However, loop fusion is difficult, and must often be done manually, doing considerable violence to the structure of the program. • In Repa every array is delayed by default, so that fusion is

guaranteed. However, sharing may be lost; it can be restored manually by adding calls to force. These calls do not affect the structure of the program. By using force, Repa allows the programmer tight control over some crucial aspects of the program: sharing, data layout, and parallelism. The cost is, of course, that the programmer must exercise that control to get good performance. Ignoring the issue altogether can be disastrous, because it can lead to arbitrary loss of sharing. In further work, beyond the scope of this paper, we are developing a compromise approach that offers guaranteed sharing with aggressive (but not guaranteed) fusion.

laplace :: Int -> Array DIM2 Double -> Array DIM2 Double laplace steps arr = go steps arr where go s arr | s == 0 = arr | otherwise = go (s-1) (force $ stencil arr) The use of force after each recursion is important, as it ensures that all updates are applied and that we produce a manifest array. Without it, we would accumulate a long chain of delayed computations with a rather non-local memory access pattern. In Repa, the function force triggers all computation, and as we will discuss in Section 7, the size of forced array determines the amount of parallelism in an algorithm.

7.

Parallelism

As described in Section 3.1, all elements of a Repa array are demanded simultaneously. This is the source of all parallelism in the library. In particular, an application of the function force triggers the parallel evaluation of a delayed array, producing a manifest one. Assuming that the array has n elements and that

268

we have P parallel processing elements (PEs) available to perform the work, each PE is responsible for computing n/P consecutive elements in the row-major layout of the manifest array. In other words, the structure of parallelism is always determined by the layout and partitioning of a forced array. The execution strategy is based on gang parallelism and is described in detail in [5]. Let us re-consider the function mmMult from Section 3.3 and Figure 2 in this light. We assume that arr is a manifest array, and know that trr is manifest because of the explicit use of force. The rank-2 array produced by the rank-generalised application of sum corresponds to the right face of the cuboid from Figure 2. Hence, if we force the result of mmMult, the degree of available parallelism is proportional to the number of elements of the resulting array — 8 in the figure. As long as the hardware provides a sufficient number of PEs, each of these elements may be computed in parallel. Each involves the element-wise multiplication of a row from arr with a row from trr and the summation of these products. If the hardware provides fewer PEs, which is usually the case, the evaluation is evenly distributed over the available PEs. Let’s now turn to a more sophisticated parallel algorithm, the three-dimensional fast Fourier transform (FFT). Three-dimensional FFT works on one axis at a time: we apply the one dimensional FFT to all vectors along one axis, then the second and then the third. Instead of writing a separate transform for each dimension, we implement one-dimensional FFT as a shape polymorphic function that operates on the innermost axis. We combine it with a threedimensional rotation, rotate3D, which allows us to cover all three axes one after another:

halve :: (sh:.Int -> sh:.Int) -> Array (sh:.Int) Complex -> Array (sh:.Int) Complex halve sel arr = backpermute (sh :. n ‘div‘ 2) sel arr where sh:.n = extent arr By virtue of rank generalisation, this shape polymorphic function will split all rows of a three-dimensional cube at once and in the same manner. The following two convenience functions use halve to extract all elements in even and odd positions, respectively. evenHalf, oddHalf :: Array -> Array evenHalf = halve (\(ix:.i) oddHalf = halve (\(ix:.i)

(sh:.Int) Complex (sh:.Int) Complex -> ix :. 2*i) -> ix :. 2*i+1)

Now, the definition of the one-dimensional transform is a direct encoding of the Cooley-Tukey algorithm: fft1D :: Array (sh:.Int) Complex -> Array (sh:.Int) Complex -> Array (sh:.Int) Complex fft1D rofu v | n > 2 = (left +^ right) :+: (left -^ right) | n == 2 = traverse v id swivel where (_ :. n) = extent v swivel f (ix:.0) = f (ix:.0) + f (ix:.1) swivel f (ix:.1) = f (ix:.0) - f (ix:.1)

fft3D :: Array DIM3 Complex -- roots of unity -> Array DIM3 Complex -- data to transform -> Array DIM3 Complex fft3D rofu = fftTrans . fftTrans . fftTrans where fftTrans = rotate3D . fft1D rofu

rofu’ = evenHalf rofu left = force . fft1D rofu’ . evenHalf $ v right = force . (*^ rofu) . fft1D rofu’ . oddHalf $ v (+^) (-^) (*^)

The first argument, rofu, is an array of complex roots of unity, which are constants that we wish to avoid recomputing for each call. The second is the three-dimensional array to transform, and we require both arrays to have the same shape. We also require each dimension to have a size which is a power of 2. If the result of fft3D is forced, evaluation by P PEs is again on P consecutive segments of length n3 /P of the row-major layout of the transformed cube, where n is the side length of the cube. However, the work that needs to be performed for each of the elements is harder to characterise than for mmMult, as the computations of the individual elements of the result are not independent and as fft1D uses force internally. Three-dimensional rotation is easily defined based on the function backpermute which we discussed previously:

= zipWith (+) = zipWith (-) = zipWith (*)

All the index space transformations that are implemented in terms of backpermute, as well as the elementwise arithmetic operations based on zipWith produce delayed arrays. It is only the use of force in the definition of left and right that triggers the parallel evaluation of subcomputations. In particular, as we force the recursive calls in the definition of left and right separately, these calls are performed in sequence. The rank-generalised input vector v is halved with each recursive call, and hence, the amount of available parallelism decreases. However, keep in mind that —by virtue of rank generalisation— we perform the one-dimensional transform in parallel on all vectors of a cuboid. That is, if we apply fft3D to a 64 × 64 × 64 cube, then fft1D still operates on 64 ∗ 64 ∗ 2 = 8192 complex numbers in one parallel step at the base case, where n = 2.

rotate3D :: Array DIM3 Complex -> Array DIM3 Complex rotate3D arr = backpermute (Z:.m:.k:.l) f where (Z:.k:.l:.m) = extent arr f (Z:.m:.k:.l) = (Z:.k:.l:.m)

8.

Benchmarks

In this section, we discuss the performance of three programs presented in this paper: matrix-matrix multiplication from Section 3.3, the Laplace solver from Section 5.4 and the fast Fourier transform from Section 7. We ran the benchmarks on two different machines: • a 2x Quad-core 3GHz Xeon server and

The one-dimensional fast Fourier transform is more involved: it requires us to recursively split the input vector in half and apply the transform to the split vectors. To facilitate the splitting, we first define a function halve that drops half the elements of a vector, where the elements to pick of the original are determined by a selector function sel.

• a 1.4GHz UltraSPARC T2.

The first machine is a typical x86-based server with good singlecore performance but frequent bandwidth problems in memoryintensive applications. The bus architecture directly affects the scalability of some of our benchmarks, namely, the Laplace solver,

269

Matrix mult Laplace FFT

1024×1024 300×300 128×128×128

GCC 4.2.1 3.8s 0.70s 0.24s

Repa 1 thread fastest parallel 4.6s 0.64s 1.7s 0.68s 8.8s 2.0s

Matrix-matrix multiplication on 2x Quad-core 3Ghz Xenon 8 7 6

Matrix mult Laplace FFT

1024×1024 300×300 128×128×128

GCC 4.1.2 53s 6.5s 2.4s

5

speedup

Figure 5. Performance on the Xeon

4 3

Repa 1 thread fastest parallel 92s 2.4s 32s 3.8s 98s 7.7s

2 1 0

1

2

3

4

5

6

7

8

threads

Figure 6. Performance on the SPARC Matrix-matrix multiplication on 1.4Ghz UltraSPARC T2 64

which cannot utilise multiple cores well due to bandwidth limitations. The SPARC-based machine is more interesting. The T2 processor has 8 cores and supports up to 8 hardware threads per core. This allows it to effectively hide memory latency in massively multithreaded programs. Thus, despite a significantly worse single-core performance than the Xeon it exhibits much better scalability which is clearly visible in our benchmarks. 8.1

56 48 speedup

40 32 24 16 8

Absolute performance

0

Before discussing the parallel behaviour of our benchmarks, let us investigate how Repa programs compare to hand-written C code when executed with only one thread. The C matrix-matrix multiplication and Laplace solver are straightforwardly written programs, while the FFT uses FFTW 3.2.2 [8] in “estimate” mode. Figure 5 shows the single threaded results together with the fastest running times obtained through parallel execution. For matrix multiplication and Laplace on the Xenon, Repa is slower than C when executed sequentially, but not by much. FFTW uses a finely tuned in-place algorithm, which is significantly faster but more complicated than our own direct encoding of the Cooley-Tukey algorithm. We include the numbers with FFTW for comparative purposes, but note that parallelism is no substitute for a more efficient algorithm. Compared with the Xenon, the results on the SPARC (Figure 6) are quite different. The SPARC T2 is a “throughput” machine, designed to execute workloads consisting of many concurrent threads. It has half the clock rate of the Xenon, and does not exploit instruction level parallelism. This shows in the fact that the single threaded C programs run about 10x slower than their Xenon counterparts. The SPARC T2 also does not perform instruction reordering or use speculative execution. GHC does not perform compile time scheduling to account for this, which results in a larger gap between the single threaded C and Repa programs than on the Xenon. We have also compared the performance of the Laplace solver to an alternative, purely sequential Haskell implementation based on unboxed, mutable arrays running in the IO Monad (IOUArray). This version was about two times slower than the Repa program, probably due to the overhead introduced by bounds checking, which is currently not supported by our library. Note, however, that bounds checking is unnecessary for many collective operations such as map and sum, so even after we introduce it in Repa we still expect to see better performance than a low-level, imperative implementation based on mutable arrays.

1

8

16

24

32

40

48

56

64

threads (on 8 PEs)

Figure 7. Matrix-matrix multiplication, size 1024x1024.

8.2

Parallel behaviour

The parallel performance of matrix multiplication is show in Figure 7.3 Each point shows the lowest, average, and highest speedups for ten consecutive runs. Here, we get excellent scalability on both machines. On the Xeon, we achieve a speedup of 7.2 with 8 threads. On the SPARC, it scales up to 64 threads with a peak speedup of 38. Figure 8 shows the relative speedups for the Laplace solver. This program achieves good scalability on the SPARC, reaching a speedup of 8.4 with 14 threads but performs much worse on the Xeon, stagnating at a speedup of 2.5. As Laplace is memory bound, we attribute this behaviour to insufficient bandwidth on the Xeon machine. There is also some variation in the speedups from run to run, which is more pronounced when using specific numbers of threads. We attribute this to scheduling effects, in the hardware, OS, and GHC runtime system. Finally, the parallel behaviour of the FFT implementation is shown in Figure 9. This program scales well on both machines, achieving a relative speedup of 4.4 on with 7 threads on the Xeon and 12.7 on 14 threads on the SPARC. Compared to the Laplace solver, this time the scalability is much better on the Xenon but practically unchanged on the SPARC. Note that FFT is less memory intensive than Laplace. The fact that Laplace with a 300 × 300 matrix does not scale as well on the Xenon as it does on the SPARC 3 Yes,

270

those really are the results in the first graph of the figure.

Laplace solver on 2x Quad-core 3GHz Xenon

FFT3D on 2x Quad-core 3GHz Xenon

4

5

2

speedup

speedup

4

size 400x400

3

size 300x300

1

3 2 1

0

1

2

3

4

5

6

7

0

8

1

2

3

threads Laplace solver on 1.4Ghz UltraSPARC T2 10

6

7

8

28

32

FFT3D on 1.4Ghz UltraSPARC T2

size 300x300

12

6

speedup

speedup

5

16

8

size 400x400

4

8 4

2 0

4 threads

1 2

4

6

8

10

12

14

0

16

1

4

8

threads (on 8 PEs)

12

16

20

24

threads (on 8 PEs)

Figure 8. Laplace solver, 1000 iterations.

Figure 9. 3D Fast Fourier Transform, size 128x128x128

supports our conclusion that this benchmark suffers from lack of memory bandwidth. For Laplace with a 400 × 400 matrix on the SPARC, we suspect the sharp drop off after 8 threads is due to the added threads contending for cache. As written, the implementation from Section 5.4 operates on a row-by-row basis. We expect that changing to a block-wise algorithm would improve cache-usage and reduce the bandwidth needed.

Partly motivated by the shortcomings of standard arrays, numerous Haskell array libraries have been proposed in recent years. These range from highly specialised ones such as ByteString [7] to full-fledged DSLs for programming GPUs [12]. However, these libraries do not provide the same degree of flexibility and efficiency for manipulating regular arrays if they support them at all. Our own work on Data Parallel Haskell is of particular relevance in this context as the work presented in this paper shares many of its ideas and large parts of its implementation with that project. Indeed, Repa can be seen as complementary to DPH. Both provide a way of writing high-performance parallel programs but DPH supports irregular, arbitrarily nested parallelism which requires it to sacrifice performance when it comes to purely regular computations. One of the goals of this paper is to plug that hole. Eventually, we intend to integrate Repa into DPH, providing efficient support for both regular and irregular arrays in one framework.

9.

Related Work

Array programming is a highly active research area so the amount of related work is quite significant. In this section, we have to restrict ourselves to discussing only a few most closely related approaches. 9.1

Haskell array libraries

Haskell 98 already defines an array type as part of its prelude which, in fact, even provides a certain degree of shape polymorphism. These arrays can be indexed by arbitrary types as long as they are instances of Ix, a type class which plays a similar role to our Shape. This allows for fully shape-polymorphic functions such as map. However, standard Haskell arrays do not support at-least constraints and rank generalisation which are crucial for implementing highly expressive operations such as sum from Section 4.3. This inflexibility precludes many advanced uses of shape polymorphism described in this paper and makes even unboxed arrays based on the same interface a bad choice for a parallel implementation.

9.2

C++ Array Libraries

Due to its powerful type system and its wide-spread use in highperformance computing, C++ has a significant number of array libraries that are both fast and generic. In particular, Blitz++ [23] and Boost.MultiArray [1] feature multidimensional arrays with a restricted form of shape polymorphism. However, our library is much more flexible in this regard and also has the advantage of a natural parallel implementation which neither of the two C++ libraries provide. Moreover, these approaches are inherently imperative while

271

we provide a purely functional interface which allows programs to be written at a higher level of abstraction.

[8] M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216–231, 2005. Special issue on “Program Generation, Optimization, and Platform Adaptation”.

9.3

[9] A. Gilat. MATLAB: An Introduction with Applications 2nd Edition. John Wiley & Sons, 2004. ISBN 978-0-471-69420-5.

Array Languages

In addition to libraries, there exist a number of special-purpose array programming languages. Of these, Single Assignment C (SAC) [18] has exerted the most influence on our work and is the closest in spirit as it is purely functional and strongly typed. SAC provides many of the same benefits as Repa: high-performance arrays with shape polymorphism, expressive collective operations and extensive optimisation based on with-loops, a special-purpose language construct for creating, traversing and reducing arrays. It also comes with a rich library of standard array and matrix operations which Repa has not yet acquired. However, Repa has the advantage of being integrated into a mainstream functional language and not requiring specific compiler support. This allows Repa programs to utilise the entire Haskell infrastructure and to drop down to a very low level of abstraction if required in specific cases. This, along with strong typing and purity, are also the advantages Repa has over other array languages such as APL, J and Matlab [2, 9, 21].

[10] J. H. v. Groningen. The implementation and efficiency of arrays in Clean 1.1. In W. Kluge, editor, Proceedings of Implementation of Functional Languages, 8th International Workshop, IFL ’96, Selected Papers, number 1268 in LNCS, pages 105–124. Springer-Verlag, 1997. [11] J. Launchbury and S. Peyton Jones. Lazy functional state threads. In Proceedings of Programming Language Design and Implementation (PLDI 1994), pages 24–35, New York, NY, USA, 1994. ACM. [12] S. Lee, M. M. T. Chakravarty, V. Grover, and G. Keller. GPU kernels as data-parallel array computations in haskell. In EPAHM 2009: Workshop on Exploiting Parallelism using GPUs and other HardwareAssisted Methods, 2009. [13] X. Leroy, D. Doligez, J. Garrigue, D. R´emy, and J. Vouillon. The Objective Caml system, release 3.11, documentation and user’s manual. Technical report, INRIA, 2008. [14] J. Mathews and K. Fink. Numerical Methods using MATLAB, 3rd edition. Prentice Hall, 1999. [15] S. Peyton Jones. Call-pattern specialisation for Haskell programs. In Proceedings of the International Conference on Functional Programming (ICFP 2007), pages 327 –337, 1997. [16] S. Peyton Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty. Harnessing the multicores: Nested data parallelism in Haskell. In R. Hariharan, M. Mukund, and V. Vinay, editors, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2008), Dagstuhl, Germany, 2008. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany. URL http: //drops.dagstuhl.de/opus/volltexte/2008/1769. [17] F. A. Rabhi and S. Gorlatch, editors. Patterns and Skeletons for Parallel and Distributed Computing. Springer-Verlag, 2003. [18] S.-B. Scholz. Single assignment C – efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 13(6):1005–1059, 2003. [19] T. Schrijvers, S. Peyton-Jones, M. M. T. Chakravarty, and M. Sulzmann. Type checking with open type functions. In Proceedings of ICFP 2008 : The 13th ACM SIGPLAN International Conference on Functional Programming, pages 51–62. ACM Press, 2008. [20] W. Swierstra and T. Altenkirch. Dependent types for distributed arrays. In Trends in Functional Programming, volume 9, 2008. [21] The International Standards Organisation. Programming Language APL. ISO standard 8485, 1989. [22] The OpenMP Architecture Review Board. OpenMP Application Program Interface, 2008. URL http://www.openmp.org/specs. [23] T. L. Veldhuizen. Arrays in Blitz++. In Proceedings of the 2nd International Scientific Computing in Object Oriented Parallel Environments (ISCOPE’98). Springer-Verlag, 1998. ISBN 978-3-540-653875. [24] H. Xi. Dependent ML: an approach to practical programming with dependent types. Journal of Functional Programming, 17(2):215–286, 2007.

Acknowledgements. We are grateful to Arvind for explaining the importance of delaying index space transformations and thank Simon Winwood for comments on a draft. We also thank the anonymous ICFP’10 reviewers for their helpful feedback on the paper. This research was funded in part by the Australian Research Council under grant number LP0989507.

References [1] The Boost Multidimensional Array Library, April 2010. URL http://www.boost.org/doc/libs/1_42_0/libs/multi_ array/doc/user.html. [2] C. Burke. J and APL. Iverson Software Inc., 1996. [3] M. M. T. Chakravarty, G. Keller, and S. Peyton Jones. Associated type synonyms. In ICFP ’05: Proceedings of the Tenth ACM SIGPLAN International Conference on Functional Programming, pages 241– 253, New York, NY, USA, 2005. ACM Press. ISBN 1-59593-064-7. doi: http://doi.acm.org/10.1145/1086365.1086397. [4] M. M. T. Chakravarty, G. Keller, S. Peyton Jones, and S. Marlow. Associated types with class. In POPL ’05: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 1–13. ACM Press, 2005. ISBN 1-58113-830-X. doi: http://doi.acm.org/10.1145/1040305.1040306. [5] M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: a status report. In DAMP 2007: Workshop on Declarative Aspects of Multicore Programming. ACM Press, 2007. [6] D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP 2007). ACM Press, 2007. [7] D. Coutts, D. Stewart, and R. Leshchinskiy. Rewriting Haskell strings. In Practical Aspects of Declarative Languages 8th International Symposium, PADL 2007, pages 50–64. Springer-Verlag, Jan. 2007.

272

A Certified Framework for Compiling and Executing Garbage-collected Languages Andrew McCreight

Tim Chevalier

Andrew Tolmach

Portland State University {mccreigh,tjc,apt}@cs.pdx.edu

Abstract

Garbage collection (GC) is a key runtime service that is often a source of bugs. GC bugs can result from erroneous algorithms or incorrect collector implementations, or because the intended interface between the collector and the mutator—the application code that makes allocation requests and performs reads and writes on the heap—has been violated. Moreover, GC bugs are often difficult to reproduce and diagnose. Garbage collection is therefore a good application area for formal methods, including machine-certified correctness proofs, and several proofs of collector implementations have been developed in recent years [14, 22, 24, 36]. Bugs often occur because the collector-mutator interface has not been explicitly specified, making it easy for implementers on either side to violate intended invariants. Precise garbage collectors must be able to access all roots, i.e., pointers from mutator data structures into the heap, and to ascertain the layout, i.e., size and embedded pointer positions, for all heap records. The collector proofs cited above formalize the interface as seen from the collector’s side. But there has been much less work on ensuring that the mutator obeys its side of the interface. In this work, we show how to encapsulate the key aspects of the collector-mutator interface into the semantics of a generic intermediate language, called GCminor, that can serve as the target for compiling a range of garbage-collected languages, and as the source for a machine-certified semantics-preserving compiler back end. GCminor makes the collector-mutator interface both explicit and precise. A client, i.e., a compiler front end, can use the back end simply by generating code in GCminor and record layout descriptions in a format that GCminor can accept. GCminor supports many styles of uniprocessor memory managers, including mark-sweep, copying, and generational collectors. Any real collector implementation will modify the heap and possibly the values of root variables. However, GCminor’s semantics completely hide these effects: from the perspective of mutator code, the heap and reachable pointers do not appear to change at all during a collection. This property makes it much easier to verify that the mutator code obeys the GC interface. We enforce correctness of root declarations using a block-based memory model together with a novel pointer-clearing technique at the semantic level. Our work extends Leroy’s Compcert compiler and Cminor intermediate language [16, 18]. Compcert compiles (most of) C to PowerPC or ARM assembly code, and is proven to preserve the observable behavior of the program: its sequence of system calls and final return value. The proof is certified using the Coq Proof Assistant [2]. Cminor is an untyped, low-level imperative intermediate language with C-like control constructs, which sits between Compcert’s C-specific front end and its processor-specific back end. It supports global and local memory, but not a heap. We define GCminor as a small extension to Cminor that adds statements for allocating in a garbage-collected heap and declarations for specifying heap roots. We implement new Compcert pipeline stages to translate GCminor programs into ordinary Cminor code, intended to be linked against a memory management library also written in Cminor. The existing Compcert back end com-

We describe the design, implementation, and use of a machinecertified framework for correct compilation and execution of programs in garbage-collected languages. Our framework extends Leroy’s Coq-certified Compcert compiler and Cminor intermediate language. We add: (i) a new intermediate language, GCminor, that includes primitives for allocating memory in a garbage-collected heap and for specifying GC roots; (ii) a precise, low-level specification for a Cminor library for garbage collection; and (iii) a proven semantics-preserving translation from GCminor to Cminor plus the GC library. GCminor neatly encapsulates the interface between mutator and collector code, while remaining simple and flexible enough to be used with a wide variety of source languages and collector styles. Front ends targeting GCminor can be implemented using any compiler technology and any desired degree of verification, including full semantics preservation, type preservation, or informal trust. As an example application of our framework, we describe a compiler for Haskell that translates the Glasgow Haskell Compiler’s Core intermediate language to GCminor. To support a simple but useful memory safety argument for this compiler, the front end uses a novel combination of type preservation and runtime checks, which is of independent interest. Categories and Subject Descriptors D.3.4 [Processors]: Compilers, Memory management (garbage collection); D.2.4 [Software/Program Verification]: Correctness proofs General Terms

1.

Languages, Verification, Reliability, Security.

Introduction

Programming in high-level, type-safe languages such as Haskell, ML, or Java eliminates large classes of potential bugs, thus increasing reliability while reducing implementation time and cost in many application domains. Safe languages should be particularly attractive for implementing systems that demand the highest possible levels of assurance, such as safety-critical device control or high-security data processing, which are currently very expensive to produce. But the appeal of these languages for high-assurance applications is undercut by their reliance on large, complex runtime systems, usually written in C or assembler. For example, the runtime system of the Glasgow Haskell Compiler (GHC) [31] consists of roughly 75,000 lines of C code. Such systems are very difficult to verify, even informally.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

273

Haskell …

ML

GHC

Core

\ \

Java

Our work

e ::= | | | | | |

id l addrsymbol(id) stackaddri op(~e) load(ch,eaddr ) e1 ? e2 : e3

s ::= | | | | | | | | | |

id = e; local variable assignment store(ch,eaddr ,eval ); store to global or stack frame [id =] call(e,~e); function call tailcall(e,~e); function tail call if e {~s1 } else {~s2 } conditional block {~s} delimited block loop {~s} infinite loop exit n; block exit −−→ switch e {i : n} n; switched exit return [e]; return skip; no-op

local variable integer or float literal address of global symbol address of stack frame entry arithmetic operations memory load conditional expression

Dminor GCminor LowGCminor

GC library

Cminor

\ \

Compcert

PPC

ARM

→ − − → f ::= fun( id ) { stack n; vars id ; ~s } −−−−→ p ::= functions id = f ; −−−−−−−−−−→ vars id = initializers; main id

Figure 1. Overall architecture. Boxes represent languages; double lines are semantics-preserving translations and dotted lines are type-preserving translations. The GC library, written in Cminor, is linked into the program during translation. The dashed boxes and lines show possible future extensions.

Figure 2. Syntax of Cminor expressions (e), statements (s), functions (f ) and programs (p). A chunk (ch) specifies type, size, and signedness of a datum.

piles the resulting complete Cminor program to machine code, and the existing back end semantics-preservation proofs guarantee the executable’s runtime behavior. The translation from GCminor to Cminor is proven in Coq to be semantics-preserving relative to a low-level specification of the memory management library, which can be implemented by a range of bump-pointer allocators and record-moving collectors. In particular, several existing proofs of Cheney-style copying collectors, developed both by ourselves [22] and others [14, 24], obey similar specifications. (We do not yet have a machine-checked proof that the specification assumed by GCminor and the specification obeyed by our collector match precisely.) To illustrate the utility of our extended Compcert back end, we exhibit a prototype Haskell compiler. We implement a front end that translates GHC’s Core intermediate language to another new intermediate language, dubbed Dminor, and from there to GCminor. Dminor is a purely functional, typed language that guarantees memory safety through a novel combination of runtime checks and a rudimentary type system to distinguish pointers from nonpointers. We have proven semantics preservation in Coq for most of the Dminor-to-GCminor translation (including all the language features that involve allocation). This result can be combined with a type checker for Dminor and soundness proof for the checker to show that any program that compiles without complaint is memorysafe. To obtain a stronger guarantee, we could prove that the Coreto-Dminor translation preserves well-typedness, thereby showing that any well-typed Core program is memory-safe. (We do not have machine-checked versions of these typing proofs.)

2.

functional Coq program that is automatically extracted to executable OCaml code [34]. Cminor. Cminor (Figure 2) is the C-like intermediate language at the heart of the Compcert compiler. A program consists of function definitions and global data definitions. Functions have the usual parameters and local variables. In addition, a function can place data in an explicit stack frame, which is disjoint from storage for the named variables. The size of a function’s frame is specified in the function header; the expression stackaddri refers to the ith byte of the current function’s frame. Other expressions are standard. Statements are also largely standard, but include support for structured control flow using nested blocks; an exit statement, where exit n branches to the end of the (n + 1)st enclosing block; and a switch statement, which matches an integer discriminant against a list of values and exits to the corresponding specified enclosing block. Cminor semantics. Cminor has a small-step operational semantics in the style originally suggested by Appel and Blazy [3]. A characteristic transition rule for Cminor statements has the form t

G `CM (F, st, E, k, σ, M ) −→ (F 0 , st0 , E 0 , k0 , σ 0 , M 0 ) Here G is the global environment, which maps function names to definitions and global variables to memory locations; F is the current function definition; st is the statement at the current program point; E is the local environment, which maps parameters and locals to values; k is the continuation, which describes both the remainder of the current function and the call stack, including the local environments of suspended activations; σ points to the explicit stack frame; M is the memory; and t is a trace of the observable events (system calls) that occur as a side-effect of evaluation. The meaning of a program is given by the (finite or infinite) trace it produces when started from a suitable initial state, together with its final result value (if it terminates). As usual, possible

Compcert and Cminor

Our work forms a backwards-compatible extension to Version 1.4 of the Compcert system [16–18], which we review in this section. Figure 1 shows the overall architecture of our framework. Compcert and its correctness proof are structured as a pipeline of small translation steps between intermediate languages, each with its own operational semantics. The compiler itself is a purely

274

unchecked errors during program execution are modeled by the absence of a suitable transition rule, in which case evaluation is said to “get stuck.” A distinctive feature of the Compcert architecture is that all intermediate languages in the pipeline, from C through assembly code, share the same memory model [19] and notion of values. The memory M is composed of an unbounded number of disjoint blocks, indexed by (mathematical) integers. Cminor uses one statically-allocated block per global variable and one dynamicallyallocated block per stack frame. Each block has fixed upper and lower bounds (signed mathematical integers) set at block-allocation time. The block contains values indexed by byte offsets (signed machine integers) within these bounds. Operations on the memory include Mem.alloc, which (always) returns a fresh block, with initially undefined contents; Mem.high bound and Mem.low bound, which return the bounds of a specified block; Mem.load and Mem.store, which operate on a specified block, offset, and chunk; and Mem.free, which renders a block inaccessible (but does not permit the block number to be re-used). Chunks describe the type, size, and signedness of memory being accessed; they include int8unsigned, int32 and float64. Values are described by the grammar

property. This property is somewhat accidental, in the sense that if Compcert elected to use a lower-level memory model for assembly code—e.g., a flat array of bytes—then progress of assembly programs might not imply anything about memory safety.1 Semantics preservation. Semantics preservation proofs in Compcert generally take the form of forward simulations. Let L1 and L2 be adjacent languages in the compiler pipeline, P1 be a program in L1 and P2 be the corresponding program in L2 . To show that P1 and P2 have the same observable behavior, we define a simulation relation ∼ between the semantic states of L1 and those of L2 , and then show that this relation is preserved as execution of P1 and P2 progresses. S1

t

~ S2

S'1 ~

t *

S'2

In words, the diagram says that if state S1 is simulated by state S2 , and S1 can reach S10 by taking a single step, generating trace t, then there exists a state S20 such that S2 can reach S20 by taking zero or more steps also generating t, and S10 is simulated by S20 . By inductively applying this diagram over an entire L1 execution sequence, we can prove the existence of an equivalent L2 sequence: that is, any observable behavior of P1 can be mimicked by P2 . Moreover, if P2 is deterministic, we can also show that every P2 behavior is equivalent to a P1 behavior, and hence that the translation preserves specifications about the behavior; see Leroy [18] for details.

value := Vint n | Vfloat f | Vptr b n | Vundef where n is a 32-bit machine integer (which can represent both signed and unsigned numbers), f is a double-precision IEEE float, and b ∈ Z is a memory block number. Vptr b n is a pointer to offset n within block b. The null pointer is represented by Vint 0. Vundef represents undefined values, e.g., the contents of uninitialized offsets within a block. Memory loads and stores only succeed within the boundaries of a valid block. Also, it is not possible to cast one kind of value to another (without applying an explicit coercion operator), so in particular, Vptr values cannot be forged. As a concrete example of a semantics transition rule, here is one for the store statement:

Memory embeddings. In many cases, a key part of the state simulation relation S1 ∼ S2 describes how the memory components M1 and M2 of the two states are related. Depending on the transformation, memory blocks may be added, removed, or combined. Formally, the memory relation is specified by an memory embedding φ from the blocks of M1 to addresses in M2 [19]. For a semantics preservation proof to succeed, the embedding must guarantee that successful loads and stores in M1 are simulated in M2 .

G, E, M, σ ` ea → Vptr b n G, E, M, σ ` ev → v Mem.store M ch b n v = Some M 0

3.

G `CM (F, store(ch,ea ,ev ), E, k, σ, M ) −→ (F, skip, E, k, σ, M 0 )

GCminor

GCminor is our target language for generated mutator code. A primary design goal for GCminor is that it be as general-purpose as possible. On the mutator side, we support both functional and object-oriented languages. The principal restriction on clients is that heap roots must be identified statically; we do not support collectors that distinguish dynamically between pointer and nonpointer values, e.g., by stealing a bit from each value to flag pointers, a trick that cannot be expressed in Compcert’s current memory model. This limitation makes GCminor unsuitable as a target for compilers that generate a single piece of object code for functions that are polymorphic over both boxed and unboxed values. On the collector side we support a range of “stop-and-collect” styles, including both copying and mark-sweep collectors; we also include hooks supporting generational collectors. We leave extension to incremental and concurrent collectors for future work. GCminor provides a well-defined interface for communication between the mutator and collector. There are two main aspects to the interface:

Here the hypotheses of the form G, E, M, σ ` e → v invoke a separate set of rules for evaluating expressions e to values v. If the parameters to Mem.store are invalid, the rule will not apply, and the program will get stuck. For statements such as store that do not alter control flow, we define the next statement to be just skip; we can then encapsulate all the details of inspecting the continuation to determine what to do next within the transition rule for skip. The trace annotation represents the empty trace. Assembly code semantics. Other intermediate languages in the Compcert pipeline use similar small-step semantic formulations. For example, the assembly code semantics transition relation has the form t G `AS (R, M ) −→ (R0 , M 0 ) where R maps target machine registers to values, and G, M , and t are as above. As with Cminor, the assembly code semantics uses one statically-allocated block per global variable and one dynamically-allocated block per stack frame. Note that because this semantics uses the same models of values and memory as earlier languages in the pipeline, assembly code programs will get stuck if they attempt to forge pointers or access arbitrary parts of memory. Thus, “non-stuck” assembly programs enjoy a memory safety

i. Garbage collection roots are mutator variables that hold pointers to heap records. The collector uses these as the starting 1 On

the other hand, informally the existence of the observational equivalence proof would still be very comforting, as it is hard to imagine that a (non-malicious) compiler could systematically violate memory safety and still generate correct code for all programs!

275

s ::= | | |

... − → id = alloc(~i); rstore(ch,ebl ,eoff ,eval ); [id =] extcall(e,~e);

as in Cminor heap record allocation store to heap record call to external function

Descriptors

3

2

2

Global store 8

Records 1

0

7

9

Heap

NULL

→ − − → − → f ::= fun( id ) { stack n; vars id ; roots id ; ~s } Figure 4. Example of standard layout descriptor scheme

Figure 3. Syntax of GCminor statements(s) and functions (f ). Expressions (e) and programs (p) are the same as in Cminor.

data regions or in explicitly-allocated stack frames; these restrictions would be straightforward to remove. Correct specification of roots by the mutator is essential, because only pointers to reachable records are guaranteed to remain valid after a collection. Mutator code must obey the invariant that each declared root variable always holds either a heap record pointer obtained from an alloc or null. To make this task easier, GCminor implicitly initializes all local roots to null; the subsequent translation from GCminor to Cminor can usually eliminate these initializations.

points of its search for reachable records; if a moving collector is used, it may also update root values. Roots are explicitly declared in GCminor functions. ii. The layout for a heap record tells the collector how long the record is, and which fields contain pointers. As with roots, pointer fields must be traced to find other reachable records, and may also be updated by a moving collector. In our system, pointer layout is always determined by the record header, but the precise method by which this is done is an auxiliary parameter of the system, specified outside of GCminor itself.

Record layout. Both the semantics of GCminor and the actual implementation of the underlying collector need to know which record fields contain heap pointers. We classify all values as having either GC type Ptr, meaning a heap pointer, or GC type Atomic, meaning an integer, float,2 or a pointer into the global static data area. (The value Vint 0, which represents null, has both types.) The collector needs to identify all fields that contain Ptr values. Concretely, there are many possible ways to associate a record’s header with its layout description: e.g., the size and pointer information might be stored directly in bit fields within the header, or indirectly in an auxiliary data structure pointed to by the header. When designing GCminor, we considered hiding this choice from clients, and, e.g., simply including a list of pointer field locations as an additional parameter of the alloc statement. But we rejected this approach: in practice, clients need concrete control over headers and descriptors, because they are often used for additional purposes besides garbage collection. For example, our prototype Haskell compiler (Section 6) uses an additional descriptor field to encode the type of closures; similarly, object-oriented languages often use the header to point to a class descriptor record or vtable. Therefore, although GCminor requires that the record layout can always be determined from the record header, it is flexible about exactly how this connection is made. A record returned by alloc always contains a header, but the mutator is responsible for writing the header contents explicitly; if headers point to auxiliary static descriptors, the mutator must provide explicit global data definitions for those descriptors. In general, therefore, the client must specify its desired layout description scheme to our system. Abstractly, the necessary information consists of two functions: size : memory → value → nat ptrP : memory → value → nat → bool

Language. GCminor (see Figure 3) extends Cminor with three syntactic features that support a garbage-collected heap. First, the language is augmented with an alloc statement that allocates fresh heap records of specified sizes, after performing garbage collection if necessary. Second, each function definition is annotated with a list of the variables (parameters and locals) that hold heap pointers; the garbage collector uses these variables as roots for its traversal of the live data graph within the heap. Finally, rstore is a new variant of the memory store statement specifically for updating heap records, which provides a hook for a write barrier. We also add a syntactic distinction between calls to GCminor code and calls to external functions, which lets the implementation of GCminor use different calling conventions for these two cases (see Section 4). The statement id1 ,. . .,idn = alloc(i1 ,. . .,in ) allocates n records with data sizes i1 , . . . , in (counted in words) and assigns pointers to them into id1 , . . . , idn respectively. Each record is prefixed by an additional one-word header, which is not included in the size argument. The alloc statement does not initialize the records; it is the responsibility of the mutator to keep the heap well-formed by filling in the header and data fields consistently before the next allocation, as described in more detail below. If there is insufficient heap memory for the requested records even after a collection, the program’s behavior is undefined (semantically, it gets stuck); in practice, the allocator issues a runtime error. The alloc statement supports multiple simultaneous allocations; this is necessary to allow the mutator to build mutually recursive records efficiently. If the GCminor implementation allocates by bumping a free pointer (as with a copying collector), it can combine the storage requirements of the multi-allocation before checking the heap limit. Thus, mutators can make natural use of multi-allocation to obtain code that does just one limit check per basic block, an important common optimization. At the same time, keeping the allocations distinct at the GCminor level preserves the possibility of switching transparently to an underlying allocator that uses free lists (as in a mark-sweep collector). Any allocation may invoke the garbage collector, which calculates the set of live heap records and reclaims the space used by dead (garbage) records for use in subsequent allocations. More precisely, the collector computes the set of records that are reachable via a chain of heap pointer dereferences starting from the roots declared in the current function activation and all suspended activations. Roots are either local variables or function parameters. For simplicity, GCminor does not currently support roots in global

where size M h gives the total length (in words) of the record with header h in memory M , and ptrP M h n returns true if and only if field n of the record (numbered from 0) with header h in memory M contains a heap pointer. Concretely, the system requires i. Cminor code macros size and ptrP that implement size M and ptrP M within the collector code. ii. A Coq logical predicate layout desc : memory → value → nat → (nat → bool) → Prop 2 Our

proofs do not currently cover floats, because the Compcert v1.4 memory model does not permit us to write a collector that manipulates ints and floats uniformly as raw bytes.

276

Source code (Haskell notation, but strict):

` M1 : A enough mem R M1 s R = env root values F.roots E ∪ cont root values k (M2 , b) = Mem.alloc M1 (−4) (4 · s) E 0 = env clear non roots A F.roots E k0 = cont clear non roots A k E 00 = E 0 {x 7→ Vptr b 0} A0 = A ∪ {b}

f xs a = case xs of [] -> [] x:zs -> (x+a):(f zs a) GCminor code: functions f = fun (xs,a) { stack 0; vars x,y,ys; roots xs,y,ys; if xs = null { return null; } else { ys = call(f,[load(int32,xs+4),a]); x = load(int32,xs); y = alloc(2); rstore(int32,y,-4,cons_header); rstore(int32,y,0,x+a); rstore(int32,y,4,ys); return y; } }; ... vars cons_header = {int32 2, int32 1}; ...

G `GCM (F, x = alloc(s), E, k, σ, M1 , A) −→ (F, skip, E 00 , k0 , σ, M2 , A0 ) Figure 6. GCminor alloc semantics transition rule

where most of the components are the same as in the rules for Cminor (see Section 2). In addition to global data and stack frame blocks, the memory M now contains a block for each heap record allocated so far. The set of these blocks is recorded in the new state component A. Continuations k now record the root sets as well as the environments of suspended function activations. Representing each heap record by an entire fresh memory block is essential to abstraction. It makes it impossible to forge a pointer to a record, and prevents order comparisons (e.g., ≤) between pointers into different records. The former ensures that records unreachable from roots are truly inaccessible, while the latter allows GCminor to hide any movement of records that occurs in the implementation. Neither would be possible in a conventional flat memory model. As noted above, it is the mutator’s responsibility to initialize the header and fields of allocated records consistently, so that every field designated in the header as a Ptr contains a valid pointer into the set of heap records A (or is null). This well-formedness property is captured by the predicate heap record ok, defined as:

Figure 5. Possible GCminor code for a simple function over lists. where layout desc M h s p holds exactly when s = size M h and p = ptrP M h. The formal semantics of GCminor is parameterized by this predicate. iii. A Coq lemma showing that size and ptrP are consistent with layout desc, and that they are invariant under changes to the heap; the correctness proof for the collector is parameterized over this lemma. Standard scheme. To let clients use our system without doing additional proofs, we provide a standard instantiation of these components for a particular, simple layout descriptor scheme. In this scheme, field order in records is constrained so that all Atomic values come before any Ptr values. (This ordering is convenient for describing closure records and simple class-based objects without inheritance.) Record layouts can thus be described by a simple two-element descriptor: the first element gives the total number of words in the record, and the second gives the number of words in the atomic prefix. We store these descriptors in global static memory; each record header is a pointer to such a descriptor. Figure 4 gives an example. The corresponding Cminor GC macro implementations are size h := load(int32,h) ptrP h n := load(int32,h+4) <= n

Mem.load int32 M b (−4) = Some h layout desc M h s p is atomic A h Mem.high bound M b = 4 · s fields ok A M b s p heap record ok A M b

Here the is atomic clause asserts that the header h is not itself a heap pointer, and fields ok asserts that each field in record b has the correct GC type according to the pointer map p. The mutator has some flexibility in initializing records, but it must ensure that every record is well-formed at any potential collection point, i.e., at any execution of an alloc (recall that we do not support incremental or concurrent collection). This is reflected in the semantics rule for alloc, shown in Figure 6; for simplicity, we give a version that allocates just one record at a time. This rule relies on a number of auxiliary predicates, which we describe in the remainder of this section. We write ` M : A as an abbreviation for (∀a ∈ A, heap record ok A M a), i.e., the entire heap A is wellformed in M . Note that a well-formed heap is necessarily closed: i.e., each Ptr field in each heap record points to some other heap record. The enough mem clause asserts that in the current program state there is still enough space to add a record of size s to the heap. The definition of this predicate depends on the style of memory manager being used. For a compacting collector and bump-pointer allocator, we can use the following definition, where the maximum allowed heap size is a symbolic parameter of the overall semantics.

and the logical predicate is defined by Mem.load int32 M b o = Some (Vint s) Mem.load int32 M b (o+4) = Some (Vint a) layout desc M (Vptr b o) s (λn.a ≤ n) Example Figure 5 shows an example of GCminor code, such as might be produced for a simple recursive function over integer lists. We assume that lists are represented by two-word cons cells in the usual way, with the empty list represented by the null pointer. The global cons header gives the layout of a cons cell using the standard layout descriptor scheme. Formal semantics. the form

GCminor’s small-step transition rules have t

G `GCM (F, st, E, k, σ, M, A) −→ (F 0 , st0 , E 0 , k0 , σ 0 , M 0 , A0 )

277

` M : A0 ∀b, (Vptr b 0) ∈ R ⇒ b ∈ A0 sizeofM (A0 ) + s ≤ maximum heap size enough mem R M s

...

2

[x]

[y]

g

1

[z]

h

Figure 7. Stack layout example. We suppose that f, with two root variables x and y, calls g, with one root variable z, and that g in turn calls h.

This predicate doesn’t actually compute the live heap; instead, it just asserts that there exists some well-formed, and hence closed, heap A0 that contains all the root values and is small enough to permit the desired allocation. In fact, A0 will be a subset of the A in the alloc rule, but we don’t need to use this fact explicitly. The root value set R is calculated as the union of the root values in the current environment (env root values) and for any environments stored within the current continuation (cont root values). If enough mem does not hold, the alloc rule is not enabled, and the program gets stuck. (An alternative approach would be to add a companion rule stating that when there is insufficient memory, the alloc statement issues a runtime error message and enters an infinite loop representing a fatal exception. Unfortunately, it would be difficult to preserve these semantics through the remainder of the compilation pipeline, because subsequent compiler transformations can actually decrease the size of the live heap, so some allocations that fail at the GCminor level would succeed in the generated code!) Mem.alloc allocates a fresh block in Compcert’s underlying memory model, giving it appropriate lower and upper offset bounds (in bytes, and allowing for the header). A pointer to the first data word of the resulting block is assigned to x. As noted above, GCminor hides any heap and environment changes to reachable pointers caused by a relocating collector. However, in the formal semantics, collection does have an observable effect on any non-root variables containing heap pointers: it clears them (i.e., sets them to the value Vundef). To keep the GCminor semantics deterministic, pointers are cleared at each alloc operation, both in the current environment (env clear non roots) and in any environments stored within the current continuation (cont clear non roots). Well-behaved GCminor programs that specify their roots correctly will never observe pointer clearing, but ill-behaved programs that attempt to dereference a pointer fetched from a non-root variable will get stuck. For example, in the code of Figure 5, if we had omitted the declaration of xs as a root, the value of xs would have been cleared by an alloc within the recursive call, and the subsequent load into x would have gotten stuck. Since our semantics preservation proof for the GCminor-to-Cminor translation only needs to hold for non-stuck programs, it can ignore programs that mis-specify roots, which is essential to making the proof work. Of course, GCminor’s actual implementation doesn’t clear the nonaccessible roots; this would be pointless, since correct programs wouldn’t be able to tell the difference anyway.

4.

f

Allocation. Translating alloc statements to calls is straightforward. Although the translation does not commit to a specific collection method, it does assume a bump-pointer allocator (such as in a Cheney collector). With this kind of allocator, it is more efficient for the mutator to make a single large allocation request than a series of small ones, so the translation of a multi-record alloc sums the requested sizes, requests a single record, and then updates the target variables with appropriate offsets into the resulting record. Roots. The translation of root declarations is more complex. The fundamental difficulty is that the collector must be able to find— and, for a moving collector, also update—all roots for all functions suspended on the current call stack. But Cminor provides no direct access to local variables in suspended activations (unsurprisingly, since C doesn’t require such a feature). Our solution is to use a “shadow stack” [9, 15] in which live root values are stored in memory across calls. Specifically, the translation generates code to dump the local live roots to the stack before each call and restore them after each return. Roots are stored in a record in the function’s explicit stack frame. Each root record is linked to that of its caller, so that the entire chain of root records can be traversed given a pointer to the most recent record, which is passed as an extra argument to every call (but not to extcalls). Figure 7 shows an example of the stack layout at the Cminor level. As an important optimization, we store only live roots in root records. GCminor constrains root variables to contain valid roots at all times, so it would be safe to record all roots, but this could prevent some garbage from being collected. We also have some minor optimizations for cases when no local variables need to be stored on the stack. LowGCminor. To subdivide the implementation and proof effort, we introduce LowGCminor, a further intermediate language between GCminor and Cminor. LowGCminor is syntactically identical to GCminor, except that there is no per-function list of roots; instead, each alloc statement and non-tail call to an internal function has an additional component listing the set of live roots at this particular site. This set can be thought of as an abstract form of the GC root tables used by many collectors. As in GCminor’s semantics, any heap pointers omitted from the root set are (conceptually) cleared by a collection. Cheney collector. Our actual GCminor implementation currently uses a simple Cheney-style copying collector.3 The collector uses two large fixed semi-spaces declared in Cminor’s global data region. The allocation pointer and the limits of the current and reserve spaces are also held in globals.

GCminor Implementation

GCminor is implemented by translation to Cminor. This translation involves two key steps: i. GCminor alloc statements are translated into calls to a fixed library function, written in Cminor, that performs the allocation after garbage collecting if necessary. (This call could be inlined for efficiency.)

Other collector architectures. Changing to a different collector would have only modest impact on the structure of the GCminorto-Cminor translation. A collector (e.g., mark-sweep) that doesn’t move records still needs to read roots from all suspended functions, but doesn’t need to change them. For such a collector, it would be unnecessary to restore values from the root records after returning from calls. For an allocator that uses free lists rather than pointer

ii. Code is inserted around each function call (including allocation calls) to save and restore live roots into in-memory stack frames. If the collector is invoked, it examines this data structure to find roots. The remainder of the GCminor language is essentially identical to Cminor, so its translation is trivial.

3 We

thank Xavier Leroy for providing collector code on which ours is closely based.

278

bumping, we would want to generate separate allocation requests for each record in a multi-allocation. For a generational collector, the translation would need to generate suitable write barrier code at rstore statements.

5.

„

Semantics preservation

Our overall semantics preservation proof for the GCminor-toCminor translation is conducted in the same style as the existing Compcert proofs. It assumes correctness of the allocation function with respect to a low-level specification. The proof has two parts, bridged by LowGCminor. We describe each part in turn, and then discuss the allocator specification and how it can be realized.

G `CM

(R, M1 , A) ∼φ (P, M10 ) ` M1 : A enough memory R M1 s ∀v, v ∈ R ⇒ is pointer v ∃M20 , b0 , ofs0 , φ0 . « (F, x = call(alloc,[P, s]), E, k0 , σ, M10 ) −→ 0 0 0 0 (F, skip, E{x 7→ Vptr b ofs }, k , σ, M2 ) ∧ (R, M1 , A) ∼φ0 (P, M20 ) ∧ free block φ0 M1 M20 b0 ofs0 s ∧ stack preserve mem k0 m01 m02 P ∧ nobj preserve φ φ0

Figure 8. Cminor-level allocator specification, with roots in the shadow stack P .

LowGCminor. The essence of the translation from GCminor to LowGCminor is an analysis that computes the liveness of root variables at each point where a root record must be constructed. The proof that the translation preserves semantics must show that no live pointers are omitted, i.e., that the liveness analysis is correct.4 The simulation relation for the proof is very simple: GCminor state S1 and LowGCminor state S2 are related only if they are identical except for their local environment components E1 and E2 , which need only agree on the live variables L computed for that state:

To manage this part of the proof, we factor out the behavior of the allocation function into an abstract specification, which is then refined several times until we reach a low-level description specialized to a Cheney collector. We have proved correctness of the translation relative to each refinement level of this specification. Allocator specification. The allocator specification at each refinement level describes the behavior of the Cminor-level alloc function call corresponding to the GCminor allocation in Figure 6. The alloc function may choose to perform a GC; the assumptions of the specification ensure that the GC will not crash when run, while the conclusion describes the state after the allocation is successful. At the highest level, the alloc specification describes the effect of collection on local variables in the current environment and continuation. For brevity, we avoid describing this specification level and instead concentrate on the next refinement level, where roots are assumed to be stored in the shadow stack. This specification, given in Figure 8, is the most important part of the interface between GCminor and the GC. This style of specification is abstract enough to describe a range of collectors, including those that move or coalesce heap records, and has been used successfully in prior work [14, 22, 24] to verify copying, mark-sweep and incremental copying collectors. The core of the specification is the simulation relation ∼φ . The initial GCminor state (represented by the root values R, GCminor memory M1 and set of objects A) must be related to the initial Cminor state (the shadow stack, viewed as a list of root frame addresses P , and the Cminor memory M10 ) via the embedding φ, written as (R, M1 , A) ∼φ (P, M10 ). This relation states that the the root values in R are represented in a linked list of arrays with nodes given by P , which ensures the correctness of root restoration. It also requires that M1 is embedded in M10 via φ, without overlapping in memory. The precise definition of the memory embedding depends on the collector being used, and will include the private data needed by the GC. The other preconditions come directly from the GCminor semantics (the heap must be well-formed and there must be enough free memory) or from LowGCminor (all roots must be valid). The first part of the postcondition asserts that the call to the function alloc will succeed and return to the state including memory M20 . alloc is called with two arguments, a pointer to the linked list of saved roots (the first element of P ) and the number of words to be allocated, s. The allocation function will return a pointer Vptr b0 ofs0 to a fresh record 4s bytes long, by setting the local variable x to the start of the record. After the collection, the embedding φ has changed to φ0 . However, the same fragment of GCminor state (R, M1 and A) is represented in the new Cminor memory M20 . From the client’s perspective, this means that the roots and memory have not changed.

E1 ∼L E2 ::= ∀x ∈ L. ∀v. E1 (x) = v → E2 (x) = v For technical reasons, LowGCminor requires that values written to root records be valid pointers. The proof of this invariant follows easily from GCminor’s invariants and from the insertion of explicit null initializers for variables that are used before they are assigned. The simulation proof is particularly simple because each LowGCminor program contains exactly the same statements as its GCminor original, except for the added initialization code. Cminor. The semantics preservation proof for LowGCminor to Cminor is much harder. First, the structure of source and target code differs substantially at allocations and call sites, where the Cminor code introduces new statements to invoke the allocation library function and to copy root variables to and from the shadow stack explicitly. The invariant relation between LowGCminor and Cminor states must account for the addition of explicit root records in stack frame memory at the Cminor level, and extra parameters and local variables in the environment. Furthermore, because we reason using a precise model of machine arithmetic, we have the burden of proving that none of the generated code (such as the save and restore code) causes arithmetic overflow. Second, and more fundamentally, there is a major change in the representation of the heap. In (Low)GCminor, each allocated heap record is represented by a pointer to an independent memory block, and these records appear never to move. In Cminor, the actual implementations of the allocator and collector are exposed. For example, if a Cheney copying collector is used, records are represented by pointers into the middle of a single large block that holds the entire current semi-space, and they can move at any collection. The invariant relation between (Low)GCminor memory and Cminor memory is thus a dynamic isomorphism over live blocks, described by a memory embedding φ that maps GCminor addresses to Cminor addresses. Initially, φ is empty. It is extended every time an object is allocated: at the GCminor level the object is stored in a fresh block, while at the Cminor level it is stored at a free location in the heap block. When a collection occurs, the old embedding can be composed with a partial map describing the movement (and possible freeing) of each object to create a new embedding. 4 Our

current proof makes this easy by omitting support for loops, which our example front end doesn’t need.

279

6.

From the collector’s perspective, the use of a new embedding allows records to be moved at the Cminor level. The free block predicate states that there is enough unallocated space at address Vptr b0 ofs0 to hold s words of memory. The definition of this predicate depends on the collector being used. Taken together with the ∼φ0 injection, free block φ0 M1 M20 b0 ofs0 s ensures that when we extend M1 with a fresh object block b to produce a new memory M2 at GCminor level (see Figure 6), we will be able to extend φ0 to a fresh embedding of M2 into M20 that maps b to the address Vptr b0 ofs0 . Finally, the specification ensures that the GC does not damage other parts of memory. The predicate stack preserve mem states that the length and content (aside from the saved roots) of Cminor stack frames must be unchanged from M10 to M20 . This ensures that the collector does not change the portion of the stack frame visible at the GCminor level. The predicate nobj preserve states that the GC does not move any non-records (i.e., stack frames and global memory).

To assess the utility of GCminor as a compilation target for an existing programming language, we have built a prototype compiler for a subset of Haskell [26]. Our compiler supports most of the features of Haskell 98, except for floating point, file I/O, arrays, and seq. Compiling Haskell to a low-level, call-by-value language such as GCminor involves a number of major program transformations, including an implementation of lazy evaluation using force and delay constructs [1, p. 261], conversion of higher-order functions and delayed thunks to first-order functions and closure records [4], and conversion from an expression-based, purely functional form to a statement-based, imperative one. Although a number of compilers have compiled Haskell by transformation to a strict language with explicitly lazy constructs [5, 6, 11, 12], currently GHC instead relies on specialized runtime system support for laziness [27]. Demonstrating that our minimalist runtime system is adequate to run Haskell programs is an important step towards increasing the overall assurance of Haskell-based applications. The starting point of our compilation pipeline is External Core [35], a text-based representation of code in GHC’s intermediate language Core [28]. Core is based on System FC , which is an implicitly lazy language that extends System F with algebraic data types, a let construct, and type equality coercions [30]. Our compiler takes as input a Core program that has already been heavily optimized by GHC’s front end. It then passes the program through the transformations described above, each of which produces a program in a different call-by-value intermediate language. These transformations expose further opportunities for standard functional language optimizations such as uncurrying, let-floating, identifying functions that do not require closures, and inlining, as well as removal of redundant force and delay operations. Finally, the pipeline produces a GCminor program.

Cheney collector In this section we describe the further refinement of the alloc specification for a typical Cheney copying collector, by defining φ and free block. A Cheney collector stores all objects in a single semispace block objb. A free pointer free points to the next unused location in objb and the limit pointer limit points to the end of objb. The injection φ maps GCminor blocks that contain objects to offsets within objb, and other GCminor blocks directly to Cminor blocks (disjoint from objb). For this injection to be well-formed, no two pieces of GCminor memory can be mapped onto a single piece of Cminor memory. For free block, a Cheney collector requires that free ≤ Vptr b0 (ofs0 − 4) and that Vptr b0 (ofs0 + 4s) ≤ limit. Cheney collection causes reachable objects to be copied to a new semispace block. Generating the new injection φ0 after a collection is the key part of the preservation proof. At the concrete level, the movement of objects by the collector can be given by an isomorphism ϕ from the initial location of a reachable Cminor object to the final location of that object. The mapping φ0 from GCminor objects in M1 to the new Cminor objects in M20 can then be defined as the composition of the old mapping φ with ϕ, as shown by the following diagram:

Memory safety. In addition to demonstrating that GCminor is a reasonable compilation target for a sophisticated high-level source language, this prototype illustrates how our framework can provide useful assurance guarantees short of full semantics preservation. Building a fully semantics-preserving compiler accepting Haskell source would be a daunting task (even if there were a generallyaccepted formal semantics for Haskell), especially since we would need to prove the correctness of the optimizations that GHC applies to Core. Instead, we lay the groundwork for an assurance argument based on a combination of semantics preservation proofs and weaker, but much easier, type soundness proofs. Specifically, we select one of our intermediate languages, called Dminor, to serve as the boundary between the two kinds of proofs. Dminor has a type system, which is designed to be sound: well-typed programs don’t get stuck.5 Semantics preservation for the remainder of the pipeline guarantees that a non-stuck Dminor program yields a non-stuck assembly language program. Finally, Compcert’s definition of assembly language semantics (Section 2) implies that a non-stuck program is memory-safe, in the sense that it cannot forge pointers or dereference memory outside of properly allocated stack frames or global memory regions. Combining these properties yields a memory safety guarantee for the entire back end. By typechecking the Dminor code generated by our front end, we obtain a memory-safe Haskell compiler with “fail-stop” behavior: any program that compiles successfully will be memorysafe. In fact, we have also implemented type-checkers for Core and our other intermediate languages, which we use to check typecorrectness at each compilation stage; this technique is an excellent

M1 φ◦Φ

Φ

M'1

φ

Case Study: Haskell to Dminor to GCminor

M'2

In other words, if b is a reachable GCminor object, then φ0 (b) = ϕ(φ(b)). φ(b) produces the initial location of the concrete representation of b, and ϕ produces the final concrete location of that object. If b is unreachable (and thus was not copied by the collector), then φ0 (b) is undefined. Low-level collector proof. We have partially verified the safety and completeness of our Cheney copying collector (Section 4) written in Cminor using separation logic tactics [21], but have not yet formally connected this separation-logic specification to the one given in Fig. 8. In order to connect this proof to the rest of our system, we must show that the concrete specification of the collector matches the abstract specification given here, verify termination of the collector, and formally relate the separationlogic proof system to the style of specification shown here (where separation facts must be made explicit).

5 Strictly

speaking, even a well-typed program might get stuck unless the Dminor analogue of enough mem holds at every allocation point.

280

t ::= Int | | hci

way to find compiler bugs. Of course, if the front end is bug-free, it should always generate well-typed Dminor code. We lack machinechecked soundness proofs for the type systems of our languages, but we believe that doing these proofs would be straightforward. We also expect that the various transformations and optimizations performed along the pipeline all preserve types, although for the most part we do not have formal proofs. We are confident that these type preservation results could, with sufficient effort, also be proved within Coq, obviating the need for compile-time typechecking of the intermediate forms.

integer pointer to record pointer to record with tag c

e ::= . . . | rload(ch,erecord ,eoffset ) | stackaddri m ::= | | | | | | | |

Minimalist type system. Unfortunately, while proofs about types are typically much easier than proofs of semantics preservation, they may still be quite hard if the type systems involved are complex. Standard type systems that can typecheck closure-converted code and record initialization require features such as existential types and initialization types [23]. Explicit forces require strong updates, which would add yet more complexity to our type system. Moreover, in our compiler, these features would need to be added to the already very complex System FC . We therefore adopt a different approach, and give the intermediate languages in our pipeline extremely rudimentary type systems that serve only to distinguish which local variables hold heap pointers and to describe the pointer layout corresponding to each heap record tag. These type systems are about as minimalist as they can be while still supporting static identification of roots at the GCminor level. This approach simplifies both engineering of type checkers and proofs involving types. However, our type systems are so weak that they can neither distinguish function closures from ordinary data records nor track which algebraic data type a data record belongs to. If our front end had bugs, it could produce Dminor programs that tried to apply data records as functions or inspect function closures with case expressions; yet, these programs would still be welltyped. Thus, in order to prove the type systems sound, we must add additional runtime checks at closure applications and some record accesses, so that these nonsensical programs yield a checked runtime error. Checks never fail in code generated by a bug-free compiler, but they may increase execution time even so. We think that this idea of trading off verification complexity against runtime performance has some merit, but we stress that the use of dynamic checks is quite independent of the remainder of our framework.

(as in GCminor) load from record field address of stack frame entry

e pure expression app(e,~t → t,~e) closure application call(f ,~e) internal function call let [id :: t =] extcall(f ,~e) in m external call let id :: t = force e in m thunk evaluation −−−−−−→ letrec id = c(~e) in m record binding −−−−−−−−→ case e of id :: hci : m case analysis if e then m else m conditional let id :: t = m in m monadic binding

−−−→ −−−→ f ::= fun(id :: t) { vars id :: t; m } −−−→ p ::= functions id = f ; −−−→ tags c 7→ ~t; −−−−−−−−→ closuresigs c 7→ (~t → t); −−−−−−−−−→ vars id = initializers; main id

Figure 9. Syntax of Dminor types (t), monadic expressions (m), functions (f ), and programs (p). Atomic expressions (e) are the same as for GCminor, except as noted. Record constructor tags (c) are described in the text.

given type Int. Fortunately, GHC does not permit polymorphism over unboxed values, so the boxity of a value is always apparent. We introduce record types while doing closure conversion and through static analysis to make the types of certain -typed values more precise (eliminating the need for some case expressions). All bindings are statically typed, making it possible to compute the type of any expression without an environment. No identifier can be bound twice in the same function; this simplifies formalization of the semantics and eases translation to GCminor, which lacks nested scopes. The semantics of most Dminor expressions are standard or similar to their GCminor equivalents; we describe those that are not. All memory is allocated by letrec expressions. Evaluating

Dminor syntax and semantics. Dminor (Figure 9) is a low-level, first-order, strict, expression-based, pure functional language with a very simple type system. Its design is a compromise between ease of typability and simplicity of translation. Including closure application and thunk forcing as primitive operations in the language facilitates typability, while using almost the same set of underlying pure operations as GCminor simplifies translation. The language divides expressions into two categories: pure and monadic [13]. Pure expressions correspond directly to GCminor expressions, with the addition of a separate rload operator to read from heap objects (which avoids the need to type address arithmetic) and the omission of stackaddri (as there are no explicit stack frames). Monadic expressions may have effects. They can appear only in tail position or in a let; all other subexpressions must be pure. This syntactic structure sequences effectful operations explicitly and simplifies subsequent translation to GCminor. Dminor supports just three type constructors: unboxed integers (Int), pointers to heap records with unknown tag (, pronounced “box”), and pointers to heap records of known tag, written hci where c is a constructor tag (explained below). Any value of type hci can be statically coerced to type . In code translated from Core, most variables have type ; only variables explicitly declared as unboxed integers (type Int# in GHC) or pointers to static global memory such as string literals (type Addr# or Ptr in GHC) are

letrec

x1 = c1 (e11 , . . . , e1p1 ) ... xn = cn (en1 , . . . , enpn )

in m simultaneously allocates n records such that record j has tag cj and fields given by the values of pure expressions ej1 , . . . , ejpj . The bindings are recursive in the sense that the xi (but not loads from them) may be mentioned in the ejk . Closures and thunks (which are simply closures taking zero arguments) are constructed just like ordinary data records; the only difference is that the signature ~t → t of the closed-over function is included in the closuresigs list. The semantics of force reflect Haskell’s call-by-need semantics: forcing a thunk means evaluating it, then overwriting the pointer to the thunk with a pointer to the result. The latter step means changing a record’s tag, an operation which would have complicated Dminor’s type system if expressed explicitly in the language. Every record, whether it is a closure record or a data record, has two header fields (meaning that the size of every record incorporates two extra words in addition to the sizes of its fields). One

281

Program circsim clausify cryptarithm1 cse gcd hartel comp lab zift hartel ida hartel sched hartel transform hartel typecheck knights lambda last-piece lcss multiplier power primetest rewrite Mean

word—the record tag—is, abstractly, an index into the list tags of record layouts. The tag plays two roles for any given record: first, it points to layout information that the collector uses, as described in Section 3; second, if the record is a data record, then a case expression that deconstructs it will check its tag, and if the record is a closure record, then an app expression that applies it will check its tag to obtain the closure’s type signature. The second word contains the code pointer for closure records and goes unused for data records (to obtain uniformity, we pay an extra word per data record). The translation from Haskell to Dminor assigns a unique tag to each declared algebraic data type constructor and assigns a unique encoding to each possible closure type signature. A case expression dispatches on the record’s tag value. The type of a case discriminant x is normally . Within an arm of the form x0 :: hci : m, identifier x0 is bound to the value of x and given the refined type hci; this allows fields of x to be accessed within the arm by rload(ch,x0 ,offset), which is well-typed if offset and ch are valid for the record layout corresponding to c. Dminor’s type system is not powerful enough to check that a case expression is exhaustive; a case expression in which no listed constructor matches the discriminant denotes a runtime error. As a result, the compiled code may test more alternatives than would be required if cases were known to be exhaustive. In the expression app(op, ~t → t, args), the operator op should evaluate to a pointer to a closure record, whose first field is a (toplevel) function f , which is invoked with the values of arguments args and (implicitly) the closure record itself. The second field ~t → t is the expected type signature of the function being applied. At runtime, the static signature is checked against the signature associated with the operator closure record’s tag; if the record has an unexpected signature, or isn’t a closure record at all, a runtime error is raised. We chose to keep closure application as a primitive operation in Dminor in order to make the language typable without introducing existential types or more runtime checks. Dminor also includes calls to known (top-level) functions; these do not require a runtime type check. By default, the translation from Haskell must compile function applications as closure applications, but static analysis can transform some of these applications to knownfunction calls.

GT 1.81 1.04 2.04 4.26 11.89 1.09 1.19 3.31 1.87 1.48 4.62 3.16 1.92 1.59 1.55 41.1 233 1.58

GM 672 228 1029 <1 5209 405 364 1057 738 330 170 634 411 625 464 13388 74750 268

CT/GT 3.94 1.56 1.46 5.42 1.91 1.98 1.62 1.33 1.78 1.54 0.88 1.82 2.36 3.64 2.50 1.65 2.82 2.23 2.04

CM/GM 4.98 2.11 1.68 1.71 2.33 1.97 2.06 1.91 1.86 2.21 4.59 2.10 4.95 1.97 1.76 2.11 3.62 4.86 2.49

Figure 10. Comparing time and space usage for GHC and Haskell Compcert. For each program, the “GT” column shows its runtime in seconds and the “GM” column shows its allocation (in megabytes) when compiled by GHC. The “CM/GM” column shows the ratio of memory allocated by each Compcert-compiled program, compared to the GHC baseline. The “CT/GT” column shows the same ratio, but for time instead of memory. “Mean” shows the geometric mean ratios over all 18 programs.

The GCminor code generated for a case statement dispatches on the record’s tag value, which must be retrieved from the header by subtracting dbase; it is impractical to dispatch directly on the descriptor pointer itself, because absolute descriptor addresses are not known at compile time. We must implement the dispatch as a binary comparison tree.7 At the GCminor level, the function signatures that appear in app expressions and in the global descriptor array are represented by distinct integer encodings s, which can be cheaply compared for equality at runtime. These integers can be easily assigned by a global traversal of the program during translation. We have a machine-checked proof of semantics preservation for most of the Dminor-to-GCminor translation, excluding only case, app, and force expressions, which do not interact with allocation in interesting ways. As usual, the key to a semantics preservation proof for the translation is the simulation between Dminor and GCminor states. Although the relationship between Dminor and GCminor continuations is complicated, the heap memory components are identical. Thus, the crucial proof obligation induced on clients by the GCminor semantics, namely that ` M : A at each allocation point, can be proved as an invariant of Dminor in isolation. In fact, we prove that the heap is well-formed after every possible Dminor evaluation step.

Translation to GCminor. The translation of Dminor to GCminor is largely straightforward. Expressions must be converted to statements; the most complicated translation is from case expressions to nested switch, block, and exit statements, but this is similar to existing Compcert code for compiling C switch statements, so we omit further details here. Each letrec expression is translated to an alloc statement followed by a series of rstore statements to initialize the headers and fields. The scopes of locally-bound identifiers are widened to the entire function; this is safe because no identifier is bound twice in a Dminor function. Identifiers of type or hci are declared as roots of the GCminor function. Tags and closure signatures are made concrete as follows. Each tag c is converted to an offset o into a static global descriptor array.6 We write dbase for the base of this array. At runtime, a record header contains a direct pointer to the descriptor (that is, dbase+o). The descriptor has three words: number of fields, number of atomic fields, and encoded closure signature type s (explained below). This format is compatible with the standard descriptor scheme described in Section 3. (The translation will reorder fields as necessary to keep atomic fields first.)

Practical experience. To assess the practicality of our Haskell Compcert pipeline, we ran it on a number of benchmarks from the “spectral” section of the Haskell nofib benchmark suite. The “spectral” benchmarks are “small, key pieces from real programs” [25]. In order to compile real Haskell code, we had to make some changes to GHC’s standard libraries to circumvent code that is based on primitive operations we have not implemented. In par-

6 For

7 Jump

historical reasons, our current system actually uses a version of Dminor in which the concrete descriptor array is already present; the Dminor type checker is responsible for confirming that the abstract and concrete representations of tag and closure signature information agree.

tables, which require choosing the tags for each algebraic type from a dense domain of small integers, may be a more efficient implementation. But our choice of globally unique record tags forbids us from using them, and our version of Compcert does not support them in any case.

282

ticular, we reimplemented I/O functions as foreign calls to functions implemented in our simple RTS. In addition, we recompiled GHC with a native Haskell version of the multi-precision integer library [32] (substituting for GMP). We started with a baseline of GHC version 6.10.4. Of the 60 spectral benchmarks, we chose 18 to present in this paper. We excluded benchmarks that ran for less than 1 second when compiled by Haskell Compcert, as well as benchmarks that used Haskell or GHC features we do not support. We used our patched version of GHC to generate Core programs as input to Haskell Compcert, as well as to generate baseline executables. We ran GHC with the -O2 and -fvia-C flags, and ran all programs with a 128 MB heap. We did the measurements on an Apple Xserve G5 (2.3 GHz PowerPC dual core, 8 GB RAM, Mac OS X Server 10.4.11). Figure 10 compares the performance of GHC-compiled and Compcert-compiled programs. (In both cases, time measurements included both mutator time and GC time.) On average, Compcertcompiled programs ran about twice as slowly as GHC-compiled programs, and also allocated 2.5 times as much memory. We can explain some of the memory allocation overhead by reference to our inefficient record layout (as described earlier in this section) and our strategy for compiling laziness. Also, our thunk representation introduces an extra three-word record for every thunk the program allocates. And in order to avoid implementing multiple return values in the back end, we compiled GHC’s unboxed tuples by transforming them into boxed tuples. Given these pervasive sources of overhead, a mean factor of 2.5 increase in memory usage is not surprising. Since allocation is expensive in our system (invoking the allocator requires function calls), a corresponding increase in execution time is not surprising either, but in fact time and memory overheads of individual benchmarks are often not well-correlated. Indeed, the sources of time overhead are still somewhat mysterious to us. One obvious possible source is the cost of maintaining the shadow stack of GC roots. To test this possibility, we selected a subset of benchmarks that do not require GC when run in a 1GB heap (the largest we can configure). Removing the shadow stack management code from these benchmarks improved their mean execution time by less than 3%, with little variance among the programs. Runtime type checks are another potential source of overhead. But compiling our benchmark set without runtime checks had almost no effect on mean execution time, although it did reduce the time of one benchmark (power) by 17%. Of course, the overhead of checks would increase if we eliminated other sources of overhead and thus reduced overall execution time. Another obvious point is that our garbage collector is slow and simplistic compared to GHC’s highly tuned generational collector, but again, higher execution time overheads are not well-correlated with amount of GC performed. Investigating other possible sources of the performance gap remains as future work.

heap blocks have the same format (a single atomic constructor tag or closure function pointer, followed by value pointers) so there is no need for the front end to pass record layout information. Dargaye’s implementation of the memory management library is also quite similar to ours; we share similar collector code and shadow stack format, although she chooses to store roots in the shadow stack permanently rather than to store and reload them around calls. Like us, Dargaye axiomatizes the behavior of the allocation function. She explicitly defines reachability in the heap (made simpler because miniML heaps cannot have cycles of pointers), and specifies that collection should leave reachable memory locations unchanged. We use a similar but simpler specification at the GCminor level (all memory should remain unchanged), but at the Cminor level we refine it to a more precise specification that lets us describe the behavior of a moving collector. McCreight et al. [20, 22] discuss the treatment of a garbagecollected heap as an abstract data type to hide the implementation details of a collector from the mutator, and verify in Coq that several collectors satisfy this interface. Hawblitzel and Petrank [14] apply this approach to realistic collectors for the Bartok C# compiler, using an automated theorem prover to verify the collectors. The root and record descriptor information needed by the collectormutator interface is verified using a typed assembly language [7]. The final allocator interface of our work, given in Fig. 8, is also based on this approach. The main difference in our work is that it takes a local specification of parts of memory and wraps it up into a global specification in the form of a complete intermediate language, GCminor. This allows clients to reason about mutator programs at a single level of abstraction that hides the action of the collector. Myreen [24] verifies a Cheney collector for a simple fixed record format, and uses a memory embedding to hide the movement of records from high-level code. However, the high-level state does not include any non-root components, so he does not have to deal with stale record pointers. Chlipala [8] carries out semanticspreserving compilation of the simply-typed lambda calculus to a low-level machine with garbage collection. He also uses an embedding from a high-level memory to a low-level memory, but does not hide the actions of the collector from the high level. Vanderwaart and Crary [37] use a type system to describe the interface to a precise collector. Their focus is on describing the layout of roots within the stack, and their work only supports reasoning about the type safety of the mutator code. The C-- generic intermediate language [29] has a small runtime system with activation inspection primitives designed to support an arbitrary garbage collector provided by the compiler front end without the need for a shadow stack. It would be interesting to attempt a semantics preservation proof for a version of Cminor extended with primitives of this kind.

8. 7.

Related work

Conclusions and Future Work

We have described a general-purpose, machine-verified compilation pipeline for garbage-collected languages. A key feature of this system is the use of language abstraction to hide collection from the mutator. This is embodied in our language GCminor, which makes precise the often-subtle collector-mutator interface. Compiler writers can take advantage of our work simply by generating GCminor code—verifying that code to whatever level they desire—and then applying the existing Compcert back end to generate code for the PowerPC or ARM. The work reported here represents a serious engineering effort stretching over several calendar years. The verified GCminor-toCminor compiler is about 13,000 lines of Coq code and proof scripts; the Core-to-GCminor front end is about 10,000 lines of

Dargaye [9, 10] extends the Compcert framework to compile miniML, a simple call-by-value functional language. The compiler uses a chain of new intermediate languages connected by semantics-preserving translations and ending with Cminor. Dargaye makes no attempt to present a general-purpose variant of Cminor for interfacing to a collector. However, the last of her new languages, Fminor, is quite similar to our Dminor (without support for laziness), though higher-level in some respects (e.g., case expressions bind constructor fields to identifiers) and lower-level in others (e.g., live roots are already explicitly identified, as in our LowGCminor). One significant simplification is that miniML contains no primitive types or operations; all values are boxed and all

283

(heavily commented) Haskell and the Dminor-to-GCminor preservation proof is another 5500 lines of Coq scripts. Our framework depends heavily on the existing Compcert system. Using Compcert has allowed us to build a working compiler quickly and has given us an excellent model for developing semantics preservation proofs for our extensions. We have remained fully backwards-compatible with Compcert, so that existing proofs are unaffected; indeed, we have changed only a very few existing Compcert files at all (in order to extend module signatures). This approach has caused a few problems: the Compcert memory model cannot express some useful GC techniques, and the lack of stack introspection requires using the awkward shadow stack technique. These limitations are not inherent in the Compcert framework, but removing them would be a significant task with possibly extensive ramifications for the existing proofs. We have exercised our framework by building a compiler from GHC’s Core intermediate language to GCminor. With simplicity of verification in mind, we designed intermediate languages for this compiler that use a novel combination of static and dynamic type checking. An alternative, which we plan to explore, would be to build a more conventional TAL-like type system [23] for GCminor, which would obviate the need for a Dminor-like higher-level typed language and corresponding semantics preservation proof. Our performance measurements show that our Compcert-based compiler generates code that runs at about half the speed of GHCgenerated code on average. We achieved this level of performance by combining an existing front end and back end, without extensive performance tuning. We conclude that it is possible to increase the assurance of high-level language compilers without seriously injuring performance. Choosing a real language and compiler as a testbed has some obvious advantages, but also inevitably introduces a great deal of “accidental” complexity. We initially underestimated how hard it would be to understand the behavior of our back end on code generated by GHC’s front end. The verification story for our pipeline is almost complete. Our first priority for future work is to fill the remaining gap, which is between the allocator specification assumed by our GCminor-toCminor proof and the one we have proven for our prototype Cheney collector. We also plan to prove that the specification can be met by more realistic collectors, such as a generational copying collector. Finally, our garbage collection framework is just one component of a larger effort to build a complete high-assurance runtime system suitable for supporting safety-critical and security-critical applications [33]. We hope to address other components, including concurrency and foreign function interfacing, in future work.

[8] A. Chlipala. A certified type-preserving compiler from lambda calculus to assembly language. In PLDI, pp. 54–65. ACM, 2007. [9] Z. Dargaye. MLCompCert Coq proofs. http://gallium.inria. fr/∼dargaye/mlcompcert.html, 2009. [10] Z. Dargaye. V´erification formelle d’un compilateur pour langages fonctionnels. PhD thesis, Universit´e Paris 7 Denis Diderot, July 2009. [11] A. Dijkstra, J. Fokker, and S. D. Swierstra. The architecture of the Utrecht Haskell Compiler. In Haskell Symp., pp. 93–104. ACM, 2009. [12] K.-F. Fax´en. Analysing, Transforming and Compiling Lazy Functional Programs. PhD thesis, Royal Institute of Technology, June 1997. [13] C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In PLDI, pp. 237–247. ACM, 1993. [14] C. Hawblitzel and E. Petrank. Automated verification of practical garbage collectors. In POPL, pp. 441–453. ACM, 2009. [15] F. Henderson. Accurate garbage collection in an uncooperative environment. In MSP/ISMM, pp. 256–263, 2002. [16] X. Leroy. Formal certification of a compiler back-end or: programming a compiler with a proof assistant. In POPL, pp. 42–54, 2006. [17] X. Leroy. The Compcert verified compiler. http://compcert. inria.fr/doc/index.html, April 2009. [18] X. Leroy. A formally verified compiler back-end. J. Autom. Reason., 43(4):363–446, 2009. [19] X. Leroy and S. Blazy. Formal verification of a C-like memory model and its uses for verifying program transformations. J. Autom. Reason., 41(1):1–31, 2008. [20] A. McCreight. The Mechanized Verification of Garbage Collector Implementations. PhD thesis, Yale University, New Haven, CT, USA, 2008. [21] A. McCreight. Practical tactics for separation logic. In TPHOLs, volume 5674 of LNCS, pp. 343–358. Springer, 2009. [22] A. McCreight, Z. Shao, C. Lin, and L. Li. A general framework for certifying GCs and their mutators. In PLDI, pp. 468–479. ACM, 2007. [23] G. Morrisett, D. Walker, K. Crary, and N. Glew. From System F to typed assembly language. TOPLAS, 21(3):527–568, 1999. [24] M. O. Myreen. Formal verification of machine-code programs. PhD thesis, University of Cambridge, 2008. [25] W. Partain. The nofib benchmark suite of Haskell programs. In Proc. 1992 Glasgow Workshop on FP, pp. 195–202. Springer, 1993. [26] S. Peyton Jones, editor. Haskell 98 Language and Libraries – The Revised Report. Cambridge University Press, 2003. [27] S. L. Peyton Jones. Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. JFP, 2(2):127–202, 1992. [28] S. L. Peyton Jones. Compiling Haskell by program transformation: A report from the trenches. In ESOP, pp. 18–44, 1996. [29] S. L. Peyton Jones, N. Ramsey, and F. Reig. C–: A portable assembly language that supports garbage collection. In PPDP ’99, pp. 1–28, London, UK, 1999. Springer-Verlag. [30] M. Sulzmann, M. Chakravarty, S. Peyton Jones, and K. Donnelly. System F with type equality coercions. In TLDI, pp. 53–66, 2007. [31] The GHC Team. GHC. http://haskell.org/ghc, 2009. [32] The GHC Team. Replacing GMP. http://hackage.haskell. org/trac/ghc/wiki/ReplacingGMPNotes, April 2009. [33] The HASP Project. http://hasp.cs.pdx.edu. [34] The Ocaml Development Team. The Caml language. http://caml. inria.fr. [35] A. Tolmach, T. Chevalier, and the GHC Team. An external representation for the GHC Core language. http://www.haskell.org/ghc/ docs/6.10.4/html/ext-core/core.pdf, July 2009. [36] N. Torp-Smith, L. Birkedal, and J. C. Reynolds. Local reasoning about a copying garbage collector. ACM TOPLAS, 30(4):1–58, 2008. [37] J. C. Vanderwaart and K. Crary. A typed interface for garbage collection. In TLDI, pp. 109–122. ACM Press, 2003.

References [1] H. Abelson and G. J. Sussman. Structure and Interpretation of Computer Programs. The MIT Press, first edition, 1985. [2] ADT Coq. The Coq proof assistant. http://coq.inria.fr. [3] A. W. Appel and S. Blazy. Separation logic for small-step Cminor. In TPHOLs, volume 4732 of LNCS, pp. 5–21. Springer, 2007. [4] A. W. Appel and T. Jim. Continuation-passing, closure-passing style. In POPL, pp. 293–302. ACM Press, 1989. [5] A. Bloss, P. Hudak, and J. Young. Code optimizations for lazy evaluation. Lisp and Symbolic Computation, 1(2):147–164, 1988. [6] U. Boquist and T. Johnsson. The GRIN project: A highly optimising back end for lazy functional languages. In IFL ’96, volume 1268 of LNCS, pp. 58–84. Springer, 1996. [7] J. Chen, C. Hawblitzel, F. Perry, M. Emmi, J. Condit, D. Coetzee, and P. Pratikaki. Type-preserving compilation for large-scale optimizing object-oriented compilers. In PLDI, pp. 183–192, 2008.

284

Total Parser Combinators Nils Anders Danielsson School of Computer Science, University of Nottingham, United Kingdom [email protected]

Abstract

The library has an interface which is very similar to those of classical monadic parser combinator libraries. For instance, consider the following simple, left recursive, expression grammar:

A monadic parser combinator library which guarantees termination of parsing, while still allowing many forms of left recursion, is described. The library’s interface is similar to those of many other parser combinator libraries, with two important differences: one is that the interface clearly specifies which parts of the constructed parsers may be infinite, and which parts have to be finite, using dependent types and a combination of induction and coinduction; and the other is that the parser type is unusually informative. The library comes with a formal semantics, using which it is proved that the parser combinators are as expressive as possible. The implementation is supported by a machine-checked correctness proof.

term ::= factor | term ’+’ factor factor ::= atom | factor ’*’ atom atom ::= number | ’(’ term ’)’ We can define a parser which accepts strings from this grammar, and also computes the values of the resulting expressions, as follows (the combinators are described in Section 4): mutual term = factor | ] term > >= tok ’+’ > >= factor > >= return (n1 + n2 ) factor = atom | ] factor > >= tok ’*’ > >= atom > >= return (n1 ∗ n2 ) atom = number | tok ’(’ > >= ] term > >= tok ’)’ > >= return n

Categories and Subject Descriptors D.1.1 [Programming Techniques]: Applicative (Functional) Programming; E.1 [Data Structures]; F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about Programs; F.4.2 [Mathematical Logic and Formal Languages]: Grammars and Other Rewriting Systems—Grammar types, Parsing General Terms

Languages, theory, verification

Keywords Dependent types, mixed induction and coinduction, parser combinators, productivity, termination

1.

Introduction

Parser combinators (Burge 1975; Wadler 1985; Fairbairn 1987; Hutton 1992; Meijer 1992; Fokker 1995; R¨ojemo 1995; Swierstra and Duponcheel 1996; Koopman and Plasmeijer 1999; Leijen and Meijer 2001; Ljungl¨of 2002; Hughes and Swierstra 2003; Claessen 2004; Frost et al. 2008; Wallace 2008, and many others) can provide an elegant and declarative method for implementing parsers. When compared with typical parser generators they have some advantages: it is easy to abstract over recurring grammatical patterns, and there is no need to use a separate tool just to parse something. On the other hand there are also some disadvantages: there is a risk of lack of efficiency, and parser generators can give static guarantees about termination and non-ambiguity which most parser combinator libraries fail to give. This paper addresses one of these points by defining a parser combinator library which ensures statically that parsing will terminate for every finite input string.

λ n1 → → λ λ n2 →

λ n1 → → λ λ n2 → → λ λn → λ →

The only visible difference to classical parser combinators is the use of ], which indicates that the definitions are corecursive (see Section 2). However, we will see later that the parsers’ types contain more information than usual. When using parser combinators the parsers/grammars are often constructed using cyclic definitions, as above, so it is natural to see the definitions as being partly corecursive. However, a purely coinductive reading of the choice and sequencing combinators would allow definitions like the following ones: p = ]p|]p p0 = ] p0 > >= λ x → ] return (f x) For these definitions it is impossible to implement parsing in a total way (in the absence of hidden information): a defining characteristic of parser combinator libraries is that non-terminals are implicit, encoded using the recursion mechanism of the host language, so (in a pure setting) the only way to inspect p and p0 is via their infinite unfoldings. The key idea of this paper is that, even if non-terminals are implicit, totality can be ensured by reading choice inductively, and only reading an argument of the sequencing operator coinductively if the other argument does not accept the empty string (see Section 3). To support this idea the parsers’ types will contain information about whether or not they accept the empty string.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

285

The main contributions of the paper are as follows:

with some form of restriction. The combinators defined here can handle many left recursive grammars, but not all; for instance, the definition p = p is rejected statically. Lickman (1995) defines a library which can handle left recursion if a tailor-made fixpoint combinator, based on an idea due to Philip Wadler, is used. He proves (informally) that parsers defined using his combinators are terminating, as long as they are used in the right way; the argument to the fixpoint combinator must satisfy a nontrivial semantic criterion, which is not checked statically. Johnson (1995) and Frost et al. (2008) define libraries of recogniser and parser combinators, respectively, including memoisation combinators which can be used to handle left recursion. As presented these libraries can fail to terminate if used with grammars with an infinite number of non-terminals—for instance, consider the grammar { pn ::= p1+n | n ∈ N }, implemented by the definition p n = memoise n (p (1 + n))—and users of the libraries need to ensure manually that the combinators are used in the right way. The same limitations apply to a library described by Ljungl¨of (2002). This library uses an impure feature, observable sharing (Claessen and Sands 1999), to detect cycles in the grammar. Claessen (2001) mentions a similar implementation, attributing the idea to Magnus Carlsson. Kiselyov (2009) also presents a combinator library which can handle left recursion. Users of the library are required to annotate left recursive grammars with something resembling a coinductive delay constructor. If this constructor is used incorrectly, then parsing can terminate with the wrong answer. Baars et al. (2009) represent context-free grammars, including semantic actions, in a well-typed way. In order to avoid problems with left recursion when generating top-down parsers from the grammars they implement a left-corner transformation. Neither correctness of the transformation nor termination of the generated parsers is proved formally. Brink et al. (2010) perform a similar exercise, giving a partial proof of correctness, but no proof of termination. In Section 4.5 it is shown that the parser combinators are as expressive as possible—every parser which can be implemented using the host language can also be implemented using the combinators. In the case of finite token sets this holds even for nonmonadic parser combinators using the applicative functor interface (McBride and Paterson 2008); see Section 3.5. The fact that monadic parser combinators can be as expressive as possible has already been pointed out by Ljungl¨of (2002), who also mentions that applicative combinators can be used to parse some languages which are not context-free, because one can construct infinite grammars by using parametrised parsers. It has also been known for a long time that an infinite grammar can represent any language, decidable or not (Solomon 1977), and that the languages generated by many infinite grammars can be decided (Mazurkiewicz 1969). However, the result that monadic and applicative combinators have the same expressive strength for finite token sets seems to be largely unknown. For instance, Claessen (2004, page 742) claims that “with the weaker sequencing, it is only possible to describe context-free grammars in these systems”. Bonsangue et al. (2009, Example 2) represent a kind of regular expressions in a way which bears some similarity to the representation of recognisers in Section 3. Unlike the definition in this paper their definition is inductive, with an explicit representation of cycles: µx.ε, where ε can contain x. However, occurrences of x in ε have to be guarded by what amounts to the consumption of a token, just as in this paper. In Sections 3.3 and 4.2 Brzozowski derivative operators (Brzozowski 1964) are implemented for recognisers and parsers, and in Sections 3.4 and 4.3 these operators are used to characterise recogniser and parser equivalence coinductively. Rutten (1998) performs similar tasks for regular expressions.

• It is shown how parser combinators can be implemented in such

a way that termination is guaranteed, using a combination of induction and coinduction to represent parsers, and a variant of Brzozowski derivatives (1964) to run them. • Unlike many other parser combinator libraries these parser

combinators can handle many forms of left recursion. • The parser combinators come with a formal semantics. The

implementation is proved to be correct, and the combinators are shown to satisfy a number of laws. • It is shown that the parser combinators are as expressive as

possible (see Sections 3.5 and 4.5). The core of the paper is Sections 3 and 4. The former section introduces the ideas by using recognisers (parsers which do not return any result other than “the string matched” or “the string did not match”), and the latter section generalises to full parser combinators. Related work is discussed below. As mentioned above the parser type is defined using mixed induction and coinduction (Park 1980). This technique is explained in Section 2, and discussed further in the conclusions. Those readers who are not particularly interested in parser combinators may still find the paper useful as an example of the use of this technique. The parser combinator library is defined in the dependently typed functional programming language Agda (Norell 2007; Agda Team 2010), which will be introduced as we go along. The library comes with a machine-checked1 proof which shows that the implementation is correct with respect to the semantics. The code which the paper is based on is at the time of writing available from the author’s web page. 1.1

Related work

There does not seem to be much prior work on formally verified termination for parser combinators (or other general parsing frameworks). McBride and McKinna (2002) define grammars inductively, and use types to ensure that a token is consumed before a non-terminal can be encountered, thereby ruling out left recursion and non-termination. Danielsson and Norell (2008) and Koprowski and Binsztok (2010) use similar ideas; Koprowski and Binsztok also prove full correctness. Muad`Dib (2009) uses a monad annotated with Hoare-style pre- and post-conditions (Swierstra 2009) to define total parser combinators, including a fixpoint combinator whose type rules out left recursion by requiring the input to be shorter in recursive calls. Note that none of these other approaches can handle left recursion. The library defined in this paper seems to be the first one which both handles (many forms of) left recursion and guarantees termination for every parser which is accepted by the host language.2 It also seems fair to say that, when compared to the other approaches above, this library has an interface which is closer to those of “classical” parser combinator libraries. In the classical approach the ordinary general recursion of the host language is used to implement cyclic grammars; this library uses “ordinary” corecursion (restricted by types, see Section 3). There are a number of parser combinator libraries which can handle various forms of left recursion, but they all seem to come 1 Note that the meta-theory of Agda has not been properly formalised, and

Agda’s type checker has not been proved to be bug-free, so take words such as “machine-checked” with a grain of salt. 2 Danielsson and Norell (2009) define a parser using a specialised version of the library described in this paper. This version of the library can handle neither left nor right recursion, and is restricted to parsers which do not accept the empty string. A brief description of the parser interface is provided, but the implementation of the backend is not discussed.

286

2.

Induction and coinduction

corecursive call. It is easy to convince oneself that, if the input colist is productively computable, then the (spine of the) output colist must also be. Let us now consider what happens if a definition uses both induction and coinduction. We can define a language of “stream processors” (Carlsson and Hallgren 1998; Hancock et al. 2009), taking colists of As to colists of Bs, as follows:

The parser combinators defined in Sections 3 and 4 use a combination of induction and coinduction which may at first sight seem bewildering, so let us begin by discussing induction and coinduction. This discussion is rather informal. For more theoretical accounts of induction and coinduction see, for instance, the works of Hagino (1987) and Mendler (1988). Induction can be used to define types where the elements have finite “depth”. A simple example is the type of finite lists. In Agda this data type can be defined by giving the types of all the constructors:

data SP (A B : Set) : Set where get : (A → SP A B) → SP A B put : B → ∞ (SP A B) → SP A B done : SP A B

data List (A : Set) : Set where [] : List A :: : A → List A → List A

The recursive argument of get is inductive, while the recursive argument of put is coinductive. The type should be read as the nested fixpoint νX.µY. (A → Y) + B × X + 1, with an outer greatest fixpoint and an inner least fixpoint.3 This means that a stream processor can only read (get) a finite number of elements from the input before having to produce (put) some output or terminate (done). As a simple example of a stream processor, consider copy, which copies its input to its output:

This definition should be read inductively, i.e. all lists have finite length. Functions with underscore in their names are operators; marks the argument positions. For instance, the constructor :: is an infix operator. Set is a type of small types. Coinduction can be used to define types where some elements have infinite depth. Consider the type of potentially infinite lists (colists), for instance:

copy : ∀ {A} → SP A A copy = get (λ a → put a (] copy))

data Colist (A : Set) : Set where [] : Colist A :: : A → ∞ (Colist A) → Colist A

Note that copy is guarded (lambdas do not affect guardedness). The semantics of stream processors can be defined as follows: J K : ∀ {A B} → SP A B → Colist A → Colist B J get f K (a :: as) = J f a K ([ as) J put b sp K as = b :: ] J [ sp K as K = [] J

(Note that constructors can be overloaded.) The type function ∞ : Set → Set marks its argument as being coinductive. It is similar to the suspension type constructors which are used to implement non-strictness in strict languages (Wadler et al. 1998). Just as the suspension type constructors the function ∞ comes with delay and force functions, here called ] (sharp) and [ (flat): ] [

(J K is a mixfix operator.) In the case of get one element from the input colist is consumed (if possible), and potentially used to guide the rest of the computation, while in the case of put one output element is produced. The definition of J K uses a lexicographic combination of guarded corecursion and structural recursion:

: {A : Set} → A→∞A : {A : Set} → ∞ A → A

Sharp is a tightly binding prefix operator; ordinary function application binds tighter, though. (Flat is an ordinary function.) Note that {A : Set} → T is a dependent function space; the argument A is in scope in T. Arguments in braces, {. . .}, are implicit, and do not need to be given explicitly as long as Agda can infer them from the context. Agda is a total language. This means that all computations of inductive type must be terminating, and that all computations of coinductive type must be productive. A computation is productive if the computation of the next constructor is always terminating, so even though an infinite colist cannot be computed in finite time we know that the computation of any finite prefix has to be terminating. For types which are partly inductive and partly coinductive the inductive parts must always be computable in finite time, while the coinductive parts must always be productively computable. To ensure termination and productivity Agda employs two basic means for defining functions: inductive values can be destructed using structural recursion, and coinductive values can be constructed using guarded corecursion (Coquand 1994). As an example of the latter, consider the following definition of map for colists:

• In the second clause the corecursive call is guarded. • In the first clause the corecursive call is not guarded, but it

“preserves guardedness”: it takes place under zero occurrences of ] rather than at least one (and there are no destructors involved). Furthermore the stream processor argument is structurally smaller: f x is strictly smaller than get f for any x. This ensures the productivity of the resulting colist: the next output element can always be computed in finite time, because the number of get constructors between any two put constructors must be finite. Agda accepts definitions which use this kind of lexicographic combination of guarded corecursion and structural recursion. For more information about Agda’s criterion for accepting a program as total, and more examples of the use of mixed induction and coinduction in Agda, see Danielsson and Altenkirch (2010). It may be interesting to observe what would happen if get were made coinductive. In this case we could define more stream processors, for instance the following one: sink : ∀ {A B} → SP A B sink = get (λ → ] sink)

map : ∀ {A B} → (A → B) → Colist A → Colist B map f [ ] = [] map f (x :: xs) = f x :: ] map f ([ xs)

On the other hand we could no longer define J K as above (suitably modified), because the output of J sink K as would not be productive for infinite colists as. In other words, if we make more stream processors definable some functions become impossible to define.

(Note that the code ∀ {A B} → . . . means that the function takes two implicit arguments A and B; it is not an application of A to B.) Agda accepts this definition because the corecursive call to map is guarded: it occurs under the delay constructor ] , without any nonconstructor function application between the left-hand side and the

3 At the time of writing this interpretation is not correct in Agda (Altenkirch

and Danielsson 2010), but the differences are irrelevant for this paper.

287

3.

Recognisers

nonempty : ∀ {n} → P n → P false cast : ∀ {n1 n2} → n1 ≡ n2 → P n1 → P n2

This section defines a small embedded language of parser combinators. To simplify the explanation the parser combinators defined in this section can only handle recognition. Full parser combinators are described in Section 4. The aim is to define a data type with (at least) the following basic combinators as constructors: fail, which always fails; empty, which accepts the empty string; sat, which accepts tokens satisfying a given predicate; | , symmetric choice; and · , sequencing. Let us first consider whether the combinator arguments should be read inductively or coinductively. An infinite choice cannot be decided (in the absence of extra information), as this is not possible without inspecting every alternative, so choices will be read inductively. The situation is a bit trickier for sequencing. Consider definitions like p = p · p0 or p = p0 · p. If p0 accepts the empty string, then it seems hard to make any progress with these definitions. However, if p0 is guaranteed not to accept the empty string, then we know that any string accepted by the recursive occurrence of p has to be shorter than the one accepted by p · p0 or p0 · p. To make use of this observation I will indicate whether or not a recogniser is nullable (accepts the empty string) in its type, and the left (right) argument of · will be coinductive iff the right (left) argument is not nullable. Based on the observations above the type P of parsers (recognisers) can now be defined for a given token type Tok:

The nonempty combinator turns a recogniser which potentially accepts the empty string into one which definitely does not (see Section 3.1 for an example and 3.2 for its semantics), and cast can be used to coerce a recogniser indexed by n1 into a recogniser indexed by n2 , assuming that n1 is equal to n2 (the type n1 ≡ n2 is a type of proofs showing that n1 and n2 are equal). Both nonempty and cast are definable in terms of the other combinators—in the case of cast the definition is trivial, and nonempty can be defined by recursion over the inductive structure of its input—but due to Agda’s reliance on guarded corecursion it is convenient to have them available as constructors. 3.1

Examples

Using the definition above it is easy to define recognisers which are both left and right recursive, for instance the following one: left-right : P false left-right = ] left-right · ] left-right Given the semantics in Section 3.2 it is easy to show that left-right does not accept any string. This means that fail does not necessarily have to be primitive, it could be replaced by left-right. As examples of ill-defined recognisers, consider bad and bad2 :

mutual data P : fail empty sat | ·

Bool → Set where : P false : P true : (Tok → Bool) → P false : ∀ {n1 n2} → P n1 → P n2 → P (n1 ∨ n2 ) : ∀ {n1 n2} → ∞h n2 iP n1 → ∞h n1 iP n2 → P (n1 ∧ n2 ) ∞h iP : Bool → Bool → Set ∞h false iP n = ∞ (P n) ∞h true iP n = Pn

bad : P false bad = bad

bad2 : P true bad2 = bad2 · bad2

These definitions are rejected by Agda, because they are neither structurally recursive nor guarded. They are not terminating, either: an attempt to evaluate the inductive parts of bad or bad2 would lead to non-termination, because the definitions do not make use of the delay operator ] . As a more useful example of how the combinators above can be used to define derived recognisers, consider the following definition of the Kleene star:

Here P true represents those recognisers which accept the empty string, and P false those which do not: fail and sat do not accept the empty string, while empty does; a choice p1 | p2 is nullable if either p1 or p2 is; and a sequence p1 · p2 is nullable if both p1 and p2 are. The definition of the sequencing operator makes use of the mixfix operator ∞h iP to express the “conditional coinduction” discussed above: the left argument has type ∞h n2 iP n1 , which means that it is coinductive iff n2 is false, i.e. iff the right argument is not nullable. The right argument’s type is symmetric. The conditionally coinductive type ∞h iP comes with corresponding conditional delay and force functions:

mutual ? : P false → P true p ? = empty | p + + : P false → P false p + = p · ] (p ?) (The combinator | binds weaker than the other combinators.) The recogniser p ? accepts zero or more occurrences of whatever p accepts, and p + accepts one or more occurrences; this is easy to prove using the semantics in Section 3.2. Note that this definition is guarded, and hence productive.4 Note also that p must not accept the empty string, because if it did, then the right hand side of p + would have to be written p · p ?, which would make the definition unguarded and non-terminating—if p ? were unfolded, then no delay operator would ever be encountered. By using the nonempty combinator one can define a variant of ? which accepts arbitrary argument recognisers:

]? : ∀ {b n} → P n → ∞h b iP n ]? {b = false} x = ] x ]? {b = true} x = x [? : ∀ {b n} → ∞h b iP n → P n [? {b = false} x = [ x [? {b = true} x = x

F : ∀ {n} → P n → P true p F = nonempty p ?

(Here {b = . . .} is the notation for pattern matching on an implicit argument.) We can also define a function which returns true iff the argument is already forced:

For more examples, see Section 4.6.

forced? : ∀ {b n} → ∞h b iP n → Bool forced? {b = b} = b

4 The call to p + is not guarded in the definition of p ?, but all that matters

In addition to the constructors listed above the following constructors are also included in P:

for guardedness is calls from one function to itself. If p + is inlined it is clear that p ? is guarded.

288

3.2

Semantics

The backend will be implemented using so-called derivatives (Brzozowski 1964). The derivative D t p of p with respect to t is the “remainder” of p after p has matched the token t; it should satisfy the equivalence

The semantics of the recognisers is defined as an inductive family. The type s ∈ p is inhabited iff the token string s is a member of the language defined by p: data ∈ ...

s ∈ Dtp

: ∀ {n} → List Tok → P n → Set where

The semantics is determined by the constructors of ∈ , which are introduced below. The values of type s ∈ p are proofs of language membership; the constructors can be seen as inference rules. To avoid clutter the declarations of bound variables are omitted in the constructors’ type signatures. No string is a member of the language defined by fail, so there is no constructor for it in ∈ . The empty string is recognised by empty:

t :: s ∈ p.

D : ∀ {n} (t : Tok) (p : P n) → P (D-nullable t p) The function D-nullable decides whether the derivative accepts the empty string or not. Its extensional behaviour is uniquely constrained by the definition of D; its definition is included in Figure 1. The derivative operator is implemented as follows. The combinators fail and empty never accept any token, so they both have the derivative fail:

empty : [ ] ∈ empty (Recall that constructors can be overloaded.) The singleton [ t ] is recognised by sat f if f t evaluates to true (T b is inhabited iff b is true):

D t fail = fail D t empty = fail

sat : T (f t) → [ t ] ∈ sat f If s is recognised by p1 , then it is also recognised by p1 | p2 , and similarly for p2 :

The combinator sat f has a non-zero derivative with respect to t iff f t is true:

|-left : s ∈ p1 → s ∈ p1 | p2 |-right : s ∈ p2 → s ∈ p1 | p2

D t (sat f ) with f t . . . | true = empty . . . | false = fail

If s1 is recognised by p1 (suitably forced), and s2 is recognised by p2 (suitably forced), then the concatenation of s1 and s2 is recognised by p1 · p2 :

(Here the with construct is used to pattern match on the result of f t.) The derivative of a choice is the choice of the derivatives of its arguments:

: s1 ∈ [? p1 → s2 ∈ [? p2 → s1 + + s2 ∈ p1 · p2 If a nonempty string is recognised by p, then it is also recognised by nonempty p (and empty strings are never recognised by nonempty p): ·

D t (p1 | p2 ) = D t p1 | D t p2 The derivatives of nonempty p and cast eq p are equal to the derivative of p: D t (nonempty p) = D t p D t (cast eq p) = Dtp

nonempty : t :: s ∈ p → t :: s ∈ nonempty p Finally cast preserves the semantics of its recogniser argument:

The final and most interesting case is sequencing:

cast : s ∈ p → s ∈ cast eq p

D t (p1 · p2 ) with forced? p1 | forced? p2 . . . | true | false = D t p1 · ]? ([ p2 ) . . . | false | false = ] D t ([ p1 ) · ]? ([ p2 ) . . . | true | true = D t p1 · ]? p2 | D t p2 . . . | false | true = ] D t ([ p1 ) · ]? p2 | D t p2

It is easy to show that the semantics and the nullability index agree: if p : P n, then [ ] ∈ p iff n is equal to true (one direction can be proved by induction on the structure of the semantics, and the other by induction on the inductive structure of the recogniser; delayed sub-parsers do not need to be forced). Given this result it is easy to decide whether or not [ ] ∈ p; it suffices to inspect the index:

Here we have four cases, depending on the indices of p1 and p2 : • In the first two cases the right argument is not forced, which implies (given the type of · ) that the left argument is not nullable. This means that the first token accepted by p1 · p2 (if any) has to be accepted by p1 , so the remainder after accepting this token is the remainder of p1 followed by p2 .

nullable? : ∀ {n} (p : P n) → Dec ([ ] ∈ p) Note that the correctness of nullable? is stated in its type. An element of Dec P is either a proof of P or a proof showing that P is impossible: data Dec (P : Set) : Set where yes : P → Dec P no : ¬ P → Dec P

• In the last two cases p1 is nullable, which means that the first token could also be accepted by p2 . This is reflected in the presence of an extra choice D t p2 on the right-hand side. In all four cases the operator ]? is used to conditionally delay

Here logical negation is represented as a function into the empty type: ¬ P = P → ⊥. 3.3

⇔

By applying the derivative operator D to t1 and p, then to t2 and D t1 p, and so on for every element of the input string s, one can decide if s ∈ p is inhabited. The new recogniser constructed by D may not have the same nullability index as the original one, so D has the following type signature:

p2 , depending on the nullability index of the derivative of p1 ; the implicit argument b to ]? is inferred automatically. The derivative operator D is total: it is implemented using a lexicographic combination of guarded corecursion and structural recursion (as in Section 2). Note that in the first two sequencing cases p2 is delayed, but D is not applied recursively to [ p2 because p1 is known not to accept the empty string.

Backend

Let us now consider how the relation ∈ can be decided, or alternatively, how the language of recognisers can be interpreted. No attempt is made to make this recogniser backend efficient, the focus is on correctness. (Efficiency is discussed further in Section 4.2.)

289

It is straightforward to show that ≈ is an equivalence relation, if the definition of “equivalence relation” is generalised to accept indexed sets. (Such generalisations are silently assumed in the remainder of this text.) It is also easy to show that ≈ is a congruence—i.e. that it is preserved by all the primitive recogniser combinators—and that 6 is a partial order with respect to ≈ . The following definition provides an alternative, coinductive characterisation of equality:

D-nullable : ∀ {n} → Tok → P n → Bool D-nullable t fail = false D-nullable t empty = false D-nullable t (sat f ) = f t D-nullable t (p1 | p2 ) = D-nullable t p1 ∨ D-nullable t p2 D-nullable t (nonempty p) = D-nullable t p = D-nullable t p D-nullable t (cast p) D-nullable t (p1 · p2 ) with forced? p1 | forced? p2 . . . | true | false = D-nullable t p1 . . . | false | false = false . . . | true | true = D-nullable t p1 ∨ D-nullable t p2 . . . | false | true = D-nullable t p2

data ≈c {n1 n2} (p1 : P n1 ) (p2 : P n2 ) : Set where :: : n1 ≡ n2 → (∀ t → ∞ (D t p1 ≈c D t p2 )) → p1 ≈c p2 Two recognisers are equal iff they agree on whether the empty string is accepted, and for every token the respective derivatives are equal (coinductively). Note that the values of this data type are infinite proofs5 witnessing the equivalence of the two parsers. Note also that this equality is a form of bisimilarity: the “transitions” are of the form

Figure 1. The index function D-nullable.

(n,t)

The index function D-nullable uses recursion on the inductive structure of the recogniser. Note that D-nullable does not force any delayed recogniser (it does not use [). Readers familiar with dependent types may find it interesting that this definition relies on the fact that ∧ is defined by pattern matching on its right argument. If ∧ were defined by pattern matching on its left argument, then the type checker would no longer reduce the open term D-nullable ([ p1 ) t ∧ false to false when checking the definition of D. This problem could be fixed by using an equality proof in the definition of D, though. It is straightforward to show that the derivative operator D satisfies both directions of its specification:

p −−−→ D t p, where p : P n. It is easy to show that ≈ and ≈c are equivalent. When proving properties of recognisers one can choose the equality which is most convenient for the task at hand. For an example of a proof using a coinductively defined equality, see Section 4.4. The type of the sequencing combinator is not quite right if we want to state properties such as associativity, so let us introduce the following variant of it: : ∀ {n1 n2} → P n1 → P n2 → P (n1 ∧ n2 ) {n1 = n1} p1 p2 = ]? p1 · ]? {b = n1} p2 (Agda does not manage to infer the value of the implicit argument b, but we can still give it manually.) Using the combinator it is easy to prove that the recognisers form an idempotent semiring:

D-sound : ∀ {n s t} {p : P n} → s ∈ D t p → t :: s ∈ p D-complete : ∀ {n s t} {p : P n} → t :: s ∈ p → s ∈ D t p

p|p fail | p p empty empty p fail p p fail

These statements can be proved by induction on the structure of the semantics. Once the derivative operator is defined and proved correct it is easy to decide if s ∈ p is inhabited: ∈? : ∀ {n} (s : List Tok) (p : P n) → Dec (s ∈ p) [ ] ∈? p = nullable? p t :: s ∈? p with s ∈? D t p . . . | yes s∈Dtp = yes (D-sound s∈Dtp) . . . | no s/ ∈Dtp = no (s/ ∈Dtp ◦ D-complete)

p p p p fail fail

p 1 | p2 p1 | (p2 | p3 ) p1 (p2 p3 ) p1 (p2 | p3 ) (p1 | p2 ) p3

≈ ≈ ≈ ≈ ≈

p2 | p1 (p1 | p2 ) | p3 (p1 p2 ) p3 p1 p2 | p1 p3 p1 p3 | p2 p3

It is also easy to show that the order 6 coincides with the natural order of the join-semilattice formed by | : p1 6 p2

In the case of the empty string the nullability index tells us whether the string should be accepted or not, and otherwise ∈? is recursively applied to the derivative and the tail of the string; the specification of D ensures that this is correct. (Note that s∈Dtp and s/ ∈Dtp are normal variables with descriptive names.) As an aside, note that the proof returned by ∈? when a string matches is actually a parse tree, so it would not be entirely incorrect to call these recognisers parsers. However, in the case of ambiguous grammars at most one parse tree is returned. The implementation of parse in Section 4.2 returns all possible results. 3.4

≈ ≈ ≈ ≈ ≈ ≈

⇔

p1 | p2 ≈ p2

By using the generalised Kleene star F from Section 3.1 one can also show that the recognisers form a F-continuous Kleene algebra (Kozen 1990): p1 (p2 F) p3 is the least upper bound of the set { p1 (p2 ˆ i) p3 | i ∈ N }, where p ˆ i is the i-fold repetition of p: h ˆ i-nullable : Bool → N → Bool h n ˆ zero i-nullable = h n ˆ suc i i-nullable = ˆ : ∀ {n} → P n → (i : N) → P (h n ˆ i i-nullable) p ˆ zero = empty p ˆ suc i = p (p ˆ i)

Laws

Given the semantics above it is easy to prove that the combinators satisfy various laws. Let us first define that two recognisers are equivalent when they accept the same strings:

(Here zero and suc are the two constructors of N. Note that Agda can figure out the right-hand sides of h ˆ i-nullable automatically, given the definition of ˆ ; see Section 4.6.)

≈ : ∀ {n1 n2} → P n1 → P n2 → Set p1 ≈ p2 = p1 6 p2 × p2 6 p1 Here × is conjunction, and 6 encodes language inclusion:

3.5

Expressive strength

Is the language of recognisers defined above useful? It may not be entirely obvious that the restrictions imposed to ensure totality

6 : ∀ {n1 n2} → P n1 → P n2 → Set p1 6 p2 = ∀ {s} → s ∈ p1 → s ∈ p2

5 If Tok is non-empty.

290

Consider the monadic parser combinator bind, >>= : The parser p1 > >= p2 successfully returns a value y for a given string s if p1 parses a prefix of s, returning a value x, and p2 x parses the rest of s, returning y. Note that p1 > >= p2 accepts the empty string iff p1 accepts the empty string, returning a value x, and p2 x also accepts the empty string. This shows that the values which a parser can return without consuming any input can be relevant for determining if another parser is nullable. This suggests that, in analogy with the treatment of recognisers, a parser should be indexed by its “initial set”—the set of values which can be returned when the input is empty. However, sometimes it is useful to distinguish two grammars if the number of parse trees corresponding to a certain string differ. For instance, the parser backend defined in Section 4.2 returns twice as many results for the parser p | p as for the parser p. In order to take account of this distinction parsers are indexed by their return types and their “initial bags” (or multisets), represented as lists:

do not rule out the definition of many useful recognisers. Fortunately this is not the case, at least not if Tok, the set of tokens, is finite, because then it can be proved that every function of type List Tok → Bool which can be implemented in Agda can also be realised as a recogniser. For simplicity this will only be shown in the case when Tok is Bool. The basic idea is to turn a function f : List Bool → Bool into a grammar representing an infinite binary tree, with one node for every possible input string, and to make a given node accepting iff f returns true for the corresponding string. Let us first define a recogniser which only accepts the empty string, and only if its argument is true: accept-if-true : ∀ b → P b accept-if-true true = empty accept-if-true false = fail Using this recogniser we can construct the “infinite binary tree” using guarded corecursion:

mutual data Parser : (R : Set) → List R → Set1 where ...

grammar : (f : List Bool → Bool) → P (f [ ]) grammar f = cast (lemma f ) ( ]? (sat id) · ] grammar (f ◦ :: true ) | ]? (sat not) · ] grammar (f ◦ :: false) | accept-if-true (f [ ]))

(Set1 is a type of large types; Agda is predicative.) The first four combinators have relatively simple types. The return combinator is the parser analogue of empty. When accepting the empty string it returns its argument:

Note that sat id recognises true, and sat not recognises false. The following lemma is also used above:

return : ∀ {R} (x : R) → Parser R [ x ] (Note that [ ] is the return function of the list monad.) The fail parser, which mirrors the fail recogniser, always fails:

lemma : ∀ f → (false ∧ f [ true ] ∨ false ∧ f [ false ]) ∨ f [ ] ≡ f [ ]

fail : ∀ {R} → Parser R [ ]

The final step is to show that, for any string s, f s ≡ true iff s ∈ grammar f . The “only if” part can be proved by induction on the structure of s, and the “if” part by induction on the structure of s ∈ grammar f . Note that the infinite grammar above has a very simple structure: it is LL(1). I suspect that this grammar can be implemented using a number of different parser combinator libraries. As an aside it may be interesting to know that the proof above does not require the use of lemma. The following left recursive grammar can also be used:

(Note that [ ] is the zero of the list monad.) The token parser accepts any single token, and returns this token: token : Parser Tok [ ] This combinator is not as general as sat, but a derived combinator sat is easy to define using token and bind, see Section 4.6. The analogue of the choice recogniser is | : |

grammar : (f : List Bool → Bool) → P (f [ ]) grammar f = ] grammar (λ xs → f (xs + + [ true ])) · ]? (sat id) ] | grammar (λ xs → f (xs + + [ false ])) · ]? (sat not) | accept-if-true (f [ ])

The initial bag of a choice is the union of the initial bags of its two arguments. The bind combinator’s type is more complicated than the types above. Consider p1 > >= p2 again. Here p2 is a function, and we have a function f : R1 → List R2 which computes the initial bag of p2 x, depending on the value of x. When should we allow p1 to be coinductive? One option is to only allow this when f x is empty for every x, but I do not want to require the user of the library to prove such a property just to define a parser. Instead I have chosen to represent the function f with an optional function f : Maybe (R1 → List R2 ),6 where nothing represents λ → [ ], and to make p1 coinductive iff f is nothing. The same approach is used for xs, the initial bag of p1 :

This shows that nonempty and cast are not necessary to achieve full expressive strength, because neither grammar nor the backend rely on these operators. Finally let us consider the case of infinite token sets. If the set of tokens is the natural numbers, then it is quite easy to see that it is impossible to implement a recogniser for the language { nn | n ∈ N }. By generalising the statement to “it is impossible that p accepts infinitely many identical pairs, and only identical pairs and/or the empty string” (where an identical pair is a string of the form nn) one can prove this formally by induction on the structure of p (see the accompanying code). Note that this restriction does not apply to the monadic combinators introduced in the next section, which have maximal expressive strength also for infinite token sets.

4.

: ∀ {R xs1 xs2} → Parser R xs1 → Parser R xs2 → Parser R (xs1 + + xs2 )

> >= : ∀ {R1 R2} {xs : Maybe (List R1 )} {f : Maybe (R1 → List R2 )} → ∞h f iParser R1 (flatten xs) → ((x : R1 ) → ∞h xs iParser R2 (apply f x)) → Parser R2 (bind xs f ) The helper functions flatten, apply and bind, which interpret nothing as the empty list or the constant function returning the

Parsers

6 The type Maybe A has the two constructors nothing : Maybe A and just : A → Maybe A.

This section describes how the recogniser language above can be extended to actual parser combinators, which return results.

291

flatten : {A : Set} → Maybe (List A) → List A flatten nothing = [ ] flatten (just xs) = xs

data ∈ ·

: ∀ {R xs} → R → Parser R xs → List Tok → Set1 where return : x ∈ return x · [ ] token : t ∈ token · [ t ] |-left : x ∈ p1 · s → x ∈ p1 | p2 · s |-right : x ∈ p2 · s → x ∈ p1 | p2 · s > >= : x ∈ [? p1 · s1 → y ∈ [? (p2 x) · s2 → y ∈ p1 > >= p2 · s1 + + s2 nonempty : x ∈ p · t :: s → x ∈ nonempty p · t :: s cast : x ∈ p · s → x ∈ cast eq p · s

apply : {A B : Set} → Maybe (A → List B) → A → List B apply nothing x = [ ] apply (just f ) x = f x bind : {A B : Set} → Maybe (List A) → Maybe (A → List B) → List B bind xs nothing = [ ] bind xs (just f ) = bindL (flatten xs) f Figure 2. Helper functions used in the type signature of > >= . Note that there is a reason for not defining bind using the equation bind xs f = bindL (flatten xs) (apply f ); see Section 4.6.

Figure 3. The semantics of the parser combinators. To avoid clutter the declarations of bound variables are omitted in the constructors’ type signatures.

empty list, are defined in Figure 2; bind is defined in terms of bindL , the standard list monad’s bind operation. The function ∞h iParser is defined as follows, mutually with Parser:

Here A ⇔ B means that A and B are equivalent: there is a function of type A → B and another function of type B → A. We immediately get that language equivalence is an equivalence relation. As mentioned above language equivalence is sometimes too weak. We may want to distinguish between grammars which define the same language, if they do not agree on the number of ways in which a given value can be produced from a given string. To make the example given above more concrete, the parser backend defined in Section 4.2 returns one result when the empty string is parsed using return true (parse tree: return), and two results when return true | return true is used (parse trees: |-left return and |-right return). Based on this observation two parsers are defined to be parser equivalent ( ∼ = ) if, for all values and strings, the respective sets of parse trees have the same cardinality: ∼ = : ∀ {R xs1 xs2} → Parser R xs1 → Parser R xs2 → Set1 p1 ∼ = p2 = ∀ {x s} → x ∈ p1 · s ↔ x ∈ p2 · s

∞h iParser : {A : Set} → Maybe A → (R : Set) → List R → Set1 ∞h nothing iParser R xs = ∞ (Parser R xs) ∞h just iParser R xs = Parser R xs (∞ works also for Set1 .) It is straightforward to define a variant of [? for this type. It is not necessary to define ]? , though: instead of conditionally delaying one can just avoid using nothing. Just as in Section 3 two additional constructors are included in the definition of Parser: nonempty : ∀ {R xs} → Parser R xs → Parser R [ ] cast : ∀ {R xs1 xs2} → xs1 ≈bag xs2 → Parser R xs1 → Parser R xs2 Here ≈bag stands for bag equality between lists, equality up to permutation of elements; the cast combinator ensures that one can replace one representation of a parser’s initial bag with another. Bag equality is defined in two steps. First list membership is encoded inductively as follows:

From its definition we immediately get that parser equivalence is an equivalence relation. Parser equivalence is strictly stronger than language equivalence: the former distinguishes between return true and return true | return true, while the latter is idempotent. Just as in Section 3.2 the initial bag index is correct:

data ∈ {A : Set} : A → List A → Set where here : ∀ {x xs} → x ∈ x :: xs there : ∀ {x y xs} → y ∈ xs → y ∈ x :: xs

index-correct : ∀ {R xs x} {p : Parser R xs} → x ∈ p · [ ] ↔ x ∈ xs Note the use of ↔ : the number of parse trees for x matches the number of occurrences of x in the list xs. One direction of the inverse can be defined by recursion on the structure of the semantics, and the other by recursion on the structure of ∈ . From index-correct we easily get that parsers which are parser equivalent have equal initial bags:

Two lists xs and ys are then deemed “bag equal” if, for every value x, x is a member of xs as often as it is a member of ys: ≈bag : ∀ {R} → List R → List R → Set xs ≈bag ys = ∀ {x} → x ∈ xs ↔ x ∈ ys Here A ↔ B means that there is an invertible function from A to B, so A and B must have the same cardinality. 4.1

same-bag : ∀ {R xs1 xs2} {p1 : Parser R xs1} {p2 : Parser R xs2} → p1 ∼ = p2 → xs1 ≈bag xs2

Semantics

The semantics of the parser combinators is defined as a relation ∈ · , such that x ∈ p · s is inhabited iff x is one of the results of parsing the string s using the parser p. This relation is defined in Figure 3. Note that values of type x ∈ p · s can be seen as parse trees. The parsers come with two kinds of equivalence. The weaker one, language equivalence ( ≈ ), is a direct analogue of the equivalence used for recognisers in Section 3.4:

Similarly, language equivalent parsers have equal initial sets. 4.2

Backend

Following Section 3.3 it is easy to implement a derivative operator for parsers: D : ∀ {R xs} (t : Tok) (p : Parser R xs) → Parser R (D-bag t p)

: ∀ {R xs1 xs2} → Parser R xs1 → Parser R xs2 → Set1 p1 ≈ p2 = ∀ {x s} → x ∈ p1 · s ⇔ x ∈ p2 · s ≈

The implementation of the function D-bag which computes the derivative’s initial bag can be seen in Figure 4. Both D and D-bag use analogues of the forced? function from Section 3:

292

D-correct : ∀ {R xs x s t} {p : Parser R xs} → x ∈ D t p · s ↔ x ∈ p · t :: s

D-bag : ∀ {R xs} → Tok → Parser R xs → List R D-bag t (return x) = [] D-bag t fail = [] D-bag t token = [t] D-bag t (p1 | p2 ) = D-bag t p1 + + D-bag t p2 D-bag t (nonempty p) = D-bag t p D-bag t (cast eq p) = D-bag t p D-bag t (p1 > >= p2 ) with forced? p1 | forced?0 p2 . . . | just f | nothing = bindL (D-bag t p1 ) f . . . | just f | just xs = bindL (D-bag t p1 ) f + + bindL xs (λ x → D-bag t (p2 x)) . . . | nothing | nothing = [ ] . . . | nothing | just xs = bindL xs (λ x → D-bag t (p2 x))

Both directions of the inverse can be defined by recursion on the structure of the semantics, with the help of index-correct. Given the derivative operator it is easy to define the parser backend: parse : ∀ {R xs} → Parser R xs → List Tok → List R parse {xs = xs} p [ ] = xs parse p (t :: s) = parse (D t p) s The correctness of this implementation follows easily from indexcorrect and D-correct: parse-correct : ∀ {R xs x s} {p : Parser R xs} → x ∈ p · s ↔ x ∈ parse p s Both directions of the inverse can be defined by recursion on the structure of the input string. Note that this proof establishes that a parser can only return a finite number of results for a given input string (because the list returned by parse is finite)—infinitely ambiguous grammars cannot be represented in this framework. As mentioned in Section 4.1 we have

Figure 4. The index function D-bag. Note that its implementation falls out almost automatically from the definition of D. forced? : ∀ {A R xs m} → ∞h m iParser R xs → Maybe A forced? {m = m} = m forced?0 : ∀ {A R1 R2 : Set} {m} {f : R1 → List R2} → ((x : R1 ) → ∞h m iParser R2 (f x)) → Maybe A forced?0 {m = m} = m

parse (return true | return true) [ ] ≡ true :: true :: [ ] . It might seem reasonable for parse to remove duplicates from the list of results. However, the result type is not guaranteed to come with decidable equality (consider functions, for instance), so such filtering is left to the user of parse. The code above is not optimised, and mainly serves to illustrate that it is possible to implement a Parser backend which guarantees termination. It is not too hard to see that, in the worst case, parse is at least exponential in the size of the input string. Consider the following parser:

The non-recursive cases of D, along with choice, nonempty and cast, are easy: D t (return x) D t fail D t token D t (p1 | p2 ) D t (nonempty p) D t (cast eq p)

= = = = = =

fail fail return t D t p1 | D t p2 Dtp Dtp

p : Parser Bool [ ] p = fail > >= λ (b : Bool) → fail

The last case, > >= , is more interesting. It makes use of the combinator return?, which can return any element of its argument list:

The derivative D t p is p | p, for any token t. After taking n derivatives we get a parser with 2n − 1 choices, and all these choices have to be traversed to compute the parser’s initial bag. The parser p may seem contrived, but similar parsers can easily arise as the result of taking the derivative of more useful parsers. It may be possible to implement more efficient backends. For instance, one can make use of algebraic laws like fail > >= p ∼ = fail (see Section 4.4) to simplify parsers, and perhaps avoid the kind of behaviour described above, at least for certain classes of parsers. Exploring such optimisations is left for future work, though.

return? : ∀ {R} (xs : List R) → Parser R xs return? [ ] = fail return? (x :: xs) = return x | return? xs The code is very similar to the code for sequencing in Section 3.3: D t (p1 > >= p2 ) with forced? p1 | forced?0 p2 . . . | just f | nothing = D t p1 > >= (λ x → [ (p2 x)) . . . | nothing | nothing = ] D t ([ p1 ) > >= (λ x → [ (p2 x)) . . . | just f | just xs = D t p1 > >= (λ x → p2 x) | return? xs > >= (λ x → D t (p2 x)) . . . | nothing | just xs = ] D t ([ p1 ) > >= (λ x → p2 x) | return? xs > >= (λ x → D t (p2 x))

4.3

Coinductive equivalences

In Section 3.4 a coinductive characterisation of recogniser equivalence is given. This is possible also for parser equivalence: data ∼ =c {R xs1 xs2} (p1 : Parser R xs1 ) (p2 : Parser R xs2 ) : Set where :: : xs1 ≈bag xs2 → (∀ t → ∞ (D t p1 ∼ =c D t p2 )) → p1 ∼ =c p2

There are two main differences. One is the absence of ]? . The other difference can be seen in the last two cases, where p1 is potentially nullable (it is if xs is nonempty). The corresponding right-hand sides are implemented as choices, as before. However, the right choices are a bit more involved than in Section 3.3. They correspond to the cases where p1 succeeds without consuming any input, returning one of the elements of its initial bag xs. In this case the elements of the initial bag index of p1 are returned using return?, and then combined with p2 using bind. The implementation of D-bag is structurally recursive, while the implementation of D uses a lexicographic combination of guarded corecursion and structural recursion, just as in Section 3.3. It is straightforward to prove the following correctness property:

Two parsers are equivalent if their initial bags are equal, and, for every token t, the respective derivatives with respect to t are equivalent (coinductively). Using index-correct and D-correct it is easy to show that the two definitions of parser equivalence, ∼ = and ∼ =c , are equivalent. By replacing the use of ↔ in the definition of bag equality with ⇔ we get set equality instead. If, in turn, the use of bag equality is replaced by set equality in ∼ =c , then we get a coinductive characterisation of language equivalence ( ≈ ).

293

> >=-left-identity : {R1 R2 : Set} {f : R1 → List R2} (x : R1 ) (p : (x : R1 ) → Parser R2 (f x)) → return x > >= p ∼ =c p x > >=-left-identity {f = f} x p = bindL -left-identity x f :: λ t → ] ( D t (return x > >= p) fail > >= p | return? [ x ] > >= (λ x → D t (p x)) fail | return x > >= (λ x → D t (p x)) return x > >= (λ x → D t (p x)) D t (p x)

By using the coinductive characterisations of equivalence I have proved that all primitive parser combinators preserve both language and parser equivalence, i.e. the equivalences are congruences. 4.4

Laws

Let us now discuss the equational theory of the parser combinators. Many of the laws from Section 3.4 can be generalised to the setting of parser combinators. To start with we have a commutative monoid formed by fail and | : p1 | p2 fail | p (p1 | p2 ) | p3

∼ = p2 | p1 ∼ = p ∼ = p1 | (p2 | p3 )

(To avoid clutter the proof above uses the equational reasoning notation . . . ∼ =c . . . , and the sub-proofs for the =c . . . ∼ individual steps have been omitted.) The proof has two parts. First bindL -left-identity is used to show that the initial bags of return x > >= p and p x are equal, and then it is shown, for every token t, that D t (return x > >= p) and D t (p x) are equivalent. The first step of the latter part uses a law relating D and > >= , the second step uses the left zero law (fail > >= p ∼ =c fail) and the right identity law for choice (p | fail ∼ =c p), the third step uses the left identity law for choice (fail | p ∼ =c p), and the last step uses the coinductive hypothesis. The proof as written above would not be accepted by Agda, because the coinductive hypothesis is not guarded by constructors (due to the uses of transitivity implicit in the equational reasoning notation). However, this issue can be addressed (Danielsson 2010). For details of how all the properties above have been proved, see the code accompanying the paper.

If language equivalence is used this monoid is also idempotent: p|p ≈ p We also have a monad, with fail as a left and right zero of bind, and bind distributing from the left and right over choice: return x > >= p p > >= return p1 > >= (λ x → p2 x > >= p3 ) fail > >= p p > >= (λ → fail) p1 > >= (λ x → p2 x | p3 x) (p1 | p2 ) > >= p3

∼ = ∼ = ∼ = ∼ = ∼ = ∼ = ∼ =

∼ =c ∼c = ∼ =c ∼ =c )

px p (p1 > >= p2 ) > >= p3 fail fail p1 > >= p2 | p1 > >= p3 p1 > >= p3 | p2 > >= p3

Unlike in Section 3.4 there is no need to define a special variant of > >= to state the laws above: if the types of the argument parsers are given (as for > >=-left-identity below), then Agda automatically infers that bind’s implicit arguments xs and f should have the form just something. Analogues of most of the laws from Section 3.4 are listed above. However, assuming that the token type is inhabited, it is not possible to find a function

4.5

Expressive strength

This subsection is concerned with the parser combinators’ expressiveness. By using bind one can strengthen the result from Section 3.5 to arbitrary sets of tokens: every function of type List Tok → List R can be realised as a parser (if bag equality is used for the lists of results). The grammar is similar to the construction in Section 3.5:

f : ∀ {R xs} → Parser R xs → List (List R)

grammar : ∀ {R} (f : List Tok → List R) → Parser R (f [ ]) grammar f = token > >= (λ t → ] grammar (f ◦ :: t)) | return? (f [ ])

and a Kleene-star-like combinator F : ∀ {R xs} (p : Parser R xs) → Parser (List R) (f p)

The function grammar satisfies the following correctness property:

such that

grammar-correct : ∀ {R x s} (f : List Tok → List R) → x ∈ grammar f · s ↔ x ∈ f s

return [ ] | (p > >= λ x → p F > >= λ xs → return (x :: xs)) 6 pF

One direction of the inverse can be defined by induction on the structure of the semantics, and the other by induction on the structure of the input string. If we combine this result with parse-correct we get the expressiveness result:

holds for all p. (Here 6 is defined as in Section 3.4.) The reason is that p may be nullable, in which case the inequality above implies that xs ∈ p F · [ ] must be satisfied for infinitely many lists xs, whereas parse-correct shows that a parser can only return a finite number of results. (A combinator F satisfying the inequality above can easily be implemented if it is restricted to non-nullable argument parsers.) Before leaving the subject of equational laws, let me take a moment to explain how one of the laws above—the left identity law for bind—can be proved. Assume that we have already proved some of the other laws, along with the following property of bindL :

maximally-expressive : ∀ {R} (f : List Tok → List R) {s} → parse (grammar f ) s ≈bag f s Assume for a moment that the primitive parser combinators included sat and applicative functor application (McBride and Paterson 2008) instead of token and bind. Then, for finite sets of tokens, we could have defined grammar roughly as in Section 3.5. This means that, for finite sets of tokens, the inclusion of the monadic bind combinator does not provide any expressive advantage; the applicative functor interface is already sufficiently expressive. This comparison does not take efficiency into account, though.

bindL -left-identity : {A B : Set} (x : A) (f : A → List B) → bindL [ x ] f ≈bag f x

4.6

I have found the coinductive characterisations of the equivalences to be convenient to work with, so I have proved the law roughly as follows:

Examples

Finally let us consider some examples, along with some practical remarks.

294

Let us start with the left recursive grammar in the introduction. Note that it does not require any user annotations, except for the three uses of ] . Agda infers all the type signatures and all the implicit arguments, including several functions, automatically. Agda’s inference mechanism is based on unification (a variant of pattern unification (Pfenning 1991)), and an omitted piece of code is only “filled in” if it can be uniquely determined from the constraints provided by the rest of the code. In general there is no guarantee that implicit arguments can be omitted, and it is not uncommon that the exact form of a definition affects how much can be inferred. Consider the definition of bind in Figure 2. It is set up so that bind xs nothing evaluates to the empty list, even if xs is a neutral term. If bind had instead been defined by the equation

good framework for understanding lazy programs. To take one example, Claessen (2004) defines the following parser data type using Haskell: data P0 s a = SymbolBind (s → P0 s a) | Fail | ReturnPlus a (P0 s a) He notes that it is isomorphic to the stream processor type used in Fudgets (Carlsson and Hallgren 1998), and that this isomorphism “inspired the view of the parser combinators being parsing process combinators”. However, in a total setting I would define these two types differently. The stream processors were defined in Section 2, with an inductive get constructor and a coinductive put constructor. I find it natural to define P0 in the opposite way:

bind xs f = bindL (flatten xs) (apply f ), then the example in the introduction would have required manual annotations: the example gives rise to the constraint xs = bind (just xs) nothing, which with the alternative definition of bind reduces to xs = bindL xs (λ → [ ]), and Agda cannot solve this unification problem. As an example of a definition for which the initial bag is not inferred automatically, consider the following definition of sat:

data P0 (S A : symbolBind fail returnPlus

The reason for the difference is that the types are used differently. Stream processors are interpreted using J K, and parsers using parse0 , which works with finite lists:

sat : ∀ {R} → (Tok → Maybe R) → Parser R sat {R = R} p = token > >= λ t → ok (p t) where ok-bag : Maybe R → List R ok-bag nothing = ok-bag (just x) = ok : (x : Maybe R) → Parser R (ok-bag x) ok nothing = fail ok (just x) = return x

parse0 : ∀ {S A} → P0 S A → List S → List (A × List S) parse0 (symbolBind f ) (c :: s) = parse0 ([ (f c)) s parse0 (returnPlus x p) s = (x, s) :: parse0 p s 0 = [] parse The definition of J K in Section 2 would not be total if get were coinductive, because then we could not guarantee that the resulting colist would be productive. On the other hand, if returnPlus were coinductive and symbolBind inductive, then parsers like the one used in the proof of maximal expressiveness in Section 4.5 could not be implemented (consider the case when the argument to grammar is λ → [ ]). The use of lazy data types and general recursion in Haskell is very flexible—for instance, Carlsson and Hallgren (1998) use their stream processors in ways which would not be accepted if the type SP were used in Agda—but I find it easier to understand how and why programs work when induction and coinduction are separated as in this paper. The use of mixed induction and coinduction has been known for a long time (Park 1980), but does not seem to be well-known among functional programmers. It is my hope that this paper provides a compelling example of the use of this technique.

The parser sat p matches a single token t iff p t evaluates to just x, for some x; the value returned is x. The initial bag function ok-bag is not inferred by Agda. However, the right-hand sides of ok-bag, and the initial bag of sat, are inferred. The example in the introduction uses the derived combinators tok and number. The parser tok, which accepts a given token, is easy to define using sat (assuming that equality of tokens can be decided using the function == ): tok : Tok → Parser Tok tok t = sat (λ t0 → if t == t0 then just t0 else nothing) Given a parser for digits (which is easy to define using sat) the parser number, which accepts an arbitrary non-negative number, can also be defined:

Acknowledgements I would like to thank Ulf Norell for previous joint work on total parsing, and for improving Agda’s unification mechanism. Another person who deserves thanks is Thorsten Altenkirch, with whom I have had many discussions about mixed induction and coinduction. Thorsten also suggested that I should allow left recursive parsers, which I might otherwise not have tried, and gave feedback which improved the presentation; such feedback was also given by several anonymous reviewers. Finally I would like to acknowledge financial support from EPSRC and the Royal Swedish Academy of Sciences’ funds (EPSRC grant code: EP/E04350X/1).

number : Parser N number = digit + > >= return ◦ foldl (λ n d → 10 ∗ n + d) 0 Here foldl is a left fold for lists, and p + parses one or more ps (as in Section 3.1). The examples above are quite small; larger examples can also be constructed. For instance, Danielsson and Norell (2009) construct mixfix operator parsers using a parser combinator library which is based on some of the ideas described here.

5.

Set) : Set where : (S → ∞ (P0 S A)) → P0 S A : P0 S A : A → P0 S A → P0 S A

Conclusions

References

A parser combinator library which handles left recursion and guarantees termination of parsing has been presented, and it has been established that the library is sufficiently expressive: every finitely ambiguous parser on finite input strings which can be implemented using the host language can also be realised using the combinators. I believe that the precise treatment of induction and coinduction which underlies the definition of the parser combinators gives a

The Agda Team. The Agda Wiki. Available at http://wiki.portal. chalmers.se/agda/, 2010. Thorsten Altenkirch and Nils Anders Danielsson. Termination checking in the presence of nested inductive and coinductive types. Note supporting presentation given at the Workshop on Partiality and Recursion in Interactive Theorem Provers, Edinburgh, UK, 2010.

295

Arthur Baars, S. Doaitse Swierstra, and Marcos Viera. Typed transformations of typed grammars: The left corner transform. In Preliminary Proceedings of the Ninth Workshop on Language Descriptions Tools and Applications, LDTA 2009, pages 18–33, 2009.

Pieter Koopman and Rinus Plasmeijer. Efficient combinator parsers. In IFL’98: Implementation of Functional Languages, volume 1595 of LNCS, pages 120–136, 1999. Adam Koprowski and Henri Binsztok. TRX: A formally verified parser interpreter. In Programming Languages and Systems, 19th European Symposium on Programming, ESOP 2010, volume 6012 of LNCS, pages 345–365, 2010.

Marcello Bonsangue, Jan Rutten, and Alexandra Silva. A Kleene theorem for polynomial coalgebras. In Foundations of Software Science and Computational Structures, 12th International Conference, FOSSACS 2009, volume 5504 of LNCS, pages 122–136, 2009.

Dexter Kozen. On Kleene algebras and closed semirings. In Mathematical Foundations of Computer Science 1990, volume 452 of LNCS, pages 26–47, 1990.

Kasper Brink, Stefan Holdermans, and Andres L¨oh. Dependently typed grammars. In Mathematics of Program Construction, Tenth International Conference, MPC 2010, volume 6120 of LNCS, pages 58–79, 2010.

Daan Leijen and Erik Meijer. Parsec: Direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-35, Department of Information and Computing Sciences, Utrecht University, 2001.

Janusz A. Brzozowski. Derivatives of regular expressions. Journal of the ACM, 11(4):481–494, 1964.

Paul Lickman. Parsing with fixed points. Master’s thesis, University of Cambridge, 1995.

William H. Burge. Recursive Programming Techniques. Addison-Wesley, 1975.

Peter Ljungl¨of. Pure functional parsing; an advanced tutorial. Licentiate thesis, Department of Computing Science, Chalmers University of Technology and G¨oteborg University, 2002.

Magnus Carlsson and Thomas Hallgren. Fudgets – Purely Functional Processes with applications to Graphical User Interfaces. PhD thesis, Chalmers University of Technology and G¨oteborg University, 1998.

Antoni W. Mazurkiewicz. A note on enumerable grammars. Information and Control, 14(6):555–558, 1969. Conor McBride and James McKinna. Seeing and doing. Presentation (given by McBride) at the Workshop on Termination and Type Theory, Hind˚as, Sweden, 2002. Conor McBride and Ross Paterson. Applicative programming with effects. Journal of Functional Programming, 18:1–13, 2008. Erik Meijer. Calculating Compilers. PhD thesis, Nijmegen University, 1992. Paul Francis Mendler. Inductive Definition in Type Theory. PhD thesis, Cornell University, 1988. Muad`Dib. Strongly specified parser combinators. Post to the Muad`Dib blog, 2009. Ulf Norell. Towards a practical programming language based on dependent type theory. PhD thesis, Chalmers University of Technology and G¨oteborg University, 2007. David Park. On the semantics of fair parallelism. In Abstract Software Specifications, volume 86 of LNCS, pages 504–526, 1980. Frank Pfenning. Unification and anti-unification in the calculus of constructions. In Proceedings of the Sixth Annual IEEE Symposium on Logic in Computer Science, pages 74–85, 1991. J.J.M.M. Rutten. Automata and coinduction (an exercise in coalgebra). In CONCUR’98, Concurrency Theory, 9th International Conference, volume 1466 of LNCS, pages 547–554, 1998. Niklas R¨ojemo. Garbage collection, and memory efficiency, in lazy functional languages. PhD thesis, Chalmers University of Technology and University of G¨oteborg, 1995. Marvin Solomon. Theoretical Issues in the Implementation of Programming Languages. PhD thesis, Cornell University, 1977. S. Doaitse Swierstra and Luc Duponcheel. Deterministic, error-correcting combinator parsers. In Advanced Functional Programming, volume 1129 of LNCS, pages 184–207, 1996.

Koen Claessen. Embedded Languages for Describing and Verifying Hardware. PhD thesis, Chalmers University of Technology, 2001. Koen Claessen. Parallel parsing processes. Journal of Functional Programming, 14:741–757, 2004. Koen Claessen and David Sands. Observable sharing for functional circuit description. In Advances in Computing Science — ASIAN’99, volume 1742 of LNCS, pages 62–73, 1999. Thierry Coquand. Infinite objects in type theory. In Types for Proofs and Programs, International Workshop TYPES ’93, volume 806 of LNCS, pages 62–78, 1994. Nils Anders Danielsson. Beating the productivity checker using embedded languages. In Workshop on Partiality and Recursion in Interactive Theorem Provers, Edinburgh, UK, 2010. Nils Anders Danielsson and Thorsten Altenkirch. Subtyping, declaratively: An exercise in mixed induction and coinduction. In Mathematics of Program Construction, Tenth International Conference, MPC 2010, volume 6120 of LNCS, pages 100–118, 2010. Nils Anders Danielsson and Ulf Norell. Structurally recursive descent parsing. Unpublished note, 2008. Nils Anders Danielsson and Ulf Norell. Parsing mixfix operators. To appear in the proceedings of the 20th International Symposium on the Implementation and Application of Functional Languages (IFL 2008), 2009. Jon Fairbairn. Making form follow function: An exercise in functional programming style. Software: Practice and Experience, 17(6):379–386, 1987. Jeroen Fokker. Functional parsers. In Advanced Functional Programming, volume 925 of LNCS, pages 1–23, 1995. Richard A. Frost, Rahmatullah Hafiz, and Paul Callaghan. Parser combinators for ambiguous left-recursive grammars. In PADL 2008: Practical Aspects of Declarative Languages, volume 4902 of LNCS, pages 167– 181, 2008. Tatsuya Hagino. A Categorical Programming Language. University of Edinburgh, 1987.

Wouter Swierstra. A Hoare logic for the state monad. In Theorem Proving in Higher Order Logics, 22nd International Conference, TPHOLs 2009, volume 5674 of LNCS, pages 440–451, 2009.

PhD thesis,

Peter Hancock, Dirk Pattinson, and Neil Ghani. Representations of stream processors using nested fixed points. Logical Methods in Computer Science, 5(3:9), 2009.

Philip Wadler. How to replace failure by a list of successes; a method for exception handling, backtracking, and pattern matching in lazy functional languages. In Functional Programming Languages and Computer Architecture, volume 201 of LNCS, pages 113–128, 1985. Philip Wadler, Walid Taha, and David MacQueen. How to add laziness to a strict language, without even being odd. In Proceedings of the 1998 ACM SIGPLAN Workshop on ML, 1998. Malcolm Wallace. Partial parsing: Combining choice with commitment. In IFL 2007: Implementation and Application of Functional Languages, volume 5083 of LNCS, pages 93–110, 2008.

R. John M. Hughes and S. Doaitse Swierstra. Polish parsers, step by step. In ICFP ’03: Proceedings of the eighth ACM SIGPLAN international conference on Functional programming, pages 239–248, 2003. Graham Hutton. Higher-order functions for parsing. Journal of Functional Programming, 2:323–343, 1992. Mark Johnson. Memoization in top-down parsing. Computational Linguistics, 21(3):405–417, 1995. Oleg Kiselyov. Parsec-like parser combinator that handles left recursion? Message to the Haskell-Cafe mailing list, December 2009.

296

Scrapping your Inefficient Engine: Using Partial Evaluation to Improve Domain-Specific Language Implementation Edwin C. Brady

Kevin Hammond

School of Computer Science, University of St Andrews, St Andrews, Scotland. Email: eb,[email protected]

Abstract

ness of the interpreter. However, the resulting implementation is unlikely to be efficient. In contrast, code generation gives an efficient implementation, but it is harder to verify its correctness, harder to add new features to the EDSL, and can be harder to exploit features of the host language. In this paper, we consider how partial evaluation can be used to achieve an EDSL implementation that is simple to write, straightforward to verify, and efficient. While the theoretical benefits of partial evaluation have been extensively covered in the literature e.g. [8, 16, 20, 37], there are very few practical examples (the work of Seefried et el. on Pantheon [34] is a notable exception). This is because it can be difficult to use partial evaluation effectively — several issues must be dealt with, including binding-time improvements, function calls, recursion, code duplication, and management of side-effects. Since these issues must usually be dealt with by the applications programmer, they limit the practical benefits of partial evaluation and so limit its widespread adoption. We argue here that partial evaluation can be a highly effective technique for implementing efficient EDSLs, allowing us to specialise the EDSL interpreter with respect to the EDSL source program. We also argue that this approach allows us to reason easily about the correctness of our implementation.

Partial evaluation aims to improve the efficiency of a program by specialising it with respect to some known inputs. In this paper, we show that partial evaluation can be an effective and, unusually, easy to use technique for the efficient implementation of embedded domain-specific languages. We achieve this by exploiting dependent types and by following some simple rules in the definition of the interpreter for the domain-specific language. We present experimental evidence that partial evaluation of programs in domain-specific languages can yield efficient residual programs whose performance is competitive with their Java and C equivalents and which are also, through the use of dependent types, verifiably resource-safe. Using our technique, it follows that a verifiably correct and resource-safe program can also be an efficient program. Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifications—Applicative (functional) Languages; D.3.4 [Programming Languages]: Processors—Compilers General Terms

Languages, Verification, Performance

Keywords Dependent Types, Partial Evaluation

1.

Introduction

1.1

This paper reconsiders the use of partial evaluation [20] for implementing embedded domain-specific languages (EDSLs) [19]. Partial evaluation is a well-known technique that aims to improve the efficiency of a program by automatically specialising it with respect to some known inputs. Embedded domain-specific languages embed specialist languages for some problem domain in a generalpurpose host language. By reusing features from the host language, EDSLs can be implemented much more rapidly than their standalone equivalents, and can take advantage of compiler optimisations and other implementation effort in the host language. A common approach to EDSL implementation in functional languages such as Haskell is to design an abstract syntax tree (AST) which captures the required operations and properties of the domain [14, 24], and then either to implement an interpreter on this AST or to use a code generator targeting e.g. C or LLVM [23]. Directly interpreting a syntax tree is a simple, lightweight approach, and it is relatively straightforward to verify the functional correct-

Contributions

Our main contribution is a new study of the practical limits of partial evaluation as a technique for realistic EDSL implementation. It is folklore that we have a choice between writing verifiably correct, but inefficient, code and more efficient, but potentially incorrect, code. Sadly, at this point in time, pragmatic software developers will generally make the latter choice. In this paper, we make the following key claim and support it with experimental evidence: There is no need for correctness to be at the expense of efficiency. We make the following specific contributions: • we give experimental evidence that by partially evaluating an

interpreter for the EDSL, the EDSL implementation compares favourably with Java, and is not significantly worse than C; • we give concrete rules for defining an interpreter for an EDSL

so as to gain the maximum benefit from partial evaluation; • we describe the implementation of realistic, state-aware EDSLs

using dependent types. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00. Copyright

Although the techniques we describe apply equally to other dependently typed languages, we will use I DRIS1 as a host language. I DRIS is a pure functional programming language with fullspectrum dependent types. Throughout this paper, we will identify 1 http://www.idris-lang.org/

297

2.1

the features of I DRIS that assist both with EDSL implementation and with partial evaluation, and will also identify how the presence of dependent types affects standard partial evaluation techniques. In particular, in a language with full dependent types, there is a different view of the phase distinction between compile-time and run-time. Traditionally, this distinction is observed syntactically between types (which are easily erasable) and values. With dependent types, the distinction is semantic, between compile-time values (which are erasable by analysing data types [6]) and runtime values. This affects several aspects of the implementation, which we will identify here. Partial evaluation has been known for many years, and Futamura’s paper [16] on interpreter specialisation is now 39 years old. However, the technique is still not widely applied, because several problems arise when putting it into practice. We have found that these problems are either easily handled or simply do not arise when embedding DSLs in I DRIS, largely due to its type system. 1.2

The type theory underlying I DRIS is implemented by the I VOR theorem proving library. I DRIS provides a front-end (syntactic sugar), a back-end (a compiler), some primitive types and an interface to the I VOR theorem proving tools. I DRIS programs consist of functions over inductive data types, defined by dependent patternmatching. Terms, t, are defined as follows: t

Research Motivation: Resource-Aware Programming

Set x \x : t => t (x : t) -> t tt c D

Type of types Variables Abstraction Function space Function application Data constructor Inductive family

f : (x1 : T1 ) -> (x2 : T2 x1 ) -> . . . -> (xn : Tn x1 x2 . . . xn−1 ) -> T x1 x2 . . . xn f p1,1 p2,1 . . . pn,1 = t1 f p1,2 p2,2 . . . pn,2 = t2 ... f p1,m p2,m . . . pn,m = tm The type of each argument may be predicated on the values of previous arguments, and the return type may be predicated on the values of any of the arguments. A pattern, pi,j , may be an arbitrary term, but can only be matched if it is either a variable or is in head normal form. 2.2

Type Checking Dependent Types

Since full dependent types may include values, which may need to be reduced to normal form, type checking exploits the following rules3 , where Set represents the type of types: Γ ` A, B : Set Γ ` A ; β A0 Γ ` B ; β A0 Γ`A'B Γ ` A, B : Set Γ`x : A Γ`A'B Γ`x : B The context Γ records the types and values of all names in scope, as is standard. The first rule defines the conversion relation. If two terms A and B have a common redex A0 , then they are βconvertible. The second rule states that β-convertible terms can be interchanged within types. Two key implications of these rules are: 1. Implementing a type checker for a dependently-typed language requires an evaluator which can compute under binders. 2. In order to ensure termination of type checking (and therefore decidability), we must distinguish terms for which evaluation definitely terminates, and those for which it may not.

Idris and its Type Theory

We take a simple but effective approach to termination checking: any functions that do not satisfy a simple syntactic constraint on recursive calls will not be reduced by the type checker. The constraint we use is that each recursive call must have an argument that is structurally smaller than the input argument in the same position, and that these arguments must belong to a strictly positive data type. We check for totality by additionally ensuring that the patterns cover all possible cases.

I DRIS is an experimental functional programming language with dependent types, similar to Agda [29] or Epigram [26]. It has a Haskell-like syntax, but is evaluated eagerly. It compiles to C via a supercombinator compiler. I DRIS has monadic I/O in the style of Hancock and Setzer [18], and a simple foreign function interface. It is implemented on top of the I VOR theorem proving library [7], giving direct access to an interactive tactic-based theorem prover. This section explores the relationship between type checking and evaluation in dependently-typed languages, such as I DRIS. 2 Joe

::= | | | | | |

The type theory possesses full-spectrum dependent types, where the definition of terms also captures the definition of types. As a convention, we use T to stand for a term which is to be used as a type. A function definition consists of a type declaration, followed by a number of pattern-matching clauses. We use p to stand for a term which is to be used as a pattern:

The underlying motivation for our research is a desire to reason about extra-functional properties of programs, that is how programs behave in terms of their usage of finite resources such as memory and execution time, and how that behaviour affects the end-user, e.g. through proper management of files and exceptions. Ensuring that essential extra-functional properties are met is vitally important to writing software that is useful in practice. In fact, a respected industrial language designer once confided to us that “nobody in industry really cares whether a program does what it’s supposed to – what they are really concerned with is how the program behaves when it is run”2 . This concern is reflected in the major software failures that we see reported in the national and international press. Many of the most significant problems with software behaviour boil down to poor usage of resources: software can fail because of buffer overflows, because of deadlocks, because of memory leaks, because of inadequate checking (e.g. that a file handle has the correct mode), because it fails to meet hard real-time deadlines, and in many other ways that are not direct consequences of the functional properties of the software. The EDSL approach, using a dependently-typed host language, allows us to define notations in which extra-functional resource usage properties are stated explicitly in a program’s type. Since we are primarily concerned with ease of reasoning and verification, we take the simplest possible approach to EDSL implementation, via an interpreter. Our hypothesis is that first defining an interpreter and then specialising it with respect to EDSL programs is efficient enough for practical purposes. The definition of “efficient enough” is, of course, open to interpretation and it may be hard to evaluate whether it has been met. For the purposes of this paper, we will call a program “efficient enough” if it is similar in speed and memory footprint to an equivalent hand-written Java program

2.

The Core Type Theory

3 The

full type-checking rules for core I DRIS may be found elsewhere [8]. The type checker follows standard methods for implementing a dependently typed lambda calculus [11, 25].

Armstrong, Erlang designer, personal communication.

298

EJSetK EJx K EJ\x : T => tK EJ(x : T1 ) -> T2 K EJt1 t2 K EJcK EJDK

= = = = = = =

vadd : (n:Nat) -> Vect Int n -> Vect Int n -> Vect Int n; vadd O (VNil Int) (VNil Int) = VNil Int; vadd (S k) ((::) Int k x xs) ((::) Int k y ys) = (::) Int k (x + y) (vadd k xs ys);

Set x \x : EJT K => EJtK (x : EJT1 K) -> EJT2 K F JEJt1 K t2 K c D

The values for implicit arguments, in types, patterns and function applications, are inferred from explicit values by unification. For each implicit argument I DRIS adds a place holder ( ) to the term in the type theory, which is then filled in by I VOR’s type checker. It is important to consider fully explicit terms when type checking and reasoning about meta-theory. A programmer, however, need not be concerned that these additional arguments affect performance: the back end erases computationally irrelevant information [6, 9]. In the case of Vect and vadd above, all implicit arguments are erased from the definition of Vect, and implicit arguments to vadd are marked as unused, so a dummy value is passed instead. This is an instance of the phase distinction between compiletime and run-time affecting the implementation. Implicit arguments, although they are available at run-time, are normally present primarily for type correctness and unused at run-time. In a function ~ ) -> T 0 we consider an argument xi unused if both f : (x : T of the following conditions hold:

F J(\x : T => t1 ) t2 K = EJt1 [t2 /x ]K F Jf ~tK = EJφ(e)K f ~p = e defined in Γ if ~ (~p , EJ~tK) Yes φ = MATCH F JtK = t MATCH MATCH MATCH

~ (~p , ~t) (c ~p ) (c ~t) = MATCH x t = Yes [t/x ] = No

Figure 1. Sketch of the evaluation function EJ·K

2.3

1. It is implicit (i.e. its value can be determined by the value of another argument at compile-time by unification).

Evaluation

The evaluator used by the type checker implements β-reduction and pattern matching. A sketch of the evaluation function EJ·K is given in Figure 1. To keep the presentation simple, the sketch uses a substitution-based approach. In practice, however, for efficiency reasons we will use an environment and de Bruijn indexed variables. In the evaluator we use ~t to denote a telescope of arguments t1 . . . tn . We define a function F J·K, which evaluates a function application, using β-reduction and pattern-matching definitions, where possible, and an overall evaluation function EJ·K which effectively implements structural closure of function application. MATCH implements pattern matching of an argument against a pat~ tern, returning a substitution if matching succeeds, with MATCH being the obvious lifting across a telescope of arguments. 2.4

2. It is not used on the right hand side of the definition, except in an unused argument position. The first condition ensures that the argument’s value will not affect case distinction — another argument suffices. The second condition ensures that it will not be used for case distinction elsewhere. 2.5

Type Checking and Partial Evaluation

There is an important implication of the typing rules: If a language has full dependent types, it requires an evaluator. This evaluator could take several forms, but it must provide compile-time β-reduction and pattern matching to implement conversion, from which it is simple to extend to full normalisation. So, if we have a function f with statically-known arguments s and dynamic arguments d, we can create a specialised version (the residual program) by normalising an expression of the form:

Implicit Arguments

I DRIS adds a layer of syntactic sugar to the core type theory, including implicit arguments. Where the value of an argument can be determined from its type, or from another argument, it can be omitted. For example, vectors (lists with the length expressed in the type) are defined as follows:

\d => f s d

A standard example (e.g. [38]) is the power function: power : Int -> Nat -> Int; power x O = 1; power x (S k) = x * power x k;

data Nat = O | S Nat; infixr 5 ::; data Vect : Set -> Nat -> Set where VNil : Vect A O | (::) : A -> Vect A k -> Vect A (S k);

We can specialise the power function for a particular exponent. For example, if the first argument x is dynamic, and the second argument has the statically-known value S (S (S (S O))) — i.e. 4 — the I DRIS built-in evaluator, using β-reduction and pattern matching alone, gives us the residual program:

The VNil constructor has an implicit argument A, and the :: constructor has two implicit arguments, A and k. Written out in full, the types are:

\ x => power x (S (S (S (S O)))) ==> \ x : Int => x*(x*(x*(x*1))) : Int -> Int

VNil : (A:Set) -> Vect A O (::) : (A:Set) -> (k:Nat) -> A -> Vect A k -> Vect A (S k);

In the rest of this paper, we will make extensive use of this built-in evaluator, and introduce some simple methods that can be used to control its behaviour in order to make partial evaluation effective.

Similarly, we can leave arguments implicit in function definitions: vadd : Vect Int n -> Vect Int n -> Vect Int n; vadd VNil VNil = VNil; vadd (x :: xs) (y :: ys) = (x + y) :: (vadd xs ys);

3.

Embedding Languages in Idris

In this section, we demonstrate how we can implement EDSLs efficiently using partial evaluation by showing the implementation of a simple expression language embedded in I DRIS.

Written in full, using the prefix form of ::, this would be:

299

data Ty = TyInt | TyFun Ty Ty;

interp interp interp interp interp interp

interpTy : Ty -> Set; interpTy TyInt = Int; interpTy (TyFun A T) = interpTy A -> interpTy T; data Fin : Nat -> Set where fO : Fin (S k) | fS : Fin k -> Fin (S k);

Figure 3. Expression Language Interpreter

using (G:Vect Ty n) { data Expr : (Vect Ty n) -> Ty -> Set where Var : (i:Fin n) -> Expr G (vlookup i G) | Val : (x:Int) -> Expr G TyInt | Lam : Expr (A::G) T -> Expr G (TyFun A T) | App : Expr G (TyFun A T) -> Expr G A -> Expr G T | Op : (interpTy A -> interpTy B -> interpTy C) -> Expr G A -> Expr G B -> Expr G C; }

data Ty = TyInt | TyBool | TyFun Ty Ty; interpTy TyBool = Bool; data Expr : (Vect Ty n) -> Ty -> Set where ... | If : Expr G TyBool -> Expr G A -> Expr G A -> Expr G A; interp env (If v t e) = if (interp env v) then (interp env t) else (interp env e);

Figure 2. The Simple Functional Expression Language, Expr.

3.1

: Env G -> Expr G T -> interpTy T; env (Var i) = envLookup i env; env (Val x) = x; env (Lam sc) = \x => interp (Extend x env) sc; env (App f a) = interp env f (interp env a); env (Op op l r) = op (interp env l) (interp env r);

Figure 4. Booleans and If construct

A Simple Expression EDSL, Expr

A common introductory example for dependently-typed languages is a well-typed interpreter [2, 8, 31], where the type system ensures that only well-typed source programs can be represented and interpreted. Figure 2 defines a simple functional expression language, Expr, with integer values and operators. The using notation indicates that G is an implicit argument to each constructor, with type Vect Ty n. Terms of type Expr are indexed by i) a context (of type Vect Ty n), which records types for the variables that are in scope; and ii) the type of the term (of type Ty). The valid types (Ty) are integers (TyInt) or functions (TyFun). We define terms to represent variables (Var), integer values (Val), lambda-abstractions (Lam), function calls (App), and binary operators (Op). Types may either be integers (TyInt) or functions (TyFun), and are translated to I DRIS types using interpTy. Our definition of Expr also states its typing rules, in some context, by showing how the type of each term is constructed. For example, Val : (x:Int) -> Expr G TyInt indicates that literal values have integer types (TyInt), and Var : (i:Fin n) -> Expr G (vlookup i G) indicates that the type of a variable is obtained by looking up i in context G. For any term, x, we can read x : Expr G T as meaning “x has type T in the context G”. Expressions in this representation are well-scoped, as well as well-typed. Variables are represented by de Bruijn indices, which are guaranteed to be bounded by the size of the context, using i:Fin n in the definition of Var. A value of type Fin n is an element of a finite set of n elements, which we use as a reference to one of n variables. In order to evaluate this language, we will need to keep track of the values of all variables that are in scope. Environments, Env, allow us to link a vector of types with instances of those types. They are indexed by a vector of types Vect Ty n. Each element in the environment corresponds to an element in the vector:

The full interpreter for Expr is given in Figure 3. Note that its return type depends on the type of the expression to be interpreted. This is a significant benefit of dependent types for language implementation — there is no need to tag the result of the interpreter with a type, because its type is known from the input program. 3.2

Example Programs

We can now define some simple example functions. We define each function to work in an arbitrary context G, which allows it to be applied in any subexpression in any context. Our first example function adds its integer inputs using the I DRIS + primitive. add : Expr G (TyFun TyInt (TyFun TyInt TyInt)); add = Lam (Lam (Op (+) (Var (fS fO)) (Var fO)));

We can use add to define the double function: double : Expr G (TyFun TyInt TyInt); double = Lam (App (App add (Var fO)) (Var fO));

Now, runDouble applies double to an argument: runDouble : Expr VNil TyInt; runDouble = App double (Val 21);

We run this program by interpreting it in an empty environment: interp Empty runDouble ==> 42 : Int

Running the interpreter yields a host language representation of the EDSL program. In the example above, the program had type TyInt, so the value returned was of type Int. The value to be returned is computed from the representation type of the expression. So, if we were to evaluate something with a function type, we would obtain an I DRIS function. For example, for add:

data Env : Vect Ty n -> Set where Empty : Env VNil | Extend : (res:interpTy T) -> Env G -> Env (T :: G);

We provide operations to lookup/update an environment, corresponding to lookup and update on vectors:

interp Empty add ==> \ x : Int => \ x0 : Int => x+x0 : Int -> Int -> Int

vlookup : (i:Fin n) -> Vect A n -> A; envLookup : (i:Fin n) -> Env G -> interpTy (vlookup i G);

3.3 update : (i:Fin n) -> A -> Vect A n -> Vect A n; updateEnv : Env G -> (i:Fin n) -> interpTy T -> Env (update i T G);

Control structures and recursion

To make Expr more realistic, we will add boolean values and an If construct, and attempt to write a recursive function. Our extensions are shown in Figure 4. We can now define a factorial function:

300

fact : Expr G (TyFun TyInt TyInt); fact = Lam (If (Op (==) (Val 0) (Var fO)) (Val 1) (Op (*) (Var fO) (App fact (Op (-) (Var fO) (Val 1)))));

PJSetK PJx K PJ\x : T => tK PJ(x : T1 ) -> T2 K PJt1 t2 K PJcK PJDK

Unfortunately, we cannot specialise an interpreter with respect to this definition: it is not structurally recursive, and if we try to evaluate it, the recursive call to fact will unfold forever. Evaluation of the interpreter with this definition will only terminate if it is given a concrete argument and the recursive call is evaluated lazily.

4.

Set x \x : PJT K => PJtK (x : PJT1 K) -> PJT2 K F 0 JPJt1 K t2 K c D

F 0 J(\x : T => t1 ) t2 K = PJt1 [t2 /x ]K F 0 Jf ~tK = hf, ~ts i ~td ~ → T0 if f : T f ~p = e defined in Γ ~ (~ps , PJ~ts K) Yes φ = MATCH ~ d ) → φ(T ~ 0) add hf, ~ts i : φ(T hf, ~ts i φ(~pd ) = PJφ(e)K to Γ F 0 JtK = t

Partial Evaluation

In general, a partial evaluator produces specialised versions of functions where some arguments are statically known. We are interested in the instance where the function to specialise is an EDSL interpreter, and the statically known argument is an input program. As we have seen, evaluating the interpreter with specific input programs yields specialised versions of those programs in the host language. This is, of course, not surprising: it is the first Futamura projection [16] — specialising an interpreter for given source code yields an executable. However, two problems arise if we simply use the standard evaluator:

MATCH MATCH MATCH

~ (~p , ~t) (c ~p ) (c ~t) = MATCH x t = Yes [t/x ] = No Figure 5. Partial Evaluator

1. Recursive programs with dynamically known inputs cannot be specialised since recursive calls are unfolded arbitrarily deeply;

from a static argument may also be static. In this case, T can be uniquely inferred from the expression. G may not be considered static, because it can also be inferred from the (dynamic) environment. We therefore define the static arguments in an application to be those in a position annotated as [static] or uniquely inferrable from an argument in a position annotated as [static], and in head normal form, or a global definition. The evaluator constructs a new version of f, hf, ~ts i, that is specialised with the static arguments ~ts , reusing this definition if it already exists. The type of hf, ~ts i is constructed by specialising the type according to the statically known values — these values may appear in the dynamic portion of the type. This new definition is added, permanently, to the global context Γ. Effectively, this caches the intermediate result of a computation with specific static arguments. In this way, we abstract away multiple calls to a given specialised definition. In particular, this means that specialising the interpreter with fact will result in a recursive I DRIS program. The method we use for specialising function applications, namely extending the environment with cached versions of partially evaluated functions, is a standard technique [13]. The new definitions are well-typed and preserve the semantics of the original program. However, some care is required in the presence of fully explicit dependently typed programs, because the values of implicit arguments may make a definition less generic than required. When constructing a new function hf, ~ts i we aim to produce the most generic definition possible. We achieve this by replacing any implicit, unused arguments with a place holder before adding the definition, as demonstrated by the example below.

2. Evaluating completely, unfolding every function definition, can lead to loss of sharing. The first problem is illustrated by fact, in which the number of times to unfold the recursive definition is dynamically known. The second problem is illustrated by an application of double to a complex expression: doubleBig : Expr VNil TyInt; doubleBig = App double complexFn;

Assuming complexFn evaluates to complexExpr, evaluating doubleBig gives: complexExpr + complexExpr

The large expression complexFn is evaluated twice in the residual code, where it would make more sense to evaluate it once, and then pass its result to the evaluated version of double. This is a similar problem to that which arises when inlining functions [32] — there is needless duplication of work in the evaluated code. 4.1

= = = = = = =

A Partial Evaluator for Idris

We can significantly improve the results of partial evaluation by taking care with function application. A sketch of the partial evaluator, PJ·K, is given in Figure 5. This mostly follows the standard evaluator rules given previously, but differs in the F 0 J·K function used to evaluate function definitions. The partial evaluator PJ·K is implemented relative to a context Γ, but unlike the regular evaluator EJ·K it updates Γ during evaluation. When applying a function f, we separate its arguments ~t into those which are statically known (i.e. known at compile-time), ~ts , and those which are dynamic, ~td (i.e. which will be known at run-time). In I DRIS, we make this distinction through programmer annotations. These annotations define which functions should be partially evaluated, and which arguments of those functions should be treated as static. For interp, we declare the function as follows:

Example — Factorial When the interpreter is partially evaluated with fact as a static argument, according to the scheme in Figure 5, we obtain: interp : (n:Nat) -> (T:Ty) -> (G:Vect Ty n) -> Env G -> Expr G T -> interpTy T fact : (n:Nat) -> (G:Vect Ty n) -> Expr G (TyFun TyInt TyInt)

interp : Env G -> Expr G T [static] -> interpTy T;

For the remainder of this section, we write applications in fully explicit form. interp has implicit arguments for the expression type and context, and fact has implicit arguments for its initial context.

The [static] annotation on the expression argument indicates that the expression may be static in any application of interp. Additionally, any implicit argument which can be uniquely inferred

301

data Ty = TyUnit | TyBool | TyLift Set;

Partially evaluating the interpreter happens as follows. The type and expression are in static argument positions, so the evaluator creates a new definition interpfact that has been specialised according to these arguments and adds it to the context:

interpTy interpTy interpTy interpTy

interp O (TyFun TyInt TyInt) VNil Empty (fact O VNil) ==> interpfact O VNil Empty

data Imp : Ty -> Set where ACTION : IO a -> Imp (TyLift a) | RETURN : interpTy a -> Imp a | WHILE : Imp TyBool -> Imp TyUnit -> Imp TyUnit | IF : Bool -> Imp a -> Imp a -> Imp a | BIND : Imp a -> (interpTy a -> Imp b) -> Imp b

The new definition replaces arguments in implicit positions in the original application with place holders: interpfact : (n:Nat) -> (G:Vect Ty n) -> Env G -> Int -> Int; interpfact n G e = interp _ _ _ _ (fact _ _);

Figure 6. A Simple Imperative EDSL, Imp.

Type checking this leads to the following definition with the implicit arguments filled in:

interp : Imp a [static] -> IO (interpTy a); interp (ACTION io) = io; interp (RETURN val) = return val; interp (WHILE add body) = while (interp add) (interp body); interp (IF v thenp elsep) = if v then (interp thenp) else (interp elsep); interp (BIND code k) = do { v <- interp code; interp (k v); };

interpfact : (n:Nat) -> (G:Vect Ty n) -> Env G -> Int -> Int; interpfact n G e = interp n (TyFun TyInt TyInt) G e (fact n G);

The next step is to partially evaluate the definition of interpfact. Eventually, this reaches another call to interp: interpfact : (n:Nat) -> (G:Vect Ty n) -> Env G -> Int -> Int; interpfact n G e = \ x : Int => if (0==x) then 1 else interp (S n) (TyFun TyInt TyInt) (TyInt::G) (Extend x e) (fact (S n) G) (x-1);

Figure 7. Interpreter for Imp.

5.

EDSLs with State

The Expr language and its interpreter demonstrate our general approach to EDSL implementation:

When creating a specialised version of interp there is already a suitable definition of interpfact that can be reused: interpfact : (n:Nat) -> (G:Vect Ty n) -> Env G -> Int -> Int; interpfact n G e = \ x : Int => if (0==x) then 1 else interpfact (S n) (TyInt::G) (Extend x e) (x-1);

1. Define the data type for the EDSL in the host language; 2. Write an interpreter to evaluate this type, in the host language; 3. Specialise the interpreter with respect to concrete programs. In practice, however, the languages that interest us will not be as simple as Expr. Real programs have state, they may communicate across a network, they may need to read and write files, allocate and free memory, or spawn new threads and processes. To be usable in practice we need to ensure that we can deal with such issues. In this section, we implement a simple EDSL for file manipulation, which demonstrates how our approach can deal with external state, side effects such as I/O, and imperative features in general.

This definition builds a context and an environment, which are unused (as defined in Section 2.4). I DRIS notes this and replaces these arguments with dummy place holder values: interpfact : (n:Nat) -> (G:Vect Ty n) -> Env G -> Int -> Int; interpfact _ _ _ = \ x : Int => if (0==x) then 1 else interpfact _ _ _ (x-1);

4.2

: Ty -> Set; TyBool = Bool; TyUnit = (); (TyLift A) = A;

5.1

Why Taglessness Matters

A Simple Imperative EDSL, Imp

Figure 6 describes a simple imperative EDSL, Imp, which includes a means of embedding arbitrary I/O operations (ACTION) and pure values (RETURN), WHILE and IF statements, and a monadic bind operation (BIND) for sequencing statements. Types in Imp are the unit type, TyUnit, for operations which do not return any value, such as writing to a file; a boolean type, TyBool for intermediate results in control structures, and lifted host language types, TyLift, which allow host language functions and arbitrary I/O actions to be embedded in EDSL programs. Figure 7 gives an interpreter.

As mentioned above, one important feature of an interpreter that has been defined in this style is that there is no need to tag the return value with its type, because the type can be computed in advance. This is a common feature of interpreters in dependentlytyped languages [2, 8, 31]. Effectively, we use the type checker for the host language to check the terms in the object language, so that there is no need to check types dynamically. This leads us to a concrete rule, to be followed by the EDSL author, for maximising the effect of partial evaluation:

5.2

Rule 1: Index the EDSL representation by its type, to avoid needing to tag the result.

A File Management EDSL, File

So far, we have indexed our language representations by the type of the programs they represent. This allows us to exploit the host language’s type checker to ensure that EDSL programs are welltyped. We can take this idea much further with a dependently-typed host language such as I DRIS, and index EDSL representations not only by the program’s type, but also by other properties such as the states of the resources that it uses. To demonstrate this, we implement an EDSL for file management, designed so that typecorrect programs have the following informally-stated properties:

The tag elimination problem, in which an evaluator aims to move type checking of intermediate results in the interpreter from runtime to compile-time, has been extensively studied in the partial evaluation literature [10, 21, 39, 40]. The presence of dependent types simplifies the implementation of a tagless interpreter greatly, in that we are able to write it in a natural style (avoiding, for example, continuation passing style), with no post-processing required.

302

data File : Vect FileState n -> Vect FileState n’ -> Ty -> Set ...

interp : Env ts -> File ts ts’ T [static] -> IO (Env ts’ & interpTy T); ...

where

-- Updated control structures | WHILE : (File ts (File ts | IF : (a:Bool) (File ts

ts ts -> ts

TyBool) -> TyUnit) -> (File ts ts TyUnit) (File ts ts b) -> (File ts ts b) -> b)

interp env (WHILE test do { while (ioSnd (ioSnd return (env,

body) = (interp env test)) (interp env body)); II); };

-- File operations

-- File management operations | | | | |

interp env (OPEN p fpath) = do { fh <- fopen (getPath fpath) (pMode p); OPEN : (p:Purpose) -> (fd:Filepath) -> return (addEnd env (OpenFile fh), bound); }; (File ts (snoc ts (Open p)) (TyHandle (S n))) interp env (CLOSE i p) CLOSE : (i : Fin n) -> (OpenH i (getPurpose i ts) ts) -> = do { fclose (getFile p env); (File ts (update i Closed ts) TyUnit) return (updateEnv env i ClosedFile, II); }; GETLINE : (i:Fin n) -> (p:OpenH i Reading ts) -> interp env (GETLINE i p) (File ts ts (TyLift String)) = do { str <- fread (getFile p env); EOF : (i:Fin n) -> (p:OpenH i Reading ts) -> return (env, str); }; (File ts ts TyBool) interp env (EOF i p) PUTLINE : (i:Fin n) -> (str:String) -> = do { e <- feof (getFile p env); return (env, e); }; (p:OpenH i Writing ts) -> interp env (PUTLINE i str p) (File ts ts (TyUnit)); = do { fwrite (getFile p env) str; fwrite (getFile p env) "\n"; Figure 8. The File-Handling Language, File return (env, II); };

Figure 9. Interpreter for File. • Files must be open before they are read or written; • Files that are open for reading cannot be written, and files that ioSnd : IO (a & b) -> IO b; ioSnd p = do { p’ <- p; return (snd p’); };

are open for writing cannot be read; • Only open files can be closed;

interp env (WHILE test do { while (ioSnd (ioSnd return (env,

• All files must be closed on exit.

Our first attempt extends Imp. The language definition is given in Figure 8, and its interpreter in Figure 9. The interpreter returns a pair of the modified environment, and the value resulting from interpretation, where (S & T) is I DRIS notation for a pair of types S and T. Each command in the language has a more or less direct translation into a host language function. We index programs in File over their input and output states, as well as their type. This state is a vector that holds information about whether file handles are open or closed: data Purpose = Reading | Writing; data FileState = Open Purpose | Closed; data File : Vect FileState n -> Vect FileState n’ -> Ty -> Set

body) = (interp env test)) (interp env body)); II); };

Types in File include the types from Imp, extended with a file handle type, TyHandle, to carry the number of available file handles. data Ty = | | |

TyUnit TyBool TyLift Set TyHandle Nat;

-----

unit type booleans host language type a file handle

These types can be converted into host language types using an interpTy function, as before. In the interpreter, we will need to lookup concrete file handles from the environment, so it is convenient to use elements of finite sets as a concrete representation.

where

interpTy interpTy interpTy interpTy interpTy

Environments carry concrete file handles, if the file state indicates that a file is open: data FileHandle : FileState -> Set where OpenFile : (h:File) -> -- actual file FileHandle (Open p) | ClosedFile : FileHandle Closed;

: Ty -> Set; TyBool = Bool; TyUnit = (); (TyLift A) = A; (TyHandle n) = Fin n;

A program which is correct with respect to its file management operations will begin and end with no open file handles. We use FileSafe T as a notational convenience to represent a safe program which returns a value of type T:

data Env : Vect FileState n -> Set where Empty : Env VNil | Extend : (res:FileHandle T) -> Env G -> Env (T :: G);

FileSafe T = File VNil VNil T;

We use handles to declare, in advance, the number of file handles we will create, also as a notational convenience. The function allClosed returns a list of closed file handles, so that handles requires a program which closes all of the file handles it opens:

Since interpreting File threads an environment through the program, we also modify the interpreter so that imperative constructs such as while loops manage the environment. Since the type of WHILE indicates that the test and body of the loop cannot modify an environment, we call the interpreter recursively and discard the resulting environment:

handles : (x:Int) -> File VNil (allClosed x) T -> FileSafe T;

303

copyLine : Filepath -> Filepath -> FileSafe TyUnit; copyLine inf outf = handles 2 do { fh1 <- OPEN Reading inf; fh2 <- OPEN Writing outf; str <- GETLINE fh1 ?; PUTLINE fh2 str ?; CLOSE fh2 ?; CLOSE fh1 ?; };

copy : Filepath -> Filepath -> FileSafe TyUnit; copy i o = handles 2 do { fh1 <- OPEN Reading i; fh2 <- OPEN Writing o; WHILE (do { e <- EOF (handle fh1) ?; return (not e); }) (do { str <- GETLINE (handle fh1) ?; PUTLINE (handle fh2) str ?; }); CLOSE (handle fh1) ?; CLOSE (handle fh2) ?; };

Figure 10. Simple File program.

Figure 12. Copying a file line by line.

interpCopyLine : Filepath -> Filepath -> IO (); interpCopyLine inf outf = IODo (fopen (getPath inf) "r") (\ x : Ptr => IODo (fopen (getPath outf) "w") (\ x0 : Ptr => IODo (freadStr x) (\ x1 : String => IODo (fputStr x0 x1) (\ x2 : () => IODo (fputStr x0 "\n") (\ x3 : () => IODo (fclose x0) (\ x4 : () => IODo (fclose x) (\ x5 : () => IOReturn (Empty, II))))))));

We consider IO to be an EDSL for describing interaction with the operating system, and the run-time system to be its interpreter. This gives a clean separation between pure values and external operations, and means we can treat I DRIS evaluation as pure, even with side-effecting code. Why Partial Evaluation Worked As with the functional language example in Section 3.1, partial evaluation of our example program above yielded a residual program without any trace of the interpreter or environment. Firstly, in the host language, I/O operations remain pure, so there is no need to treat them specially. Secondly, we followed a simple rule:

Figure 11. Simple File program (interpreted).

5.3

Example 1: Copying a line

Our first example is a very simple program which reads a line from one file and writes it to another (Figure 10). I DRIS provides rebindable do-notation, which here uses the BIND and RETURN operators provided by File: do using (BIND, RETURN) {

...

Rule 2: The interpreter must only pattern match on the EDSL program to be translated. The reason for this rule is that the EDSL program is the only static argument, i.e. the only thing we know at compile-time. Everything else (including the environment and any additional arguments to the EDSL program) is dynamic, i.e. unknown until the program is run. Therefore, we write the interpreter so that it does not need to match dynamic values. Auxiliary functions may match them (indeed, they may need to), as long as we follow another rule:

}

GETLINE, PUTLINE and CLOSE each take an additional argument as a proof that the file is open. Since the files are known statically, these proofs are all straightforward. I DRIS provides hooks to the I VOR theorem prover [7] to allow these proofs to be completed, as well as a means to implement decision procedures. As before, we can specialise the interpreter with respect to this program, and obtain a version which calls the I/O operations directly, as in Figure 11. Despite adding an environment for external resources, and using side-effecting I/O operations, this partial evaluation proceeds smoothly. Since the interpreter returns a pair of the final environment and a value, the specialised version returns an empty environment. This is a single constructor, so not expensive, but it can easily be removed with a call to ioSnd, which can itself be specialised.

Rule 3: Auxiliary functions which match on dynamic data must not be passed EDSL code unless it has been interpreted first. These rules are necessary to eliminate any trace of the interpreter in the residual code, in that they prevent situations which will cause partial evaluation to get stuck, but they are not sufficient to guarantee the best results from partial evaluation, as we will see.

Input/Output implementation

5.4

The reason we do not have any difficulties with partial evaluation of I/O is because the evaluator does not execute I/O operations itself, but rather constructs an I/O tree explaining which operations will be executed at run-time. Like Haskell, I DRIS provides an IO monad. This is implemented in the style of Hancock and Setzer [18], where an I/O operation consists of a command followed by a continuation that defines how to process the response to that command:

In order to increase expressivity, File also includes while-loops and conditional expressions. For example, Figure 12 shows how our previous example can be extended to copy an entire file.

Example 2: Copying a file

Partial Evaluation — First attempt Unfortunately, there is a problem. Using our initial implementation of the interpreter to evaluate the copy program, the specialised program includes the fragment shown in Figure 13. At first, partial evaluation removes all of the interpreter and environment overhead. After translating the while-loop, however, the residual code still carries an environment. We have faithfully followed our three rules, so why does this happen? To understand this, we observe that there are several instances of bind in the residual program, including the following call: bind (while (IODo (feof x) ...) ...). However, bind matches on its first argument, so can be reduced only when its first argument is known. Since while will never be reduced by the partial evaluator, because it could loop forever, reduction cannot continue! But we know, both from the type of WHILE

data IO : Set -> Set where IOReturn : a -> (IO a) | IODo : (c:Command) -> (Response c -> IO a) -> (IO a);

I DRIS defines default Command and Response structures which allow simple interaction with the outside world, plus calls to C functions. We define a bind operation for sequencing I/O operations: bind : IO a -> (a -> IO b) -> IO b; bind (IOReturn a) k = k a; bind (IODo c p) k = IODo c (\x => (bind (p x) k));

304

interpCopy : Filepath -> Filepath -> IO (); interpCopy i o = IODo (fopen (getPath i) "r") (\ x : Ptr => IODo (fopen (getPath o) "w") (\ x0 : Ptr => bind (bind (bind (bind (bind (while (IODo (feof x)

interpCopy : Filepath interpCopy i o = IODo (\ x : Ptr => IODo (\ x : Ptr => IODo

-> Filepath -> IO (); (fopen (getPath i) "r") (fopen (getPath o) "w") (IOLift (while (IODo (feof x) (\ x1 : Int => IOReturn (x1==0))) (IODo (freadStr x) (\ x2 : String => IODo (fputStr x0 x2) (\ x3 : () => IODo (fputStr x0 "\n") (\ x4 : () => IOReturn II))))) (\ x5 : () => IODo (fclose x) (\ x6 : () => IODo (fclose x0) (\ x7 : () => IOReturn (Empty, II)))))));

{- ... while loop translation omitted ... -} (\ k0 : () => IOReturn (Extend (OpenFile (FHandle x)) (Extend (OpenFile (FHandle x0)) Empty), II))) {- ... further residual code omitted ... -}

Figure 14. Second (successful) attempt at partial evaluation. Figure 13. First (unsuccessful) attempt at partial evaluation. successfully partially evaluate an EDSL program if the program itself is statically known, and we have broken this by allowing bind to transform a program dynamically, based on the environment. Using ibind, the environment can be discarded statically. This leads to a further rule:

and the behaviour of the interpreter, exactly what the result of the loop will be (namely, a unit value and an unchanged environment): WHILE : File ts ts (TyLift Bool) -> File ts ts TyUnit -> File ts ts TyUnit

Rule 4: Ensure that EDSL program construction, generation and transformation can be evaluated statically.

interp env (WHILE test body) = do { while (ioSnd (interp env test)) (ioSnd (interp env body)); return (env,II); };

Adding IOLift allows more work to be done statically by giving bind a constructor form to evaluate in static position where it would not otherwise be available. Therefore this rule has a consequence specific to I DRIS programs using the Command/Response I/O system:

We should be able to persuade the evaluator to continue with what we know will be the result of the loop. The solution is to observe that bind can reduce if the I/O operation is in constructor form. We therefore include a lifting operation in the Command type:

Consequence: The result of interpreting a control structure should be bound with ibind rather than the default bind.

data Command : Set where ... | IOLift : (IO a) -> Command;

5.5

We also introduce an alternative bind operation: ibind : (IO a) -> (a -> (IO b)) -> (IO b); ibind c p = IODo (IOLift c) p;

A note on modularity

It is worth observing that the safety of the File EDSL requires that only File, its constructors and interp are exposed as file manipulation operations. If this were not the case, an EDSL programmer would be able to bypass the safety mechanisms given by the EDSL by invoking fopen, fread and other file manipulation functions directly. This can be achieved, as normal, by not exporting these names to the EDSL programmer.

Using ibind allows evaluation to continue as long as either the result of the operation is known statically, or it is unused. If we have an expression bind c k, for some arbitrary c which is not in constructor form, evaluation cannot proceed. On the other hand, using ibind to bind the result of the c which we know will not reduce, expands and evaluates as follows:

6.

bind (ibind c p) k ==> bind (IODo (IOLift c) p) k ==> IODo (IOLift c) (\x => bind (p x) k)

Experimental Results

To assess the value of our partial evaluation approach, we implemented several example programs as EDSLs, and measured their execution time and memory footprint before and after partial evaluation. The example programs that we used were:

Evaluation then proceeds with the inner bind, with x standing for the value returned by the action c. As long as x is never used in p, as with WHILE where we already know the environment will be unchanged, the inner bind can be reduced. We change the interpreter for WHILE as below, and note that the variable x is unused in the continuation:

fact : In Expr, the factorial program, repeatedly calculating the sum of all factorials from 1! to 20!. We implemented this both using direct recursion (as previously described) and using tail recursion, and timed both 20,000 and 2,000,000 iterations.

interp env (WHILE test body) = ibind (while (ioSnd (interp env test)) (ioSnd (interp env body))) (\x => return (env, II));

sumlist : In Expr extended with list processing, calculating the sum of a list of 10,000 elements (iterating 20,000 times). copy : In File, copying the contents of a large file, line by line. copy dynamic : In File, copying the contents of several large files, reading the file names from another file.

Partial Evaluation — Successfully After this change, specialising the copy program yields the residual program shown in Figure 14, with the environment eliminated entirely. To understand what has happened, observe that IO itself is an EDSL describing execution, interpreted by the run-time system, and bind is a program transformation operation. We can only

copy store : In File, copying the contents of a large file by storing the entire contents in memory. sort file : In File, sorting the contents of a large file using a tree sort, and writing a new file.

305

Program fact (20K tail-recursive) fact (2M tail-recursive) fact (2M recursive) sumlist (10K elements) copy copy dynamic copy store sort file

Idris (gen.) Time (s) Space (kb) 8.598 1892 877.2 1900 538.7 1888 1148.0 155616 1.974 1944 1.763 1940 7.650 59872 7.510 42228

Idris (spec) Time (s) Space (kb) 0.017 816 1.650 816 3.154 816 3.181 1604 0.589 1896 0.507 1900 1.705 51488 5.205 42180

Time (s) 0.081 1.937 N/A† 4.413 1.770 1.673 3.324 2.610

Java Space (kb) 11404 11388 N/A† 12092 12764 12796 46364 32560

C (gcc -O3) Time (s) Space (kb) 0.007 292 0.653 292 N/A† N/A† 0.346 504 0.564 296 0.512 304 1.159 24276 1.728 15832

† Java and C versions implemented iteratively

Table 1. Experimental Results We would, of course, prefer to use real programs, as might be found in the nofib suite for Haskell [30], for example. Since I DRIS is a new, experimental, language, however, such a suite does not yet exist, and we have therefore tried to implement a variety of simple benchmarks covering both functional and imperative features, as well as examples which use the host language extensively. In order to compare our approach with mainstream programming methods, we implemented equivalent programs in Java (version 1.5.0) and C (gcc 4.0.1, using -O3), following the same algorithms as far as possible and appropriate. Clearly, these results should be treated with some caution, since comparing different language implementations does not always produce completely fair results: different languages are optimised for different tasks; different algorithms work better in some languages than others; and, to some extent, we are also comparing library implementations. Nevertheless, the results provide an indication of the feasibility of our approach for more realistic tasks. The source code for our examples is available at http://www.cs.st-and.ac.uk/∼eb/icfp10/.

Program fact (tail-rec, 2M) sumlist copy copy dynamic copy store sort file

Idris (gen.) Time Space 992.15 10904 709.6 70824 2.048 10976 1.847 10948 7.576 57944 7.593 49840

Idris (spec.) Time Space 1.642 816 3.161 1604 0.587 10916 0.521 10920 1.708 57936 5.223 40604

Table 2. Space and time results with default heap size 10 Mb due to the overhead of the run-time system, but in every case the results are well within an order of magnitude, and in one case (copy dynamic), the partially evaluated interpreter is actually slightly faster than the C version. It is worth noting that the I DRIS compiler and run-time system are at an early stage of development, and do not yet apply well-known optimisations that have been used in production systems. For example, deforestation [17, 43] might improve the performance of sort file and similar programs that build and destroy intermediate structures. Our results are therefore highly encouraging.

Analysis of our Results Table 1 gives absolute run times for our example programs. These results were obtained on an Apple MacBook Pro with a 2.8GHz Intel Core 2 Duo processor and 4Gb memory, running Mac OS X 10.5.8, using time -l. The times are the reported CPU times (i.e. the actual processing time, rather than wall clock time or system time), and the space is the maximum resident set size (i.e. the maximum portion of the process’s memory held in RAM). For each example, specialising the interpreter provides both a significant speedup and a reduction in space usage. The speedup is particularly dramatic for Expr programs. There are two likely reasons for this (established using Apple’s Shark profiler4 ): firstly, Expr is far more fine-grained than File, in that it has syntactic forms for variables, values and application, whereas File takes advantage of host language constructs; and secondly, recursive calls in non-specialised Expr programs must be evaluated lazily, with some associated overhead, in order to avoid expanding the abstract syntax tree indefinitely. The speedup is less dramatic, but still significant, for programs in File, even for sort file which spends much of its time evaluating host language code. It is worth noting that, for File, the main benefit of partial evaluation is in improved execution time rather than reducing space usage. In most cases, the results of specialising the interpreter produces a program that runs faster than its Java equivalent and also uses significantly less memory. For sort file, this is not the case, however, because the Java version of tree sort allows inplace update (this is safe since we discard the intermediate trees) whereas the I DRIS version, being purely functional, does not. Few of the programs are close to the efficiency of the C equivalents,

6.1

Garbage Collection

I DRIS compiles to C, using the Boehm-Demers-Weiser conservative garbage collector [5] with an initial heap size of 1Mb. As an experiment, we increased the initial heap size to 10Mb, hypothesising that this would lead to faster run times due to fewer calls to the garbage collector, at the expense of a larger memory footprint. The results are shown in Table 2. In fact, they suggest little more than that it is difficult to predict the effect of changing garbage collector parameters. Increasing the heap-size has little positive effect on either execution time or heap usage, other than sumlist where there is a significant benefit for the generic version. In general, with a larger heap, while the programs collect less frequently, each collection takes longer. Further (informal) experiments with a hand-written allocator for I DRIS suggest that we could significantly improve execution times by implementing a special-purpose collector, with specific knowledge of the I DRIS memory structure. 6.2

A Larger Example — Network Protocols

Given our encouraging results, we have begun research into implementing verified network protocols, using the EDSL approach and partial evaluation. We have implemented a simple transport protocol [4] as a DSL embedded in I DRIS. Space does not permit a full explanation, but the DSL encodes valid transitions of a state machine, where transitions represent actions such as sending a packet to a remote machine and receiving an acknowledgment, and its representation is parameterised over start and end states. Programs are therefore guaranteed to use valid transitions, and terminate in an

4 http://developer.apple.com/tools/sharkoptimize.html

306

Program protocol (20K packets)

Idris (gen.) Time Space 0.990 24285

8.

Idris (spec.) Time Space 0.751 24011

The underlying motivation for our research is to be able to reason about extra-functional properties of programs, while ensuring good efficiency. We have found that if EDSLs are based on a dependently-typed host language, then they present a promising basis for such reasoning. Moreover, we have established the feasibility of producing “efficient enough” implementations this way. Our methods apply not only to EDSLs embedded in I DRIS, but would also apply to those embedded in any language with a suitably rich type system, if extended with the [static] annotation and caching partial evaluator described in Section 4.1. This includes languages such as Agda, Coq and Concoqtion, or even GHC with recent extensions such as GADTs and open type families [33]. Given the current trend for EDSL development in Haskell, our results suggest that extending GHC’s compile-time evaluation machinery to support full partial evaluation would be beneficial. In particular, through developing several EDSL programs using a variety of language features, we have found that dependentlytyped languages such as I DRIS represent a “sweet spot” where partial evaluation is particularly effective. With minimal modification to the evaluator, and minimal annotations to direct the process of evaluation, the efficiency of our programs compares favourably with similar hand-written programs in Java (in fact, they are generally faster with better memory usage), and is not significantly worse than C (generally around 2–3 times slower, and about 3 times the memory usage). Since we have not yet applied standard optimisation techniques, this is highly promising. The main reason partial evaluation is so effective in our setting is that we can state precisely what the type of a residual program should be, and can allow that type to vary according to the input program. This removes any need for tagging of intermediate values. I/O and side effects also pose no problem because we distinguish computation and execution. Code duplication, which might arise as a result of carelessly expanding definitions, is easily avoided by caching the result of function applications. There are some obvious limitations to our approach which we hope to address in future work. In particular, an EDSL with higherorder functions would still retain an interpreter in the residual program, because higher-order functions accept functions as dynamic arguments, violating our Rule 4. Possible solutions include the use of multi-stage languages, defunctionalisation or run-time type-safe transformation rules. A more serious limitation is that generated programs can only be as efficient as the underlying host language constructs. For purely functional EDSLs this is not a problem, as there are equivalent constructs in I DRIS, but it would be a problem for a language with, e.g., mutable local variables. For the resource-safe EDSLs that currently interest us, such as those for safe network protocols, we do not anticipate that this will be an issue, but if it is, then it may be possible to remove the overhead, for example by compiling environments specially. To conclude, partial evaluation in a dependently-typed language enables inefficient EDSL interpretation engines to be scrapped, so achieving both efficiency and verifiability. We hope that our techniques and the new opportunities afforded by dependentlytyped languages will lead to a renewed interest in partial evaluation.

Table 3. Space and time results for Network Protocol

expected state. The state machine handles error conditions such as timeout and dropped packets. Table 3 gives CPU time and space usage for a test run sending 20,000 packets to a remote machine. Once again, specialisation improves the run time. Profiling suggests that much of the time spent in this example, at this stage of development, is involved in validating packet data. Nevertheless, a significant interpreter overhead is eliminated.

7.

Conclusions and Further Work

Related Work

Domain-specific languages are a recognised technique for improving programmer productivity by providing appropriate notation and abstractions for a particular problem domain [42]. Recently, the Embedded Domain Specific Language (EDSL) approach, in which a DSL is implemented by embedding in a host language, has been gaining in popularity, with Haskell a popular choice as the host language [3, 14, 24]. A common approach to executing these languages is to use a code generation library such as LLVM [23, 41]. We prefer to use partial evaluation, for two reasons: firstly, if we aim for verification, a general purpose partial evaluator need only be verified once, rather than verifying a specialised code generator for every compiled program; and secondly, we can produce efficient language implementations more rapidly. Partial evaluation [20] has been studied for many years, along with related methods such as multi-stage programming as implemented in e.g. MetaOCaml [38], Template Haskell [35], or Concoqtion [15]. The idea that an interpreter can be specialised to generate an efficient executable has been known since at least 1971 [16]. It may therefore seem surprising that the technique is not more widely used5 . However, there are several issues that must be addressed when using partial evaluation in a general setting, e.g. tag elimination [39], termination analysis [1], code duplication [36], and it is also difficult to use with imperative programming and side effects [12]. In contrast, our approach is specifically targeted towards the efficient execution of EDSLs in a dependently-typed purely functional host language, where partial evaluation is effectively β-normalisation. We therefore avoid many of the complexities involved with general solutions. Our approach makes extensive use of tagless interpreters. An alternative approach is to use combinators, rather than data constructors, to build an object language, which can then be partially evaluated [10]. Combined with representation of stronger invariants [22], it is possible that efficient, correct EDSLs could be implemented in a more mainstream functional language. However, the stronger the invariants required, the more likely it is that a stronger type system with full dependent types will be needed. Finally, supercompilation [27, 28] is closely related to partial evaluation. This technique aims to reduce abstraction overhead through compile-time evaluation. We believe that it may be particularly effective when combined with tagless interpreters, and we hope to explore it in future work.

Acknowledgments This work was partly funded by the Scottish Informatics and Computer Science Alliance (SICSA), by EU Framework 7 Project No. 248828 (ADVANCE) and by EPSRC grant EP/F030657 (Islay). We thank our colleagues, notably William Cook, James McKinna and Anil Madhavapeddy for several helpful discussions, and the anonymous reviewers for their constructive suggestions on this paper.

5 One

example where partial evaluation has been applied to a realistic problem is Pantheon [34], an implementation of Pan [14] using Template Haskell.

307

References

Master’s thesis, Computer Science Dept., University of Illinois at Urbana-Champaign, December 2002.

[1] P. H. Andersen and C. K. Holst. Termination analysis for offline partial evaluation of a higher order functional language. In Proc. SAS ’96: Intl. Symp. on Static Analysis, pages 67–82. Springer, 1996.

[24] S. Lee, M. M. T. Chakravarty, V. Grover, and G. Keller. GPU kernels as data-parallel array computations in Haskell. In Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods (EPAHM 2009), 2009.

[2] L. Augustsson and M. Carlsson. An exercise in dependent types: A well-typed interpreter. In Workshop on Dependent Types in Programming, Gothenburg, 1999. Available from http://www. cs.chalmers.se/∼augustss/cayenne/interp.ps.

[25] A. L¨oh, C. McBride, and W. Swierstra. A tutorial implementation of a dependently typed lambda calculus, 2010. To appear in Fundam. Inf. [26] C. McBride and J. McKinna. The view from the left. Journal of Functional Programming, 14(1):69–111, 2004.

[3] L. Augustsson, H. Mansell, and G. Sittampalam. Paradise: a two-stage DSL embedded in Haskell. In Proc. ICFP 2008: International Conf. on Functional Programming, pages 225–228. ACM, 2008.

[27] N. Mitchell. Transformation and Analysis of Functional Programs. PhD thesis, University of York, June 2008.

[4] S. Bhatti, E. Brady, K. Hammond, and J. McKinna. Domain specific languages (DSLs) for network protocols. In International Workshop on Next Generation Network Architecture (NGNA 2009), 2009.

[28] N. Mitchell and C. Runciman. A supercompiler for core Haskell. In Implementation and Application of Functional Languages 2007, volume 5083 of LNCS, pages 147–164. Springer, May 2008.

[5] H.-J. Boehm, A. J. Demers, Xerox Corporation Silicon Graphic, and Hewlett-Packard Company. A garbage collector for C and C++. http://www.hpl.hp.com/personal/Hans Boehm/gc/, 2001.

[29] U. Norell. Towards a practical programming language based on dependent type theory. PhD thesis, Department of Computer Science and Engineering, Chalmers University of Technology, SE-412 96 G¨oteborg, Sweden, September 2007.

[6] E. Brady. Practical Implementation of a Dependently Typed Functional Programming Language. PhD thesis, University of Durham, 2005.

[30] W. Partain. The nofib benchmark suite of Haskell programs. In J. Launchbury and P. Sansom, editors, Functional Programming, Workshops in Computing, pages 195–202. Springer, 1992.

[7] E. Brady. Ivor, a proof engine. In Implementation and Application of Functional Languages 2006, volume 4449 of LNCS, pages 145–162. Springer, 2007.

[31] E. Paˇsal´ıc, W. Taha, and T. Sheard. Tagless staged interpreters for typed languages. In Proc. 2002 International Conf. on Functional Programming (ICFP 2002). ACM, 2002.

[8] E. Brady and K. Hammond. A verified staged interpreter is a verified compiler. In Proc. GPCE ’06: Conf. on Generative Programming and Component Engineering, 2006.

[32] S. Peyton Jones and S. Marlow. Secrets of the Glasgow Haskell Compiler inliner. Journal of Functional Programming, 12(4):393– 434, September 2002.

[9] E. Brady, C. McBride, and J. McKinna. Inductive families need not store their indices. In S. Berardi, M. Coppo, and F. Damiani, editors, Types for Proofs and Programs 2003, volume 3085, pages 115–129. Springer, 2004.

[33] T. Schrijvers, S. Peyton Jones, M. Chakravarty, and M. Sulzmann. Type checking with open type functions. In International Conf. on Functional Programming (ICFP 2008), pages 51–62, New York, NY, USA, 2008. ACM.

[10] J. Carette, O. Kiselyov, and C.-c. Shan. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. J. Funct. Program., 19(5):509–543, 2009.

[34] S. Seefried, M. Chakravarty, and G. Keller. Optimising embedded DSLs using Template Haskell. In Proc. GPCE ’04: Conf. Generative Prog. and Component Eng., LNCS. Springer, 2004.

[11] T. Coquand. An algorithm for type-checking dependent types. Science of Computer Programming, 26(1-3):167–177, 1996.

[35] T. Sheard and S. Peyton Jones. Template metaprogramming for Haskell. In ACM Haskell Workshop, pages 1–16, Oct. 2002.

[12] S. Debois. Imperative program optimization by partial evaluation. In Proc. PEPM ’04: ACM Symp. on Partial Evaluation and SemanticsBased Program Manipulation, pages 113–122. ACM, 2004.

[36] K. Swadi, W. Taha, O. Kiselyov, and E. Pasalic. A monadic approach for avoiding code duplication when staging memoized functions. In Proc. PEPM ’06: ACM Symp. on Partial Evaluation and Semanticsbased Program Manipulation, pages 160–169, 2006.

[13] B. Delaware and W. R. Cook. Generic operations and partial evaluation using models, 2009. Draft. [14] C. Elliott, S. Finne, and O. De Moor. Compiling embedded languages. J. Funct. Program., 13(3):455–481, 2003.

[37] W. Taha. Multi-stage Programming: Its Theory and Applications. PhD thesis, Oregon Graduate Inst. of Science and Technology, 1999.

[15] S. Fogarty, E. Pasalic, J. Siek, and W. Taha. Concoqtion: indexed types now! In Proc. PEPM ’07: ACM Symp. on Partial Evaluation and Semantics-Based Program Manipulation, pages 112–121, 2007.

[38] W. Taha. A Gentle Introduction to Multi-stage Programming, 2003. Available from http://www.cs.rice.edu/∼taha. [39] W. Taha and H. Makholm. Tag elimination – or – type specialisation is a type indexed effect. In Subtyping and Dependent Types in Programming, APPSEM Workshop, 2000.

[16] Y. Futamura. Partial evaluation of computation process – an approach to a compiler-compiler. Systems, Comps., Controls, 2(5):45–50, 1971. [17] A. Gill. Cheap deforestation for non-strict functional languages. PhD thesis, University of Glasgow, January 1996.

[40] W. Taha, H. Makholm, and J. Hughes. Tag elimination and jonesoptimality. In PADO ’01: Proceedings of the Second Symposium on Programs as Data Objects, pages 257–275, London, UK, 2001. Springer-Verlag.

[18] P. Hancock and A. Setzer. Interactive programs in dependent type theory. In P. Clote and H. Schwichtenberg, editors, Proc. CSL 2000: 14th Ann. Conf. of EACSL, Fischbau, Germany, 21–26 Aug 2000, LNCS 1862, pages 317–331. 2000.

[41] D. Terei. Low level virtual machine for Glasgow Haskell Compiler. Bachelor’s Thesis, Computer Science and Engineering Dept., The University of New South Wales, Sydney, Australia, 2009.

[19] P. Hudak. Building domain-specific embedded languages. ACM Computing Surveys, 28A(4), December 1996.

[42] A. van Deursen, P. Klint, and J. Visser. Domain-specific languages — an annotated bibliography. http://homepages.cwi.nl/∼arie/ papers/dslbib/, 2000.

[20] N. Jones, C. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation. Prentice Hall International, 1993. [21] N. D. Jones. Challenging problems in partial evaluation and mixed computation. New Gen. Comput., 6(2-3):291–302, 1988.

[43] P. Wadler. Deforestation: Transforming programs to eliminate trees. Theoretical Computer Science, 73:231–248, 1990.

[22] O. Kiselyov and C.-c. Shan. Lightweight monadic regions. In Proc. Haskell ’08: ACM SIGPLAN Symp. on Haskell, pages 1–12, 2008. [23] C. Lattner. LLVM: An infrastructure for multi-stage optimization.

308

Rethinking Supercompilation Neil Mitchell [email protected]

Abstract

1.1 Contributions

Supercompilation is a program optimisation technique that is particularly effective at eliminating unnecessary overheads. We have designed a new supercompiler, making many novel choices, including different termination criteria and handling of let bindings. The result is a supercompiler that focuses on simplicity, compiles programs quickly and optimises programs well. We have benchmarked our supercompiler, with some programs running more than twice as fast than when compiled with GHC.

Our primary contribution is the design of a new supercompiler (§2). Our supercompiler has many differences from previous supercompilers (§5.1), including a new core language, a substantially different treatment of let expressions and entirely new termination criteria. The result is a supercompiler with a number of desirable properties:

Categories and Subject Descriptors ming Languages General Terms Keywords

1.

Simple Our supercompiler is designed to be simple. From the descriptions given in this paper a reader should be able to write their own supercompiler. We have written a supercompiler following our design which is available online1 . Much of the code (for example Figure 2) has been copied verbatim into our implementation. The supercompiler can be implemented in under 300 lines of Haskell.

D.3 [Software]: Program-

Languages

Haskell, optimisation, supercompilation

Fast compilation Previous supercompilers have reported compilation times of up to five minutes for small examples (Mitchell and Runciman 2008). Our compilation times are under four seconds, and there are many further compile time improvements that could be made (§4.2).

Introduction

Consider a program that counts the number of words read from the standard input – in Haskell (Peyton Jones 2003) this can be compactly written as:

Fast runtime We have benchmarked our supercompiler on a range of small examples (§4). Some programs optimised with our supercompiler, and then compiled with GHC, are more than twice as fast than when compiled with GHC alone.

main = print ◦ length ◦ words =< < getContents Reading the program right to left, we first read the standard input as a string (getContents), then split it in to words (words), count the number of words (length), and print the result (print). An equivalent C program is unlikely to use such a high degree of abstraction, and is more likely to get characters and operate on them in a loop while updating some state. Sadly, such a C program is three times faster, even using the advanced optimising compiler GHC (The GHC Team 2009). The abstractions that make the program concise have a significant runtime cost. In a previous paper (Mitchell and Runciman 2008) we showed how supercompilation can remove these abstractions, to the stage where the Haskell version is faster than the C version (by about 6%). In the Haskell program after optimisation, all the intermediate lists have been removed, and the length ◦ words part of the pipeline is translated into a state machine. One informal description of supercompilation is that you simply “run the program at compile time”. This description leads to two questions – what happens if you are blocked on information only available at runtime, and how do you ensure termination? Answering these questions provides the design for a supercompiler.

We give examples of how our supercompiler performs (§2.3.2), including how it subsumes list fusion and specialisation (§3), and what happens when the termination criteria are needed (§2.6.4).

2. Method This section describes our supercompiler. We first present a Core language (§2.1), along with simplification rules (§2.2). We then present the overall algorithm (§2.3), which combines the answers to the following questions: • How do you evaluate an open term? (§2.4) • What happens if you can’t evaluate an open term further? (§2.5) • How do you know when to stop? (§2.6) • What happens if you have to stop? (§2.6.3)

Throughout this section we use the following example: root g f x = map g (map f x) map f [ ] = [] map f (x : xs) = f x : map f xs

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright °

Our supercompiler always optimises the function named root. The root function applies map twice – the expression map f x 1 http://hackage.haskell.org/package/supero

309

type Var = String type Con = String

data Exp = App Var [Var ] | Con Con [Var ] | Let [(Var, Exp)] Var | Case Var [(Pat, Exp)] | Lam Var Exp type Pat = Exp

map = λf x → case x of [] → let v1 in v1 y : ys → let v1 v2 v3 in v3

-- variable/function names -- constructor names -- function application -- constructor application -- let expression -- case expression -- lambda expression

= [] =fy = map f y = (:) v1 v2

Our Core language can be rather verbose, so we sometimes use a superset of our Core language, assuming the expressions are translated to our Core language when necessary. For example, we might write map as:

-- restricted to Con Figure 1. Core Language

map = λf x → case x of [] → [] y : ys → f y : map f ys

produces a list that is immediately consumed by map g. A good supercompiler should remove the intermediate list.

2.2 Simplified Core

2.1 Core Language

We now define a simplified form of our Core language. When working with Core expressions we assume they are always simplified, and after constructing new expressions we always simplify them. We require all expressions bound in the top-level environment consist of a (possibly empty) sequence of lambda expressions, followed by a let expression. We call the first let expression within a top-level definition the root let. Within a let we remove any bindings that are not used and ensure all bound variables are unique. We also require that all expressions bound at a let must not have the following form:

Our Core language for expressions is given in Figure 1, and has much in common with Administrative Normal Form (Flanagan et al. 1993). We make the following observations: • We require variables in many places that would normally permit

expressions, including let bodies and application. A standard Core language (such as from Tolmach (2001)) can be translated to ours by inserting let expressions. • Our let expression is non-recursive – bindings within a let

expression are bound in order. For example, we allow let x = y; y = C in C but not let x = y; y = x in C.

• App v [ ], where v is bound at this let – we can remove the

• We don’t have default patterns in case expressions. These can be

• App v vs, where v is bound to a Con – the App can be replaced

binding by inlining it.

added without great complexity, but are of little interest when describing a supercompiler.

with a Con of higher arity. • App v vs, where v is bound to App w ws and the arity of w is

• We assume programs in our Core language are well-typed

higher than the length of ws – the App can be replaced with an App with more arguments.

using Hindley-Milner, in particular that we never over-apply a constructor or perform case analysis on a function. While most Haskell programs can be translated to our Core language, the typing restriction means some features are not supported (GADTs, existential types).

• App v vs, where v is bound to a Lam – the App can be replaced

with the body of the lambda, with the first variable substituted. • Case v w, where v is bound to a Con – the Case can be replaced

with the appropriate alternative.

• Function application takes a list of arguments, rather than just

a single argument – the reasons are explained in §2.8.1. We use an application with no arguments to represent just a variable.

• Let bs v – the bindings can be lifted into the let, renaming

variables if necessary.

• Variables may be either local (bound in an expression), or

As an example, we can simplify the following expression:

global (bound in a top-level environment). We require that all global variables occur as the first argument of App.

let v1 v2 v3 v4 v5 in v5

• When comparing expressions we always normalise local vari-

able names and the order of let bindings. We define the arity of a variable to be the number of arguments that need to be applied before reduction takes place. For our purposes, it is important that for a variable with arity n, if less than n arguments are applied, no evaluation occurs. We approximate the arity of bound variables using the number of lambda arguments at the root of their expression, for primitives we use a known arity (e.g. integer addition has arity 2), and for all other variables we use arity 0. In our example map has arity 2, root has arity 3, and f, g and x have arity 0. We write expressions using standard Haskell syntax (e.g. let for Let, case for Case etc.). Rewriting the map/map example in our Core language gives:

=f = Con x = v2 y = let w1 = y in v1 w1 = case v3 of Con a b → v4 a

To give: let v4 = f y v5 = v4 x in v5 If the arity of f was known to be 2, this would further simplify to: let v5 = f y x in v5 2.2.1 Simplifier Non-Termination

root = λg f x → let v1 = map f x v2 = map g v1 in v2

Sadly, not all expressions have a simplified form. Take the following example:

310

type Env = Var → Maybe Exp data Tree = Tree {pre :: Exp, gen :: [Var ] → Exp, children :: [Tree]}

step :: Env → Exp → Maybe Exp -- §2.4 split :: Exp → ([Var ] → Exp, [Exp]) -- §2.5 type History = [Exp] (C), (E) :: Exp → Exp → Bool -- §2.6 terminate :: (Exp → Exp → Bool) → History → Exp → Bool -- §2.6 stop :: History → Exp → ([Var ] → Exp, [Var])

manager :: Env → [(Var, Exp)] manager env = assign (flatten (optimise env e)) where Just e = env "root" optimise :: Env → Exp → Tree optimise env = f [ ] where f h e | terminate (E) h e = g h e (stop h e) | otherwise = g (e : h) e (reduce env e) g h e (gen, cs) = Tree e gen (map (f h) cs)

-- §2.6.3

Figure 3. Auxiliary definitions for Figure 2.

get program are equivalent, but hopefully the target program runs faster. We use the type Env to represent a mapping from source function names to expressions, allowing a result of Nothing to indicate a primitive function. The manager first builds a tree (the type Tree), where each node has a source expression (pre) and an equivalent target expression. The target expression may call target functions, but these functions do not yet have names. Therefore, we store target expressions as a generator that when given the function names produces the target expression (gen), and a list of trees for the functions it calls (children). We then flatten this tree, ensuring identical functions are only included once, and assign names to each node before generating the target program. If a target function is recursive then the initial tree will be infinite, but the flattened tree will always be finite due to the termination scheme defined in §2.6.

reduce :: Env → Exp → ([Var ] → Exp, [Exp]) reduce env = f [ ] where f h e = case step env e of | terminate (C) h e → stop h e Just e0 → f (e : h) e0 Nothing → split e flatten :: Tree → [Tree] flatten = nubBy (λt1 t2 → pre t1 ≡ pre t2 ) ◦ f [ ] where f seen t = if pre t ∈ seen then [ ] else t : concatMap (f (pre t : seen)) (children t) assign :: [Tree] → [(Var, Exp)] assign ts = [(f t, gen t (map f (children t))) | t ← ts] where f t = fromJust (lookup (pre t) names) names = zip (map pre ts) functionNames

manager: This function puts all the parts together. Reading from right to left, we first generate a potentially infinite tree by optimising the expression bound to the function root, we then flatten the tree to a finite number of functions, and finally assign names to each of the result functions.

Figure 2. The manager function.

optimise: This function constructs the tree of result functions. While the tree may be infinite, we demand that any infinite path from the root must encounter the same pre value more than once. We require that for any infinite sequence of distinct expressions h, there must exist an i such that terminate (E) (take i h) (h !! i + 1) returns True (where (!!) is the Haskell list indexing operator). If we are forced to terminate we call stop, which splits the expression into several subexpressions. We require that stop h only produces subexpressions which pass the termination test, so that when f is applied to the subexpressions they all call reduce. If the termination criteria do not force us to stop, then we call reduce to evaluate the expression.

data U = MkU (U → Bool) e = let f = λx → case x of MkU y → y x in f (MkU f) This program encodes recursion via a data type, and any attempt to apply all the simplification rules will not terminate. We are aware of two solutions: 1) We could avoid performing case elimination on contravariant data types (Peyton Jones and Marlow 2002); 2) We could avoid simplifying certain expressions, using the size measure from the HOSC supercompiler (Klyuchnikov 2010). We have chosen to leave this problem unsolved. The problem only occurs for contrived programs which encode recursion via a data type, and it is a problem shared with GHC, which will also non-terminate when compiling this example. The later stages of our supercompiler do not rely on the simplifications having been performed, so either solution could be applied in future.

reduce: This function optimises an expression by repeatedly evaluating it with calls to step. If we can’t evaluate any further we call split. We use a local termination test to ensure the evaluation terminates. We require that for any infinite sequence of expressions h, there must exist an i such that terminate (C) (take i h) (h !! i + 1) returns True. Note that this termination criteria is more restrictive than that used by optimise.

2.3 Manager Our supercompiler is based around a manager, that integrates the answers to the questions of supercompilation. The manager itself has two main purposes: to create recursive functions, and to ensure termination (assuming the simplifier terminates). In our experience the creation of recursive functions is often the most delicate part of a supercompiler, so we deliberately include all the details. The code for our manager is given in Figure 2, making use of a some auxiliary functions whose types are given in Figure 3. We first give an intuition for how the manager works, then describe each part. Our supercompiler takes a source program, and generates a target program. Functions in these programs are distinct – target expressions cannot refer to source functions. The source and tar-

flatten: This function takes a tree and extracts a finite number of functions from it, assuming the termination restrictions given in optimise. Our flatten function will only keep one tree associated with each source expression. These trees may have different target expressions if one resulted from a call to stop, while another resulted from a call to reduce – but all are semantically equivalent. assign: This function assigns names to each target function, and constructs the target expressions by calling gen. We assume the function functionNames returns an infinite list of function names.

311

2.3.1

Notation

force :: Exp → Maybe Var force (Case v ) = Just v force (App v ) = Just v force = Nothing

Values of type ([Var ] → Exp, [Exp]) can be described by taking an expression and removing some subexpressions (indicated by [[•]]). The first component is a function to insert variables where subexpressions were removed, and the second component is a list of the removed subexpressions. Before removing each subexpression, we insert an applied lambda for all variables bound in the expression but free in the subexpression. As an example:

next :: Exp → Maybe Var next (Lam x) = next x next (Let bind v) = last (Nothing : f v) where f v = case lookup v bind of Nothing → [ ] Just e → Just v : maybe [ ] f (force e)

λg f → map g ([[map f xs]]) We first insert a lambda for the variable f: λg f → map g (([[λf → map f xs]]) f)

Figure 4. Function to determine the next evaluated binding.

We then remove the subexpression. The first component of the result is a function that when given ["name"] returns the expression:

λg f z zs → let q = [[g (f z)]] qs = [[map g (map f zs)]] v = q : qs in v

λg f → map g (name f) And the second component is the singleton list containing the expression: λf → map f xs 2.3.2

When optimising g (f z) we get no optimisation, as there is no available information. To optimise map g (map f zs) we repeat the exact same steps we have already done. However, the flatten function will spot that both Tree nodes have the same pre expression (modulo variable renaming), and reduce them to one node, creating a recursive function. We then assign names using assign. For the purposes of display (not optimisation), we apply a number of simplifications given in §2.7. The end result is:

Example

Revisiting our initial example, manager first calls optimise with: λg f x → map g (map f x) The termination history is empty, so we call reduce, which calls step repeatedly until we reach the expression:

root g f x = case x of [] → [] z : zs → g (f z) : root g f zs

λg f x → let v = case w of [] → [] y : ys → g y : map g ys w = case x of → [] [] z : zs → f z : map f zs in v

The final version has removed the intermediate list, with no additional knowledge about the map function or its fusion rules. 2.4 Evaluation Evaluation is based around the step function. Given an expression, step either replaces a variable with its associated value from the environment and returns Just, or if no suitable variable is found returns Nothing. We always replace the variable that would be evaluated next during normal evaluation. To determine which variable would be evaluated next, we define the functions force and next in Figure 4. The function force determines which variable will be evaluated next given an expression – either a case scrutinee or an applied variable. The function next determines which variable bound at the root let will be evaluated next, by following the forced variables of the let bindings. Looking at the original example:

The step function now returns Nothing, since we cannot evaluate further without the result of x. We therefore call split, which results in (using the notation from §2.3.1): λg f x → case x of [] → [[let v = . . .; w = . . .; x = [ ] in v]] z : zs → [[let v = . . .; w = . . .; x = z : zs in v]] Looking at the first child expression, where x = [ ], the simplification rules from §2.2 immediately produce [ ] as the result. The second child starts as: λg f z zs → let x = z : zs w = case x of [ ] → [ ]; z : zs → f z : map f zs v = case w of [ ] → [ ]; y : ys → g y : map g ys in v

λg f x → let v1 = map f x v2 = map g v1 in v2 The function next returns Just v2 . Calling force on the expression map g v1 returns map, but since map is not bound at the root let we go no further. Therefore, to evaluate this expression we will start by evaluating v2 , and thus map. To perform an evaluation step we insert a fresh variable w1 bound to the body of map, and replace the map variable in v2 with w1 . This transformation results in:

Which simplifies to: λg f z zs → let y = f z ys = map f zs q =gy qs = map g ys v = q : qs in v

λg f x → let v1 = map f x w1 = λf x → case x of [] → [] y : ys → f y : map f ys v2 = w1 g v1 in v2

Calling step produces Nothing, as the root of this expression is a constructor (:) which can’t be evaluated. We therefore call split which results in:

312

Simplification immediately removes the lambda at w1 , replacing v2 with a case expression on v1 . More generally, we match any expression with the following pattern:

y : ys → [[let x = y : ys v = case x of [ ] → [ ]; y : ys → add y ys in v]] Looking more closely at the second child, we start with the expression:

λfree → let s = f w1 wn v1 = e1 vn = en in v where Just e0 = env f

λy ys → let x = y : ys v = case x of [ ] → [ ]; y : ys → add y ys in v This expression immediately simplifies to:

We use s to represent the next binding to be evaluated, as returned by next. We allow any other variables v1 . . vn to be present, bound to expressions e1 . . en . Given this configuration we can rewrite to: λfree → let s0 s v1 vn in v

λy ys → let v = add y ys in v More generally, if s is the next expression to evaluate:

= e0 = s0 w1 wn = e1 = en

λfree → let s = case x of p1 → e01 ; pm → e0m v1 = e1 vn = en in v

As always, after generating a new expression we immediately apply the simplification rules (§2.2). 2.5

After split it becomes: λfree → case x of p1 → [[let x = p1 s = e01 v1 = e1 vn = en in v]] pm → [[let x = pm s = e0m v1 = e1 vn = en in v]]

Evaluation Splitting

If evaluation cannot proceed, we split to produce a target expression, and a list of child expressions for further optimisation. When splitting an expression there are three concerns: Permit further optimisation: When we split, the current expression cannot be evaluated using the rules described in §2.4. We therefore aim to place the construct blocking evaluation in the target expression, allowing the child expressions to be optimised further. No unbounded loss of sharing: An expensive variable binding cannot be duplicated in a way that causes it to be evaluated multiple times at runtime. The target program cannot remove sharing present in the source program, or it would run slower.

2.5.2 Lambda If the next binding to be evaluated is a lambda, then we place a lambda in the target program. The key point when splitting a lambda is that we do not reduce sharing. Consider the following example:

Keep bindings together: If we split variables bound at the same let expression into separate child expressions, we reduce the potential for optimisation. If the expression associated with a variable is not available when evaluating, the evaluation will be forced to stop sooner. We aim to make child expressions as large as possible, but without losing sharing. We split in one of three different ways, depending on the type of the next expression to be evaluated (as described in §2.4). We now describe each of the three ways to split, in each case we start with an example, then define the general rule. We use the [[•]] notation described in §2.3.1. 2.5.1

λx → let v1 = f x v2 = expensive v1 s = λy → add v2 y in s The add function takes two arguments, but only has one so far. We cannot move the argument y upwards to form λx y → . . ., as this action potentially duplicates the expensive computation of v2 . Instead, we create child expressions for every variable binding, and for the body of the lambda:

Case Expression

If the next expression is a case expression then we make the target a similar case expression, and under each alternative we create a child expression with the case scrutinee bound to the appropriate pattern. For example, given:

λx → let v1 = [[f x]] v2 = [[expensive v1 ]] s = λy → [[add v2 y]] in s

λx → let v = case x of [] → [] y : ys → add y ys in v

Unfortunately, we have now split the bindings for v1 and v2 apart, when there is no real need. We therefore move binding v1 under v2 , because it is only referred to by v2 , to give: λx → let v2 = [[let v1 = f x in expensive v1 ]] s = λy → [[add v2 y]] in s

We split to produce: λx → case x of [] → [[let x = [ ] v = case x of [ ] → [ ]; y : ys → add y ys in v]]

We will now optimise the body of v2 , and the body of the lambda, which will be able to evaluate add. More generally, given:

313

λfree → let s = λx → e0 v1 = e1 vn = en in v

source program. Compared to all earlier expressions in a list, each root let must contain either different names, or fewer names. In this section we first describe the terminate, C and E functions from a mathematical perspective, then how we apply these functions to expressions. Finally, we show an example of how these rules ensure termination.

We rewrite: λfree → let s = λx → [[e0 ]] v1 = [[e1 ]] vn = [[en ]] in v

2.6.1 Termination Rule Our termination rules are defined over bags (also known as multisets) of values drawn from a finite alphabet Σ. A bag of values is unordered, but may contain elements more than once. We define our rules as: x C y = set(x) 6≡ set(y) ∨ #x < #y

We then repeatedly move any binding vi under vj if either: 1) vi is only used within the body of vj ; or 2) the expression bound to vi is cheap. We define an expression to be cheap if it is a constructor, or an application to a variable v with fewer arguments than the arity of v (a partial application). The intention of moving bindings is to increase sharing, which can be done provided we don’t duplicate work (condition 1), or if the work duplicated is bounded (condition 2).

xEy =x≡y ∨ xCy We use set(x) to transform a bag to a set, and # as the cardinality operator to take the number of elements in a bag. A sequence x1 . . . xn is well-formed under C if for all indices i < j ⇒ xj Cxi (and respectively for E). The following sequences are well-formed under both E and C:

2.5.3 Anything Else

[a, aaaaab, aaabb, b] [abc, ab, accc, a] [aaaaabbb, aaab, aab]

The final rule applies if the next expression is not a case expression or a lambda, including a constructor, a variable, and an application of a variable not bound in the environment. We do not deal with variables bound in the environment, as these are handled by step. Given the example:

The following sequences are well-formed under E, but not under C: [aaa, aaa] [aabb, ab, ab]

λx y → let v1 = expensive x v2 = v1 x v3 = Con v2 y v2 in v3

The following sequences are not well-formed under E or C: [abc, abcc] [aa, aaa]

We turn each binding into a child, apart from the next binding to be evaluated:

We define the terminate function referred to in Figure 3 as:

λx y → let v1 = [[expensive x]] v2 = [[v1 x]] v3 = Con v2 y v2 in v3

terminate :: (Exp → Exp → Bool) → History → Exp → Bool terminate (<) h e = not (all (e<) h) The terminate function returns False if given a well-formed sequence (h), adding the expression e will keep the sequence wellformed.

We then perform the same sharing transformation as for lambda expressions, noting that v1 is only used within v2 , to give:

Lemma: Any well-formed sequence under C is finite

λx y → let v2 = [[let v1 = expensive x in v1 x]] v3 = Con v2 x v2 in v3

Given a finite alphabet Σ, any well-formed sequence under C is finite. Consider a well-formed sequence x1 . . .. We can partition this sequence into at most 2Σ subsequences using set equality. Consider any subsequence y1 . . .. For any two elements in this subsequence, set(yi ) 6≡ set(yj ) will be false, due to the partitioning. Therefore, for the sequence to be well-formed, i < j ⇒ #yj < #yi . Therefore there can be at most #y1 + 1 elements in any particular subsequence. Combined with a finite number of subsequences, we conclude that any well-formed sequence is finite.

More generally, given an expression: λfree → let s = e0 v1 = e1 vn = en in v We rewrite to:

Lemma: Any well-formed sequence under E has a finite number of distinct elements.

λfree → let s = e0 v1 = [[e1 ]] vn = [[en ]] in v

Given a finite alphabet Σ, any well-formed sequence under E has a finite number of distinct elements. For a sequence to be well-formed under E but not C it must have elements which are duplicates. If we remove all duplicates we end up with a wellformed sequenced under C, which must be finite. Therefore there must be a finite number of distinct elements.

We then repeatedly move any binding vi under vj according to the criteria given in §2.5.2. 2.6

Termination

2.6.2 Tracking Names

The termination rule is responsible for ensuring that whenever we proceed along a list of expressions we eventually stop. The intuition is that each expression has a set of bindings at the root let, and each of these bindings has a name indicating where it came from in the

Every expression in the source program is assigned a name. A name is a triple, hf , e, ai where f is a function name, e is an expression index and a is an argument count. We label every expression in

314

the source program with f being the function it comes from and e being a unique index within that function. The argument count a for constructors and applications is the number of arguments, and for all other expressions is 0. When manipulating expressions, we track names:

λfree → let v1 = [[e1 ]] vn = [[en ]] in v We now move variable vi under vj using the same conditions as split, described in §2.5.2. In addition, we do not merge vi under vj if the resulting expression bound to vj would violate the termination criteria terminate (E) h. Combined with with the property that all initial children are singleton name bags, which trivially satisfy E for any expression, our merge restriction ensures no children violate the termination criteria. As a heuristic, we attempt to move variable v before w if the name associated with v occurs fewer times in the original expression. In most expressions that are growing, and therefore hit the termination criteria, there will be some name that keeps repeating. By favouring less frequent names we hope to keep together subexpressions that are not growing. This heuristic has no effect on the correctness or termination, but sometimes gives better optimisation.

• When renaming a bound variable, or substituting one variable

for another, we do not change any names. • If we move a subexpression, we keep the name already assigned

to that subexpression. • If we increase the number of arguments to an application or

constructor, we increase the argument count of that expression. For example, let v = C x; w = v y in w being transformed to let w = C x y in w would have the new name for w set to the old name of v, but with an argument count of 2 instead of 1. • When splitting on a case we introduce a new constructor (see

§2.5.1), for this constructor we use the name assigned to the pattern from the case alternative.

2.6.4 Example Many simple example programs (such as map/map) never trigger the termination criteria. The standard example of a function that does require termination is reverse, which can be written in a simplified form as:

We map an expression to a bag of names by taking the names of all expressions bound at the root let. Lemma: For any source program, there are a finite number of names

root xs = rev [ ] xs rev acc xs = case xs of [] → acc y : ys → rev (y : acc) ys

All subexpressions are assigned expression indices in advance, so there are only a finite number of function name/index values. We only increase the argument count when increasing the number of arguments applied to a constructor or application, which is bounded by the arity of that constructor or the source function. Therefore, there are only a finite number of names.

The rev function builds up an accumulator argument, which will be equal to the size of xs. To specialise on the accumulator argument would require an infinite number of specialisations. To supercompile this program, the optimise function starts with an empty termination history and the expression rev [ ] xs, and calls reduce, resulting in:

Lemma: A bag of names represents a finite number of expressions Given a bag of names, there are only a finite number of expressions that could have generated it. We first assume that when simplifying an expression we always normalise the free variables – naming the let body v1 , and naming all other variables as they are reached from v1 . Each name refers to one particular subexpression, but may have different variable names. A finite number of subexpressions can only be combined to produce a finite number of expressions, if we ignore variable names, which the normalisation handles.

λxs → case xs of [] → [[[ ]]] y : ys → [[rev (y : [ ]) ys]] Focusing on the second alternative, we now add rev [ ] xs to the termination history, and continue optimising rev (y : [ ]) ys. This leads to the sequence of expressions: λx1 → rev [ ] x1 λx1 x2 → rev (x1 : [ ]) x2 λx1 x2 x3 → rev (x1 : x2 : [ ]) x3 ...

Lemma: The termination properties required by §2.3 are satisfied The termination properties in §2.3 are satisfied by the lemmas in this section. We have shown that the alphabet of names, Σ, is finite. For terminate (E) we have shown that there can only be a finite number of distinct name bags, and that each name bag can only correspond to a finite number of expressions, therefore there are a finite number of distinct expressions. For terminate (C) we have shown that there can only be a finite number of name bags.

We can rewrite these expressions in our core language, with annotations for names: λx1 → let v1 = hroot, 2, 0i v2 = hroot, 1, 0i in v2 λx1 x2 → let v1 = hroot, 2, 0i v2 = hrev , 2, 0i v3 = hrev , 1, 0i in v1 λx1 x2 x3 → let v1 = hroot, 2, 0i v2 = hrev , 2, 0i v3 = hrev , 2, 0i v4 = hrev , 1, 0i in v1

2.6.3 Termination Splitting If we are forced to terminate we call stop, which splits the expression into several subexpressions. We require that stop h e only produces subexpressions which are not forced to terminate by terminate (E) h. We trivially satisfy this requirement by using the termination criteria when defining stop. Given an expression: λfree → let v1 = e1 vn = en in v We first split every variable bound at the let, to give:

315

[] rev v1 x1 [] x1 : v1 rev v2 x2 [] x2 : v1 x1 : v2 rev v3 x3

Under E the first two expressions create a well-formed sequence, but the first three expressions do not. The first expression is permitted because the history is empty. The second expression is permitted because it has a different set of names from the first. The third expression has the same set of names as the second, and has a higher cardinality. Therefore, when optimising, we call stop on the third expression. After calling stop we get: λx1 x2 x3 → let v2 = [[let v1 = hroot, 2, 0i v2 = hrev , 2, 0i in v2 ]] v4 = [[let v3 = hrev , 2, 0i v4 = hrev , 1, 0i in v4 ]] in v1

track arity information. There were three main reasons for moving to vector application: • Vector application simplifies splitting with primitive functions,

by providing the arity information directly. • Vector application makes it easier to identify partial applica-

tions when increasing sharing (see §2.5.2). • Vector application reduces the number of names in an expres-

sion, improving the time taken to compile.

[] x2 : v1

2.8.2 Alternative Termination Orderings

x1 : v2 rev v3 x3

Our original termination rule was: x C y = x ⊃set y ∨ x ⊂bag y Both this rule and the one described in §2.6.1 can be proved using the same argument. We switched to use our new rule because it is simpler, follows more directly from the proof, and can be implemented very efficiently. Choosing a termination rule is a tricky business – no termination rule can be the best for all programs, so there is always scope for experimentation.

Part of the accumulator has been bound to v2 , and separated from the main expression. Continuing to optimise we get the sequence: λx1 λx1 λx1 λx1 λx1 λx1 ...

→ rev [ ] x1 x2 → rev (x1 : [ ]) x2 x2 x3 → rev (x1 : x2 : [ ]) x3 x2 x3 → rev (x1 : x2 ) x3 x2 x3 x4 → rev (x1 : x2 : x3 ) x4 x2 x3 → rev (x1 : x2 ) x3 -- repeat the last 2 lines

-- reduce -- reduce -- stop -- reduce -- stop -- reduce

2.8.3 Recursive Lets Our Core language does not include recursive lets. Recursive lets bound to functions can be efficiently removed using lambda-lifting (Johnsson 1985). Recursive lets bound to values can be removed, but doing so may cause the program to run arbitrarily slower (Mitchell 2008). Alternatively, we can take functions with value recursive lets and make them primitives, losing optimisation potential, but preserving complexity. In practice, the most common function with a value recursive let is repeat, and our supercompiler is nearly always able to fuse away the list generated by repeat.

As required, we have a finite number of distinct expressions, and end up with a recursive function in the target program. 2.7 Post-processing Our split function is structured to produce only one simple expression per target function – for example a target function will never contain two constructors. While most opportunities to remove intermediate structure have been exploited, the target program will usually contain lots of small functions. We can eliminate many of these functions by inlining all functions which are only called once. For example, given the source program:

2.8.4 Common Subexpression Elimination Common Subexpression Elimination (CSE) involves detecting when a program will compute two identical expressions, and reducing them both to a single shared expression. Our Core language is well suited to CSE – two variables can be merged if they have the same bound expression. The advantage of CSE is that performance can be increased, sometimes asymptotically. The disadvantages are that CSE can introduce space leaks (Chitil 1998), and the additional sharing may stop variables from being moved when splitting. We have not investigated the performance impact of CSE on supercompilation, but think it is a worthwhile area for future research.

root x = x : x : [ ] After supercompilation, we get the target program: root x = x : f x f x = x : nil nil = [ ]

2.8.5 Inlining Simple Functions The GHC compiler inlines many non-recursive functions during the simplification phase (Peyton Jones and Marlow 2002). It is certainly possible that our simplification rules could be extended to inline some functions, such as id, provided no new names were introduced (and thus termination was unaffected). Another alternative would be to inline simple functions in the source program before supercompilation started (such as otherwise and (◦)). The primary motivation for inlining simple functions would be to reduce the complexity of the main supercompilation phase, and avoid inopportune termination splits. We have deliberately not performed any inlining other than in the step function, as there is a risk that doing so would hide weaknesses in our supercompiler. However, we think simple inlining would be worth investigating.

We can then inline all functions that are only called once: root x = x : x : [ ] It is important that the only optimisation intended from this post-processing is the reduction of function call overhead. This use of inlining is substantially different from other compilers (Peyton Jones and Marlow 2002), where inlining is used to bring expressions together to trigger other optimisations. 2.8

Alternative Designs

In this section we describe some possible design alternatives for our supercompiler. 2.8.1 Binary Application

3. Comparison to Other Optimisations

The first version of this supercompiler had binary application, rather than vector application. The App Var [Var ] constructor was replaced by a combination of Var Var and App Var Var. The reason for originally choosing binary application is that it is closer to other Core languages, and the simplification does not need to

Supercompilation naturally subsumes many other optimisations, including constructor specialisation (Peyton Jones 2007) and deforestation (Gill et al. 1993; Wadler 1990). However, there are some optimisations that supercompilation (in the form presented here)

316

Program Lines Compile time Runtime Memory Size append 8 0.1 + 0.6 0.85 0.84 1.00 bernouilli 148 2.4 + 1.3 1.04 0.96 1.02 charcount 32 0.1 + 0.6 0.14 0.01 0.99 digits-of-e2 100 2.0 + 0.8 0.40 0.45 0.99 39 0.5 + 0.8 0.93 1.00 1.08 exp3 8 factorial 12 0.1 + 0.6 0.98 1.00 1.00 linecount 43 0.2 + 0.6 0.01 0.01 0.98 primes 58 0.1 + 0.6 0.58 0.81 0.99 raytracer 26 0.1 + 0.6 0.56 0.44 1.00 rfib 16 0.1 + 0.7 0.77 1.01 0.98 sumsquare 45 1.2 + 0.9 0.38 0.23 1.03 sumtree 27 0.1 + 0.6 0.14 0.01 1.00 tak 19 0.1 + 0.6 0.79 1.01 0.98 treeflip 26 0.1 + 0.6 0.57 0.45 1.01 wordcount 62 0.3 + 0.7 0.19 0.30 1.00 x2n1 36 0.1 + 0.8 0.90 0.99 1.00

does not address – in particular strictness analysis and unboxing (Peyton Jones and Launchbury 1991), and the generation of native code. In order to benefit from these optimisations we use GHC to compile the resulting Core after supercompilation (The GHC Team 2009). We now give an example where our supercompiler massively outperforms GHC, and discuss the optimisations being performed. Our example is: root n = map square (iterate (+1) 1) !! n where square x = x ∗ x Running this program with n = 400000, GHC takes 0.149 seconds, while our supercompiler combined with GHC takes 0.011 seconds. Running for larger values of n is infeasible as the GHC only variant overflows the stack. After optimising with our supercompiler, then compiling with GHC, the resulting inner-loop is: go :: Int# → Int# → Int # go x y = case x of 0→y∗y → go (x − 1) (y + 1)

Program is the name of the program as given in §4. Lines is the number of lines of code in the original program, including library definitions, but excluding primitives. Compile time is the number of seconds to compile the program (a + b), including both compilation with our supercompiler (a) and the subsequent compilation with GHC (b). The final three columns are relative to ghc -O2 being 1.00, with a lower number being better. Runtime is how long the optimised program takes to run. Memory is the amount of memory allocated on the heap. Size is the size of the optimised program on disk.

All the intermediate lists have been removed, there are no functional values, all the numbers have been unboxed and all arithmetic is performed on unboxed values (GHC uses Int# as the type of unboxed integers). Supercompilation has fused all the intermediate lists and specialised all functional arguments, leaving GHC to perform strictness analysis and unboxing. The program compiled with GHC alone is much less efficient. GHC uses programmer supplied rewrite rules to eliminate intermediate lists (Peyton Jones et al. 2001), which fuses the map/iterate combination. Unfortunately, GHC does not contain a rule to fuse the input list to the (!!) operator. The GHC rules match specific function names in the source program, meaning that redefining map locally would inhibit the fusion. In contrast, our supercompiler does not rely on rules so is able to fuse the functions regardless of their names, and is able to perform fusion on data types other than lists. GHC specialises the resulting map/iterate combination with the square function, but fails to specialise with increment – passing (+1) as a higher-order function. GHC can specialise functions to particular data values using constructor specialisation, but does not currently do the same transformation for functional arguments. To allow specialisation, some functions are written in a particular style:

Table 1. Benchmark results. • sumsquare is the introductory example used in the stream fusion

paper (Coutts et al. 2007). • charcount, linecount and wordcount are taken from our previous

supercompiler work (Mitchell and Runciman 2008), and wordcount is used as the example in §1. For the purpose of benchmarking, we have removed the actual IO operations, leaving just the actual computation. • append, factorial, treeflip, sumtree and raytracer have been used

to benchmark other supercompilers (Jonsson and Nordlander 2009), and originate from papers on deforestation (Kort 1996; Wadler 1990). • bernouilli, digits-of-e2, exp3 8, primes, rfib, tak and x2n1 are

all taken from the Imaginary section of the nofib benchmark suite (Partain et al. 2008).

foldr f z xs = go xs where go [ ] =z go (y : ys) = f y (go ys)

We have manually translated all the examples from their source language to our Core language. We have taken care to ensure that we have not simplified the programs in translation – in particular we have inserted explicit dictionaries for all examples that require type classes (Wadler and Blott 1989), and have translated listcomprehensions to concatMap as described by the Haskell report (Peyton Jones 2003). For comparison purposes we compiled all the benchmarks with GHC 6.12.1 (The GHC Team 2009), using the -O2 optimisation setting. For the supercompiled results we first ran our supercompiler, then compiled the result using GHC. To run the benchmarks we used a 32bit Windows machine with a 2.5GHz processor and 4Gb of RAM.

In this definition, provided lambda-lifting is not performed, the function foldr is considered non-recursive. GHC can inline nonrecursive functions, allowing the definition of foldr to be replicated in an expression where f is known, eliminating the functional argument. In contrast, our supercompiler has specialised all the functions to their functional arguments, even when written in a natural style. GHC fails to eliminate all the lists and higher-order functions, which in turn means the integers are not detected as strict, and are not unboxed. In contrast, our supercompiler has reduced the program sufficiently for everything to be unboxed.

4.1 Comparison to GHC

4. Benchmarks

The benchmarks are nearly all faster than GHC, with some programs running substantially faster than GHC alone. The improvement in speed is usually accompanied by either a similar memory usage, or a substantial reduction. The resulting executables are all

In this section we run our supercompiler over a range of benchmarks drawn from other papers. The results are given in Table 1. The benchmarks we use are:

317

very close in size to compilation with GHC alone – partly because the run-time system accounts for a substantial proportion of the executable size.

5.1 Supercompilation Supercompilation was introduced by Turchin (1986) for the Refal programming language (Turchin 1989). Since this original work, there have been many suggestions of both termination strategies and generalisation/splitting strategies (Leuschel 2002; Sørensen and Gl¨uck 1995; Turchin 1988). The original supercompiler maintained both positive and negative knowledge (Secher and Sørensen 2000), however our implementation uses only positive knowledge (what a variable is, rather than what it cannot be). More recently, supercompilation has started to converge towards a common design, described in detail by Klyuchnikov (2009), but which has much in common with the underlying design present in other papers (Jonsson and Nordlander 2009; Mitchell and Runciman 2008). Compared to an increasingly common foundation, our supercompiler is radically different. We have changed many of the ingredients of supercompilation (the treatment of let, termination criteria, how the termination histories are used), but have also changed the way these ingredients are combined (the manager). In particular, many of our choices would not work if applied in isolation to another supercompiler – for example the termination criteria relies on the treatment of let.

Numerical Computation Some of the benchmarks mainly test numerical performance – for example factorial, x2n1 and tak. In these benchmarks we have been able to inline some of the functions even though they are recursive, which has been equivalent to a small amount of loop unrolling, and has sometimes improved execution speed. Complete Elimination Some of the benchmarks allow us to completely eliminate most intermediate values – for example charcount, sumtree and raytracer. In these cases the execution time and memory are both substantially reduced. Most of these benchmarks have previously been used to test supercompilers, and our supercompiler performs the same optimisations. Partial Elimination Some of the benchmarks have a combination of data structures and numerical computation – for example primes, digits-of-e2 and exp3 8. In these benchmarks we perform specialisation, and remove some intermediate values, but due to the nature of the benchmarks not all intermediate values can be eliminated. In digits-of-e2 we are able to fuse long pipelines of list fusions. In exp3 8 most of our performance increase comes from eliminating intermediate values of the data type data Nat = Z | S Nat.

5.1.1 Let Expressions Compared to other supercompilers, our Core language requires many more let expressions. Previous supercompilation work has tended to ignore let expressions – if let is mentioned the usual strategy is to substitute all linear lets and residuate all others. At the same time, movement of lets can have a dramatic impact on performance: carefully designed let-shifting transformations give an average speedup of 15% in GHC (Peyton Jones et al. 1996). Our previous work inlined all let bindings that it could show did not lead to a loss of sharing (Mitchell and Runciman 2008). Unfortunately, where a let could not be removed, there was a substantial performance penalty. By going to the opposite extreme we are forced to deal with let bindings properly, making our new supercompiler both simpler and more robust.

Bernouilli The benchmark on which we do worst is bernouilli. The bernouilli program seems reasonably similar in terms of list operations to other benchmarks such as primes, but our supercompiler is unable to outperform GHC – the exact reasons are still unclear. Interestingly, both our previous supercompiler and the stream fusion work also failed to outperform GHC on this benchmark, so the reason may be that GHC does a particularly good job on this benchmark. 4.2 Compilation Speed In the benchmarks presented, our supercompiler always takes under four seconds to compile. We have given the compilation times as two figures – the time taken to run the supercompiler, followed by the time taken to compile the result with GHC. In all cases, the resulting GHC compilation time is dominated by the linker. Compared to our previous supercompiler, where compile times ranged from a few seconds to five minutes, our new supercompiler is substantially faster. While we have designed our supercompiler with compilation speed in mind, we haven’t focused on optimising the compiler – all functions are implemented as simply as possible. Profiling shows that 80% of the compilation time is spent simplifying expressions, as described in §2.2. Our simplification method is currently written as a transformation that is applied until a fixed point is reached – we believe the simplification can be implemented in one pass, leading to a substantial reduction in compile time. We have implemented the termination check exactly as described in §2.6.1, traversing and comparing the entire history at each step. For our termination check it is simple to change the history to a mapping from a name bag to an integer (being the highest permitted cardinality) – reducing the algorithmic complexity. We could also optimise the representation of names, using a single integer for both the function name and subexpression index.

5.1.2 Termination Criteria The standard termination criteria used by supercompilers is the homeomorphic embedding (Leuschel 2002). The homeomorphic embedding is a well-quasi ordering, from Kruskal’s Tree Theorem (Kruskal 1960). The criteria requires that for every infinite sequence e1 , e2 . . . there exist indicies i < j such that ej 6 ei . The intuition of homeomorphic embedding is that x 6 y holds if by removing nodes from y you cannot obtain x. Our termination rule uses similar ideas to a well-quasi ordering, but with a very different comparison relation. We are unaware of any other supercompilers that have assigned names to expressions, or that have used a bag based termination rule (most use tree orderings, or sometimes cost models/budgets). Without our particular treatment of expressions as a set of let bindings, and our particular simplification rules, it is not possible to use our termination rule. For example, if we ever inline let bindings then subexpressions would be changed internally, and a single name for each subexpression would no longer be sufficient. In some cases, our rule is certainly less restrictive than the homeomorphic embedding. The example in §2.6.4 would have stopped one step earlier with a homeomorphic embedding. Under a fairly standard interpretation of variable names and let expressions, we can show that our rule is always less restrictive than the homeomorphic embedding – although other differences in our treatment of expressions mean such a comparison is not necessarily meaningful. However, we did not choose our termination criteria to permit more expressions – it was chosen for both simplicity and compilation speed.

5. Related Work We first describe related work in the area of supercompilation, particularly what makes our supercompiler unique. We then describe some work from other areas, particularly work from which we have used ideas.

318

We use two separate termination histories, one in reduce and another in optimise – an idea suggested by Mitchell (2008), but not previously implemented. By separating the termination histories we gain better predictability, as reduce is not dependent on which functions have gone before. Additionally, the histories are kept substantially smaller, again improving compile-time performance. By splitting termination checks we also reduce the coupling between the separate aspects of supercompilation, allowing us to present a simpler manager than would otherwise be possible. As a result of the changes to termination and the Core language, the operation for splitting when the termination check fails is radically different. In particular, we can use almost identical operations when either evaluation fails to continue, or the termination check fails. 5.2

in the jhc compiler (Meacham 2008), with promising initial results. GRIN works by first removing all functional values, turning them into case expressions, allowing subsequent optimisation. The intermediate language for jhc is at a much lower level than our Core language, so jhc is able to perform detailed optimisations that we are unable to express.

6. Conclusions and Future Work We have described a novel supercompiler, with a focus on simplicity, which can compile our benchmarks in a few seconds, and in some benchmarks offers substantial performance improvements over GHC alone. We see two main avenues for future work: increasing the range of benchmarks, and improving the runtime performance.

Partial evaluation

6.1 More Benchmarks

There has been a lot of work on partial evaluation (Jones et al. 1993), where a program is specialised with respect to some static data. Partial evaluation works by marking all variable bindings within a program as either static or dynamic, using binding time analysis, then specialises the program with respect to the static bindings. Partial evaluation is particularly appropriate for optimising an interpreter with respect to the expression tree of a particular program, automatically generating a compiler, and removing interpretation overhead. The translation of an interpreter into a compiler is known as the First Futamura Projection (Futamura 1999), and can often give an order of magnitude speedup. Supercompilation and partial evaluation both remove abstraction overhead within a program. Partial evaluation is more suited to completely removing static data, such as an expression tree which is interpreted. Supercompilation is able to remove intermediate data structures, which partial evaluation cannot usually do. 5.3

In order to run more benchmarks we need to automatically translate Haskell to our Core language. In previous papers we used the Yhc compiler to generate Core (Golubovsky et al. 2007), but sadly Yhc is not maintained and no longer works. Given that our supercompiler relies on GHC to perform strictness analysis and generate native code, it seems sensible to use GHC to generate our Core language – perhaps as a compiler plug-in, or working on external Core, or integrated into the compiler. Our supercompiler processes the whole program in one go, which naturally leads to questions of scalability. In the tests we have run we have not had a problem with compilation time, but it is something to be aware of as benchmarks increase in size. We believe that our supercompiler could be sped up massively, using some of the techniques mentioned in §4.2. In addition, we could split programs into separate components by defining some functions to be primitive – although this will remove optimisation potential.

Deforestation

Deforestation (Wadler 1990) removes intermediate trees (most commonly lists) from computations. This technique has been extended in many ways, including to encompass higher-order deforestation (Marlow 1996). In many cases the gains from supercompilation are just particular forms of deforestation. Probably the most practically applied work on deforestation uses GHC’s rewrite rules to optimise programs (Peyton Jones et al. 2001). Shortcut deforestation rewrites many definitions in terms of foldr and build, then combines foldr/build pairs (Gill et al. 1993) to deforest lists. Stream fusion works similarly, but relies on stream/unstream rules (Coutts et al. 2007). All these schemes are only able to optimise functions written in terms of the correct primitives, which have had fusion rules defined. The advantage of supercompilation is that it applies to many types and functions, without any special effort from the program author. 5.4

6.2 Runtime Performance Our performance results are good, but there are always opportunities to improve. We currently rely on GHC’s strictness analysis to run after we have optimised the program, but by integrating a strictness analysis we may be able to do better. The most common uses of GHC’s rules engine, particularly list/stream fusion, are automatically performed by our supercompiler. However, some transformations such as replacing head ◦ sort with minimum, are too complex to automatically infer. It may be of benefit to integrate a rules engine in to our supercompiler. In some cases the author of a program has a particular idea about some intermediate data structure they expect to be eliminated. If these structures remain in the optimised program the performance penalty is sometimes dramatic. Perhaps a user could mark some values they expect to be removed, and then be warned if they remain.

Lower Level Optimisations

Our optimisation works at the Core level, but even once efficient Core has been generated there is still some work before efficient machine code can be produced. Key optimisations include strictness analysis and unboxing (Peyton Jones and Launchbury 1991). In GHC both of these optimisations are done at the core level, using a core language extended with unboxed types. After this lower level core has been generated, it is then compiled to STG machine instructions (Peyton Jones 1992), from which assembly code is generated. There is still work being done to modify the lowest levels to take advantage of the current generation of microprocessors (Marlow et al. 2007). We rely on GHC to perform all these optimisations after our supercompiler generates a target program. The GRIN approach (Boquist and Johnsson 1996) uses whole program compilation for Haskell. It is currently being implemented

6.3 Conclusions Supercompilation is a powerful technique which generalises many of the transformations performed by optimising compilers. We were initially drawn to supercompilation for two reasons. Firstly, all intermediate values have the potential to be eliminated, regardless of their type or the functions which operate on them. Secondly, supercompilation is a single-pass optimisation, avoiding the tricky problem of ordering compiler phases for best optimisation. With these advantages supercompilation has the potential to drastically simplify an optimising compiler, while still achieving great performance. Our supercompiler builds on these advantages, rethinking supercompilation to make it simpler and improve compilation times.

319

Acknowledgements

Simon Marlow, Alexey Rodriguez Yakushev, and Simon Peyton Jones. Faster laziness using dynamic pointer tagging. In Proc. ICFP ’07, pages 277–288. ACM Press, October 2007. John Meacham. jhc: John’s Haskell compiler. http://repetae.net/ john/computer/jhc/, 2008. Neil Mitchell. Transformation and Analysis of Functional Programs. PhD thesis, University of York, 2008. Neil Mitchell and Colin Runciman. A supercompiler for core Haskell. In Selected papers from IFL 2007, volume 5083 of LNCS, pages 147–164. Springer-Verlag, May 2008. Will Partain et al. The nofib Benchmark Suite of Haskell Programs. http://darcs.haskell.org/nofib/, 2008. Simon Peyton Jones. Implementing lazy functional languages on stock hardware: The spineless tagless G-machine. JFP, 2(2):127–202, 1992. Simon Peyton Jones. Haskell 98 Language and Libraries: The Revised Report. Cambridge University Press, 2003. Simon Peyton Jones. Call-pattern specialisation for Haskell programs. In Proc. ICFP ’07, pages 327–337. ACM Press, October 2007.

I would like to thank Max Bolingbroke, Jason Reich, Simon Peyton Jones, Colin Runciman and Peter Jonsson for helpful discussions. Thanks to Ketil Malde for providing further inspiration to continue researching supercompilation. Thanks to Max Bolingbroke, Mike Dodds and the anonymous referees for helpful comments on earlier drafts.

References Urban Boquist and Thomas Johnsson. The GRIN project: A highly optimising back end for lazy functional languages. In Proc IFL ’96, volume 1268 of LNCS, pages 58–84. Springer-Verlag, 1996. Olaf Chitil. Common subexpressions are uncommon in lazy functional languages. LNCS, 1467:53–71, 1998. Duncan Coutts, Roman Leshchinskiy, and Don Stewart. Stream fusion: From lists to streams to nothing at all. In Proc ICFP ’07, pages 315– 326. ACM Press, October 2007. Cormac Flanagan, Amr Sabry, Bruce Duba, and Matthias Felleisen. The essence of compiling with continuations. In Proc PDLI ’93, volume 28(6), pages 237–247. ACM Press, New York, 1993. Yoshihiko Futamura. Partial evaluation of computation process – an approach to a compiler-compiler. Higher-Order and Symbolic Computation, 12(4):381–391, 1999. Andrew Gill, John Launchbury, and Simon Peyton Jones. A short cut to deforestation. In Proc FPCA ’93, pages 223–232. ACM Press, June 1993. Dimitry Golubovsky, Neil Mitchell, and Matthew Naylor. Yhc.Core – from Haskell to Core. The Monad.Reader, 1(7):45–61, April 2007.

Simon Peyton Jones and John Launchbury. Unboxed values as first class citizens in a non-strict functional language. In Proc FPCA ’91, volume 523 of LNCS, pages 636–666. Springer-Verlag, August 1991. Simon Peyton Jones and Simon Marlow. Secrets of the Glasgow Haskell Compiler inliner. JFP, 12:393–434, July 2002. Simon Peyton Jones, Will Partain, and Andre Santos. Let-floating: Moving bindings to give faster programs. In Proc. ICFP ’96, pages 1–12. ACM Press, 1996. Simon Peyton Jones, Andrew Tolmach, and Tony Hoare. Playing by the rules: Rewriting as a practical optimisation technique in GHC. In Proc. Haskell ’01, pages 203–233. ACM Press, 2001. Jens Peter Secher and Morten Sørensen. On perfect supercompilation. In Proceedings of Perspectives of System Informatics, volume 1755 of LNCS, pages 113–127. Springer-Verlag, 2000. Morten Sørensen and Robert Gl¨uck. An algorithm of generalization in positive supercompilation. In Logic Programming: Proceedings of the 1995 International Symposium, pages 465–479. MIT Press, 1995. The GHC Team. The GHC compiler, version 6.12.1. http://www. haskell.org/ghc/, December 2009.

Thomas Johnsson. Lambda lifting: transforming programs to recursive equations. In Proc. FPCA ’85, pages 190–203. Springer-Verlag, 1985. Neil Jones, Carsten Gomard, and Peter Sestoft. Partial Evaluation and Automatic Program Generation. Prentice-Hall International, 1993. Peter Jonsson and Johan Nordlander. Positive supercompilation for a higher order call-by-value language. In POPL ’09, pages 277–288. ACM, 2009. Ilya Klyuchnikov. Supercompiler HOSC 1.0: under the hood. Preprint 63, Keldysh Institute of Applied Mathematics, Moscow, 2009. Ilya Klyuchnikov. Supercompiler HOSC 1.1: proof of termination. Preprint 21, Keldysh Institute of Applied Mathematics, Moscow, 2010.

Andrew Tolmach. An external representation for the GHC core language. http://www.haskell.org/ghc/docs/papers/core.ps. gz, September 2001.

J Kort. Deforestation of a raytracer. Master’s thesis, University of Amsterdam, 1996. Joseph Kruskal. Well-quasi-ordering, the tree theorem, and Vazsonyi’s conjecture. Transactions of the American Mathematical Society, 95(2): 210–255, 1960. Michael Leuschel. Homeomorphic embedding for online termination of symbolic methods. In The essence of computation: complexity, analysis, transformation, pages 379–403. Springer-Verlag, 2002. Simon Marlow. Deforestation for Higher-Order Functional Programs. PhD thesis, University of Glasgow, 1996.

Valentin Turchin. The concept of a supercompiler. ACM Trans. Program. Lang. Syst., 8(3):292–325, 1986. Valentin Turchin. The algorithm of generalization in the supercompiler. In Partial Evaluation and Mixed Copmutation, pages 341–353. NorthHolland, 1988. Valentin Turchin. Refal-5, Programming Guide & Reference Manual. New England Publishing Co., Holyoke, MA, 1989. Philip Wadler. Deforestation: Transforming programs to eliminate trees. Theoretical Computer Science, 73:231–248, 1990. Philip Wadler and Stephen Blott. How to make ad-hoc polymorphism less ad hoc. In Proc. POPL ’89, pages 60–76. ACM Press, 1989.

320

Program Verification Through Characteristic Formulae Arthur Chargu´eraud INRIA [email protected]

Abstract This paper describes CFML, the first program verification tool based on characteristic formulae. Given the source code of a pure Caml program, this tool generates a logical formula that implies any valid post-condition for that program. One can then prove that the program satisfies a given specification by reasoning interactively about the characteristic formula using a proof assistant such as Coq. Our characteristic formulae improve over Honda et al’s total characteristic assertion pairs in that they are expressible in standard higher-order logic, allowing to exploit them in practice to verify programs using existing proof assistants. Our technique has been applied to formally verify more than half of the content of Okasaki’s Purely Functional Data Structures reference book. Categories and Subject Descriptors Verification]: Formal methods General Terms

∀x.  ∀P. (x = 0 ⇒ P 0)  ∧ (x 6= 0 ⇒  (x = 1 ⇒ False)   ∧ (x 6= 1 ⇒   ∃P 0 . (AppReturns half (x − 2) P 0 ) ∧ (∀y. (P 0 y) ⇒ P (y + 1)) )) ⇒ AppReturns half x P When x is equal to zero, the function half returns zero. So, if we want to show that half returns a value satisfying P , we have to prove “P 0”. When x is equal to one, the function half crashes, so we cannot prove that it returns any value. The only way to proceed is to show that the instruction fail cannot be reached. Hence the proof obligation False. Otherwise, we want to prove that “let y = half (x − 2) in y + 1” returns a value satisfying P . To that end, we need to exhibit a post-condition P 0 such that the recursive call to half on the argument x − 2 returns a value satisfying P 0 . Then, for any name y that stands for the result of this recursive call, assuming that y satisfies P 0 , we have to show that the output value y + 1 satisfies the post-condition P . More generally, the characteristic formula JtK associated with a term t can be used to prove that this term returns a value satisfying a particular post-condition. For any post-condition P , the term t terminates and returns a value satisfying P if and only if the proposition “JtK P ” is true. The application “JtK P ” is a standard higher-order logic proposition that can be proved using an off-theshelf proof assistant. Thus, characteristic formulae can be used in practice to verify that a program satisfies its specification. For program verification to be realistic, the proof obligation “JtK P ” should be easy to read and manipulate. Fortunately, our characteristic formulae can be pretty-printed in a way that closely resemble source code. For example, the characteristic formula associated with half is displayed as follows.

D.2.4 [Software/Program

Verification

1.

Overview

1.1

Introduction to characteristic formulae

This paper describes an effective technique to formally specify and verify the source code of an existing purely functional program. The key idea is to generate, in a systematic manner, a logical formula for each top-level definition from the source program. Those formulae, expressed solely with standard higher-order logic connectives, carry a precise account of what the program does. Verification of the program can then be conducted by reasoning on its characteristic formula using an off-the-shelf proof assistant. For the sake of example, consider the following recursive function, which divides by two any non-negative even integer. let rec half x = if x = 0 then 0 else if x = 1 then fail else let y = half (x − 2) in y+1

LET half := Fun x 7→ If x = 0 Then Return 0 Else If x = 1 Then Fail Else Let y := App half (x − 2) In Return (y + 1)

The corresponding characteristic formula appears next. Given an argument x and a post-condition P , the characteristic formula for half describes what needs to be proved in order to establish that the application of half to x terminates and returns a value satisfying the predicate P , written “AppReturns half x P ”.

At first sight, it might appear that the characteristic formula is merely a rephrasing of the source code in some other syntax. To some extend, this is true. A characteristic formula is a sound and complete description of the behaviour of a program. Thus, it carries no more and no less information than the source code of the program itself. However, characteristic formulae enable us to move away from program syntax and conduct program verification entirely at the logical level. Characteristic formulae thereby avoid all the technical difficulties associated with manipulation of program syntax and make it possible to work directly in terms of higherorder logic values and formulae.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

321

1.2

Specification and verification

The interesting steps in that proof are: the setting up of the induction on the set of non-negative integers (xinduction), the application of the characteristic formula (xcf), the case analysis on the value of x (xcase), and the instantiation of the ghost variable n with the value “n − 1” when reasoning on the recursive call to half (xapp). The tactic auto runs a goal-directed proof search and may also rely on a decision procedure for linear arithmetic. The tactic introv is used to assign names to hypotheses. Such explicit naming is not mandatory, but in general it greatly improves readability of proof obligations and robustness of proof scripts. When working with characteristic formulae, proof obligations always remain very tidy. The Coq goal obtained when reaching the subterm “let y = half (x − 2) in y + 1” is shown below. In the conclusion (stated below the line), the characteristic formula associated with that subterm is applied to the post-condition to be established (= n). The context contains the two pre-conditions n ≥ 0 and x = 2 ∗ n, the negation of the conditionals that have been tested, x 6= 0 and x 6= 1, as well as the induction hypothesis, which asserts that the specification that we are trying to prove for half already holds for any non-negative argument x0 smaller than x.

One of the key ingredient involved in characteristic formulae is the predicate AppReturns, which is used to specify functions. Because of the mismatch between program functions, which may fail or diverge, and logical functions, which must always be total, we cannot represent program function using logical functions. For this reason, we introduce an abstract type, named Func, which we use to represent program functions. Values of type Func are exclusively specified in terms of the predicate AppReturns. The proposition “AppReturns f x P ” states that the application of the function f to an argument x terminates and returns a value satisfying P . Hence the type of AppReturns, shown below. AppReturns : ∀A B. Func → A → (B → Prop) → Prop Remark: an OCaml function f of type A → B is described in Coq at the type Func, regardless of what A and B might be. This is not a problem because propositions of the form “AppReturns f x P ” can only be derived when x has type A and P has type B → Prop. The predicate AppReturns is used not only in the definition of characteristic formulae but also in the statement of specifications. One possible specification for half is the following: if x is the double of some non-negative integer n, then the application of half to x returns an integer equal to n. The corresponding higher-order logic statement appears next.

x : int IH : forall x’, 0 <= x’ -> x’ < x -> forall n, n >= 0 -> x’ = 2 * n -> AppReturns half x’ (= n) n : int Pos : n >= 0 Eq : x = 2 * n C1 : x <> 0 C2 : x <> 1 ----------------------------------------------(Let y := App half (x-2) in Return (1+y)) (= n)

∀x. ∀n. n ≥ 0 ⇒ x = 2 ∗ n ⇒ AppReturns half x (= n) Remark: the post-condition (= n) is a partial application of equality: it is short for “λa. (a = n)”. Here, the value n corresponds to a ghost variable: it appears in the specification of the function but not in its source code. The specification that we have considered for half might not be the simplest one, however it illustrates our treatment of ghost variables. Our next step is to prove that the function half satisfies its specification using its characteristic formula. We first give the mathematical presentation of the proof and then show the corresponding Coq proof script. The specification is to proved by induction on x. Let x and n be such that n ≥ 0 and x = 2∗n. We apply the characteristic formula to prove “AppReturns half x (= n)”. If x is equal to 0, we conclude by showing that n is equal to 0. If x is equal to 1, we show that x = 2 ∗ n is absurd. Otherwise, x ≥ 2. We instantiate P 0 as “= n − 1”, and prove “AppReturns half (x − 2) P 0 ” using the induction hypothesis. Finally, we show that, for any y such that y = n − 1, the proposition y + 1 = n holds. This completes the proof. Note that, through this proof by induction, we have proved that the function half terminates on its domain. Formalizing the above piece of reasoning in a proof assistant is straightforward. In Coq, a proof script takes the form of a sequence of tactics, each tactic being used to make some progress in the proof. The verification of the function half could be done using only built-in Coq tactics. Yet, for the sake of conciseness, we rely on a few specialized tactics to factor out repeated proof patterns. For example, each time we reason on a “if” statement, we want to split the conjunction at the head of the goal and introduce one hypothesis in each subgoal. The tactics specific to our framework can be easily recognized: they start with the letter “x”. The verification proof script for half appears next. xinduction (downto 0). xcf. introv IH Pos Eq. xcase. xret. auto. xfail. auto. xlet. xapp (n-1); auto. xret. auto.

(* (* (* (* (*

As illustrated through the example, a verification proof script typically interleaves applications of “x”-tactics with pieces of general Coq reasoning. In order to obtain shorter proof scripts, we set up an additional tactic that automates the invokation of x-tactics. This tactic, named xgo, simply looks at the head of the characteristic formula and applies the appropriate x-tactic. A single call to xgo may analyse an entire characteristic formula and leave a set of proof obligations, in a similar fashion as a Verification Condition Generator (VCG). Of course, there are pieces of information that xgo cannot infer. Typically, the specification of local functions must be provided explicitly. Also, the instantiation of ghost variables cannot always be inferred. In our example, Coq automation is slightly too weak to infer that the ghost variable n should be instantiated as n − 1 in the recursive call to half. In practice, xgo will stop running whenever it lacks too much information to go on. The user may also explicitly tell xgo to stop at a given point in the code. Moreover, xgo accepts hints to be exploited when some information cannot be inferred. For example, we can run xgo with the indication that the function application whose result is named y should use the value n − 1 to instantiate a ghost variable. In this case, the verification proof script for the function half is reduced to: xinduction (downto 0). xcf. intros. xgo~ ’y (Xargs (n-1)). Note that automation, denoted by the tilde symbol, is able to handle all the subgoals produced by xgo. For simple functions like half, a single call to xgo is usually sufficient. However, for more complex programs, the ability of xgo to be run only on given portions of code is crucial. In particular, it allows one to stop just before a branching point in the code in order to establish facts that are needed in several branches. Indeed, when a piece of reasoning needs to be carried out manually, it is

x = 0 *) x = 1 *) otherwise *) half (x-2) *) return y+1 *)

322

without referring to its syntactic definition. For the sake of reasoning on functional correctness, programs can be studied in terms of their most-general specification. The theoretical insight that any program admits a most-general Hoare triple which entails all other correct specifications is nearly as old as Hoare logic. Gorelick [9] proved that every program admits a weakest pre-condition (the minimum requirement to ensure safe termination) and a strongest post-condition (the maximal amount of information that can be gathered about the output of the program). The suggestion that most-general specifications could be exploited to verify programs first appears, as far as we know, in recent work by Honda, Berger and Yoshida [10]. The authors consider a particular Hoare logic and exhibit an algorithm for constructing the total characteristic assertion pair (TCAP) of a program, which corresponds to most-general Hoare triple. TCAPs offer an alternative way of proving that a program satisfies a given specification: rather than building a derivation using the reasoning rules of the Hoare program logic, one may simply prove that the pre-condition of the specification implies the weakest pre-condition and that the post-condition of the specification is implied by the strongest postcondition. The verification of those two implications can be conducted entirely at the logical level. Our work builds upon a similar idea, relying on characteristic formulae to move away from program syntax and carry out the reasoning in the logic. Our main contribution is to express the characteristic formula of a program in terms of a standard higher-order logic. By contrast, TCAPs are expressed in an ad-hoc logic. In particular, the values from this logic are well-typed PCF values, including first-class functions. It is not immediate to translate this logic into a standard logic, because of this mismatch between program functions, which may fail or diverge, and logical functions, which must always be total. Due to the nonstandard logic it relies upon, Honda et al’s TCAPs cannot be manipulated in an existing theorem prover. In this work, we show how an abstract type Func can be introduced to support the ability to refer to first-class functions from the logic. Our characteristic formulae also improve over TCAPs in that they are human-readable. While Honda et al’s TCAP did not fit on a screen for a program of more than a few lines, we show characteristic formulae can be displayed just like source code. The ability to read charactersitic formulae is very important in interactive proofs since the characteristic formula shows up as part of the proof obligation that the user must discharge.

extremely important to avoid duplicating the corresponding proof script across several branches. To summarize, our approach allows for very concise proof scripts whenever verifying simple pieces of code, thanks to the automated processing done by xgo and to the good amount of automation available through the proof search mechanism and the decision procedures that can be called from Coq. In the same time, when verifying more complex code, our approach offers a very fine-grained control on the structure of the proofs and it greatly benefits from the integration in a proof assistant for proving nontrivial facts interactively. 1.3

Implementation

Our implementation is named CFML, an acronym for Characteristic Formulae for ML. It parses an OCaml source code and normalizes its syntax, making sure that applications and function definitions be bound to a name. Our tool then type-checks the code and produces a set of Coq definitions. For each type definition in the source program, it generates the corresponding definition in the logic. For each top-level value definition, it introduces one abstract variable to represent the result of the evaluation of this definition, plus one axiom stating the characteristic formula associated with the definition. For example, for the program “let x = let y = 2 in y ∗ y”, we generate a first axiom, named x, of type int, and a second axiom with a type of the form “∀P. [. . .] ⇒ P x”. This characteristic formula for x describes what needs to be proved in order to establish that x satisfies a given predicate P . We have proved on paper that characteristic formulae are sound with respect to the logic of Coq, by showing that those axioms could be realized in Coq, at least in theory. (In practice, generating actual proof terms would require a lot of effort, so we have not implemented it.) Moreover, in order to preserve soundness, each time we introduce an axiom to represent a value we generate a proof that the type of this value is inhabited. For example, our tool rejects the program definition “let x = fail” because the type ∀A.A cannot be proved to be inhabited. Rejecting this kind of program is not really a limitation since it would not be possible anyway to prove that such a program returns a value. For the time being, only purely functional programs are supported. However, we strongly believe that characteristic formulae can be extended with heap descriptions and frame rules, without compromising the possibility of pretty-printing characteristic formulae like source code. We leave the extension to side-effects to future work and focus in this paper on demonstrating the benefits of characteristic formulae for reasoning on pure programs. This paper is organized as follows. First, we explain how our approach compares against existing program verification techniques (§2). Second, we describe formalizations of purely functional data structures (§3). Third, we describe the algorithm for generating characteristic formulae (§4), and formally define our specification predicates (§5). Finally, we discuss the soundness and completeness of characteristic formulae (§6), and conclude (§7).

2.

Comparison with related work

2.1

Characteristic formulae

2.2

Verification Condition Generators

Tools such as Spec# [1] for C# programs, Krakatoa [14] for Java programs, Caduceus [7] for C programs, Pangolin [24] for pure ML programs, and Who [11] for imperative ML programs, are all based on VCGs. They generate a set of proof obligations and rely on automated theorem provers to discharge these obligations. In the latter three systems, proof obligations that are not verified automatically can be discharged using an interactive proof assistant. However, in practice, those proof obligations are often large and clumsy, and their proofs are generally quite brittle because proof obligations are very sensitive to changes in either the source code or its invariants. In our approach, proof obligations remain tidy and can be easily related to the point of the program they arise from. Moreover, the user has the possibility to invest a little extra effort in naming hypotheses explicitly in order to be able to build very robust proof scripts. The tool Jahob [26], which supports the verification of linked data structures implemented in a subset of Java, tries to avoid as much as possible the need for interactive proofs by annotating programs not only with their invariants but also with proof hints to guide automated theorem provers. As acknowledged by the authors, finding the appropriate hints can be very time-consuming. In

The notion of characteristic formula originates in process calculi. Given the syntactic definition of a process, the idea is to generate a temporal logic formula that precisely describes that process [12, 17, 23]. In particular, behavioural equivalence or dis-equivalence of two processes can be established by comparing their characteristic formulae. Such a proof can be conducted in temporal logic rather than through reasoning on the syntactic definition of the processes. In a similar way, the characteristic formula of a program is a logical formula that carries a precise description of this program,

323

mentation [20] try to overcome this limitation by extending Coq with a monad in order to support effects. Like in Russel, specification appears in types. They typically take the form “STsep P Q” where P and Q describe the pre- and the post-condition in terms of heap descriptions. Verification proofs are constructed by application of Coq lemmas that correspond to the reasoning rules of the program logic. This process is partially automated through a tactic (which is implemented by reflection). In our approach, most of this work is performed during the generation of characteristic formulae, by our external tool. In the end, although the implementation strategies differ, similar kinds of proof obligations are generated. Note that the trusted base of HTT is not much smaller than ours since HTT also needs to rely on some external tool in order to extract OCaml or Haskell code from Coq scripts. Although we do not yet support side effects, we see one main advantage that characteristic formulae may have compared to HTT-based approaches in the long run. Characteristic formulae can be adapted to existing programming languages. On the contrary, following HTT’s approach forces one to rewrite programs in terms of the language of Coq and of the constructors of HTT’s monad. Some programming language features cannot be handled easily by HTT. For example, because pattern matching is deeply hard-wired in Coq, supporting handy features such as alias-patterns and when-clauses would be a real challenge for HTT. A slightly different approach to shallow embeddings relies on the definition of a translation from a programming language into higher-order logic. Myreen et al [18] describe an effective technique for reasoning on machine code, which consists in decompiling machine code procedures into higher-order logic functions. This translation is possible only because the functional translation of a while loop is a tail-recursive function, and that nonterminating tail-recursive functions are safely accepted as logical definitions in HOL4. Lemmas proved interactively about the higher-order logic functions can then be automatically transformed into lemmas about the behaviour of the machine code. While this approach works for reasoning on machine code, it does not seem possible to apply it to programs featuring arbitrary recursion and higher-order functions.

particular, one needs to compute and read the new proof obligations after any modification of a hint. Moreover, guessing hints requires a deep understanding of the VCG process and of the automated theorem provers being used. Nevertheless, there are some particular situations where providing such hints is actually very effective. Our approach naturally supports this proof technique, simply by giving the appropriate hints as argument to our tactic xgo. We may also set up Coq automation to apply a user-defined sequence of tactics to any proof obligation satisfying a particular pattern. Among the tools cited above, few of them support higher-order functions: Pangolin [24] and Who [11], which combines ideas from Caduceus [7] and Pangolin [24] to handle effectfull higherorder programs. One notable difference with our work lies in the way in which functions are lifted to the logical level. In Pangolin and Who, a function is reflected in the logic as a pair of a precondition and of a post-condition. Instead, we reflect a function in the logic as a value of the abstract type Func and use AppReturns to specify the behaviour of this value. We believe that our approach is more appropriate when functions are given several specifications, when functions are stored in data-structures, and when higher-order functions are applied to functions specified with ghost variables. 2.3

Shallow embedding techniques

A radically different approach consists in programming directly within a theorem prover and verifying properties of the code interactively inside the same framework. Indeed, the logic of a proof assistant such as Coq is so rich that it contains a purely functional programming language. An extraction mechanism can then be used to isolate the actual source code from proof-specific elements. The shallow embedding approach can be applied in two very different styles, depending on how much types are used to enforce invariants. The first possibility is to write programs using only basic ML types. This style is employed for instance in Leroy’s formallyverified C compiler [13]. While it can be quite effective for some applications, this approach also suffers from a number of severe restrictions that restrict its scope of use. In particular, all functions must be total and recursive functions must satisfy a syntactical termination criteria. On the contrary, characteristic formulae can accommodate various syntaxes for the source language, allowing for the verification of existing programs. In particular, any (well-typed) function definition can be handled: termination does not need to be established at definition time but can be proved by induction while reasoning on the characteristic formula (the induction may be on a measure, on a well-founded relation or on any Coq predicate). The second possibility is to write programs with more elaborated types, relying on dependent types to carry invariants (e.g. using the type “list n” to describe lists of length n). Programming with dependent types has been investigated in particular in Epigram [15], Adga [5] and Russell [25]. The latter is an extension to Coq, which behaves as a permissive source language which elaborates into Coq terms. In Russel, establishing invariants, justifying termination of recursion and proving the inaccessibility of certain branches from a pattern matching can be done through interactive Coq proofs. While Russel certainly manages to smoothen the writing of dependently-typed terms, the manipulation of dependent types remains fairly technical for non-experts. Moreover, the treatment of ghost variables remains problematic in the current implementation of Coq because extraction is not sufficiently fine-grained to erase all ghost variables. As a consequence, some ghost variables may remain in the extracted code, leading to runtime inefficiencies and possibly to incorrect asymptotic complexity. Because they rely directly on Coq terms, the two shallow embedding approaches describe above cannot support impure programming features such as side-effects and non-termination. HTT [19], its implementation in Ynot [4] and HTT’s new imple-

2.4

Deep embedding techniques

A fourth approach to reasoning formally on programs consists in describing the syntax and the semantics of a programming language in the logic of a proof assistant using inductive definitions. In theory, the deep embedding approach can be applied to any programming language, it does not suffer from any limitation in terms expressiveness and it is compatible with the use of interactive proofs. Mehta and Nipkow [16] have set up a proof of concept of a deep embedding, axiomatizing a small procedural language in Isabelle, proving Hoare-style reasoning rules, and verifying a short program using those reasoning rules. More recently, the frameworks XCAP [21] and SCAP [6] rely on deep embeddings for reasoning in Coq about assembly programs. They support reasoning on advanced patterns such as strong updates, embedded code pointers and higher-order calls. They have been used to verify short but complex assembly routines, whose proof involves hundreds of lines per instruction. Previoulsy, the author of the present paper has investigated the use of a deep embedding of the pure fragment of OCaml in Coq [2]. Characteristic formulae arose from that work, bringing major improvements. In a deep embedding, reasoning rules of the program logic take the form of lemmas that are proved correct with respect to the axiomatized semantics of the source language. When verifying a program, those reasoning rules are applied almost in a systematic manner, following the syntax of the program. The idea that the application of those reasoning rules could be anticipated lead to characteristic formulae.

324

module type Fset = sig | type elem | type fset | val empty: fset | val insert: elem -> fset -> fset | val member: elem -> fset -> bool | end |

To illustrate this idea, consider the rule for reasoning on letexpressions in a deep embedding. The rule reads as follows: to show that “let x = t1 in t2 ” returns a value satisfying P , the subterm t1 must be shown to return a value satisfying a postcondition P 0 , and the term t2 must be shown to return a value satisfying P under the assumption that x satisfies P 0 . The statement of this rule, shown below, relies on a predicate capturing that a term t returns a value satisfying a post-condition P , written “t ⇓ | P ”. (For the sake of presentation, many technical details are omitted.)

Figure 1. Module signatures for finite sets and ordered types module RedBlackSet (Elem : Ordered) : Fset = struct type elem = Elem.t type color = Red | Black type fset = Empty | Node of color * fset * elem * fset let empty = Empty

t1 ⇓ | P 0 ∀x. P 0 x ⇒ t2 ⇓ | P (let x = t1 in t2 ) ⇓ | P With characteristic formulae, the proposition “Jlet x = t1 in t2 K P ” captures the fact that “let x = t1 in t2 ” returns a value satisfying P . This proposition is defined in terms of the characteristic formulae Jt1 K and Jt2 K associated with the two subterms t1 and t2 . More precisely, “Jt1 K P 0 ” asserts that t1 returns a value satisfying P 0 and “Jt2 K P ” asserts that t2 returns a value satisfying P . Formally:

let rec member x = function | Empty -> false | Node (_,a,y,b) -> if Elem.lt x y then member x a else if Elem.lt y x then member x b else true

Jlet x = t1 in t2 K P = ∃P 0 . Jt1 K P 0 ∧ ∀x. P 0 x ⇒ Jt2 K P

let balance = function | (Black, Node (Red, Node (Red, a, x, b), y, c), z, d) | (Black, Node (Red, a, x, Node (Red, b, y, c)), z, d) | (Black, a, x, Node (Red, Node (Red, b, y, c), z, d)) | (Black, a, x, Node (Red, b, y, Node (Red, c, z, d))) -> Node (Red, Node(Black,a,x,b), y, Node(Black,c,z,d)) | (col,a,y,b) -> Node(col,a,y,b)

Although this equation looks very similar to the reasoning rule, there is one important difference. With the program logic reasoning rule, the intermediate specification P 0 needs to be provided at the time of applying the rule. On the contrary, characteristic formulae are able to anticipate the application of the reasoning rule even without any knowledge of this intermediate specification, thanks to the existential quantification on P 0 . While it may appear to be fairly natural, this form of existential quantification of an intermediate specification, which takes full advantage of the strength of higherorder logic, does not seem to have been exploited in previous work. From our experience on working on the verification of pure OCaml programs both with a deep embedding and with characteristic formulae, we conclude that moving to characteristic formulae brings at least three major improvements. First, characteristic formulae do not need to represent and manipulate program syntax. Thus, they avoid many technical difficulties, in particular those associated with the representation of binders. Also, the repeated computations of substitutions that occur during the verification of a deeply-embedded program typically lead to the generation of a proof term of quadratic size, which can be problematic for scaling up to larger programs. Second, with characteristic formulae there is no need to apply reasoning rules of the program logic manually. Indeed, the applications of those rules have been anticipated in the characteristic formulae. A practical consequence is that proof scripts are lighter and easier to automate. Third and last, characteristic formulae avoid the need to relate the deep embedding of program values with the corresponding logical values, saving a lot of technical burden. For example, consider a list of integers in a OCaml program. In the deep embedding, the description of this list is encoded using constructors from the grammar of OCaml values. With characteristic formulae, program values are translated into logical values once and for all upon generation of the formula. Thus, the list of integers would appear in the characteristic formula directly as a list of integers, significantly simplifying proofs. The fact that characteristic formulae outperform deep embeddings is after all not a surprize: characteristic formulae can be seen as an abstract layer built on the top of a deep embedding, so as to hide uninteresting details and retain only the essence of the reasoning rules supported by the deep embedding.

3.

module type Ordered = sig type t val lt: t -> t -> bool end

let rec insert x s = let rec ins = function | Empty -> Node(Red,Empty,x,Empty) | Node(col,a,y,b) as s -> if Elem.lt x y then balance(col,ins a,y,b) else if Elem.lt y x then balance(col,a,y,ins b) else s in match ins s with | Empty -> raise BrokenInvariant | Node(_,a,y,b) -> Node(Black,a,y,b) end

Figure 2. Okasaki’s implementation of Red-Black sets lent benchmark for testing the usability of our approach to program verification. So far, we have verified more than half of the contents of the book. This paper focuses on the formalization of red-black trees and give statistics on the other formalizations completed. Red-black trees behave like binary search trees except that each node is tagged with a color, either red or black. Those tags are used to maintain balance in the tree, ensuring a logarithmic asymptotic complexity. Okasaki’s implementation appears in Figure 2. It consists of a functor that, given an ordered type, builds a module matching the signature of finite sets. Signatures appear in Figure 1. We specify each OCaml module signature through a Coq module signature. We then verify each OCaml module implementation through a Coq module implementation that contains lemmas establishing that the OCaml code satisfies its specification. We rely on Coq’s module system to ensure that the lemmas proved actually correspond to the expected specification. This strategy allows for modular verification of modular programs. 3.1

Specification of the signatures

In order to specify functions manipulating red-black trees, we need to introduce a representation predicate called rep. Intuitively, every data structure admits a mathematical model. For example, the model of a red-black tree is a set of values. Similarly, the model of a priority queue is a multiset, and the model of a queue is a se-

Formalizing purely functional data structures

Chris Okasaki’s book Purely Functional Data Structures [22] contains a collection of efficient data structures, with concise implementation and nontrivial invariants. Its code appeared as a excel-

325

Module Type FsetSigSpec. Declare Module F : MLFset. Import F.

quence (a list). Sometimes, the mathematical model is simply the value itself. For instance, the model of an integer or of a value of type color is just the value itself. We formalize models through instances of a typeclass named Rep. If values of a type a are modelled by values of type A, then we write “Rep a A”. For example, consider red-black trees that contain items of type t. If those items are modelled by values of type T (i.e. Rep t T ), then trees of type fset are modelled by values of type set T (i.e. Rep fset (set T )), where set is the type constructor for mathematical sets in Coq. The typeclass Rep contains two constructors, as shown below. For an instance of type “Rep a A”, the first field, rep, is a binary relation that relates values of type a with their model, of type A. Note that not all values admit a model. For instance, given a redblack tree e, the proposition “rep e E” can only hold if e is a wellbalanced, well-formed binary search tree. The second field of Rep, named rep unique, is a lemma asserting that every value of type a admits at most one model (we sometimes need to exploit this fact in proofs).

Parameter T : Type. Instance elem_rep : Rep elem T. Instance fset_rep : Rep fset (set T). Parameter empty_spec : rep empty \{}. Parameter insert_spec : RepTotal insert (X;elem) (E;fset) >> = \{X} \u E ; fset. Parameter member_spec : RepTotal member (X;elem) (E;fset) >> bool_of (X \in E). End FsetSigSpec.

Figure 3. Specification of finite sets

Module Type OrderedSigSpec. Declare Module O : MLOrdered. Import O. Parameter T : Type. Instance rep_t : Rep t T. Instance le_inst : Le T. Instance le_order : Le_total_order.

Class Rep (a:Type) (A:Type) := { rep : a -> A -> Prop; rep_unique : forall x X Y, rep x X -> rep x Y -> X = Y }.

Parameter lt_spec : RepTotal lt (X;t) (Y;t) >> bool_of (LibOrder.lt X Y).

Remark: while representation predicates have appeared in previous work (e.g. [7, 16, 19]), our work seems to be the first to use them in a systematic manner through a typeclass definition. Figure 3 contains the specification for an abstract finite set module named F. Elements of the sets, of type elem, are expected to be modelled by some type T and to be related to their models by an instance of type “Rep elem T ”. Moreover, the values implementing finite sets, of type fset, should be related to their model, of type set T , through an instance of type “Rep fset (set T )”. The module signature then contains the specification of the values from the finite set module F. The first one asserts that the value empty should be a representation for the empty set. The specifications for insert and member rely on a special notation, explained next. So far, we have relied on the predicate AppReturns to specify functions. While this works well for functions of one argument, it becomes impractical for curried functions of higher arity, in particular because we want to specify the behaviour of partial applications. So, we introduce the Spec notation, explaining its meaning informally and postponing its formal definition to §5.2. With the Spec notation, the specification of insert, shown below, reads like a prototype: insert takes two arguments, x of type elem and e of type fset. Then, for any model X of x and for any set E that models e, the function returns a finite set e’ which admits a model E’ equal to {X} ∪ E. (\{X} is a Coq notation for a singleton set.)

End OrderedSigSpec.

Figure 4. Specification of ordered types

The specification is now stated entirely in terms of the models, and does no longer refer to the names of OCaml input and output values. Only the type of those program values remain visible. Those type annotation are introduced by semi-columns. The specification for the function insert given in Figure 3 makes two further simplifications. First, it relies on the notation RepTotal, which avoids the introduction of a name R when it is immediately applied. Second, we have employed for the sake of conciseness a partial application of equality, of the form “= {X} ∪ E”. Overall, the interest of introducing several layers of notation is that the final specifications from Figure 3 are about the simplest possible formal specifications one could hope for. Let us describe briefly the remaining specifications. The function member takes as argument a value x and a finite set e, and returns a boolean which is true if and only if the model X of x belongs to the model E of e. Figure 4 contains the specification of an abstract ordered type module named O. Elements of the ordered type t should be modelled by a type T . Values of type T should be ordered by a total order relation. The order relation and the proof that it is total are described through instances of the typeclasses Le and Le total order, respectively. An instance of the strict-order relation (LibOrder.lt) is automatically derived through the typeclass mechanism. This relation is used to specify the boolean comparison function lt, defined in the module O.

Parameter insert_spec : Spec insert (x:elem) (e:fset) |R>> forall X E, rep x X -> rep e E -> R (fun e’ => exists E’, rep e’ E’ /\ E’ = \{X} \u E). The variable R should be read as “the application of insert returns a value satisfying the following post-condition”. R is bound in “|R>>”and it is applied to the post-condition of the function. As it is often the case that arguments and/or results are described through their rep predicate, we introduce the RepSpec notation. With this new layer of syntactic sugar, the specification becomes:

3.2

Verification of the implementation

It remains to verify the implementation of red-black trees. Consider a module O describing an ordered type. Assume the module O has been verified through a Coq module named OS of signature OrderedSigSpec. Our goal is then to prove correct the module obtained by applying the functor RedBlackSet to the module O, through the construction of Coq module of signature FsetSigSpec. Thus, the verification of the OCaml functor RedBlackSet is carried through the implementation of a Coq functor named RedBlackSet-

Parameter insert_spec : RepSpec insert (X;elem) (E;fset) |R>> R (fun E’ => E’ = \{X} \u E ; fset).

326

Inductive inv : nat -> fset -> set T -> Prop := | inv_empty : forall, inv 0 Empty \{} | inv_node : forall n m col a y b A Y B, inv m a A -> inv m b B -> rep y Y -> foreach (is_lt Y) A -> foreach (is_gt Y) B -> (n = match col with Black => m+1 | Red => m end) -> (match col with | Black => True | Red => root_color a = Black /\ root_color b = Black end) -> inv n (Node col a y b) (\{Y} \u A \u B).

Spec, which depends both on the module O and on its specification OS. The first few lines of this Coq functor are shown below. Module RedBlackSetSpec (O:MLOrdered) (OS:OrderedSigSpec with Module O:=O) <: FsetSigSpec with Definition F.elem := O.t. Module Import F <: MLFset := MLRedBlackSet O. The next step in the construction of this functor is the definition of an instance of the representation predicate for red-black trees. To start with, assume that our goal is simply to specify a binary search tree. The rep predicate would be defined in terms of an inductive invariant called inv, as shown below. First, inv relates the empty tree to the empty set. Second, inv relates a node with root y and subtrees a and b to the set {Y} ∪ A ∪ B, where the uppercase variables are the model associated with their lowercase counterpart. Moreover, we need to ensure that all the elements of the left subtree A are smaller than the root Y, and that, symmetrically, elements from B are greater than Y. Those invariants are stated with help of the predicate foreach. The proposition “foreach P E” asserts that all the elements in the set E satisfy the predicate P .

Global Instance set_rep : Rep fset (set T). Proof. apply (Build_Rep (fun e E => exists n, inv n e E /\ root_color e = Black)). Defined.

Figure 5. Representation predicate for red-black trees

Lemma insert_spec : RepTotal insert (X;elem) (E;fset) >> = \{X} \u E ; fset. Proof. xcf. introv RepX (n&InvE&HeB). xfun_induction_nointro_on size (Spec ins e |R>> forall n E, inv true n e E -> R (fun e’ => inv (is_black (root_color e)) n e’ (\{X} \u E))). clears s n E. intros e IH n E InvE. inverts InvE as. xgo*. simpl. constructors*. introv InvA InvB RepY GtY LtY Col Num. xgo~. (* case insert left *) destruct~ col; destruct (root_color a); tryifalse~. ximpl as e. simpl. applys_eq* Hx 1 3. (* case insert right *) destruct~ col; destruct (root_color b); tryifalse~. ximpl as e. simpl. applys_eq* Hx 1 3. (* case no insertion *) asserts_rewrite~ (X = Y). apply~ nlt_nslt_to_eq. subst s. simpl. destruct col; constructors*. xlet as r. xapp~. inverts Pr; xgo. fset_inv. exists*. Qed.

Inductive inv : fset -> set T -> Prop := | inv_empty : inv Empty \{} | inv_node : forall col a y b A Y B, inv a A -> inv b B -> rep y Y -> foreach (is_lt Y) A -> foreach (is_gt Y) B -> inv (Node col a y b) (\{Y} \u A \u B). A red-black tree is a binary search tree satisfying three invariants. First, every path from the root to a leaf contains the same number of black nodes. Second, no red node can have a red child. Third, the root of the tree must be black. In order to capture the first invariant, we extend the predicate inv so that it depends on a natural number n representing the number of black nodes to be found in every path. For an empty tree, this number is zero. For a nonempty tree, this number is equal to the number m of black nodes that can be found in every path of each of the two subtrees, augmented by one if the node is black. The second invariant, asserting that a red node must have black children, can be enforced simply by testing colors. Finally, the rep predicate relates a red-black tree e with a set E if there exists a value n such that “inv n e E” holds and such that the root of e is black (the third invariant). The extended definition of inv appears in Figure 5. In practice, we further extend the invariant with an extra boolean (this extended definition does not appear in the present paper). When the boolean is true, the definition of inv is unchanged. However, when the boolean is false, then second invariant might be broken at the root of the tree. This relaxed version of the invariant is useful to specify the behaviour of the function balance. Indeed, this function takes as input a color, an item and two subtrees, and one of those two subtrees might have its root incorrectly colored. Figure Figure 6 shows the lemma corresponding to the verification of insert. Observe that the local recursive function ins is specified in the script. It is then verified with help of the tactic xgo. 3.3

[...]

Figure 6. Invariant and model of red-black trees

but note that OCaml is a concise language and that Okasaki’s code is particularly minimalist. Details are given about Coq scripts. The column “inv” indicates the number of lines needed to state the invariant of each structure. The column “facts” gives the length of proof script needed to state and prove facts that are used several times in the verification scripts. The column “spec” indicates the number of lines of specification involved, including the specification of local and auxiliary functions. Finally, the last column describes the size of the actual verification proof scripts where characteristic formulae are manipulated. Note that Coq proof scripts also contain several lines to import and instantiate modules, a few lines to set up automation, as well as one line per function to register its specification in a database of lemmas. We evaluate the relative cost of a formal verification by comparing the number of lines specific to formal proofs (figures from columns “facts” and “verif”) against the number of lines required in a properly-documented source code (source code plus invariants and specifications). For particularly-tricky data structures, such as bootstrapped queues, Hood-Melville queues and binominal heaps, this ratio is close to 2.0. In all other structures, the ration does not exceed 1.25. For a user as fluent in Coq proofs as in OCaml programming, it means that the formalization effort can be expected to be comparable to the implementation and documentation effort.

Statistics

We have specified and verified various implementations of queues, double-ended queues, priority queues (heaps), sets, as well as sortable lists, catenable lists and random-access lists. OCaml implementations are directly adapted from Okaski’s SML code [22]. All code and proofs can can be found online.1 Figure 7 contains statistics on the number of non-empty lines in OCaml source code and in Coq scripts. The programs considered are generally short, 1 http://arthur.chargueraud.org/research/2010/cfml/

327

Development Caml Coq BatchedQueue 20 73 BankersQueue 19 95 PhysicistsQueue 28 109 RealTimeQueue 26 104 ImplicitQueue 35 149 BootstrappedQueue 38 212 HoodMelvilleQueue 41 363 BankersDeque 46 172 LeftistHeap 36 132 PairingHeap 33 137 LazyPairingHeap 34 132 SplayHeap 53 176 BinomialHeap 48 367 UnbalancedSet 21 85 RedBlackSet 35 183 BottomUpMergeSort 29 151 CatenableList 38 153 RandomAccessList 63 272 Total 643 3065

inv 4 6 8 4 25 22 43 7 16 13 12 10 24 9 20 23 9 29 284

facts 0 20 10 12 21 54 53 26 28 17 24 41 118 11 43 31 20 37 566

spec 16 15 19 21 14 29 33 24 15 16 14 20 41 5 22 9 23 47 383

verif 16 16 32 28 50 77 180 58 22 35 32 59 110 22 53 40 37 83 950

schema is recalled below. T S 4.2

Characteristic formula generation

4.1

Source language and normalization

CFML takes as input programs written in the pure fragment of OCaml, which includes algebraic data types, pattern matching, higher-order functions, recursion and mutual recursion. Polymorphic recursion, whose support was recently added to OCaml and which is used extensively in Okasaki’s book, is also handled. Modules and functors are supported as long as the corresponding signatures are definable in Coq’s module system. Lazy expressions are supported under the condition that the code would terminate without any lazy annotation. While this restriction certainly does not enable reasoning on infinite data structures, it covers the use of laziness for computation scheduling, as described in Okasaki’s book. In fact, our tools simply ignores any annotation relative to laziness. The key idea is that if a program satisfies its specification when evaluated without any lazy annotation, then it also satisfies its specification when evaluated with lazy annotations. (Of course, the reciprocal is not true.) Program verification based on characteristic formulae could presumably be applied to another programming language. Yet, we make the assumption throughout this work that the source language is call-by-value and deterministic. For the sake of simplicity, program integers are modelled as unbounded mathematical integers. Before generating the characteristic formula of a program, the program is automatically transformed into its normal form: the program is arranged so that all intermediate results and all functions become bound by a let-definition (except applications of simple total functions such as addition and subtraction). This transformation, similar to A-normalization [8], is straightforward to implement and greatly simplifies formal reasoning on programs (see [10, 24] for similar transformations in the context of program verification). The grammar of terms in normal form is given below, for a subset of the source language. It will later be extended with curried n-ary functions and curried n-ary applications (§5.3). x, f v t

:= := :=

variables x | n | (v, v) | injk v v | (v v) | fail | if x then t else t let x = t in t | let f = (µf.λx.t) in t

A | int | T × T | T + T | T → T | µA.T ∀A.T

Characteristic formula generation: informal presentation

The characteristic formula of a term t, written JtK, is generated using a recursive algorithm that follows the structure of t. Recall that, given a post-condition P , the characteristic formula is such that the proposition “JtK P ” holds if and only if the term t terminates and returns a value that satisfies P . In terms of a denotational interpretation, JtK corresponds to the set of post-conditions that are valid for the term t. In terms of types, the characteristic formula associated with a term t of type T applies to a post-condition P of type T → Prop and produces a proposition, so JtK admits the type (T → Prop) → Prop. The key ideas involved in the construction of characteristic formulae are explained next. The reflection of Caml values into Coq and the treatment of polymorphism are described afterwards. The definition of JtK for a particular term t always takes the form “λP. H”, where H expresses what needs to be proved in order to show that the term t returns a value satisfying the post-condition P . To show that a value v returns a value satisfying P , it suffices to prove that “P v” holds. So, JvK is defined as “λP. (P v)”. Next, to prove that an application “f v” returns a value satisfying P , one must exhibit a proof of “AppReturns f v P ”. So, Jf vK is defined as “λP. AppReturns f v P ”. To show that “if x then t1 else t2 ” returns a value satisfying P , one must prove that t1 returns such a value when x is true and that t2 returns such a value when x is false. So, the formula Jif x then t1 else t2 K is defined as

Figure 7. Non-empty lines of source code and proof scripts

4.

:= :=

λP. (x = true ⇒ Jt1 K P ) ∧ (x = false ⇒ Jt2 K P )

To show that the term “fail” returns a value satisfying P , the only way to proceed is to show that this point of the program cannot be reached, by proving that the assumptions accumulated at that point are contradictory. Therefore, JfailK is defined as “λP. False”. The treatment of let-bindings is more interesting. To show that a term “let x = t1 in t2 ” returns a value satisfying P , one must prove that there exists a post-condition P 0 such that t1 returns a value satisfying P 0 and that t2 returns a value satisfying P for any x satisfying P 0 . Formally, Jlet x = t1 in t2 K is defined as λP. ∃P 0 . (Jt1 K P 0 ) ∧ ∀x. (P 0 x) ⇒ (Jt2 K P )

Slightly trickier is the treatment of functions and recursive functions. In fact, we generate the same formula regardless of whether a function is recursive or not (except, of course, for the treatment of binding scopes). Indeed, as suggested in the example of the function half (§1.2), specification for recursive functions are proved by induction, using the induction principles provided by Coq. Thus, there is no need to add further support for reasoning by induction inside characteristic formulae. Consider a possibly-recursive function “µf.λx.t”. The statement “∀x. ∀P 0 . JtK P 0 ⇒ AppReturns f x P 0 ”, called the body description for f , captures the fact that, in order to prove that the application of f to x returns a value satisfying a post-condition P 0 , it suffices to prove that the body t, instantiated with that particular value of x, terminates and returns a value satisfying P 0 . The characteristic formula for the function µf.λx.t then states that, in order to prove a property P to hold of µf.λx.t, it suffices to prove that the body description for f implies the proposition “P f ” for any abstract name f . The formula Jµf.λx.tK is thus defined as: λP. ∀f. ∀x. ∀P 0 . JtK P 0 ⇒ AppReturns f x P 0 ⇒ P f

|

The treatment of pattern matching and mutually-recursive functions can be found in the technical appendix [3].

Throughout this work, we consider only programs that are welltyped in ML with recursive types. The grammar of types and type

328

4.3

Reflection of values in the logic

is obtained by removing from the set B all the type variables that do not occur free in hT i. Indeed, as all arrow types are mapped directly towards the type Func, some variables occuring in T may no longer occur in hT i. So, the set B might be strictly smaller than A. Consider a polymorphic let-binding “let x = t1 in t2 ”. The type checking of the term t1 involves a set of type variables that are to be generalized at this let-binding on variable x. Let C denotes that set of generalizable type variables, and let T be the type of t1 before generalization. The variable x thus admits a type of the form ∀B.T , where B is a subset of C. Note that, in general, C is a strict subset of B because not all intermediate type variables are visible in the result type of an expression. Our goal is to define the characteristic formula associated with the term “let x = t1 in t2 ” in a context Γ. To that end, let ∀A.hT i be the Coq translation of the type ∀B.T . Since A is a subset of B and B is a subset of C, we can define a set A0 such that C is equal to the union of A and A0 . Then, we define:

So far, we have abusively identified program values from the programming language with values from the logic. This section clarifies the translation from ML types to Coq types, as well as the translation from ML values to Coq values. We map every ML value to its corresponding Coq value, except for functions. As explained earlier on, due to the mismatch between the programming language arrow type and the logical arrow type, we represent OCaml functions using values of type Func. For each ML type T , we define the corresponding Coq type, written hT i. This type is simply a copy of T where all the arrow types are replaced with the type Func. Formally: hAi ≡ hinti ≡ hT1 × T2 i ≡ hT1 + T2 i ≡ hµA.T i ≡ hT1 → T2 i ≡

A Int hT1 i × hT2 i hT1 i + hT2 i µA.hT i Func

Jlet x = t1 in t2 KΓ ≡ λP. ∃P 0 : (∀A.(hT i → Prop)). (∀A. ∀A0 . Jt1 KΓ (P 0 A)) ∧ ∀X : (∀A.hT i). (∀A. (P 0 A) (X A)) ⇒ (Jt2 K(Γ,x7→X) P )

Technical remark: a ML algebraic data type definition can be translated into a Coq inductive definition without any difficulty regarding negative occurrences. Indeed, since all arrow types are mapped to Func, there simply cannot be any negative occurrence. Now, given a type T , we define the translation from Caml values of type T towards Coq values of type hT i. The translation of a value v of type T is written dveΓ T . The context Γ, which maps Caml variables to Coq variables, is used to translate non-closed values. The definition of the operator d·e, called decoder, appears next. dxeΓ T dneΓ int d(v1 , v2 )eΓ T1 ×T2 dinjk veΓ T1 +T2 dveΓ µA.T

≡ Γ(x) ≡ n Γ ≡ (dv1 eΓ T1 , dv2 eT2 ) k Γ ≡ inj dveTk ≡ dveΓ([A→(µA.T )] T )

dµf.λx.teΓ T1 →T2

≡

The post-condition P 0 describing X is a polymorphic predicate of type ∀A.(hT i → Prop). Note that it is not a predicate on a polymorphic value, which would have the type (∀A.hT i) → Prop. (Indeed, we only care about describing the behaviour of monomorphic instances of the polymorphic variable X.) If we write type applications explicitly, then a particular monomorphic instance of X takes the form X A and it satisfies the predicate P 0 A. Those type applications appear in the characteristic formula stated above. Remark: we need to update slightly the translation from OCaml variables to Coq variables, because the context Γ may now associate program variables with polymorphic logical variables. The translation a monomorphic occurrence of a polymorphic variable x is the application of the Coq type variable Γ(x) to some appropriates types, which depend on the type of x at its place of occurrence. Finally, we give the characteristic formula for polymorphic functions, which is simpler than that of other polymorphic values because functions are simply reflected in the logic using the type Func. If A denotes the set of generalizable type variables associated with the body t of a function µf.λx.t, then the characteristic formula is constructed as follows.

not needed at this time

When decoding closed values, the context Γ is typically empty. Henceforth, we write dveT as a shorthand for dve∅T . Moreover, when there is no ambiguity on the type T of the value v, we omit the type T and simply write dveΓ and dve. 4.4

Jµf.λx.tKΓ ≡ λP. ∀F. (∀A X P 0 . JtK(Γ,f 7→F,x7→X) P 0 ⇒ AppReturns F X P 0 ) ⇒ P F

Characteristic formula generation: formal presentation

The characteristic formula generator can now be given a formal presentation in which OCaml values are reflected into Coq, through calls to the decoding function d·e. If t is a term of type T , then its characteristic formula JtKΓ is actually a logical predicate of type (hT i → Prop) → Prop. The environment Γ describes the substitution from program variables to Coq variables. In order to justify that characteristic formulae can be displayed like the source code, we proceed in two steps. First, we describe the characteristic formula generator in terms of an intermediate layer of notation (Figure 8). Then, we define the notation layer in terms of higher-order logic connectives as well as in terms of the predicate AppReturns (Figure 9). The contents of those figures simply refines the informal presentation from §4.2. 4.5

5.

Specification predicates

Through this section, we formally describe the meaning of the predicates AppReturns and Spec. We then generalize those predicates to n-ary functions. Finally, we investigate how the predicate Spec can be used to specify higher-order functions. 5.1

Definition of the specification predicate

Consider the specification of the function half, written in terms of the predicate AppReturns. ∀x. ∀n ≥ 0. x = 2 ∗ n ⇒ AppReturns half x (= n) The same specification can be rewritten with the Spec notation as:

Polymorphism

Spec half (x : int) | R >> ∀n ≥ 0. x = 2 ∗ n ⇒ R (= n)

The treatment of polymorphism is certainly one of the most delicate aspect of characteristic formula generation. We need to extend the characteristic formula so as to quantify type variables needed to type-check the bodies of polymorphic let-bindings. The translation of a polymorphic OCaml type ∀B.T is a polymorphic Coq type of the form ∀A.hT i. The set of type variables A

The notation based on Spec in fact stands for an application of a higher-order predicate called Spec1 . The proposition “Spec1 f K” asserts that the function f admits the specification K. The predicate K takes both x and R as argument, and specifies the result of the application of f to x. The predicate R is to be applied to the

329

JvKΓ Jf vKΓ JfailKΓ Jif x then t1 else t2 KΓ Jlet x = t1 in t2 KΓ Jlet f 0 = (µf.λx.t1 ) in t2 KΓ

≡ ≡ ≡ ≡ ≡ ≡

Ret dveΓ App df eΓ dveΓ Fail If dxeΓ Then Jt1 KΓ Else Jt2 KΓ Let X := Jt1 KΓ in Jt2 K(Γ,x7→X) 0 0 Let F 0 := Fun F X := Jt1 K(Γ,f 7→F,x7→X) in Jt2 K(Γ,f 7→F )

Figure 8. Characteristic formula generator Ret V App F V Fail If V Then Q Else Q0 Let X := Q in Q0 Fun F X := Q

≡ ≡ ≡ ≡ ≡ ≡

λP. λP. λP. λP. λP. λP.

PV AppReturns F V P False (V = true ⇒ Q P ) ∧ (V = false ⇒ Q0 P ) ∃P 0 . Q P 0 ∧ (∀X. P 0 X ⇒ Q0 P ) ∀F. ∀X. ∀P 0 . Q P 0 ⇒ AppReturns F X P 0 ⇒ P F

Figure 9. Syntactic sugar to display characteristic formulae post-condition that holds of the result of “f x”. For example, the previous specification for half stands for:

Secondly, we define the predicate Specn . Again, we proceed by recursion on n. For example, a curried function f of two arguments is a total function that, when applied to its first argument, returns a unary function g that admits a certain specification which depends on that first argument. Formally:

Spec1 half (λx R. ∀n ≥ 0. x = 2 ∗ n ⇒ R (= n)) In first approximation, the predicate Spec1 is defined as follows: Spec1 f K

≡

∀x. K x (AppReturns f x)

Spec2 f K

Spec1 f (λx R. R (λg. Spec1 g (K x)))

where (K : A1 → A2 → ((B → Prop) → Prop) → Prop). Remark: Spec2 is polymorphic in the types A1 , A2 and B. The actual definition, given in Figure 10, includes a side condition to ensure that K is covariant in R, written Is specn K. Note: the specification of a curried function described using Specn can always be viewed as a unary function specified using Spec1 . This property will be useful for reasoning on higher-order functions. The high-level notation for specification used in §3 can now be easily explained in terms of the family of predicates Specn .

where K has type A → ((B → Prop) → Prop) → Prop, where A and B correspond to the input and the output type of f , respectively. The reader may check that unfolding the definition of Spec1 in the specification for half expressed using Spec1 yields the specification for half expressed in terms of AppReturns. The true definition of “Spec1 ” actually include an extra sidecondition, expressing that K is covariant in R. It is needed to ensure that the specification K actually concludes about the behaviour of the application of the function. Formally, covariance is captured by the predicate Weakenable, defined as follows: Weakenable H ≡ ∀ G G0 . (∀x. G x → G0 x) → H G → H G0

≡

where H has type “(X → Prop) → Prop” for some X. The formal definition of Spec1 appears in the middle of Figure 10. Fortunately, thanks to appropriate lemmas and tactics, the predicate Weakenable never needs to be manipulated explicitly by the user. 5.2

≡

5.3

Spec f (x1 : A1 ) . . . (xn : An ) | R >> H Specn f (λ(x1 : A1 ). . . . λ(xn : A1 ). λR. H)

Characteristic formulae for curried functions

In this section, we update the generation of characteristic formulae to add direct support for reasoning on n-ary functions using Specn and AppReturnsn . Note that the grammar of terms in normal form is now extended with n-ary applications and n-ary abstractions. Intuitively, the characteristic formula associated with an application “f v1 . . . vn ” is simply “λP. AppReturnsn v1 . . . vn P ”. The formal definition, which takes decoders into account, is:

Direct treatment of n-ary functions

In order to obtain a realistic tool for program verification, it is crucial to offer direct support for reasoning on the definition and application of n-ary curried functions. Generalizing the definitions of Spec1 and AppReturns1 to higher arities is not entirely straightforward, because we want the ability to reason on partial applications and over applications. Intuitively, the specification of a n-ary curried function should capture the property that the application to a number of arguments less than n terminates and returns a function with the appropriate specialization of the original specification. Firstly, we define the predicate AppReturnsn . The proposition “AppReturnsn f v1 . . . vn P ” states that the application of f to the n arguments v1 . . . vn returns a value satisfying P . The family of predicates AppReturnsn is defined by recursion on n in terms of the predicate AppReturns, as shown at the top of Figure 10. For instance, “AppReturns2 f v1 v2 P ” states that the application of f to v1 returns a function g such that the application of g to v2 returns a value satisfying P . More generally, if m is smaller than n, then applications at arities n and m are related as follows:

Jf v1 . . . vn KΓ ≡ λP. AppReturnsn df eΓ dv1 eΓ . . . dvn eΓ P

The characteristic formula for a function “µf.λx1 . . . xn .t” asserts that to prove “Specn f K” it suffices to show that the proposition “K x1 . . . xn JtK” holds for any arguments xi . Remark: the treatment of unary functions given here is different but provably equivalent to that given earlier on (§4.2). It may be surprizing to see the predicate “K x1 . . . xn ” being applied to a characteristic formula JtK. It is worth considering an example. Recall the definition of the function half. It takes the form “µhalf.λx.t”, where t stands for the body of half. Its specification takes the form “Spec1 half K”, where K is equal to “λx R. ∀n ≥ 0. x = 2 ∗ n ⇒ R (= n)”. According to the new characteristic formula for functions, in order to prove that the function half satisfies its specification, we need to prove the proposition “∀x. K x JtK”. Unfolding K, we obtain: “∀n ≥ 0. x = 2 ∗ n ⇒ JtK (= n)”. As expected, we are required to prove that the body of the function half (described by the characteristic formula

AppReturnsn f v1 . . . vn P ⇐⇒ AppReturnsm f v1 . . . vm (λg. AppReturnsn−m g vm+1 . . . vn P )

330

AppReturns1 f x P AppReturnsn f x1 . . . xn P Is spec1 K Is specn K Spec1 f K Specn f K

≡ ≡ ≡ ≡ ≡ ≡

AppReturns f x P AppReturns f x1 (λg. AppReturnsn−1 g x2 . . . xn P ) ∀x. Weakenable (K x) ∀x. Is specn−1 (K x) Is spec1 K ∧ ∀x. K x (AppReturns f x) Is specn K ∧ Spec1 f (λx R. R (λg. Specn−1 g (K x)))

In the figure, n > 1 and (f : Func) and (xi : Ai ) and (P : B → Prop) and (K : A1 → . . . An → ((B → Prop) → Prop) → Prop). Figure 10. Formal definitions for AppReturnsn and Specn JtK) returns a value equal to n, under the assumption that n is a non-negative integer such that x = 2 ∗ n. Characteristic formulae for functions are constructed as follows. Γ Jµf.λx 1 . . . xn .tK ≡ λP. ∀F. ∀K. ∀X1 . . . Xn . K X1 . . . Xn JtK(Γ,f 7→F,xi 7→Xi ) ⇒PF ⇒ Is specn K ⇒ Specn F K

type Func and the predicate AppReturns can be given concrete implementations in the logic. This construction, which has been verified in Coq for a subset of the source language, relies on a deep embedding of the source language and on the definition of functions called encoders, which are the reciprocal of decoders. Second, we present the statements of the soundness and completeness theorems, which have been proved on paper [3].

5.4

6.1

Specification of higher-order functions

The specification of a function, whether unary or n-ary, can always take the form Spec1 f K. Thus, given a function f , we can quantify over every possible specification that f might admit simply by quantifying universally over the variable K. Let us illustrate this ability with the functions apply and compose. The function apply, defined as “λx. λf. (f x)”, can be specified as follows.

Realization of Func and AppReturns

To realize the type Func, we construct a deep embedding of the source language. More precisely, we use inductive definitions to define the set of runtime values, named Val, and to define the set of program terms, named Trm. Runtime values, written v throughout this section, extend source program values with function closures. We then define Func as the set of function closures, that is, as the set of values of type Val of the form µf.λx.t. In order to prove interesting facts about characteristic formulae, we need to define a decoder for function closures created at runtime. We define the decoding of a function as the deep embedding of the code of that function. In other words, the decoder for functions is the identity.

Spec2 apply (λx f R. ∀K. Spec1 f K ⇒ K x R) The conclusion “K x R” states that the behaviour R of the term “apply f x” is described by the predicate “K x”. The predicate “K x” indeed specifies the behaviour of the term “f x”, since “Spec1 f K” implies “K x (AppReturns1 f x)”. Consider now the function compose, which is defined as “λf1 f2 x. f1 (f2 x)”. Its specification is expressed in terms of the specifications K1 and K2 of the functions f1 and f2 , respectively.

dµf.λx.teΓ T1 →T2

≡

(µf.λx.t) : Func

Note that the context Γ is ignored as function closures are always closed values. To realize the predicate AppReturns, we need to define the semantics of the source language and to define encoders. First, we describe the semantics of the deep embedding of the source language through a big-step reduction relation. This inductivelydefined judgment, written “t ⇓ v”, relates a term t of type Trm with a value v of type Val. Second, we define encoders, which are the reciprocal of decoders. For each program type T , we define an encoder function, written bV chT i or simply bV c, that translates a logical value V of type hT i towards the deep embedding of the corresponding program value. Thus, bV chT i is always a logical value of type Val. The definition of encoders, not shown here, is such that b dveT chT i = v and d bV chT i eT = V . We can now give the concrete implementation to AppReturns. The judgment “AppReturns F V P ” asserts that the application of F to the embedding of V terminates and returns the embedding of a value V 0 that satisfies P . Remark: since F is a value of type Func, F is also equal to its encoding bF c.

Spec3 compose (λf1 f2 x R. ∀K1 K2 . Spec1 f1 K1 ⇒ Spec1 f2 K2 ⇒ K2 x (λP. ∃y. P y ⇒ K1 y R))) The last line can be read as follows. First, we want to unfold the specification “K2 x” associated with the application of f2 onto x, since this inner call is the first to be performed. Then, for any postcondition P that holds of the result y of the application “f2 x”, the behaviour R of the term “f1 (f2 x)” is the same as the behaviour of “f1 y”. Since the behaviour of “f1 y” is described by the predicate “K1 y”, the conclusion is “K1 y R”. The specification given above specifies in particular the result obtained by applying compose to two functions. For example, we were able to prove in a few lines of Coq that the term “compose half half” yields a function that divides its argument by four. More precisely, using a weakening lemma for specification, we have proved that the resulting function admits the specification “λx R. ∀n ≥ 0. x = 4 ∗ n ⇒ R (= n)”. (See [3] for details.) Using similar techniques, we were able to assign a concise specification to the Y fixed-point combinator, and then to verify it. We have also started to investigate the specification of higherorder iterators such as map and fold on lists and sets. However, due to lack of space and because we lack experience in using those specifications, we do not report on that recent work in this paper.

The soundness theorem states that if a predicate P satisfies the characteristic formula of a term t, then the term t terminates and returns the encoding of a value V satisfying P .

6.

Theorem 6.1 (Soundness) For any closed term t of type T and any predicate P of type “hT i → Prop”,

AppReturns F V P 6.2

Soundness and completeness

Characteristic formulae can be displayed in a way that closely resemble source code. However, proving the soundness and completeness of a characteristic formula with respect to the source code it describes is not entirely straightforward. First, we show how the

≡

∃V 0 . (P V 0 ) ∧ (F bV c) ⇓ bV 0 c

Soundness and completeness theorems

JtK∅ P

⇒

∃V.

t ⇓ bV c

∧

P V

The completeness result states that the characteristic formula of a term implies any true specification satisfied by this term. To

331

avoid complications related to the occurrence of functions in the final result of a program, we present here only the particular case where the program produces an integer value as final result.

[6] Xinyu Feng, Zhong Shao, Alexander Vaynberg, Sen Xiang, and Zhaozhong Ni. Modular verification of assembly code with stackbased control abstractions. In M. Schwartzbach and T. Ball, editors, PLDI. ACM, 2006.

Theorem 6.2 (Completeness for integer results) Let t be a welltyped closed term, n be an integer, and P be a predicate on integers. If “t ⇓ bnc” and “P n” are true then the proposition “JtK∅ P ” is provable, even without knowledge of the concrete definitions of Func and AppReturns.

[7] Jean-Christophe Filliˆatre and Claude March´e. Multi-prover verification of C programs. In Formal Methods and Software Engineering, 6th ICFEM 2004, volume 3308 of LNCS, pages 15–29. Springer-Verlag, 2004. [8] Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The essence of compiling with continuations. In PLDI, pages 237– 247, 1993.

A more precise theorem can be found in the appendix [3]. 6.3

[9] G. A. Gorelick. A complete axiomatic system for proving assertions about recursive and non-recursive programs. Technical Report 75, University of Toronto, 1975.

Quantification over type variables

Polymorphism has been treated by quantifying over logical type variables, but we have not mentioned what exactly is the sort of these variables in the logic. A tempting solution would be to assign them the sort Type. (In Coq, Type is the sort of all types from the logic, including the sort of Prop.) But in fact, type variables used to represent ML polymorphism are only meant to range over reflected types, i.e. types of the form hT i. Thus, we ought to assign type variables the sort RType, defined as { X : Type | ∃T. X = hT i }. Since we provide RType as an abstract definition, users do not need to exploit the fact that universally-quantified types correspond to reflected ML types. A question naturally follows: since RType is an abstract type, would it remain sound and complete to use the sort Type instead of the sort RType as a sort for type variables? We conjecture that the answer is positive. In the implementation, we use the sort Type for the sake of convenience, however we could switch to RType if it ever turned out to be necessary.

7.

[10] Kohei Honda, Martin Berger, and Nobuko Yoshida. Descriptive and relative completeness of logics for higher-order functions. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, ICALP (2), volume 4052 of LNCS. Springer, 2006. [11] Johannes Kanig and Jean-Christophe Filliˆatre. Who: a verifier for effectful higher-order programs. In ML’09: Proceedings of the 2009 ACM SIGPLAN workshop on ML, pages 39–48, New York, NY, USA, 2009. ACM. [12] Henri Korver. Computing distinguishing formulas for branching bisimulation. In Kim Guldstrand Larsen and Arne Skou, editors, CAV, volume 575 of LNCS, pages 13–23. Springer, 1991. [13] Xavier Leroy. Formal certification of a compiler back-end or: programming a compiler with a proof assistant. In POPL, pages 42–54, January 2006. [14] Claude March´e, Christine Paulin Mohring, and Xavier Urbain. The Krakatoa tool for certification of Java/JavaCard programs annotated in JML. JLAP, 58(1–2):89–106, 2004. [15] Conor McBride and James McKinna. The view from the left. JFP, 14(1):69–111, 2004. [16] Farhad Mehta and Tobias Nipkow. Proving pointer programs in higher-order logic. In Franz Baader, editor, CADE, volume 2741 of LNCS, pages 121–135. Springer, 2003. [17] R. Milner. Communication and Concurrency. Prentice-Hall, 1989.

Conclusion

We have presented CFML, a tool for the verification of pure OCaml programs. It consists of two parts: a characteristic formula generator (implemented in 3000 lines OCaml) and a set of lemmas, notation and tactics for manipulating characteristic formulae (a 4000line Coq library). We have reused OCaml’s parser and type-checker to achieve maximal compatibility, making it possible to verify existing code, even if it is was not originally intented to be verified. We have employed our tool to specify and verify total correctness of a number of advanced purely-functional data structures. Complex invariants can be expressed concisely, thanks to the high expressiveness of higher-order logic. Nontrivial proof obligations can be discharged easily, thanks to the use of interactive proofs. When the code or its specification is incorrect, the proof assistant provides immediate feedback, explaining what proof obligation fails and where this obligation comes from. In our experience, the process of verifying a program can be conducted relatively quickly. Most often, the hardest part is to figure out very precisely all the invariants that the program relies upon.

[18] Magnus O. Myreen, Michael J. C. Gordon, and Konrad Slind. Machine-code verification for multiple architectures: an application of decompilation into logic. In FMCAD, pages 1–8, Piscataway, NJ, USA, 2008. IEEE Press. [19] Aleksandar Nanevski, J. Gregory Morrisett, and Lars Birkedal. Hoare type theory, polymorphism and separation. JFP, 18(5-6):865–911, 2008. [20] Aleksandar Nanevski, Viktor Vafeiadis, and Josh Berdine. Structuring the verification of heap-manipulating programs. In Manuel V. Hermenegildo and Jens Palsberg, editors, POPL, pages 261–274. ACM, 2010. [21] Zhaozhong Ni and Zhong Shao. Certified assembly programming with embedded code pointers. In POPL, 2006. [22] Chris Okasaki. Purely Functional Data Structures. Cambridge University Press, 1999. [23] David Park. Concurrency and automata on infinite sequences. In Peter Deussen, editor, Theoretical Computer Science: 5th GI-Conference, Karlsruhe, volume 104 of LNCS, pages 167–183, Berlin, Heidelberg, and New York, March 1981. Springer-Verlag. [24] Yann R´egis-Gianas and Franc¸ois Pottier. A Hoare logic for call-byvalue functional programs. In MPC, July 2008. [25] Matthieu Sozeau. Program-ing finger trees in coq. SIGPLAN Not., 42(9):13–24, 2007.

References [1] Mike Barnett, Rob DeLine, Manuel F¨ahndrich, K. Rustan M. Leino, and Wolfram Schulte. Verification of object-oriented programs with invariants. JOT, 3(6), 2004. [2] Arthur Chargu´eraud. Verification of call-by-value functional programs through a deep embedding. Unpublished. http://arthur.chargueraud.org/research/2009/deep/, March 2009. [3] Arthur Chargu´eraud. Technical appendix to the current paper. http://arthur.chargueraud.org/research/2010/cfml/, April 2010. [4] Adam Chlipala, Gregory Malecha, Greg Morrisett, Avraham Shinnar, and Ryan Wisnesky. Effective interactive proofs for higher-order imperative programs. In ICFP, September 2009.

[26] Karen Zee, Viktor Kuncak, and Martin Rinard. An integrated proof language for imperative programs. In PLDI, 2009.

[5] Thierry Coquand. Alfa/agda. In Freek Wiedijk, editor, The Seventeen Provers of the World, volume 3600 of Lecture Notes in Computer Science, pages 50–54. Springer, 2006.

332

VeriML: Typed Computation of Logical Terms inside a Language with Effects Antonis Stampoulis

Zhong Shao

Department of Computer Science Yale University New Haven, CT 06520-8285 {antonis.stampoulis,zhong.shao}@yale.edu

Abstract

explicit proof objects. Other verification projects have opted to use first-order automated theorem provers; one such example is the certified garbage collector by Hawblitzel and Petrank [2009]. Still, the actual process of software verification requires significant effort, as clearly evidenced by the above developments. We believe that a large part of this effort could be reduced, if the underlying verification frameworks had better support for extending their automation facilities. During a large proof development, a number of user-defined datatypes are used; being able to define domainspecific decision procedures (for these) can significantly cut back on the manual proof effort required. In other cases, different program logics might need to be defined to reason about different parts of the software being verified, as is argued by Feng et al. [2008] for the case of operating system kernels. In such cases, developing automated provers tailored to these logics would be very desirable. We thus believe that in order to be truly extensible, a proof development framework should support the following features:

Modern proof assistants such as Coq and Isabelle provide high degrees of expressiveness and assurance because they support formal reasoning in higher-order logic and supply explicit machinecheckable proof objects. Unfortunately, large scale proof development in these proof assistants is still an extremely difficult and timeconsuming task. One major weakness of these proof assistants is the lack of a single language where users can develop complex tactics and decision procedures using a rich programming model and in a typeful manner. This limits the scalability of the proof development process, as users avoid developing domain-specific tactics and decision procedures. In this paper, we present VeriML—a novel language design that couples a type-safe effectful computational language with firstclass support for manipulating logical terms such as propositions and proofs. The main idea behind our design is to integrate a rich logical framework—similar to the one supported by Coq— inside a computational language inspired by ML. The language design is such that the added features are orthogonal to the rest of the computational language, and also do not require significant additions to the logic language, so soundness is guaranteed. We have built a prototype implementation of VeriML including both its type-checker and an interpreter. We demonstrate the effectiveness of our design by showing a number of type-safe tactics and decision procedures written in VeriML.

• Use of a well-established logic with well-understood metathe-

ory, that also provides explicit proof objects. This way, the trusted computing base of the verification process is kept at a minimum. The high assurance offered by developments such as CompCert and seL4 owes largely to this characteristic of the proof assistants they are developed on. • Being able to programmatically reflect on logical terms (e.g.,

propositions) so that we can write a large number of procedures (e.g., tactics, decision procedures, and automated provers) tailored to solving different proof obligations. Facilities such as LTac [Delahaye 2000], and their use in developments like Chlipala et al. [2009], demonstrate the benefits of this feature.

Categories and Subject Descriptors D.3.1 [Programming Languages]: Formal Definitions and Theory General Terms

1.

Languages, Verification

Introduction

• An unrestricted programming model for developing these pro-

In recent years, there has been a growing interest in formal verification of substantial software code bases. Two of the most significant examples of this trend is the verification of a full optimizing compiler for a subset of the C language in the CompCert project [Leroy 2009], as well as the verification of the practical operating system microkernel seL4 [Klein et al. 2009]. Both of these efforts use powerful proof assistants such as Coq [Barras et al. 2010] and Isabelle [Nipkow et al. 2002] which support higher-order logic with

cedures, that permits the use of features such as non-termination and mutable references. The reason for this is that even simple decision procedures might make essential use of imperative data structures and might have complex termination arguments. One such example are decision procedures for the theory of equality with uninterpreted functions [Bradley and Manna 2007]. By enabling an unrestricted programming model, porting such procedures does not require significant re-engineering. • At the same time, being able to provide certain static guaran-

tees and rich type information to the programmer. Terms of a formal logic come with rich type information: proof objects have “types” representing the propositions that they prove, and propositions themselves are deemed to be valid according to some typing rules. By retaining this information when programmatically manipulating logical terms, we can specify the behavior of the associated code. For example, we could statically specify that a tactic transforms a propositional goal into

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

333

proofs tactics, decision procedures, proofs ML + λHOLind VeriML

HOL

tactics, decision procedures

LTac tactics, reflection-based d.p., proofs CIC

tactics, decision procedures

ML traditional LCF approach

LTac

encoding of λHOLind inside LF?

proofs, tactics, decision procedures?

LF + Twelf/Beluga/Delphin

ML Coq approach

potential LF-based approach?

Figure 1. Schematic comparison of the structure of related approaches an equivalent one, by requiring a proof object witnessing this equivalence. Another guarantee we would like is the correct handling of the binding constructs that logical terms include (e.g. quantification in propositions), so that this common source of errors is statically avoided.

Traditional LCF LTac Reflection-based Beluga, Delphin VeriML

A framework that combines these features is not currently available. As a result, existing proof assistants must rely on a mix of languages (with incompatible type systems) to achieve a certain degree of extensibility. In this paper, we present VeriML—a novel language design that aims to support all these features and provide a truly extensible and modular proof development framework. Our paper makes the following new contributions:

Table 1. Comparison of different approaches based on features

that we will use to base our framework. We opt to use a higherorder logic with inductive definitions and explicit proof objects. This gives us a high degree of expressivity, enough for software verification as evidenced by the aforementioned large-scale proof developments, while at the same time providing a high level of assurance. Furthermore, we allow a notion of computation inside our logic, by adding support for defining and evaluating total recursive functions. In this way, logical arguments that are based solely on such computation need not be explicitly witnessed in proof objects, significantly reducing their sizes. This notion of computation must of course be terminating, in order to maintain soundness. Because of this characteristic, our logic satisfies what we refer to as the Poincar´e Principle (abbreviated as P.P.), following the definition in Barendregt and Geuvers [1999]. Last, we choose to omit certain features like dependent types from our logic, in order to keep its metatheory straightforward. We will see more details about this logic in Section 4. We refer to propositions, inhabitants of inductive types, proof objects, and other terms of this logic as logical terms. Developing proofs directly inside this logic can be very tedious due to the large amount of detail required; because of this, proof development frameworks provide a set of computational functions (tactics and decision procedures) that produce parts of proofs, so that the proof burden is considerably lessened. The problem that we are interested in is the design of a computational language, so that such functions can be easily and effectively written by the user for the domains they are interested in, leading to a scalable and modular proof development style. As we have laid out in the introduction, we would like a number of features out of such a computational language: being able to programmatically pattern match on propositions, have a generalpurpose programming model available, and provide certain static guarantees to the programmer. Let us briefly consider how computational approaches in existing proof assistants fare towards those points. A schematic comparison is given in Figure 1, while Table 1 compares existing approaches based on these points. A standard and widely available approach [Slind and Norrish 2008, Harrison 1996, Nipkow et al. 2002, Barras et al. 2010] is to write user-defined tactics and decision procedures inside the implementation language of the proof assistant, which is in most cases a member of the ML family of languages. This gives to the user access to a rich programming model, with non-terminating recursion and imperative data struc-

• As far as we know, VeriML is the first proof framework that

successfully combines a type-safe effectful computational language with first-class support for manipulating rich logical terms such as propositions, proofs, and inductive definitions. • An important feature of VeriML is the strong separation of

roles played by its underlying logic language and computational language. The logic language, λHOLind , supports higher-order logic with inductive definitions (as in Coq), so it can both serve as a rich meta logic and be used to define new object logics/languages and reason about their meta theory. All proof objects in VeriML can be represented using λHOLind alone. The computational language is used only for general-purpose programming, including typed manipulation of logical terms. This is in sharp contrast to recent work such as Beluga [Pientka and Dunfield 2008] and Delphin [Poswolsky and Sch¨urmann 2008] where meta-logical proofs are represented using their computational languages. Maintaining soundness of such proofs when adding imperative features to these languages would be non-trivial, and would put additional burden (e.g. effect annotations) on general-purpose programming. • We present the complete development of the type system and

operational semantics for VeriML, as well as their associated meta-theory . We also show how to adapt contextual modal type theory to work for a rich meta-logic such as λHOLind . • We have built a prototype implementation of VeriML and used

it to write a number of type-safe tactics and decision procedures. We use these examples to demonstrate the applicability of our approach and show why it is important to support typesafe handling of binders, general recursion, and imperative features such as arrays and hash tables in VeriML. The rest of the paper is organized as follows. We first give a high-level overview of VeriML (Sec 2) and then use examples to explain the basic design (Sec 3); we then present the logic language, the computational language, and their meta-theory (Sec 4-5); finally, we describe the implementation and discuss related work.

2.

ces s tee ation feren P.P. n a n ar rmi le re ith gu e tab ogic w tic on-t u a t l m n s no no yes (maybe) no yes no yes yes no no yes yes yes no no yes yes yes yes

Overview of the language design

We will start off by presenting a high-level overview of our framework design. The first choice we have to make is the formal logic

334

context they depend on. Our computational language manipulates such terms, instead of normal logical terms, which would also need some external information about their context. A notion of context polymorphism will also need to be introduced, in order to write code that is generic with respect to variable contexts.

tures. Still, the user has to deal with the implementation details of the framework, and no static guarantees are given whatsoever. All logical terms are essentially identified at the ML type level, leading to an untyped programming style when programming with them. Another approach is the use of LTac [Delahaye 2000] in the Coq proof assistant: a specialized tactic language that allows pattern matching on propositions, backtracking proof search and general recursion. This language too does not provide any static guarantees, and has occasional issues when dealing with binders and variables. Also, the programming model supported is relatively poor, without support for rich data structures or imperativity. An interesting approach is the technique of proof-by-reflection [Boutin 1997], where the computational notion inside the logic itself is used in order to create certified decision procedures. While this approach gives very strong static guarantees (total correctness), it does not support non-termination or imperative data structures, limiting the kind of decision procedures we can write. Also, the use of a mix of languages is required in this technique. In order to combine the benefits of these approaches, we propose a new language design, that couples a general-purpose programming language like ML with first-class support for our logical framework. Furthermore we integrate the type system of the logic inside the type system of the computational language, leading to a dependently typed system. Logical term literals will thus retain the type information that can be statically determined for them. Moreover, a pattern matching construct for logical terms is explicitly added, which is dependently typed too; the type of each branch depends on the specific pattern being matched. We use dependent types only as a way to provide lightweight static guarantees. For example, we can require that a function receives a proposition and returns a proof object for that proposition, ruling out the possibility of returning an “invalid” proof object because of programmer error. Our approach therefore differs from other dependently-typed frameworks like Agda [Norell 2007] or HTT [Nanevski et al. 2006], as we are not necessarily interested in reasoning about the correctness of code written in our computational language. Also, programming in such systems includes an aspect that amounts to proof development, as evidenced e.g. in the Russell framework [Sozeau 2007]. We are interested in how proof development itself can be automated, so our approach is orthogonal to such systems. Our notion of pattern matching on propositions would amount to typecase-like constructs in these languages, which in general are not provided. Dependently-typed frameworks for computing with LF terms like Beluga [Pientka and Dunfield 2008] and Delphine [Poswolsky and Sch¨urmann 2008], are not ideal for our purposes, because of the lack of imperative features and the fact that encoding a logic like the one we describe inside LF is difficult and has not been demonstrated yet in practice. The reason for this is exactly our logic’s support for the Poincar´e principle. Still, we draw inspiration from such frameworks. In order to type-check logical terms, and make sure that binding is handled correctly, we need information about the free variables context that they depend on. Not all logical terms manipulated during evaluation of a program written in our language need to refer to the same context; for example, when pattern matching a quantified proposition like ∀x : Nat.P, the variable P might refer to an extra variable compared to the original proposition that was being matched. Therefore, in order to guarantee proper scoping of variables used inside the logical terms, the type system of our computational language tracks the free variable context of the logical terms that are manipulated. This is done by using the main idea of contextual modal type theory [Nanevski et al. 2008, Pientka 2008]. We introduce a notion of contextual logical terms, that is, logical terms that come packaged together with the free variables

3.

Programming Examples

In this section we will present a number of programming examples in our computational language in order to demonstrate its use as well as motivate some of our design choices, before presenting the full technical details in later sections. Each example demonstrates one particular feature of the computational language. The examples we will present are successive versions of a tactic that attempts to automatically derive intuitionistic proofs of propositional tautologies (similar to Coq’s tauto tactic [Barras et al. 2010]) and a decision procedure for the theory of equality, along with the data structures that they use. Note that we use a somewhat informal style here for presentation purposes. Full details for these examples can be found as part of our implementation at http://flint.cs.yale.edu/publications/veriml.html. 3.1 Pattern matching We will start with the automatic tautology proving tactic which is structured as follows: given a proposition, it will perform pattern matching in order to deconstruct it, and attempt to recursively prove the included subformulas; when this is possible, it will return a proof object of the given proposition. Some preliminary code for such a function follows, which handles only logical conjunction and disjunction, and the True proposition as a base case. Note that we use a monadic do notation in the style of Haskell for the failure monad (computation with the ML option type), and we use the syntax e1 ||e2 where both expressions are of option type in order to choose the second expression when the first one fails. We use the “holcase · of · · · ” construct to perform pattern matching on a logical term. Also, we use the notation h·i to denote the lifting of a logical term into a computational language term. Under the hood, this is an existential package which packages a logical term with the unit value. Here we have used it so that our function might return proof objects of the relevant propositions. We have not written out the details of the proof objects themselves to avoid introducing unnecessary technical details at this point, but they are straightforward to arrive at. tauto P = holcase P of P1 ∧ P2 7→ do pf1 ← tauto P1 ; pf2 ← tauto P2 ;

| P1 ∨ P2

| True | P0

h· · · proof of P1 ∧ P2 · · · i 7→ (do pf1 ← tauto P1 ; h· · · proof of P1 ∨ P2 · · · i) || (do pf2 ← tauto P2 ; h· · · proof of P1 ∨ P2 · · · i) 7→ Some h· · · proof of True · · · i 7 None →

We assign a dependent type to this function, requiring that the proof objects returned by the function prove the proposition that is given as an argument. This is done by having the pattern matching be dependently typed too; we will see the details of how this is achieved after we describe our type system. We use the notation LT(·) to denote lifting a logical term into the level of computational types; thus the function’s type will be: ΠP : Prop.option LT(P)

3.2 Handling binders and free variables Next we want to handle universally quantified propositions. Our procedure needs to go below the quantifier, and attempt to prove the body of the proposition; it will succeed if this can be done parametrically with regards to the new variable. In this case, the pro-

335

cedure we have described so far will need to be run recursively on a proposition with a new free variable (the body of the quantifier). To avoid capture of variables, we need to keep track of the free variables used so far in the proposition; we do this by having the function also take the free variables context of the proposition as an argument. Also, we annotate logical terms with the free variables context they refer to. Thus we will be handling contextual terms in our computational language, i.e., logical terms packaged with the free variables context they refer to. We use φ to range over contexts (their kind is abbreviated as ctx), and the notation [φ] Prop to denote propositions living in a context φ; a similar notation is used for other logical terms. The new type of our function will thus be:

terms in the hypotheses list to the new context before being able to use it; this is what hyplistWeaken is used for. The proof object returned for P2 might mention the new free variable pf1 . This extra dependence is discharged using the implication introduction axiom of the underlying logic, yielding a proof of P1 → P2 . Details are shown below. tauto : Πφ : ctx.ΠP : [φ] Prop.hyplist φ → option LT([φ] P) tauto φ P hl = holcase P of

... | P1 → P2 7→ let hl’ = hyplistWeaken φ P1 hl in let hl” = cons hP1 , h[φ, pf1 : P1 ] pf1 ii hl’ in do x ← tauto (φ, pf1 : P1 ) P2 hl”; h· · · proof of P1 → P2 · · · i | 7→ hyplistFind φ P hl

Πφ : ctx.ΠP : [φ] Prop.option LT([φ] P)

The new code for the function will look like the following: tauto φ P = holcase P of P1 ∧ P2 7→ do pf1 ← tauto φ P1 ; pf2 ← tauto φ P2 ;

| ∀x : A.P

3.3 General recursion We have extended this procedure in order to deconstruct the hypothesis before entering it in the hypothesis list (e.g. entering two different hypotheses for P1 ∧ P2 instead of just one, etc.), but this extension does not give us any new insights with respect to the use of our language so we do not show it here. A more interesting modification we have done is to extend the procedure that searches the hypothesis list for the current goal, so that when trying to prove the goal G, a hypothesis like H 0 → G can be used, making H 0 a new goal. This is easy to achieve: we can have the hyplistFind procedure used above be mutually recursive with tauto, and have it pattern-match on the hypotheses, calling tauto recursively for the newly generated goals. Still, we need to be careful in order to avoid recursive cycles. A naive implementation would be thrown into an endless loop if a proof for a proposition like (A → B) → (B → A) → A was attempted. The way to solve this is to have the two procedures maintain a list of “already visited” goals, so that we avoid entering a cycle. Using the techniques we have seen so far, this is easy to encode in our language. This extension complicates the termination argument for our tactic substantially, but since we are working in a computational language allowing non-termination, we do not need to formalize this argument. This is a point of departure compared to an implementation of this tactic based on proof-by-reflection, e.g. similar to what is described in Chlipala [2008]. In that case, the most essential parts of the code we are describing would be written inside the computational language embedded inside the logic, and as such would need to be provably terminating. The complicated termination argument required would make the effort required for this extension substantial. Compared to programming a similar tactic in a language like ML (following the traditional LCF approach), in our implementation the partial correctness of the tactic is established statically. This is something that would otherwise be only achievable by using a proof-by-reflection based implementation.

h· · · proof of P1 ∧ P2 · · · i 7→ do pf ← tauto (φ, x : A) P; h· · · proof of ∀x : A.P · · · i

| ···

Here the proof object pf returned by the recursive call on the body of the quantifier will depend on an extra free variable; this dependence is to be discharged inside the proof object for ∀x : A.P. Let us now consider the case of handling a propositional implication like P1 → P2 . In this case we would like to keep information about P1 being a hypothesis, so as to use it as a fact if it is later encountered inside P2 (e.g. to prove a tautology like P → P). The way we can encode this in our language is to have our tauto procedure carry an extra list of hypotheses. The default case of pattern matching can be changed so that this list is searched instead of returning None as we did above. Each element in the hypothesis list should carry a proof object of the hypothesis too, so that we can return it if the hypothesis matches the goal. Thus each element of the hypothesis list must be an existential package, bundling together the proposition and the proof object of each hypothesis: hyplist = λφ : ctx.list (ΣH : [φ] Prop.LT([φ] H))

The type list is the ML list type; we assume that the standard map and fold left functions are available for it. For this data structure, we define the function hyplistWeaken which lifts a hypotheses list from one context to an extended one; and the function hyplistFind which given a proposition P and a hypotheses list, goes through the list trying to find whether the proposition P is included, returning the associated proof object if it is. For hyplistWeaken we only give its desired type; we will see its full definition in Section 5, after we have introduced some details about the logic and the computational language. hyplistWeaken : Πφ : ctx.ΠA : [φ] Prop.hyplist φ → hyplist (φ, x : A) hypMatch : Πφ : ctx.ΠP : [φ] Prop.(ΣP0 : [φ] Prop.LT([φ] P0 )) →

option LT([φ] P)

3.4 Imperative features The second example that we will consider is a decision procedure that handles the theory of equality, one of the basic theories that SMT solvers support. This is the theory generated from the axioms:

hypMatch φ P hyp = let hP0 , pf’i = hyp in holcase P of P0 7→ Some pf’ | 7→ None

∀x.x = x ∀x, y.x = y → y = x ∀x, y, z.x = y → y = z → x = z

hyplistFind : Πφ : ctx.ΠP : [φ] Prop.hyplist φ → option LT([φ] P) hyplistFind φ P hl = fold left (λres.λhyp.res || hypMatch P hyp) None hl

In a logic like the one we are using, the standard definition of equality includes a single constructor for the reflexivity axiom; the other axioms above can then be proved as theorems using inductive elimination. We will see how this is done in the next section. Usually this theory is extended with the axiom:

Note that we use the notation h·, ·i as the introduction form for existential packages, and let h·, ·i = · in · · · as the elimination form. The tauto tactic should itself be modified as follows. When trying to prove P1 → P2 , we want to add P1 as a hypothesis to the current hypotheses list hl. In order to provide a proof object for this hypothesis, we introduce a new free variable pf1 representing the proof of P1 and try to prove P2 recursively in this extended context using the extended hypotheses list. Note that we need to lift all

∀x, y.x = y → f x = f y

which yields the theory of equality with uninterpreted functions (EUF). We have implemented this extension but we will not con-

336

sider it here because it does not add any new insights. To simplify our presentation, we will also assume that all terms taking part in equations are of a fixed type A. We want to create a decision procedure that gets a list of equations as hypotheses, and then prove whether two terms are equal or not, according to the above axioms and based on the given equations. The standard way to write such a decision procedure is to use a union-find data structure to compute the equivalence classes of terms, based on the given equations. We will use a simple algorithm, described in Bradley and Manna [2007], which still requires imperative features in order to be implemented efficiently. Terms are assigned nodes in a tree-like data structure, which gets usually implemented as an array. Each equivalence class has one representative; each node representing a term has a pointer to a parent term, which is another member of its equivalence class; if a term’s node points to itself, then it is the representative of its class. We can thus find the representative of the equivalence class where a term belongs by successively following pointers, and we can merge two equivalence classes by making the representative of one class point to the representative of the other. We want to stay as close as possible to this algorithm, yet have our procedure yield proof objects for the claimed equations. We choose to encode the union-find data structure as a hash table; this table will map each term into a (mutable) value representing its parent term. Since we also want to yield proofs, we need to also store information on how the two terms are equal. We can encode such a hash table using the following type (assuming that terms inhabit a context φ):

it updates the one representative to point to the other. union : ΠX : [φ] A.ΠX 0 : [φ] A.Πpf : [φ] X = X 0 .eqhash → unit union X X 0 pf h = let Xrep , pf1 :: X = Xrep = find X h in 0 0 0 let Xrep , pf2 :: X = Xrep = find X 0 h in holcase Xrep of

|

A last function is needed, which will be used to check whether in the current hash table, two terms are equal or not. Its type will be: areEqual? : ΠX : ([φ] A).ΠX 0 : ([φ] A).eqhash →

option LT([φ] X = X 0 )

Its implementation is very similar to the above function. In the implementation we have seen above, we have used an imperative data-structure with a dependent data type, that imposes an algorithm-specific invariant. Because of this, rich type information is available while developing the procedure, and the type restrictions impose a principled programming style. At the same time, a multitude of bugs that could occur in an ML-based implementation are avoided: at each point where a proof object is explicitly given in the above implementation, we know that it proves the expected proposition, while in an ML-based implementation, no such guarantee is given. Still, adapting the standard implementation of the algorithm to our language is relatively straightforward, and we do not need to use fundamentally different data structures, as we would need to do if we were developing this inside the computational language of a logic (since only functional data structures could be used).

eqhash = array (ΣX : [φ] A.ΣX 0 : [φ] A.LT([φ] X = X 0 ))

4. We can read the type of elements in the array as key-value pairs, where the key is the first term of type A, and the value is the existential package of its parent along with an appropriately typed proof object. Implementing such a hash-table structure is straightforward, provided that there exists an appropriate construct in our computational language to compute a hash value for a logical term. We can have dependent types for the get/set functions as follows:

This framework, which we call λHOLind , is based on λHOL as presented in Barendregt and Geuvers [1999], extended with inductive definitions and a reduction relation for total recursive functions, in the style of CIC [Barras et al. 2010]. Alternatively, we can view this framework as a subset of CIC, where we have omitted universes other than Prop and Type, as well as polymorphism and dependent types in Type. Logical consistency of CIC [Werner 1994] therefore implies logical consistency of our system. Still, a simpler metatheory based on reducibility candidates is possible. We view this logical framework as a common core between proof assistants like Coq and the HOL family, that is still expressible enough for many applications. At the same time, we believe that it captures most of the complexities of their logics (e.g. the notion of computation in CIC), so that the results that we have for this framework can directly be extended to them. The syntax of our framework is presented in Figure 2. The syntactic category d includes propositions (which we denote as P) and predicates, as well as objects of our domain of discourse: terms of inductively defined data types, as well as total functions between them. Inductive definitions come from a definitions environment ∆; total functions are defined by primitive recursion (using the Elim(·, ·) construct). Terms of this category get assigned kinds of the syntactic category K, with all propositions being assigned kind Prop. Inductive datatypes are defined at this level of kinds. We

The find operation for the union-find data structure can now be simply implemented using the following code. Given a term, we need to return the representative of its equivalence class, along with a proof of equality of the two terms. We look up a given term in the hash table, and keep following links to parents until we end up in a term that links to itself, building up the equality proof as we go; if the term does not exist in the hash table, we simply add it. find : ΠX : ([φ] A).eqhash → ΣX 0 : ([φ] A).LT([φ] X = X 0 ) find X h = (do x ← eqhashGet X h; let hX 0 , pf :: X = X 0 i = x in holcase X of

X0

The logic language λHOLind

We will now focus on the formal logic that we are using. We use a higher-order logic with support for inductive definitions of datatypes, predicates and logical connectives; such inductive definitions give rise to inductive elimination axioms. Also, total recursive functions can be defined, and terms of the logic are identified up to evaluation of these functions. Our logical framework also consists of explicit proof objects, which can be viewed as witnesses of derivations in the logic.

eqhashGet : ΠX : [φ] A.eqhash → option (ΣX 0 : [φ] A.LT([φ] X = X 0 )) eqhashSet : ΠX : [φ] A.ΠX 0 : [φ] A.Πpf : [φ] X = X 0 .eqhash → unit

hX 0 , pfi let hX 00 , pf’ :: X 0 = X 00 i = find X 0 h in hX 00 , h· · · proof of X = X 00 · · · ii) || (let self = hX, · · · proof of X = X · · · i in (eqhashSet X self); self) |

0 → Xrep 7 ()

0

0 ··· in , · · · proof of Xrep = Xrep 7→ let upd = Xrep eqhashSet Xrep upd

7→ 7→

The union operation is given two terms along with a proof that they are equal, and updates the hash-table accordingly: it uses find to get the representatives of the two terms, and if they do not match,

337

s ::= Type | Type0 K ::= Prop | cK | K1 → K2 | x (domain obj./props.) d, P ::= d1 → d2 | ∀x : K.d | λx : K.d | d1 d2 | cd | x | Elim(cK , K0 ) (proof objects) π ::= x | λx : P.π | π1 π2 | λx : K.π | π d | cπ | elim cK | elim cd (HOL terms) t ::= s | K | d | π (logic variables env.) Φ ::= • | Φ, x : t (definitions env.) ∆ ::= • −−−→ | ∆, Inductive cK : Type := {cd : K} −−→ −−−→ | ∆, Inductive ct (x : K) : K1 → · · · → Kn → Prop := {cπ : P}

To see how inductive definitions are used, let us consider the case of natural numbers. Their definition would be as follows:

(sorts) (kinds)

Inductive Nat : Type := zero : Nat | succ : Nat → Nat.

This gives rise to the Nat kind, the zero and succ constructors at the domain objects level, and the elimination axiom elim Nat at the proof object level, that witnesses induction over natural numbers, having the following type: ∀P : Nat → Prop.P zero → (∀x : Nat.P x → P (succ x)) → ∀x : Nat.P x

Similarly we can define predicates, like equality of natural numbers, or logical connectives, through inductive definitions at the level of propositions: Inductive (=Nat ) (x : Nat) : Nat → Prop := refl : x =Nat x.

Figure 2. Syntax of the base logic language λHOLind

Inductive (∧) (A B : Prop) : Prop := conj : A → B → A ∧ B.

Typing for domain objects, propositions and predicates: x:K∈Φ Φ`x:K

Φ ` P1 : Prop Φ ` P2 : Prop Φ ` P1 → P2 : Prop

DP -VAR

From the definition of =Nat we get Leibniz equality as the elimination principle, from which the axioms mentioned in the previous section are easy to prove.

DP -I MPL

elim (=Nat ) : ∀x : Nat.∀P : Nat → Prop.P x → ∀y : Nat.x =Nat y → P y

Φ, x : K ` P : Prop Φ ` ∀x : K.P : Prop

DP -F ORALL

Φ, x : K ` d : K0 Φ ` λx : K.d : K → K0

Φ ` d1 : K → K0 Φ ` d2 : K Φ ` d1 d2 : K0

Last, recursive functions over natural numbers can also be defined through the Elim(Nat, K) construct: Elim(Nat, K) n fz fs proceeds by performing primitive recursion on the natural number n given fz : K and fs : Nat → K → K, returning a term of kind K. For example, we can define the addition function for Nat as:

DP -L AM

DP -A PP

plus = λx, y : Nat.Elim(Nat, Nat) x y (λx0 , rx0 : Nat.succ rx0 )

Functions defined through primitive recursion are permitted to return propositions (where K = Prop), something that is crucial in order to prove theorems like ∀x : Nat.zero 6= succ x.

Typing for proof objects: x:P∈Φ Φ`x:P

PO -VAR

Φ, x : P ` π : P0 Φ ` P → P0 : Prop Φ ` λx : P.π : P → P0

Φ ` π 1 : P → P0 Φ ` π2 : P Φ ` π1 π2 : P0

PO - IMP E

Φ, x : K ` π : P0 Φ ` ∀x : K.P0 : Prop Φ ` λx : K.π : ∀x : K.P0 Φ ` π : ∀x : K.P0 Φ`d:K Φ ` π d : P0 [d/x] Φ`π:P P =βι P0 Φ ` π : P0

We present the main typing judgements of λHOLind in Fig 3. These judgements use the logic variables environment Φ. To simplify the presentation, we assume that the definitions environment ∆ is fixed and we therefore do not explicitly include it in our judgements. We have not included its well-formedness rules; these should include the standard checks for positivity of inductive definitions and are defined following CIC (see for example [PaulinMohring 1993]). Similarly we have omitted the typing rules for the elim constructs. We can view this as a standard PTS with sorts S = {Prop, Type, Type0 }, axioms A = {(Prop, Type), (Type, Type0 )} and rules R = {(Prop, Prop, Prop), (Type, Prop, Prop), (Type, Type, Type)}, extended with inductive definitions and elimination at the levels we described earlier. In later sections we follow this “collapsed” view, using the single typing judgement Φ ` t : t 0 for terms of all levels. Of interest is the PO -C ONVERT typing rule for proof objects. We define a limited notion of computation within the logic language, composed by the standard β-reduction for normal β-redeces and by an additional ι-reduction (defined as in CIC), which performs case reduction and evaluation of recursive function applications. With this rule, logical terms (propositions, terms of inductive datatypes, etc.) that are βι-equivalent are effectively identified for type checking purposes. Thus a proof object for the proposition 2 =Nat 2 can also be seen as a proof object for the proposition 1 + 1 =Nat 2, since both propositions are equivalent if they are evaluated to normal forms. Because of this particular feature of having a notion of computation within the logic language, λHOLind follows the Poincar´e principle, which we view as one of the points of departure of CIC compared to HOL. We have included this in our logic to show that a computational language as the one we propose in the next section is still possible for such a framework.

PO - IMP I

PO - FORALL I

PO - FORALL E

PO -C ONVERT

Figure 3. Main typing judgements of λHOLind (selected rules)

can view Prop as a distinguished datatype, whose terms can get extended through inductive definitions of predicates and logical connectives. Kinds get assigned the sort Type, which in turn gets assigned the (external) sort Type0 , so that contexts can include variables over Type. The last syntactic category of our framework is π, representing proof objects, which get assigned a proposition as a type. We can think of terms at this level as corresponding to different axioms of our logic, e.g. function application will witness the implication elimination rule (modus-ponens). We include terms for performing proof by induction on inductive datatypes and inductive predicates. Using the syntactic category t we represent terms at any level out of the ones we described; at the level of variables we do not distinguish between these different levels.

4.1 Extension with meta-variables As we have mentioned in the previous section, our computational language will manipulate logical terms living in different contexts.

338

(contextual terms) (meta-variables env.) (substitution)

T M σ K d, P π

W; W; W; W;

::= [Φ]t ::= • | M, X : T ::= • | σ, t ::= · · · | X/σ ::= · · · | X/σ ::= · · · | X/σ

To type a contextual term T = [Φ]t, we use the normal typing judgement for our logic to type the packaged term t under the free variables context Φ. The resulting type will be another contextual term T 0 associated with the same free variables context. Thus the only information that is needed in order to type-check a contextual term T is the meta-context M. The judgement M ` T : T 0 for typing contextual terms will therefore look as follows:

(context env.) W ::= • | W, φ : ctx Φ ::= · · · | Φ, φ σ ::= · · · | σ, idφ

T = [Φ]t

Figure 5. Syntax extension of λHOLind with parametric contexts

M`Φ M; Φ ` t : t 0 0 M ` T : [Φ]t

We use the judgement M ` Φ to make sure that Φ is a wellformed context, i.e. that all the variables defined in it have a valid type; dependencies between them are allowed.

In order to be able to type-check these terms properly, we introduce a new class of terms T called contextual terms, which package a logical term along with its free variables environment. We write a contextual term as [Φ]t where t can mention variables out of Φ. We identify these terms up to alpha equivalence (that is, renaming of variables in Φ and t). Furthermore, we need a notion of contextual variables or metavariables. Our computational language will provide ways to abstract over such variables, which stand for contextual terms T . We denote meta-variables as X and use capital letters for them. To use a meta-variable X inside a logical term t, we need to make sure that when it gets substituted with a contextual term T = [Φ0 ]t 0 , the resulting term t[T /X] will still be properly typed. Since t 0 refers to different free variables compared to t, we need a way to map them into terms that only refer to the same variables as t. This mapping is provided by giving an explicit substitution when using the variable X. The syntax of our logic is extended accordingly, as shown in Figure 4. Since logical terms t now include meta-variables, we need to refer to an additional meta-context M. Thus the main typing judgement of the base logic, Φ ` t : t 0 is extended to include this new environment, resulting in a typing judgement of the form M; Φ ` t : t 0 . Existing typing rules ignore the extra M environment; what is interesting is the rule for the use of a meta-variable. This is as follows:

M`•

M`Φ M; Φ ` t : t 0 M ` Φ, x : t

Last, let us consider how to apply the substitution [T /X] (where T = [Φ]t) on X inside a logical term t 0 . In most cases the substitution is simply recursively applied to the subterms of t 0 . The only special case is when t 0 = X/σ. In this case, the meta-variable X should be substituted by the logical term t; its free variables Φ are mapped to terms meaningful in the same context as the original term t 0 using the substitution σ. Thus: (X/σ)[T /X] = t[σ/Φ], when T = [Φ]t

For example, consider the case where X : [a : Nat, b : Nat] Nat, t 0 = plus (X/1, 2) 0 and T = [a : Nat, b : Nat] plus a b. We have: t 0 [T /X] = plus ((plus a b)[1/a, 2/b]) 0 = plus (plus 1 2) 0 =βι 3

The above rule is not complete: the substitution σ is still permitted to use X based on our typing rules, and thus we have to re-apply the substitution of T for X in σ. Note that no circularity is involved, since at some point a substitution σ associated with X will need to not refer to it – the term otherwise would have infinite depth. Thus the correct rule is:

T = [Φ0 ]t M; Φ ` σ : Φ0 M; Φ ` X/σ : t[σ/Φ0 ]

(X/σ)[T /X] = t[(σ[T /X])/Φ], when T = [Φ]t

We use the typing judgement M; Φ ` σ : Φ0 to check that the provided explicit substitution provides a term of the appropriate type under the current free variable context, for each one of the free variables in the context associated with the meta-variable X. The judgment M; Φ ` σ : Φ0 is defined below. M; Φ ` • : •

Typing for logical terms Typing for contextual terms Typing for substitutions Well-formedness for logical variables contexts

Figure 6. Summary of extended λHOLind typing judgements

Figure 4. Syntax extension of λHOLind with contextual terms and meta-variables

X :T ∈M

M; Φ ` t : t 0 M ` T : T0 M; Φ ` σ : Φ0 M`Φ

4.2 Extension with parametric contexts As we saw from the programming examples in the previous section, it is also useful in our computational language to be able to specify that a contextual term depends on a parametric context. Towards that effect, we extend the syntax of our logic in Figure 5, introducing a notion of context variables, denoted as φ, which stand for an arbitrary free variables context Φ. These context variables are defined in the environment W. The definition of the logical variables context Φ is extended so that context variables can be part of it; thus Φ contexts become parametric. In essence, a context variable φ inside a context Φ serves as a placeholder, where more free variables can be substituted; this is permitted because of weakening. We extend the typing judgements we have seen so that the W environment is also included; a summary of this final form of the judgements is given in Figure 6. The typing judgement that checks well-formedness of a context Φ is extended so that context

M; Φ ` σ : Φ0 M; Φ ` t : t 0 [σ/Φ0 ] M; Φ ` (σ, t) : (Φ0 , x : t 0 )

A little care is needed since there might be dependencies between the types of the elements of the context. Thus when type-checking a substitution against a context, we might need to apply part of the substitution in order to get the type of another element in the context. This is done by the simultaneous substitution [σ/Φ] of variables in Φ by terms in σ. To simplify this procedure, we treat the context Φ and the substitution σ as ordered lists that adhere to the same variable order.

339

variables defined in the context W are permitted to be part of it:

K ::= ∗ | K1 → K2 τ ::= unit | int | bool | τ1 → τ2 | τ1 + τ2 | τ1 × τ2 | µα : K.τ | ∀α : K.τ | α | array τ | λα : K.τ | τ1 τ2 | · · · e ::= () | n | e1 + e2 | e1 ≤ e2 | true | false | if e then e1 elsee2 | λx : τ.e | e1 e2 | (e1 , e2 ) | proji e | inji e | case(e, x1 .e1 , x2 .e2 ) | fold e | unfold e | Λα : K.e | e τ | fix x : τ.e | mkarray(e, e0 ) | e[e0 ] | e[e0 ] := e00 | l | error | · · · Γ ::= • | Γ, x : τ | Γ, α : K Σ ::= • | Σ, l : array τ

W; M ` Φ φ : ctx ∈ W W; M ` Φ, φ With this change, meta-variables X and contextual terms T can refer to a parametric context Φ, by including a context variable φ at some point. Explicit substitutions σ associated with use of metavariables must also be extended so that they can correspond to such parametric contexts; this is done by introducing the identity substitution idφ for each context variable φ. The typing rule for checking a substitution σ against a context Φ is extended accordingly:

Figure 7. Syntax for the computational language (ML fragment)

W; M; Φ ` σ : Φ0 φ∈Φ W; M; Φ ` (σ, idφ ) : (Φ0 , φ) When substituting a context Φ for a context variable φ inside a logical term t, this substitution gets propagated inside the subterms of t. Again the interesting case is what happens when t corresponds to a use of a meta-variable (t = X/σ). In that case, we need to replace the identity substitution idφ in the explicit substitution σ by the actual identity substitution for the context Φ. This is done using the idsubst(·) function:

K ::= · · · | Πx : T.K | Πφ : ctx.K τ ::= · · · | ΠX : T.τ | ΣX : T.τ | Πφ : ctx.τ | Σφ : ctx.τ | λX : T.τ | τ T | λφ : ctx.τ | τ Φ e ::= · · · | λX : T.e | e T | hT, ei | let hX, xi = e in e0 | λφ : ctx.e | e Φ | hΦ, ei | let hφ, xi = e in e0 | holcase T of (p1 7→ e1 ) · · · (pn 7→ en ) p ::= cd | p1 → p2 | ∀x : p1 .p2 | λx : p1 .p2 | p1 p2 | x | X/σ | Elim(cK , K0 ) | cK | Prop

idφ [Φ/φ] = idsubst(Φ)

where:

idsubst(•) idsubst(Φ, x : t) idsubst(Φ, φ)

= = =

• idsubst(Φ), x idsubst(Φ), idφ

Figure 8. Syntax for the computational language (new constructs)

With the above in mind, it is easy to see how a proof object for P1 → P2 living in context φ can be created, when all we have is a proof object X for P2 living in context φ, pf1 : P1 .

schemata are not necessary too. Last, our Φ contexts are ordered and therefore permit multiple context variables φ in them; this is mostly presentational.

[φ] λy : P1 .(X/(idφ , y))

This can be used in the associated case in the tautology prover example given earlier, filling in as the term h· · · proof of P1 → P2 · · · i.

5.

The computational language

Having described the logical framework we are using, we are ready to describe the details of our computational language. The ML fragment that we support is shown in Figure 7 and consists of algebraic datatypes, higher-order function types, the native integer and boolean datatypes, mutable arrays, as well as polymorphism over types. For presentation purposes, we regard mutable references as one-element arrays. We use bold face for variables x of the computational language in order to differentiate them from logical variables. In general we assume that we are given full typing derivations for well-typed terms; issues of type reconstruction are left as future work. The syntax for the new kinds, types and expressions of this language is given in Figure 8, while the associated typing rules and small-step operational semantics are given in Figures 9 and 10 respectively. Other than the pattern matching construct, typing and operational semantics for the other constructs are entirely standard. We will describe them briefly, along with examples of their use.

4.3 Metatheory We have proved that substitution of a contextual term T for a metavariable X and the substitution of a context Φ for a context variable φ, preserve the typing of logical terms t. The statements of these substitution lemmas are: Lemma 4.1 If M, X0 : T, M0 ; Φ ` t : t 0 and M ` T0 : T , then M, M0 [T0 /X0 ]; Φ[T0 /X0 ] ` t[T0 /X0 ] : t 0 [T0 /X0 ]. Lemma 4.2 If M, M0 ; Φ, φ0 , Φ0 ` t : t 0 and W; M; Φ ` Φ0 , then M, M0 [Φ0 /φ0 ]; Φ, Φ0 , Φ0 ` t[Φ0 /φ0 ] : t 0 [Φ0 /φ0 ]. These are proved by straightforward mutual structural induction, along with similar lemmas for explicit substitutions σ, contextual terms T and contexts Φ, because of the inter-dependencies between them. The proofs only depend on a few lemmas for the core of the logic that we have described, namely the standard simultaneous substitution lemma, weakening lemma, and preservation of βι-equality under simultaneous substitution. Details are provided in the extended version of this paper [Stampoulis and Shao 2010]. These extensions are inspired by contextual modal type theory and the Beluga framework; here we show how they can be adapted to a different logical framework, like the one we have described. Compared to Beluga, one of the main differences is that we do not support first-class substitutions, because so far we have not found abstraction over substitutions in our computational language to be necessary. Also, context variables are totally generic, not constrained by a context schema. The reason for this is that we will not use our computational language as a proof meta-language, so coverage and totality of definitions in it is not needed; thus context

Functions and existentials over contextual terms Abstraction over a contextual logical term (λX : T.e) results in a dependent function type (ΠX : T.τ), assigning a variable name to this term so that additional arguments or results of the function can be related to it. Still, because of the existence of the pattern matching construct, such logical terms are runtime entities. We should therefore not view this abstraction as similar to abstraction over types or type indexes in other dependently typed programming languages; rather, it is a construct that gets preserved at runtime. Similarly, existential packages over contextual terms are also more akin to normal tuples. To lift an arbitrary HOL term to the computational language, we can use an existential package where the second member is of unit type. This operation is very common, so we introduce the following syntactic sugar at the type level, and for the introduction

340

W; M ` T : T 0 W; M, X : T ; Σ; Γ ` e : τ W; M; Σ; Γ ` λX : T.e : ΠX : T.τ W; M ` T 0 : T W; M; Σ; Γ ` e : τ[T 0 /X]

W; M; Σ; Γ ` T 0 , e : ΣX : T.τ W, φ : ctx; M; Σ; Γ ` e : τ W; M; Σ; Γ ` λφ : ctx.e : Πφ : ctx.τ

W; M; Σ; Γ ` e : ΠX : T.τ W; M ` T 0 : T 0 W; M; Σ; Γ ` e T : τ[T 0 /X]

W; M; Σ; Γ ` e : ΣX : T.τ W; M, X 0 : T ; Σ; Γ, x : τ[X 0 /X] ` e0 : τ0

W; M; Σ; Γ ` let X 0 , x = e in e0 : τ0

W; M; Σ; Γ ` e : Πφ : ctx.τ W; M ` Φ wf W; M; Σ; Γ ` e Φ : τ[Φ/φ]

W; M ` Φ wf W; M; Σ; Γ ` e : τ[Φ/φ] W; M; Σ; Γ ` hΦ, ei : Σφ : ctx.τ

W; M; Σ; Γ ` e : Σφ : ctx.τ W, φ0 : ctx; M; Σ; Γ, x : τ[φ0 /φ] ` e0 : τ0

W; M; Σ; Γ ` let φ0 , x = e in e0 : τ0 W; M ` T : T 0

T 0 = [Φ]t 0

X 0 6∈ fv(τ0 )

φ0 6∈ fv(τ0 )

W; M; Φ ` t 0 : Type ∀i, M; Φ ` (pi ⇐ t 0 ) ⇒ Mi W; M, Mi ; Σ; Γ ` ei : τ[[Φ] pi /X] W; M; Σ; Γ ` holcase T of (p1 7→ e1 ) · · · (pn 7→ en ) : τ[T /X]

Figure 9. Typing judgement of the computational language (selected rules)

v E σM µ

::= λX : T.e | hT, vi | λφ : ctx.e | hΦ, vi | · · · ::= • | E T | hT, Ei | let hX, xi = E in e0 | E Φ | hΦ, Ei | let hφ, xi = E in e0 | · · · ::= • | σM , X 7→ T ::= • | µ, l 7→ [v1 , · · · , vn ]

µ, e −→ µ0 , e0 µ, E[e] −→ µ0 , E[e0 ]

µ, E[error] −→ µ, error

µ, (λX : T.e) T 0 −→ µ, e[T 0 /X]

µ, let hφ, xi = hΦ, vi in e0 −→ µ, e0 [Φ/φ][v/x]

µ, (λφ : ctx.e) Φ −→ µ, e[Φ/φ]

µ, let hX, xi = hT, vi in e0 −→ µ, e0 [T /X][v/x] T = [Φ]t Φ ` unify(p1 ,t) = σM µ, holcase T of (p1 7→ e1 ) · · · (pn 7→ en ) −→ µ, e1 [σM ]

T = [Φ]t Φ ` unify(p1 ,t) = ⊥ µ, holcase T of (p1 7→ e1 ) · · · (pn 7→ en ) −→ µ, holcase T of (p2 7→ e2 ) · · · (pn 7→ en )

µ, holcase T of • −→ µ, error

Figure 10. Operational semantics for the computational language (selected rules) number of new propositional variables might need to be introduced. We could therefore give the following type to such a function:

and elimination operations: LT(T ) hT i let hXi = e in e0

= = =

ΣX : T.unit hT, ()i let hX, i = e in e0

cnf :: Πφ : ctx.ΠP : [φ] Prop.Σφ0 : ctx.LT([φ, φ0 ] Prop)

Erasure semantics are possible for these constructs, since there is no construct that inspects the structure of a context.

An example of the use of existential packages is the hyplistWeaken function of Section 3.2, which lifts a list of hypotheses from one context to an extended one. This works by lifting each package of a hypothesis and its associated proof object in turn to the extended context. We open up each package, getting two contextual terms referring to context φ; we then repackage them, having them refer to the extended context φ, x : A:

Type constructors At the type level we allow type constructors abstracting both over contexts and over contextual terms. This is what enables the definition of the hyplist type constructor in Section 3.2. Similarly we could take advantage of type constructors to define a generic type for hash tables where keys are logical terms of type [φ] A and where the type of values is dependent on the key. The type of values is thus given as another type constructor, and the overall type constructor for the hashtable should be:

hypWeaken : Πφ : ctx.ΠA : [φ] Prop.(ΣH : [φ] Prop.LT([φ] H)) → (ΣH : [φ, x : A] Prop.LT([φ, x : A] H))

hash : (ΠX : [φ] A.∗) → ∗

hypWeaken φ A hyp = let hX, x1 i = hyp in 0

let hX , i = x1 in

([φ, x : A] (X/idφ )), [φ, x : A] (X 0 /idφ )

= λres : (ΠX : [φ] A.∗).array (ΣX : [φ] A.res X)

Implementation of such a data structure is possible, because of a built-in hashing function in our computational language, that maps any logical term to an integer. Some care is needed with this construct, since we want terms that are βι-equivalent to generate the same hashing value (α-equivalence is handled implicitly by using deBruijn indices). To that effect we need to reduce such terms to full βι-normal forms before computing the hash value.

hyplistWeaken : Πφ : ctx.ΠA : [φ] Prop.hyplist φ → hyplist (φ, x : A) hyplistWeaken φ A hl = map hypWeaken hl

Functions and existentials over contexts Abstraction over contexts works as seen previously: we use it in order to receive the free variables context that further logical terms refer to. Existential packages containing contexts can be used in cases where we cannot statically determine the resulting context of a term. An example would be a procedure for conversion of a propositional formula to CNF, based on Tseitin’s encoding [Tseitin 1968]. In this case, a

Static and dynamic semantics of pattern matching The last new construct in our computational language is the pattern matching construct for logical terms. Let us first describe its typing rule, as

341

Proof by structural induction on the step relation µ, e −→ µ0 , e0 , made relatively simple for the new constructs by use of the above lemmas. The proofs for common constructs do not require special provisions and follow standard practice [Pierce 2002]. Progress depends on the following canonical forms lemma:

seen in Figure 9. First we make sure that the logical term to be pattern matched upon (the scrutinee) is well-typed. Furthermore, only logical terms that correspond to propositions, predicates or objects of the domain of discourse are allowed to be pattern matched upon; we thus require that the scrutinee’s kind is Type. Patterns can be viewed as normal logical terms (of syntactic levels K and d of the logic) that contain certain unification variables; we will discuss their typing shortly. Unification variables are normal metavariables that are newly introduced. The result type of the pattern matching construct is dependent on the scrutinee, enabling each branch to have a different type depending on its associated pattern. At runtime, patterns are attempted to be unified against the scrutinee in sequence, and only the first succeeding branch is evaluated; an error occurs when no pattern can be matched. Unification merely checks whether the pattern and the scrutinee match up to βιequivalence; if they do, it returns a substitution for the unification variables, which gets applied to the body of the branch. Higher-order unification in a setting like the logic we are describing is undecidable. We therefore restrict the patterns allowed to linear patterns where unification variables are used at most once. For efficiency reasons, we also impose the restriction that when we use a unification variable in a certain context, it must be applied to the identity substitution of that context. These restrictions are imposed using the pattern typing judgement M; Φ ` (p ⇐ t) ⇒ M0 . This judgement checks that p is a valid pattern corresponding to a logical term of type t, and outputs the unification variables environment M0 that p uses. We further check that patterns are terms in normal form; that is, patterns should only be neutral terms. The details for this judgement, as well as the dynamic semantics of the unification procedure, are given in the extended version of this paper [Stampoulis and Shao 2010]. Revisiting the example of the tautology prover from Section 3.1, the pattern matching would more accurately be written as follows. Note that we also use a return clause in order to specify the result type of the construct. Though this is verbose, most of the contexts and substitutions would be easy to infer from the context.

Lemma 5.5 (Canonical forms) If •; •; Σ; • ` v : τ then: If τ = ΠX : T.τ0 , then v = λX : T.e. If τ = ΣX : T.τ0 , then v = hT 0 , v0 i and •; • ` T 0 : T . If τ = Πφ : ctx.τ0 , then v = λφ : ctx.e. If τ = Σφ : ctx.τ0 , then v = hΦ0 , v0 i and •; • ` Φ0 wf. ... Theorem 5.6 (Progress) If •; •; Σ; • ` e : τ then either e is a value, or, for every µ such that •; •; Σ; • ` µ, µ, e −→ error, or there exists e0 and µ0 such that µ, e −→ µ0 , e0 . The proof is a straightforward structural induction on the typing derivation of e. The only cases for the error result come from non-exhaustive pattern matching or out-of-bounds access in arrays. More details about the proofs can be found in the extended version of this paper [Stampoulis and Shao 2010]. We have found these proofs to be relatively orthogonal to proofs about the type safety of the basic constructs of the computational language. From type safety we immediately get the property that if an expression evaluates to an existential package containing a proof object π, then it will be a valid proof of the proposition that its type reflects. This means that, at least in principle, the type checker of the logic language does not need to be run again, and decision procedures and tactics written in our language always return valid proof objects. In practice, the compiler for a language like this will be much larger than a type checker for the logic language so we will still prefer to use the second as our trusted base. Furthermore, if we are only interested in type checking proof objects yielded from our language, our trusted base can be limited to a type checker for the base logic language, and does not need to include the extensions with meta-variables and parametric contexts.

holcase P return option LT(P) with P1 /idφ ∧ P2 /idφ 7→ · · · | ∀x : A/idφ .P0 /(idφ , x) 7→ · · ·

6.

Implementation and examples

We have created a prototype implementation of a type-checker and interpreter for VeriML, along with an implementation of the higherorder logic we use. Readers are encouraged to download it from http://flint.cs.yale.edu/publications/veriml.html. The implementation is about 4.5kLOC of OCaml code, and gives a VeriML toplevel to the user where examples can be tested out. Concrete syntax for VeriML is provided through Camlp4 syntax extensions. The implementation of the logic can be used in isolation as a proof checker, and is done using the PTS style; thus generalization to a theory like CIC is relatively straightforward. Binding is represented using the locally nameless approach. Both at the level of the logic, as well as at the level of the computational language, we allow named definitions. In the computational language we allow some further constructs not shown here, like mutually recursive function definitions, as well as a printing function for HOL terms, used primarily for debugging purposes. We are currently working on a code generator that translates well-typed programs of our computational language into normal OCaml code for efficiency. This translation essentially is a typeerasure operation, where the annotations needed to support our dependent type system are removed and we are thus left with code similar to what one would write for an LCF-style theorem prover. We are investigating the possibility of emitting code for existing frameworks, like Coq or HOL.

Metatheory We have studied the metatheory for this language, and have found it relatively straightforward using standard techniques. Preservation depends primarily on the substitution lemmas: Lemma 5.1 (Substitution of contextual terms into expressions) If W; M, X : T ; Σ; Γ ` e : τ and W; M ` T 0 : T , then W; M; Σ[T 0 /X]; Γ[T 0 /X] ` e[T 0 /X] : τ[T 0 /X]. Lemma 5.2 (Substitution of contexts into expressions) If W, φ : ctx; M, X : T ; Σ; Γ ` e : τ and W; M ` Φ wf, then W; M[Φ/φ]; Σ[Φ/φ]; Γ[Φ/φ] ` e[Φ/φ] : τ[Φ/φ]. A lemma detailing the correct behavior of pattern unification is also needed, intuitively saying that by applying the substitution yielded by the unification procedure to the pattern, we should get the term we are matching against: Lemma 5.3 (Soundness of unification) If Φ ` unify(p,t) = σM , W; M ` [Φ]t : T0 and W; M ` (p ⇐ T0 ) ⇒ M0 then W; M ` σM : M0 and ([Φ] p)[σM /M0 ] =βι T . Theorem 5.4 (Preservation) If •; •; Σ; • ` e : τ , •; •; Σ; • ` µ and µ, e −→ µ0 , e0 then there exists a Σ0 ⊇ Σ so that •; •; Σ0 ; • ` e0 : τ and •; •; •; Σ0 ; • ` µ0 .

342

design we proposed here; unfortunately they are too numerous to cover here but we refer the reader to three of the most recent and relevant proposals [Norell 2007, Fogarty et al. 2007, Chen and Xi 2005] and the references therein. The way that our approach contrasts with languages like these is that we are not primarily interested in certifying properties of code written in our language. We rather view our language as a foundation for an “extensible” proof assistant, where proofs about code written in other (however richly typed or untyped) languages can be developed in a scalable manner. Of the above languages, we believe Concoqtion [Fogarty et al. 2007] is the one that is closest to our language, as it embeds the full CIC universe as index types for use inside a version of the ML language. Our language does the same thing, even though only a subset of CIC is covered; the point of departure compared to Concoqtion is that our language also includes a computational-level pattern matching construct on such terms. Thus logical terms are not to be viewed only as index types, and actually have a runtime representation that is essential for the kind of code we want to write. Pattern matching for logical terms would amount to a typecase-like construct in languages like Concoqtion, a feature that is generally not available in them. It is interesting to contrast our framework with computational languages that deal with terms of the LF logical framework [Harper et al. 1993], like Beluga [Pientka and Dunfield 2008] and Delphin [Poswolsky and Sch¨urmann 2008]. Especially Beluga has been an inspiration for this work, and our use of meta-variables and context polymorphism is closely modeled after it. LF provides good support for encoding typing judgements like the ones defining our logic; in principle our logic could be encoded inside LF, and languages such as the above could be used to write programs manipulating such encodings with static guarantees similar to the ones provided through our language. In practice, because of the inclusion of a notion of computation inside our logic, for which LF does not provide good support, this encoding would be a rather intricate exercise. The βι-reduction principles would have to be encoded as relations inside LF, and explicit witnesses of βι-equivalence would have to be provided at all places where our logic alludes to it. We are not aware of an existing encoding of a logic similar to the one we describe inside LF, and see it as a rather complicated endeavour. This situation could potentially be remedied by a framework like Licata et al. [2008] that considers adding computation to a subset of LF; still, this framework currently lacks dependent types, which are essential for encoding the judgements of our logic in LF. Even if this encoding of our logic was done, the aforementioned computational languages do not provide the imperative constructs that we have considered. Last, writing procedures that could be part of a proof assistant, even for simpler logics, has not been demonstrated yet in these languages, as we do here. Another framework that is somewhat related is Hoare Type Theory (HTT) and the associated YNot project [Chlipala et al. 2009, Nanevski et al. 2010]. This framework attempts to extend the programming model available in the Coq proof assistant with support for effectful features like mutable references. This is done by axiomatically extending Coq’s logic with support for a stateful monad; essentially, imperative computational features are integrated inside the logical framework. Our approach, instead, integrates the logical framework inside a computational language, keeping the two as orthogonal as possible. Thus, it does not require any significant metatheoretic additions to the logic. Additional features in our computational language like concurrency could be easily added, as long as they are type safe. In principle, one could use HTT in conjunction with the standard proof-by-reflection technique in order to program decision procedures in an imperative style inside the proof assistant. We are not aware of a development based

We have used this implementation to test two larger examples that we have developed in VeriML; these are included in the language distribution. The first example is an extension of the decision procedure for the theory of equality given in Section 3, so that uninterpreted functions are also handled. Furthermore, we use this decision procedure as part of a version of the tauto tactic that we showed earlier. Equality hypotheses are used in order to create the equivalence class hash table; terms contained in goals are then viewed up to equivalence based on this hash table, by using the functions provided by the decision procedure. The second example is a function that converts a proposition P into its negation normal form P0 , returning a proof object witnessing the fact that P0 implies P. Such proof objects are not built manually. They get produced by a version of the tauto tactic with all the extensions that we described in Section 3, along with handling of the False proposition. This tactic is enough to prove all the propositional tautologies required in NNF conversion.

7.

Related work

There is a large body of existing related work that we should compare the work we described here to. We will try to cover other language and framework designs that are similar in spirit or goals to the language design we have described here. The LTac language [Delahaye 2000, 2002] available in the Coq proof assistant is an obvious point of reference for this work. LTac is an untyped domain-specific language that can be used to define new tactics by combining existing ones, employing pattern matching on propositions and proof contexts. Its untyped nature is sometimes viewed as a shortcoming (e.g. in [Nanevski et al. 2010]), and there are problems with handling variables and matching under binders. Our language does not directly support all of the features of LTac. Still, we believe that our language can serve as a kernel where missing features can be developed in order to recover the practicality that current use of LTac demonstrates. Also, our language is strongly typed, statically guarantees correct behavior with regards to binding, and gives access to a richer set of programming constructs, including effectful ones; this, we believe, enables the development of more robust and complex tactics and decision procedures. Last, our language has formal operational semantics, which LTac lacks to the best of our knowledge, so the behaviour of tactics written in it can be better understood. The comparison with the LCF approach [Gordon et al. 1979] to building theorem provers is interesting both from a technical as well as from a historical standpoint, seeing how ML was originally developed toward the same goals as its extension that we are proposing here. The LCF approach to building a theorem prover for the logic we have presented here would basically amount to building a library inside ML that contained implementations for each axiom, yielding a term of the abstract thm datatype. By permitting the user to only create terms of this type through these functions, we would ensure that all terms of this datatype correspond to valid derivations – something that depends of course on the type safety of ML. Our approach is different in that the equivalent of the thm datatype is dependent on the proposition that the theorem shows. Coupled with the rest of the type system, we are able to specify tactics, tacticals, and other functions that manipulate logical terms and theorems in much more detail, yielding stronger static guarantees. Essentially, where such manipulation is done in an untyped manner following the usual LCF approach, it is done in a strongly typed way using our approach. We believe that this leads to a more principled and modular programming paradigm, a claim that we aim to further substantiate with future work. In recent years many languages with rich dependent type systems have been proposed, which bear similarity to the language

343

on this idea; for example, even though a decision procedure for the EUF theory is proved correct in Nanevski et al. [2010], it is not evident whether this procedure can be used as a tactic in order to prove further goals. We would be interested in attempting this approach in future work.

S. Fogarty, E. Pasalic, J. Siek, and W. Taha. Concoqtion: indexed types now! In Proceedings of the 2007 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation, pages 112–121. ACM New York, NY, USA, 2007.

8.

R. Harper, F. Honsell, and G. Plotkin. A framework for defining logics. Journal of the ACM, 40(1):143–184, 1993.

M.J. Gordon, R. Milner, and C.P. Wadsworth. Edinburgh LCF: a mechanized logic of computation. Springer-Verlag Berlin, 10:11–25, 1979.

Future work and conclusion

There are many directions for extending the work we have presented here. One is to investigate how to replicate most of the functionality of a language like LTac inside our language, by providing ways to perform pattern matching with back-tracking, and pattern matching on contexts. Furthermore, we want to explore issues of type and term inference in our language in order to limit its current verbosity, especially with respect to context quantification and instantiation. Last, the context manipulation currently allowed by our language is relatively limited, e.g. with respect to contraction of variables out of contexts. We are investigating how such limitations can be lifted, without complicating the language design or its metatheory. We have described VeriML, a new language design that introduces first-class support for a logical framework modeled after HOL and CIC inside a computational language with effects. The language allows pattern matching on arbitrary logical terms. A dependent type system is presented, which allows for strong specifications of effectful computation involving such terms. We have shown how tactics and decision procedures can be implemented in our language, providing strong static guarantees while at the same time allowing a rich programming model with non-termination and mutable references.

J. Harrison. HOL Light: A tutorial introduction. Lecture Notes in Computer Science, pages 265–269, 1996. C. Hawblitzel and E. Petrank. Automated verification of practical garbage collectors. ACM SIGPLAN Notices, 44(1):441–453, 2009. G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, et al. seL4: Formal verification of an OS kernel. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 207– 220. ACM, 2009. X. Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107–115, 2009. D.R. Licata, N. Zeilberger, and R. Harper. Focusing on binding and computation. In Logic in Computer Science, 2008. LICS’08, pages 241–252, 2008. A. Nanevski, G. Morrisett, and L. Birkedal. Polymorphism and separation in hoare type theory. In Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming, pages 62–73. ACM New York, NY, USA, 2006. Aleksandar Nanevski, Frank Pfenning, and Brigitte Pientka. Contextual modal type theory. ACM Trans. Comput. Log., 9(3), 2008. Aleksandar Nanevski, Viktor Vafeiadis, and Josh Berdine. Structuring the verification of heap-manipulating programs. In Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 261–274. ACM, 2010. T. Nipkow, L.C. Paulson, and M. Wenzel. Isabelle/HOL : A Proof Assistant for Higher-Order Logic, volume 2283 of LNCS, 2002. Ulf Norell. Towards a practical programming language based on dependent type theory. Technical report, Goteborg University, 2007. C. Paulin-Mohring. Inductive definitions in the system Coq; rules and properties. Lecture Notes in Computer Science, pages 328–328, 1993. Brigitte Pientka. A type-theoretic foundation for programming with higherorder abstract syntax and first-class substitutions. In Proceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 371–382. ACM, 2008.

Acknowledgments We thank anonymous referees for their comments on this paper. This work is supported in part by NSF grants CCF-0811665, CNS0915888, and CNS-0910670.

References Henk P. Barendregt and Herman Geuvers. Proof-assistants using dependent type systems. In A. Robinson and A. Voronkov, editors, Handbook of Automated Reasoning. Elsevier Sci. Pub. B.V., 1999. B. Barras, S. Boutin, C. Cornes, J. Courant, Y. Coscoy, D. Delahaye, D. de Rauglaudre, J.C. Filliˆatre, E. Gim´enez, H. Herbelin, et al. The Coq proof assistant reference manual (version 8.3), 2010. S. Boutin. Using reflection to build efficient and certified decision procedures. Lecture Notes in Computer Science, 1281:515–529, 1997. A.R. Bradley and Z. Manna. The calculus of computation: decision procedures with applications to verification. Springer-Verlag New York Inc, 2007. C. Chen and H. Xi. Combining programming with theorem proving. In Proceedings of the tenth ACM SIGPLAN international conference on Functional programming, page 77. ACM, 2005. A. Chlipala. Certified Programming with Dependent Types, 2008. URL http://adam.chlipala.net/cpdt/. Adam J. Chlipala, J. Gregory Malecha, Greg Morrisett, Avraham Shinnar, and Ryan Wisnesky. Effective interactive proofs for higher-order imperative programs. In Proceeding of the 14th ACM SIGPLAN international conference on Functional programming, pages 79–90. ACM, 2009. D. Delahaye. A tactic language for the system Coq. Lecture notes in computer science, pages 85–95, 2000. D. Delahaye. A proof dedicated meta-language. Electronic Notes in Theoretical Computer Science, 70(2):96–109, 2002. X. Feng, Z. Shao, Y. Guo, and Y. Dong. Combining domain-specific and foundational logics to verify complete software systems. In Proc. 2nd IFIP Working Conference on Verified Software: Theories, Tools, and Experiments (VSTTE’08), volume 5295 of LNCS, pages 54–69. Springer, October 2008.

Brigitte Pientka and Joshua Dunfield. Programming with proofs and explicit contexts. In Proceedings of the 10th international ACM SIGPLAN conference on Principles and practice of declarative programming, pages 163–173. ACM New York, NY, USA, 2008. B.C. Pierce. Types and programming languages. The MIT Press, 2002. A. Poswolsky and C. Sch¨urmann. Practical programming with higher-order encodings and dependent types. Lecture Notes in Computer Science, 4960:93, 2008. K. Slind and M. Norrish. A brief overview of HOL4. In TPHOLs, pages 28–32. Springer, 2008. M. Sozeau. Subset coercions in Coq. Types for Proofs and Programs, pages 237–252, 2007. A. Stampoulis and Z. Shao. VeriML: Typed computation of logical terms inside a language with effects (extended version). Technical Report YALEU/DCS/TR-1430, Dept. of Computer Science, Yale University, New Haven, CT, 2010. URL http://flint.cs.yale.edu/ publications/veriml.html. G.S. Tseitin. On the complexity of derivation in propositional calculus. Studies in constructive mathematics and mathematical logic, 2(115125):10–13, 1968. Benjamin Werner. Une Th´eorie des Constructions Inductives. PhD thesis, A L’Universit´e Paris 7, Paris, France, 1994.

344

Parametricity and Dependent Types Jean-Philippe Bernardy

Patrik Jansson

Ross Paterson

Chalmers University of Technology and University of Gothenburg {bernardy,patrikj}@chalmers.se

City University London [email protected]

Abstract

generic programming In a certain style of generic programming, functions can be type-indexed. However, in some cases it is useful to show that functions behave uniformly for all types. Vytiniotis and Weirich [2009] use parametricity to show that certain casting functions are equivalent to the identity.

Reynolds’ abstraction theorem shows how a typing judgement in System F can be translated into a relational statement (in second order predicate logic) about inhabitants of the type. We obtain a similar result for a single lambda calculus (a pure type system), in which terms, types and their relations are expressed. Working within a single system dispenses with the need for an interpretation layer, allowing for an unusually simple presentation. While the unification puts some constraints on the type system (which we spell out), the result applies to many interesting cases, including dependently-typed ones.

encoding of inductive types Via Church-encodings, inductive types can be encoded in pure System F. The proof of isomorphism relies on the parametricity condition. Hinze [2005] gives an illuminating example. Parametricity in System F is useful enough that there has been much research to transport it to related calculi. Johann and Voigtl¨ander [2005] have applied it to a system with explicit strictness; Vytiniotis and Weirich [2010] to Fω extended with representation types; Takeuti [2004] sketches how it can be applied to the λ-cube, Neis et al. [2009] to a system with dynamic casting. In this paper, we apply Reynolds’ idea to dependently-typed systems. In fact, we go one step further and generalize to a large class of pure type systems [Barendregt 1992]. By targeting pure type systems (PTSs), we aim to provide a framework which unifies previous descriptions of parametricity and forms a basis for future studies of parametricity in specific type systems. As a by-product, we get parametricity for dependentlytyped languages. Our specific contributions are:

Categories and Subject Descriptors F.3.3 [Logics and Meanings of Programs]: Studies of Program Constructs—Type Structure General Terms Keywords

1.

Languages, Theory

Pure type system, Abstraction theorem, Free theorems

Introduction

Reynolds [1983] defined a relational interpretation of System F types, and showed that interpretations of a term of that type in related contexts yield related results. He was thus able to constrain interpretations of polymorphic types. Wadler [1989] observed that if a type has no free variables, the relational interpretation can thus be viewed as a parametricity property satisfied by all terms of that type. Such properties have been used in a variety of situations. A few examples include:

• A concise definition of the translation of types to relations (Def-

inition 4), which yields parametricity propositions for PTSs.

program transformation The fold /build rule can be used to remove intermediate lists [Gill et al. 1993]. Its correctness can be proved using the parametricity condition derived from the type of the function build [Johann 2002].

• A formulation (and a proof) of the abstraction theorem for a

testing The testing of a polymorphic function can often be reduced to testing a single monomorphic instance. Bernardy et al. [2010a] present a scheme for constructing such a monomorphic instance for which the correctness proof relies on parametricity.

• An extension of the translation to inductive definitions (Sec-

automatic program inversion It is possible to write a function that inverts a polymorphic function given as input. The inversion process essentially relies on the parametric behaviour of the input function, and therefore its correctness relies on the corresponding parametricity condition [Voigtl¨ander 2009a].

• A demonstration by example of how to derive free theorems for

useful class of PTSs (Theorem 1). A remarkable feature of the theorem is that the translation from types to relations and the translations from terms to proofs are unified. tion 4). Our examples use a notation close to that of Agda [Norell 2007], for greater familiarity for users of dependentlytyped functional programming languages. (and as) dependently-typed functions (sections 3.1 and 5). Two examples of functions that we tackle are: generic catamorphism fold : ((F , mapF ) : Functor ) → (A : ?) → (F A → A) → µ F → A, which is a generic catamorphism function defined within a dependently-typed language (see Section 5.2). generic cast gcast : (F : ? → ?) → (u t : U ) → Maybe (F (El u) → F (El t)), which comes from a modelling of representation types with universes (see Section 5.3).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

In both cases, the derived parametricity condition yields useful properties to reason about the correctness of the function.

345

2.

Pure type systems

axiom

In this section we briefly review the notion of PTS as described by Barendregt [1992, sec. 5.2], and the basic intuitions behind it. We introduce our notation along the way, as well as our running example type system.

start

Definition 1 (Syntax of terms). A PTS is a type system over a λcalculus with the following syntax: T

= | | | |

C V T T λV : T . T ∀V : T . T

weakening

constant variable application abstraction dependent function space

product

abstraction

conversion

Γ`A:s Γ, x : A ` x : A Γ`A:B Γ`C:s Γ, x : C ` A : B

Γ ` F : (∀x : A. B) Γ`a:A Γ ` F a : B[x 7→ a]

Γ, x : A ` b : B Γ ` (∀x : A. B) : s Γ ` (λx : A. b) : (∀x : A. B) Γ`A:B

Γ ` B0 : s

B =β B0

Γ ` A : B0

Figure 1. Typing judgements of the PTS (S, A, R)

the λ-cube Barendregt [1992] defined a family of calculi each with S = {?, }, A = {? : } and R a selection of rules of the form s1 ; s2 , for example:

eralized Calculus of Constructions (CCω ) of Miquel [2001], impredicativity exists for the sort ? (conventionally called the sort of propositions), which lies at the bottom of the hierarchy.

• The (monomorphic) λ-calculus has Rλ = {? ; ?}, corre-

Definition 3 (CCω ). CCω is a PTS with this specification:

sponding to ordinary functions. • System F has RF = Rλ ∪ { ; ?}, adding (impredicative)

• S = {?} ∪ {i | i ∈ N} • A = {? : 0 } ∪ {i : i+1 | i ∈ N} • R = {? ; ?, ? ; i , i ; ? | i ∈ N} ∪

universal quantification over types. • System Fω has RFω = RF ∪ { ; }, adding type-level

{(i , j , max(i,j) ) | i, j ∈ N}

functions. • The Calculus of Constructions (CC) has RCC = RFω ∪ {? ;

Both CC and Iω are subsystems of CCω , with ?i in Iω corresponding to i in CCω . Because in CC corresponds to 0 in CCω , we often abbreviate 0 as . Many dependently-typed programming languages and proof assistants are based on variants of Iω or CCω , often with the addition of inductive definitions [Dybjer 1994; Paulin-Mohring 1993]. Such tools include Agda [Norell 2007], Coq [The Coq development team 2010] and Epigram [McBride and McKinna 2004].

}, adding dependent types. Here ? and are conventionally called the sorts of types and kinds respectively. Notice that F is a subsystem of Fω, which is itself a subsystem of CC. (We say that S1 = (S1 , A1 , R1 ) is a subsystem of S2 = (S2 , A2 , R2 ) when S1 ⊆ S2 , A1 ⊆ A2 and R1 ⊆ R2 .) In fact, the λ-cube is so named because the lattice of the subsystem relation between all the systems forms a cube, with CC at the top.

2.1

sort hierarchies Difficulties with impredicativity1 have led to the development of type systems with an infinite hierarchy of sorts. The “pure” part of such a system can be captured in the following PTS, which we name Iω .

PTS as logical system

Another use for PTSs is as logical systems: types correspond to propositions and terms to proofs. This correspondence extends to all aspects of the systems and is widely known as the Curry-Howard isomorphism. The judgement ` p : P means that p is a witness, or proof of the proposition P. In the logical system reading, an inhabited type corresponds to a tautology and dependent function types correspond to universal quantification. Predicates over a type A have type A → s, for some sort s: a value satisfies the predicate whenever the returned type is inhabited. Similarly, binary relations between values of types A1 and A2 have type A1 → A2 → s. For this approach to be safe, it is important that the system be consistent: some types must be uninhabited, or equivalently each witness p must reduce to a normal form. This is the case for the systems used here. In fact, in Iω and similarly rich type systems, one may both represent programs and logical formulae about them. In the following sections, we make full use of this property: we encode programs and parametricity statements about them in the same type system.

Definition 2 (Iω ). Iω is a PTS with this specification: • S = {?i | i ∈ N} • A = {?i : ?i+1 | i ∈ N} • R = {(?i , ?j , ?max(i,j) ) | i, j ∈ N}

Compared to the monomorphic λ-calculus, ? has been expanded into the infinite hierarchy ?0 , ?1 , . . . In Iω , the sort ?0 (abbreviated ?) is called the sort of types. Type constructors, or type-level functions have type ? → ?. The set of types (?), the set of type constructors (? → ?) and similar have type ?1 (the sort of kinds). Terms like ?1 and ? → ?1 have type ?2 , and so on. Impredicativity can in fact coexist with an infinite hierarchy of sorts, as Coquand [1986] has shown. For example, in the Gen1 It

c:s∈A

Γ ` A : s1 Γ, x : A ` B : s2 (s1 , s2 , s3 ) ∈ R Γ ` (∀x : A. B) : s3

application

We often write (x : A) → B for ∀x : A. B, and sometimes just A → B when x does not occur free in B. The typing judgement of a PTS is parametrized over a specification S = (S, A, R), where S ⊆ C, A ⊆ C×S and R ⊆ S ×S ×S. The set S specifies the sorts, A the axioms (an axiom (c, s) ∈ A is often written c : s), and R specifies the typing rules of the function space. A rule (s1 , s2 , s2 ) is often written s1 ; s2 . The rules for typing judgements in a PTS are given in Figure 1. An attractive feature of PTSs is that the syntax for types and values is unified. It is the type of a term that tells how to interpret it (as a value, type, kind, etc.).

`c:s

is inconsistent with strong sums [Coquand 1986].

346

3.

Types to relations

example systems Note that both Iω and CCω are reflective, with se = s. Therefore we can write programs in these systems and derive valid statements about them, using [[ ]], within the same PTS. We proceed to do so in the rest of the paper.

We start by defining the relational interpretation of a term, as a syntactic translation from terms to terms. As we see in Section 3.1, it is a generalization of the classical rules given by Reynolds [1983], extended to application and abstraction. In this section, we assume that the only constants are sorts. We also assume for each sort s another sort se of parametricity propositions about terms of type s. In our examples, we simply choose se = s. We shall return to the general case in Section 6.2.

3.1

Examples: the λ-cube

In this section, we show that [[ ]] specializes to the rules given by Reynolds [1983] to read a System F type as a relation. Having shown that our framework can explain parametricity theorems for System-F-style types, we move on to progressively higher-order constructs. In these examples, the binary version of parametricity is used (arity n = 2). For examples using the unary version (arity n = 1) see Section 5.3. While the systems of the λ-cube are not reflective, they are embedded in CCω , which is. This means that our translation rules take System F types to terms in CCω (instead of second order propositional logic). The possibility of using a different PTS for the logic is discussed in Section 6.3.

Definition 4 ([[ ]], translation from types to relations). Given a natural number n (the arity of relations), we assume for each variable x, fresh variables x1 , . . . , xn and xR . We write A for the n terms Ai , each obtained by replacing each free variable x in A with xi . Correspondingly, x : A stands for n bindings (xi : Ai ). We define a mapping [[ ]] from T to T as follows: [[s]] = λx : s. x → se [[x]] = xR

types to relations

[[∀x : A. B]] = λf : (∀x : A. B). ∀x : A. ∀xR : [[A]] x. [[B]] (f x)

Note that, by definition,

[[?]] T1 T2 = T1 → T2 → ?

[[F a]] = [[F]] a [[a]]

Assuming that types inhabit the sort ?, this means that types are translated to relations (as expected). Here we also use ? on the right side as the sort of propositions (e ? = ?), but other choices are possible, as we shall discuss in Section 6.2.

[[λx : A. b]] = λx : A. λxR : [[A]] x. [[b]] Note that for each variable x free in A, the translation [[A]] has free variables x1 , . . . , xn and xR . There is a corresponding replication of variables bound in contexts, which is made explicit in the following definition.

function types Applying our translation to non-dependent function types, we get:

Definition 5 (translation of contexts).

[[A → B]] : [[?]] (A → B) (A → B) [[A → B]] f1 f2 = ∀a1 : A. ∀a2 : A. [[A]] a1 a2 → [[B]] (f1 a1 ) (f2 a2 )

[[Γ, x : A]] = [[Γ]], x : A, xR : [[A]] x Note that each tuple x : A in the translated context must satisfy the relation [[A]], as witnessed by xR . Thus, one may interpret [[Γ]] as n related environments. In order for a PTS to be able to express both programs and parametricity propositions about them, it must satisfy certain closure conditions, for which we coin the term reflective:

That is, functions are related iff they take related arguments into related outputs. type schemes System F includes universal quantification of the form ∀A : ?. B. Applying [[ ]] to this type expression yields: [[∀A : ?. B]] : [[?]] (∀A : ?. B) (∀A : ?. B) [[∀A : ?. B]] g1 g2 = ∀A1 : ?. ∀A2 : ?. ∀AR : [[?]] A1 A2 . [[B]] (g1 A1 ) (g2 A2 )

Definition 6 (reflective). A PTS (S, A, R) is reflective if • for each sort s ∈ S

∃e s∈S ∃s0 ∈ S such that s : s0 ∈ A • for each axiom s : s0 ∈ A se : se0 ∈ A s ; se0 ∈ R • for each rule (s1 , s2 , s3 ) ∈ R (se1 , se2 , se3 ) ∈ R s1 ; se3 ∈ R

In words, polymorphic values are related iff instances at related types are related. Note that as A may occur free in B, the variables A1 , A2 and AR may occur free in [[B]]. type constructors With the addition of the rule ; , one can construct terms of type ? → ?, which are sometimes known as type constructors, type formers or type-level functions. As Voigtl¨ander [2009b] remarks, extending Reynolds-style parametricity to support type constructors appears to be folklore. Such folklore can be precisely justified by our framework by applying [[ ]] to obtain the relational counterpart of type constructors:

We can then state our main result: Theorem 1 (abstraction). Given a reflective PTS (S, A, R), Γ ` A : B =⇒ [[Γ]] ` [[A]] : [[B]] A

[[? → ?]] : [[]] (? → ?) (? → ?) [[? → ?]] F1 F2 = ∀A1 : ?. ∀A2 : ?. [[?]] A1 A2 → [[?]] (F1 A1 ) (F2 A2 )

Proof. By induction on the derivation. A brief sketch of the proof is given in appendix A.

That is, a term of type [[? → ?]] F1 F2 is a (polymorphic) function converting a relation between any types A1 and A2 to a relation between F1 A1 and F2 A2 , a relational action.

The above theorem can be read in two ways. A direct reading is as a typing judgement about translated terms: if A has type B, then [[A]] has type [[B]] A. The more fruitful reading is as an abstraction theorem for pure type systems: if A has type B in environment Γ, then n interpretations A in related environments [[Γ]] are related by [[B]]. Further, [[A]] is a witness of this proposition within the type system. In particular, closed terms are related to themselves:

dependent functions In a system with the rule ? ; , value variables may occur in dependent function types like ∀x : A. B, which we translate as follows: [[∀x : A. B]] : [[?]] (∀x : A. B) (∀x : A. B) [[∀x : A. B]] f1 f2 = ∀x1 : A. ∀x2 : A. ∀xR : [[A]] x1 x2 . [[B]] (f1 x1 ) (f2 x2 )

Corollary 2 (parametricity). ` A : B =⇒ ` [[A]] : [[B]] A

347

proof terms We have used [[ ]] to turn types into relations, but we can also use it to turn terms into proofs of abstraction properties. As a simple example, the relation corresponding to the type T = ∀A : ?. A → A, namely

data Bool : ? where true : Bool false : Bool data Nat : ? where zero : Nat succ : Nat → Nat

[[T ]] f1 f2 = ∀A1 : ?. ∀A2 : ?. ∀AR : [[?]] A1 A2 . ∀x1 : A1 . ∀x2 : A2 . AR x1 x2 → AR (f1 A1 x1 ) (f2 A2 x2 )

data ⊥ : ? where

states that functions of this type map related inputs to related outputs. From a term id = λA : ?. λx : A. x of this type, by Theorem 2 we obtain a term [[id ]] : [[T ]] id id , that is, a proof of the abstraction property:

data > : ? where tt : > data List (A : ?) : ? where nil : List A cons : A → List A → List A

[[id ]] A1 A2 AR x1 x2 xR = xR

4.

data Vec (A : ?) : Nat → ? where nilV : Vec A zero consV : A → (n : Nat) → Vec A n → Vec A (succ n)

Constants and data types

While the above development assumes pure type systems with C = S, it is possible to add constants to the system and retain parametricity, as long as each constant is parametric. That is, for each new axiom ` k : A (where k is an arbitrary constant and A an arbitrary term such that ` A : s, not a mere sort) we require a term [[k]] such that the judgement ` [[k]] : [[A]] k holds. (Additionally, βconversion rules involving those constants must preserve types.) One source of constants in many languages is data type definitions. In the rest of the this section we detail how to handle such definitions (in a system extending Iω ). 4.1

-- no constructors

data Σ (A : ?) (B : A → ?) : ? where , : (a : A) → B a → Σ A B data ≡ (A : ?) (a : A) : A → ? where refl : ≡ A a a Figure 2. Example inductive families For example, the definition of List in Figure 2 gives rise to the following constants:

Inductive families

Many languages permit data type declarations like those in Figure 2. Dependently typed languages typically allow the return types of constructors to have different arguments, yielding inductive families [Dybjer 1994; Paulin-Mohring 1993] such as the family Vec, in which the type is indexed by the number of elements. Data family declarations of sort s (? in the examples) have the typical form:2

List nil cons List-elim

data T (a : A) : ∀n : N. s where c : ∀b : B. (∀x : X. T a i) → T a v

: (A : ?) → ? : (A : ?) → List A : (A : ?) → A → List A → List A : (A : ?) → (P : List A → ?) → P (nil A) → ((x : A) → (xs : List A) → P xs → P (cons A x xs)) → (l : List A) → P l

Arguments of the type constructor T may be either parameters a, which scope over the constructors and are repeated at each recursive use of T , or indices n, which may vary between uses. Data constructors c have non-recursive arguments b, whose types are otherwise unrestricted, and recursive arguments with types of a constrained form, which cannot be referred to in the other terms. Such a declaration can be interpreted as a simultaneous declaration of formation and introduction constants T : ∀a : A. ∀n : N. s c : ∀a : A. ∀b : B. (∀x : X. T a i) → T a v

In the following sections, we consider two ways to define an abstraction proof [[k]] : [[τ ]] k for each constant k : τ introduced by the data definition.

and also an eliminator to analyse values of that type:

To turn this into a pattern matching definition of T-elim, we need a suitable definition of [[c]], and similarly for the constructors in v. The only arguments of [[c]] not already in scope are bR and uR , so we package them as a dependent pair, because the type of uR may depend on that of bR . Writing (x : A) × B for Σ A (λx : A. B), and elements of this type as (a, b), omitting the arguments A and λx : A. B, we define3

4.2

Deductive-style translation

First, we define each proof as a term (using pattern matching to simplify the presentation). We begin with the translation of the equation for each constructor: [[T -elim a P e v]] (c a b u) ([[c]] a aR b bR u uR ) = [[e b u (λx : X. T -elim a P e i (u x))]]

T -elim : ∀a : A. ∀P : (∀n : N. T a n → s). Casec → ∀n : N. ∀t : T a n. P n t where the type Casec of the case for each constructor c is ∀b : B. ∀u : (∀x : X. T a i). (∀x : X. P i (u x)) → P v (c a b u)

[[T ]] : [[∀a : A. ∀n : N. s]] T [[T ]] a aR v [[v]] (c a b u) = (bR : [[B]] b) × [[∀x : X. T a i]] u [[T ]] a aR u uR t = ⊥

with beta-equivalences (one for each constructor c): T -elim a P e v (c a b u) = e b u (λx : X. T -elim a P e i (u x)) We shall often use corresponding pattern matching definitions instead of these eliminators [Coquand 1992].

[[c]] : [[∀a : A. ∀b : B. (∀x : X. T a i) → T a v]] c [[c]] a aR b bR u uR = (bR , uR )

2 We show only one of each element here, but the generalization to arbitrary 3 The

numbers is straightforward.

348

definition of [[T ]] relies on the weak elimination constant to sort se0 .

(From this example onwards, we use a layout convention to ease the reading of translated types: each triple of arguments, corresponding to one argument in the original function, is written on its own line if space permits.) In order to complete the above definition, we must provide a type-correct expression for each question mark. In the case of the second equation, this means that we must construct an expression of type AR x1 y2 . Neither xR : AR x1 x2 nor yR : AR y1 y2 can help us here. The only liberty left is in bR : [[Bool ]] true false. If we let [[Bool ]] true false be ⊥, then this case can never be reached and we need not give an equation for it. This reasoning holds symmetrically for the third equation. Therefore, we have the restrictions:

and the translation of T-elim becomes [[T -elim a P e v]] (c a b u) (bR , uR ) = [[e b u (λx : X. T -elim a P e i (u x))]] Because [[T ]] yields ⊥ unless the constructors match, these clauses provide complete coverage. The reader may have noted by now that the argument lists of the translated constants tend to be quite long. The use of the translated constants can be substantially simplified using implicit arguments (arguments which can be inferred from contextual knowledge). We avoid using them in this paper to explicitly show the underlying machinery, but the Agda library implementing the translation makes heavy use of implicit arguments for convenience.

[[Bool ]] x x = some inhabited type [[Bool ]] x y = ⊥ if x 6= y

Booleans To get an intuition of the meaning of the above translation scheme we proceed to apply it to a number of examples, starting with the data type for Booleans. We obtain:

We have some freedom regarding picking “some inhabited type”, so we choose [[Bool ]] x x = >, yielding an encoding of the identity relation. In general, for any base type, the identity is the most permissive relation which allows for a definition of the translation of the eliminator. An intuition behind parametricity is that, the more programs “know” about a type, the more restricted parametricity theorems are. Through the Bool example, we have seen how our framework captures this intuition, in a fine grained manner. We revisit this idea in Section 5.4.

[[Bool ]] : [[?]] Bool Bool [[Bool ]] true true = > [[Bool ]] false false = > [[Bool ]] = ⊥ [[true]] : [[Bool ]] true true [[true]] = tt [[false]] : [[Bool ]] false false [[false]] = tt

lists and vectors From the definition of List in Figure 2, we have the constant List : ? → ?, so List is an example of a type constructor, and thus [[List]] is a relation transformer. The relation transformer we get by applying our scheme is exactly that given by Wadler [1989]: lists are related iff their lengths are equal and their elements are related point-wise.

(We use > for nullary constructors as it is the identity of ×.) parametricity and elimination Reynolds [1983] and Wadler [1989] assume that each type constant K : ? is translated to the identity relation, as we have done for Bool above. This definition is certainly compatible with the condition required by Theorem 1 for such constants: [[K ]] : [[?]] K K , but so are many other relations. Are we missing some restriction for constants? This question might be answered by resorting to a translation to pure terms via Church encodings [B¨ohm and Berarducci 1985], as Wadler [2007] does. However, in the hope to shed a different light on the issue, we give another explanation, using our machinery. Consider a base type, such as Bool : ?, equipped with constructors true : Bool and false : Bool . In order to derive parametricity theorems in a system containing such a constant Bool , we must define [[Bool ]], satisfying ` [[Bool ]] : [[?]] Bool . What are the restrictions put on the term [[Bool ]]? First, we must be able to define [[true]] : [[Bool ]] true. Therefore, [[Bool ]] true must be inhabited. The same reasoning holds for the false case. Second, to write any useful program using Booleans, a way to test their value is needed. This may be done by adding a constant if : Bool → (A : ?) → A → A → A, such that if true A x y −→β x and if false A x y −→β y. (This special case of Bool-elim is sufficient for the present example.) Now, if a program uses if , we must also define [[if ]] of type [[Bool → (A : ?) → A → A → A]] if for parametricity to work. Let us expand the type of [[if ]] and attempt to give a definition case by case:

JListK : [[? → ?]] List List JListK A1 A2 AR nil nil = > JListK A1 A2 AR (cons x1 xs1 ) (cons x2 xs2 ) = AR x1 x2 × JListK A1 A2 AR xs1 xs2 JListK A1 A2 AR = ⊥ [[nil ]] : [[(A : ?) → List A]] nil nil [[nil ]] A1 A2 AR = tt [[cons]] : [[(A : ?) → A → List A → List A]] cons cons [[cons]] A1 A2 AR x1 x2 xR xs1 xs2 xsR = (xR , xsR ) The translations of the constants of Vec are given in Figure 3. list rearrangements The first example of parametric type examined by Wadler [1989] is the type of list rearrangements: R = (A : ?) → List A → List A. Intuitively, functions of type R know nothing about the actual argument type A, and therefore they can only produce the output list by taking elements from the input list. In this section we recover that result as an instance of Theorem 1. Applying the translation to R yields: JRK : R → R → ? JRK r1 r2 = (A1 A2 : ?) → (AR : [[?]] A1 A2 ) → (l1 : List A1 ) → (l2 : List A2 ) → (lR : JListK A1 A2 AR l1 l2 ) → JListK A1 A2 AR (r1 A1 l1 ) (r2 A2 l2 )

[[if ]] : (b1 b2 : Bool ) → (bR : [[Bool ]] b1 b2 ) → (A1 A2 : ?) → (AR : [[?]] A1 A2 ) → (x1 : A1 ) → (x2 : A2 ) → (xR : AR x1 x2 ) → (y1 : A1 ) → (y2 : A2 ) → (yR : AR y1 y2 ) → AR (if b1 A1 x1 y1 ) (if b2 A2 x2 y2 ) [[if ]] true true bR x1 x2 xR y1 y2 yR = xR [[if ]] true false bR x1 x2 xR y1 y2 yR = ? [[if ]] false true bR x1 x2 xR y1 y2 yR = ? [[if ]] false false bR x1 x2 xR y1 y2 yR = yR

In words: two list rearrangements r1 and r2 are related iff for all types A1 and A2 with relation AR , and for all lists l1 and l2 pointwise related by AR , the resulting lists r1 A1 l1 and r2 A2 l2 are also point-wise related by AR . By corollary 2 (parametricity), we have, for any r : ` r : R =⇒ ` [[r ]] : [[R]] r r

349

JVecK : [[(A : ?) → Nat → ?]] Vec JVecK A1 A1 AR zero zero nilV nilV = > JVecK A1 A1 AR (succ n1 ) (succ n2 ) nR (consV n1 x1 xs1 ) (consV n2 x2 xs2 ) = AR x1 x2 × (nR : JNatK n1 n2 ) × JVecK A1 A1 AR n1 n2 nR xs1 xs2 xsR JVecK A1 A1 AR n1 n2 nR xs1 xs2 = ⊥ JnilV K : [[(A : ?) → Vec A zero]] nilV JnilV K A1 A1 AR = tt

JconsV K : [[(A : ?) → A → (n : Nat) → Vec A n → Vec A (succ n)]] consV JconsV K A1 A1 AR x1 x2 xR n1 n2 nR xs1 xs2 xsR = (xR , (nR , xsR ))

[[Vec-elim]] : J (A : ?) → (P : (n : Nat) → Vec n A → ?) → (en : P zero (nilV A)) → (ec : (x : A) → (n : Nat) → (xs : Vec n A) → P n xs → P (succ n) (consV A x n xs)) → (n : Nat) → (v : Vec n A) → P n v K Vec-elim [[Vec-elim]] A1 A2 AR P1 P2 PR en 1 en 2 en R ec 1 ec 2 ec R zero zero nilV nilV = en R [[Vec-elim]] A1 A2 AR P1 P2 PR en 1 en 2 en R ec 1 ec 2 ec R (succ n1 ) (succ n2 ) nR (consV x1 n1 xs1 ) (consV x2 n2 xs2 ) (xR , (nR , xsR )) = ec R x1 x2 xR n1 n2 nR xs1 xs2 xsR (Vec-elim A1 P1 en 1 ec 1 n1 xs1 ) (Vec-elim A2 P2 en 2 ec 2 n2 xs2 ) ([[Vec-elim]] A1 A2 AR P1 P2 PR en 1 en 2 en R ec 1 ec 2 ec R n1 n2 nR xs1 xs2 xsR ) Figure 3. Deductive translation of Vec constants. ([[Nat]] is the identity relation.) In words: applying r preserves (point-wise) any relation existing between input lists. By specializing AR to a function (AR a1 a2 = f a1 ≡ a2 ) we obtain the well-known result:

4.3

Inductive-style translation

Inductive definitions offer another way of defining the translations [[c]] of the constants associated with a data type, an inductive definition in contrast to the deductive definitions of the previous section. Given an inductive family

` r : R =⇒ (A1 A2 : ?) → (f : A1 → A2 ) → (l : List A1 ) → map f (r A1 l ) ≡ r A2 (map f l )

data T (a : A) : K where c:C

(This form relies on the facts that [[List]] preserves identities and composes with map.)

by applying our translation to the components of the datadeclaration, we obtain an inductive family that defines the relational counterparts of the original type T and its constructors c at the same time:

proof terms We have seen that applying [[ ]] to a type yields a parametricity property for terms of that type. However, by Theorem 1 we can also apply [[ ]] to a term of that type to obtain a proof of the property. Consider a list rearrangement function odds that returns every second element from a list.

data [[T ]] ([[a : A]]) : [[K]] (T a) where c : [[C]] (c a) It remains to supply a proof term for the parametricity of the elimination constant T-elim. If the inductive family has the form

odds : (A : ?) → List A → List A odds A nil = nil A odds A (cons x nil ) = cons A x nil odds A (cons x (cons xs)) = cons A x (odds A xs)

data T (a : A) : ∀n : N. s where c : ∀b : B. (∀x : X. T a i) → T a v then the proof [[T-elim]] can be defined using [[T ]]-elim and T-elim as follows:

Any list rearrangement function must satisfy the parametricity condition seen above. We know by Theorem 1 that [[odds]] is a proof that odds satisfies parametricity. Expanding it yields:

[[T-elim]] : [[∀a : A. ∀P : (∀n : N. T a n → s). ∀e : Casec . ∀n : N. ∀t : T a n. P n t]] T-elim [[T-elim a P e]] = [[T ]]-elim a aR (λ[[n : N, t : T a n]]. [[P n t]] (T-elim a P e n t)) (λ[[b : B, u : (∀x : X. T a i)]]. [[e b u]] (λx : X. T-elim a P e i (u x)))

[[odds]] : [[(A : ?) → List A → List A]] odds odds [[odds]] A1 A2 AR nil nil = tt [[odds]] A1 A2 AR (cons x1 nil ) (cons x2 nil ) (xR , ) = (xR , tt) [[odds]] A1 A2 AR (cons x1 (cons xs1 )) (cons x2 (cons xs2 )) (xR , ( , xsR )) = (xR , [[odds]] A1 A2 AR xs1 xs2 xsR )

Deductive and inductive-style translations define the same relation, but the objects witnessing the instances of the inductively definedrelation record additional information, namely which rules are used to prove membership of the relation. However, since the same constructor never appears in more than one case of the inductive definition, that additional content can be recovered from a witness of the deductive-style; therefore the two styles are truly isomorphic.

We see that [[odds]] performs essentially the same computation as odds, on two lists in parallel. However, instead of building a new list, it keeps track of the relations (in the R-subscripted variables). This behaviour stems from the last two cases in the definition of [[odds]]. Performing such a computation is enough to prove the parametricity condition.

350

[[odds]] A1 A2 AR (cons nil ) (cons nil ) ([[cons]] x1 x2 xR nil nil [[nil ]]) = [[cons]] A1 A2 AR x1 x2 xR (nil A1 ) (nil A2 ) ([[nil ]] A1 A2 AR ) [[odds]] A1 A2 AR )) (cons (cons )) (cons (cons ([[cons]] x1 x2 xR xs1 xs2 ([[cons]] xs1 xs2 xsR )) = [[cons]] A1 A2 AR x1 x2 xR (odds A1 xs1 ) (odds A2 xs2 ) ([[odds]] A1 A2 AR xs1 xs2 xsR )

Booleans Applying the above scheme to the data-declaration of Bool (from Figure 2), we obtain: data [[Bool ]] : [[?]] Bool where [[true]] : [[Bool ]] true [[false]] : [[Bool ]] false The main difference from the deductive-style definition is that it is possible, by analysis of a value of type [[Bool ]], to recover the arguments of the relation (either all true, or all false). The elimination constant for Bool is Bool -elim : (P : Bool → ?) → P true → P false → (b : Bool ) → P b

vectors We can apply the same translation method to inductive families. For example, Figures 4 and 5 give the translation of the family Vec, corresponding to lists indexed by their length. The relation obtained by applying [[ ]] encodes that vectors are related if their lengths are the same and if their elements are related pointwise. The difference with the List version is that the equality of lengths is encoded in [[consV ]] as a Nat (identity) relation.

Similarly, our new type [[Bool ]] (with n = 2) has an elimination constant with the following type: [[Bool ]]-elim : (C : (a1 a2 : Bool ) → [[Bool ]] a1 a2 → ?) → C true true [[true]] → C false false [[false]] → (b1 b2 : Bool ) → (bR : [[Bool ]] b1 b2 ) → C b1 b2 bR As an instance of the above scheme, we can define [[Bool -elim]] using the elimination constants [[Bool ]] and [[Bool ]]-elim as follows (where t = true and f = false):

5.

Applications

In this section we shall see how examples going beyond Wadler [1989] can be expressed in our setting. All examples fit within the system Iω augmented with inductive definitions.

[[Bool -elim]] : (P1 P2 : Bool → ?) → (PR : [[Bool → ?]] P1 P2 ) → (x1 : P1 t) → (x2 : P2 t) → (PR t t [[t]] x1 x2 ) → (y1 : P1 f ) → (y2 : P2 f ) → (PR f f [[f ]] y1 y2 ) → (b1 b2 : Bool ) → (bR : [[Bool ]] b1 b2 ) → PR b1 b2 bR (Bool -elim P1 x1 y1 b1 ) (Bool -elim P2 x2 y2 b2 ) [[Bool -elim]] P1 P2 PR x1 x2 xR y1 y2 yR = [[Bool ]]-elim (λ b1 b2 bR → PR b1 b2 bR (Bool -elim P1 x1 y1 b1 ) (Bool -elim P2 x2 y2 b2 )) xR yR

5.1

Type classes

What if a function is not parametrized over all types, but only types equipped with decidable equality? One way to model this difference in a pure type system is to add an extra parameter to capture the extra constraint. For example, a function nub : Nub removing duplicates from a list may be given the following type: Nub = (A : ?) → Eq A → List A → List A The equality requirement itself may be modelled as a mere comparison function: Eq A = A → A → Bool . In that case, the parametricity statement is amended with an extra requirement on the relation between types, which expresses that eq1 and eq2 must respect the AR relation. Formally:

lists For List, as introduced in Figure 2, we have the following translation: data [[List]] ([[A : ?]]) : [[?]] (List A) where [[nil ]] : [[List A]] (nil A) [[cons]] : [[A → List A → List A]] (cons A)

[[Eq A]] eq1 eq2 = (a1 : A1 ) → (a2 : A2 ) → AR a1 a2 → (b1 : A1 ) → (b2 : A2 ) → AR b1 b2 → [[Bool ]] (eq1 a1 b1 ) (eq2 a2 b2 ) [[Nub]] n1 n2 = (A1 A2 : ?) → (AR : [[?]] A1 A2 ) → (eq1 : Eq A1 ) → (eq2 : Eq A2 ) → [[Eq A]] eq1 eq2 → (l1 : List A1 ) → (l2 : List A2 ) → [[List A]] l1 l2 → JList AK (n1 A1 eq1 l1 ) (n2 A2 eq2 l2 )

or after expansion (for n = 2): data [[List]] (A1 A2 : ?) (AR : [[?]] A1 A2 ) : List A1 → List A2 → ? where [[nil ]] : [[List]] A1 A2 AR (nil A1 ) (nil A2 ) [[cons]] : (x1 : A1 ) → (x2 : A2 ) → (xR : AR x1 x2 ) → (xs1 : List A1 ) → (xs2 : List A2 ) → (xsR : [[List]] A1 A2 AR xs1 xs2 ) → [[List]] A1 A2 AR (cons A1 x1 xs1 ) (cons A2 x2 xs2 )

So far, this is just confirming the informal description in Wadler [1989]. But with access to full dependent types, one might wonder: what if we model equality more precisely, for example by requiring eq to be reflexive? Eq 0 A = (eq : A → A → Bool ) × Refl eq Refl eq = (x : A) → eq x x ≡ true

The above definition encodes the same relational action as that given in Section 4.2. Again, the difference is that the derivation of a relation between lists l1 and l2 is available as an object of type [[List]] A1 A2 AR l1 l2 .

In the case of Eq 0 , the parametricity condition does not become more exciting. It merely requires the proofs of reflexivity at A1 , A2 to be related. This extra condition adds nothing new: since there is at most one element in (and thus proof of) x ≡ y, one already expects proofs to be related. The observations drawn from this simple example can be generalized in two ways. First, proof arguments do not strengthen parametricity conditions in useful ways. One often does not care about the actual proof of a proposition, but merely that it exists, so knowing that two proofs are related adds nothing. Secondly, type-classes

proof terms The proof term for the list-rearrangement example can be constructed in a similar way to the inductive one. The main difference is that the target lists are also built and recorded in the [[List]] structure. In short, this version has more of a computational flavour than the inductive version. [[odds]] : [[(A : ?) → List A → List A]] odds odds [[odds]] A1 A2 AR nil nil [[nil ]] = [[nil ]] A1 A2 AR

351

data JVecK ([[A : ?]]) : [[Nat → ?]] (Vec A) where JnilV K : [[Vec A zero]] (nilV A) JconsV K : [[(x : A) → (n : Nat) → Vec A n → Vec A (succ n)]] (consV A)

data JVecK (A1 A2 : ?) (AR : A1 → A2 → ?) : (n1 n2 : Nat) → (nR : [[Nat]] n1 n2 ) → Vec A1 n1 → Vec A2 n2 → ? where JnilV K : JVecK A1 A2 AR zero zero [[zero]] (nilV A1 ) (nilV A2 ) JconsV K : (x1 : A1 ) → (x2 : A2 ) → (xR : AR x1 x2 ) → (n1 : Nat) → (n2 : Nat) → (nR : [[Nat]] n1 n2 ) → (xs1 : Vec A1 n1 ) → (xs2 : Vec A2 n2 ) → (xsR : JVecK A1 A2 AR n1 n2 nR xs1 xs2 ) → JVecK A1 A2 AR (succ n1 ) (succ n2 ) (JsuccK n1 n2 nR ) (consV A1 x1 n1 xs1 ) (consV A2 x2 n2 xs2 ) Figure 4. Inductive translation of Vec, both before and after expansion. J Vec-elim K : J (A : ?) → (P : (n : Nat) → Vec n A → ?) → (en : P zero (nilV A)) → (ec : (x : A) → (n : Nat) → (xs : Vec n A) → P n xs → P (succ n) (consV A x n xs)) → (n : Nat) → (v : Vec n A) → P n v K Vec-elim J Vec-elim A P en ecK = [[Vec]]-elim A AR (λ Jn : Nat, v : Vec n AK → JP n v K (Vec-elim A P en ec v )) en R (λ Jx : A, n : Nat, xs : Vec n AK → Jec x n xsK (Vec-elim A P en ec xs)) Figure 5. Proof term for Vec-elim using the inductive-style definitions. Such Functor s can be used to define a generic fold operation, which typically takes the following form:

may be encoded as their dictionary of methods [Wadler and Blott 1989]. Indeed, even if a type class has associated laws, they have little impact on the parametricity results.

5.2

data µ ((F , map) : Functor ) : ? where In : F (µ (F , map)) → µ (F , map) fold : ((F , map) : Functor ) → (A : ?) → (F A → A) → µ (F , map) → A fold (F , map) A φ (In d ) = φ (map (µ (F , map)) A (fold (F , map) A φ) d )

Constructor classes

Having seen how to apply our framework both to type constructors and type classes, we now apply it to types quantified over a type constructor, with constraints. Voigtl¨ander [2009b] provides many such examples, using the Monad constructor class. They fit well in our framework. For the sake of brevity, we do not detail them more here. We can however detail the definition of the simpler Functor class, which can be modelled as follows:

Note that the µ datatype is not strictly positive, so its use would be prohibited in many dependently-typed languages to avoid inconsistency. However, if one restricts oneself to well-behaved functors (yielding strictly positive types), then consistency is restored both in the source and target systems, and the parametricity condition derived for fold is valid. One can see from the type of fold that it behaves uniformly over (F , map) as well as A. By applying [[ ]] to fold and its type, this observation can be expressed (and justified) formally and used to reason about fold . Further, every function defined using fold , and in general any function parametrized over any functor enjoys the same kind of property. Gibbons and Paterson [2009] previously made a similar observation in a categorical setting, showing that fold is a natural transformation between higher-order functors. Their argument heavily relies on categorical semantics and the universal property of fold , while our type-theoretical argument uses the type of fold as a starting point and directly obtains a parametricity property. However some additional work is required to obtain the equivalent property using natural transformations and horizontal compositions from the parametricity property.

Functor = (F : ? → ?) × ((X Y : ?) → (X → Y ) → F X → F Y ) Our translation readily applies to the above definition, and yields the following relation between functors: [[Functor ]] (F1 , map1 ) (F2 , map2 ) = (FR : (A1 A2 : ?) → (AR : A1 → A2 → ?) → (F1 A1 → F2 A2 → ?)) × ((X1 X2 : ?) → (XR : X1 → X2 → ?) → (Y1 Y2 : ?) → (YR : Y1 → Y2 → ?) → (f1 : X1 → Y1 ) → (f2 : X2 → Y2 ) → ((x1 : X1 ) → (x2 : X2 ) → (xR : XR x1 x2 ) → YR (f1 x1 ) (f2 x2 )) → (y1 : F1 X1 ) → (y2 : F2 X2 ) → (yR : FR XR y1 y2 ) → FR YR (map1 f1 y1 ) (map2 f2 y2 ))

5.3

In words, the translation of a functor is the product of a relation transformer (FR ) between functors F1 and F2 , and a witness (mapR ) that map1 and map2 preserve relations.

Generic cast

Continuing to apply our framework to terms of increasingly rich types, the next candidate is dependently typed.

352

We remark that onMaybe is in fact the deductive version of [[Maybe]], for the unary version of [[ ]]. We take this as a hint to use the unary version of [[ ]], and derive relations of the following types:

An important application of dependent types is that of generic programming with universes, as in the work of Altenkirch and McBride [2003]; Benke et al. [2003]. The basic idea is to represent the “universe” of types as data, and provide an interpretation function from values of this data type to types (in ?). Generic functions can then be written by pattern matching on the type representation. While universes usually capture large classes of types, we use as an example a very simple universe of types for Booleans and natural numbers, as follows.4

[[U ]] : U → ?1 [[El ]] : (u : U ) → (uR : [[U ]] u) → [[?]] (El u) [[gcast]] : (F : ? → ?) → (FR : [[? → ?]] F ) → (u : U ) → (uR : [[U ]] u) → (t : U ) → (tR : [[U ]] t) → [[Maybe]] (F (El u) → F (El t)) (λ cast → (x : F (El u)) → FR (El u) ([[El ]] u uR ) x → FR (El t) ([[El ]] t tR ) (cast x )) (gcast F u t)

data U : ?1 where bool : U nat : U El : U → ? El bool = Bool El nat = Nat

Additionally, we can define param U :

An example of a dependently-typed, generic function is gcast, which for any type context F and any two (codes for) types u and t, returns a casting function between F (El u) and F (El t), if u and t are the same (and nothing otherwise).

param U : (u : U ) → [[U ]] u param U bool = [[bool ]] param U nat = [[nat]] We can then use [[gcast]] to prove the theorem. The idea is to specialize it to the types and relations of interest:

gcast : (F : ? → ?) → (u t : U ) → Maybe (F (El u) → F (El t)) gcast F bool bool = just (λ x → x ) gcast F nat nat = just (λ x → x ) gcast F = nothing data Maybe (A : ?) : ? where nothing : Maybe A just : A → Maybe A

lemma1 : (F : ? → ?) → (u t : U ) → (x : F (El u)) → [[Maybe]] (F (El u) → F (El t)) (λ cast → (x 0 : F (El u)) → x 0 ∼ =x → cast x 0 ∼ = x) (gcast F u t) lemma1 F u t x = [[gcast]] F (λ tR y → y ∼ = x) u (param U u) t (param U t) 0 By fixing x to x in the argument to [[Maybe]], the condition x 0 ∼ =x is fulfilled, and the proof is complete. The remarkable feature of this proof is that it is essentially independent of the definitions of U and El : only their types matter. Adding constructors in U would not change anything in the proof: [[gcast]] isolates Theorem 3 from the actual definitions of U , El and gcast; it can be generated automatically from gcast. In summary, we have proved the correctness of gcast in three steps:

The function gcast is deemed safe if it returns the identity function whenever it returns something. Vytiniotis and Weirich [2009] show that this theorem can be deduced from the type of gcast alone, by parametricity. While the result can be re-derived in a simple way by reasoning directly on the definition of gcast, there is a good reason for using parametricity: as the universe is extended to a realistic definition, the definition of gcast gets more complex, but its type remains the same, and therefore the argument relying on parametricity is unchanged. The rest of this section is devoted to rederiving the result using our framework. The first step is to encode the theorem. We can encode that an arbitrary function f : A → B is the identity as the formula (x : A) → f x ∼ = x . Note that because the input and output types of the cast are not definitionally equal, we must use a heterogeneous equality (∼ =), defined as follows: ∼ (A : ?) (a : A) : (B : ?) → B → ? where data = refl 0 : ∼ =Aa Aa

1. Model representation types within our dependently-typed language; 2. use [[ ]] to obtain parametricity properties of any function of interest; 3. prove correctness by using the properties.

The theorem can then be expressed as follows:

We think that the above process is an economical way to work with parametricity for extended type systems. Indeed, step one of the above process is becoming an increasingly popular way to develop languages with exotic type systems as an embedding in a dependently-typed language [Oury and Swierstra 2008]. By providing (an automatic) step two, we hope to spare language designers the effort to adapt Reynolds’ abstraction theorem for new type systems in an ad-hoc way.

Theorem 3 (gcast is safe).

5.4

Now, gcast is not a direct conversion function: sometimes it returns no result; its result is wrapped in Maybe. Hence we use a helper function to lift the identity predicate to a Maybe type: onMaybe : (A : ?) → (A → ?) → Maybe A → ? onMaybe A P nothing = > onMaybe A P (just a) = P a

(F : ? → ?) → (u t : U ) → (x : F (El u)) → onMaybe (F (El u) → F (El t)) (λ cast → cast x ∼ = x) (gcast F u t)

A partially constrained universe

So far we have only seen universes which are either completely unconstrained (like ?) and translate to arbitrary relations, or universes which are completely constrained (like Bool or U in the previous section) and translate to the identity relation. In this section we show that a middle ground is also possible. Suppose that we want the same universe as in the above section, but with only limited capabilities to dispatch on the type. That is, we allow users to define functions that have special behaviour for

4 For the present section, U : ? would be sufficient, but we define U : ? 1 to permit a different definition of [[U ]] in the next section.

353

Booleans, but are otherwise oblivious to the actual type at which they are used. This particular functionality may be encoded by only providing an eliminator for U with restricted capabilities:

For example, using this generalization, we see that all the parametricity statements about terms in the λ-cube are expressible and provable in the generalized calculus of constructions (CCω ). Indeed, we observe that

typeTest : (u : U ) → (F : ? → ?) → F Bool → ((A : ?) → F A) → F (El u) typeTest bool F AB AGen = AB typeTest t F AB AGen = AGen (El t)

• CCω is reflective with s e = s and, • All eight systems of the λ-cube are embedded in CCω

While extending our abstraction to subsystems is useful, further generalization is possible. For example, parametricity theorems (and proofs) generated from terms in the λ-cube will never use the higher sorts of CCω . Specifying necessary and sufficient conditions for the two-system case is left as future work.

This restriction of elimination allows us to “relax” the definitions of [[U ]] and [[El ]], by translating the cases that do not involve bool to an arbitrary relation (for n = 2): [[U ]] : U → U → ?1 [[U ]] bool bool = > = ⊥ [[U ]] bool [[U ]] bool = ⊥ [[U ]] u1 u2 = [[?]] (El u1 ) (El u2 ) [[El ]] : (u1 u2 : U ) → (uR : [[U ]] u1 u2 ) → [[?]] (El u1 ) (El u2 ) [[El ]] bool bool r = [[Bool ]] [[El ]] u1 u2 r = r

6.4

Given the above definitions, free theorems involving U reduce to the constrained case if presented with Booleans, and to the unconstrained case otherwise. While the above is a toy example, it points the way towards more sophisticated representations of universes. An example would be an encoding of fresh abstract type variables, as in Neis et al. [2009].

6.

Discussion

6.1

Proof

A detailed sketch of the proof of Theorem 1 is available online [Bernardy et al. 2010b]. Beyond the pen-and-paper version, we also have a machine-checked proof, for the unary case, as an Agda program [Bernardy 2010]. A few improvements are necessary before it can be considered a fully-machine-checked proof: • some substitution lemmas need to be proved; • the top-level structure needs some superficial restructuring to

pass the termination-check of the Agda system;

6.5

• proofs of some lemmas given by Barendregt [1992] should be

formalized. 6.2

Related work

Some of the many studies of parametricity have already been mentioned and analysed in the rest of the paper. In this section we compare our work to only a couple of the most relevant pieces of work. One direction of research is concerned with parametricity in extensions of System F. Our work is directly inspired by Vytiniotis and Weirich [2010], which extend parametricity to (an extension of) Fω: indeed, Fω can be seen as a PTS with one more product rule than System F. Besides supporting more sorts and function spaces, an orthogonal extension of parametricity theory is to support impure features in the system. For example, [Johann and Voigtl¨ander 2005] studied how explicit strictness modifies parametricity results. It is not obvious how to support such extensions in our framework. Another direction of research is concerned with better understanding of parametricity. Here we shall mention only [Wadler 2007], which gives a particularly lucid presentation of the abstraction theorem, as the inverse of Girard’s Representation theorem [Girard 1972]. Our version of the abstraction theorem differs in the following aspects compared to that of Wadler (and to our knowledge all others):

Different source and target sorts

Even though the sort-mapping function e used in all our examples has been the identity, there are other possible choices. For example, Iω is reflective with ?ei = ?i+k , for any natural k. Other examples can be constructed by mapping e to “fresh” sorts. The following fi = 4i . system (Iω + ) is reflective with ?ei = 4i and 4 Definition 7 (Iω + ). Iω + is a PTS with this specification: • S = {?i | i ∈ N} ∪ {4i | i ∈ N} • A = {4i : 4i+1 | i ∈ N} ∪ {?i : ?i+1 | i ∈ N} • R = {(?i , ?j , ?max(i,j) ) | i, j ∈ N}

{(4i , 4j , 4max(i,j) ) | i, j ∈ N} {?i ; 4j | i 6 j ∈ N} 6.3

Internalizing the meta-theorem

Theorem 1 and Corollary 2 (` A : B =⇒ ` [[A]] : [[B]] A) are meta-theorems. One can instantiate the corollary by choosing specific terms A and B; then [[A]] is a proof of [[B]] A in the system, derived from the structure of ` A : B. Our examples consist of many such instantiations. However, one would like to go further and make a general statement about about all values of type B within the system. That is, for a type B, to define param B : (∀x : B. [[B]] x . . . x), as we did with param U in Section 5.3, essentially making the semantics of the type available for reasoning within the system. In particular, for any constant k : B, we could define [[k]] = param B k. One way to proceed is to assert parametricity at all types, with a constant param B for each B. This approach was applied to CC by Takeuti [2004], extending similar axiom schemes for System F by Plotkin and Abadi [1993]. For each α : and P : α, Takeuti defined a relational interpretation hPi and a kind (|P : α|) such that hPi : (|P : α|). Then for each type T : ?, he postulated an axiom param T : (∀x : T. hTi x x), conjecturing that such axioms did not make the system inconsistent. For closed terms P, Takeuti’s translations hPi and (|P : α|) resemble our [[P]] and [[α]] P respectively (with n = 2), but the pattern is obscured by an error in the translation rule for the product ; ?, and the omission of a witness xR for the relationship between values x1 and x2 in the rules corresponding to the product ? ; . Another approach would be to provide access to the terms via some form of reflection.

Different source and target systems

For simplicity, we have chosen to use the same source and target PTS in Theorem 1. However, the theorem may be generalized to the case where source and target are different. One way to relax the hypothesis is to allow any source PTS which is a subsystem of the target one, keeping the same conditions for the target PTS.

1. Instead of targeting a logic, we target its propositions-as-types interpretation, expressed in a PTS.

354

References

2. We abstract from the details of the systems, generalizing to a class of PTS’s.

T. Altenkirch and C. McBride. Generic programming within dependently typed programming. In Proc. of the IFIP TC2/WG2.1 Working Conference on Generic Programming, pages 1–20. Kluwer, B.V., 2003.

3. We add that the translation function used to interpret types as relations can also be used to interpret terms as witnesses of those relations. In short, the [[A]] part of Γ ` A : B =⇒ [[Γ]] ` [[A]] : [[B]] A is new. This additional insight depends heavily on using the propositions-as-types interpretation.

H. P. Barendregt. Lambda calculi with types. computer science, 2:117–309, 1992.

M. Benke, P. Dybjer, and P. Jansson. Universes for generic programs and proofs in dependent type theory. Nordic J. of Computing, 10(4):265– 289, 2003.

It also appears that the function [[ ]] (for the unary case) has been discovered independently by Monnier and Haguenauer [2010], for a very different purpose. They use [[ ]] as a compilation function from CC to a language with singleton types only, in order to enforce phase-distinction. Type preservation of the translation scheme is the main formal property presented by Monnier and Haguenauer. We remark that this property corresponds to the abstraction theorem for CC. 6.6

J.-P. Bernardy. A proof of the abstraction theorem for pure type systems (unary case). http://www.cse.chalmers.se/∼bernardy/ParDep/html/ Theorem.html, 2010. J.-P. Bernardy, P. Jansson, and K. Claessen. Testing polymorphic properties. In Proc. of ESOP 2010, volume 6012 of LNCS. Springer, 2010a. J.-P. Bernardy, P. Jansson, and R. Paterson. An abstraction theorem for pure type systems. Available from http://www.cse.chalmers.se/∼bernardy/ ParDep/abstraction-pts.pdf, 2010b.

Future work

C. B¨ohm and A. Berarducci. Automatic synthesis of typed lambdaprograms on term algebras. Theor. Comp. Sci., 39(2-3):135–154, 1985.

Our explanation of parametricity for dependent types has opened a whole range of interesting topics for future work. We should investigate whether our framework can be applied (and extended if need be) to more exotic systems, for example those incorporating strictness annotations (seq) or non-termination. We should extend our translation to support non-informative function spaces, as found for example in Coq. In Coq, the sort ? of CC is split into two separate sorts, one for types (Set) and one for propositions (Prop). Inhabitants of Set can depend on inhabitants of Prop: for example, a program may depend on a certain property to terminate. However, computational content can never “leak” from Prop to Set: programs may only depend on the existence of a proof; it is forbidden to inspect their structure. In such a situation, our translation scheme appears to generate parametricity results that are too weak, as we have briefly alluded to in Section 5.1. The reason is that we always assume that computational content may be transferred from the argument of a function to its result. We could modify the translation to omit the superfluous relation parameter in such cases. Reynolds’ abstraction theorem can be understood as an embedding of polymorphic lambda calculus into second order propositional logic. Wadler [2007] showed that Girard’s representation theorem [Girard 1972] can be understood as the corresponding projection. In this work we have shown that the embedding can be generalized for more complex type systems. The question of how the projection generalizes naturally arises, and should also be addressed. It is straightforward to derive translated types using our schema, but tedious. Providing [[ ]] as a meta-function would greatly ease experimentation with our technique. Another direction worth exploring is to provide the parametricity axiom (param ) as a metafunction in a logical framework. We presented only simple examples. Applying the results to more substantial applications should be done as well.

7.

Handbook of logic in

T. Coquand. An analysis of Girard’s paradox. In Proc. of LICS 1986, pages 227–236. IEEE Comp. Society Press, 1986. T. Coquand. Pattern matching with dependent types. In Proc. of the Workshop on Types for Proofs and Programs, pages 66–79, 1992. P. Dybjer. Inductive families. Formal Aspects of Computing, 6(4):440–465, 1994. J. Gibbons and R. Paterson. Parametric datatype-genericity. In Proc. of WGP 2009, pages 85–93, Edinburgh, Scotland, 2009. ACM. A. Gill, J. Launchbury, and S. L. Peyton Jones. A short cut to deforestation. In Proc. of FPCA, pages 223–232, Copenhagen, Denmark, 1993. ACM. J. Y. Girard. Interpr´etation fonctionnelle et elimination des coupures de l’arithm´etique d’ordre sup´erieur. Th`ese d’´etat, Universit´e de Paris 7, 1972. R. Hinze. Church numerals, twice! J. Funct. Program., 15(1):1–13, 2005. P. Johann. A generalization of short-cut fusion and its correctness proof. Higher-Order and Symbol. Comput., 15(4):273–300, 2002. P. Johann and J. Voigtl¨ander. The impact of seq on free theorems-based program transformations. Fundam. Inf., 69(1-2):63–102, 2005. C. McBride and J. McKinna. The view from the left. J. Funct. Program., 14(01):69–111, 2004. A. Miquel. Le Calcul des Constructions implicite: syntaxe et s´emantique. Th`ese de doctorat, Universit´e Paris 7, 2001. S. Monnier and D. Haguenauer. Singleton types here, singleton types there, singleton types everywhere. In Proc. of PLPV 2010, pages 1–8, Madrid, Spain, 2010. ACM. G. Neis, D. Dreyer, and A. Rossberg. Non-parametric parametricity. In Proc. of ICFP 2009, pages 135–148, Edinburgh, Scotland, 2009. ACM. U. Norell. Towards a practical programming language based on dependent type theory. PhD thesis, Chalmers Tekniska H¨ogskola, 2007. N. Oury and W. Swierstra. The power of Pi. In Proc. of ICFP 2008, pages 39–50, Victoria, BC, Canada, 2008. ACM. C. Paulin-Mohring. Inductive definitions in the system Coq – rules and properties. In Typed Lambda Calculi and Applications, pages 328–345. Springer, 1993. G. Plotkin and M. Abadi. A logic for parametric polymorphism. In LNCS, volume 664, page 361–375. Springer-Verlag, 1993. J. C. Reynolds. Types, abstraction and parametric polymorphism. Information processing, 83(1):513–523, 1983. I. Takeuti. The theory of parametricity in lambda cube. Manuscript, 2004.

Conclusion

We have shown that it is not only possible, but easy to derive parametricity conditions in a dependently-typed language. Further, it is possible to analyse parametricity properties of custom languages, via their embedding in a dependently-typed host language.

The Coq development team. The Coq proof assistant, 2010. J. Voigtl¨ander. Bidirectionalization for free! (Pearl). In Proc. of POPL 2009, pages 165–176, Savannah, GA, USA, 2009a. ACM. J. Voigtl¨ander. Free theorems involving type constructor classes: Funct. pearl. SIGPLAN Not., 44(9):173–184, 2009b.

Acknowledgments Thanks to Andreas Abel, Thierry Coquand, Peter Dybjer, Marc Lasson, Guilhem Moulin, Ulf Norell, Nicolas Pouillard and anonymous reviewers for providing us with very valuable feedback.

355

Γ`A:B

[[Γ]] ` [[A]] : [[B]] A

=⇒

` s : s0

` (λx : s. x → se) : s → se0

start

Γ`A:s Γ, x : A ` x : A

[[Γ]] ` [[A]] : A → se [[Γ]], x : A, xR : [[A]] x ` xR : [[A]] x

weakening

Γ`A:B Γ`C:s Γ, x : C ` A : B

[[Γ]] ` [[A]] : [[B]] A [[Γ]] ` [[C]] : C → se [[Γ]], x : C, xR : [[C]] x ` [[A]] : [[B]] A

product

Γ ` A : s1 Γ, x : A ` B : s2 Γ ` (∀x : A. B) : s3

[[Γ]] ` [[A]] : A → se1 [[Γ]], x : A, xR : [[A]] x ` [[B]] : B → se2 [[Γ]] ` (λf : (∀x : A. B). ∀x : A. ∀xR : [[A]] x. [[B]] (f x)) :(∀x : A. B) → se3

application

Γ ` F : (∀x : A. B) Γ`a:A Γ ` F a : B[x 7→ a]

[[Γ]] ` [[F]] : (∀x : A. ∀xR : [[A]] x. [[B]] (F x)) [[Γ]] ` [[a]] : [[A]] a [[Γ]] ` [[F]] a [[a]] : [[B[x 7→ a]]] (F a)

Γ ` A : s1 Γ, x : A ` B : s2 Γ, x : A ` b : B Γ ` (λx : A. b) : (∀x : A. B)

[[Γ]] ` [[A]] : A → se1 [[Γ]], x : A, xR : [[A]] x ` [[B]] : B → se2 [[Γ]], x : A, xR : [[A]] x ` [[b]] : [[B]] b [[Γ]] ` (λx : A. λxR : [[A]] x. [[b]]) : (∀x : A. ∀xR : [[A]] x. [[B]] b)

Γ`A:B Γ ` B0 : s B =β B0 Γ ` A : B0

[[Γ]] ` [[A]] : [[B]] A [[Γ]] ` [[B0 ]] : B0 → se [[B]] =β [[B0 ]] [[Γ]] ` [[A]] : [[B0 ]] A

axiom

abstraction

conversion

Figure 6. Outline of a proof of Theorem 1 by induction over the derivation of Γ ` A : B.

A.

D. Vytiniotis and S. Weirich. Type-safe cast does no harm: Syntactic parametricity for Fω and beyond. Preliminary version of “Parametricity, Type Equality and Higher-order Polymorphism”, 2009.

Proof of the abstraction theorem

In this appendix we sketch the proof of our main theorem, using the following lemma:

D. Vytiniotis and S. Weirich. Parametricity, type equality, and higher-order polymorphism. J. Funct. Program., 20(2):175–210, 2010. P. Wadler. Theorems for free! In Proc. of FPCA 1989, pages 347–359, Imperial College, London, United Kingdom, 1989. ACM. P. Wadler. The Girard–Reynolds isomorphism. Theor. Comp. Sci., 375(1– 3):201–226, 2007. P. Wadler and S. Blott. How to make ad-hoc polymorphism less ad hoc. In POPL’89, pages 60–76. ACM, 1989.

Lemma 4 (translation preserves β-reduction). A −→∗β A0 =⇒ [[A]] −→∗β [[A0 ]] Proof sketch. The proof proceeds by induction on the derivation of A −→∗β A0 . The interesting case is where the term A is a βredex (λx : B. C) b. That case relies on the way [[ ]] interacts with substitution: [[b[x 7→ C]]] = [[b]][x 7→ C][xR 7→ [[C]]] The remaining cases are congruences. Theorem (abstraction). In a reflective PTS, Γ ` A : B =⇒ [[Γ]] ` [[A]] : [[B]] A Proof sketch. A derivation of [[Γ]] ` [[A]] : [[B]] A is constructed by induction on the derivation of Γ ` A : B, using the syntactic properties of PTSs. We have one case for each typing rule: each type rule translates to a portion of a corresponding relational typing judgement, as shown in Figure 6. For convenience, the proof uses a variant form of the abstraction rule; equivalence of the two systems follows from Barendregt [1992, Lemma 5.2.13]. The conversion case uses Lemma 4.

356

A Play on Regular Expressions Functional Pearl Sebastian Fischer

Frank Huch

Thomas Wilke

Christian-Albrechts University of Kiel, Germany {sebf@,fhu@,wilke@ti.}informatik.uni-kiel.de

Abstract

To the left: H AZEL sitting at her desk, two office chairs nearby, a whiteboard next to the desk, a laptop, keyboard, and mouse on the desk. H AZEL is looking at the screen, a browser window shows the project page of a new regular expression library by Google. C ODY enters the scene.

Cody, Hazel, and Theo, two experienced Haskell programmers and an expert in automata theory, develop an elegant Haskell program for matching regular expressions: (i) the program is purely functional; (ii) it is overloaded over arbitrary semirings, which not only allows to solve the ordinary matching problem but also supports other applications like computing leftmost longest matchings or the number of matchings, all with a single algorithm; (iii) it is more powerful than other matchers, as it can be used for parsing every context-free language by taking advantage of laziness. The developed program is based on an old technique to turn regular expressions into finite automata which makes it efficient both in terms of worst-case time and space bounds and actual performance: despite its simplicity, the Haskell implementation can compete with a recently published professional C++ program for the same problem.

C ODY What are you reading? H AZEL Google just announced a new library for regular expression matching which is—in the worst case—faster and uses less memory than commonly used libraries. C ODY How would we go about programming regular expression matching in Haskell? H AZEL Well, let’s see. We’d probably start with the data type. (Opens a new Haskell file in her text editor and enters the following definition.) data Reg = Eps | Sym Char | Alt Reg Reg | Seq Reg Reg | Rep Reg

Categories and Subject Descriptors D.1.1 [Programming Techniques]: Applicative (Functional) Programming; F.1.1 [Computation by Abstract Devices]: Models of Computation (Automata) General Terms

Algorithms, Design

-- ε -- a -- α|β -- αβ -- α∗

T HEO (a computer scientist, living and working three floors up, strolls along the corridor, carrying his coffee mug, thinking about a difficult proof, and searching for distraction.) What are you doing, folks? H AZEL We just started to implement regular expressions in Haskell. Here is the first definition. T HEO (picks up a pen and goes to the whiteboard.) So how would you write

Keywords regular expressions, finite automata, Glushkov construction, purely functional programming

C AST C ODY – proficient Haskell hacker H AZEL – expert for programming abstractions

((a|b)∗ c(a|b)∗ c)∗ (a|b)∗ ,

T HEO – automata theory guru

which specifies that a string contains an even number of c’s? C ODY That’s easy. (Types on the keyboard.)

ACT I

ghci> let nocs = Rep (Alt (Sym ’a’) (Sym ’b’)) ghci> let onec = Seq nocs (Sym ’c’) ghci> let evencs = Seq (Rep (Seq onec onec)) nocs

S CENE I . S PECIFICATION

T HEO Ah. You can use abbreviations, that’s convenient. But why do you have Sym in front of every Char?— That looks redundant to me. H AZEL Haskell is strongly typed, which means every value has exactly one type! The arguments of the Alt constructor must be of type Reg, not Char, so we need to wrap characters in the Sym constructor.— But when I draw a regular expression, I leave out Sym, just for simplicity. For instance, here is how I would draw your expression. (Joins T HEO at the whiteboard and draws Figure 1.) C ODY How can we define the language accepted by an arbitrary regular expression?

To the right: a coffee machine and a whiteboard next to it.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. Copyright © 2010 ACM 978-1-60558-794-3/10/09. . . $10.00

357

Seq Rep

Rep

Seq

Alt

Seq

’a’

Seq

Rep

Rep

Alt

Alt ’b’ ’c’ ’a’

T HEO That makes sense to me. C ODY Maybe, it’s good to have a separate name for these lists. I think Hazel used the term words—that’s a good term. Let’s stick to it. T HEO I want to check out your code. (Sits down as well. Now all three build a small crowd in front of the monitor.)

’b’ ’c’ ’a’

ghci> parts "acc" [["acc"],["a","cc"],["ac","c"],["a","c","c"]] ghci> accept evencs "acc" True

T HEO Aha. (Pauses to think for a moment.) Wait a second! The number of decompositions of a string of length n + 1 is 2n . Blindly checking all of them is not efficient. When you convert a regular expression into an equivalent finite-state automaton and use this automaton for matching, then, for a fixed regular expression, the run time of the matching algorithm is linear in the length of the string. H AZEL Well, the program is not meant to be efficient. It’s only a specification, albeit executable. We can write an efficient program later. What I am more interested in is whether we can make the specification a bit more interesting first. Can it be generalized, for instance? T HEO (staring out of the window.) We can add weights. H AZEL Weights?

’b’

Figure 1. The tree representation of the regular expression ((a|b)∗ c(a|b)∗ c)∗ (a|b)∗ which matches all words in {a, b, c}∗ with an even number of occurrences of c T HEO As a predicate, inductively on the structure of your data type. (Writes some formal definitions to the whiteboard: semantic brackets, Greek letters, languages as sets, etc.) H AZEL (goes to the keyboard, sits down next to C ODY.) Ok, this can be easily coded in Haskell, as a characteristic function. List comprehensions are fairly useful, as well. (Writes the following definition in her text editor.)

S CENE II . W EIGHTS

accept :: Reg → String → Bool accept Eps u = null u accept (Sym c) u = u = I [c] accept (Alt p q) u = accept p u ∨ accept q u accept (Seq p q) u = or [accept p u1 ∧ accept q u2 | (u1 , u2 ) ← split u] accept (Rep r) u = or [and [accept r ui | ui ← ps] | ps ← parts u]

H AZEL, C ODY, and T HEO are still sitting around the laptop. H AZEL What do you mean by weights? T HEO Remember what we did above? Given a regular expression, we assigned to a word a boolean value reflecting whether the word matches the given expression or not. Now, we produce more complex values—semiring elements. H AZEL What’s an example? Is this useful at all? T HEO A very simple example is to determine the length of a word or the number of occurrences of a given symbol in a word. A more complicated example would be to count the number of matchings of a word against a regular expression, or to determine a leftmost longest matching subword. C ODY That sounds interesting, but what was a semiring, again? H AZEL If I remember correctly from my algebra course a semiring is an algebraic structure with zero, one, addition, and multiplication that satisfies certain laws. (Adds a Haskell type class for semirings to the Haskell file.)

T HEO Let me see. split produces all decompositions of a string into two factors, and parts stands for “partitions” and produces all decompositions of a string into an arbitrary numbers of factors. C ODY Wait! We need to be careful to avoid empty factors when defining parts. Otherwise there is an infinite number of possible decompositions. H AZEL Right. But split must also produce empty parts and can be defined as follows. (Continues writing the Haskell program.) split :: [a] → [([a], [a])] split [ ] = [([ ], [ ])] split (c : cs) = ([ ], c : cs) : [(c : s1 , s2 ) | (s1 , s2 ) ← split cs]

class Semiring s where zero, one :: s (⊕), (⊗) :: s → s → s

The function parts is a generalization of split to split words into any number of factors (not just two) except for empty ones. C ODY That’s tricky. Let’s use list comprehensions again. (Sits down on one of the empty chairs, grabs the keyboard and extends the program as follows:)

Here, zero is an identity for ⊕, one for ⊗, both composition operators are associative, and ⊕ is commutative, in addition. T HEO That’s true, but, moreover, the usual distributivity laws hold and zero annihilates a semiring with respect to multiplications, which means that both zero ⊗ s and s ⊗ zero are zero for all s. H AZEL These laws are not enforced by Haskell, so programmers need to ensure that they hold when defining instances of Semiring. C ODY Ok, fine. I guess what we need to do is to add weights to the symbols in our regular expressions. T HEO (sipping coffee) Right. C ODY So let’s make a data type for weighted regular expressions. T HEO (interjects.) Cool, that’s exactly the terminology we use in formal language theory.

parts :: [a] → [[[a]]] parts [ ] = [[ ]] parts [c] = [[[c]]] parts (c : cs) = concat [[(c : p) : ps, [c] : p : ps] | p : ps ← parts cs] We split a word with at least two characters recursively and either add the first character to the first factor or add it as a new factor. T HEO Why do you write [a] and not String. H AZEL That’s because we want to be more general. We can now work with arbitrary list types instead of strings only.

358

C ODY The only change to what we had before is in the symbol case; we add the weights. We can also generalize from characters to arbitrary symbol types. (Writes the following code.) data Regw c s = Epsw | Symw | Altw | Seqw | Repw

T HEO Right, but you could also restrict yourself to the nonnegative integers. They also form a semiring. H AZEL Let’s try it out. ghci> ghci> 2 ghci> ghci> 4

(c → s) (Regw c s) (Regw c s) (Regw c s) (Regw c s) (Regw c s)

H AZEL Aha! A standard implementation for the function attached to some character would compare the character with a given character and yield either zero or one:

let as = Alt (Sym ’a’) (Rep (Sym ’a’)) acceptw (weighted as) "a" :: Int let bs = Alt (Sym ’b’) (Rep (Sym ’b’)) acceptw (weighted (Seq as bs)) "ab" :: Int

It seems we can compute the number of different ways to match a word against a regular expression. Cool! I wonder what else we can compute by using tricky Semiring instances. T HEO I told you what you can do: count occurrences of symbols, determine leftmost matchings, and so on. But let’s talk about this in more detail later. There is one thing I should mention now. You are not right when you say that with the above method one can determine the number of different ways a word matches a regular expression. Here is an example. (Uses again the interactive Haskell environment.)

sym :: Semiring s ⇒ Char → Regw Char s sym c = Symw (λx → if x = I c then one else zero) Using sym, we can translate every regular expression into a weighted regular expression in a canonical fashion: weighted :: Semiring s ⇒ Reg → Regw Char s weighted Eps = Epsw weighted (Sym c) = sym c weighted (Alt p q) = Altw (weighted p) (weighted q) weighted (Seq p q) = Seqw (weighted p) (weighted q) weighted (Rep p) = Repw (weighted p)

ghci> acceptw (weighted (Rep Eps)) "" :: Int 1

C ODY The number of matchings is infinite, but the program gives us only one. Can’t we fix that? T HEO Sure, we can, but we would have to talk about closed semirings. Let’s work with the simple solution, because working with closed semirings is a bit more complicated, but doesn’t buy us much. H AZEL (smiling) The result may not reflect our intuition, but, due to the way in which we defined parts, our specification does not count empty matchings inside a repetition. It only counts one empty matching for repeating the subexpression zero times.

T HEO How would you adjust accept to the weighted setting? H AZEL I replace the Boolean operations with semiring operations. (Goes on with entering code.) acceptw :: Semiring s ⇒ Regw c s → [c] → s acceptw Epsw u = if null u then one else zero acceptw (Symw f) u = case u of [c] → f c; → zero acceptw (Altw p q) u = acceptw p u ⊕ acceptw q u acceptw (Seqw p q) u = sum [acceptw p u1 ⊗ acceptw q u2 | (u1 , u2 ) ← split u] acceptw (Repw r) u = sum [prod [acceptw r ui | ui ← ps] | ps ← parts u]

ACT II

T HEO How do you define the functions sum and prod? H AZEL They are generalizations of or and and, respectively:

Same arrangement as before. The regular expression tree previously drawn by C ODY, see Figure 1, is still on the whiteboard. H AZEL and C ODY standing at the coffee machine, not saying anything. T HEO enters the scene.

sum, prod :: Semiring s ⇒ [s] → s sum = foldr (⊕) zero prod = foldr (⊗) one And we can easily define a Semiring instance for Bool:

S CENE I . M ATCHING

instance Semiring Bool where zero = False one = True (⊕) = (∨) (⊗) = (∧)

T HEO Good morning everybody! How about looking into efficient matching of regular expressions today? H AZEL Ok. Can’t we use backtracking? What I mean is that we read the given word from left to right, check at the same time whether it matches the given expression, revising decisions when we are not successful. I think this is what algorithms for Perl style regular expressions typically do. C ODY But backtracking is not efficient—at least not always. There are cases where backtracking takes exponential time. H AZEL Can you give an example? C ODY If you match the word an against the regular expression (a|ε)n an , then a backtracking algorithm takes exponential time to find a matching.1 H AZEL You’re right. When trying to match an against (a|ε)n an one can choose either a or ε in n positions, so all together there are 2n different options, but only one of them—picking ε every time—leads to a successful matching. A backtracking

T HEO I see. We can now claim for all regular expressions r and words u the equation accept r u = I acceptw (weighted r) u. C ODY Ok, but we have seen matching before. Theo, can I see the details for the examples you mentioned earlier? T HEO Let me check on your algebra. Do you know any semiring other than the booleans? C ODY Well, I guess the integers form a semiring. (Adds a corresponding instance to the file.) instance Semiring Int where zero = 0 one = 1 (⊕) = (+) (⊗) = (∗)

1 By

359

xn Cody means a sequence of n copies of x.

Seq Rep

Rep

Rep

Rep

Seq

Alt

Seq

Alt

Seq

’a’

Seq

Seq

Seq

Seq

Rep

Rep

Rep

Rep

Alt

Alt

Alt

Alt

’b’ ’c’ ’a’

’b’ ’c’ ’a’

’b’

’a’

(a) Result after reading b

’b’ ’c’ ’a’

’b’ ’c’ ’a’

’b’

(b) Result after reading bc

Figure 2. Marking positions in the regular expression ((a|b)∗ c(a|b)∗ c)∗ (a|b)∗ while matching algorithm may pick this combination only after having tried all the other options.— Can we do better? T HEO An alternative is to turn a regular expression into an equivalent deterministic finite-state automaton. The run time for simulating an automaton on a given word is linear in the length of the word, but the automaton can have exponential size in the size of the given regular expression. C ODY That’s not good, because then the algorithm not only has exponential space requirements but additionally preprocessing takes exponential time. H AZEL Can you give an example where the deterministic automaton has necessarily exponential size? T HEO Suppose we are working with the alphabet that contains only a and b. If you want to check whether a word contains two occurrences of a with exactly n characters in between, then any deterministic automaton for this will have 2n+1 different states. C ODY Why? T HEO Because at any time—while reading a word from left to right—the automaton needs to know for each of the previous n characters whether it was an a or not in order to tell whether the entire word is accepted. H AZEL I see. You need such detailed information because if there was an a exactly n + 1 positions before the current position and the next character is not an a, then you have to go to the next state, where you will need to know whether there was an a exactly n positions before the current position, and so on. T HEO Exactly!— And here is a formal proof. Suppose an automaton had less than 2n+1 states. Then there would be two distinct words of length n + 1, say u and v, which, when read by the automaton, would lead to the same state. Since u and v are distinct, there is some position i such that u and v differ at position i, say u carries symbol a in this position, but v doesn’t. Now, consider the word w which starts with i copies of b, followed by one occurrence of a (w = bi a). On the one hand, uw has the above property, namely two occurrences of a’s with n characters in between, but vw has not, on the other hand, the automaton would get to the same state for both words, so either both words are accepted, or none of them is—a contradiction. C ODY Interesting. And, indeed, a regular expression to solve this task has size only linear in n. If we restrict ourselves to the alphabet consisting of a and b, then we can write it as follows. (Grabs a pen from his pocket and a business card from his wallet. Scribbles on the back of the business card. Reads aloud the following term.)

(a|b)∗ a(a|b)n a(a|b)∗ H AZEL Can we avoid constructing the automaton in advance? C ODY Instead of generating all states in advance, we can generate the initial state and generate subsequent states on the fly. If we discard previous states, then the space requirements are bound by the space requirements of a single state. T HEO And the run time for matching a word of length n is in O(mn) if it takes time O(m) to compute a new state from the previous one. H AZEL That sounds reasonably efficient. How can we implement this idea? T HEO Glushkov proposed a nice idea for constructing non-deterministic automata from regular expressions, which may come in handy here. It avoids ε-transitions and can probably be implemented in a structural fashion. H AZEL (smiling) I think we would say it could be implemented in a purely functional way. C ODY (getting excited) How are his automata constructed? T HEO A state of a Glushkov automaton is a position of the given regular expression where a position is defined to be a place where a symbol occurs. What we want to do is determinize this automaton right away, so we should think of a state as a set of positions. H AZEL What would such a set mean? T HEO The positions contained in a state describe where one would get to by trying to match a word in every possible way. H AZEL I don’t understand. Can you give an example? T HEO Instead of writing sets of positions, I mark the symbols in the regular expression with a box. This is more intuitive and allows me to explain how a new set of positions is computed from an old one. C ODY Let’s match the string bc against the regular expression which checks whether it contains an even number of c’s. T HEO Well, I need to draw some pictures. C ODY Then let’s go back to Hazel’s office; we can probably use what we wrote on the whiteboard yesterday. The three move to the left side of the stage, get in front of the whiteboard, where Figure 1 is still shown. T HEO Initially, no symbol is marked, i. e., the initial state is the empty set. We then mark every occurrence of ’b’ in the regular expression that might be responsible for reading the first char-

360

Seq Rep

Rep

Rep

Rep

Seq

Alt

Seq

Alt

Seq

’a’

Seq

Seq

Seq

Seq

Rep

Rep

Rep

Rep

Alt

Alt

Alt

Alt

’b’ ’c’ ’a’

’b’ ’c’ ’a’

’b’

’a’

(a) Result after reading bcc

’b’ ’c’ ’a’

’b’ ’c’ ’a’

’b’

(b) Result after reading bccc

Figure 3. Shifting symbol marks in repetitions of the regular expression ((a|b)∗ c(a|b)∗ c)∗ (a|b)∗ acter b of the word. (Draws two boxes around the first and last ’b’ in the regular expression tree, see Figure 2(a).) H AZEL There are two possibilities to read the first character b, which correspond to the two positions that you marked. The last ’b’ in the regular expression can be marked because it follows a repetition which accepts the empty word. But the ’b’ in the middle cannot be responsible for matching the first character because it follows the first ’c’ which has not yet been matched. T HEO Exactly! If we now read the next character c we shift the mark from the first ’b’ to the subsequent ’c’. (Does so, which leads to Figure 2(b).) C ODY And the mark of the other ’b’ is discarded because there is no possibility to shift it to a subsequent ’c’. T HEO Right, you got the idea.— We have reached a final state if there is a mark on a final character. C ODY When is a character final? T HEO When no other character has to be read in order to match the whole regular expression, i.e., if the remaining regular expression accepts the empty word. H AZEL I think we can elegantly implement this idea in Haskell! Instead of using sets of positions we can represent states as regular expressions with marks on symbols—just as you did on the whiteboard. (They move to the desk, C ODY sits down right in front of the keyboard, H AZEL and T HEO take the two other office chairs.) C ODY Ok, let’s change the data type. We first consider the simple version without semirings. (Opens the file from the previous scene, adds a data type for regular expressions with marked symbols. They use this file for the rest of the scene.)

H AZEL We need a third parameter, m, which represents an additional mark that can be fed into the expression. So the arguments of shift are the mark, a possibly marked regular expression, and the character to be read: shift :: Bool → REG → Char → REG The result of shift is a new regular expression with marks. C ODY The rule for ε is easy, because ε doesn’t get a mark: shift

EPS

= EPS

T HEO And a symbol in the new expression is marked if, first, some previous symbol was marked, indicated by m= I True, and, second, the symbol equals the character to be read. shift m (SYM

x) c = SYM (m ∧ x = I c) x

H AZEL We treat both arguments of a choice of regular expressions the same. shift m (ALT p q) c = ALT (shift m p c) (shift m q c) Sequences are trickier. The given mark is shifted to the first part, but we also have to shift it to the second part if the first part accepts the empty word. Additionally, if the first part contains a final character we have to shift its mark into the second part, too. Assuming helper functions empty and final which check whether a regular expression accepts the empty word or contains a final character, respectively, we can handle sequences as follows: shift m (SEQ p q) c = SEQ (shift m p c) (shift (m ∧ empty p ∨ final p) q c)

data REG = EPS | SYM Bool Char | ALT REG REG | SEQ REG REG | REP REG

We haven’t talked about repetitions yet. How do we handle them? T HEO Let’s go back to the example first. Assume we have already read the word bc but now want to read two additional c’s. (Stands up, grabs a pen, and changes the tree on the whiteboard; the result is shown in Figure 3(a).) After reading the first c the mark is shifted from the first ’c’ in the regular expression to the second ’c’ as usual because the repetition in between accepts the empty word. But when reading the second c, the mark is shifted from the second ’c’ in the expression back to the first one! (Modifies the drawing again, see Figure 3(b) for the result.) Repetitions can be read multiple times and thus marks have to be passed through them multiple times, too. C ODY So we can complete the definition of shift as follows:

H AZEL Let’s implement the shift function. We probably want it to take a possibly marked regular expression and a character to be read and to produce a possibly marked regular expression. T HEO That’s not enough. In the beginning, no position is marked. So if we just shift, we’ll never mark a position. A similar problem occurs for subterms. If, in the left subexpression of a sequence, a final position is marked, we want this to be taken into account in the right subexpression.

361

shift m (REP r) c = REP (shift (m ∨ final r) r c)

data REGw c s = REGw {emptyw :: s, finalw :: s, regw :: REw c s}

We shift a mark into the inner expression if a previous character was marked or a final character in the expression is marked. H AZEL Ok, let’s define the helper functions empty and final. I guess, this is pretty straightforward. (Types the definition of empty in her text editor.)

data REw c s = EPSw | SYMw | ALTw | SEQw | REPw

empty :: REG → Bool empty EPS = True empty (SYM ) = False empty (ALT p q) = empty p ∨ empty q empty (SEQ p q) = empty p ∧ empty q empty (REP r) = True

(c → s) (REGw c s) (REGw c s) (REGw c s) (REGw c s) (REGw c s)

epsw :: Semiring s ⇒ REGw c s epsw = REGw {emptyw = one, finalw = zero, regw = EPSw }

No surprises here. How about final? (Goes on typing.) final :: REG → Bool final EPS = False final (SYM b ) = b final (ALT p q) = final p ∨ final q final (SEQ p q) = final p ∧ empty q ∨ final q final (REP r) = final r

symw :: Semiring s ⇒ (c → s) → REGw c s symw f = REGw {emptyw = zero, finalw = zero, regw = SYMw f } altw :: Semiring s ⇒ REGw altw p q = REGw {emptyw finalw regw

C ODY (pointing to the screen) The case for sequences is wrong. It looks similar to the definition in shift, but you mixed up the variables p and q. (Takes the keyboard and wants to change the definition.) H AZEL No, stop! This is correct. final analyzes the regular expression in the other direction. A final character of the first part is also a final character of the whole sequence if the second part accepts the empty word. Of course, a final character in the second part is always a final character of the whole sequence, as well. C ODY Got it. Let’s wrap all this up into an efficient function match for regular expression matching. (Continues typing.) The type of match is the same as the type of accept—our previously defined specification.

c s → REGw c s → REGw c s = emptyw p ⊕ emptyw q, = finalw p ⊕ finalw q, = ALTw p q}

seqw :: Semiring s ⇒ REGw c s → REGw c s → REGw c s seqw p q = REGw {emptyw = emptyw p ⊗ emptyw q, finalw = finalw p ⊗ emptyw q ⊕ finalw q, regw = SEQw p q} repw :: Semiring s ⇒ REGw c s → REGw c s repw r = REGw { emptyw = one, finalw = finalw r, regw = REPw r} matchw :: Semiring s ⇒ REGw c s → [c] → s matchw r [ ] = emptyw r matchw r (c : cs) = finalw (foldl (shiftw zero · regw ) (shiftw one (regw r) c) cs)

match :: REG → String → Bool If the given word is empty, we can check whether the expression matches the empty word using empty: match r [ ] = empty r

shiftw :: Semiring s ⇒ s → REw c s → c → REGw c s shiftw EPSw = epsw shiftw m (SYMw f) c = (symw f) {finalw = m ⊗ f c} shiftw m (ALTw p q) c = altw (shiftw m (regw p) c) (shiftw m (regw q) c) shiftw m (SEQw p q) c = seqw (shiftw m (regw p) c) (shiftw (m ⊗ emptyw p ⊕ finalw p) (regw q) c) shiftw m (REPw r) c = repw (shiftw (m ⊕ finalw r) (regw r) c)

If the given word is a nonempty word c : cs we mark all symbols of the given expression which may be responsible for matching the first character c by calling shift True r c. Then we subsequently shift the other characters using shift False. match r (c : cs) = final (foldl (shift False) (shift True r c) cs) T HEO Why has the argument to be False? C ODY Because, after having processed the first character, we only want to shift existing marks without adding new marks from the left. Finally, we check whether the expression contains a final character after processing the whole input word. H AZEL That is a pretty concise implementation of regular expression matching! However, I’m not yet happy with the definition of shift and how it repeatedly calls the auxiliary functions empty and final which traverse their argument in addition to the traversal by shift. Look at the rule for sequences again! (Points at the shift rule for sequences on the screen.)

Figure 4. Efficient matching of weighted regular expressions

There are three calls which traverse p and one of them is a recursive call to shift. So, if p contains another sequence where the left part contains another sequence where the left part contains another sequence and so on, this may lead to quadratic run time in the size of the regular expression. We should come up with implementations of empty and final with constant run time.

shift m (SEQ p q) c = SEQ (shift m p c) (shift (m ∧ empty p ∨ final p) q c)

362

C ODY We need to cache the results of empty and final in the inner nodes of regular expressions such that we don’t need to recompute them over and over again. Then the run time of shift is linear in the size of the regular expression and the run time of match is in O(mn) if m is the size of the regular expression and n the length of the given word. T HEO That’s interesting. The run time is independent of the number of transitions in the corresponding Glushkov automaton. The reason is that we use the structure of the regular expression to determine the next state and find it without considering all possible transitions. H AZEL And the memory requirements are in O(m), because we discard old states while processing the input. The three are excited. T HEO sits down again to join C ODY and H AZEL for generalizing the implementation previously developed: they use semirings again and implement the idea of caching the results of empty and final. The result is presented in Figure 4. H AZEL First of all, we have to add fields to our regular expressions in which we can store the cached values for empty and final. This can be done elegantly by two alternating data types REGw and REw . T HEO Okay, but what are the curly brackets good for. C ODY This is Haskell’s notation for specifying field labels. You can read this definition as a usual data type definition with a single constructor REGw taking three arguments. Furthermore, Haskell automatically defines functions emptyw , finalw , and regw to select the values of the corresponding fields. T HEO I understand. How can we go on then? C ODY We use smart constructors, which propagate the cached values automatically. When shifting marks, which are now weights, through the regular expression these smart constructors will come in handy (epsw , symw , altw , seqw , and repw ). T HEO And again the curly brackets are Haskell’s syntax for constructing records? H AZEL Right. Finally, we have to modify match and shift such that they use cached values and construct the resulting regular expression by means of the smart constructors. The rule for SYMw introduces a new final value and caches it as well. T HEO And here the curly brackets are used to update an existing record in only one field.— Two more weeks of Haskell programming with you guys, and I will be able to write beautiful Haskell programs. Disbelief on H AZEL’s and C ODY’s faces.— They all laugh.

After a few seconds, the three move with their office chairs to the desk again, open the file from before in an editor, and continue typing. submatchw :: Semiring s ⇒ REGw (Int, c) s → [c] → s submatchw r s = matchw (seqw arb (seqw r arb)) (zip [0 . .] s) where arb = repw (symw (λ → one)) C ODY I see! arb is a regular expression that matches an arbitrary word and always yields the weight one. And r is a regular expression where the symbols can access information about the position of the matched symbol. H AZEL Exactly! T HEO How can we create symbols that use the positional information? H AZEL For example, by using a subclass of Semiring with an additional operation to compute an element from an index: class Semiring s ⇒ Semiringi s where index :: Int → s C ODY We can use it to define a variant of the sym function: symi :: Semiringi s ⇒ Char → REGw (Int, Char) s symi c = symw weight where weight (pos, x) | x = I c = index pos | otherwise = zero T HEO Ok, it yields zero if the character does not match, just like before, but uses the index function to compute a weight from the position of a character. And if we use symi we get regular expressions of type Regw (Int, Char) s, which we can pass to submatchw . Now, we need some instances of Semiringi that use this machinery! H AZEL How about computing the starting position for a nonempty leftmost subword matching? We can use the following data types: data Leftmost = NoLeft | Leftmost Start data Start = NoStart | Start Int NoLeft is the zero of the semiring, i.e., it represents a failing match. Leftmost NoStart is the one and, thus, used for ignored characters: instance Semiring Leftmost where zero = NoLeft one = Leftmost NoStart C ODY Let me try to define addition. NoLeft is the identity for ⊕. So the only interesting case is when both arguments are constructed by Leftmost.

S CENE II . H EAVY WEIGHTS H AZEL, C ODY, and T HEO sitting relaxed on office chairs, facing the audience, holding coffee mugs. C ODY I have a question. Until now we compute a weight for a whole word according to a regular expression. Usually, one is interested in matching a subword of a given word and finding out information about the part that matches. H AZEL Like the position or the length of the matching part. C ODY Can we use our approach to match only parts of a word and compute information about the matching part? T HEO I think so. How about putting something like (a|b)∗ around the given regular expression? This matches anything before and after a part in the middle that matches the given expression. H AZEL Yes, that should work. And we can probably zip the input list with a list of numbers in order to compute information about positions and lengths. Let’s see. (Scratches her head with one hand and lays her chin into the other.)

NoLeft ⊕x =x x ⊕ NoLeft =x Leftmost x ⊕ Leftmost y = Leftmost (leftmost x y) where leftmost NoStart NoStart = NoStart leftmost NoStart (Start i) = Start i leftmost (Start i) NoStart = Start i leftmost (Start i) (Start j) = Start (min i j) The operation ⊕ is called on the results of matching a choice of two alternative regular expressions. Hence, the leftmost function picks the leftmost of two start positions by computing their minimum. NoStart is an identity for leftmost. H AZEL Multiplication combines different results from matching the parts of a sequence of regular expressions. If one part fails, then the whole sequence does, and if both match, then the start

363

T HEO We also need to define the index function for the LeftLong type:

position is the start position of the first part unless the first part is ignored: ⊗ = NoLeft ⊗ NoLeft = NoLeft Leftmost x ⊗ Leftmost y = Leftmost (start x y) where start NoStart s = s start s =s

instance Semiringi LeftLong where index i = LeftLong (Range i i)

NoLeft

And again, we can use the same algorithm that we have used before. The light fades, the three keep typing, the only light emerges from the screen. After a few seconds, the light goes on again, the sketched Semiring instance is on the screen. H AZEL Let’s try the examples from before, but let’s now check for leftmost longest matching.

T HEO We need to make Leftmost an instance of Semiringi . I guess we just wrap the given position in the Start constructor. instance Semiringi Leftmost where index = Leftmost · Start H AZEL Right. Now, executing submatchw in the Leftmost semiring yields the start position of the leftmost match. We don’t have to write a new algorithm but can use the one that we defined earlier and from which we know it is efficient. Pretty cool. T HEO Let me see if our program works. (Starts GHCi.) I’ll try to find substrings that match against the regular expression a(a|b)∗ a and check where they start.

ghci> submatchw NoLeftLong ghci> submatchw LeftLong (Range ghci> submatchw LeftLong (Range

aaba "ab" :: LeftLong aaba "aa" :: LeftLong 0 1) aaba "bababa" :: LeftLong 1 5)

The three lean back in their office chairs, sip their coffee, and look satisfied.

ghci> let a = symi ’a’ ghci> let ab = repw (a ‘altw ‘ symi ’b’) ghci> let aaba = a ‘seqw ‘ ab ‘seqw ‘ a ghci> submatchw aaba "ab" :: Leftmost NoLeft ghci> submatchw aaba "aa" :: Leftmost Leftmost (Start 0) ghci> submatchw aaba "bababa" :: Leftmost Leftmost (Start 1)

S CENE III . E XPERIMENTS C ODY and H AZEL sit in front of the computer screen. It’s dark by now, no daylight anymore. C ODY Before we call it a day, let’s check how fast our algorithm is. We could compare it to the grep command and use the regular expressions we have discussed so far. (Opens a new terminal window and starts typing.)

Ok. Good. In the first example, there is no matching and we get back NoLeft. In the second example, the whole string matches and we get Leftmost (Start 0). In the last example, there are three matching subwords—"ababa" starting at position 1, "aba" starting at position 1, and "aba" starting at position 3— and we get the leftmost start position. C ODY Can we extend this to compute also the length of leftmost longest matches? H AZEL Sure, we use a pair of positions for the start and the end of the matched subword.

bash> for i in ‘seq 1 10‘; do echo -n a; done | \ ...> grep -cE "^(a?){10}a{10}$" 1

H AZEL What was that? C ODY That was a for loop printing ten a’s in sequence which were piped to the grep command to print the number of lines matching the regular expression (a|ε)10 a10 . H AZEL Aha. Can we run some more examples? C ODY Sure. (Types in more commands.)

data LeftLong = NoLeftLong | LeftLong Range data Range = NoRange | Range Int Int

bash> ...> 0 bash> ...> 1 bash> ...> 0

The Semiring instance for LeftLong is very similar to the one we defined for Leftmost. We have to change the definition of addition, namely, where we select from two possible matchings. In the new situation, we pick the longer leftmost match rather than only considering the start position. If the start positions are equal, we also compare the end positions: First, they only sketch how to implement this. ... LeftLong x ⊕ LeftLong y = LeftLong (leftlong x y) where leftlong ... leftlong (Range i j) (Range k l) |i l = Range i j | otherwise = Range k l

for i in ‘seq 1 9‘; do echo -n a; done | \ grep -cE "^(a?){10}a{10}$" for i in ‘seq 1 20‘; do echo -n a; done | \ grep -cE "^(a?){10}a{10}$" for i in ‘seq 1 21‘; do echo -n a; done | \ grep -cE "^(a?){10}a{10}$"

H AZEL Ah. You were trying whether nine a’s are accepted—they are not—and then checked 20 and 21 a’s. C ODY Yes, it seems to work correctly. Let’s try bigger numbers and use the time command to check how long it takes. bash> time for i in ‘seq 1 500‘;do echo -n a;done |\ ...> grep -cE "^(a?){500}a{500}$"

C ODY And when combining two matches sequentially, we pick the start position of the first part and the end position of the second part. Pretty straightforward!

C ODY and H AZEL stare at the screen, waiting for the call to finish. A couple of seconds later it does.

... LeftLong x ⊗ LeftLong y = LeftLong (range x y) where range ... range (Range i ) (Range j) = Range i j

1 real user sys

364

0m17.235s 0m17.094s 0m0.059s

H AZEL That’s not too fast, is it? Let’s try our implementation. (Switches to GHCi and starts typing.) ghci> ghci> ghci> ghci> ghci> True (5.99

bash> for i in ‘seq 1 5000‘; do echo -n a; done |\ ...> ./re5000 +RTS -s match ... 3 MB total memory in use ... Total time 20.80s (21.19s elapsed) %GC time 83.4% (82.6% elapsed) ...

let a = symw (’a’==) let seqn n = foldr1 seqw . replicate n let re n = seqn n (altw a epsw ) ‘seqw ‘ seqn n a :set +s matchw (re 500) (replicate 500 ’a’) secs, 491976576 bytes)

H AZEL The memory requirements are quite good but in total it’s about five times slower than Google’s library in this example. C ODY Yes, but look at the GC line! More than 80% of the run time is spent during garbage collection. That’s certainly because we rebuild the marked regular expression in each step by shiftw . H AZEL This seems inherent to our algorithm. It’s written as a purely functional program and does not mutate one marked regular expression but computes new ones in each step. Unless we can somehow eliminate the data constructors of the regular expression, I don’t see how we can improve on this. C ODY A new mark of a marked expression is computed in a tricky way from multiple marks of the old expression. I don’t see how to eliminate the expression structure which guides the computation of the marks. H AZEL Ok, how about trying another example? The Google library is based on simulating an automaton just like our algorithm. Our second example, which checks whether there are two a’s with a specific distance, is a tough nut to crack for automata-based approaches, because the automaton is exponentially large.

C ODY Good. We’re faster than grep and we didn’t even compile! But it’s using a lot of memory. Let me see. (Writes a small program to match the standard input stream against the above expression and compiles it using GHC.) I’ll pass the -s option to the run-time system so we can see both run time and memory usage without using the time command. bash> for i in ‘seq 1 500‘; do echo -n a; done | \ ...> ./re500 +RTS -s match ... 1 MB total memory in use ... Total time 0.06s (0.21s elapsed) ...

Seems like we need a more serious competitor! H AZEL I told you about Google’s new library. They implemented an algorithm in C++ with similar worst case performance as our algorithm. Do you know any C++? C ODY Gosh! The light fades, the two keep typing, the only light emerges from the screen. After a few seconds, the light goes on again. H AZEL Now it compiles! C ODY Puuh. This took forever—one hour. H AZEL Let’s see whether it works. C ODY C++ isn’t Haskell. They both smile. H AZEL We wrote the program such that the whole string is matched, so we don’t need to provide the start and end markers ^ and $.

T HEO curiously enters the scene. C ODY Ok, can we generate an input string such that almost all states of this automaton are reached? Then, hopefully, caching strategies will not be successful. T HEO If we just generate a random string of a’s and b’s, then the probability that it matches quite early is fairly high. Note that the probability that it matches after n + 2 positions is one fourth. We need to generate a string that does not match at all and is sufficiently random to generate an exponential number of different states. If we want to avoid that there are two a’s with n characters in between, we can generate a random string and additionally keep track of the n + 1 previous characters. Whenever we are exactly n + 1 steps after an a, we generate a b. Otherwise, we randomly generate either an a or a b. Maybe, we should . . .

bash> time for i in ‘seq 1 500‘;do echo -n a;done |\ ...> ./re2 "(a?){500}a{500}" match real 0m0.092s user 0m0.076s sys 0m0.022s

T HEO’s voice fades out. Apparently, he immerses himself in some problem. C ODY and H AZEL stare at him for a few seconds, then turn to the laptop and write a program genrnd which produces random strings of a’s and b’s. They turn to T HEO.

Ah, that’s pretty fast, too. Let’s push it to the limit: bash> time for i in ‘seq 1 5000‘;do echo -n a;done |\ ...> ./re2 "(a?){5000}a{5000}" Error ... invalid repetition size: {5000}

C ODY Theo, we’re done! T HEO Ohhh, sorry! (Looks at the screen.) C ODY We can call genrnd with two parameters like this:

C ODY Google doesn’t want us to check this example. But wait. (Furrows his brow.) Let’s cheat:

bash> ./genrnd 5 6 bbbaaaaabbbbbbbabaabbbbbbbaabbbbbbabbaabbb

bash> time for i in ‘seq 1 5000‘;do echo -n a;done |\ ...> ./re2 "((a?){50}){100}(a{50}){100}" match real 0m4.919s user 0m4.505s sys 0m0.062s

The result is a string of a’s and b’s such that there are no two a’s with 5 characters in between. The total number of generated characters is the product of the incremented arguments, i.e., in this case (5 + 1) ∗ (6 + 1) = 42. T HEO Ok. So if we want to check our regular expression for n = 20 we need to use a string with length greater than 220 ≈ 106 . Let’s generate around 2 million characters. H AZEL Ok, let’s check out the Google program.

H AZEL Nice trick! Let’s try our program. Unfortunately, we have to recompile for n = 5000, because we cannot parse regular expressions from strings yet. They recompile their program and run it on 5000 a’s.

365

C ODY The benchmarks are quite encouraging and I like how elegant the implementation is. T HEO I like our work as well, although it is always difficult to work with practitioners. (Rolls his eyes.) It is a pity that the approach only works for regular languages. C ODY I think this is not true. Haskell is a lazy language. So I think there is no reason why we should not be able to work with nonregular languages. T HEO How is this possible? (Starts eating much faster.) C ODY Well, I think we could define an infinite regular expression for a given context-free language. There is no reason why our algorithm should evaluate unused parts of regular expressions. Hence, context-free languages should work as well. T HEO That’s interesting. (Finishes his sandwich.) Let’s go to Hazel and discuss it with her. T HEO jumps up and rushes to H AZEL. C ODY is puzzled and follows, eating while walking. T HEO (addressing H AZEL) Cody told me that it would also be possible to match context-free languages with our Haskell program. Is that possible? H AZEL It might be. Let’s check how we could define a regular expression for any number of a’s followed by the same number of b’s ({an bn | n > 0}). C ODY Instead of using repetitions like in a∗ b∗ , we have to use recursion to define an infinite regular expression. Let’s try.

bash> time ./genrnd 20 100000 | ./re2 ".*a.{20}a.*"

While the program is running C ODY is looking at a process monitor and sees that Google’s program uses around 5 MB of memory. no match real 0m4.430s user 0m4.514s sys 0m0.025s

Let’s see whether we can beat this. First, we need to compile a corresponding program that uses our algorithm. They write a Haskell program dist20 which matches the standard input stream against the regular expression .*a.{20}a.*. Then they run it. bash> ./genrnd 20 100000 | ./dist20 +RTS -s no match ... 2 MB total memory in use ... Total time 3.10s (3.17s elapsed) %GC time 5.8% (6.3% elapsed) ...

H AZEL Wow! This time we are faster than Google. And our program uses only little memory. C ODY Yes, and in this example, the time for garbage collection is only about 5%. I guess that’s because the regular expression is much smaller now, so fewer constructors become garbage. T HEO This is quite pleasing. We have not invested any thoughts in efficiency—at least w.r.t. constant factors—but, still, our small Haskell program can compete with Google’s library. H AZEL What other libraries are there for regular expression matching? Obviously, we cannot use a library that performs backtracking, because it would run forever on our first benchmark. Also, we cannot use a library that constructs a complete automaton in advance, because it would eat all our memory in the second benchmark. What does the standard C library do? C ODY No idea. Just as above, the light fades out, the screen being the only source of light. C ODY and H AZEL keep working, T HEO falls asleep on his chair. After a while, the sun rises. C ODY and H AZEL look tired, they wake up T HEO. C ODY (addressing T HEO) We wrote a program that uses the standard C library regex for regular expression matching and checked it with the previous examples. It’s interesting, the performance differs hugely on different computers. It seems that different operating systems come with different implementations of the regex library. On this laptop—an Intel MacBook running OS X 10.5—the regex library outperforms Google’s library in the first benchmark and the Haskell program in the second benchmark – both by a factor between two and three, but not more. We tried it on some other systems, but the library was slower there. Also, when not using the option RE2::Latin1 in the re2 program it runs in UTF-8 mode and is more than three times slower in the second benchmark. T HEO Aha.

ghci> ghci> ghci> ghci>

let a = symw (’a’==) let b = symw (’b’==) let anbn = epsw ‘altw ‘ seqw a (anbn ‘seqw ‘ b) matchw anbn ""

The program doesn’t terminate. ^C

T HEO It doesn’t work. That’s what I thought! H AZEL You shouldn’t be so pessimistic. Let’s find out why the program evaluates the infinite regular expression. C ODY I think the problem is the computation of finalw . It traverses the whole regular expression while searching for marks it can propagate further on. Is this really necessary? H AZEL You mean there are parts of the regular expression which do not contain any marks. Traversing these parts is often superfluous because nothing is changed anyway, but our algorithm currently evaluates the whole regular expression even if there are no marks. C ODY We could add a flag at the root of each subexpression indicating that the respective subexpression does not contain any mark at all. This could also improve the performance in the finite case, since subexpressions without marks can be shared instead of copied by the shiftw function. T HEO I’d prefer to use the term weights when talking about the semiring implementation. When you say marks you mean weights that are non-zero. C ODY Right. Let me change the implementation. C ODY leaves.

S CENE II . L AZINESS

ACT III

T HEO and H AZEL still at the desk. C ODY returns. C ODY (smiling) Hi guys, it works. I had to make some modifications in the code, but it’s still a slick program. You can check out the new version now. H AZEL What did you do?

S CENE I . I NFINITE REGULAR EXPRESSIONS H AZEL sitting at her desk. T HEO and C ODY at the coffee machine, eating a sandwich.

366

where stepw is the old definition of shiftw with recursive calls to shiftw . The only change is the definition for the SYMw case, as I showed you before. 2 T HEO Ok, fine (tired of the practitioners’ conversation). How about trying it out now? C ODY Ok, let’s try an bn with the new implementation. We only have to use the variants of our smart constructors that create inactive regular expressions.

C ODY First of all, I added a boolean field active to the data type REGw . This field should be False for a regular expression without non-zero weights. If a weight is shifted into a subexpression the corresponding node is marked as active. shiftw m (SYMw f) c = let fin = m ⊗ f c in (symw f) {active = fin 6= zero, finalw = fin}

ghci> ghci> ghci> ghci> True ghci> True ghci> True ghci> False

H AZEL So the new field is a flag that tells whether there are any non-zero weights in a marked regular expression. We need an extra flag because we cannot deduce this information from the values of emptyw and finalw alone. An expression might contain non-zero weights even if the value of finalw is zero. C ODY Right. The smart constructors propagate the flag as one would expect. Here is the modified definition of seqw , the other smart constructors need to be modified in a similar fashion: seqw :: Semiring s ⇒ REGw c s → REGw c s → REGw c s seqw p q = REGw {active = active p ∨ active q, emptyw = emptyw p ⊗ emptyw q, finalw = finala p ⊗ emptyw q ⊕ finala q, regw = SEQw p q}

matchw anbn "ab" matchw anbn "aabb" matchw anbn "aabbb"

T HEO Impressive. So, what is the class of languages that we can match with this kind of infinite regular expressions? H AZEL I guess it is possible to define an infinite regular expression for every context-free language. We only have to avoid left recursion. C ODY Right. Every recursive call has to be guarded by a symbol, just as with parser combinators. T HEO I see. Then it is enough if the grammar is in Greibach normal form, i. e., every right-hand side of a rule starts with a symbol. C ODY Exactly. But, in addition, regular operators are allowed as well, just as in extended Backus-Naur form. You can use stars and nested alternatives as well. H AZEL Pretty cool. And I think we can recognize even more languages. Some context-sensitive languages should work as well. T HEO How should this be possible? H AZEL By using the power of Haskell computations. It should be possible to construct infinite regular expressions in which each alternative is guarded by a symbol and remaining expressions can be computed by arbitrary Haskell functions. Let’s try to specify an infinite regular expression for the language an bn cn (more precisely, {an bn cn | n > 0}), which—as we all know— is not context-free. T HEO A first approach would be something like the following. (Scribbles on the whiteboard.)

H AZEL What is finala ? C ODY It’s an alternative to finalw that takes the active flag into account. finala :: Semiring s ⇒ REGw c s → s finala r = if active r then finalw r else zero H AZEL How does this work? C ODY It blocks the recursive computation of finalw for inactive regular expressions. This works because of lazy evaluation: if the given expression is inactive, this means that it does not contain any non-zero weights. Thus, we know that the result of computing the value for finalw is zero. But instead of computing this zero recursively by traversing the descendants, we just set it to zero and ignore the descendants. H AZEL This only works if the value of the active field can be accessed without traversing the whole expression. This is why we need special constructors for constructing an initial regular expression with all weights set to zero. C ODY Yes, for example, a constructor function for sequences without non-zero weights can be defined as follows: seq p q = REGw {active emptyw finalw regw

let a = symw (’a’==) let b = symw (’b’==) let anbn = epsw ‘alt‘ seq a (anbn ‘seq‘ b) matchw anbn ""

ε | abc | aabbcc | aaabbbccc | . . . C ODY Unfortunately, there are infinitely many alternatives. If we generate them recursively, the recursive calls are not guarded by a symbol. But we can use distributivity of choice and sequence. H AZEL Ah! Finally, we are using an interesting semiring law:

= False, = emptyw p ⊗ emptyw q, = zero, = SEQw p q}

ε | a(bc | a(bbcc | a(bbbccc | a(. . .)))) Now every infinite alternative of a choice is guarded by the symbol a. Hence, our algorithm only traverses the corresponding subexpression if the input contains another a. C ODY So, let’s see! We first define functions to generate a given number of b’s and c’s. (Typing into GHCi again.)

The difference to seqw is in the definition of the active and finalw fields, which are set to False and zero, respectively. H AZEL Ok, I guess we also need new functions alt and rep for initial regular expressions where all weights are zero. C ODY Right. The last change is to prevent the shiftw function from traversing (and copying) inactive subexpressions. This can be easily implemented by introducing a wrapper function in the definition of shiftw :

ghci> let bs n = replicate n (symw (’b’==)) ghci> let cs n = replicate n (symw (’c’==))

Then we use them to build our expression. (Continues typing.)

shiftw :: (Eq s, Semiring s) ⇒ s → REGw c s → c → REGw c s shiftw m r c | active r ∨ m 6= zero = stepw m (regw r) c | otherwise =r

2 These

modifications not only allow infinite regular expressions, they also affect the performance of the benchmarks discussed in Act II. The first benchmark runs in about 60% of the original run time. The run time of the second is roughly 20% worse. Memory usage does not change siginificantly.

367

ghci> ghci> ghci> ghci>

let let let let

several implementations of regular expressions in Haskell [Haskell Wiki]. Some of these are bindings to existing C libraries, others are implementations of common algorithms in Haskell. In comparison with these implementations our approach is much more concise and elegant, but can still compete with regard to efficiency. The experiments were carried out using GHC version 6.10.4 with -O2 optimizations. The Google library can be found at http://code.google.com/ p/re2/, the accompanying blog post at http://google-opensource.

bcs n = foldr1 seq (bs n ++ cs n) a = symw (’a’==) abc n = a ‘seq‘ alt (bcs n) (abc (n+1)) anbncn = epsw ‘alt‘ abc 1

T HEO Fairly complicated! Can you check it? C ODY enters some examples. ghci> True ghci> True ghci> True ghci> False

matchw anbncn "" matchw anbncn "abc"

blogspot.com/2010/03/re2-principled-approach-to-regular. html.

matchw anbncn "aabbcc" matchw anbncn "aabbbcc"

R EFERENCES

T HEO Great, it works. Impressive!

S CENE III . R EVIEW The three at the coffee machine. H AZEL Good that you told us about Glushkov’s construction. T HEO We’ve worked for quite a long time on the regular expression problem now, but did we get somewhere? H AZEL Well, we have a cute program, elegant, efficient, concise, solving a relevant problem. What else do you want? C ODY What are we gonna do with it? Is it something people might be interested in? T HEO I find it interesting, but that doesn’t count. Why don’t we ask external reviewers? Isn’t there a conference deadline coming up for nice programs (smiling)? C ODY and H AZEL (together) ICFP. T HEO ICFP? C ODY Yes, they collect functional pearls—elegant, instructive, and fun essays on functional programming. T HEO But how do we make our story a fun essay? The three turn to the audience, bright smiles on their faces!

C. Allauzen and M. Mohri. A unified construction of the Glushkov, follow, and Antimirov automata. In R. Kralovic and P. Urzyczyn, editors, Mathematical Foundations of Computer Science 2006 (MFCS 2006), Stará Lesná, Slovakia, volume 4162 of Lecture Notes in Computer Science, pages 110–121. Springer, 2006. P. Caron and M. Flouret. From Glushkov WFAs to rational expressions. In Z. Ésik and Z. Fülöp, editors, Developments in Language Theory, 7th International Conference (DLT 2003), Szeged, Hungary, volume 2710 of Lecture Notes in Computer Science, pages 183–193. Springer, 2003. M. Droste, W. Kuich, and H. Vogler. Handbook of Weighted Automata. Springer, New York, 2009. V. M. Glushkov. On a synthesis algorithm for abstract automata. Ukr. Matem. Zhurnal, 12(2):147–156, 1960. S. A. Greibach. A new normal-form theorem for context-free phrase structure grammars. J. ACM, 12(1):42–52, 1965. Haskell Wiki. Haskell – regular expressions. http://www.haskell.org/ haskellwiki/Regular_expressions. P. Hudak, J. Hughes, S. L. Peyton-Jones, and P. Wadler. A history of Haskell: being lazy with class. In Third ACM SIGPLAN History of Programming Languages Conference (HOPL-III), San Diego, California, pages 1–55. ACM, 2007. S. Kleene. Representation of events in nerve nets and finite automata. In C. Shannon and J. McCarthy, editors, Automata Studies, pages 3–42. Princeton University Press, Princeton, N.J., 1956. R. McNaughton and H. Yamada. Regular expressions and state graphs for automata. IEEE Transactions on Electronic Computers, 9(1):39–47, 1960. M. O. Rabin and D. Scott. Finite automata and their decision problems. IBM journal of research and development, 3(2):114–125, 1959. M. P. Schützenberger. On the definition of a family of automata. Information and Control, 4(2–3):245–270, 1961. K. Thompson. Programming techniques: Regular expression search algorithm. Commun. ACM, 11(6):419–422, 1968.

E PILOGUE Regular expressions were introduced by Stephen C. Kleene in his 1956 paper [Kleene 1956], where he was interested in characterizing the behavior of McCulloch-Pitts nerve (neural) nets and finite automata, see also the seminal paper [Rabin and Scott 1959] by Michael O. Rabin and Dana Scott. Victor M. Glushkov’s paper from 1960, [Glushkov 1960], is another early paper where regular expressions are translated into finite-state automata, but there are many more, such as the paper by Robert McNaughton and H. Yamada, [McNaughton and Yamada 1960]. Ken Thompson’s paper from 1968 is the first to describe regular expression matching [Thompson 1968]. The idea of introducing weights into finite automata goes back to a paper by Marcel P. Schützenberger, [Schützenberger 1961]; weighted regular expressions came up later. A good reference for the weighted setting is the Handbook of Weighted Automata [Droste et al. 2009]; one of the papers that is concerned with several weighted automata constructions is [Allauzen and Mohri 2006]. The paper [Caron and Flouret 2003] is one of the papers that focuses on Glushkov’s construction in the weighted setting. What we nowadays call Greibach normal form is defined in Sheila A. Greibach’s 1965 paper [Greibach 1965]. Haskell is a lazy, purely functional programming language. A historical overview is presented in [Hudak et al. 2007]. There are

368

Experience Report: Haskell as a Reagent Results and Observations on the Use of Haskell in a Python Project Iustin Pop Google Switzerland [email protected]

Abstract

Categories and Subject Descriptors D.2.12 [Software engineering]: Interoperability; D.3.2 [Programming languages]: Language Classifications—Applicative (functional) languages

type system contrasts markedly from Python’s “laissez-faire” approach to types, and its cheap-persistence model is the opposite of Python’s cheap-modification one, while both are in the same highlevel language category where complex data-modification pipelines are readily available. Such diametrically opposite views on some topics were very good at highlighting differences, and neither language was delegated to the ‘low-level’ versus ‘high-level’ status. We have observed gains from simply having two different languages exercise the same API/RPC endpoints; in our case, this meant that the effort spent to standardise the message types (useful for Haskell) can lead to a more sound framework on the Python side. Prototyping the same algorithm in both languages led to a better understanding and in a few cases optimisations in one language can be carried to the other side. Not all was good, however; a few bumps appeared along the way, in the form of small issues with availability of libraries, performance for some operations, compatibility between different versions of the base libraries and the higher difficulty of advanced programming techniques in a functional language.

General Terms

1.1

In system administration, the languages of choice for solving automation tasks are scripting languages, owing to their flexibility, extensive library support and quick development cycle. Functional programming is more likely to be found in software development teams and the academic world. This separation means that system administrators cannot use the most effective tool for a given problem; in an ideal world, we should be able to mix and match different languages, based on the problem at hand. This experience report details our initial introduction and use of Haskell in a mature, medium size project implemented in Python. We also analyse the interaction between the two languages, and show how Haskell has excelled at solving a particular type of realworld problems.

Keywords

1.

Experimentation, Languages

Our contribution

Past ICFP experience reports have focused on either conversion of software from an imperative language ([Newton and Ko 2009]), or on using functional programming for an entire project ([Sampson 2009]). Furthermore, reports on the use of functional programming refer in general to either use in research institutes and universities ([Cuoq et al. 2009] and [Balat et al. 2009]) or in commercial software development teams (e.g. [Sampson 2009]). We believe our use of Haskell in combination with Python in an already existing, mature project and in the context of system administration represents a different view on the use of functional programming.

Haskell, Python, Ganeti, System administration

Introduction

For the past year, our team has developed1 and started to use a set of tools implemented in Haskell to solve a specific category of problems that were not best expressed in an interpreted language. This required learning a new style of programming, becoming familiar with the tool-chain (compiler, profiler, documentation tools, etc.), investigating the available libraries and making sure that our new tools inter-operate well with the Python code. At the end, the question was: is the extra effort needed for maintaining code written in two languages justified? Do we get any advantage out of combining two high-level, but quite different, languages? As we try to show in this paper, in our experience the answer is affirmative, sometimes in non obvious ways. Haskell’s strong

2.

The team and the project

Our team is part of Google’s corporate IT system administration group, dealing with administration of virtual machines. While most of us have strong Python development skills, and everyone is familiar with other system administration languages and tools, we are not, per se, a software development team. Rather the development activities are ‘demand-based’ and geared towards automation of system administration. As part of our work, we have developed Ganeti (http://code. google.com/p/ganeti/), a management tool for clusters of virtual machines (e.g. Xen, KVM). Our team has been working on the project since 2006, open-sourced it in September 2007, and it was (before the introduction of the Haskell component) written almost entirely in Python, with just some small bits of shell and other languages, mostly for the build system. The objective of Ganeti is to enable easy management of clusters created from off-the-shelf hardware, without requiring custom or expensive storage or net-

1 As

described later in the paper, while the actual Haskell code has a single author, the entire team has participated in the design and testing of these tools, and therefore the paper uses the ‘we’ pronoun

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 Copyright

369

inal Twisted3 -based RPC and serialisation format to HTTP and JSON, and for data storage from Python Pickle format to (again) JSON4 . While this decision was not made with any specific purpose in mind, it did help during the development of Ganeti as such data formats are more stable and very easy to use/modify even from the shell. Later, it was one of the key factors that allowed the use of Haskell.

work gear. As such, we use a quantity of other open-source software for managing the physical resources. For storage management we use DRBD2 , a software solution for over-the-network RAID1 storage. Using RAID1 (mirroring) means that each virtual disk resides on two physical machines, called the primary and secondary machine respectively. A virtual machine can cheaply switch between these two machines (if the other required resources, e.g. memory, are sufficient), an operation called failover. Switching a virtual disk from the machine pair (A,B) to (A,C), an operation called relocation, requires copying the entire disk data from A to C and thus it is costly. This dependency of each virtual machine on two physical machines means that the placement algorithm is not as straightforward as in solutions using external storage, where any physical machine can access the entirety of all storage; therefore, we had to develop tools that can automate the layout computation in order to best use the resources of each physical machine. 2.1

2.3

The introduction of Haskell in the project was somewhat accidental. The capabilities of Ganeti itself were growing and it was able to manage bigger and bigger clusters; but the layout algorithms were still very weak. Thus, at the end of 2008 the author of the paper started working towards solving an independent (at that time) problem, that is the automated computation of the changes needed in cluster state to solve N+1 check failures. The initial version of the algorithm attempted a brute-force search over a limited subset of the solution space, and thus the Python implementation was very slow. It was also unwieldy, since throw-away copies of the data are expensive in Python and undoing modifications in a generic and safe way is not simple. The next step was intended to be both a learning experience and a language comparison exercise: how would such an algorithm, the simulation of cluster state changes when virtual machines are being relocated, look in a functional language? People familiar with functional programming might recognise such an algorithm as a good fit at once; for the author it took a while until he was convinced that indeed such modelling is easier to achieve in a purely functional way, using persistent data structures.

Layout policies

We anticipated somewhat early in the development of Ganeti that the actual policies for the layout of virtual machines across (pairs of) physical machines might differ based on site policies and thus decided that the actual policy should be left to external scripts, while Ganeti itself should just implement the mechanism. Thus, we have a documented API (called the IAllocator API) for things like given cluster state X, on what pair of machines should new virtual machine with specifications Y be placed?. Note that this works the same way for non-mirrored storage, where we simply allocate on a single physical machine (in a non-redundant setup). The recursive application of this problem is how many virtual machines can we allocate on a cluster before we violate site policies or run out of physical resources. The allocation problem is also present in a slight different version: if a physical machine needs to be removed from the cluster (e.g. due to hardware failures, or any other reason), the virtual machines which live on this particular machine need to be relocated to another member of the cluster. The question is expressed as: given cluster state X, we want to move a virtual machine from physical machine pair (A, B) to (A, x); what is the best choice for x?. We use the same API as above. A third related problem is computing the ‘optimal’ layout of the current cluster. This means responding to the question: given current cluster state X, with physical machine list N and virtual machine list I, how should we relocate the virtual machines for a better layout?. This question is not encoded in the IAllocator API, but we can both extract the cluster state and instruct changes in the layout via other Ganeti APIs. A final note is that the layout of virtual machines across the physical machine pairs has both optimisation aspects (e.g. even load distribution) and hard constraints. One important such constraint is that if a physical machine fails, all its peers must have enough free memory to failover and run the now offline instances. In other words: for a given physical machine A and its hosted virtual machines Ii , each living on the machine pair (A, xi ), does each machine xi have enough free memory to accommodate Ii ?. We call this N+1 redundancy, and we flag in our verification routines any physical machines that fail this check. 2.2

Introduction of Haskell

2.3.1

Time-line

N+1 solver Initial forays into the Haskell implementation of the above algorithm proved successful, and after a couple of weeks we had a tool to automatically compute the solution list (a set of move virtual machine I from (A, B) to (C, D) commands) needed to solve the N+1 failures in most of the usual cases. The algorithm itself was still rough, and due to its brute-force nature it was very limited in capabilities. One of the biggest limitations was that it wasn’t able to tell if it will ever manage to find a solution and thus it tried to explore the whole solution space (which is too big to compute), thus preventing its use in a fully-automated way. Cluster balancer The N+1 solver already provided the infrastructure needed for importing data from Ganeti, so writing a new algorithm was a much smaller effort. Thus, the next goal was another missing piece of the Ganeti infrastructure, a generic cluster balancer that takes the state of the cluster and computes “the next best” state. The new algorithm is no longer a brute-force algorithm, but an iterative one that looks in each step at the current cluster state and computes the next cluster state. This is done without look-ahead, and without keeping history, so its time and space characteristics were very good and we soon started testing this new tool in production. At this point, the team made a decision: do we keep developing the Haskell implementation of the algorithm, or do we rewrite the algorithm (which was reasonably simple) in Python? The reasons for staying with Haskell were twofold. First, all the problems we attack are basically numerical algorithms, thus they

Open APIs

Another key decision early in the history of Ganeti was that, as much as possible, our APIs and data formats should be language agnostic. Thus, we moved (for the inter-node RPC) from the orig-

3 http://twistedmatrix.com/trac/,

“an event-driven networking engine written in Python” 4 The Protocol Buffers data encoding format, which is used extensively inside Google, was not open-sourced at the time we did this conversion; hence the choice of JSON

2 http://www.drbd.org/,

a networked RAID1 driver usable either in active-passive or active-active mode

370

hspace – computes the available cluster capacity

model very nicely in the pure domain; Haskell was here at its best, and the performance of the program was very good. Second, the entire Haskell code-base was trivial at this point (roughly 1500 lines of code, including comments), so the cost of an eventual rewrite into Python (in case ever needed, e.g. to standardise on a single language) was deemed low enough as not to be an impediment. A third, non obvious reason for the initial acceptance of these tools was that they came at the right time, and filled a very big gap in our project. The value that they brought was high enough that it helped overcome the barrier of introducing a new language. 2.3.2

All the tools work based on the same core algorithm: 1. The cluster state is analysed and we compute the current numerical score based on both hard constraints and optimisation scores (a) hard constraints represent extremely undesirable cases that are flagged as errors by Ganeti itself; they degrade the cluster score heavily (b) optimisation scores are obtained by computing the standard deviation of normalised metrics (e.g. percentage free memory on all physical machines is expressed as a value in the range [0, 1] and the standard deviation of this vector is used as the ‘free memory score’ metric)

Expansion of the code-base

In the months following the initial acceptance, the project entered a phase of significant expansion; after a few weeks of use, the question changed from “should we use this to automate cluster balancing?” to “can we add a new rule/constraint to the algorithm?”. From the team’s perspective, once the initial barrier of acceptance was overcome and the tools were stable, there was no reason to hold back their use. From the development point of view, once the initial I/O framework was in place and the core algorithm implemented, it was easy to iterate on the code base and extend it with new features. Furthermore, after we had some experience with the cluster balancing tool, we realised that the algorithm we developed could be used for all our allocation/layout problems; whether placement of a new virtual machine, or most efficient layout, or computing the maximal cluster capacity. At this point, we were comfortable enough with the stability of the code-base to delegate such decisions to it, and continued to iterate on the capabilities of the tools. 2.3.3

(c) at the end all metrics are summed and they result in the final cluster weight 2. We then iterate over all the possible virtual machines and their moves (in balancing) or over all the possible ways to allocate a new virtual machine, and chose the best (according to the new score) state Initially we had only two metrics (percentage memory free, percentage disk free); the current version has many more: • percentage free memory, free disk and memory reserved for

redundancy • ratio of virtual-to-physical CPUs • experimental metrics for load-based balancing (CPU load,

Integration with Ganeti APIs

memory load, disk bandwidth, network bandwidth)

Initially, the Haskell tools were interacting with Ganeti via the command line interface, which worked but was suboptimal. As described in section 2.2, the APIs provided by Ganeti use standard protocols and data formats, so in time it was rather trivial to extend the Haskell tools to talk to Ganeti directly. Fortunately, the json, curl and network libraries in Haskell are stable and have all the needed features for our use, so from this point of view we have observed no limitations in what regards library support. The use of the APIs from Haskell led to interesting discoveries about the consistency of our Python RPCs, described in section 3.4. 2.4

• offline physical machines still hosting virtual machines • virtual machine exclusion via tags (for example, preventing two

virtual machines used as DNS servers to be hosted on the same physical machine) The algorithm is known to be imperfect (e.g. since it does not look ahead, it can get into a situations from where it cannot execute any more moves), but in practice it works well enough that all layout decisions can be done with it, with manual intervention being very rare. After the initial implementation and production deployment, development proceeded at a somewhat slower pace, but nonetheless we continued to improve the basic algorithm, implement new features, and keep to date with Ganeti changes.

Results and current status

By the summer of 2009, the tools were stable enough that we also released them5 as Open Source under the name “ganeti-htools”. The work on them continued and as of February 2010 we have the following capabilities:

3.

Haskell/Python interaction

In the following sections we detail our experiences in combining Haskell and Python, and some changes to the (already mature) Python code base as a result of gaining experience with Haskell. It is important to note that we don’t claim that either Haskell or Python absolutely enforces a certain programming paradigm; it’s just that each language has certain characteristics that make it easier and more natural to program in a certain way.

• Local and remote gathering of data from Ganeti clusters • Direct job execution for the local transport, or in the case of

remote transport, creation of a shell script with the needed commands • Sequencing of jobs customised such that we get the maximum

parallelism when executing them in Ganeti

3.1

The software package consists of the following tools:

Either String in Python

In languages which have native support for exceptions one can usually find many libraries that offer over-the-wire transport of exception; for Python, both Twisted and Pyro6 offer transport of Python exceptions from the server to the client (with just a few restrictions).

hbal – computes the needed moves to improve the cluster layout hail – used as an IAllocator script for Ganeti, for both new virtual machine placement, virtual machine moves and physical machine evacuations 5 see the release announcement at http://groups.google.com/group/ ganeti/msg/8a9fef84ff138071

6 http://pyro.sourceforge.net/,

tributed Object Technology system”

371

“an advanced and powerful Dis-

However, there are few, if any, both lightweight and language independent RPC libraries that offer this. When moving from Twisted to our HTTP-based RPC in Ganeti, we saw this as a regression in functionality, and we planned to solve it at a later time. In the meantime, we started to modify some RPC calls to return a tuple (Boolean, Payload) with the first member representing success or failure and the second one being either a string (in case of failure) or the actual payload. Since this seemed to be a temporary ‘hack’ until we got a real exception propagation framework in place, we only implemented it for a few RPC calls, in an ad-hoc mode. After ‘discovering’ the Either String data type in Haskell, we realised that this is exactly what were using in Python. Far from being a hack, it’s a simple and elegant way to transport classless exceptions. We proceeded to rework our RPC framework to have this as a basic functionality instead of being implemented in the individual RPC calls, and as a result we gained much better error reporting across the entire inter-node communication. It is unfortunate, though, that one has to code algebraic data types by hand in imperative languages; for many types of problems they are the most natural way to express values. 3.2

Recently, Python has gained partial function application, however this was implemented as a library, and not at syntax level. Thus, actually using this feature is less direct; compare for example a very trivial example in Haskell: let fn = map length with the Python version: fn = partial(map, len) The latter is more cumbersome to use, as it breaks the normal code flow (it is not instantly clear that len will be an argument of map, one has to mentally parse the partial function call first). Due to this, and to the fact that it is a recent addition to the Python libraries, we have not yet started using partial in our Python code. Another pair of functions present in both languages and which we use is any and all. While the original Python implementation (as a library) was similar to the Haskell one, the recent conversion to built-in functions dropped the predicate argument. This had two effects: for lists of booleans, their use is simpler, but for lists where we still need to apply a predicate, the syntax became more complicated:

Persistent data types

result = any(pred(i) for i in lst)

Python has very weak support of persistent data types; it is neither possible to mark a complex data structure read-only (in a generic way) nor to make cheap copies. The only native facility for data copies is in the copy module, but the speed of a so-called deepcopy is slower than a manually-coded attribute-by-attribute copy by a big margin (our tests show factors between three and five times). These two reasons tend to drive the architecture to careful inplace updates, even when this a suboptimal solution. After our experience with Haskell and understanding how much safer copyand-modify is, compared to in-place modification-and-undo, we reused our serialisation framework for a cheap data-copy functionality. This was facilitated by the fact that said framework is based on a two-stage process, first converting from custom objects to ‘standard’ Python types which are then serialised via JSON. By doing just the initial custom-object-to-standard-object conversion and then its reverse, we managed to get a simple, albeit slow, data copying method. This increased the safety of our code in a number of places, especially as these data types are used in a multi-threaded environment. We envision that careful use of this method can lead to a semi-pure style of programming in Python. Of course, the best solution would be to have the ability to ‘freeze’ arbitrary objects in the Python standard library. 3.3

This change is not a big impediment to their use, but it results in more verbose code. To summarise, the functional programming tools present in Python are of mixed quality, making their consistent use hard. It seems that the process of adopting them from other languages was not perfect: in some cases they are harder to use, and often they look like additions, rather than integral elements of the language. 3.4

Ganeti API consistency

Ganeti has roughly four sets of APIs that interest us: 1. command line interface; while mostly used by people, this was designed to be scriptable such that many tools use this simple method 2. local UNIX socket, JSON encoded messages; this is the simplest (and fastest) method since it relies on local Unix socket permissions and security and thus it doesn’t need to do any data encryption 3. HTTP-based, REST-style API; used for remote querying and administration (called RAPI) 4. the IAllocator API, a plugin-based framework for allocation and layout policies; this is an internal API, used between Ganeti and plugin scripts

Functional programming features in Python

The Python language has adopted a few functional programming features (sometimes directly from Haskell). How do these compare with their native counterparts? Probably the first such feature that Python programmers get accustomed to is a lambda expression. It has, however, a big weakness: Python differentiates between statements and expression; since lambdas can only contain expressions, they are not as powerful as regular functions. As such, their usage is very limited in Python, and their future in the language is under question7 . Fortunately, using functions (defined either normally or as lambda expressions) as normal values is indeed possible without any restrictions, and this is useful enough that (for example) we’re using it extensively in our Python code-base. Also the list generators are similar in both languages, and can be used to good effect in Python too.

The first three API sets are used for querying and changing the cluster state, while the latter is used for feeding information back into Ganeti as described in section 2.1. Except for the first API, which is plain text, all use JSON encoded messages which translate into native data types in most languages. This should make it straightforward to introduce a generic tool that talks either locally or remotely to Ganeti. However, when trying to integrate our Haskell program with Ganeti, we quickly found out that: • not all APIs exported the same data; the data available via each

API was only a sub-set of the data available in Ganeti • even for the same data, the actual naming of various properties

differed across the APIs While any program (independent of the language used) would have discovered these inconsistencies, static vs. dynamic typing makes a difference here. In Python, it’s very easy to adjust data

7 The

original Python 3000 standard proposed to drop them, but this decision was reversed during the development cycle

372

then back-ported to the original master daemon implementation8 . It would have been possible to detect such inefficiencies based on the Python version, but the profiler support for multi-threaded Python code is of very low quality compared with the GHC profiler; in this way, we used the GHC tool-chain indirectly for improving the Python code. While benchmarking the two implementations, a surprising result was that, due to the deficiencies of the standard String data type in Haskell, serialisation/de-serialisation of JSON messages is actually slower in Haskell compared to Python (which uses a Cbased module for speedups). In retrospective, this exercise has proven that it would be feasible to write more parts of our project in Haskell. Even though the heavy I/O emphasis in some areas does not match entirely Haskell’s strengths, there are other advantages (like the combination between read-only data-structures and very “cheap” sparks) that would make a Haskell implementation attractive.

structures on the fly, since we don’t have a strong type system. Changing a certain variable from Boolean to Integer will work most of the time without any changes. As such, it’s easier to simply adapt and have code branches that deal with different versions of a data format than adjust an entire ecosystem to a format change. In contrast, our experience with Haskell shows it’s best if differences are kept entirely at the representation layer (e.g. it is acceptable to have a different name for a value in the JSON message) but not in actual data format, since this means introducing algebraic data types which will complicate (via excessive use of pattern matching) many of the functions manipulating that particular piece of data. The effort needed to unify the APIs was small, since the differences were somewhat minor; this allowed a streamlined Haskell implementation, with multiple backends (each talking to a specific Ganeti API) which return the same data structures; the actual algorithm is then oblivious to the fact that we used an HTTP query or a Unix socket connection to talk to Ganeti. On the Python side, this resulted in a saner API overall, which should benefit all its consumers. 3.5

4.

Roadblocks

While our experiments with Haskell were successful and resulted into production level code, we have had a few small roadblocks along the way.

A test version of the “master daemon” in Haskell

As described before, our Haskell project is a set of small tools external to the main Python code-base. The Ganeti software architecture is moderately complex: on all physical machines, a so-called “node daemon” runs; one of these machines is designated as the current “master node” and it runs two more daemons: the “master daemon” which is the one actually responsible for the job execution and coordinating the node daemons, and “remote API daemon” which offers the RAPI endpoint described in the previous section. Commands can be submitted either remotely via RAPI, or locally on the master node via a set of Python scripts that talk directly to the master daemon. In this architecture, the node daemons are doing purely I/O related work, talking to LVM, DRBD, and the hypervisor in use. The master daemon manages the cluster configuration (and its replication) and coordinates the interactions between the node daemons. This means a significant part of the master daemon is dedicated to abstract data handling:

4.1

Debugging facilities

Like many other Haskell programmers, we had to debug the famous error message “Exception: Prelude.head: empty list”, followed by an abrupt exit from our program. In standard Python code, this message would have been accompanied by a stack trace, including the code fragment that generated such an exception. In Haskell, however, the available options are very primitive: either use Debug.Trace (a low-level facility) or implement changes to the pure code-flow (via the use of a Writer monad). Combined with the effects of laziness, this makes debugging a complex application much more difficult for the beginner Haskell programmer than in a scripting language. Another problem, not particular to Haskell but to compiled languages in general, is the inability to quickly debug problems on the deployment systems. Instead, the problem must be replicated on the development machines. By itself this wouldn’t be an issue, but it exacerbates the debugging difficulties. In due time we have learned ways to compensate for both these problems, but we still find debugging tools a bit weak in Haskell (e.g. compared to the excellent profiler); the author is looking forward to developments in this area—e.g. [Allwood et al. 2009].

1. Accepting incoming jobs from clients 2. Managing the job queue, replicating the jobs to other nodes and archiving them 3. Executing the jobs, including managing the locking and synchronisation between the different worker threads

4.2

As an exercise, we tried to see if it is feasible to implement a small subset of the master daemon in Haskell, how a Haskell program that deals heavily with I/O compares to the original Python code, and whether the advantages seen for the original tools are replicated to other parts of the code base. This effort had mixed success. At a basic level, we were able to implement the basic functionality (accept jobs, local queue management) and the most basic job type (the null job type), but the continuous use of the I/O domain changed the style of programming significantly: it looks rather more like our original Python version than expected (this might also be due to our limited experience with Haskell). We developed this Haskell version of the master daemon until we were able to actually run the Python command line scripts and see a fake cluster, execute job management commands, etc., at which point we declared the attempt complete. During the exercise, the use of the GHC profiler has revealed some deficiencies in our algorithm used for job submission and dispatching to the worker threads; since this was copied verbatim from the Python version, we were able to devise improvements that were

Status of library packaging in Linux distributions

While Hackage provides a plethora of libraries, not all of them are available in Linux distributions, or at least not in the current stable versions. We’re mainly interested in Debian, since it’s our test/reference platform. As an end user, due to the static linking, it’s very easy to deploy Haskell software: the needed libraries must be available only on a development machine, while the deployment targets do not need them. But packaging software for Debian is different, as it requires that all prerequisites must be available in Debian itself, for hermetic/reproducible builds across all supported architectures. We use just three libraries: json, curl and network; at the beginning of year 2009, neither the json nor curl were available at all in Debian, and this delayed our packaging efforts. Today, all are available in the unstable track, and this allowed us to package ganeti-htools for it, but it is still hard to back-port our software for the current stable version. 8 an

373

example is commit 009e73d in Ganeti, Optimise multi-job submit

The introduction of the Haskell platform, and the advances in packaging of the debian-haskell group however makes this an obsolete issue; both developers and end-users will be able to consider the state of Haskell integration on the same level as Python’s in future Debian versions. 4.3

we would lose the advantages described in the paper for this particular kind of problem (numerical algorithms). As to the opposite option, rewriting the Python part in Haskell, there are a few reasons why this is not feasible. First, the Python code-base is significant (around 30K lines of code), and rewriting a project of this size would be a huge effort, which is hard to justify. Second, it is unknown whether our team would be able to successfully reimplement Ganeti in Haskell, given our limited experience with the language. Were we to start the project from scratch, today, it would be a different proposition. Both Python and Haskell have their advantages, and choosing the right language would be a hard decision. Nevertheless, after using Haskell in real life to solve actual production problems, we believe that—both at language level and at implementation level (GHC)—it is a viable choice for projects of similar complexity in the domain of system administration.

Rate of change and maintenance effort

One surprising finding during development is that some basic functionality, in our case related to error handling (the use of the Control.Exception module), changed so much between GHC 6.8 and 6.10 that is hard to write code that works with both versions, unless one starts to use conditional compilation (via CPP defines, but this has its own problems), or switch to another library for error handling (e.g. the extensible exceptions package). Fortunately we were mostly interested in I/O errors, so we were able to revert to the simpler error handling in prelude9 . But such changes were surprising to us, and we did pause to consider how much effort will be needed in the future to keep backwards compatibility with current deployed systems, while still allowing use on unstable/development platforms. 4.4

Acknowledgments Many thanks to the Ganeti team in Google for their support during my initial experiments with Haskell, especially to Guido Trotter and Michael Hanselmann. Also, the subject of this paper would have not existed without Haskell itself, and the many resources created by the community that enabled me to learn, write and deploy Haskell.

String data type speed issues

As described in section 3.5, even in our limited experience we have observed the slow speed of the standard String data type. The author’s initial reaction was to simply look for other libraries in order to solve this problem; but, even though alternatives exist, the String type is still having a privileged position as the default text type, and thus many libraries use it by default; as such, the effort is put on prospective developers to decide “can I use library X or do I need to find an X-bytestring implementation to get reasonable speed?”. This situation is unexpected in a language and an implementation (GHC) that seems in many other ways quite mature, and the author believes that it adds a non-trivial effort on both sides of the community: on library authors, who need to provide bytestringenabled versions, and on the users of libraries, who need to make sure their mix of libraries does work as expected and gives reasonable performance. 4.5

References T. O. Allwood, S. Peyton Jones, and S. Eisenbach. Finding the needle: stack traces for ghc. In Haskell ’09: Proceedings of the 2nd ACM SIGPLAN symposium on Haskell, pages 129–140, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-508-6. V. Balat, J. Vouillon, and B. Yakobowski. Experience report: ocsigen, a web programming framework. In ICFP ’09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, pages 311–316, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-332-7. P. Cuoq, J. Signoles, P. Baudin, R. Bonichon, G. Canet, L. Correnson, B. Monate, V. Prevosto, and A. Puccetti. Experience report: Ocaml for an industrial-strength static analysis framework. In ICFP ’09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, pages 281–286, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-332-7. R. R. Newton and T. Ko. Experience report: embedded, parallel computervision with a functional dsl. In ICFP ’09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, pages 59–64, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-332-7. C. J. Sampson. Experience report: Haskell in the ’real world’: writing a commercial application in a lazy functional lanuage. In ICFP ’09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, pages 185–190, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-332-7.

High barrier to entry

Lastly, we believe that the most significant problem is the high barrier to entry. Even after the completion of this project, the author feels that his knowledge of Haskell is very much incomplete, and that he is far from being familiar with advanced topics (e.g. applicative programming, generic programming, etc.). Whether this is needed or not for small projects is debatable—for example, our current code works with only standard Haskell 98 (no language extensions in use)—but it might be possible that careful use of advanced programming techniques will reduce and simplify the code. The second remark on this topic refers to the difficulty of coopting other people to contribute; except for a few trivial patches, in our project the Haskell component remains a one person effort, compared to the Python code which has had around three to five active contributors (depending on project phase).

5.

Summary

After using Haskell for slightly more than a year, our conclusion is that even though the adoption barrier is quite high, Haskell is a good asset for solving certain types of problems. The combination with Python has shown to be a success, and we have managed to write a set of tools that are used daily for solving real-world problems. For the foreseeable future, our project will remain a duallanguage one; rewriting the Haskell part in Python is doable, but 9 commit

1cf9747 in htools, Change ExtLoader to only handle I/O errors

374

Instance Chains: Type Class Programming Without Overlapping Instances J. Garrett Morris

Mark P. Jones

Portland State University {jgmorris,mpj}@cs.pdx.edu

Abstract

structor t :+: u and a subtyping relation f :<: g, which is implemented with the following overlapping instance declarations:

Type classes have found a wide variety of uses in Haskell programs, from simple overloading of operators (such as equality or ordering) to complex invariants used to implement type-safe heterogeneous lists or limited subtyping. Unfortunately, many of the richer uses of type classes require extensions to the class system that have been incompletely described in the research literature and are not universally accepted within the Haskell community. This paper describes a new type class system, implemented in a prototype tool called ilab, that simplifies and enhances Haskellstyle type-class programming. In ilab, we replace overlapping instances with a new feature, instance chains, allowing explicit alternation and failure in instance declarations. We describe a technique for ascribing semantics to type class systems, relating classes, instances, and class constraints (such as kind signatures or functional dependencies) directly to a set-theoretic model of relations on types. Finally, we give a semantics for ilab and describe its implementation.

instance f :<: f instance f :<: (f :+: g) instance f :<: h ⇒ f :<: (g :+: h) The restrictions on overlapping instances in Haskell constrain the use of this relation in several ways. Most significantly, :<: only recurses on the right-hand side of a :+:, limiting its use to listlike (rather than tree-like) coproducts. There is also an unresolvable overlap between the first and third instances. This overlap causes Hugs, one implementation of Haskell, to reject the instances outright, and will cause GHC, another Haskell implementation, to issue type errors for some otherwise-valid predicates (see Section 2.2.2 for more details). These issues are typical of those encountered by Haskell programmers using overlapping instances. We argue that overlapping instances lack modularity, lack specification, and that they significantly complicate reasoning about type-level programming. This paper proposes an alternative approach to type-class programming, replacing overlapping instances with a new feature called instance chains. Using instance chains, we could rewrite Swierstra’s subtyping example as:

Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifications—Applicative (functional) languages; F.3.3 [Logics and Meanings of Programs]: Studies of Program Constructs—Type structure General Terms

Design, Languages

instance f else f :<: else f :<: else f :<:

Keywords Qualified types, Type classes, Overlapping instances, Fucntional dependencies, Haskell

1.

:<: f (g :+: h) if f :<: g (g :+: h) if f :<: h g fails

Our version expresses the alternation between the three instances directly instead of relying on the overlapping instances mechanism. As a result, it recurses on both sides of the :+: operator, and resolves the overlap between the first and third instances. We can also close the definition of :<: in the last line of the declaration. This example highlights the major features of instance chains: explicit alternation within instance declarations, and explicit failure in both predicates and instance declarations. We argue that reasoning about programs with instance chains is simpler than reasoning about programs with overlapping instances despite the additional syntax. This paper proceeds as follows. Section 2 describes type classes and some frequently used extensions. In the process, we develop an intuitive semantics for type classes (following the lead of Jones and Diatchki [7]). We then examine several interesting examples of type-class programming that use overlapping instances and functional dependencies. We identify places where existing type-class programming techniques are unclear or could be improved. Section 3 describes the type class system implemented by our prototype tool ilab. In designing ilab, we identified some usage patterns that are implemented in Haskell using overlapping instances, and made those patterns expressible directly, simplifying coding and removing the need for overlapping instances. The key

Introduction

Type classes are a widely used, studied, and extended feature of the Haskell programming language. Some extensions, such as multiparameter type classes, functional dependencies, and type functions, have been extensively studied and debated. In contrast, the overlapping instances extension has received relatively little attention, despite its use in several interesting examples of type-level programming. One example of overlapping instances is the “smart constructors” in Wouter Swierstra’s solution to the expression problem in Haskell [18]. We discuss this example in detail in Section 2.2.2, but preview that discussion here. His solution uses a coproduct con-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA. Copyright © 2010 ACM 978-1-60558-794-3/10/09. . . $10.00

375

features of ilab are: instance chains, allowing programmers to express alternation directly in instance declarations; and explicit failure, allowing programmers to define and test when predicates do not hold. We explain these features and their consequences. We revisit the earlier examples of type-class programming, showing how to simplify and improve them using instance chains. Of course, this is not the first attempt to simplify type-level programming in Haskell: much entertainment has resulted from the odd or incompletely specified interaction of otherwise-reasonable extensions to the Haskell type-class system. To avoid repeating this experience, Section 4 formalizes a set-theoretic semantics for type classes. We highlight several places where instance chains simplify reasoning about type classes compared to overlapping instances. We also state properties, such as soundness and completeness, that link the implementation of a type-class system to its semantics, and provide a basis for programmers to reason about type-class programs. In the process, we connect the semantics of type classes to Jones’ theory of qualified types. Section 5 discusses the implementation of ilab and describes the algorithms that ilab uses to validate sets of instances and to (attempt to) prove predicates. Section 6 discusses related work, while Section 7 discusses future work and concludes.

2.

Background

2.1

Type classes

erators), they proposed that type classes could also apply to more than one parameter. For example, the multi-parameter type class class Elems c e where . . . could describe the relation that the elements of (collection) c have type e. This class might be populated for lists: instance Eq t ⇒ Elems [t] t where . . . and for sets: instance Ord t ⇒ Elems (Set t) t where . . . For this example, we assume Sets are implemented by balanced binary trees, and so a type t must have an ordering before we can construct a value of type Set t. The type class Ord captures this constraint. Just as single-parameter type classes can be interpreted as sets of types, multi-parameter type classes can be interpreted as relations on types (i.e., sets of tuples of types). Assuming the instances for Eq above, and that we have an instance of Ord for Int, we would expect Elems to include the following subset of Type × Type {([Int], Int), (Set Int, Int), ([[Int]], [Int]), . . . } 2.1.2

One of the operations of the Elems class might be an insert function with type

Type classes [21] provide an extensible mechanism for giving principal types to overloaded functions. For instance, we can define a type class for equality:

insert :: Elems c e ⇒ e → c → c Using it, we could write the function

class Eq t where (==) :: t → t → Bool

insert2 c = insert True (insert ’x’ c)

The type of == is now constrained, or qualified, by a predicate mentioning the Eq class:

to insert both the Boolean constant True and the character constant ’x’ into the collection c. This function has the type:

(==) :: Eq t ⇒ t → t → Bool

(Elems c Bool, Elems c Char) ⇒ c → c

We can explain this qualified polymorphic type using set notation. The == function can assume types from the set

While one could imagine a collection type c that satisfied both qualifiers, we may wish to require homogeneity. That constraint can be expressed by adding a functional dependency [6, 10] to the definition of Elems:

{t → t → Bool | t ∈ Eq} We conclude that one should view type classes as specifications of sets of types, not just as tools for typing overloaded functions. Type classes are populated by instance declarations. If we had a primitive integer comparison function primIntEq, then we could write an Eq instance for Int as follows

class Elems c e | c → e where . . . This requires that the value of parameter c uniquely determines the value of parameter e, or, equivalently, that for any two predicates Elem c e and Elem c’ e’, if c = c’, then e = e’. This would make the definition of insert2 above a type error, because it would require that Char = Bool The functional dependency is a property of the relation itself, not of the constraints on it; for example, the Elems class from Section 2.1.1 had this functional dependency, even though we had not yet added the constraint. However, were we to add an instance

instance Eq Int where x == y = primIntEq x y Instance declarations themselves may use qualified polymorphism. For example, the definition of Eq for lists reads: instance Eq t ⇒ Eq [t] where [] == [] = True (x:xs) == (y:ys) = x == y && xs == ys _ == _ = False

instance Elems [Int] Char where . . . that interpreted characters by their ASCII values, then both the predicates Elems [Int] Int and Elems [Int] Char would hold and the relation would no longer have the functional dependency. With the functional dependency constraint on the Elems class, a program could not contain both this instance and the instance Elems [t] t from the previous section.

As type classes correspond to sets of types, instance declarations correspond to assertions about those sets. The first declaration asserts that Int ∈ Eq. The second asserts that t ∈ Eq =⇒ [t] ∈ Eq. Together, these assertions require that Eq include the subset {Int, [Int], [[Int]], . . . } of the set Type of all types. There have been numerous proposals to extend Haskell’s class system. In the next sections, we discuss those relevant to our work. 2.1.1

Functional dependencies

2.1.3

Overlapping instances

Two instances overlap if they could apply to the same predicate. For example, consider a type class C with the following instances:

Multi-parameter type classes

instance C (a, [b]) where . . . instance C ([a], b) where . . .

Although Wadler and Blott [21] focus on type classes with a single parameter (which correspond to sets of types with associated op-

376

class Lte m n b | m n → b instance Lte Z (S n) T instance Lte (S n) Z F instance Lte m n b ⇒ Lte (S m) (S n) b

Either of these instances could be used to solve the predicate C ([a], [b]). However, the compiler has no guarantee that the class methods are implemented equivalently for both instances, and so a program with both instances may have multiple distinct interpretations. To avoid this kind of (potential) incoherence, Haskell 98 prohibits any overlap between instances. This restriction is sometimes inconvenient. The Show class includes types whose values have a textual representation:

As indicated by the functional dependency, Hallgren has actually defined the characteristic function of the ≤ relation, using additional type constructors T and F to represent the corresponding Boolean values. This allows him more flexibility in using the Lte class because he can now determine, not only when one number is less than or equal to another, but also when that property fails. Hallgren goes on to use the Lte class to define insertion sort at the type level. However, the Haskell implementation that he was using (Hugs 98) could not solve the type constraints in his insertion sort example. His code works in ilab without modification, so we do not reproduce it here. Instead, we try to define another operation on Peano numbers: greatest common divisor. This is not an arbitrary choice: for example, work on typing low-level data structures in Haskell has relied on a type-level GCD operator [1]. We begin by defining a (bounded) subtraction operation:

class Show t where show :: t → String Haskell’s syntax for lists surrounds the elements with brackets and separates them with commas—for example, [1,2,3]. We could write a Show instance that used this syntax: instance Show t ⇒ Show [t] where . . . Haskell also has special syntax to allow lists of characters to be written as strings, as in "string", for example. We might like to add a special instance of Show to handle this case: instance Show [Char] where . . .

class Subt m n p | m n → p -- p = m - n instance Subt Z n Z instance Subt m Z m instance Subt m n (S p) ⇒ Subt m (S n) p

but that would not be allowed because this instance overlaps with the more general instance. Peyton Jones et al. [13] describe an extension to the Haskell type-class system that allows instances to overlap as long as one of them is more specific than the other, using substitutions to make the notion of “more specific” precise. Given two instances

We use this to implement Euclid’s algorithm for GCD: class Gcd m n p | m n → p -- p = gcd(m,n) instance Gcd m m m instance (Lte n m T, Subt m n m’, Gcd m’ n p) ⇒ Gcd m n p instance (Lte n m F, Subt n m n’, Gcd m n’ p) ⇒ Gcd m n p

instance Q1 ⇒ P1 where . . . instance Q2 ⇒ P2 where . . . the instances overlap if P1 ∼ P2 (i.e., P1 unifies with P2 ) The first instance is more specific than the second if there is a substitution S such that P1 = S P2 but there is no substitution T such that T P1 = P2 . This extension would allow the two instances of Show, but would prohibit the two instances of C at the beginning of this section because neither is a substitution instance of the other. A full description of how overlapping instances affect the semantics of type classes is beyond the scope of this paper; however, we do mention some of the difficulties in Section 2.3. 2.2

However, both GHC and Hugs reject this trio of instances. While it is true that the conclusions of the second and third instances (trivially) unify, there is no actual overlap between those instances. For both instances to apply to the same predicate Gcd m n p, both the predicates Lte n m T and Lte n m F would have to hold. However, the functional dependency in Lte makes it impossible for both predicates to hold.

Type-class programming

2.2.2

This section describes two (simplified) examples from the literature that use the extensions of the Haskell type-class system described earlier. We focus on examples that use overlapping instances because of their relative complexity in both implementation and semantics. We will return to these examples in Section 3 to demonstrate instance chains. 2.2.1

The expression problem

In this section, we return to the example presented in the introduction. We describe its context, a Haskell solution to the expression problem that relies on multi-parameter type classes and overlapping instances, and highlight the difficulties these extensions introduce. The expression problem [20] is a benchmark for comparing language expressiveness and modularity. The starting point is to define, by cases, a data type for arithmetic expressions, as well as an operation over that data type. For example, the data type might contain integer constants and addition, and the operation might be evaluation, consuming an expression and generating an integer value. The challenge is to extend the data type with both a new case (such as multiplication) and a new operation (such as pretty-printing). This extension should be done without changing or recompiling the original code, and without losing static type safety. Though definition of types by cases is standard in both functional and object-oriented languages, the expression problem is usually challenging in either paradigm. In many functional languages, adding a new case to an existing data type requires changing the definition of the data type and all the functions that use it; in many object-oriented languages, adding a new operation requires changing the definition of the base class and its subclasses. Wouter Swierstra proposed a Haskell solution to the expression problem in “Data Types a` la Carte” [18]. His solution works by

Type-level arithmetic

In this section, we describe the implementation of several mathematical operations at the type level using Peano arithmetic and type classes, based on work by Thomas Hallgren [2]. We begin by representing Peano numbers at the type level using two data types, one for zero and one for successor. We do not provide value-level constructors for these types because we only intend to use them at the type level. data Z; data S n Similarly, we will introduce types to represent Boolean values: data T; data F Hallgren defines a class Lte to implement the ≤ relation at the type level as follows:

377

constructing coproducts of functor type constructors (using a type constructor :+:), injecting values into these coproducts (using a :<: type class), and then defining operations over coproducts using one type class per operation. We highlight some details that arise in the construction of coproducts. The type constructor f :+: g represents the coproduct of the functor type constructors f and g, and is defined similarly to the standard Haskell Either type:

instance (f :<: g) ⇒ f :<: (g :+: h) where inj = Inl ◦ inj

However, (B-2) and (C) are each substitution instances of the other. The Haskell compiler no longer has a way to order these instances, and so a program containing (B-2) and (C) would be rejected. 2.3

The type constructors for various possible expressions will also be functors but the use of functors is irrelevant to the remainder of this presentation. Suppose that we have a type constructor Const for integer constants and a type constructor Add for additions. The type of an expression containing either constants or sums could be built using the coproduct Const :+: Add. Constants would be injected into the expression type using the Inl value constructor, and sums using the Inr constructor. We could extend the expression type by adding a new type to the coproduct. For instance, if the type Multiply represents multiplications, we could construct expressions with the coproduct Const :+: (Add :+: Multiply). We would still inject constants into these new expressions using Inl, but additions would be injected using Inl ◦ Inr and multiplications using Inr ◦ Inr. It is somewhat tiresome to construct different injection functions for each type in each possible coproduct. Swierstra alleviates this by defining a type class f :<: g to indicate that there is an injection from f e to g e for any type e. The class is defined by:

Lack of modularity. In “Data Types a` la Carte”, the three instances of :<: are presented together, and in order from most to least specific. However, Haskell imposes no requirement that the instances be presented together, or in any particular order, or even in the same module. A programmer has no way to know whether a particular instance will be overlapped before it is used. In fact, GHC only attempts to determine which instance is most specific at call sites, and so will accept ambiguously overlapping instances—that is, cases where two instances apply to the same predicate and neither is more specific than the other—and not report an error unless the programmer attempts to use an overloaded function at one of the ambiguous types. Ambiguity could be introduced in a library but not discovered until some client of the library uses one of the types at which the instances are ambiguous. Logical consistency. The syntax of instance declarations in Haskell suggests logical implication. For example, the Eq instance for lists begins:

class f :<: g where inj :: f e → g e

instance Eq t ⇒ Eq [t]

Swierstra populates this class using three instances. The first says that :<: is reflexive, with the obvious injection:

and can be read as an implication t ∈ Eq =⇒ [t] ∈ Eq. Even in Haskell 98, this interpretation does not completely cover the meaning of the declaration. Because Haskell 98 instances cannot overlap, the only way that [t] can be in Eq is for t to be in Eq, so a more accurate interpretation would be t ∈ Eq ⇐⇒ [t] ∈ Eq. The meaning of overlapping instances is more obscured. A particular instance only applies if a more specific instance could not be found. Furthermore, if the preconditions of the most specific instance are not met, the compiler does not check to see if less specific instances might be applicable but instead immediately issues an error. As a result, it is impossible to interpret the meaning of an individual instance declaration without referring to the other instances in the program. While the syntax of instances still suggests a logical interpretation, applying that interpretation gives an incomplete and potentially incorrect meaning.

(A)

The second instance checks the left-hand side of the :+: type: instance f :<: (f :+: g) where inj = Inl

(B)

The final instance recurses on the right side of the :+: type: instance (f :<: h) ⇒ f :<: (g :+: h) where inj = Inr ◦ inj

Challenges

Haskell programmers have three main challenges when using overlapping instances. We summarize them here.

data (f :+: g) e = Inl (f e) | Inr (g e)

instance f :<: f where inj = id

(B-2)

(C)

Swierstra comments on the overlap between instances (B) and (C): because (B) is a substitution instance of (C), predicates will be checked against (C) only if they fail to match (B), and the set of instances will behave as expected. More interestingly, instances (A) and (C) overlap at predicates of the form (f :+: g) :<: (f :+: g) but neither is a substitution instance of the other. As a result, Hugs will reject this set of instances outright for having unresolved overlap. GHC accepts the instances, but attempting to use the inj function with such a predicate will result in a type error. This error occurs at the usage of inj, not at the ambiguous overlap in the definition of :<:. Although the :+: type constructs a coproduct of types, the subtype relation :<: cannot fully exploit it because the recursive case only descends the right-hand side. Thus, while the predicate

Lack of specification. Peyton Jones et al. [13] describe some issues introduced by overlapping instances. However, they do not consider the interaction of overlapping instances with some other type-class system features, such as improvement. Research on functional dependencies in class systems [6, 7, 17] generally does not mention overlapping instances, and recent work by Schrijvers et al. [14] related to type classes explicitly excludes overlap. Without a specification to define the correct behavior, code must be tailored to a particular implementation. For example, Kiselyov et al. [9] discover significant incompatibilities between GHC and Hugs and end up tailoring their code to GHC. Similarly, Swierstra’s :<: instances are not accepted by Hugs.

Sum :<: (Const :+: (Sum :+: Product)) holds, the predicate Sum :<: ((Const :+: Sum) :+: Product)

3.

Features of the ilab type-class system

As part of the High Assurance Systems Programming1 project at Portland State University, we are designing a dialect of Haskell (called Habit) for use in low-level systems programming tasks.

does not, because it requires recursion on the left-hand side of the :+: operator. This forces a list-like use of the :+: constructor; for instance, if t and u are already coproducts, we would not be able to inject components of t into the coproduct t :+: u. To fix this, we could replace instance (B) with

1 http://hasp.cs.pdx.edu

378

One of our goals is to preserve and expand the possibilities of type-class programming while simplifying the underlying model of type classes. As background to this effort, we surveyed type-level programming in Haskell, using both the existing research literature and the Hackage database of Haskell libraries as resources [11]. Based on the results of our survey, we have developed the ilab type-class system and prototype implementation. We use ilab to experiment with features of the Haskell type-class system; it is not a complete implementation of either Haskell or Habit type classes, but implements features central to both. Despite their history, the features of ilab are not tied to other features of Habit; they could just as well be applied to Haskell or other Haskell dialects. The remainder of this section describes the features of ilab and shows how they simplify and improve the examples from Section 2.2. 3.1

This is the form we will use for the remainder of the paper. Next, we explore the new features introduced by ilab, and some of their consequences. 3.1.1

Design of the ilab type-class system

The ilab class system is based on the Haskell 98 class system, extended with overlapping instances and functional dependencies. However, rather than support overlapping instances in ilab, we added new features based on the usage patterns implemented using overlapping instances in Haskell. These features support and extend Haskell-style type-level programming while avoiding the complexity that would be introduced by overlapping instances. For the remainder of the paper, we use Habit instance syntax for examples using ilab features, and continue to use Haskell syntax for examples that do not rely on any ilab-specific functionality. The Habit syntax for instance declarations is given by the following BNF-like grammar, where non-terminals have initial caps, optional elements are surrounded by brackets, and optional repeatable elements are surrounded by braces. Pred Context Clause Chain

::= ::= ::= ::=

Explicit alternation

Many of the examples we found use overlapping instances to implement alternation between instances. This approach is fragile, obscures the programmer’s intention, and limits the algorithms the programmer can encode. In ilab, a class can be populated by multiple, non-overlapping instance chains, where each chain may contain multiple clauses (separated by the keyword else). Unlike between chains, we make no limitation on overlap between the clauses within a single chain. During instance selection, the clauses within an instance chain are checked in order. Using instance chains allows clearer expression of programmer intentions and simplifies the encoding of algorithms that would be complex or impossible to express with overlapping instances. For example, in Section 2.2.2, we presented the class :<: and the three instances used to populate it. These instances implement a simple conditional by making the alternative clause more general than the consequent. ilab allows a more direct expression of the conditional: instance f :<: f where . . . else f :<: (f :+: g) where . . . else f :<: (g :+: h) if f :<: h where . . . 3.1.2

Explicit failure

Some of the examples we found attempted to encode failure of the instance search [9]. However, lacking a mechanism to encode failure directly, the examples used a combination of the class and module system to prevent the user from solving certain constraints. While this approach works, it leads to confusing error messages and cannot be used as a building block for more complex instance schemes. Making failure explicit in both predicates and instance declarations significantly simplifies coding these patterns. By defining the characteristic function of the ≤ relation instead of the relation itself (as discussed in Section 2.2.1), Hallgren could express both properties of the form m ≤ n and ¬(m ≤ n). With explicit failure, we can define the ≤ relation directly:

ClassName Type {Type} [fails] Pred {, Pred} Pred [if Context] [where Decls] instance Clause {else Clause}

There are, of course, additional constraints on instance chain declarations—all the clauses in a chain must refer to the same class, and fails clauses cannot contain method definitions—as well as other restrictions as in Haskell. Habit’s syntax differs from that of Haskell 98 in three ways: 1. Predicates may include the fails keyword, indicating that the given type tuple is not in the named class;

instance Lte Z n else Lte (S m) (S n) if Lte m n else Lte m n fails

2. Clauses may be chained together using the else keyword, allowing a programmer to indicate explicit alternation; and,

Because we implement the relation directly, we no longer need the third parameter to the Lte class or the functional dependency. One use of explicit failure is in the definition of closed classes. For example, at one point [16], the crypto package defined a class AESKey and three instances for types Word128, Word192, and Word256, the only valid key lengths for AES encryption. To prevent users from adding invalid types to the class, AESKey was not exported. As a consequence, users could not write type signatures such as:

3. The instance being defined appears to the front of the instance declaration, calling attention to it even in the presence of long or complex preconditions. Habit uses additional features, such as the functional notation proposed by Jones and Diatchki [7]. For example, the Haskell instance declaration: instance (Lte n m T, Subt m n m’, Gcd m’ n p) ⇒ Gcd m n p

AESKey a ⇒ a → ByteString → ByteString

can be expressed in Habit as:

In ilab, we can close the AESKey class with the following instance chain:

instance Gcd m n = Gcd (Subt m n) n if Lte n m T

instance AESKey Word128 else AESKey Word192 else AESKey Word256 else AESKey a fails

Because it is a prototype tool and functional notation is orthogonal to our other goals, ilab does not support rewriting passes such as those used to implement functional notation. In ilab we would have to use the following version instead:

No additional instances of AESKey can be added because they would overlap with the (last clause of the) existing instance, so there is no need to hide the class.

instance Gcd m n p if Lte n m T, Subt m n m’, Gcd m’ n p

379

3.1.3

Backtracking search

of these proposals, while sensible as proposed, have led to unexpected interactions with other features, or have proven difficult for programmers to understand. We hope to avoid a similar fate for instance chains by defining a semantics for type classes and for instance chains, providing a basis for understanding their use and implementation and a foundation for future research in type classes. Previous work has focused on translating programs in a language with type classes into programs in a language without type classes (for example, by introducing dictionaries of type-specific method implementations and transforming qualifiers into extra parameters [21]). This approach conflates the meaning of type classes with their implementation, making it difficult or impossible to define properties of type classes without reference to a particular implementation, or to prove properties of the implementation itself. This conflation is particularly unfortunate when it comes to understanding the interaction of type-class features or when the implementation itself is suspect, such as in the interaction between functional dependencies and overlapping instances. This section elaborates the intuitive understanding of type classes as relations on types to give a full semantics for ilab type classes. We follow a standard approach from mathematical logic: first, we characterize models of type classes. Then, we define a property that holds when a given model describes a particular typeclass program; we use this property to capture properties of implementations such as soundness or completeness. This approach does not attempt to capture the details of type class implementations such as substitutions, improvement, simplification, etc. Rather, it describes the meaning of type classes and provides a basis both for reasoning about programs that use type classes and for evaluating type-class implementations.

Haskell instance search never backtracks: if no two instance heads unify, then no predicate could be solved by more than one instance. However, combined with overlapping instances, this complicates reasoning about instances. Even if an instance could apply to a predicate, it will not be checked if a more specific instance exists anywhere in the program, and failure to prove the preconditions of the most specific instance causes instance search to fail rather than to attempt to use less specific instances. ilab instance search backtracks when it can disprove the precondition of an instance (either because of a fails clause or because of a functional dependency). When backtracking, ilab checks clauses within an instance chain in order. The order in which ilab checks instance chains is unimportant because clauses in different chains are not allowed to overlap. 3.2

Type-class programming, revisited

In this section, we demonstrate how the examples from Section 2.2 are changed and improved using the features of ilab. Section 2.2.1 includes several examples of implementing typelevel arithmetic using type classes. The first example is the characteristic function for the ≤ relation. Section 3.1.2 described how this example can be improved using instance chains. Next, we attempted to define a Gcd class. We define the class as before, but populate it with a single instance chain that uses the Lte relation: 1 2 3 4 5 6

instance Gcd m m m else Gcd m n p if Lte n m, Subt m n m’, Gcd m’ n p else Gcd m n p if Lte n m fails, Subt n m n’, Gcd m n’ p else Gcd m n p fails

4.1

As the clauses overlap, we have combined them into an instance chain. We also add the clause at line 6, closing the Gcd class. Section 2.2.2 describes a solution to the expression problem. The solution relies on a type constructor :+: to construct coproducts of types, and a type class :<: for subtypes. However, the implementation of :<: is asymmetric—it recurses only on the righthand side of a :+: type. We can implement it symmetrically: 1 2 3 4 5 6 7 8

instance f :<: f where inj = id else f :<: (g :+: h) fails if f :<: g, f :<: h else f :<: (g :+: h) if f :<: g where inj = Inl ◦ inj else f :<: (g :+: h) if f :<: h where inj = Inr ◦ inj else f :<: g fails

Lines 1-2 provide a base case, and correspond to instance (A) in the original implementation. Line 8 serves as the other base case. We explicitly close the class to ensure that we have evidence for backtracking in the middle clauses of the instance. Lines 6-7 recurse on the right-hand side of a :+: constructor, and correspond to instance (C). Lines 4-5 replace instance (B). Unlike the original implementation, this clause recurses on the left-hand side of the :+: constructor. If both f :<: g and f :<: h hold, ilab would select the left injection because of the ordering of the respective clauses. This behavior may be surprising to programmers, so we add the additional (but optional) clause at Line 3 to rule out this kind of injection completely.

4.

Modeling type classes

Single parameter type classes, such as Eq or Ord, are naturally modeled by sets of types. Let Type refer to the set of all types. Writing M(Eq) for the model of the Eq class, we can say that M(Eq) ⊆ Type or, equivalently, M(Eq) ∈ P(Type). This approach extends to multi-parameter type classes by using relations on types instead of sets of types. Just as (the models of) Eq and Ord are subsets of Type, (the model of) a class like Elems (see Section 2.1) is a subset of Type × Type, or equivalently, M(Elems) ∈ P(Type2 ). A threeparameter class would be modeled by an element of P(Type3 ), and so forth. The number of arguments to a class is called its arity, and we will write arity(C) (where C ranges over the set of class names ClassName) to refer to the arity of class C. For example, we have arity(Eq) = 1 and arity(Elems) = 2. Using the arity function, we can write a general rule that captures the examples so far: for a class C, we have M(C) ∈ P(Typearity(C) ) A program will typically contain a number of type classes. To model an entire program, we use a function from ClassName to models of the individual classes. We can then describe a model of a program as a dependently typed function M : (C : ClassName) → P(Typearity(C) ) This is not the only possible structure for M; we will discuss some of the design choices further when describing the handling of constraints. Next, we define a family of relations M |= x that hold if “M models x”. We develop this family of relations “bottom-up”, starting from single predicates and working towards full programs. Predicates. Predicates are the simplest parts of a type-class system. We define predicates with the following grammar

A semantics for type classes

f ::= holds | fails

The history of Haskell type-class research is littered with proposals to extend, enhance, or simplify writing type-class programs. Some

π ::= C ~τ f

380

Flags Predicates

ε@~τ = ε ( ((∀~x. P ⇒ C ~ υ f ) ; α)@~τ =

(S P ⇒ C (S ~ υ ) f ) ; α@~τ α@~τ

if ∃S. dom(S) ⊆ ~x ∨ S ~ υ = ~τ otherwise

Figure 1. The restriction of an axiom α to the type tuple ~τ With the type constructors {Int, []}, where [] constructs the list type, we would generate the following concrete axioms

Here ~τ is an (arbitrary-size) tuple of types. As Haskell predicates cannot express failure, the Haskell predicate C~τ is equivalent to the ilab predicate C ~τ holds. Predicates correspond directly to the model of type classes:

((Eq Int holds) ⇒ Eq [Int] holds) ; ε ((Eq [Int] holds) ⇒ Eq [[Int]] holds) ; ε .. .

M |= (C ~τ holds) ⇐⇒ ~τ ∈ M(C) M |= (C ~τ fails) ⇐⇒ ~τ ∈ / M(C)

We begin formalizing specialization by defining the syntax of concrete axioms, which follows the syntax of axiom schemes closely, but omits the quantifiers.

The presence of flags within predicates makes possible a simple syntactic definition of the negation of a predicate: C ~τ holds = C ~τ fails Contexts.

C ~τ fails = C ~τ holds

γ ::= (P ⇒ π) ; γ | ε

Contexts, or lists of predicates, occur frequently: P ::= ~π

Whether ε denotes a concrete axiom or axiom scheme should be obvious from context. Next, we define the restriction of an axiom α to a particular type tuple ~τ , written α@~τ . This operation removes the polymorphism from α by attempting to instantiate the type variables in each clause so that the instance head matches ~τ ; when that is not possible, the clause is dropped. The definition of α@~τ is shown in Figure 1. (We refer to the variables mentioned by a substitution S as dom(S), and abuse notation by treating the vector of quantified variables as a set.) We can now give the concrete axioms generated from a given axiom scheme. An empty axiom ε generates exactly one concrete axiom, also ε. If all the clauses in a (non-empty) axiom scheme α are for class C, then the set of concrete axioms generated from α is

Contexts

As above, ~π is an arbitrary-size tuple of predicates. We model contexts as conjunctions. A context is modeled if all of its predicates are modeled: M |= P ⇐⇒ ∀π ∈ P. M |= π We write the negation of a context P as P. The negation of a context is modeled if the negation of one of its predicates is modeled: M |= P ⇐⇒ ∃π ∈ P. M |= π Axioms. We turn to the axioms of ilab, instance chains. The syntax of instance chains is given by: α ::= (∀~x. P ⇒ π) ; α | ε

Concrete axioms

{α@~τ | ~τ ∈ Typearity(C) }

Axiom schemes

The set of concrete axioms for a given program may be infinite, but because we can determine whether a concrete axiom was specialized from a particular axiom scheme by unification, it is still recursive. We now describe the modeling of concrete axioms. The empty axiom is trivially modeled:

Because instance chains may contain polymorphic clauses, we refer to them as axiom schemes. Rather than attempting to model axiom schemes directly, we first specialize them to concrete axioms, removing any polymorphism in the process. Intuitively, we specialize an axiom scheme α by enumerating each type tuple that matches the arity of the class mentioned in α, and then attempting to restrict each clause to that tuple. Some examples may clarify specialization. Consider the instance chain

M |= ε The concrete axiom (P ⇒ π) ; γ is modeled by the two disjuncts that it represents: if P is modeled, then π must be modeled; alternatively, if P is modeled, then γ must be modeled:

instance C Int else C Bool else D t ⇒ C t

M |= ((P ⇒ π) ; γ) ⇐⇒ M |= P =⇒ M |= π ∧

which corresponds to the axiom

M |= P =⇒ M |= γ

(() ⇒ C Int holds) ; (() ⇒ C Bool holds) ; (∀t. (D t holds) ⇒ C t holds) ; ε

Axioms correspond to statements about the inclusion or exclusion of particular tuples within the model of a class. Other aspects of type-class systems can be modeled as properties of all the tuples. We describe several such properties next.

where we have omitted empty quantifiers. Note that the last clause contains qualified polymorphism. If our set of types were limited to {Bool, Int, Float}, we would generate the following concrete axioms from this instance chain:

Functional dependencies. Our implementation supports the use of functional dependencies, both to constrain instance declarations and to introduce improvement into the deduction algorithm. Functional dependencies were originally proposed for class systems as a mechanism to induce improving substitutions [6]; these improvements, in turn, are only valid because of properties of the underlying relations [10]. Here, we formalize functional dependencies as properties of the models of classes. The Elems class from Section 2.1

(() ⇒ C Int holds) ; ((D Int holds) ⇒ C Int holds) ; ε (() ⇒ C Bool holds) ; ((D Bool holds) ⇒ C Bool holds) ; ε ((D Float holds) ⇒ C Float holds) ; ε Note particularly the lack of quantifiers: there is no polymorphism in concrete axioms. Alternatively, consider the instance chain instance Eq t ⇒ Eq [t]

class Elems c e | c → e

381

has a functional dependency stating that the parameter c determines the parameter e. We can phrase this with the same language used to describe functions: given two predicates Elems c e and Elems c’ e’, if c = c’ then e = e’. We generalize the syntax of functional dependency constraints as follows: X, Y ⊆ N

declarations has no models. On the other hand, we do not constrain predicates that are not mentioned in the program. For example, if a program contains neither an assertion that C Bool holds nor an assertion that C Bool fails, then that program could admit (at least) two models: one in which Bool ∈ M(C) and another in which Bool ∈ / M(C). We say that A|∆ is consistent if it has at least one model. A predicate π is a theorem of A|∆ if it holds in all models of the program; that is:

Index sets

δ ::= C : X

Functional dependencies

Y

π is a theorem of A|∆ ⇐⇒ (M |= A|∆ =⇒ M |= π)

The Elems class would generate the constraint Elems : {0} {1}, indicating that the 0th parameter of the class determines the 1st parameter. The class:

Informally, a particular implementation of a type-class system is sound if it proves only theorems and complete if it proves all theorems. To formalize this, we must formalize our notion of the implementation of a type-class system.

class F t u v | t v → u would generate the constraint F : {0, 2} {1}. Modeling these constraints is a straightforward extension of the single-parameter version given above. M |= C : {X}

4.2

Predicates, evidence and proof

The preceding section focusses the meaning of type classes; this section builds upon Jones’ theory of qualified types to begin describing their implementation. Jones [4] describes the extension of the polymorphic λ-calculus with qualified types. He uses a notion of ‘evidence’ to close the gap between the qualifiers in a type and their implementation in a term. For example, the evidence for a type-class predicate Eq Int is a function that implements the equality check for integers, while the evidence for a subtype predicate t ⊆ t0 is a function embedding values of type t into values of type t0 . To capture the use of evidence in computations, Jones extends the term language of the polymorphic λ-calculus with expressions for evidence abstraction, application, and construction. For our purposes, we only need to consider evidence construction; the remainder of Jones’ theory can be applied to ilab intact. Jones represents evidence construction with a three-place relation P e : π indicating that e is evidence for predicate π, given evidence for the predicates in P. He assumes a set of base axioms such as ∅ Eq Int and Eq t Eq [t]. We will use an alternative relation A|∆ ` e : π that diverges from his in two ways:

{Y} ⇐⇒ ∀~τ , ~ υ ∈ M(C). ~τ |X = ~ υ |X =⇒ ~τ |Y = ~ υ |Y

If ~z is a tuple and X is a subset of N, then we write ~z|X to refer to the tuple consisting of those elements of ~z indexed by the elements of X. Appealingly, this is exactly the definition of a functional dependency used in the theory of relational databases [10]. Functional dependencies are not the only possible use of the constraint mechanism; for example, it could also be used to model class arities, kind signatures, or Haskell-style superclasses. We describe two of those applications next. Arities. We have chosen to bake the arity of classes into the definition of models. Alternatively, we could have chosen models over arbitrary sequences of types, with the following structure: M : ClassName → P(Type∗ ) This definition would allow the model of a single class to contain tuples of various lengths. We could then enforce separate arity constraints on classes. An arity constraint of the form arity(C) = x would require that any tuple in the model of C have length x. We could model this constraint by:

• His set of base axioms corresponds to our model of a program,

so we augment the evidence relation with the program A|∆; and,

M |= arity(C) = x ⇐⇒ ∀~τ ∈ M(C). length(~τ ) = x

• We will omit the set of assumptions P, as it is trivial to reintro-

Note that, unlike the definition of |= heretofore, this relation expresses a property of all tuples in the model of a class.

duce and will play no further role in our discussion. It would be valuable for implementing features of Haskell beyond the scope of this paper, such as existential types or GADTs.

Kinds. A similar approach could be used to capture the kind signature of a type class. Suppose that the type system were equipped with some set of kinds ranged over by k, and that, for any kind k, the set Typek ⊆ Type, is all the types of kind k. In this setting, classes are assigned kind signatures

Evidence construction does not precisely model the process of proving that a predicate exists. There are a number of predicates in ilab that generate no evidence, such as classes without methods or negative predicates. However, we do not wish for all negative predicates to be trivially provable simply because their evidence can always be constructed. Also, evidence construction involves details that are irrelevant for our purposes, such as the implementations of class methods. To avoid these difficulties, we will use proof expressions instead of evidence. Proof expressions capture the reasoning steps made by the deduction algorithm, and there must be a translation from a proof for a predicate π to evidence for π. Proofs may also capture details that are not observable from the generated evidence, such as recursion, naming of common subexpressions, etc. To connect differences in proof expressions to differences in evidence, we introduce a notion of equivalence for proof expressions, written p ∼ = p0 . We require that, if p ∼ = p0 , then the evidence generated from p is not observably different from the evidence generated from p0 . Note that this relation is one-way; it is not likely that p∼ = p0 for arbitrary proofs p and p0 that generate observably equivalent evidence. Equivalence is a statement only about the evidence

C : ~k th

where the n element of the kind signature is the kind of the nth argument to the class. Kind signatures are validated by: M |= C : ~k ⇐⇒ ∀~τ ∈ M(C). ∀i. τi ∈ Typeki Programs. We model (the classes and instances of) a program with a pair A|∆, consisting of a set of axioms A and a set of constraints ∆. In ilab, the constraint set will only contain functional dependencies; however, an application to Haskell or Habit would include additional constraints such as kind signatures, superclass constraints, etc. A program is modeled when all of its axioms and all of its dependencies are modeled: M |= A|∆ ⇐⇒ (∀α ∈ A. M |= α) ∧ (∀δ ∈ ∆. M |= δ) These rules are not generative. A given program A|∆ may have one, many, or no models. A program with conflicting instance

382

generated from proofs, not about what they prove; that is: 0 A|∆ ` p : π ∧ p ∼ =⇒ 6 A|∆ ` p0 : π. =p

We will discuss proofs of ilab’s properties following the discussion of the ilab deduction algorithm in Section 5.

Section 5 discusses the details of ilab proof expressions and proof equivalence. We will refer to an algorithm for finding p such that A|∆ ` p : π for given A|∆ and π as a deduction algorithm. The rest of this section will discuss deduction algorithms in general; Section 5 discusses the particular deduction algorithm implemented in ilab and the details of its proof expressions. We can now formalize the notions of soundness and completeness. A deduction algorithm is sound if it only proves predicates that are theorems:

4.3

Application to other type-class systems

As ilab’s extensions to the type-class mechanism could be applied to other languages, the techniques used in the previous subsections to model type classes and to reason about type-class implementations could be applied to other languages, other implementations of Habit, or other type-class systems. Among the goals of ilab was to avoid the complexity of overlapping instances; by applying our modeling techniques to overlapping instances, we can see to what extent we achieved our goal. Overlapping instances are not as modular as ilab’s axioms: to determine whether an axiom applies to a predicate, we must determine whether it is the most specific axiom that matches the predicate. Making this determination requires knowing the axioms in the remainder of the program, so it would not be possible to define the meaning of an axiom without the remainder of the program as context. Overlapping instances also preclude the stability of proofs: because a program can be extended with more specific axioms, the proofs of theorems of the original program may change in the extension. While we expected that making implicit aspects of overlapping instances explicit would reduce complexity from the beginning of the ilab design process, comparing the models and properties of ilab with those of systems with overlapping instances gives a solid basis for this intuition.

∃p. A|∆ ` p : π =⇒ π is a theorem of A|∆ (S OUNDNESS) A deduction algorithm is complete if it can prove any theorem: π is a theorem of A|∆ =⇒ ∃p. A|∆ ` p : π (C OMPLETENESS) Soundness is an essential property of type-class systems because it connects the implementation to the programmer’s model of type classes. We can ensure completeness with sufficient syntactic restrictions on class and instance declarations: for example, Haskell 98’s type-class system is complete. However, these syntactic restrictions make expressing many type-class programs difficult or impossible. Alternatively, some implementations use pragmatic measures to ensure termination, such as a (programmer-adjustable) limit to the total number of deduction steps. In ilab, we make no effort to ensure completeness or termination, ensuring greater expressiveness as a result. We hope to return to this issue in future work, and find a set of restrictions that ensure completeness while allowing more programs than are allowed by other class systems. The evidence generated from a deduction algorithm is used in translating programs with type classes. If the deduction algorithm could generate different evidence to prove the same predicate, then the translated program could have multiple meanings. To avoid this incoherence, any two pieces of evidence generated for the same predicate must be semantically indistinguishable, a property Jones calls Uniqueness of Evidence [5]. The notion of evidence being semantically indistinguishable corresponds to proof equivalence, so we restate this for our purposes as Equivalence of Proof: 0 A|∆ ` p : π ∧ A|∆ ` p0 : π =⇒ p ∼ (E O P) =p

5.

Mechanics

This section describes our prototype implementation of ilab. The presentation is divided into two subsections: Section 5.1 discusses the validation of source axioms and Section 5.2 describes proof expressions and a deduction algorithm for ilab. Functional dependencies will play a larger role in this section than heretofore. Some preliminaries will simplify the remaining discussion. Section 4 used a set ∆ to refer to all the functional dependencies in a program. In this section, we will usually only be interested in the functional dependencies that apply to a particular predicate, and so will use the following (overloaded) function: fundeps∆ (C) = {X

Y|C:X

Y ∈ ∆} ∪ {N

∅}

fundeps∆ (C ~τ f ) = fundeps∆ (C) The set ∆ will be omitted when it is obvious from context. To ensure that fundeps∆ (C) is never empty, we have added the dependency N ∅ to the functional dependencies for any class. This dependency treats all positions as determining, so it will give the behavior expected were there no functional dependencies at all. Later rules will be able to assume that all classes have at least one functional dependency constraint. Of course, any relation satisfies the dependency N ∅, so adding it does not affect the modeling of programs. When considering predicates and functional dependencies, it is useful to consider the predicates without including any of the parameters that are determined by the functional dependency. For instance, to know whether the instances

ilab type classes are open: new axioms or constraints may be added to existing programs, adding to or refining the meaning of classes. To formalize this, we call a program A∗|∆∗ an extension of program A|∆ if: 1. A|∆ and A∗|∆∗ are consistent; and, 2. A ⊆ A∗ and ∆ ⊆ ∆∗. Our definition differs from the standard definition of extension in logic in that we require that the exact axioms from A be included in A∗, not just that A∗|∆∗ prove all the theorems of A|∆. As a consequence, we might hope that, if a predicate is a theorem in both programs, then the proofs in each program will be equivalent. We call this property Stability of Proofs: 0 A|∆ ` p : π ∧ A∗|∆∗ ` p0 : π =⇒ p ∼ (S O P) =p

instance Eq t ⇒ Elems [t] t instance Elems [Int] Char

Because the Haskell 98 class system permits no overlap between instances, its proofs are stable. Overlapping instances preclude stable proofs: when more specific overlapping instances are added, the proofs of some predicates will change to use the new instances. By restricting overlap to instance chains, ilab restores stability while still allowing many of the programs that could be written using overlapping instances.

overlap, it is not enough to unify Elems [t] t with Elems [Int] Char. Rather, we must take the functional dependency for Elems into account, and attempt to unify Elems [t] with Elems [Int], which succeeds, in this particular case, showing that the instances do overlap. We can generalize this idea to any relation on predicates R and index set Y by writing πRπ 0 mod Y to indicate the result of πRπ 0 without considering the elements indexed by Y. Formally, we

383

define 0

0

dependencies is broken into multiple independent checks. The “consistency” check is incorporated into ilab’s expanded overlap check. The “covering” check is implemented as described in the last few paragraphs. A final note: while a functional dependency does not inherently include or exclude any tuples from a class, each tuple in a class with a dependency excludes all other tuples that would violate the dependency. This can create multiple avenues to prove that a tuple is excluded from a class: either via a negative axiom, or via a (non-overlapping) positive axiom combined with a functional dependency. Luckily, as these proofs generate the same evidence, we can allow both without jeopardizing Equivalence of Proofs.

0

(C ~τ f )R(C ~ υ f ) mod Y ⇐⇒ (C (~τ |N\Y ) f )R(C (~ υ |N\Y ) f ). The name of this operation is chosen by analogy with modular arithmetic: as arithmetic modulo x does not consider powers of x, so operations modulo the index set Y do not consider the elements indexed by Y. 5.1

Validation

There are two tasks in validating ilab axioms: ensuring that there are no overlaps, and checking that the relevant functional dependencies are respected. To determine whether two instances overlap, we apply a variation of the scheme used in Haskell 98. We say that two instance clauses ∀~x. P ⇒ π and ∀~y. P0 ⇒ π 0 overlap if ∃(X

5.2

Y) ∈ fundeps(π). ∃U. dom(U) ⊆ ~x ∪ ~y U

U

∧ (π ∼ π 0 mod Y ∨ π ∼ π 0 mod Y) U

where we write π ∼ π 0 to indicate that U is the most general unifier of π and π 0 . Note that if π and π 0 mention the same class name C, then fundeps(π) = fundeps(C) = fundeps(π 0 ). Otherwise, π and π 0 cannot unify, so the choice to quantify over fundeps(π) is irrelevant. Our definition of overlap differs from the Haskell definition in two ways. First, ilab axioms have explicit quantifiers, whereas all free type variables in Haskell axioms are implicitly quantified. Second, we take account of the various ways in which predicates can contradict each other. Predicates may overlap if their flags disagree (having proofs that both C~τ holds and C~τ fails would be difficult to model, even though the two predicates do not unify). Predicates may also overlap even if they differ in the determined parameters of some functional dependency, as in the example of Elems [t] t and Elems [Int] Char. Two axioms α and α0 overlap if some clause from α overlaps some clause from α0 . This is as strict as the Haskell 98 restriction on overlap; however, because clauses within a single instance chain are free to overlap, ilab still offers greater expressivity. The overlap check is not enough to ensure that instances do not violate functional dependencies; we must also ensure that any quantified variables in determined positions are actually determined. To do this, we make use of the theory of functional dependencies [7, 10]. Let TV(π) be all the free type variables mentioned in the type tuple of predicate π. The induced functional dependencies, Fπ , of a predicate π are the dependencies {TV(π|X )

TV(π|Y ) | X

α ::= (n : ∀~x. P ⇒ π) ; α

Y ∈ fundeps(π)}

1. J ⊆ JF+ ; and, Y ∈ F and X ⊆ JF+ , then Y ⊆ JF+ .

Now, consider an instance clause ∀~x. P ⇒ C ~τ f . We can ensure that all variables in ~τ are properly determined if: ∀X

Axiom schemes

This differs from the previous syntax only by adding the name n; because names are irrelevant outside construction of proof expressions, this change does not affect the other sections of this paper. A predicate is usually proved because it matches some axiom clause and the preconditions of that axiom are provable. We describe this case with the proof expression n(~p) where n is the name of the axiom clause that matched, and ~p are the proofs of that clause’s preconditions. Alternatively, as discussed above, a negative predicate C ~τ fails may be proven by proving some C ~ υ holds such that for some functional dependency, ~τ and ~ υ agree on the determining parameters but disagree on the determined. We capture this case with the proof expression exclp where p is the proof of the excluding predicate. Finally, axioms that match the target predicate may not apply because their preconditions can be contradicted. We capture that with the proof expression [n, i, p]p0 where n identifies the axiom being skipped, i is the index of the contradicted precondition, p is the proof expression for the contradiction, and p0 is the remainder of the proof. Intuitively, only the positive portion of the proof contributes to the construction of evidence, so we can define equivalence for ilab proofs inductively by ignoring skip steps. Additionally as mentioned in the last section, it may be possible to prove some

Note that, unlike the class constraints, these are functional dependencies over sets of type variables, not over index sets. By extension, for a context P, let FP be the union of the induced functional dependencies for each predicate π ∈ P. The closure of a set J with respect to a set of functional dependencies F, written JF+ is intuitively the set of all elements determinable from J using the functional dependencies in F. Formally, we define JF+ as the smallest set such that: 2. if X

Solving

This section describes the inference algorithm used to prove predicates in ilab. Intuitively, to prove a predicate π, we try each of the available axioms in sequence. At each step, we compare the current axiom (∀~x. P ⇒ π 0 ) ; α to the target predicate π. There are three cases in which we might be able to prove π: either π and π 0 do not match, so the current clause cannot apply to π, but we can prove π from α; or π and π 0 match but we can disprove one of the preconditions in P and prove π from α; or π and π 0 match and we can prove the preconditions. This intuition is somewhat complicated by the presence of functional dependencies. Recall the Elems class from Section 2.1 and suppose we are trying to prove Elems T U for some types T and U. If we can prove Elems T U’ for some type U’6=U, then the functional dependency assures us that we will not be able to prove Elems T U. Similarly, if we are trying to prove that Elems T U fails and can prove that Elems T U’ holds, then the functional dependency assures us that Elems T U fails. Before formally describing the ilab deduction algorithm, we will describe its proof expressions. The structure of ilab proof expressions matches the possible reasoning steps mentioned above. To avoid noise, our proof expressions omit steps in which the current axiom does not match the target predicate. Let n range over some countably infinite source of names. We assume that each axiom clause has a unique identifying name (because these are an artifact of the proof expressions, they would not need to be provided by the programmer), and so we will use the following syntax for axiom schemes:

Y ∈ fundeps(C). TV(~τ |Y ) ⊆ (TV(~τ |X ))+ FP .

We require that all clauses in ilab instance chains pass this check. In previous work on type classes and functional dependencies [7, 17], the process of validating instances against functional

384

∃(X Y) ∈ fundeps(π). S π 0 = π mod Y 0 S π 6= π ∀i. A ` pi : S Pi π is negative (M ATCH -E XCL) ((n : ∀~x. P ⇒ π 0 ) ; α) ` excl n(~ pi ) : π

S π0 = π ∀i. A ` pi : S Pi (M ATCH) ((n : ∀~x. P ⇒ π 0 ) ; α) ` n(~ pi ) : π

Y) ∈ fundeps(π).(π 0 π mod Y ∧ π 0 π mod Y) α`p:π π 0 is positive (S TEP -P OS) ((n : ∀~x. P ⇒ π 0 ) ; α) ` p : π

∀(X ∃(X

0

Y) ∈ fundeps(π). S π = π mod Y ∃i. A ` pi : S Pi α`p:π 0 ((n : ∀~x. P ⇒ π ) ; α) ` [n, i, pi ]p : π (S TEP -C ONTRA)

π0 π

π0 π α`p:π π 0 is negative 0 ((n : ∀~x. P ⇒ π ) ; α) ` p : π

(S TEP -N EG)

Figure 2. The ilab deduction system predicates either via a negative axiom or via exclusion by a (nonoverlapping) positive axiom. To account for this, we make excl p equivalent to any other proof. Equivalence for ilab proofs is given by the following assertions: p = q =⇒ p ∼ =q

to be) applied, ilab would not proceed to the second. A similar argument shows stability of ilab proofs. The proof of soundness is along the same lines. If a set of axioms are valid, then none of the conclusions of the clauses in one axiom overlap the conclusions of the other axioms. As a result, the only sources of unsoundness must originate within a single axiom. However, ilab will prove at most one conclusion from any single axiom. This rules out all sources of unsoundness. In this work, we have focussed on expressiveness of ilab at the cost of formal termination or completeness properties. We imagine that an approach similar to the one taken by Volpano and Smith [19] to show the undecidability of ML typeability with overloading could be applied to the ilab deduction algorithm. We hope to return to issues of completeness and termination in future work.

0 0 0 p∼ = q =⇒ [n, i, p ]p ∼ = [n, i , q ]q 0 excl p ∼ =p

We can prove a predicate from an axiom set if we can prove it from some axiom in the set: ∃α ∈ A. α ` p : π A, ∆ ` p : π Because ilab prohibits overlap between clauses in separate instance chains, there cannot be more than one axiom in the set that matches a given predicate, let alone more than one that proves it. The deduction rules for α ` p : π are given in Figure 2. We continue to use the fundeps(π) shorthand instead of passing the constraint set ∆ in to all the inference rules. We also omit the regular side condition that substitutions must only mention the quantified variables. Rule M ATCH is intuitive: if an axiom matches the target predicate, and we can prove the preconditions of the axiom, then we can prove the predicate. Rule S TEP -C ONTRA is similarly intuitive. In a regular pattern, we require only that the axiom and rule match modulo (the determining parameters of) some functional dependency. Rule M ATCH -E XCL captures the case where we can prove a negative predicate by showing a positive predicate that agrees modulo a functional dependency. We gave an example of this case at the beginning of this section. Finally, there are two rules for skipping an axiom because it does not match the target predicate. The positive version (S TEP P OS) makes the usual allowance for functional dependencies. The negative version (S TEP -N EG) does not need to make this allowance because a negative predicate cannot be excluded by functional dependencies. 5.3

6.

Related work

Although they have been implemented in both Haskell and other languages, such as BitC [15], overlapping instances do not appear to have received much attention in prior research. Peyton Jones et al. [13] consider some of the issues with overlapping instances and other features of Haskell current at the time, such as context reduction. However, as the combination of functional dependencies and type classes had not yet been proposed, they do not anticipate many of the interactions that motivated the work in this paper. The use of overlapping instances is not quite as sparse. We have already discussed Swierstra’s [18] use of overlapping instances. Kiselyov et al. [9] use overlapping instances and functional dependencies to define a library for heterogeneous lists in Haskell, and Kiselyov and L¨ammel [8] take a similar approach in defining an object system in Haskell. In both cases, the authors find ways to avoid overlapping instances, but at the cost of additional code complexity. The Hackage collection of Haskell libraries also includes a number of examples that use overlapping instances. Heeren and Hage [3] describe a technique for providing additional information to the type checker in the form of type-class directives, specified separately from the Haskell source code. While specifying type-class directives separately allows them to be applied to existing Haskell code, it also limits their usability. In particular, while they can specify that a particular predicate is excluded from a class, or that a class is closed, they cannot use that information in an instance precondition or qualified type. Their directives do include some of the uses of explicit exclusion, such as closing classes or ensuring that classes are disjoint. Maier [10] summarizes the theory of functional dependencies as used in the database community. Jones [6] originally proposed the use of functional dependencies in type-class systems. Hallgren [2] describes some uses of functional dependencies for type-level computation, which we used for examples in Section 2.2.1. Alternative notation for functional dependencies was discussed by Neubauer et al. [12] and by Jones and Diatchki [7].

Properties of the ilab deduction algorithm

Section 4.2 describes several properties of deduction algorithms. We are currently developing formal proofs of those properties for ilab; we sketch some of them in this section. Equivalence and Stability of Proofs are relatively easy to establish because ilab does not allow instances to overlap. To generate two inequivalent proofs of the same predicate would require two different axiom clauses that both unify with the predicate being proved. Were these clauses in separate axioms, ilab would reject the axioms as overlapping. Were they in the same clause, they would be ordered such that, once the first (whichever it happened

385

Acknowledgements. We would like to thank Tim Chevalier, James Hook, Justin Bailey, Andrew Tolmach, and the rest of the HASP group for comments and suggestions on drafts of this paper.

Sulzmann et al. [17] describe an alternative approach to implementing functional dependencies. In the course of describing their implementation, they establish properties of classes with functional dependencies to make type inference sound, complete, and terminating but do not discuss the soundness of the class system directly. They also do not consider the interaction between overlapping instances and functional dependencies. Later work by Schrijvers et al. [14] proposes an alternative to functional dependencies called type functions and describes an implementation. They explicitly exclude any overlap between type functions.

7.

References [1] I. S. Diatchki and M. P. Jones. Strongly typed memory areas: programming systems-level data structures in a functional language. In Haskell ’06, pages 72–83, Portland, Oregon, USA, 2006. ACM. [2] T. Hallgren. Fun with functional dependencies, or (draft) types as values in static computations in Haskell. In Proc. of the Joint CS/CE Winter Meeting, 2001.

Conclusion and future work

[3] B. Heeren and J. Hage. Type class directives. In PADL ’05, pages 253–267. Springer-Verlag, 2005.

This paper has explored a new type-class feature, instance chains. We have motivated its development from existing Haskell typelevel programming, and demonstrated how type-level programming can be simplified and enhanced with instance chains. We have described a semantic framework for reasoning about type classes and their implementations, showed how we can model a type class system with instance chains and functional dependencies, and presented a deduction algorithm for such a type system. There is also significant opportunity for future work in this area; we outline some possibilities next.

[4] M. P. Jones. A theory of qualified types. In B. K. Bruckner, editor, ESOP ’92, volume 582. Springer-Verlag, London, UK, 1992. [5] M. P. Jones. Qualified Types: Theory and Practice. University Press, 1994.

Cambridge

[6] M. P. Jones. Type classes with functional dependencies. In ESOP 2000, pages 230–244, London, UK, 2000. Springer-Verlag. [7] M. P. Jones and I. S. Diatchki. Language and program design for functional dependencies. In Haskell Symp., pages 87–98, Victoria, BC, Canada, 2008. ACM. [8] O. Kiselyov and R. L¨ammel. Haskell’s overlooked object system. Draft; Submitted for publication; online since 10 Sept. 2005.

Overlap check. The overlap check as implemented in ilab is significantly more restrictive than it needs to be. As discussed in Section 2.2.1, the preconditions of instances may prevent them from applying to the same predicate. We would like to improve the ilab overlap check so that it takes account of semantic overlap—that is, when two axioms actually cover the same cases—as opposed to the purely syntactic notions of overlap used in this paper. To do so, we will need to determine not just when the hypotheses of two axioms contradict, but also when the possible conclusions of two hypotheses contradict—a potentially expensive search. We hope to apply existing refutation methods to limit this search.

[9] O. Kiselyov, R. L¨ammel, and K. Schupke. Strongly typed heterogeneous collections. In Haskell ’04, pages 96–107, Snowbird, Utah, USA, 2004. ACM Press. [10] D. Maier. The Theory of Relational Databases. Computer Science Press, 1983. [11] J. G. Morris. Experience report: Using Hackage to inform language design. In Haskell ’10, Baltimore, Maryland, USA, 2010. ACM. [12] M. Neubauer, P. Thiemann, M. Gasbichler, and M. Sperber. A functional notation for functional dependencies. In Haskell ’01, Firenze, Italy, September 2001. [13] S. Peyton Jones, M. P. Jones, and E. Meijer. Type classes: an exploration of the design space. In Haskell ’97, Amsterdam, The Netherlands, June 1997. [14] T. Schrijvers, S. Peyton Jones, M. Chakravarty, and M. Sulzmann. Type checking with open type functions. In IFCP ’08, pages 51–62, Victoria, BC, Canada, 2008. ACM. [15] J. Shapiro, S. Sridhar, and S. Doerrie. BitC (0.11 transitional) language specification. http://www.bitc-lang.org/docs/bitc/spec.html. Last accessed June 15, 2010. [16] D. Steinitz. Exporting a type class for type signatures. http: //www.haskell.org/pipermail/haskell-cafe/2008-November/ 050409.html, November 2008. [17] M. Sulzmann, G. J. Duck, S. Peyton Jones, and P. J. Stuckey. Understanding functional dependencies via constraint handling rules. JFP, 17(1):83–129, 2007. [18] W. Swierstra. Data types a` la carte. JFP, 18(04):423–436, 2008. [19] D. M. Volpano and G. S. Smith. On the complexity of ML typeability with overloading. In FPCA ’91, pages 15–28, Cambridge, Massachusetts, USA, 1991. Springer-Verlag. [20] P. Wadler. The expression problem. http://homepages.inf.ed.ac. uk/wadler/papers/expression/expression.txt, 1998.

Default implementations. We have discussed coding alternatives using overlapping instances at some length; another use of overlapping instances in existing Haskell code, particularly serialization and generic programming libraries, is to provide default implementations of classes while allowing type-specific implementations to be defined later. We have developed a pattern that encodes default implementations using instance chains instead of overlapping instances. We anticipate testing this pattern against examples of default instances, and hope to report on the results in the future. Greatest and least models. Section 4 effectively uses the least model of a set of instances to determine its consequences. As it includes failure and functional dependencies, the greatest model of a set of ilab instances, unlike in Haskell 98, need not include all predicates. We hope that further study of greatest models will inform alternative approaches to recursive instances and the termination and completeness of deduction algorithms. Integration into Habit. As discussed in Section 3, the development of ilab was an intermediate step in the development of a dialect of Haskell called Habit. Habit includes many features omitted by ilab, including Haskell-style superclasses, type-level naturals, explicit representation of binary formats, etc. We hope to extend the techniques used in the modeling and implementation of ilab in developing the Habit type-class system. We are also interested to see how features like type-level naturals affect the ilab type-class system, and how much we can implement using ilab features without baking operations into the compiler. We also believe that instance chains would be a valuable addition to Haskell, or to other Haskell dialects besides Habit.

[21] P. Wadler and S. Blott. How to make ad-hoc polymorphism less ad hoc. In POPL ’89, pages 60–76, Austin, Texas, USA, 1989. ACM.

386

Author Index Arnold, Gilad ............................. 249

Greenberg, Michael ................... 193

Morris, Peter ..................................3

Barbosa, Davi M. J. .................... 193

Hage, Jurriaan .............................. 63

Nakano, Keisuke ........................205

Bergstrom, Lars ........................... 93

Hammond, Kevin ...................... 297

Naylor, Matthew ..........................75

Bernardy, Jean-Philippe ............. 345

Hidaka, Soichiro ........................ 205

Neis, Georg ................................143

Bierman, Gavin M. ..................... 105

Holdermans, Stefan ..................... 63

Paterson, Ross ............................345

Birkedal, Lars ............................ 143

Hölzl, Johannes ......................... 249

Peyton Jones, Simon ..................261

Blelloch, Guy E. ......................... 247

Hriţcu, Cătălin ........................... 105

Pierce, Benjamin C. ............ 157, 193

Bodík, Rastislav ......................... 249

Hu, Zhenjiang .....................181, 205

Pop, Iustin ..................................369

Brady, Edwin C. ......................... 297

Huch, Frank ............................... 357

Pottier, François .........................217

Buisson, Jérémy ........................... 27

Inaba, Kazuhiro ......................... 205

Pouillard, Nicolas .......................217

Chakravarty, Manuel M. T. ........ 261

Jansson, Patrik ........................... 345

Rainey, Mike ................................93

Chapman, James ............................ 3

Jones, Mark P. ............................ 375

Reed, Jason ................................157

Charguéraud, Arthur .................. 321

Kato, Hiroyuki ........................... 205

Reppy, John .................................93

Chevalier, Tim ........................... 273

Keller, Gabriele ......................... 261

Runciman, Colin ..........................75

Crary, Karl ................................. 131

Kennedy, Andrew J. ..................... 15

Sagiv, Mooly ..............................249

Crestani, Marcus ........................ 229

Köksal, Ali Sinan ...................... 249

Scott, David .................................87

Cretin, Julien .............................. 193

Langworthy, David .................... 105

Shao, Zhong ...............................333

Culpepper, Ryan ........................ 235

Leshchinskiy, Roman ................ 261

Sharp, Richard ..............................87

Dagand, Pierre-Évariste ................. 3

Licata, Daniel R.......................... 169

Shaw, Adam .................................93

Dagnat, Fabien ............................. 27

Lippmeier, Ben .......................... 261

Sperber, Michael ........................229

Danielsson, Nils Anders ............ 285

Madhavapeddy, Anil ................... 87

Stampoulis, Antonis ...................333

Dreyer, Derek ............................ 143

Matsuda, Kazutaka .............181, 205

Tobin-Hochstadt, Sam ...............117

Felleisen, Matthias ..... 117, 129, 235

Mazurak, Karl .............................. 39

Tolmach, Andrew ......................273

Fischer, Sebastian ...................... 357

McBride, Conor ............................. 3

Van Horn, David ..........................51

Fluet, Matthew ............................. 93

McCreight, Andrew ................... 273

Voigtländer, Janis ......................181

Foster, Nate ................................ 193

Might, Matthew ........................... 51

Vytiniotis, Dimitrios ....................15

Gazagnaire, Thomas .................... 87

Mitchell, Neil ............................ 309

Wang, Meng ...............................181

Gordon, Andrew D. .................... 105

Morgenstern, Jamie ................... 169

Wilke, Thomas ...........................357

Gordon, Michael J. C...................... 1

Morris, J. Garrett ........................ 375

Zdancewic, Steve .........................39

387

ICFP’10 Proceedings of the 2010 ACM SIGPLAN International Conference on Functional Programming

Erlang’10 Proceedings of the 2010 ACM SIGPLAN Erlang Workshop

Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming : August 31-September 2, 2009, Edinburgh, Scotland

Haskell’10 Proceedings of the 2010 ACM SIGPLAN Haskell Symposium

p-adic Functional Analysis.. Proceedings of the Sixth International Conference

P-adic functional analysis: proceedings of the fourth international conference

Complete Proceedings of the NordiCHI 2010 Conference

7th International Conference on Automated Deduction: Proceedings

Proceedings of COMPSTAT'2010: 19th International Conference on Computational Statistics, Paris France, August 22-27, 2010

Mathematical Foundations of Programming Semantics: International Conference Proceedings

Proceedings of the 20th International Conference on Fluidized Bed Combustion

Proceedings of the 6th SIAM International Conference on Data Mining

Proceedings of the International Conference on Experimental Fluid Mechanics (2nd)

Proceedings of the International Conference on Chinese Enterprise Research 2007

Proceedings of the Sixth SIAM International Conference on Data Mining

Proceedings of the 4th International Conference on Southeast Asia

Proceedings of the Fourth SIAM International Conference on Data Mining

Proceedings of the International Conference on Colloid and Surface Science

Proceedings of the 20th International Conference on Fluidized Bed Combustion

ACM International Conference on Distributed Systems Platforms Heidelberg, Germany, November 12-16, 2001, Proceedings

International Mathematical Conference 1982: Proceedings

Trends in Functional Programming - TFP 2010

Systematic Biology: Proceedings of an International Conference

Proceedings of International Science Education Conference 2009

Entertainment Computing - ICEC 2010: 9th International Conference, ICEC 2010, Seoul, Korea, September 8-11, 2010. Proceedings

Foundations of functional programming

ACM International Workshop Bangalore, India, December 17, 2000 Proceedings

Unifying Theories of Programming: Third International Symposium, UTP 2010, Shanghai, China, November 15-16, 2010, Proceedings

Proceedings of a Conference on Operator Theory

Proceedings of Conference on Hyperfunctions, Katata, 1971

Remembering the Space Age: Proceedings of the 50th Anniversary Conference: Proceedings on the 50th Anniversary Conference

ICFP’10 Proceedings of the 2010 ACM SIGPLAN International Conference on Functional Programming

Erlang’10 Proceedings of the 2010 ACM SIGPLAN Erlang Workshop

Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming : August 31-September 2, 2009, Edinburgh, Scotland

Haskell’10 Proceedings of the 2010 ACM SIGPLAN Haskell Symposium

p-adic Functional Analysis.. Proceedings of the Sixth International Conference

P-adic functional analysis: proceedings of the fourth international conference

Complete Proceedings of the NordiCHI 2010 Conference

7th International Conference on Automated Deduction: Proceedings

Proceedings of COMPSTAT'2010: 19th International Conference on Computational Statistics, Paris France, August 22-27, 2010

Mathematical Foundations of Programming Semantics: International Conference Proceedings

Proceedings of the 20th International Conference on Fluidized Bed Combustion

Proceedings of the 6th SIAM International Conference on Data Mining

Proceedings of the International Conference on Experimental Fluid Mechanics (2nd)

Proceedings of the International Conference on Chinese Enterprise Research 2007

Proceedings of the Sixth SIAM International Conference on Data Mining

Proceedings of the 4th International Conference on Southeast Asia

Proceedings of the Fourth SIAM International Conference on Data Mining

Proceedings of the International Conference on Colloid and Surface Science

Proceedings of the 20th International Conference on Fluidized Bed Combustion

ACM International Conference on Distributed Systems Platforms Heidelberg, Germany, November 12-16, 2001, Proceedings

International Mathematical Conference 1982: Proceedings

Trends in Functional Programming - TFP 2010

Systematic Biology: Proceedings of an International Conference

Proceedings of International Science Education Conference 2009

Entertainment Computing - ICEC 2010: 9th International Conference, ICEC 2010, Seoul, Korea, September 8-11, 2010. Proceedings

Foundations of functional programming

ACM International Workshop Bangalore, India, December 17, 2000 Proceedings

Unifying Theories of Programming: Third International Symposium, UTP 2010, Shanghai, China, November 15-16, 2010, Proceedings

Proceedings of a Conference on Operator Theory

Proceedings of Conference on Hyperfunctions, Katata, 1971

Remembering the Space Age: Proceedings of the 50th Anniversary Conference: Proceedings on the 50th Anniversary Conference

Recommend Documents