COMPUTING MEANING VOLUME 3
VOLUME 78
Studies in Linguistics and Philosophy Volume 83
Managing Editors GENNARO CHIERCHIA, University of Milan KAI VON FINTEL, M.I.T., Cambridge F. JEFFREY PELLETIER, Simon Fraser University Editorial Board JOHAN VAN BENTHEM, University of Amsterdam GREGORY N. CARLSON, University of Rochester DAVID DOWTY, Ohio State University, Columbus GERALD GAZDAR, University of Sussex, Brighton IRENE HEIM, M.I.T., Cambridge EWAN KLEIN, University of Edinburgh BILL LADUSAW, University of California at Santa Cruz TERRENCE PARSONS, University of California, Irvine
The titles published in this series are listed at the end of this volume.
COMPUTING MEANING Volume 3 edited by
HARRY BUNT Tilburg University, The Netherlands and
REINHARD MUSKENS Tilburg University, The Netherlands
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 978-1-4020-5956-8 (HB) ISBN 978-1-4020-5958-2 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. www.springer.com
Printed on acid-free paper
All Rights Reserved © 2007 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
CONTENTS
HARRY BUNT AND REINHARD MUSKENS Computing the Semantic Information in an Utterance
1
MASSIMO POESIO, UWE REYLE AND ROSEMARY STEVENSON Justified Sloppiness in Anaphoric Reference
11
AOIFE CAHILL, MAIREAD MCCARTHY, MICHAEL BURKE, JOSEF VAN GENABITH AND ANDY WAY Deriving Quasi-Logical Forms from F-Structures for the Penn Treebank
33
HARRY BUNT Semantic Underspecification: Which Technique for What Purpose?
55
ALEX LASCARIDES AND NICHOLAS ASHER Segmented Discourse Representation Theory: Dynamic Semantics with Discourse Structure
87
´ RAQUEL FERNANDEZ, JONATHAN GINZBURG, HOWARD GREGORY AND SHALOM LAPPIN SHARDS: Fragment Resolution in Dialogue
125
´ AND BONNIE L. WEBBER IVANA KRUIJFF-KORBAYOVA Interpreting Concession Statements in Light of Information Structure
145
JAN VAN EIJCK Context and the Composition of Meaning
173
MARC SWERTS AND EMIEL KRAHMER Meaning, Intonation and Negation
195
MYROSLAVA DZIKOVSKA, MARY SWIFT AND JAMES ALLEN Customizing Meaning: Building Domain-Specific Semantic Representations from a Generic Lexicon
213
ARAVIND K. JOSHI, LAURA KALLMEYER AND MARIBEL ROMERO Flexible Composition in LTAG: Quantifier Scope and Inverse Linking 233
v
vi
CONTENTS
FABRICE NAUZE AND MICHIEL VAN LAMBALGEN Serious Computing with Tense
257
JAMES PUSTEJOVSKY, ROBERT KNIPPEN, JESSICA LITTMAN, ROSER SAURÍ Temporal and Event Information in Natural Language Text
301
TIM FERNANDO Finite-state Descriptions for Temporal Semantics
347
CLAIRE GARDENT AND KRISTINA STRIEGNITZ Generating Bridging Definite Descriptions
369
KEES VAN DEEMTER AND EMIEL KRAHMER Graphs and Booleans: On the Generation of Referring Expressions
397
JAN ALEXANDERSSON AND TILMAN BECKER Efficient Computation of Overlay for Multiple Inheritance Hierarchies in Discourse Modeling
423
RICHARD CROUCH, ANETTE FRANK AND JOSEF VAN GENABITH
Linear Logic Based Transfer and Structural Misalignment INDEX
457 473
HARRY BUNT AND REINHARD MUSKENS
COMPUTING THE SEMANTIC INFORMATION IN AN UTTERANCE
1. Introduction
To compute the meaning of any given natural language expression is extremely hard. This is partly due to the structural complexity, variability, and flexibility of natural language, but also, and more importantly, to its pervasive ambiguity. It has been estimated that, due to the fact that individual words as well as their combination in a sentence can usually express a range of semantic concepts and relations, an ordinary sentence of average length can have several million possible meanings. So what do we mean by ‘the meaning of a given natural language expression’ ? Language users are hardly ever aware of having to resolve an ambiguity, so in practice the understanding of a natural language expression does not mean choosing ‘the right interpretation’ among millions of possibilities. The crucial point is, of course, that as a language user we are never confronted with the task of computing the meaning of a sentence in splendid isolation. That happens only in the linguistic literature. In reality, natural language always occurs in a certain context. In a given context we are not talking about just anything, but we have a certain domain of discourse. This means that those word senses can be ruled out that do not belong to this domain, as well as those interpretations of structural ambiguities which express something that would be impossible or highly implausible in the domain. While the fact that speakers and listeners do not struggle with ambiguity resolution may suggest that context information is sufficient to determine the intended meanings of natural language expressions, it is hard to believe that context information is really sufficient to exclude millions of potential sentence meanings and retain exactly one of them as the meaning. It seems more likely that readers and listeners disambiguate meanings to the extent that is required by the circumstances, and that speakers and writers, expecting this of their listeners and readers, formulate their utterances accordingly. In other words, a context 1 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 1–10. c 2007 Springer.
2
BUNT AND MUSKENS
comes with certain demands on the precision with which meanings should be computed, and so, in a given context what should be regarded as the meaning of a given sentence is something that contains a certain amount of ambiguity or incompleteness. In particular, the context of use determines the appropriate level of granularity for referring expressions, and thus for the permitted vagueness of reference. Frazier and Rayner (1990) report on empirical evidence supporting this intuition for the use of polysemous nouns, whose reference is shown to be often unresolved until information later in a sentence provides disambiguating information, in contrast to the case of homonyms, where an interpreter must make a choice that may turn out to be wrong, leading to garden path sentences. Poesio et al. (2006) provide evidence that under certain conditions a certain amount of ‘sloppiness’ is permitted in anaphoric reference. A consequence of this view is that a representation of the meaning of a given natural language expression in a given context is not a semantic representation in the classical Montagovian sense, being fully specified, complete, and unambiguous, but is an underspecified semantic representation, that leaves room for ambiguity, vagueness, and incompleteness. Underspecified semantic representations (usrs) can be regarded not just as imperfect representations of meaning, waiting to become fully specified, but also as a way to salvage the adagio of compositionality: rather than saying that the semantic representation of a sentence represents its meaning, determined by the meanings of its components and its syntactic-semantic composition, we could say that a usr represents the meaning of a sentence in sofar as determined by the meanings of its components and its syntactic-semantic composition, or, in other words, a usr captures the semantic information contained in a sentence through its components and its syntactic-semantic composition. In the latter form, the compositionality assumption is no longer a thesis, that can be proved wrong or correct, but simply describes what one does when computing a usr: one computes the semantic information that a given sentence contains, and gives that a formal representation. A slightly different way of looking at the interpretation of a sentence in context, i.e. of an utterance, is to observe that a human interpreter does not so much compute the contextually most appropriate meaning of the utterance, but computes the semantic information that the utterance contributes to the contextual knowledge that the interpreter already has. This view corresponds with the idea that understanding an utterance means trying to integrate the information that it conveys with the rest of one’s knowledge, a classical notion in
COMPUTING SEMANTIC INFORMATION
3
artificial intelligence that is compatible with the approach to utterance meaning currently popular in dialogue studies, generally known as the information-state update or context-change approach (Smith & van Kuppeveld, 2003 Bunt, 2000; Traum & Larsson, 2003), and to some extent also with the basic ideas of dynamic semantics (Groenendijk & Stokhof, 1985; see also the discussion in Bunt, 1989 and Groenendijk & Stokhof, 1989). If utterance meanings are in general underspecified, as we just argued, then the way in which context models (‘information states’) are updated by utterance meanings is in general ambiguous, and incomplete... a consequence which researchers who follow this approach have so far not taken on board. Returning to the view on meaning and underspecification which focuses on computing the semantic information in a given utterance, it may be noted that this view comes very close to that of modern approaches to semantic annotation. Traditionally, annotation is the enrichment of text with notes on some of its properties or background. In computational linguistics annotation has usually taken the form of labelling text elements with certain tags, such as part of speech tags. Semantic annotation is taking a somewhat different turn, where the annotations that are added to a text are supposed to be expressions in a formal language with a well-defined semantics (see Bunt & Romary, 2002; 2004). The reason for this is that semantic annotations are intended to support not only the retrieval of certain text elements, but reasoning as well. A clear case is presented by the annotation of the temporal information in a text. If, for instance, a question-answering system is asked (1) What new products did Microsoft announce in the last quarter? and the data for providing answers contain a newspaper item, dated 12 May 2006, and stating that: (2) Microsoft announced its XP follow-up system yesterday at a press meeting in San Francisco. then the answer to this question should include the new operating system, as a result of reasoning that yesterday in this case is the same as May 11, 2006; that May belongs to the second quarter of a year: and that last quarter refers to the second quarter of 2006, since the question was asked at a date belonging to the third quarter. It is therefore insufficient to simply tag temporal expressions as being temporal expressions, for instance. Questions that are input to the
4
BUNT AND MUSKENS
system must also be time-stamped, and the time stamp should have a well-defined internal structure allowing the recognition of a month and year, which can then be compared to the creation dates of the documents in the database. Moreover, the date referred to by yesterday must be annotated not just as a date, but as the date on which the event occurred that is described in the corresponding sentence. Clearly, in order to support the fairly complex inferencing that is needed in such examples, the annotations have to meet syntactic and semantic requirements that go a long way beyond those of just labelling. The interesting point is here that semantic annotations are developing into formal representations of some of the semantic information contained in the sentences in a text. That makes them formally comparable to underspecified semantic representations, the main difference being that annotations tend to focus on a particular type of semantic information, such as temporal information or semantic roles, whereas usrs are typically intended to capture all the semantic information in a sentence.
2. About this book
Following the present introductory chapter, the book continuess with four chapters concerned with aspects of ambiguity, vagueness, and underspecification. The chapter by Massimo Poesio, Uwe Reyle and Rosemary Stevenson, entitled Justified sloppiness in anaphoric reference, takes up the issue of the ambiguity that speakers and listeners allow in the meanings of what they say, focusing on the use of anaphoric expressions. They analyze a corpus of spoken dialogues to identify cases in which the addressee of an utterance containing an anaphoric pronoun does not appear to have enough evidence to resolve that pronoun, yet doesn’t appear to find the pronominal use infelicitous. The two patterns of anaphoric use that were found to fit these conditions suggest three conditions under which justified sloppiness in anaphoric references is not perceived as infelicitous. Preliminary controlled experiments indicate that subjects do find anaphoric pronouns that satisfy the justified sloppiness conditions significantly easier to process than pronouns occurring in minimally different contexts in which these conditions are not satisfied. In the second chapter of this group, by Aoife Cahill, Mairead McCarthy, Michael Burke, Josef van Genabith and Andy Way, Deriving Quasi-Logical Forms from f-structures for the Penn Treebank, the au-
COMPUTING SEMANTIC INFORMATION
5
thors show how the trees in the Penn-II treebank can be associated automatically with simple Quasi-Logical Forms (QLFs). Their approach is based on combining two independent strands of work: the first is the observation that there is a close correspondence between QLFs and LFG’s f-structures (van Genabith and Crouch, 1996); the second is the development of an automatic f-structure annotation algorithm for the Penn-II treebank (Cahill et al., 2002a; Cahill et al., 2002b). The approach is compared with that of (Liakata and Pulman, 2002). In the chapter, Which underspecification technique for what purpose?, Harry Bunt examines a number of techniques for underspecification in semantic representations, notably labels and holes, ambiguous constants, metavariables, dominance constraints, radical reification, stores, lists and disjunctions, and in situ quantified terms. These techniques are considered for their usefulness in dealing with a variety of linguistic phenomena and cases of incomplete input, which have motivated the use of underspecified semantic representations. It is argued that labels and constraints, and the use of ambiguous constants and variables, have nearly disjoint domains of application, and together cover a wide range of phenomena. In the last chapter of this group, Alex Lascarides and Nicholas Asher motivate and describe Segmented Discourse Representation Theory (SDRT) as a dynamic semantic theory of discourse interpretation, using rhetorical relations to model the semantics/pragmatics interface. They describe the syntax and dynamic semantics of the SDRT language in which logical forms are represented, a separate but related language in which semantic underspecification is expressed as partial descriptions of logical forms, and a glue logic which uses commonsense reasoning to construct logical forms, relating the semantically underspecified forms that are generated by the grammar to their pragmatically preferred interpretations. The framework is applied to examples involving anaphora and other kinds of semantic ambiguities. Being concerned with the analysis of discourse, this chapter forms a bridge to the next three chapters, that are also concerned with the semantic interpretation of discourse and dialogue. Raquel Fern´ andez, Jonathan Ginzburg, Howard Gregory and Shalom Lappin present the main features of SHARDS, a semantically-based HPSG approach to the resolution of dialogue fragments. This implemented system interprets short questions (‘sluices’) and short answers. It provides a procedure for computing the content values of clausal fragments from contextual information contained in a discourse record of previously processed sentences.
6
BUNT AND MUSKENS
Ivana Kruijff-Korbayova and Bonnie Webber describe an investigation into the sensitivity of discourse connectives to the Information Structure (IS) of the utterances they relate, in their chapter Interpreting concession statements in light of information structure. They illustrate this with an analysis of connectives signaling concession, distinguishing two senses – denial of expectation and concessive opposition. Their account thus refines earlier accounts that ignore IS. This work is part of a larger enterprise aimed at understanding what role(s) sentence-level IS plays in the interpretation of larger units of discourse. Key ingredients in the description of discourse meaning are reference markers: objects in the formal representation that the discourse is about. It is well-known that reference markers are not like firstorder variables; the received view is that reference markers are like the variables in imperative programming languages. However, in a computational semantics of discourse that treats reference markers as ‘dynamically bound’ variables, every noun phrase will get linked to a dynamic variable, so it will give rise to a marker index. In the chapter Context and the composition of meaning, Jan van Eijck addresses the question where these indices come from, and how they can be handled when combining (or ‘merging’) pieces of discourse. He argues that reference markers are better treated as indices into context, and presents a theory of context and context extension based on this view. In context semantics, noun phrases do not come with fixed indices, so the merge problem does not arise. This solves a vexing issue with coordination that causes trouble for all current versions of compositional discourse representation theory. In the chapter Meaning, intonation and negation, Marc Swerts and Emiel Krahmer outline an approach to the study of meaning and intonation. The approach focusses both on what speakers can do, using production experiments, and on what hearers can do, using perception experiments. They show that such an experimental paradigm may yield interesting results from a semantic point of view, discussing the role intonation can play for the interpretation of negation phrases in natural language. Empirical evidence is presented for the existence of a set of prosodic differences between two kinds of negations, descriptive and metalinguistic ones. This distinction has been the subject of considerable debate in presupposition theory and also plays an important role in discussions about the division of labor between semantics and pragmatics. In general, it is argued that intonation gives rise to ‘soft constraints’, and that an optimality-theoretical framework may be suitable to model the relation between intonation and meaning.
COMPUTING SEMANTIC INFORMATION
7
Myroslava Dzikovska, Mary Swift and James Allen in their chapter Customizing meaning: Building domain-specific semantic representations from a generic lexicon, argue that language input to practical dialogue systems must be transformed into a semantic representation that is customized for use by the back-end domain reasoners, while at the same time one wants to keep front-end system components as domain-independent as possible for easy portability across multiple domains. They propose a transparent way to achieve domain-specificity from a broad-coverage domain-independent parser. They define a set of mappings from ontologies into domain-specific knowledge representations, and use these mappings both to customize the semantic representations output by the parser for the reasoners, and to specialize the lexicon to the domain – which improves parsing speed and accuracy. This method facilitates instances of semantic type coercion common in many domains by combining lexical representations with domain-specific constraints on interpretation. The chapter by Aravind Joshi, Laura Kallmeyer and Mariel Romero addresses the problem of formulating constraints for relative quantifier scope, in particular in inverse linking readings where certain scope orders are excluded. They show how to account for such restrictions in the Tree Adjoining Grammar (TAG) framework by adopting a notion of ‘flexible composition’. In the semantics used for TAG they introduce quantifier sets that group quantifiers that are ‘glued’ together in the sense that no other quantifier can scopally intervene between them. The flexible composition approach allows them to obtain the desired quantifier sets and thereby the desired constraints for quantifier scope. The next three chapters are concerned with the expression of time in natural language. In the first of these, Serious computing with tense, Fabrice Nauze and Michiel van Lambalgen describe a comprehensive proposal for dealing with time and events as expressed in natural language. They argue that the simple davidsonian addition of time and event variables to predicates in the representation language is insufficient for reasoning about time and events, and argue that a theory of time and events should have great expressive power and be presentable in axiomatic form, so that it is entirely clear what it predicts and what it doesn’t. They argue that the event calculus that has been developed in robotics (Shanahan, 1997) has all the desired properties. It allows one to formulate a goal and a causal theory of the domain. Based on the causal theory, a plan for reaching that goal can be inferred. In the version of the event calculus proposed by van Lambalgen and Hamm (2003), the inference mechanism is constraint logic programming with
8
BUNT AND MUSKENS
negation as failure. The authors propose to apply this formalism to tense and aspect, since goals seem to play a prominent part there. For example, a profitable way to formulate the meaning of accomplishments is to specify a goal and a causal theory, which together yield a plan which achieves the goal when no unforeseen circumstances occur. This prevents the so-called ‘imperfective paradox’ to arise in case the goal is for some reason never achieved. The second chapter in this group, by James Pustejovsky, Robert Knippen, Jessica Litman and Roser Saur´ı, Temporal and event information in natural language text, discusses the role that temporal information plays in natural language text, specifically in the context of question-answering systems. A descriptive framework is defined for examining the temporally sensitive aspects of natural language queries. The properties are investigated that a general specification language would need to have, in order to mark up temporal and event information in text. The language TimeML is presented, a rich specification language for event and temporal expressions in natural language text. The chapter shows the expressiveness of TimeML for a broad range of syntactic and semantic contexts, and demonstrates how it can play an important part in the development of more robust question-answering systems. In the third of this group of chapters, Finite-state descriptions for temporal semantics, Tim Fernando outlines finite-state descriptions for temporal semantics through which to distinguish ‘soft’ inferences, reflecting manners of conceptualization, from more robust semantic entailments defined over models. Fernando argues that just what descriptions are built (before being interpreted model-theoretically) and how they are grounded in models of reality, explains upon examination why some inferences are soft and others are robust. The next two chapters are both concerned with semantic aspects of language generation. Claire Gardent and Kristina Striegnitz in their chapter Generating bridging definite descriptions focus on the role that knowledge based reasoning plays in the generation of definite descriptions. Specifically, they propose an extension of Dale and Reiter’s incremental algorithm which covers not only directly anaphoric descriptions, but also indirect and associative anaphora. Starting from a formalism independent algorithm, they further show how this algorithm can be implemented using description logic. Kees van Deemter and Emiel Krahmer explore in their chapter Graphs and booleans: On the generation of referring expressions how a graph-theoretical perspective may be brought to bear on the generation
COMPUTING SEMANTIC INFORMATION
9
of complex referring expressions. The motivation for this exploration was that, if each of these types of referring expressions can be addressed using one and the same formalism, then this will make it easier to compare and, ultimately, to combine them into one unified algorithm. They sketch how relations, vague properties, and Boolean operators have been tackled by earlier algorithms, and ask how these algorithms can be formalised using a graph-theoretical approach. it is shown that most of the existing algorithms carry over without difficulty, through the technique of making implicit properties explicit in the knowledge base. However, in the case of one algorithm (which focusses on the generation of Boolean descriptions that also contain relational properties), this strategy turns out to be problematic. For this case, a new algorithm is presented, based on partitioning the target set, which can be implemented in a graph-theoretical formalism without difficulty. The last two chapters of the book return to issues of underspecification and interpretation in context. In the first of these, Efficient computation of overlay for multiple inheritance hierarchies in discourse modeling, Jan Alexandersson and Tilman Becker note that default reasoning has been shown to be a convenient means for interpreting user utterances in context and introduce ‘overlay’, the combination of default unification and a scoring function where the latter is used for computing the degree of similarity between new and old information. In this work they continue their efforts for default unification of typed feature structures by giving an efficient algorithm for multiple inheritance hierarchies. The main contribution of this chapter is that, contrary to previous suggestions, most of the computation can be done on the type hierarchy. The scoring function is adapted accordingly. The final chapter, by Dick Crouch, Anette Frank and Josef van Genabith is concerned with an application of underspecified semantic representations, namely ambiguity-preserving machine translation. In earlier work (van Genabith et al., 1998), the authors developed an approach where transfer takes place on the glue language meaning constructors of (Dalrymple et al., 1996); unfortunately, that approach was unable to deal with structural misalignment problems, such as embedded head switching, in a satisfactory way. This chapter proposes the use of a fragment of linear logic as a transfer formalism, and shows how it provides a more general and satisfactory solution to these problems.
10
BUNT AND MUSKENS
References Bunt, H.: 1989. Dynamic Interpretation in Text and Dialogue. In H. Bouma and B. Elsendoorn, editors Working Models of Human Perception. New York: Academic Press. pp. 81–150. Bunt, H.: 2000. Dialogue pragmatics and context specification. In H. Bunt and W. Black, editors Abduction, Belief and Context in Dialogue. Studies in Computational Pragmatics. Benjamins, Amsterdam. pp. 419–455. Cahill, A., M. McCarthy, J. van Genabith and A. Way: 2002a. Automatic Annotation of the Penn Treebank with LFG F-Structure Information. In Proc.LREC Workshop on Linguistic Knowledge Acquisition and Representation. Cahill, A., M. McCarthy, J. van Genabith and A. Way: 2002b. Evaluating Automatic F-Structure Annotation for the Penn-II Treebank in Proceedings of the Treebanks and Linguistic Theories Workshop, Sozopol, Bulgaria. Dalrymple, M., Lamping, J., Pereira, F.C.N. and Saraswat, V.: 1996. Quantification, anaphora, and intensionality. Journal of Logic, Language and Information 6(3) 219–273. Frazier, L. and K. Rayner: 1990. Taking on Semantic Commitments: Processing Multiple Meanings vs. Multiple Senses. Journal of Memory and Language 29, 181–200. Genabith, J. van, and D. Crouch: 1996. Direct and Underspecified Interpretations of LFG f-Structures. In Proceedings COLING 96, Copenhagen, Denmark, pp. 262–267. Genabith, J. van, Frank, A. and Dorna, M.: 1998. Transfer constructors. In Butt, M. and King, T. H., editors, Proceedings of the LFG’98, Conference Brisbane, Australia. CSLI Publications, pp. 190–205. Groenendijk, J., and M. Stokhof: 1985. Dynamic Predicate Logic. ITLI Report, Amsterdam: ITLI. Groenendijk, J., and M. Stokhof: 1989. Context and Information in Dynamic Semantics. In H. Bouma and B. Elsendoorn, editors Working Models of Human Perception. New York: Academic Press. pp. 457–486. Liakata, M. and S. Pulman: 2002. From trees to predicate-argument structures. COLING’02, Proceedings of the Conference, Taipei. Lambalgen, M. van, and F. Hamm: 2003. Moschovakis’ notion of meaning as applied to linguistics. In M. Baaz and J. Krajicek, editors, Logic Colloquium ’01. ASL Lecture Notes in Logic, Wellesley, MA.: A.K. Peters. Shanahan, M.P.: 1997. Solving the frame problem. The M.I.T. Press, Cambridge MA. Smith, R. and J. van Kuppevelt editors: 2003. Current and new Directions in Discourse and Dialogue. Dordrecht: Kluwer. Traum, D., and S. Larsson: 2003. The Information State Approach to Dialogue Management. In R. Smith and J. van Kuppevelt, editors,: 2003, Current and new Directions in Discourse and Dialogue. Dordrecht: Kluwer, pp. 325–353.
MASSIMO POESIO, UWE REYLE AND ROSEMARY STEVENSON
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
1. Motivations The reason why I hate critics . . . is that they write sentences like this: .... Flaubert does not build up his characters, as did Balzac, by objective cultural description; in fact, so careless is he of their outward appearance that on one occasion he gives Emma brown eyes (14); on another, deep black eyes (15); and on another blue eyes (16). I must confess that in all the times I read Madame Bovary, I never noticed the heroine’s rainbow eyes. Should I have? Would you? Put it another way: IS THERE A PERFECT READER SOMEWHERE, A TOTAL READER? (From J. Barnes, Flaubert’s Parrot, Picador, 1984, p. 74–76)
The quote above expresses an intuition shared by many computational semanticists: namely, that readers and listeners do not always seem to construct complete interpretations of everything they read or hear. The possibility that utterance meanings may occasionally be ‘incomplete’ or ‘underspecified’ may lead to a fundamental rethink of traditional ideas about how meaning is constructed, and has therefore generated a great interest in recent years, testified, e.g., by the collection (van Deemter and Peters, 1996), by a special issue of the Journal of Semantics on this topic, and by a number of workshops. Computational semanticists, primarily concerned with lexical and scopal underspecification, focused on developing logical characterizations of the type of interpretation that may be assigned to an utterance when its meaning remains underspecified. As a result, it has been shown that it is possible to provide a logical characterizations of the complete space of possible interpretations (see, e.g., (Alshawi and Crouch, 1992; van Eijck and Jaspars, 1996; Muskens, 1995; Pinkal, 1995; Poesio, 1991; Poesio, 1996; Reyle, 1993; Reyle, 1996) as well as the papers in van Deemter and Peters, 1996). However, not much empirical evidence has yet been found supporting this intuition, except perhaps for the case of lexical underspecification (Frazier and Rayner, 1990; Copestake and Briscoe, 1995). On the contrary, there is a lot of evidence suggesting that a number of semantic 11 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 11–31. c 2007 Springer.
12
POESIO ET AL.
interpretive processes take place immediately, just like syntactic interpretation does. (Well-known results concerning the incrementality of semantic interpretation are discussed in (Swinney, 1979) for lexical interpretation and (Tanenhaus et al., 1995) for anaphora resolution.) Our long-term goal is to examine the evidence for and against underspecification in anaphora resolution by identifying cases in which anaphoric expressions, and especially pronouns, are not completely interpreted, and determining the interpretation they receive. (We focus on anaphoric interpretation rather than scope disambiguation – the area that has motivated the most work on underspecification in computational semantics (Reyle, 1993; Poesio, 1994; Muskens, 1995; Pinkal, 1995) – because empirical evidence on anaphora resolution is much easier to collect.) Our research combines corpus analysis and more traditional psychological experimentation: using corpus analysis first to identify contexts in which pronouns may remain underspecified, then running controlled psychological experiments to verify whether indeed this is the case. In this chapter we first discuss the results of a study of a corpus of task-oriented spoken conversations that led us to identify contexts in which an apparently ambiguous anaphoric expressions does not seem to result in a problem being signalled by the other conversational participant. We then propose a preliminary hypothesis concerning what these cases have in common, and finally discuss psychological experiments supporting our hypothesis.
2. Underspecification in Reference: Psychological
Evidence Whereas work on underspecification in computational syntax such as (Sturt and Crocker, 1996) is supported to some extent by empirical evidence – e.g., on the greater or lesser facility of certain syntactic reanalysis processes – the work on underspecification in computational semantics has been driven much less by psychological results, since such evidence was, until recently, pretty minimal. There is, however, an increasing number of studies pointing out cases in which aspects of semantic interpretation are not immediately resolved. These studies are discussed in some length in (Poesio, 1999), to appear as (Poesio, to appear); a useful summary of the psycological evidence is in (Sanford and Sturt, 2002).
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
13
The best-known results concerning semantic underspecification are the studies by (Frazier and Rayner, 1990). Frazier and Rayner observed that cases of homonymy like pitcher or records could be experimentally separated from cases of polysemy like newspaper: whereas words belonging to the first class would originate garden paths when subsequent context disconfirmed the preferred interpretation at the point the word (e.g., record) was encountered, as in (3d), no garden paths were observed for polysemous words like newspaper in (4d). (3) a. After they were scratched, the records were carefully guarded. b. After the political takeover, the records were carefully guarded. c. The records were carefully guarded after they were scratched. d. The records were carefully guarded after the political takeover.
(4) a. Lying in the rain, the newspaper was destroyed. b. Managing advertising so poorly, the newspaper was destroyed. c. Unfortunately the newspaper was destroyed, lying in the rain. d. Unfortunately the newspaper was destroyed, managing advertising so poorly. For the case of anaphoric reference, the evidence is mixed. On the one hand, it is known (see, e.g., Tanenhaus et al., 1995) that definite descriptions referring to objects in the visual situation are interpreted immediately. It is also known, however, that the case with anaphoric definite descriptions and other anaphoric expressions such as pronouns is more complex. Although the interpretation of pronouns begins immediately, anaphoric pronouns in ambiguous contexts (i.e., with multiple same-gender potential antecedents) remain uninterpreted until the end of the sentence (Gernsbacher and Hargreaves, 1988). And some evidence suggests that pronouns not referring to a focal entity are not immediately interpreted (Garrod and Sanford, 1985; Garrod et al., 1994). Garrod et al. (1994), for example, tested the effect of gender, focusing, and verb bias using an eye-tracker and materials like those in (5). A context paragraph was used to establish either a male or a female entity as focus, and to introduce a second entity whose gender either matched or didn’t match that of the focused entity. Then a target
14
POESIO ET AL.
sentence was presented, containing either a masculine or a feminine pronoun, and a verb biased either towards the focused entity (sank) or towards the second entity (jumped). (5) A dangerous incident in the pool Elizabeth1 / Alexander1 was an inexperienced swimmer and wouldn’t have gone in if the male lifeguard2 hadn’t been standing by the pool. But as soon as she1 /he1 got out of her1 /his1 depth she1 /he1 started to panic and wave her1 /his1 hands about in a frenzy. a. b. c. d.
Within Within Within Within
seconds, seconds, seconds, seconds,
she sank into the pool. he sank into the pool. she jumped into the pool. he jumped into the pool.
First-pass reading times for the verb indicated that conflicts between the interpretation of the pronoun and the verb (as in (5c)) were only immediately detected, resulting in a slowed reading time for the verb, when both gender information and focus information converged on the interpretation of the pronoun; otherwise, the conflict was only detected later. According to Garrod et al., this suggested that unless both these conditions were satisfied, the interpretation of the pronoun was delayed.
3. Finding (Anaphoric) Underspecification in Corpora
3.1. Methodology How is it possible to use a corpus to find cases in which aspects of the meaning of an utterance – in our case, an anaphoric expression – remain underspecified? Our method has been to analyze task-oriented conversations from the trains corpus collected at the University of Rochester1 , and to look for cases in which (i) more than one potential antecedent of an anaphoric expression matches it in gender and number, (ii) no focusing principle we are aware of makes one of the interpretations preferred, and yet (iii) the recipient of the utterance is able to accomplish the task without signalling a problem. The assumption underlying this approach is that in task-oriented conversations, unlike so-called ‘cocktail-party’ situations, the participants need to signal when they didn’t understand something, as S does in 24.5 in (6): 1
See http://www.cs.rochester.edu/research/speech/trains.html
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
(6) 23.7 23.8 23.9 23.10 23.11 23.12 23.13 23.14 24.1 24.2 24.3 24.4 24.5
15
M: what would be faster : to send : an engine : from Elmira : to : ... one of the boxcars : or from : Avon S: well there’s : there’s a boxcar : already _at_ Elmira [3sec] : and : t / YOU MEAN TO GO TO CORNING
We identified several patterns of pronominal use that satisfy these conditions: three annotators (including two of the authors) agree that a use of a pronoun is ambiguous, and the absence of repair signals (such as Sorry, what did you mean?, or I didn’t understand) indicates that the listener didn’t seem to have a problem with the expression being ambiguous. We discuss these patterns in the rest of this section, and the conclusions that we drew from these cases in the next. 3.2. References to Mereologies A first clear pattern emerging from the corpus is illustrated by the following example (from dialogue d91-3.1):2 (7) 3.1 3.2 3.3 4.1 5.1 5.2 2
M: can we .. kindly hook up : uh : engine E2 to the boxcar at .. Elmira S: ok M: +and+ send it to Corning : as soon as possible please
The same person plays the system’s role in all trains-91 dialogues; 8 different subjects play the manager role, and each of them is involved in two dialogues. The first part of the identification code of the dialogue says which speaker was involved; the second which dialogue it was - for example, d91-1.1 is the first dialogue for speaker 1, d91-1.2 is the second dialogue for speaker 1, d91-2.1 is the first dialogue for speaker 2, etc.
16
POESIO ET AL.
6.1 7.1 8.1 8.2 9.1 9.2 9.3 9.4 9.5 10.1 10.2
S: okay [2sec] M: do let me know when it gets there S: okay it’ll / : it should get there at 2 AM M: great : uh can you give the : manager at Corning instructions that : as soon as it arrives : it should be filled with oranges S: okay : then we can get that filled
In this example, it’s not clear whether the pronoun it in 5.1 refers to the engine E2 which has been hooked up to the boxcar at Elmira, to the boxcar itself, or indeed whether that matters. It’s only at utterance 9.5 that we get evidence that it probably referred to the boxcar at Elmira, since it is only boxcars that can be filled with oranges; yet, if anything, focusing theories would predict engine E2 to be the antecedent, since engine E2 is the direct object, the THEME, and comes first (Sidner, 1979; Stevenson et al., 1994; Grosz et al., 1995; Poesio and Stevenson, To appear). Note that this type of pronoun use cannot be viewed as an example of vagueness, at least not according to the standard vagueness tests (Lakoff, 1970; Zwicky and Sadock, 1975). Whereas the ellipsis test, for example, would suggest that a glove in (8a) is indeterminate, because it’s possible for John to have lost its left glove and for Bill to have lost its right one, the same test applied to it in (8b) suggests that this expression is ambiguous, in that it’s not possible to interpret Then, John should check if IT gets to Bath in time, and Bill should too as meaning that John should check that the engine gets to Bath in time, whereas Bill should check that the boxcar gets there in time: (8) a. John lost a glove, and Bill did too b. Let’s hook the engine to the boxcar. Then, John should check if IT gets to Bath in time, and Bill should too
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
17
Another context also apparently leading to unproblematic use of pronouns with multiple matching antecedents is illustrated by (9). In this example, the pronoun that in 27.4 may refer either to the orange juice or to the tanker car in which the orange juice has been loaded: (9) 26.1 27.1 27.2 27.3 27.4
S: M: : :
okay so then we’ll ... we’ll be in a position to load the orange juice into the tanker car : ... and send that off
In order to characterize in a more systematic fashion the possible interpretations of that in 27.4 in this last example (and of it in (7), 5.1) we will borrow some notation from Link (1983). We will write oj ⊕ tc to indicate the object that has oj and tc as subparts, and a b to say that a is a mereological part of b.3 With this notation, we can formalize the first and most obvious property of examples (7) and (9): namely, that actions like hooking up and loading are performed that create a new object a ⊕ b out of the potential antecedents a and b (e.g., oj ⊕ tc in (9)). The second property of these examples is that four interpretations for the pronominal expression are possible. The complete list of the possible interpretations of that in (9), 27.4 is: that = oj, tc, oj ⊕ tc, or an indeterminate x (oj ⊕ tc) This latter interpretation (x (oj ⊕ tc)) is what has been called a p-underspecified interpretation in (Poesio, 1999) – i.e., a ‘disjunctive’ interpretation that ‘covers’ all of the alternative interpretations, similar to those proposed for certain cases of lexical polysemy in (Copestake and Briscoe, 1995). We will hypothesize below that the existence of such an underspecified interpretation may be a further important property of these contexts. The third property that these examples have in common is that both in situations involving attaching two objects together and in situations involving loading objects into other objects, all of the alternative interpretations of the anaphoric expression are equivalent as far as the plan (moving these objects to a new location) is concerned: after the two Link uses ≤ to indicate the ordering relation on parts for which we use the notation . 3
18
POESIO ET AL.
explicitly mentioned potential antecedents are joined, if one of them gets moved, the other one must be moved as well. E.g., in (9), 27.4, all interpretations of the instruction send that off will achieve the same result irrespective of how the pronoun is interpreted. We will write X ∼ Y to indicate that interpretation X is equivalent to interpretation Y for the purpose of the plan: i.e., we write oj ∼ tc to indicate that from the point of view of the plan, interpreting the pronoun that as referring to the orange juice or the tanker car are equivalent. Similarly, in the case of (7), we will write e ∼ b to say that from the point of view of the plan, the interpretation of it in which it refers to engine E2 and that in which it refers to the boxcar are equivalent. 3.3. Reference to plans A second class of pronominal uses in the trains dialogues satisfies our three conditions (ambiguity, no preferences for one interpretation, and no complaints). These are the uses of demonstratives such as that to refer to parts of a plan, as in the following example (from trains dialogue d91-1.1): (10) 11.6 11.7 12.1 13.1 13.2 13.3 13.4 13.5 14.1 14.2 15.1 15.2 15.3 15.4
: aha : I see an engine and a boxcar both at Elmira S: right M: this looks like the best thing to do : so we should get : ... the eng / engine to picks up the boxcar : and head for Corning : ’s that sound reasonable S: sure : that sounds good M: and from Corning we’ll pick up the oranges : and um : take them to Bath : will it / that get m / me
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
15.5 16.1 16.2 16.3
19
: do you think that I can get .. this all over to Bath by 8 o’clock S: yeah : that gets us to Bath at f / 5 AM : so it’s plenty of time
The demonstratives in question are that in 15.4 and that in 16.2. Roughly speaking, the structure of the plan at this point in the conversation can be represented as follows: P: get o to Bath
(11) Q:ee, be get in Corning
R: get o from C. to Bath using ee, be pick up o
take o to Bath
but from the transcript it is unclear whether these two demonstratives refer just to the portion of the plan consisting of the action of picking up the oranges and taking them to Bath (or perhaps just to one of the two actions, say the second), or to the larger plan which also includes the previous action of getting the engine to pick up the boxcar and heading for Corning. Our two other conditions are met as well: no focusing principle we are aware of suggests one interpretation over the other, and the other participant does not complain. These examples do not look like cases of vagueness, either, as shown by the fact that (12) does not have the interpretation that John could agree to a plan, whereas Bill would agree to a subpart of it: (12) John agree to THAT PLAN/THAT, and Bill did too; The diagrammatic representation of the plan in (11) already hints at a further semantic property that plans share with the antecedents of the previously discussed class of pronominal use: plans, at least as classically viewed in the Artificial Inteligence literature, have a mereological structure as well. According to this view of plans, plans denote event or action types, and these in turn can be seen as sets of actions (of the appropriate type). So, for example, plan R in (11) denotes the set of events of getting oranges from Corning to Bath using the engine and the boxcar, R = {e|e:get(o,c,ba,ee,be)}
20
POESIO ET AL.
where o refers to the oranges, c refers to Corning, ba refers to Bath, ee refers to the engine at Elmira, and be to the boxcar at Elmira. Plan P refers to the set of events of getting oranges to Bath: P = {e |∃x, y, z e :get(o,x,ba,y,z)} Intuitively, set R is a subset of set P. This allows us to define a ‘part-of’ relation between plans, to be interpreted as plan decomposition, as follows. Let ⊆ be the relation of inclusion between events; then R is a part of P iff for every event e in R there is an event e in P such that e ⊆ e, as follows: R P ≡ [∀e ∈ R, ∃e ∈ P e ⊆ e ] With this interpretation of plans, we can see that the potential antecedents of the ambiguous demonstratives are part of a mereological structure similar to that observed in the previous examples, and that there is a similar range of possibilities concerning the interpretation derived by the listener. In particular, a p-underspecified interpretation is available, as in the previous cases. One crucial difference is that in the case of reference to plans one interpretation does not seem available: this is the interpretation in which the pronoun refers to Q. This appears to be yet another instance of the so-called ‘right frontier constraint’ often discussed in the literature on references to abstract objects (Webber, 1991). As a result, the p-underspecified interpretation now appears to be further constrained, as well: the antecedent z of the pronoun is not merely dominated by the supremum of the current plan, P; but it’s also part of its right frontier: z ∗ P , z ∈ RF (P ) What’s more, these last examples have a further, and crucial, similarity to the mereology cases discussed earlier: in these cases, as well, we can say that the two possible interpretations of the relevant utterance (e.g., 15.4, that gets us to Bath at 5 AM) are equivalent for the purpose of the plan. This is because once the part of the plan being proposed by M in 13.1–13.5, Q, has been accepted by S (utterances 14.1–14.2), whether or not the plan as a whole (P) is going to work depends entirely on whether subplan R is going to work; so accepting R is essentially equivalent to accepting P as a whole: P ∼R
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
21
4. The Justified Sloppiness Hypothesis
The discussion in the previous section makes it clear that the two patterns of anaphoric reference we have observed in the trains dialogues have at least three aspects in common: 1. Both explicitly mentioned potential antecedents x and y are elements of an underlying mereological structure with summum σ = x ⊕ y which has been explicitly constructed (and made salient) in the dialogue (σ = oj ⊕ tc in (9), σ = P in the case of (10)); 2. the existence of this structure makes it possible to construct a punderspecified interpretation in which the anaphoric expression is interpreted as denoting an element z included in the mereological structure - i.e., part-of its summum σ: xyσz ... σ =x⊕y z ∗σ ... 3. All possible interpretations (x, y, z, x ⊕ y) are equivalent for the purposes of the plan. This suggest the following preliminary hypothesis: Ambiguous anaphoric expressions are not perceived as infelicitous provided that Conditions 1–3 hold.
This may be because if these three conditions hold, the speaker’s sloppiness in using an anaphoric expression in an ambiguous context is not problematic; we will therefore use the term justified sloppiness to indicate cases such as those discussed in the previous section, and refer to the hypothesis above as Justified Sloppiness Hypothesis, or jsh. Of course, the fact that a p-underspecified interpretation exists does not mean that the listener will adopt it as its final interpretation; however, this possibility is what makes these examples interesting from an underspecification perspective.
22
POESIO ET AL. 5. Additional cases of justified sloppiness
5.1. Events related by generation After identifying the cases discussed above, we discovered that similar examples of potentially ambiguous anaphoric expressions which, however, did not appear to be problematic for the reader had already been discussed by Schuster (1988), albeit not in connection with the question of whether anaphoric reference is underspecified or not.4 Schuster analyzes two types of data: dialogues between an expert and a novice attempting to learn how to use the Emacs editor, and questionnaires asking subjects to indicate the preferred referents of anaphoric expressions referring to events. The transcripts of Emacs dialogues include several examples of reference to events, the most interesting among which for our purposes are examples like the following (our own indices): (13) a. E: Do this: [1 set a “mark” at some point (any old point) by [2 typing <esc>-M]]. It will say “mark set”. Try it. b. E: <esc>-M will give set-mark. Try it. Schuster uses such examples to argue for a representation of events in which an event may generate another event, such as those developed by Goldman (1970) and Pollack (1986). Examples like those in (13) are used as diagnostics, in the sense that the possibility of using a single pronoun indicates that the two events are related by a generation relation: In both cases [our (13a) and (13b), NDR] the referent(s) of the pronoun it can be either “setting the mark” or “typing <esc>-M” or even both: “setting the mark by typing <esc>-M”. . . . we can claim that “typing <esc>-M at a given time” can generate “setting the mark at that given time” . . . ((Schuster, 1988), p.9–10).
Schuster also observes that these references appear to be unproblematic; her explanation is: This relationship [generate] allows us to establish a connection between “typing <esc>-M” and “setting the mark”and it can be understood as one relationship . . . When the pronoun it is used as is the case in both examples, neither of the two referents need to be specified because the generation relationship indicates that they are both related to each other . . . 4
These examples were pointed out to us by Bonnie Webber.
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
23
She also argues that if the generation relation is not properly established, such references may turn out to be ambiguous, as in the following example: (14) Set the mark at the beginning of the region. Type <esc>-M and once you’ve done that1 , move to the end of the region. These cases bear a considerable resemblance to those discussed in the previous section: two actions that are closely ‘tied together’ and as a result reference to the one becomes equivalent to a reference to the other. Using tem to indicate “typing <esc>-M”and stm for setting the mark, we can say again that the two interpretations for the discourse entity z for that are equivalent: tem ∼ stm The difference in this case is that instead of a ‘part-of’ relationships between the actions, as in the case of reference to plans discussed earlier, we have a much tighter relationship: generation makes the two actions almost into a single action. As a result, in this case we cannot even talk of ‘sloppy’ references: these references are perfectly accurate. We can still assume that the pronoun gets assigned an underspecified interpretation as discussed above, but this interpretation is almost not ‘underspecified’: z ∗ (stm ⊕ tem), generates(stm,tem) 5.2. An unclear case: References to spatial areas We are also aware of a few cases that cannot clearly be reconduced under our generalization, either because it isn’t clear whether the reference was truly ambiguous or merely vague, or because of lack of evidence concerning whether readers truly find these references unproblematic. Poesio and Vieira (1998) report that human subjects do not agree on the interpretation of definite descriptions such as the area in the following example: (15) About 160 workers at a factory that made paper for the Kent filters were exposed to asbestos in the 1950s. Areas of the factory were particularly dusty where the crocidolite was used. Workers dumped large burlap sacks of the imported material into a huge bin, poured
24
POESIO ET AL.
in cotton and acetate fibers and mechanically mixed the dry fibers in a process used to make filters. Workers described “clouds of blue dust” that hung over parts of the factory, even though exhaust fans ventilated the area. Three subjects were asked to indicate the antecedent of this description in the text. One subject indicated parts of the factory as the antecedent; another indicated the factory; and the third indicated areas of the factory. In this example, again, we have an underlying mereological structure: both parts of the factory and areas of the factory are obviously included in the total area of the factory. There are, however, a few problematic issues in these examples. First of all, in this case there is no obvious equivalence between the three different interpretations. Furthermore, in this case one could argue that the object referred to – the area – is not in focus; to the extent that one can say that there is a focus in this text, it is most likely the factory. So, given the results of the experiments of Garrod et al., this is perhaps the example in which it is most likely that the reader did not even attempt to construct an interpretation for the anaphoric expression.
6. The Limits of Corpus Analysis
It is a good idea to stop at this point and think again about which questions we would like to see answered, and what we can expect from an analysis of corpus data like the one we just presented. With corpus analysis, we can identify contexts in which ‘sloppy references‘ take place, and commonalities between the semantic interpretations produced in each case. But it is important to realize that most of the important questions cannot be answered this way. In all of the examples we discussed the speaker is taking a risk, yet the listener does not signal a problem in understanding. An interesting question raised by this observation is whether the speaker is simply being sloppy, or he/she has done what Hobbs (1985) would call ‘collapsing a complex theory in one of coarser granularity’ – i.e., he/she is aware that there are two possible interpretations, but is also aware that the two interpretations are equivalent. This is one example of a question that cannot be answered by corpus analysis – or indeed, by any other technique we are aware of, because doing so would require reading the mind of the speaker.
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
25
More amenable questions are whether it’s really the case that listeners do not find these cases problematic, and if so, what kind of interpretations they are constructing. The space of possible answers to this second question is as follows: 1. The listener doesn’t even attempt to interpret the pronoun, and keeps what is called a h-underspecified interpretation in (Poesio, 1999): i.e., an interpretation in which the conventional meaning of some sub-utterances has not been determined and, as a consequence, the conventional meaning of the utterance as a whole is not determined. This hypothesis would be perhaps more plausible in the case of non-task-oriented dialogues such as those in the Switchboard corpus; less so in the case of task-oriented dialogues. Furthermore, the results by Garrod et al. (1994) suggest that listeners do interpret pronouns when the entity they refer to is in focus (as is the case in all examples here).5 2. The listener does attempt to interpret the pronoun. Again, there are two possibilities: a) The listener realizes that there are two possible interpretations. In this case, there are three more possibilities: i) The listener realizes that the two objects are part of a same mereological structure t; so it builds a (p-underspecified) interpretation (Poesio, 1999) where the pronoun is assigned a discourse entity z as interpretation, with the constraints that z (e ⊕ b), AT OM (z). ii) The listener performs a shift in granularity, building a new interpretation in which e and b are treated as the same object. iii) The listener interprets the pronoun as referring to the mereological structure itself, (e ⊕ b). iv) The listener chooses one of the two interpretations, whether or not he/she realizes that they are equivalent (he/she may also be taking a risky strategy and ‘hope for the best’). 5
Perhaps one could argue that at this point in the conversation the listener (usually S) is simply constructing a very rough plan, without really trying to interpret everything that the speaker says; this is left for later. This hypothesis is, however, hard to distinguish from Hypothesis 2.a.
26
POESIO ET AL.
b) The listener only finds one possible interpretation for the pronoun, either e or b; no communication problem ensues, since the two interpretations are equivalent. Corpus analysis can’t answer these questions, either; but in this case, we are talking about questions that may be answerable using controlled psychological experiments. In the next section, we discuss our preliminary experimental results.
7. Testing the Justified Sloppiness Hypothesis
The Justified Sloppiness Hypothesis is a fairly weak claim, in that it does not say anything concerning the actual interpretation of the pronominal uses we identified: it merely asserts that our ’lack of problem signals’ heuristic is correct, and that cases of pronominal reference that satisfy the conditions we identified are indeed felicitous. As such, it is fairly easy to check: we simply have to test whether sentences that contain a potentially ambiguous anaphoric reference are easier to process when the two potential antecedents are part of a mereological structure rather than being separate. A number of techniques can be used to test hypotheses of this kind; we used the Magnitude Estimation technique proposed in (Bard et al., 1996). Methods To test the jsh, we asked subjects to judge whether sentences such as (16a) are ‘more acceptable’ (in that less ambiguous) than the minimally different (16b), in which the engine and the boxcar are not attached together: (16) a. The engineer hooked up the engine to the boxcar and sent it to London. b. The engineer separated the engine from the boxcar and sent it to London. In Magnitude Estimation experiments, the subjects are asked to assign a magnitude (an arbitrary number) to a reference sentence, and then have to judge the acceptability of other sentences relative to the reference magnitude. This experiment involves two conditions: ‘MEREOLOGY’ (M) and ‘NON-MEREOLOGY’ (NM). To compare these two conditions we used
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
Mereology
Nonmereology
Total
Full Material First Part
0.0409 0.1712
-0.0656 0.1403
-0.0123 0.1557
Total
0.1061
0.0373
Figure 1.
27
By-subject means for Experiment 1
minimal pairs of the form shown in (16), with an identical second part containing the anaphoric expression, and first parts that differ only in the verb: in the MEREOLOGY condition, a verb suggesting that the two objects are part of a larger block is used (e.g., hooked up . . . to; in the NON-MEREOLOGY condition, that the two objects are disjoint (e.g., separated . . . from). We adopted a Latin Square design, whereby each subject sees only one element of the minimal pair. (In fact, we also asked our subjects to estimate the acceptability of the first parts only, to make sure the differences were not there. As a result, we got four groups of subjects.) The experiment was run using WebExp, a software package for running experiments on the Web developed at the Universities of Edinburgh and Saarbruecken (http://www.hcrc.ed.ac.uk/web exp/). The subjects connect to a web page (viz. http://www.cogsci.ed.ac.uk/ ∼poesio/web exp/undersp1.instr.html) and follows the instructions at her/his own pace. We had 28 subjects in total. Results We found a significant effect of mereology both on a by-subject and a by-item analysis.6 The means for the by-subject analysis are shown in figure 1. A two-way ANOVA test over these means indicates a Length effect (First Parts are more acceptable than Full Materials) of no concern to us, but also that Mereology items like (16a) are significantly more acceptable than Non-Mereology items like (16b): Fs (1,27) = 36.78 (p < 0.000). Crucially, we only find this effect when comparing Full 6
A by subjects analysis indicates whether the results generalize across subjects – i.e., whether new subjects are likely to behave like the ones we tested. A by items analysis indicates whether the results generalize across materials.
28
POESIO ET AL.
Materials, not when comparing First Parts: Fs = 7.45 (p < 0.011) for interaction M × P . Similar results are obtained when analyzing the means by items: Fa = 9.43 (p < 0.005) for Mereology, Fa = 5.196 (p < 0.032) for interaction M × P . Discussion These results support the jsh; preliminary analyses of these results also suggest that a few refinements may be necessary. First of all, we observed that the availability of the ‘underspecified’ interpretation discussed above is affected by the salience of the antecedents: when the two antecedents are highly salient, the interpretation becomes difficult, and the sentences less acceptable. For example, entities introduced by proper names are ‘too salient’: (17) Sue tied John’s bike to Bill’s bike. It wouldn’t move anymore. Also, there seems to be a difference between instructions and assertions: in the first case, there seems to be a stronger feeling that a plan is being developed, which seem to make condition 2 easier to achieve. The next step is to address the second–and central– question: what kind of interpretation is being produced by the listener in these cases? This is the goal of a follow-up experiment currently under way, also using Web Exp, which involves the same materials, but in which we ask our subjects to indicate the antecedents by means of a multiple-choice questionnaire.
8. Conclusions
In summary, we analyzed a corpus of task-oriented dialogues, finding two patterns of anaphoric pronoun usage in which apparently ambiguous pronouns do not seem to result in communication problems: references to objects which are part of a larger mereological structure, and references to events which are part of a plan. Our analysis of these cases led us to formulate a Justified Sloppiness Hypothesis, stating that such apparently ambiguous uses are felicitous provided that the following three conditions hold: 1. The alternative interpretations are part of an underlying mereological structure, which has been made salient in the discourse; 2. this structure allows a p-underspecified interpretation;
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
29
3. the alternative interpretations are equivalent for the purposes of the plan. We also argued that examples previously noted by Schuster can be reconduced under the proposed generalization. Controlled psychological experiments did support the jsh, showing that pronominal uses satisfying these three conditions are significantly more acceptable than analogous ambiguous cases in which no mereological structure is made salient. This is, of course, only a first step. Our future research plans include testing whether the p-underspecified interpretations allowed by the discourse model are actually chosen as the interpretation of the pronouns–a difficult question to answer without forcing our subjects to choose one of these interpretations. It also turns out that speakers are not always so careful; there also seem to be unjustified sloppiness cases, in which only the first Condition in the jsh is satisfied, yet speakers still use pronouns. These cases, as well, we plan to study in the future.
Acknowledgments Thanks to Ellen Bard, Antje Roßdeutscher, Hannes Rieser, Patrick Sturt, and Bonnie Webber for comments and suggestions; special thanks to Frank Keller and Patrick Sturt for help with the design of the experiments. This work was supported in part by Advanced Research Fellowship B/96/AF/2266 from the UK Engineering and Physical Sciences Research Council (Poesio), in part by a European Science Exchange Programme grant from the Royal Society, Cases of Unresolved Underspecification (Poesio and Reyle).
References Alshawi, H. and R. Crouch: 1992, ‘Monotonic Semantic Interpretation’. In: Proc. 30th. ACL. University of Delaware, pp. 32–39. Bard, E. G., D. Robertson, and A. Sorace: 1996, ‘Magnitude Estimation of Linguistic Acceptability’. Language 72(1), 32–68. Copestake, A. and T. Briscoe: 1995, ‘Semi-Productive Polysemy and Sense Extension’. Journal of Semantics 12(1), 15–68. Special Issue on Lexical Semantics.
30
POESIO ET AL.
Frazier, L. and K. Rayner: 1990, ‘Taking on Semantic Commitments: Processing Multiple Meanings vs. Multiple Senses’. Journal of Memory and Language 29, 181–200. Garrod, S. C., D. Freudenthal, and E. Boyle: 1994, ‘The role of different types of anaphor in the on-line resolution of sentences in a discourse’. Journal of Memory and Language 32, 1–30. Garrod, S. C. and A. J. Sanford: 1985, ‘On the real-time character of interpretation during reading’. Language and Cognitive Processes 1, 43–61. Gernsbacher, M. A. and D. Hargreaves: 1988, ‘Accessing Sentence Participants: The Advantage of First Mention’. Journal of Memory and Language 27, 699–717. Goldman, A.: 1970, A Theory of Human Action. Princeton, NJ: Princeton University Press. Grosz, B. J., A. K. Joshi, and S. Weinstein: 1995, ‘Centering: A Framework for Modeling the Local Coherence of Discourse’. Computational Linguistics 21(2), 202–225. (The paper originally appeared as an unpublished manuscript in 1986). Hobbs, J. R.: 1985, ‘Granularity’. In: Proceedings of the Ninth International Joint Conference on Artificial Intelligence. Los Angeles, California, pp. 432– 435. Lakoff, G. P.: 1970, ‘A note on vagueness and ambiguity’. Linguistic Inquiry 1(3), 357–359. Link, G.: 1983, ‘The Logical Analysis of Plurals and Mass Terms: A LatticeTheoretical Approach’. In: R. B¨ auerle, C. Schwarze, and A. von Stechow (eds.): Meaning, Use and Interpretation of Language. Walter de Gruyter, pp. 302–323. Muskens, R.: 1995, ‘Order-independence and underspecification’. In DYANA2 Deliverable R2.2.C, Ellipsis, Underspecification, and Events in Dynamic Semantics. Pinkal, M.: 1995, ‘Radical Underspecification’. In: P. Dekker, J. Groenendijk, and M. Stokhof (eds.): Proceedings of the Tenth Amsterdam Colloquium. Poesio, M.: 1991, ‘Relational Semantics and Scope Ambiguity’. In: J. Barwise, J. M. Gawron, G. Plotkin, and S. Tutiya (eds.): Situation Semantics and its Applications, vol.2. Stanford, CA: CSLI, Chap. 20, pp. 469–497. Poesio, M.: 1994, ‘Discourse Interpretation and the Scope of Operators’. Ph.D. thesis, University of Rochester, Department of Computer Science, Rochester, NY. Poesio, M.: 1996, ‘Semantic Ambiguity and Perceived Ambiguity’. In: K. van Deemter and S. Peters (eds.): Semantic Ambiguity and Underspecification. Stanford, CA: CSLI, Chap. 8, pp. 159–201. Poesio, M.: 1999, ‘Utterance Processing and Semantic Underspecification’. HCRC/RP 103, University of Edinburgh, HCRC. Poesio, M.: to appear, Incrementality and Underspecification in Semantic Interpretation, Lecture Notes. Stanford, CA: CSLI. To appear.
JUSTIFIED SLOPPINESS IN ANAPHORIC REFERENCE
31
Poesio, M. and R. Stevenson: To appear, Salience: Theoretical Models and Empirical Evidence. Cambridge and New York: Cambridge University Press. Poesio, M. and R. Vieira: 1998, ‘A Corpus-Based Investigation of Definite Description Use’. Computational Linguistics 24(2), 183–216. Also available as Research Paper CCS-RP-71, Centre for Cognitive Science, University of Edinburgh. Pollack, M. E.: 1986, ‘Inferring Domain Plans in Question-Answering’. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania. Reyle, U.: 1993, ‘Dealing with ambiguities by underspecification: Construction, Representation and Deduction’. Journal of Semantics 10, 123–179. Reyle, U.: 1996, ‘Co-indexing Labeled DRSs to Represent and Reason with Ambiguities’. In: K. van Deemter and S. Peters (eds.): Semantic Ambiguity and Underspecification. Stanford: CSLI, Chap. 10, pp. 239–268. Sanford, A. J. and P. Sturt: 2002, ‘Depth of processing in language comprehension: not noticing the evidence’. Trends in Cognitive Science 6, 382–386. Schuster, E.: 1988, ‘Pronominal reference to events and actions: Evidence from naturally-occurring data’. LINC LAB 100, University of Pennsylvania, Dept. of Computer and Information Science, Philadelphia. Sidner, C. L.: 1979, ‘Towards a computational theory of definite anaphora comprehension in English discourse’. Ph.D. thesis, MIT. Stevenson, R. J., R. A. Crawley, and D. Kleinman: 1994, ‘Thematic Roles, Focus, and the Representation of Events’. Language and Cognitive Processes 9, 519–548. Sturt, P. and M. Crocker: 1996, ‘Monotonic Syntactic Processing: A crosslinguistic study of attachment and reanalysis’. Language and Cognitive Processes 11(5), 449–494. Swinney, D. A.: 1979, ‘Lexical Access During Sentence Comprehension: (Re)consideration of Context Effects’. Journal of Verbal Learning and Verbal Behavior 18, 545–567. Tanenhaus, M. K., M. Spivey-Knowlton, K. M. Eberhard, and J. C. Sedivy: 1995, ‘Integration of Visual and Linguistic Information in Spoken Language Comprehension’. Science 268, 1632–1634. van Deemter, K. and S. Peters (eds.): 1996, Semantic Ambiguity and Underspecification. Stanford: CSLI Publications. van Eijck, J. and J. Jaspars: 1996, ‘Underspecification and Reasoning’. In Building the Framework, Deliverable D15 of the FRACAS project. Available at URL http://www.cogsci.ed.ac.uk/~fracas/. Webber, B. L.: 1991, ‘Structure and Ostension in the Interpretation of Discourse Deixis’. Language and Cognitive Processes 6(2), 107–135. Zwicky, A. and J. Sadock: 1975, ‘Ambiguity Tests and How to Fail Them’. In: J. Kimball (ed.): Syntax and Semantics 4. New York: Academic Press, pp. 1–36.
AOIFE CAHILL, MAIREAD MCCARTHY, MICHAEL BURKE, JOSEF VAN GENABITH AND ANDY WAY
DERIVING QUASI-LOGICAL FORMS FROM F-STRUCTURES FOR THE PENN TREEBANK
1. Introduction
Probabilistic parsers and grammars extracted from treebanks (cf. Charniak (1996)) provide an attractive way of inducing large coverage syntactic resources. However, automatic construction of logical forms for such large coverage grammars is a non-trivial task. In this chapter we present the first steps towards this goal: we show how the trees in the Penn-II treebank (Marcus et al., 1994) can be associated automatically with simple Quasi-Logical Forms inspired by (Alshawi & Crouch, 1992). Our approach is based on combining two independent strands of work: the first is the observation that there is a close correspondence between Quasi-Logical Forms and Lexical-Functional Grammar (LFG) f-structures (van Genabith and Crouch, 1996); the second is the development of an automatic proto-f-structure annotation algorithm for the Penn-II treebank (Cahill et al., 2002a; Cahill et al., 2002b). We automatically annotate the trees in the Penn-II treebank with LFG f(unctional)-structures (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) and then translate the resulting f-structures into simple Quasi-Logical Forms. Currently, using this method we can associate 95.76% of the trees in the Penn-II treebank with a Quasi-Logical Form. This chapter is structured as follows: first, we briefly describe the basics of LFG f-structures, Quasi-Logical Forms (QLFs) and how to translate between them (van Genabith and Crouch, 1996). Second, we outline the automatic proto-f-structure annotation method developed in (Cahill et al., 2002a; Cahill et al., 2002b). Proto-f-structures do not resolve long-distance dependencies. Third, we extend this method towards proper f-structures to represent non-local dependencies and passive. Fourth, we extend the theoretical work described in (van Genabith and Crouch, 1996) to cover the data provided by the Penn-II treebank. Fifth, we compare our approach with related work by (Liakata and Pulman, 2002) and outline ongoing work on combining our approach 33 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 33–53. c 2007 Springer.
34
CAHILL ET AL.
with probabilistic, treebank-based LFG parsers (Cahill et al., 2002c) to parse new text into f-structures and QLFs. Finally, we conclude and outline further work.
2. LFG F-Structures and Quasi-Logical Forms
2.1. Lexical-Functional Grammar Lexical-Functional Grammar (LFG) (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) is an early member of the family of unification- (more correctly: constraint-) based grammar formalisms (FUG, PATR-II, GPSG, HPSG etc.). At its most basic, an LFG involves two levels of representation: c-structure (constituent structure) and f-structure (functional structure). C-structure represents surface grammatical configurations such as word order and the grouping of linguistic units into larger phrases. The c-structure component of an LFG is represented by a CF-PSG (context-free phrase structure grammar). F-structure represents abstract syntactic functions such as subj(ect), obj(ect), obl(ique), “closed” comp(lement), “open” xcomp(lement), pred(icate), adjn(unct) etc. in terms of recursive attribute-value structure representations. These functional representations abstract away from particulars of surface configuration. While languages differ with respect to surface configuration (word order etc.), they may still encode the same (or very similar) abstract syntactic functions (approximating to predicate-argument or deep dependency structure). To give a simple example, typologically, English is a SVO (subject-verb-object) language, while Irish is a verb initial VSO language. However, a sentence like John saw Mary and its Irish translation Chonaic Se´ an M´ aire, while associated with very different c-structure trees, have structurally isomorphic f-structure representations, as illustrated in Figure 1. C-structure trees and f-structures are related in terms of projections (indicated by arrows in Figure 1). These projections are defined in terms of f-structure annotations in c-structure trees (equations describing f-structures) originating from annotated grammar rules and lexical entries. A sample set of LFG grammar rules and lexical entries with functional annotations (f-descriptions) is provided in Figure 2. Optional constituents are indicated in brackets.
35
DERIVING QUASI-LOGICAL FORMS S ↑=↓ NP (↑ subj)= ↓
VP ↑=↓
John
V ↑=↓
NP (↑ obj)= ↓
saw
Mary S ↑=↓
V ↑=↓
NP (↑ subj)= ↓
Chonaic
Se´ an
Figure 2. S
→
NP →
f 1 :
NP f 1 : (↑ obj) = ↓ M´ aire
‘see(↑subj)(↑obj)’ pred ‘John’ subj f 2 : num sg pers 3 pred ‘Mary’ obj f 3 : num sg pers 3 tense past pred
‘feic(↑subj)(↑obj)’ pred ‘Sean’ subj f 2 : num sg pers 3 pred ‘Maire’ obj f 3 : num sg pers 3 tense past pred
C- and f-structures for an English and corresponding Irish sentence NP ↑ SUBJ =↓
DET ↑=↓
VP →
V ↑=↓
NP →
Mary
Figure 3.
VP ↑=↓
ADV ↓∈↑ ADJN
N ↑=↓
NP ↑ OBJ =↓
VP ↑ XCOMP =↓
S ↑ COMP =↓
↑ PRED = MARY ↑ NUM = SG ↑ PERS = 3
Sample LFG grammar rules for a simple fragment of English
2.2. Quasi-Logical Forms Quasi-Logical Forms (QLFs) (Alshawi & Crouch, 1992) provide the semantic level of representation employed in the Core Language Engine (CLE) (Alshawi, 1992). The two main characteristics of the formalism are underspecification and monotonic contextual resolution. QLFs give (partial) descriptions of intended semantic compositions. Contextual resolution monotonically adds to this description, e.g. by placing further constraints on the meanings of certain expressions like pronouns, or quantifier scope. QLFs are interpreted by a truth-conditional semantics via a supervaluation construction over the compositions meeting the description.
36
CAHILL ET AL.
Unresolved QLFs give the basic predicate-argument structure of a sentence, mixed with some syntactic information encoded in the category attributes (e.g.:
) of QLF terms and forms. As an example, (ignoring temporal and aspectual information) the sentence Every representative supported a candidate. would give rise to a QLF of the form: ?Scope:support(term(+r,,representative,?Q,?S), term(+g,,candidate,?P,?R))
The motivation for including syntactic information in QLFs is that resolution of anaphora, ellipsis or quantifier scope may be constrained by syntactic factors (Alshawi, 1992). 2.3. From F-Structures to QLFs: I F-structures encode predominantly abstract syntactic information with some semantic information approximating predicate-argument structure in the form of “semantic form” PRED(icate) values and quantificational information in the form of SPEC(ifier) values:
PRED ‘Representative’
SUBJ NUM sg SPEC Every PRED ‘Support ↑ SUBJ, ↑ OBJ’ PRED ‘Candidate’ OBJ NUM sg SPEC
A
While there is clear difference in approach and emphasis, unresolved QLFs and f-structures bear a striking similarity and, for simple cases at least, it is easy to see how to get from one to the other in terms of a translation function (·)◦ , cf. (van Genabith and Crouch, 1996):
γ1 Γ1 . . . PRED Π ↑ Γ1 , ., ↑ Γn ◦ . . . γn Γn
=
?Scope:Π(γ 1 ◦ , ., γ n ◦ )
DERIVING QUASI-LOGICAL FORMS
37
The core of the (·)◦ mapping taking us from f-structures to QLFs places the values of subcategorisable grammatical functions into their argument positions in the governing semantic form Π and recurses on those arguments. From this rather general perspective, the difference between f-structures and QLF is one of information packaging and presentation rather than anything else.
3. Automatic Proto-F-Structure Annotation
Given an f-structure annotated version of the Penn-II treebank and a mapping from f-structures to QLFs, we can associate the trees in the Penn treebank with QLFs. The question is: how do we get a version of the Penn-II treebank annotated with f-structures? Given a parse-annotated string (c-structure), f-structures are computed from the functional annotations on the RHSs of PSG rules and lexical entries involved in the tree. Clearly, one way of associating the Penn-II treebank with f-structure information is to first automatically extract the CFG from the treebank following the method of (Charniak, 1996); second, manually annotate the extracted CFG rule types (and lexical entries) with f-structure information; third, automatically match the annotated CFG against the parse-annotated strings in the treebank; and fourth, collect and resolve the f-structure annotations in the matching rules to generate an f-structure. Unfortunately, the large number of CFG rule types in treebanks (>19, 000 for Penn-II) makes manual f-structure annotation of grammar rules extracted from a complete treebank prohibitively time consuming and expensive. Can the process of annotating treebank trees with f-structure information be automated? As far as we are aware, to date we can distinguish three different types of automatic f-structure annotation architectures:1 • annotation algorithms • regular expression based annotation • flat, set-based tree description rewriting 1
These have all been developed within an LFG framework and although we refer to them as automatic f-structure annotation architectures they could equally well be used to annotate treebanks with e.g. HPSG typed feature structure or indeed Quasi-Logical Form (QLF) (Liakata and Pulman, 2002) annotations.
38
CAHILL ET AL.
All approaches are based on exploiting categorial and configurational information encoded in trees. Some also exploit the Penn-II functional annotation tags.2 With annotation algorithms, two variants are possible. An annotation algorithm may • directly (recursively and destructively) transform a treebank tree into an f-structure; • indirectly (recursively) annotate CFG treebank trees with f-structure annotations from which an f-structure can be computed by a constraint solver. The earliest approach to automatically identify SUBJ, OBJ etc. nodes in CFG trees structures is probably (Lappin et al., 1989).3 Their algorithm identifies grammatical function nodes to facilitate the statement of transfer rules in a machine translation project. The first direct automatic f-structure transformation algorithm we are aware of is unpublished work by Ron Kaplan (p.c.) from 1996. Kaplan worked on automatically generating f-structures from the ATIS corpus to generate data for LFG-DOP applications. The algorithm walks through the tree looking for different configurations (e.g. np under s, 2nd np under vp, etc.) and “folds” or “bends” the tree into the corresponding f-structure. A regular expression-based, indirect, automatic f-structure annotation methodology is described in (Sadler et al., 2000). The idea is simple: first, the CFG rule set is extracted from the treebank (fragment); second, regular expression-based annotation principles are defined; third, the principles are automatically applied to the extracted rule set to generate an annotated rule set; fourth, the annotated rules are automatically matched against the original treebank trees and thereby f-structures are generated for these trees. Since the annotation principles factor out linguistic generalisations, their number is much smaller than the number of CFG treebank rules. In fact, the regular expression-based f-structure annotation principles constitute a principle-based LFG c-structure/f-structure interface. (Frank, 2000) develops an automatic annotation method that is a generalisation of the regular expression-based annotation method. The 2
Note that apart from -SBJ and -LGS, functional annotation tags in the Penn-II treebank do not indicate LFG type predicate-argument structure but, e.g., serve to classify modifying PP constituents semantically as -TMP (temporal), -LOC (locative) etc. modifiers. 3 This was recently pointed out to us by Shalom Lappin (p.c.).
DERIVING QUASI-LOGICAL FORMS
39
idea is again simple: first, trees are translated into a flat set representation format in a tree description language; second, annotation principles are defined in terms of rules employing a rewriting system originally developed for transfer-based machine translation architectures. In contrast to (Sadler et al., 2000) which applies only to “local” CFG rule contexts, (Frank, 2000) can consider arbitrary tree fragments. Secondly, it can be used to define both order-dependent cascaded and order-independent annotation systems. (Liakata and Pulman, 2002) have developed a similar approach to map Penn-II trees to logical forms. The approaches detailed in (Sadler et al., 2000; Frank, 2000) and compared in (Frank et al., 2003) are proof-of concept and operate on small subsets of the AP and Susanne corpora.4 In our more recent research (Cahill et al., 2002a; Cahill et al., 2002b) we have developed an algorithmic indirect annotation method for the >49, 000 parse annotated strings in the Wall Street Journal section of the Penn-II treebank. The algorithm is implemented as a recursive procedure (in Java) which annotates Penn-II treebank tree nodes with f-structure information. The annotations describe what we call “proto-f-structures”, which • encode basic predicate-argument-modifier structures; • interpret constituents locally (i.e. do not resolve long-distance dependencies or “movement” phenomena encoded as traces in the Penn-II trees); • may be partial or unconnected (the method is robust: in case of missing annotations a sentence may be associated with two or more unconnected f-structure fragments rather than a single complete f-structure). Even though the method is encoded in the form of an annotation algorithm (i.e. a procedure), we did not want to completely hardwire the linguistic basis for the annotation into the procedure. In order to support maintainability and reusability of the annotation algorithm and the linguistic information encoded in it, the algorithm is designed in terms of three main components that, to a first approximation,5 work 4
This is not to claim that these approaches cannot be scaled to a complete treebank! 5 The Coordination component can invoke the L/R Context component again after coordinate daughters are identified, in order to annotate remaining shared complements and adjuncts occurring at the same level as the coordinate daughters in the CFG rule.
40
CAHILL ET AL.
NP subcat
non-sub
left context
head
right context
DT,CD: ↑spec=↓
NN,NNS,NP: ↑=↓
...
ADJP: ↓∈↑adjn NN,NNS,NP: ↓∈↑adjn ...
Figure 4.
SBAR,VP: ↑relmod=↓ PP: ↓∈↑adjn NN,NNS,NP: ↑app=↓
Simplified, partial annotation matrix for NP rules
in sequence: L/R Context APs
⇒
Coordination APs
⇒
Catch-All APs
L/R Context Annotation Principles are based on a tripartition of the daughters of each local tree (of depth one, i.e. CFG rules) into a prefix, head and suffix sequence. In order to establish local heads, we automatically transform the Penn-II trees into head-lexicalised trees by adapting the rules of (Magerman, 1994) and (Collins, 1999). For each LHS type (np, vp, s, etc.) in the Penn-II CFG rule types, we construct an annotation matrix. The matrix encodes information on how to annotate CFG node types (i.e. occurrences of categories) in the left (prefix) and right (suffix) context in rule RHSs with LFG f-structure equations. The matrices distinguish between subcategorisable and nonsubcategorisable grammatical functions. Heads are always annotated ↑=↓. Figure 3 gives a partial and much simplified matrix for NP rules. For each LHS category, the annotation matrices are populated by analysing the topmost frequent rules types such that the token occurrence of such rule types in the corpus covers at least 85%. To give an example, this means that instead of looking at >6, 000 different NP rule types in the Penn-II corpus, we only look at the 102 most frequent NP rule types to populate the NP annotation matrix. During application of the annotation algorithm (i.e. while traversing a treebank tree), however, this annotation matrix is applied to “unseen” NP rule instances (i.e. less frequently occurring NP rules that did not inform the construction of the NP matrix). Such rules, however, will also have e.g. DT constituents to the left of the head daughter, PPs to the right etc. and these constituents will get annotated accordingly. It is in this sense that the annotation matrices generalise to unseen rules (and it is in this sense that they capture linguistic generalisations). To keep L/R context annotation principles simple and perspicuous, they only apply if the local tree does not contain coordination. (Like
41
DERIVING QUASI-LOGICAL FORMS # f-str. frags 0 1 2
Figure 5.
fragmentation # sent percent 221 48142 62
0.46 99.41 0.13
Precision Recall F-Score
All annotations
Preds-only
0.93 0.83 0.88
0.90 0.81 0.85
Proto-f-structure annotation: coverage, precision and recall
and unlike) coordinate structures are treated by the second component of our annotation algorithm, Finally, the algorithm has a “catch-all and clean-up” component. For a more detailed exposition see (McCarthy, 2003). Annotation coverage is measured in terms of f-structure fragmentation (the method is robust and in case of missing annotations may deliver unconnected f-structure fragments for a tree). Annotation accuracy is measured against a manually constructed gold-standard with f-structures for 105 trees randomly selected from Section 23 of the Penn-II treebank.6 Preds-only f-structures (and corresponding annotations) only consider f-structure paths that end in pred:Lemma pairs; i.e. they disregard e.g. num:sg and pers:3rd attribute-value pairs.
4. From “Proto”- towards “Proper”-F-Structures
Figure 4 shows that our proto-f-structure annotation is complete7 and of considerably high quality. Unfortunately, proto-f-structures are not sufficient yet for the compilation of high-quality logical forms representing predicate-argument-modifier structure in the form of, say, QLFs. 6
Our gold-standard f-structures are available for inspection at http://www.computing.dcu.ie/~away/Treebank/treebank.html. Readers will notice that the figures reported here and in Figure 6 are lower than those reported in (Cahill et al., 2003a). This is due to the fact that, (i) unlike in (Cahill et al., 2003a), the current gold standard f-structures used here involve full copies of reentrant, coindexed f-structure components (rather than simple coindexation index features as before); (ii) we have updated the f-structure annotation algorithm, and (iii) we now use the (Crouch et al., 2002) precision and recall software and f-structure to triple translation. 7 The remaining 221 sentences receiving no f-structure are due to inconsistent annotations causing attribute-value structure clashes in the constraint solver.
42
CAHILL ET AL. S
S-TPC- ∗1∗
Anan
VP
NP U.N.
VP
NP
V
NP
signs
treaty
proto-f-structure:
subj
pred
Figure 6.
S
said
T- ∗1∗
proper-f-structure:
subj pred ‘U.N.’ topic pred ‘sign’ pred ‘treaty’ obj pred ‘Anan’ ‘say’
V
subj pred
subj pred ‘U.N.’ topic pred ‘sign’ 1 pred ‘treaty’ obj
comp
pred ‘Anan’ ‘say’ 1
Treebank Tree with Traces and Proto- vs. Proper F-Structure
The reason is that proto-f-structures do not encode passive nor do they attempt to resolve any non-local (long-distance) dependencies (LDDs). The automatic proto-f-structure annotation algorithm presented in Section 3 interprets linguistic material purely locally where it occurs in the tree. In the Penn-II treebank trees, “displacement” phenomena and passives are encoded in terms of traces and coindexed empty nodes in the trees. Such traces and empty nodes, however, are ignored by the automatic proto-f-structure annotation algorithm outlined in Section 3. This is illustrated in Figure 5: unlike the proper f-structure on the right, the proto-f-structure on the left does not establish the link between the value of the TOPIC attribute and the COMP argument of the matrix PRED(icate) say. The Penn-II trees mark LDDs (and partly passive constructions) in terms of a typology of traces relating different kinds of “moved” material and movement types to positions marked by a coindexed empty node where the moved material should be interpreted semantically. In our research to date we have concentrated on traces for A and A’ movement (movement to argument and non-argument positions) including traces for wh-questions, relative clauses, fronted elements (topicalisation) and
43
DERIVING QUASI-LOGICAL FORMS
subjects of participal clauses, gerunds and infinitival clauses (including both controlled and arbitrary PRO). The treatment of traces (as well as passive annotation) is implemented in a new fourth component of our annotation algorithm that translates traces in Penn-II trees into corresponding coindexation (and re-entrancies, as in Figure 5 above) in the f-structure representations: L/R APs
⇒
Coord APs
⇒
Catch-All APs
⇒
Trace APs
Null constituents are treated as full nodes in the annotation (except for passive empty objects) and traces ∗i∗ in treebank trees are recorded in terms of INDEX = i f-structure annotations. Traces without indices are translated into arbitrary PRO. The representation of passive is particularly important as LFG “surface-syntactic” grammatical functions such as SUBJ and OBJ differ from “logical” grammatical functions: surface-syntactic grammatical functions are identified in terms of e.g. agreement phenomena while logical grammatical functions are more akin to thematic roles. The surfacesyntactic subject of a passive sentence is usually a logical object, while a surface grammatical object of an optional by-prepositional phrase in a passive construction is usually the logical subject. This is exemplified in the “argument-switch” between the proper f-structure and the QLF for the string: An agreement was brokered by the U.N. PRED ‘Agreement’
SUBJ NUM SG SPEC A PRED ‘Broker ↑ SUBJ, ↑ OBJby ’ PASSIVE + PRED ‘BY ↑ OBJ ’ PRED ‘U.N.’ OBLby OBJ NUM SG SPEC
THE
?Scope:broker(term(+r,,U.N.,?Q,?S), term(+g,,agreement,?P,?R))
In order to capture these “argument-switches”, we extend our automatic f-structure annotation procedure with a (PASSIVE = +) annotation in relative, matrix and subordinate clauses triggered by a variety of contexts encoded in the Penn-II treebank, including sequences of
44
CAHILL ET AL. # frags. 0 1 2
Figure 7.
# sent. 226 48141 58
Precision Recall F-Score
all annotations
preds only
0.93 0.95 0.94
0.90 0.91 0.91
Proper-f-structure annotation: fragmentation, precision and recall
forms of to be or to get followed by past participles followed by empty NPs in object position coindexed with the surface subject NP position or by the presence of an -LGS (logical subject) Penn-II functional tag in by-prepositional phrases. A treebank example involving the interaction of passive, relative clause traces and arbitrary PRO is provided in the Appendix. Proper f-structure annotation precision and recall results given in Figure 6 improve on the best proto-f-structure annotation results in Figure 4. For preds only, recall goes up from 0.81 to 0.91, f-score from 0.85 to 0.91; if all annotations are considered, f-score goes up from 0.88 to 0.94. This indicates that the extended annotation algorithm can reliably determine traces for wh-questions, relative clauses, fronted elements and subjects of participal clauses, gerunds and infinitival clauses as well as passives, and reflect them accurately in terms of indices (and corresponding re-entrancies) in the resulting f-structure representations. Fragmentation, however, goes up slightly. Currently 226 sentences do not get an annotation due to inconsistent annotations (against 221 for the proto-f-structure annotation).
5. From F-Structures to QLFs: II
In our work we employ a modified and much scaled down subset of the full QLF formalism presented in (Alshawi & Crouch, 1992). We distinguish between terms and form(ula)s. Every predicate is associated with a distinguished referential argument: an eventuality8 variable for verbal predicates and entity variables for other predicates. For nominal structures, entity variables follow the predicate separated by a colon, while they precede all other predicates. Where possible, we distinguish between quantified and non-quantified nominal structures. Quantified terms are of the form q(Quant,Ty:Var,Restr) where Quant is the 8
Our translation does not distinguish between event and state variables.
DERIVING QUASI-LOGICAL FORMS
45
surface quantifier/determiner, Var is the referential argument, Ty is sg, pl or ud (undefined) and Restr is of the form Pred:Var (in simple cases). Non-quantified terms (e.g. proper names, bare plurals etc.) are of the form q(Ty:Var,Pred:Var). Verbal predications are of the form Var:Pred(Arg1 ,..,Argn ) where Argi are terms or forms. Formulas are unscoped, i.e. no attempt is made to instantiate quantifier scope prefix constraints ?Scope as provided by the full QLF formalism. Modification (adjectival, adverbial, prepositional phrase, relative clauses etc.) is treated via relational md(x,y) (modifier) or eq(x,y) (equality) predicates linking modifiers with referential arguments. We do not currently use the higher-order lambda abstraction nor any of the contextual resolution facilities provided by the full QLF formalism. The f-structure to QLF translation algorithm is based on (van Genabith and Crouch, 1996) and extended to include passive constructions, wh-questions, relative clauses, fronted material and subjects of participal clauses, gerunds and infinitival clauses, modification (adjectival, adverbial, prepositional, appositional and sentential and non-sentential adjuncts as well as relative clauses) and coordinate/subordinate constructions. For reasons of space, here we can only discuss a single, simple aspect of the recursive translation algorithm: in the case of a passive f-structure component, the algorithm first tries to determine whether it can find a logical subject, i.e. a sub-f-structure marked LGS = +. If this is the case, the translation of this sub-f-structure is used as the logical subject in the QLF translation of the passive construction, while the QLF translation of the surface SUBJ(ect) f-structure is used as the logical object in the QLF translation. This, and the interaction with relative clause traces, is illustrated in the example provided in the Appendix. If no logical subject is marked in the f-structure, the algorithm inserts a non-specific q(a,ud:x,’UNDEF’:x) logical subject term. As a simple example, consider the following (subject) NP The plant , which is owned by Hollingsworth & Vose Co. , (the full sentence, tree, f-structure and QLF are given in the Appendix). This string is associated with the following f-structure by the automatic f-structure annotation algorithm: relmod : topicrel : index : 1 pred : which subj : index : 1 pred : which passive : +
46
CAHILL ET AL.
xcomp : subj : index : 1 pred : which passive : + tense : past pred : own adjunct : 1 : obj : conj : 2 : num : sing pers : 3 pred : hollingsworth 3 : pred : vose num : sing pers : 3 4 : pred : ’co.’ num : sing pers : 3 pred : & lgs : + pform : by pred : by pred : be tense : pres spec : det : pred : the pers : 3 pred : plant num : sing
and the f-structure is translated into the following QLF term: q(the,sing:1,(plant:1) & 2:be(5:own(8:and(q(sing:10,hollingsworth:10), q(sing:11,vose:11), q(sing:12,co.:12)), q(ud:4,1:4))))
where the optional by-phrase in the f-structure translates into the logical subject of own in the QLF and where the f-structure re-entrancy between the TOPIC-REL and the surface grammatical SUBJ(ect)s of be and own translates into the logical object term q(ud:4,1:4) that carries the same index (1) as the np head noun plant. Currently, 46371 of the 47783 f-structures (97.04%) generated are associated with a QLF by the f-structure–QLF translation algorithm. This means that 95.76% (46371) of the 48424 trees in the Penn treebank receive a QLF.
DERIVING QUASI-LOGICAL FORMS
47
6. A Tree De-Transformation Approach
It is interesting to compare our work with (Liakata and Pulman, 2002). Liakata and Pulman translate Penn-II treebank trees into flat, setbased descriptions much like (Frank, 2000) and then match these descriptions with templates to extract logical forms. In order to cope with passive and non-local (“moved”) material, they pre-process the (flat representations of the) treebank trees to undo the effects of movement: fronted (topicalised) material is moved back to the location of its coindexed empty node in the tree, passives are transformed into essentially their active counterparts etc. These “de-transformed” trees, where, crucially, all material is now located where it should be interpreted semantically, are then matched by a small number of logical form extraction templates. The considerable advantage of this approach is that the logical form extraction templates are now much simpler than they would be for the original trees. Similarly, in our case, the translation of trees into logical forms is simplified by the intermediate level of f-structure representation where non-local dependencies are reflected in terms of f-structure re-entrancies.
7. Parsing into Proper F-Structures and QLFs
Unfortunately, the tree de-transformation approach developed by (Liakata and Pulman, 2002) is not available to us. The reason is the following: empty productions and coindexed null elements required to de-transform trees are not the standard fare in probabilistic parsing (except Collins’ (1999) model 3 and Johnson’s (2002) empty node and trace reconstruction post-processing approach). Indeed, treebanks are usually preprocessed to eliminate empty productions before extracting a PCFG (cf. (Charniak, 1996)). Similarly, standard LFG (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) assumes surface oriented c-structures, traditionally without empty productions, with cstructure nodes required to dominate lexical material (as in: what you see is what you get). What is more, we cannot use a pre-processed, detransformed treebank as in (Liakata and Pulman, 2002) as a basis to extract a PCFG as the CFG rules extracted from such a de-transformed treebank do not reflect surface strings. Given this, it is not immediately clear how one can automatically obtain full Penn-II-style trees complete with empty nodes and coindexation for new text which can then be de-transformed to generate good QLFs.
48
CAHILL ET AL.
In addition to annotating treebank trees with (proto- and proper-) fstructures and QLFs, we have extracted PCFGs from proto-f-structure annotated and unannotated versions of the Penn-II treebank (Cahill et al., 2002c) to parse new text into proto-f-structures. In one approach we extract a simple (i.e. unannotated) PCFG (following (Charniak, 1996)) and annotate the resulting parse trees for new text with our automatic f-structure annotation algorithm to generate f-structures. In the other approach we extract an annotated PCFG (A-PCFG) where (complex) categories consist of CFG categories paired with f-structure equations as generated by the automatic annotation algorithm (e.g. NP:up-subj=down, NP:up-obj=down, etc.). Note that the A-PCFG has more rules than the simple PCFG and that, compared to the rules in the simple PCFG, rules in A-PCFG are associated with different probabilities. Parsing new text with an A-PCFG yields trees with fstructure annotations from which f-structures can be generated directly. These probabilistic LFG parsers are wide-coverage (they parse the Penn-II treebank) and robust. However, they only generate proto-fstructures for new text, that is they do not resolve long-distance dependencies and, as such, the output of these parsers (the proto-f-structures) are not yet sufficient to construct high-quality QLFs for new text (cf. Sections 3 and 4 above). In LFG, non-local dependencies and traces are resolved in f-structure in terms of functional uncertainty expressions (regular expressions over paths in f-structures), located where extraposed or dislocated material is actually found, without any need for corresponding traces and null elements in c-structure (Bresnan, 2001; Dalrymple, 2001). What we have shown in the present chapter is how to transfer traces from Penn-II treebank trees into corresponding reentrancies in automatically generated f-structures (Section 4). The question is: can we use this resource to automatically compute functional uncertainty equations which can be used with conventional CPFG parsing technology to resolve long distance dependencies in f-structure without the need for traces and coindexed empty nodes in trees? The answer is yes: the coindexation (i.e. the reentrancies) in fstructures generated from the traces in the Penn-II treebank trees indicate source and target sites for dislocated material in f-structures. Given these f-structures for the Penn treebank trees, we can automatically compute shortest paths through f-structures linking source and target sites for long distance dependencies. We collect these paths and compute associated probabilities.
DERIVING QUASI-LOGICAL FORMS
49
As in standard LFG, these f-structure paths are then associated with extraposed material (e.g. in the values of TOPIC, TOPIC-REL and FOCUS attributes for fronted-, relative clause- and wh-constructions respectively) and we can parse with standard PCFGs (without empty productions in and coindexation across c-structure tree nodes) and resolve non-local dependencies at f-structure. In order to do this, we also require LFG semantic forms (subcategorisation frames): extraposed material can be linked via an f-structure path to an argument position of a verb only if (i) the verb subcategorises for the argument, (ii) the argument is not already present at the relevant level of the f-structure. LFG semantic forms (i.e. subcat frames and associated probabilities) can be extracted automatically from the f-structure annotated Penn-II treebank resource following the method first described in (van Genabith et al., 1999). Given an f-structure, multiple resolutions of long-distance dependencies are then ranked by multiplying path and semantic form probabilities. The extraction of semantic forms from the f-structure annotated Penn-II resource is detailed in (Cahill et al., 2003c); the resolution of long-distance dependencies in probabilistic LFG parsing is presented in (Cahill et al., 2003b). In our current work we have started to integrate the f-structure to QLF translation with our probabilistic, proper f-structure LFG parsers to parse new text into proper f-structures and QLFs. We hope to report on this work in detail elsewhere.
8. Conclusion
We have presented a methodology for associating Penn-II treebank trees with simple QLFs by combining and extending the work of (van Genabith and Crouch, 1996) and (Cahill et al., 2002a; Cahill et al., 2002b). Currently this method associates 95.76% of the 48424 sentences in the treebank with a QLF. We are currently annotating our 105 test sentences from Section 23 with gold-standard QLF information to evaluate the results of our automatic f-structure to QLF translation. We are refining and extending the coverage of the translation. We are working on integrating the QLF translation into our probabilistic, proper f-structure LFG parsers (Cahill et al., 2003b) to generate QLFs for new text. Finally, it would be interesting to compare our QLFs with the ones generated by (Liakata and Pulman, 2002).
50
CAHILL ET AL.
The f-structures and QLFs for the first 1000 sentences of the Penn-II are available for inspection at http://www.computing.dcu.ie/∼josef/ {1000 sent f-str.html,1000 sent qlf.html}.
References H. Alshawi (ed.) (1992) The Core Language Engine, MIT Press, Cambridge Mass. H. Alshawi and R. Crouch. (1992) Monotonic Semantic Interpretation, In Proceedings 30th Annual Meeting of the Association for Computational Linguistics, pages 32–38. J. Bresnan. (2001) Lexical-Functional Syntax. Blackwell, Oxford. A. Cahill, M. McCarthy, J. van Genabith and A. Way. (2002) Automatic Annotation of the Penn Treebank with LFG F-Structure Information. in Proceedings of the LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data, Las Palmas, Canary Islands, Spain, pp. 8–15. A. Cahill, M. McCarthy, J. van Genabith and A. Way. (2002) Evaluating Automatic F-Structure Annotation for the Penn-II Treebank in Proceedings of the Treebanks and Linguistic Theories (TLT’02) Workshop, Sozopol. Bulgaria, Sept.19th–20th, 2002. A. Cahill, M. McCarthy, J. van Genabith and A. Way. (2002) Parsing with PCFGs and Automatic F-Structure Annotation. In: The Sixth International Conference on Lexical-Functional Grammar, Athens, Greece, 3 July–5 July 2002, http://cslipublications.stanford.edu/. A. Cahill, M. McCarthy, J. van Genabith and A. Way. (2003) Quasi-Logical Forms for the Penn Treebank. in (eds.) Harry Bunt, Ielka van der Sluis and Roser Morante; Proceedings of the Fifth International Workshop on Computational Semantics, IWCS-05, January 15–17, 2003, Tilburg, The Netherlands, pp. 55–71. A. Cahill, M. McCarthy, R. O’Donovan, J. van Genabith and A. Way. (2003) Lexicalisation of Long Distance Dependencies in a TreebankBased Statistical LFG Grammar. In: The Seventh International Conference on Lexical-Functional Grammar, Saratoga Springs, New York, U.S.A., 16 July–18 July, 2003, to appear. A. Cahill, M. McCarthy, R. O’Donovan, J. van Genabith and A. Way. (2003) Extracting Large-Scale Lexical Resources for LFG from the Penn-II Treebank. In: The Seventh International Conference on Lexical-Functional Grammar, Saratoga Springs, New York, U.S.A., 16 July–18 July, 2003, to appear. E. Charniak. (1996) Tree-bank Grammars, Technical Report CS-96-02, Brown University, Providence, Rhode Island.
DERIVING QUASI-LOGICAL FORMS
51
M. Collins. (1999) Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia. R. Crouch, R. Kaplan, T. King and S. Riezler. (2002) A comparison of evaluation metrics for a broad coverage parser. Beyond PARSEVAL workshop at 3rd Int. Conference on Language Resources and Evaluation (LREC’02), Las Palmas. M. Dalrymple. (2001) Lexical-Functional Grammar. San Diego, Calif.; London: Academic Press. A. Frank. (2000) Automatic F-Structure Annotation of Treebank Trees. In: (eds.) M. Butt and T. H. King, The Fifth International Conference on Lexical-Functional Grammar, The University of California at Berkeley, 19 July–20 July 2000, CSLI Publications, Stanford, CA. http://cslipublications.stanford.edu/. A. Frank, L. Sadler, J. van Genabith and A. Way. (2003) From Treebank Resources to LFG F-Structures. In: (ed.) Anne Abeille, Treebanks: Building and Using Syntactically Annotated Corpora, Kluwer Academic Publishers, Dordrecht/Boston/London, in press. M. Johnson. (2002) A simple pattern-matching algorithm for recovering empty nodes and their antecedents. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, University of Pennsylvania, July 8–10. R. Kaplan and J. Bresnan. (1982) Lexical-functional grammar: a formal system for grammatical representation. In Bresnan, J., editor 1982, The Mental Representation of Grammatical Relations. MIT Press, Cambridge Mass. 173–281. S. Lappin, I. Golan and M. Rimon. (1989) Computing Grammatical Functions from Configurational Parse Trees Technical Report 88.268, IBM Israel, Haifa, Israel. M. Liakata and S. Pulman. (2002) From trees to predicate-argument structures. COLING’02, Proceedings of the Conference, Taipei, 24 August – 1 September 2002. D. Magerman. (1994) Natural Language Parsing as Statistical Pattern Recognition Ph.D. Thesis, Stanford University, CA. M. Marcus, G. Kim, M. A. Marcinkiewicz, R. MacIntyre, M. Ferguson, K. Katz and B. Schasberger. (1994) The Penn Treebank: Annotating Predicate Argument Structure. In: Proceedings of the ARPA Human Language Technology Workshop. M. McCarthy. (2003) Linguistic Basis for Automatic F-Structure Annotation of the Penn-II Treebank. M.Sc. dissertation, School of Computer Applications, Dublin City University, Dublin 9, Ireland. L. Sadler, J. van Genabith and A. Way. (2000) Automatic F-Structure Annotation from the AP Treebank. In: (eds.) M. Butt and T. H. King, The Fifth International Conference on Lexical-Functional Grammar, The University
52
CAHILL ET AL.
of California at Berkeley, 19 July–20 July 2000, CSLI Publications, Stanford, CA. http://cslipublications.stanford.edu/. J. van Genabith and D. Crouch. (1996) Direct and Underspecified Interpretations of LFG f-Structures. In: COLING 96, Copenhagen, Denmark, Proceedings of the Conference. 262–267. J. van Genabith, L. Sadler, and A. Way. (1999) Data-Driven Compilation of LFG Semantic Forms. In: EACL-99 Workshop on Linguistically Interpreted Corpora, Bergen, Norway, 69–76.
Appendix The plant, which is owned by Hollingsworth & Vose Co., was under contract with Lorillard to make the cigarette filters. subj : relmod : topicrel : index : 1 pred : which subj : index : 1 pred : which passive : + xcomp : subj : index : 1 pred : which passive : + tense : past pred : own adjunct : 1 : obj : conj : 2 : num : sing pers : 3 pred : hollingsworth 3 : pred : vose num : sing pers : 3 4 : pred : ’co.’ num : sing pers : 3 pred : & lgs : + pform : by pred : by pred : be tense : pres spec : det : pred : the pers : 3 pred : plant num : sing pred : be tense : past adjunct : 5 : obj : xcomp : subj : pred : ’PRO1’ inf : + to : + obj : spec : det : pred : the adjunct : 7 : pred : cigarette num : sing
DERIVING QUASI-LOGICAL FORMS pers : 3 pred : filter num : pl pers : 3 pred : make subj : pred : ’PRO1’ pers : 3 num : sing pred : contract adjunct : 6 : obj : pers : 3 num : sing pred : lorillard pform : with pred : with pform : under pred : under
0:be(q(the,sing:1,(plant:1) & 2:be(5:own(8:and(q(sing:10,hollingsworth:10), q(sing:11,vose:11), q(sing:12,co.:12)), q(ud:4,1:4))))) & (md(0,16) & 16:under(q(sing:17,(contract(q(ud:19,PRO1:19), 18:make(q(ud:19,PRO1:19), q(the,pl:20,(filter:20) & md(20,24) & q(sing:24,cigarette:24)))):17) & (md(17,26) & 26:with(q(sing:27,lorillard:27))))))
53
HARRY BUNT
SEMANTIC UNDERSPECIFICATION: WHICH TECHNIQUE FOR WHAT PURPOSE?
1. Introduction
In recent years a variety of representation formalisms have been proposed that support the construction of underspecified semantic representations, such as Quasi-Logical Form, Underspecified Logical Form, Underspecified Discourse Representation Theory, Minimal Recursion Semantics, Ontological Promiscuity, Hole Semantics, the Constraint Language for Lambda Structures, and Normal Dominance Constraints. These formalisms support methods of underspecification which sometimes seem very different but in fact have similar underlying concepts, and in other cases appear deceptively similar, using the same terminology but with different interpretations. UDRT and Normal Dominance Constraints, for example, at first blush seem quite different but upon closer inspection have much in common; on the other hand, the term ‘metavariable’ is used by different authors to refer to different concepts in different underspecification formalisms. Recent studies have produced interesting results about the relative expressive capabilities of some of these formalisms. Koller (2004) has for instance shown that under certain conditions the underspecified representations of Hole Semantics can be translated into normal dominance constraints, and vice versa. Ebert (2005) has shown that the most prominent underspecification formalisms for the representation of scopal phenomena all suffer from lack of expressive power. This paper aims at contributing to the understanding of the merits of the various underspecification formalisms by considering what different underspecification techniques have to offer for dealing with a range of phenomena that motivate the use of underspecified semantic representations. 55 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 55–85. c 2007 Springer.
56
BUNT 2. Why underspecified semantic representation
The use of underspecified semantic representations is motivated primarily by the massive ambiguity that is found in natural language, in particular as revealed through attempts to build effective language understanding systems. Computer implementation of a Montague-style semantics, using a set of construction rules for building formal meaning representations compositionally from lexical meanings, has turned out not to be feasible due to the astronomical number of alternative representations that would have to be built for an ordinary sentence. Hobbs and Shieber (1987) have shown that the ambiguity of relative quantifier scopes means that a sentence with n noun phrases can have up to n! readings, although syntactic constraints tend to reduce this number (giving a sentence with 5 NPs typically between 30 and 40 readings). The pervasive ambiguity of words is another major cause of the ambiguity explosion in natural language analysis. Bunt and Muskens (1999) estimate that, due to lexical ambiguity and quantifier scope ambiguity alone, an average-length Dutch sentence1 has some 2.000.000 possible readings. Add to this (or rather, multiply this with) the ambiguities due to collective/individual distincions, specific/nonspecific readings, count/mass ambiguities, PP-attachment possibilities, extensional versus intensional readings,... and it is obvious that the construction of representations for all the possible readings of a given sentence is a computationally extremely expsensive task. Moreover, in a given context nearly all of the possible readings have to be discarded. Constructing representations for all possible readings and subsequently discarding nearly all of them, is arguably the most inefficient way of organising the interpretation process... Clearly, the construction and use of compact underspecified semantic representations, that correspond to sets of fully disambiguated readings, may allow a much more efficient way of processing. While ambiguity makes the construction of representations of all the possible interpretations of a sentence exceedingly expensive, the phenomenon of vagueness presents an even greater, more fundamental problem. In the case of an ambiguity, the number of alternative readings can be large, but is finite. Vagueness is worse, in particular that form of vagueness or ‘imprecision’ that is caused by the infinite range of 1
The average length of a written Dutch sentence is approximately 12 words.
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
57
possibilities that a speaker has when choosing a certain granularity for his referring expressions. Take for instance the adjective green. In itself, green is not ambiguous between, for example, light green and dark green, but when your house is being painted using a light and a dark shade of green, then an instruction like That window frame should be painted green is ambiguous, and when you enter a paint shop and ask for green paint, the shopkeeper will want you to be even more precise; such a context requires a finer granularity. Since the required precision of a referring expression depends entirely on the context, there is no a priori way to know how many interpretations of a given sentence should be distinguished; there is in general not even a finite number of interpretations. This means that we do not have the choice of constructing the semantic representations of ‘all possible interpretations’; the only alternative seems to be to construct representations down to a certain level of granularity, and allowing to infer more specific interpretations in a given context. Another form of vagueness is the relational vagueness that is found in nominal complexes, where the semantic relation between the constituents is left implicit, as in Apple computers, Apple employees, Apple logo, university computers, university offices, office teachers, and so on. This form of vagueness is not due to a certain coarseness in granularity (although the implicit internominal relations may be inferred with varying precision), but it is a case of apparently infinite ambiguity, since there appears to be no limit to the relations that can be intended to connect the constituents. For handling this phenomenon as well as for that of granularity-related vagueness, there seems to be no alternative to the employment of underspecified semantic representations. Apart from the efficiency and feasibility considerations of dealing with ambiguity and vagueness, there are other processing considerations that favour the construction of underspecified semantic representations. One consideration is for a language understanding system the occurrence of an unknown word, Often, when an unknown word occurs the context makes it possible to guess what the word means approximately, as in Yesterday I ran into a great green lizard when I crossed the ??tarket?? street; it kept running ahead of me on the pavement until it disappeared under a car, and it will depend on the context whether it is necessary to know precisely what the word means. The same goes for unknown proper names. A similar situation arises when a word cannot be understood (or read) well for some reason. Clearly, in such situations it is not the case that we have an utterance that cannot be interpreted at all, we rather construct an interpretation
58
BUNT
with a ‘hole’ for the unknown word; in other words, we construct an underspecified interpretation for the utterance as a whole. For computer systems with spoken input, this situation is quite common. An independent motivation for assigning underspecified semantic representations to sentences comes from machine translation, since ambiguities in a sentence in the source language are often retained in the target language. Finally, psycholinguistic considerations also provide arguments in favour of underspecified semantic representations. When listening to a sentence, people clearly do not wait constructing an interpretation until the sentence is complete; instead, they interpret incrementally. This means that underspecified semantic representations are constructed from incomplete input. With a lot of luck, at the end of the sentence the interpreter may be able to construct a fully-specified representation from this, but more probable is that human listeners use context information to construct a representation which is not fully specified, but underspecified to a degree that is acceptable in the given context. The use of underspecified semantic representations is thus motivated by a range of rather different phenomena, including the following, and summarized in Table 1: Lexical ambiguity: the referential ambiguity of content words; the count-mass ambiguity of nouns; the possible resolutions of anaphoric and deictic expressions; adjectives by concatenation, also the internal relational ambiguity of compound words. Syntactic ambiguity: the ambiguities resulting from alternative possible parsings, such as the possible attachments of PPs and relative clauses; Structural semantic ambiguity: ambiguities that do not have a lexical or syntactic basis, such as the scoping of quantifiers and modifiers; also the collective/distributive ambiguity of quantifiers; for English also the ambiguity of noun-noun complexes; Semantic imprecision: vagueness, due to relatively coarse granularity in reference; also, the apparently infinite ambiguity of implicit semantic relations; Missing information: the absence of information due to speech recognition problems, unknown words, or interrupted speech; the requirements of incremental processing; also the use of constructions such as ellipsis and short answers.
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
59
Table 1. A taxonomy of motivations of semantic underspecification General phenomenon
Instance
Lexical ambiguity
homonymy; polysemy anaphora and deixis count/mass use of nominals metonymy compound nouns
Syntactic ambiguity
PP-attachment relative clause attachment scope of adjectives and adverbs thematic/semantic role assignment
Structural semantic ambiguity
quantifier scope quantifier distributivity modifier distributivity noun-noun complexes
Semantic imprecision
varying granularity of reference relational vagueness
Missing information
unknown words incomplete input ellipsis, short answers incremental processing
In the following sections we will discuss the applicability of a variety of underspecification techniques to various important forms of ambiguity, vagueness, and missing information. These findings are summarized at the end of the chapter in Table 2.
60
BUNT 3. Semantic Underspecification
3.1. Underspecified Semantic Representations As expressions in a formal language, such as the language of firstorder logic or that of constructive type theory. semantic representations can be described syntactically as formed by the recursive combination of subexpressions by means of logical constructions such as function application, conjunction, negation, and universal quantification. The semantic definitions of these constructions determine the logically correct patterns of reasoning in which these representations may be used, and usually take the form of specifying how the denotation of an expression, given the way it is constructed, can be computed from the denotations of its subexpressions. Since the atomic subexpressions, such as predicate terms and individual constants, have precisely specified denotations, it follows that every semantic representation also has a precise denotation. Being the result of applying constructions to subexpressions, a semantic representation can be underspecified in two ways:2 1. atomic subexpressions (constants and variables) may be ambiguous, i.e. do not have a single value specified as their denotation, but a range of possible values; 2. the way in which subexpressions are combined by means of constructions may not be fully specified. A representation which is underspecified in one or both of these ways may be viewed as a representation of constraints on fully specified meaning representations, i.e. as a meta-representation describing the set of representations that satisfy the constraints. Such a representation can therefore be a compact representation of a set of readings of a natural language expression. (More on compactness below.) Since the combination of subexpressions by means of constructions is in general not fully specified, an underspecified semantic representation (usr) is not a single expression, but a set of (sub-)expressions representing the meanings of parts of the sentence, possibly containing 2
In theory, a third form of underspecification could be to allow ambiguous constructions. For example, one might allow a two-place predicate to combine with a set rather than an ordered pair of arguments, leaving the semantic role of each of the arguments underspecified. We are not aware of any proposal in this direction.
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
61
ambiguous constants and variables, plus a possibly incomplete specification of how these subexpressions may be combined to form a complete semantic representation. So a usr for a given utterance U is a pair: (18) RU = < E U , C U > where E U is a set of expressions and C U is a set of constraints on the admissible ways of combining the subexpressions in E U .3 It follows from definition (18) that a framework for expressing usrs requires not only a language LE for (subexpressions of a) semantic representation, but also a constraint language LC for specifying constraints on combining LE -expressions. Since the constraints expressed in LC refer to LE -expressions, or more precisely to occurrences of such expressions, the expressions in E U must carry identifiers that can be used in LC . Therefore LE is not just a language as we know it for fully specified semantic representation, but LE should additionally have such identifiers and (‘meta’)-variables ranging over these identifiers. Note that the expressions in the E U -component of a usr may themselves be either single LE -expressions or underspecified representations. As formulated, definition (18) suggests that E U would consist of single LE -expressions; this is not really the case, but is in fact immaterial from the point of view of the representational structures that the definition allows. Consider for example a natural language expression S, consisting of two subexpressions S 1 and S 2 , and suppose one would want to represent S as consisting of two usrs for these two parts, plus a set of constraints on how to combine them. This would mean that the underspecified representation of S is structured as: (19) RS = =
< E S , C S > = < {usr1 , usr2 }, C S > < {< E 1 , C 1 >, < E 2 , C 2 >}, C S >.
This last representational structure is equivalent to (20) RS = < E 1 ∪ E 2 , C 1 ∪ C 2 ∪ C S > = < E S , C S > since (19) and (20) contain the same sets of subexpressions and the same sets of combination constraints; the only difference is that in (19) 3
One could also imagine constraints on the possible interpretations of ambiguous constants and variables, but such constraints are in practice determined by the interpretation framework in which the usr) is used (see below, section 3.3), rather than expressed as part of the usr.
62
BUNT
the set of subexpressions has been structured into subsets, and the set of constraints has likewise been structured into subsets that apply to the subsets of subexpressions and to their combination. 3.2. Underspecification techniques The techniques that have been proposed for underspecificied semantic representation can be classified in five groups: (1) in situ representations; (2) use of ambiguous terms; (3) labels, holes, and dominance constraints; (4) flat, conjunctive expressions: (5) use of stores and lists. We briefly characterize each of these groups. In situ representations One of the oldest approaches to the representation of quantifier structures in an underspecified way is the use of operators in structurally similar positions as the corresponding determiners or quantifiers in the natural language expressions. So a sentence such as Every student read a book is represented as something like read(every student, a book). Various proposals to this effect have been put forward, such as Schubert and Pelletier’s ‘conventional translations’ (Schubert & Pelletier, 1982), but also ‘situation schemata’ (Fenstad et al., 1987), QuasiLogical Forms (Alshawi & van Eijck, 1987; Alshawi, 1992), and Underspecified Logical Forms (Geurts & Rentier, 1991; Kievit, 1998).4 In the most influential of these proposals, that of Quasi-Logical Form (QLF), predicates have arguments in the form of terms (‘quasideterminers’) which include a list of features that capture quantificational information relating to the determiners that are represented. For example, Every student read a book would be represented as: (21) read, qterm(, X, [student,X]), qterm(, Y, [book,Y])) QLFs and other in situ representations were intended to be used in intermediate stages of semantic interpretation, and to be disambiguated (or ‘resolved’) in a later stage, which involves extracting the in situ terms from the usr in a certain order which determiners the relative scoping of quantifiers in a fully resolved logical form. 4
The oldest proposal in this direction dates back to Woods (1978), who used underspecified representations in the LUNAR question-answering system.
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
63
Ambiguous terms in a formal language The pervasive ambiguity and vagueness of words has inspired the idea of using equally ambiguous and vague predicate constants and other terms in formal representations. This approach was pioneered by the designers of the Phliqa question answering system (see Bronnenberg et al., 1979), where a formal representation language with ambiguous constants was defined, using a model-theoretic semantics with supervaluations (Van Fraassen, 1966). In fact, all the nonlogical constants in this language are ambiguous, at least in principle, just as the content words of natural language. Once ambiguous terms have been allowed in a formal language, they can be useful also for other purposes, such as for the underspecified representation of the collective/distributive interpretations of quantifiers (Bunt, 1985) and for the compact representation of attachment ambiguities (Bunt, 1995). Such constants do not correspond to any natural language words, and have an ambiguity that is determined by semantic theory; for instance, the possible values of formal metaconstants for quantifier scoping and distributivity are provided by the theory of quantification that is used. Yet another use of ambiguous terms has been introduced in some approaches to underspecification, such as CLLS (see below), where so-called metavariables are used that are intended to be replaced by semantic expressions that are generated from the context. We therefore distinguish the following three cases in the use of ambiguous terms in usrs: 1. Predicate constants, function constants or individual constants that represent ambiguous or vague content words, such as homonymous, polysemous, or ‘vague’ nouns, verbs and adjectives. They may be ‘disambiguated’ by being replaced by more specific terms or expressions of the representation language. We will refer to these as referential metaconstant. 2. Constants that represent a formal semantic property or relation which is not expressed explicitly in natural language, such as the scoping relation between noun phrases, or the distributivity of a quantification. Their ‘disambiguation’ typically consists of structuring a semantic represen tation in a certain way. We will refer to such terms as formal metaconstant. 3. Terms which are intended to be replaced by a constant, a bound variable, or another subexpression that either occurs elsewhere in
64
BUNT
the representation, or that is generated from the context. We will refer to such terms as metavariables. Labels, holes, and dominance constraints. For underspecifying the way in which the E U -elements in a usr U =< E U , C U > are to be combined, a label may be associated with each element in E U , and the fact that a certain subexpression labelled L1 , consists of two subexpressions joined by means of the construction κ, of which the first one a subexpression labelled L2 and the second is unknown, may be represented as L1 : κ(L2 , h1 ), where h1 is a ‘hole’, i.e. a variable that ranges over the labels of the subexpressions in E U .5 . The precise ways in which holes may be used has been described in terms of possible ‘pluggings’, operations for replacing hole variables by subexpressions. The approach of labelling subexpressions and using variables to refer to subexpressions in the specification of constraints in a metalanguage, is clearly applicable to any given object language. Bos (1995) formalizes the use of labels and holes in propositional and (dynamic) predicate logic in a way that is easily extended to other object languages. In (Bos, 2002) he applies Hole Semantics to DRT. The use of labels to mark subexpressions and constraints on their possible combinations was originally invented in DRT, leading to UDRT (Reyle, 1993; 1996), specifically for the underspecified representation of quantifier scopes. Minimal Recursion Semantics (MRS) applies the idea of labelling subexpressions and expressing constraints on their possible combinations to a language of typed feature structures (attribute-value matrices), fitting in with the object language of HPSG (Copestake et al., 1995). Labels are called ‘handles’ (or ‘handels’) in MRS, and a peculiarity of MRS is that structure sharing between features is interpreted as conjunction, which gives MRS representation a relatively flat structure. An MRS representation is essentially a pair consisting of a list of labelled feature matrices, where the labels may occur as feature values, and a set of constraints on the labels. Although MRS representations look rather differently from usrs in first-order logic HS and from UDRSs, the underlying ideas of all three approaches are clearly very similar. The studies by Koller et al. (see Koller, 2004) and Ebert (2005) make the similarities and differences explicit between the various approaches based on labels and constraints. 5
Bos (2002) also calls hole variables metavariables; see below.
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
65
‘Dominance Constraints’ refers to a general framework for the partial description of trees, which has been used in various parts of computational linguistics (see e.g. Rogers & Vijay-Shanker, 1994; Gardent & Webber, 1998; Muskens, 2001). For underspecified semantic representation, the use of dominance constraints relies on the fact that logical formulas can be represented as trees, labeled with construction names. A usr can therefore take the form of a partial description of a tree. For instance, the usr (22a) can be repressented as the partially specified tree (22b), where a dotted line connecting two nodes indicates a dominance relation but leaves open that some material may come in between. (22) a. < {X 0 : applic(f, X 1 ), X 2 : applic(g, a)}, {X 0 > X 2 , X 1 > X2 } > b.
X 0 : applic
f
X1 l l l l X 2 : applic
g
a
A dominance constraint representation is formally a usr as defined in (1) above, where the set of constraints is restricted to constraints of two kinds; those of the form X 1 > X 2 , interpreted as indicating that the node X 1 dominates the node X 2 , and those of the form X 1 = X 2 , expressing inequality of nodes. (See Koller et al., 2003 for more detailed formal definitions.) General dominance constraints have bad computational poperties, therefore Koller et al. (2003) devised the restricted form of dominance constraints called Normal Dominance Constraints (NDC), which are computationally more tractable. Koller (2004) has shown that NDC has the same expressive power as Hole Semantics with certain plausible normality constraints, and claims that these consraints
66
BUNT
as well as the normality restrictions of NDC are linguistically adequate in the sense that all underspecified semantic representations of natural language expressions satisfy these restrictions. Ebert (2005) provides evidence that this is not the case, however, and argues that both NDC and Hole Semantics are therefore expressively incomplete. The NDC framework has been constructed as a restriction of the more powerful Constraint Language for Lambda Structures (CLLS, Egg et al., 1998). CLLS is an expressive language of tree descriptions which combines dominance constraints with parallelism constraints for dealing with VP ellipsis and anaphoric binding constraints to represent intrasentential anaphora. Glue Semantics was originally developed not as a formalism for, semantic underspecification, but for defining the syntax-semantics interface in LFG (Dalrymple, 2001). Glue Semantics uses Linear Logic in order to deductively piece together the meanings of individual words and constituents in a sentence. Premises for such deductions are ‘meaning constructors’ obtained from the lexical entries of the words, showing how meanings assigned to various constituents can be combined to build meaning assignments for other constituents. This naturally leads to intermediate underspecified representations in the glue language. The most important insight of Glue Semantics is perhaps not so much its particular representation forms, but its strategy of using logical inference to construct semantic representations. Crouch et al. (2001) show that the glue language can be used to construct UDRSs in a computationally attractive way, and Pulman (2000) shows how the Glue Semantics strategy can be applied to (a cleaned up version of) the QLF formalism in order to infer fully resolved logical forms from QLFs using context information. Flat conjunctive representations The davidsonian approach of reifying the states and events associated with verbs (Davidson, 1967) naturally leads to conjunctive semantic representations like (23) for a sentence like John saw Mary yesterday: (23) ∃e :see(e) ∧ agent(e, john) ∧ theme(e, mary) ∧ time(e, yesterday) Hobbs (1983) has proposed to apply reification not only to verbs but also to nouns, adjectives, adverbs, and prepositions, constructing ‘flat’ representations in the form of conjunctions in first-order logic. An interesting property of such representations is that underspecification takes the form of leaving out certain conjuncts. This means that an
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
67
‘underspecified’ representation is in no way different in form from a fully specified representation. For instance, a representation of Somebody read every article which leaves the relative scoping of the two NPs underspecified could be as follows: (24) ∃e :read(e) ∧ somebody(x) ∧ agent(e,x) ∧ every(y) ∧ article(y) ∧ theme(e,y) This approach entails a rather bewildering ontological picture, which has made Hobbs refer to it as ‘ontological promiscuity’. We will call this approach radical reification (RR). Another form of representations that is in a sense flat, and shares with RR-representations the property that usrs have the same form as fully specified semantic representations, is in terms of typed feature structures. Bunt (2005) has shown that representations of quantification which leave scope or distributivity (or both) underspecified, can be cast in the form of feature structures. In this case, underspecification takes the form of leaving out those attributes in an attribute-value matrix that have no value specified. These representations are also conjunctive in nature, since the interpretation of an attribute-value structure is in terms of the logical and of its attribute-value pairs. Stores and lists A relatively old idea is to accompany the construction of semantic representations by a symbolic memory in which those components of a representation are temporarily stored whose position in the representation is not yet fully determined. Cooper storage was developed for doing this for quantifier scopes (Cooper, 1983). Keller (1998) has developed an improved version of the Cooper store techniques with nested stores, known as Keller stores. Some experimental language understanding systems use a similar technique, placing NP representations on a list from which they can be tretrieved in an order that corresponds to their relative scopes. In ULF, a representation language that was used in the DenK system (Kievit, 1998) lists of variables are used as an alternative to metavariables constrained to be instantiated as one of the elements in the list. Also, lists of predicates are used to indicate that the predicates are to be combined somehow in order to form a complex predicate.
68
BUNT
3.3. Requirements on Representations, and Interpretation Frameworks The various techniques for underspecified semantic representations have often been developed for being used within a certain theoretical or computational framework. As already noted, MRS was developed for use within the framework of HPSG, and therefore employs feature structures that integrate well with the ’signs’ that are used in HPSG for representing linguistic information of all kinds; Glue Semantics was designed for developing the syntax-semantics interface in LFG. UDRT was a further development of DRT (Kamp & Reyle, 1993). Other underspecification techniques were developed for use within a certain processing architecture. This is for instance true of the QLFs developed in the CLE system, the ambiguous metaconstants of the Phliqa system, and the ULFs of the PLUS and DenK systems. Radical reification assumes an abduction-driven interpretation process; CLLS and Pulman’s renewed version of QLF (Pulman, 2000) assume an interpretation process with higher-order unification. We will use the generic term Interpretation Framework for indicating the theoretical or computational framework which a certain underspecification approach assumes. The distinguishing features of many underspecification techniques are due to their interpretation framework, which brings certain theoretical or computational requirements. There are also certain requirements that any underspecification technique should meet. Two particularly important requirements are those of compactness and expressive completeness. The requirement of compactness is, informally, that the use of a usr should have a real advantage over the use of the set of all fully specified representations. A usr which simply lists all the fully specified representations, for example, does not satisfy this requirement. Ebert (2005) has formalized the notion of compactness of usrs. The requirement of expressive completeness means that an underspecification approach should allow the representation not only of interpretations which are entirely unspecified w.r.t. a particular aspect, such as quantifier scoping, but also those interpretations which are partly specified, or, in other words, which are partly disambiguated. The need to represent these stems from the desire to represent in a usr what the syntactic and lexical information in a given sentence tell us semantically, no more and no less. This is obviously motivated from the wish to have a satisfactory syntax-semantics interface, but also from the observation that people sometimes express themselves
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
69
deliberately vaguely or ambiguously, and an adequate semantic representation framework should be able to deal with that. Koller (2004) has shown that several underspecification formalisms are equally expressive, if certain normality conditions are imposed on the representation structures. NDC already has such conditions in its definition, and Koller argues that HS and MRS could very well also have some such conditions imposed on their definitions, since from a linguistic point of view only ‘normalized’ MRS and HS representations are needed. On the other hand, Ebert (2005) provides evidence to the effect that HS as well as NDC and MRS are unable to represent certain linguistically relevant partial disambiguations, and are therefore expressively incomplete, Hole Semantics and Normal Dominance Constraints were designed in a framework-independent way. The same goes for Radical Reification and the use of flat typed feature structures. A rather obvious but far from trivial requirement on usrs is that they should be semantically well-defined. Van Deemter (1996) analyses usr proposals from this point of view, in particular in relation to their role in inference patterns; K¨ onig & Reyle (1996) provide a logical basis for a broad range of underspecification formalisms. We will now consider the various underspecification techniques that have been introduced here for their suitability to handle the various phenomena that motivate the use of underspecified semantic representations.
4. Applicability of underspecification techniques
4.1. Lexical ambiguity The efficient processing of sentences with homonymous words has motivated the oldest known use of underspecified semantic representation. which makes use of the technique of ambiguous constants in a formal representation language. In the Phliqa question answering system (see Medema et al., 1975; Bronnenberg et al., 1979) a representation language was used (a typed higher-order lambda calculus) with a referential metaconstant for each English content word, the idea being that there is hardly any content word that does not display some degree of homonymy or polysemy. So for instance, the adjective American corresponds to the metaconstant american, representing the various senses of the word American as illustrated by American car, American city, American flag, and American airplane. A domain-specific lexicon
70
BUNT
lists the possible instantiations of each metaconstant, such as american having as three of its interpretations: (25) a. american λx. Nationality-of(Manufacturer-of(x)) = USA. b. american λx. Country-location-of(x)) = USA. c. american λx. Nationality-of-(Carrier-of(x)) = USA. Referential metaconstants seem the perfect instrument for underspecifying these forms of semantic indeterminacy, especially within an interpretation framework like that of the Phliqa system, where a domain-specific lexicon determines the contextually relevant senses of ambiguous lexical items. A rather different kind of lexical ambiguity is the one between anaphoric and deictic use of pronouns and definite NPs (Did you see that?), and between their possible referents in the linguistic or situational context. The simplest way to underspecify the intended interpretation of an anaphoric or deictic expression would seem to be the use of a metavariable, with the syntactic/semantic constraints that the expression provides. For instance, for the English pronoun her as occurring in Did you see her the constraints include that the referent plays the role of the theme in a see event and is a female person or animal. This underspecification technique has been used in the DenK system (see Kievit et al., 2001). Sentence pairs such as (26) a. There’s no chicken in the yard b. There’s no chicken in the salad illustrate the count/mass ambiguity that is found in many languages. In English, virtually every noun can be used both as a count noun and as a mass noun.6 Treating every noun in the lexicon as ambiguous between a count and a mass reading is computationally unattractive and conceptually unsatisfactory, since there are systematic semantic relations between the count and mass uses of a word. Bunt (1985) has introduced the use of formal metaconstants for converting a count noun to its mass use and the other way round. For instance, the noun 6
In a pamphlet in a hospital ward the following text was found: Nurses spend a great deal of their time washing patients, and since the population has become more than 10% taller in the last ten years, they have correspondingly more patient to wash.. See Bunt (2006) for more about the count/mass distinction
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
71
bread is represented in the lexicon as bread, and the grammar rules for constructing noun phrases may turn this into µ(bread), adding a function constant µ that can be instantiated in two ways, representing the count and the mass reading. This approach has been implemented in the Tendum dialogue system (Bunt et al., 1984). Other underspecification techniques besides the use of referential or formal metaconstants are hardly available for dealing with lexical ambiguity. For instance, the use of labels, holes and dominance constraints for lexical sense underspecification runs up against the problem that there is no definite, finite list of possible pluggings or substitutions for holes or metavariables. One could use metavariables with an interpretation framework where they are instantiated by means of a context-specific word sense lexicon, but that is in fact precisely the use of referential metaconstants. 4.2. Structural semantic ambiguity: quantifier scoping Underspecified semantic representation of quantifier scopes have been proposed using the following of the above mentioned techniques. In situ representation. Alshawi’s original QLF proposal was implemented in the Core Language Engine (CLE). This interpretation framework includes a disambiguation process that pulls the quantifiers out of the predicate arguments and assigns them a scope (Alshawi, 1990). Alshawi and Crouch (1992) have provided a semantics for QLF representations in terms of their disambiguations (resolved quasi-logical forms), an approach that is generally available for usrs – at least, if they have a well-defined set of possible disambiguations. Willis & Manandhar (2001) have argued that QLFs cannot represent partial scope information adequately (for which the indices of the qterms can be used), and for instance does not get the scope constraints for the sentence Every representative of a company saw some samples right. The QLF formalism thus suffers from expressive incompleteness (Ebert, 2005). Pulman (2000) also notes several shortcomings of the CLE QLF and proposes an improved version. The in situ representations used in Allen’s textbook (Allen, 1995) and those used in ULF (Geurts & Rentier, 1991; Kievit, 1994), suffer from the lack of a constraint specification language LC , and therefore do not meet the requirement of expressive completeness.
72
BUNT
Stores and lists. The mechanisms of Cooper storage and Keller storage do not quite form an underspecified representation technique, but a procedure for postponing decisions on the relative scopes of noun phrases during much of the interpretation of a sentence. Some language understanding systems of the seventies and eighties, such as Phliqa, spicos and Tendum have incorporated list-based representations with a similar effect (van Deemter et al., 1984; Scha, 1981). Bunt and Muskens (1999) have described a formal calculus assigning logical forms to syntactic trees using an NP store, which (like Keller store) has the nice property that when an NP can raise out of an NP, as in Every representative of a company saw some samples, this is allowed only if the embedded NP is retrieved from the store later than the embedding one. Ambiguous terms. A treatment with formal metaconstants has been implemented in the Tendum dialogue system. All the noun phrases in a clause are collected in a ‘noun phrase sequence’ constituent, collecting the NP representations as the argument of a metafunction. Alternative instantiations of this metafunction correspond to alternative relative scopings. Clause formation occurs through combination of the NP sequence with the verbal constituent. A shortcoming of this technique is that sentence-specific constraints on the possible instantiations of the metaconstant cannot be expressed as constraints on metaconstant instantiation, hence there is no adequate representation of partial scoping, and so the requirement of expressive completeness is not met. Labels, holes, and dominance constraints. The use of subexpression labelling with scope constraints was originally invented in UDRT (Reyle, 1993) for the underspecified representation of quantifier scopes. NDC and HS were likewise designed specifically for the underspecification of quantifier scopes, Ebert (2005) has shown that UDRT, MRS and NDC are all expressively incomplete, as they are not always able to represent all the linguistically relevant sets of readings of a given sentence. Still, these formalisms offer the best known possibilities for representing scopal ambiguities in a compact manner. The ideas of UDRT and MRS have been implemented successfully in the Verbmobil on-line translation system (see Schiehlen, Bos & Dorna, 2000); these of NDC were implemented in the CHORUS project (see Koller, 2004).
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
73
Flat, conjunctive representations. Hobbs (1983) has proposed flat, conjunctive semantic representations which contain less information then is usually considered adequate, but where the abduction-driven pragmatic component in the interpretation framework supplies additional information. The flatness of the representations is attractive, but the price is a rather cumbersome treatment of quantification, involving the notion of a ‘typical element’ of every set, and other ontologically strange creatures. Moreover, as Hobbs (1996) shows, in an attempt to fix certain shortcomings of his original proposal, the representations of simple sentences in fact become very complicated. For instance, the sentence Most men work would be represented in RR as: (27) (∃s2 , s1 , x, e, y)[most(s2 , s1 ) ∧ dset(s1 , x, e) ∧ man(e, x) ∧ typelt(y, s2 ) ∧ work(y)] which is to be read as: There is a set s1 defined by the property e of its typical element x being a man, and there is a set s2 which is most of s1 and has y as its typical element, and y works. RR does not come with a constraint language, and is therefore unable to express partial scope disambiguations. This makes RR expressively incomplete. Another kind of flat representation that has been suggested exploits the expressive power of typed feature structures. These representations are for the most part (though not entirely) flat in the sense that the feature structures that have been proposed contain very little nesting, and are by and large of a flat, conjunctive character (see Bunt, 2005), thus allowing efficient processing. This use of feature structures has grown out of work on the development of expressive but efficient systems for semantic annotation. 4.3. Other structural semantic ambiguities The ambiguities involved in the distributivity of quantifiers has received much less attention than those in quantifier scoping, although every NP introduces distributivity ambiguities. A single NP can be argued to be multiply ambiguous between individual, collective, cumulative, and group readings, and the combination of two ore more NPs gives rise to additional distributivity ambiguities. Some examples are: (28) a. These machines will lift the platform. [together] b. These machines lift 5 crates. [in one go] c. These machines have lifted 2000 crates. [in total]
74
BUNT
The only approach to underspecifying quantifier distributivity that has been proposed, to our knowledge, is the use of formal metaconstants (Bunt, 1985). In this approach, predicates like Lift are applied to arguments from a domain represented as δ(num, machines), where δ is a formal metaconstant representing distributivity, and num stands for the (absolute or relative) numerical information in the (generalized) quantifier. The metaconstant can be instantiated in alternative ways, (where num is a group size), as (29a), b and c: (29) a. δ (λM. λX. X) b. δ (λM. λX. {X}) c. δ (λM. λX. {Y |Y ⊆ X ∧ num(Y )}), Distributivity ambiguities arise not only in quantification but also in modification, as in The crates that this machine lifted, which can be taken both individually and collectively. The kind of representation of quantifier distributivity that we have just seen, by means of a formal metaconstant, can also be used to represent modifier distributivity in an underspecified way (see Bunt, 2005). Another form of structural ambiguity occurs in English for nominal compounds. Hobbs et al. (1993) have proposed a treatment with formal metavariables, to be instantiated through abductive reasoning with context information. For example, the compound Boston office in sentence (30a) is represented schematically as in (30b): (30) a. The Boston office called b. boston(x) ∧ office(x) ∧ NN(x) where NN is a metavariable (in the sense defined above), representing the unknown semantic relation between the two nouns. In languages where nouns (and adjectives) are concatenated to form compound words, rather than multi-word expressions, this form of ambiguity arises at the lexical level and can be treated in essentially the same way, decomposing the compound word into its constituent parts. The interpretation problem for nominal compounds is very similar to that of metonymy (in fact, The Boston office called is also metonymous). Metonymy is one of the types of ambiguity for which Pinkal (1999) suggests an underspecified treatment with dominance constraints using the CLLS formalism. For the example sentence John began the book he provides the schematic representation (31a).
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
(31) a. b.
<
75
{X 0 : begin(john, X 1 ), X 2 : the-book}, {X 1 > X 2 , X 0 > X 1 } >
begin(john, writing-of(the-book))
The subexpressions labelled X 0 and X 2 in this representation are not connected, but the book is constrained to be outscoped by the unspecified subexpression labelled X 1 . The idea is of course that X 1 identifies a subexpression that should be plugged in and connect the two subexpressions, such as X 1 : writing-of(X 2 ), giving the result (31b), after replacing labels by the subexpressions that they label and suppressing the top label X 0 . The use of labels and pluggings that we see here differs importantly from that of Hole Semantics, where labels serve only to formulate constraints on the possible instantiation of holes, which range over the labels in the E U part of a usr < E U , C U >. By contrast, a hole like X 1 in the above example does not relate to any element in the set E U = {X 0 : begin(john, X 1 ), X 2 : the-book}; instead, X 1 stands for any object-level expression that can be generated through reasoning with contextual information. The dominance constraints approach is therefore in general more powerful than that of Hole Semantics , and its power is in fact determined by the interpretation framework associated with it, wthat determines which expressions can be generated from the context to instantiate hole variables. The use of a term that stands for an expression which is not given in the underspecified representation, but that has to be generated from the context, is is in fact the use of a metavariable, in the sense defined above, rather than the use of a hole variable. 4.4. Syntactically-based ambiguity Of the many forms of syntactic ambiguity, we consider syntactic scope ambiguities and attachment ambiguities. Syntactic scope ambiguities can be handled elegantly by means of labels and holes. For instance, Bos (1995) shows how the sentence (32a) can be represented schematically without resolving the relative scopes of do not and and by the usr (32b). (32) a. Do not sleep and pay attention, please.
76
BUNT
b. < {L1 : ¬h1 , L3 : sleep, L4 : pay-attention, L2 : h2 ∧ h3 }, {h1 ≥ L3 , h2 ≥ L3 , h3 ≥ L4 } > The constraints in the second part of the usr express that the argument of the negation outscopes sleep and that the two arguments of and outscope sleep and pay attention, respectively. Similar treatments are obviously possible in other label-based approaches (MRS, UDRT, CLLS, NDC). Formal metaconstants have also been proposed for treating this kind of ambiguity (Bunt, 1995), but in the absence of a constraint specification language (in which the possible instantiations of the metaconstants would be constrained), this proposal is expressively incomplete. Molla (2001) has proposed a variant of radical reification in which all logical relations are reified as well, resulting in flat list representations. For the most plausible reading of (32a) the RR representation would be: (33) not(e1 , e3 ), sleep(e3 , x), and(e4 , e1 , e6 ), pay-attention(e6 , x) To underspecify the scoping, the arguments of the scope-bearing elements are simply not tied together, and we get the usr (34): (34) <{not(e1 , e2 ), sleep(e3 , x), and(e4 , e5 , e6 ), pay-attention(e6 , x)}, {(e2 = e3 ∧ e5 = e1 ) ∨ (e2 = e4 ∧ e5 = e3 )} > While interesting as a way of dealing with such scopal ambiguities, this variant of RR is deficient in its treatment of quantification. Note that the ei variables in (33) act as a kind of subexpression labels, but do not have the same expressive power since one cannot have something like dominance constraints for them. An attachment ambiguity occurs when a sentence contains several candidates for being modified by a certain modifier. Two important cases of this are PP attachment and relative clause attachment, as illustrated by (35): (35) a. John saw the man with binoculars. b. The crates on the platform that Hercules lifted. Syntactically, the different ways of attaching the PP and the relative clause come down to different ways of connecting the subtree, describing the modifier, to the rest of the tree, and it might seem that this corresponds semantically to different ways of inserting the representation of
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
77
the modifier in the rest of the semantic representation. Underspecifying the attachment would then take the form of keeping the modifier representation separate and indicating its possible insertion points. Labels and holes would seem to be the obvious instruments for achieving this. Schematically, we can represent the two readings of (35a) as (36a) and (36b), and in labelled form as (37a) and (37b). (36) a. saw(e1 ,j,x) ∧ theman(x) ∧ withbinocs(e1 ) b. saw(e1 ,j,x) ∧ theman(x) ∧ withbinocs(x) (37) a. {L1 : saw(e1 ,j,x), L2:theman(x), L3:withbinocs(e1 )} b. {L1 : saw(e1 ,j,x), L2:theman(x), L3:withbinocs(x)} The only difference between the two readings is the argument of the modifier. Indeed, from a semantic point of view an attachment ambiguity is a choice of modifier argument. Therefore, holes can be used for underspecifying the attachment only if the alternative arguments of the modifier are labelled subexpressions, as in (38): (38) < {L1 : john, L2 : e1 , L3 : saw(L2, L1, L3), L4 :theman(L5), L5 : x, L6 : withbinocs(h1 )}, {h1 = L2 ∨ h1 = L5} > Technically this seems possible, but note that it it does not make sense to label a variable or a constant, as in L1 : john, L2 : e1 and L5 : x, since the constant and variable themselves can be inserted directly in the semantic representation, resulting in the simpler usr (39). (39) < {saw(e1 ,j,x) ∧ theman(x) ∧ withbinocs(h1 )}, {h1 = e1 ∨ h1 = x} > In this representation, h1 is clearly a variable that can be instantiated as a constant or an object-language variable, hence h1 is in fact not so much a hole but a metavariable. A general point to note about ambiguities that have their origin in a syntactic ambiguity, is that, even if it is possible to represent the various possible readings by a single underspecified semantic representation, this is only useful if the corresponding syntactic analyses are equally representable by a single ambiguous ‘packed’ syntactic representation; otherwise the interpretation process would generate a number of syntactic analyses, each associated with the same usr. That would not
78
BUNT
only miss the efficiency gain that motivates the use of usrs, but would even be wrong, since it suggests that each of the syntactic analyses has all the possible semantic readings. Attachment ambiguities are especiaaly difficult in this respect. Take for instance the following sentence: (40) John saw the man on the hill with the telescope. The PP on the hill has two possible attachments, and the PP with the telescope three. However, of the six possible combinations, one does not correspond to a possible reading of the sentence, namely the one where the man has the telescope and the see-event occurred on the hill. The impossibility of this reading must be due to syntactic reasons, for semantically that reading makes perfect sense. It seems very difficult to model this phenomenon with syntactically and semantically underspecified representations. Muskens (2001) has argued strongly in favour of a unified approach where both syntactic and semantic representations take the form of (partial) descriptions of trees, using Tree-Adjoing Grammar for syntactic analysis, in order to have a better handle on the desired parallelism of syntactic and semantic underspecification. 4.5. Granularity and vagueness The treatment of lexical ambiguity with the help of ambiguous referential metaconstants in the frepresentation language, outlined above in section 4.1, can be applied equally well to effectively deal with the vagueness that is inherent to virtually all nouns, verbs and adjectives because they refer with a certain granularity that may be too coarse in a given context. The virtually infinite ambiguity of implicit semantic relations, of which we saw examples in section 2, can be treated effectively by introducing a metavariable, similar to the NN predicate of Hobbs’ treatment of metonymy. This is especially useful in an interpretation framework where the representation language is typed, so that the types of the arguments of this predicate can be used to infer a contextually suitable interpretation of it. 4.6. Missing information In the case of intrasentential ellipsis, some linguistic material is missing locally, which can be supplied from elsewhere in the sentence. This seems an ideal application of labels and holes. As Pinkal (1999) points
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
79
out, however, parallelism has to be taken into account, for instance to make sure that the relative scope assignments in the first and the second part of a sentence like Two European languages are spoken by every linguist, and two Asian languages are, too are the same. So the constraints in the usr should take such parallelism into account. The CLLS formalism was developed specifically with this aim. The other labels-and-constraints based formalisms are unable to represent parallelism constraints in their constraint language. The occurrence of unknown words in the input to a language understanding system may be considered as causing ambiguity in the extreme: unknown words can mean anything that could make sense in the context of utterance. Therefore the treatment of ambiguous and vague words by means of metavariables can also be applied in this case. Another plausible approach to the occurrence of unknown words or of parts of an utterance that cannot be recognized (as in the case of imperfect speech recognition), consists of constructing labelled semantic representations for those parts of the input that can be processed, and add labels for any material that cannot be interpreted, possibly with certain constraints on the interpretation and on how the various pieces of the input may connect. For instance, Pinkal (1999) has suggested a treatment using the CLLS representation language, where the example sentence We meet XX next week, where XX marks an unrecognized part of the input, would have the following (schematic) underspecified representation (41): (41) < {X 1 : meet(we), X 2 , X 3 : next-week}, {X 0 ≥ X 1 , X 0 ≥ X 2 } > where X 1 and X 2 label the two semantic chunks roccresponding to the recognized parts of the input, and X 3 the unrecognized parts (and X 0 is the top label of the representation). These pieces might get connected, as Pinkal suggest, by adding the constraints X 3 = T rel(X 2 ), X 0 = X 3 (X 1 ), and T rel ∈ {. . .}, where {. . .} is a set of temporal relations, so T rel is a referential metaconstant (which Pinkal calls a ‘metavariable’) ranging over temporal relations. These constraints express the assumption that the unrecognized part expresses a temporal relation which takes the part X 2 as its second argument and the part X 1 as its first. The result of taking these constraints into account could be that we obtain a fully specifed representation like (42). (42) (In(next-week))(meet(we)
80
BUNT
Notice that in the process we have used a constraint of the form X 3 = T rel(X 2 ), which contains a semantic construction of the object language, as well as a referential metaconstant T rel ranging over objectlanguage predicates; and the additional constraint X 0 = X 3 (X 1 ) which does not specify a dominance constraint, but a semantic relation connecting the twho first of the two recognized parts to the rest of the input. This consequently is not just an application of the technique of labels, holes and constraints; it also makes essential use of metavariables (X 3 ) and of the powerful interpretation framework of higher-order lambda calculus that comes with the CLLS representation language. Incomplete input and incremental processing both have the effect that the utterance interpretation process has the task of assigning a semantic representation to an incomplete fragment of the input. Incremental processing is equivalent to processing an input with an unrecognized part at the end, like We will meet some time during XX. This suggest that the treatment of unrecognized input with metavariables , dominance constraints and referential metaconstants can be applied also in this case.
5. Summary and conclusion
We summarize the applicability of the various underspecification techniques in Table 2. The techniques based on the use of labels and dominance constraints (UDRT, HS, MRS, NDC) have been grouped together in one column in view of their comparable expressiveness, as shown by Koller (2004). A ‘+’ sign in this table means that the technique under consideration is suitable for dealing with a particular phenomenon; it does not mean that the technique is perfect for that purpose – probably no technique is perfect, given the results on expressive adequacy from Ebert (2005). A ± sign is used to indicate that a technique is suitable only if supplemented with an adequate constraint language. Three rows near the bottom of the table contain the pattern ‘(+) + (+)’; this indicates that the phenomena under consideration can be handled by a combination of these three techniques, with the one in the column that has a ‘+’ without parentheses playing center stage. Three columns occupying the center of the table represent the applicability of various forms of ambiguous terms in a formal representation language. Recall that metavariables are understood here as terms which are intended to be replaced by a subexpression that either occurs
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
81
Table 2. Applicability of underspecification techniques Phenomenon
Labels, domin. consts.
Metavariables
Ref. metacons.
Formal metacons.
Radical reification
Stores, lists, in situ
Lexical ambiguity homonymy anaphora; deixis count/mass use metonomy compound nouns
– – – – –
– + – – +
+ – – – –
– – + + –
– – – – –
+ ± – – –
Syntactic ambiguity modifier attachment syntactic scope thematic/semantic role
– + –
+ – +
– – –
± – –
± ± ±
– – –
Struct. sem. ambiguity quantifier scope quantifier distributivity modifier distributivity nominal complexes
+ – – –
– – – +
– – – –
– + + –
– – – –
± – – +
Semantic imprecision polysemy granular vagueness relational vagueness
– – –
– – –
+ + +
– – –
– – –
– – –
(+) (+) ± (+)
+ + – +
(+) (+) – (+)
– – – –
– – – –
– – – –
Missing information unknown words incomplete input ellipsis; short answers incremental processing
82
BUNT
elsewhere in the representation or that is generated from the context. Referential metaconstants, by contrast, are ambiguous predicates and other nonlogical constants that represent ambiguous or vague content words, and formal metaconstants are terms in the representation language that do not correspond to anything that is expressed explicitly in natural language, but to a formal semantic property or relation such as the distributivity of a quantifier, or the relation between the count and mass senses of a noun. Two observations from Table 2 are that: 1. each underspecification technique is applicable only to a limited subset of the phenomena that call for the use of underspecified semantic representations; 2. the various kinds of underspecification techniques have some overlapping applicability, but by and large they each apply to different phenomena. In fact, we see quite clearly that the techniques based on subexpression labelling and dominance constraints are useful for dealing with scope ambiguities, both syntactic and purely semantic ones, and in combination with metavariables also for dealing with cases of missing information. Ambiguous terms of the various kinds are particularly useful for dealing with lexical ambiguity and vagueness, and metavariables have interesting applications in combination with labels cum dominance constraints and referential metaconstants. One general conclusion seems unescapable: a single, unified framework for dealing with all kinds of ambiguity, vagueness, and incomplete information will not be based on just one of the underspecification techniques that we currently know. Instead, the wide range of phenomena for which underspecified semantic representations is useful or even a necessity, calls for the use of a combination of underspecification tools and techniques.
Acknowledgements
I would like to thank Johan Bos, Manfred Pinkal and Steve Pulman for their comments on an earlier version of this chapter.
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
83
References Allen, J.: 1995, ‘Natural Language Understanding’. Redwood City, California: Benjamin/Cummings. Alshawi, H.:1990, ‘Resolving Quasi Logical Form.’ Computational Linguistics 16:133–144. Alshawi, H.: 1992, ‘The Core Language Engine’. MIT Press, Cambridge, MA. Alshawi, H. and J. van Eijck: 1992, ‘Logical Forms in the Core Language Engine’. In Proceedings ACL’87. Alshawi, H. and D. Crouch: 1992, ‘Monotonic semantic interpretations’. In Proceedings ACL’92, pp. 33–39. Bos, J.: 1995, ‘Predicate Logic Unplugged’. In Proceedings of the 10th Amsterdam Colloquium, Amsterdam: ILLC, pp. 133–142. Bos, J.: 2002, ‘Underspecification and resolution in discourse semantics’. Ph.D. Thesis, Saarland University, Saarbr¨ ucken. Bronnenberg, W., H. Bunt, J. Landsbergen, R. Scha, W. Schoenmakers & E. van Utteren: 1979, The question answering system Phliqa1. In L. Bolc (ed.) Natural language question answering systems. London: McMillan, pp. 217–305. Bunt, H.: 1984, ‘The resolution of quantificational ambiguity in the Tendum system.’ Proceedings COLING 1984 , Stanford University, pp. 130–133. Bunt, H.: 1985, ‘Mass terms and model-theoretic semantics.’ Cambridge University Press. Bunt, H.: 1995, ‘Semantics and Pragmatics in the ∆elta Project.’ In: L. Dybkjaer, editor, Proceedings of the Second Spoken Dialogue and Discoure Workshop, Topics in Cognitive Science and HCI, Vol.8. Roskilde: Centre for Cognitive Science, pp. 1–27. Bunt, H.: 2005, ‘Quantification and modification as Feature Structures’. In Proceedings of the Sixth Internationanl Workshop on Computational Semantics IWCS-6, Tilburg, pp. 54–65. Bunt, H.: 2006: ‘Mass expressions’. In K.Brown, editor, Encyclopedia of Language and Liguistics, Second Edition. Amsterdam: Elsevier, pp. 5757–5760. Bunt, H., R.J. Beun, F. Dols, J.v.d. Linden & G. Schwartzenberg: 1984, ‘The Tendum dialogue system and its theoretical foundations.’ IPO Annual Progress Report 19, 105–113, Eindhoven: IPO. Bunt, H. & R. Muskens: 1999, ‘Computational Semantics’. In H. Bunt & R. Muskens (eds.) Computing Meaning, Vol. 1. Dordrecht: Kluwer, 1–32. Cooper, R.: 1983, ‘Quantification and Syntactic Theory.’ Dordrecht: Reidel. Copestake, A., D. Flickinger, R. Malouf, I. Sag & S. Riehemann: 1995, ‘Minimal Recursion Semantics’. Unpublished ms., CSLI, Stanford University. Copestake, A. and D. Flickinger: 2000, ‘An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG’. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation. Athens, Greece.
84
BUNT
Dalrymple, M.: 2001. ‘Lexical-Functional Grammar’. San Diego, Calif.; London : Academic Press. Davidson, D.: 1967, ‘The Logical Form of Action Sentences’. In: N. Rescher (ed.): The Logic of Decision and Action. Pittsburgh: University of Pittsburgh Press, pp. 81–95. Deemter, K. van: 1996, ‘Towards a logic of Ambiguous Expressions’. In S. Peters and K. van Deemter, editors, Semantic Ambiguity and Underspecification. Stanford: CSLI, pp. 203–237. Deemter, K. van, G. Brockhoff, H. Bunt, M. Meya and J. de Vet: 1985 ‘From Tendum to spicos, or: How flexible is the Tendum approach to question answering?’ IPO Annual Progress Report 20, 83–90. Ebert, C.: 2005, Formal Investigations of Underspecified Representations. Ph.D. Thesis, King’s College, University of London. Egg, M., A. Koller and J. Niehren: 2001, ‘The constraint language for lambda structures’. Journal for Logic, Language, and Information 10, 457–485. Fenstad, J.E., P. Halvorsen, T. Langholm, and van Benthem: 1987, ‘Situations, Language and Logic’. Studies in Linguistics and Philosophy 34, Reidel, Dordrecht. Fraassen, B. van: 1966, ‘Singular terms, truth-values gaps, and free logic’. Journal of Philosophy 63.17: 481–495. Gardent, C. and B. Webber: 1998, ‘Describing discourse semantics.’ In Proceedings of the 4th TAG+ Workshop, Philadelphia. Geurts, B. & G. Rentier: 1991, ‘Quasi logical form in PLUS’. Esprit project P5254 (PLUS) internal report, Tilburg: Institute for Language Technology and Artificial Intelligence ITK. Hobbs, J.: 1985, ‘Ontological Promiscuity.’ In Proc. 23rd Annual meeting of the ACL, Chicago, 61–69. Hobbs, J. and S. Shieber: 1987, ‘An algorithm for generating quantifier scopings’. Computational Linguistics 13 (1–2):47–63. Hobbs, J., M. Stickel, D. Appelt & P. Martin: 1993, ‘Interpretation as Abduction’. Artificial Intelligence 63: 69–142. Keller, W.: 1998, ‘Nested Cooper Storage: The Proper Treatment of Quantification in Ordinary Noun Phrases’. In: U. Reyle & C. Rohrer (eds.) Natural Language Parsing and Linguistic theories. Dordrecht: Reidel, pp. 1–32. Kievit, L.: 1998, ‘Context-driven natural language interpretation’. Ph.D. Thesis, Tilburg University. Kievit, L., P. Piwek, R.J. Beun & H. Bunt: 2001, ‘Multimodal Cooperative Resolution of Referential Expressions in the DenK system.’ In H. Bunt & R.J. Beun (eds) Cooperative Multimodal Communication. Heidelberg: Springer, pp. 197–214. Koller, A., J. Niehren, and S. Thater: 2003, ‘Bridging the gap between underspecification formalisms: Hole semantics as dominance constraints.’ Proceedings of the 11th EACL, Budapest, pp. 195–202.
APPLICABILITY OF UNDERSPECIFICATION TECHNIQUES
85
Koller, A.: 2004, ‘Constraint-Based and Graph-Based Resolution of Ambiguities in Natural Language’. Ph.D. Thesis, Saarland University, Saarbr¨ ucken. K¨ onig, E. & Reyle, U.: 1996, ‘A General Reasoning Scheme for Underspecified Representations.’ In: H.J. Ohlbach & U. Reyle (eds.) Logic and its Applications. Festschrift for Dov Gabbay. Dordrecht, Kluwer, pp. 1–28. Medema, P., W. Bronnenberg, H. Bunt, J. Landsbergen, W, Schoenmakers and E. van Utteren: 1975, ‘Phliqa1: ‘Multilevel semantics in question answering.’ American Journal of Computational Linguistics microfiche 32. Molla, D.: 1999, ‘Ontologically Promiscuous Flat Logical Forms for NLP’. In Proceedings 4th International Workshop on Computational Semantics IWCS-4 , Tilburg University, pp. 249–265. Muskens, R.: 2001, ‘Talking about Trees and Truth-Conditions.’ Journal of Logic, Language and Information 10: 417–455. Pinkal, M.: 1999, ‘On semantic underspecification.’ In H. Bunt & R. Muskens (eds.) Computing Meaning, Vol. 1. Dordrecht: Kluwer, pp. 33–55. Pulman, S.: 2000, ‘Bidirectional Contextual resolution.’ Computational Linguistics 26:4. 497–538. Reyle, U.: 1993, ‘Dealing with Ambiguities by Underspecification’. Journal of Semantics 10 (2), 123–179. Reyle, U.: 1996, ‘Co-indexing labelled DRSs to represent and reason with ambiguities’. In S. Peters and K. van Deemter, editors, Semantic Ambiguity and Underspecification. Stanford: CSLI, pp. 239–268. Scha, R.: 1981, ‘Distributive, collective and cumulative quantification’. In J. Groenendijk and M. Stokhof, editors, Formal methods in the study of language. Amsterdam: Mathematical Centre. Schiehlen, M., Bos, J. & Dorna, M.:2000, ‘Verbmobil Interface Terms (VITs).’ In W. Wahlster (ed.) Verbmobil: Foundations of Speech-to-Speech Translation. Berlin: Springer, pp. 183–199. Schubert, L.K. and F.J. Pelletier: 1982, ‘From English to Logic: Contextfree computation of conventional logical translation’. American Journal of Computational Linguistics 8 (1): 165 –176. Willis, A. & S. Manandhar: 2001, ‘The Availability of Partial Scopings in an Underspecified Semantic Representation’. In H. Bunt, R. Muskens & E. Thijsse (eds.) Computing Meaning, Vol. 2 , Dordrecht: Kluwer, pp. 129–145. Woods, W.: 1978, ‘Semantics and quantification in question answering/. In M. Yovits (ed.) Advances in Computers. New York: Academic Press, pp. 2–64.
ALEX LASCARIDES AND NICHOLAS ASHER
SEGMENTED DISCOURSE REPRESENTATION THEORY: DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
1. Introduction
At least two important ideas emerged from research on discourse interpretation in the 1980s. First, dynamic semantics changed the way linguists think about meaning: instead of viewing the content of a discourse as the set of models that it satisfies (e.g., Montague, 1974; Davidson, 1980), dynamic semantics views it as a relation between contexts known as the context change potential or ccp (e.g., Kamp, 1981; Groenendijk and Stokhof, 1991). Secondly, ai-based research demonstrated that discourse structure is a necessary component of discourse interpretation (e.g., Hobbs, 1985; Mann and Thompson, 1987; Grosz and Sidner, 1986). Both these insights address the need to model how the interpretation of the current sentence is dependent on the interpretation of the sentences that precede it, but they differ in their aims and execution. Dynamic semantics typically explores a relatively restricted set of pragmatic phenomena, focusing on the effects of logical structure on anaphora of various kinds. For example, it predicts the difference in acceptability of the pronouns in (43a) vs. (43b): (43) a. A man walked in. He ordered a beer. b. Every man walked in. ??He ordered a beer. Discourse structure in dynamic semantics is thus determined entirely by the presence of certain linguistic expressions such as if, not, every and might. The process of constructing logical form is equally simple, either using only syntax and the form of the logical forms of clauses but not their interpretations (e.g., Kamp and Reyle, 1993), or using in addition notions such as consistency and informativeness (e.g, van der Sandt, 1992). In contrast, many ai approaches to discourse interpretation aim to model implicatures generally, including the interpretation of pronouns (e.g., Hobbs et al., 1993). These theories emphasise the role of commonsense reasoning with non-linguistic information such as domain 87 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 87–124. c 2007 Springer.
88
LASCARIDES AND ASHER
knowledge and cognitive states. For example, Hobbs (1985) argues that such reasoning is necessary for inferring that he in (1b) binds to Bill rather than John: (1) a. John can open Bill’s safe. b. He’s going to have to get the combination changed soon. He argues persuasively that this interpretation occurs as a byproduct of working out how (and why) the discourse is coherent, where a discourse is defined to be coherent only if the contents of its utterances are rhetorically connected in a discourse structure. In this case, the rhetorical relation is Result, and this is inferred via commonsense reasoning with domain knowledge since no cue phrases such as therefore are present. Dynamic semantics predicts that John and Bill are possible antecedents to he in (1) but doesn’t rank these alternatives. Discourse (1) is also a counterexample to theories of anaphora which utilise only grammatical information; e.g., Centering Theory (Grosz et al., 1995) predicts that he binds to John since antecedents in subject position are preferred to those in object position. By eschewing insights from dynamic semantics on how logical form is constructed and interpreted, Hobbs inter alia (Hobbs, 1979); 1985; Hobbs et al., 1993 tend to exploit commonsense reasoning in cases where the simpler mechanisms from dynamic semantics would do. Indeed, it’s not even clear that one can explain (43a) vs. (43b) by relying on commonsense reasoning and ignoring dynamic semantics, especially since every man in (43b) implicates that at least one man exists (who walked in). Further, Hobbs et al. (1993) assume a highly unmodular architecture: any piece of information from any knowledge source can be accessed at any time. But we believe this has drawbacks. First, an unmodular approach misses certain generalisations. For example, one cannot express within weighted abduction that the preferences for interpreting pronouns which are predicted by Centering Theory are overridden if the semantics of the rhetorical relation that’s predicted by other information sources conflicts with them (see Stone and Thomason (2002) for motivation of such a rule). Indeed, information from grammar and from domain knowledge aren’t distinguished at all, and weighted abduction is unable to express laws about how weights are assigned anyway. Secondly, allowing the process for constructing logical form to have full access to their interpretations, as Hobbs et al. do, confuses constructing what is said with evaluating what is said. To see the difference consider (2), where (2b) plainly elaborates (2a):
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
(2)
89
a.
There are some unsolvable problems in number theory.
b.
Every even number greater than two is expressible as the sum of two primes is undecidable, for instance.
Suppose that we were to infer this in a system of defeasible reasoning that has full access to the interpretations of the two clauses. This default reasoning demands a consistency test, and given that the semantics of Elaboration is such that it’s true only if the propositions it connects are also true, testing the Elaboration connection for consistency will entail a test as to whether (2a) and (2b) are satisfiable. That is, we would need to test whether Goldbach’s Conjecture is in fact undecidable or not, something which we have no idea how to do! But even the most mathematically inept interpreter can easily understand the discourse structure of (2) and construct its logical form; one has a clear picture of what is being said without necessarily being able to evaluate what is said. Unlike Hobbs et al.’s framework, this distinction between constructing logical form and interpreting it is clearly marked in dynamic semantics. We will describe here a theory of discourse interpretation that integrates dynamic semantics and ai-approaches, in an attempt to ameliorate the disadvantages of one framework with the advantages of the other. The theory is called Segmented Discourse Representation Theory or sdrt, and it is something that we have working on for over a decade. sdrt provides both a logic for representing (and interpreting) the logical forms of discourse, and a logic for constructing logical forms. The former logic is known as the logic of information content; the latter is the glue logic. sdrt is wedded to dynamic semantics in that the logic of information content is assigned a dynamic semantic interpretation, using ccp in the familiar way. It extends prior work on discourse structure by assigning rhetorical relations a precise dynamic semantics, which explains how the content of the discourse augments the compositional semantics of its clauses. We will see several examples of this in Sections 4 and 5. The glue logic extends dynamic semantics’ mechanisms for constructing logical form by encoding an interaction between semantics and pragmatics: it involves commonsense reasoning with both linguistic and non-linguistic information, which extends the partial information about content that’s generated by the grammar to a more complete semantic representation of the discourse. For example, it will compute the value of a rhetorical relation (and its arguments) that was absent from compositional semantics; and/or it identifies the antecedent
90
LASCARIDES AND ASHER
to a pronoun, resolves the semantic scope of a presupposition, disambiguates a word sense, yields a bridging inference etc. The glue logic therefore contributes an important additional element to current research in semantic underspecification: instead of relating an underspecified semantic representation to all its possible interpretations, as current formalisms do (e.g., Reyle, 1993; Koller et al., 2000), sdrt relates an underspecified semantic representation to its pragmatically preferred interpretations. The glue logic is also distinct from commonsense reasoning as it’s used in other work. For example, unlike Hobbs et al.’s abductive approach, it works over partial descriptions of logical forms, so that constructing logical form proceeds in a constraint-based fashion. Secondly, for the reasons given earlier, sdrt has a highly modular architecture, which leads to a more constrained approach. Each knowledge source that contributes to discourse interpretation – compositional and lexical semantics, domain knowledge, cognitive states etc. – is represented in a distinct language with its own distinct logic. The glue logic has only restricted access to the information within these logics; for example, it has access only to descriptions of formulae in the logic of information content, but not to what those formulae entail (in the dynamic logic where logical forms for discourse are interpreted). This separation ensures we don’t confuse constructing what is said with evaluating what is said. It also ensures that we can represent the interaction between the information from different knowledge resources within the glue logic’s consequence relation. While sdrt can be used to extend any dynamic semantic theory, for the sake concreteness we will use Discourse Representation Theory (drt, Kamp and Reyle, 1993) as the starting point in this paper. We will briefly describe drt and use simple texts to motivate the need for rhetorical relations (e.g., see (1)). Accordingly, we will extend the logic of information content in drt to include rhetorical relations, to which we assign a compositional and dynamic semantic interpretation. We will then introduce sdrt’s mechanisms for constructing logical form. As we mentioned, this takes place at the description level, following current practice in composing logical forms for clauses within the grammar; e.g., Copestake et al., 2001; Asudeh and Crouch, 2001. We will show how the resulting theory provides a unifying explanatory mechanism for a number of different discourse phenomena, overcoming some problematic predictions in both dynamic semantics and ai-based accounts of discourse.
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
91
2. Dynamic Semantics
Montague semantics (Montague, 1974) wasn’t designed to construct logical forms for multi-sentence discourse, but extending it in obvious ways falls short of handling the phenomena we want to analyse. Consider (43a) again. Appending the logical forms of the clauses to yield (43a ) incorrectly predicts that the man that walked can be different from the one that ordered the beer: (43)
a .
∃x(man(x) ∧ walk-in(x)) ∧ ∃y(beer(y) ∧ order(z, y))
a .
∃x(man(x) ∧ walk-in(x) ∧ ∃y(beer(y) ∧ order(x, y)))
The formula (43a ) is an improvement in this respect, but assigning (43a) this logical form would make constructing logical form overly complex, since the scope of quantifiers would extend beyond sentence boundaries. Moreover, since anaphoric binding must be blocked in (43b), one would need to block constructing a logical form for (43b) that’s similar to (43a ). In any event, this misses the point: (43a ) fails to represent the fact that uttering the first sentence changes the context in which the second sentence is interpreted. Dynamic semantics redefines meaning to address these problems: a sentence S is interpreted as a relation between an input context and an output one. Assuming that the model M is fixed, these contexts consist of variable assignment functions. Roughly put, the input context is the set of variable assignment functions which make the content of the discourse prior to S true in M ; the output context is the subset of variable assignment functions from the input context which (also) make S true. Thus the set of functions always gets smaller, capturing the (simplifying) idea of monotonically accumulating information as discourse is interpreted. Viewing meaning this way provides an elegant account of the anaphoric dependency in (43a). The input context for the first sentence is the set of all variable assignment functions; the output one consists of just those variable assignment functions which are defined for the individual or discourse referent x that’s introduced in the grammar by a man and that make man(x) and walk(x) true. Like all NPs, pronouns introduce a discourse referent but they also introduce a condition that this discourse referent be identified with an accessible prior discourse referent of appropriate number and gender. As we will see, accessibility is defined in terms of the form of logical form, but semantically, the accessible discourse referents amount to those for which each variable
92
LASCARIDES AND ASHER
assignment function in the output context is defined. This captures the anaphoric binding in (43a), and the output contexts consist of variable assignment functions which are defined for x, and which satisfy the conditions that x is a man, walked in and ordered a beer. In (43b), there is no accessible discourse referent which can be identified with that introduced by the pronoun, making the discourse uninterpretable. Now to formal details, focusing on Discourse Representation Theory (or drt, Kamp and Reyle, 1993) although our evaluation of it in Section 3 applies to dynamic semantics generally. The logical forms of discourse in drt are discourse representation structures or drss. A drs is a pair: a set of discourse referents, and a set of drs-conditions. Their syntax (for a very simple fragment) is defined as follows: DEFINITION 1. The Syntax of DRSs The set of drss is defined by: K := U, ∅ | K ⊕ ∅, γ Where: 1. U is a set of discourse referents; 2. γ is a drs-condition; i.e., if x1 , . . . , xn are discourse referents and R is an n-place predicate, then γ := R(x1 , . . . , xn )|¬K|K 1 ⇒ K 2 ; and 3. ⊕ is an ‘append’ operation on drss: that is, if K 1 is the drs U 1 , C 1 and K 2 is the drs U 2 , C 2 , then K 1 ⊕ K 2 = U 1 ∪ U 2 , C 1 ∪ C 2 .
drss are sometimes written in a ‘box-style’ notation, as shown in (3a) (for a man walks) and (3b) (for every man walks): x (3)
a. man(x) walk(x)
b.
x man(x)
⇒
walk(x)
Thus one drs can subordinate another, and this is used to define the accessibility constraint on anaphora mentioned earlier: DEFINITION 2. Subordination A drs K 1 is immediately subordinate to K 2 iff: 1. K 2 contains the drs-condition ¬K 1 ; or
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
93
2. K 2 contains the drs condition K 1 ⇒ K 3 or K 3 ⇒ K 1 for some drs K3 . Transitive Closure: A drs K 1 is subordinate to K 2 iff there is a drs K 3 such that K 1 is immediately subordinate to K 3 and K 3 is subordinate to K 2 .
DEFINITION 3. Accessibility A discourse referent x is accessible to an anaphoric drs condition in K 1 (e.g., the condition introduced by a pronoun) iff x is introduced in U K 2 where: 1. K 1 is subordinate to K 2 ; or 2. K 2 ⇒ K 3 is a drs condition in a drs K 4 , such that K 1 is subordinate to K 3 .
Thus the discourse referent y that’s introduced by a pronoun in a drs K 1 can be bound to any discourse referent (of appropriate number and gender) that is introduced in a drs on the following path: starting at K 1 , if there is a drs K 2 immediately to your left (i.e., K 2 ⇒ K 1 is drs-condition) then move to that; if not, but there is a drs K 2 that you’re immediately subordinate to, move to that; otherwise stop. Let’s focus first on how logical form is constructed, rather than how it’s interpreted. Logical-form construction is encapsulated in the process of discourse update. In its simplest form, it consists of the following steps: 1. Construct the logical form of the current sentence, leaving the anaphoric conditions unresolved. We will examine how unresolved conditions are represented shortly; for now we gloss the condition introduced by a pronoun as x =? (the “?” showing that the antecedent to x is unknown). Constructing such drss can be done compositionally within the grammar (Muskens, 1996; Asher, 1993). 2. Use ⊕ to append this logical form to the drs of the discourse context; and 3. Resolve any conditions of the form x =? to conditions of the form x = y, where y is accessible to x =?. Observe that this construction procedure uses only the form of the drss – i.e., subordination – and not their interpretation. Nevertheless, it makes the right predictions about (43a) and (43b), as shown in Figures 8 and 9 (ignoring tense for now). The dynamic interpretation of drss makes semantic sense of the accessibility constraint on anaphora. The introduction of new discourse referents into a drs K causes a transition from an input context to
94
LASCARIDES AND ASHER Context drs: x man(x), walk(x) Append with ⊕: x, y, z man(x), walk(x) beer(z), order(y, z), y =?
Resolve y =?: x, y, z man(x), walk(x) beer(z), order(y, z), y=x
Current drs: y, z beer(z), order(y, z), y =? Figure 8.
Constructing the drs for (43a)
an output one, while drs-conditions impose tests on the input context (observe the conditions f = g in clauses 3–6): DEFINITION 4. The Interpretation of DRSs Assuming a first order model M consisting of a set of individuals DM and an interpretation function I M : 1. 2. 3. 4. 5.
f [[ U, ∅]]M g iff dom(g) = dom(f ) ∪ U . f [[K ⊕ ∅, γ]]M g iff f [[K]] ◦ [[γ]]M g f [[R(x1 , . . . , xn )]]M g iff f = g and f (x1 ), . . . f (xn ) ∈ I M (R). f [[¬K]]M g iff f = g and there is no function h such that f [[K]]M h. f [[K 1 ⇒ K 2 ]]g iff f = g and for every function h such that f [[K 1 ]]M h there is a function k such that h[[K 2 ]]M k.
This looks like a small change to the Tarskian notion of satisfaction: instead of defining semantics in terms of one variable assignment function, we use two functions. And indeed, there is a close correspondence between first-order logic and basic fragments of drt (Fernando, 1994). However, the change is more dramatic than it seems: ¬¬K is not dynamically equivalent to K – indeed, the former simply imposes a
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
95
Context drs: x man(x)
⇒
walk(x) Append with ⊕: y, z x man(x)
⇒
walk(x)
beer(z), order(y, z), y =? | | ∨ Current drs: y, z beer(z), order(y, z), y =? Figure 9.
Resolve y =?: x is inaccessible to y =? Constructing the drs for (43b)
test on the input context while the latter transforms it. This semantic difference surfaces in the treatment of anaphora: a discourse referent that’s introduced inside a double negation isn’t an accessible antecedent to subsequent anaphora while a discourse referent that’s not inside a double negation is.
3. Why Dynamic Semantics needs Rhetorical Structure
Dynamic semantics is used to account for various anaphora: pronouns (e.g., Kamp and Reyle, 1993), tense (e.g., Kamp and Rohrer, 1983) and presupposition (e.g., van der Sandt, 1992), among others. We review this work here to motivate the introduction of rhetorical relations.
96
LASCARIDES AND ASHER John had a lovely evening Elaboration He had a great meal
Narration
He won a dancing competition
Elaboration He ate salmon
Narration
Figure 10.
He devoured cheese
The discourse structure of (4)
3.1. Pronouns Consider text (4) from Lascarides and Asher, 1993: (4)
π 1 . John had a great evening last night. π 2 . He had a great meal. π 3 . He ate salmon. π 4 . He devoured lots of cheese. π 5 . He won a dancing competition. π 6 . ??It was a beautiful pink.
This discourse contains no expressions such as every or not that block discourse referents from being antecedents. drt therefore over-generates the possible interpretations of it in π 6 , allowing it to bind to the salmon in π 3 . Rhetorical relations can help overcome this problem, however. They allow one to reflect the capacity of a discourse to describe things at different levels of detail: for example, one can introduce a relation Elaboration(π 1 , π 2 ) whose semantics entails that the events described in π 2 describe in more detail those described in π 1 ; in contrast, Narration(π 3 , π 4 ) reflects temporal progression between the events, rather than a change in the granularity of description. These relations therefore provide a way of thinking about the content of (4) that’s shown in Figure 10. This figure follows Hobbs (1985) and Asher (1993) in assuming that Elaboration induces subordination (to reflect its semantic function of changing granularity of description) whereas Narration induces coordination. The resulting structure affects anaphora. Most research on
97
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
discourse structure assumes what’s known as a right-frontier constraint (e.g., Grosz and Sidner, 1986; Webber, 1991 and others): anaphora in the current clause must be bound to an antecedent which is on the right frontier of the structure. This blocks it in π 6 from binding to the salmon in π 3 , since π 3 isn’t on the right frontier. drt doesn’t introduce discourse referents which denote abstract objects such as propositions, and it therefore under-generates the possible interpretations of this in (5): (5)
π 1 . One plaintiff was passed over for promotion three times. π 2 . Another didn’t get a raise for five years. π 3 . A third plaintiff was given a lower wage compared to males who were doing the same work. π 4 . But the jury didn’t believe this.
However, simply extending drt to include such referents would replace the under-generation problem with an over-generating one. Since there are no linguistic expressions such as every, not and if that block discourse referents from being antecedents to anaphora, drt’s accessibility constraint would incorrectly predict that this can refer to the second claim alone. But in fact, this can only refer to the last claim or to the sum of the claims (differences in intonation would facilitate these differences in interpretation). Rhetorical relations and the right-frontier constraint help here too: π 2 forms a Continuation with π 1 , the continuation segment elaborating some linguistically implicit topic (such as three plaintiffs made three claims that they are ill-treated), and π 3 continues this continuation as shown in (5 ). (5 )
Three plaintiffs made three claims that they are ill-treated
π1
π2 Continuation
Continuation
π3
Thus according to the right-frontier constraint π 4 can either be rhetorically connected to the topic (in which case this resolves to the three claims) or to π 3 (in which case this resolves to the third claim). This right-frontier constraint also explains why inserting a sentence expressing the topic between π 3 and π 4 changes the interpretation of this: now the only proposition on the right frontier is the topic, and so this must bind to the three claims.
98
LASCARIDES AND ASHER
3.2. Temporal Anaphora Intuitively, (6) is true only if the events all occur in the past and in a definite sequence: i.e., the sequence in which they are mentioned. (6)
Max fell. John helped him up.
Kamp and Reyle (1993) and others use drt’s mechanisms for anaphoric binding to account for this. Thus only syntax and the form of the drss affect how temporal anaphora are interpreted. As Kamp and Reyle (1993) themselves observe, however, factors other than syntax and form influence temporal interpretation. Lascarides and Asher (1993) use (7) as a counterexample to drt’s analysis: (7)
Max fell. John pushed him.
The natural interpretation of (7) is one where the temporal order of the events mismatches their textual order, and the rules for constructing logical form in (Kamp and Reyle, 1993) yield a drs with the wrong truth conditions. Default world knowledge – about pushing causing fallings, for example – might seem a plausible basis for distinguishing (6) from (7). But in general it’s not sufficient. On its own, default world knowledge would predict the wrong truth conditions in (8) (from Lascarides et al., 1992): (8)
π 1 . John gave Max a blow to the back of his neck. π 2 . Max fell. π 3 . John pushed him. π 4 . Max rolled over the edge of the cliff.
Reasoning about discourse structure – and in particular, the right frontier constraint mentioned earlier – can explain these textual ordering effects: the proposition π 3 cannot be interpreted as a cause of π 2 , because that would require it to be rhetorically connected to the other cause π 1 , and the right-frontier constraint blocks this. 3.3. Presuppositions Many recent accounts of presupposition have exploited the dynamics in dynamic semantics (Beaver, 1996; Geurts, 1996; Heim, 1982; van der Sandt, 1992). Presuppositions impose tests on the input contexts: either the context must satisfy the presuppositions of the clause (e.g.,
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
99
Beaver, 1996; Heim, 1982), or the presuppositions are anaphoric (e.g., van der Sandt, 1992) and so must be bound to elements in the context. When presuppositions fail this test, as they do in (9) (where Jack’s son generates the presupposition that Jack has a son), the presupposition is accommodated or added to the context, provided various constraints are met (e.g., the result must be satisfiable). (9)
If baldness is hereditary, then Jack’s son is bald.
(10)
If Jack has a son, then Jack’s son is bald.
One of the success stories of dynamic semantics has been to use the structure of the contexts to place constraints on accommodation . Van der Sandt (1992) stipulates that presuppositions are accommodated into the part of the context that gives them the widest scope possible, subject to the result being consistent and informative (which means that no part of the asserted content becomes logically redundant once the presupposition is added). In contrast, Beaver (1996) stipulates that they are accommodated in the accessible site that produces the most plausible pragmatic interpretation, though he doesn’t formalise this constraint. Nevertheless, both these theories predict that the presupposition projects from the embedding inside the conditional in (9). Since the context in (10) satisfies the test on the input context (both the satisfaction test and the anaphoric binding one), the presupposition isn’t added to the context and so it doesn’t project out from the embedding inside the conditional. This doesn’t explain some of the data, however. Pragmatics and rhetorical structure also affect presuppositions. To illustrate this, consider texts (11a) and (11b): (11)
a. b.
If David scuba dives, he’ll bring his regulator. If David scuba dives, he’ll bring his dog.
In both cases, the context fails the tests imposed by the presupposed content that’s generated by the possessive np in the consequent (that David has a regulator and that David has a dog respectively). So this content has to be accommodated. According to van der Sandt’s constraints on accommodation, the presupposed content projects out from the conditional in both cases. But although this is correct for (11b), it’s clearly wrong for (11a), which is interpreted as: If David scuba dives then he has a regulator and he’ll bring it. In other words, the constraints on accommodation are too weak, yielding wide scope readings in cases where it should be narrow scope.
100
LASCARIDES AND ASHER
Beaver’s (1996) plausibility constraint seems to do a better job. According to domain knowledge, there is no logical dependency between scuba diving and owning a dog. But one implies there is if one assigns the presupposed content narrow scope relative to the conditional, for then (11b) would mean: If David scuba dives, then he has a dog and he’ll bring it. In contrast, domain knowledge suggests there is a dependency between being a scuba diver and owning a regulator (i.e., you’re much more likely to own a regulator if you scuba dive than if you don’t). And the narrow scope reading of the presupposition for (11a) reflects this dependency. However, Beaver doesn’t formalise this story, and further inspection reveals that it’s not so simple to do so. In particular, measuring the plausibility of the content as a whole that results from some particular interpretation of a presupposition can’t be right: this would always make the narrow scope reading win over the wide scope one, because the former are entailed by the latter and are therefore necessarily more plausible/probable to be true. In particular, measuring plausibility this way (rather than in the way we described above) would predict the wrong reading of (11b): one where the presupposition has narrow scope because this reading doesn’t require John owns a dog to be true while the wide-scope reading does require this (and is therefore necessarily less plausible). But if we don’t measure the plausibility of the whole content, then what are we measuring the plausibility of? One might try to fix this by only blocking interpretations which entail something highly implausible. Since owning a dog is not highly implausible, the wide scope reading of (11b) would not (necessarily) be ruled out. This contrasts with the plausibility of owning a regulator, providing the basis for preferring the narrow scope reading of (11a). But avoiding interpretations that are highly unlikely to be true isn’t right either, because this strategy predicts the wrong interpretation of (12): (12)
I doubt that the knowledge that this seminal logic paper was written by a computer program running on a pc will confound the editors.
The factive noun knowledge generates the presupposition that this seminal logic paper was written by a computer program running on a pc. Given world knowledge about the current state of nlp technology, this is very unlikely to be true! But interpreting the presupposition so that the result reflects default world knowledge results in the wrong prediction – one where the presupposition takes narrow scope and is
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
101
embedded within the referentially opaque context generated by doubt. Thus unless one suitably constrains what one measures the plausibility of, Beaver’s constraints on accommodation are in danger of predicting a narrow scope reading where intuitively we get a wide scope one. Rhetorical relations can offer the means to constrain accommodation appropriately. In essence, they determine what we should be measuring the plausibility of, for they allow us to cache out plausibility in terms of the ‘naturalness’ or overall quality of the different rhetorical links in different candidate interpretations. Let’s assume that, just as asserted content is coherent only if it’s rhetorically connected to something in the context, presupposed content is also coherent only if it’s rhetorically connected to the context. So the semantic scope of a presupposition depends on which part of the context the presupposition binds to with a rhetorical relation. Now, we can assume that on on the one hand, presuppositions have a default tendency to project from embeddings and rhetorically connect to propositions which have widest scope. But on the other hand, to capture the effects of plausibility on interpretation, let’s assume a monotonic principle that we prefer discourse interpretations that maximise the naturalness of the rhetorical links between its constituents. This in fact follows from a more general principle of interpretation that we motivate in (Asher and Lascarides, 2003): the principle of Maximising Discourse Coherence (or mdc). This principle rests on the observation that coherence is not a yes/no matter, but it can vary in quality. And mdc states that one (monotonically) prefers discourse interpretations that are consistent with the compositional semantics of the clauses and maximise coherence. But how does one measure the degree of coherence of a discourse interpretation? sdrt takes a very conservative view on how interpretations are ranked, which is as roughly follows (for a full definition of mdc, see (Asher and Lascarides, 2003): DEFINITION 5. Maximise Discourse Coherence (or mdc) Discourse is interpreted so as to maximise discourse coherence, where the ranking among interpretations are encapsulated in the following principles: 1. All else being equal, the more rhetorical connections there are between two items in a discourse, the more coherent the interpretation. 2. All else being equal, the more anaphoric expressions whose antecedents are resolved, the higher the quality of coherence of the interpretation.
102
LASCARIDES AND ASHER
3. Some rhetorical relations are inherently scalar. For example, the quality of a Narration is dependent on the specificity of the common topic that summarises what went on in the story; the quality of a Contrast is dependent on the extent to which the semantics of the connected propositions are dissimilar (to see this, consider John loves to collect classic cars. But his favourite car is a 1999 Ford Mondeo, which is a ‘better’ contrast than John loves to collect classic cars. hates football). All else being equal, an interpretation which maximises the quality of its rhetorical relations is more coherent than one that doesn’t.
Now consider (11a) again. It seems intuitively plausible that the quality or ‘naturalness’ of the rhetorical relation Consequence in (11a) – as triggered by the word if – is improved if it connects a proposition that John scuba dives to one that includes the content that he has a regulator compared to the alternative where no such logical dependency between scuba diving and owning a regulator is recorded. So by clause 3. of mdc we predict that the presupposition has narrow scope relative to the conditional, the default that presuppositions take wide scope being overridden because mdc is monotonic. In (11b), on the other hand, world knowledge doesn’t support a logical dependency between scuba diving and owning a dog, and so the naturalness of the Consequence link wouldn’t be enhanced by assigning the presupposition narrow scope, and so the default for it to take wide-scope wins. Overall, the strategy we advocate here, contra Hobbs et al. (1993), is to separate the task of constructing the interpretation of an utterance from the likelihood that this interpretation is true. Instead, we aim to construct the interpretation of a discourse by reasoning about the demands that are imposed on it by discourse coherence. This strategy allows us to apply preferences based on likelihood more selectively. 3.4. Some other Phenomena Interpreting definite descriptions typically involves computing a bridging relation to some antecedent, and rhetorical relations affect this. Observe that contrary to world knowledge, the waitress in (13c) is the waitress in the hotel rather than the waitress in the Thai restaurant: (13)
a. b. c.
We had dinner at a Thai Restaurant and then drinks at a fancy hotel on 5th Avenue. The waitress was from Bangkok.
The right-frontier constraint mentioned earlier would predict this, since (13a) is connected to (13b) with the coordinating relation Narration
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
103
and so only (13b) is on the right frontier when (13c) is interpreted. Rhetorical relations also explain the minimal pair (14ab) vs. (14ab ): (14)
a.
John moved engine E1 from Avon to Dansville.
b.
He picked up the boxcar and took it to Broxburn.
b .
He also took the boxcar.
The boxcar in (14ab) is interpreted as a boxcar in Dansville, but changing the rhetorical relation (from Narration to Parallel) through the introduction of also in (14ab ) changes that interpretation: now the boxcar is in Avon when it’s picked up. Word sense disambiguation is similarly effected by discourse structure: (15)
(16)
a.
John bought an apartment.
b.
But he rented it.
a.
The judge asked where the defendant was.
b.
The barrister said he was in the pub drinking.
c.
The bailiff found him slumped beneath the bar.
c .
But the bailiff found him slumped beneath the bar.
The intuitive interpretation of (15ab) is one where rent is interpreted as rent out (i.e., John is the landlord). This is predicted by mdc, because this interpretation supports not only a Contrast relation between the constituents (as required by but) but also Narration (which entails that the buying happened before the renting). The rent-from sense of rent doesn’t support Narration and moreover the quality of the Contrast is worse (though coherent, because one isn’t usually a tenant in an apartment before one buys it). In (16a–c), the noun bar is ambiguous; it has (at least) a pub/placeto-drink sense and a courtroom sense. But it is not ambiguous in this discourse, where the preferred interpretation of bar is its pub sense. Arguably, the most detailed models of word sense disambiguation are stochastic models which are trained on corpora and other on-line resources (e.g., Guthrie et al., 1991; Dagan et al., 1997). But these models cannot fully explain the data in (16): if it predicts the right sense disambiguation of bar in (16c), then it will get the wrong results when (16c) is replaced with (16c ) (and where bar now means the courtroom bar), for the word but is statistically speaking independent of the word bar.
104
LASCARIDES AND ASHER
Using rhetorical relations as a clue to word sense disambiguation yields a different story. Roughly, (16abc) is a narrative: the proposition expressed by each sentence is related by the discourse relation Narration to the proposition expressed by the previous sentence. Narration imposes strong spatio-temporal constraints on the actors and events involved (see Section 4.1): the narrative links are better if the locations of objects in the interpretation of the current clause comply with expectations from the context vs. when they don’t so comply (e.g., if bar in (16c) is replaced with courtroom bar). So mdc predicts that we interpret (16c) so that the defendant is found in the pub and not in court, making bar disambiguate to its pub sense. But (16abc ) has a different interpretation, because the rhetorical role of (16c ) is different. This is related with Contrast (plus perhaps other discourse relations too) to (16b) thanks to the cue phrase but;1 and this has different spatio-temporal effects to the cases where the clauses are related only with Narration (e.g., (16abc)). Roughly, because of mdc, the spatialtrajectory of the objects must be such as to maximise the ‘semantic differences’ of the propositions connected (e.g., the expectations that arise from the content of one proposition should be asserted as false in the other). In this case, this means that interpreting bar as the courtroom bar is preferred. For then the expectation arising from (16b) – that the defendant is not in the courtroom – is asserted as false in (16c ). Thus interpreting bar as the courtroom bar in (16abc ) is predicted by the principle mdc. Using rhetorical relations to analyse (16) is complementary to using world knowledge. The presence of but is crucial to the distinct interpretations of (16abc) and (16abc ), and the fact that but favours interpretations where expectations in the context get violated is a matter of linguistic convention rather than world knowledge.
4. The Logic of Information Content
Having motivated the need for rhetorical relations, we will now extend the language of drss accordingly. We introduce two new expressions: speech act discourse referents label content (either of a clause or of text segments), and keep track of the token utterances in the discourse; and 1
Asher and Lascarides (2003) argue that explicit cues, such as the presence of a cue phrase but, must be present when the Contrast relation conveys a denial of expectation. This is why the clauses (16bc) cannot be interpreted so that they are connected with Contrast.
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
105
rhetorical relations relate speech act discourse referents. The resulting structures are known as segmented drss or sdrss. We start with a definition of sdrs-formulae, which express the content tagged by speech act discourse referents (or “labels”) in sdrss: DEFINITION 6. SDRS-Formulae The well-formed sdrs-formulae are constructed from the following vocabulary: 1. The vocabulary of drss. 2. Labels or speech act discourse referents π, π 1 , π 2 , . . . 3. A set of relation symbols for discourse relations: e.g., Explanation, Contrast, Narration etc. The set Φ of well-formed sdrs-formulae consists of: 1. The set of drss 2. If R is a (2-place) rhetorical relation symbol and π 1 and π 2 are labels, then R(π 1 , π 2 ) is an sdrs-formula. 3. If φ and φ are sdrs-formulae, then so are: (φ ∧ φ ) and ¬φ (where ∧ and ¬ are interpreted dynamically, as in Definition 4).
DEFINITION 7. SDRS or Discourse Structure An sdrs or a discourse structure is a triple A, F, LAST , where: − − −
A is a set of speech act discourse referents; LAST is a member of A (intuitively, this is the label of the content of the last clause that was added to the logical form); and F is a function which assigns each member of A an sdrs-formula.
In addition, the following constraint is imposed on A: let Succ(π, π ) hold just in case F(π) contains the literal R(π , π ) or R(π , π ). Then the transitive closure of Succ (which we also call outscopes) from a partial order over A and there is a unique Succ-supremum π 0 ∈ A. When there is no confusion, we may write A, F instead of A, F, LAST .
Note how F assigns to labels sdrs-formulae that contain labels (indeed, this yields the partial order outscopes on A). This captures the intuition that the contents of clauses ‘group together’ to form coherent text segments. Having the unique supremum π 0 corresponds to assuming that the content of the discourse overall receives a single label.
106
LASCARIDES AND ASHER
Let’s illustrate the definition with a couple of examples. (7 ) is the sdrs for the two-sentence discourse (7): (7)
Max fell. John pushed him.
(7 )
A, F, LAST , where: − A = {π 0 , π 1 , π 2 } x, eπ1 − F(π 1 ) = max(x), fall(eπ1 , x), eπ1 ≺ n y, eπ2 F(π 2 ) = john(x), push(eπ2 , y, x) eπ2 ≺ n F(π 0 ) = Explanation(π 1 , π 2 ) − LAST = π 5
The temporal relation between eπ1 and eπ2 is not explicitly encoded, but instead follows from the semantics of Explanation(π 1 , π 2 ), as we’ll see in Section 4.1. The more complex sdrs (4 ) is the logical form for the first five sentences of (4), so long as K π1 –K π5 are the drss representing the contents of those sentences respectively: (4 )
A, F, LAST , where: − A = {π 0 , π 1 , π 2 , π 3 , π 4 , π 5 , π 6 , π 7 } − F(π 1 ) = K π1 , F(π 2 ) = K π2 , F(π 3 ) = K π3 , F(π 4 ) = K π4 , F(π 5 ) = K π5 F(π 0 ) = Elaboration(π 1 , π 6 ) F(π 6 ) = Narration(π 2 , π 5 ) ∧ Elaboration(π 2 , π 7 ) F(π 7 ) = Narration(π 3 , π 4 ) − LAST = π 5
In words, the overall content of the text, which is labelled π 0 , consists of π 1 (Max having a lovely evening) being elaborated by the narrative (and hence complex proposition) π 6 , which consists of the content of π 2 (having a lovely meal) and π 5 (winning a dancing competition), where π 2 is elaborated by the content of the narrative of π 3 (eating salmon) and π 4 (devouring cheese). We’ll show how to construct this sdrs for (4) in Section 5. The sdrs (4 ) makes use of a convention that for any label π, we call the formula F(π) that it labels K π . We will make use of this
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
107
π0 π 1 ,π 6 π 1 : K π1 π2 , π5 , π7 π 2 : K π2 , π 5 : K π5 Narration(π 2 , π 5 ) π0 : π6 :
π 3 ,π 4 π 7 : π 3 : K π3 , π 4 : K π4 , Narration(π 3 , π 4 ) Elaboration(π 2 , π 7 )
Elaboration(π 1 , π 6 )
Figure 11.
The sdrs (4 ) in drt-style notation
convention from now on. We also adopt other conventions for writing sdrss. For example, one can use the ‘box-style’ notation familiar from drt and our earlier work (e.g., Asher, 1993; Asher and Lascarides, 1998). First, we convey F(π) = φ by writing π : φ. We then convey the “immediate outscopes” relation Succ as follows: If Succ(π, π ), then π appears in the top strip of K π and π : K π in the main part of K π . Thus Figure 11 is just another way of coding up the sdrs (4 ). Another way of coding up sdrss explicitly shows which rhetorical relations are subordinating and which are coordinating, as shown in Figure 12. sdrt’s constraints on anaphora (see Definition 8) ensures that antecedents must be drs-accessible on the right frontier of this structure (unless Contrast or Parallel are present, where this principle breaks down). The definition of sdrss allows two labels π 1 and π 2 to be related by more than one rhetorical relation. This plurality allows an utterance to make more than one illocutionary contribution to the discourse. For example, π 1 and π 2 in (15) are related by both Contrast (as indicated by but) and Narration (because as we will see shortly, this ensures the right temporal effects, that the buying precedes the renting):
108
LASCARIDES AND ASHER
π1
[John had a lovely evening] Elaboration
π6 Narration π2 [He had a great meal]
π5 [he won a dance competition]
Elaboration π7 Narration π3 [he ate salmon] Figure 12.
(15)
π4 [he devoured cheese]
A Graphical Representation of the sdrs (4 )
π 1 . John bought an apartment π 2 . but he rented it.
It also allows a given utterance to be rhetorically connected to more than one proposition in the context. This also allows an utterance to make more than one illocutionary contribution to the discourse, this time to more than one part of the context. For example, an adequate interpretation of dialogue (17) requires the relations Correction(π 2 , π 3 ) and Elaboration(π 1 , π 3 ) to both be identified, yielding the sdrs depicted in Figure 13, for logically inferring one of these relations is co-dependent on inferring the other. (17)
π 1 . A: Max owns several classic cars. π 2 . B: No he doesn’t. π 3 . A: He owns two 1967 Alfa spiders.
Before examining the semantics of sdrss, let’s define the constraints on anaphora, for like the accessibility constraints of drt, these are defined in terms of the form of an sdrs but not its interpretation. We first define which labels in the sdrs new information can attach to with a rhetorical relation, and then define constraints on antecedents in
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
109
π 1 [Max owns several classic cars] Correction π 2 [No he doesn’t]
Elaboration
Correction π 3 [He owns two 1967 spiders] Figure 13.
The sdrs for (17)
terms of that. The following definition doesn’t apply when the rhetorical relations Contrast or Parallel are present; for details of the special effects of these relations, see (Asher and Lascarides, 2003). DEFINITION 8. Availability Let A, F, LAST be an sdrs, and let K β (which we label β) be new information. Then β can attach with a rhetorical relation to: 1. The label α = LAST ; 2. Any label γ such that: a) Succ(γ, α); or b) F(λ) = R(γ, α), for some label λ, where R is a subordinating discourse relation (Elaboration, Explanation etc.). We gloss this as α < γ. 3. Transitive Closure: Any label γ that dominates α through a sequence of labels γ 1 , . . . , γ n such that α < γ 1 < . . . < γ n < γ. Let K β contain an anaphoric condition ϕ. Then the available antecedents are: 1. in K β and drs-accessible to ϕ; or 2. in K α , drs-accessible to any condition in K α , and there is a condition R(α, γ) in the sdrs such that γ = β or γ outscopes β (i.e., γ is related to β by a sequence of Succ relations).
This definition ensures that antecedents to anaphora must be drsaccessible on the right frontier of the discourse structure (unless Contrast and Parallel are present, where in certain cases drs-inaccessible
110
LASCARIDES AND ASHER
discourse referents become available; see Asher and Lascarides (2003) for details). This means that It was a beautiful pink in (4) isn’t acceptable: the discourse referent introduced by salmon is in K π3 , and π 3 is not on the right frontier and so It was a beautiful pink cannot attach to it. Thus sdrt’s constraints on anaphora refines those of drt. It also refines those of ai-based theories of discourse structure which adopt the right-frontier constraint (e.g., Grosz and Sidner, 1986; Webber, 1991). sdrt correctly predicts that (43b) is odd, because although π 2 can attach to π 1 , the discourse referent introduced by every man in π 1 isn’t drs-accessible in K π1 , and hence it’s not available to he in π 2 : (43)
b.
π 1 . Every man walked in. π 2 . ??He ordered a beer.
4.1. The Dynamic Semantics of SDRSs By supplying truth definitions for all sdrs-formulae, one can assign an interpretation to the formula F(π 0 ), where the label π 0 outscopes all other labels in the sdrs (a unique such label must exist by the definition of sdrss). Typically, F(π 0 ) is some boolean combination of formulae of the form R(π i , π j ) (e.g., see (4 )). We also define the interpretation of R(π i , π j ) in terms of K πi and K πj (i.e., F(π i ) and F(π j )). Thus, we can recursively unpack the semantics of an sdrs, culminating in interpreting the contents of the clauses. These contents are typically drss, with the semantics given in Definition 4. sdrt’s semantics in fact extends drt’s semantics. We need to assign a semantics to rhetorical relations. For the sake of simplicity, we will restrict attention here to an extensional approximation of the intensional semantics for rhetorical relations (to handle intensional relations one needs to make contexts into pairs of the form (w, f ), where w is a possible world and f is an assignment function, and this induces more complications than we want to go into here). Unlike predicate symbols such as love within the drt vocabulary, rhetorical relations are not interpreted as imposing tests on the input information state. Rather, they define a real transition across information states. For example, veridical rhetorical relations satisfy the schema given below: − Satisfaction Schema for Veridical Rhetorical Relations f [[R(π 1 , π 2 )]]M g iff f [[K π1 ]]M ◦ [[K π2 ]]M ◦ [[φR(π1 ,π2 ) ]]M g where φR(π1 ,π2 ) expresses the semantic constraints pertinent to the particular rhetorical connection R(π 1 , π 2 ).
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
111
Veridical relations include Narration, Explanation, Elaboration, Background, Contrast and Parallel. This schema ensures that these relations entail the two propositions they connect. It contrasts with non-veridical relations such as Alternation (which is the sdrt way of expressing or), and relations such as Correction, where f [[Correction(π, π )]]g entails f [[¬K π ]]f (for a full definition of Correction see Asher and Lascarides, ress). More generally, rhetorical relations act semantically like complex update operators, and their interpretation reflects the special semantic influence that they have on the propositions they connect. It also reflects their status as speech acts: like other kinds of actions, they change the context. We’ll focus attention on veridical relations from now on. Interpreting sdrss involves defining the values of φR(π1 ,π2 ) for various relations R. For most relations, φR(π1 ,π2 ) is defined in terms of K π1 and K π2 or the discourse referents introduced in them. For example Asher and Lascarides (ress) define φNarration(π1 ,π2 ) to mean the end of the main eventuality (or semantic index in hpsg terms) eπ1 in K π1 overlaps, both in space and time, with the beginning of the main eventuality eπ2 in K π2 . This ensures that so long as the logical form of (6) contains Narration(π 1 , π 2 ), the interpretation of this logical form entails the temporal progression of the events (i.e., Max falling precedes John helping him up). It also places the boxcar in the narrative (14ab) in Dansville (for the event of picking up the boxcar starts in Dansville, since this is where the event in (14a) ended). The location of the boxcar is different in (14ab ) because Parallel(π 1 , π 2 ) imposes different spatio-temporal constraints from Narration. In contrast, φExplanation(π1 ,π2 ) entails cause(eπ2 , eπ1 ). Thus the logical form (7 ) of (7) entails that the pushing caused the falling, even though the compositional semantics of the clauses don’t entail this (note that we have assumed in (7 ) that him binds to Max; by availability this is the only choice). (7)
Max fell. John pushed him.
We’ll examine how one constructs this logical form for (7) in Section 5. Here we show how it’s interpreted. This sdrs relates the input variable assignment function f to g iff the content K π0 that F assigns to the highest label π 0 does this. I.e., f [[K π0 ]]M g. K π0 is Explanation(π 1 , π 2 ). So f [[K π0 ]]M g iff f [[Explanation(π 1 , π 2 )]]M g. According to the semantics of Explanation this holds iff: 1. There is an h such that f [[K π1 ]]M h, and 2. There is an i such that h[[K π2 ]]M i and
112
LASCARIDES AND ASHER
3. i[[cause(eπ2 , eπ1 )]]M g. Clause 1. holds iff h is defined for x and eπ1 and h(eπ1 ), h(x) ∈ I M (fall) etc. Clause 2. holds iff i extends h such that i(eπ2 ), i(y), i(x)∈ I M (push) etc. Finally, clause 3. holds iff i = g and i(eπ2 ), i(eπ1 ) ∈ I M (cause). In other words, Max fell, John pushed him, and the latter event caused the former. We have now introduced the language of sdrss and their dynamic semantic interpretation, which in turn makes sense of the availability constraint on anaphora which we defined in terms of rhetorical structure. The question now arises as to how one constructs these logical forms for discourse.
5. Constructing Logical Form
To construct a conceptually clean account of how to reason about discourse structure and construct logical forms, sdrt distinguishes between the sdrss themselves and a language in which we describe them. As interpreters attempt to reconstruct the intended logical form of a discourse, they must confront many ambiguities: the grammar and lexical semantics typically underdetermines the intended logical form thanks to semantic scope ambiguities, anaphora of various kinds such as pronouns and presuppositions, and lexical ambiguities. sdrt contains a description-language Lulf which allows us to analyse and to reason about such underdeterminination (ulf stands for underspecified logical form). sdrt’s glue logic then defines the pragmatically preferred ways of resolving semantic underspecification. 5.1. The Language Lulf of Semantic Underspecification The language Lulf partially describes the form of sdrss. It allows us to express how a given knowledge source, such as the grammar, yields only partial information about content. Let’s clarify the idea with an example. Ignoring rhetorical relations for now, sentence (18) contains a two-way semantic scope ambiguity between the indefinite np and might and an anaphoric ambiguity, as given by him. (18)
A man might push him.
Let’s assume that the discourse context is such that there are two available antecedents for him: z 1 and z 2 . Then there are four fully determinate logical forms for (18). Two of these are shown in Figure 14
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
∃ x
might
man
might
x
∧
push x
∃ x
y
Figure 14.
z1
∧
man x
= y
113
x
push
= y
y
z2
Two logical forms for (18), shown as trees
(the first one corresponds to ∃ outscoping might and him resolving to z 1 , and the second corresponds to might outscoping ∃ and him resolving to z 2 ); note that these trees show the form of the determinate logical forms. We want this form to be all that the description-language Lulf ‘knows’ about. In fact, the two trees in Figure 14 will each correspond to a model of Lulf , so that M |= Lulf φ corresponds to: the ulf φ (partially) describes the unique determinate logical form that corresponds to M . But how do we express partial descriptions of such trees? In this example, what’s the formula or ulf φ in Lulf that describes just the four trees or determinate logical forms for (18) and no others? Well, following the usual strategy (Bos, 1995; Asher and Fernando, 1997; Copestake et al., 1999), Lulf ’s vocabulary consists of labels which pick out nodes in the trees of Figure 14. These labels allow one to talk independently about on the one hand the logical connectives, predicate symbols and variables that are present in the determinate logical form and on the other hand the way they are combined. Thus labels tag bits of content (as expressed in the sdrs-language); in fact, all constructors (∧, =, man, x etc.) in the sdrs-vocabulary become predicate symbols over labels in Lulf . We can then express partial information about semantic scope by underdetermining the outscopes constraints on labels (in this case, the ulf will underdetermine the relative semantics scopes of the label that tags ∃ and the label that tags might). Information about anaphoric conditions amounts to not knowing the value of an sdrs-discourse referent (at least, for pronouns referring to individuals). Discourse referents become one-place predicates in Lulf , the argument of the predicate being reserved for the label that tags its position in the ‘trees’ of the kind shown in Figure 14. So the compositional semantics of a pronoun involves not knowing the value of a one-place predicate in Lulf , and
114
LASCARIDES AND ASHER
∃ might x
man
l2 l4
x ∧ push x Figure 15.
= y
y
?
A graphical representation of the ulf (18 )
is thus represented with a high-order variable. For simplicity, we gloss the anaphoric condition as x =? as given earlier, although in fact one should think of x and ? as one-place predicate symbols in Lulf , and one should also bear in mind that this gloss ignores the labels indicating their position in the trees of Figure 14. So the ulf for (18) (in simplified notation, where labels that don’t contribute to the semantic ambiguities are ignored) is (18 ); we’ve shown this graphically in Figure 15 (where curved arrows convey the outscopes conditions). (18 )
l1 : ∃(x, man(x), l2 )∧ l3 : might(l4 )∧ l5 : ∧(l6 , l7 ) ∧ l6 : push(x, y) ∧ l7 : x =?∧ outscopes(l4 , l5 ) ∧ outscopes(l2 , l5 )
Lulf can also express underspecified information about rhetorical connections. For example, ?(π 1 , π 2 , π 0 ) expresses the information that π 1 and π 2 are rhetorically connected and the resulting connection is labelled π 0 , but the value of the rhetorical relation is unknown (as shown by the higher-order variable ‘?’). The compositional semantics of sentence-initial but includes the underspecified condition Contrast(?1 , π 2 , ?2 ) (again, this is a notational gloss for the formula involving higher-order variables), where π 2 is the top label of the clause that’s syntactically outscoped by But in the grammar. This indicates that sentence-initial but generates a Contrast relation between the label of a proposition that’s not determined by the grammar (although it may be by some other knowledge source) and the label of the proposition
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
115
denoted by the clause that’s syntactically outscoped by but, and the label that’s assigned to this Contrast connection in the sdrs is also unknown. The satisfaction relation |= Lulf is defined relative to finite first-order models (i.e., trees like those in Figure 14) and the higher-order variables which are used to express unknown values of predicate symbols etc. in the sdrs are interpreted substitutionally. In fact, |= Lulf is monotonic, extensional, static and decidable. This contrasts with the logical of sdrss themselves, which is dynamic and undecidable. The difference comes from the fact that |= Lulf consists of reasoning only about the form of sdrss but not their (dynamic) interpretation. In essence, |= Lulf relates a ulf to all possible ways of resolving the underspecification, making it unnecessary to define separately a notion of supervaluation, contra Reyle (1993). However, the sdrt framework not only defines the possible ways of completing an underspecified logical form, it defines the pragmatically preferred ways of doing it. This is part of the definition of discourse update, which is defined in terms of the glue logic. 5.2. The Glue Logic The glue logic of sdrt defines a nonmonotonic consequence relation |∼g over Lulf . Together with the principle mdc described earlier, this defines the pragmatically preferred interpretations of ulfs. In general, pragmatically preferred interpretations are more informative: they are a subset of the possible interpretations. Or to put it another way, |∼g generates more consequences than |= Lulf . The glue logic has only limited access to the logic of sdrss: since it accesses only ulfs it knows about the forms of sdrss but not their (dynamic) interpretations. It also has only restricted access to information in domain knowledge, the lexicon and cognitive states. The relationship between these richer knowledge sources and their shallow form in the glue language is very similar to the relationship between sdrss proper and their corresponding ulfs. Building these ‘porous’ fences between the information sources that contribute to discourse interpretation and the logic in which logical form is constructed is the only way of ensuring that constructing logical form – or equivalently, computing what is said – is computable. The glue logic in combination with mdc then determines the following logically dependent information: 1. the (pragmatically preferred) values of certain underspecified conditions that are generated by the grammar;
116
LASCARIDES AND ASHER
2. which labels are rhetorically connected to which other labels (this is equivalent to the task of text segmentation); 3. the values of the rhetorical relations. This information is computed on the basis of inferences over default axioms within the glue logic, written A > B (which is read as If A then normally B). These express information about pragmatically preferred values of underspecified conditions in a given ulf. sdrt thus enriches dynamic semantics with contributions from pragmatics in a constrained way. It’s a contribution from pragmatics in that the default axioms are justified on the basis of pragmatic information such as domain knowledge and cognitive states; it’s constrained because of its limited access to these information sources. Many glue-logic axioms are schemata of the form (19) (where α, β and λ are metavariables over sdrs-labels π 1 , π 2 etc.): (19)
(?(α, β, λ) ∧ Info(α, β, λ)) > R(α, β, λ)
In words, if β is to be attached to α with a rhetorical relation and the result is labelled λ, and information Info(α, β, λ) about α, β and λ, that is transferred into the glue logic from more expressive languages such as that of sdrss, the lexicon, domain knowledge and cognitive states holds, then normally, the rhetorical connection is R. Observe that Info(α, β, λ) expresses information from rich knowledge sources that contribute to discourse interpretation in a shallow form: for example the discourse content present in sdrss is transferred into the glue logic in a shallow form, as expressed in Lulf . For example, Narration stipulates that if β is to be connected to α and α occasions β, then normally the relation is Narration: − Narration: (?(α, β, λ) ∧ occasion(α, β)) > Narration(α, β, λ). Scriptal knowledge can be used to infer occasion predicates (by default), and such knowledge takes the following form: if two event types of a certain kind (φ and ψ) are to be related, then occasion can normally be inferred: − Scripts for Occasion: (?(α, β, λ)∧φ(α)∧ψ(β)) > occasion(α, β) Of course this isn’t a general schema in the sense that any φ and ψ will do. Instances of Scripts for Occasion will depend on particular semantic contents of the clauses involved. For example, we assume there’s scriptal information or ‘domain knowledge’ that if x’s falling and y’s helping x up are connected somehow, then the former occasioned
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
117
the latter (we forego giving the formal axiom here, but see Asher and Lascarides (ress)). By contrast, Explanation(α, β) can be inferred when there’s evidence in the discourse that β causes α. Evidence of a causal relation is distinct from a causal relation actually holding; the glue logic expresses evidence in the discourse of a causal relation with causeD (β, α) and the actual causal relation between events with cause(eβ , eα ); note that the former does not entail the latter. However, given the default rule Explanation and the semantics of Explanation given earlier, evidence in the discourse of a causal relation non-monotonically yields an sdrs which does entail an actual causal relation (e.g., the sdrs in (7 )): − Explanation: (?(α, β, λ) ∧ causeD (β, α)) > Explanation(α, β, λ). Glue-logic axioms for inferring causeD (β, α) are monotonic, for either the discourse contains evidence of a causal connection or it doesn’t. For example, Causation and Change stipulates that if eα describes a change (of some kind, e.g., a change in location) in y, and eβ describes a force that would cause a change (of that same kind, in this case a change in location), then causeD (β, α) holds: − Causation and Change: (change(eα , y)) ∧ cause-change-force(eβ , x, y)) → causeD (β, α) This rule applies in the analysis of (7): (7)
π 1 . Max fell. π 2 . John pushed him.
Lexical semantics stipulates that fall is a verb describing a change in location and push is a verb describing a force that causes a change in location (see Asher and Lascarides (ress) for detailed motivation for this position). Moreover, if the discourse is coherent, then the π 1 must be connected to π 2 with a rhetorical relation. Hence ?(π 1 , π 2 , π 0 ) holds. By Definition 8, this means that the only available antecedent for the pronoun is Max. Thus the information about content that’s transferred into the glue logic from Lulf (and the lexicon) verifies the antecedent of Causation and Change, and so causeD (π 2 , π 1 ) is inferred. Thus the antecedent to Explanation is verified, and so Explanation(π 1 , π 2 , π 0 ) is non-monotonically inferred. The definition of sdrt update given in the next section uses this output of the glue logic to ensure that the (pragmatically preferred) logical form for (7) is the sdrs in (7 ). Given the semantics of this sdrs – and in particular the dynamic interpretation of Explanation(π 1 , π 2 ) – discourse (7) is correctly predicted to mean: Max fell, John pushed Max, and the latter caused the former.
118
LASCARIDES AND ASHER
The relation Elaboration is inferred via axioms that are analogous to the ones for Explanation, save now the discourse evidence is for subtype information (written subtypeD ): − Elaboration: (?(α, β, λ) ∧ subtypeD (β, α)) > Elaboration(α, β, λ) 5.3. Discourse Update The consequences of the glue logic are used to build the sdrs for discourse (7). Discourse update in sdrt is entirely declarative, and in the absence of divergent relations such as Correction (which we won’t consider here), it’s also a monotone decreasing function thereby reflecting the idea that one monotonically accumulates more information as the discourse is interpreted. Discourse update is defined as a sequence of simple (and monotone decreasing) update operations +, where + is defined in terms of the glue-logic consequence relation |∼g . This simple update operation works over the set σ of sdrss which represents the content of the discourse context. The ulf which describes this set of sdrss is the theory of σ in Lulf , written Th(σ). + also takes as input some new information: this is either a ulf Kβ (e.g., this could be the ulf of a clause as generated by the grammar), or it is an assumption ?(α, β, λ) about attachment where Th(σ) |= Lulf Kβ (in other words, the ulf Kβ is part of the description of logical form already). The result of + over these arguments is a set σ of discourse structures which (a) is a subset of the old information σ in that it satisfies this old information and also the new information; and (b) it also ensures that any |∼g -consequences of the old information and the new are satisfied too. Note that all monotonic consequences are nonmonotonic consequences, and so ensuring that the updated context satisfies (b) will make it satisfy (a) as well (because the old and the new information follows monotonically from itself). More formally: DEFINITION 9. The Simple Update + Let σ be a set of (fully-specified) discourse structures. And let ψ be either (a) a ulf Kβ , or (b) a formula ?(α, β, λ) about attachment, where Th(σ) |= Lulf Kβ . Then σ + ψ is a set of sdrss defined as follows: 1. σ+ψ = {τ : if Th(σ), ψ|∼g φ then τ |= Lulf φ}, provided the result is not ∅; 2. σ + ψ = σ otherwise.
Recall that sdrss are in effect models of the ulf-logic Lulf . τ |= Lulf φ means that φ (partially) describes τ . Simple update is thus defining
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
119
the set of sdrss which comply with a partial description of the logical form of the discourse, this partial description being the conjunct of the |∼g -consequences as shown (making it a formula of Lulf ). In essence, + defines a constraint-based approach to constructing logical form: it uses the old information and the new to accumulate constraints on the form of the sdrs which ultimately represents the interpretation of the updated discourse. Discourse update updatesdrt itself is very conservative: it remains neutral about what’s attached to what. In other words, suppose that A is the set of available attachment sites in the old information σ for the new information β. Then the power set P(A) represents all possible choices for what labels αi in σ the new label β is actually attached to. updatesdrt is neutral about which member of P(A) is the ‘right’ choice, for updatesdrt (σ, Kβ ) is the union of sdrss that result from a sequence of +-operations for each member of P(A) (we forego giving the formal definition here, but see Asher and Lascarides (ress) for details). Since updatesdrt is defined in terms of +, it is also a monotone decreasing function, reflecting the idea that interpreting discourse amounts to a (monotonic) accumulation of information. Any satisfiable set of statements in Lulf describe a countably infinite set of equivalence classes of sdrss (where equivalence is alphabetic variance). To see why, simply observe that a discourse can continue in an indefinite number of ways. So the output of + can be a countably infinite set. This has no adverse computational effects on sdrt update however. Performing updates is simply a matter of accumulating more and more constraints in the description language Lulf as |∼g -consequences, as shown above. If at any point during discourse processing one wants to actually interpret the discourse (so far), then one needs to construct all pragmatically preferred sdrss which satisfy the description (that’s accumulated so far). Note that while the glue logic uses pragmatic information to compute rhetorical relations, thereby ensuring that + eliminates some pragmatically inadmissible logical forms, ranking the models in the update is done via the principle mdc given in Definition 5 together with the following additional factor that determine ranking: we prefer models (or sdrss) with a minimum number of labels. The content of a discourse at any given point will be those things that follow from the highest ranked sdrss in the update. In essence, only a subset of sdrss in the update are the ones that ‘matter’, and because of the minimality constraint there are a finite number of these (up to alphabetic variance).
120
LASCARIDES AND ASHER
This ranking of models by mdc (plus minimality) is in fact what influences our inferences about what’s rhetorically connected to what in more complex discourses such as (4). (4)
π1 . π2 . π3 . π4 . π5 .
John had a great evening last night. He had a great meal. He ate salmon. He devoured lots of cheese. He won a dancing competition.
The output of updatesdrt for (4) contains many different sdrss, each with different assumptions about which labels are rhetorically connected. However, the highest ranked sdrss in the update according to mdc are those with the minimum number of labels, the maximum number of rhetorical connections, the fewest unresolved semantic ambiguities (including anaphoric conditions) and no inconsistencies. These principles determine that the sdrs in Figure 11 is the preferred model in the ranking. A full analysis is given in (Asher and Lascarides, ress), but to illustrate the point we focus on one particular decision: what π 5 attaches to. Given the interpretation of the prior context π 1 –π 4 , Definition 8 means that there are five available labels for π 5 to attach to: π 4 (because this is the LAST label), π 7 (because it immediately outscopes π 4 ), π 2 (because it’s attached to π 7 with the subordinating relation Elaboration), π 1 (because it’s attached to π 2 with Elaboration) and π 0 (because it immediately outscopes π 1 and π 2 ). Note that π 3 isn’t available. Thus updatesdrt will output the sdrss that follow from any combination of these attachment assumptions, and mdc must then rank these choices. There are no glue-logic axioms which allow us to infer occasion, subtype D or cause D for linking π 5 and π 4 , and so if π 4 is one of the actual attachment sites, then the update would include the underspecified condition ?(π 4 , π 5 , π ). The same holds for the link between π 6 and π 5 . However, attempting to attach π 5 to just π 1 and π 2 yields something more coherent according to our assumptions, in that the update won’t include these (rhetorical) underspecifications. There is subtype information we can exploit in attaching π 5 to π 1 , yielding Elaboration(π 1 , π 5 ). So eating the meal and winning the dance competition are both part of the evening. This additional information verifies an occasion-axiom for π 5 and π 2 , yielding Narration(π 2 , π 5 ). So mdc determines that the sdrss with highest ranking are those where π 5 attaches to π 1 and to π 2 , but not to π 0 (such an attachment would
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
121
not allow us to resolve the pronoun he in π 5 ), π 4 or π 6 . This is exactly the sdrs shown in Figure 11. As we explained earlier, similar reasoning involving mdc predicts the correct interpretations of the presuppositions in (11a) vs. (11b), the lexical sense disambiguations in (15) and (16), and the bridging inferences in (13) and (14) though we forego spelling out the formal details here.
6. Conclusion
We have presented brief highlights of Segmented Discourse Representation Theory or sdrt. sdrt is distinct from other dynamic semantic theories in that it enriches logical forms to include rhetorical relations, to which sdrt assigns a semantics, making them complex update operators. Indeed, all logical forms are interpreted compositionally and dynamically. sdrt refines the accessibility constraint on anaphora, replacing it with the notion of availability which takes both logical structure and rhetorical structure into account. Because logical structure is a factor in blocking off antecedents, it also refines the right-frontier constraint from ai-based work on discourse structure which ignores this information source. sdrt includes a language Lulf in which logical forms of discourse are described. This language essentially knows about the form of an sdrs, but not its (dynamic) interpretation. It’s a language in which one can express information about semantic underspecification, and its consequence relation captures the relation between an underspecified logical form and all its possible interpretations. Discourse update in sdrt takes this further, defining a relation between an underspecified logical form and its pragmatically preferred interpretations. This is achieved via (a) the glue logic, which consists of axioms describing default values for certain underspecified semantic conditions; and (b) the principle mdc, which imposes a ranking on the set of sdrss that are output by updatesdrt , as determined by the glue logic consequence relation |∼g . The process of computing these pragmatically preferred logical forms is decidable, unlike the interpretation mechanisms described in much of the ai-research (e.g., Hobbs et al., 1993; Lochbaum, 1998; Grosz and Sidner, 1990; Traum and Allen, 1994). We believe that this is crucial for an adequate model of semantic competence, since it’s essential to
122
LASCARIDES AND ASHER
explaining why competent language users by and large agree on what was said, if not its consequences.
References Asher, N.: 1993, Reference to Abstract Objects in Discourse. Kluwer Academic Publishers. Asher, N. and T. Fernando: 1997, ‘Labelling Representation for Effective Disambiguation’. In: Proceedings of the 2nd International Workshop on Computational Semantics (IWCS-2). Tilburg, pp. 1–4. Asher, N. and A. Lascarides: 1998, ‘The Semantics and Pragmatics of Presupposition’. Journal of Semantics 15(2), 239–299. Asher, N. and A. Lascarides: in press, Logics of Conversation. Cambridge University Press. Asudeh, A. and R. Crouch: 2001, ‘Glue Semantics for hpsg’. In: Proceedings of HPSG-2001. Trondheim, Norway. Beaver, D.: 1996, ‘Local Satisfaction Preferred’. In: P. Dekker and M. Stokhof (eds.): Proceedings of the 10th Amsterdam Colloqium. ILLC, Amsterdam. Bos, J.: 1995, ‘Predicate Logic Unplugged’. In: Proceedings of the 10th Amsterdam Colloquim. Amsterdam, pp. 133–143. Copestake, A., D. Flickinger, I. A. Sag, and C. Pollard: 1999, ‘Minimal Recursion Semantics: An Introduction’. available from http://www-csli.stanford.edu/~aac. Copestake, A., A. Lascarides, and D. Flickinger: 2001, ‘An Algebra for Semantic Construction in Constraint-based Grammars’. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL/EACL 2001). Toulouse, pp. 132–139. Dagan, I., L. Lee, and F. C. N. Pereira: 1997, ‘Similarity-based Methods for Word Sense Disambiguation’. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Meeting of the European Chapter for the Association for Computational Linguistics (ACL97/EACL97). Madrid, pp. 56–63. Davidson, D.: 1980, Essays on Actions and Events. Clarendon Press. Fernando, T.: 1994, ‘Bisimulations and Predicate Logic’. Journal of Symbolic Logic 59(3), 924–944. Geurts, B.: 1996, ‘Local Satisfaction Guaranteed’. Linguistics and Philosophy 19, 259–294. Groenendijk, J. and M. Stokhof: 1991, ‘Dynamic Predicate Logic’. Linguistics and Philosophy 14, 39–100. Grosz, B., A. Joshi, and S. Weinstein: 1995, ‘Centering: A Framework for Modelling the Local Coherence of Discourse’. Computational Linguistics 21(2), 203–226.
DYNAMIC SEMANTICS WITH DISCOURSE STRUCTURE
123
Grosz, B. and C. Sidner: 1986, ‘Attention, Intentions and the Structure of Discourse’. Compuational Linguistics 12, 175–204. Grosz, B. and C. Sidner: 1990, ‘Plans for Discourse’. In: J. M. P. R. Cohen and M. Pollack (eds.): Intentions in Communication. pp. 365–388. Guthrie, J., L. Guthrie, Y. Wilks, and H. Aldinejad: 1991, ‘Subject-Dependent Co-Occurrence and Word Sense Disambiguation’. In: Proceedings of the 29th Association for Computational Linguistics. Berkeley, pp. 146–152. Heim, I.: 1982, ‘The Semantics of Definite and Indefinite Noun Phrases’. Ph.D. thesis, University of Massachussetts. Hobbs, J. R.: 1979, ‘Coherence and Coreference’. Cognitive Science 3(1), 67–90. Hobbs, J. R.: 1985, ‘On the Coherence and Structure of Discourse’. Technical Report csli-85-37, Center for the Study of Language and Information, Stanford University. Hobbs, J. R., M. Stickel, D. Appelt, and P. Martin: 1993, ‘Interpretation as Abduction’. Artificial Intelligence 63(1–2), 69–142. Kamp, H.: 1981, ‘A Theory of Truth and Semantic Representation’. In: M. B. ˜ ˜ J. S. J. A. G.Groenendijk, T. M. V.Janssen (ed.): Formal Methods in the Study of Language. Mathematisch Centrum, Amsterdam, pp. 277–322. Kamp, H. and U. Reyle: 1993, From Discourse to the Lexicon: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic Publishers. Kamp, H. and C. Rohrer: 1983, ‘Tense in Texts’. In: C. S. R. Bauerle and A. von Stechow (eds.): Meaning, Use and Interpretation of Language. Berlin: de Gruyter, pp. 250–269. Koller, A., K. Mehlhorn, and J. Niehren: 2000, ‘A Polynomial-time Fragment of Dominance Constraints’. In: Proceeedings of the 28th Annual Meeting of the Association for Computational Linguistics (ACL2000). Hong Kong. Lascarides, A. and N. Asher: 1993, ‘Temporal Interpretation, Discourse Relations and Commonsense Entailment’. Linguistics and Philosophy 16(5), 437–493. Lascarides, A., N. Asher, and J. Oberlander: 1992, ‘Inferring Discourse Relations in Context’. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics (ACL92). Delaware, pp. 1–8. Lochbaum, K. E.: 1998, ‘A Collaborative Planning Model of Intentional Structure’. Computational Linguistics 24(4), 525–572. Mann, W. C. and S. A. Thompson: 1987, ‘Rhetorical Structure Theory: A Framework for the Analysis of Texts’. International Pragmatics Association Papers in Pragmatics 1, 79–105. Montague, R. M.: 1974, ‘The Proper Treatment of Quantification in Ordinary English’. In: Formal Philosophy: Selected Papers of Richard Montague. Yale University Press, pp. 247–270. Muskens, R.: 1996, ‘Combining Montague Semantics and Discourse Representation’. Linguistics and Philosophy 19, 143–186.
124
LASCARIDES AND ASHER
Reyle, R.: 1993, ‘Dealing with Ambiguities by Underspecification: Construction, Interpretation and Deduction’. Journal of Semantics 10, 123–179. Stone, M. and R. Thomason: 2002, ‘Context in Abductive Interpretation’. In: Proceedings of the 6th International Workshop on the Semantics and Pragmatics of Dialogue (Edilog). Edinburgh. Traum, D. and J. Allen: 1994, ‘Discourse Obligations in Dialogue Processing’. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL94). Las Cruces, New Mexico, pp. 1–8. van der Sandt, R.: 1992, ‘Presupposition Projection as Anaphora Resolution’. Journal of Semantics 9(4), 333–377. Webber, B. L.: 1991, ‘Structure and Ostension in the Interpretation of Discourse Deixis’. Natural Language and Cognitive Processes 6(2), 107–135.
´ RAQUEL FERNANDEZ, JONATHAN GINZBURG, HOWARD GREGORY AND SHALOM LAPPIN
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE
1. Introduction
A major challenge for any grammar-driven text understanding system is the resolution of fragments. Basic examples include bare NP answers (44a), where the bare NP John is resolved as the assertion John saw Mary, and sluicing (44b), where the wh-phrase who is interpreted as the question Which student saw John. (44) a. A: Who saw Mary? B: John b. A: A student saw John. B: Who? Either the antecedent or the fragment (or both) may be embedded: (45) a. A: Bill wonders who saw Mary. B: John. (John saw Mary) b. A: Bill thinks a student saw John. B: Who? (Which student does Bill think saw John?) c. A: Who saw Mary? B: John thinks Bill. (John thinks Bill saw Mary) d. A: A student saw John. B: Bill wonders who. (Bill wonders which student saw John)
2. Theoretical Background
2.1. Ellipsis Resolution: A Theory of Context and Parallelism The task of accounting for many ellipsis phenomena can be viewed as involving: (a) locating an element in the context (the source) parallel 125 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 125–144. c 2007 Springer.
´ FERNANDEZ ET AL.
126
to the ellipsis element (the target); and (b) computing from contextual information a property which, applied to the target, yields the resolved content. This view underlies work on Higher Order Unification (HOU) (Dalrymple et al., 1991; Pulman, 1997), and also the Dynamic Syntax approach of (Kempson et al., 1999). We adopt a similar approach in this paper. We extend our account to adjuncts, and we briefly consider an alternative approach to adjunct fragments. We also provide an explicit account of the relation between this parallelism and dialogue context (see Section 5 below). We adapt the situation semantics-based theory of dialogue context developed in the kos framework (Ginzburg, 1996; Ginzburg, 1999; Cooper et al., 1999). This combines a structuring of the propositional common ground of conversation (Webber, 1991; Asher, 1993) with a modelling of discourse topic based on (Carlson, 1983). In (Ginzburg and Sag, 2001) this framework is integrated into recent work in Head Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994; Sag, 1997). Following (Ginzburg and Sag, 2001) we define two new attributes within the context (ctxt) feature structure: Maximal Question Under Discussion (max-qud), whose value is of sort question, and Salient Utterance (sal-utt), whose value is a set of elements of type sign. In this framework, questions are represented as semantic objects comprising a set of parameters – that is, restricted indices – and a proposition prop as in (46). This is the feature structure counterpart of the λ-abstract λπ(. . . π . . .). In a wh-question the params set represents the abstracted index values associated with the wh-phrase(s). For a polar question the params set is empty.
(46) question params {π, ...} prop
proposition
sit
soa soa(. . . π . . .)
In general a number of such questions may be available in a given discourse context, of which one is selected as the value of max-qud. An algorithm is given below for the simple cases discussed in the present system, but it will be apparent that the system is flexible enough to allow for extension to more complicated dialogues.
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE
127
The feature sal-utt represents a distinguished constituent of the utterance whose content is the current value of max-qud. In information structure terms, sal-utt can be thought of as a means of underspecifying the subsequent focal (sub)utterance or as a potential parallel element. max-qud corresponds to the ground of the dialogue at a given point.1 Since sal-utt is a sign, it enables us to encode syntactic categorial parallelism, including case assignment for the fragment.2 sal-utt is computed as the (sub)utterance associated with the role bearing widest scope within max-qud: − For wh-questions, sal-utt is the wh-phrase associated with the params set of the question.3 − If max-qud is a question with an empty params set, the context will be underspecified for sal-utt. Its possible values are either the empty set or the utterance associated with the widest scoping quantifier in max-qud. This will be invoked to resolve sluicing.4 Our grammar also includes the non-local feature c(ontextual)param(eter)s, introduced by (Ginzburg and Cooper, 2004). This feature encodes the entire inventory of contextual parameters of an 1
For related notions concerning information structure see (Vallduv´ı, 1992; Krifka, 1992; Rooth, 1993; Grosz et al., 1995). In contrast to these works, the notions we utilise here are applied directly within a computational theory of dialogue. 2 We invoke syntactic parallelism only for matching conditions between an elliptical form and a prior utterance. Thus, our approach is compatible with psycholinguistic work demonstrating the rapid decay of purely structural information (Fletcher, 1994). Indeed, by relating max-qud and sal-utt in the way we do, our approach enables us to make strong predictions about the range of possible relations of categorial parallelism. 3 More generally, the utterance associated with the params set when this is non-empty. An empty params set can arise when the antecedent is not an interrogative clause, for example in reprise or echo questions (mentioned in Section 6; see Ginzburg and Sag, 2001 for detailed discussion). In such cases, sal-utt will be the utterance of the constituent to be clarified. 4 sal-utt can also be a set containing more than one member in contexts where max-qud is a multiple question, as in (i) below. We leave the analysis of such phenomena to future research. (i) A: Who arrived when? B: Jo at 5, Mustafa at 7.
128
´ FERNANDEZ ET AL.
utterance (proper names, indexicals and so on).5 The values of cparams get amalgamated via lexical heads and are propagated in the same way as other non-local features, such as slash and wh. The set of parameters is inherited from head daughter to mother within headed structures by a Generalised Head Feature Principle (see Ginzburg and Sag, 2001). 2.2. A Grammatical Framework for Fragments We adopt a version of HPSG, following (Sag, 1997; Ginzburg and Sag, 2001), which encodes information about phrases by cross-classifying them in a multi-dimensional type hierarchy. Phrases are classified not only in terms of their phrase structure schema or X-bar type, but also with respect to the informational dimension of clausality. Clauses are divided into inter alia declarative clauses (decl-cl), which denote propositions, and interrogative clauses (inter-cl) denoting questions. Each maximal phrasal type inherits feature values from both these dimensions. This classification allows us to specify systematic correlations between clausal construction types and semantic content types . In line with much recent work in HPSG and Categorial Grammar, we do not treat ellipsis by positing a phonologically null head. Rather, we assign fragments to a subtype of the phrasal type head-only-phrase. 6 We first deal with fragments that constitute arguments, and then turn to adjuncts in Section 4. Bare argument phrases are analysed by means of the phrasal type headed-fragment-phrase (hd-frag-ph). The top-most constraint associated with this type is shown in (47). This constraint has two significant effects. First, it ensures that the category of the head daughter (the fragment) is identical to that specified by the contextually provided sal-utt. Second, the constraint coindexes the head daughter with the sal-utt. This will have the effect of ‘unifying in’ the content of the former into a contextually provided content. Thus, the (sub)utterance in the antecedent picked up by sal-utt links the bare phrase to the appropriate argument-role, and enforces categorial identity. 5
The presence of this feature allows signs to play a role similar to the role traditionally associated with ‘meanings’, i.e. to function as abstracts with roles that need to be instantiated. See (Ginzburg and Cooper, 2004) for more discussion. 6 In former versions of the system (see Ginzburg et al., 2001), fragments were analysed as non-headed bare phrases, the fragment being the non-head daughter.
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE
(47) hd-frag-ph: head ctxt|sal-utt hd-dtr
v vform fin
129
cat 1 cont|index 2 cat 1
cont|index
2
We define two types by means of which we analyse argument fragments. These are both subtypes of hd-frag-ph: declarative-fragmentclause (decl-frag-cl) for “short answers” and sluice-interrogative-clause (slu-int-cl) for sluices. These subtypes are also subtypes of decl-cl and inter-cl, respectively. As a result, their content is in the first case a proposition and in the second a question. We start by considering decl-frag-cl. The only information, beyond that inherited from hd-frag-ph and decl-cl, which remains to be specified concerns the scoping of quantifiers and the semantic content. Whereas in most headed clauses the content is entirely (or primarily) derived from the head daughter, here it is constructed for the most part from the contextually salient question (max-qud). This provides the values for the situation and nucleus features of the phrase’s content. With respect to quantifier scoping, we assume the following: − Quantifier scoping. If the bare phrase is (or contains) a quantifier Q, then Q is scoped in wider than the existing quantifiers, if any, in the contextually salient question (max-qud).7 The constraint particular to decl-frag-cl is, hence, as represented in (48). This constraint identifies the sit and nucleus values of the phrase’s content with those of the max-qud. It also ensures that if the head daughter contributes a parameter to the store, due to the presence of a wh-phrase, that parameter remains stored, i.e. is included in the mother’s store value. Turning to sluices (slu-int-cl), most existing linguistic work on sluicing has assumed (on the basis of embedded uses in monologue) that 7
For motivation for this view see (Ginzburg, 1999; Ginzburg and Sag, 2001). In the latter work apparent exceptions to this are analysed on the basis of Skolem function interpretation of wh-phrases.
´ FERNANDEZ ET AL.
130
only existentially quantified propositions can serve as antecedents (see e.g. Chung et al., 1995; Reinhardt, 1997). However the examples in (6), taken from (Ginzburg and Sag, 2001), show that the context for sluicing crucially involves the QUD-maximality of a question of the form whether p, where p is quantified and its widest-scoping quantifier is non-negative.8 (48) decl-frag-cl:
head
cont store max-qud hd-dtr
ic + proposition sit 2 soa quants order( Σ3 ) ⊕ 5 nucl
A Σ1 question params non-empty-set proposition sit 2 prop A quants soa 5 nucl
Σ Σ 3 ∪ 1 set(param) store
(49) a. A: Many dissidents have been released. B: Do you know who? b. A: Did anyone show up for class today? B: Yes. A: Who? 8
(Singular) definites are an exception to this, allowing as they do only reprise/echo sluices: (i) A: The murderer was obviously a vicious guy. B: #who?/WHO?/#You don’t know who?/ (ii) # which murderer was obviously a vicious guy? Our account will associate the non-reprise sluice (i) with a content essentially synonymous with the non-elliptical (ii), which is indeed infelicitous in this context (i.e. as a non-reprise sluice).
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE
131
c. A: Can anyone solve the problem? B: Gee, I wish I knew who. d. A: No student supported the proposal. B: hmm, # I wonder who (cf. I wonder who did.) As with decl-frag-cl, the type slu-int-cl inherits a significant part of its specification through being a subtype of hd-frag-ph and inter-cl. The conditions that are specific to slu-int-cl pertain to content, which like decl-frag-cl is partially determined by the context, and quantifiers. We assume that: − Quantifier replacement. The widest scoping quantifier Q in max-qud’s quants list is removed from the quants list of the content of a slu-int-cl. Thus, the widest scoping quantifier, if any, in the open proposition of the question after resolution will be whichever quantifier, if any, was previously scoped just narrower than Q.9 The constraint particular to slu-int-cl is shown in (51). The whphrase contributes a parameter to the store value of the head daughter, which is constrained to be a non-empty set of parameters. The parameter is then retrieved by identifying the head daughter’s store value with the clause’s params set. This analysis can be extended to account for reprise sluices (50a) and elliptical literal reprises (50b). (50) a. A: Mary sang. B: WHO? b. A: Did Jo kowtow? B: JO? 9
(Lappin, 2002) points out that there are exceptions to this condition, as in (i). (i) A: Each student will consult a supervisor. B: Which one? The most natural interpretation of B’s utterance in (i) is Which supervisor will each student consult?. This reading takes the narrower scoped quantified NP ‘a supervisor’ as the sal-utt. We will be refining the scope conditions associated with decl-frag-cl and slu-int-cl, and the procedure for identifying the sal-utt in future work. (Ginzburg and Sag, 2001) discuss cases of this kind in terms of an analysis that uses Skolem functions.
´ FERNANDEZ ET AL.
132 (51) slu-int-cl:
question params Σ1 proposition cont sit 1 prop quants A soa 3 nucl store {} question params {} proposition sit 1 ctxt max-qud prop n-neg-qf-rel quants ⊕ soa nucl 3
hd-dtr store Σ1 non-empty-set(param)
A
The resolution of reprise sluices such as B’s utterance in (50a) is achieved by allowing a conversational participant to coerce a clarification question onto max-qud.10 The reprise sluice is then analysed as a direct-insitu-interrogative-clause (dir-is-int-cl), a phrasal type introduced by (Ginzburg and Sag, 2001) to analyse direct in-situ constructions like ‘You gave the book to who?’ and intonation questions such as ‘You’re hungry?’. The head daughter of a construction like (50a) is a decl-frag-cl in which the parameter which constitutes the content of the wh-phrase WHO remains in storage. The retrieval of this parameter is effected by the type dir-is-int-cl, which allows a question to be constructed by retrieving from storage zero or more parameters from a proposition-denoting head daughter. This is achieved by identifying the prop value of the mother with the content value of the head daughter. The parameter introduced by the wh-phrase is included in the clause’s params set. Elliptical literal reprises like (50b) are also analysed by means of dir-is-int-cl. Again, a clarification question is coerced onto max-qud, which allows us to analyse the fragment using the type decl-frag-cl as 10
For a detailed account of these coercion mechanisms see (Ginzburg and Cooper, 2004).
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE
133
a head daughter of dir-is-int-cl. The crucial difference is that in (50b) there is no parameter to retrieve from storage, leading to a question with an empty params set, in other words a polar question (‘Are you asking if JO kowtows?’).
3. An Implemented System for Short Answers and Bare
Sluiced Questions Our fragment interpretation system consists of four main components: I. An HPSG grammar. This is a substantially modified version of the grammar employed by (Lappin and Gregory, 1997; Gregory and Lappin, 1999; Ginzburg et al., 2001), extended as described in (Fern´ andez, 2002) to cover a important part of the wide coverage grammar proposed in (Ginzburg and Sag, 2001). The grammar uses the types and features specified in Section 2 and is encoded in (Erbach, 1995)’s ProFIT system. II. A dialogue record. When a clause has been parsed (and any ellipsis resolved as described below), its atribute value matrix (AVM) is first converted into a transitive network of Mother-DaughterRelations (MDR list) and then stored in a dialogue record paired with an index (counter).11 A list of max-qud candidates is computed from the value cont feature of each subclause and stored as a further component of the discourse record (the qud-list or candidate list). III. A CONTEXT resolution procedure. This assigns values from the dialogue record to the max-qud and sal-utt features of the current clause C, according to the procedures specified in Section 2. The most recent element of the qud-list which is compatible with the type constraints imposed by the bare argument phrase is selected as the value of max-qud.12 On the basis of the conditions 11
In the simple dialogue sequences implemented so far, this indexing corresponds to a linear sequence of utterances, but the format can be enriched to capture more complicated dialogue structures. 12 (Lappin, 2002) motivates the need for a more refined procedure to select the antecedent of a fragment phrase. He presents cases in which recency is overridden and more distant antecedents are preferred to type compatible candidates.
134
´ FERNANDEZ ET AL.
indicated in Section 2.1, the sal-utt is obtained from the sign whose content provides max-qud. IV. A bare clause resolution procedure. This computes the cont of C as already described. The nucleus N is identified with that of the max-qud, and the index of the head daughter identified with that of the sal-utt, with the specified operations on the params set and on the quants list. If the head daughter is an argument, its cat is identified with that of sal-utt, enabling it to be assigned case. The system as described produces AVMs for bare answers and reprise and non-reprise sluices corresponding to the structures argued for in the previous section. An instructive case is the dialogue sequence in (52), where “cascaded” bare answers and sluices interact to give the specified interpretation of the final fragment Mary. (52) A: Who saw John? B: A girl. (= A girl saw John.) A: Who? (= Which girl saw John?) B: Mary. (= A girl called Mary saw John) The simplified AVM for this final bare answer (various contextual restrictions on indices are suppressed) is shown in (54). Note that the procedure for assigning the cat value to the fragment rules out cases of category mismatch such as those in (53):13 (53) a. A: Who saw Mo? B: #to Jo. b. A: Whose book did you read? B: His/#He/#Him Similarly the coindexation of the sal-utt and the head daughter in the cont of the bare clause, identifies cases of semantic mismatch between the source and target. 13
This paper focuses on English. The existence of syntactic parallelism across utterances is far easier to demonstrate in case rich languages such as German, Russian, or Greek. For such data see e.g. (Ginzburg, 1999; Ginzburg and Sag, 2001).
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE
135
(54) phon mary
cat S[fin] index 1 c-params 6 , 7 rest girl( 1 ),person( 1 ) cont | prop | soa 3 question params 7 max-qud see-rel ctxt 3 see-er prop soa see ee 5 NP[nom] cat sal-utt index 1 phon mary cat 5 hd-dtr 1 index cont 6 rest named(mary( 1 ))
1 2
4. Adjunct Sluices
Until now we have dealt with cases of bare answers and sluices corresponding to arguments. Our system also covers adjunct sluices. We assume the following general account of the semantics of adjuncts. An adjunct has a cont of type soa, i.e. a relation (adjunct-relation) with a parameter corresponding to a property of a soa, and a role whose value is identified with the soa of the modified head. (55)
adj-rel param cont param soa-role soa
On this account, interrogative adjuncts are interpreted as abstraction over properties of soas. As shown in (56), if the adjunct is interrogative, then its index represents abstraction over the corresponding type of relation (e.g. temporal or causal).
´ FERNANDEZ ET AL.
136 (56) a. when:
at-rel index cont param restr soa-role soa
1 time( 1 )
b. When did John see Mary?
index 2 params 1 restr time( 2 ) at-rel param 1 soa see-rel soa-role see-er 3 see-ee 4
The framework sketched above already accommodates short answers to adjunct questions such as (57):14 (57) a. A: When did Jo leave? B: At 2. b. A: Why did Bo leave? B: Because he was unhappy. A bare answer to a question with an interrogative adjunct would take that adjunct as its sal-utt value and substitute its own relation for the latter’s index. In similar fashion we can also accommodate sluicing where an antecedent exists: (58) a. A: Jo left at some point yesterday. B: When? 14
An issue whose discussion we defer to future work concerns the categorial parallelism requirements associated with adjuncts, which appear to be somewhat freer than with arguments: (i) A: When did Jo leave? B: At 2/Yesterday/Recently.
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE
137
b. A: Bo shot herself for a reason. B: Hmm. Why/What?
There are, however, cases where there is no such antecedent, as in the following examples:
(59) a. A: John saw Mary. B: When? b. A: John likes Mary. B: Why?
One approach we could adopt to deal with such cases is to subsume them under the analysis we give for (58). To do this, one needs to posit a mechanism of ‘existential adjunct accommodation’, which provides for the requisite existentially quantified antecedent. Such an approach has the advantage that it requires no extra grammatical apparatus as such, but it does involve an ‘adjustment’ of the content of the antecedent, which is computationally problematic in that it frequently relies on non-deterministic (specifically, abductive) principles of inference.15 An alternative approach, which we adopt in our current system,16 is to take the sal-utt in such cases as empty (i.e. there is no parallel constituent to be picked up). Consequently, we posit an additional phrasal type bare-soa-modifier-phrase (bare-soa-mod-ph) to support the constraints needed for the interpretation of bare adjuncts. The specification for this type is as follows: 15
From the perspective of dialogue processing this ‘adjustment’ can be viewed as an inference initiated by the addressee of the original utterance. Hence, it need not be viewed as post hoc reanalysis. This requires an approach to context, such as that developed in kos and implemented in the GODIS system at Gothenburg (see e.g. (Bohlin et al., 1999)), in which distinct dialogue participants can diverge in their view of what constitutes the contextual common ground. Our system as it stands does not accommodate such mismatches. 16 See (Fern´ andez and Ginzburg, 2002a) for a similar approach to bare adjuncts.
138
´ FERNANDEZ ET AL.
(60) bare-soa-mod-ph: store {}
max-qud prop | soa | nucl 1 cat adv hd-dtr cont adj-rel soa-role 1
We posit a type sluice-bare-adjunct-clause (slu-bare-adj-cl), which is a subtype of inter-cl and bare-soa-mod-ph. This entails that it denotes a question and that the information specified for bare-soa-mod-ph is inherited. The sole additional information required by slu-bare-adj-cl concerns the semantic content of the clausal fragment and the retrieval of the index associated with the wh-phrase: (61) slu-bare-adj-cl: question cont 2 params prop | soa | nucl 1 cont hd-dtr 2 store
1
The clause’s nucleus is identified with the content of the head daughter and the parameter contributed by the wh-phrase is included in the clause’s params set. The (truncated) AVM which our system generates for the adjunct sluice when in (59a) is as shown in (63). Given this treatment of adjuncts, we can also accommodate adverbial answers to polar questions such as the following: (62) A: Was Bo sent home? B: Probably. Such cases are analysed as instances of a type bare-adjunct-clause (bare-adj-cl). This is a subtype of both bare-soa-mod-ph and decl-cl, which inherits its specification entirely from these two types.
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE (63) c-params cont ctxt hd-dtr
...
0
...
139
index 1 params 0 1) time( restr at-rel param 0 prop | soa | nucl 2 see-rel soa-role 3 see-er j see-ee m params {} max-qud 3 prop | soa | nucl sal-utt {} 2 cont store 0
question
5. Comparison with Other Approaches
The system described here shares many features with the Higher-Order Unification (HOU) account of ellipsis resolution (Dalrymple et al., 1991; Pulman, 1997). However, it differs from HOU in three important respects. First, while HOU does not indicate how the relation of parallelism between the fragment and a counterpart term in the antecedent clause is specified, we provide an explicit definition of the counterpart term (as the phrase that supplies the value of sal-utt for the bare clause).17 Second, unlike HOU, we impose a syntactic matching condition on the category of the fragment and its counterpart term. This permits us to rule out cases of categorial mismatch, as discussed above. Because the entire AVM of an antecedent is recoverable from the discourse 17
(Gardent and Kohlhase, 1997) combine a version of HOU with an abductive calculus for computing parallel elements. This invokes relations defined over a hierarchical classification of the lexical semantics of the subconstituents, following the work of (Hobbs, 1991) on parallelism in discourse. To the best of our knowledge this system has not yet been implemented.
140
´ FERNANDEZ ET AL.
record, we can invoke additional syntactic constraints on fragment interpretation if these are required (cf. (Lappin and Gregory, 1997) for other cases of ellipsis resolution where additional syntactic matching conditions are invoked). Third, HOU is not possible when the semantic type of the target is distinct from that of the source, as when the target denotes a question and the source is a proposition or vice versa. (Pulman, 1997) attempts to bypass this problem by associating propositional contents with interrogatives. However, as the formal semantic literature on interrogatives suggests (Groenendijk and Stokhof, 1997), such a move is not semantically viable. Our system can handle such mismatches in semantic type by using the soa and quants features of different clause types present in the discourse record to specify the max-qud and sal-utt values of a bare clause. (Lappin and Shih, 1996) propose a generalised algorithm for ellipsis resolution, which was implemented in (Lappin and Gregory, 1997). Bare NP fragments are treated as the non-head daughters of clauses with empty heads, which are replaced in ellipsis resolution by an antecedent head. Counterpart arguments are replaced by the overt fragment, while non-counterpart arguments (and adjuncts) are copied into the ellipsis site. The present model avoids the need for empty heads and full syntactic reconstruction.
6. Conclusions and Future Work
We have presented the main features of SHARDS, a system for resolving fragments in dialogue within a typed feature structure grammar. The system provides a procedure for computing the content values of clausal fragments from contextual information contained in a discourse record of previously processed sentences. The system has served as the basis for a variety of work on elliptical constructions. (Purver, 2004) describes an implementation of the different readings and forms of clarification requests within an TrindiKit-based dialogue system which incorporates the ellipsis resolution capability of SHARDS, together with the dialogue move engine GODIS (Cooper et al., 2001; Larsson, 2002). SHARDS has also been used as the basis of a generation module. (Ebert et al., 2004) developed an algorithm which generates full paraphrases for interpreted fragments in a dialogue management system.
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE
141
The theoretical work on ellipsis resolution described in Section 2 has supported two large scale corpus studies. One focuses on the available means for posing clarification requests, of which 40% turn out to be elliptical forms (Purver et al., 2001). The other attempts to characterise the entire class of fragmentary utterances in a conversational corpus (Fern´ andez and Ginzburg, 2002a; Fern´ andez and Ginzburg, 2002b; Fern´andez and Ginzburg, 2002c). This latter study proposes a taxonomic scheme consisting of 16 classes of fragmentary utterances, a substantial part of which can be analysed within the framework described in Section 2. Our current aim is to extend the system to cover a wider range of elliptical utterances. These include e.g. verb phrase ellipsis like (64) (see (Lappin and Gregory, 1997; Gregory and Lappin, 1999) for an analysis; see also (Nielsen, 2003) for a corpus-based study of VP ellipsis). (64) A: Mary wants vodka. B: Rosa does too. Our future research is concerned with the decision procedures required for choosing the antecedent of a fragment in dialogue. In this respect, the heuristics currently employed by SHARDS select the most recent clause which is compatible with the syntactic and semantic constraints imposed by the elliptical utterance. However, to account for dialogue sequences like (65) (where the fragment phrase to surprise you is a reply to A’s first question rather than the second), this recency measure needs to be refined. (65) A: Why did Mary arrive early? B: I can’t tell you. A: Why can’t you tell me? B: Okay, if you must know, to surprise you. One of our main concerns is to develop a robust computational procedure for identifying the antecedents of fragmentary utterances, with the aim of implementing a wide coverage system for fragment interpretation in dialogue.
Acknowledgements Earlier versions of this paper were presented at the NLP colloquium of the Cambridge Computer Laboratory (March 2000), the University
142
´ FERNANDEZ ET AL.
College London Workshop on Syntax and Pragmatics (April 2000), the IBM Research NLP Colloquium in Hawthorne, NY (June, 2000), and the Workshop on Linguistic Theory and Grammar Implementation at ESSLLI 2000. We are grateful to the participants of these forums for useful discussion of some of the ideas in this paper. In particular we would like to thank Steve Pulman and Richard Brehny for helpful comments. This research project was funded by grant number R00022269 from the Economic and Social Research Council of the United Kingdom, by grant number AN2687/APN 9387 from the Arts and Humanities Research Board of the United Kingdom, by grant number GR/R04942/01 from the Engineering and Physical Sciences research Council, and by grant number RES-000-23-0065 from the Economic and Social Research Council of the United Kingdom.
References Asher, N.: 1993, Reference to Abstract Objects in English: a Philosophical Semantics for Natural Language Metaphysics, Studies in Linguistics and Philosophy. Dordrecht: Kluwer. Bohlin, P., R. Cooper, E. Engdahl, and S. Larsson: 1999, ‘Information states and dialogue move engines’. Gothenburg Papers in Computational Linguistics. Carlson, L.: 1983, Dialogue Games, Synthese Language Library. Dordrecht: Reidel. Chung, S., B. Ladusaw, and J. McCloskey: 1995, ‘Sluicing and Logical Form’. Natural Language Semantics 3: 239–282. Cooper, R., S. Larsson, J. Hieronymus, S. Ericsson, E. Engdahl, and P. Ljunglof: 2001, ‘GODIS and Questions Under Discussion’. In: The TRINDI Book. Available from http://www.ling.gu.se/research/projects/trindi. Cooper, R., S. Larsson, M. Poesio, D. Traum, and C. Matheson: 1999, ‘Coding Instructional Dialogue for Information States’. In: The TRINDI Book. Available from http://www.ling.gu.se/research/projects/trindi. Dalrymple, M., F. Pereira, and S. Shieber: 1991, ‘Ellipsis and Higher Order Unification’. Linguistics and Philosophy 14, 399–452. Ebert, C., S. Lappin, H. Gregory, and N. Nicolov: 2004, ‘Full Paraphrase Generation for Fragments in Dialogue’. In: R. Smith and J. van Kuppevelt (eds.): Current and New Directions in Discourse and Dialogue. Kluwer Academic Publishers. Erbach, G.: 1995, ‘ProFIT: Prolog with Features, Inheritance and Templates’. In: Proceedings of the Seventh European Conference of the ACL. pp. 180– 187.
SHARDS: FRAGMENT RESOLUTION IN DIALOGUE
143
Fern´ andez, R.: 2002, ‘An Implemented HPSG Grammar for SHARDS’. Technical Report TR-02-04, Department of Computer Science, King’s College London. Fern´ andez, R. and J. Ginzburg: 2002a, ‘Non-Sentential Utterances: A Corpus Study’. Traitement automatique des languages. Dialogue 43(2), 13–42. Fern´ andez, R. and J. Ginzburg: 2002b, ‘Non-Sentential Utterances: Grammar and Dialogue Dynamics in Corpus Annotation’. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002. Taipei, Taiwan, pp. 253–259. Fern´ andez, R. and J. Ginzburg: 2002c, ‘Non-Sentential Utterances in Dialogue: A Corpus Study’. In: Proceedings of the Third SIGdial Workshop on Discurse and Dialogue, ACL’02. Philadelphia, PA, USA, pp. 15–26. Fletcher, C.: 1994, ‘Levels of Representation in Memory for Discourse’. In: M.A. Gernsbacher (ed.) Handbook of Psycholinguistics. New York: Academic Press, pp. 589–607. Gardent, C. and M. Kohlhase: 1997, ‘Computing Parallelism in Discourse’. In: Proceedings IJCAI’97. pp. 1016–1021. Ginzburg, J.: 1996, ‘Interrogatives: Questions, Facts, and Dialogue’. In: S. Lappin (ed.): Handbook of Contemporary Semantic Theory. Oxford: Blackwell. Ginzburg, J.: 1999, ‘Ellipsis Resolution with Syntactic Presuppositions’. In: H. Bunt and R. Muskens (eds.): Computing Meaning, Volume 1. . Kluwer, pp. 255 – 279. Ginzburg, J. and R. Cooper: 2004, ‘Clarification, Ellipsis, and the Nature of Contextual Updates’. Linguistics and Philosophy 27(3), 297–366. Ginzburg, J., H. Gregory, and S. Lappin: 2001, ‘SHARDS: Fragment Resolution in Dialogue’. In: H. Bunt, I. van der Sluis, and E. Thijsse (eds.): Proceedings of the Fourth International Conference on Computational Semantics. Tilburg, pp. 156–172. Ginzburg, J. and I. Sag: 2001, Interrogative Investigations. CSLI Publications. Gregory, H. and S. Lappin: 1999, ‘Antecedent Contained Ellipsis in HPSG’. In: Lexical and Constructional Aspects of Linguistic Explanation. Stanford: CSLI Publications, pp. 331–356. Groenendijk, J. and M. Stokhof: 1997, ‘Questions’. In: J. van Benthem and A. ter Meulen (eds.) Handbook of Logic and Linguistics. Amsterdam: North Holland, pp. 1055–1124. Grosz, B., A. Joshi, and S. Weinstein: 1995, ‘Centering: a framework for modelling the local coherence of discourse’. Computational Linguistics 21, 203–225. Hobbs, J.: 1991, Literature and Cognition, Vol. 21. Stanford: CSLI Lecture Notes. Kempson, R., W. Meyer-Viol, and D. Gabbay: 1999, ‘VP Ellipsis: Towards a Dynamic, Structural Account’. In: S. Lappin and E. Benmamoun (eds.): Fragments. New York: Oxford University Press, pp. 175–223.
144
´ FERNANDEZ ET AL.
Krifka, M.: 1992, ‘A Framework for Focus-sensitive Quantification’. In: Proceedings of Semantics and Linguistic Theory 2. Ithaca, NY, pp. 213–236, CLC Publications. Lappin, S.: 2002, ‘Salience and Inference in Anaphora Resolution’. In: Fourth Discourse and Anaphora Resolution Colloquium, Lisbon. (Invited talk.) Lappin, S. and H. Gregory: 1997, ‘A Computational Model of Ellipsis Resolution’. In: Proceedings of the Conference on Formal Grammar. Aix en Provence. Lappin, S. and H.-H. Shih: 1996, ‘A Generalized Reconstruction Algorithm for Ellipsis Resolution’. In: Proceedings of COLING-96. pp. 687–692. Larsson, S.: 2002, ‘Issue based Dialogue Management’. Ph.D. thesis, Gothenburg University. Nielsen, L. A.: 2003, ‘A Corpus-based Study of Verb Phrase Ellipsis’. In: Proceedings of the 6th CLUK Colloquium. Edinburgh, UK, pp. 109–115. Pollard, C. and I. Sag: 1994, Head Drieven Phrase Structure Grammar. University of Chicago Press and CSLI Publications. Pulman, S.: 1997, ‘Higher Order Unification and the Interpretation of Focus’. Linguistics and Philosophy 20, 73–115. Purver, M.: 2004, The Theory and Use of Clarification Requests in Dialogue. PhD thesis. King’s College, University of London. Purver, M., J. Ginzburg, and P. Healey: 2001, ‘On the Means for Clarification in Dialogue’. In: Proceedings of the 2nd ACL SIGdial Workshop on Discourse and Dialogue, pp. 116–125. Reinhart, T.: 1997, ‘Quantifier Scope: How Labor is divided between QR and Choice Functions’. Linguistics and Philosophy 20, 335–397. Rooth, M.: 1993, ‘A theory of focus interpretation’. Natural Language Semantics 1, 75–116. Sag, I.: 1997, ‘English Relative Clause Constructions’. Journal of Linguistics 33, 431–484. Vallduv´ı, E.: 1992, The Informational Component. New York: Garland. Webber, B.: 1991, ‘Structure and Ostension in the Interpretation of Discourse Deixis’. Language and Cognitive Processes 14, 107–135.
´ AND BONNIE L. WEBBER IVANA KRUIJFF-KORBAYOVA
INTERPRETING CONCESSION STATEMENTS IN LIGHT OF INFORMATION STRUCTURE
1. Introduction
Information structure (IS) concerns the utterance-internal structural and semantic properties reflecting the relation of an utterance to the discourse context, in terms of the discourse status of its content, the actual and attributed attentional states of the discourse participants, and the participants’ prior and changing attitudes (knowledge, beliefs, intentions, expectations, etc.) (Kruijff-Korbayov´ a and Steedman, 2003). Since these extend the utterance boundaries, it is relevant to ask how IS is taken up and used in the interpretation of larger units of discourse. It is well-known that IS influences the interpretation of individual sentences. For example, (Halliday, 1970) notes a sign in the London Underground with the text “Dogs must be carried”, and observes that this text can be pronounced with different intonation patterns, e.g., (1) vs. (2) reflecting different IS. Thereby, different instructions (here, paraphrased in italics) are conveyed to passengers. One supposes that (2) was not the intention of the London Transport Authority. (1)
Dogs must be carried. H* LL%
(2)
If there is a dog, carry it.
Dogs H*
must be carried. LL%
Carry a dog.
In English, IS is most often conveyed by intonation. In languages with freer word order, differences in IS are most often conveyed by different word ordering. Over the past decade, the understanding of IS within the sentence has been enriched by intensive research in formal semantics, addressing association-with-focus phenomena involved in the the interpretation of focus particles (“only”, “even”, etc.), quantifiers and negation, interpretation of intonation and word order, and reference. It is now widely accepted that IS affects both interpretation and realization, even though there is no uniform account. 145 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 145–172. c 2007 Springer.
146
´ AND WEBBER KRUIJFF-KORBAYOVA
However, much less is known about what, if any, use is made of IS beyond clause and sentence boundaries and how IS interacts with other aspects of discourse structure and semantics. This was identified as an important task already in (Dretske, 1972), who argues that determining the meaning of larger expressions needs to take into account contrastive differences between expressions embedded in them. Our work extends the repertoire of IS-sensitive accounts in this direction. In (Kruijff-Korbayov´ a and Webber, 2001) we described how the IS of a previous sentence or clause can affect the meaning projected through the adverbial discourse connective “otherwise”. We showed that an IS-based account of its meaning provides access to contextually appropriate interpretations that are unavailable to accounts that ignore IS. In this chapter, we concentrate on clauses and sentences related by discourse connectives signaling concession, such as although, however, nevertheless, and we discuss what influence the IS of the relata has on the overall interpretation. Concession is a relation between two arguments, one of which is conceded in favor (or: in face) of the other. The literature on concession, reviewed in Section 2 distinguishes two types of concession, which we call denial of expectation and concessive opposition. While research to date has ignored IS, it appears very relevant. This can be seen through the pair of examples below, modeled on examples introduced in (Dretske, 1972) (cf. Section 4) and interpreted first as denial of expectation.1 (3)
Clyde married Bertha. However, he did not inherit a penny.
(4) Clyde married Bertha. However he did not inherit a penny. In both cases, the expectation is the same – namely, that Clyde inherited some money. But the reasons for this expectation differ in (3) and (4), as paraphrased in (5) and (6) respectively. (We are considering the ‘narrow-focus’ readings of the first utterance which provides the conceded argument.) (5) If it is Bertha (and not someone else) who Clyde marries, then what is expected to happen to Clyde is that he inherits some money. (Expectation in (3)) 1
Small capitals indicate words carrying pitch accents, i.e., phonologically prominent elements in each sentence.
CONCESSION AND INFORMATION STRUCTURE
(6)
147
If what Clyde does with Bertha is to marry her (and not something else), then what is expected to happen to Clyde is that he inherits some money. (Expectation in (4))
Similar observations hold when (3) and (4) are interpreted as concessive opposition, as responses to the question in (7): (7) Is Clyde happy? The question provides a common issue, a so-called tercium comparationis, with respect to which the relata in concessive opposition support opposite conclusions. Let us say Clyde married Bertha supports the conclusion that Clyde is happy, and Clyde did not inherit a penny supports that Clyde is not happy. Again, (3) and (4) differ in the exact reasons for expecting Clyde’s happiness: in (3) it is marrying Bertha as opposed to someone else, whereas in (4) it is the marriage. The corresponding interpretations of (3) and (4) are paraphrased in (8) and (9), respectively (again considering their ‘narrow-focus’ readings). (8) If it is Bertha (and not someone else) whom Clyde marries, then what is expected to happen to Clyde is that he is happy. (Expectation based on (3)) (9)
If what Clyde does with Bertha is to marry her (and not something else), then what is expected to happen to Clyde is that he is happy. (Expectation based on (4))
Our claim is that the observed differences follow from differences in the information structure (IS) of the concession relata. We presented a preliminary IS-sensitive analysis of denial of expectation in (KruijffKorbayov´ a and Webber, 2000), where our approach was to distinguish ‘applicability conditions’ of the implied defeasible rules in accordance with IS. However, it remained unclear how they would be handled by a discourse update function. In (Kruijff-Korbayov´ a and Webber, 2001) we argued that all the differences due to IS can be captured formally in terms of Rooth’s notion of an alternative set (Rooth, 1985; Rooth, 1992) and the alternative-set semantics of IS worked out in (Steedman, 2000b; Steedman, 2000a), and we also included concessive opposition in our analysis. Here we present a more comprehensive account.2 2
We continue to omit contributions of tense, aspect and modality, under the assumption that our analysis will not be invalidated by their eventual incorporation.
148
´ AND WEBBER KRUIJFF-KORBAYOVA
The chapter is organized as follows. Section 2 summarizes some existing approaches to concession. Section 3 introduces the formal machinery employed in the IS-sensitive account of denial of expectation in Section 4.2, and concessive opposition in Section 4.3. Section 5 concludes with a summary and an indication of future research directions.
2. Two Types of Concession
2.1. Denial of Expectation and Concessive Opposition Two types of concession are distinguished in the literature: (i) one type, which we call denial of expectation following (Lakoff, 1971b) and (Lagerwerf, 1998), involves a conceded expectation (Example (10)), and (ii) the other type, which we call concessive opposition following (Spooren, 1989), involves a conceded adversative relation (Example (11)). Using the cognitive primitives introduced in (Sanders et al., 1992), concessive opposition is characterized as an additive, negative, semantic or pragmatic relation, whereas denial of expectation as a causal, negative, semantic or pragmatic relation. (10)
Although Greta Garbo was considered the yardstick of beauty, she never married. (Lagerwerf, 1998)
(11)
Although he does not have a car, he has a bike. (Grote et al., 1995)
The denial of expectation in (10) involves an underlying expectation that Greta Garbo would get married, based on an assumed general rule paraphrased as (12). This general rule is violated (does not hold) in the specific case of Greta Garbo. (12)
If a woman is beautiful then it is expected that she gets married.
In concessive opposition, the two relata lead to opposite contrary conclusions with respect to a contextually pertinent open issue, the tercium comparationis (TC) (Lagerwerf, 1998). The conceded relatum yields one conclusion (also here one can talk about an expectation), whereas the other relatum yields an opposite conclusion. The conclusion expected on the basis of the conceded relatum is considered weaker (by the speaker), and thereby this expectation fails. The opposite conclusion on the basis of the other relatum takes precedence.
CONCESSION AND INFORMATION STRUCTURE
149
In (11), a possible TC is the mobility of the person under discussion. The conceded relatum in the subordinate clause suggests he is not, while the relatum in the main clause suggests that he is. These conclusions can be derived from rule-like observations such as the following, of which the latter is the stronger one. (13)
If a person does not have a car, then he is expected not to be mobile.
(14)
If a person has a bike, then he is expected to be mobile.
The distinction between concessive opposition and denial of expectation can be traced back at least to (Lakoff, 1971b), who explains that the connective but3 can be used (i) in semantic opposition, requiring a ‘common topic’ and semantic similarity, or (ii) in denial of expectation, without any notion of semantic opposition, but presupposing a general tendency or expectation. Lakoff’s notion of ‘common topic’ can be seen as the aforementioned tercium comparationis. But concessive opposition does not appear to require semantic similarity or parallelism of the relata, as shown by the lack of either in the conceded response in (15B). (15)
A. B.
Shall we go running? I do need some exercise. However, it’s raining cats and dogs outside.
(16)
If I need exercise, then it is expected that I will go running.
(17)
If it’s raining hard, then it is expected that I will not go running.
To see how the two types of concession are related, note that in concessive opposition, the TC is distinct from both relata. If, however, the validity of one relatum is taken as the TC, one of the implications turns into an equivalence, and the two types collapse into one. (18)
Although Greta Garbo was considered the yardstick of beauty, she never married.
(19)
If a woman is beautiful then it is expected that she gets married.
(20)
If a woman does not get married then it is not expected that she gets married.
3
An account of the French mais (‘but’) in terms of the opposition of expectation is present in Oswald Ducrot’s work, and an extensive account can also be found in Jacques Jayez’s work (in French).
´ AND WEBBER KRUIJFF-KORBAYOVA
150
This of course only works when a causal or implication relation can hold between the two relata. Thus, interpreting the example (11) (repeated below as (21)) as a denial of expectation by taking as the TC whether the person has a bike, involves the implication in (22) – presumably not a sensible one. (21)
Although he does not have a car, he has a bike.
(22)
If a person does not have a car then it is expected that he does not have a bike.
(23)
If a person has a bike then it is expected that he has a bike.
The TC does not need to be a polar issue (i.e., one corresponding to a yes/no question), as shown by the following example (discussed in (Lagerwerf, 1998)), where the TC corresponds to an open proposition: (24)
A. B.
Which restaurant shall we go to? Although King Tsin has great mu shu pork, China First has great dim sum.
(25)
If a restaurant has good mu shu pork, then it is expected that we will go there.
(26)
If a restaurant has good dim sum, then it is expected that we will go there.
One can take as TC the open proposition explicitly introduced by A’s question (i.e., λx. restaurant(x) & we go to(x)). The conceded relatum suggests one value of the restaurant variable, while the other relatum provides another (presumably preferred) value. How can the TC be formalized? According to (Hamblin, 1973), a question determines a set of potential answers. A further connection between the set of potential answers and Rooth’s notion of contextual alternative set has been made in (Rooth, 1992). In the presence of an explicit question, it is the contextual alternative set of the question, i.e., its Rheme-alternative set (ρ-AS, cf. Section 3), that we take to be the TC. When specified in this way, the notion of the TC also appears to correspond to the notion of a question under discussion used in dialog modeling (Ginzburg, 1996; Larsson, 2003). 2.2. Assertion vs. Presupposition Discourse connectives assert a relation holding between two units of discourse. But they can also convey more than what they assert. For
CONCESSION AND INFORMATION STRUCTURE
151
example, (Halliday, 1985) defines the concession relation as conveying “if P then, contrary to expectation, Q”, involving an expectation the construction appeals to but does not explicitly assert. Early on, Robin Lakoff treated such expectations as presuppositions of the relevant discourse connectives, distinct from their assertions (Lakoff, 1971b). Her account of but in the denial of expectation use was formalized by George Lakoff (Lakoff, 1971a, p. 17) as follows: (27)
S 1 but S 2 asserts the truth of both S 1 and S 2 , and presupposes an expectation Exp(S 1 ⊃ ∼ S 2 ).
The claim that causal discourse connectives have presuppositions was elaborated in (Lagerwerf, 1998). He associated the expectations induced by causal discourse connectives with defeasible rules that might be expected but nevertheless fail to hold in the current case: (28)
although a, b − −
asserts the truth of the arguments: α and β presupposes a defeasible rule: gen(α) > gen(¬β)4
where α and β stand for the propositions expressed by a and b, > stands for ‘defeasibly implies’ (Asher and Morreau, 1991), and gen(X) stands for a generalization, i.e., an abstraction reachable from proposition X. A similar treatment appears in (Knott, 1996) and (Knott and Mellish, 1996). While (Karttunen, 1973) has distinguished two types of presupposition – semantic and pragmatic – and equated the latter with the Gricean notion of conventional implicature (Grice, 1975), it is clear that in speaking of discourse connectives as carrying presuppositions of the sort involved in denial of expectation, authors have in mind the pragmatic notion of conventional implicature. Here we adopt the prevalent current terminology of pragmatic presupposition. 2.3. A Unified Interpretation Scheme for Concession (Grote et al., 1995) have observed that despite the diversity of concessions in their corpus of German text and in the literature, an underlying general principle can be extracted, relating propositions and implications: 4
Throughout the chapter, we paraphrase the presupposed defeasible rule(s) formally denoted by “>” as If. . . , then it is expected that. . . .
´ AND WEBBER KRUIJFF-KORBAYOVA
152
On the one hand, A holds, implying the expectation of C. On the other hand, B holds, which implies not-C, contrary to the expectation induced by A.
which they capture in what they call the ABC-scheme, written as: (29)
i. A > C ii. B → not-C
(29i) is a defeasible implication, which typically encodes general world knowledge, either a rule of cause and effect or a customary expectation. (29ii) is an induced strict rule. While A and B both hold, (29ii) is considered stronger in the given situation, hence not-C follows. One can paraphrase this as Although A, nevertheless not-C, because B. Grote et al. suggested that A and not-C (and possibly B) are verbalized in denial of expectation, whereas A and B are verbalized in concessive opposition (and C is either implicit in the context or is explicitly mentioned). Another way to describe how the ABC-scheme is instantiated in the denial of expectation case is, as we noted in Section 2.1, to take B as equivalent to not-C. This results in a ‘degenerate’ version of the scheme, where denial of expectation is a ‘simplified’ version of concessive opposition, taking the validity of the non-conceded relatum as the TC. The generalization made in the presupposed defeasible rule in (28) (Lagerwerf, 1998) and the corresponding rule (29i) in the ABC-scheme (Grote et al., 1995) reflects general world knowledge or a suitable abstraction from the current context. Our discussion of expectation perfection in Section 4.2 suggests that rather than presupposing a general defeasible rule (gDR), a concessive connective should be taken to presuppose a specific defeasible rule (sDR) corresponding directly to the relata. In turn, this presupposed sDR would be satisifed with respect to a given context by binding, bridging (Bos et al., 1995) or accommodation (van der Sandt, 1992) (cf. also (Beaver, 1997)). Binding and accommodation concern the sDR, whereas bridging involves determining a compatible gDR. On the basis of the remarks presented above, the following modified version of the ABC-scheme results: (30)
concession(α, β), e.g., “although a b”; “a however b” or “ a nevertheless b” − −
asserts the truth of α and β presupposes:
CONCESSION AND INFORMATION STRUCTURE
153
i. α > γ ii. β → not−γ iii. Γ, a contextual alternative set representing the TC such that γ and not-γ are members of Γ where, as before, > stands for defeasible implication (Asher and Morreau, 1991), while → stands for a strict rule. α and β stand for the propositions expressed by a and b, respectively, γ and not-γ stand for the two contrary conclusions with respect to a TC. In denial of expectation, the TC is the issue of whether γ or not-γ holds, where not-γ is equivalent to β. The TC can be explicitly stated as c, whose interpretation is a contextual alternative set Γ; otherwise the TC can be implicit and Γ is inferred in the context (cf. Section 2.1).
3. Information Structure
The notion of IS we are employing originates in the Prague School (Mathesius, 1975), elaborated in (Sgall et al., 1986), (Firbas, 1992), (Halliday, 1985), (Steedman, 2000b) and others. We adopt the formal account presented in (Steedman, 1996; Steedman, 2000a; Steedman, 2000b) which (1) provides a well worked out compositional semantics of English intonation in IS terms; (2) interprets the elements of IS in terms of alternative sets, and (3) assumes a general IS-sensitive notion of discourse context update. Leaving terminological differences and more subtle issues aside, Steedman’s account is by and large straightforwardly compatible with the Prague School approach, and thus when analyzing Czech examples, we can combine Steedman’s account with Sgall and Hajiˇcov´a’s ideas relating IS and word order (Hajiˇcov´a and Sgall, 1987; Sgall et al., 1986). Building on the findings originating in the Prague School, Steedman recognizes two dimensions of IS: The first defines a partitioning at the sentence-level into Themeis and Rhemeis ; the second is a further partitioning of each into Backgroundis and Focusis .5 The latter partitioning is related to Halliday’s Given-New dichotomy (Halliday, 1970; Halliday, 5
Alternative terms used for similar (but not identical) IS partitions in other works are, e.g., Topic-Focus (Sgall et al., 1986), Ground (=Link+Tail)Focus (Vallduv´ı, 1992). We adopt Steedman’s terms, but add the subscripts in Themeis , Rhemeis and Backgris , Focusis in order to avoid confusion with some other uses of the same terms.
154
´ AND WEBBER KRUIJFF-KORBAYOVA
1985) and concerns distinguishing the Themeis and the Rhemeis from other alternatives that the context makes available. In English, Czech and many other languages, IS is established as a result of an interplay of intonation, word order and grammatical structure. Below we give three possible IS partitions that Steedman’s approach provides for the utterance of Clyde married Bertha. Each example is presented as a reply to a question which indicates the intended context and thus helps to fix its IS.6 (31) Q: I know Clyde has stopped dating Aretha, because of a relationship with someone else, and David is getting a divorce. Is there any new gossip? A: Clyde married Bertha. i. (θ λP. P ( c)) (ρ λx. marry (x, b)) ii. θ-AS: {∃P. P (c), ∃P. P (d)} ρ-AS: {marry (c, a), date (c, b), marry (c, b)} (32) Q: I know Clyde has stopped dating Aretha, because he either married or started dating someone else. Who is it? A: Clyde married Bertha. i. (θ λx. marry (c, x)) (ρ λQ. Q( b)) ii. θ-AS: {∃x. marry (c, x), ∃x. date (c, x)} ρ-AS: {marry (c, a), marry (c, b)} (33)
Q: I noticed a change in Clyde’s relationship with Aretha and Bertha. What’s going on between him and his girlfriends? A: Clyde married Bertha. i. (θ λQ. Q(c, b)) (ρ λx.λy. marry (x, y)) ii. θ-AS: {∃Q. Q(c, a), ∃Q. Q(c, b)} ρ-AS: {date (c, b), marry (c, b)}
For each sentence, (i) provides a simplified IS-partitioned logical form, where θ and ρ are operators which ‘wrap’ Themeis and Rhemeis , respectively. Within Themeis and Rhemeis , asterisks on terms (e.g., carry) indicate elements that belong to the respective Focusis . These IS-partitioned logical forms represent the linguistic meaning of the sentences, and serve as input for a discourse (context) update function described 6
Small capitals again indicate words carrying a pitch accent, underlining is used to mark the Rhemeis -part of the utterance. What is not underlined in the utterance belongs to the Themeis -part.
CONCESSION AND INFORMATION STRUCTURE
c1 Figure 16.
θ(ψ)
c2
ρ(ψ)
155
c3
IS-sensitive update of context c1 with ψ: c1 [θ(ψ)]c2 [ρ(ψ)]c3
in Section 3.2. (ii) indicates the Themeis -alternative set (θ-AS) and Rhemeis -alternative set (ρ-AS), which are explained in Section 3.1. Because each example contains Focusis within Themeis (indicated by a -term), which entails contrast with a previous Themeis (and hence alternatives to contrast with), each θ-AS contains more than one element. Without pitch accents in Themeis , and thus without contrast, the θ-AS is a singleton set. 3.1. Alternative Set Semantics for IS Elaborating on the alternative semantics of focus in (Rooth, 1992) and contrastive topics in (B¨ uring, 1999), Steedman assigns the following semantics to IS (cf. (Steedman, 2000a)): − Themeis presupposes a Rhemeis -alternative set (ρ-AS ). − Focusis within Rhemeis restricts the ρ-AS to the singleton set corresponding to the asserted proposition. − Themeis also presupposes a Themeis -alternative set (θ-AS ). − Focusis within Themeis restricts the θ-AS to the singleton set corresponding to Themeis . ρ-AS corresponds to what Rooth calls the contextual alternative set (Rooth, 1985; Rooth, 1992). θ-AS is a set of alternative themes with respect to the context, corresponding to what Rooth calls the question alternative set. The notion of alternative set is also closely related to Karttunen’s notion of secondary denotation (Karttunen and Peters, 1979). Following (Steedman, 2000a), we take ρ-AS to be a subset of the propositions supported by the context, whose characteristic function is obtained systematically from the IS-partitioned logical form. As noted in (Steedman, 2000a, p. 10), alternative sets may not be exhaustively known to hearers, and in practice one would want to compute with a more abstract form.
156
´ AND WEBBER KRUIJFF-KORBAYOVA
3.2. IS-sensitive Context Updating We follow (Krifka, 1993; Kruijff-Korbayov´ a, 1998; Steedman, 2000a) in defining the updating of an input context c1 with an IS-partitioned logical form ψ as comprising two phases: a Themeis update phase, c1 [θ(ψ)]c2 , and a Rhemeis update phase, c2 [ρ(ψ)]c3 , where c2 and c3 are resulting contexts (see Figure 3.2). Following (Karttunen, 1974) and current work in dynamic semantics, we can then define recursively when an input context admits an IS-partitioned logical form (see Figure 3.2): (34)
c1 admits ψ iff: − c1 admits θ(ψ): c1 [θ(ψ)]c2 & c2 = ⊥ ⊥ − c2 admits ρ(ψ): c2 [ρ(ψ)]c3 & c3 =
In the Themeis update phase, the input context c1 is checked as to whether it supports or can accommodate the presuppositions of the theme θ(ψ) – namely, the Themeis -alternative set θ-AS and the Rhemeis -alternative set ρ-AS. This yields a restricted context c2 where θ(ψ) holds. In the Rhemeis update phase, one alternative according to the ρ-AS is selected, which yields the final context c3 . Updating fails if either update phase does.
4. IS-Sensitivity of Concession
4.1. Dretske on Contrastive Statements The relevance of IS in computing the meaning of larger expressions in discourse started to attract the attention of formal semanticists in the early seventies. (Lakoff, 1971a) discusses the relation between presuppositions and reciprocal contrastive stress with respect to the connective and, together with other discourse markers such as then, too and either. (Dretske, 1972) analyzes what he calls contrastive statements, arguing for the importance of understanding how contrastive differences can make a difference in meaning. As part of this, he shows that utterances U1 and U2 (which differ “only” in IS) may differ radically in what counts as their correct explanation or justification, and similarly, arguments involving U1 and U2 as premises may differ in validity. As we model our examples on those presented in (Dretske, 1972), we reproduce the original context and some of the examples below:
CONCESSION AND INFORMATION STRUCTURE
(35)
157
Clyde, who finds intolerable any sustained involvement with a woman, and thus leads the life of a dedicated bachelor, learns that he stands to inherit a great deal of money at the age of thirty if he is married. He shops around and finds Bertha, an equally dedicated archaeologist who spends eleven months out of every year directing excavations in southeastern Turkey. Justifiably expecting that marriage to this woman will leave his life as little disturbed (in the relevant aspects) as any marriage could, he proposes. Bertha accepts and they are married.
Hence, (36) is true, but (37) is false, whereas (38) is again true. (36) The reason Clyde married Bertha was to qualify for the inheritance. (37)
The reason Clyde married Bertha was that she was the woman least likely to disturb his style of living.
(38)
The reason Clyde married Bertha was to qualify for the inheritance.
Dretske observes that the reason/justification/explanation stated, is the reason/justification/explanation not for everything in the explanandum, but only for that which constitutes the contrastive focus. Dretske also discusses the effect of focus in conditionals, both subjunctive and indicative. Again, it is easy to imagine circumstances in which (39) is true, but (40) is false, whereas (41) is again true. (39)
If Clyde hadn’t married Bertha, he would not have been eligible for the inheritance.
(40)
If Clyde hadn’t married Bertha, he would not have been eligible for the inheritance.
(41)
If Clyde hadn’t married Bertha, the marriage would likely disturb his style of living.
Dretske shows that these differences can have an impact on which arguments count as valid and sound and which do not. In the next two sections, we extend Dretske’s insights to the interpretation of concession, showing that each IS-variant of a concessive statement presupposes different sDR(s). As a result, not every ISvariant is licensed with respect to the given discourse context or world knowledge. In Section 2 we presented an account of interpreting the two types of concession, elaborating on the proposals in (Lagerwerf, 1998), (Webber
´ AND WEBBER KRUIJFF-KORBAYOVA
158
et al., 2003) and (Grote et al., 1995). Denial of expectation presupposes a specific defeasible rule (sDR) directly reflecting the relata. Concessive opposition presupposes a tercium comparationis (TC) and two sDRs where the rules predict different values with respect to the TC. In the next two sections, we refine this to a claim that each IS variant of a concessive statement presupposes different sDR(s). As a result, not every IS variant is licensed with respect to the given discourse context or world knowledge. An issue we leave for further work is that what counts as a compatible gDR may also differ, because the generalization/abstraction also needs to be sensitive to IS. 4.2. IS-Sensitive Denial of Expectation Building on what we have said about the interpretation of concession and of information structure, we obtain the following IS-sensitive scheme for interpreting a concessive statement as denial of expectation (with a further addition to be given in Section 4.2.2): (42)
concession(α,β) where α and β have the following IS-partitioning: α = θ(α)ρ(α), β = θ(β)ρ(β) assertions: i.
α and β are true
presuppositions due to IS: ii. iii.
θ-AS(α) and ρ-AS(α) θ-AS(β) and ρ-AS(β)
presuppositions due to denial of expectation: iv. v.
α >not-β Γ = the issue whether β or not-β holds
In terms of IS-sensitive context updating, this means: (43)
An initial context c1 admits concession(α, β) where α and β have the following IS-partitioning: α = θ(α)ρ(α), β = θ(β)ρ(β) iff − (assertions) c1 admits α&β, i.e., c1 admits α: c1 [θ(α)]c2 [ρ(α)]c3 c3 admits β: c3 [θ(β)]c4 [ρ(β)]c5
CONCESSION AND INFORMATION STRUCTURE
−
159
(presupposition due to denial of expectation (sDR)) c1 admits α >not-β, i.e., c1 [θ(α)]c2 [ρ(α)]c3 [θ(β)]c4 [ρ(β)]c6 ∧ c6 =⊥
Here, not-β stands for a negation of β respecting IS, that is, in which θ(β) is preserved and ρ(β) is replaced by its complement, written as ρ(β). What is new here is that context c4 must be updatable not only by ρ(β) but also by its complement. Consider the following adaptations of Dretske’s examples and the expectations that are involved in each case. (44)
Clyde married Bertha. However, he could not keep his lifestyle.
(45)
If it is Bertha (and not someone else) who Clyde marries, then it is expected that what happens to Clyde is that he can keep his lifestyle.
(46) Clyde married Bertha. However, he did not get the inheritance. (47)
If what Clyde does with Bertha is to marry her (and not something else), then it is expected that what happens to Clyde is that he gets the inheritance.
Negation as ‘suitable alternative’ Using the notion of alternative sets, the presupposed sDR in (43) can be formulated more generally as There exists β which is a suitable alternative to β in the given context, such that if α then it is expected that β : (48)
In denial of expectation, concession(α, β) presupposes: ∃β . β ∈ ρ-AS(β) ∧ β = β ∧ (c1 admits α > β ), i.e., c1 [θ(α)]c2 [ρ(α)]c3 [θ(β)]c4 [β ]c6 ∧ c6 =⊥
where a suitable alternative to β depends on the discourse context. In case of concession relation between separate utterances, we treat β as coming from the Rhemeis -alternative set of β, (i.e., ρ-AS(β)). (We return to this issue later in Section 4.4.) A trivial alternative to any proposition corresponding to a sentence with the main verb included in the Rhemeis is a proposition with the opposite polarity. However, a contextually pertinent alternative set may include other, more specific alternatives. In our examples, marrying is an alternative to dating,
160
´ AND WEBBER KRUIJFF-KORBAYOVA
Aretha is an alternative to Bertha, etc. Determining the appropriate alternatives in a given context is part of the process of interpreting IS. (48) effectively means that we model the denied expectation as a possibility: it must be possible to update c3 with both β and the alternative of β, β . The transition from c3 via c4 to c6 is a hypothetical one: in dynamic semantics, it corresponds to a test of c3 , which succeeds unless there is contrary information in the context. Only if c6 =⊥, the ‘real’ update from c1 to c5 is successful. Intuitively however, an expectation is more than just a possibility, it is a ‘likely possibility’. In order to be able to model this, we would need a model which distinguishes among possibilities with different likelihood, perhaps along the lines of (Kratzer, 1991). For now we leave this issue for further research. Our semantics so far recognizes the role of IS in identifying which denied expectation is contextually appropriate. This we illustrate below with respect to example (44). We consider only the ‘broad-focus’ reading of the second relatum, paired with the two possibilities for ISpartitioning the conceded relatum: a ‘narrow-focus’ and a ‘broad-focus’ reading. We first consider the former pair and then the latter. (49)
Clyde married Bertha. However, he could not keep his lifestyle.
(50)
sDR: If it is Bertha (and not someone else) who Clyde marries, then it is expected that what happens to Clyde is that he can keep his lifestyle.
(51)
i. IS-partitioned logical forms: α : (θ λx. marry (c, x)) (ρ λP. P ( b)) β : (θ λ Q. Q(c)) (ρ λy. ¬keep lif estyle (y)) ii. Themeis - and Rhemeis -alternative sets : θ-AS(α) : {∃x. marry (c, x)} ρ-AS(α) : {marry (c, a), marry (c, b)} θ-AS(β) : {∃Q. Q(c)} ρ-AS(β) : {keep lif estyle (c), ¬keep lif estyle (c)}7 iii. assertions: c1 [λx. marry (c, x)]c2 [λP. P (b)]c3 [λQ. Q(c)]c4 [λy. ¬keep lif estyle (c)]c5
7
For simplicity, we only consider the extremes, i.e., changing no aspects of lifestyle vs. changing all, although there are further possibilities in between, i.e., changing only some relevant aspects, such as work habits.
CONCESSION AND INFORMATION STRUCTURE
161
iv. presupposition: c1 [λx. marry (c, x)]c2 [λP. P (b)]c3 [λQ. Q(c)]c4 [λy. keep lif estyle (y)]c6 ∧ c6 =⊥ That is, the update succeeds iff If Clyde marries Bertha, then it is possible that Clyde can keep his lifestyle; Clyde marries Bertha; Clyde cannot keep his lifestyle. (52)
Clyde married Bertha. However, he could not keep his lifestyle.
(53)
sDR: If what Clyde does is marry Bertha, then it is expected that what happens to Clyde is that he can keep his lifestyle. i. IS-partitioned logical forms: α : (θ λP. P (c)) (ρ λx. marry (x, b)) β : (ρ λy. ¬keep lif estyle (y)) (θ λ Q. Q(c)) ii. Themeis - and Rhemeis -alternative sets : θ-AS(α) : {∃P. P (c)} ρ-AS(α) : {marry (c, a), marry (c, b), date (c, b)} θ-AS(β) : {∃Q. Q(c)} ρ-AS(β) : {keep lif estyle (c ), ¬keep lif estyle (c)} iii. assertions: c1 [λP. P (c)]c2 [λx. marry (x, b)]c3 [λ Q. Q(c)]c4 [λy. ¬keep lif estyle (y)]c5 iv. presupposition due to denial of expectation: c1 [λP. P (c)]c2 [λx. marry (x, b)]c3 [λ Q. Q(c)]c4 [λy. keep lif estyle (y)]c6 ∧ c6 =⊥
That is, the update again succeeds iff If Clyde marries Bertha, it is possible that Clyde can keep his lifestyle; Clyde marries Bertha; Clyde cannot keep his lifestyle. So far both the assertions and the presupposed sDRs are the same for (49) and (52): they differ only in the presuppositions due to IS. In the next section, we show another difference in the semantics. Expectation Perfection The important intuition which we have not captured so far is that the element(s) in the Rhemeis , e.g., Bertha in (49), marrying Bertha in (52) and marrying in (46), are somehow special, differentiated from their alternatives (in the given context). For example, in (49), it is not only the case that we expect that if Clyde marries Bertha he would be
´ AND WEBBER KRUIJFF-KORBAYOVA
162
β c1
θ(α)
c2
ρ(α)
c3
α c3
θ(β)
θ(β)
c4 c4
ρ(β)
ρ(β)
c6 c5 c5
Figure 17. IS-sensitive context update with concession(α, β) for denial of expectation
able to keep his lifestyle unchanged; (49) also suggests that Bertha is a special case: for anyone else than Bertha, we would normally expect that Clyde’s lifestyle would have to change. Similarly in (52): not only do we expect that if Clyde marries Bertha, he can keep his lifestyle, but also, if he would do something other than marrying Bertha (e.g., not marrying anyone or doing something else with Bertha will both do in (52)), we would expect him to have to change his lifestyle. We capture this intuition by adding another presupposed sDR: (54)
Additional presupposed sDR for concession(α, β) due to denial of expectation (42): vi.
not-α > β
The corresponding addition to the definition of when a context admits a concessive statement proposed in (43) is given in (55b). The addition we made in (48) is repeated here as (55a). (55a) says that there is a suitable alternative β to β such that if α then it is expected that β . (55b) says that all suitable alternatives α to α are such that if α then it is expected that β. (For a schematic presentation, see Figure 17.) (55) presuppositions for concession(α, β) (denial of expectation): a. ∃β . β ∈ ρ-AS(β) ∧ β = β ∧ (c1 admits α > β ), i.e., c1 [θ(α)]c2 [ρ(α)]c3 [θ(β)]c4 [β ]c6 ∧ c6 =⊥ b. ∀α . α ∈ ρ-AS(α) ∧ α = α ∧ (c1 admits α > β), i.e., c1 [θ(α)]c2 [α ]c3 [θ(β)]c4 [ρ(β)]c5 ∧ c5 =⊥ We call the additional defeasible rule in (55b) expectation perfection, because it is parallel to the notion of conditional perfection proposed
CONCESSION AND INFORMATION STRUCTURE
163
in (Geis and Zwicky, 1971) to capture the fact that humans tend to perfect conditionals to bi-conditionals. Expectation perfection leads to the following additional implicatures in the analyses of (49) and (52), respectively: (56)
Expectation perfection for (49): c1 [λx. marry (c, x)]c2 [λP. P (a)]c3 [λ Q. Q(c)]c4 [λy. ¬keep lif estyle (y)]c5 ∧ c5 =⊥
The additional condition in (56) is that in a context in which Aretha is an alternative to Bertha with respect to marrying, the update succeeds iff If Clyde marries Aretha, it is possible that what happens to Clyde is that he cannot keep his lifestyle. (57)
Expectation perfection for (52): c1 [λP. P (c)]c2 [λx. date (x, b)]c3 [λ Q. Q(c)]c4 [λy. ¬keep lif estyle (y)]c5 ∧ c5 =⊥ c1 [λP. P (c)]c2 [λx. marry (x, a)]c3 [λ Q. Q(c)]c4 [λy. ¬keep lif estyle (y)]c5 ∧ c5 =⊥
The additional condition in (57) is that in a context in which dating Bertha and marrying Aretha are alternative things that Clyde might do, the update succeeds iff If Clyde dates Bertha or marries Aretha, then it is possible that what happens to Clyde is that he cannot keep his lifestyle. The analysis of (46) repeated in (58) completes the demonstration. The conceded relatum has only the ‘narrow focus’ reading. We combine this with the ‘broad focus’ reading of the second relatum. (58) Clyde married Bertha. However, he did not get the inheritance. (59)
sDR: If what Clyde does with Bertha is to marry her (and not something else), then it is expected that what happens to Clyde is that he gets the inheritance. i. IS-partitioned logical forms: α : (θ λP. P (c, b)) (ρ λx.λy. marry (x, y)) β : (θ λ Q. Q(c)) (ρ λz. ¬get inheritance (z )) ii. Themeis - and Rhemeis -alternative sets : θ-AS(α) : {∃P. P (c, b)} ρ-AS(α) : {marry (c, b), date (c, b)} θ-AS(β) : {∃Q. Q(c)} ρ-AS(β) : {get inheritance (c ), ¬get inheritance (c)}
164
´ AND WEBBER KRUIJFF-KORBAYOVA
iii. assertions: c1 [λP. P (c, b)]c2 [λx.λy. marry (x, y)]c3 [λ Q. Q(c)]c4 [λz. ¬get inheritance (z )]c5 iv. presupposition due to denial of expectation): c1 [λP. P (c, b)]c2 [λx.λy. marry (x, y)]c3 [λ Q. Q(c)]c4 [λy. get inheritance (y )]c6 ∧ c6 =⊥ v. expectation perfection: c1 [λP. P (c, b)]c2 [λx.λy. date (x, y)]c3 [λ Q. Q(c)]c4 [λy. ¬get inheritance (y)]c5 ∧ c5 =⊥ Thus, the IS-sensitive update with (58) succeeds iff If Clyde marries Bertha, then it is possible that Clyde gets the inheritance; If Clyde dates Bertha, then it is possible that Clyde does not get the inheritance; Clyde marries Bertha; Clyde does not get the inheritance. 4.3. IS-Sensitive Concessive Opposition Now consider the examples below as concessive opposition responses with respect to the TC whether Clyde is happy as established by the question in (60).8 The presuppositions that the responses give rise to are paraphrased in (a) through (d) for each example: (a) gives the sDR of the conceded relatum and (b) its perfection, (c) gives the sDR of the other relatum and (d) its perfection. The context restricts the available alternatives, so not any arbitrary relation between Clyde and Bertha or between Clyde and the inheritance will do. In our example context (35), an alternative to marrying is dating and an alternative to not getting the inheritance is getting it. (60)
Q. Is Clyde happy?
(61) Clyde married Bertha. However, he did not get the inheritance. a. If what Clyde does with Bertha is to marry her, then it is expected that what happens to Clyde is that he is happy. 8
However artificial, it is convenient to continue to use the same sentence(s) because we have already presented their detailed IS analysis. Naturallyoccurring examples however are not rare – for example, the following concessive opposition from a FAQ sheet on PERL: Q: Can I use PERL regular expressions to match balanced text? A: Although PERL regular expressions are more powerful than ‘mathematical’ regular expressions, they still aren’t powerful enough. . .
CONCESSION AND INFORMATION STRUCTURE
165
b. If what Clyde does with Bertha is something other than marrying her, then it is expected that what happens to Clyde is something other than that he is happy. c. If what happens to Clyde is that he does not get the inheritance, then it is expected that what happens to Clyde is something other than that he is happy. d. If what happens to Clyde is something other than not getting the inheritance, then it is expected that what happens to Clyde is that he is happy. (62)
Clyde married Bertha. However, he did not get the inheritance. a. If Clyde marries Bertha, then it is expected that what happens to Clyde is that he is happy. b. If Clyde marries someone else than Bertha, then it is expected that what happens to Clyde is something else than that he is happy. c. same as (61c) d. same as (61d)
(63)
Clyde married Bertha. However, he did not get the inheritance. a. If what Clyde does is marry Bertha, then it is expected that what happens to Clyde is that he is happy. b. If what Clyde does is something else than marry Bertha, then it is expected that what happens to Clyde is something else than that he is happy. c. same as (61c) d. same as (61d)
Given that the conceded relatum has a weaker argumentative force, the polarity of the answer in all cases depends on whether we take Clyde’s not inheriting any money to entail that he is happy or unhappy. (Here we have chosen the latter.) Since we have kept the second relata the same, their analyses and thus the answers are the same in all cases. The examples differ in what is responsible for the expectation derived from the conceded relatum. What is responsible constitues the Rhemeis in each case. For the corresponding Rhemeis -alternatives, there would be the opposite expectation.
´ AND WEBBER KRUIJFF-KORBAYOVA
166
c4 γ1 c1
θ(α)
c2
ρ(α)
c3
θ(β)
c5
β
α c3 Figure 18. opposition
ρ(β)
γ1
c4
c6
γ2
c7
c6
γ2 c 7
IS-sensitive context update with concession(α, β) for concessive
Using the notion of IS-sensitivity described earlier and employing alternatives and expectation perfection as we did for denial of expectation, we can define when a context admits the concessive opposition interpretation (see also Figure 18): (64)
An initial context c1 admits concession(α, β) where α and β have with the following IS-partitioning: α = θ(α)ρ(α), β = θ(β)ρ(β) iff − −
(assertions) c1 admits α&β (as with denial of expectation in (43)) (presuppositions due to concessive opposition) a. ∃Γ. Γ = {γ 1 , γ 2 , . . . , γ n } is a contextual alternative set b. (c1 admits α > γ 1 ) ∧ (c3 admits β > γ 2 ), i.e., c1 [θ(α)]c2 [ρ(α)]c3 [γ 1 ]c4 ∧ c4 = ⊥ ∧ c3 [θ(β)]c5 [ρ(β)]c6 [γ 2 ]c7 ∧ c7 =⊥ c. ∃γ 1 , γ 2 ∈ Γ. γ 1 = γ 1 ∧ γ 2 = γ 2 ∧ γ 1 = γ 2 ∧ ∀α .∀β . α ∈ ρ-AS(α) ∧ α = α ∧ β ∈ ρ-AS(β) ∧ β = β ∧ (c1 admits α > γ 1 ) ∧ (c3 admits β > γ 2 ), i.e., c1 [θ(α)]c2 [α ]c3 [γ 1 ]c4 ∧ c4 =⊥ ∧ c3 [θ(β)]c5 [β ]c6 [γ 2 ]c7 ∧ c7 =⊥
That is, a contextual alternative set Γ is presupposed; α defeasibly implies some member(s) of Γ; every alternative α defeasibly implies the other member(s) of Γ; similarly for β, which defeasibly implies a member of Γ distinct from the member implied by α. A minimal set
CONCESSION AND INFORMATION STRUCTURE
167
Γ contains just two alternatives, γ 1 and γ 2 , and then γ 1 = γ 2 and γ 2 = γ 1 . Γ corresponds to the TC, which can be established by the preceding context fully or partially. (If it is implicit, the interpreter must figure it out; the relata in the concessive opposition relation can provide important clues.) We illustrate (64) in full for (62), repeated below: (65)
Clyde married Bertha. However, he did not get the inheritance. i. IS-partitioned logical forms: α : (θ λx. marry (c, x)) (ρ λP. P ( b)) β : (θ λ Q. Q(c)) (ρ λy. ¬get inheritance (y )) ii. Themeis - and Rhemeis -alternative sets : θ-AS(α) : {∃x. marry (c, x)} ρ-AS(α) : {marry (c, a), marry (c, b)} θ-AS(β) : {∃Q. Q(c)} ρ-AS(β) : {get inheritance (c), ¬get inheritance (c)}9 iii. assertions: c1 [λx. marry (c, x)]c2 [λP. P (b)]c3 [λQ. Q(c)]c4 [λy. ¬get inheritance (c)]c5 iv. presuppositions for concessive opposition: a. ∃Γ. Γ = {γ 1 , γ 2 } b. c1 [λx. marry (c, x)]c2 [λP. P (b)]c3 [γ 1 ]c4 ∧ c4 =⊥ c. c3 [λQ. Q(c)]c5 [λy. ¬get inheritance (y )]c6 [γ 2 ]c7 ∧ c7 =⊥ d. c1 [λx. marry (c, x)]c2 [λP. P (a)]c3 [γ 1 ]c4 ∧ c4 =⊥ e. c1 [λQ. Q(c)]c5 [λy. get inheritance (y )]c6 [γ 2 ]c7 ∧ c7 =⊥
The update succeeds, iff If Clyde marries Bertha, then it is possible that γ 1 ; If Clyde marries Aretha, then it is possible that γ 1 ; If Clyde does not inherit money, then it is possible that γ 2 ; If Clyde inherits money, then it is possible that γ 2 ; Clyde marries Bertha and Clyde does not inherit any money. In the presence of an explicit question as in (60), we take the minimal contextual alternative set Γ containing just the alternatives γ 1 (e.g., happy(c)) and γ 2 (e.g., unhappy(c)), 9
For simplicity, we again only consider the extremes, i.e., getting all of the inheritance vs. getting none, although there are further possibilities in between, i.e., getting a bit, a lot, most, etc. of it.
´ AND WEBBER KRUIJFF-KORBAYOVA
168
while γ 1 = γ 2 and γ 2 = γ 1 . Even without an explicit question, the preceding context may establish an alternative set which can serve as the TC. Otherwise, a suitable alternative set has to be accommodated. 4.4. The Concession Interpretation Scheme Revisited To conclude this section, we return to the ABC-scheme (Section 2.3). When we replace negation with the notion of a suitable alternative (Section 4.2.1) and add expectation perfection (Section 4.2.2), we get the following extended scheme: (66)
concession(α, β), e.g., “although a b”; “a however b” or “a nevertheless b” (where a and b express the propositions α, and β, respectively) − −
asserts the truth of α and β presupposes: i. α > γ ii. β > γ iii. Γ, a contextual alternative set representing the TC such that γ and not-γ are members of Γ iv. α > γ v. β > γ
where X is a contextually suitable IS-alternative to X. α and α belong to the Rhemeis -alternative set of the conceded relatum, ρ-AS(α). β and β belong to the Rhemeis -alternative set of the other relatum, ρ-AS(β). Denial of expectation verbalizes α and β, whereby β is considered equivalent to γ , and β to γ. Concessive opposition also verbalizes α and β. The tercium comparationis corresponds to the contextual alternative set Γ containing (at least) γ and γ . The last two rules are the ones that we added to capture expectation perfection. As noted in Section 2, we appeal to the mechanisms of binding, briging and accomodation in resolving the presuppositions. Bridging involves finding a compatible gDR for a presupposed sDR.
5. Summary and Further Research
In this chapter, we analyzed two types of concession distinguished in the literature and clarified the similarities and differences between them,
CONCESSION AND INFORMATION STRUCTURE
169
proposing a revision of a uniform concession interpretation scheme. We also revised and improved our IS-sensitive analysis of the denial of expectation concession, fully casting it in terms of contextual alternative sets, and we proposed an IS-sensitive analysis of concessive opposition, using the same formal machinery as the analysis of the denial of expectation relying on alternative sets. Our analysis in some cases predicts readings which are too strong: the expectation perfection leads to an opposite expectation, instead of the absence of an(y) expectation. Modeling this difference is left for future research as it requires recasting our analysis in epistemic logic. Other issues for further research include: (1) analyzing other possibilities of IS-partitioning, including Rhemeis -Focusis as well as Themeis Focusis ; (2) producing an analysis of complex utterances that takes IS into account; (3) defining which generalizations/abstractions properly license an sDR and which not, taking IS into account; (4) analyzing how previous discourse context and speaker’s attitudes interact in ISpartitioning, and (v) determining the influence of these interactions on the computation of the presupposed sDRs (and gDRs).
Acknowledgements This work was supported by the British Academy, the Institute of Advanced Studies in the Humanities of the University of Edinburgh, the Research Support Scheme of the Open Society Support Foundation (grant No. 271/1999), and the Royal Society/NATO Postdoctoral Fellowship Programme. We would also like to thank Eva Hajiˇcov´a, Alistair Knott, Nobo Komagata, Geert-Jan Kruijff, Alex Lascarides, Jaroslav Peregrin, Natalia Nygren Modjeska, Petr Sgall, Mark Steedman, Matthew Stone, Dan Hardt, the participants of the ICOS-2 conference in Dagstuhl and the IWCS-4 workshop in Tilburg, and several anonymous reviewers for useful comments.
References Asher, N. and M. Morreau: 1991, ‘Commonsense Entailment’. In: IJCAI’91, Proceedings of the Ninth International Joint Conference on Artificial Intelligence. Sydney, Australia, 387–392. Beaver, D.: 1997, ‘Presupposition’. In: J. van Benthem and A. ter Meulen (eds.): Handbook of Logic and Language. Elsevier and The MIT Press, pp. 939–1008.
170
´ AND WEBBER KRUIJFF-KORBAYOVA
Bos, J., A.-M. Mineur, and P. Buitelaar: 1995, ‘Bridging as coercive accommodation’. Technical report, CLAUS 52, Department of Computational Linguistics, Universit¨ at des Saarlandes. B¨ uring, D.: 1999, ‘Topic’. In: P. Bosch and R. van der Sandt (eds.): Focus: Linguistic, Cognitive and Computational Principles, Natural Language Processing. Cambridge: Cambridge University Press, pp. 142–165. Dretske, F.: 1972, ‘Contrastive Statements’. Philosophical Review, 411–437. Fillmore, C. J. and D. T. Langedoen (eds.): 1971, Studies in Linguistic Semantics. Holt, Rinehart and Winston, Inc. Firbas, J.: 1992, Functional Sentence Perspective in Written and Spoken Communication, Studies in English Language. Cambridge: Cambridge University Press. Geis, M. L. and A. M. Zwicky: 1971, ‘On Invited Inferences’. Linguistic Inquiry II(4), 561–566. Ginzburg, J.: 1996, ‘Interrogatives: Questions, Facts and Dialogue’. In: S. Lappin (ed.): The Handbook of Contemporary Semantic Theory. Oxford: Blackwell Publishers, Chap. Chapter 15, pp. 385–422. Grice, H. P.: 1975, ‘Logic and conversation’. In: P. Cole and J. Morgan (eds.): Syntax and Semantics, No. 3. New York: Academic Press, pp. 41–58. Grote, B., N. Lenke, and M. Stede: 1995, ‘Ma(r)king Concessions in English and German’. Discourse Processes 24(1), 87–118. Hajiˇcov´a, E. and P. Sgall: 1987, ‘The Ordering Principle’. Journal of Pragmatics 11(4), 435–454. Halliday, M. A.: 1970, A Course in Spoken English: Intonation. Oxford: Oxford Uniersity Press. Halliday, M. A.: 1985, Introduction to Functional Grammar. London, U.K.: Edward Arnold. Hamblin, C.: 1973, ‘Questions in Montague English’. Foundations of Language pp. 41–53. Karttunen, L.: 1973, ‘Presuppositions of Compound Sentences’. Linguistic Inquiry IV(2), 169–193. Karttunen, L.: 1974, ‘Presupposition and Linguistic Context’. Theoretical Linguistics 1(1/2), 181–194. Karttunen, L. and S. Peters: 1979, ‘Conventional Implicature’. In: C.-K. Oh and D. A. Dinneen (eds.): Syntax and Semantics: Presupposition, Vol. 11. Academic Press, pp. 1–56. Knott, A.: 1996, ‘A Data-driven Methodology for Motivating a Set of Coherence Relations’. Ph.D. thesis, Department of Artificial Intelligence, University of Edinburgh. Knott, A. and C. Mellish: 1996, ‘A Feature-based Account of the Relations Signalled by Sentence and Clause Connectives’. Language and Speech 39(23), 143–183.
CONCESSION AND INFORMATION STRUCTURE
171
Komagata, N.: 2003, ‘Information Structure in Subordinate and Subordinatelike Clauses’. Journal of Logic, Language and Information: Special Issue on Discourse and Information Structure 12(3), 301–318. Kratzer, A.: 1991, ‘Modality’. In: A. von Stechow and D. Wunderlich (eds.): Semantik: ein internationales Handbuch der zeitgen˘sssischen Forschung. Berlin: Walter de Gruyter, pp. 639–650. Krifka, M.: 1993, ‘Focus and Presupposition in Dynamic Semantics’. Journal of Semantics 19, 269–300. Kruijff-Korbayov´ a: 2001, ‘Information Structure and the Semantics of “otherwise”’. In: I. Kruijff-Korbayov´ a and M. Steedman (eds.): Information Structure, Discourse Structure and Discourse Semantics, ESSLLI 2001 Workshop Proceedings. Helsinki: The University of Helsinki, pp. 61–78. Kruijff-Korbayov´ a, I.: 1998, ‘The Dynamic Potential of Topic and Focus: A Praguian Discourse Representation Theory’. unpublished Ph.D. thesis, Charles University, Prague, Czech Republic. Kruijff-Korbayov´ a, I. and M. Steedman: 2003, ‘Discourse and Information Structure’. Journal of Logic, Language and Information: Special Issue on Discourse and Information Structure 12(3), 249–259. Kruijff-Korbayov´ a, I. and B. Webber: 2000, ‘Discourse Connectives, Inference and Information Structure’. In: J. Bos and M. Kohlhase (eds.): Proceedings of ICoS-2, Schloß Dagstuhl, Germany, pp. 105–120. Kruijff-Korbayov´ a, I. and B. Webber: 2001, ‘Concession, Implicature, and Alternative Sets’. In: H. Bunt (ed.): Proceedings of the International Workshop on Computational Semantics IWCS-4. Tilburg, the Netherlands, pp. 227–248. Lagerwerf, L.: 1998, Causal Connectives have Presuppositions. The Hague, The Netherlands: Holland Academic Graphics. Ph.D. Thesis, Tilburg University. Lakoff, G.: 1971a, ‘If’s, and’s and but’s about Conjunction’. In C. J. Fillmore and D. T. Langedoen (eds.), Studies in Linguistic Semantics, Holt, Rinehart and Winston, Inc., pp. 114–149. Lakoff, R.: 1971b, ‘The Role of Deduction in Grammar’. C. J. Fillmore and D. T. Langedoen (eds.), Studies in Linguistic Semantics, Holt, Rinehart and Winston, Inc., pp. 62–70. Larsson, S.: 2003, ‘Interactive communication management in an issue-based dialogue system’. In: I. Kruijff-Korbayov´ a and C. Kosny (eds.): Proceedings of the 7th workshop on the semantics and pragmatics of dialogue (DiaBruck). pp. 75–82. Mathesius, V.: 1975, ‘On Information Bearing Structure of the Sentence’. In: S. Kuno (ed.): Harvard studies in syntax and semantics. Cambridge, MA.: Harvard University Press. Partee, B.: 1995, ‘Allegation and local accommodation’. In: B. H. Partee and P. Sgall (eds.): Discourse and Meaning. Amssterdam: John Benjamins, pp. 65–86.
172
´ AND WEBBER KRUIJFF-KORBAYOVA
Partee, B. H., E. Hajiˇcov´a, and P. Sgall: 1998, Topic-Focus Articulation, Tripartite Structures, and Semantic Content. Dordrecht: Kluwer Academic Publishers. Rooth, M.: 1985, ‘A Theory of Focus Interpretation’. Ph.D. thesis, Graduate School of the University of Massachusetts, Amherst, Massachusetts. Rooth, M.: 1992, ‘A Theory of Focus Interpretation’. Natural Language Semantics 1, 75–116. Sanders, T., W. Spooren, and L. Noordman: 1992, ‘Toward a taxonomy of coherence relations’. Discourse Processes 15(1), 1–35. Sgall, P., E. Hajiˇcov´a, and J. Panevov´ a: 1986, The meaning of the sentence in its semantic and pragmatic aspects. Dordrecht: Reidel. Spooren, W.: 1989, ‘Some Aspects of the Form and Interpretation on Global Contrastive relations’. Ph.D. thesis, University of Nijmegen. Steedman, M.: 1996, Surface Structure and Interpretation. Cambridge, MA: MIT Press. Steedman, M.: 2000a, ‘Information Structure and the Syntax-Phonology Interface’. Linguistic Inquiry 31(4), 649–689. Steedman, M.: 2000b, The Syntactic Process. Cambridge, MA: M.I.T. Press. Vallduv´ı, E.: 1992, The Informational Component. New York: Garland. van der Sandt, R. A.: 1992, ‘Presupposition projection as anaphora resolution’. Journal of semantics 9, 333–377. Webber, B., A. Knott, and A. Joshi: 1999a, ‘Multiple Discourse Connectives in a Lexicalized Grammar for Discourse’. In: Proceedings IWCS-3, Third International Workshop on Computational Semantics. Tilburg, The Netherlands, pp. 309–325. Webber, B., A. Knott, M. Stone, and A. Joshi: 1999b, ‘Discourse Relations: A Structural and Presuppositional Account using Lexicalised TAG’. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. College Park MD, pp. 41–48. Webber, B., A. Knott, M. Stone, and A. Joshi: 1999c, ‘What are Little Trees Made of: A Structural and Presuppositional Account using Lexicalised TAG’. In: A. Knott, J. Oberlander, M. Johanna, and T. Sanders (eds.): Proceedings of International Workshop on Levels of Representation in Discourse (LORID’99). Edinburgh, pp. 151–156. Webber, B., M. Stone, A. Joshi, and A. Knott: 2003, ‘Anaphora and Discourse Structure’. Computational Linguistics 29(4), 545–587.
JAN VAN EIJCK
CONTEXT AND THE COMPOSITION OF MEANING
1. Introduction
We will start by briefly reviewing the discussion of DRT and compositionality, by presenting the standard view on how DRSs should be merged. We show that this view leads to a puzzle with coordination. This unsolved problem motivates the switch to a more sophisticated theory of context and context extension than is present in current rational reconstructions of DRT. We show that under this new reconstruction the need for merge has disappeared, and the coordination puzzle can be solved. Next we briefly turn to issues of salience and salience update and the use of salience in pronoun reference resolution, and list our conclusions.
2. Linking Pronouns: the Standard Dynamic Account
The by now standard dynamic account of the way pronouns get linked to their antecedents has the following two kinds of basic ingredients: − contexts, − constraints on contexts. A DRT-style representation (Kamp, 1981; Kamp and Reyle, 1993) for a piece of text uses these ingredients as follows: context constraints on context
Information conveyed by a piece of text grows, and this growth of information is reflected in a representation update: 173 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 173–193. c 2007 Springer.
174
VAN EIJCK context
new context
constraints on context
−→ update −→
new constraints on context
In DRT, the details of this scheme are filled out as follows. An initial DRS represents context and constraints on context for ‘a man entered’: x Mx Ex This initial representation changes through successive updates, as follows: xy
xy x Mx Ex
→ ‘A woman entered’ →
Mx Ex Wy Ey
→ ‘He smiled at her’ →
Mx Ex Wy Ey Sxy
Assume, now, that sentences to be added to an existing representation have a representation of their own. Then we are faced with the problem of how an initial piece representation has to be merged with a new piece of representation to effect an information update. xy x Mx Ex
y +
Wy Ey
=
Mx Ex Wy Ey
The example illustrates how we get in trouble in cases where the representation of ‘a man’ and of ‘a woman’ employ the same variable. This merge problem in the presence of clashing variables does not occur in (Kamp, 1981) and in the DRT textbook (Kamp and Reyle, 1993), for
CONTEXT AND THE COMPOSITION OF MEANING
175
these presentations of DRT work with a top-down DRS construction algorithm, where merging of DRSs is avoided because a new DRS is always constructed in the context of an already existing DRS. The Classic DRT construction algorithms always parses new sentences in the context of an existing representation structure.
3. Merging DRSs
When dynamic semantics for NL first was proposed in (Kamp, 1981) and (Heim, 1982), the approach invoked strong opposition from the followers of Montague (Montague, 1973). Rational reconstructions to restore compositionality were announced in (Groenendijk and Stokhof, 1991) and carried out in (Groenendijk and Stokhof, 1990; Chierchia, 1992; Jan98; Muskens, 1995; Muskens, 1996; Muskens, 1994; van Eijck, 1997; van Eijck and Kamp, 97; Kohlhase et al., 1996; Kuschert, 2000; Bekki, 2000), among others. All of these reconstructions are based in some way or other on DPL (Groenendijk and Stokhof, 1991), and they all inherit the main flaw of this approach: the destructive assignment problem, i.e., the fact that assigning a value to a variable x destroys the old value of x. Interestingly, DRT itself did not suffer from this problem: the discourse representation construction algorithms of (Kamp, 1981) and (Kamp and Reyle, 1993) are stated in terms of functions with finite domains, and carefully talk about ‘taking a fresh discourse referent’ to extend the domain of a verifying function, for each new NP to be processed. The merge problem arises in carrying out a Montagovian or Fregean programme of natural language analysis in a setting that takes context and context change into account. Compositional versions of DRT presuppose a definition of ‘merge’. In compositional DRT, a lexical entry for the determiner ‘a’ might look like this. x λP Q.
•P x • Qx.
And a lexical entry for the determiner ‘every’ might look like this:
176
VAN EIJCK
λP Q.
x
• Px
⇒ Qx
Two obvious questions: (i) How should the reference marker x be picked? (ii) how should • be defined? The classical DRT view on these questions can be found in (Zeevat, 1989), where the following compositional DRS definition is proposed. − Basic DRSs: (∅, ∅), (∅, {P r0 · · · rn−1 }), (∅, {⊥}), ({x}, ∅). − Merger of DRSs: δ • δ := (V δ ∪ V δ , C δ ∪ C δ ). − Implication of DRSs: δ → δ := (∅, {δ ⇒ δ }). The semantics that goes with this, with respect to some First Order Model M = (M, I), is given by: − − − − − −
[[(∅, ∅)]] = (∅, M U ), [[(∅, {P r0 · · · rn−1 })]] = (∅, {f ∈ M U | M |= f P r0 · · · rn−1 }), [[(∅, {⊥})]] = (∅, ∅), [[({x}, ∅)]] = ({x}, M U ), [[δ • δ ]] := [[δ]] ⊕ [[δ ]], where (X, F ) ⊕ (Y, G) := (X ∪ Y, F ∩ G), [[δ ⇒ δ ]] := [[δ]] → [[δ ]], where (X, F ) → (Y, G) := (∅, {h ∈ M U | ∀f ∈ F (if h[X]f then ∃g ∈ G with f [Y ]g)}).
These definitions suggest that merging proceeds as follows in our example case. x Mx Ex
x
x •
Wx Ex
=
Mx Wx Ex
This is certainly not the outcome one would like. So can such variable clashes be avoided? Or can they get repaired (e.g., by means of an alternative merge operation)? To see that these are vexing problems, consider the following puzzle with coordination (cf. also (Blackburn and Bos, 2005)): (66) A man entered and a man left.
CONTEXT AND THE COMPOSITION OF MEANING
177
A treatment of this example in compositional DRT will be based on the following ingredients: x • Qx.
‘a man’: λQ. Mx
‘entered’: λy.
‘left’: λy. Ey
Ly
‘and’: λpq.p • q.
Composing these ingredients with functional applications, using the above definition of •, gives the following (wrong) result: x ‘a man entered and a man left’:
Mx Ex Lx
4. Attempts at a Solution
One way to solve the merge problem is by a change in the representation of variables. In (Vermeulen, 1995), variables get replaced by so-called referent systems. The basic idea of this modification is to distinguish between the following two aspects of a variable: − variable name − variable address or memory slot Referent systems are like pointer structures in imperative programming, with the slight twist that input under a name gets distinguished from output under a name. x → [·] [·] → x y → [·] → y z → [·] [·] → u
178
VAN EIJCK
Merging referent systems is done by tracing variables from input to output.
x → [·] [·] → x x→ y → [·] → y • y → z → [·] [·] → u
[·] x → [·] [·] → x [·] → x [·] → y y → [·] → y = [·] z → [·] [·] [·]
This way of merging can be applied to merging ‘referent system’ DRSs, as follows. In the following example, x is first linked to the referent m, next to the referent n.
[m] → x
[m] [n] → x
[n] → x •
Mx Ex
=
Wx Ex
Mx’ Ex’ Wx Ex
The reference to m gets destroyed by the merge. To indicate that two instances of the variable name x point to different data, a renaming of the instances that refer to m is mandatory. The example shows that replacing variables by referent systems does not in itself solve the merge problem. In the example the referent m becomes unaccessible: there is no variable name attached to it anymore. A genuine solution to the merge problem in dynamic semantics is provided by sequence semantics, also proposed by Vermeulen, in (Vermeulen, 1993). In sequence semantics, a variable x gets interpreted as a stack [d0 , . . . , dn−1 ]. Each new introduction for x extends the stack by means of an operation [d0 , . . . , dn−1 ] → [d0 , . . . , dn−1 , dn ] The key idea of sequence semantics can be restated as follows: assume that each variable x comes with a sequence x , x , x . . . of extensions. x
x
x
x
x
···
d0
d1
d2
d3
d4
···
New introductions are never destructive, for they refer to an extension:
CONTEXT AND THE COMPOSITION OF MEANING
179
x,x’ x
x
Mx Ex
•
=
Wx Ex
Mx Ex Wx’ Ex’
Note that both x and x remain accessible.
5. Abstraction over Context
In this section we will propose an account of contexts, context extension and abstraction over context based on what can be viewed as a combination and simplification of referent systems and sequence semantics. The main ingredients of our accounts are (i) passing contexts as parameters and (ii) using types for constraining the indices used for pointing into contexts. Our account can also be viewed as a simplification of (Dekker, 1996) where a rational reconstruction of DRT is given in terms of a system of predicate logic extended with stack pointers, together with an encoding in polymorphic relational type theory. The basic difference is that while Dekker starts out from an version of predicate logic containing both variables and stack pointers (Dekker, 1994), in our set-up all the stack manipulation tools that we need get introduced by the type theory.
λ context · context +
context extension
constraints
We will take contexts to be essentially lists of reference markers that encode discourse information and allow us to keep track of topics of discourse. In discourse processing, the order in which discourse topics are introduced is crucial. Topics mentioned most recently are (other things being equal) more readily accessible for reference resolution. Context also provides additional information: − Gender and number information.
180
VAN EIJCK
− actor focus: agent of the sentence. − discourse focus: ‘what is talked about’ in the sentence. In this chapter, we will take this additional information as secondary. So how does one abstract over context? By representing a context as a stack of items, and by handling these stacks as suggested in Incremental Dynamics (van Eijck, 2001). c0
c1
c2
c3
···
c4
Now existential quantification can be modelled as context extension: c0
c1
c2
c3
+d = c0
c1
c2
c3
d
Indices are used to refer to context elements: 0
1
2
3
4
···
n−1
n
c0
c1
c2
c3
c4
···
cn−1
d
Given this representation of context, we can replace merge by context composition. It is convenient to introduce some type abbreviations for that. We use [e] for the type of contexts, [e] → [e] → t for the type of context transitions (characteristic functions of binary relations on the type of contexts). Use a :: α for “a is of type α”. Assume c, c :: [e] and x :: e. Let cˆx be the result of extending context c with element x. This assumes (ˆ) :: [e] → e → [e]. Note that we assume that type arrows associate to the right, so that [e] → e → [e] gets interpreted as [e] → (e → [e]). Now define context extension as follows: ∃ := λcc .∃x(cˆx = c ) This definition of context extension essentially uses polymorphic type theory (Hindley, 1997; Milner, 1978). The type polymorphism is crucial, for it is implicit in cˆx = c that the lengths of the contexts c and c match. In other words, the type of contexts is polymorphic, for a context may have any finite length. The type of ∃ is given by ∃ :: [e] → [e] → t. This is a polymorphic type, for ∃ relates contexts of length n to contexts of length n + 1, for any n. The type polymorphism can be made explicit by writing the type of a context c as [e]i , with i a type variable indicating the context length, but for convenience we omit these indices. What we do need, however,
CONTEXT AND THE COMPOSITION OF MEANING
181
is a means of referring to the length of a context c in a generic way. We will use |c| for the length of context c. Let T be an abbreviation of [e] → [e] → t, i.e., let T be the type of context transitions. Then ∃ :: T . Composition of context transitions is also easily defined in polymorphic type theory. Assume φ, ψ :: T , let c, c :: [e] and define ( ; ) as follows: φ ; ψ := λcc .∃c (φcc ∧ ψc c ) Now the type of ; is given by: ( ; ) :: T → T → T. What this says is that ( ; ) takes two context transitions and produces another context transition. If a context c has length n, i.e., if the context elements run from c0 to cn−1 , indices referring to context elements should run from 0 to n − 1: 0
1
2
3
4
···
n−1
c0
c1
c2
c3
c4
···
cn−1
This can be achieved in a natural way by using natural numbers as index types. Recall that under the Von Neumann encoding of natural numbers it holds that n = {0, . . . , n−1}. Thus, an index of type n is an index ranging over {0, . . . , n − 1}, and this is precisely what is needed for indexing into a context of length n. If c is a context of length n, i.e., c :: [e]n , then i :: n indicates that i is of the appropriate type for indexing into c. If we assume that λci.c[i] :: [e]n → n → e, where n is the type variable indicating context length, then we get (λi.c[i]) :: n → e. This illustrates how polymorphic type assigment can enforce the index to be of the type that fits the size of the context. It is convenient to gloss over these details by using ι as a type for context indices, and assuming that indices fit the sizes of contexts throughout. More specifically, in types of the form ι → [e] → α we will tacitly assume that the index fits the size of the initial context [e]. Thus, ι → [e] → [e] → t is really a type scheme rather than a type, although the type polymorphism remains hidden from view. Since ι → [e] → [e] → t generalizes over the size of the context, it is shorthand for the types 0 → [e]0 → [e] → t, 1 → [e]1 → [e] → t, 2 → [e]2 → [e] → t, and so on.
182
VAN EIJCK
In what follows, we will employ variables P, Q of type ι → [e] → [e] → t, variables i, j, j of type ι, and variables c, c of type [e]. We call a function of type ι → T an indexed context transition. In the treatment of the indefinite determiner, we assume that the determiner combines with two indexed context transitions and produces a context transition. Assume that P and Q are indexed context transitions, i.e., assume P, Q :: ι → T . Then the new version of the lexical entry for determiner ‘a’ runs like this: λP Qc.(∃ ∃ ; P i ; Qi)c where i = |c|. The “φ(i) where i = |c|” is a piece of syntactic sugar that can be removed, at the penalty of unreadability, by generic substitution of |c| for i in φ. The type of this determiner entry is (ι → T ) → (ι → T ) → T . In a phrase [S [NP [DET a ][CN A ]][VP B ]], the common noun A and the verb phrase B get interpreted as indexed context transitions, to be combined by the interpretation of the determiner into a new context transition that interprets the sentence. This illustrates the type lift ∗ involved in incremental dynamics. = T t∗ e∗ = ι ∗ (α → β) = α∗ → β ∗ In fact, the shift from e to ι will always take place “in context”. To make this explicit, one might wish to define the type lift for a type language built from the primitive t with the operations e → α and α → β. To define universal quantification, we first need a definition of context negation: ¬ ¬φ := λcc .(c = c ∧ ¬∃c φcc ) If φ is a context transition, then ¬ ¬φ is a new context transition, and ¬ ¬φ expresses the familiar negation as test from dynamic semantics: input context is equal to output context, and the test succeeds for a given input c iff there are no φ-transitions from c. Note that ¬ ¬ :: T → T . Dynamic implication ⇒ can now be defined in terms of ¬ ¬ and ; , as follows: φ ⇒ ψ := ¬ ¬(φ ; ¬ ¬ψ)
CONTEXT AND THE COMPOSITION OF MEANING
183
The type of ⇒ is given by (⇒) :: T → T → T . In terms of this, we phrase the lexical entry for the determiner ‘every’ as follows: λP Qc.((∃ ∃ ; P i) ⇒ Qi))c where i = |c|. This has the correct type (ι → T ) → (ι → T ) → T , and it assigns the dynamic meaning familiar from DRT and DPL. Predicate Lifting We assume that the lexical meanings of CNs, VPs are given to us as one place predicates (type e → t) and those of TVs as two place predicates (type e → e → t). We therefore define blow-up operations for lifting one-placed and two-placed predicates to the dynamic level. Assume A to be an expression of type e → t, and B an expression of type e → e → t; we use c, c as variables of type [e], and j, j as variables of type ι, and we employ postfix notation for the lifting operations: A◦ := λjcc .(c = c ∧ Ac[j]) B • := λjj cc .(c = c ∧ Bc[j]c[j ]) Note that (◦ ) :: (e → t) → ι → T and (• ) :: (e → e → t) → ι → ι → T . The operation ◦ lifts a one-place predicate to a function of type ι → T , the operation • lits a two-place predicate to a function of type ι → ι → T . These type lifts are in accordance with the type lifting operation ∗, for A◦ :: (e → t)∗ and B • :: (e → e → t)∗ . It is instructive to compare our typing ι → T for a common noun like man with the typing in (Dekker, 1996), where the translation of man has the following type: e → e1 , . . . , en → e1 , . . . , en , where e1 , . . . , en is the type of n-ary relations in the relational type theory of (Orey, 1959). This makes a common noun into a function for mapping individuals to relation transformers. This detour via polymorphic relational type theory turns out to be unnecessary. Passing contexts as parameters, together with the use of typed indices, allows us to remain within simple polymorphic type theory. Implementation as a functional program in a functional programming language based on polymorphic type theory is therefore straightforward.
184
VAN EIJCK 6. Multidimensional Grammar
Following (Creswell, 1973), (Oehrle, 1994) and (Muskens, 2003), we use a multi-dimensional set-up for our grammar formalism. We will use signs consisting of three components: a syntactic term, a semantic term, and a sign type. Syntactic Terms For syntactic terms, we take closed linear lambda terms over word lists, with the conventions of list notation and list concatenation adopted from functional programming languages like Haskell (Haskell Team). A lambda term is linear if each lambda operator binds exactly one variable. We take S as the type of a word list. Word list concatenation is given by ++ :: S → S → S. We abbreviate λxλy to λxy, and so on. We use x, y as variables over word lists, i.e., x, y :: S, and X as a variable over functions from word lists to word lists, i.e., X :: S → S. Here are some example syntactic terms, with their syntactic types. [john] :: S [john, loves, mary] :: S λx.[loves]++x :: S → S λxy.y++[loves]++x :: S → S → S λx.x++[loves, mary] :: S → S λX.X[loves, mary] :: (S → S) → S Semantic Terms Semantic terms are expressions from polymorphic typed logic, over basic types e and t, with the conventions for the use of [e] and ι that were explained above. Sign Types and Sign Type Reduction The language of sign types is given by: SignType := Ref | Noun | Sent | SignType → SignType The functions ♦ , ♥ reduce a sign type Φ to a pair consisting of a syntactic type Φ♦ and a semantic type Φ♥ . Ref♦ Noun♦ Sent♦ (A → B)♦
:= := := :=
S S S A♦ → B ♦
Ref♥ Noun♥ Sent♥ (A → B)♥
:= := := :=
ι ι→T T A♥ → B ♥
CONTEXT AND THE COMPOSITION OF MEANING
185
Note that the links Ref♥ = ι, Noun♥ = ι → T and Sent♥ = T constitute the basic type assignment of context logic. Signs Signs are triples consisting of a syntactic term S, a semantic term T and a sign type Φ, under the constraint that S :: Φ♦ and T :: Φ♥ . Assume that Woman is a constant of type e → t. Then the following triple is a sign: ([woman], Woman◦ , Noun). To see that this is a sign, note that the typing constraints are satisfied, for Noun♦ = S, which matches [woman] :: S, and Noun♥ = ι → T , which matches Woman◦ :: ι → T . Assume that Love is a constant of type e → e → t. Then the following triple is a sign: (λxy.y++[loves]++x, Love• , Ref → Ref → Sent) The typing constraints are satisfied, for λxy.y++[loves]++x :: S → S → S, (Ref → Ref → Sent)♦ = S → S → S, Love• :: ι → ι → T , (Ref → Ref → Sent)♥ = ι → ι → T . Here are some further examples of signs. The check that these are signs is left to the reader. (Assume Smile is a constant of type e → t.) − λxX.X([every]++x), λP Qc.((∃ ∃ ; P i) ⇒ Qi))c where i = |c|, Noun → (Ref → Sent) → Sent. − λX.X[every, woman], λQc.((∃ ∃ ; Woman◦ i) ⇒ Qi)c where i = |c|, (Ref → Sent) → Sent. − λx.x++[smiled], Smile◦ , Ref → Sent. In the treatment of proper names, we will assume that appropriate indices for proper names can get extracted from the input context. Indeed, it will be assumed that all proper names are linked to anchored elements in context. The incrementality of the context update mechanism ensures that no anchored elements can ever be overwritten. Suppose (†) :: (e, [e]) → ι is the function that gives the first index i in c with c[i] = x. Then if n :: e and c :: [e] is a context in which n occurs, †(n, c) gives the lowest index of n in c. In case there is no such an index, this is not well-defined, but we will assume contexts that have
186
VAN EIJCK
referents for all proper names. Let name :: e. We define: name := λP cc .P icc where i = †(name, c). Note that ( ) :: e → (ι → T ) → T , and name :: (ι → T ) → T . Let John be a constant of type e. Then the following triple is a sign: (λX.X[john], John , (Ref → Sent) → Sent). Combining Signs The simplest way of combining signs is by application in each of the first two dimensions. (λX.X[john], John )(λx.x++[smiled], Smile◦ ) β
→ → ([john, smiled], λcc .c = c ∧ Smile(c[i]) where i = †(John, c)) In this example we considered the first sign as a function that takes the second sign as its argument. This boils down to using the following application combinator: C 1 :: (α → β) → α → β C 1 = λFX · FX . The first line gives the type specification, the second line the definition. The type specification indicates that the first argument of C 1 can be any function and the second argument any argument to that function. We get: C 1 (λX.X[john], John )(λx.x++[smiled], Smile◦ ) β
→ → ([john, smiled], λcc .c = c ∧ Smile(c[i]) where i = †(John, c)) Other ways of combining signs are possible. ‘Consider the first sign as an argument of the second sign’ would correspond to the following combinator: C 2 :: α → (α → β) → β C 2 = λX F · FX . An operator that we will need is C 3 , for combining a transitive verb with its direct object. C 3 :: (α → α → β) → ((α → β) → α) → α → β C 3 = λRX y.X (λx.Rxy).
CONTEXT AND THE COMPOSITION OF MEANING
187
We illustrate this with an example in the syntax dimension: C 3 (λxy.y++[loved]++x)(λX.X[every, woman]) β
→ → λy.y++[loved, every, woman] In the semantics dimension, this works out in the same way: C 3 (Love• ) (λQc.((∃ ∃ ; Woman◦ i) ⇒ Qi)c where i = |c|) β
→ → λjc.((∃ ∃ ; Woman◦ i) ⇒ (λc c (c = c ∧ Love• c [j]c [i])))c where i = |c|) This can be simplified further by expansion of Woman◦ , Love• and the definitions of ∃ , ; and ⇒: β
→ → λjcc .c = c ∧ ∀x(Woman(x) → Love(c[j], x)). The following tree structures summarize what these combinators do: C1 C2 C3 F
A
A
F
F Ax
R x It is also possible to define combinators that achieve the effects of quantifier raising: see (Muskens, 2003) for details.
7. A Fragment
For the fragment, all we have to do is given specifications for the signs. Sentence and Text Formation (λxy.[if]++x++[then]++y, λpq.p ⇒ q, Sent → Sent → Sent) (λxy.x++[.]++y,
λpq.p ; q,
Sent → Sent → Sent)
If NP is an NP sign and VP a VP sign, then C 1 NP VP is a sentence sign. Names (λX.X[mary], Mary , (Ref → Sent) → Sent)
188
VAN EIJCK
Names combine with VPs to form sentences by means of combinator C1 . Pronouns Anaphoric reference resolution is the process of fixing the references of pronouns in a given context by linking the pronouns to appropriate context indices. After reference resolution, the interpretation of a pronoun is an index pointing to an appropriate context element. ([PROj ], j, Ref) Pronouns combine with VPs to form sentences by means of combinator C2 . Determiners Let ⇓: ([e] → t) → t be the function that gives true for a context set P :: [e] → t just in case P is not empty. Then ⇓ is an operation for success. The sign type for all determiner signs is Noun → (Ref → Sent) → Sent. (λxX.X[every++x], λP Qc.(¬ ¬(∃ ∃ ; P |c| ; ¬ ¬Q|c|))c) (λxX.X[some++x], λP Qc.(∃ ∃ ; P |c| ; Q|c|)c) (λxX.X[no++x],
λP Qc.(¬ ¬(∃ ∃ ; P |c| ; Q|c|))c)
(λxX.X[the++x],
λP Qc.((λc .c = c ∧ ∃x∀y(⇓ (∃ ∃ ; P i (i|y)c) ↔ x = y)) ; ∃ ; P i ; Qi)c where i = |c|)
Relative Clauses Let be the empty list. The relative clause formator that takes a sentence with a gap for a referent in it, (syntactic type S → S, semantic type ι → T , sign type Ref → Sent), fills the gap with the empty list, and produces a function from nouns to nouns. Thus, the sign type of the relative clause formator is (Ref → Sent) → Noun → Noun. The syntactic and semantic parts look like this: (λXx.x++[that]++X(), λQP j.(P j ; Qj))
CONTEXT AND THE COMPOSITION OF MEANING
189
Common Nouns ([woman], Woman◦ , Noun). ([man], Man◦ , Noun). If CLAUSE is a clause sign (sign type Ref → Sent), THAT is the relative clause operator sign, and CN is a noun sign, then a complex common noun is construed by the following rule: C 2 CN(C 1 THAT CLAUSE). VPs (λx.x++[laughed], Laugh◦ , Ref → Sent) (λx.x++[smiled], Smile◦ , Ref → Sent) For VPs consisting of a TV and a direct object, the rule is given by C 3 TV NP, where TV is the TV sign and NP the sign for the direct object. TVs Ref → Ref → Sent) (λxy.y++[loved]++x, Love• , (λxy.y++[respected]++x, Respect• , Ref → Ref → Sent)
8. Solution of the Coordination Puzzle
Sign for ‘a man’: λX.X[a, man], λQc.((∃ ∃ ; Man◦ i) ; Qi)c where i = |c|, (Ref → Sent) → Sent Sign for ‘entered’: λx.x++[entered], Enter◦ , Ref → Sent. Sign for ‘a man entered’, after reduction: [a, man, entered], λcc .∃x(c = cˆx ∧ Man x ∧ Enterx, Sent
190
VAN EIJCK
Sign for ‘a man left’, after reduction: [a, man, left], λcc .∃x(c = cˆx ∧ Man x ∧ Leave x, Sent Sign for ‘a man entered and a man left’, after some reduction: [a, man, entered, and, a, man, left], λcc .∃x(Man x ∧ Enter x ∧ cˆx = c ) ; λcc .∃x(Man x ∧ Leave x ∧ cˆx = c ), Sent. Sign for ‘a man entered and a man left’, after final reduction: [a, man, entered, and, a, man, left], λcc .∃x(Man x ∧ Enter x ∧ ∃y(Man y ∧ Leave y ∧ cˆxˆy = c )), Sent.
9. Conclusion
Anaphoric reference resolution in context logic is determined on the fly, on the basis of: − Syntactic properties of the sentence that contains the pronoun. − Information conveyed in the previous discourse. − Background information shared by speaker and hearer (the common ground). An account of how the simple contexts discussed above can be enriched to allow for salience updates is provided in (van Eijck, 2002). The basic change is the replacement of contexts with contexts under permutation, type p[e]. Let x : c be the operation that extends context c with object x, while at the same time pushing x to the most salient position in the new context. Then the new definition of context extension runs like this: ∃ := λcc .∃x((x : c) = c ) Here it is assumed that c, c :: p[e], P, Q :: ι → p[e] → p[e] → t. The new translation of ‘a man’ effects a salience reshuffle: λQcc · ∃x(Man(x) ∧ Qi(x : c)c ) where i = |c|.
CONTEXT AND THE COMPOSITION OF MEANING
191
Context semantics is flexible enough to take syntactic effects on salience ordering into account, for lambda abstraction allows us to make flexible use of the salience updating mechanism. To see why this is so, note that in systems of typed logic, predicate argument structure is a feature of the ‘surface syntax’ of the logic. Consider the difference between the following formulas: (λxy.Kxy)(b)(j) (λxy.Kyx)(j)(b) (λx.Kbx)(j) All of these reduce to Kbj, but the predicate argument structure is different. Surface predicate argument structure of lambda expressions can be used to encode the relevant salience features of surface syntax, and we can get the right salience effects from the surface word order of examples like Bill kicked John versus John got kicked by Bill versus John, Bill kicked. Anaphoric reference resolution can now be implemented by a mechanism for picking the indices of the entities satifying the appropriate gender constraint from the current context, in order of salience. The result of reference resolution is a list of indices, in an order of preference determined by the salience ordering of the context. Thus, the meaning of a pronoun, given a context, is an invitation to pick indices from the context. This can be further refined in a set-up that also stores syntactic information (about gender, case, and so on) as part of the contexts. The proposed reference resolution mechanism provides an ordering of resolution options determined by syntactic structure, semantic structure, and discourse structure. This shows that pronoun reference resolution can be brought within the compass of dynamic semantics in a relatively straightforward way, and that the mechanism can be viewed as an extension of pronoun reference resolution mechanisms proposed for DRT (Wada and Asher, 1986; Blackburn and Bos, 2005). With minimal modification, the proposal also takes the so-called ‘actor focus’ from the centering theory of local coherence in discourse (Grosz and Sidner, 1986; Grosz et al., 1995) into account. Contexts ordered by salience are a suitable datastructure for further refinement of the reference resolution mechanism by means of modules for discourse focus and world knowledge (Walker et al., 1998).
192
VAN EIJCK
References
P. Blackburn and J. Bos: 2005. Representation and Inference for Natural Language; A First Course in Computational Semantics — Two Volumes. Stanford: CSLI. D. Bekki: 2000. Typed dynamic logic for E-type link. In Proceedings for Third International Conference on Discourse Anaphora and Anaphor Resolution (DAARC2000), pp. 39–48. Lancaster University, U.K. G. Chierchia: 1992. Anaphora and dynamic binding. Linguistics and Philosophy, 15(2):111–183. M.J. Cresswell: 1973. Logics and Languages. London: Methuen. P. Dekker: 1994. Predicate logic with anaphora. In L. Santelmann and M. Harvey, editors, Proceedings of the Fourth Semantics and Linguistic Theory Conference, page 17 vv, Cornell University. DMML Publications. P. Dekker: 1996. Representation and information in dynamic semantics. In Jerry Seligman and Dag Westerst˚ ahl, editors, Language, Logic and Computation, pp. 183–197. CSLI, Stanford. J. van Eijck: 1997. Typed logics with states. Logic Journal of the IGPL, 5(5):623–645, 1997. J. van Eijck. Incremental dynamics. Journal of Logic, Language and Information, 10:319–351. J. van Eijck: 2002. Reference resolution in context. In M. Theune, A. Nijholt, and H. Hondorp, editors, Computational Linguistics in the Netherlands 2001 Selected Papers from the 12th CLIN Meeting, pp. 89–103. Amsertdam: Rodopi. J. van Eijck and H. Kamp: 1997. Representing discourse in context. In J. van Benthem and A. ter Meulen, editors, Handbook of Logic and Language, pp. 179–237. Amsterdam: Elsevier. B. Grosz, A. Joshi, and S. Weinstein: 1995. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21:203–226. B.J. Grosz and C.L. Sidner: 1986. Attention, intentions, and the structure of discourse. Computational Linguistics, 12:175–204. J. Groenendijk and M. Stokhof: 1990. Dynamic Montague Grammar. In L. Kalman and L. Polos, editors, Papers from the Second Symposium on Logic and Language, pp. 3–48. Budapest: Akademiai Kiadoo. J. Groenendijk and M. Stokhof: 1991. Dynamic predicate logic. Linguistics and Philosophy, 14:39–100. I. Heim: 1991. The Semantics of Definite and Indefinite Noun Phrases. PhD thesis, University of Massachusetts, Amherst, 1981. J. Roger Hindley: 1997. Basic Simple Type Theory. Cambridge University Press. The Haskell Team. The Haskell homepage. http://www.haskell.org.
CONTEXT AND THE COMPOSITION OF MEANING
193
Martin Jansche: 1998. Dynamic Montague Grammar lite. Dept of Linguistics, Ohio State University. H. Kamp: 1981. A theory of truth and semantic representation. In J. Groenendijk et al., editors, Formal Methods in the Study of Language. Mathematisch Centrum, Amsterdam. M. Kohlhase, S. Kuschert, and M. Pinkal: 1996. A type-theoretic semantics for λ-DRT. In P. Dekker and M. Stokhof, editors, Proceedings of the Tenth Amsterdam Colloquium, Amsterdam: ILLC. H. Kamp and U. Reyle: 1983. From Discourse to Logic. Kluwer, Dordrecht. S. Kuschert: 2000. Dynamic Meaning and Accommodation. PhD thesis, Universit¨ at des Saarlandes, 2000. R. Milner: 1978. A theory of type polymorphism in programming. Journal of Computer and System Sciences, 17. R. Montague: 1973. The proper treatment of quantification in ordinary English. In J. Hintikka e.a., editor, Approaches to Natural Language, pp. 221–242. Reidel. R. Muskens: 1984. A compositional discourse representation theory. In P. Dekker and M. Stokhof, editors, Proceedings 9th Amsterdam Colloquium, pp. 467–486. Amsterdam: ILLC. R. Muskens: 1995. Tense and the logic of change. In U. Egli et al., editor, Lexical Knowledge in the Organization of Language, pp. 147–183. W. Benjamins. R. Muskens: 1996. Combining Montague Semantics and Discourse Representation. Linguistics and Philosophy, 19:143–186. R. Muskens: 2002. Language, lambdas and logic. Manuscript, Tilburg University. R. Muskens: 2003. Language, lambdas and logic. In G.J. Kruijff and R. Oehrle, editors, Resource Sensitivity in Binding and Anaphora, Studies in Linguistics and Philosophy, pp. 23--54. Kluwer. R. Oehrle: 1994. Term-labeled categorial type systems. Linguistics and Philosophy, 17:633–678. S. Orey: 1959. Model theory for the higher order predicate calculus. Transactions of the American Mathematical Society, 92:72–84. C.F.M. Vermeulen: 1993. Sequence semantics for dynamic predicate logic. Journal of Logic, Language, and Information, 2:217–254. C.F.M. Vermeulen: 1995. Merging without mystery. Journal of Philosophical Logic, 24:405–450. H. Wada and N. Asher: 1986. BUILDRS: An implementation of DR theory and LFG. In 11th International Conference on Computational Linguistics. Proceedings of Coling ’86, University of Bonn. M. Walker, A. Joshi, and E. Prince, editors: 1998. Centering Theory in Discourse. Clarendon Press,. H. Zeevat: 1989. A compositional approach to discourse representation theory. Linguistics and Philosophy, 12:95–131.
MARC SWERTS AND EMIEL KRAHMER
MEANING, INTONATION AND NEGATION
1. Introduction
1.1. Meaning, Intonation . . . In describing the sound shape of a language, it is common practice to distinguish between a segmental and a suprasegmental (or prosodic) level. The former refers to the individual speech sounds, seen as the basic units into which a continuous stream of speech can be subdivided. The latter comprises vocal features such as speech melody, tempo, loudness, pause, that are not typical attributes of the single segments, but are characteristic of longer stretches of speech. There has been a lot of research on how these two levels of sound structure may affect the meaning of an utterance. At the segmental level, one can view the individual speech sounds as the basic building blocks out of which meaningful units are constructed. Though they have no intrinsic meaning of their own, they may change meaning in a discrete way, as the replacement of one phoneme by another forms a different word (the nevertheless principle). Consider the following pair of utterances: (67) a. John likes dogs. b. John likes hogs. The difference in meaning between these utterances is obvious. There is a clear-cut segmental contrast between the phonemes /d/ and /h/, which implies a categorical difference between the words dogs and hogs, and thus accounts for the difference in (truth-conditional) meaning between (67.a) and (67.b). A linguistic description of such phonological contrasts is helped by the existence of a lexicon which provides a yardstick to decide whether or not a difference in form leads to a difference in meaning. Similar attempts to relate form to meaning at the suprasegmental level have often been less successful, because prosodic variation is usually not distinctive in this structural linguistic sense. It is generally more difficult to paraphrase how the meaning of an utterance is affected 195 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 195–212. c 2007 Springer.
196
SWERTS AND KRAHMER
by replacing its intonation contour by another. For instance, consider the following variants of (67.a). In (68.a), the word dogs is pronounced with a sharp rise in pitch (an H∗ pitch accent in the terminology of Pierrehumbert 1980), while in (68.b) it is pronounced with a lower-rising pitch accent (notated as L+H∗ ). (68) a. John likes dogs H∗ b. John likes dogs L+H∗ What is the difference in ‘meaning’ between (68.a) and (68.b)? For instance, what is the function of the L+H∗ accent in (68.b)? The literature contains the following, partially overlapping suggestions. According to Pierrehumbert & Hirschberg (1990) it marks a contrastive relation between dogs and something else. Vallduv´ı (1992) claims that it indicates that the NP dogs is a link (an instruction to update a file card, in the sense of Heim 1982). According to the theory of Steedman (2000) it is an indication that dogs is part of the theme (provided that the entire contour is of the form L+H∗ L− H%), while Hendriks (2002) would claim that dogs is a non-monotone anaphor. To complicate the picture even further, it is still a matter of considerable debate whether a separately identifiable L+H∗ -form accent exists.1 This is not an isolated problem. When studying the relation between meaning and intonation, we have to face basic questions such as: what are the descriptive intonational units, does the assumed meaning of a contour generalize to all tokens of that intonation pattern, how should one account for the variability between speakers in how they supplement their utterances with intonation patterns and for the variability between listeners in how they interpret particular contours, and how should one deal with the fact that the linguistic and situational context of an utterance may overrule the meaning of a given intonational contour.2 Problems such as these made intonologists sceptical about 1
Recent work by Herman and McGory (2002) shows that of all the ToBI tones (Silverman et al., 1992) the H∗ and L+H∗ are conceptually the most similar ones, and the main cause of disagreements between professional labellers. 2 Moreover, it is worth stressing that prosody may also be ‘meaningful’ in quite different ways, to signal communicatively relevant phenomena like the cocktail party phenomenon, turn-taking, emotional and attitudinal aspects of utterances, etc.
MEANING, INTONATION AND NEGATION
197
the prospects of assigning ‘meanings’ (in the broadest sense) to intonation contours. Cutler (1977:106): “(. . . ) the attempt to extract from [intonation contours] an element of commonality valid in all contexts must be reckoned a futile endeavour”. One of the propositions (no. 9) of the theory of intonation put forward by ’t Hart et al., (1990:110) is: “Intonation features have no intrinsic meaning.” One of the key problems seems to be that prosody often involves gradient rather than categorical differences, which is a severe complication when one wants to apply the principle of nevertheless to prosodic features, and assign semantic properties to these features. Despite these methodological difficulties, semanticists are increasingly interested in incorporating intonation in semantic theories of language (e.g., Schwarzschild 1999, Steedman 2002 and Hendriks 2002). The motivation is that utterances do not occur in an intonational vacuum.3 Rather, speakers may use intonation to cue certain aspects of meaning, and listeners may use these cues during the interpretation of the speaker’s utterance. The case of negation phrases offers a good illustration of this. 1.2. . . . and Negation Negation phrases in natural language are usually represented semantically by a logical negation. But consider the following examples from Horn (1985:132), who uses small caps to indicate pitch accents: (69) a. Some men aren’t chauvinists — all men are chauvinists. b. I didn’t manage to trap two mongeese — I managed to trap two mongooses. The negation phrases in the first part of these utterances do not negate part of the proposition expressed, but respectively a conversational implicature and an instance of inflectional morphology. It is not obvious how negation should be expressed logically for examples such as these. According to Horn (1985:125), the classical examples of presupposition denial are manifestations of the same problem as that exemplified by (69). Consider the standard example, originally due to Russell (1905). (70) The present king of France is not bald. 3
This is obvious for spoken language, but recent psycholinguistic evidence suggests that people even use intonation when interpreting written language (see e.g., Fodor 2002).
198
SWERTS AND KRAHMER
In (70) the negation phrase can either deny the proposition that the present king of France is bald or the presupposition that a king of France exists. The difference becomes especially clear when we take a question-answer perspective. (71) Q: A1 : A2 :
Is the present king of France bald? No, he isn’t. No, the king of France isn’t bald — there isn’t any king of France.
Horn would call the negation in A1 a descriptive negation and the one in A2 (like those in (69)) a metalinguistic negation. The problem is that we have two different uses of negation, which cannot both be treated in the same logical way. Various solutions have been proposed. One is to assume that natural language negation is semantically ambiguous. Russell, for instance, maintains that (70) is ambiguous between a narrow scope and a wide scope reading for the negation. Others, following the seminal work of Frege and Strawson, have argued that presupposition failure leads to truth-value gaps (propositions being neither true nor false) and that there should really be two different logical negations (e.g., ¬ for the traditional negation, and ∼ for the presupposition cancelling negation).4 ϕ
¬ϕ
∼ϕ
T F N
F T N
F T T
The main problem for accounts which assume the existence of a semantic ambiguity for negation is that it is difficult to show the actual existence of such an ambiguity. For instance, while it is true that there are many languages in which negation may be ambiguous, this ambiguity apparently does not involve two negation phrases which can be represented in the logical form as ¬ and ∼ respectively. This has led other researches (such as Atlas 1977 and Gazdar 1979) to deny the existence of an ambiguity for negation. However, this is “wielding 4
See Beaver and Krahmer (2001) for an overview. They also present an alternative to postulating ambiguities for logical connectives, which uses Bochvar’s (1939) assertion operator A as a presupposition wipe-out device; whatever is presupposed by a logical formula ϕ, Aϕ presupposes nothing.
MEANING, INTONATION AND NEGATION
199
Occam’s razor like a samurai sword” (Horn 1985:126), in that it denies the existence of the two distinct uses of negation. According to Horn negation is pragmatically ambiguous. It has a built-in duality of use; negation may be used in either a descriptive or a metalinguistic way. The difference in usage can be illustrated best when the two types of negation are studied in larger interactions: metalinguistic negation naturally occurs in response to utterances by other dialogue partners earlier in the same discourse contexts, announcing a speaker’s unwillingness to accept another’s assertion of a particular state of affairs. Thus, following Horn (1985:136), a metalinguistic negation expresses something like “‘I object to u”, where u is crucially a linguistic utterance rather than an abstract proposition”. The problem, then, is how to distinguish the different uses of negation. Horn mentions two tests. The first is based on the ability of descriptive (but not metalinguistic) negation to incorporate prefixally. These examples are from Horn (1985:140): (72) a. The king of France is {not happy / * unhappy} — there isn’t any king of France. b { It isn’t possible / * Its impossible } for you to leave now — it’s necessary. The second test (mentioned in an appendix of Horn 1985) is based on the observation that metalinguistic (but not descriptive) negation can occur in “not X but Y” contexts. Consider the following example (due to Fillmore, cited by Horn 1985:170): (73) John wasn’t born in Boston, but in Philadelphia. Both tests are not foolproof and have a somewhat limited applicability, so a more general criterion would be useful. It has been argued that intonation could be used to distinguish the two uses of negation. Some claim that the negative sentence in a metalinguistic negation involves “contrastive intonation with a final rise” (this is what Liberman and Sag 1974 dubbed the ‘contradiction contour’5 ) while the continuation contains a ‘rectification’ which is prosodically marked. The goal of the current study is to find empirical support for the difference between descriptive and metalinguistic negations. In particular, we present evidence for the existence of a set of prosodic correlates for 5
Note that Cutler (1977) has argued that this particular contour can also have very different ‘meanings’.
200
SWERTS AND KRAHMER
these two different usages. We focus both on what speakers do (using production data) and on what hearers do (using perception data). The perception experiment explicitly trades on the assumption that meaning distinctions are only communicatively relevant if they can reliably and consistently be ‘interpreted’. In the following, we will first give more information on the data we used and on the way we operationalized descriptive and metalinguistic negations. Then, we present results of a speaker-oriented and listener-oriented analysis of the negations. We end with a general discussion and conclusion.
2. Data and definitions
How do speakers produce metalinguistic negations, and is there a difference with ‘ordinary’, descriptive negations? To address this question, we have conducted a corpus study of a set of human-machine interactions that contains utterances that can be operationalized as instances of a descriptive or a metalinguistic usage of negation. Our starting assumption is that the discussion about the two types of negation is very much in line with claims put forward in current models about dialogue behaviour. One central claim in many of these models is that dialogue partners are continuously monitoring the flow of the interaction, and notify each other whenever something is wrong. This is reflected in the following rule for dialogue behavior from Groenendijk et al. (1996): Rule H2 If a sentence is uttered which is incompatible with a participant’s information state, then she does not update with it, but signals the incompatibility by uttering a sentence that contradicts the sentence uttered.
We take it that metalinguistic negations are examples of ‘sentences contradicting the previously uttered sentence’. They function as a negative, ‘go back’ signal, indicating that there is an apparent communication problem; a discrepancy between the last utterance of the addressee and the information state of the current speaker. If there are no communication problems, the speaker sends a positive, ‘go on’ signal. Our hypothesis is that speakers use more prosodically marked features in the case of ‘go back’ signals (indicating a communication problem) than in the case of ‘go on’ signals. The intuition is that it is more important for an addressee to pick up a ‘go back’ signal than it is to pick up a ‘go on’ signal (see also Clark and Schaeffer 1989). If a ‘go on’ signal is missed, this does not hamper the communication; it can continue anyway. The expectation that go-back signals are provided with prominent
MEANING, INTONATION AND NEGATION
201
prosodic features is in line with Horn’s claim that metalinguistic forms of negation are ‘marked’. To test the hypothesis a corpus of human-machine dialogues was used.6 This corpus consists of 120 dialogues with two speaker-independent Dutch spoken dialogue systems which provide train time table information (see Weegels 2002). In a series of questions, both systems prompt the user for unknown slots, such as departure station, arrival station, date, etc. Twenty subjects were asked to query both systems via telephone on a number of train journeys. They had to perform three simple travel queries on each system (in total six tasks). In the corpus used in this study, subjects may use disconfirmations in response to two kinds of questions, of which (74.a) and (74.b) are representative examples. (74) a. So you want to go from Eindhoven to Swalmen? b. Do you want me to repeat the connection? Both (74.a) and (74.b) are yes/no questions and to both “no” is a perfectly natural answer. However, the two questions serve a rather different goal, and consequently the corresponding negations have a rather different function. Question (74.a) is an (explicit) attempt of the system to verify whether its current assumptions (about the departure and arrival station) are compatible with the intentions of the subject.7 If this is not the case, the subject will signal this (in line with rule H2 above) using a metalinguistic negation, thereby indicating that at least one of the system’s assumptions is incorrect: (75) No, not to Swalmen but to Reuver. (Compare example (73) above.) Question (74.b), on the other hand, is not an attempt of the system to verify its assumptions, and hence it cannot represent incorrect system assumptions. A subsequent negative 6
The current discussion of negation is part of a wider research programme to study communication problems in human-machine conversation (see e.g., Krahmer et al., 2002). Given the current state of the art in speech technology, spoken communication with computers is still error-prone. Moreover, computers find it difficult to monitor the ongoing dialogue. If they would be able to distinguish descriptive negations (which do not signal communication problems) from metalinguistic negations (which do), this would be helpful from an error-handling point of view. 7 Due to the imperfections of automatic speech recognition technology, current state dialogue systems are in constant need of verification.
202
SWERTS AND KRAHMER
answer from a subject thus serves as an ‘ordinary’, descriptive negation. A typical example would be:8 (76) No, that is not necessary. So, the two kinds of system yes/no questions allow for an unambiguous distinction between descriptive and metalinguistic negation. The respective disconfirmations, being lexically similar but functionally different, constitute minimal pairs, allowing us to check whether the various occurrences of this kind of utterance vary prosodically as a function of their context. In this way, they form ideal, naturally occurring speech materials for investigating the role of prosody, which can be analysed both from a speaker and listener perspective, as will be illustrated in the following sections.
3. Experimental analyses
3.1. Speaker’s Perspective: Production Experiment Method To study the speaker’s perspective we randomly selected 109 negative answers to yes/no questions from the 120 dialogues. If a negative answer follows a verification question (such as (74.a)), the subject’s utterance indicates that there are communication problems. This is the case for 68 of the 109 negative answers (62%). If a negative answer follows a standard yes/no question (like (74.b)) there are no communication problems (notated as no problems). These are the remaining 41 cases (38%). Regarding their structure, the subjects’ negations were divided into three categories: (1) responses only consisting of a single explicit disconfirmation marker “no” (“nee”), (2) responses consisting of an explicit disconfirmation marker followed by other words (‘no+stuff’ in the terminology of Hockey et al., 1997), (3) responses containing no explicit disconfirmation marker (‘stuff’).9 8
The original Dutch utterance is Nee, dat is niet nodig, and — significantly— the negation phrase could also have been incorporated prefixally (Nee, dat is onnodig) Compare example (72) above. 9 As we shall see, metalinguistic negations may occur which do not contain an explicit negation. An example would be the second turn in the following exchange: A: Thomas ate some cookies. B: He ate all cookies!
MEANING, INTONATION AND NEGATION
203
Table I. Numbers of negative answer types following an unproblematic system utterance (no problems) and following those containing one or more problems (problems). Type
no problems
problems
Total
no stuff no+stuff
18 0 23
11 24 33
29 24 56
Total
41
68
109
The subjects’ responses to the yes/no questions were analyzed in terms of the following features: (1) presence or absence of a high boundary tone following “no”; (2) duration (in ms) of “no”; (3) duration (in ms) of pause after “no” before stuff; (4) duration (in ms) of pause between system’s prompt and user response; (5) F 0 max10 (in Hz) at energy peak of major pitch accent in stuff; (6) number of words in stuff. Results Table I gives the distribution of different types of negation following either an unproblematic system utterance or one which contains one or more problems. A χ2 test reveals that these numbers significantly differ from chance level (p < 0.001). First, this table shows that the minimal response, a single no, is in the majority of the cases used when there are no communication problems. Second, single stuff responses are exclusively reserved for responses following a system utterance with one or more problems. The majority of the responses to yes/no questions in our data, however, is of the no+stuff type, which may serve either as a descriptive or as a metalinguistic negation. The lexical material in the stuff is quite different for the two signals: for the positive cases, the subsequent words are mostly some polite phrases (“thank you”, “that’s right”); for the metalinguistic cases, the stuff usually is an attempt to correct the information which caused the problems (i.e., what Horn called the ‘rectification’). Table II displays the presence or absence of high boundary tones (H% in the terminology of Pierrehumbert 1980) on the word “no” (for the single no and no+stuff cases). A χ2 test reveals that this distribution is again well above chance level (p < 0.001). In the case 10
F 0 stands for fundamental frequency; changes in the fundamental frequency are the most commonly used approximation of perceived pitch variations.
204
SWERTS AND KRAHMER
Table II. Presence or absence of high boundary tones following occurrences of “no” (single no and no+stuff, 29 and 56 cases respectively) for positive and negative cues. High boundary tone
no problems
problems
Total
Absent Present
32 9
7 37
39 46
Total
41
44
85
Table III. Average values for various features. Duration of “no” (for all occurrences of “no”: single no and no+stuff), delay between end of system utterance and beginning of user’s disconfirmation (all cases), pause between “no” and stuff (for no+stuff cases), F 0 max in stuff and number of words in stuff (both for no+stuff and stuff). Feature Duration of “no” (ms)∗∗ Preceding delay (ms)∗∗ Following pause (ms)∗ F 0 max in stuff (Hz)∗ Words in stuff∗∗ ∗∗
no problems
problems
226 516 94 175 2.61
343 953 311 216 5.42
p < 0.001, ∗ p < 0.05
of problems, the “no” is generally provided with a question-like H% boundary tone, which is absent when “no” follows an unproblematic system utterance. The results for the continuous prosodic features of interest are given in Table III. Taking the utterances of all subjects together, a t-test reveals a significant difference for each of these features. The trend is the same in all cases: corrective, metalinguistic negations are comparatively marked. First, the word “no” —when it occurs— is longer. Second, there is a longer delay after a problematic system prompt before subjects respond. Third, in the no+stuff utterances, the interval between “no” and the remainder of the utterance is longer. Fourth, the stuff part of the answer usually contains a high-pitched accent to mark corrected information, whereas in the unproblematic case the stuff is usually prosodically unmarked. Finally, the stuff part tends to be longer in number of words. In sum: there are clear prosodic differences between metalinguistic and descriptive negations.
MEANING, INTONATION AND NEGATION
205
3.2. Hearer’s Perspective: Perception Experiment Method It seems a reasonable hypothesis that when speakers systematically dress up their utterances with certain features, hearers will be able to attach communicative relevance to the presence or absence of these features. To test if this is indeed the case for the acoustic properties of utterances of “no” described in Section 3.1, a perception experiment was carried out. For this experiment we used 40 “no”s, all taken from no+stuff disconfirmations. We opted for no+stuff disconfirmations since these are the most frequent. In addition, they are equally likely to be used following problematic and unproblematic system utterances from a distributional perspective (see Table I), and are thus least biased in terms of their function as positive or negative cues. For the perception study, we only used the “no”-part of these utterances, given that the stuff-part would be too informative about their function (compare answers A1 and A2 in (71)). Of the 40 “no”s, 20 functioned as a descriptive negation and 20 as a metalinguistic negation. Subjects of the perception experiment were 25 native speakers of Dutch. They were presented with the 40 stimuli, each time in a different random order to compensate for any potential learning effects. They heard each stimulus only once. The experiment was self-paced and no feedback was given on previous choices. In an individual, forced choice task, the subjects were instructed to judge for each “no” they heard whether the speaker signaled a communication problem or not. They were not given any hints as to what cues they should focus on. The subjects were first presented with four “exercise” stimuli to make them aware of the experimental platform and the type of stimuli. It is worth stressing that the choice to use only “no”s extracted from no+stuff answers implies that not all the acoustic features which speakers employ (see above) survive in the current perceptual analysis. In particular, we lose the features delay (time between end of prompt and start of subject’s answer), pause (time between end of “no” and beginning of stuff) as well as any potential cues in the stuff part (e.g., number of words, narrow-focused pitch accents). Results Table IV summarizes the results of the perception experiment. For each stimulus, a χ2 test was used to check whether there was a significant preference for perceiving an utterances as signalling no problem or as signalling a problem. Of the descriptive negations, 17 out of 20
206
SWERTS AND KRAHMER Table IV. Perceived classification of positive and negative signals. Perceived as no problems
No significant difference
Perceived as problems
Total
no problems problems
17 1
3 4
0 15
20 20
Total
18
7
15
40
were classified by a significant number of subjects as cases in which the speaker did not signal a problem. The remaining three cases were in the expected direction, though not significant. Of the metalinguistic negations, 15 out of 20 cases were classified correctly as instances of “no” signaling problems. Interestingly one metalinguistic negation was significantly misclassified as a descriptive negation. A post-hoc acoustic analysis of this “no” revealed that it shared its primary characteristics with ordinary descriptive negations. In particular: the “no” was relatively short, and lacked a high boundary tone. Table IV clearly shows that subjects are good at correctly classifying instances of “no”, extracted from no+stuff utterances, as descriptive or metalinguistic negations.
4. Discussion: Meaning, Intonation and Optimality
We have studied the differences between two kinds of negation, descriptive and metalinguistic, and did so from two perspectives. The production perspective showed that there are a number of significant prosodic differences between the two. Metalinguistic negations tend to have high boundary tones (in line with Liberman and Sag 1974), the negation phrase itself is relatively long, is preceded and followed by longer pauses and the continuation has a relatively high pitch peak. This pitch peak is placed on the corrected item (the rectification) and has a narrow focus. A typical example is the following, in which the speaker indicates she doesn’t want to go to Amsterdam: (77) No, to Opdam. It is interesting to observe that the pitch accent occurs on the syllable ‘op-’, while normally it would occur on ‘-dam’.
MEANING, INTONATION AND NEGATION
207
Descriptive negations, on the other hand, are usually not realized with a high boundary tone, are preceded and followed by shorter pauses, and have a relatively flat continuation (in the no+stuff cases). In addition, it is interesting to observe that a single negation ‘no’ is most likely to be descriptive, while single stuff is exclusively associated with the metalinguistic case. Thus, it appears that speakers produce metalinguistic and descriptive negations in prosodically different ways. The perception experiment confirmed this, in the sense that hearers were quite capable of predicting whether the word ‘no’ signalled a problem or not (i.e., whether it was used metalinguistically or descriptively). They could do this with utterances which display only a small subset of the relevant prosodic features, and without contextual or lexical information. These findings are interesting from a semantic point of view: they give people who assume that negation is ambiguous (be it semantic or pragmatic) an empirical argument for postulating such an ambiguity. It has been noted that other constructions (such as conditionals, questions, etc.) can also be used metalinguistically. Consider the following examples (from Beaver, 1997 and Horn, 1985, respectively): (78) a. If Mary knows that Bill is happy, then I’m a Dutchman – she merely believes it. b. You did what with Sally and Billy? We conjecture that some of the intonative properties which distinguish metalinguistic from descriptive negation can also be found in other metalinguistic phenomena. The relation between meaning and intonation is a highly complex one. We have argued that to investigate this relation an experimental approach is called for, in particular one in which both the speaker’s perspective and the hearer’s perspective are taken into account. One obvious methodological advantage of doing experiments with different speakers and listeners is that one gains insight into inter- and intrasubject agreement, both in terms of production and perception, and that it provides a handle on how to deal with the intrinsic variability between subjects regarding intonational matters. In addition, it is instrumental in determining what is essential (that which many subjects agree on) and what is peripheral (those features regarding which there is little consensus). It is difficult to imagine how this distinction can be made on the basis of researchers’ intuitions alone. It should be pointed out that this methodological approach combining production and perception studies is very general. For instance, we have also applied it to
208
SWERTS AND KRAHMER
the study of differences in accent types (Krahmer and Swerts, 2001) and in a cross-linguistic study of focus (Swerts et al., 2002). The motivation for looking both at speakers and listeners is that it does justice to our belief that a feature can only be communicatively relevant if it is not only encoded in the speech signal by a speaker, but can also be interpreted by a listener. There is an interesting parallel with Optimality Theory (ot) here; ot syntacticians tend to focus on the speaker perspective, while ot semanticists (such as Hendriks and de Hoop, 2001) tend to focus on the hearer. Recently, there has been an increased interest in combining the two perspectives (see e.g., Beaver, 2004 for such a plea). In fact, we believe that an ot-like framework is eminently suitable to model the intricate relationship between intonation and meaning. First of all, it is clear that whatever meaning intonational contours may have, they can easily be ‘overruled’ by features from other linguistic levels or by the situational context. This has, for instance, been illustrated by Geluykens (1987) who showed that the classification of intonation contours as statements or questions is influenced to a high extent by the lexical content of the utterances on which they occur. (79) a. You feel ill. b I feel ill. He tested this perceptually using sentences with a declarative syntax, finding that high-ending contours are more likely to trigger an interrogative interpretation if they occur on question-prone utterances like (79.a) than on statement-prone utterances like (79.b). This difference can be explained by the observation that it is easier to make statements about one’s own internal state than about those of other people.11 Or consider an utterance like (80) You fucking idiot. spoken to the driver of a car that just hit the speaker’s car. Whatever contour the speaker would put on that utterance, it will be difficult to seriously affect its intended basic meaning. In this paper, we have seen that user responses to communication problems contain prosodic but also non-prosodic cues, like the lexical material in the stuff part of the no+stuff utterances. These cues may even conflict, as shown by the 11
See e.g., Beun (1990) for an alternative analysis in terms of shared knowledge and Safarova and Swerts (2004) for further discussion.
MEANING, INTONATION AND NEGATION
209
particular example of the single metalinguistic negation consistently classified as a descriptive negation (see Table IV); even though the prosodic features of “no” suggested that there were no communication problems, this is overruled by the lexical material in the stuff (“not to Amsterdam, to Opdam!”).12 Note that for an ot approach which has something interesting to say about the relation between intonation and meaning, it is essential to integrate different levels of linguistic analysis into a single tableau. A plea for such an integrated approach can also be found, albeit for different reasons, in Beaver (2004), who notes that one of the main advantages of ot is that it provides us with a new way of looking at the syntax-semantics-pragmatics interface and enables us to make the interconnections between these components explicit using relational constraints. A final interesting aspect of ot is that it offers a handle on intonational variation across languages. The idea is that many of the ot constraints are universal, although the ranking may differ across languages. We conjecture that the usage of marked prosodic features to signal communication problems is a universal phenomenon. Similar effects have been found in quite different types of human-human and human-machine interactions, collected for Japanese (Shimojima et al., 2002) and American English (Swerts et al., 2000).
References Bos, J.: 2002, Underspecification and resolution in discourse semantics. Ph.D. Thesis, Saarland University, Saarbr¨ ucken. 12
An open research question related to this would be how non-categorical features can be integrated in an ot approach. In principle ot constraints are universal restrictions (e.g., “all feet are right-aligned with the right edge of the word”, McCarthy & Prince 1993). However, as shown here, some aspects are of a scalar nature, such as gradient differences in duration and pauses. Moreover, the more features apply, the stronger the semantic effect. Our view is that it may be possible to connect scalar values to differences in constraints (see also Boersma 1998 and Boersma and Hayes 2001). For instance, prominence judgments seem to be gradient in that some differences in pitch range and loudness represent differences in cue strength, which influences the perception of accents, though it is unclear whether the gradient information is truly continuous or whether those types of continua can be divided into reliable categories which are invariant across speakers.
210
SWERTS AND KRAHMER
Atlas, J.: 1977, Negation, Ambiguity and Presupposition, Linguistics and Philosophy 1:321–336. Beaver, D.: 1997, Presupposition, in: Handbook of Logic and Language, J. van Benthem & A. ter Meulen (eds.), Amsterdam: Elsevier Science Publishers, pp. 939–1008. Beaver, D. and E. Krahmer: 2001, A Partial Account of Presupposition Projection Journal of Logic, Language and Information, 10:147–182. Beaver, D.: 2004, The Optimization of Discourse Anaphora, Linguistics and Philosophy, 27(1): 3–56. Beun, R.J.: 1990, The recognition of Dutch declarative questions, Journal of Pragmatics 14: 39–56. Bochvar, D.: 1939, Ob odnom trehznachom iscislenii i ego primeneii k analizu paradoksov klassicskogo rassirennogo funkcional ‘nogo iscislenija’. In: Matematiciskij sbornik , 4 (English translation)1981): On a Three-valued Logical Calculus and Its Applications to the Analysis of the Paradoxes of the Classical Extended Functional Calculus, History and Philosophy of Logic 2:87–112). Boersma, P.: 1998, Functional Phonology, doctoral dissertation, University of Amsterdam. Boersma, P. and B. Hayes: 2001, Empirical Tests of the Gradual Learning Algorithm, Linguistic Inquiry 32: 45–86. Bolinger, D.: 1986, Intonation and its Parts, London: Edward Arnold. Clark, H. and E. Scheffer: 1989, Contributing to Discourse, Cognitive Science 13:259–294. Cutler, A.: 1977, The Context-dependence of “Intonational Meanings”, Papers from the 13th regional meeting of the Chicago Linguistic Society, pp. 104– 115. Fodor, J.D.: 2002, Prosodic Disambiguation in Silent Reading, Proceedings of NELS 32 , M. Hirotani (ed.]Amherst, MA, pp. 113–132. Gazdar, G.: 1977, Pragmatics, New York, Academic Press. Geluykens, R.: 1987, Intonation and Speech Act Type, Journal of Pragmatics, 11: 483–94. Groenendijk, J., M. Stokhof and F. Veltman: 1996, Coreference and Modality in the Context of Multi-speaker Discourse, in Context Dependence in the Analysis of Linguistic Meaning, H. Kamp and B. Partee (eds.), Stuttgart: IMS, pp. 195–216. ’t Hart, H., R. Collier and A. Cohen: 1990, A Perceptual Study of Intonation: An Experimental-phonetic Approach to Speech Melody, Cambridge University Press. Heim, I.: 1982, The Semantics of Definite and Indefinite Noun Phrases, doctoral dissertation, University of Massachusetts, Amherst. Hendriks, H.: 2002, Information Packaging: From Cards to Boxes, in: Information Sharing: Reference and Presupposition in Language Generation and Interpretation, K. van Deemter and R. Kibble (eds.), Stanford: CSLI Publications, pp. 1–34.
MEANING, INTONATION AND NEGATION
211
Hendriks, P. and H. de Hoop: 2001, Optimality Theoretic Semantics, Linguistics and Philosophy 24:1–32. Herman, R. and J. McGory: 2002, The Conceptual Similarity of Intonational Tones and its Effects on Intertranscriber Reliability, Language and Speech 45(1):1–36. Hockey, B., D. Rossen-Knill, B. Spejewski, M. Stone and S. Isard: 1997, Can You Predict Answers to Y/N Questions? Yes, no and stuff, in: Proceedings Eurospeech, Rhodos, Greece, pp. 2267–2270. Horn, L.: 1985, Metalinguistic Negation and Pragmatic Ambiguity, Language, 61:121–174. Krahmer, E. and M. Swerts: 2001, On the Alleged Existence of Contrastive Accents, Speech Communication, 34:391–405. Krahmer, E., M. Swerts, M. Theune and M. Weegels: 2002, The Dual of Denial: Two Uses of Disconfirmations in Dialogue and Their Prosodic Correlates, Speech Communication, 36(1-2):133–145. Ladd, D.: 1983, Phonological Features of Intonational Meaning, Language 59: 721–759. Liberman, M. and I. Sag: 1974, Prosodic Form and Discourse Function, Papers from the 10th regional meeting of the Chicago Linguistic Society, pp. 402– 415. McCarthy, J. & A. Prince: 1993, Generalized Alignment, in: Yearbook of Morphology, G. Booy and J. van Marle (eds.), Dordrecht: Kluwer Academic Publishers, pp. 79–153. Pierrehumbert, J.: 1980, The Phonology and Phonetics of English Intonation, doctoral dissertation, MIT. Pierrehumbert, J. and J. Hirschberg: 1990, The Meaning of Intonational Contours in the Interpretation of Discourse, in: Intentions in Communication, P. Cohen, J. Morgan, and M. Pollack (eds.), Cambridge MA: MIT Press, pp. 342–365. Russell, B.: 1905, On Denoting, Mind 14:479–493. Safarova, M. and M. Swerts: 2004, On recognition of declarative questions in English, Proceedings of Speech Prosody, Nara, Japan, pp. 313–316. Schwarzschild, R. : 1999, GIVENness, Avoid F and other Constraints on the Placement of Focus, Natural Language Semantics 7(2): 141–177. Shimojima, A., Y. Katagiri, H. Koiso and M. Swerts: 2001, The Informational and Dialogue-coordinating Functions of Prosodic Features of Japanese Echoic Responses. Speech Communication, 36 (1-2): 113–132. Silverman, K., M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert and J. Hirschberg: 1992, ToBI: A Standard for Labelling English Prosody, Proceedings of Second International Conference on Spoken Language Processing (ICSLP), Banff, Canada, vol. II, pp. 867–870. Steedman, M.: 2000, Information Structure and the Syntax Phonology Interface, Linguistic Inquiry, 31(4): 649–689.
212
SWERTS AND KRAHMER
Swerts, M., D. Litman and J. Hirschberg: 2000, Corrections in Spoken Dialogue Systems, Proceedings of Sixth International Conference on Spoken Language Processing, Beijing, China, vol. II, pp. 615–619. Swerts, M., E. Krahmer and C. Avesani: 2002, Prosodic Marking of Information Status in Italian and Dutch: A comparative analysis, Journal of Phonetics, 30(4): 629–654. Vallduv´ı, E.: 1992, The Informational Component, New York: Garland. Weegels, M.: 2002, Users’ Conceptions of Voice-Operated Information Services, Journal of Speech Technology 3(2): 75–82.
MYROSLAVA DZIKOVSKA, MARY SWIFT AND JAMES ALLEN
CUSTOMIZING MEANING: BUILDING DOMAIN-SPECIFIC SEMANTIC REPRESENTATIONS FROM A GENERIC LEXICON
1. Introduction
Most practical dialogue systems are developed for specific domains to maximize performance efficiency. Back-end system components use a knowledge representation tailored to application needs, and the language input must be converted into that representation. This is traditionally achieved by linking the lexical definitions directly to the concepts in the domain-specific ontology. This linking is also commonly used to bring in domain-specific selectional restrictions to increase parsing efficiency. In this approach, adapting the system to a new domain requires relinking the lexicon to the new ontology. We propose an alternative method that is an easy, transparent way to achieve domain specificity from a broad-coverage deep parser. In our approach, we maintain two ontologies: domain-independent for the parser and domain-specific for the knowledge representation, and we define a set of mappings between domain-specific knowledge sources and the semantic representations generated by the parser. Our method allows us to easily obtain domain-specific semantic representations without modifying the lexicon or grammar. We also use the mappings to specialize the lexicon to the domain, resulting in substantial improvement in parsing speed and accuracy. In this chapter, we describe our customization method and illustrate how it facilitates our approach to semantic type coercion by combining lexical representations with domain-specific constraints on interpretation. The customization method described here was developed in the process of adapting the TRIPS dialogue system (Allen et al., 2001) to several different domains, including a transportation routing system (Allen et al., 1996) and a medication scheduling system (Ferguson et al., 2002). We assume a dialogue system architecture (Allen et al., 2000) that includes a speech module, a parser, an interpretation manager (responsible for contextual processing and dialogue management), and 213 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 213–231. c 2007 Springer.
214
DZIKOVSKA ET AL.
a back-end application responsible for the general problem-solving behavior of the system. The system architecture is shown in Figure 19. Our philosophy is to keep all components as domain-independent as possible for easy portability, and to develop tools to facilitate component customization to different domains. In particular, our goal is to develop a parser and grammar that can handle language input from different application domains, but still retain speed and efficiency, and produce output suitable for domain-specific reasoning done by the behavioral agent and domain reasoners.
Speech Recognition
GUI Events
Parser
Response Planner
Reference Manager Discourse Context
Interpretation Manager
Generation Manager
Interpretation
Generation Task Manager
Planner
Scheduler
Behavioral Agent
Behavior
...
Reporters
Figure 19.
Speech Synthesis
Graphical Displays
Monitors
...
The architecture of the TRIPS dialogue system
2. Background
The most common approach for building robust and customizable parses for different domains is the use of semantic grammars. In a semantic grammar, the lexicon entries are linked to frames in a domain-specific representation. During parsing, lexical items are matched with frame slots they can fill, and unfilled slots can be queried from context. Probabilities of matching the words with frame slots can be trained on a corpus to achieve the best accuracy. For example, the TINA
CUSTOMIZING MEANING
215
parser (Seneff, 1992) in the multi-domain dialogue system GALAXY (Goddeau et al., 1994) provides a mechanism for semi-automatically learning a probabilistic model from a training corpus. Similarly, parsers for information extraction use probabilistic models trained on text corpora to learn subcategorization frames and selectional restrictions in a given domain (see e.g. Utsuro and Matsumoto, 1997). Semantic grammars offer a nice abstraction across a range of information retrieval applications, and robust language interpretation for small domains. They work best for the tasks which can be encoded by a small set of variables, with dialogue state encoded as a set of contexts. In a conversational interface where intention recognition is critical, and which requires plan- and agent-based dialogue models, more detailed semantic representations are required, such as those that can be obtained with a broad-coverage deep grammar (Allen et al., 2001). Linguistically motivated deep parsers have been used in dialogue systems, most notably the LINGO grammar (Copestake and Flickinger, 2000) used for the Verbmobil project (Wahlster, 2000) caused by ambiguity. Syntactic constraints alone are not sufficient to disambiguate word meanings or structure, and when methods for handling speech fragments, such as bottom-up chart parsing, error correction rules and lattice-based inputs are added, they seriously aggravate already existing efficiency and accuracy problems of the current parsers. When a large enough training corpus is available, speed and accuracy can be improved by adjusting probabilities of grammar rules to best reflect the corpus. However, collecting training corpora is considerably more difficult and expensive for speech applications than for text applications, since it requires recording and transcribing live speech. When suitable training corpora are not available, a common solution is to use selectional restrictions to limit the search space or to filter out results that could not be disambiguated by the parser. This approach is employed, for example, in the GEMINI system (Dowding et al., 1993). Selectional restrictions encoded in the system lexicon considerably speed up parsing and improve disambiguation accuracy in parsing of in-domain sentences. However, another issue that needs to be addressed in porting dialogue systems between domains is linking the lexical entries to the domain semantic representation. TINA lexical entries specify the frames to which the words are linked, and GALAXY requires that all system components use a shared ontology. Therefore, for new domains the system lexicon needs to be re-linked to the new ontology. Similarly,
216
DZIKOVSKA ET AL.
GEMINI encodes the selectional restrictions in the system lexicon, which needs to be changed for each new domain. The AUTOSEM system (Ros´e, 2000) makes re-linking easier by separating the re-usable syntactic information from the links to the domain ontology. It uses COMLEX (Macleod et al., 1994) as a source of reusable syntactic information. The subcategorization frames in the lexicon are manually linked to the domain-specific knowledge representation. The linking is performed directly from syntactic arguments (e.g. subject, object) to the slots in a frame-like domain representation output by the parser. Ros´e’s approach speeds up the process of developing tutoring systems in multiple domains. Similarly, McDonald (1996) maps the output of a partial parser to the semantic representation for information extraction to improve parsing speed and accuracy. While AUTOSEM re-uses syntactic information across domains, it does not provide a way to re-use common semantic properties of words. In our approach, we introduce an intermediate layer of abstraction: a generic ontology for the parser (the LF Ontology) that is linked to the lexicon and preserved across domains. In this way, we preserve basic semantic features associated with lexical entries (e.g. whether a word represents an event or an object) as well as some general selectional restrictions that do not change across our domains (e.g. the verb cancel takes an action or event as an object argument). The parser uses this ontology to supply meaning representations of the input speech to the interpretation manager, which handles contextual processing and dialogue management and interfaces with the back-end application, as shown in Figure 19. The domain-specific ontology ontology used for reasoning (the KR ontology) is localized in the back-end application. We then customize the communication between the parser/interpretation manager and the back-end application via a set of mappings between the LF and KR ontologies, as described in Section 4. Our method of separating domain-specific and domain-independent ontologies has a number of advantages. First, it allows developers to write mappings in semantic terms at a higher level of abstraction, so there is no need to address the details of the grammar and subcategorization frames such as those used in COMLEX. Developers can instead use descriptive labels for semantic arguments, such as AGENT, THEME, etc. Second, it allows developers to take advantage of the hierarchical structure of the domain-independent ontology and write mappings that cover large classes of words (see example in Section 4). Third, the mappings are used to convert the generic representation into the particular form utilized by the back-end application, either a
CUSTOMIZING MEANING
217
frame-like structure or a predicate logic representation, without changing the grammar rules, as described in (Dzikovska et al., 2002). Finally, the lexicon is specialized to the domain via the mappings, which both improves parsing speed and accuracy, and provides a transparent way to map lexical forms to domain-specific meanings.
3. Domain-independent representation
3.1. The LF Ontology Entries in the generic lexicon are linked to the LF ontology, a domainindependent ontology for the parser. The LF ontology is kept as general as possible so it can be used across multiple domains. The LF ontology consists of a set of representations (LF types) that classify entities corresponding to (classes of) lexical items in terms of argument structure and selectional restrictions, with a hierarchical structure inspired by FrameNet (Johnson and Fillmore, 2000). Every LF type declares a set of thematic arguments with selectional restrictions. The LF ontology is used in conjunction with a unification-based grammar that covers a wide range of syntactic structures. The LF types are organized in a single-inheritance hierarchy. We implement multiple inheritance via semantic feature vectors associated with each LF type. The features correspond to basic meaning components and are based on the EuroWordNet (Vossen, 1997) feature system with some additional features we have found useful across domains. While the same distinctions can be represented in a multiple inheritance hierarchy, a feature-based representation makes it easy to implement an efficient type-matching algorithm based on (Miller and Schubert, 1988). More importantly, using semantic feature vectors allows us to easily augment semantic information associated with a lexical entry during the customization process and the semantic coercion operations described below. Our choice of the LF types and semantic features included in the LF ontology is linguistically motivated, using an approach similar to that of Lascarides and Copestake (1998), who encode the knowledge “appropriate as a locus for linguistically relevant generalizations” in their lexical semantic hierarchy. The semantic features are organized into five basic clusters: ft physobj which are the objects that can be seen and felt in real world, ft situation, which include states and events, ft time, which are references to times, and ft abstr-obj, which are other abstract objects and properties, e.g. weight; and ft proposition, which denote entities which
218
DZIKOVSKA ET AL.
can be true or false, denied or confirmed, e.g. ideas, plans. Some of the features associated with physical objects, situations and times are shown in Figure 20. FT Phys-obj Form Substance solid, liquid, gas Object solid-object, hole, geo-object, enclosure Origin Natural living human, plant animal, creature non-living artifact Object-function covering, comestible, etc. Mobility fixed movable self-moving non-self-moving Information (+/-) Intentional (+/-) Container (+/-) Spatial abstraction line, strip, point, region Figure 20. lexicon
FT Situation Aspect Static indiv-level, stage-level Dynamic Bounded, unbounded Time-span atomic, extended Cause Force agentive, phenomenal Stimulating Mental Trajectory (+/-) FT time Function Time-of-day clock-time, day-part Time-of-year frequency time-interval time-unit
Some feature set dimensions in the TRIPS domain-independent
For physical objects, we mostly kept the features defined in EuroWordNet. These are form, which differentiates solid objects from substances, origin, which differentiates natural (living and non-living)
CUSTOMIZING MEANING
219
things from artifacts, and object-function, which classifies the objects by the main function they perform in real world. We additionally defined feature mobility to handle the distinctions between objects that are inherently fixed, e.g. cities and oceans, and objects that can be easily moved, e.g. people and trucks. The feature spatial-abstraction is intended for handling spatial properties of objects (Dzikovska and Byron, 2000), to capture the distinctions between objects such as roads, which are best visualized as lines or strips, and cities, which can be visualized as either points on the map or larger regions with defined area and borders. We use the features aspect and time-span to represent event structure based on Moens and Steedman (1987). The cause feature can be Force, either an intentional agent or a natural phenomenon, Stimulating, set for verbs which tend to have an experiencer role, such as see, smell, and Mental, which is some internal non-agentive process, such as love or hate. We also added the trajectory feature to differentiate the situation types that can be easily modified by trajectory adverbials such as FROM-LOC and TO-LOC. For times, the function feature is designed to describe the most common functions of temporal expressions we observed in our corpora. Our feature dimensions are generally orthogonal, but there are dependencies between some feature values. For example, if something is marked as a human being, it is also a solid object. We express these dependencies in feature inference rules, such as the one shown in Figure 21. When a feature vector is processed in the system, if the origin feature is set to (origin human), the form feature, if not already specified, will be automatically filled with (form solid-object). This mechanism is also used for specializing feature vectors to the domain as described in Section 4.2. (origin human) → (form solid-object) (intentional +) Figure 21.
An example feature inference rule in the TRIPS lexicon
3.2. The Domain Independent Lexicon Word senses are treated as leaves of the semantic hierarchy. The following information is specified for every word sense in the lexicon: − Syntactic features such as agreement, morphology, etc.; − LF type;
220
DZIKOVSKA ET AL.
(define-type LF CONSUME :semfeatures (Situation (aspect dynamic) (cause force) :arguments (AGENT (Phys-obj (form object))) (THEME (Phys-obj (form substance)))) (define-type LF DRUG :semfeatures (Phys-obj (form substance))) Figure 22.
LF type definitions for LF CONSUME and LF DRUG
− The semantic feature vector (mostly inherited from LF type definition) − The subcategorization frame and syntax-semantics mappings. To illustrate, consider the definition for the verb take in the sense to consume substances, as in take aspirin. The LF type definition for the consume sense of take is shown in Figure 22. It specifies a generic semantic feature vector associated with the type and selectional restrictions on its arguments. Intuitively, LF Consume defines a dynamic event in which some physical object (AGENT) consumes some substance. The lexicon entry for take (shown in Figure 23) is linked to the LF type definition by mapping the syntactic roles to the semantic argument labels in the LF type. The selectional restrictions specified in the LF type arguments are propagated into the lexicon. When trying to create a verb phrase, the parser checks that the semantic feature vector specified in the argument restriction matches the semantic feature vector associated with the noun phrase. Thus, only noun phrases marked as substances in their feature vectors (form substance) are accepted as direct objects of verb take in the consume sense. The parser uses the LF types to construct a general semantic representation (the base logical form) of the input language, which is a flattened and unscoped logical form using reified events (Davidson, 1967). A base logical form for take aspirin is shown in Figure 24. The representation is a conjunction of terms in the form (<specifier> <argument>*) where the specifier can be “F” for predicates produced by verbs and adjectives, a quantifier for noun phrases, and IMPRO for implicit arguments, such as subjects of imperatives. Note that in this chapter we omit tense, aspect and speech act information from our examples for simplicity.
CUSTOMIZING MEANING
221
(take :lf LF CONSUME*take :semfeatures (Situation (aspect dynamic) (cause force))) :subject (NP (Role AGENT) (Restriction (Phys-obj (form object)))) :object (NP (Role THEME) (Restriction (Phys-obj (form substance)))) (aspirin :lf LF DRUG*aspirin :semfeatures (Phys-obj (form substance) (origin artifact))) Figure 23.
Lexicon entries for consume sense of take and for aspirin
(F V51016 LF CONSUME*take :THEME V51380 :AGENT V51537) (IMPRO V51537 :CONTEXT-REL +YOU+) (A V51380 LF DRUG*aspirin) Figure 24.
LF representation of take aspirin
4. Constructing domain-specific representations
4.1. Customizing parser output To produce domain-specific KR representations from the base logical form, we developed a method to customize parser output. The current system supports two knowledge representation formalisms often used by reasoners: a frame-like formalism where types have named slots, and a representation that has predicates with positional arguments. For this chapter, we assume a frame representation used by the reasoners. We use LF-to-frame transforms to convert from base logical form into a frame representation. These transforms specify the KR frame that the LF type maps to, the mappings between LF arguments and KR slots, and additional functions that can be applied to arguments during the transform process. These transforms can be simple and name the slot into which the value is placed, as in Figure 25, or more elaborate and specify an operator expression that is applied to the value, which we use, for example, for semantic type coercion, described in Section 5. The Interpretation Manager takes the parser’s base logical form, determines the most specific transform consistent with it, and uses the
222
DZIKOVSKA ET AL.
(a) (LF-to-frame-transform takemed-transform :pattern (LF CONSUME TAKEMED) :arguments (AGENT :ACTOR) (THEME :MEDICATION)) (LF-to-frame-transform drug-transform :pattern (LF DRUG :LF-FORM (lf-form-default MEDICINAL-SUBSTANCE)) (b) (define-class TAKEMED :isa ACTION :slots (:ACTOR PERSON) (:MEDICATION MEDICINAL-SUBSTANCE)) (define-class ASPIRIN :isa MEDICINAL-SUBSTANCE) (c) (TAKEMED v123) (:ACTOR v123 +USER+) (:MEDICATION v123 v456) (ASPIRIN v456) Figure 25. LF-to-frame-transform. (a) Transforms for LF CONSUME and LF DRUG types; (b) Definition of KR class TAKEMED that the transform maps into, and the class ASPIRIN, into which LF DRUG*aspirin will be mapped; (c) The KR frame that results from applying the transform in (a) to the consume event representation in Figure 24
transform to convert the base representation into the domain knowledge representation. To illustrate, consider the base logical form for take aspirin in Figure 24. The applicable transform is takemed-transform (Figure 25a), and when the Interpretation Manager applies it, the frame shown in Figure 25c is the result. The details of the process and the way transforms are used to adapt the base logical form to different possible formalisms used by reasoners are described by Dzikovska et al. (2002). Note that this single transform covers a class of words. For example, the same transform also covers have used in the consume sense, as in have an aspirin every day. Transforms can also utilize the lexical form of a word. For example, medication names are all grouped as leaves under the LF Drug type in our ontology, but the transform shown in Figure 25(a) for medicines uses the lexical form of the item transformed to determine the correct
CUSTOMIZING MEANING
223
KR class name for the mapping. In the figure, the :LF-FORM keyword indicates that when an entity of type LF DRUG is transformed, the lexical form of the item will be taken as the class name. For example, when LF DRUG*aspirin undergoes this transform, it will be converted to an instance of KR class ASPIRIN. If a class with the corresponding name does not exist, then the default class supplied in :lf-form-default argument will be used. 4.2. Lexicon Specialization We use the transforms described above in a post-processing stage to customize the generic parser output for the reasoners. We also use them in a pre-processing stage to specialize the lexicon, which speeds up parsing and improves semantic disambiguation accuracy by integrating the domain-specific semantic information into the lexicon and grammar. We pre-process every entry in the lexicon by determining all possible transforms that apply to its LF type. For each transform, we create a new sense definition identical to the existing generic definition plus a new feature kr-type in its semantic vector. The value of kr-type is the KR ontology class that results from applying this transform to the entry. Thus, we obtain a (possibly larger) set of entries which specify the KR class to which they belong. We then propagate type information into the syntactic arguments, making tighter selectional restrictions in the lexicon. This allows us to control the parser search space better and obtain greater parsing speed and accuracy. When specializing the lexicon to the medical domain, given the definition of the verb take and LF Consume in Figure 22, and the definitions in Figure 25, the system determines the applicable LF-toframe-transform, takemed-transform, and adds (kr-type takemed) to the feature vector of take. Next, it applies the argument mappings from the transform. For example, the mappings specify that the LF argument THEME maps to KR slot :medication, and therefore should be restricted to medicinal substances. Since THEME is realized as a direct object of take, (kr-type medicinal-substance) is added to the semantic vector in the object restriction. Similar transforms are applied to the rest of the arguments. As a result, a new definition of take with stricter selectional restrictions is added to the lexicon, and suitable objects of take must not only be substances, but also identified as medicines. Similarly, the definition of aspirin is specialized using the drug-transform rule in Figure 25a, and will be mapped to ASPIRIN, which is a subclass of
224
DZIKOVSKA ET AL.
KR type MEDICINAL-SUBSTANCE. The new definition is shown in Figure 26, with added features shown in bold. (take :lf LF CONSUME*take :semfeatures (Situation (aspect dynamic) (cause agentive))) :subject (NP (Role Agent) (Restriction (Phys-obj ((intentional +) (origin human) (form solid-object) (kr-type Person)))) :object (NP (Role Theme) (Restriction (Phys-obj (form substance) (kr-type Medicinal-Substance)))) Figure 26. The specialized entry for take in the CONSUME sense. The features changed or added during specialization are shown in bold
In the process of specialization, the parser uses a feature inference mechanism described in Section 3.1 as follows. For the KR specialization process we add rules that declare dependencies between the values of kr-type features and the values of domain independent features. Two sample rules for PERSON and MEDICINAL-SUBSTANCE values are shown in Figure 27. These feature rules, together with the domainindependent rule from 21, are used to further specialize the feature vectors in the lexical entry. In this case, this results in changing the generic (origin living) restriction on the agent of take to the more specific (origin human) value. The domain-specific feature inference is particularly useful in the way we handle semantic type coercion, described in Section 5. 4.3. Evaluation Lexicon specialization considerably speeds up the parsing process. We conducted an evaluation comparing parsing speed and accuracy on two sets of 50-best speech lattices produced by our speech recognizer: 34 sentences in the medical domain and 200 sentences in the transportation domain. (kr-type Person)⇒(phys-obj (origin human)) (kr-type Medicinal-substance)⇒(phys-obj (form substance)) Figure 27. Feature inference rules used to derive feature vectors for kr-type PERSON and kr-type SUBSTANCE in the medadvisor domain
225
CUSTOMIZING MEANING Table V. Some lexicon statistics in our system
# of senses # of KR classes # of mappings
Generic
Transportation
Medical
1947 -
2028 228 113
1954 182 95
Table VI. Average parsing time per lattice in seconds and sentence error rate for our specialized grammar compared to our generic grammar. Numbers in parentheses denote total time and error counts
# of sentences Time with KR (sec) Time with no KR (sec) Errors with KR Errors with no KR
Transportation
Medical
200 4.35 (870) 9.7(1944) 24%(47) 32% (65)
34 2.5 (84) 4.3 (146) 24% (8) 47% (16)
Table V describes the lexicon and ontologies used in these domains. The results presented in Table VI show that lexicon specialization considerably increases parsing speed and improves disambiguation accuracy. The times represent the average parsing time per lattice, and the errors are the number of cases in which the parser selected the incorrect word sequence out of the alternatives in the lattice.1 A part of a lattice for the sentence that looks good is shown in Figure 28. The highest scoring sequence is that looks gave it, but it is not syntactically correct. Using the syntactic information, the parser should be able to determine that that looks good is a better interpretation. To do that, it needs to be able to select between a large number of possible word sequences, which seriously complicates the parsing task. The improvement comes from two sources. The tighter selectional restrictions limit the search space and help to correctly disambiguate according to our domain knowledge. In addition, our parser has preference 1
Choices in which a different pronoun, article or tense form were substituted, e.g. can/could I tell my doctor were considered equivalent, but grammatical substitutions of a different word sense, e.g. drive/get the people were counted as errors.
226
DZIKOVSKA ET AL.
values associated with different senses, and correspondingly with constituents that use those senses. We increase the preference values for specialized entries, so they are tried first during parsing, which helps the parser find an interpretation for in-domain utterances faster.2 gave 0.78 that 0.7
0.7
looks 0.84
0.34
0.5
0.32
0.52 0.78
it
END good
0.01
0.08
0.12 0.78 bad
0.02 0.04
0.06
Figure 28. A sample lattice for that looks good, misrecognized as that looks gave it. Numbers at nodes denote confidence scores for lexical items, numbers on arcs denote the confidence scores on transitions
The amount of work involved in domain customization is relatively small. The lexicon and grammar stay essentially the same across domains, and a KR ontology must be defined for the use of back-end reasoners anyway. We need to write the transforms to connect the LF and KR ontologies, but as their number is small compared to the total number of sense entries in the lexicon and the number of words needed in every domain (see Table V), this represents an improvement over hand-crafting custom lexicons for every domain.
5. Domain-specific coercion
Certain semantic type coercions are frequent in our domains. For example, in our medical adviser domain, the word prescription frequently appears in contexts that require a word for (a type of) medication, as in (1) Which prescriptions do I need to take? Intuitively, this is understood to mean (2) Which medications specified by my prescriptions do I need to take? We have adopted a practical approach to such coercions by applying a domain-specific operator to the mismatched argument to 2
Unspecialized entries have a lower preference, so parses for out of domain utterances can be found if no domain-specific interpretation exists.
CUSTOMIZING MEANING
227
produce an entity of the coerced type. While this is a restricted approach compared to, for instance, Pustejovsky (1995) and Lascarides and Copestake (1998), we adopt it as a transparent method of handling our domain-specific coercions in the most efficient way for our system. The first problem is for the parser to recognize (1) as a valid utterance in spite of the semantic type mismatch, since prescription is not semantically typed as a consumable substance, which is required for an argument of take in its consume sense. An approach frequently taken by robust parsers (see e.g. Ros´e, 2000) is to relax the constraints in case of a type mismatch. However, if we need to construct a semantic representation for this utterance that is suitable for use by the backend reasoners, the problem is deeper than just finding a parse tree. If the literal meaning of prescriptions is used during the interpretation process, the query asking for prescription objects consumed by the user (Figure 29) would be sent to the medication knowledge base, eliciting an empty result, since prescriptions are not recognized in the knowledge base as consumable objects.3 (a) ASK ?y (SET-OF ?y ?x (PRESCRIPTION ?x)) (TAKEMED v123) (:ACTOR v123 +USER+) (:MEDICATION v123 ?x) (b) (SET-OF ?y ?x (PRESCRIBED-MEDICATION (PRESCRIPTION ?x))) Figure 29. (a) Query for Which prescriptions do I need to take with no type coercion; (b) representation for prescriptions coerced to medication
A correct interpretation needs a semantic representation for (1) that resembles the semantic representation for (2). For this, additional information must be interpolated – in particular, the relation that holds between prescriptions and medications. We accomplish this in our framework with a library of coercion rules. To handle the semantic type 3
Arguably, reasoning about prescriptions as consumables could be implemented in the knowledge base. However, in our system the context information needed to resolve some instances of coercion is localized in the intention recognition module, and using operators handled by the intention recognition is a way to take it into account. Such implementation differences between components with different reasoning capabilities provide another justification for the need to specialize parser output for different application back-ends.
228
DZIKOVSKA ET AL.
coercion of prescription to medication, we define the coercion rule shown in Figure 30a, which uses a prescribed-medication operator (known to the knowledge base) to declare that prescriptions can be coerced into medications. (a) declare-coercion-operator prescribed-medication :arguments prescription :return medication (b) (prescription :semfeatures (Phys-obj (information +) (kr-type PRESCRIPTION)) :lf LF Information-object*prescription :coercion ((operator prescribed-medication) (semfeatures (phys-obj (kr-type medicinal-substance) (form substance)))))) Figure 30. (a) Operator to coerce prescriptions to medications. (b) Lexical entry for prescription with coercion feature generated from operator in (a)
During the lexicon specialization process, information from coercion rules is propagated into the lexical entries that have domain-specific mappings. When the lexicon is specialized for the medical domain, the lexical entry for prescription has a nonempty coercion feature (Figure 30b). The coercion feature specifies the coercion operator that will be applied during the interpretation process, and a set of semantic features derived from the operator returns type medicinal-substance (with the help of a feature inference rule in Figure 27), allowing the coerced NP prescriptions to be accepted in parsing contexts that require substances. For every noun phrase with a nonempty coercion feature, the grammar produces an additional NP representation for that item with a feature that indicates that a coercion has occurred, and the original semantic features replaced by the features specified by the coercion. During the transform process, the Interpretation Manager applies the specified coercion rule, resulting in the (simplified) representation for prescriptions in Figure 29b. This coerced representation is then plugged into the query in Figure 29a in place of (PRESCRIPTION ?x). The new query is sent to the medication database and results in the correct answer: a set of medications. Adding the information about possible coercion rules into the lexical entries gives us an edge in efficiency, because only the coercions relevant to our domain are attempted, and the interpretation manager can apply
CUSTOMIZING MEANING
229
the correct rule immediately. At the same time, the declarative coercion rules can also be used in reference resolution. For example, consider the sequence of utterances: I have a new prescription. When do I need to take it? The obvious referent for it is prescription, but there is a type mismatch, because take in this context prefers an argument that is a medicinal substance. The PHORA system (Byron, 2002) can use the declarative coercion rules in the resolution process. When a type mismatch is encountered, it searches the library of coercion rules for an operator with suitable argument types to perform the coercion and attempts to resolve the pronoun.
6. Conclusion
Our method of parser customization . allows us to maintain a domainindependent lexicon and grammar for improved domain coverage and portability, and at the same time provides a straightforward mechanism for constructing custom semantic representations that are optimally suited for specific domain reasoners. Our lexicon specialization process improves parsing speed and accuracy by using custom domain knowledge to boost domain-specific word senses, tighten selectional restrictions on arguments and reduce search space during parsing. We also use the domain specific information to handle semantic type coercions common in our domains in a computationally efficient manner.
Acknowledgments This material is based upon work supported by the Office of Naval Research under grant number N00014-01-1-1015 and the Defense Advanced Research Projects Agency under grant number F30602-98-20133. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of ONR or DARPA.
References Allen, J., D. Byron, M. Dzikovska, G. Ferguson, L. Galescu, and A. Stent: 2000, ‘An Architecture for a Generic Dialogue Shell’. NLENG: Natural Language Engineering, Cambridge University Press 6(3), 1–16.
230
DZIKOVSKA ET AL.
Allen, J., D. Byron, M. Dzikovska, G. Ferguson, L. Galescu, and A. Stent: 2001, ‘Towards Conversational Human-Computer Interaction’. AI Magazine 22(4), 27–38. Allen, J. F., B. W. Miller, E. K. Ringger, and T. Sikorski: 1996, ‘A Robust System for Natural Spoken Dialogue’. In: Proceedings of the 1996 Annual Meeting of the Association for Computational Linguistics (ACL’96). Byron, D. K.: 2002, ‘Resolving Pronominal Reference to Abstract Entities’. Ph.D. thesis, University of Rochester. Copestake, A. and D. Flickinger: 2000, ‘An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG’. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation. Athens, Greece. Davidson, D.: 1967, ‘The Logical Form of Action Sentences’. In: N. Rescher (ed.): The Logic of Decision and Action. Pittsburgh: University of Pittsburgh Press, pp. 81–95. Dowding, J., J. M. Gawron, D. Appelt, J. Bear, L. Cherny, R. Moore, and D. Moran: 1993, ‘GEMINI: A natural language system for spoken-language understanding’. In: Proceedings of the 1993 Annual Meeting of the Association for Computational Linguistics (ACL’93). pp. 54–61. Dzikovska, M., J. F. Allen, and M. D. Swift: 2002, ‘Finding the balance between generic and domain-specific knowledge: a parser customization strategy’. In: Proceedings of LREC 2002 Workshop on Customizing Knowledge for NLP applications. Dzikovska, M. O. and D. K. Byron: 2000, ‘When is a union really an intersection? Problems resolving reference to locations in a dialogue system’. In: Proceedings of the GOTALOG’2000. Gothenburg. Ferguson, G., J. Allen, N. Blaylock, D. Byron, N. Chambers, M. Dzikovska, L. Galescu, X. Shen, R. Swier, and M. Swift: 2002, ‘The Medication Advisor Project: Preliminary Report’. Technical Report 766, Computer Science Dept., University of Rochester. Goddeau, D., E. Brill, J. Glass, C. Pao, M. Phillips, J. Polifroni, S. Seneff, and V. Zue: 1994, ‘Galaxy: A Human-Language Interface to On-line Travel Information’. In: Proc. ICSLP ’94. Yokohama, Japan, pp. 707–710. Johnson, C. and C. J. Fillmore: 2000, ‘The FrameNet tagset for framesemantic and syntactic coding of predicate-argument structure’. In: Proceedings ANLP-NAACL 2000. Seattle, WA. Lascarides, A. and A. Copestake: 1998, ‘Pragmatics and Word Meaning’. Journal of Linguistics 34(2), 387–414. Macleod, C., R. Grishman, and A. Meyers: 1994, ‘Creating a Common Syntactic Dictionary of English’. In: SNLR: International Workshop on Sharable Natural Language Resources. McDonald, D. D.: 1996, ‘The interplay of syntactic and semantic node labels in partial parsing’. In: H. Bunt and M. Tomita (eds.): Recent Advances in Parsing Technology. Kluwer Academic Publishers, pp. 295–323.
CUSTOMIZING MEANING
231
Miller, S. A. and L. K. Schubert: 1988, ‘Using Specialists to Accelerate General Reasoning’. In: T. M. Smith and R. G. Mitchell (eds.): Proceedings of the 7th National Conference on Artificial Intelligence. pp. 161–165. Moens, M. and M. Steedman: 1987, ‘Temporal Ontology in Natural Language’. In: Proceedings of the 25th Annual Conference of the Association for Computational Linguistics. pp. 1–7, Association for Computational Linguistics. Pustejovsky, J.: 1995, The Generative Lexicon. Cambridge, Massachuesetts: The MIT Press. Ros´e, C.: 2000, ‘A Framework for Robust Semantic Interpretation’. In: Proceedings 1st Meeting of the North American Chapter of the Association for Computational Linguistics. Seneff, S.: 1992, ‘TINA: A Natural Language System for Spoken Language Applications’. Computational Linguistics 18(1), 61–86. Utsuro, T. and Y. Matsumoto: 1997, ‘Learning Probabilistic Subcategorization Preference by Identifying Case Dependencies and Optimal Noun Class Generalization Level’. In: Proceedings of 5th ANLP Conference. Vossen, P.: 1997, ‘EuroWordNet: a multilingual database for information retrieval’. In: Proceedings of the Delos workshop on Cross-language Information Retrieval. Wahlster, W. (ed.): 2000, Verbmobil: Foundations of Speech-to-Speech Translation. Berlin: Springer.
ARAVIND K. JOSHI, LAURA KALLMEYER AND MARIBEL ROMERO
FLEXIBLE COMPOSITION IN LTAG: QUANTIFIER SCOPE AND INVERSE LINKING
1. Introduction
The scope of quantifiers within a sentence can be in principle arranged in different orderings. For example, a sentence with two quantifiers [Qu1 ...Qu2 ] has two logically possible scope orderings: the surface order Qu1 > Qu2 , and the inverse order Qu2 > Qu1 . Both orderings need to be generated by the grammar to account for the surface reading of (83a) and the inverse reading of (84a):1 (83) a. Every FBI agent hates a professor. b. Surface ∀ ∃: ∀y [agent(y) → ∃x[professor(x) ∧ hate(y, x)] ]
(84) a. An FBI agent is spying on every professor. b. Inverse ∀ ∃: ∀y [professor(y) → ∃x[agent(x) ∧ spy(x, y)] ] 1
Here and throughout this chapter, we will avoid using (singular) indefinites as wide scope quantifiers. This is because the wide scope effect of indefinites can be obtained through a special pseudo-scoping mechanism – namely, choice functions (Kratzer, 1998, Reinhart, 1997) –, and not through the general truly scoping mechanisms that we are concerned with in this chapter. This pseudo-scoping mechanism allows indefinites to yield the truthconditional effect of wide scope in configurations where regular quantifiers cannot, e.g., out of if-islands, as the contrast (81)-(82) illustrates. In sum, even though (83a) in the text has an ∃∀ reading, we need example (84a) to ensure that true inverse scope is available in the grammar. (81) If a relative of Mary dies, she’ll inherit a large fortune. ∃ > if (82) If every relative of Mary dies, she’ll inherit a large fortune. *∀ > if
233 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 233–256. c 2007 Springer.
234
JOSHI ET AL.
Both scope orderings must be made available by the grammar also when the two quantifiers appear nested within each other, that is, when one of the quantifiers appears within the Noun Phrase headed by the other quantifier. This is illustrated in (85) and (86). In (85a), the surface ordering ∀∃ corresponds to the existing reading spelled out in (85b). In (86a), the inverse ordering ∀∃ gives us the existing reading in (86b). The surface and inverse reading in nested quantifier constructions will be called “surface linking reading” and “inverse linking reading” respectively. Note, that, in the inverse linking reading, the nested Qu2 does not only take scope over its host NP, but over the clause in general, as it can bind the variable it in (87) (May, 1985, p. 68). (85) a. Every representative from an African country came to the meeting. b. Surface linking reading ∀ ∃: ∀x[ ∃y [representative(x, y) ∧ Afrcountry(y)] → came(x) ] (86) a. A representative from every African country came to the meeting. b. Inverse linking reading ∀ ∃: ∀y [ Afrcountry(y) → ∃x [representative(x, y) ∧ came(x)] ] (87) Somebody from every city despises it. When we turn to sentences with a sequence of three quantifiers [Qu1 . . . Qu2 . . . Qu3 ] , we have six logically possible scope combinations. Out of these six combinations, the ordering Qu3 >Qu1 >Qu2 , yields an actual reading for sentences like (88), as spelled out in (88b).2 (88) a. (At least) two social workers gave a doll to each/every child. 2
In (88b), x ranges over singular and plural individuals (see Link, 1983), the formula |x| ≥ 2 says that x consists of at least 2 atoms and the universal quantification ∀x corresponds to the distributive interpretation optionally available for plural Noun Phrases (Link, 1983 among many others). For convenience, we will use the notation 2 x[p1 ∧ p2 ] for ∃x [p1 ∧ |x| ≥ 2 ∧ ∀x [x ⊂ i x → p2 ] ] ] in subsequent examples.
235
FLEXIBLE COMPOSITION IN LTAG
b. ∀3 21 ∃2 : ∀y [ child(y) → ∃x [social-workers(x) ∧ |x| ≥ 2 ∧ ∀x [x ⊂ i x → ∃z[doll(z) ∧ give(x , z, y)] ] ] ] In the case of nested quantifier configurations [Qu1 . . . [Qu2 . . . Qu3 ]], however, where Qu3 appears within Qu2 , this same ordering Qu3 Qu1 Qu2 does not yield an actual reading (Larson, 1987, Heim and Kratzer, 1998, Sauerland, 2000, Buering, 2001, Barker, 2002). Neither does the configuration Qu2 Qu1 Qu3 yield an actual reading (Hobbs and Shieber, 1987). This can be seen in (90)-(91). Take the sequence [21 . . . [∀2 . . . ∃3 ]] in (89a) and concentrate on the three logically possible surface linking orderings (where ∀2 has wider scope than ∃3 ). The claim is that the nested quantifiers can in principle take scope together under 2 (order 21 ∀2 ∃3 ) or they can take scope together over 2 (order ∀2 ∃3 21 ), but they cannot take scope separately with the quantifier 2 intervening between them (ordering * ∀2 21 ∃3 ). Take now the inverse linking scope between the nested quantifiers in the sequence [21 . . . [∃2 . . . ∀3 ]] in (90a). Again, the (inverse linked) nested quantifiers ∀3 ∃2 can take scope together under 2 or over 2 (readings 21 ∀3 ∃2 and ∀3 ∃2 21 , respectively), but their scope cannot be split by inserting the scope of another quantifier inbetween (reading * ∀3 21 ∃2 ). Examples (91) and (92) illustrate the same point as (90). (89) a. Two politicians visited everybody from a foreign country. b. 21 ∀2 ∃3 , ∀2 ∃3 21 , * ∀2 21 ∃3 c. * ∀2 21 ∃3 : ∀y [ person(y) → 2 x [politicians(x) ∧ ∃z[f-country(z) ∧ from(y, z) ∧ visit(x, y)] ] (90) a. Two politicians spy on someone from every city. 1987)
(Larson,
b. 21 ∀3 ∃2 , ∀3 ∃2 21 , * ∀3 21 ∃2 c. * ∀3 21 ∃2 : ∀y [ city(y) → 2 x [politicians(x) ∧ ∃z[person(z) ∧ from(z, y) ∧ spy(x, z)] ] (91) Two engineers repaired some exits from every freeway in California city. (Larson, 1987) (92) Two boys are dancing with a girl from every city. 2000)
(Sauerland,
236
JOSHI ET AL.
The missing surface linking reading Qu2 Qu1 Qu3 may be banned due to general architectural reasons, since *∀2 21 ∃3 in (89c) involves misplacing the restrictor f rom(z, y) of the the quantifier ∀z into its nuclear scope. But note that such a problematic configuration does not arise in the inverse linked * ∀3 21 ∃2 in (90c). The unavailability of this last reading is, hence, puzzling and needs an explanation. The main goal of this chapter is to provide an LTAG account of why no quantificational NP can intervene between an inverse linked quantifier and its host NP in examples like (90a).3 This chapter is part of a larger project concerned with the development of a compositional semantics for LTAG. The material is organized as follows. We first provide some background on LTAG and compositional semantics in section 2. Section 3 develops a flexible composition approach to quantification. Section 4 spells out the semantics for it, generating only the correct scopal combinations for nested quantifier constructions. Section 5 concludes.
2. LTAG and compositional semantics
2.1. Lexicalized Tree Adjoining Grammars (LTAG) An LTAG (Joshi and Schabes, 1997) consists of a finite set of trees (elementary trees) associated with lexical items and of composition operations of substitution (replacing a leaf with a new tree) and adjunction (replacing an internal node with a new tree). The elementary trees represent extended projections of lexical items and encapsulate all syntactic/semantic arguments of the lexical anchor. They are minimal 3
Hobbs and Shieber (1987) correctly rule out reading * Qu2 Qu1 Qu3 , but their algorithm incorrectly generates * Qu3 Qu1 Qu2 , since their inverse linking examples have an indefinite as the nested Qu3 (see footnote 1). Barker, 2001 and Barker, 2002 develop the notion of ‘integrity’, which successfully rules out both linking readings; however, his account also rules out the grammatical reading Qu3 Qu1 Qu2 in (88). Finally, Buering, 2001 achieves the desired results in inverse linking configurations by prohibiting the nested Qu to escape their host NP and by deriving pronoun binding in Somebody from every city despises its subway system through indirect quantification over situations. We will, instead, allow scoping to the S clause node, but we will postulate a scope identification requirement that, given the way the syntax-semantics of LTAG representations is composed, only limits the scope possibilities in nested Qu configurations.
237
FLEXIBLE COMPOSITION IN LTAG S
derived tree: S
NP
VP
NP
V
John
VP NP
ADV
ADV
VP
always
V
∗
VP
laughs John
VP
laughs
always derivation tree: laugh (1)john Figure 31.
(2)always
TAG derivation for John always laughs
in the sense that only the arguments of the anchor are encapsulated, all recursion is factored away. Starting from the elementary trees, larger trees are derived using substitution and adjunction. As an example see Fig. 1 where a derivation starts with the elementary tree of laughs. Then, the elementary tree of John is substituted for the NP node and the elementary tree of always is adjoined at the VP node. The result is the derived tree on the right. The elementary tree of always is a special tree, a socalled auxiliary tree. In these trees, one of the leaves is marked as foot node (marked with an asterisk). When adjoining an auxiliary tree to a node µ, in the resulting tree, the subtree with root node µ from the old tree is put below the foot node of the new auxiliary tree. For adjunctions auxiliary trees are used, while for substitutions initial trees (the non-auxiliary elementary trees) are used. LTAG derivations are represented by derivation trees that record the history of how the elementary trees are put together. The derived tree is the result of carrying out the substitutions and adjoinings. Each edge in the derivation tree stands for an adjunction or a substitution. The edges are equipped with Gorn addresses of the nodes where substitution/adjunction takes place: the root has the address (0), its children have addresses (1), (2), . . . and the jth child of the node with address (p) has address (p · j). In Fig. 1, John is substituted for the node at position (1) and always is adjoined at position (2).
238
JOSHI ET AL.
2.2. Compositional semantics with LTAG Because of the localization of the arguments of a lexical item within elementary trees TAG derivation trees express predicate argument dependencies. Therefore it is generally assumed that the proper way to define compositional semantics for LTAG is with respect to the derivation tree, rather than the derived tree (see, e.g., Candito and Kahane, 1998, Joshi and Vijay-Shanker, 1999, Kallmeyer and Joshi, 1999, Kallmeyer and Joshi, 2003). The overall idea is as follows. Each elementary tree is linked to a semantic representation. The way the semantic representations combine with each other depends on the derivation tree. Following (Kallmeyer and Joshi, 1999; Kallmeyer and Joshi, 2003), in this chapter, we will adopt ‘flat’ semantic representations as in, for example, Minimal Recursion Semantics (MRS, Copestake et al., 1999). (93) shows the elementary semantic representations for John always laughs.4 (93)
l1 : laugh(x1 ) h1 ≥ l1 arg: x1 , (1)
john(x) arg: –
l2 : always(h2 ) g 1 ≥ l2 , h2 ≥ s1 arg: g 1 , s1
Roughly, a semantic representation consists of a conjunctively interpreted set of formulas (typed lambda-expressions), scope constraints and a set of argument variables. The formulas may contain labels and holes (metavariables for propositional labels). In the following, l1 , l2 , . . . are propositional labels, h1 , h2 , . . . are propositional holes, s1 , s2 , . . . are propositional and x1 , x2 , . . . individual argument variables (whose values must be propositional labels/free individual variables) and g 1 , g 2 , . . . are hole variables (special argument variables whose values must be holes). Argument variables may be linked to positions in the elementary tree, as it is the case for x1 in (93). The use of holes is motivated by the desire to generate underspecified representations (as in, e.g., Bos, 1995) for scope ambiguities. After having constructed a (possibly underspecified) semantic representation with holes and labels, disambiguation is done which consists of finding bijections from holes to labels that respect the scope constraints. E.g., in the semantic representation for laugh, there is a hole h1 above l1 4
john(x) is not a standard unary predicate but it is supposed to signify “there is a unique individual called John and x refers to that individual”.
FLEXIBLE COMPOSITION IN LTAG
239
(constraint h1 ≥ l1 ). Between h1 and l1 , other labels and holes might intervene (introduced for example by quantifiers or adverbs) or, if this is not the case, l1 will be assigned to h1 in the disambiguation(s).5 When combining semantic representations, values are assigned to argument variables and the union of the semantic representations is built. The values for the argument variables of a certain (elementary) semantic representation must come from semantic representations that are linked to it in the derivation tree. The linking of argument variables and syntactic positions restricts the possible values as follows: In a substitution derivation step at a position p, only argument variables linked to p get values. In an adjunction step, only argument variables that are not linked to any positions can get values. In the case of a substitution, a new argument is inserted and therefore a value is assigned to an argument variable in the old semantic representation. However, in the case of an adjunction, a new modifier is applied and therefore a value is assigned to a variable in the semantic representation that is added. In other words, in case of a substitution, semantic composition is downwards, while in case of an adjunction, semantic composition is upwards. For a formal definition of the semantic composition operation see (Kallmeyer and Joshi, 2003).6 Figure 2 shows the semantic composition for John always laughs. The directed edges signify semantic application while the dotted links signify variable assignments. John is substituted into laugh, therefore the corresponding semantic composition is downwards, while the composition of always and laugh is upwards. Furthermore, the value of x1 needs to come from John since x1 is linked to the node address where John is substituted, and the values of g 1 and s1 need to come from laugh since they are not linked to any node addresses. Consequently, The constraints k 1 ≥ k 2 we use differ from the qeq conditions k 1 = q k 2 in MRS (see Copestake et al., 1999, p.10) in that they allow any element having a propositional argument (quantifiers, scope-taking adverbs . . . ) to intervene between k 1 and k 2 while in MRS, just quantifiers can intervene between k 1 and k 2 . 6 The algebra introduced in (Kallmeyer and Joshi, 2003) is close to what (Copestake et al., 2001) introduce for MRS except for details of the formalization and for the fact that in (Copestake et al., 2001) each semantic representation contains just one “hook”, i.e. just one element that can be assigned as possible value to an argument (if equations are viewed as variable assignments). This is different in our approach, e.g. in (93) h1 and l1 are contributed by the same elementary representations and they are both used as values when combining laugh and always. 5
240
JOSHI ET AL. l1 : laugh(x1 ), h1 ≥ l1 arg: x1 , (1)
l2 : always(h2 ), g 1 ≥ l2 , h2 ≥ s1
john(x) (1)
(2) arg: – Figure 32.
arg: g 1 , s1 Semantic composition for John always laughs
x1 → x, g 1 → h1 and s1 → l1 . The result is (94). (94)
l1 : laugh(x), john(x), l2 : always(h2 ), h1 ≥ l1 , h1 ≥ l2 , h2 ≥ l1 arg: –
A disambiguation δ is a bijection from holes to labels such that: After having applied δ (i.e., replaced the holes by their corresponding labels), the reflexive transitive closure ≥ ∗ of the order ≥ with (i) l1 ≥ l2 if l1 ≥ l2 is a constraint and (ii) l1 ≥ l2 and l1 = l2 if l2 labels a subformula of the formula labelled l1 must be such that: (a) ≥ ∗ is a partial order, and ∗ l2 and l2 ≥ ∗ l1 if l1 and l2 are different arguments of (b) l1 ≥ the same predicate (e.g., the restrictive and the nuclear scope of a quantifier). In (94), h1 ≥ l2 , l2 > h2 (because h2 appears inside a formula labelled l2 ) and h2 ≥ l1 . Consequently h1 = l1 and the only possible disambiguation is h1 → l2 , h2 → l1 . This leads to the semantics john(x) ∧ always(laugh(x)). The way we use holes and labels correponds largely to the Hole Semantics in (Bos, 1995). There are mainly two differences: first the underspecified representations we are using are not required to be proper underspecified representations in the sense of (Bos, 1995). This means that the set of propositional labels with the partial order ≥ ∗ need not form a meet semi-lattice. In other words, for two labels l1 , l2 , there need not be a label l3 with l3 ≥ ∗ l1 , l3 ≥ ∗ l2 . The second difference concerns the definition of ≥ ∗ on the basis of a given underspecified representation. The condition (b) in our definition of ≥ ∗ that makes
241
FLEXIBLE COMPOSITION IN LTAG
sure that nothing can be at the same time below two different arguments of the same predicate, is not present in (Bos, 1995). Due to this condition, for example the surface linked scope order ∀2 21 ∃3 in (89) is excluded because this scope order would require from(x, y) to be in the restriction of ∀ (since it is part of the restriction of everybody) while being in the nuclear scope of ∃ (since it is in the scope of a foreign country) which is in the nuclear scope of 2 which is in the nuclear scope of ∀. With a partial order defined as in (Bos, 1995), i.e., without condition (b), this scope order would be possible. 2.3. Separating scope and predicate argument information A central aspect of (Kallmeyer and Joshi, 1999; Kallmeyer and Joshi, 2003) is the separation of the contribution of a quantifier into a scope and a predicate argument part: Quantifiers have a set of two elementary trees and tree-local multicomponent TAGs are used. (This means that if a new elementary tree set is added, all trees of the set are added simultaneously and they are added to nodes belonging all to the same elementary tree.) An auxiliary tree consisting of a single node is linked to the scope part of the semantics, while an initial tree is linked to the predicate argument part.7 E.g., consider the syntactic analysis of every dog barks in Fig. 3. The corresponding elementary semantic representations are shown in (95).
∗
S
NP
Det every
NP N
S VP V barks
N dog
Figure 33.
7
derived tree: derivation tree:
S
barks NP
VP (0)every-1
Det
N
V
every
dog
barks
(1)every-2 (2)dog
Syntactic analysis of every dog barks
This two component representaion of a quantifier phrase can be viewed as a kind of type raising that is used in categorial grammars, especially in the logic based categorial grammars (Morill, 1994; Steedman, 1996).
242 (95)
JOSHI ET AL.
l1 : bark(x1 ) h1 ≥ l1
l2 : ∀x(h2 , h3 ) h3 ≥ s1
l3 : p1 (x) h2 ≥ l3
arg: x1
arg: s1
arg: p1
q 1 : dog arg: −
The scope part of the quantifier (second representation in (95)) introduces a proposition containing the quantifier, its variable and two holes for its restrictive and nuclear scope. The proposition this semantic representation is applied to (variable s1 ) is in the nuclear scope of the quantifier (h3 ≥ s1 ). The predicate argument part (third representation in (95)) introduces a proposition p1 (x) where p1 will be the noun predicate dog. This proposition is in the restrictive scope of the quantifier (h2 ≥ l3 ). The values for the argument variables are x1 → x, s1 → l1 , p1 → q 1 which gives (96). The only disambiguation is h1 → l2 , h2 → l3 , h3 → l1 which leads to the semantics ∀x(dog(x), bark(x)). (96)
l1 : bark(x), l2 : ∀x(h2 , h3 ), l3 : dog(x), h1 ≥ l1 , h3 ≥ l1 , h2 ≥ l3 arg: −
To account for cases with more than one quantifier, a restricted use of multiple adjunctions (for the scope parts) is necessary. As already mentioned above, the use of holes and labels allows to generate underspecified representations for quantifier scope ambiguities as in (97). (97) some student loves every course The elementary trees and elementary semantic representations and the derivation tree are shown in Fig. 4. The assignments are x1 → x, x2 → y, s1 → l1 , p1 → q 1 , s2 → l1 , p2 → q 2 . The result is (98). (98)
l2 : ∃x(h2 , h3 ), l4 : ∀y(h4 , h5 ), l1 : loves(x, y), l3 : student(x), l5 : course(y), h2 ≥ l3 , h3 ≥ l1 , h4 ≥ l5 , h5 ≥ l1 , h1 ≥ l1 arg: –
According to (98), student(x) is in the restriction of ∃, course(y) in the restriction of ∀, and loves(x, y) is in the body of ∃ and the body of ∀.
243
FLEXIBLE COMPOSITION IN LTAG Elementary trees and semantic representations: S NP
l1 : love(x1 , x2 ) h1 ≥ l1
VP V
NP arg: x1 , (1), x2 , (22)
loves N
student
arg: –
l2 : ∃x(h2 , h3 ) h3 ≥ s1
S∗
arg: s1 NP Det
N
some
N
q 1 : student
l3 : p1 (x) h2 ≥ l 3 arg: p1 , 01
q 2 : course arg: –
course
l4 : ∀y(h4 , h5 ) h5 ≥ s2
S∗
arg: s2 NP Det
N
every
l5 : p2 (y) h4 ≥ l5 arg: p2 , 01
loves Derivation tree: (0)some-1
Figure 34.
(1)some-2
(22)every-2
(2)student
(2)course
(1)every-1
Scope ambiguity and underspecification: analysis of (97)
This leaves open whether ∃ is in the body of ∀ or ∀ in the body of ∃. The corresponding two disambiguations are h1 → l2 , h2 → l3 , h3 → l4 , h4 → l5 , h5 → l1 (wide scope of ∃) and h1 → l4 , h2 → l3 , h3 → l1 , h4 → l5 , h5 → l2 (wide scope of ∀). There are many related works on computational models for scope representation, e.g., (Reyle, 1993) that introduces scope constraints and underspecification into DRT. One that has a specific connection to our work is (Alshawi, 1992). In this work there is an intermediate level of scope representation (Quasi Logical Form (QLF)). At this level underspecified representation of scope is allowed (among other things). This form is computed from a prior phase of syntactic analysis and is produced by an initial semantic analysis phase.
244
JOSHI ET AL.
The fact that we provide in our representation a level of underspecification is not the novel part of our system. One of the novel aspects of the compositional semantics developed in (Kallmeyer and Joshi, 2003) is that the derivation tree (which is the syntactic derivational history in the LTAG system) already represents the underspecified scope relations. Computation of this representation is not a separate level. This is a crucial point of departure from the traditional compositional systems. The other distinguishing aspect is the factoring of the composition of the predicate-argument semantics from the scope composition semantics.
3. LTAG and flexible composition
In a context-free grammar, CFG, a rule such as A → BC can be interpreted in two ways. We can regard B as a function and C as its argument, producing the result A. Alternatively, C can be regarded as a function and B as its argument, producing the same result. Thus we have flexible composition here, in the sense that the direction of composition is flexible. In the case of CFGs it is easily seen that providing such flexibility does not affect the weak generative capacity of the grammar (i.e., the set of strings generated by the grammar) as well as the strong generative capacity (i.e., the set of derivation trees generated by the grammar). In a categorial grammar, CG, flexible composition can be achieved by type raising. For example, the sequence, NP(S\NP)/NP NP can be reduced to NP S\NP and then to S. Alternatively, by type raising the subject NP to S/(S\NP) we can first compose the subject with the transitive verb and then compose the result with the object NP. We can therefore get two different derivations (structural descriptions) for the given string, thus increasing the strong generative capacity of the grammar. It appears that for a CG, flexible composition may increase the strong generative capacity but it does not increase the weak generative capacity. It should be noted that both CFGs and CGs are ‘string’ rewriting systems, in the sense that the function and argument categories have to be ‘string adjacent’ to each other. For a TAG and, in particular, for the multi-component TAG it can be shown that flexible composition allows the possibility of increasing both the strong and weak generative capacities. This is due to the fact that when TAG trees are composed (interpreting them either as functions or arguments) the function and argument trees are ‘tree-adjacent’. The notion of ‘string adjacency’ is not relevant here. The fact that complex
FLEXIBLE COMPOSITION IN LTAG
245
structured objects are composed (instead of string concatenation of two strings as in a CFG or a CG) allows the possibility of increasing strong and weak generative capacities using flexible composition. In this chapter, we only use the tree local MC-TAGs, i.e., when a multicomponent tree composes with another tree that tree has to be an elementary tree. With tree-local MC-TAG it can be shown that they are weakly equivalent to TAG , however, they give more structural descriptions (i.e., more strong generative capacity). Further treelocal MC-TAGs when used with flexible composition give even more strong generative capacity, still being weakely equivalent to TAG. Setlocal MC-TAGs allow the possibilty for composing two multicomponent trees, say, α with β (two components in each tree) such that the two components of α compose with the two components of β individually, the two components of β have to be elementary trees to preserve locality. The result is a two component tree where each one of the components is a derived tree. Set-local MCTAG can generate more strings than TAGs, i.e., more weak generative power as well as, of course, more strong generative power. However, in our case, one component of a multicomponent is aways a degenerate tree (empty tree). It can be shown that in this case set-local composition does not increase weak generative capacity beyond TAGs. Thus, in summary, in this chapter we use MC-TAGs with flexible composition, which are weakly equivalent to TAGs but strongly more powerful than TAGs in ways that are just adequate for describing scope ambiguities. We will now use some simple examples to illustrate what we mean by flexible composition in a TAG or MC-TAG. Instead of the two operations, substitution and adjoining, we will use the cover term ‘attachment’. In Fig. 5 β 1 can be attached to α1 at the interior S node of α1 resulting in the tree corresponding to whoi NP thinks NP likes i . In this case β 1 composes with α1 . Alternatively, we can regard α1 as a multicomponet tree (with two components) as shown in α2 with the two components α21 and α22 . Now we can compose α2 with β 1 such that α21 attaches to the root node of β 1 and α22 attaches to the foot node S of β 1 , resulting in the same string and the same tree as before, but with a different derivation (different structural description). In flexible composition if a tree t composes with a tree u then we require that u is an elementary tree. This assures ‘tree locality’ in the composition. Given two trees t and u composition can go in either direction if both t and u are elementary. If both t and u are derived trees then they cannot compose with each other. If only one of the
246
JOSHI ET AL. α1
S
β1 NP
NP
WHi
V
NP
likes α21 S
Figure 35.
VP
VP V
α2
S
S
WHi
thinks
i
α22 S NP S
S
VP V
NP
likes
i
Example of flexible composition
trees is elementary then the other tree can compose into it but not vice versa. Given this constraint on locality the composition can proceed in a flexible manner. Of course, several derived trees can be added simultaneously to an elementary tree. This is necessary in order not to exclude standard TAG derivations. Let us consider a second example where flexible composition allows us to obtain a richer set of derivation structures. Using the elementary trees in Fig. 6 , the two derivations shown in Fig. 7 are possible (among others). They generate the same string and even the same tree but the two derivation trees are different. In the first derivation, β 2 is first attached to β 12 . This results in a derived two component β 1 tree. This derived two component tree can now compose with α1 , which is an elementary tree: β 11 attaches to the root node of α1 and the derived tree attaches to the lower S node. (This derivation would also be possible in a tree-local multicomponent TAG.) In the second derivation, β 12 is first attached to β 2 leading to a two component tree. Then, this derived two component tree (i.e., β 11 and the derived tree) composes with α1 . This second derivation is not possible in a tree-local multicomponent TAG. There is another way to describe this second derivation. Take the first derivation tree in Fig. 7. In this tree β 2 composes with β 12 (going bottom up on the dervation tree. If we view this composition going the othe other way, i.e., β 12 composing with β 2 then we have the
247
FLEXIBLE COMPOSITION IN LTAG β 11
S
a
β 12 S
b
Figure 36.
c
A
d
S
S a
S S
b
S
Figure 37.
S
S e
A
d
S a
S
S b
c
α1
S
S
S
Sample TAG with flexible composition
S
β2
S
S
S e
S
β 11
S S e c d
S
A S
derivation tree: α1
β 12 β2
derivation tree: α1 β 11
β2 β 12
Flexible composition derivations for the string aebcd
second derivation tree in Fig. 7. So flexible compostion, in this sense, means walking along a derivation graph (i.e., a derivation tree without directions on the edges) in any possible way, preserving the locality of compostion. Thus in our example in Fig. 7 we get the second derivation tree from the first derivation tree by starting the walk at the node labled β 12 and going downwards to the node β 2 . The first step in the second derivation tree in Fig. 7, on the face of it, looks as if we have violated the definition of how a multicomponent set composes with an elementary tree. We are supposed to use both the components but we are using only one of them. But this ok since, as we have just seen, the effect of this compostion can be achieved by just reversing the composition of β 2 with β 12 in the first derivation in Fig. 7. It is in this sense that flexible composition allows us to build the second dervation tree in Fig. 7. In a way, the idea of splitting the contribution of a quantifier into two parts (see Section 2.3) arises naturally from the idea of flexible composition: a substitution of an NP-tree for an NP substitution node
248
JOSHI ET AL.
in a verb tree can also be considered as an adjunction of the verb tree at the root of the NP-tree (with the substitution node being the foot node). Now, adopting flexible composition in the way shown in Fig. 5, the NP-tree can be split into two components and these components can compose with the verb tree in the way shown in Section 2.3. To account for scope restrictions in inverse linking readings, we will use the flexible composition approach. However, we do not need its full power. We will just use standard TAG derivations but adopt the flexible composition perspective which roughly corresponds to a bottom-up traversal of the derivation tree.8
4. The quantifier set approach
In this section we propose a way to obtain the desired scope restrictions for inverse linking constructions making use of the flexible composition approach. Consider again the inverse linking example (90), repeated as (99). The reading we want to exclude is the inverse linked reading with intervention ∀3 21 ∃2 . (99) Two politicians spy on someone from every city In the flexible composition approach, at some point the QPs someone and every city are composed. In this step, the two scope parts (the S auxiliary trees) of these quantifiers are identified (one adjoins to the other). The result is the complex QP someone from every city. Later, this QP and two politicians are both added to spy, i.e., their scope parts adjoin to the S node of spy. In other words, in this latter step the scope parts of the complex QP and of two politicians are identified. It seems that whenever an identification of scope parts takes place (i.e., either one adjoins to the other or all adjoin to the same node), • all scope orders are possible between the quantifiers involved in that identification, and • no other quantifier can intervene (i.e., have scope over one of the quantifiers while being in the scope of another of the quantifiers involved in this identification). 8
The idea of flexible composition is not only useful for quantifier scope restrictions but also to deal with various word order phenomena in LTAG. Therefore we presented it in a general way even though only a part of it is used in this chapter.
249
FLEXIBLE COMPOSITION IN LTAG spy on (0)two-1
(1)two-2
(0)someone-1
(2)politicians
(22)someone-2
(0)every-1
(0)from (22)every-2 (2)city
Figure 38.
Derivation tree of (99)
To formalize this, we introduce quantifier sets in our semantic representations. The idea is the following: Whenever several quantifiers are identified, a new set is built containing the scope parts of these quantifiers. Eventually, these scope parts are already sets (as in the case of the complex QP in (99)). E.g., the representation for (99) contains a quantifier set {l1 : 2 . . . , {l2 : ∃ . . . , l3 : ∀ . . . }}. The elements of one quantifier set (e.g., ∃ and ∀ in (99)) are considered being ‘glued together’ in the sense that no other quantifier can intervene. This is obtained by putting a condition on the scope order that makes sure that if one part of a quantifier set Q1 is subordinated by one part of another quantifier set Q2 , then all quantifiers in Q1 must be subordinated by all quantifiers in Q2 . More formally, to the conditions on the relation “≥ ∗ ” one obtains after having applied a disambiguation (see (a), (b), p. 240), we add the following: (c) for each quantifier set Q, for all Q1 , Q2 ∈ Q: if there are labels l1 in Q1 and l2 in Q2 such that l1 > ∗ l2 , then for all l1 in Q1 and l2 in Q2 l1 > ∗ l2 holds. For (99), this will exclude the disambiguation to l3 > ∗ l1 > ∗ l2 , i.e,. the inverse linking reading * ∀3 21 ∃2 that we want to rule out.9 Let us go through the derivation of (99). Figure 8 shows its derivation tree. For the scope parts of quantifiers we allow now non-local multicomponent attachments. This does not affect the generative capacity of the grammar. The disambiguation l2 > ∗ l1 > ∗ l3 corresponding to the ungrammatical surface linking reading * ∀2 21 ∃3 of (89) is also ruled out, both by (c) -due to intervention- and by (b) -because of the misplacement of the restrictor of the quantifier ∀. 9
250
JOSHI ET AL.
The flexible composition view corresponds roughly to a bottom-up derivation where derived trees are added to elementary trees, i.e., the derivation steps are the following:
1. politicians attaches to the lower part of the multicomponent (MC) set of two building a larger MC set 2. similarly, city attaches to every building a MC set 3. the lower part of the MC set of every city is substituted into from. The result is a new MC set. 4. the MC set of from every city is added to the MC set of someone with adjunction of the upper component at the scope part and an adjunction of the lower component at the NP. At this point, the first identification of two scope parts takes place. The result is a new MC set. 5. the two MC sets of two politicians and someone from every city are added to spy on where the two scope parts are adjoined to the root node and the two lower components are substituted for the corresponding leaves. At this point, the second identification of scope parts takes place.
Compared to section 2, we slightly modify the semantic representation of quantifiers: the scope part contains only the quantifier with the holes for restriction and body. The scope constraint linking the quantifier to its argument propositions is part of the lower part of the quantifier. In particular, the variable for the proposition in the nuclear scope of the quantifier (s1 and s2 in (100)) is now part of the lower part. This is necessary, since we allow non-local multicomponent adjunction for the scope auxiliary trees. Consequently, the scope part and the predicateargument part of a quantifier are not necessarily added to the same elementary tree. But the tree the predicate argument part is added to is the tree that contributes the proposition that must be in the nuclear scope of the quantifier. E.g., in (99), the scope part of every is identified with the scope part of someone and finally added to spy. But the proposition that must be part of the nuclear scope of every comes from the from tree, the tree the predicate argument part of every is added to. (100) shows the multicomponent sets derived for two politicians and every city. (101) shows the elementary tree for from.
251
FLEXIBLE COMPOSITION IN LTAG
(100)
l1 : 2x(h1 , h2 )
S∗
arg: – l11 : pol.(x) h1 ≥ l11 , h2 ≥ s1
NP two pol.
(101)
arg: s1
NP NP∗
NP every c.
S∗
l3 : ∀y(h3 , h4 ) arg: – l31 : city(y) h3 ≥ l31 , h4 ≥ s2 arg: s2
l5 : from(x1 , x2 ) h5 ≥ l5 , g 1 ≥ h5 , h5 ≥ s3
PP from
NP
arg: x1 , x2 , (22), g 1 , s3
Adding every city to from by substitution of the lower component at the NP leaf inside the PP leads to x2 → y and s2 → l5 . The result is (102).
(102)
S∗
arg: – NP
NP∗
(103)
l3 : ∀y(h3 , h4 )
S∗
PP from every city
l31 : city(y), l5 : from(x1 , y) h3 ≥ l31 , h4 ≥ l5 , h5 ≥ l5 , g 1 ≥ h5 , h5 ≥ s3 arg: x1 , g 1 , s3
l2 : ∃z(h6 , h7 ) arg: –
NP
someone l21 : person(z) h6 ≥ l21 , h7 ≥ s4 arg: s4
252
JOSHI ET AL.
(103) is the MC set for someone. When adding from every city to (103), the two scope parts are put into one quantifier set. The assignments are x1 → z, g 1 → h6 , s3 → l21 . (s3 → l21 is the only possibility, and g 1 → h7 would lead to h7 ≥ h5 ≥ l21 and h6 ≥ l21 , i.e., to l21 being in the restrictive and the nuclear scope of someone.) One obtains (104).
(104)
{l3 : ∀y(h3 , h4 ), l2 : ∃z(h6 , h7 )} S
NP someone fr. ev. city
arg: – l31 : city(y), l5 : from(z, y), l21 : person(z) h3 ≥ l31 , h4 ≥ l5 , h5 ≥ l5 , h6 ≥ h5 , h5 ≥ l21 , h6 ≥ l21 , h7 ≥ s4 arg: s4
When adding the two QPs, two politicians and someone form every city to spy, the two scope parts are adjoined to the same node and thereby identified. Therefore a large quantifier set is built. The result is (105). (105)
{l1 : 2x(h1 , h2 ), {l3 : ∀y(h3 , h4 ), l2 : ∃z(h6 , h7 )}} l11 : politicians(x), l31 : city(y), l5 : from(z, y), l21 : person(z), l8 : spy(x, z) h1 ≥ l11 , h2 ≥ l8 , h3 ≥ l31 , h4 ≥ l5 , h5 ≥ l5 , h6 ≥ h5 , h5 ≥ l21 , h6 ≥ l21 , h7 ≥ l8 , h8 ≥ l8 arg: –
The inverse linking reading ∀3 21 ∃2 with the quantifier 21 intervening is correctly excluded: this reading would mean l3 > l1 > l2 . Let Q1 := {l3 : ∀ . . . , l2 : ∃ . . . } and Q2 := l1 : 2 . . . . Then the new scope order condition on quantifier sets ((c), p. 16) is not satisfied because l3 > l1 and l2 > l1 . Note that, since every city is embedded under someone, the derivation shown above is the only possible derivation for (99). In other words, the multicomponent of someone and the multicomponent of from every city can combine first, but the multicomponent of two politicians and the multicomponent of from every city cannot combine first to yield a multicomponent set, since neither of them is an argument or adjunct of
FLEXIBLE COMPOSITION IN LTAG
253
the other. Consequently, the scope parts of someone and every can combine first, and the scope parts of two and every cannot. This means in general that, in a nested quantifier configuration [Qu1 . . . [Qu2 . . . Qu3 ]], we must glue together Qu2 and Qu3 and there is no way to glue together Qu1 and Qu3 such that nothing can intervene. Furthermore, in a sentence like (88) with three non-nested quantifiers [Qu1 Qu2 Qu3 ], no group of two quantifiers can be combined first into a multicomponent set, since none of them is argumentally related to the other. Hence, our analysis correctly predicts no ban against the reading ∀3 21 ∃2 in this case. Finally, consider ‘verbal’ quantifiers such as adverbs (e.g., sometimes), modals (e.g., can) and attitude verbs (e.g., think, want). What is interesting about them is that they have fixed scope, determined by their adjunction site (Cinque, 1999, Joshi and Vijay-Shanker, 1999, Percus, 2000, Romero, 2002). For example, in (106), think necessarily scopes over can. This probably means that the scope of verbal quantifiers does not depend on the rules that govern scope of quantificational Noun Phrases, including the set formation rule in multicomponent sets proposed in this chapter. However, no matter what mechanism partially delimits underspecified Noun Phrase scope and what algorithm gives rise to the fixed scope order of verbal quantifiers, the scope ordering of Noun Phrases and verbal quantifiers can be interleaved at a later stage, allowing e.g. for everybody to scope over or under can in (106). If this double picture of quantification is roughly correct, we would in principle expect intervention of a verbal quantifier to be acceptable in inverse linking configurations. This prediction is borne out: (107) allows for the reading ∀ want ∃, as noted in Sauerland, 2000. (106) Mary thinks everybody can win. (107) John wants to meet someone from every city you do (want to meet someone from).
5. Conclusion
In this chapter we provided an LTAG account for certain restrictions on quantifier scope. The approach is part of a larger project on compositional semantics in LTAG. The constructions considered are inverse linking readings for nested quantifiers, i.e., sentences with one quantifying phrase Qu3 embedded in another quantifying phrase Qu2 where
254
JOSHI ET AL.
Qu3 takes scope over Qu2 . In this case no other quantifier Qu1 that is on the same level as Qu2 can scopally intervene between Qu3 and Qu2 . In order to explain the fact that some quantifiers seem to be more closely connected than others, we adopted another perspective on TAG derivation, namely a perspective of flexible composition. This allowed to combine first those quantifiers that are closer with respect to scope and that do not allow intervening quantifiers and then to combine larger sets of quantifiers. In our semantics we built corresponding smaller and larger sets of quantifiers that express the constraints on relative quantifier scope that can be observed in inverse linking readings. The flexible composition approach as used in this chapter does not increase the generative capacity of the TAG formalism, it is just a specific way of ordering the derivations in a TAG.
Acknowledgements We would like to thank Chung-Hye Han and Mark Steedman for fruitful discussions of the subject of this chapter. Furthermore, we are grateful to two anonymous reviewers and to the participants of IWCS-5 for many valuable and helpful comments.
References Alshawi, H. (ed.): 1992, The Core Language Engine. MIT Press. Barker, C.: 2001, ‘Integrity: A Syntactic Constraint of Quantificational Scoping’. In: M. K. and B. el L.A. (eds.): Proceedings of WCCFL 20. Somerville, MA, Cascadilla Press. Barker, C.: 2002, ‘Continuations and the Nature of Quantification’. Natural Language Semantics. Bos, J.: 1995, ‘Predicate Logic Unplugged’. In: P. Dekker and M. Stokhof (eds.): Proceedings of the 10th Amsterdam Colloquium. pp. 133–142. Buering, D.: 2001, ‘A Situation Semantics for Binding out of DP’. In: R. H. et al. (ed.): Proceedings from SALT XI. Ithaca, NY, CLC Publications. Candito, M.-H. and S. Kahane: 1998, ‘Can the TAG Derivation Tree represent a Semantic Graph? An Answer in the Light of Meaning-Text Theory’. In: Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks, IRCS Report 98–12. University of Pennsylvania, Philadelphia, pp. 25–28. Cinque, G.: 1999, Adverbs and functional heads : a cross-linguistic perspective. NY: Oxford University Press.
FLEXIBLE COMPOSITION IN LTAG
255
Copestake, A., D. Flickinger, I. A. Sag, and C. Pollard: 1999, ‘Minimal Recursion Semantics. An Introduction’. Manuscript, Stanford University. Copestake, A., A. Lascarides, and D. Flickinger: 2001, ‘An Algebra for Semantic Construction in Constraint-based Grammars’. In: Proceedings of ACL. Heim, I. and A. Kratzer: 1998, Semantics in Generative Grammar. Blackwell. Hobbs, J. and S. Shieber: 1987, ‘An Algorithm for Generating Quantifier Scopings’. Computational Linguistics 13, 47–63. Joshi, A. K. and Y. Schabes: 1997, ‘Tree-Adjoning Grammars’. In: G. Rozenberg and A. Salomaa (eds.): Handbook of Formal Languages. Berlin: Springer, pp. 69–123. Joshi, A. K. and K. Vijay-Shanker: 1999, ‘Compositional Semantics with Lexicalized Tree-Adjoining Grammar (LTAG): How Much Underspecification is Necessary?’. In: H. Bunt and E. Thijsse (eds.): Proceedings ot the Third International Workshop on Computational Semantics (IWCS-3). Tilburg, pp. 131–145. Revised version published in: H. Bunt, R. Muskens, and E. Thijsse (eds.) Computing Meaning, Vol.2. Dordrecht: Kluwer Academic Publishers 2001, pp. 147–163. Kallmeyer, L. and A. K. Joshi: 1999, ‘Factoring Predicate Argument and Scope Semantics: Underspecified Semantics with LTAG’. In: P. Dekker (ed.): 12th Amsterdam Colloquium. Proceedings. Amsterdam, pp. 169–174. Kallmeyer, L. and A. K. Joshi: 2003, ‘Factoring Predicate Argument and Scope Semantics: Underspecified Semantics with LTAG’. Research on Language and Computation 1(1–2), 3–58. Kratzer, A.: 1998, ‘Scope or Pseudoscope? Are There Wide-Scope Indefinites?’. In: S. Rothstein (ed.): Events and Grammar. Great Britain: Kluwer Academic Publishers, pp. 163–196. Larson, R.: 1987, ‘Quantifying into NP’. Ms. MIT. Link, G.: 1983, ‘The Logical Analysis of Plurals and Mass Terms’. In: R. B. et al. (ed.): Meaning, Use and Interpretation of Language. Berlin: De Gruyter, pp. 302–323. May, R.: 1985, Logical Form. Its Structure and Derivation. Cambridge, Mass.: MIT Press. Morill, G. V.: 1994, Type Logical Grammar. Categorial Logic of Signs. Dordrecht: Kluwer. Percus, O.: 2000, ‘Constraints on some other variables in syntax’. Natural Language Semantics. Reinhart, T.: 1997, ‘How Labor is Divided between QR and Choice Functions’. Linguistics and Philosophy pp. 335–397. Reyle, U.: 1993, ‘Dealing with Ambiguities by Underspecification: Construction, Representation and Deduction’. Journal of Semantics 10, 123–179. Romero, M.: 2002, ‘Quantification over situations variables in LTAG: some constraints’. In: Proceedings of LTAG+6. Venice, Italy.
256
JOSHI ET AL.
Sauerland, U.: 2000, ‘Syntactic Economy and Quantifier Raising’. Ms. University of T¨ ubingen. Steedman, M.: 1996, Surface Structure and Interpretation, No. 30 in Linguistic Inquiry. Cambridge, MA: MIT Press.
FABRICE NAUZE AND MICHIEL VAN LAMBALGEN
SERIOUS COMPUTING WITH TENSE
1. Introduction
In this chapter we present a novel approach to the formal semantics of the French tense system. More precisely, we give a synopsis of the computational theory of meaning developed in (van Lambalgen and Hamm, 2003b), (van Lambalgen and Hamm, 2003a) and the forthcoming book (van Lambalgen and Hamm, 2004), and apply it to two French tenses, Pass´e Simple and Imparfait. Much work has been done on French tenses within the framework of DRT or extensions thereof such as SDRT. The latter uses so-called rhetorical relations such as elaboration, to explain the peculiar ways in which events described by sentences in Pass´e Simple form can be ordered in time. It is claimed here that a much more insightful description can be obtained by taking a computational point of view, in which the meaning of an expression corresponds to an algorithm which computes its denotation in a given context. ‘Algorithm’ is taken very seriously here – it is the precise form of the algorithm (constraint logic programming) that is important. A cognitive justification for this particular choice is provided in (van Lambalgen and Hamm, 2004); here we can only hope to convince the reader by examples of the algorithm in action. 2. Data
In this section we provide some data pertinent to Pass´e Simple and the Imparfait. We begin with a discussion of the Pass´e Simple, and continue with examples of the interplay between Imparfait and Pass´e Simple. 2.1. Examples involving the Pass´e Simple We will start our discussion with a typical example of a narrative discourse with the PS where the events described are in temporal succession: 257 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 257–300. c 2007 Springer.
258
NAUZE AND VAN LAMBALGEN
(108) Pierre se leva, monta dans sa chambre, ferma la porte et alluma la radio. (4 × PS) What can be said about the role of the PS in this example? Obviously, the PS conveys the information that all events are located in the past. More interestingly, it is often claimed that these events are to be viewed as punctual in the sense that there are no other events which could partition them. The internal constitution of the events is not important; this means that the PS views events as perfective. The PS imposes a view of the events ‘from the outside’ and from a distance. This is then claimed to explain why multiple uses of the PS implies a succession of the events described. As the events are seen as punctual, irreducible and viewed from the outside, it is then natural to expect that two events in the PS are not simultaneous, and so that one is happening before the other. Then the obvious choice is to place first things first (unless explicitly stated otherwise). Hence in 108, the getting up of Pierre precedes his going up in his room, etc... This is why the PS is often considered to imply narrative succession. Let us try to describe the above in a more formal manner. The most evident effect of the PS is to place the eventuality in the past of the speech time (this is what is known as “pure” tense information). We have now two options to account for the succession effect. We may assume, as in early versions of DRT, that the PS introduces a new reference point placed after an old one (this would amount to a direct representation of the “succession effect” of the PS). Alternatively, we may posit that the PS represents the eventuality as perfective and located in the past, and derive the succession effect from this, whenever it is appropriate. We will choose the latter option, as it seems to be a better representation of the core meaning of the PS, succession being in our view only a (albeit quite frequent) side-effect. In fact, a good counter-example to the unconditional validity of the succession effect of the PS was given by Kamp and Rohrer, here slightly changed to (109) L’´et´e de cette ann´ee-l`a vit plusieurs changements dans la vie de nos h´eros. Francostcois ´epousa Ad`ele, Jean partit pour le Br´esil et Paul s’acheta une maison `a la campagne. (4×PS) The first sentence introduces an event which gets divided in the following sentence (this phenomenon is known as the rhetorical relation of elaboration; it can also be viewed as a change of granularity in the description of events). How this first event is divided cannot be
SERIOUS COMPUTING WITH TENSE
259
determined from those PS sentences alone. In a way the first sentence ’asks’ for an enumeration afterwards, and so the next verb phrases enumerate the list of changes in the life of the ‘heroes’, but in the absence of adverbs or ordering conjunctions (like puis) we cannot give the precise temporal relationship between those events. Hence we have here two phenomena: the first sentence gets divided by others (in a way this could be seen as contradicting the perfectivity of the PS), and furthermore the following PS sentences do not impose a natural ordering on the events described by them. One of the causes of this lack of ordering is that the VPs have different subjects: Maurice, Jean and Paul. We can reformulate example 109 by removing one of the subjects as in (110) L’´et´e de cette ann´ee-l`a vit plusieurs changements dans la vie de nos h´eros. Maurice ´epousa Ad`ele et partit pour le Br´esil, Paul s’acheta une maison `a la campagne. (4 × PS) In sentence 110 we have now a succession of two events Maurice marrying Ad`ele and then leaving to Brazil. However we still cannot derive any ordering of those two VPs with the third. We should also note that the inverse temporal order seems to be called for in the following example of Gosselin (Gosselin, 1996, p.117) (111) Pierre brisa le vase. Il le laissa tomber. (PS × 2) Even without the use of an explanative conjunction like car, it seems we can derive the explanation reading, and this for two reasons: first, the achievement of the first sentence is irreversible in the way that the object of the sentence is changed for good after the achievement (briser ), second, the anaphoric pronoun le in the second sentence refers to the the vase, not to the broken vase which is the result of the first sentence, hence we expect that the second sentence applies to the not-yet-broken vase. We can further notice that the first sentence presupposes an action on the part of the subject Pierre on the vase (directly or indirectly), which causes the breaking. Furthermore, the subjects of the two sentences agree, and the pronoun of the second sentence refers to the object of the first sentence; and obviously to drop something is a plausible cause of breaking this same thing. It then seems natural that the second sentence actually describes the action that leads to the breaking of the vase.1 It should also be noticed that 1
We have provided such an extensive discussion of example 111 because there appears to be a general agreement in the literature on the impossibility of the PS to give an inverse temporal reading.
260
NAUZE AND VAN LAMBALGEN
the fact that the two sentences in 111 are separated by a period is of major importance. If it is not the case, as in (112) Pierre brisa le vase et le laissa tomber. (2 × PS) there is no ambiguity about the ordering of the two events described by the sentences: the breaking happens before the falling. Furthermore, even when the sentences are separated by a period we may get the ordering expressed by 112. If we add a further sentence, as in (113) a. Pierre brisa le vase avec un marteau. Il le laissa tomber et s’en alla. b. Pierre brisa le vase avec un marteau. Il le laissa tomber. Il s’en alla sans le regarder. the narrative seems to force the events to be ordered corresponding to the sentences. Let us now change the examples 108, 109 and 111 somewhat, to determine when and why narration occurs or on the contrary breaks down. In example 108 we have a simple succession of events affecting one subject, in 109 we have several events affecting different subjects and occurring in a certain period of time but not explicitly ordered with respect to each other, and finally in 111 we have two events affecting one subject and one object in inverse temporal order. Now consider the following variations. (114) a. Pierre monta dans sa chambre et ferma la porte. (2 × PS) b. Pierre ferma la porte et monta dans sa chambre. (2 × PS) c. Pierre ferma la porte et Jean monta dans sa chambre. (2 × PS) d. # Pierre monta dans sa chambre, ferma la porte, alluma la radio et se leva. (4 × PS) e. Cet ´et´e-l` a, Maurice ´epousa Ad`ele, Jean partit pour le Br´esil et Paul s’acheta une maison a` la campagne. (3 × PS) f. Cet ´et´e-l` a, Maurice ´epousa Ad`ele et partit pour le Br´esil et Paul s’acheta une maison a` la campagne. (3 × PS)
SERIOUS COMPUTING WITH TENSE
261
Examples 114a and 114b describe a succession of two events accomplished by a single subject: monter dans sa chambre (go upstairs in his room) and fermer la porte (close the door). In example 114a Pierre goes first in his room and then closes the door whereas in 114b he first closes the door and then goes in his room. As those eventualities are seen as perfective (this is the aspectual effect of the PS), are ascribed to one subject (this is a syntactic property of the sentence) and can hardly be done simultaneously (this is part of the semantics of those eventualities), the only possibility is that those two events are consecutive. However, the claim that the PS implies succession must be revised. All we get is that in a discourse in which the PS describes eventualities which have few semantic connections (note that going upstairs doesn’t presuppose closing the door and vice-versa) and in which there is a unique subject, the order of the events is isomorphic to the utterance order. What is heard (or read) first, happens first. Here are some more examples to show that the two factors identified, semantic connections and uniqueness of subject, indeed influence the reading of a piece of discourse. The importance of uniqueness of subject can be seen in examples 114c, 114e and 114f. The only difference between 114b and 114c is that in the latter the second VP has a different subject than the first. The correct reading of this sentence is probably that of a succession but the possibility of simultaneity is not excluded, as in 114b. This sentence can describe the simultaneous actions of two subjects but would be inadequate to described the inverse order. Examples 114e (a simplified version of 109) and 114f differ in that Maurice is now the subject of two events. Furthermore those two events are successive but still in no particular relation to the third event. In 114e all subjects differ and we have no special ordering between the events. Sentence 114d isn’t correct because Pierre going into his room and closing the door presupposes (semantically) that he remains standing.2 Hence to determine the temporal relation of a new PS VP with respect to a given sequence of PS VPs, all having the same subject, the meaning of the new VP must be compared with the possible lexical (semantic) information conveyed by the preceding VPs. The last example we will give involves aspectual information. The reader may have noticed that the VPS in the preceding examples are 2
This is not the case for the VP ‘switch the radio on’. Therefore the following sentence is correct: Pierre alluma la radio et se leva. (2 × PS)
262
NAUZE AND VAN LAMBALGEN
either accomplishments or achievements. The PS can also be used with states or activities, however. (115) Il fut pr´esident. (PS) In this example we obtain an inchoative reading. This is made clear by giving the proper English translation: He became president (and not He was president). The stative VP is coerced by the PS into its initiating event.3 2.2. Examples involving the Imparfait It is illustrative to begin this section by citing several comments on this tense from the literature. De Swart says in (de Swart and Corblin, 2002, p. 57), sentences in the Imparfait are traditionally taken to describe background information that does not move the story forward.
It follows Kamp’s view which is motivated by the study of the tenses in narrative context and where the fact that the Imp doesn’t move the narration forward is directly opposed to the fact that the PS does. Gosselin, in (Gosselin, 1996, p. 199), doesn’t put the emphasis on moving the story line forward, but notices that the Imp refers to a moment in the past during which the process is going on, without precision about the situation of the beginning and the end of the process.4
Sten in (Sten, 1952) focusses on its use as “present in the past”: L’imparfait sert a` indiquer une action qui serait du pr´esent pour un observateur du pass´e,...”, (the Imp serves to indicate an action which would be present for an observator in the past). 3
Notice that the combination PS + stative VP does not logically imply an inchoative reading. (116) Il fut pr´esident de 1981 `a 1995. (PS) Here, we do not obtain an inchoative reading but just a perfective eventuality. 4 p. 199, (Gosselin, 1996): “L’imparfait renvoie donc typiquement a` un moment du pass´e pendant lequel le proc`es se d´eroule, sans pr´eciser la situation temporelle du d´ebut et de la fin du proc`es. Ce temps apparaˆıt non autonome (anaphorique) et situe le proc`es comme simultan´e par rapport a` d’autres proc`es du contexte, et comme se d´eroulant en un mˆeme lieu.
SERIOUS COMPUTING WITH TENSE
263
Finally, all authors stress the anaphoric nature of this tense, in the sense that it cannot be used by itself but only with reference to another sentence or with temporal adverbials.5 We may summarize these positions by saying that the Imparfait is an anaphoric, imperfective past tense. We will now introduce some examples of the use of the Imparfait, however the reader should notice that those examples partially represent the possibilities of the Imparfait. In particular we won’t give examples of the so-called narrative Imparfait or habitual and iterative readings. The anaphoric and imperfective nature of the Imp can be seen in the following example: (117) a. # Il faisait chaud. (Imp) b. Il faisait chaud. Jean oˆta sa veste. (Imp, PS) That sentence 117a is not felicitous is explained in Kamp’s theory by the fact that there is no previous “reference point” to anchor the sentence and that an Imp sentence such as 117a does not introduce its own reference point.6 In sentence 117b, the Imp sentence is “attached” to the reference point introduced by the PS sentence and the imperfective aspect of the Imp is due to the fact that the PS event happens while the Imp eventuality holds. It is however not a general rule for the Imp, that the Imp eventuality contains its reference point, as is shown by the following examples. (119) Jean appuya sur l’interrupteur. La lumi`ere l’´eblouissait. (PS, Imp) (120) Jean attrapa une contravention. Il roulait trop vite. (PS, Imp) The Imp sentence in 119 is viewed as a consequence of the PS sentence; clearly the light cannot blind Jean before he switched it on. De Swart, in (de Swart and Corblin, 2002, p. 59–61), maintains that 5
See for instance (Kamp, 1983(?), p. 35). Notice that the reference point does not have to be introduced by a PS sentence; it can also be a temporal adverbial, or even the subject of the sentence, as in the following examples 6
(118) a. Mercredi, il pleuvait. Jeudi, il faisait soleil. (Imp, Imp) b. Le grand-p`ere de Marie ´etait noir. (Imp)
264
NAUZE AND VAN LAMBALGEN
the reference point for the Imp sentence is not the PS sentence, but rather its consequent state (the light is switched on). Then we would have simultaneity between the Imp sentence and its reference point. On de Swart’s approach, the decision whether the Imp overlaps with the PS reference point or with its consequent state is made on the basis of rhetorical relations between the sentences; this theory is what is known as SDRT. De Swart calls this rhetorical relation temporal implication and she provides an analogous explanation for 120, introducing the rhetorical relation of temporal presupposition. In example 120 the Imp sentence is understood as being the cause of getting a ticket hence even though the Imp sentence is placed after the PS sentence the activity driving too fast takes place before getting a ticket. We believe that in this area explanations of much greater generality are possible than those provided by SDRT. Below we present a fully computational semantics for French tenses, built upon the computational mechanism of (constraint) logic programming. In this setup, rhetorical relations will turn out to be derived constructs, abstracting certain features of the computations.
3. Computational semantics for tense
Tense is concerned with the grammatical localization of events in time. However, this does not mean that the ontology required for tense is restricted to on the one hand a time line, such as the reals, and on the other hand a set of discrete, structureless entities called events. The ontology must be very much richer, as has been put forcefully by Mark Steedman (Steedman, 1997, p. 932) The semantics of tense and aspect is profoundly shaped by concerns with goals, actions and consequences . . . temporality in the narrow sense of the term is merely one facet of this system among many.
The system of English future tenses provides a good illustration for this point of view: present tense in its future use, the futurate progressive, and the auxiliaries will and be going to all express different relations between goals, plans, and the actions comprising those plans. The case of the English future tense will be explained in detail in the forthcoming book (van Lambalgen and Hamm, 2004); here we concentrate on the French past tenses. We claim that also in this case a rich ontology involving goals, plans and actions is necessary to capture the data. As our basic representational format we choose a formalism, the event
SERIOUS COMPUTING WITH TENSE
265
calculus, that was developed in robotics for the purpose of autonomous planning in robots. The book (van Lambalgen and Hamm, 2004) contains a lengthy defense of the idea that human understanding of time is conditioned by human planning procedures. Indirectly one may then also expect that the grammatical correlate of that understanding is conditioned by planning. The next step is then, to try to model the semantics of tense by means of a planning formalism. 3.1. A calculus of events By definition, planning means setting a goal and computing a sequence of actions which provably suffice to attain that goal. It involves reasoning about events, both actions of the agent and events in the environment, and about properties of the agent and the environment, which may undergo change as a consequence of those events. A simple example is that of an agent who wants a light L to burn from time t0 until t1 , and knows that there is a switch S serving L. The obvious plan is then to turn S at t0 , and to leave S alone until t1 . Even this simple plan hides a number of problems. We required that a plan should be provably correct. On a classical reading, that would mean that the plan is sure to achieve the goal in every model of the premisses, here the description of the situation and its causal relationships. Among these models, there will be some containing non-intended events, such as light turning off spontaneously (i.e. without an accompanying turn of the switch), or a gremlin turning off the switch between t0 and t1 . In fact it is impossible to enumerate all the things that may go wrong. No planning formalism is therefore likely to give ‘provably correct plans’ in the sense of classical logic. The most one can hope for is a plan that works to the best of one’s knowledge. The event calculus7 is a formalism for planning that addresses some of these concerns. It axiomatizes the idea that all change must be due to a cause–spontaneous changes do not occur. It thus embodies one sense of the common sense principle of inertia: a property persists unless it is caused to change by an event. That is, if an action a does not affect a property F , then if F is true before doing a, it will be true after. Of course, the crucial issue in this intuitive idea concerns the notion of ‘affect’. This refers to a kind of causal web which specifies the influences of actions on properties.The other difficulty with planning identified 7
The version used in this chapter is modelled on that developed by Murray Shanahan, which in turn was based on work by Kowalski and Sergot. A good reference is the book (Shanahan, 1997).
266
NAUZE AND VAN LAMBALGEN
above, the possibility of unexpected events, can be treated either in the axiomatic system, or equivalently in the logic underlying the system. The solution of this difficulty is essentially to restrict the class of models of the axiomatic system to those models which are in a sense minimal: only those events happen which are required to happen by the axioms, and similarly only those causal influences obtain which are forced by the axioms. Our treatment of the event calculus will correspond to the division just outlined: we first discuss its formalization of causality, and then move on to introduce the class of its minimal models. Formally, the event calculus requires a many-sorted first order logic with sorts for the following: 1. 2. 3. 4.
individual objects, such as humans, chairs, tables, . . . real numbers, to represent time and variable quantities time-dependent properties, such as states and activities variable quantities, such as position, degree of sadness, state of completion of a painting, . . . 5. event types, whose instantiations (i.e. tokens) mark the beginning and end of time-dependent properties. A few comments on this list are in order. The predicates of the event calculus will be seen to have an explicit parameter for time. We have chosen to represent time by the real numbers, actually by the structure (R, <; +, ×, 0, 1). It is explained at length in (van Lambalgen and Hamm, 2004) why this choice is justified as well as harmless, so we do not dwell on this topic here. It may furthermore strike the reader that properties are reckoned to belong to the ontology of the event calculus, on a par with individual objects and time points. Usually properties correspond to predicates, hence objects of a different type than that of entities. But in the event calculus a property is an object which may itself fill an argument slot in a predicate. There are several reasons for this, one having to do with the notion of ‘cause’. Consider one of the most complex classes of verbs, the accomplishments, of which examples are ‘draw a circle’, ‘write a letter’, ‘cross the street’. Eventualities representing such verbs have an elaborate internal structure. On the one hand there is an activity taking place (draw, write, cross), on the other hand an ‘object’ is being ‘constructed’: the circle, the letter, or the path across the street. Dowty (in (Dowty, 1979)) analyzes the progressivized accomplishment (121) Mary is drawing a circle as
SERIOUS COMPUTING WITH TENSE
267
(122) cause Mary draws something, a circle comes into existence That is, the sentence is decomposed into an activity (‘Mary draws something’) and a partial, changing, object (‘circle’); it is furthermore asserted that the activity is the cause of the change. For Dowty, causality is a relation between propositions, and accordingly he tries, not entirely successfully, to give an account of causality in terms of possible world semantics. By contrast, the event calculus gives an analysis of causality which has its roots in physics, as a relation between events. The event calculus actually formalizes two notions of cause, and their relation. The first notion of cause is concerned with instantaneous change, as when two balls collide. We are thus concerned with an event (type) collision, which for simplicity is assumed to occur instantaneously. An event type together with a time at which it occurs (or happens) will be referred to as an event token. We furthermore need time-dependent properties such as, for example, ‘ball b has momentum m’. In the case at hand, the property ‘ball 1 has momentum m and ball 2 has momentum 0’ will be true until the time of collision t, after which ‘ball 2 has momentum m and ball 1 has momentum 0’ is true. Such time-dependent properties are called fluents8 . A fluent is a function which may contain variables for individuals and reals, and which is interpreted in a model as a set of time points. We now want to be able to say that fluents are initiated and terminated by events, and that a fluent was true at the beginning of time. If f is a variable over fluents, e a variable over events, and t a variable over time points, we may write the required predicates as 1. 2. 3. 4.
Initially(f) Happens(e,t) Initiates(e,f,t) Terminates(e,f,t)
If events happen instantaneously, these predicates are to be interpreted in such a way, that if Happens(e, t) ∧ Initiates(e, f, t), then f will begin to hold after (but not at) t; if Happens(e, t) ∧ Terminates(e, f, t), then f will still hold at t. The second notion of causality is more like change due to a force which exerts its influence continuously. The paradigmatic example here is the acceleration of an object due to the gravitational field, but other examples abound: pushing a cart, filling a bucket, drinking a glass of 8
The name is appropriated from Newton’s treatise on the calculus, where all variables are assumed to depend implicitly on time.
268
NAUZE AND VAN LAMBALGEN
wine, writing a letter, . . . . As the reader can see from this list, continuous change is important in providing a semantics for accomplishments. Continuous change requires its own special predicates, namely 5. Trajectory(f1 ,t,f2 ,d) 6. Releases(e,f,t) In the Trajectory predicate, one should think of f 1 as a force, and of f 2 as a variable quantity which my change under the influence of the force. The predicate then expresses that if f 1 holds from t until t + d, then at t + d, f 2 holds. In applications, f 2 will have a real number as argument, and will be of the form f (g(t + d)) for some continuous function g. The predicate Releases is necessary to reconcile the two notions of cause with each other. Cause as instantaneous change leads to one form of inertia: after the occurrence of the event marking the change, properties will not change value until the occurrence of the next event. This however conflicts with the intended notion of continuous change, where variable quantities may change their values without concomitant occurrences of events. The solution is to exempt, by means of the special predicate Releases, those properties which we want to vary continuously, from the inertia of the first form of causation. The axioms will be seen to have the form: if there are no ‘f -relevant’ events between t1 and t2 , then the truth value of f at t1 is the same as that at t2 . We introduce two special predicates to formalize the notion of ‘f -relevant’ events. The first predicate expresses that there is a terminating or releasing event between t1 and t2 ; the second predicate expresses that there is an initiating or releasing event between t1 and t2 . 7. Clipped(t1 ,f,t2 ) Lastly, we need the ‘truth predicate’ 8. HoldsAt(f,t). The intuitive meaning of HoldsAt(f, t) is that the fluent f is true at time t. In the usual setup of the event calculus, such defining axioms for this truth predicate are lacking, and this can easily lead to contradictions. However, due to lack of space we cannot furnish the relevant truth theory and we ask the reader to simply assume that HoldsAt(f, t) can indeed be forced to have the meaning indexmeaning ‘the fluent f is true at time t’. For language processing we need a lexicon, which can be thought of as associating a theory to each lexical item. We will show below that
SERIOUS COMPUTING WITH TENSE
269
for the purposes of discussing tense it is very useful to formulate these theories in the language of the event calculus. In fact we claim that there is a profitable analogy between linguistics and robotics here. In order to derive predictions, e.g. on when a robot will reach its destination, one needs a theory describing the robot’s situation, conveniently divided in axioms, holding for every situation, and a scenario, laying down properties of a particular situation; this latter theory corresponds to the lexicon. We first study the axioms. 3.2. The axiom system EC The axioms of the event calculus given below are modified from (Shanahan, 1997), the difference being due to the fact that we prefer a logic programming approach, whereas Shanahan uses a technique for obtaining minimal models called circumscription. In the following, all variables are assumed to be universally quantified. AXIOM 1. Initially(f ) ∧ ¬Clipped(0, f, t) → HoldsAt(f, t) AXIOM 2. Happens(e, t) ∧ Initiates(e, f, t) ∧ t 0, and that no ‘f -relevant’ event occurs between t1 and t2 . Here, ‘f -relevant’ is rendered formally by the predicate Clipped, whose meaning is given by axiom 4. Axiom 2 then says that f also holds at time t2 . The role of the Releases predicate is important here, because it provides the bridge between the two notions of causality. Axiom 2 really embodies the principle of inertia as it relates to the first notion of causality, instantaneous change: in the absence of relevant events, no changes occur. However, continuous change occurs due to a force, not an event, and hence absence of relevant events does not always entail absence of change. The Releases predicate then provides the required loophole.
270
NAUZE AND VAN LAMBALGEN
(
]
(
]
i
(
I
*
]
-
R
f Figure 39.
Structure of fluents
The first axiom says that if a fluent holds at time 0 and no event has terminated or released it before time t > 0, it still holds at t. Axiom 3 is best explained by means of the example of filling a bucket with water. So let f 1 be instantiated by filling, and f 2 by height(x). If filling has been going on uninterruptedly from t until t , then for a certain x, height(x) will be true at t , the particular x being determined by the law of the process as exemplified by the Trajectory–predicate. 3.3. A model for EC In the absence of further statements constraining the interpretation of the primitive predicates, a simple model for EC is obtained by taking the extensions of Happens and Initially to be empty. If we then set ¬HoldsAt(f, t) for all f, t we obtain a model. However, this model is not very informative, and to facilitate the reader’s comprehension of the axioms, we will sketch an intuitively appealing class of models of EC. It can in fact be proven that all models of EC of interest in this context are of the form to be presented. We have to specify the sorts of fluents and event types in such a way that EC holds automatically. We interpret fluents as sets of intervals of the form [0, b] or (a, b], where a is the instant at which an initiating event occurs, and b is the instant where ‘the next’ terminating event occurs9 . Talk about ‘the next’ seems justified due to the inertia inherent in fluents. A typical fluent therefore looks as in figure 39. For the purpose of constructing models, we think of event (types) as derivative of fluents, in the sense that each event either initiates or terminates a fluent, and that fluents are initiated or terminated by events only. The instants are taken to be nonnegative reals. Each fluent f is a finite set of disjoint halfopen intervals (a, b], with the possible addition of an interval [0, c]. Event types e are of the form e = e+ f 9
Note that a fluent does not hold at the instant it is initiated, but does hold at the moment it is terminated. We need intervals [0, b] to account for Initially statements. We allow b to be ∞.
SERIOUS COMPUTING WITH TENSE
271
or e = e− f where e+ f := {(f, r) | ∃s((r, s] ∈ f )} and e− f := {(f, s) | ∃r((r, s] ∈ f )}. This then yields the following interpretations for the distinguished predicates. 1. 2. 3. 4. 5. 6. 7.
HoldsAt := {(f, t) | ∃I ∈ f (t ∈ I)} Initially := {f | ∃s > 0[0, s] ∈ f )} Happens := {(e, t) | ∃f ((e = e+ f ∨ e = e− f ) ∧ (f, t) ∈ e)} Initiates := {(e, f, t) | e = e+ f ∧ (f, t) ∈ e} Terminates := {(e, f, t) | e = e− f ∧ (f, t) ∈ e} Releases := ∅ Clipped := {(t1 , f, t2 ) | ∃t(t1 < t < t2 ∧ (f, t) ∈ e− f )}
PROPOSITION 1. EC is true under the above interpretation. The model captures an important intuition, namely that fluents can be represented by intervals, that is, very simple sets, as a consequence of ‘the common sense law of inertia’. The reader is advised to have this interpretation in mind when interpreting statements in the formal language. 3.4. Scenarios The above axioms provide a general theory of causality. We also need ‘micro-theories’ which state the specific causal relationships holding in a given situation, and which list the events that have occurred in that situation. We claim that an important part of the lexicon can also be represented in this causal format. For example, in the case of ‘draw a circle’ the situation contains (at least) an activity (‘draw’) and a changing partial object (the circle in its various stages of completion); the micro-theory should specify how the activity ‘draw’ is causally related to the amount of circle constructed. This is done by means of two definitions, of state and scenario. DEFINITION 10. A state S(t) at time t is a conjunction of 1. literals of the form (¬)HoldsAt(f, t), for t fixed and possibly different f . 2. (in)equalities between fluent terms, between event terms and between constants for individuals. 3. formulas in the language of the structure (R, <; +, ×, 0, 1)
272
NAUZE AND VAN LAMBALGEN
DEFINITION 11. A scenario is a conjunction of statements of the form 1. 2. 3. 4. 5. 6.
Initially(f ), ∀t(S(t) → Initiates(e, f, t)), ∀t(S(t) → Terminates(e, f, t)), ∀t(S(t) → Releases(e, f, t)), ∀t(S(t) → Happens(e, t)), S(f 1 , f 2 , t, d) → Trajectory(f 1 , t, f 2 , d).
where S(t) (more generally S(f 1 , f 2 , t, d)) is a state in the sense of definition 10. These formulas may contain additional constants for objects, reals or time points, and can be prefixed by universal quantifiers over reals (including time points) and objects. Formulas of type 6 are said to define a dynamics. One final remark before we consider an example. The definition of ‘state’ refers only to fluents being true or false; it is not allowed to include conjuncts using Happens. This may seem strange for a formalism concerned with causality; after all, the archetypical form of causality is one where Happens(e1 , t) implies Happens(e2 , s) for some s slightly later than t. There is a formal reason for our choice: allowing Happens in states increases the danger of nonterminating computations. In practice the restriction can be liberalized: one must only take care that no loops are introduced. In one example below we shall make use of the more liberal form; the reader can easily check that this causes no harm. 3.5. Minimal models We now turn to an important feature in which the proposed computational semantics differs from, say, DRT. DRSs are always taken to be substructures of the ‘real’ world (or a world of fiction, as the case may be). By contrast, the models that we consider are ‘closed worlds’ in the sense that events which are not forced to occur by the scenario, are assumed not to occur. Later additions to the scenario may overturn this assumption, so that incremental processing of a discourse does not lead to a ‘nice’ chain of DRSs ordered by the substructure relation. Instead, we obtain a nonmonotonic progression. In (van Lambalgen and Hamm, 2003a) it is shown how the peculiar meaning of the progressive form in English can be explained in this way, and, also, that the ubiquitous phenomenon of coercion is a natural consequence of this computational
SERIOUS COMPUTING WITH TENSE
273
model. This form of nonmonotonicity is easiest explained in terms of planning; we will latter return to its linguistic relevance. Consider a problem with planning referred to above: it is impossible to construct a plan which is provably correct in the sense that it works whatever is true in the real world. We can only hope for plans which are correct with respect to the eventualities that are envisaged now, barring unforeseen circumstances. Formally, this means that we must restrict the class of models of event calculus and scenario to models which are minimal in the sense that the occurrences of events and their causal influences are restricted to what is required by the scenario and EC. Thus if a scenario contains only the following statements involving Happens − Happens(switch-on, 5) − Happens(switch-off, 10) a non-minimal model of this scenario would be one in which the following events are interpolated between times 5 and 10: − Happens(switch-off, 8) − Happens(switch-on, 9) Similarly, if the scenario contains only the following statements involving Initially, Initiates and Terminates − ¬HoldsAt(light-on, t) → Initiates(switch-on, light-on, t) − Terminates(switch-off, light-off, t) a non-minimal model of this scenario could contain the additional statements concerning causal influences − Initially(light-on) − HoldsAt(light-on, t) → Terminates(switch-on, light-on, t) While this intuition about minimality is fairly straightforward, its implementation is less so. Somewhat surprisingly, there exist essentially different ways of defining ‘minimal model’. We favour the definition implicit in logic programming, because of its computational nature. It will be shown that in the cases of interest to us, there exists in fact a unique minimal model, which defines the denotations of all expressions occurring in the scenario. Moreover, there exists a computable procedure for obtaining the minimal model, so that the denotations are in fact computable.
274
NAUZE AND VAN LAMBALGEN
3.6. Logic programming with constraints and negation as failure This section introduces the computational machinery that we will use. Examples of actual computations will be given below. We have not worried about details of implementation, such as providing selection rules, but apart from this the computational mechanism is fully explicit. Our favoured formalism, constraint logic programming, is in general concerned with the interplay of two languages. In our case these will be the languages L = {0, 1, +, ·, <}, and the language K consisting of the primitive predicates of the event calculus. The latter will also be called programmed predicate symbols, because we will write logic programs defining the primitive predicates. Not considering negation for the moment10 , clauses in a constraint logic program based on L and K are generally of the following form B 1 , . . . , B n , c → A, where the B 1 , . . . , B n , A are primitive predicates and c is a constraint. Constraints may occur only in the bodies of clauses. Likewise, a query has the logical form B 1 ∧ . . . ∧ B m ∧ c → ⊥. We shall use the notation ?c, B 1 , . . . , B m for queries, always with the convention that c denotes the constraint, and that the remaining formulas come from K. The words ‘query’ and ‘goal’ will be used interchangeably. The aim of a constraint computation is to express a programmed predicate symbol entirely in terms of constraints, or at least to find an assignment to the variables in the programmed predicate which satisfies a given constraint. Thus, unlike the case of ordinary logic programming, the last node of a successful branch in a derivation tree contains a constraint instead of the empty clause. To make this precise, we have to spell out the notion of derivation step and derivation tree. One difference between standard logic programming and constraint logic programming is its treatment of substitution. In the former, the unification algorithm, applied to two atoms, determines which terms have to be substituted for the variables occurring in the atoms, in order 10
I.e. restricting attention to so called definite constraint logic programs.
SERIOUS COMPUTING WITH TENSE
275
for the atoms to become identical. In constraint logic programming the treatment is different: when the unification algorithm has determined that a term t should be substituted for a given variable x, one adds a constraint t = x but no substitution is effected. If A, B are atoms, we let {A = B} denote the set of equations between terms which unify A and B if A and B are unifiable; otherwise {A = B} is set to ⊥. The constraints are then simply accumulated in the course of the derivation. There are some clear notational advantages to this approach, which avoids nested, possibly unreadable terms11 . The main advantage is conceptual, however, since it allows a more symmetric treatment of positive and negative information. The main derivation rule is resolution, which can be formalized as follows. Suppose ?c, B 1 , . . . , B i , . . . , B m is a goal, and D1 , . . . , Dk , c → A a program clause. A new goal ?c , B 1 , . . . , D1 , . . . , Dk , . . . , B m can be derived from these two clauses if the constraint c , defined as c = (c ∧ {B i = A} ∧ c ) is satisfiable in A. That is, if A can be unified with B i , one can replace B i by D1 , . . . , Dk if in addition the given constraint c is narrowed down to contain also the unifying substitution and the constraint c . Given this inference rule, the concepts of derivation tree and branch in a derivation tree have straightforward definitions12 . DEFINITION 12. A branch in a derivation tree is successful if it is finite and ends in a query of the form ?c, where c is a satisfiable constraint; note that the query is not allowed to contain an atom. A branch in a derivation tree is finitely failed if it ends in a query ?c, B 1 . . . B m such that either c is not satisfiable, or no program clause is applicable to the B i . Otherwise the branch is called infinite. Intuitively, this definition applied to the situation of interest means the following. Suppose we start from a query ?HoldsAt(f, t) and find a successful branch ending in ?c. This should mean that for all t, if A consequence of this approach may seem that the constraint language L has to be extended, since constraints in the wider sense may now also involve objects, (parametrized) events and (parametrized) fluents. However, all such syntactic objects can be coded into L. 12 As in the case of standard logic programming, one also needs the concept of a selection rule, which determines which atom should be chosen at a particular stage in a derivation. The interested reader may consult (Stuckey, 1995); we need not dwell on this topic here. 11
276
NAUZE AND VAN LAMBALGEN
c(t) is true, then so is HoldsAt(f, t). Likewise, if a branch finitely fails and ends in ?c, B 1 . . . B m , we should have, for all t satisfying c(t), ¬HoldsAt(f, t). For our purposes, definite constraint logic programs are not yet expressive enough, due to the occurrence of ¬ in the bodies of the axioms of the event calculus13 . DEFINITION 13. A complex body is a conjunction of literals, i.e. atoms and negated atoms, and constraints. A normal program is a formula ψ → A of CLP (T ) such that ψ is a complex body and A is an atom. The form of negation most congenial to constraint logic programming is constructive negation ((Stuckey, 1995)). In the customary negation as failure paradigm, negative queries differ from positive queries: the latter yield computed answer substitutions, the former only the answers ‘true’ or ‘false’. Constructive negation tries to make the situation more symmetrical by also providing computed answer substitutions for negative queries. Applied to constraint logic programming, this means that both positive and negative queries can start successful computations ending in constraints. The full operational definition of constructive negation is somewhat involved (see (Stuckey, 1995)), but we will provide a simplified version (disregarding the possibility of infinite derivations) modeled on negation as failure, which suffices for our purposes. The operational meaning of constructive negation may be given as follows. Suppose we are given the goal ?L1 , . . . Li , . . . , Ln , c.Here, the ψ i are literals and c is a set of constraints Consider a Li of the form Li = ¬B, which has been selected for processing. Start a subderivation with goal ?B. Assume this derivation tree is finite. Collect the constraints c1 , . . . , cl occurring on the successful branches of the tree (the finitely failing branches can be disregarded). The children of the goal ?L1 , . . . , Li , . . . , Ln,c are now of the form ?c ∧ ¬ci , L1 , . . . , Li−1 , . . . , Li+1 , . . . Ln for all i such that c ∧ ¬ci is satisfiable. There may be no such i, in which case the goal has no children. The subderivation may itself feature negative goals, so that an abstract definition of a derivation tree allowing constructive negation involves a recursion. We will not provide 13
The following definition is deliberately simplified from the one given in (Stuckey, 1995).
SERIOUS COMPUTING WITH TENSE
277
a definition, but the reader may check that the derivations used in the linguistic applications below all conform to the above characterization. A global concept of success for a derivation tree is given by the following definition. DEFINITION 14. A query ?c, G is totally successful if its derivation tree includes successful branches ending in constraints c ∧ c1 , . . . , c ∧ cn such that A |= ∀x(c → c1 ∨ . . . ∨ cn . Intuitively, a query ?c, G is totally successful if all instances of the query also succeed; this is much stronger than saying that the query is satisfiable, as one would in standard logic programming. 3.7. Semantics As in the case of negation as failure, the fundamental technical tool in describing the semantics of the above procedure is the completion of a program: DEFINITION 15. Let P be a normal program, consisting of clauses B 1 ∧ c1 → p1 (t1 ), . . . , B n ∧ cn → pn (tn ), where the pi are atoms and the B i are complex bodies. The completion of P, denoted by comp(P), is computed by the following recipe: 1. choose a predicate p that occurs in the head of a clause of P 2. choose a sequence of new variables x of length the arity of p 3. replace in the i-th clause of P all occurrences of a term in ti by a corresponding variable in x and add the conjunct x = ti to the body; we thus obtain B i ∧ ci ∧ x = ti → pi (x) 4. for each i, let z i be the set of free variables in B i ∧ ci ∧ x = ti not in x 5. given p, let n1 , . . . , nk enumerate the clauses in which p occurs as head 6. define Def(p) to be the formula ∀x(p(x) ↔ ∃z n 1 (B n 1 ∧cn 1 ∧x = tn 1 )∨. . .∨∃z n k (B n k ∧cn k ∧x = tn k ). "
7. comp(P) is then obtained as the formula p Def (p), where the conjunction ranges over predicates p occurring in the head of a clause of P. The soundness of the operational definition of constructive negation is expressed by
278
NAUZE AND VAN LAMBALGEN
THEOREM 1. Let P be a normal program on the constraint structure A, and let T be the axiomatization of A. 1. If the query ?c, G is totally successful, then T +P |= ∀x(c → G. 2. If the query ?c, G is finitely failed, then T +P |= ¬∃x(c ∧ G). There is also a corresponding completeness result (for which see (van Lambalgen and Hamm, 2004)), but this need not detain us here. It is however of some importance to note the following consequence of the computational procedure just outlined. The formulation of the result is somewhat sloppy (see (van Lambalgen and Hamm, 2004) for a rigorous version), but it suffices to capture the main idea. THEOREM 2. Let P be a normal program on the constraint structure A, and let T be the axiomatization of A. Then T +comp(P) has a unique model which is of the form given in section 3.3.
4. Where do the fluents come from?
We now have to connect the preceding material with natural language. The basic idea of the approach is that meaning is computational, and that computations are performed in the language of the event calculus. Here is an example of this kind of computation involving the English progressive, taken from (Hamm and van Lambalgen, 2000) and (van Lambalgen and Hamm, 2004). Consider the sentences (123) a. John was crossing the street. b. John crossed the street. The second sentence implies that John arrived at the other side, but the first one does not. Nevertheless, the VP ‘cross the street’ is telic in that it comes with a canonical culminating event. The so-called ‘imperfective paradox’ generated by sentences 123a and 123b is: on the one hand the canonical culminating event essentially belongs to the meaning of ‘cross the street’, on the other hand the actual occurrence of that event can be denied with impunity. The solution given in (van Lambalgen and Hamm, 2004) is that an accomplishment such as ‘cross the street’ actually corresponds semantically to a plan for reaching the other side of the street. This plan will be brought to successful completion in a minimal model of the plan, but not necessarily in extensions of the
SERIOUS COMPUTING WITH TENSE
279
minimal model. The plan corresponds to a scenario in the language of the event calculus, and the minimal model of the plan can be computed using the procedure outlined in the previous section. What we have not yet explained is how to associate a plan to a lexical expression. The first step in setting up a corresponding plan consists of the transformation of lexical material into fluents and events. Here is a brief sketch of how this can be done – a more detailed exposition can be found in (Hamm and van Lambalgen, 2000) and (van Lambalgen and Hamm, 2004). We assume that verbs correspond to predicates which have a time parameter. Thus, in the intransitive case, ‘walk’ corresponds to the predicate walk (x, t). Given this predicate, one can form two kinds of abstraction over it, corresponding to perfect and imperfect aspect, respectively: 1. ∃twalk (x, t) 2. {t | walk (x, t)} Since time is interpreted on the reals, which contain the integers, such expressions may be assigned G¨odel numbers as ‘codes’, with the consequence that these codes may themselves figure as arguments in predicates14 . Codes of expressions of the first kind will play the role of event types in the event calculus, whereas those of the second kind will function as fluents. Indeed, for fixed x, {t | walk (x, t)} can be viewed as a function from times to truth values, i.e. as a time-dependent property, and this is precisely how fluents are characterized. We shall employ the following notation in using events and fluents derived from natural language expressions: event types will be denoted by e(x), and fluents by f [x], where in both cases e and f are replaced by suitable natural language expressions. For example, we will meet the expressions leave-to(Jean,Brasil) (an event type), and is-president(x) (a fluent). The second step in setting up a plan corresponding to a lexical expression consists in writing down a scenario in the language of the event calculus, which captures the (causal aspects of) the meaning of that expression. This second step is best explained by means of examples, of which many will be given below. We have now completed our introduction to the computational machinery. In the next section we exploit a feature of this machinery to give an account of reference time, together with utterance time and event time the main pillar of the semantics of tense. 14
In AI, this process is known as reification.
280
NAUZE AND VAN LAMBALGEN 5. Reference time as integrity constraint
Reichenbach’s great insight into tense was his identification of the importance of the reference time, on a par with event time and utterance time. The reference time is a marker for the time, context or situation that we are talking about. R must be known by the participants in order for the temporal discourse to make sense. Reichenbach noticed that the reference time can be different from the event time, as for instance in the present perfect (124) I have caught a flu. Here the infection-event lies in the past, but the reference time is identical to the utterance time: the sentence is meant to have present relevance, e.g. as an explanation for my being bad–tempered. We now have to investigate how the reference time is to be formulated in our framework. This is not at all easy, as the following example will make clear. Suppose we try to model the English present perfect, in a very simple situation, where there is an event type e (say a viral infection) which initiates a consequent state f e.g. having a flu); there are no further events or fluents. The scenario therefore contains only the statement (125) Initiates(e, f, t). Suppose the utterance time is denoted by a constant now, to be interpreted on the reals; this constant belongs to the constraint language. The present relevance of the present perfect then suggests that the contribution of this tense to the scenario is the addition of formula 126b, so that the scenario becomes (126) a. Initiates(e, f, t) b. HoldsAt(f, now). We would like to derive from 126, using the axioms of the event calculus, that for some t < now, Happens(e, t). Naively one might reason as follows: completing the axioms of the event calculus plus the scenario gives us Happens(e, t) ∧ Initiates(e, f, t) ∧ t < t ∧ ¬Clipped(t, f, t ) ↔ HoldsAt(f, t ) ,
SERIOUS COMPUTING WITH TENSE
281
so that the desired result follows after applying the given 126b. Although this argument embodies an important intuition, it cannot be pushed through as stated, since the completion at issue is actually [[Happens(e,t) ∧ Initiates(e,f,t) ∧ t
282
NAUZE AND VAN LAMBALGEN
in the first case the integrity constraint imposes an obligation on the ordinary sentences in the database to establish that the consequent should hold. This entails in general that the database has to be updated with a true statement about the world; which statement that is, has to be found out by abduction. To return to our example, there will be an action take umbrella, whose meaning is given by the database clause Initiates(take umbrella, carry umbrella, t). Suppose the database is updated with HoldsAt(rain, now ), i.e. the antecedent of the integrity constraint 127. The integrity constraint then requires us to set up a derivation starting from the query ?HoldsAt(carry umbrella, now + ). Applying the event calculus we can reduce this query to ?Happens(take umbrella, now ), ¬Clipped (now , carry umbrella, now +). We now have to update the database in such a way that the query succeeds. This can be achieved if we only add the clause Happens(take umbrella, now ), and no other occurrences of events. For in this case the query ?Happens(take umbrella, now ), ¬Clipped (now , carry umbrella, now +) reduces to ?¬Clipped (now , carry umbrella, now + ), and this query can be shown to succeed by applying negation as failure to ?Clipped (now , carry umbrella, now + ). Indeed, the latter query fails because of the way we updated the database. Of course, it is assumed that a statement such as Happens(take umbrella, now ) only gets added when in fact the action has been performed. As this example makes clear, an integrity constraint requires us to update a database in a particular way. A derivation is started with the consequent of the integrity constraint as the top query. Then resolution
SERIOUS COMPUTING WITH TENSE
283
with clauses from the database is applied for as long as possible. The derivation will in general end with a query that cannot be further resolved. If this were an ordinary derivation we would then apply negation as failure to the top query. In the case of an integrity constraint we use the unresolved bottom query instead to suggest an addition to the database which will make the top query succeed after all. The procedure chosen has the effect of making a minimal update of the database to ensure success of the top query: the computation exploits as much of the database as is possible, and only plugs in facts when absolutely necessary. These considerations lead us to the following definition of integrity constraint as it applies in our context. DEFINITION 16. Let R, R , R . . . be a finite set of constants each denoting a reference time; these constants belong to the constraint language. An integrity constraint is a formula of the form (†) IF ϕ THEN ψ(R, R , R . . .), where ϕ and ψ are formulas of the event calculus. The operational meaning of (†) is that if the scenario satisfies ϕ, the goal ?ψ(R, R , R . . .) must succeed, or fail finitely. To determine whether the scenario satisfies ϕ, one has to investigate whether the goal ?ϕ succeeds. Hence if the integrity constraint expresses an obligation it may be represented by the demand that ?ϕ, ψ(R, R , R . . .) succeeds. The case that the goal must succeed expresses an obligation; the case where it must fail finitely expresses a prohibition. A typical application is where ψ(t, t , t . . .) = HoldsAt(f, t) ∧ t ≤ now .In this case we require that the goal?HoldsAt(f, R)∧ R ≤ now succeeds or fails finitely by either of the following strategies. 1. One may update the scenario with true Happens, Initiates and ¬Clipped formulas, using the axioms of the event calculus; this is the strategy of choice if f is an activity fluent. It has the effect of making the temporal denotation of f extended in time. 2. If the first strategy fails, the scenario may also be updated with true HoldsAt or Initially formulas, or (in)equalities in R in the language of the reals. This is a possible strategy if f represents a state, since states do not have to be caused by events. For example, they may be a particular instance of a parametrized state, which evolves continuously via a dynamics. Another typical application of integrity constraints in our context is where the goal which must succeed or fail finitely is of the form
284
NAUZE AND VAN LAMBALGEN
?Happens(e, R) ∧ R ≤ now . Again there are two possible strategies for handling the goal: 1. if Happens(e, t) occurs in the head of a clause with nontrivial body θ(e, t), proceed by resolving the query ?θ(e, R); 2. otherwise, replace Happens(e, R) by a set of true (in)equalities in R. It is also possible to have an integrity constraint without a condition, i.e. where ϕ is a tautology. An entry in my diary like ‘appointment in Utrecht, Friday at 9.00’ expresses an unconditional obligation to satisfy HoldsAt(be-in-Utrecht,Friday at 9.00), and presented with this integrity constraint, my internal database comes up with a plan to satisfy the constraint. We will usually refer to an unconditional integrity constraint by means of the query that must succeed, or fail finitely. We now illustrate the linguistic relevance of the preceding definition by the example of the English perfect: for the present perfect the integrity constraint is that the query (128) ?HoldsAt(f, R), R=now must succeed, whereas for the pluperfect success is required for the query (129) ?HoldsAt(f, R), R <now In both cases the logic programming mechanism starts a computation from the given query by applying the axioms of the event calculus. For instance, applying axiom 2 means that the database searches for an event e such that Initiates(e, f, t), Happens(e, t0 ) and ¬Clipped (t0 , f, t). If this query does not succeed, the database may ask the outside world for input. For instance, in the above the scenario consists of the formula Initiates(e, f, t) only. The database then asks for input of a true formula Happens(e, t0 ) ∧ t0 ≤ R. This is the computational meaning of the perfect. It is also the computational meaning of the progressive, where the fluent f represents an activity and e an initiating event. If the database would be unable to find a formula Initiates(e, f, t), it could also ask the world for input of a true formula of this type. Alternatively, it could forego the search for an f -triggering event and ask for a set of (in)equalities in R or (using axiom 1) a true Initially formula instead. These strategies may occur if f represents a stative verb, for in that case f need not be triggered by an action or event.
SERIOUS COMPUTING WITH TENSE
285
The upshot of the preceding discussion is that a reference time is characterized by a set of fluents which must hold at that time. This stipulation captures the idea that the role of a reference time is to fix the situation or context that we are talking about. In general such situations are only partially determined. Suppose the reference time is characterized by fluents f 1 , . . . , f n , i.e. by the integrity constraint ?HoldsAt(f 1 , R), . . . , HoldsAt(f n , R). If we want to stipulate that another fluent f , say the result fluent involved in the perfect, holds at R, the only way to do this is to enlarge the integrity constraint to ?HoldsAt(f 1 , R), . . . , HoldsAt(f n , R), HoldsAt(f, R), i.e f must occur in a subgoal in the integrity constraint. In general we shall mention only the immediately relevant part of the integrity constraint, in this case ?HoldsAt(f, R), and leave out the contextually given part.15 Before we move to a discussion of the Pass´e Simple and Imparfait, let us take stock. The principal function of the scenario is to contribute lexical information, which is general and does not talk about specific times. The addition of temporal information is required to construct a sentence out of lexical material. Contrary to first impressions, the reference time cannot be added as a fact to a scenario, as we have seen in the case of the perfect. The reference time is an integrity constraint formulated in terms of fluents, which typically puts constraints upon possible temporal locations of event types. We have to exercise some care here: the traditional phrase ‘event time’ obscures the fact that there are at least two different kinds of events, what we have termed here fluents and event types. Localizing stative verbs or VPs in the English progressive, involves anchoring a fluent, whereas the English simple tenses locate event types. This is related to an important issue: the role of Aktionsart in defining tense. Some authors, such as Comrie (Comrie, 1985), prefer to define tense abstracting from aspectual features, the idea being that tense talks only about localization in time; an example will be given below. We try to go a different route, and allow for the possibility that tense works out 15
In the presence of integrity constraints, the model whose existence is posited in theorem 2 depends of course on the constraints used the make the goal in the integrity constraint succeed or fail.
286
NAUZE AND VAN LAMBALGEN
differently for different Aktionsarten. Some indications of what this means in practice are given below.
´ Simple and Imparfait 6. Formalizing the Passe We will now formalize the examples of section 2 using the event calculus formalism. 6.1. Pass´e Simple: scenarios and integrity constraints Guided by the previous analysis we propose that the effect of the Pass´e Simple is to introduce an integrity constraint of the form ?HoldsAt(f, R), Happens(e, R), R < now, where e is the event type derived from the VP which occurs in the PS, and the fluent f represents the context in which the PS is interpreted. If the context is empty, for example if the sentence considered is the first sentence of the discourse, we leave out the HoldsAt clause. One may observe immediately that this stipulation accounts for two features of the PS: it presents the eventuality as perfective, and it places the eventuality in the past of the speech time. We also need a meaning postulate for the conjunction et, which is not a simple Boolean conjunction. The following stipulation seems to capture what we need: if the PS occurs in the form ‘Set PS-VP’, then the fluent f occurring in the integrity constraint for the PS refer to the state which results from the event described by S (and not from material that was processed earlier). We view the construction ‘S1 , S2 et S3 ’ as an iterated form of et, that is, as ‘(S1 et S2 ) et S3 ’. Sentences conjoined by et are thus bound together more tightly then sentences conjoined by a period. Succession (non)effects Recall that we have argued in section 2.1 for the succession effect in PS narratives as a side-effect of the semantics of the PS. Consider the sentences (130) a. Pierre monta dans sa chambre et ferma la porte. (2 × PS) b. Pierre ferma la porte et monta dans sa chambre. (2 × PS) If our approach is to be correct, the implied succession in sentences 130a and 130b should derive from the ordering of the sentences, the identity
SERIOUS COMPUTING WITH TENSE
287
of the subject in the conjuncts related by et, and the perfectivity of the PS. We propose the following derivation of this effect. The scenario for sentence 130a looks as follows: 1. a) b) 2. a) b)
Initiates(go-upstairs(x), upstairs[x], t) ?Happens(go-upstairs(Pierre), R), R < now succeeds Initiates(close(x, y), closed[y], t) ?HoldsAt(upstairs[Pierre], R ), Happens(close(Pierre,door), R ), R < now succeeds
The formulas collected in 1. represent the information in the scenario induced by the first VP16 . The first formula states lexical information, and the integrity constraint gives the contribution of the PS. For completeness we should have added a HoldsAt clause as well, whose fluent has information about the context; but this clause would be irrelevant to the computation. The minimal model for this scenario looks as follows. Until reference time R, the fluent upstairs[Pierre] does not hold (by negation as failure), at R the event go-upstairs(Pierre) happens and initiates the fluent upstairs[Pierre]. The next pair of formulas introduces the semantic contribution of the second conjunct of 130a. The lexical information introduced in 2a is straightforward. The choice of the integrity constraint 2b requires some explanation. The Happens clause of the integrity constraint represents the effect of the PS (perfective event in the past of the speech time), and the HoldsAt clause represents the context in which the PS is interpreted. Since the two clauses in 130a are linked by et, the fluent in this HoldsAt clause must refer to the state resulting from 1. We show that this choice accounts for the default succession effect of a sequence of PS sentences ascribed to a single subject. In the minimal model we have R < R , as is shown by the following derivation. The first few steps in the argument look like this: The top node of this derivation contains the integrity constraint, i.e. a goal which is assumed to succeed. The derivation shows that the top goal can only succeed if the goal ?¬Clipped(R,upstairs[Pierre],R ), Happens(close(Pierre,door),R ),R < R <now also succeeds. A simple negation as failure argument shows that the subgoal ?¬Clipped(R,upstairs[Pierre],R ) succeeds. This leaves us with the goal ?Happens(close(Pierre,door),R ),R < R <now, which means that any R satisfying Happens(close(Pierre,door),R ) must also satisfy R < R <now. 16
Recall that go-upstairs(x) denotes an event type and upstairs[x] a fluent.
288
NAUZE AND VAN LAMBALGEN ?HoldsAt(upstairs[Pierre],R’), Happens(close(Pierre,door),R’), R < now
Axiom 2 mmm m m m mmm mmm m m mm
?Happens(go-upstairs(Pierre),R), Initiates(goHappens(go-upstairs(Pierre),R), upstairs(Pierre),upstairs[Pierre],R), Initiates(go¬Clipped(R,upstairs[Pierre],R ), upstairs(x),upstairs[x], t) Happens(close(Pierre,door),R ), m m m mmm R < R < now mmm m m m mmm mmm ?¬Clipped(R,upstairs[Pierre],R ), Happens(close(Pierre,door),R ), R < R < now Figure 40.
Effect of the second integrity constraint in 130a
Having explained the main idea, we shall usually leave the last few steps, including the proof that the ?¬Clipped subgoal succeeds, to the reader. Analogously, the scenario for sentence 130b looks as follows 1. a) b) 2. a) b)
Initiates(close(x, y), closed[y], t) ?Happens(close(Pierre,door), R), R < now succeeds Initiates(go-upstairs(x), upstairs[x], t) ?HoldsAt(closed[door], R ), Happens(go-upstairs(Pierre), R ), R < now succeeds
A derivation analogous to figure 40 shows that in the minimal model we must have R < R . In both this case and the previous, the derivation does not branch, corresponding to the fact that sentences 130a and 130b have a single reading. So far so good, but we also have to check whether the proposed integrity constraint does not overgenerate, that is, we have to look at examples where the succession does not hold. Let us first look at example 114d, here adapted to (131) # Pierre monta dans sa chambre et se leva. (2 × PS) As we remarked above, sentence 131 is not felicitous because the information conveyed by the second VP contradicts the lexical presupposition of the first. That is, you go upstairs walking, hence you need to
SERIOUS COMPUTING WITH TENSE ?HoldsAt(upright[Pierre],t), R < t
289
Axiom 2 iiii i i i iii iiii
?Happens(get-up(Pierre),R ), Happens(get-up(Pierre),R ), Initiates(getHoldsAt(sitting-down(x), t) → up(Pierre),upright[Pierre],R ), R < t, Initiates(get-up(x),upright[x],t) ¬Clipped(R ,upright[Pierre],t) jjj jjjj j j j j jjjj ?HoldsAt(sitting-down[Pierre],R ), ¬Clipped(R ,upright[Pierre],t), R < t
f ailure Figure 41.
Conditions on the fluent upright[Pierre] in 131
be standing up; and if you are already standing up you cannot perform the action of getting up. The scenario for this case has the following form 1. a) b) 2. a) b)
Initiates(go-upstairs(x), upstairs[x], t) ?Happens(go-upstairs(Pierre), R), R < now succeeds HoldsAt(sitting-down[x], t) → Initiates(get-up(x), upright[x], t) ?HoldsAt(upstairs[Pierre], R ), Happens(get-up(Pierre), R ), R < now succeeds
Using only the material in 1. we obtain a minimal model where the fluent upstairs[Pierre] is initiated at time R (hence does not hold before R). Viewed superficially, the material in 2. enforces that the event getup(Pierre) happens at time R with R < now . However, the integrity constraint in 2(b) is actually inconsistent with 2(a), as we can see when we try to compute the query ?HoldsAt(upright[Pierre],t),R < t. The problem is that, in the course of the derivation, the query ?HoldsAt(upright[Pierre],t) is transformed into ?HoldsAt(sitting-down [Pierre],R ) which cannot lead to successful termination, as we do not have any information in the scenario pertaining to an event initiating this fluent (see figure 41). This means that also for t later than R , ?HoldsAt(upright[Pierre],t) is false. As a consequence, the goal 2(b) cannot succeed. The following example presents a case where the order of the events can actually be the inverse of the order of the sentences describing them.
290
NAUZE AND VAN LAMBALGEN
(132) Pierre brisa le vase. Il lelaissa tomber. (2 × PS) When introducing this example in section 2.1 we noted that one may get the standard ordering back upon enlarging the discourse: (133) a. Pierre brisa le vase avec un marteau. Il le laissa tomber et s’en alla. b. Pierre brisa le vase avec un marteau. Il le laissa tomber. Il s’en alla sans le regarder. Thus we must be able to explain the inversion of example 132 by a construction which is flexible enough to also accomodate examples 133a and 133b. It is important to mention at this stage that lexical expressions do not come with unique scenarios. A clause in a scenario can be seen as an activated part of semantic memory17 ; which part is activated depends on all kinds of circumstantial factors. For this example we assume that the scenario for ‘break’ contains an open-ended set of clauses specifying possible causes of the breaking. We choose a simplified formulation here; e.g. 1b below could be derived in more elaborate scenario detailing the relationship between ‘drop’, ‘fall’ and impact on the ground. The simplified formulation is better suited, however, to illustrate the main points of the argument. Accordingly, we will take the scenario to be of the following form, where we omit the HoldsAt components of the integrity constraints because they play no role in the derivation. 1. a) Initiates(break(x, y), broken[y], t) b) Happens(drop(x, y),t − ) → Happens(break(x, y), t) c) Happens(smash(x, y),t) → Happens(break(x, y), t) .. . d) ?Happens(break(Pierre,vase), R), R < now succeeds 2. a) ?Happens(drop(il,le), R ), R < now succeeds 18 A successful computation starting from the query ?Happens(break( Pierre,vase), R), R< now is given in figure 42. 17
There is actually a close connection between logic programming with negation as failure and the spreading activation networks beloved of psycholinguists. 18 The anaphors ‘il’ and ‘le’ are really variables to be unified with concrete objects; we keep the words as handy mnemonics.
SERIOUS COMPUTING WITH TENSE
291
Happens(drop(x, y),t − ) → Happens(break(x, y), t) gggg ggggg g g g g gg ?Happens(drop(Pierre,vase),R− ), R < now ?Happens(break(Pierre,vase),R), R < now
?il=Pierre, le=vase, R = R − < R < now Figure 42.
The effect of the two integrity constraints in example 132
This computation explains the reversed order. Notice however that if we would bind the two sentences with an et, as in 112, the integrity constraint for the second sentence would be: ?Happens(drop(il, le), R ), HoldsAt(broken[vase], R ), R < now
By negation as failure, Initially(broken[vase]) is false, hence there must have been an event initiating the fluent broken[vase]. If ‘il’ is unified with Pierre and ‘le’ with the vase, this is impossible, because dropping the vase would have to take place before R . If the fluent broken[vase] goes proxy for the broken vase (as an object)19 , it is possible to unify ‘le’ with broken[vase], and get a coherent interpretation again. The examples 133a and 133b can be treated in the same manner. Finally, we come to an example where the events described have no natural order. (134) Cet ´et´e-l` a, Maurice ´epousa Ad`ele, Jean partit pour le Br´esil et Paul s’acheta une maison `a la campagne. (3×PS) The fact that we cannot order the enumerated events in sentence 134 is mainly due to the different subjects of the VPs. The temporal adverbial (Cet ´et´e-l` a) only places the events in a certain period of time, without implying anything about their order. A scenario might look as follows: 1. a) Initiates(begin, this-summer, t) b) Terminates(end, this-summer, t) 2. a) Initiates(marry(x, y), married[x, y],t) 19
This trick is explained in (van Lambalgen and Hamm, 2004).
292
NAUZE AND VAN LAMBALGEN
b) ?HoldsAt(this-summer, R1 ), Happens(marry(Maurice, Ad`ele), R1 ), R1 < now succeeds 3. a) Initiates(leave-for(x, y), be-in[x, y], t) b) ?HoldsAt(this-summer, R2 ), Happens(leave-for(Jean,Brasil), R2 ), R2 < now succeeds 4. a) Initiates(buy(x, y), have[x, y], t) b) ?HoldsAt(this-summer, R3 ), Happens(buy(Paul,countryhouse), R3 ), R3 < now succeeds What we obtain from the integrity constraints, by means of a derivation like the ones given above, is that there are times R0 and R4 such that Happens(begin, R0 ), Happens(end, R4 ), R0 < {R1 , R2 , R3 } and {R1 , R2 , R3 } ≤ R4 . However, the order of R1 , R2 and R3 cannot be determined. Inchoative use of the PS Consider again the example (135) Mitterand fut pr´esident. (PS) We have to derive formally that the PS applied to the stative expression ‘be president’, picks out the initiating event. Interestingly, when we are only given the fluent ‘be president’, there is no explicitly given event which warrants the application of the PS. Applying the PS means that a form of coercion is going on, in which the fluent is somehow transformed into an event. The proper of way of doing this involves so-called hierarchical planning, for which we refer to (van Lambalgen and Hamm, 2004), since it is too involved to explain here; we give a simplified treatment, based on the idea that the PS ‘searches’ the scenario to find the event which saturate the event-argument in the integrity constraint. Since presidents are usually elected, the scenario for ‘be president’ will contain a statement such as 1a. This statement contains a reference to the event ‘elect’, which may thus figure in an integrity constraint. We thus get 1. a) Initiates(elect(x), president[x], t) b) ?Happens(elect(M.), R), R < now succeeds As can be seen from figure 43, the fluent president[M.] does not hold before R. A similar derivation shows that it must hold after R.
SERIOUS COMPUTING WITH TENSE ?HoldsAt(president[M.],t ), t ≤ R
293
Axiom 2 jjj j j j jj jjjj
?Happens(elect(M.),t), Initiates(elect(M.),president[M.],t), Happens(elect(M.),R) t < t ≤ R, mmm mmm ¬Clipped(t,president[M.],t m m m mmm mmm m m mm ?Initiates(elect(M.),president[M.],R), R < t ≤ R, ¬Clipped(R,president[M.],t )
f ailure Figure 43.
Fluent president[M.] before R
6.2. Imparfait: scenarios and integrity constraints The integrity constraint associated to the Imparfait must be very different from that associated to the Pass´e Simple, for example because an Imp sentence is not felicitous in isolation, unlike a PS sentence. An Imp sentence must be anchored by means of PS in the discourse. We therefore propose the following. An Imp VP with an adjacent PS VP introduces an integrity constraint of the form ?Happens(e, R), HoldsAt(f 1 , R ), ..., HoldsAt(f n , R ), R < now, R < now
where e is some PS event of the discourse context (this sentence can precede or come after the Imp sentence), and f 1 ,..., f n are the relevant fluents describing the Imp verb phrase. The most relevant part of the integrity constraint for the Imp is the HoldsAt(f, R ) part. This part distinguishes the PS and the Imp: the PS introduces an integrity constraint of the form Happens(e, R), possibly together with some other fluents that hold at R, while the integrity constraint associated to the Imp introduces a number of HoldsAt(f, R ) statements that are combined with the Happens(e, R) statement of a PS VP in the discourse.
294
NAUZE AND VAN LAMBALGEN ?HoldsAt(warm,R), HoldsAt(wearing[Jean,vest],R), Happens(take-off(Jean,vest),R), R <now
2 × Axiom 1 qq qqq q q q qqq q q q qqq ?Initially(warm), 0 < R < now, ¬Clipped(0,warm,R), Initially(wearing[Jean,vest]), ¬Clipped(0,wearing[Jean,vest],R), Happens(take-off(Jean,vest),R) Figure 44.
Integrity constraint in example 136
Imparfait as background Consider the discourse (136) Il faisait chaud. Jean oˆta sa veste. (Imp, PS) The scenario for these sentences must contain a fluent warm, and an event and a fluent for the achievement ‘take off one’s sweater’. For the latter we choose the event take-off, which terminates the fluent wearing; equivalently, we could have take-off initiate not-wearing. The integrity constraint anchors the fluent warm; note again that anchoring is only possible given a PS VP. 1. a) Terminates(take-off(x, y), wearing[x, y],t) b) ?HoldsAt(warm, R), HoldsAt(wearing[Jean,vest],R), Happens(take-off(Jean,vest),R), R < now succeeds The derivation in figure 44 shows that ‘ Il faisait chaud’ really functions as a background. The final query can succeed only if warm is true from the start. The next derivation (figure 45) shows the fate of the fluent wearing[Jean,vest]. Hence the fluent warm is true at all times, while the fluent wearing[Jean,vest] holds until R and is terminated at this time. Imparfait for a resultant state (137) Jean appuya sur l’interrupteur. La lumi`ere l’´eblouissait. (PS, Imp) This is an example where there is no overlap between the two eventualities, pushing a button and being blinded. The desired effect is
295
SERIOUS COMPUTING WITH TENSE ?HoldsAt(wearing[Jean,vest],t), Axiom 1 R < t, R < now jjjj j j j j jjjj jjjj ?Initially(wearing[Jean,vest]), ?Clipped(0, / wearing[Jean,vest],t), ¬Clipped(0,wearing[Jean,vest],t), R < t, R < now R < t a
Axiom 4 vv v vv v v vv vv
f ailure
?Happens(takeoff(Jean,vest),R), Terminates(take-off(Jean, vest),wearing[Jean,vest],R), 0 < R < t ?0 < R < t
Figure 45.
Fluent wearing[Jean,vest] in example 136 for t > R
obtained only when the scenario gives some information about the causal relation between the light being on and being blinded; this is the purpose of part 2 of the scenario. 1. a) b) c) 2. a) b) c)
Initiates(push(x,on), light-on, t) Terminates(push(x,off), light-on, t) ?Happens(push(Jean,y), R), R <now succeeds Releases(push(x,on), blinded[x], t) Trajectory(light-on, t, blinded[x], d) ?Happens(push(Jean,y), R), HoldsAt(light-on, R ), HoldsAt(blinded[Jean],R ), R <now, R <now succeeds
Figure 46 shows the derivation starting from the integrity constraint 2c. The substitution leading to success is indicated. The last query in the derivation can be made to succeed because the scenario makes no mention of a push-off event, and we therefore obtain the conclusion R < R < now.
The reader should notice that, for the sake of readability, in figures 48 and 49 we process the Clipped formulas in the same tree and delete them from the goal. The proper treatment of the integrity constraint would be, as described in section 5, to first update the database with the
296
NAUZE AND VAN LAMBALGEN ?Happens(push(Jean,y),R), R < now, R <now, HoldsAt(light-on,R ), HoldsAt(blinded[Jean],R )
Axiom 2 pp p p pp ppp p p ppp
?Happens(push(Jean,y),R), HoldsAt(blinded[Jean],R ), R < now, t < R <now, [Jean/x] Happens(push(x,on),t), [on/y] [R/t] Initiates(push(x,on),lighton,t), ¬Clipped(t,light,R )
?Happens(push(Jean,on),R), HoldsAt(blinded[Jean],R ),R < Initiates(push(x,on),lightR < now, Initiates(push(Jean,on),light,R), on,t) q q ¬Clipped(R,lightq qq on,R ) qqq q q q qqq q q qqq ?Happens(push(Jean,on),R), HoldsAt(blinded[Jean],R ), Trajectory(lightR < R < now, on,t,blinded[x],t+d), ¬Clipped(R,lightAxiom 3 nn n on,R ) n nnn nnn n n nnn ?Happens(push(Jean,on),R), R < R < now, ¬Clipped(R,lighton,R ) Figure 46.
Integrity constraint in example 137
Happens statement and then begin a new tree for the Clipped statement and check it for failure. Imparfait in an explanatory context In the following discourse, the second sentence has the function of explaining the event described in the first sentence. The eventuality described in the second sentence should therefore be placed in its entirety
SERIOUS COMPUTING WITH TENSE
297
?Happens(get(Jean,ticket),R), R < now, R < now, HoldsAt(speed[s],R ), HoldsAt(driving[Jean],R ), QQQ s > limit mm QQQ mmm QQ mmm ?goal + R = R ?goal + R < R ?goal + R < OOR OOO OOO ?R ≤ R < now, s > limit, Happens(get(Jean,ticket),R), Initially(speed[s]), Initially(driving[Jean]), ¬Clipped(0,speed[s],R ), ¬Clipped(0,driving[Jean],R ) Figure 47.
Integrity constraint in example 138
before the event described in the first sentence. (138) Jean attrapa une contravention. Il roulait trop vite. (PS, Imp) The scenario for this situation may look as follows. The first two statements have been included for convenience only; it would make no difference if we pushed the beginning of the scene further in the past and introduced an event initiating driving. 1. a) b) c) d) 2. a) b) c) d)
Initially(driving[Jean]) Initially(speed[s]) Initiates(get(x,ticket), have[x,ticket],t) ?Happens(get(Jean,ticket),R), R <now succeeds Terminates(get(x,ticket), driving[x],t) Terminates(get(x,ticket), speed[s],t) Initiates(get(x,ticket), speed[0],t) ?Happens(get(Jean,ticket),R), HoldsAt(speed[s],R ), HoldsAt(driving[Jean],R ), s > limit, R < now, R < now succeeds
In the first step we start from the query in 2d, and we expand the derivation tree according to the different possibilites for the relation of R and R , and then we recombine to get the possibilities R ≤ R and R < R . These possibilities are considered in the figures 48 and 49, respectively. Derivation 48 terminates successfully with the constraint R ≤ R < now, because of part 1 of the scenario.
298
NAUZE AND VAN LAMBALGEN ?Happens(get(Jean,ticket),R), R ≤ R <now, HoldsAt(speed[s],R ), HoldsAt(driving[Jean],R ), s>limit
Axiom 1 s ss s s s s ss ss s s ss ss ?Happens(get(Jean,ticket),R), R ≤ R <now, Initially(speed[s]), / ?Clipped(0,speed[s],R ) Axiom 4 ¬Clipped(0,speed[s],R ), q qqq q q s >limit, Initially(driving[Jean]), qq i qqq ¬Clipped(0,driving[Jean],R ) q q qqq
f ailure
?Happens(get(Jean,ticket),R), R ≤ R < now, Initially(speed[s]), s > limit, Initially(driving[Jean]), ¬Clipped(0,driving[Jean],R )
/
?Clipped(0, driving[Jean],R )
j
f ailure
Axiom 4 m mmm m m mm mmm mmm
?Happens(get(Jean,ticket),R), R ≤ R < now, Initially(speed[s]), s > limit, Initially(driving[Jean]) Figure 48.
Integrity constraint in example 138 with R ≤ R
Now consider derivation 49 for the other possibility, R < R . This derivation ends in failure, because the subderivation for Clipped(0, driving[Jean],R ) will end in success, given that getting a ticket at R < R ends in terminating the driving at that point.
7. Coda
In conclusion we can do no better than quote from the eloquent ‘Apology and guide to the reader’ of Kamp and Reyle (Kamp and Reyle, 1993): . . . the mechanisms which natural language employ to refer to time cannot be properly understood by analyzing the properties of single sentences. Thus the methodology of modern generative grammar, which takes the single sentence as the basic unit of study is not, we believe, suited to this particular domain.
SERIOUS COMPUTING WITH TENSE
299
?Happens(get(Jean,ticket),R), R < R < now, HoldsAt(speed[s],R ), s > limit, HoldsAt(driving[Jean])
Axiom 1 uu u uu u u uu uu u uu uu ?Happens(get(Jean,ticket),R), R < R < now, Initially(speed[s]), / ?Clipped(0,speed[s],R ) ¬Clipped(0,speed[s],R ), Axiom 4 t tt s > limit, t t t Initially(driving[Jean]), tt tt t t ¬Clipped(0,driving[Jean],R ) t ` tt tt t t tt ?Happens(get(Jean,ticket),R), 0 < R < R , f ailure Terminates(get(Jean,ticket),speed[s],R) ?0 < R < R Figure 49.
Integrity constraint in example 138 with R < R
Rather, a proper analysis of temporal reference must (a) make explicit its anaphoric aspects – the systematic ways in which such devices of temporal reference as tenses and temporal adverbs rely for their interpretation on temporal elements contained in the antecedent discourse – and (b) discover the temporal organization of those conceptual structures which extended discourses produce in the human recipients who are able to interpret them.
This is precisely what we have attempted to do here.
Acknowledgements A condensed form of this chapter appeared as Chapter 10, ‘Tense in French: Pass´e Simple and Imparfait’, in van Lambalgen and Hamm (van Lambalgen and Hamm, 2004).
300
NAUZE AND VAN LAMBALGEN
References Comrie, B.: 1985, Tense. Cambridge, UK: Cambridge University Press. De Swart, H. and F. Corblin (eds.): 2002, Handbook of French Semantics. Stanford: CSLI Publications. Dowty, D.: 1979, Word meaning and Montague grammar. Dordrecht: Reidel, Gosselin, L.: 1996, S´emantique de la temporali´e en francostcais. Champs Linguistiques. Editions Duculot. Hamm, F. and M. van Lambalgen: 2000, Event calculus, nominalisation and the progressive. Research report, ILLC, Amsterdam. 76 pp. To appear in Linguistics and Philosophy. Available at http://www.semanticsarchive.net. Kamp, H.: 1983(?), unpublished progress-report for research on tenses and temporal adverbs of French. Kamp, H. and U. Reyle: 1993, From Discourse to the Lexicon: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic Publishers. Kowalski, R.: 1995, Using meta-logic to reconcile reactive with rational agents. In Meta-logics and logic programming, pp. 227–242. Cambridge MA.: MIT Press. Shanahan, M.: 1997, Solving the frame problem. Cambridge MA.: MIT Press, Steedman, M.: 1997, Temporality. In J. van Benthem and A. ter Meulen, editors, Handbook of Logic and Language, chapter 16, Amsterdam: Elsevier, pp. 895–938. Sten, H.: 1952, Les temps du verbe fini (indicatif ) en francostcais moderne. Historik-filologiske Meddelelser. Det Kongelige Danske Videnskabernes Selskab. Stuckey, P.: 1995, Negation and constraint logic programming. Information and Computation, 118:12–33. Van Lambalgen, M. and F. Hamm: 2003, Intensionality and coercion. In R. Kahle, editor, Intensionality. ASL Lecture Notes in Logic, Wellesley, MA: A.K. Peters, 2003. Van Lambalgen, M. and F. Hamm: 2003, Moschovakis’ notion of meaning as applied to linguistics. In M. Baaz and J. Krajicek, editors, Logic Colloquium ’01. ASL Lecture Notes in Logic, Wellesley, MA: A.K. Peters, 2003. Van Lambalgen, M. and F. Hamm: 2004, The proper treatment of events. To appear with Blackwell, Oxford and Boston, 2004.
JAMES PUSTEJOVSKY, ROBERT KNIPPEN, JESSICA LITTMAN, ROSER SAUR Í
TEMPORAL AND EVENT INFORMATION IN NATURAL LANGUAGE TEXT
1. Introduction
The automatic recognition of temporal and event expressions in natural language text has recently become an area of intense research in computational linguistics and Artificial Intelligence. The importance of temporal awareness to question answering systems has become more obvious as current systems strive to move beyond keyword and simple named entity extraction. Named entity recognition (Chinchor et al., 1999) has moved the fields of information retrieval and information exploitation closer to access by content, by allowing some identification of names, locations, and products in texts. Beyond such metadata tags, however, there is only a limited ability to mark up text for real content. One major problem that has not been solved is the recognition of events and their temporal anchorings. Newspaper articles describe the world around us by talking about people and the events and states of affairs they participate in. As it happens, however, much of the temporal information in a report or narrative is left implicit in the text. The exact temporal locations of events is rarely explicit and many temporal expressions are vague at best. Acrucial first step in the automatic extraction of information from such texts, for use in applications such as automatic question answering or summarization, is the capacity to identify what events are being described and to make explicit when these events occurred. While questions such as thefollowing can be easily answered by human beings after reading the appropriate newspaper article, such capabilities goes beyond any current automatic system: (139) a. Is Schr¨ oder currently German chancellor? b. What happened in French politics last week? c. When was merger between Deutsche Bank and Dresdner Bank?
301 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 301–346. c 2007 Springer.
302
PUSTEJOVSKY ET AL.
The recognition of temporal “keywords” (e.g., currently, last week) is clearly a prerequisite for understanding and answering these questions. In addition, further temporal knowledge needs to be represented and further temporal inferences need to be drawn. First, temporal aspects of theproperties of entities (i.e., property of being German chancellor) must be adequately represented.Second, the extraction of event descriptions with their time stamps has to be carried out. Theknowledge of certain temporal features of events (i.e. typical duration of an event) seems also to becrucial for the correct understanding of text. Finally, the veridicality of events has to be checkedas well (i.e., actual vs. intended events).As can be seen from these three example questions, building an automatic system that can extractand reason with temporal and event information bring up new multifaceted research issues. Firstwe require an expressive language in which the kind of event and time information we are concernedwith can be made explicit.
2. Temporal Information in Questions
Natural language questions express possible queries that a QA system must answer. In many of them, temporal information is a basic component of knowledge, and needs to be handled for an acceptable degree of performance to be attained. Take as example the following interrogative sentences, which are extracted from (or based on other questions in) the Excite question log. (140) a. When did Yale first admit women? b. How long does it take to climb Everest? Both examples in (140) are looking for a temporal value associated with the expressed event: a date in the case of (140a), a duration in (140b).1 Answering (140a) does not require a very powerful reasoning engine. Assuming that the needed information is contained in our knowledge base, the answer will be the specific date associated with the event denoted by the query. For unstructured textual knowledge bases, it will suffice to have a system capable of identifying events and temporal expressions in a text and anchoring the events in the timeline. The 1
Here and throughout the article we use the term ‘event’ to refer to the ontologically broader notion of eventuality. Our events will therefore include states as well as dynamic events.
TEMPORAL AND EVENT INFORMATION
303
first task can be performed reasonably well by a chunker informed with lexical information and constrained to a limited set of structures (Mani and Wilson, 2000; Schilder and Habel, 2001; Pustejovsky et al., 2002). Similarly, some time-stamping can be extracted from the parse tree, as is done in previous work (Filatova and Hovy, 2001; Mani et al., 2003). However, queries like (140b) demand a higher temporal reasoning capability. The eventive predicate (climbing the Everest) does not refer to a unique event in the knowledge base, and therefore the felicitous answer will need to be calculated over a set of temporal expressions presumably associated with that expression. A similar issue is at play in the following pair of examples: (141) a. Who was the American ambassador to Japan before Walter Mondale? b. What is the name of the teacher that went to jail after getting pregnant by a student? Answering the examples in (141) involves dealing with temporal relations of the sort before, after, and during. The reasoning system does not need to be very sophisticated for questions like (141a) which, similar to (140a), focusses on very unique events. The first time Yale admitted women happened presumably only once in our timeline. Similarly, there is only a limited number of states of somebody being the American ambassador to Japan, and all of them are reasonably well anchored in time. From a structured knowledge base perspective, the answer to (141a) can be provided on the basis of ordered lists of domainprominent state-of-affairs in the world. From less structured sources, the processing tools mentioned above for question (140a) would help in time-stamping and ordering the set of states of being an American ambassador to Japan. However, answering queries of the sort of (141b) demands the temporal ordering of events that may not be as well temporally delimited as those in (140a) or (141a), and for which no precise time-stamping may exist. Consider now the examples below: (142) a. When is the Monsoon season in Southeast India? b. Between October and December. The question in (142a) queries for the temporal value related to a temporal expression (Monsoon season, which is of the same nature as Ramadan, Christmas, or Passover). Assuming that our knowledge base contains the necessary information to answer it, we can expect Monsoon
304
PUSTEJOVSKY ET AL.
season to be temporally related to a bounded period of time (such as between(Monsoon,October,December)), so that answer (142b) is returned. (143) a. When does the Monsoon season begin in Southeast India? b. In October. (144) a. How long does the Monsoon season last in Southeast India? b. For three months. Additional mechanisms will be needed, however, in examples (143– 144), both for interpreting the queries (so that the references to only part of the Monsoon period in (143a) and to its duration in (144b) are identified), and in the answering process in order to compute the appropriate information from the statment in our knowledge base. Whatever the strategy a QA system applies is, it is clear that answering those questions requires controlling information along the temporal axis. Other examples illustrating the fundamental character of temporal information in queries are the following: (145) a. Who won the Nobel prize this year? b. Who is the President of Argentina? As in (141), neither of the two questions above inquire about a temporal expression or relation, but about individuals. Yet, answering them involves locating the events they refer to in the time axis. The structure of (145a) ressembles that of (140a) in that they both relate an event (winning the Nobel prize and admitting women to Yale, respectively) to a temporal expression (in (145a), this year, and in (140a) the value that will be returned as answer to the question). Furthermore, temporal reasoning is also essential in queries with no overt or queried temporal values. Question (145b), for instance, may not receive the adequate answer if no reference to the temporal axis is made. Examples (140–145) illustrate the extent to which temporal information is pervasive throughout questions, thereby demanding systems capable of representing and reasoning with temporal knowledge. This is the case for many of the different types of queries in the typologies used by current QA systems (e.g., Abney et al., 2000; Hovy et al., 2002). We will now look into different kinds of questions and analyze the relevant components that must be identified in order to answer the
TEMPORAL AND EVENT INFORMATION
305
question felicitously. The first task to address here is the identification of temporally relevant queries since, as shown in (141) and (145), it is not necessary for them to be introduced by a wh-word referring to a temporal index, nor to contain an overt temporal expression. Still, a generalization applies to all sentences in (140–145). They all involve at least one temporal relation of any of the following sort: − Between an event E and a temporal reference T . The temporal reference T can be explicit (146a), implied by the wh-word (140), or contextually implied as in (145b); − Between an event E and another event E’ (146b); − Between two temporal values, T and T’ (146c). (146) a. How many servings of Coca-Cola were consumed in 1994? b. How many Iraqi civilians were killed during the attack on Falluja? c. When is Chanukah? Taking these three relation types as defining the nature of temporally relevant queries, we now have a better view on the kinds of potential questions that are involved here. A first, unequivocal subclass of temporal queries is constituted of queries that look for a temporal value as its felicitous answer, be it an index to a calendar date (8a), an index to a time of the day (8b), a duration (8c), or a set (8d). For expository purposes, we identify this inquired temporal value as qT,2 and the queries featuring them, as qT-queries. (147) a. When is the next full moon? b. What time is The Daily Show? c. How long does it take to climb Everest? d. How many days is the temperature below 32 ◦ F in Barrow, Alaska? Linguistically, queries of this sort are distinguished by a specific set of wh-phrases. Queries aiming at durations are generally introduced by (148a), whereas those looking for a set, by expressions like (148b). On 2
qT and the other q-terms we will introduce in the remainder of the article (qE, qI, qR) correspond broadly speaking to Qtargets in the QA literature (cf. Hovy et al., 2002).
306
PUSTEJOVSKY ET AL.
the other hand, queries pointing to temporal indices (calendar dates and time of the day) are introduced by phrases like (148c), where Nt is any temporal denoting noun (of the sort hour, day, month, century, Wednesday, January, etc.), and NPt is an NP headed by a Nt . (148) a. how long b. how often how many times c. what + Nt what is + NPt on what + Nt Of course, the class of queries described here is also, and very commonly, introduced by the very distinctive temporally-selecting wh-word when, which is not restricted to any of the temporal value types distinguished in (147). qT-queries can be nicely classified according to the kind of temporal relations they convey. Some of them involve a relation qT-T, between the queried temporal value qT and another temporal value T (149a). Some others hold a qT-E relation between qT and an event E (149b). Others present two relations: a first one between qT and an event E, which in turn is related to a time value T (as in 149c). Using the same notation as for the previous two relations, this case can be represented as qT-(E-T), but for simplicity’s sake, they will be represented as qTET. Queries of the form qT-TT are also possible (149d).3 (149)a. qT-T: 3
[q T When] is [T the first day of winter 1999]?
Note, however, that they are restricted to two very specific cases:
(i) [q T When] was [T Ramadan] [T this year]? (ii) a. b.
[q T What day] is [T December 18th] [T this year]? [q T What week] is [T Feb 2nd] [T this year]?
qT-TT queries like the one in (i) can only be constructed with event based temporal expressions as the T to which qT is related (Ramadan in the current case). Temporal expressions of this sort are: religious festivities (Chanukah or Easter), other local-based festivities (Carnaval, Les Santes), weather-based seasons (Monsoon, hurricane season), etc. On the other hand, qT-TT queries conforming to the model in (ii) are restricted to having a calendar date as the T to which the qT is related, and accept only days of the week, or week number, as possible values for the qT, respectively.
TEMPORAL AND EVENT INFORMATION
307
b. qT-E: [q T What year] was the toilet [E invented]? c. qT-ET: [q T What is the last day] [E to contribute to a Roth IRA] for [T 1999]? d. qT-TT: [q T When] was [T Ramadan] [T this year]? Note that because of the polymorphic nature of the wh-word when, qT-queries introduced by this particle can actually receive as answer a reference to the temporal relation that E or T holds with a second event, E’ or T’ (150c): (150) a. When did the embargo on Iraq begin? b. In mid September 1988. c. Before the Kuwait crisis. In addition to qT-queries, other temporally relevant queries are qEqueries. They look for events as their appropriate answer type (qEs, in our terminology). All the cases identified as qE-queries are introduced by the expression what happened and, interestingly, the set of temporal relations they can convey is equivalent to the one shown for qT-queries. In other words, qE-queries also hold relations between the qE element and a temporal reference T (151a), another event E (151b), an event E that is anchored to a temporal value T (151c), or a time reference T that is anchored to another temporal value T’ (151d). (151)a. qE-T: [q E b. qE-E: [q E c. qE-ET:[q E d. qE-TT:[q E
What What What What
happened] happened] happened] happened]
in Czechoslovakia [T in 1968]? in Vietnam after [E the war]? during [T yesterday]’s [E strike]? during[T Ramadan] [T this year]?
Other temporally relevant queries are those aiming at answers typed as individuals or values (qI and qV, respectively). qI and qV are in fact arguments of an eventive relation expressed or implicated in the query. qI-queries are exemplified in (152), qV-queries, in (153). (152) a. Who was the ruler of Egypt when Jesus Christ was born? b. What president had two vice-presidents die while in office?
(153) a. How many Iraqi civilians have been killed in the last year? b. How old was Che Guevara when he was killed?
308
PUSTEJOVSKY ET AL.
qI-queries are introduced by wh-phrases like those in (154), where Ni is any individual denoting noun, and NPi is an NP headed by an Ni . Similarly, qV-queries are distinguishable by wh-phrases of any of the patterns in (155): (154) who what + Ni what is/was + NPi (155) how many/much + Nnon−temporal how + Adjscalar what + Ndimension what is/was + NPdimension As with qT- and qE-queries, qI- and qV-queries reproduce again the same correlations between the q-term and the element it is temporally related to. This is actually predictable from the fact that q-terms in qI- and qV-queries are arguments of an event-denoting predicate, which is, strictly speaking, the element participating in the temporal relation. The query types for qI- and qV-queries (corresponding to those in (149) and (151) for qT- and qE-queries) are here characterized as expressions of the form qχ E-Υ, where χ is a variable over types I (for individuals) and V (for values), qχ E is a q-term of type I or V which refers to an argument of event E, and Υ is a temporal entity (either a time value T or an event E’) to which E is temporally related. (156)a. qI E-T: b. qI E-E: c. qI E-ET: d. qI E-TT:
(157)a. qV E-T:
[q I E [q I Who] was born] on [T December 18th]? [q I E [q I Who] was the ruler of Egypt] when Jesus Christ [E was born]? [q I E [q I Who] was killed] during [T yesterday]’s [E strike]? [q I E [q I Where] did G.W. Bush travel] for [T Thanks- giving] [T last year]?
[q V E [q V What] was the lowest temperature] [T last winter]? b. qV E-E: [q V E [q V How old] was Che Guevara] when he [E was killed]? c. qV E-ET: [q V E [q V How many students] were killed] during [T yesterday]’s [E strike]? d. qV E-TT: [q V E [q V How many turkeys] were eaten] for [T Thanksgiving] [T last year]?
TEMPORAL AND EVENT INFORMATION
309
Contrary to the other temporal query classes seen so far (qT- and qE-queries), qI- and qV-queries can also be temporally relevant and yet not express a temporal relation overtly, as illustrated in examples (145b) above and (158): (158) a. What company is ranked number 1 on the Fortune 500 list of companies? b. What is the population of Iraq? Examples above are queries looking for an individual I or a value V that is the argument of an event En , computed from a set of temporallyordered events Σ. All elements in Σ have equivalent intension, but receive a different extension depending on the temporal index they are related to. Thus, the answer in (158a) will refer to a different company depending on the year the state of being ranked number 1 on the fortune 500 list of companies is anchored to. Similarly, the value indicating the population in Iraq will vary from day to day. We will represent these queries as qχ En -(T), where χ is again a variable over types I and V, En refers to an event E ∈ Σ (the set of temporally ordered events), qχ En denotes a q-term of type I or V which is an argument of En , and T is an implicit temporal index.4 Two distinctive strategies for locating event En within Σ can be distinguished. On the one hand, there are cases like those in (159) in which En is calculated on the basis of an ordinal term (first,third) pre-modifying the event-denoting expression. Knowing the temporal anchoring of En to T is therefore not strictly necessary in examples like those above. T will correspond to the temporal index of En , T = TE n . (159) qχ En -(T), where T = TE n : a. qI E-(T): Who won the first Rose Bowl game? b. qV E-(T): What was the score of the third Rose Bowl game? This is however not the case with queries in (160–162), where the anchoring to an absolute temporal index (by default a present reference, Tnow ) is crucial. Differences in the linguistic encoding of the information allow for grouping those queries into several subclasses, which, correspondingly, requires different reasoning schemes. 4
As the notation suggests, these queries correspond to subtypes of qI E-T and qV E-T, exemplified in (156a) and (157a), respectively.
310
PUSTEJOVSKY ET AL.
(160) qχ En -(T), where TE n ≺T, or TE n
T, or TE n =T:
a. qI E-(T): Who was the previous President in Catalonia? b. qV E-(T): What was the previous lowest temperature registered?
(161) qχ En -(Tnow ), where TE n = Tnow : a. qI E-(T): Who is doing the body count in Iraq? b. qV E-(T): How old is Michael Jackson? (162) qχ En -(Tnow ), where TE n = Tnow : a. qI E-(T): Who is the President of Venezuela? b. qV E-(T): What is the temperature in Ellicotville, NY? In some cases (such as 160), a sequencing modifier (previous, next, current) is employed to signal the temporal relation between En and the implicit temporal index T. Depending on the sequencing term, the relation between the two entities will be ≺, , or =. A reference to a temporal index is also needed in queries grouped under (161), which are different from the previous cases in that En is not explicitely temporally ordered. In this case, the temporal relation assumed by default is TE n = Tnow . Queries exemplified by (162) are very similar to those in (161), the main difference being that in these the predicative force is carried by either an agentive nominal (president, landlord, passenger) or a measure-denoting noun (population, height, temperature). At a higher order of complexity, there is the class of queries looking for the value of the temporal relation itself, identified here as qRqueries. Again, they can be subclassified depending on the types of the entities involved in that relation:5 (163)a. qR(T-T): Is [T Ramadan] [qR before or after] [T Christmas]? b. qR(T-E): Was [T Thanksgiving] [qR before or after] [E the 9/11 Comission report]? 5
Note that the reasoning capability needed for answering queries like the following is very similar to that required for obtaining the alternative answers of qT-Υ queries in (150), where Υ denotes an entity of temporal nature (T or E).
TEMPORAL AND EVENT INFORMATION
311
c. qR(T-ET): Was [T Thanksgiving] [qR before or after] [E the 9/11 Comission meeting] in [T November? d. qR(T-TT): Is [T Easter] [qR before, during or after] [T Passover], [T this year]? Was [E the attack on Falluja] [qR after or during] [T Ramadan]? b. qR(E-E): Did [E J. Kerry concede] [qR before or after] [E finishing the ballot counting in Ohio]? c. qR(E-ET): Was [E the graduation ceremony] [qR before or after] [E the soccer finals], [T last year? d. qR(E-TT): Was [E the graduation ceremony] [qR before or during] [T May] [T last year]?
(164)a. qR(E-T):
Finally, there are also Yes/No queries inquiring about the truth value of a temporal relations made explicit in the text: Is [T Lent] before [T Carnaval]? Was it [T Ramadan] during [E the attack on Falluja]? c. q(T-ET): Was it [T night time] when [E the suspect arrived in Boston] on [T January 8th, 2001]? d. q(T-TT): Was [T Thanksgiving] during [T Ramadan] in [T 2003]?
(165)a. q(T-T): b. q(T-E):
(166)a. q(E-T):
Are poinsettias [E popular in Australia] during [T Christmas]? b. q(E-E): Did Putin [E lift the embargo] on Iraqi arms sales before [E the end of the war]? c. q(E-ET): Were WMD [E found] before [E the attack on Iraq] in [T 2003]? d. q(E-TT): Was there any [E combat] during [T Ramadan] [T last year]?
Temporal information is therefore an important component of different kinds of questions in natural language. In some cases, it is present even if there is no explicit temporal reference, be it a wh-expression or an overt temporal expression. QA systems must be sensitive to the various ways temporal relations are conveyed in natural language queries, as well as capable of controlling the information at the temporal axis.
312
PUSTEJOVSKY ET AL. 3. Representation of Temporal Information for QA
Systems The previous section made clear how important it is that QA systems are sensitive to temporal information of various sorts. In this section, we consider more specifically the kinds of temporal information that might be needed for answering questions and how this information might be represented for use by a QA system. We assume that the construction of a knowledge base for QA will involve marking up a document collection with some annotation language, so the question we address here is what such a markup language has to be like in order to use it to annotate documents for temporally sensitive question answering. There are clearly types of information that such a language must be able to encode if it is going to be at all useful for retrieving and inferring temporal information from texts. For example, it should be clear from Section 2 that to be useful for retrieving answers to temporally-sensitive questions, a knowledge representation language must have some way to annotate the way natural languages refer to events, times, and temporal relations. We can see that any temporally-aware QA system must have the ability to anchor events in time and order them. By event anchoring we mean placing a given event on a timeline. By event ordering, we mean establishing the relative position of two events in time. To make either of these tasks possible, a language must have some way of uniquely identifying events and times, as well as a way to express relations between these two. The details of how these three primitives are expressed in English, as well as their conceptual background, has been discussed elsewhere (e.g., Setzer, 2001). Here we will be concerned with those those features of events, times, and relations which must be encoded in order to answer questions effectively. The information extraction component of QA is also significant, so we will also have to consider the sorts of time-related information needed by tools and algorithms which automatically extract temporal information. It is important to us that any annotation scheme is equally useful as a language for marking up corpora which can be later used to train and evaluate temporal information extraction algorithms. Information about the tense and aspect of finite verbs, for instance, may not be directly useable for answering questions, but as an important way that natural languages express time, it may be useful to algorithms which determine the anchoring and ordering of events in texts. Further, given the range of linguistic mechanisms involved in expressing
TEMPORAL AND EVENT INFORMATION
313
temporal information, a language which allows an incremental, layered approach to information extraction will be valued highly. 3.1. Retrieving information from texts Events Events, as well as the kinds of states which change and thus might need to be located in time, (we will refer to these as events here), are referred to by finite clauses, nonfinite clauses, nominalizations, event-referring nouns, adjectives, and even some kinds of adverbial clauses, as seen in the following examples. (167) a. When President Leonid Kravchuk was elected by the Ukrainian Parliament in 1990, he vowed to seek Ukrainian sovereignty. b. In July 1994, Ukraine again held free and fair elections. c. Vowing to seek Ukrainian sovereignty, Kravchuk . . . d. While in office, Kravchuk was always an advocate for . . . As mentioned above, a language for representing temporal information in texts must have a way of identifying events so they can be anchored and ordered. Just as important for retrieval in QA is some way to indicate whether an event-referring clause includes a negation of the relevant event, as seen in the following. (168) a. When it became clear that controllers could not contact the plane, . . . b. No one reached the site in time. English has a wide range of mechanisms for expressing what might be referred to as a ‘negative’ event. It might not be possible or necessary to design an annotation system which indicates how the negation is expressed in every case. However, in order to determine whether one of the statements in 168 constitutes an answer to some question, it must have some way to record the fact that the relevant event is negated in each case. Just as important, though perhaps more difficult for a retrieval system to deal with, is the fact that events are often expressed with various types of modality, as seen below. (169) a. The United States may extend its naval quarantine to Jordan. b. Some assets might be sold to service the debt.
314
PUSTEJOVSKY ET AL.
c. The deal must give inspectors unrestricted access. d. Sununu has plenty of support and should be appointed . . . Epistemic modality, seen in examples (169a) and (169b) has to do with aspects of events such as necessity and possibility. Deontic modality, seen in examples (169c) and (169d), has to do with aspects of events such as obligations agents might have with respect to them, or the permissibility of events. In general, such modality is expressed in English with a modal verb. The modality expressed in each case clearly has implications for the suitability of statements as answers to questions. With no more information than 169a, for example, a question answering system should not treat the event referred to by extend as if it really occurred. One can even imagine domains in which questions refer directly to modalities such as permissibility. (Can a US citizen visit Cuba before 2005?) Thus, a retrieval system should at the very least record the modality of the events in statements like the above. In section 4, we discuss a way to annotate events that allows information about negation and modality to be represented. One factor complicating the markup of events in texts is the fact that not every unique event referred to can be associated with a text extent. That is, some text extents referred to multiple events, by quantifying over events as in (170a), or by the use of various kinds of syntactic ellipsis, as in (170b). (170) a. James taught 3 times on Tuesday. The first time was at 8am. b. Marty taught on Friday, but James didn’t. Because the different events referred to by a single text extent may have different negation and modality properties, as well as different sorts of relations to times and other events, it is important for any annotation language to have a way to reify the multiple events referred to in a single text extent. In (170b), two teaching events are referred to, but one is negated. In (170a), there are 3 events of teaching referred to, and they must be represented seperately to capture the different times expressed for each one. In section 4, we discuss a method for associating multiple events with a single text extent. Times The main reason for the knowledge base of a QA system to concern itself with time expressions is in order to be able to anchor events to times. It may sometimes be useful for efficient retrieval to order times
TEMPORAL AND EVENT INFORMATION
315
with respect to each other, but the main concern will always be trying to place events on a timeline. Times are usually expressed in English by adverbial or prepositional phrases, as shown in the following: (171) a. on Thursday b. November 15, 2004 c. Thursday evening d. in the late 80’s e. Later this afternoon f. yesterday In order to anchor events to times on a timeline so questions can be answered, it is necessary to normalize time expressions to a representation that can be mapped to a timeline. Such a normalization simplifies by conflating the different ways English has for referring to the same time (e.g., 11/15/04, November 15, 2004, the 15th of November in 2004, etc.). It also resolves any indexical component there might be to a time. Many time expressions refer to a point in time via some indexical anchor, as seen in the following. (172) a. today b. next Friday c. last week d. in October All of these expressions refer to a time, but they do not by themselves fully specify that time. They refer via reference to the moment of utterance–in the case of texts, the document creation time. One has to know the time of utterance in order to retrieve the time referred to and normalize them to some machine-readable form. ISO 8601 provides a useful standard for the purpose of normalizing times. However, English has numerous ways to express what might be called ‘indeterminate’ times, which cannot be determinately linked to a timeline. Some examples follow. (173) a. in the Fall of this year b. recently c. yesterday morning
316
PUSTEJOVSKY ET AL.
Such times cannot be interpreted directly as parts of a timeline, because their begin and end points are more or less vague. Nevertheless, they can be ordered with respect to most points on a timeline, and so a knowledge representation system for QA must have some way of normalizing them. In Section 4 we discuss a set of indicators that are useful for normalizing many such expressions. The time expressions mentioned so far refer, with greater or lesser granularity and with greater or lesser precision, to coherent ‘chunks’ of the timeline. They provide a means for directly associating particular events with particular parts of the timeline, and are thus of primary importance in QA applications. English contains two more kinds of time expressions which involve slightly more complex means of anchoring events to times. The first of these two types is the duration. Durations refer not to parts of the timeline, but to quantities of time. (174) a. after three weeks b. a day c. for three hours d. a two-hour flight The time expressions in these examples simply indicate the duration of events. They thus might be significant to making inferences about how events are ordered with respect to each other, or making inferences about the location of a particular event on the timeline. These topics will be discussed below. One also cannot rule out the possibility that a QA system might be presented with a query about the duration of a particular event. The representation of such expressions must normalize these periods of time so inferences can be drawn. Other durations are part of a more complex system for indicating the time of an event. (175) two weeks from today We call this latter type of duration anchored durations, because they express the time of an event by making explicit the duration of time between the event and a time. In fact, they can be said to be part of a compositional time expression. For example, in (175), the duration two weeks is anchored to the time expression today. Thus, in combination with the temporal preposition from and the time expressed by today, it refers to a time two weeks after the document creation time. In order to effectively answer questions about the event referred to, a QA retrieval system should have some access to the time referred to. It
TEMPORAL AND EVENT INFORMATION
317
could be retrieved via a calculation from the anchor, the duration, and the nature of the relation indicated by from, or this information could be calculated by an information extraction system and stored as part of the annotation of this statement. The language for representing time in QA systems should allow for either possibility. Note that durations can also be used to anchor events to other events. (176) Three weeks before the invasion, most stockpiles were destroyed. The time expression here does not refer to parts of the timeline, but indicates distance along it–the amount of time that separates the italicized events. As such, it does not directly anchor events to times, but may allow the time for an event to be inferred. Like durations anchored to times, the amount of time they indicate should be represented in any temporal language for question answering. The final type of time expression to consider is exemplified below. (177) a. every Thursday b. two weekends per month These time expressions indicate what are referred to as sets of times. They refer neither to coherent chunks of the timeline, nor to distances along it, but to, roughly speaking, groups of distinct pieces of the timeline. They are used to place recurring events on the timeline, particularly when the recurrence is regular. While corpus study reveals that such time expressions are not common in English texts, they are one way English allows events to be associated with the timeline, and a language for representing the temporal aspects of English texts should have some way of normalizing them. A way to do so will be discussed in Section 4. Relations In order to perform the fundamental tasks of anchoring and ordering events, the last major building block required of a temporal annotation language is the ability to represent temporal relations. The language must have some way to characterize the relationship between events and times. English and other natural languages do not usually express the interval which a given event takes on the timeline directly, in terms of its specific endpoints. Instead, they use a range of strategies to indicate a relation between a given event and other times and events in the text. In most cases, the result is that the interval which a given event takes on the timeline is expressed only partially.
318
PUSTEJOVSKY ET AL.
(178) A Brooklyn woman who was watching her clothes dry in a laundromat was killed on Thursday evening . . . In this example, a temporal relation which we might express as is included is predicated between a time and the event referred to. It expresses that the the interval in which the event killed occurs is included in the time Thursday evening. This is signalled by the presence of the preposition on. This information is partial in that the precise begin and end points are not specified for the event. The reader only knows that it occurred somewhere within a particular range on the timeline. If specific end and begin points for events were always expressed, it would be possible for a knowledge base to directly represent them as part of the event. However, a relation like ‘is included’ anchors an event to a time by specifying a pair of ordering constraints on the endpoints of the event; its begin point is after the time referred to by Thursday evening begins and its end point is before Thursday evening ends. Such a complex pattern is best expressed by a relation which can easily be interpreted by machine. The need for relations is seen even more clearly in examples like the following. (179) a. Not that long ago, before the Chinese takeover, real estate prices in Hong Kong hit a record high. b. We were eating dinner when the wall fell on us. c. After John left, I realized he still had my pen. In these examples, the only information that is expressed about the temporal extent of the events are ordering constraints on the begin and end points of the events. (179a) is an example of a before relation, which orders the two events with respect to each other. The ordering constraint implied by this relation might be useful if, for example, the following statement was part of the knowledge base. (180) The Chinese takeover of Hong Kong took place on July 1, 1997. Another way that partial information about the temporal extent of events is expressed in English is the tense/aspect system. This grammatical marking system expresses the temporal extent of an event expressed by a finite clause with respect to the time of utterance (in the case of texts, the document creation time ), and to a reference time (for details, see Reichenbach, 1947). This can be seen in examples such as the following:
TEMPORAL AND EVENT INFORMATION
319
(181) a. Kidnappers kept their promise to kill a store owner they took hostage . . . b. The killers had demanded a ransom when they contacted police on Tuesday. In (181a), the use of the past tense indicates that the event kept is located before the document creation time . In (181b), the use of perfective aspect indicates that the event demanded was complete before the time of the event finished. We need to consider what relations our language needs to express in order to retrieve answers to questions. Allen (1984) laid out the space of possibilities for relating intervals to one another based on the possible ordering of the endpoints of intervals, as shown in Figure 50. A A EQUALS B
B
A B
A is BEFORE B; B is AFTER A
A B
A
A MEETS B; B is MET BY A
A OVERLAPS B; B
A
B is OVERLAPPED BY A
A STARTS B; B
B is STARTED BY A A
B
A FINISHES B; B is FINISHED BY A
A B
Figure 50.
A is DURING B; B CONTAINS A
The interval relations as defined by Allen (1984)
English does not succintly express all of the possible relations between intervals, so it might not be necessary for a temporal annotation language to use all of them. Overlaps, for example, is difficult to find instantiated in natural language text. The most important requirement of the set of relations used by the language under consideration is that it allow inferences to be drawn from pairs of relations. This is
320
PUSTEJOVSKY ET AL.
because in natural language texts, it is uncommon for each event to be associated with a time. As we have seen, much more significant is the placement of events with respect to each other and with respect to certain key times. That is, information about the placement of a given event on the timeline is almost always partial and distributed across several clauses. Thus, a QA system needs to have reference to a system of relations which allows it to easily combine the different kinds of temporal information expressed in a set of English statements and make judgements about the temporal location of a events. Because reasoning over the Allen relations is well-understood, they provide a good basis for the set of relations needed by a QA system. Further, the more temporally-oriented questions seen in Section 2 are phrased directly in terms of temporal relations. Subordination relations Examination of the following examples makes clear that there is, in fact, another kind of event-event relation that is important to be able to represent in a QA knowledge base. (182) a. Five other U.N. inspection teams visited a total of nine other sites, the agency reported. b. . . . said he regretted the civilian casualties . . . c. U.S. officials claim they already see signs Saddam Hussein is getting nervous. d. German law requires parties to publish the name and address of anyone who donates . . . ... The veracity of the event referred to by the italicized word in each example—whether the event can be treated as real—is affected by the fact that it is embedded under the underlined verb. The sentence does not simply represent the event as being part of the actual past or present, or projected future. Instead, it expresses the event in qualified terms. This is very similar to modality, discussed above. In (182a), for instance, the underlined event is qualified by being the argument of report. Its veracity depends on the reliability of the reporting agent. In (182b), the underlined event is presupposed to be true, because regret is a factive predicate in English (Kiparsky and Kiparsky, 1970). The relations expressed between subordinated events and the events that subordinate them are not temporal relations, per se, (though they may have temporal implications); nevertheless, it is crucial that they are
TEMPORAL AND EVENT INFORMATION
321
represented in a QA knowledge base. In order to effectively answer a question about an event it is very important to know whether the writer has presupposed its veracity, deferred responsibility for its veracity to another party, or presupposed its falsity, as in (182c). The use of such relations and even their significance will vary from application to application, but a QA retrieval application will have to know that the relation exists. Thus, a language for modelling temporal information in texts should have some way to represent the different sorts of subordination relations that can be expressed. In Section 4, we present a complete set of relations for this purpose. 3.2. Extracting information for use by QA systems As seen above, English (as well as any other natural language) has many mechanisms for expressing temporal properties of events. So far, we have primarily discussed the nature of those temporal properties themselves; we have considered what can be expressed from the perspective of how that information could be represented in such a way that a QA retrieval system could have access to it. Now we turn to considerations of how temporal information might be extracted from natural language texts. A temporal annotation language should also capture the kind of information that might be required by tools and algorithms for automatically annotating temporal information. Such tools and algorithms include machine learning techniques which might use human-annotated documents as training data, as well as more rule-based techniques, which might exploit linguistic regularities to derive temporal relations. We envision a multi-step, layered, information extraction process in which distinct modules may be responsible for extracting different pieces of information and incrementally marking up a knowledge base so information can be retrieved from it. Because the temporal relations representing the anchoring and ordering of events are the ultimate goal of an extraction process, such a process requires a practical way to represent all the building blocks which might be used to determine the temporal relations. Morphosyntax of events As discussed above, the tense/aspect system of English is an important method of locating events on the timeline. Thus, a language used to record information relevant to temporal information extraction systems would have to record the tense and aspect of finite verbs so it could infer a relation between the document creation time and the event based on
322
PUSTEJOVSKY ET AL.
the event’s tense. While we would not expect a QA retrieval system to use information about the tense of a verb used to express a given event, tense/aspect information is absolutely necessary for any information extraction system which attempts to anchor events to times. It also seems likely that information about tense and aspect features may be an important component in attempts to extract temporal relations between events. Some positive results have been reported by Gover et al. (1995), Song and Cohen (1991), and Mani et al. (2003) using machine learning techniques for predicting temporal relations between events based on tense and aspect (among other morphosyntactic factors). As mentioned above, many events are not expressed by finite verbs. Nonfinite verbs are quite common ways to refer to events, and nouns and adjectives are not insignificant. Lapata and Pascarides (2004) found that the grammatical category of nonfinite verbs that express events is significant in predicting temporal relations between events which occur in the same sentence. We expect that the part-of-speech of events not expressed with verbs may be significant in drawing inferences about the temporal anchoring of events. For example, adjectives normally encode states, which, when they can be located temporally, are considered events for our purposes. States are fairly unique in that they are generally persistent. This property of persistence may well be usable to make temporal inferences; an introduced state can be assumed to continue unless its termination is explicitly mentioned. So the part-of-speech of an event-expressing term may give clues as to the type of event, which might in turn have implications about the temporal properties of the event. Next, we turn to the importance of a such a typology of events for temporal information extraction. Typology of events There are several different ways to categorize events. The just-mentioned distinction between states and events is based on temporal properties of events, and is part of the Vendler (1957) classification, which focusses on the internal temporal structure of events. It is certainly plausible that a Vendlerian classification of events might provide useful input to algorithms which attempt to anchor and order events in text, and a temporal annotation language could certainly adopt this classification. However, other aspects of Vendler’s scheme involve quite a bit more sublety than the event/state distinction, and the implications of the distinctions for inferring temporal relations are not at all straightforward, so as we discuss in Section 4, the event/state distinction is the only aspect of Vendler’s classification that we adopt. We are, however,
TEMPORAL AND EVENT INFORMATION
323
concerned with having the ability to automatically infer or extract the sort of embedding relations mentioned above in 3.1.4. It seems that the sort of embedding relation involved is easily predictable from semantic features of the verb. Consider the following examples of events which might be intuitively classed as ‘reporting’ events of some sort. (183) a. In the air, U.S. Air Force fliers say they have engaged in . . . b. In Kuwait, the Iraqis have rimmed the capital city with an air-defense system, according to a U.S. official. c. A senior law enforcement source tells CNN, the evidence is mostly circumstantial. d. At least 51 people were reported killed in clashes between Serb police and ethnic Albanians. e. The spokesman added that the deal has not been signed yet. It is easy to see that these underlined reporting events bear roughly the same sort of relation to their embedded events as described in 3.1.4 They seem to express that the agent of the reporting event is responsible for establishing the veracity of the embedded event. It would seem that a classification which recognized these verbs as belonging to the same class would allow us to easily infer the presence of this particular type of subordination relation. Thus, it would be very useful for a temporal information extraction system to have access to a classification of events which would allow it to predict the sort of subordination relations they introduce, particularly in case there is some level of ambiguity in the way such relations are introduced; if a given verb can have senses which introduce subordination relations and senses which do not, a classification algorithm will be needed in order to automatically infer the presence of these relations. For example, the verb add, which is used in 183e as a reporting verb, obviously has senses in which it has no reporting meaning and thus does not introduce a subordination relation. This suggests that extracting subordination relations will be a multi-step process which involves the classification of events by some sort of disambiguation. In section 4, we propose a set of event types for this purpose. One particular class of events which subordinate other events deserves special mention here, because it introduces subordination relations of a unique sort.
324
PUSTEJOVSKY ET AL.
Aspectual verbs and relations Consider the examples below. (184) a. The tank began leaking oil on Friday morning. b. The phony war has finished and the real referendum campaign has clearly begun. c. An intense manhunt continues for Rudolph in the wilderness of western north Carolina. In each, the underlined verb, rather than expressing an event as such, expresses an important temporal property of the event referred to by the embedded verb. We might say that it ‘chooses’ an event or a time and expresses a temporal relation between that event or time and the underlined event. For example, in (184a), the event of leaking is said to begin on Friday. If we compare the sentence in (185), we see that different relations between the time Friday morning and the event leaking are expressed in these cases. (184a) gives more specific information about the temporal extent of the leaking event. (185) The tank was leaking oil on Friday morning. In the former, a begins relation is expressed between leaking and Friday morning. Without the verb began, only an includes relation is expressed between the two, meaning the temporal extent of leaking includes the time referred to by Friday morning. Thus, the fact that leaking is subordinated to began in this example is a very important fact that a temporal annotation language must record if it is going to be useful for inferring anchorings and orderings of events. The behavior of verbs like begin, stop, continue, etc., closely parallels the grammatical category of aspect, and thus, the underlined verbs are often referred to as ‘aspectual’ verbs. In Section 4, we introduce a classification of events which includes aspectual events. While the temporal extent of aspectual events are not in themselves of interest, they seem to form a class in that they introduce a special sort of subordination relation. Paralleling the event classification, we refer to the sort of subordination relation they introduce as an ‘aspectual’ relation. Note that aspectual verbs can be subcategorized in terms of the sort of aspectual relation they introduce. For example, verbs like begin, start, and commence all express an aspectual relation which allows one to infer a begins relation between the embedded verb and some other time or event. A temporal annotation language should both recognize the category of aspectual
TEMPORAL AND EVENT INFORMATION
325
verbs, and provide a way to characterize the subordination relations they introduce. Signals of temporal relations Above, we saw examples of temporal relations being expressed by temporal prepositions and conjunctions like before, after, while, on, etc. In order to automatically extract temporal relations, it is important to be able to first identify such signals. Because these expressions have multiple uses (for example, on can be used as a locative preposition), it becomes necessary to identify when they are signalling a temporal relation and when they are not. Again, we envision a multi-step information extraction process, in which temporal signals are likely to be identified early so later algorithms can exploit them. Because we also expect that human-annotated corpora will be used for machine learning, it will be necessary for the markup language to have some way to associate temporal relations with the signals that express them. The functional content of time expressions As seen above, time expressions in natural language do not usually fully specify a time. Instead, they often function indexically, picking out a time via reference to some anchoring time in the context, as seen below: (186) a. The White House press secretary reports that the president will leave for Istanbul tomorrow. b. The prime minister’s last visit was in October. c. He didn’t make it to Istanbul until the following Saturday. Tomorrow, for example, does not refer to any particular time until its indexical anchor (usually the document creation time ) is recovered. It refers to a day one day after that anchor. As a time expression on its own, a phrase like October has similar behavior. The October it picks out is picked out with respect to the document creation time. Expressions like the following Saturday parallel this behavior, except that their anchors are times other than the document creation time. It is possible to see such expressions as functional in the sense that they return determinate values based on their anchoring. That is, their meaning returns a value when given an anchor time. While we assume that a QA retrieval system will have more use for the fully-specified, normalized value of a time expression, the possibility of representing the functional content which is the meaning of these expressions would
326
PUSTEJOVSKY ET AL.
be extremely useful in an incremental process for extracting temporal information; it allows the process of recovering the functional content of these expressions to be separated from the process of normalizing and fully specifying their value. In Section 4, we present a proposal for representing the functional content of time expressions. 4. TimeML
The questions presented in the question corpus revealed that some understanding of time was necessary in order to both model and answer the questions. Moreover, many of the questions and the data that could answer them involved temporal relationships in an implicit way. That is, while some questions such as When was John F. Kennedy president? require the use of time directly, others are far less explicit. For example, the question, Who was president in 1958? is not so directly about time (i.e. it is not a when question), but it surely requires a temporal understanding to answer it. In Section 3, the features of a system capable of working in a QA system were discussed. Such a system must be able to represent temporal expressions, events, and relationships. TimeML is a modeling language that has been designed with these features in mind. In this section, we discuss how this is accomplished and point out some of the expressive power of TimeML. The tags employed in TimeML are all intended to assist in the understanding of time so that questions and corpora can be modeled, leading to eventual question answering. To that end, TimeML used four different tag types. The TIMEX3 tag is used to capture all temporal expressions. The EVENT tag captures all temporal events. Functional words such as at and from are annotated with the SIGNAL tag. Finally, all relationships between the other tags are represented with the LINK tags: TLINK, SLINK, and ALINK. For a complete description of TimeML, the reader can refer to www.timeml.org. 4.1. Representing Temporal Expressions At the core of any scheme designed to provide temporal understanding is a method for representing specific temporal expressions such as 1961 or today. TimeML models this type of expression with the TIMEX3 tag. There are four types of temporal expressions captured in TIMEX3: time, date, duration, and set, each corresponding with the types described in 3.1.2.
TEMPORAL AND EVENT INFORMATION
327
An expression that receives the TIME type is one that refers to a time of the day, even if in a very indefinite way. The easiest way to distinguish a time from a date is to look at the granularity of the expression. If the granularity of the expression is smaller than a day, then the expression is a time. For example, the following expressions fit into this category:
Mr. Smith left ten minutes to three at five to eight at twenty after twelve at half past noon at eleven in the morning at 9 a.m. Friday, October 1, 1999 the morning of January 31 late last night
Notice that most of these examples are not fully specified temporal expressions. That is, they appear to be within a context that provides their complete specification, including the date on which they take place. With the exception of the expression 9 a.m. Friday, October 1, 1999, each of these expressions requires more information to fully represent what they entail. This is a recurring phenomenon with temporal expressions that TimeML addresses with temporal functions. This technique will be discussed shortly. The DATE type can be thought of as any expression that refers to a calendar time. Again, there may be some confusion as to when an expression is a time and when it is a date. The granularity test continues to help with this as dates are generally of a day or larger temporal unit. As with times, dates are often underspecified. Here are a few examples:
Mr. Smith left Friday, October 1, 1999 the second of December yesterday in October of 1963 in the summer of 1964 on Tuesday 18th in November 1943 this year’s summer last week
328
PUSTEJOVSKY ET AL.
An expression is a DURATION if it explicitly describes some extent of time. Examples of this are: Mr. Smith stayed 2 months in Boston 48 hours three weeks all last night 20 days in July 3 hours last Monday.
Finally, the SET type is used for expressions that describe a set of regularly reoccurring times. These are expressions such as: John swims twice a week. every 2 days.
The type of a temporal expression is represented in the tag along with a specific value for the time expression. A temporal expression’s value is annotated with an extension of the ISO 8601 standard. For example, a fully specified temporal expression such as the one in (187a) has a value of “2004-11-22”. A TimeML annotation produces XML as in example (187b). (187) a. November 22, 2004 b. <TIMEX3 tid="t1" type="DATE" value="2004-11-22"> November 22, 2004 The value of temporal expressions that are not fully specified are not as obvious as those whose extent contains all of the necessary information. For these kinds of expression, the value must be normalized. But, before this is discussed, it is useful to examine one more aspect of the simple example in (187). The tid attribute is an automatically assigned ID number that allows the expression to be mentioned elsewhere in the annotation. For instance, “t1” above might participate in a temporal link with some event. The method for doing this is found in 4.4.1, but it is enough to say, for now, that all objects in TimeML receive an ID number similar to the tid given in the TIMEX3 tag. When a temporal expression is not fully specified, placeholders can be used in the value attribute. For example, an expression such as January 12 provides no year information. It can be given a value of
TEMPORAL AND EVENT INFORMATION
329
XXXX-1-12. In the case of times and dates, these placeholders are generally removed in favor of a more complete annotation provided by temporal functions. Durations and sets are rarely, if ever, under specified, but they do receive some special attention in both the value attribute and the TIMEX3 tag as a whole. In the following subsection, temporal functions for times and dates will be described, but, first, we will briefly turn to these special aspects. The first attribute value of note for durations is contained in value. Durations are required to a have a particular format in this attribute because they represent a period of time. A sample annotation for a simple duration is given in (188). (188) <TIMEX3 tid="t1" type="DURATION" value="P3D"> three days Durations are also elligible to use two additional TIMEX3 attributes: beginPoint and endPoint. These are used to capture what were called in Section 3 anchored durations. For example, the expression a week from Monday has a begin point, namely, the tid for Monday. With this information, the actual date that the full phrase refers to can be calculated. TimeML allows for an additional TIMEX3 to be created to annotate the missing point. This is a useful and neccesary part of TimeML. The following example reveals why. (189) John will leave a week from Monday. Althought we have not yet introduced the TimeML methods for capturing events and temporal relationships, it should be clear that leave is linked in some way to the expression a week from Monday. Yet, it is not directly related to either a week or Monday. Using the method described above, a tid can be created that can participate in a link such that leave is truly anchored to the correct time. In the case of the set type, the value attribute must work together with at least one of two additional TIMEX3 attributes: quant and freq. The former represents any quantifier that is used in the expression. For instance, every Tuesday would receive a quant of EVERY and a value of “XXXX-WXX-2”, the ISO 8601 representation of Tuesday. The frequency of the expression is represented in the freq attribute as in 3 days each week. The annotation of this expression is given in (190).
330
PUSTEJOVSKY ET AL.
(190) <TIMEX3 tid="t1" type="SET" value="P1W" quant="EACH" freq="3D"> 3 days each week Functional Content of Temporal Expressions TimeML strives to capture all temporal expressions with the TIMEX3 tag, but, as is apparent in the above examples, many of these expressions seem to be missing information critical to their full specification. In fact, analysis of the corpus reveals that there are generally very few fully specified temporal expressions. The reader uses these to fully appreciate the rest of the temporal expressions. Temporal functions are TimeML’s way of doing the same thing. When a TIMEX3 is underspecified, it is anchored to a fully specified temporal expression. This is often the expression that includes the functionInDocument attribute in its TIMEX3. For example, a news report often includes a specific document creation time. If the article refers to today, that expression is anchored to the document creation time to complete its specification. In the same manner, an expression such as July 9 is underspecified until the appropriate year is supplied. Since that information can be extracted from the document creation time, it is anchored to that TIMEX3 and the correct year is added to the value of the July 9 TIMEX3. When an expression requires an anchoring to be completely specified, an attribute called temporalFunction receives a “true” value. When an annotation is done manually, this attribute is just an indication that the value of the TIMEX3 was calculated by way of a temporal anchor, which the annotator must also supply. An automatic annotation will use functions to do the same thing. In the next sections, descriptions of these functions are provided along with examples of the functions in action. Notice that the underspecified TIMEX3s still have three core attributes: tid, type, and value. When a temporal function is also used, three more attributes are added: − temporalFunction – a boolean attribute that indicates that a function is necessary − anchorTimeID – the tid of another TIMEX3 that provides information to the temporal function − valueFromFunction – the tfid, or temporal function ID, of the function that completes the TIMEX3 specification The reader may wonder why the value and valueFromFunction attributes are both used since expressions that require functions, by
TEMPORAL AND EVENT INFORMATION
331
definition, do not contain enough information to provide a value. However, it is not always the case that the expression doesn’t contain any specific temporal information at all. In cases such as today, the extent of the tag cannot lend any information to the value attribute and, truly, the temporal function must do all the work. Still, cases such as Wednesday do contain specific information that should be captured by the TIMEX3 tag. In the former case, the value must be something like “XXXX-XX-XX”, where the X-placeholder is used to show that the format of this value should be that of a DATE, but that no other information has been provided. In the latter case, though, it is useful to capture that the expression makes use of specific temporal information by giving a value of “XXXX-WXX-3”. Specification of Selected Temporal Functions 1. Usage: Indicate a future reference argumentID: ID of last anchor in the chain of functions Example: in the future <TIMEX3 tid="t1" type="DATE" value="FUTURE_REF" temporalFunction="true" valueFromFunction="tf1" anchorTimeID="t0"> future 2. Usage: Indicate a past reference argumentID: ID of last anchor in the chain of functions Example: in the past <TIMEX3 tid="t1" type="DATE" value="PAST_REF" temporalFunction="true" valueFromFunction="tf1" anchorTimeID="t0"> past 3. Usage: Indicate a present reference argumentID: ID of last anchor in the chain of functions (i.e. whatever time “now” refers to) Example: now
332
PUSTEJOVSKY ET AL.
<TIMEX3 tid="t1" type="DATE" value="PRESENT_REF" temporalFunction="true" valueFromFunction="tf1" anchorTimeID="t0"> now 4. Usage: Returns the enclosing time period of the specified type given in scale argumentID: ID of last anchor in the chain of functions; generally, the DCT ID for simple temporal expressions scale: Name of a type of time period (granularity); “hour, minute, day, year”, etc. Example: this week <TIMEX3 tid="t1" type="DURATION" value="P1W" temporalFunction="true" valueFromFunction="tf1" anchorTimeID="t0"> this week 5. Usage: Given a time period of a standard granularity, returns a new time period of the same type that precedes or succeeds the original by the number given in count argumentID: ID of last anchor in the chain of functions count: Numeric attribute that specifies how much to move on the timeline signalID: ID of the signal that prompted the use of the function Example: 4 weeks ago <TIMEX3 tid="t1" type="DURATION" value="P4W" temporalFunctiion="true" valueFromFunction="tf1" anchorTimeID="t0"> 4 weeks <SIGNAL sid="s1"> ago
TEMPORAL AND EVENT INFORMATION
333
signalID="s1"/> 6. Usage: Indicates a modification of the argument time or time period; an approximation function argumentID: ID of last anchor in the chain of functions signalID: ID of SIGNAL that prompted the use of the function direction: later|earlier (for times), larger|smaller (for time periods), unspecified (for adjustments in either direction) quantity: a numeral, unspecified, or small that indicates the amount of adjustment value: the value of the argument time or time period – this information is captured in the TIMEX3 tag, so this attribute should likely be dropped. Example: for just over two years <TIMEX3 tid="t1" type="DURATION" value="P2Y" temporalFunction="true" valueFromFunction="tf1"> for just over two years 4.2. Representing Events The goal of TimeML is to provide a language for the representation of temporal relations. Temporal expressions, captured with TIMEX3, are the first ingredient in many of these relationships. Events are the next ingredient and are primarily represented with the EVENT tag, followed by the MAKEINSTANCE tag. The EVENT Tag Much like the TIMEX3 tag, TimeML captures several different types of event. The type of event is stored in the class attribute. A TimeML event will fit into one of these categories: 1. REPORTING: When a person or organization declares something, narrates an event, or informs about an event, the event that describes that action is of the REPORTING class. These are generally verbs such as: say, report, tell, explain, state.
334
PUSTEJOVSKY ET AL.
2. PERCEPTION: This class includes events that involve the physical perception of another event. Such events are typically expressed by verbs like: see, watch, glimpse, behold, view, hear, listen, overhear. 3. ASPECTUAL In languages such as English and French, there is a grammatical device of aspectual predication, which focuses on different facets of event history: a) b) c) d) e)
Initiation: begin, start Reinitiation: restart, reinitiate, reignite Termination: stop, cancel Culmination: finish, complete. Continuation: continue
Events that are of this class also participate in a particular kind of TimeML link called an ALINK (for “Aspectual Link”) so that the relationship between the ASPECTUAL event and the one it predicates over can be shown. 4. I ACTION: An I ACTION is an Intensional Action. An I ACTION introduces an event argument, which must be in the text explicitly. The event argument describes an action or situation from which we can infer something given its relation with the I ACTION. For instance, the events introduced as arguments of some I ACTIONS may not necessarily have occurred when the I ACTION takes place. Explicit performative predicates are also included here. Note that the I ACTION class does not cover states as they have their own associated classes. For the most part, events that are tagged as I ACTIONs are in a closed class. The following list provides a sampling of this class: a) b) c) d) e) f) g) h) i)
attempt, try, scramble investigate, investigation, look at, delve delay, postpone, defer, hinder, set back avoid, prevent, cancel ask, order, persuade, request, beg, command, urge, authorize promise, offer, assure, propose, agree, decide swear, vow name, nominate, appoint, declare, proclaim claim, allege, suggest
TEMPORAL AND EVENT INFORMATION
335
5. I STATE I STATE events are similar to the previous class. This class includes states that refer to alternative or possible worlds (delimited by square brackets in the examples below), which can be introduced by subordinated clauses (a), nominalizations (b), or untensed VPs (c): a) Russia now feels [the US must hold off at least until UN secretary general Kofi Annan visits Baghdad]. b) “There is no reason why we would not be prepared for [an attack”]. c) The agencies fear they will be unable [to crack those codes to eavesdrop on spies and crooks]. Here again is a list of events that fall into this category: a) believe, think, suspect, imagine, doubt, feel, be conceivable, be sure b) want, love, like, desire, crave, lust c) hope, expect, aspire, plan d) fear, hate, dread, worry, be afraid e) need, require, demand f) be ready, be eager, be prepared g) be able, be unable 6. STATE: STATEs describe circumstances in which something obtains or holds true. However, only certain events in this category are annotated in TimeML: a) States that are identifiably changed over the course of the document being marked up. Remember that TimeML’s chief concern is to annotate temporal events. If a STATE is deemed persistent throughout the event line of the document, it is factored out and not annotated. Conversely, if a property is known to change during the course of events represented or reported in the article, that property is marked as a STATE. b) States that are directly related to a temporal expression. If a STATE directly participates in a temporal relationship, it must be annotated to do so. Again, this is an example of limiting TimeML STATEs to ones that involve time. c) States that are introduced by: an I ACTION, an I STATE, or a REPORTING event.
336
PUSTEJOVSKY ET AL.
d) Predicative states the validity of which is dependent on the document creation time . 7. OCCURRENCE: This class includes all the many other kinds of events describing something that happens or occurs in the world. Essentially, this is a catch-all category for events that participate in the temporal annotation, but do not fit into any of the above categories. The annotation of an EVENT is quite simple as it only includes the class attribute and a tag that identifies it. The following tag holds much more information about the event, or rather an instance of that event. As such, examples of annotated EVENTs are provided below. The MAKEINSTANCE Tag Once an event is tagged in TimeML, an instance of that event is created with the MAKEINSTANCE tag. It is this event instance that participates in temporal relationships. The need for this extra tag arrises from corpus analysis. MAKINSTANCE is also the first example of an offline TimeML tag. That is, both the TIMEX3 and EVENT tags are inserted directly into a document so they surround the text they capture. Again, the data calls for instances of an event to be annotated out of line because these instances do not always capture text directly from the document. The first incarnation of this extra tag was developed to capture multiple instances of an event. The following simple sentence reveals why MAKEINSTANCE is necessary in this case: (191) “John teaches on Monday and Wednesday.” One might believe the EVENT and TIMEX3 tags along with the soon to be discussed temporal relationship tags could successfully capture the information this sentence contains. However, without multiple instances of the teaches event, such a relationship would suggest that the same event occurs on both Monday and Wednesday. The MAKEINSTANCE tag allows a more accurate representation of this sentence such that the occurrences of teaches on Monday and Wednesday are unique. In the above sentence, the teaches event is annotated first with an EVENT tag and then with two MAKEINSTANCE tags: John <EVENT eid="e1" class="OCCURRENCE"> teaches
TEMPORAL AND EVENT INFORMATION
337
on Monday and Wednesday. <MAKEINSTANCE eiid="ei1" eventID="e1" tense="PRESENT" aspect="NONE"/> <MAKEINSTANCE eiid="ei2" eventID="e1" tense="PRESENT" aspect="NONE"/> Along with this ability to create multiple instances of an event, the MAKEINSTANCE tag captures other information about the event instance. The values for these attributes are generally lexically motivated. The tense and aspect of the event are represented in the appropriate MAKEINSTANCE for that event along with the instance’s modality and polarity. Again, this information must appear in an offline tag because it can change for multiple instances of the event. A simple sentence can demonstrate this: (192) “John teaches on Monday but might not on Tuesday.” Here, one instance of teaches contains both a modal and negation operator while the other does not: John <EVENT eid="e2" class="OCCURRENCE"> teaches on Monday but might not on Tuesday <MAKEINSTANCE eiid="ei1" eventID="e2" tense="PRESENT" aspect="NONE"/> <MAKEINSTANCE eiid="ei2" eventID="e2" tense="PRESENT" aspect="NONE" modality="MIGHT" polarity="NEG"/>
4.3. Representing Signals The final TimeML tag that is used before relationships are represented is the SIGNAL tag. Functional words such as before and during are captured with this tag to make explicit the part they play in determining relationships between times and events. Once a text has been given an annotation for times, events, and signals, TimeML can begin to relate them to each other. In the next section, these relationships will be described. A complete annotation for a simple sentence, prior to any links being added, looks like this:
338
PUSTEJOVSKY ET AL.
John <EVENT eid="e2" class="OCCURRENCE"> teaches <MAKEINSTANCE eiid="ei1" eventID="e2" tense="PRESENT" aspect="NONE"/> <SIGNAL sid="s1"> at <TIMEX3 tid="t1" type="TIME" value="2004-11-22T15:00" temporalFunction="TRUE" anchorTimeID="t2"> 3:00 <SIGNAL sid="s2"> on <TIMEX3 tid="t2" type="DATE value="2004-11-22"> November 22, 2004 .
4.4. Representing Relationships Many events are explicitly anchored to a specific time within a document. An article might include a sentence such as: (193) “John taught at 3:00 p.m.” In this case, the taught event can be stamped with the 3:00 p.m. time so that this anchoring relationship is clear. Time stamping is an effective way to represent some temporal relationships, but it cannot capture relationships that involve the ordering of events and times, or any other relationships between two events. For example, the subordinating relationship an I ACTION has with another event is key to the understanding of the text. That relationship may be a modal one that calls into question whether the latter event actually takes place, or it could negate that latter event altogether. With instances of events available along with the annotated temporal expressions, TimeML can effectively do time stamping with a LINK tag, presented in the following section. TimeML is not limited to this kind of temporal relationship, though. The LINK tags capture
TEMPORAL AND EVENT INFORMATION
339
both anchoring and ordering relationships as well as subordinating and aspectual ones between event instances. TIMEX3 and EVENT tags only begin to reveal the representational power of TimeML. In order to adequately represent text and queries for question answering, an annotation requires a method for capturing all sorts of temporal relationships as well as other relationships that have already been touched upon. To that end, there are three LINK tags in TimeML: 1. TLINK: Temporal Link, captures anchoring and ordering relationships 2. SLINK: Subordinating Link, captures subordinating relationships between event instances 3. ALINK: Aspectual Link, captures aspectual relationships between ASPECTUAL event (instances) and the event instance over which it predicates As with the MAKEINSTANCE tag, these linking tags appear offline since they don’t specifically capture any text. Each tag has particular attributes associated with it. The most crucial of these is the relType attribute, which has different possible values depending on the type of the link. Since the relType is the primary indicator for what relationship the participating temporal entities share, this attribute will be the focus of the following discussion of each tag. Temporal Links A TLINK or Temporal Link represents the temporal relationship holding between events, times, or between an event and a time. Note that EVENTs participate in a TLINK by means of their corresponding event instance IDs. In the present explanation, however, the words “events” and “event instances” are used interchangeably. This same observation applies also for SLINKs and ALINKs, below. As a rule, EVENTs never participate in a LINK. Only their associated event instances are eligible. The following enumeration describes the possible values for the attribute relType in a TLINK tag: 1. Simultaneous Two event instances are judged simultaneous if they happen at the same time, or are temporally indistinguishable in context, i.e. occur close enough to the same time that further distinguishing their times makes no difference to the temporal interpretation of the text. 2. One before the other:
340
PUSTEJOVSKY ET AL.
As in the following example between the events slayings and arrested: The police looked into the slayings of 14 women. In six of the cases suspects have already been arrested.
3. One after the other: This is just the inverse of the preceding relation. So the two events of the previous example can alternatively be annotated as expressing an after relation, if the directionality is changed. 4. One immediately before the other: As in the following sentence between crash and died. All passengers died when the plane crashed into the mountain
5. One immediately after than the other: This is the inverse of the preceding relation. 6. One including the other: As is the case between the temporal expression and the event in the following example: John arrived in Boston last Thursday.
7. One being included in the other: The inverse relation to the preceding one. 8. One holds during the other: Specifically applicable to states or events that persist throughout a duration, for example: James was CTO for two years. John taught for 20 minutes on Monday.
9. One being the beginning of the other: As holds between the first of the temporal expressions and the event in the following example: John was in the gym between 6:00 p.m. and 7:00 p.m.
10. One being begun by the other: The inverse relation to the one just introduced. 11. One being the ending of the other: John was in the gym between 6:00 p.m. and 7:00 p.m..
12. One being ended by the other: The inverse relation to the one just introduced.
TEMPORAL AND EVENT INFORMATION
341
13. Event identity: Event identity is also annotated via the TLINK. The relationship is used when two events are deemed to be the same event within the document. E.g.: John drove to Boston. During his drive he ate a donut.
With this rich library of possible temporal relationships, the TLINK can both anchor an event instance to a particular time and order event instances with respect to one another. In addition, some of these relationships work specifically with events of the duration type. TLINK is arguably the most important tag in all of TimeML. It greatly increases the power of the annotation by providing the tools for temporal ordering , a feature lacking in traditional time stamping procedures. Whether a question itself requires ordering in its representation or the text that can answer that question necessitates it, the anchoring and ordering capabilities of the TLINK tag greatly increase the likelihood that question answering can be achieved. To see these TLINKs in action, we complete the example from the end of the last section, adding two temporal links: John <EVENT eid="e2" class="OCCURRENCE"> teaches <MAKEINSTANCE eiid="ei1" eventID="e2" tense="PRESENT" aspect="NONE"/> <SIGNAL sid="s1"> at <TIMEX3 tid="t1" type="TIME" value="2004-11-22T15:00" temporalFunction="TRUE" anchorTimeID="t2"> 3:00 <SIGNAL sid="s2"> on <TIMEX3 tid="t2" type="DATE value="2004-11-22"> November 22, 2004 .
342
PUSTEJOVSKY ET AL.
"IS_INCLUDED" signalID="s1"/>
Subordinating Links An SLINK or Subordination Link is used for contexts introducing relations between two events. SLINKs are of one of the following sorts: − Modal: This relation is brought up by events introducing a reference to a possible world – mainly I ACTIONs and I STATEs: John promised Mary to buy some beer. Mary wanted John to buy some wine.
− Factive: Certain verbs introduce an entailment (or presupposition) of their argument’s veracity. They include forget (with a tensed complement), regret, or manage: John forgot that he was in Boston last year. Mary regrets that she didn’t marry John. John managed to leave the party
− Counter-factive: Contrary to the previous relation, in this case the event introduces a presupposition about the non-veracity of its argument: forget (to), unable to (in past tense), prevent, cancel, avoid, decline, etc. John forgot to buy some wine. Mary was unable to marry John. John prevented the divorce.
− Evidential: Evidential relations are typically introduced by REPORTING or PERCEPTION events: John said he bought some wine. Mary saw John carrying only beer.
− Negative evidential: Introduced by REPORTING and PERCEPTION events conveying negative polarity: John denied he bought only beer.
TEMPORAL AND EVENT INFORMATION
343
− Conditional: Introduced by the presence of an ‘if’ construction: If John buys only beer, Mary will get the wine.
SLINKs can be either lexically or structurally-based. Those that are lexically-based are SLINKs introduced by i action, i state, perception, and reporting events. These events generally take a clausal complement or a noun phrase headed by an event-denoting nominal. The SLINK is established between those events and the one denoted by the complement. An SLINK is always introduced when an event is tagged as being in one of these classes. Structurally-based SLINKs are motivated by purpose clauses and conditional constructions. In the first case, an SLINK relates the event in the main clause (bold face) and the one in the purpose clause modifying it (underlined): The environmental commission must adopt regulations to ensure people are not exposed to radioactive waste.
In this example, adopt puts ensure in a modal context, motivating an SLINK with a modal reltype. In a conditional construction, an SLINK relates the event in the antecedent clause and the one in the consequent clause: On Dec. 2 Marcos promised to return to the negotiating table if the conflict zone was demilitarized.
Aspectual Links An ALINK or Aspectual Link represents the relationship between an aspectual event and its argument event. Examples of the aspectual relations to be encoded are: 1. Initiation: John started to read
2. Culmination: John finished assembling the table.
3. Termination: John stopped talking.
4. Continuation: John kept talking.
344
PUSTEJOVSKY ET AL.
An ALINK is required whenever an event is classified as ASPECTUAL. They are, as such, completely motivated by the text. When a document is fully annotated with each of these links, an accurate picture of the text takes shape with respect to time. It is hoped and believed that this picture and the similar annotation of questions will lead to good temporal understanding and question answering abilities.
5. Conclusion
In this paper, we have presented a descriptive framework with which to examine the temporal aspects of natural language queries. We then demonstrated generally how tense and temporal information is encoded in language, motivating a particular specification description. The rest of the paper reported on work done towards establishing a broad and open standard metadata markup language for natural language texts, examining events, temporal expressions, and their orderings. What is novel in this language, TimeML, we believe, is the integration of three efforts in the semantic annotation of text: TimeML systematically anchors event predicates to a broad range of temporally denotating expressions; it provides a language for ordering event expressions in text relative to one another, both intrasententially and in discourse; and it provides a semantics for underspecified temporal expressions, thereby allowing for a delayed interpretation. Significant efforts have been launched to annotate the temporalinformation in large textual corpora, according to the specification of TimeML described above. The result is a gold standard corpus, known as TimeBANK, which has been released for general use. We have also worked towards integrating TimeML with the DAML-Time language (Hobbs, 2002, Hobbs and Pustejovsky, 2003), for providing an explicit interpretation of the markup described in this paper. It is hoped that this effort will provide a platform on which to build a multi-lingual, multi-domain standard for the representation of events and temporal expressions. We are currently building on this work to develop temporal awareness algorithms in the context of question answering systems within the AQUAINT program.
TEMPORAL AND EVENT INFORMATION
345
Acknowledgements The authors would like to thank the other members of the TERQAS and TANGO Working Groups on TimeML for their contribution to the specification language presented here. In particular, we would like to thank Jerry Hobbs, Inderjeet Mani, Rob Gaizauskas, and Graham Katz. This work was performed in support of the Northeast Regional Reseach Center (NRRC) which is sponsored by the Advanced Research and Development Activity in Information Technology (ARDA), a U.S. Government entity which sponsors and promotes research of import to the Intelligence Community which includes but is not limited to the CIA, DIA, NSA, NIMA, and NRO. It was also funded in part by the Defense Advanced Research Projects Agency as part of the DAML program under Air Force Research Laboratory contract F30602-00-C0168.
References Abney, S., M. Collins, and A. Singhal: 2000, ‘Answer Extration’. In: Proceedings of the Applied Natural Language Processing Conference (ANLP). Seattle, WA, pp. 296–301. Allen, J.: 1984, ‘Towards a General Theory of Action and Time’. Artificial Intelligence 23, 123–154. Filatova, E. and E. Hovy: 2001, ‘Assigning Time-Stamps to Event-Clauses’. In: Proceedings of the 2001 ACL Workshop on Temporal and Spatial Information Processing. Grover, C., J. Hitzeman, and M. Moens: 1995, ‘Algorithms for Analysing the Temporal Structure of Discourse’. In: Proceedings of the Sixth International Conference of the European Chapter of ACL. Hovy, E., U. Hermjakob, and D. Ravichandran: 2002, ‘A Question/Answer Typology with Surface Text Patterns’. In: Proceedings of the Second International Conference on Human Language Technology Research (HLT 2002). San Diego, CA, pp. 247–251. Kiparsky, P. and C. Kiparsky: 1970, ‘Fact’. In: Progress in Linguistics. The Hague: Mouton, pp. 143–173. Lapata, M. and A. Lascarides: 2004, ‘Inferring Sentence-Internal Temporal Relations’. In: Proceedings of HLT-NAACL 2004. Mani, I., B. Schiffman, and J. Zhang: 2003, ‘Inferring Temporal Orderings of Events in News’. In: Proceedings of HLT-NAACL 2003. Mani, I. and G. Wilson: 2000, ‘Robust Temporal Processing of News’. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL2000). New Brunswick, New Jersey, pp. 69–76.
346
PUSTEJOVSKY ET AL.
Pustejovsky, J., L. Belanger, J. Casta˜ no, R. Gaizauskas, P. Hanks, B. Ingria, G. Katz, D. Radev, A. Rumshishky, A. Sanfilippo, R. Saur´ı, A. Setzer, B. Sundheim, and M. Verhagen: 2002, ‘TERQAS Final Report’. Technical report, The MITRE Corporation, Bedford, Massachusetts. Reichenbach, H.: 1947, Elements of Symbolic Logic. New York:Macmillan. Schilder, F. and C. Habel: 2001, ‘From Temporal Expressions To Temporal Information: Semantic Tagging Of News Messages’. In: ACL-EACL-2001. Toulose, France, pp. 65–72. Setzer, A.: 2001, ‘Temporal Information in Newswire Articles: an Annotation Scheme and Corpus Study’. Ph.D. thesis, University of Sheffield, UK. Song, F. and R. Cohen: 1991, ‘Tense interpretation in the context of narrative’. In: Proceedings of AAAI ’91. pp. 131–136. Vendler, Z.: 1957, ‘Verbs and Times’. Philosophical Review 66, 143–160.
TIM FERNANDO
FINITE-STATE DESCRIPTIONS FOR TEMPORAL SEMANTICS
1. Introduction
The usual approach to inference in model-theoretic semantics is to restrict attention to models that satisfy certain formulas, among which are meaning postulates stating, for instance, that bachelors are unmarried. Computational costs aside, one might seek to deduce a proposition A from such formulas before simply stipulating A or, for that matter, devising some special-purpose mechanism that spits out A. The question for any given A, however, is whether such a deduction can be carried out with some generality whilst avoiding unwanted entailments. Consider the pairs (1a,b) and (2a,b) from the critique in Ramsay (1994) of coercion in Moens and Steedman (1988) and co-specification in Pustejovsky (1991). (|= is understood here to denote entailment; |= its denial.) (194) a. Harry was hiccuping |= Harry hiccupped twice b. Henriette was crossing the road |= Henriette crossed the road [once, let alone twice] (195) a. John baked a cake |= that baking produced that cake b. John baked a potato |= that baking produced that potato Ramsay claims that suitable meaning postulates account for the a/b contrasts. Very briefly, road crossing is temporally extended “whereas hiccuping is (conceptually) instantaneous” (Ramsay, 1999); and cakes are produced by baking, potatos are not. No rocket science here; only a modicum of common sense that underlies natural language understanding. But will an encoding by meaning postulates do? Reflecting on (1a), it is not clear that the implication is unproblematic. Just as a bus may run over Henriette before she reaches the other 347 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 347–368. c 2007 Springer.
348
FERNANDO
side of the road, so too might some irritable room-mate put Harry out of his misery before his second hiccup culminates.1 (196) Harry was hiccuping when Sally shot him dead. Far-fetched or not, the possibility of killing off Harry in the middle of his second hiccup would seem to threaten (1a). We shall return to the progressive in §2.3 below. For now, suffice it to ask: do we want to infer (1a) by ordinary reasoning from meaning postulates? I think we do not. (1a) holds only with a pesky qualification that is far from incontrovertible. Ramsay’s statement “hiccuping is (conceptually) instantaneous” is drawn out in Ramsay (1994) as hiccuping events don’t [take time] (or rather . . . we don’t think about the time they take).
Conceptualizations such as events can be very fragile. Example (4) is from Verkuyl (2000). (197) a. Piet warned him three times. b. What Piet did three times was warn him. c. What Piet did was to warn him three times. Asking how many things Piet did (i.e. how many events Piet acted in) is as tricky as asking how many time units there are in a minute. What time units? If by time units, we mean seconds, then the answer is 60; 1 if we mean hours, then the answer is 0 or rather, 60 . Talk of events in the abstract is slipperier still than talk of stretches of time, with units unspecified. The fragility of aspect is well-known; in Steedman (2000), we read aspectual categories like activity and accomplishment are ways of viewing a happening, rather than intrinsic properties of verbs and associated propositions, or of objective reality and the external world.
Although the extent to which language and reality can be separated is debatable, I think it is useful to say cakes and potatos inhabit the world in a way that progressives do not. Surely, the change in (5) from cakes to potatos has consequences that can be put down to worldly facts about cakes versus potatos. (198) a. John baked a cake. b. John baked a potato. 1
I am indebted to Martin Emms for this application of the imperfective paradox (Dowty, 1979).
FINITE-STATE TEMPORAL SEMANTICS
349
The question about (5) brought out in (2) is how can a cake be produced in (5a), but a potato not in (5b)? Ramsay (1994) writes It could be, of course, that bake is an ambiguous item and that the properties of cake and potato select between its possible interpretations. This is such an unattractive solution that I will not consider it further.
Ramsay’s solution is to add meaning postulates stating that cakes are produced by baking, and that (†)
no cake can get baked twice.
Notice that if we reject (†), then we deny the general principle from which to infer the creation of a cake in (5a). In this case, the contrast between (5a) and (5b) to explain is the inference that (5b) produces no potato. The focus then shifts to a meaning postulate ruling out the creation of a potato by baking. So why should it be acceptable to apply meaning postulates in (5), but not in (1)? Because the issue of truth in (5) concerns stuff in the world (viz. cakes and potatos) whereas that in (1) has to do with linguistic conventions reflecting conceptualization. Potatos and cakes are not, in the main, the fragile conceptualizations that instantaneous events are. To say that potatos are “ways of viewing a happening” is to invite a discussion of vagueness (borderline potatos) and/or figurative uses (“John is a potato”) that (although extremely interesting) I trust we can, for the present, put aside. The distinction between the world and “ways of viewing a happening” is at the root of the tension between formal (model-theoretic) semantics, oriented around the external world, and conceptual (cognitive) semantics, centered around mental representations (e.g. Jackendoff, 1991). The former emphasizes the structures that model the external world, distinct from the descriptions/formulas that the mental representations of the latter come to. Cramming progressives alongside potatos in one model-theoretic universe risks conflating structure with description. But so what? Supposing we care only to draw inferences, and have no interest in classifying them, should we bother distinguishing description from structure? Important work in computational linguistics depends on such a distinction. Labelling some of that work the “Description Theory of linguistic representation,” Muskens (2001) argues that The move from structures and truth conditions to descriptions of structures and their truth conditions offers a uniform and natural way to underspecify syntax and semantics . . .
350
FERNANDO
Applying underspecification to the topic at hand, consider (6). (199) a. It rained for an hour. b. It rained in an hour.
[began to rain]
If treating “bake” in (5a) and (5b) differently is (as Ramsay claims) so “unattractive” then equally would not distinct analyses of “rain” in (6a) and (6b) be as well? Underspecification allows us to associate a single representation with “rain” that can then be fleshed out in opposing directions by the “for” and “in” temporal modifications in (6a) and (6b). This is shown in section 2 below, where the parallel with (5) is developed using regular languages as representations to account for well-known entailment patterns illustrated by (7) and (8). (200) a. Carl drank beer for an hour. b. Carl was drinking beer |= Carl drank beer (201) a. Carl drank six pints in/∗ for an hour. b. Carl was drinking six pints |= Carl drank six pints The account in section 2 is an ontologically innocent alternative to mereological approaches such as Krifka (1989). As for the entailment (1a) from “Harry was hiccuping” to “Harry hiccuped twice,” even if we grant that the entailment there is soft (in view of counter-examples such as (3)), the tendency to accept it cries out for some explanation. An attempt at this is made in section 3, where issues of granularity and iteration are taken up. Temporal instants are turned into intervals, and iterations are analyzed in what Westerst˚ ahl (1989) calls “logic with free quantifier variables” (which, as far as axiomatization goes, is essentially no harder than first-order logic). The application here of generalized quantifiers (albeit underspecified) is consonant with the objection in Verkuyl (2000) to Davidson’s “wish to restrict his formalism to a first-order language.” Verkuyl questions the usefulness of Davidsonian events in compositionally piecing together a verb and its arguments (what he calls “inner aspectuality”), a problem he attacks instead through certain successor, path and participancy functions (having set aside any qualms about higher-order objects). Jointly, these functions may yield not just one event (that might be existentially quantified a` la Davidson, 1967) but any number of events
FINITE-STATE TEMPORAL SEMANTICS
351
(taking us into the realm of pluralities).2 Verkuyl’s claim that “it is only after the construal of the V[erb ]P[hrase] information that one may begin to speak about events” suggests that during VP assembly, events are underspecified. Ideas of Verkuyl’s have played an important role in work by Naumann (e.g. 2001), a finite-state form of which is developed in Fernando (2002), drawing heavily on Steedman (2000). This chapter elaborates features of Fernando (2002) that pertain to underspecification and to more general computational concerns. The label “finite-state” reflects an interest in computational methods short of predicate logic. (This explains, in part, my reluctance above to embrace meaning postulates at every turn.) The claim is that useful finite-state descriptions can be constructed without arbitrarily complicated semantic inferencing. Section 2 below can be read as exploring some form of a Grammatically Relevant Subsystem hypothesis (Pinker, 1989). To appreciate the full semantic content of these finite-state descriptions (irrespective of grammatical relevance), section 3 grounds these descriptions in models of reality, uncovering inferential properties outside the scope of section 2. Underlying the division between sections 2 and 3 is the contention that so long as we distinguish the construction of logical forms from open-ended inference on them, a computational linguist need not worry if the logical forms s/he constructs go beyond first-order logic. Davidson’s extensional first-order bias is (I claim) not as computationally compelling as it may, at first blush, seem.
2. Building finite-state descriptions
The present section develops finite-state descriptions given by finite strings of observations in chronological order. Viewed as motion pictures, they are accepted by cameras that can be formulated as finite automata or finite Kripke models. Related mechanistic conceptions have been described in Tojo (1999), where models more general than finite automata are used, and in Chang et al. (1998), where a cognitive processing picture with “simulative inference” is presented. As interesting as these directions are, we shall focus here on the languages 2
The same move to sets of events is advocated in Ramsay (1994).
352
FERNANDO
(ie string sets) at stake3 – a more abstract perspective insofar as any number of computational devices may capture the same language. The regularity of these languages can be made evident by presenting them through regular expressions. The symbols in our alphabet are the still-pictures that, strung together, form our motion pictures. More formally, given some finite set Φ of formulas such as dawn, rain, and dusk , a symbol is a subset of Φ such as ∅, {rain} and {rain, dawn} that enumerates propositions the associated still depicts. To distinguish sets-as-symbols from sets-as-languages (or, for that matter, from setsas-strings), let us draw boxes instead of braces for the former, writing for ∅-as-symbol. We may then translate English to regular languages via a mapping L satisfying (9). (202) L(rain from dawn to dusk) = rain, dawn rain ∗ rain, dusk For every integer k ≥ 0, the string rain, dawn rain k rain, dusk from (9) is a movie that starts with rain and dawn, ends with rain and dusk , and in between shows k instances of rain. In a similar vein, we have the associations (10)-(11), illustrating general translations of statives and of the prepositions “from” and “to” (used temporally), where r+ is the concatenation rr∗ of r with the Kleene star r∗ of r. (203) L(rain) = rain + (204) L(from dawn) = dawn + L(to dusk) = + dusk Can we derive (9) from (10) and (11)? The next subsection describes an associative binary operation & on languages such that (10) and (11) yield (9), provided L(rain from dawn to dusk) = L(rain) & L(from dawn) & L(to dusk). 2.1. Superposition and subsumption (unconstrained) Given languages L, L ⊆ Power (Φ)∗ , let us define their superposition L&L to be L&L = 3
#
k ≥1 {(σ 1 ∪σ 1 ) · · · (σ k ∪σ k )
| σ 1 · · · σ k ∈ L, σ 1 · · · σ k ∈ L } ,
These languages encode different information than those considered in ter Meulen (1990)’s application of van Benthem’s semantic automata to aspectual verbs.
FINITE-STATE TEMPORAL SEMANTICS
353
the intuition being that the union σ i ∪σ i is the superposition of the two stills σ i and σ i . Like intersection, & is not so readily expressible as a function on regular expressions, but nonetheless maps regular languages to regular languages. To see this, suppose M and M are finite automata that accept the languages L and L . Then we can form a finite automata that accepts L&L by taking (i) as its set of states the (Cartesian) product Q × Q of the state sets of M and M (respectively) (ii) as its initial state the pair (q 0 , q 0 ) of initial states of M and M (iii) as its set of final states the product F × F of those of M and M (iv) as its transition set the transitions σ∪σ
(q, q ) −→ (r, r ) σ
σ
for all M -transitions q → r and M -transitions q → r . (i)-(iv) above are identical to the construction for intersection L ∩ L except that in (iv), L ∩ L would require σ = σ . Like ∩, & is associative and commutative. Superposition induces a useful pre-order on languages L, L defined as follows. L subsumes L , written L L , if L ⊆ L&L L L iff L ⊆ L&L . It is not difficult to see that is reflexive and transitive, and that L L &L iff L L and L L . That is, L L can be understood intuitively as saying: L has at least as much information content as L .4 To make this intuition precise, we shall impose constraints on superposition that we shall then apply to capture certain restrictions implicit in temporal modifications. More specifically, consider the verb phrase “rain for an hour,” which we might handle according to (12), where (the underspecified constructs) x and y are variables (subject to instantiation described in section 3). (205) L(rain for an hour) = rain, time(x) rain ∗ rain, time(y), hour (x, y) An obvious alternative to L L would be to strengthen the requirement L ⊆ L&L to L = L&L . But then we would lose reflexivity (consider L = p + q ) as well as the property that L subsumes all L and L such that L = L &L . 4
354
FERNANDO
An obvious derivation of (12) would appeal to (13)-(14) and (10). (206) L(an hour) = time(x) + time(y), hour(x, y) (207) L(rain for an hour) = L(rain) & L(an hour) The question arises: what about L(for)? The semantic content of “for” is often illustrated by contrasting it with “in,” as in (8a), “Carl drank six pints in/∗ for an hour.” The oddness of “for” in (8a) is based on a so-called telic reading of “Carl drank six pints” under which a proposition such as six-pints-drunk holds only at the end of the hour in question (and not before). That is, once the six pints Carl drank are drunk, the event described is over. Now, generalizing from (8a), a first stab at defining the language of a sentence S modified by a temporal interval I using “in” and “for” is (15). $
L(S) & L(I) ∅ otherwise $ L(S) & L(I) L(S for I) ≈ ∅ otherwise
(208) L(S in I) ≈
if L(S) is ‘telic’ if L(S) is not ‘telic’
Implicit in (15) is the idea that S is odd if its language L(S) is empty, L(S) = ∅. Leaving aside exactly what “telic” means, notice that (15) cannot account for cases where both “in” and “for” modification are acceptable, albeit with different meanings, as in (6ab), “rain for/in an hour.” 2.2. Constraints and “in”/“for” modification A variant of (15) that we shall develop in this subsection is (16), which depends on the following ingredients: (i1) a refinement &Σ of &, relative to some subfamily Σ ⊆ Power (Φ) (i2) certain functions telic and iter (associated with “in” and “for” respectively) on languages. (209) L(S in I) = L(S) &Σ telic(L(S)) &Σ L(I) L(S for I) = L(S) &Σ iter(L(S)) &Σ L(I) Let us describe (i1) and (i2) in turn. To understand (i1), recall that a symbol from which we form our strings is a set of propositions – i.e. a subset of Φ – that describes
355
FINITE-STATE TEMPORAL SEMANTICS
a still-picture. But can any σ ⊆ Φ describe a still? To get sensible motion pictures, we had better require that all of the formulas in σ can be realized simultaneously. This would rule out sets such as dawn, dusk (unless dawn is allowed to coincide with dusk ). Now, the idea is to let Σ be some subcollection of subsets of Φ from which we can string together motion pictures; that is, Σ constitutes a (static) notion of consistency. Beyond requiring that dawn, dusk ∈ Σ, let us assume that Φ is closed under negations ∼ · such that (17) holds, and that Σ is closed under subsets, as stated in (18). (210) (∀ϕ ∈ Φ) ϕ, ∼ ϕ ∈ Σ (211) (∀σ ∈ Σ)(∀σ ⊆ σ) σ ∈ Σ Notice that if Σ = Power (Φ), then the superposition L&L may fail to be a subset of Σ∗ . To be assured of a subset of Σ∗ , let us intersect L&L with Σ+ , defining L &Σ L = (L&L ) ∩ Σ+ . Since regular languages are closed under intersection, it follows that &Σ maps regular languages to regular languages. A more economical construction of automata for L&L would draw on the idea that two stills σ and σ can be superposed precisely if σ ∪ σ ∈ Σ. The equation L &Σ L =
#
k ≥1 {(σ 1
∪ σ 1 ) · · · (σ k ∪ σ k ) | σ 1 · · · σ k ∈ L,
σ 1 · · · σ k ∈ L and for 1 ≤ i ≤ k, σ i ∪ σ i ∈ Σ} suggests forming an automata for L&Σ L from automata M and M for L and L (respectively) according to clauses (i)-(iii) above, replacing (iv) by a transition relation ⊆ (Q × Q ) × Σ × (Q × Q ) given by τ
(q, q ) (r, r ) iff (∃τ, τ ) σ = τ ∪ τ and q → r and q → r σ
τ
for all σ ∈ Σ, q, r ∈ Q and q , r ∈ Q . Observe that &Σ is associative, commutative, and that if Σ ⊆ Σ then (L&Σ L )&Σ L = (L&L &L ) ∩ Σ∗ for all L, L , L ⊆ Power (Φ)+ . Turning to (i2), for every language L, let ω(L) be the set of propositions that must hold at the end of L; that is, ω(L) = {ϕ ∈ Φ | (∀s ∈ L − {}) ϕ ∈ end(s)}
356
FERNANDO
where is the empty string, and for every non-empty string s, end(s) is s’s last symbol. For example, ω( rain + ) = rain = ω(L rain )
for any L
and we may expect something like six-pints-drunk ∈ ω(L(Carl drank six pints)). Moreover, the telicity of “Carl drank six pints” discussed above suggests that for any symbol σ that occurs before the end of L(Carl drank six pints), six-pints-drunk ∈ σ. To move the negation on ∈ over to propositions and symbols (i.e. sets of propositions), let us define the negation σ of a subset σ ⊆ Φ by = Φ (∈ Σ) ϕ1 , . . . , ϕn = ∼ ϕ1 + · · · + ∼ ϕn
for n ≥ 1
(writing + for non-deterministic choice; i.e. union). Now, let telic(L) = ω(L)+ and let us agree to call L telic if L subsumes telic(L). It follows that if a language L subsumes ∼ ϕ + ϕ for some proposition ϕ ∈ Φ, then L is telic. For the case L = L(Carl drank six pints), the obvious candidate for ϕ is six-pints-drunk . In general, for any S such that ω(L(S)) = ϕ , (16) gives L(S in I) = L(S) &Σ ∼ ϕ + &Σ L(I) = L(S) &Σ L(I) if L(S) is telic. If we drop the assumption ω(L(S)) = ϕ then (16) yields L(S) &Σ L(I) ⊆ L(S in I)
if L(S) is telic
where ⊆ above is as good as =, given (19). (212) If L ⊆ L&Σ L then L and L&Σ L are equivalent up to Σ-entailment. Without getting into the details of Σ-entailment mentioned in (19),5 let us make (19) plausible. That L&Σ L should Σ-entail L is a consequence of the conjunctive character of &Σ . The converse (L Σ-entails 5
Beyond the hints from section 3 below, this notion is fleshed out in www.cs.tcd.ie/Tim.Fernando/jlc.pdf.
FINITE-STATE TEMPORAL SEMANTICS
357
L&Σ L ) depends on the subset relation (L ⊆ L&Σ L ) assumed, and on a construal of languages as sets of possibilities that become more informative as possibilities get ruled out. Next, “for” modification under (16) presupposes a function iter, which we can now identify iter(L) = ω(L)+ . It is immediate from (17) and (18) that telic(L) &Σ iter(L) = ∅ . Thus, if we call a language L that subsumes iter(L) iterative, and write L = L mod Σ for “L is equivalent to L up to Σ-entailment” then (16) sharpens (15) to (20). $
L(S) &Σ L(I) mod Σ ∅ $ L(S) &Σ L(I) mod Σ L(S for I) = ∅
(213) L(S in I) =
if L(S) is telic if L(S) is iterative if L(S) is iterative if L(S) is telic
Notice that there are languages that are neither telic nor iterative. Take, for example, ∗ rain . Clearly, for every language L, rain ∈ ω(L) iff L ∗ rain . Moreover, if we re-set L(rain) to ∗ rain (revising (10) to (10) ), then not only does (12) still follow from (16) but (16) also equates L(rain in an hour) with ∼rain, time(x) ∼rain ∗ rain, time(y), hour(x, y) which can be read “begin to rain in an hour.” (10)
L(rain) = ∗ rain
Attacking (6) through (10) is analogous to analyzing “bake OBJ” in (2) by asserting the existence of an OBJ at the end of the baking, but leaving to OBJ the matter of whether or not one exists beforehand. (Yes, if OBJ is potato; no, if OBJ is cake.) (10) leaves the iterativity/telicity of “rain PREP an hour” to PREP. (Iterative, if PREP is for; telic, if PREP is in.) Revisiting “rain from dawn to dusk,” we can preserve our original analysis (9) under the revision (10) of (10) if we re-analyze L(rain from dawn to dusk) according to L(S from B to E) = L(S) &Σ iter(L(S)) &Σ L(from B) &Σ L(to E)
358
FERNANDO
where iter is triggered by the interval modification “from B to E.” I confess that it is not clear to me if this re-analysis is an improvement. Or, for that matter, if (10) ought to be revised to (10) . Be that as it may, let us step back a bit in the next subsection, and explore more generally how L(S) might be defined. But first, we look more closely at drinking six pints/beer, spelling out what the proposition six-pintsdrunk comes to, with tense abstracted away. 2.3. Finishing, starting and a bit on the progressive The tense-less phrase “Carl drink six pints” describes an accomplishment (in the sense of Vendler, 1967) that, as is well-known in the literature (e.g. Tenny and Pustejovsky, 2000), can be tracked by the action of the verb on parts of a particular argument a. If we write v(u) u"a u≺a
for a proposition saying “Carl drink u” for a proposition saying u is a non-null part of a for the proposition u " a ∧ u = a
then we may expect that L(Carl drink six pints) subsumes ∼ (∃u " a) v(u) (∃u ≺ a) v(u) ∗ v(a) and 6pints(a) + . By contrast, the language for the activity “Carl drink beer” would subsume ∼ ∃u(beer(u) ∧ v(u)) ∃u(beer(u) ∧ v(u)) + with no well-defined termination condition. In other cases of (Vendlerian) achievements (e.g. “Carl arrive”), it is an end e ∈ Φ that is supplied, and the stages along the way that are abstracted out, leading to ∼ e + e . In general, given a language L ⊆ Power (Φ)+ and a symbol σ ⊆ Φ, let us say that σ finishes L if L subsumes σ + σ. Thus, L is telic iff some σ finishes L. Dually, we say σ starts L if L subsumes σσ + . It is reasonable to assume that v(a) finishes L(Carl drink six pints)6 while ∼ (∃u " a) v(u) starts it. More generally, we might approximate a language L by a precondition b ∈ Φ and a postcondition e ∈ Φ as in figure 51 (on page 359), forming an accomplishment by &-conjoining an achievement and an activity b ∼b+ & ∼e+ e
=
b, ∼ e ∼ b, ∼ e ∗ ∼ b, e
Or, at least, some language equivalent to L(Carl drink six pints) up to Σ-entailment. Note that ∼ (∃u " a) v(u) implies ∼ v(a), and v(a) implies ∼∼ (∃u " a) v(u). 6
FINITE-STATE TEMPORAL SEMANTICS L subsumes · · ·
Vendler class of L
no change within L
b+
state (e.g. hate)
b starts L
b ∼b+
(e is ∼ b)
activity (e.g. swim)
e finishes L
∼e+ e
(b is ∼ e)
achievement (e.g. arrive)
b starts and e finishes L
b, ∼ e ∼ b, ∼ e ∗ ∼ b, e ( ∼ b, ∼ e ∈ Σ)
Figure 51.
(b is e)
359
accomplishment (e.g. swim a mile)
Vendler classes induced by beginnings b and endings e (b, e ∈ Φ)
as well as a state (lest we forget that L(Carl drink six pints) also subsumes 6pints(a) + ). The analysis of “Carl drink six pints” above is widely applicable. For instance, we may assume that L(Pat swim a mile) subsumes mile(m) + and ∼ (∃u " m) swim(p, u) (∃u ≺ m) swim(p, u) ∗ swim(p, m) . The present set-up applies also to accomplishments measured out not by a partial order " on the direct object but by some other gradable condition that may involve other arguments (e.g. “push the cart to the park”). It is instructive to contrast the present proposal with Krifka (1989), where a unary predicate P is defined to be quantized if ∀x∀y (P (x) ∧ y ≺ x ⊃ ∼ P (y)). The variables x and y above are understood to range over objects/ tokens. What is quantized is not a token, but a type P , which is comparable to our notion of a telic language L. In place of a mereology on event-tokens x, y over predicate logic, our analysis employs descriptions L, L , subject to a notion of subsumption that, when beefed up to Σentailment, is essentially propositional (as opposed to predicate) logic. As indicated in the next section, the languages L can be interpreted over predicate logic models, but these models need not include eventtokens as objects.7 We may, if we wish, introduce event-tokens into 7
Exactly what an event-object is, I am not prepared to say. But it certainly is not just a string in a language L, and so I hesitate to refer to a language as an event-type (given the common practice of identifying tokens with objects,
360
FERNANDO
our ontology; but our account (so far) does not depend on any such commitment. In this sense, our approach is ontologically innocent, compared to Krifka (1989). Rather than appealing to some partial order " on event-tokens, we build on a pre-order that supports a measure of underspecification, illustrated above by the analysis of “rain for/in an hour” based on (10) .8 And speaking of that analysis, Table 51 suggests some minor amendments. Besides a beginning and an end, an interval may have a middle (which will prove useful below for accommodating the progressive). Accordingly, let us revise telic ever so slightly to telic(L) = ω(L)+ & + = ω(L) ω(L)+ . Next, for activities to admit “for”-modification as easily as states, let us redefine ite so as to leave the first symbol (i.e. observation) unconstrained before iterating ω(L) ite(L) = ω(L)+ ω(L) . The boundary conditions on interval modification are, it seems to me, underspecified, which is why we drop the requirement that the very first symbol of a string in an iterative language contain ω(L). The functions telic and ite are, as before, complementary: telic(L) &Σ ite(L) = ∅ for Σ satisfying (17) and (18). Turning next to the progressive, let us move the spotlight from the last to the initial symbol of a string, defining a function prog on languages L by prog(L) = α(L)+ α(L) where α(L) = {ϕ ∈ Φ | (∀s ∈ L − {}) ϕ ∈ begin(s)} (with begin(s) equal to the initial symbol of s). Note that L subsumes prog(L) iff α(L) starts L and types with the set of their tokens). Languages are descriptions, as are the strings in languages. Just what they describe is the subject of section 3 below. 8 The underspecification at stake here comes down to the incompleteness of Σ-entailment. See www.cs.tcd.ie/Tim.Fernando/jlc.pdf, where the Σcomplement ¬L of a language L is defined such that neither L nor ¬L may be Σ-entailed by a language L .
FINITE-STATE TEMPORAL SEMANTICS
Vendler class
for
Prog[ress]
state activity achievement accomplishment
+ + − −
− + − +
Figure 52.
361
From Naumann (2001)
and that α(L), prog(L) and the notion of starting are duals to ω(L), telic(L) and finishing (respectively), obtained by reversing languages. The dual of the function ite for iterative languages is the function reten(L) = α(L) α(L)+ for languages that are retentive insofar as the initial condition α(L) is preserved by the middle. The characterization of Vendler classes by binary features [±for] and [±Prog] in Naumann (2001), reproduced here as figure 52, can be formulated in terms of subsumption L is [+for] L is [−for] L is [+Prog] L is [−Prog]
iff iff iff iff
L L L L
subsumes subsumes subsumes subsumes
iter(L) telic(L) prog(L) reten(L) .
Now, suppose the progressive PROG(L) of a language L were defined, and that every string in PROG(L) is a prefix of a non-empty string in L going beyond the first symbol but falling short of the last, as specified in (21). (214) For every string s ∈ PROG(L), length(s) ≥ 2 and ss ∈ L for some non-empty string s ∈ Power (Φ)+ . While (21) provides only a very sketchy picture of the progressive, it is sufficient to yield PROG(L) subsumes ∗ ω(L)
if L is iterative
but not so if L is telic. Inasmuch as ω(L) signals the culmination of L, we have the basis here of an explanation for the entailment in (7b) ing(Carl-drink-beer) |= Carl-drink-beer
362
FERNANDO
and the non-entailment in (8b) ing(Carl-drink-six-pints) |= Carl-drink-six-pints . But what about (1a) ing(Harry-hiccup) |= twice(Harry-hiccup) ? The argument for (1a) in Ramsay (1994, 1999) is based on the assumption that the progressive ing(Harry-hiccup) describes a temporal interval that has at least two points, enough to implicate two instantaneous Harry-hiccup events. Now, while (21) says that every string s in PROG(L) has length ≥ 2, it also says that s must be a proper prefix of some string in L. That is, if all strings in a language L for instantaneous events have length 1, then PROG(L) = ∅ by (21). But we agreed in §2.1 that an equality L(S) = ∅ is a symptom that S is odd; and presumably “Harry was hiccuping” is not. So we had better not apply (21) to a language L for instantaneous events, construed as a set of strings of length 1. Let us restrict (21) to languages L that subsume + . Fortunately, this includes the Vendler classes. That still leaves (1a) unresolved. And it also raises the larger question of how to reconcile talk of temporal intervals, points, and instantaneous events with the languages L we have been working with. The next section fashions a partial answer.
3. Grounding the descriptions
Can we assume (following Ramsay 1994,1999) that “Harry was hiccuping” describes a temporal interval with at least two points, and thus conclude that “Harry hiccuped twice”? Line (3) above (“Harry was hiccuping when Sally shot him dead”) invokes the imperfective paradox (Dowty, 1979), suggesting that a hiccup might be interrupted. Parsons (1990) side-steps the imperfective paradox by calling on partial events, or rather, in-progress states. But can we (truthfully) say “Harry hiccuped twice” when the second hiccup, in fact, failed to culminate? It is not altogether clear (to me) that the iteration Ramsay is after depends on the progressive. Consider the progressive-free sentences (22) and (23), essentially from Jackendoff (1991). (215) Harry hiccupped all night. (216) The light flashed all night.
FINITE-STATE TEMPORAL SEMANTICS
363
If (22) and (23) are true, then we can be sure that there were more than two complete hiccups/flashes that night. A night is simply too long, and hiccups/flashes short. Similarly, the reason, I think, we would normally infer “Harry hiccuped twice” from “Harry was hiccuping” is that the former is normally used to describe a sufficiently long time span. Surely, however, we may utter (3) to report an incident where Harry had no time to hiccup twice. The possibility of counting Harry’s hiccups, any two of which are separated by a temporal interval, points to a certain objection that might be raised against the application of Kleene iteration ·+ to characterize activities and states in §2.3. From Tenny and Pustejovsky (2000), for example, we have An activity or a state can be considered a homogeneous event because it may be divided into any number of temporal slices, and one will still have an event of the same kind (i.e. if Boris walked along the road is true for ten seconds, then a one-second slice of that walking is still an event of walking along the road). There are obvious problems relating to the granularity analysis of homogeneity that we will ignore for discussion’s sake.
Obvious or not, the present section is all about granularity. For granularity is crucial to the basic claim of this chapter: that events (whatever they are) are not so much about temporal intervals but rather about samplings of such intervals – and very selective samplings at that, given by strings of observations made with bounded granularity. That said, let us turn to L(an hour), defined in (13) as time(x) + time(y), hour (x, y) . The variables x and y above are evidently intended to range over times, stamping positions in strings. But what times? For the sake of concreteness, let us identify temporal instants/points with real numbers, writing # for the set of reals. Real numbers embody a certain infinite precision that our observations can only finitely approximate with non-empty open intervals (a, b) = {r ∈ # | a < r < b} for a, b ∈ # with a < b. Identifying a level of granularity with a choice of a real number δ > 0, our best δ-approximation of a real number r is the open interval % δ
r =
δ δ r− , r+ 2 2
&
364
FERNANDO
of size δ with center r. (The intuition is that δ is a bound on the precision of observations of that granularity.) Accordingly, we define the set Oδ of δ-points to be Oδ = {δ r | r ∈ #} . Now, while we may take the variables x and y in L(an hour) above to range over real numbers, we should be careful to fix a level δ > 0 of granularity, treating a real number r as the δ-point δ r. The slogan is real observations are δ-points (for some δ). The remainder of this chapter uses sequences of δ-points to ground strings from the languages L considered in section 2. 3.1. δ-successors and δ-instantiations Fix a real number δ > 0. Let < δ be the binary relation on the set Oδ of δ-points o, o given by o < δ o iff (∀r ∈ o)(∀r ∈ o ) r < r [iff the right end-point of o is ≤ the left end-point of o ] and let succδ ⊆ Oδ × Oδ be the successor relation induced by < δ o succδ o iff o < δ o and there is no o such that o < δ o < δ o . Notice that succδ is not functional (i.e. δ-successors are not unique), and that while o succδ o precludes a gap between o and o large enough for another δ-point, we might be able to squeeze a δ -point in between o and o , for some positive δ < δ. Returning now to our finite-state descriptions L ⊆ Σ+ , let us define a δ-instantiation of L to be a function e from some finite subset {o1 , o2 , . . . , ok } of Oδ to Σ such that e(o1 ) e(o2 ) · · · e(ok ) ∈ L and oi succδ oi+1 for 1 ≤ i < k. Given such a δ-instantiation e, let us define first(e) to be the < δ -least element o1 in domain(e), and last(e) to be the < δ -greatest element ok in domain(e). Now, the variables x and y in L(an hour) can be assigned δ-points9 by a δ-instantiation of L(an hour). More precisely, a δ-instantiation e of L(an hour) assigns x the δ-point first(e), and y the δ-point last(e). 9
Or real numbers, via the inverse of the map from r ∈ # to δ r ∈ Oδ .
FINITE-STATE TEMPORAL SEMANTICS
365
3.2. Models and intervals What must we require of a model M in order to interpret a language L relative to M ? For interpretations of the variables x and y above, M must include among its objects some δ-points. And in order to make sense of propositions ϕ ∈ Φ that occur within L, we must translate these to formulas ϕz , where z is a (temporal) variable and the truth of ϕz can be tested against M , coupled with an assignment of z to a δ-point in M . For instance, rain z = rain(z) and so the vocabulary/signature of M must include a unary relation (symbol) rain. The unary relation time in L(an hour) is special in that time(u)z will, for our purposes, be set to the equality u = z (to force the assignment to x and y mentioned at the end of §3.1). Also special is the binary relation hour, which is atemporal in the sense that hour(x, y)z is just hour(x, y). (That is, hour(x, y)z ignores the variable z.) Let v(Φ) be a vocabulary such that for every ϕ ∈ Φ, the translation ϕz is a v(Φ)-formula. To interpret a language L ⊆ Power (Φ)+ relative to a v(Φ)-model M , it will be useful to fix a 1–1 function from δ-points to variables that do not occur in Φ, writing z o for the variable to which the δ-point o is mapped. Now, suppose we are given a v(Φ)-model M , a variable assignment f into objects in M ,10 and a function e (such as a δ-instantiation of a language) from a finite subset of Oδ to Σ ⊆ Power (Φ). Let us define M, f supports e to mean that for every o ∈ domain(e), f (z o ) = o and
(∀ϕ ∈ e(o)) M, f |= ϕz o .
For instance, if eˆ has domain {o1 , o2 , o3 } and
eˆ(o1 ) = time(x), rain eˆ(o2 ) = rain eˆ(o3 ) = time(y), rain, hour (x, y)
In addition to the variables x and y in L(an hour), section 2 uses terms that we may construe as variables, such as a for the six pints in L(Carl drink six pints) and m for the mile in L(Pat swim a mile). 10
366
FERNANDO
then M, f supports eˆ precisely if f (x) M M f (y) M M
= |= |= = |= |=
f (z o 1 ) rain[o1 ] rain[o2 ] f (z o 3 ) rain[o3 ] hour [o1 , o3 ]
and, for good measure, f (z o 1 ) = o1 , f (z o 2 ) = o2 and f (z o 3 ) = o3 . Next, passing from functions e to languages L and open intervals (r, r ) over #, we define M, f |= L @δ (r, r ) iff M, f supports a δ-instantiation e of L such that first(e) = δ r and last(e) = δ r . Note that (r, r ) describes, in effect, the open interval (r − 2δ , r + 2δ ). For every open interval o = (a, b) over #, let us write oδ for (a − 2δ , b + 2δ ). We can then interpret a form of iteration ITER(L) of L against an open set o by checking if there are “enough” δ-points o ⊆ oδ such that L@δ o . That is, M, f |= ITER(L) @δ o iff {o ⊆ oδ | M, f |= L@δ o } ∈ Qδ,o for some family Qδ,o of sets of δ-points contained in oδ . The family Qδ,o may vary not only with the choice of the model M (just as the interpretation of rain does in first-order logic) but also with the particular language L. Hanging the superscript L on Qδ,o , it is natural to ask: how does Qδ,o L change with L? It may well turn out that this quantificational interpretation has very little inferential bite, spurring us on to re-analyze ITER(L) in finite-state terms along the lines of §2.3. Much remains to be said.
References Chang, N., Gildea, D.and Narayanan, S.: 1998, ‘A dynamic model of aspectual composition.’ In Proc. CogSci 98. Davidson, D.: 1967, ‘The logical form of action sentences.’ In N. Rescher, ed., The Logic of Decision and Action, pages 81–95. University of Pittsburgh Press.
FINITE-STATE TEMPORAL SEMANTICS
367
Dowty, D.: 1979, ‘Word Meaning and Montague Grammar’. Reidel. Fernando, T.: 2002, ‘A finite-state approach to event semantics.’ In Proc. of the 9th International Symposium on Temporal Representation and Reasoning (TIME-2002), pages 124–131. IEEE CS Press. Section 2 of this chapter is elaborated in http://www.cs.tcd.ie/Tim.Fernando/jlc.pdf. Jackendoff, R.: 1991, ‘Parts and boundaries.’ In B. Levin and S. Pinker, eds., Lexical and Conceptual Semantics, pages 9–45. Blackwell. Krifka, M.: 1989. ‘Nominal reference, temporal constitution and quantification in event semantics.’ In R. Bartsch, J. van Benthem, and P. van Emde Boas, eds., Semantics and Contextual Expressions, pages 75–115. Foris. Moens, M. and Steedman, M.: 1988, ‘Temporal ontology and temporal reference.’ Computational Linguistics, 14(2):15–28. Muskens, R.: 2001, ‘Talking about trees and truth-conditions.’ Journal of Logic, Language and Information, 10(4):417–455. Naumann, R.: 2001, ‘Aspects of changes: a dynamic event semantics.’ J. Semantics, 18:27–81. Parsons, T.: 1990, ‘Events in the Semantics of English: A Study in Subatomic Semantics. MIT Press. Pinker, S.: 1989, ‘Learnability and Cognition: The Acquisition of Argument Structure’. MIT Press. Pustejovsky, J.: 1991, ‘The generative lexicon.’ Computational Linguistics, 17(4): 409–441. Ramsay, A.: 1994, ‘The co-operative lexicon.’ In H. Bunt, R. Muskens, and G. Rentier, eds., Proc. International Workshop on Computational Semantics, pages 171–180. ITK, Tilburg. Ramsay, A.: 1999, ‘Dynamic and underspecified interpretation without dynamic or underspecified logic.’ In H. Bunt and R. Muskens, eds., Computing Meaning, pages 57–72. Kluwer. Steedman, M.: 2000, ‘The Productions of Time’. Draft, ftp://ftp.cogsci. ed.ac.uk/pub/steedman/temporality/temporality.ps.gz. Tenny, C. and Pustejovsky, J.: 2000, ‘A history of events in linguistic theory.’ In C. Tenny and J. Pustejovsky, eds., Events as Grammatical Objects, pages 3–37. CSLI, Stanford. Ter Meulen A.: 1990, ‘English aspectual verbs as generalized quantifiers.’ In J. Carter et al., eds., Proc. NELS 20 . GLSA, Department of Linguistics, University of Massachusetts, 378–390. Tojo, S.: 1999, ‘Event, state and process in arrow logic.’ Minds and Machines, 9:81–103. Vendler, Z.: 1967, ‘Linguistics in Philosophy’. Cornell University Press. Verkuyl, H.: 2000, ‘Events as dividuals.’ In J. Higginbotham, F. Pianesi, and A.C. Varzi, eds., Speaking of Events, pages 169–205. Oxford University Press.
368
FERNANDO
Westerst˚ ahl, D.: 1989, ‘Quantifiers in formal and natural languages.’ In D. Gabbay and F. Guenthner, eds., Handbook of Philosophical Logic, volume IV, pages 173–209. Reidel.
CLAIRE GARDENT AND KRISTINA STRIEGNITZ
GENERATING BRIDGING DEFINITE DESCRIPTIONS
1. Introduction
It has long been known that knowledge based reasoning is a crucial component of natural language processing (NLP). Yet the complexity involved in representing and using knowledge efficiently has led most NLP work to focus on more tractable aspects of language such as syntax, prosody or semantic construction. The generation of definite descriptions (i.e., noun phrases with a definite article such as “the rabbit”) is a case in point. The goal of this sub-task of natural language generation (which is the production of a text satisfying a given communicative goal) is to construct a noun phrase that allows the hearer to uniquely identify its referent in the context of utterance. The standard algorithm for this task (on which most other proposals are based) is presented in (Dale and Reiter, 1995). But neither this algorithm nor the extensions proposed in (Horacek, 1997) and (Krahmer and Theune, 2001) take world knowledge into account when considering the context of utterance. For all these algorithms, the context is a fixed set of entities and relations intended to represent the current (linguistic and situational) context of utterance. Yet many definite descriptions either refer to inferable entities (entities not explicitly mentioned or present in the context of utterance but inferable from it) or refer to contextually salient entities using inferred rather than explicitly mentioned relations (cf. Poesio and Vieira, 1998). For instance, the use of the definite article to refer to the patrons, the waitress, the busboys, etc. in (217) can only be explained by taking into account that world knowledge supports the assumption that these are somehow related to the restaurant. (217) The young woman scans the restaurant with this new information. She sees all the patrons eating, lost in conversations. The tired waitress, taking orders. The busboys going through the motions, collecting dishes. The manager complaining to the cook about something. 369 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 369–396. c 2007 Springer.
370
GARDENT AND STRIEGNITZ
One proposal which does integrate knowledge based reasoning in generation of definite descriptions is that presented in Stone, 1998. The philosophy underlying Stone’s proposal is that knowledge based reasoning should be integrated with sentence planning to reason (i) about what the already generated text implies and (ii) about what the context entails. In this paper, we follow up on Stone’s proposal and show how to integrate surface realization and inference into Dale and Reiter’s algorithm to support the generation of such definite descriptions as illustrated in (217). We start (Section 2) by presenting Dale and Reiter’s base algorithm. Section 3 then summarizes the range of definite descriptions found in corpora while Section 4 focuses on cases involving knowledge based reasoning. In Section 5, we consider the two defining characteristics of definite descriptions, uniqueness and familiarity, and show how these can be defined to encompass not only directly coreferential, but also “inference-based definite descriptions”. Section 6 presents the extended algorithm and an implementation of it based on description logic. Section 7 concludes and points to further research.
2. The Standard Algorithm for Generating Definite
Descriptions Most algorithms for generating definite descriptions that are described in the current literature have a common core based on Dale’s greedy algorithm (Dale, 1989; Dale, 1992) and on the Incremental Algorithm proposed in (Dale and Reiter, 1995). We now describe this standard base algorithm. 2.1. The Task The task is to find a description of an object (the target entity) that allows the hearer to uniquely identify that object in a given situation. This situation is known to both speaker and hearer and can be fully described by a set of facts, such as {rabbit(r1 ), rabbit(r2 ), rabbit(r3 ), hat(h1 ), hat(h2 ), hat(h3 ), white(r1 ), black(r2 ), white(r3 ), in(r1 , h1), in(r2 , h2 )}. Let’s assume that the goal is to describe the object r1 . The set {rabbit(r1 )} would not be a uniquely identifying description in this case,
GENERATING BRIDGING DEFINITE DESCRIPTIONS
371
Input: R: a set of ground atomic formulas t: target entity, t ∈ terms(R) Initialize: 1. goals ← stack with only element t 2. L ← ∅ Check success: 3. if goals is empty then return L 4. current goal ← top(goals) 5. if D(current goal,L) is a singleton then pop top(goals); goto 4 Extend description: 6. try to select an applicable p ∈ R 7. if there is no such p then return fail 8. for each o ∈ terms(p) − terms(L) push(o,goals) 9. L ← L ∪ {p} 10. goto 4 Figure 53.
Base algorithm for producing definite descriptions
as r2 and r3 also fit this description and are therefore distractors of r1 . In contrast, the description {rabbit(r1 ), white(r1 ), in(r1 , h1 ), hat(h1 )} would rule out all distractors and therefore achieve the given goal. 2.2. The Base Algorithm The standard algorithm takes as input a target entity t and a set of facts. It either returns a set of facts that uniquely identifies t (the description) as output or fails in case such a description cannot be built. Further, to build a uniquely identifying description for a given target, the algorithm starts from an empty description and incrementally adds properties to it until all distractors are ruled out. Finally, if a property is chosen which is not unary but involves other objects, then the description must be extended until it uniquely identifies these other objects as well. The pseudo-code of the algorithm is given in Figure 53. It uses a stack to keep track of the entities that have to be described. In the beginning the stack only contains the target entity (line 1) and the description is empty (line 2).
372
GARDENT AND STRIEGNITZ
After the initialization, we enter the loop that adds properties to the description. This loop is controlled by the goal stack: as long as there are goals left on the stack, the algorithm tries to add information to the description. New goals are pushed onto the stack whenever adding a property introduces new objects into the description (line 8). Conversely, a goal is popped from the stack whenever the description built so far uniquely identifies the topmost object in the stack (line 5). This is checked by testing whether the distractor set of that object is a singleton, i.e., the set containing only that object. The distractor set of an object o is the set of all objects that fit the description given of o by L. Formally: D(o, L) = {o |∃ substitution s such that s(o) = o and s(L) ⊆ R} So, if we can replace o by o and replace all other objects mentioned in L in such a way that the resulting set is a subset of R, then o is a distractor of o. If the topmost goal cannot be removed from the stack, the algorithm tries to make the description more informative (lines 6–9) by selecting an applicable property from R and adding it to the description. Usually, all properties p ∈ R that fulfill the following requirements are taken to be applicable: − p mentions an object that is already part of the description, i.e.: terms(p) ∩ terms(L) = ∅ − p is new to the description, i.e.: p ∈ L − adding p reduces the distractor set of at least one object mentioned in the description, i.e.: D(o, L∪{p}) ⊂ D(o, L) for an o ∈ terms(L) Another restriction that has been used is to exclude all properties that, according to a given concept hierarchy, are more general than a concept that has already been included into the description. In case there are several applicable properties the algorithm needs a strategy for deciding which one to select. This is where the different versions of the base algorithm differ most. Dale (1989), for instance, prefers those properties that rule out the most distractors. The “incremental” algorithm (Dale and Reiter, 1995), on the other hand, assumes a domain dependent ordering of properties according to their type (e.g. shape > color > size) which determines the order in which properties are added.
GENERATING BRIDGING DEFINITE DESCRIPTIONS
373
2.3. Interleaving Content Determination and Surface Realization The output of the base algorithm is a list of properties that uniquely identify the target entity. So, this base algorithm determines the semantic content a referring expression has to express in order to be successful, it does not, however, produce the surface form of such a referring expression. It is therefore possible that the base algorithm produces an output that cannot be verbalized. The general idea for avoiding this problem is to interleave property selection with surface realization (see, e.g., Horacek, 1997). This allows us to immediately check that every selected property can be incorporated in the syntactic tree, and to make sure that the final description has no “holes” which have to be filled due to syntactic reasons. For example, “the red” is not a good noun phrase in English (or at least it requires a special context). To turn it into one we would have to include some property that can be realized as a head noun, even if it doesn’t rule out any distractors. 2.4. An Example To illustrate how the base algorithm works, we will now go through the example of Section 2.1. The input to the algorithm is R=
{rabbit(r1 ), rabbit(r2 ), rabbit(r3 ), hat(h1 ), hat(h2 ), hat(h3 ), white(r1 ), black(r2 ), white(r3 ), in(r1 , h1), in(r2 , h2 )}
and the target t is r1 . The following table shows how the goal stack and the description L change with every pass through the main loop (lines 4–10 in Figure 53). The table furthermore shows the distractor set of the top most entity on the goal stack and the action of the algorithm. Goals
L
Distractors
Action
[r1 ] [r1 ] [r1 ] [h1 , r1 ] [r1 ] [ ]
∅ {rabbit(r1 )} {rabbit(r1 ), white(r1 ) {rabbit(r1 ), white(r1 ), in(r1 , h1 )} {rabbit(r1 ), white(r1 ), in(r1 , h1 )} {rabbit(r1 ), white(r1 ), in(r1 , h1 )}
terms(R) {r1 , r2 , r3 } {r1 , r3 } {h1 } {r1 }
extend L extend L extend L pop goals pop goals return L
In the beginning, only the target entity r1 is on the goal stack, the description is empty, and all objects mentioned in R are distracting
374
GARDENT AND STRIEGNITZ
entities of r1 . Therefore, the condition in line 5 fails and the algorithm will try to eliminate distractors by extending the description. Let’s assume that we are using a selection strategy which is following the Incremental Algorithm and picks properties in the following order: type > color > location. Adding rabbit(r1 ) to the description reduces the set of distractors. h1 , h2 , and h3 all get ruled out, because substituting any of these objects for r1 in rabbit(r1 ) won’t give a property that is contained in R. However, there are still distractors, and even adding white(r1 ) doesn’t eliminate all of them. Therefore, the algorithm adds in(r1 , h1 ) to the current description. This introduces a new entity h1 into the description which is pushed onto the goal stack and becomes the new current goal. Now, the distractor set only contains h1 itself as there is no other entity that contains a white rabbit. The goal is therefore popped off the stack. Similarly, r1 can now be popped off the stack as there is no other white rabbit that is in something. Now, the goals stack is empty and the description is returned. Unfortunately, this description cannot be realized by a full noun phrase (it only contains information to yield “the rabbit in the”). Building the surface form while assembling the semantic content in the way described above would solve this problem.
3. Definite Descriptions in Real Texts
We now survey the types of definite descriptions that can be found in corpora thereby giving a list of the different cases that an algorithm for generating definite descriptions should be able to deal with. Two properties are generally taken to characterize definite descriptions namely, uniqueness and familiarity. Roughly, uniqueness says that the referent denoted by the definite description must be the only referent satisfying the given description – this property is most prominently exposed in (Russell, 1906). Familiarity on the other hand, requires that this referent be known to the hearer – this is perhaps most strongly demonstrated in (Heim, 1982). Indeed these two properties are the properties taken into account by the base algorithm: the set of relations it outputs must allow the hearer to uniquely identify the intended referent (uniqueness) and it must do so on the basis of shared knowledge about that referent (familiarity). The familiarity/uniqueness explanation is a fairly high-level one however, and as shown by, e.g., Hawkins (1978) or Prince (1981), a finer grained examination of the phenomenon reveals a much more complex
GENERATING BRIDGING DEFINITE DESCRIPTIONS
375
typology of possible uses. For a start, uniqueness is always relative to some restricted context. Here are some examples. (218) b. If a rabbit sees a carrot, the rabbit eats the carrot. c. There once was a doctor in London. The doctor was Welsh. In (218a), uniqueness is relative to the quantification domain: for each rabbit and for each carrot that this rabbit sees, the rabbit eats the carrot that it sees. Similarly in (218b), uniqueness is relative to the domain of discourse: (218b) does not imply that there is a unique Welsh doctor in London but that there is a unique Welsh doctor in London that the speaker is talking about. Although the base algorithm simply assumes an already restricted context, Krahmer and Theune (2001) show how it can be extended to deal with discourse domain restrictions. Interaction with quantification remains an open question and will probably remain so for a while as quantifiers have received little attention in the generation literature. Moreover – and this is the main point of this paper – familiarity can be of different types, and only some of them are covered by the base algorithm. Following Poesio and Vieira (1998), for instance, we can identify four main familiarity classes: coreferential (direct or indirect), bridging, larger situation, and unfamiliar uses. In coreferential uses, the referent of the definite description is familiar in virtue of having been mentioned in the previous discourse (the referent is discourse old in Prince’s (1981) terminology). In such cases, the hearer will know the intended referent either because the speaker uses the same description as was used in the previous discourse (direct coreference, cf. (219a)) or because she uses a description which on the basis of world or lexical knowledge, the hearer can infer to be true of the previously mentioned entity (indirect coreference, cf. (219b)) (219) a. A woman came in. The woman was wearing a beautiful hat. b. An actress entered the stage. The woman was wearing a big hat. In a bridging use, the referent of the definite description is discourse new but related by world knowledge to some previously mentioned, i.e., discourse old, entity. In (220) for instance, the referent of “the ceiling” is related by a part-of relation to the discourse old entity denoted by the NP “the room”. The ceiling is not just any ceiling but the ceiling of the room that was mentioned in the previous sentence.
376
GARDENT AND STRIEGNITZ
(220) A woman came into the room. The ceiling was very high. Larger situation uses (e.g., (221)) are cases where the definite description denotes a discourse new but hearer old object: the described entity has not been mentioned in the previous discourse but is assumed by the speaker to be part of the hearer’s general knowledge about the world (221a) and situation of utterance (221b). (221) a. The sun is rising. b. Pass the salt please ! The unfamiliar class covers all remaining uses of definite descriptions; i.e., uses where the referent of the description is neither discourse/hearer old nor related by lexical knowledge to some discourse old entity. It encompasses definite descriptions with sentential complements (222a) and with modifiers relating the referent of the definite description to some either discourse or hearer old object (222b-c). (222) a. Bill is amazed by the fact that his father is black. b. The man John met yesterday is interesting. c. The Iran/Iraq war is over. In sum, a definite description can be familiar either because it refers to some known (i.e., either discourse or hearer old) entity (coreferential and situational use); or because it is related, either explicitly by the description (unfamiliar use) or implicitly by lexical knowledge (bridging), to some known entity. The base algorithm, because it only resorts to information that is explicit in the context of utterance (either through the previous discourse or through the situation), can only account for directly coreferring or larger situation uses. Indirect coreferences and bridging uses cannot be dealt with as they require an interaction between generation and inference. In the next section, we look at these inference base definite descriptions in more detail to see what is needed to extend the base algorithm so that it can deal with them.
4. Definite Descriptions and Inference
Inference based definite descriptions, i.e., bridging and indirect coreferential uses, represent a non negligible proportion of uses in real text. An empirical study of the Wall Street Journal by Poesio and Vieira
GENERATING BRIDGING DEFINITE DESCRIPTIONS
377
(1998) shows that out of 1412 definite descriptions being studied, 24% were “inference based” uses, with 9% bridging cases and 15% indirect coreference. In both cases, processing requires reasoning based on world knowledge and the discourse context. With bridging uses, the hearer must be able to infer the implicit relation holding between the referent of the definite description and another discourse or hearer old entity. And in cases of indirect coreferential uses, the hearer must be able to infer that the properties used in the speaker’s definite description, although not part of the common ground between speaker and hearer, hold of some discourse or hearer old entity. We now consider these two cases in more detail using (here and in what follows) the following terminology. We call the referent of the definite description, the target and the (discourse or hearer old) entity with which it is either coreferential or related, the anchor. 4.1. Bridging In a bridging use, a definite description relates the target entity to its anchor via some inferable relation, which we will call bridging relation. The term bridging was first introduced by Clark (1977), who identified several types of different bridging relations, such as the part-of relation, semantic roles of verbs, reasons, consequences. In this paper, we will concentrate on the part-of relation. Clark distinguishes three subcases of bridges involving the part-of relation, which are illustrated by the following examples. (223) a. John entered the room. The ceiling was very high. b. John entered the room. The windows looked out to the bay. c. John entered the room. The chandelier was sparkling brightly. In (223a), “the ceiling” is a part of the room mentioned in the previous sentence and it is a necessary part: to be a room, a room must have a ceiling. In contrast, “the windows” and “the chandelier” in (223b-c) are what Clark called probable and inducible parts. Rooms don’t necessarily have windows, nor do they necessarily have a chandelier. But while most rooms have windows, rooms with chandeliers are actually rare. Nevertheless, it is plausible to link the chandelier in (223c) to the room as rooms typically have lamps and a chandelier is a type of lamp.
378
GARDENT AND STRIEGNITZ
4.2. Indirect coreferential uses Indirect coreferential uses are cases where the definite description refers to a discourse old entity using a hearer new description. So, in this case target and anchor are the same entity. Although the description used is not part of the common ground, the hearer can nevertheless identify the intended object because the description used by the speaker is compatible with the common ground and with world knowledge and allows the hearer to establish the link to the anchor. Here are some examples. (224) a. An actress entered the stage. The woman was wearing a big hat. b. I met a man yesterday. The bastard stole my money. c. John bought a new car. The Volvo delights him. In the first case, the property used in the definite description is entailed by world knowledge, as woman is a hypernym of actress. But as the other two examples show, this is not necessarily the case. Thus a man can, but need not, be a bastard – x is a bastard is a proposition that is compatible but not entailed by the proposition x is a man. Similarly a car can but need not be a Volvo – in that case, however, the two properties are not merely compatible, but stand in a hyponymic relation (Volvos are cars). So, for both kinds of inference based definite descriptions, bridging uses and indirect coreferential uses, there are cases where the existence of an entity fitting the description is not logically entailed (Examples (223b,c) and (224b,c). The approach presented here focuses on a subset of these cases namely: bridging definite descriptions involving necessary, probable, and certain inducible parts; and indirect coreferential uses where the description of the target is a hypernym of some hearer known property of the anchor.
5. Familiarity and Uniqueness of Bridging Anaphora
As we saw in section 3, familiarity and uniqueness are two defining characteristics of definite descriptions. In this section, we show how to formulate these properties so as to encompass not only direct coreferential definite descriptions (as is done in the base algorithm) but also indirect coreferential and bridging uses. We start by presenting
GENERATING BRIDGING DEFINITE DESCRIPTIONS
379
the structured context the extended algorithm is working with. We then go on to give an intuitive explanation of how uniqueness and familiarity differ in our algorithm from the way these are defined in the base algorithm. Finally, we present our definitions. 5.1. The Discourse Context As seen in Section 2, a context in the base algorithm is an unstructured set of facts assumed to be shared by hearer and speaker. To deal with bridging anaphora, we need a slightly more sophisticated notion of context. First, knowledge based reasoning must be possible – hence we assume that world and lexical knowledge (in the form of axioms) are part of the context. Second, a distinction must be made between hearer’s and speaker’s knowledge to support the generation of examples such as (223c) where the description used by the speaker is hearer new but speaker old (in this example, the speaker knows that the room has a chandelier while the hearer doesn’t). To account for these observations, we assume that the context is structured into three sub-parts as follows: Discourse Model (DM ): a list of ground atomic formulas modeling the previous discourse Ex: DM = {restaurant(r1), italian(r1), . . .} World Knowledge (WKL): shared rule-base background knowledge Ex: WKL = {∀x.restaurant(x) → ∃y.cook(y) ∧ has(x, y), . . .} “restaurants have cooks” Speaker Model (SM ): a list of ground atomic formulas representing the additional knowledge of the speaker Ex: SM = {has(r1, c1), cook(c1), excellent(c1), . . .} Discourse Model and World Knowledge are shared by speaker and hearer. As the discourse evolves, the Speaker Model (that is, that part of the speaker knowledge that is not shared with the hearer) shrinks while the Discourse Model grows. Figure 54 gives a graphical representation of how these three parts relate to each other.
380
GARDENT AND STRIEGNITZ
shared knowledge WK
speaker’s knowledge SM Figure 54.
DM
Modeling the discourse context
5.2. Familiarity and uniqueness in the extended algorithm In the base algorithm, the target is taken to be hearer-old, that is, present in the (linguistic or situational) context of utterance. Familiarity is thus taken as given and is restricted to coreferential and situational uses. When extending the algorithm to bridging uses of definite descriptions, familiarity must be modified to include targets not only when the target itself is hearer old, but also when it is related to some hearer old entity by a bridging relation. To encode this extended notion of familiarity, we introduce the notions of intended and potential anchors. In what follows, we give an intuitive description of these notions. These will be made precise in the next section. Given a target t, the intended anchors of t (written IA(t)) are the entities which the speaker uses to “anchor” the target in the shared knowledge. That is, intended anchors are hearer-old entities which the speaker can relate either by identity or by some bridging relation to the target. By contrast, the potential anchors of a target t given a description L (written PA(t, L)) are the entities which, given the description L, the hearer can relate to the target on the basis of the shared knowledge. Given this, a description L for a target t is said to satisfy familiarity iff all intended anchors for t are potential anchors of t given L. Since potential anchors are hearer old entities and since the set of intended anchors is by definition non empty, this definition ensures that a description that satisfies familiarity actually relates the target to some hearer old entity. Further, it ensures that the set of anchors that are possible from the hearer’s perspective is not smaller than the set of anchors that are possible from the speaker’s perspective. Or in other words, that no speaker intended anchor is ruled out (from the hearer’s perspective) by the description. The impact of this restriction is illustrated in section 5.5.4 below.
GENERATING BRIDGING DEFINITE DESCRIPTIONS
381
Now consider uniqueness. In the base algorithm, uniqueness is satisfied by a given description L iff the set of distractors for L is the singleton set containing the target, that is, if there is no other object than the target satisfying the given description. Similarly, the extended algorithm requires that given a description L, the set of potential anchors equals the set of intended anchors. In essence, this relativizes the uniqueness condition of the base algorithm to anchors rather than targets. The singleton set containing the target now is the set of intended anchors and the description is used to eliminate elements not from the set of distractors (the set of elements in the context that fit the description) but from the set of potential anchors (i.e., from the set of elements in the context to which the target could be related either by identity or by some bridging relation). This is not quite sufficient however to account for uniqueness in bridging uses and as we shall soon see in more detail, the extended algorithm further requires that it is plausible to assume that there is exactly one object that, given the description, can be related to the set of potential anchors. 5.3. Defining Intended and Potential Anchors In the preceding section, we saw that for a given description, familiarity is satisfied iff all speaker’s intended anchors are potential anchors of the description. To make this notion precise, we now define intended and potential anchors. Recall that intended anchors are hearer old entities which the speaker knows to be related to the target either by identity or by some bridging relation. More specifically, intended anchors are all those discourse old entities for which the speaker can infer from her knowledge (WKL+ DM + SM ) that they are either identical to the target or related to the target via some bridging relation. (Here and in what follows, we write bridge to stand for some bridging relation.) The intended anchors (IA) of a target t are thus defined as follows: IA(t) = {o ∈ terms(DM) | WKL + DM + SM |= t = o ∨ bridge(t, o)} To define the notion of potential anchors we start by defining what it means for a target t to be familiar through an entity a given a description L. Intuitively, t is familiar through an entity a given a description L if the hearer can infer on the basis of shared knowledge that a either fits the description given for t or that a is related to an entity that fits the description.
382
GARDENT AND STRIEGNITZ
This can be formalized as follows. Let t be the target and let a be a discourse old entity. Furthermore, let L(t, o1 , . . . , on ) be the conjunction of ground atomic formulas representing the description used by the speaker; Let o1 , . . . , on be the terms other than t occurring in L; and let L(x, x1 , . . . , xn ) be the formula obtained by substituting x, x1 , . . . , xn for t, o1 , . . . , on in L(t, o1 , . . . , on ). Finally, let R be the identity relation or a bridging relation (e.g., part-of ), and let L be either L or the generalization LN of L as defined below. Then a target t is familiar through a given a description L of t provided the following holds: WKL + DM |= ∃x, x1 , . . . , xn .R(a, x) ∧ L (x, x1 , . . . , xn ) This general condition specializes to different bridging uses of definite descriptions as follows. Dir. Coref. DM |= ∃x, x1 , . . . , xn .a = x ∧ L(x, x1 , . . . , xn ) Ind. Coref. WKL + DM |= ∃x, x1 , . . . , xn .a = x ∧ L(x, x1 , . . . , xn ) Bridging WKL + DM |= ∃x, x1 , . . . , xn .bridge(x, a)∧ ∧L(x, x1 , . . . , xn ) (Nec. Parts) WKL + DM |= ∃x, x1 , . . . , xn .bridge(x, a)∧ Bridging ∧LN (x, x1 , . . . , xn ) (Prob. Parts)
In a coreferential use, the target is familiar through an entity that is identical to the target, but while indirect coreference implies a resort to world knowledge, direct coreference only involves the discourse model. In a bridging use involving necessary parts, the relation between the target and the entity through which it is familiar is a bridging relation and both discourse model and world knowledge are necessary for the inference to go through. Bridging cases involving probable parts such as Example (223c) are more complex. In such cases it does not follow from the shared knowledge that a bridging relation holds between the target (the chandelier in Example (223c)) and the discourse old entity through which it is familiar (the room in Example (223c)). However, it does follow, that the discourse old entity is related via a bridging relation to an instance of a superconcept of the given description. A room does not necessarily include a chandelier but it includes furniture which is a superconcept of chandelier:
GENERATING BRIDGING DEFINITE DESCRIPTIONS
383
WKL ∧ room(a) |= ∃x∃a.(part of(x, a) ∧ room(a) ∧ chandelier(x)) WKL ∧ room(a) |= ∃x∃a.(part of(x, a) ∧ room(a) ∧ furniture(x)) To handle such cases, the condition is modified to involve not just the speaker’s description (L) but also an adequately generalized version of the speaker’s description (LN ). More precisely, the relation N denoted by the head noun used in the speaker’s description should be replaced by a relation N , such that − N subsumes N and − the discourse old a is in a bridging relation bridge to elements of N and there is no relation N such that N subsumes N and a is in the same bridging relation bridge to elements of N .
LN is then the conjunction of atomic formulas representing the speaker’s description where N has been substituted for N . Having defined what it means for a target to be familiar through an entity given a description, we can now define the potential anchors of a target t given a description L (written, PA(t, L)) to be all those entities a through which t is familiar: PA(t, L) = {a ∈ terms(DM) |t is familiar through a given L} 5.4. Defining Familiarity and Uniqueness Given the definition of intended and potential anchors introduced in the previous section, we can now state a Familiarity Condition and a Uniqueness Condition capturing the intuitions of Section 5.2. The Familiarity Condition is stated as follows. A description L for a target t satisfies familiarity iff: IA(t) ⊆ PA(t, L) Uniqueness requires (i) that the sets of intended and potential anchors be equal and (ii) that the target is unique wrt. the potential anchors. That is, it must be plausible to assume that there is exactly one object for each potential anchor that fits the speaker’s description and can be related to the anchor. Or in other words, it should not follow from world knowledge and the domain model that there exist two different objects which are related to the anchor and satisfy the speaker’s description. So, given a target t, a description L the following must hold for each potential anchor a: '
WKL + DM + SM |= ∃y, x, x1 , . . . , xn R(a, y) ∧ L(y, x1 , . . . , xn )∧ R(a, x) ∧ L(x, x1 , . . . , xn ) ∧ y = x)
384
GARDENT AND STRIEGNITZ
with R being the identity or a bridging relation and ∧L(x, x1 , . . . , xn ) as defined above. The Uniqueness Condition therefore has the following two parts: 1. PA(t, L) = IA(t) 2. ∀a ∈ PA(t, L) : t is unique wrt. a given L. 5.5. Examples We now go trough some examples to illustrate the effect of our uniqueness and familiarity condition. An example satisfying both uniqueness and familiarity First, let us consider an example in which a restaurant has been mentioned and the speaker uses the description “the cook” to refer to the cook of that restaurant. (225) a. DM = {restaurant(r)} WKL = {∀x(restaurant(x) → ∃y(cook(y) ∧ part of(y, x)))} SM = {cook(c), part of(c, r)} T arget = c Description = cook(c) b. John took Jim to the restaurant. The cook was wearing a white hat. Given this context, the set of intended anchors for c is IA(c) = {r} and the set of potential anchors for c given the description cook(c) is PA(c, cook(c)) = {r}. Hence IA(c) = PA(c, cook(c)) and both familiarity and the first part of the uniqueness condition are satisfied. Since furthermore nothing in the given context entails that a restaurant must have more than one cook, it is consistent to assume that there is exactly one cook related to r hence the second part of the uniqueness condition is satisfied and the example satisfies both familiarity and uniqueness. An example violating familiarity Let us now consider a slightly modified version of the preceding example; one where the restaurant has been replaced by a zoo: (226) a. DM WKL SM T arget Description
= = = = =
{zoo(z)} {} {cook(c), part of(c, z)} c cook(c)
GENERATING BRIDGING DEFINITE DESCRIPTIONS
385
b. John took Jim to the zoo. ??? The cook was wearing a white hat. Now the set of potential anchors for the target c given the description cook(c) is the empty set because world knowledge does not imply that zoos have cooks. In this case, the set of intended anchors is not a subset of the set of potential anchors. Hence familiarity is not satisfied and (226b) is odd. An example violating the first uniqueness clause Here is a situation with two restaurants in the discourse model (see 227a). (227) a. DM = {restaurant(r1 ), restaurant(r2 ), italian(r1 ), chinese(r2 )} WKL = {∀x(restaurant(x) → ∃y(cook(y) ∧ part of(y, x)))} SM = {cook(c), part of(c, r1 )} T arget = c Description = cook(c) b. There is an Italian restaurant around the corner and a Chinese one at the end of the street. ??? The cook is very good. In this case, given the description cook(c) and the target c, the set of potential anchors PA(c, cook(c)) = {r1 , r2 } whereas the set of intended anchors for c is {r1 }. Hence the first clause (PA(t, L) = IA(t)) of the uniqueness condition is not satisfied – which explains the oddity of (227b). An example violating the second clause of the uniqueness condition Finally, consider the following situation: (228) a. DM = {book(b)} WKL = {∀x(book(x) → ∃zy(page(y) ∧ page(z) ∧ part of(y, x) ∧part of(z, x) ∧ z = y))} SM = {page(p), part of(p, b)} T arget = p Description = page(p) b. John took the book back to the shop. ??? The page was missing. In this case, the set of potential anchors for p given the description page(p) is {b}, which is the same as the set of intended anchors. Hence
386
GARDENT AND STRIEGNITZ
both familiarity and the first clause of the uniqueness condition are satisfied. However, the second clause of the uniqueness condition is violated as it is inconsistent with world knowledge to assume that a book has exactly one page. Hence example (228b) is correctly ruled out by our definition of uniqueness.
6. An Algorithm for Generating Bridging Anaphora
In this section, we show how to extend the base algorithm introduced in Section 2 to bridging and indirect coreferential uses of definites. We illustrate the workings of the algorithm by means of some examples and describe an implementation of it using a description logic reasoning system for carrying out the necessary inferences on the background knowledge. 6.1. Extending the Standard Algorithm We will now present an extension of the base algorithm that generates bridging definites in addition to coreferring definites. The main idea behind this extension is to use the relation between potential and intended anchors to control the algorithm. Thus the proposed algorithm starts with an empty description and extends it until the Uniqueness and while the Familiarity Condition is satisfied. In terms of anchors this means that the algorithm proceeds until the sets of potential and intended anchors are identical and while the set of intended anchors is a subset of the set of potential anchors. The algorithm is given in Figure 55. It takes a target entity and a representation of the discourse context (in the form described in the previous section) as input. It then simultaneously constructs the surface form and the semantic content of the referring expression (cf. Section 2.3). In Figure 55, we use N to refer to the (partial) syntactic tree. We assume that we have a way of accessing the open syntactic slots of N and that the function L gives us the set of properties that N is realizing. The output of the algorithm is a syntactic tree and a classification of this description as uniquely identifying, non-uniquely identifying, or unfamiliar. The overall structure of the extended algorithm is similar to that of the base algorithm (cf. Figure 53). Again, we use a goal stack to keep track of the entities that have to be described. After some initializations (lines 1–2), we enter the main loop, which terminates successfully when
GENERATING BRIDGING DEFINITE DESCRIPTIONS
387
Input: WKL: a set of rules relating relations to each other DM: a set of ground atomic formulas SM: a set of ground atomic formulas t: target entity, t ∈ terms(DM ∪ SM) Initialize: 1. goals ← stack with only element t 2. N ← initial syntactic structure with an open slot for a noun Check success: 3. if goals is empty then return uniquely identifying, N 4. current goal ← top(goals) 5. if IA(current goal ) ⊆ PA(current goal,L(N)) then return unfamiliar , N 6. if PA(current goal,L(N)) = IA(current goal)) and ∀a ∈ IA(t) : t is unique wrt. a given L(N ) then pop top(goals); goto 4 Extend description: 7. if current goal ∈ terms(DM) then R ← DM else R ← DM + SM 8. try to select an applicable atomic formula p s.t. R + WKL |= p 9. if there is no such p then return non uniquely identifying, N 10. for each o ∈ terms(p) − terms(L(N )) push(o,goals) 11. N ← N’ s.t. L(N ) = L(N ) ∪ {p} 12. goto 4 Figure 55.
Extension of the base algorithm to bridging cases
the goal stack is empty (line 3). Otherwise, the algorithm examines the top entry of the goal stack (lines 4–6) and, if necessary, extends the description (7–12). The main strategy of the base algorithm is to add information to the description until all distractors are ruled out. If it is not possible to rule out all distractors, no definite description can be constructed. The main strategy of the extended algorithm is to add information to the description until the Uniqueness Condition is satisfied. The description is extended until a) all potential anchors that are not intended anchors are ruled out, and b) it is consistent with the speaker’s knowledge to assume that the target is unique wrt. the anchor (line 6).
388
GARDENT AND STRIEGNITZ
The algorithm fails, returning the description and the classification non uniquely identifying, if the description, although it is not satisfying the Uniqueness Condition, cannot be specified any further by adding information (line 8–9). This is similar to the failure case of the base algorithm. But there is another way in which the extended algorithm can fail. And that is if the description doesn’t satisfy the Familiarity Condition. In this case, the set of potential anchors doesn’t include the intended anchors anymore (line 5). As with the base algorithm, there is some freedom in defining what are applicable properties and the selection procedure for choosing among them. Sensible minimal requirements for a property p to be applicable are: − terms(p) ∩ terms(L) = ∅ − WKL + DM + L(N )|=p These requirements are analogous to the first two of the three requirements that we used in the base algorithm. We furthermore suggest the following conditional restrictions on applicability to replace the third one. − If there are syntactic holes, fill one of them (i.e, only those properties that can fill one of them are applicable). − Only use a property that doesn’t ensure that the Familiarity Condition holds if you have to fill a syntactic hole and there is no other way. − Among the properties that ensure the satisfaction of the Familiarity Condition, prefer those that work toward the Uniqueness Condition by ruling out distracting entities. If there are no such properties, then choose a bridging relation that links the target to an entity o ∈ terms(DM). As in the base algorithm, we can now imagine various different strategies for selecting between applicable properties. We could, e.g., follow Dale and Reiter (1995) and assume a predefined order in which properties are considered, or we could always take the property that reduces the set of potential anchors most effectively (similar to Dale, 1992). 6.2. Examples We will now illustrate how the extended algorithm deals with some of the examples that we saw in the previous sections.
GENERATING BRIDGING DEFINITE DESCRIPTIONS
389
Example 1 First, we will look at the case of a “normal” bridging anaphor. Let’s assume that the speaker just uttered “There is a new restaurant next to the church”, and that he now wants to say something about the cook of this restaurant. Let’s furthermore assume that no other cooks or restaurants have been mentioned before. The relevant parts of the discourse context then look like this: WKL : ∀x(restaurant(x) → ∃y(cook(y) ∧ part of(y, x))) DM : restaurant(r), church(u) SM : cook(c), part of(c, r) Together with the target (c) this will be the input to the algorithm. The following table shows what the goal stack, the description, and the sets of potential and intended anchors of the current goal (i.e., the top most entity on the goal stack) look like in each pass through the main loop. We furthermore show the status of the Familiarity Condition and the Uniqueness Condition. goals [c] [c] []
description “...” ∅ “the cook” {cook(c)} “the cook” {cook(c)}
PA terms(DM )
IA {r}
Fam.: ✓ Uniq.: ✗ {r} {r} Fam.: ✓ Uniq.: ✓ ⇒ uniquely identifying
The algorithm starts with an empty description, which means that all objects mentioned in the discourse model are potential anchors. It then adds cook(c), which fills the open syntactic slot while preserving the satisfaction of the Familiarity Condition, and cuts down the set of potential anchors. In fact, the set of potential anchors now contains only r which is the intended anchor of c, so that the Uniqueness condition is satisfied as well and the goal can be popped off the stack. The goal stack is empty then and the algorithm terminates successfully and returns the description.
Example 2 Now, we will look at the case of Example 227. Here bridging is possible, but the bridge has to be made explicit to pick out the correct anchor.
390
GARDENT AND STRIEGNITZ
The input to the algorithm is as follows: WKL : ∀x(restaurant(x) → ∃y(cook(y) ∧ part of(y, x))) DM : restaurant(r1 ), italian(r1 ), restaurant(r2 ), chinese(r2 ), church(u) SM : cook(c), part of(c, r1 ) target : c goals [c] [c] [r1 , c] [r1 , c]
[r1 , c]
[c]
[]
description “...” ∅ “the cook” {cook(c)} “the cook of ...” {cook(c), part of(c, r1 )} “the cook of the restaurant” {cook(c), part of(c, r1 ), restaurant(r1 )} “the cook of the Italian restaurant” {cook(c), part of(c, r1 ), restaurant(r1 ), italian(r1 )} “the cook of the Italian ’ restaurant” {cook(c), part of(c, r1 ), restaurant(r1 ), italian(r1 )} “the cook of the Italian restaurant” {cook(c), part of(c, r1 ), restaurant(r1 ), italian(r1 )}
PA terms(DM )
IA {r1 }
{r1 , r2 }
{r1 }
{r1 , r2 }
{r1 }
{r1 , r2 }
{r1 }
{r1 }
{r1 }
Fam.: ✓ Uniq.: ✓
{r1 }
{r1 }
Fam.: ✓ Uniq.: ✓
Fam.: ✓ Uniq.: ✗ Fam.: ✓ Uniq.: ✗ Fam.: ✓ Uniq.: ✗ Fam.: ✓ Uniq.: ✗
⇒ uniquely identifying
In this example, adding cook(c) doesn’t rule out enough distractors to satisfy the Uniqueness Condition. So, the algorithm next adds part of(c, r1 ), which doesn’t rule out any distractors but links c to a discourse old entity. This adds a new goal to the goal stack. Then, restaurant(r1 ) is added, which fills a syntactic hole. Adding italian(r1 ) in the next round then rules out all distracting potential anchors and r1 is removed from the goal stack. The potential anchors of c now don’t contain any distracting anchors anymore either, as the shared knowledge doesn’t imply that r2 (the Chinese restaurant) has a cook which is part of an Italian restaurant.
GENERATING BRIDGING DEFINITE DESCRIPTIONS
391
Example 3 Finally, let’s see what happens in cases where bridging is not possible, because the link between target and anchor is not implied by world knowledge. The input to the algorithm is describing a situation where a zoo has been mentioned in the discourse and the speaker now wants to talk about its cook, which has not been mentioned before: WKL : (no relevant world knowledge) DM :zoo(z) SM :cook (c), part of (c,z) target :c The algorithm starts in the same way as in the other examples, but when the first property is added to the description, viz. cook(c), the set of potential anchors becomes empty, so that the Familiarity Condition is violated (cf. line 6 of Figure 55). goals [c] [c]
description “...” ∅ “the cook” {cook(c)}
PA terms(DM )
IA {z}
∅
{z}
Fam.: ✓ Uniq.: ✗ Fam.: ✗ Uniq.: ✗
⇒ unfamiliar
6.3. Implementation There is a proof of concept implementation of the extended algorithm in the functional language Mozart Oz. It uses RACER, an automated reasoning system for description logics (DL), to carry out the necessary inferences on the discourse context. In this section, we will look at how these inferences are formulated as queries to the DL reasoning system. We will first give a brief introduction to description logics and describe how the different components of the discourse context can be represented as a DL knowledge base. Then, we will show how the functionality provided by typical DL reasoning systems nicely supports the reasoning tasks required by the extended algorithm.
392
GARDENT AND STRIEGNITZ
Description Logics Description logic (DL) is a family of logics in the tradition of knowledge representation formalisms such as KL-ONE (Woods and Schmolze, 1992). DL is a fragment of first-order logic which only allows unary and binary predicates (concepts and roles) and only very restricted quantification. A knowledge base consists of a T-Box, which contains axioms relating the concepts and roles, and one or more A-Boxes, which state that individuals belong to certain concepts, or are related by certain roles. T-Box statements have the form C 1 % C 2 , where C 1 and C 2 are concept expressions. Concepts denote sets of individuals and the statement means that denotation of C 1 is a subset of the denotation of C 2 , i.e., C 2 subsumes C 1 . So, we can, for example, write cook % human to express cooks are human. Concepts can be combined by the boolean connectives & (and), ' (or), and ¬. E.g., animal % ¬human. Finally, we can use roles (binary predicates) and their inverses in combination with the quantifiers ∀ and ∃ to relate two concepts: restaurant % ∃part of−1 .cook expresses that every restaurant is related to a cook via an (inverse) part of relation. More expressive DLs furthermore allow the number restrictions on roles such as horse % (= 4 part of−1 ).leg, which means horses have exactly four legs. That was the T-Box. The A-Box contains statements such as rabbit(a) or love(b, c) to express that the individual a is an instance of the concept rabbit, and the individual b and c are related through the role love. We will represent the world knowledge as a T-Box and the discourse model and the speaker’s model as A-Boxes. Theorem provers for description logics support a range of different reasoning tasks. Among the basic reasoning tasks are, e.g., subsumption checking (Does one concept subsume another?), and instance and relation checking (Does a given individual belong to a certain concept?/Are two individuals related through a certain relation?). In addition, description logic systems usually provide some retrieval functionality which, e.g., allows to compute all atomic concepts that a given individual belongs to or all individuals that belong to a given concept. This will prove to be very useful for our purposes as it allows easy access to all properties of an object and provides an elegant way for computing the potential anchors. There is a wide range of different description logics add different extensions to a common core. Of course, the more expressive these extensions become, the more complex the reasoning problems are. In the last few years, new systems such as FaCT (Horrocks) and RACER
GENERATING BRIDGING DEFINITE DESCRIPTIONS
393
(Haarslev and M¨ oller) have shown that it is possible to achieve surprisingly good average-case performance for very expressive (but still decidable) logics. In this paper, we employ the RACER system because it allows for A-Box inferences. DL Reasoning for the Extended Algorithm The core reasoning task in the extended algorithm is to compute the set of potential anchors. In a DL setting, this is straightforwardly implemented by using the instance retrieval mechanism. Recall that this mechanism returns all instances of a given concept. We first create a DL concept which approximates the definition (cf. Section 5) of the potential anchors of a target t given a description L. That is, we create a concept which approximates the following formula λx∃x1 , . . . , xn .L(x, x1 , . . . , xn ) Using instance retrieval, we can then gather all objects belonging to this concept or which are related to an instance of this concept via a bridging relation. To construct the approximating DL concept from the set L of ground atomic formulas representing the semantic content of the definite description, we use the following strategy. Assuming that we want to compute the potential anchors of object o, we first collect all unary properties of o in L and conjoin the relations to form a concept expression. These properties are deleted from the set L. Then, we take one by one the binary properties relating o to some other object o via a relation R, we (recursively) build a concept expression C o for o and conjoin ∃R.C o with the previously constructed part. For example, for L = {cook(c), part of(c, r), restaurant(r), italian(r)} and target c we will build the following DL concept: cook & ∃part of.(restaurant & italian) The resulting concept is an approximation of the corresponding first order formula. Due to the restricted expressive power of DL, the concept cannot capture reflexivity: λx(R(x, x)) would be rendered as ∃T.(. Similarly, if the same two objects are related in two different ways, this information is lost in the DL concept: λx(∃y(R1 (x, y) ∧ R2 (x, y)) becomes (∃R1 .() ∧ (∃R2 .(). As we only generate descriptions expressing a set of positive facts about an entity, however, these two are the only cases in which the DL concept is not equivalent to the first order formula corresponding to the description.
394
GARDENT AND STRIEGNITZ
To check the second part of the Uniqueness Condition (cf. Section 5), we employ number restrictions and test that it is consistent with the shared knowledge to assume that the anchor is related to exactly one instance of the DL concept corresponding to the description. For example, to test whether it is consistent with world knowledge that an entity a has exactly one page, we test whether the negation follows from world knowledge. If so, it is not consistent, otherwise it is. To test the entailment, we send the query “is a an instance of the concept ¬(= 1 part of−1 ).page” to the DL prover. Finally, we use RACER’s functionality for retrieving properties (concepts, most specific concepts, roles) of a given instance to collect all potentially applicable properties. The inferences that are necessary in our approach to generating definite descriptions could also be carried out by automated theorem provers for first order logic. As first order logic provers do not provide the kind of knowledge base management and retrieval functionality that comes with DL systems, however, one would need additional mechanisms for selecting applicable properties and for retrieving and maintaining the set of potential anchors.
7. Conclusion
In this paper, we have shown how the basic incremental algorithm for generating definite descriptions proposed by Dale and Reiter, can be extended to handle definite descriptions whose processing involves knowledge based reasoning. Specifically, we have shown how it can be integrated with reasoning to generate bridging and to a certain extent, indirect coreferential uses of definite descriptions. But as seen in Section 3, bridging and coreferential uses do not exhaust the usage spectrum of definite descriptions. Larger situation and unfamiliar uses are also very frequent. Provided the context is extended to encode world and situational knowledge, the proposed algorithm naturally extends to larger situation uses – these are just uses where the entity is familiar because it is hearer old. The unfamiliar class is more problematic. Recall that it includes definite descriptions with sentential complements (the fact that John’s father is black ) and containing inferables, i.e., entities that are familiar by virtue of being related to some discourse old entity (the man John met yesterday, the Iran/Iraq war ). The first subclass (descriptions with sentential complements) can be viewed as a kind of event anaphora (the
GENERATING BRIDGING DEFINITE DESCRIPTIONS
395
speaker is referring to John’s father’s blackness) and should probably be treated as such. The second case (containing inferables) raises the question of how familiarity should be defined. Conceivably, the familiarity notion used in our algorithm should be extended to encompass such cases and the algorithm modified to insist that all generated descriptions be both familiar and unique. This would not be difficult. There is however a danger of overgeneration as other pragmatic factors seem to interact with the surface realization of containing inferables. It would therefore be important to first have a better understanding of the distribution and form of containing inferables i.e., of when explicit bridging is acceptable. Another open empirical question we are currently investigating is that of the lexical knowledge involved in bridging uses of definite descriptions. In this paper, we have assumed a very simple and direct “part-of” relation between target and anchor. However it is known since at least (Clark, 1977) that bridging relations can be of various natures. We have therefore started a trilingual (English-French-German) corpus study which aims at assessing the relative prominence of the various bridging relations and to identify the precise lexical semantics involved in processing definite descriptions. This will hopefully also help to refine our treatment of bridging cases involving probable and inducible parts.
Acknowledgments This work was partially supported by the Project InDiGen in SPP–Sprachproduktion, grant by the Deutsche Forschungsgemeinschaft to the University of Saarbr¨ ucken and by the Lorraine Region within the project “Ing´enierie des langues, du document et de l’Information scientifique Technique et Culturelle” of the Plan Etat-R´egion 2000–2004.
References Clark, H. H.: 1977, ‘Bridging’. In: P. N. Johnson-Laird and P. C. Wason (eds.): Thinking: Readings in Cognitive Science. Cambridge University Press, Cambridge. Dale, R.: 1989, ‘Cooking up referring expressions’. In: Proc. of the 27th ACL. pp. 68–75. Dale, R.: 1992, Generating Referring Expressions: Building Descriptions in a Domain of Objects and Processes. MIT Press.
396
GARDENT AND STRIEGNITZ
Dale, R. and E. Reiter: 1995, ‘Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions’. Cognitive Science 18, 233–263. Haarslev, V. and R. M¨ oller, ‘RACER: Renamed ABox and Concept Expression Reasoner’. http://www.fh-wedel.de/∼mo/racer/index.html. Hawkins, J. A.: 1978, Definiteness and Indefiniteness. London: Croom Helm. Heim, I.: 1982, ‘The Semantics of Definite and Indefinite Noun Phrases’. Ph.D. thesis, University of Massachusetts. Horacek, H.: 1997, ‘An Algorithm for Generating Referential Descriptions With Flexible Interfaces’. In: Proc. of the 35th ACL. pp. 206–213. Horrocks, I., ‘The FaCT System’. http://www.cs.man.ac.uk/∼horrocks/ FaCT/. Krahmer, E. and M. Theune: 2001, ‘Efficient context-sensitive generation of referring expressions’. In: K. van Deemter and R.Kibble (eds.): Information Sharing: Givenness and Newness in Language Processing. CSLI Publication. Poesio, M. and R. Vieira: 1998, ‘A Corpus-based Investigation of Definite Description Use’. Computational Linguistics 24(2), 183–216. Prince, E.: 1981, ‘Towards a taxonomy of given-new information’. In: P. Cole (ed.): Radical Pragmatics. New York: Academic Press, pp. 223–256. Russell, B.: 1906, ‘On denoting’. Mind 14, 479–493. Stone, M.: 1998, ‘Modality in Dialogue: Planning, Pragmatics and Computation’. Ph.D. thesis, Department of Computer & Information Science, University of Pennsylvania. Woods, W. and J. Schmolze: 1992, ‘The KL-ONE Family’. Computer and Mathematics with Applications 23(2–5).
KEES VAN DEEMTER AND EMIEL KRAHMER
GRAPHS AND BOOLEANS: ON THE GENERATION OF REFERRING EXPRESSIONS
1. Introduction
Generation of Referring Expressions (gre) is a key task of Natural Language Generation nlg systems (e.g., Reiter and Dale, 2000, section 5.4). The task of a gre algorithm is to find combinations of properties that allow the generator to refer uniquely to an object or set of objects, called the target of the algorithm. Older gre algorithms tend to be based on a number of strongly simplifying assumptions. For example, they assume that the target is always one object (rather than a set), and they assume that properties can always only be conjoined, never negated or disjoined. Thus, for example, they could refer to a target object as “the small violinist”, but not as “the musicians not holding an instrument”. As a result of such simplifications, many current gre algorithms are logically incomplete. That is, they sometimes fail to find an appropriate description where one does exists.1 To remedy such limitations, various new algorithms have been proposed in recent years, each of which removes one or more simplifying assumptions. They extend existing gre algorithms by allowing targets that are sets (Stone, 2000; van Deemter, 2000), gradable properties (van Deemter, 2000, 2006), salience (Krahmer and Theune, 2002), relations between objects (Dale & Haddock, 1991; Horacek, 1997), and Boolean properties (van Deemter, 2001, 2002). Recently a new formalism, based on labelled directed graphs, was proposed as a vehicle for expressing and implementing different gre algorithms (Krahmer et al., 2001, 2003). Although the formalism was primarily argued to support relatively simple descriptions (not involving negations or disjunctions, for example), we will show that it can be used beyond these confines. Far from claiming that this will solve all the problems in this area, we do believe that a common formalism 1
We use the term ‘description’ to denote either a combination of properties or its linguistic realization. 397 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 397–422. c 2007 Springer.
398
VAN DEEMTER AND KRAHMER
would be extremely useful, as a basis for comparing and combining existing algorithms. An additional advantage is that the computational properties of graphs are well understood and efficient algorithms for manipulating graphs are available ‘off the shelf’ (e.g., Mehlhorn, 1984). In this paper, we will explore to what extent the graph-based approach to gre can be extended to express a variety of algorithms in this area. Our discussion will be limited to semantic aspects of gre and, more specifically, to the problem of constructing combinations of properties that identify a referent uniquely (i.e., constructing a distinguishing description). Our main finding will be that most existing gre algorithms carry over without difficulty, but one algorithm, which focusses on the generation of Boolean descriptions that also contain relational properties, does not. For this reason, we propose an alternative algorithm that produces different types of Boolean descriptions from the original algorithm, using graphs in a very natural way. The paper is structured as follows. In section 2 we briefly describe the basic graph-based gre approach. Then, in section 3, we describe how various earlier gre algorithms aimed at the generation of sets, gradable properties, salience and negated properties can be reformulated in terms of the graph approach. In section 4 we describe two graph-based algorithms for the generation of full Boolean expressions, one based directly on van Deemter (2001, 2002) and one new alternative. In the concluding section, we list some of the new questions that come up when the different types of algorithms discussed in this paper are combined. 2. Graph-based gre
A number of gre algorithms were proposed in the 1990s, of which the Incremental Algorithm from Dale and Reiter (1995) is probably the best known. These ‘basic’ gre algorithms generate distinguishing descriptions of individual objects. The descriptions generated consist of conjunctions of atomic properties that are represented in a shared Knowledge Base (kb) that is formalized as an attribute/alue structure. Using the attributes Type, Size, and Holds, for example, a very simple kb may look as follows: Domain: {s1 , s2 , s3 , s4 } Type: Musician = {s1 , s2 }, Technician = {s3 }, Trumpet = {s4 } Size: Big = {s1 , s3 }, Small = {s2 , s4 } Holds: s4 = {s2 }
GENERATING REFERRING EXPRESSIONS
399
Note that the first argument of a relation like Holds is, for now, treated as another attribute, expressing that the object that holds s4 (a trumpet) is s2 (a musician). In the abbreviated notation used here, only those attributes are listed that have a nonempty set of values. Thus, for example, it follows that Holds: s3 = {} (i.e., nobody holds the technician). Given this kb, the Incremental Algorithm can describe s1 by conjoining the properties Size, Big and Type, Musician , for example, because the intersection of their extensions equals {s1 , s3 } ∩ {s1 , s2 } = {s1 }. Simplifying considerably, the algorithm proceeds by incrementally conjoining more and more properties, removing more and more ‘confusables’ (i.e., objects with which the target object may be confused). This process continues until only the target itself is left. The Incremental Algorithm does not allow backtracking, which is what makes it fast (Dale and Reiter, 1995). In Krahmer et al. (2001), it was shown that algorithms such as the Incremental Algorithm can be mirrored in a graph-based formalism, by expressing the description as well as the kb as a labelled directed graph. Let D be the domain of discourse, P a set of names for properties, and R a set of names for relations, then L = P ∪ R is the set of labels. Formally, G = V G , E G is a labelled directed graph, where V G ⊆ D is the set of nodes (the potential referents) and E G ⊆ V G × L × V G is the set of labelled directed edges. The kb above can now be reformulated as the graph in Figure 56. Properties are modelled as loops, i.e., edges which start and end in the same node, whereas relations are modelled as edges between nodes.2 Note that the object of the relation is no longer hidden within the attribute Holds, allowing, for example, relations to be used iteratively, as in ‘The man who holds a trumpet owned by a woman’. We call the graph S that represents the kb the scene graph; if s ∈ V S is the target object then s can be singled out as the designated element of S and we call Σ = s, S the scene pair. Crucially, a description is represented using a similar pair, consisting of a connected description graph D and a designated element d ∈ V D ; the pair ∆ = d, D is called a description pair. Representing both the description and the scene using graphs allows one to view gre as a graph construction 2
In fact, nothing forbids relations which start and end in the same node (which is correct, in view of potentially reflexive relations such as ‘shaving’ and ‘washing’). However, for simplicity, we shall assume throughout this paper that all relations are non-reflexive.
400
VAN DEEMTER AND KRAHMER
Figure 56.
A scene graph involving four objects
problem. More particularly, the task is to construct a description pair that ‘refers uniquely’ to a given scene pair.3 The notion ‘refers uniquely’ is defined via the notion of a subgraph isomorphism. A graph S is a subgraph of S if V S ⊆ V S and E S ⊆ E S . π is a subgraph isomorphism between D and S (Notation: D % π S) iff there exists a subgraph S of S such that π is a bijection π : V D → V S such that for all nodes v, w ∈ V D and all labels l ∈ L, (v, l, w) ∈ E D ⇔ (π(v), l, π(w)) ∈ ES . A description pair ∆ = d, D refers to a scene pair Σ = s, S iff D is connected and ∃π : (D % π S and π(d) = s).
Thus, a description pair ∆ = d, D refers to a scene pair Σ = s, S iff there exists a subgraph isomorphism between D and S that maps d to s. Note that, using this terminology, a description pair can ‘refer’ to more than one scene pair. Consider the description graphs depicted in Figure 57, each of which has s as its designated element. Let Σ = s2 , S, that is, we want to generate an expression referring to s2 in S, where S is the scene graph depicted in Figure 1. Then the first of the three corresponding description pairs refers to Σ but not uniquely (it may 3
Equivalently, one could say that the description pair refers to the designated element given the scene graph.
GENERATING REFERRING EXPRESSIONS
Figure 57.
401
Three possible description graphs
also refer to the ‘confusable’ s1 , S), while both of the other pairs refer to Σ uniquely. We define: Given a graph S and a description pair ∆, the set of confusables, Conf(∆, S), is the set of those nodes s ∈ V S such that ∆ refers to s , S. A description pair ∆ = d, D refers uniquely to a scene pair Σ = s, S iff d, D refers to s, S and ∀π : (D % π S ⇒ π(d) = s).
Note that if a description pair ∆ = d, D refers uniquely to a scene pair Σ = s, S, then Conf(∆, S) = {s}, i.e., the set of confusables is a singleton. We have seen that there are multiple unique (distinguishing) descriptions for our target s2 in S. As usual in gre, certain solutions may be given preference over others. There are various ways to do this. One way is by considering properties in some fixed order and to let the algorithm proceed incrementally by adding suitable properties one by one, stopping once a uniquely referring description is found (Dale and Reiter, 1995). A more general way would be to use cost functions (Krahmer et al. 2001, 2003). Costs are associated with subgraphs D of the scene graph S (notated cost(D)). We require the cost function to be monotonic. This implies that extending a graph D with an edge e can never result in a graph which is cheaper than D. Formally, ∀D ⊆ S ∀e ∈ E S : cost(D) ≤ cost(D + e) Here we assume that if D = V D , E D is a subgraph of S, the costs of D can be determined by summing over the costs associated with the edges of D. For the time being we assume that each edge costs 1 point. Naturally, this is a simplification, which does not do justice to the potential benefits of cost functions (but see Krahmer et al. (2003) for discussion). Thus, the first distinguishing graph in Figure 2 costs 2 points and is cheaper than the other one (which costs 3 points). Figure 2 contains the sketch of a basic graph-based gre algorithm, called makeReferringExpression. It takes as input a scene pair Σ
402
VAN DEEMTER AND KRAHMER
makeReferringExpression(s, S) { bestGraph := ⊥; D := {s}, ∅; return findGraph(s, bestGraph, D, S); } findGraph(s, bestGraph, D, S) { if [bestGraph = ⊥ and cost(bestGraph) ≤ cost(D)] then return bestGraph; Conf := {n : n ∈ V S ∧ s, D refers to n, S}; if Conf = {s} then return D; for each adjacent edge e do I := findGraph(s, bestGraph, D + e, S ); if [bestGraph = ⊥ or cost(I) ≤ cost(bestGraph)] then bestGraph := I; rof; return bestGraph; } Figure 58. Sketch of the main function (makeReferringExpression) and the subgraph construction function (findGraph)
consisting of the target s in a scene graph S. The description pair ∆ is initialized with the target s and the initial description graph D whose only node is s. In addition, a variable bestGraph is introduced, for the best solution found so far. Since no solutions have been found at this stage, bestGraph is initialized as the empty graph ⊥. In the findGraph function the algorithm systematically tries expanding D by adding adjacent edges (i.e, edges from s, or possibly from any of the other vertices added to the D under construction). For each D it is checked what the set of confusables is. A successful description is found iff Conf = {s}. The first distinguishing description that is found is stored in bestGraph. At that point the algorithm only looks for description graphs that are cheaper then the best (i.e., cheapest) solution found so far, performing a complete, depth-first search. (Naturally, graph-based generation is compatible with different search strategies as well.) It follows from the above-mentioned monotonicity requirement that the algorithm outputs the cheapest distinguishing description graph, if one exists. Otherwise it returns the empty graph.4 4
This basic algorithm has been implemented in java 2 (J2SE, version 1.4). For implementation and performance details we refer to Krahmer et al., 2003.
GENERATING REFERRING EXPRESSIONS
403
Discussion The graph-based gre approach has a number of attractive properties. First, there are many efficient algorithms for dealing with graph structures (see for instance Mehlhorn, 1984, Gibbons 1985, and Chartrand and Oellermann, 1993). Second, the treatment of relations between objects is not plagued by some of the problems facing earlier approaches; there is, for instance, no need for making any ad hoc stipulations (e.g., that a property can only be attributed to a given object once per description, Dale and Haddock, 1991). This is because relational properties are handled in the same way as other properties, namely as edges in a graph. Relational properties cause testing for a subgraph isomorphism to have exponential complexity (see Garey & Johnson, 1979, Appendix A 1.4, GT48, on subgraph isomorphisms), but special cases are known in which the problem has lower complexity (e.g., when both graphs are planar, that is, drawable without crossing edges). The availability of results of this kind is an important advantage of using graphs in gre. Many existing gre algorithms can be reformulated using graphs. In the following section, we will show how some of these, each of which extends ‘basic’ gre, can be recast in the graph-based approach. Our exposition of the original algorithms is necessarily sketchy; for details we refer to the original papers. In the section thereafter we show in more detail how the graph-based approach enables two different algorithms for the generation of boolean expressions.
3. Some simple extensions of graph-based GRE
3.1. Referring to sets Firstly, we consider extensions of gre algorithms that generate references to sets. Suppose, for example, we want to refer to the set {s1 , s2 } in Figure 56 (to say that its elements are famous, for instance). This type of reference can be achieved by a simple extension of existing gre algorithms: properties are conjoined as normal, removing from Conf (∆, S) any objects that lie outside the target set, and the algorithm stops if and when the remainder equals the target set (i.e., all other confusables are removed). The target {s1 , s2 }, for example, may be described by the single property Musician (and realized as ‘the musicians’). We will show that a similar procedure can be followed using a graph-based approach.
404
VAN DEEMTER AND KRAHMER
In fact, the algorithm described in the previous section is almost ready to generate simple descriptions referring to non-singular objects. The input of the algorithm is then no longer a single node s ∈ V S , but a set of nodes W ⊆ V S (in this case W = {s1 , s2 }). The algorithm now tries to generate a description pair ∆ uniquely referring to the scene pair Σ = W, S. This requires a slight update of the definition of what it means to refer (uniquely): the constructed subgraph should refer (in the sense defined in section 2) to each of the nodes in the set W , but not to any of the nodes in the scene graph outside this set. Formally, ∆ = d, D refers to Σ = W, S iff D is connected and for each w ∈ W ∃π(D % π S and π(d) = w). ∆ = d, D uniquely refers to Σ = W, S iff d, D refers to W, S and there exists no w ∈ V S − W such that d, D refers to {w }, S.
Note that this redefines reference as always involving a set as its target; the singular case is obtained by restricting W to singleton sets. This does not affect the theoretical complexity of the algorithm, which depends on calculating sets of confusables (i.e., calculating Conf (∆, S), for different ∆). As observed in Stone (2000), gre should also allow reference to sets based on properties that they have as collectives (such as ‘being parallel to each other’). Solutions to this problem are proposed in Stone (1999) and van Deemter (2002), the latter of which can be mirrored directly in terms of graphs if nodes are allowed to represent sets of objects and edges are allowed to represent properties of collectives. 3.2. Gradable properties The analysis of vague or gradable properties such as Small that we have used so far (consistent with Dale and Reiter, 1995) is not really satisfactory, for example because being small means something else for a person than for a musical instrument (Figure 56). The analysis is even more clearly inapplicable to superlatives. ‘The smallest musician’, for example, does not necessarily denote the object that is both the smallest object in the domain and a musician. It is better to let the kb list absolute values, such as a person’s size in centimeters, and let gre decide what counts as small in a given context (van Deemter, 2006). This approach allows the generator to describe someone as the small(est) musician, for example, even if the kb contains smaller objects, as long as these others are not musicians.
GENERATING REFERRING EXPRESSIONS
Figure 59.
405
Graph with gradable properties made explicit
Looking at this proposal in a bit more detail, it is worth noting that it works by adding ‘derived’ properties to the database: properties of the form Size(x) > value, Size(x) < value, which can be inferred from absolute ones (Size(x) = value) listed in the database. Note that only a limited number of inequalities needs to be added, since absolute values not occurring in the database will not be relevant for derived properties. These derived properties are then used for removing confusables in the usual way. Luckily, this procedure can be mirrored using graphs. The way to do this is by extending the scene graph by adding the derived inequalities to it as additional edges. Suppose that Size (s1 ) = Size (s3 ) = 185cm, whereas Size (s2 ) = 157cm, and Size (s4 ) = 30cm. Then, after transformation, the graph looks as in Figure 3.2. Once this extended graph has been constructed, gre proceeds in the usual way. We now find that the target object s2 can be referred to uniquely by the description graph containing only the edges Musician and Size < 185cm. This graph can then be realized as ‘the small musician’ or ‘the smallest musician’ (for the details of this realization procedure we refer to van Deemter, 2006).
406
VAN DEEMTER AND KRAHMER
The theoretical complexity of the construction algorithm, as a function of the numbers of nodes and edges in the revised scene graph, does not change. In the worst case, however, the number of edges in the scene graph grows quadratically: if the original graph contains c nodes, each with a different absolute value, the operation illustrated in Figure 3.2 produces a graph with c(c − 1) additional edges. (Each of the c nodes has an absolute value and now acquires c − 1 comparative values.) 3.3. Salience Another recent innovation in gre concerns the treatment of salience. Earlier algorithms simplified by assuming that all objects in the kb are equally salient (Reiter and Dale, 2000, section 5.4). Krahmer and Theune (2002) refined this account by one which allows degrees of salience in the style of Praguean topic/focus theory or centering. Formally, this is done using a salience weight function, which assigns a number between 0 (non salient) and 10 (maximally salient) to each object in the domain. All objects in the domain can be referred to, but the more salient an object is, the further its description can be reduced. The original algorithm works by adding properties until the set of confusables contains no object that is at least as salient as the target object. A faster, and probably slightly more natural version of the algorithm is obtained if the algorithm starts out by restricting the domain to the set of those objects that are at least as salient as the target set, causing only those properties to be added that remove salient distractors. This idea is easily implemented in the graph-based approach if we redefine the set of confusables as follows: Given a scene graph S containing a target object s, and a description pair ∆, Conf(∆, S) is the set of those nodes in s ∈ V S that are at least as salient as s such that ∆ refers to s , S.
But, in fact, this amounts to treating salience as an in-built gradable property, which allows us to describe the effect of salience on gre by a variant of the algorithm for gradable properties. Limiting ourselves to singular references and assuming that salience is the only gradable property in the kb, this can be done as follows. First, relevant comparative values of Salience are added to the scene graph; this involves properties of the form Salience(x) > value only, since values of the form Salience(x) < value are irrelevant in this connection. These saliencerelated properties are given preference over all others (in terms of costs this can be achieved by offering salience properties for free), so that
GENERATING REFERRING EXPRESSIONS
407
the gre algorithm will cause every description to take salience into account. If the algorithm terminates successfully, the result is the unique description of an object s by means of a graph that attributes a number of properties to s, for example Musician and British and Salience > 8. This is a unique description, therefore it follows that s is the most salient British musician in the domain. The graph may subsequently be realized by the expression ‘the British musician’, leaving the property of maximal salience implicit. In this way, salience is treated as just another gradable attribute, with the only difference that it is always selected by the gre algorithm and never linguistically realized. 3.4. Negations Suppose we wanted to refer to the set T = {s3 , s4 } in our basic scene graph. A simple trick (based on the notion of satellite sets) suffices to see that this is not possible if only atomic properties are taken into account. For any set X ⊆ D, let Satellites(X), the ‘satellite set’ of X, be defined as the intersection of all the extensions of properties of which X is a subset (cf. van Deemter and Halldorsson, 2001):5 S X = {A : A ∈ IP ∧ X ⊆ [[A]]} ( Satellites(X) = A∈S X ([[A]])
Clearly, if Satellites(X) = X, then the properties in the kb do not allow X to be characterized uniquely; even if all the properties of X are intersected, some confusables are not ruled out. Applying this to our target T = {s3 , s4 }, we observe that S T = ∅ (s3 and (s4 share no properties in the kb). This implies that Satellites(T ) = ∅ = D, (and hence not T ).6 What this shows is that our target T cannot be characterized by algorithms like the ones discussed in Dale and Reiter (1995), which rely on using intersections alone. As was argued in Van Deemter (2001, 2002), this is a serious limitation because a simple characterization is possible if negations of atomic properties are allowed. For example, the set of elements in the domain that are not Musicians will do the trick. It is worth mentioning that negations are not only useful from a purely logical point of view. Even where they do not add to the expressive power of the generator (i.e., where they do not make more targets uniquely distinguishable), describing an object in terms of properties it 5
IP is the set of properties; [[A]] is the extension of A. ( We assume that every object in the domain of discourse is in ∅, as is usual when the domain D is given (but compare e.g., Suppes, 1972). 6
408
VAN DEEMTER AND KRAHMER
Figure 60.
The scene graph enriched with negated edges
lacks can be the most efficient way to single it out from the rest: if all persons except one in a scene hold an instrument, then the odd one out may be best described as ‘the person not holding an instrument’, even if she could also have been described in purely positive (but possibly more complex) ways, for example ‘the tall woman in the front row, with the pearl necklace’. Adding negations to graph-based gre is easy and follows a familiar pattern: we can extend the scene graph with additional edges, making explicit what was implicit in the original scene graph. For this we use the standard Closed World Assumption (according to which all atoms
GENERATING REFERRING EXPRESSIONS
409
not listed as true are false). For instance, according to our original scene in Figure 1, s1 is not a technician and does not hold anything. This can be made explicit by adding negated edges. Let P = {Technician, . . .} be the set of names of negated properties and R = {Holds, . . .} the set of names of negated relations, then Lneg = P ∪ R is the set of negated labels. If S = V S , E S is the original scene graph, then the new scene graph with negated edges is S = V S , E S , where ES = ES ∪ {(v, p, v) : (v, p, v) ∈ E S } ∪ {(v, r, w) : v = w & (v, r, w) ∈ E S }
with p ∈ P and r ∈ R. Now graph-based gre proceeds as before: negated edges may be selected if they rule out confusables. Again, the theoretical complexity of the algorithm is not altered, but the scene graph may grow drastically. If our initial scene S contains c nodes, and our initial label set L contains n properties and m relations, then the scene graph with negated edges S will contain c.n + c.(c − 1).m edges. That is: we get a dense graph in which every possible edge is present with either a positive or a negative label. In the above, we have presented a treatment of negation based on extending the scene graph (i.e., on making a set of implicit properties explicit). Other treatments are possible, where the scene graph is left intact, and where the algorithm ‘infers’ a negative property where no positive property is found. Even though this is more efficient from a representational point of view, the conceptual difference is small. A difficult question, regardless of which of these strategies is chosen, is under which circumstances negations are to be chosen: in terms of the cost functions of section 2, what should be the cost of adding a negated property? Instead of discussing this issue, we will be content having established that negations can be treated in the graph-theoretical approach and explore the consequences of adding a further complication to gre, which arises when disjunctions are taken into account as well. In this way, we will be giving the gre algorithm full Boolean as well as relational coverage.
410
VAN DEEMTER AND KRAHMER
Figure 61.
An extended scene graph, involving five objects
4. Boolean Descriptions
To put the graph-based approach to the test, let us now move on to a more challenging task: references using an arbitrary number of Boolean operators. To keep the problem manageable, we will limit the presentation to the case where gradable properties and salience do not play a role. (See the final section for brief discussion, however.) For the discussion in this section, it will be convenient to use a slight extension of our example domain, as depicted in Figure 4. Suppose that the target for gre is the set T = {s1 , s2 , s3 }. Basic gre fails to find a unique reference to T , since there is no set of properties shared by all elements in T . If negative properties are taken into account
GENERATING REFERRING EXPRESSIONS
411
(see section 3.4), it turns out that we can characterize this set by Trumpet∩Synthesiser (i.e., the set of objects that are neither trumpets nor synthesisers), which would be a strange description of this target set. This result is due to the fact that basic gre does nothing else than conjoin atomic properties (i.e., intersect their extensions).7 Recently, proposals have been made for allowing gre algorithms to use any combination of Boolean operators (van Deemter, 2001; 2002; Gardent, 2002). Using these, we find a non-atomic (positive) property shared by all elements of {s1 , s2 , s3 } in our basic example scene: they are all either musicians or technicians ({s1 , s2 , s3 } = Technician ∪ Musician). In section 4.1 we will explore ways in which existing algorithms for Boolean gre may be reformulated using graphs. This discussion will lead on to a wholly new algorithm which is more easily cast in a graph-theoretical mold (section 4.2). 4.1. Applying an Incremental Algorithm to Boolean expansions of graphs Van Deemter (2001, 2002) describes an extension of the Incremental Algorithm, covering equivalents of all Boolean combinations. The basic idea is to incrementally apply the Incremental Algorithm to Boolean properties of growing complexity. Thus, we first apply the Incremental Algorithm to a version of the kb to which negations of properties have been added. If no unique description is found, the kb is enriched with binary disjunctions (i.e., disjunctions of two positive or negative properties), and the Incremental Algorithm is applied to the extended kb. The process is repeated, with each phase adding longer disjunctions: ternary and so on.8 In this way, logical equivalents of all Boolean combinations are covered, by constructing Conjunctive Normal Forms, that is, conjunctions of disjunctions of literals. Thus, for example, phase 1 may conjoin the atomic property A with the negation D, after which phase 2 may add the disjunction G ∪ E (resulting in the description A ∩ D ∩ (G ∪ E)). Since this process is incremental, no property is ever removed, so even if G ∪ E = A ∩ D, the properties accumulated during the first two phases are kept, leading to a description that is far longer than necessary, thus exaggerating a property of Dale and Reiter’s Incremental Algorithm. (See van Deemter, 2002 for discussion.) 7
Other targets exist, which can not be described uniquely at all using conjunction and negation alone, e.g., T = {s2 , s3 }. 8 Note that we freely mix logical with set-theoretic terminology, the relation between which is well understood.
412
VAN DEEMTER AND KRAHMER
Algorithms of this kind may be mirrored using graphs if we are able to let these express disjunctive information. Recall that, in section 3.4, negations of atomic properties were added to the scene graph to allow the inclusion of negative properties into descriptions. We would now have to do something analogous for disjunctions, and to extend the scene graph with edges making implicit information explicit. For example, we would make explicit that s1 , s2 and s3 are all musicians or technicians, by adding an edge labelled ‘Musician or Technician’ (notation: [Musician | Technician]) to each of them. We could apply the graph-based gre algorithm in phases, and for each new phase extend the scene graph. A given phase may either result in a uniquely referring description (i.e., the algorithm terminates successfully), or not, in which case the next phase is entered. If the last phase terminates unsuccessfully then no uniquely referring description is found. Thus: phase 1 Apply graph-based gre algorithm makeReferringExpression to the scene graph, after addition of negative edges. phase 2 Add edges labelled with binary disjunctions to the scene graph, then apply the algorithm to the resulting graph. phase 3 Add edges labelled with ternary disjunctions. Etcetera. In each phase, the basic graph-based makeReferringExpression from Figure 2 is used. Note that if our target is the set {s1 , s2 , s3 }, a simple solution, involving only one property, is found in phase 2, selecting the edge labelled [Musician | Technician], which may be realized as ‘the musicians and the technician’. (Note the reversal, by which ‘and’ expresses disjunction.) The algorithm can be flavoured in different ways; for example, one might decide to do a complete search within each phase (in the style of section 2), but to never undo the results of previous phases. Or alternatively, one could use heuristic search within each phase, and try to find the ‘cheapest’ solution given these limitations. But is our premise correct? That is, can disjunctive information always be expressed by labelled directed graphs of the kind that we have been discussing? Let us see how this could work, focussing on the case of binary disjunctions, and focussing on first principles rather than representational economy, for clarity. As was the case for negations, first we have to define a new class of labels, Ldis2 . Naturally, the new labels will be composites of existing labels, for example [l | l ] (also written as l | l ) will be the label denoting the disjunction of the literals denoted by the labels l and l :
GENERATING REFERRING EXPRESSIONS
413
Ldis2 = Lneg ∪ {[l | l ] : l, l ∈ Lneg }, where l and l are of the same arity (i.e., both are properties or both are relations). This would make [ Musician | Technician ] a label, and also [ Hold | Operate ], for example. With the newly extended set of labels in place, let us see how the scene graph may be extended. If S = V S , E S is a scene graph (possibly containing negative as well as positive edges), then the new scene graph with disjunctive edges is S = V S , E S , where (for [l | l ] ∈ Ldis2 ) the following holds: ES = ES ∪ {(v, [l | l ], v) : (v, l, v) ∈ E S ∨ (v, l , v) ∈ E S } ∪ {(v, [l | l ], w) : (v, l, w) ∈ E S ∨ (v, l , w) ∈ E S }
So far, everything appears to be as it should. Unfortunately, however, this treatment leaves some types of disjunctions uncovered. The simplest problem is posed by mixtures between properties and relations, such as ‘is a technician or holds a trumpet’. Cases like this might be accommodated by creating ‘mixed’ labels, which disjoin a property and a relation (that is, by dropping the requirement, in the definition of Ldis2 , that disjoined labels must have the same arity). An example would be the label [ Technician | Hold ], which could now label both looping and non-looping edges. Though this pairing of labels of different arity is slightly counterintuitive, there is no technical obstacle against it. There is a more difficult problem, however, which resists this type of fix. Consider our running example, depicted in Fig. 6. It ought to be possible to describe the target {s2 , s3 } by means of the disjunctive relation ‘holds a trumpet or operates a synthesiser’. Disjunctions of such a complex kind, where the things that are disjoined are essentially structured rather than atomic, are not covered by disjunctive labels. (Note that the structure of the disjuncts can be arbitrarily complex, e.g., ‘hold a violin which is old’, ‘hold a trumpet owned by a woman who ...’.) Extensions of the framework are possible; for example, one might create a new set of relational labels including, for example, Hold-Violin (‘holding a violin’), Hold-Violin-Old (‘holding a violin that is old’), and so on. This extension would allow us to form the disjunctive label [ Hold-Trumpet | Operate-Synthesiser], giving rise to a distinguishing description of the target {s2 , s3 }. It is doubtful, however, that all possible cases could be tackled in this fashion, and the approach would seem to be misguided for various reasons.
414
VAN DEEMTER AND KRAHMER
Firstly, it would be fiendishly complicated to add all the right extensions to the scene graph without getting into an infinite loop. Secondly, by condensing all information into a single label, this approach would make superfluous the idea of letting the subgraph isomorphism algorithm find matches between complex graphs, going against the grain of Krahmer et al. (2003). In addition, it would tend to destroy everything that is simple and intuitive about graphs as representations of meaning, calling to mind efforts to make Venn diagrams more expressive by letting lines indicate that a given object can live in either of a number of different regions of the diagram; such a strategy allows succinct expression of some disjunctions, but becomes extremely cumbersome in other cases (Peirce 1896, Shin, 1994). At least two types of responses are possible to this problem: one is to represent only some disjunctions explicitly (or even none at all), and to let the algorithm infer the others, analogous to what was suggested concerning negations at the end of section 3.4. The other is to explore a different type of algorithm which does not hinge on conjoining disjunctive properties, but on disjoining conjunctive properties. 4.2. Generating partitions: an alternative algorithm for the generation of Boolean descriptions In the present section we offer an alternative algorithm for the generation of Boolean descriptions. Unlike the previous algorithm, this algorithm will not be based on an extension of the scene graph which, as have have seen, leads to problems. In fact, the algorithm leaves the original graphs intact, embedding the basic algorithm makeReferringExpression (from Figure 2) in a larger algorithm. Also unlike the previous algorithm, which generates conjunctive normal forms (CNF i.e., conjunctions of disjunctions of literals), the new algorithm generates disjunctive normal forms (DNF). More specifically, the algorithm generates disjunctions of conjunctions of literals under the added constraint that all conjunctions are mutually disjoint. In other words, the new algorithm uses partitionings. Whether DNFs (including partitionings) or CNFs are more useful as an output of gre is a question that we will not resolve here, but which will be briefly taken up in section 4.3, where optimisation strategies are discussed. The logical point to observe, in connection with the new algorithm, is that every Boolean combination is logically equivalent to a partitioning. This can be seen as follows. Firstly, every Boolean combination is equivalent to a formula in DNF, that is, a formula of the form X 1 ∪ ... ∪ X n
GENERATING REFERRING EXPRESSIONS
415
describePartition (W, S) { n := |W |; k := 1; D := ⊥; for k = 1 to k = n do k-part := {ω : ω is a k-partition of W }; for each ω ∈ k-part do for each part w ∈ ω do D := makeReferringExpression (w, S); if D = ⊥ then D := D ∪ D else failure /* try next ω */ rof; return D; /* one k-partition could be described */ rof; return failure; rof; } Figure 62.
Sketch of an algorithm describing partitions
where each X i (with 1 ≤ i ≤ n) is of the form Y 1 ∩ ... ∩ Y m , and where each Y j (with 1 ≤ j ≤ m) is a positive or negative literal. Secondly, any DNF formula can be rewritten as a partition, that is, a DNF whose disjuncts are all disjoint. The rewriting process is most easily demonstrated using an example. Consider the DNF A∪B ∪C (a disjunction of length three), and suppose this is not a partition, for example because A∩B, A∩C and B ∩C are all nonempty. This DNF can be rewritten as another disjunction of length 3, namely A∪(B −A)∪(C −(A∪B)): each disjunct is adapted to make sure that all elements of its predecessors are removed. This procedure generalises without difficulty to disjunctions of length n. We take these logical considerations to imply that, as a first step, it is sufficient to build an algorithm that generates partition-type descriptions wherever a distinguishing description is possible. The algorithm works as follows. Let W = {w 1 , . . . , wn } be the target, with W ⊆ V S . We call a partitioning of W into k subsets (henceforth called parts) a k-partitioning. Figure 4.2 contains a description of the partition-based generation algorithm, using the function makeReferringExpression from Figure 2. (In Figure 4.2, the notation D ∪ D designates the
416
VAN DEEMTER AND KRAHMER
unconnected graph that forms the union of the two connected graphs D and D .) In a first iteration, the algorithm tries to describe W itself (as a 1-partitioning). If this fails, it attempts to describe one of the 2-partitionings of W . That is: for each 2-partition, we call the usual makeReferringExpression function from Figure 2 and apply it to each of its parts. As soon as one part cannot be described in the usual way, we move on to the next 2-partition. For our example target set, there are 3 partitions in 2 parts: {s1 }, {s2 , s3 } and {s2 }, {s1 , s3 } and {s3 }, {s1 , s2 }. Both parts of the latter 2-partition can be described in the usual way, as ‘the technician’ and ‘the musicians’ respectively. So, here the algorithm would terminate. In general, the partition algorithm will continue looking at k-partitions for ever larger values of k, until the target set is split up into singleton parts (i.e., until k = n, the number of parts equals the number of elements of the target set). Obviously, there is only one way to partition a target set W in singleton parts. Note that this new algorithm, which covers Boolean combinations of properties and relations (“the bold technicians and the musicians who hold a trumpet”) stays extremely close to the original graph-based algorithm outlined in section 2 and, unlike the approach outlined in the previous section, it does not require iterative extensions of the scene graph adding edges for ever more complex disjunctive labels. In addition, the approach is compatible with the treatment of gradable properties and salience. The algorithm, however, is computationally expensive in the worst case. The reason for this is that the number of partitions grows exponentially as a function of the size of the target set. In general, the number of k-partitionings of a target set with c elements can be determined using the second-order Stirling number S(c, k) (Stirling 1730, see also e.g., Knuth, 1997:65). This number equals )k
S(c, k) =
% j =0
k j
&
(−1)j (k − j)c k!
Fortunately, there are various ways to speed up the algorithm. For example, one can use the notion of satellite sets, described in section 3.4 (which can be computed in linear time) to determine whether a purely conjunctive description for a given part exists; if not, the algorithm moves on to the next partitioning and tests whether a description for each of its parts exists. Alternatively, we could limit the set of partitions by relying on linguistic regularities. For example, by requiring that the properties corresponding with the different parts are of the same ‘type’. Thus, for example, one might disallow “the musicians and the
GENERATING REFERRING EXPRESSIONS
417
small things”, while allowing “the musicians and the technician” or “the trumpet and the synthesizer”. Such a move, however, can sometimes result in a loss if descriptive power, because some sets may no longer be describable by the algorithm. In other words, the algorithm is no longer logically complete. 4.3. Generate and Optimise We have simplified our discussion considerably by focussing on logical completeness only, that is, on the ability of a gre algorithm to find a destinguishing description whenever there exists one. This means that we have largely disregarded the fact (noted in van Deemter, 2002 and more extensively in Gardent, 2002) that some algorithms deliver descriptions that are very lengthy and unnatural. In fact, it has been proven to be possible to construct a logically complete Boolean gre algorithm in linear time, as long as the linguistic quality of descriptions is disregarded (van Deemter and Halld´ orsson, 2001): only when linguistic restrictions are placed on the output do things get complicated. It is well known, for instance, that finding minimal descriptions (e.g., Dale, 1992, Gardent, 2002) is computationally intractable, even when only conjunction is taken into account (Dale and Reiter 1995). The Boolean algorithms presented above do not guarantee minimal output, but the results are generally much shorter and easier to realize than those in van Deemter and Halld´ orsson (2001). The quality of the generated descriptions is ultimately an empirical issue which we cannot hope to address adequately within the confines of this paper. One point that we would like to stress here, however, is that Boolean descriptions can often be optimised automatically. Consider the second, partition-based algorithm. Suppose the domain D is the set {d1 , d2 , d3 , d4 }, while the target T is {d1 , d2 , d3 }. Suppose Musician and Technician are the only properties, with Musician = {d1 , d2 } and Technician = {d2 , d3 }. Then the algorithm based on partitionings behaves as follows: during the first phase there is only one partitioning (namely T itself), and it cannot be described. During the second phase, where 2-partitionings are considered, the partitioning {{d1 , d2 }, {d3 }} may be chosen, whose two elements may be characterized as follows: the set {d1 , d2 } equals the extension of the property Musician; the set {d3 } equals the intersection of the extensions of Technician and Musician: “the musicians and the technician that is not a musician”. This would be overly verbose, since Musician ∪ (Technician
418
VAN DEEMTER AND KRAHMER
∩ Musician) is logically equivalent with the simpler expression Musician ∪ Technician: “the musicians and the technician”. This illustrates that algorithms for Boolean gre can be viewed as combining two different tasks: The first is to find a Boolean description that characterises the target, the second to determine whether there exists a logically equivalent characterisation that is more natural (e.g., briefer). Note that this makes the Boolean algorithms different from all the other ones discussed in this paper, where the second task was not relevant, but only another, similar task: determining whether there exist non-equivalent descriptions that nevertheless (i.e., given the domain and the extensions of properties) characterise the same target. In our original example domain, for instance, the property of being a small musician happens to be co-extensive with holding a trumpet, but this is something that can only be found out through an inspection of the domain. By contrast, inspection of the domain is not the natural way to find out that, for example, Musician ∪ (Technician ∩ Musician) is equivalent with Musician ∪ Technician. This separation into two different aspects of Boolean gre suggests a ‘generate then optimise’ strategy reminiscent of the idea in Reiter (1990) in the context of simple gre, which amounted to checking whether any set of properties, in a given description, may be replaced by another property without affecting the extension. In the current setting, where logical equivalence (not coextensionality) is the issue, an obvious way to optimise is to use the type of algorithms that are used in chips design to simplify switching networks (van Deemter, 2002). The best known of these algorithms is the Quine-McCluskey algorithm, which performs the types of simplifications discussed here without difficulty (McCluskey, 1965).9 A full discussion of the limitations of logical optimisation will not be offered here, since it is a more general issue of no particular relevance to the graph-theoretic approach.
5. Discussion
We have shown how labelled directed graphs can be used for generating many types of referring expressions, but we have also run into the limits of their usefulness. Starting from the basic graph-based algorithm of 9
Although Boolean simplification is hard in general, algorithms like Quine-McCluskey are highly optimised and take little time in most cases that are likely to occur. Check http://logik.phl.univie.ac.at/chris/ qmo-uk.html.
GENERATING REFERRING EXPRESSIONS
419
Krahmer et al. (2003), we have shown (1) how this algorithm can be used to generate descriptions of targets that are sets, (2) how it can accommodate gradable properties, including the property of being salient, and (3) how it can be extended to deal with negative literals. Our strategy has been the same in each of these cases: making information that is implicit in the original scene graph explicit by adding additional edges. After these relatively modest extensions, we have focussed on the generation of Boolean descriptions, arguably one of the most difficult gre tasks. We have explored how the incremental Boolean algorithm of van Deemter (2002) might be recast in graphs, making implicit (disjunctive) information explicit by adding edges to the scene graph. Having seen that it is difficult to use this method for representing all the different types of disjunctions when relations are also taken into account (as in “the men who are either holding a trumpet or playing a synthesiser”), we were forced to consider alternative algorithms, and this has led to a simple alternative based on partitionings of the target set. This approach, which generates a different type of description from the incremental Boolean algorithm, outputs description graphs that appear to be natural and easy to realise in most cases. Having noted that, like its predecessors, the algorithm can sometimes generate unnecessarily lengthy descriptions, we have briefly explored the use of existing algorithms for the automatic simplification of Boolean expressions. We foresee that other problems with complex referring expressions, not dissimilar to the one arising when disjunctions and relations are combined (see section 4.1), may arise in other types of referring expressions (for example when booleans and quantifiers are combined), but an assessment of the challenges posed by these other expressions will have to await another occasion. With the prospect of integrating the different gre algorithms, plenty of new problems appear on the horizon. For example: − Relational and Boolean properties. It is unclear how the generator should choose between different kinds of syntactic/ semantic complexity. Consider, for example, the addition of a negation, a disjunction, or a relation with another object. It is unknown, for example, which of the following descriptions should be preferred by the algorithm: “the musicians that are not technicians”, “the violinists and the cellists”, or “the musicians in the string section”. New empirical research is needed to settle such questions. It is likely that the choice between different sets of properties can partly depend on ease of realization: “the string section”, for example,
420
VAN DEEMTER AND KRAHMER
may be preferable to the “the violinists and the cellists” because it uses fewer words (rather than fewer concepts). − Salience and sets. Suppose a quintet of musicians is performing on stage, thereby achieving a higher salience than all other people. Then two of them start to solo, thereby becoming even more salient. If existing accounts of salience are applied to sets, the generator can use the expression “the musicians” to describe the set of five or the set of two, which introduces an element of ambiguity into gre that had always been kept out carefully. − Salience and vagueness. We have shown that salience can be treated as (almost) ‘just another’ gradable property (section 3.3). But this is not only good news. Should, for example, “the big piano player” mean “the biggest of the piano players that are sufficiently salient”? Or “the most salient of the piano players that are sufficiently big”? Or is some sophisticated trade-off between size and salience implied? Expressions that combine gradable properties tend to be highly indeterminate in meaning. Determining under what circumstances such combinations are nevertheless acceptable is one of the many new challenges facing gre. Issues of this kind are to be addressed in our future research.
6. Acknowledgements The work on this paper was done as part of the TUNA project, which is funded by the UK’s Engineering and Physical Sciences Research Council (EPSRC) under grant number GR/S13330/01.10
References Bateman, J., 1999, Using Aggregation for Selecting Content when Generating Referring Expressions, Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (acl 1999), University of Maryland. Chartrand, G. and O. Oellermann, 1993, Applied and Algorithmic Graph Theory, McGraw-Hill, New York. Dale, R., 1992, Generating Referring Expressions: Constructing Descriptions in a Domain of Objects and Processes, The MIT Press, Cambridge, Mass. 10
See http://www.csd.abdn.ac.uk/research/tuna for more information concerning TUNA.
GENERATING REFERRING EXPRESSIONS
421
Dale, R. and N. Haddock, 1991, Generating Referring Expressions involving Relations, In Proceedings of the European Meeting of the Association for Computational Linguistics (eacl 1991), Berlin, 161–166. Dale, R. and E. Reiter, 1995, Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions, Cognitive Science 18: 233–263. van Deemter, K., 2000, Generating Vague Descriptions, Proceedings of the First International Conference on Natural Language Generation (inlg 2000), Mitzpe Ramon, 179–185. van Deemter, 2001, Generating Referring Expressions: Beyond the Incremental Algorithm, in Procs. of 4th International Conf. on Computational Semantics (iwcs-4), Tilburg, 50–66. van Deemter, K. and M. Halld´ orsson, 2001, Logical Form Equivalence: the Case of Referring Expressions Generation, Proceedings of 8th European Workshop on Natural Language Generation (ewnlg-2001), Toulouse. van Deemter, K., 2002, Generating Referring Expressions: Boolean Extensions of the Incremental Algorithm” Computational Linguistics 28 (1): 37–52. van Deemter, K. 2006, “Generating Referring Expressions that involve gradable properties”. Computational Linguistics 32 (2). Gardent, C., 2002, Generating minimal definite descriptions, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, (acl 2002), Philadelphia, USA, 96–103. Garey, M. and D. Johnson, 1979, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H.Freeman. Gibbons, A., 1985, Algorithmic Graph Theory, Cambridge University Press, Cambridge. Horacek, H., 1997, An Algorithm for Generating Referential Descriptions with Flexible Interfaces, Proceedings of 35th Annual Meeting of the Association for Computational Linguistics (acl 1997), Madrid, 206–213. Knuth, D., 1997, The Art of Computer Programming, Vol. 1: Fundamental Algorithms, 3rd ed., Addison-Wesley, Reading, MA. Krahmer, E., S. van Erk and A. Verleg, 2001, A Meta-Algorithm for the Generation of Referring Expressions, Proceedings of 8th European Workshop on Natural Language Generation (ewnlg-2001), Toulouse, ??–??. Krahmer, E. S. van Erk and A. Verleg, 2003, Graph-based Generation of Referring Expressions, Computational Linguistics, 29(1): 53–72. Krahmer, E. and M. Theune, 2002, Efficient context-sensitive generation of referring expressions, In: Information Sharing, K. van Deemter and R. Kibble (eds.), CSLI Publications, CSLI, Stanford, 223–264. McCluskey, E.J., 1965, Introduction to the Theory of Switching, New York. McGraw-Hill. Mehlhorn, K., 1984, Data Structures and Algorithms 2: Graph Algorithms and NP-Completeness, EATCS Monographs on Theoretical Computer Science. Springer, Berlin and New York.
422
VAN DEEMTER AND KRAHMER
Peirce, C.S., 1896, Collected Papers, Vol. 4. Edited by Charles Hartshorne and Paul Weiss. Harvard University Press, Cambridge, Mass. Reiter, E., 1990, The computational complexity of avoiding conversational implicatures, Proceedings of 28th Annual Meeting of the Association for Computational Linguistics (acl 1990), 97–104. Reiter, E. and R. Dale, 2000, Building Natural language Generation Systems. Cambridge University Press, Cambridge, UK. Shin, S.-J., 1994, The Logical Status of Diagrams. Cambridge University Press, Cambridge, UK. Stirling, J., 1730, Methodus differentialis, sive tractatus de summation et interpolation serierum infinitarium, London. Stone, M., 2000, On Identifying Sets, Proceedings of the First International Conference on Natural Language Generation (inlg 2000), Mitzpe Ramon, 116–123. Suppes, P., 1972, Axiomatic set theory, Dover Publications, New York.
JAN ALEXANDERSSON AND TILMAN BECKER
EFFICIENT COMPUTATION OF OVERLAY FOR MULTIPLE INHERITANCE HIERARCHIES IN DISCOURSE MODELING
1. Introduction
For about four decades now, much investigation has been devoted to commonsense reasoning. Such reasoning is sometimes called nonmonotonic or reasoning with defaults. The main motivation for this has been the observation that humans apparently manage to draw conclusions from incomplete information, based on either the previous events or even based on previous experience. In most situations, the information at hand is indeed incomplete and humans therefore have to assume that certain additional information is relevant and true. Another motivation is the observation that even though new information might be inconsistent with the previous, humans manage to cope with this too. We want to transfer such behavior to artificial dialog partners, i.e., to dialog systems. It seems to be a general opinion that commonsense reasoning is nonmonotonic, e.g., (Moore, 1995; Wahll¨ of, 1996). Commonsense reasoning can be partitioned into two kinds: the first, called auto-epistemic assumes a rational agent is able to reason about his own beliefs, or in other words: to draw plausible conclusions based on an incomplete representation of complete information. A conclusion is drawn because it is assumed that the information at hand is complete and we would thus know whether the conclusion were false. In the second, called default reasoning, (plausible) conclusions are drawn based on the absence of information that would make it impossible to draw them. We are concerned with a kind of nonmonotonic reasoning which shares some of the the properties of auto-epistemic reasoning. We are modeling the intentions of a user in a dialog system where the conclusions we draw are based on previous knowledge collected during a dialog. Our conclusions are defeasible in the sense that any conclusion we draw might be overwritten by new information. Different from the definitions above, we do not consider the case that a premise is made 423 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 423–455. c 2007 Springer.
424
ALEXANDERSSON AND BECKER
invalid and that our conclusions are therefore false. Instead, invalidness must explicitly be stated by the user. In discourse processing, a new communicative action by some locutor introduces information that can be added to the context either with or without conflicts. When adding information without conflicts, new information can be related to some focused topic, such as, the time of some event. New information can be more precise, like the exact hour in the previously mentioned afternoon. Often, however, new information conflicts with previous information, e.g., the topic is changed, or some previous information is revised, possibly rendering parts of the current context invalid. In such cases, parts of the focused information can be inherited such that the interpretation of the new communicative action is completed with information from the discourse context as far as it is consistent. Several researchers argue that this process can be seen as overlaying the new information on top of the old, see (Grover et al., 1994; Zeevat, 2005) or reasoning with defaults, e.g., (Doherty, 1991), where the defaults in our setting are provided by the local discourse context. Similar reasoning is found in other applications, such as, lexicon maintenance (Lascarides and Copestake, 1999) or parsing of illformed texts (Fouvry, 2003). In this work, we give an enhanced formalization for the process of combining the information from the current utterance with the previous context. The combination is performed on a semantic/pragmatic level. Our approach presupposes knowledge encoded in typed feature structures (henceforth TFS) and we call this operation overlay, see also (Alexandersson and Becker, 2001; Pfleger et al., 2002; Loeckelt et al., 2002). The reason for selecting TFS for the implementation of default reasoning is because of its appealing computational behavior. However, since our knowledge is organized in an ontology, the long term goal of our research is to use more powerful formalisms, such as description logics. Even though much of the intensity within this research branch has declined, we believe there is hope for efficient implementations for such formalisms too, e.g., (Wahll¨ of, 1996). On a formal and technical level, overlay is the combination of default unification and a scoring function. Default unification is used as a general purpose mechanism for enriching the interpretation of some communicative action with consistent information from some defeasable structure. The scoring function is indispensable, in particular for its usage in real dialog systems. There, analysis components have to face multiple hypotheses and since default unification always succeeds,
EFFICIENT COMPUTATION OF OVERLAY
425
the one hypothesis best fitting the knowledge at hand – in our case, the discourse history – has to be selected. It is well established, that at least a na¨ıve computation of default unification in general and credulous default unification in particular is exponential, see (Grover et al., 1994; Copestake, 1993). However, it is shown (Ninomiya et al., 2002) that by dropping the requirement that the result should be well-typed, it is possible to achieve algorithms linear to the size of the structures involved. The algorithm presented here is a snapshot where our long term goal is to come up with an efficient algorithm for computing the set of well-typed credulous default unifiers for two TFSs including structure sharing. The new contribution beyond previous work, where we have used tree-like inheritance type systems, is a precise formalization of the extension of overlay to type hierarchies with multiple inheritance together with an efficient algorithm and an extended definition of the scoring function. Throughout this paper, we motivate and illustrate overlay in terms of single utterances in the SmartKom system (Wahlster, 2003), a multi-modal dialog system with a frame-based meaning representation. However, we believe that this work applies to other types of discourse, such as documents and even intra-sentential phenomena like gapping and one-anaphora. We have already applied overlay to anaphora resolution, see (Pfleger et al., 2002). The paper is organized as follows: Section 2 motivates our work and presents the SmartKom system in which we verify our approach. An extended discussion of related work is found in section 3 and the main contributions follow in section 4, with the definition of overlay in section 4.2 and the new scoring function in section 4.4. Finally, a possible generalization of overlay is discussed in section 5.
2. The Setting
2.1. Discourse and Domain Modeling The core of the user intention in the SmartKom system is represented by instances of our domain model. However, we have driven this approach to the extreme since everything, including dialogue acts, processes, etc., is represented in the domain model (see (Gurevych et al., 2003) for more information). The discourse modeling and processing aspects are described in (Loeckelt et al., 2002; Pfleger et al., 2003;
426
ALEXANDERSSON AND BECKER
Reithinger et al., 2003). A shallow description of its tasks in the SmartKom system is described below in section 2.2. A crucial element of this work is the assumption of a typed framebased formalism, where we use typed feature structure as a well-understood formalization (Carpenter, 1992; Krieger, 1995). We would like to stress that although the aim of what we discuss in this paper should be modeling-neutral, we have some basic assumptions on the modeling. Objects which share properties should be modeled together. Consider the example see below where an “entertainment” frame is more general than, for instance a “go-to-see-a-movie” frame. Moreover a “watchsomething-on-TV” frame is related to “go-to-see-a-movie” in a sense that they both have, e. g., a begin-time property or feature. This common property should be defined in the entertainment frame. One the other hand, a location feature in the “go-to-see-a-movie” frame should only be defined for precisely this frame, since it is obsolete for watching TV.1 In the same way, the channel only makes sense for the “watchsomething-on-TV” frame. In our implementation within the SmartKom system with which we verify our theory, an ontology based on the ideas of (Baker et al., 1998; Russel and Norvig, 1995) has been constructed (see (Porzel and Gurevych, 2002; Gurevych et al., 2003) for another usage of the ontology). A simplified excerpt of the ontology describing just those frames discussed above is shown in figure 63. Consider the following example TOP
A named entertainment ENTERTAINMENT beginTime ...
A named broadcast on some channel
A named performance at some location
BROADCAST
PERFORMANCE
channel ...
location ...
Figure 63. An excerpt from the SmartKom ontology showing the more general entertainment frame and the two specialized frames performance and broadcast 1
One can, of course, one the one hand argue that watching TV also requires a location which is typically at home or in the hotel room. But on the other hand, this kind of location is a different one from the movie theater, so the types of the locations differ.
EFFICIENT COMPUTATION OF OVERLAY
427
of a dialog2 between a user who is seeking information about the movie and TV program and the SmartKom system: (229) User: I’d like to go to the movies tonight. (230) SmartKom: Here (*) are the films showing in Heidelberg. (231) User: No, there is nothing interesting there, (232) User: what is showing on TV? After the system’s reply (230), the context contains information about the topic going to the movies and some details like in Heidelberg and tonight. Note, that in the SmartKom system, certain defaults, e. g., the location Heidelberg in this example, are incorporated into the representation of the user utterances. The next user utterance (231) could be seen as meta-talk, signaling a change of topic which is then made explicit in the final utterance (232). Since the two topics movies and TV are clearly related, some of the information from the old context should be retained, i. e., in Heidelberg and tonight.3 The deep reason for this is that both topics serve the common goal of finding some entertainment which is still valid. Which parts of the old context should be retained in this case depends on the type of the topic. Time is relevant for both movie theaters and the TV program while any information (other than the city) about the movie theaters is irrelevant for the TV program. The overlay operation is based on typed feature structures as the representation formalism for the meaning representations and automatically performs this combination of old context and new utterance. A unique contribution of overlay, to our knowledge, is a measure we call score reflecting the degree of compatibility between background and cover through a numerical scoring function, see section 4.4. While comparing the representation of the new utterance with different parts of the context, it is essential – just because overlay always succeeds – to get the information about how well the cover fits the background. From this viewpoint, the score expresses a degree of topic cohesion, i.e., whether the new utterance is more or less the same topic as some 2
The only multi-modal aspect of this dialog is the presentation of the actual film titles on the display and a pointing gesture by the presentation agent, marked by *. 3 We are not taking into account inferred information from external knowledge sources like, e.g., a user model.
428
Analyzer
ALEXANDERSSON AND BECKER
Modality Fusion
Intention Selection
Action Planning
Discourse Modeling
Generator
Presentation Manager
Function Modeling
External Databases External Devices
Figure 64. Architecture of SmartKom from the point of view of the discourse modeling module - DiM
part of the discourse context. However, it is not a measure for general discourse cohesion (see also (Pfleger, 2002)). 2.2. Architecture of the SmartKom system Figure 64 shows the relevant part of the system architecture. The input by the user is analyzed separately in the modalities speech, gesture and facial expression and a modality fusion module integrates them into a single representation. Since the analysis modules generate multiple hypotheses of the user’s intention, all combinations are computed by the modality fusion module and are presented as a hypothesis lattice. In the following process of selecting the most likely new state of the dialog, defaults, situative knowledge, and finally context information are added to the hypotheses. There, the task of the discourse modeling module is to enrich the hypotheses with information from the current context and also to validate the resulting combination. Since these operations are carried out after modality fusion, they are completely independent of the multi-modal setting. The representations of context and user intention are based on an ontology modeling the domain of the possible tasks that SmartKom can be used for and all objects and processes that can be part of these tasks, see (Gurevych et al., 2003). The classes in the ontology are connected by unary inheritance only. However, in this paper we extend the definition of overlay to domain models with multiple inheritance structures, see section 4. In the implementation, these structures are encoded in M3L, an XML based language with a very strict XML schema description which directly reflects the ontology. For the purposes of this paper, we will continue to use typed feature structures which has a number of advantages. It is a well understood formalism which encompasses the
429
EFFICIENT COMPUTATION OF OVERLAY
Performance . . . entertainment : beginTime tonight cinema Cinema Figure 65.
TFS for I’d like to go to the movies tonight
same type of inheritance structures as the XML-encoded ontology4 and the unification operation is closely related to the overlay operation, in fact it is incorporated in the definition of overlay. Finally, TFS allow for a very compact presentation. Figure 65 shows a possible corresponding TFS for the sample sentence I’d like to go to the movies tonight. For a more detailed description of the discourse modeling module and how TFS and overlay are embedded into a complex representation of the discourse history in SmartKom, see (Pfleger, 2002; Pfleger et al., 2002).
3. Related Work
Since frames or frame like structures have been used for knowledge representation, non-monotonic operations like overlay have been suggested and used. Below, we provide an overview of such approaches and additionally give a characterization of by comparing our approach to others. Operations similar to overlay are often called default unification between a strict structure (Carpenter, 1993; Grover et al., 1994; Ninomiya et al., 2002) - our covering- and a default structure (Bouma, 1990; Carpenter, 1993; Ninomiya et al., 2002) (sometimes defeasible structure (Grover et al., 1994)) corresponding to our background. Default unification has been used since the eighties and is also found under different names. To our knowledge, the first mentioning and implementation of an algorithm performing default unification was part of the DPATR workbench and was called clobber (Karttunen, 1986; Karttunen, 1998). Clobber is similar to what is later called priority union (Kaplan, 1987). 4
We will not give a formal proof of this relation here, but rather indicate that the transformation is more or less direct. Beyond inheritance, an ontology also allows for restrictions on role-fillers, e.g., a table must have 3 or 4 legs. Such constraints have to be encoded elsewhere.
430
ALEXANDERSSON AND BECKER
q q r q m s A = s t B = s t A/B = u u v p l p Figure 66.
r t v l
Kaplans priority union example
In his effort of formalizing optimality theory, Karttunen names a similar operation lenient composition. The idea is the same: information in a default structure is added to a strict structure as long as the information does not conflict. The example in figure 66 is taken from (Kaplan, 1987, P. 180). He suggests to use this operator for either morphological operations where one value overwrites a default value or the resolution of elliptical constructions. The latter is picked up in (Grover et al., 1994) (see below). However, Kaplan does not tell us how to process coreferences nor typed feature structures. The former is tackled by the following approaches. Bouma, e.g., in (Bouma, 1990), gives a recursive definition of default unification. His idea is to “remove all default information which might lead to a unification conflict.” Another definition, similar but different in style and function is given in (Carpenter, 1993) and (Calder, 1993). We concentrate on (Carpenter, 1993): There, two default unification operations are defined called credulous and skeptical default unification. The idea behind the credulous operation is to generalize the default structure until it unifies with the strict structure (see definition 1 below). Generalization can be performed by walking upwards in the subsumption lattice of feature structures until a structure is found that meets this requirement. As we will exemplify below, there might be several distinct structures that unifies with the strict one. Consider the example where the strict structure is set to [A:b] and the default structure to [A:1 a, B:1 ]. To explain the result of credulous < default unification (which we denote with ' c ) we draw the subsumption lattice imposed by the information ordering relation " as described in (Carpenter, 1993) in figure 67. Clearly, the set of the most special generalizations unifying with the strict structure is {[B:a] , [A:1 , B:1 ]} (1) which, unified with the strict structure, produces the following set of feature structures: {[A:b, B:a] , [A:1 b, B:1 ]} (2)
EFFICIENT COMPUTATION OF OVERLAY
431
Figure 67. The lattice < [A : 1 a, B : 1 ] , "> where " is the information ordering relation
Carpenter does not give an algorithm but a formal definition of his credulous default unification: DEFINITION 1. Credulous Default Unification (Carpenter, 1993) <
F ' c G = {F ' G |G + G is maximal such that F ' G is defined } Related to this, (Copestake, 1993, p. 44) points out “. . . the complexity of the algorithm as described is exponential since checking for all possible F would involve creating the unification of each member of the power-set of At(F2).” At this point it is interesting to note that the implementation reported on in (Grover et al., 1994) pursues a slightly different strategy: the default structure is decomposed into atomic feature structures (Moshier, 1988) which are then in a breadth-first-search unified in all possible orders. This algorithm, however, is admittedly potentially slow for the same reason. It is worthwhile pointing out that the examples in (Grover et al., 1994) are all based on a unary inheritance type hierarchy. Carpenter’s second definition – skeptical default unification – is based on the desire to obtain a unique result. The idea can be summarized as “. . . [maintaining] only default information which is not in any way conflicted.” (Carpenter, 1993). This is achieved by computing the most special generalization of the result of the credulous default unification: < < F ' s G = &(F ' c G)
432
ALEXANDERSSON AND BECKER
In our example above the value of feature B is conflicting between the result structures. Thus, the unique result of skeptical default unification contains no value for feature B: [A:b, B:(]
(3)
Besides the more general approaches of default unification as described above, there are approaches intended for special applications, e.g., parsing of ill-formed input, e.g., (Imaichi and Matsumoto, 1995; Ninomiya et al., 2002; Fouvry, 2003). One of the earlier ones is (Imaichi and Matsumoto, 1995) where an extension to standard unification, a → variant of forced unification called cost-based unification ( ' ) is introduced. Their idea is to continue when classical unification fails, but punish the result by adding a cost for the inconsistency. In the following example the symbol ( represents inconsistency, and ({sing, plur} an inconsistent set: *
NUM: sing PERS: third
+
→
'
*
NUM: plur PERS: third
+
*
=
NUM: ({sing,plur} PERS: third
+
There are several ways of defining costs, e.g., the number of inconsistent sets. Extending the work of (Imaichi and Matsumoto, 1995), (Ninomiya et al., 2002) introduce the theoretically elegant ideal lenient default unification (ILDU). A pragmatic and efficient algorithm called lenient default unification whose time complexity is linear in the size of the strict and the default feature structure is also provided. In contrast to Carpenter’s credulous default unification, the goal of their algorithm is to maximize the information content of the resulting structure. Carpenter’s credulous default unification tries to maximize the amount of information from the default feature structure. Formally, the ideal lenient default unification is defined as , , G % f (F ' f G) is maximal , F ' G = & F ' G ,, such that F ' G is defined , without the bottom type >
where 'f is a subsumption relation where the bottom type is de>
fined. The optimal answer for ILDU is computed by F ' s (F ' f G) which has exponential time complexity (Copestake, 1993). As a realistic and fast alternative (Ninomiya et al., 2002) introduces lenient default unification which is almost like ILDU but is based on two basic ideas: 1. inconsistencies caused by path valid specifications are replaced by generalizing the types at the fail points.
433
EFFICIENT COMPUTATION OF OVERLAY
2. inconsistencies caused by path equivalence specifications can be removed by unfolding the structure-sharing of the failing path nodes. Even though (Ninomiya et al., 2002) is using HPSG, their algorithm is not mentioning anything about types (see below). Instead they compare lenient default unification directly with credulous default unification. This example shows how credulous and lenient default unification produce significantly different results. Whereas credulous default unification returns the strict structure5 *
*
DTRS:
H:HEAD:CASE: object NH:SPR:HEAD:CASE: nom
++
<
'c MOTHER:HEAD: *1 DTRS: *
*
DTR:
H: 2 NH:SPR: 2 =
HEAD:1 head
H:HEAD:CASE: object NH:SPR:HEAD:CASE: nom
+
++
,
the result of lenient default unification (in figure 3) is more informative. The additional information is due to the unfolding operation in lenient default unification: in case of clashes, the coreferences of the default structure are “pushed” as far as possible towards the leafs. Interestingly, (Ninomiya et al., 2002) do not mention anything about the effect of introducing types into the feature structures. Instead they admit that the result of “default unification” does not necessarily produce a totally well typed structure indicating that types are of secondary interest. It is worth noting that the approach to robust parsing taken in (Fouvry, 2003) produces one resulting structure from which different distinct solutions can be retrieved. During the processing of re-entrant structures “Feature links are kept, and all re-entrancies are maintained [in] the result.” (Fouvry, 2003, page 238). Thus, the approach resembles skeptical default unification rather than credulous.
5
In the example we assume that the type head has the features PHON, CASE, INV, and TENSE and the type sign has HEAD and VAL.
434
ALEXANDERSSON AND BECKER
head PHON: 3 MOTHER:HEAD: CASE: ⊥ INV: 4 TENSE: 5 sign PHON: 3 H: CASE: obj INV: 4 TENSE: 5 sign DTRS: head PHON: NH:SPR: HEAD: CASE: INV: TENSE: VAL:6
Figure 68.
3 nom 4 5
A result of lenient default unification
3.1. Discussion Finishing the discussion of related work, we focus on two topics. First, default unification and type systems and second, the complexity of default unification. Untyped, Typed, Open and Closed World Most research prior to this paper has been concerned with default unification on untyped feature structures, e.g., (Bouma, 1990; Carpenter, 1993). To our knowledge, there are two mentionings of typed default unification: (Copestake, 1993) A precise algorithm – typed default unification – is described taking the type system into account. The basic idea is to decompose the default feature structure, F , in a fashion similar to Atomic(F ) (Moshier, 1988). However, Copestake utilizes the observation that “Only parts of the feature structure which are fully type compatible with the non-default structure are split.” This resembles our approach since we heavily utilize the type hierarchy to remove unnecessary computation. In (Lascarides and Copestake, 1999), an order independent per<>
sistent default unification algorithm ( ' ) used in the LKB system
EFFICIENT COMPUTATION OF OVERLAY
435
(Copestake, 2002) is presented. The lodestar for this work is order independence, i.e., the result of their default unification algorithm does not rely on the order of the arguments. They postulate six desiderata for default unification, some of which do not apply in our case. (Lascarides and Copestake, 1999, s. 6.5) admit that their algorithm is not suitable for tasks as described in (Grover et al., 1994). Number 2 states that “default unification never fails unless there is a conflict in non-default information”. This is not acceptable in our case since we regard old information as defeasible and it can be overwritten. Number 4 states that default unification should return “a single result, deterministically”. As shown in, e.g.(Grover et al., 1994) it is sometimes not possible to determine what the unique answer is unless one throws information away. Also this is not desirable in our case. The price paid for order independence is the maintenance of a partial history of default information (Lascarides and Copestake, 1999, p. 11). W.r.t. com<>
plexity, ' has a worst-case complexity which is “proportional of n! in the number of individual unification operations.” Such a time-complexity would not be acceptable under the real-time constraints in dialog processing. (Ninomiya et al., 2002) Their algorithm does not mention types at all. Instead, in their footnote 1, they state that lenient default unification does not necessarily produce a totally well-typed feature structure. This caveat can “easily be enforced by the total type inference function.” Finally, even though this distinction is not mentioned explicitly, some approaches, e.g., (Grover et al., 1994), appear to be defined for a closed world type system, where only explicitly defined types exist. We make the same assumption for our definitions in section 4.2.
Complexity in Related Work The complexity of default unification algorithms, e.g., credulous default unification, is regarded to be exponential (Ninomiya et al., 2002; Grover et al., 1994). While our definitions for overlay in this chapter do not cover co-indexation, our algorithm which is based on efficient computations on the type hierarchy can be used as the basis for an extended version that would avoid the power-set computations as described in (Grover et al., 1994).
436
ALEXANDERSSON AND BECKER 4. Overlay = Default Unification + Score
We devote this section to the formal definition of overlay. As basis for our discussion we will use credulous and skeptical unification as defined in (Carpenter, 1993). There are some alternative suggestions in the literature (see also the discussion in section 3), but we prefer Carpenter’s definitions for of their clarity and power. Basically, we believe that overlay can be characterized as credulous default unification as described in (Carpenter, 1993) but not in the world of feature structures but typed feature structures (TFS). In the machinery we will define below, it is possible to create a bounded complete partial order using the subsumption relation in (Carpenter, 1992, p. 41). Given this relation, the characterization of credulous default unification in (Carpenter, 1993) holds. The reason for choosing credulous default unification instead of skeptical default unification is manifold and we give the two most prominent ones here: − In skeptical default unification information is thrown away which seems to be inadequate for discourse processing. − Skeptical default unification was defined due to the desire for obtaining one single unique result. While this might make sense in the lexicon, it is not always possible in discourse processing, as indicated in (Grover et al., 1994). In our work on overlay, we have extended credulous default unification in two important ways: − Carpenter does not tell us how to efficiently compute the result(s) of credulous default unification. Previous work (Grover et al., 1994; Copestake, 1993) suggests that credulous default unification is very expensive. While (Grover et al., 1994) argues that “In general we expect priority union to be a computationally expensive operation, since we cannot exclude pathological cases in which the system has to search an exponential number of subsets in the search for the maximal consistent elements which are required.” we provide a potentially efficient algorithm which utilizes the type hierarchy. The main insight is that it is possible to skip a lot of the computationally expensive search described in (Grover et al., 1994) by direct manipulation based on type information. − In discourse processing there are a lot of potential referents. Since overlay always succeeds, we need a way of discriminating the most probable result of overlay. Our scoring mechanism in (Pfleger et al., 2002) has proven to be a powerful and precise tool for distinguishing
EFFICIENT COMPUTATION OF OVERLAY
437
the most likely referent. In this section we will recapitulate its definition and adapt it to type hierarchies with multiple inheritance. First however, we start by recapitulating the formalities behind typed feature structures. 4.1. The formal semantics The formal characterization of overlay, i.e., credulous default unification operation for typed feature structures is found in (Carpenter, 1993). For complete definitions of TFS we refer to, e.g., (Carpenter, 1992; Krieger, 1995). Here we will recapitulate the most prominent definitions needed for the presentation below. For ease of presentation we concentrate on inheritance hierarchies where every set of TFSs has a unique unifier which might be the inconsistent TFS. In (Alexandersson and Becker, 2003) we gave two definitions on TFS: one with unary inheritance only and one with multiple inheritance. In this work we will reduce the definitions to the latter case since the former is a (trivial) version of the latter. Most of the definitions below are more or less the same as given in (Carpenter, 1992) but we choose to view the lattice of TFSs with the most general structure - indicated with ( - at the top. First we characterize our inheritance hierarchies. For our presentation here, our general interests are on hierarchies where, in case one exists at all, we have a unique least upper bound (lub) for every subset of types. The same goes for greatest lower bounds (glb): if there is one for a subset of the types in our inheritance hierarchy, it is unique. To characterize such hierarchies we use a partial order: DEFINITION 2. Partial Order A relation – % – on a set of types – Type – is a partial order in case it is: − reflexive, i.e., ∀x ∈ Type we have x % x − antisymmetric, i.e., ∀x, y ∈ Type if x % y ∧ y % x then x = y − transitive, i.e., ∀x, y, z ∈ Type if x % y ∧ y % z then x % z The following definition associates features with types. DEFINITION 3. Appropriateness Let Type, % be an inheritance hierarchy and f ∈ Feat. Then we define a partial function Approp : Feat × Type → Type where
438
ALEXANDERSSON AND BECKER
Feature introduction For every feature f ∈ Feat there is a most general type Intro(f ) ∈ Type such that Approp(f, Intro(f )) is defined Downward closure/Right monotonicity If Approp(f, τ ) is defined and σ % τ , then Approp(f, σ) is also defined and Approp(f, σ) % Approp(f, τ ) Thus, for each type t ∈ Type, there is a possibly empty set of features introduced by this type, denoted Introf(t) := {f | Intro(f ) = t}. Also, we will make us of Appropf(t) = {f | Approp(f, t)}, i.e., the set of features appropriate for a type t. We define typed feature structures where we assume a finite set of features Feat and an inheritance hierarchy Type, % that is a lattice. DEFINITION 4. Typed Feature Structures A typed feature structure (TFS) is a tuple F = Q, q, θ, δ where: − − − −
Q is a finite set of nodes with the root q q is the unique root node θ : Q → Type is a total node typing function δ : Feat × Q → Q is a partial feature value function
Below we will write θ(F ) to denote the type of a TFS, i.e., θ(q). Next, we provide a formal definition of an operation similar to δ but which, given a feature structure and a feature, returns the complete TFS. The operation also gives us MGSat, which computes the most general satisfier for a given type, but the starting point for Feature Value is a concrete TFS. In what follows we will denote a sequence of features π ∈ Feat∗ . We also assume δ to work with such sequences too, i.e., for some q we have δ(, q) = q and δ(f π, q) = δ(π, δ(f, q)). Furthermore, given some node, q, we assume a function, Closure(q), that computes the subset of nodes reachable from q, including q. We can now define a function that, given a TFS and a feature f , returns the value of f , i.e., a TFS: DEFINITION 5. Feature Value Let F = Q, q, θ, δ be a TFS and f ∈ Feat, such that f is defined for δ(q). Then the Feature Value of F with respect to f , F(F, f ) is a TFS F = Q , q , θ , δ , where − q = δ(f, q) − Q = Closure(q ) − θ = {θ(q i ) = ti |q i ∈ Q }
EFFICIENT COMPUTATION OF OVERLAY *
t1 F : bool
t2 F : bool G : bool
t3 F : bool G : true I : bool
439
+
t4 F : bool H : bool
t5 F : true G : bool H : true
Figure 69. A sample type hierarchy inspired by (Cop93). The values (bool, true, false) are types, i.e., TFSs without features
− δ = {δ(f k , q l ) = q m |q l ∈ Q ∧ Approp(f k , θ(q l ))} An important definition is the following: DEFINITION 6. Subsumption F = Q, q, θ, δ is said to subsume F = Q , q , θ , δ , F % F , iff there is a total function h : Q → Q such that: − h(q) = q − θ(q) + θ (h(q)) for every q ∈ Q − h(δ(f, q)) = δ (f, h(q)) for every q ∈ Q and feature f such that δ(f, q) is defined The subsumption definition is crucial since it makes it possible to arrange the set of TFSs as a partial order which can be used not only to characterize unification (' and the join operation &) but also default unification. As an example, see the sample lattice for a small type hierarchy in figure 69 (inspired by (Copestake, 1993)). Below we write lub(x, y) or & and glb(x, y) or ' to denote the type of the least upper bound and greatest lower bound of two TFSs x and y. We will sometimes use the same denotation for the least upper bound and greatest lower bound of the types of two TFSs. Finally, we say that F 1 properly subsumes F 2 – written F 1 F 2 – if F 1 + F 2 but not F 2 + F 1 . We will use for types as well.
440
ALEXANDERSSON AND BECKER
Next, we precisely define what is meant by the generalization of a TFS. Intuitively this means that we coerce the TFS to a more general type meaning that we eventually remove some feature. DEFINITION 7. Generalization Let F = Q, q, θ, δ be a TFS of type ts . Then G(F, tt ), the generalization of F to a type tt where ts % tt is a TFS, Q , q , θ , δ , where − − − −
q = q Q = Closure(q ) θ = {θ(q) = ti |q ∈ Q − {q }} ∪ {θ (q ) = tt } δ = {δ(f k , q l ) = q m |q l ∈ Q ∧ Approp(f k , θ (q l ))}
We will also make use of a recursive function Maximal Specialization, written S(F, tt ) which specializes a TFS (F ) by repeatedly unifying F with the most general satisfiers of the types on a path between θ(F ) and tt : (((F ' MGsat(ti )) ' MGsat(ti+1 )) ' . . . ' MGsat(tt )). Note that S might not succeed in specializing F to the target type since the unification on the way might fail. In this case, the specialization is interrupted. The credulous default unification operation is characterized by the following elegant and powerful definition taken from (Carpenter, 1993): DEFINITION 8. Credulous Default Unification < F ' c G = {F ' G |G + G is maximal such that F ' G is defined } Note that even though the definition was coined with non-typed feature structures in mind, since the definition relies on the information ordering on the feature structures, it is valid also for the typed case. A possible algorithm for the computation of credulous default unification is described in (Grover et al., 1994). There, the defeasible structure G (the background) is decomposed into a set of atomic feature structures At(G) (Moshier, 1988). The members of the set are then unified with the strict structure F (the covering) one by one until the unification fails. To obtain all possible solutions, their algorithm is using all possible orderings, that is, n! orderings where n is the number of atomic structures. Even though the decomposition of the defeasible structure is technically possible, the status of the intermediate structures is questionable. Moreover the result of each unification is an appropriate structure only because (in this case) in ALE (Carpenter and Penn, ) “. . . if adding a feature to a term results in a term of
EFFICIENT COMPUTATION OF OVERLAY
441
a new type, then the representation of the structure is specialized to reflect this.” (Grover et al., 1994, s. 3.1, p. 23). We are not convinced that this way of computing priority union (another name for credulous default unification) is completely clean and it certainly is unnecessarily inefficient. 4.2. A Definition and an Algorithm for Overlay In our implementation in the SmartKom system, we have experimented with a type system with unary inheritance only and without reentrant structures. This is a simplification which makes the algorithm for the computation of credulous default unification simple. However, a domain model (ontology) will in theory and practice come with multiple inheritance; concepts inherit from multiple other concepts or types via isa-links. This is also common in grammar development, e.g., HPSG (Pollard and Sag, 1994) and lexical semantic repositories, e.g., WordNet (Fellbaum, 1998). Therefore, we have developed here an efficient extension to our algorithm to function for such scenarios. The main algorithmic contribution of this work is the insight that, contrary to, e.g., (Grover et al., 1994) we do not need to disassemble the complete defeasible structure into atomic feature structures and unify the strict structure with all orderings of them. Rather, in a preprocessing step – referred to as assimilation – we compute the target types for the generalizations of the defeasible structure and of the final result directly.6 Given the types, we can then easily compute the desired generalization of the defeasible structure and the specialization of the strict structure. Most of the computations are on the type hierarchy and can, in a finite type hierarchy, even be precomputed. Assimilation is trivial for unary inheritance type systems but for multiple inheritance type systems it is a bit more complicated. The main reason is that the result type is no longer unique if there are interactions of multiple inheritance. A further stick in the wheel is the fact that for some formalisms, cases arise where a TFS cannot be specialized. The main reason is that the constraints imposed by subtypes can make the specialization impossible. As a simple example (see figure 69), it is impossible to specialize [t4 , F : f alse, H : bool] to the type t5 since the type of feature F is true for t5 . Therefore, assimilation is divided into several steps. 6
Note that the subsumption hierarchy of the structures is directly related to the subsumption hierarchy of the types.
442
ALEXANDERSSON AND BECKER
t lub
t bg’’
step 4
step 3
t co step 2
tbg’
t co’
step 1 t bg Figure 70.
t glb
The four steps in computing the assimilated type
To provide more intuition, our explanation is based on an example from the type hierarchy depicted in figure 69 and we set the defeasible structure, bg, to [t3 , F : true, G : true, I : bool] and the strict, co, to [t4 , F : bool, H : true]. In what follows, it might be helpful to draw the complete lattice, ordered under subsumption, of all possible TFSs. The first three steps will generalize bg and specialize co in a well-defined manner until the types are sufficiently compatible such that in the final step, the actual combination of the feature structures can be performed. 1. We compute the type of lub(bg, co) which in our example is the type t1 . 2. Then, we generalize bg to the least feature structure such that its type has a non-trivial glb with the type of co. In our example, the type of the thus generalized bg is t2 and bg has lost one of its features: [t2 , F : true, G : true]. 3. Next, we specialize the co to the type of the glb. In our example this means that the strict structure now is [t5 , F : true, G : bool, H : true]. 4. Finally, the two assimilated feature structures can be combined. This needs to be done recursively by taking the assimilated strict structure and for each feature (according to its type), we overlay its value over the corresponding value in the assimilated defeasible structure. In our example, there is no recursion but the assimilated defeasible structure contributes the value true for feature G and the resulting feature structure is [t5 , F : true, G : true, H : true] At this point we have some comments:
EFFICIENT COMPUTATION OF OVERLAY
443
− in general, the specialization yields multiple results. It can be implemented by repeatedly unifying the strict structure with the most general satisfier of its subtypes until we have reached the glb or the unification fails. However, these maximal specializations may be specialized too far: to stick to the definition of credulous default unification, the iteration must stop at the maximal specialization that is itself the unification of cover and generalized background. − the values of the features in our example do not conflict. However, this is an important effect of default unification and thus overlay – new information overwrites old. For the presentation below, we view atomic values, i.e., integers, as typed feature structures without features. That is, integers form a small lattice (it is really a tree) with the top type [int] with an unlimited set of siblings – the TFSs [1], [2], [3], . . . . Slightly simplified, overlaying two ints, say co =[42] and bg =[4711] is done by – during assimilation – generalizing [4711] to the lub, i.e., [int] and then combining the two structures by unifying [int] with [42] obtaining the latter TFS. The final result will be a set FS 6 of feature structures that are generalizations of bg suitable for overlay with the corresponding specialized instances of co. Figure 70 shows the steps performed for each path. Figure 71 (page 445) is a more general example with two paths from bg to lub where co is specializable to different degrees. Below we give a precise definition of overlay. The steps 1–3 are done during assimilation and step 4 during core overlay. During the presentation of the algorithm below we will refer to the latter figure. ALGORITHM 1. Assimilation Let co and bg be two TFS (covering and background) such that co = Qco , q co , θco , δ co and bg = Qbg , q bg , θbg , δ bg . The assimilation of co and bg– assim(co, bg) – is computed by the following four steps: 1. Find the unique tlub = lub(tbg , tco ). In figure 71, this is t1 . Find all paths pi from bg to tlub . For each pi , find the maximally specific type tbg (or rather tbg p i ) that (i) is on pi and (ii) such that tbg and tco have a nontrivial (i. e., not ⊥) glb, tglb . In figure 71 there are two tbg : t2 and t2 . Next, for each type tbg , generalize bg to this type (or more precisely, to the maximally specific generalization of that type). tbg := θ(q bg ) (the type of bg)
(4)
tco := θ(q co ) (the type of co)
(5)
444
ALEXANDERSSON AND BECKER
tlub := tco & tbg
(6)
T bg := {tbg | tlub + tbg + tbg , tbg maximal with ⊥ = tbg ' tco } (7) (8) FS bg := {fs bg | fs bg = G(bg, tbg ), tbg ∈ T bg } 2. For each type tbg ∈ T bg , the goal is to maximally specialize co towards the corresponding unique glb of the types. Specialization can be done by repeatedly unifying co with the MGSat of the types on the paths between co and the corresponding glb. However, it might not be possible to successfully unify the glb with the MGSats, e.g., in our example we assume that we cannot specialize t3 to t4 , but only to t7 . A further restriction is imposed by the definition of credulous default unification: the specialization must itself be the unifier of co and a generalization of bg. In general, co is specialized to all maximally specific subtypes tco that (i) subsume the glb and (ii) are unifiers of tco and a generalization (see also step 3) tbg of tbg . In figure 71, these are t7 and t4 . (9) T co := {tco | tco = θ(fs co ), co + fs co + tglb , (10) fs co maximal , tbg ∈ T bg , tco = tco ' tbg , (11) tbg = θ(G(fs bg , tlub ))} (12) FS co := {fs co | fs co = S(co, tco ), tco ∈ T co } 3. The next step is a further (maximally specific) generalization of bg from the tbg , now to the unique lubs tbg it has with the specialized cos tco . In figure 71, these generalizations are to t6 and t2 and the corresponding glbs are t7 and t4 . T bg := {tbg | tlub + tbg + tbg , tbg ∈ T bg , tbg maximal } FS bg := {fs bg | G(bg, tbg ), tbg ∈ T bg
(13) (14)
Finally, the generalizations of bg should be “forced” to tco . Note that a strict specialization might not be possible, just as with co.7 However, we are now in a situation where tbg – the type of the generalized 7
There is an alternative definition that avoids specialization of co if it is not supported by a correspondingly specializable fs bg : tco can be defined as the maximal specialization such that co and fs bg can be specialized to it. Such a specialization might not exist at all, in which case tbg equals tlub and bg is generalized accordingly.
445
EFFICIENT COMPUTATION OF OVERLAY
t1 t6 3
t3
4 2
t2
t7
1
t 6’ 2
t7’
t 2’ 4
1 t5
t4
t 4’
Figure 71. The four steps of overlay of a co of type t3 over a bg of type t5 with multiple paths to the lub t1
bg– subsumes the (specialized) type tco of co and the further steps are defined below as the actual overlay operation of the fs co over fs bg . In figure 71 this is overlaying an fs of type t7 over one of type t6 on the left and an fs of type t4 over one of type t2 on the right. We can now proceed to define overlay (step 4 in figure 71) as an unification-like operation over the two assimilated feature structures. ALGORITHM 2. Overlay Let co and bg be two TFS (covering and background) such that the assimilated TFS are co = Qco , q co , θco , δ co and bg = Qbg , q bg , θbg , δ bg then overlay(co, bg) is defined as: overlay(co, bg) := overlay (co , bg ) := Qo , q o , θo , δ o
(15)
q o := q co and θo (q o ) := θco (q co ) ' θbg (q bg ) and f ∈ Feat : δ o (f, q o ) := (16) overlay(F(co , f ), F(bg , f )) if f ∈ co and f ∈ bg , (17) δ co (f, q co ) if f exists only in co , The first case (16) is the recursive step used when the values exist in both assimilated feature structures. The next case (17) applies when the feature is absent in the background and we use the value from the covering. Note that overlay is, unlike unification, not a commutative operation. Hence we have overlay(a, b) = overlay(b, a), where a and b are two
446
ALEXANDERSSON AND BECKER
TFS such that a = b, and unify(a, b) would fail. However, if a and b are unifiable, then overlay(a, b) = overlay(b, a) = unify(a, b) = unify(b, a). 4.3. Complexity issues There are two different complexity issues that we will cover briefly in this section: First, we need to characterize the number of results from the overlay operation. Second, we have reached our goal to generate all solutions efficently as we argue below. Although we restrict our definitions to latticies without structure sharing, which gives unique results for most steps, step 1 follows all paths pi from bg to tlub , potentially finding a unique result for each path. The number of such paths depends on the concrete type hierarchy. In general, it is linear in the number of types, although it requires a pathological type hierarchy to come close to this theoretical bound. For a given type hierarchy better bounds for the number of paths (or the length of paths and the maximum number of direct supertypes) might exist. A note on lattice: We believe that our definitions and algorithm could be extended to type hierachies meeting, for instance, the BCPO criteria and even more powerful hierarchies where two structures might not even have a unique unifier. Among other effects, the result of the unification operation may not be unique in such type hierarchies. Thus it should not be surprising that assimilation will result in a large number of results, since at each step, the number of solutions is multiplied by at most the number of minimal upper bounds (mubs) or maximal lower bounds (mlbs) respectively. Note that both, the number of steps in the algorithm and the number of mubs/mlbs are finite, though: there are at most five steps, and at most |T | (the number of types) mubs/mlbs, so that the number of solutions for one recursive step of overlay would be bound by |T |5 , still a constant w.r.t. the size of the argument feature structures. For each result of overlay (on the “top-level” features), recursive calls on the values of the features will follow. Since multiple results of overlay will have some overlap (some features will differ, but for some results there will be identical features and values), a hash-table like solution is necessary to avoid redundant computations. This, however, is easy to implement. In general, the algorithm presented above is linear in the number of solutions, and thus optimal in a certain sense. Each computation that is started will (as overlay does) always generate a result and the
EFFICIENT COMPUTATION OF OVERLAY
447
complexity for each step is (as argued above) independent from the size of the arguments. 4.4. The Scoring Function In order to clarify the notion of scoring, we briefly summarize the work of (Pfleger et al., 2002; Pfleger, 2002; Alexandersson et al., 2004) and present the necessary extensions of this work for type systems with multiple inheritance. Recall that overlay is a non-monotonic operation that always succeeds. If our task is to validate a hypothesis against the discourse memory, we will thus succeed for all referents, independent from how good or bad the hypothesis fits the referent at hand. Our solution to this is a function that computes a score that can either be used for ranking several possible hypotheses, or given a threshold, even disqualify the result of the operation. During previous usage of overlay one single result for each operation was computed – it was never necessary to compare multiple results from a single application of overlay, the score did help distinguishing one interpretation fitting the current discourse from an unexpected one. Interpretations of the latter case differed in most cases in their designated position in the ontology, i.e., the “movie Terminator on TV” vs. the “movie Terminator in a cinema”. In any case, if the application at hand needs a unique result, multiple results could also be disambiguated by a system-initiated clarification dialog. For the scenario in this paper, however, we have multiple results even for each usage of overlay. So, additionally to the comparison between the results of different referents, these multiple results should be scored so that they can be compared. We will below first recapitulate previous scoring functions but then present a possible way of comparing multiple results. The new scoring function is based on the assumption that results containing more information should be given a higher score and that the score should be lowered if information from the background is discarded by overlay. In (Alexandersson and Becker, 2001) the scoring function was based on simple heuristics consisting of a combination of the amount of information contained in the (combined) output and the distance in discourse history. In (Pfleger et al., 2002; Pfleger, 2002) the scoring function has received a more intuitive and formal design. Finally, in
448
ALEXANDERSSON AND BECKER
(Alexandersson et al., 2004), we showed how the “distance” between the types in the type hierarchy can be utilized. Since in this paper we have chosen a slightly different formalization – values are feature-less TFSs – our formula from (Alexandersson et al., 2004) has to be modified accordingly. During (the recursive application of) overlay, the following numbers are collected: |q| the number of typed nodes in the resulting feature structure, used for normalizing the values of the scoring function wtc the weighted type clash (see below) In previous definitions, the weighted type clash was the loss of information as we move in the type hierarchy from the type of the background to the type of the least upper bound. Since our version of assimilation is more complex, it is possible and reasonable to also take the possible win of information in steps 3 and 4 into account. The weighted type clash is based on the notion of informational distance, defined for each pair of types below. Since we formalize values as feature-less TFSs, the notion of “conflicting values” is replaced by a type clash, and thus we add a special definition of “informational distance of values” below. DEFINITION 9. Informational Distance Let − co, bg, tbg , tbg , tbg , tco , tco be defined as in algorithm 1 − |t| be the number of features that are Approp(t) for a type t where there exists some subtype of tlub that has at least one feature. Then, loss is defined as: loss(co, bg) := |Appropf(tbg )| − |Appropf(tbg” )| and the relative loss is defined as: -
rloss(co, bg) :=
0 loss(co,bg) |Appropf(tbg )|
if |Appropf(tbg )| = 0 otherwise
Similarly, win is defined as: win(co, bg) := |Appropf(tco )| − |Appropf(tbg” ) ∪ Appropf(tco )| and the relative win is defined as: -
rwin(co, bg) :=
0 win(co,bg) |Appropf(tco )|
if |Appropf(tco )| = 0 otherwise
EFFICIENT COMPUTATION OF OVERLAY
449
Finally, the informational distance – idist – is defined as: idist(co, bg) := max (0, rloss(co, bg) − rwin(co, bg)) Note that we explicitly avoid negative values of distance which would otherwise be possible if the relative win is larger than the relative loss. Note further, that the previous definitions of informational distance used to express the distance between background and result while the definition above also takes the distance between cover and result into account. Since informational distance is based on the number of features lost or won, it will always be zero for values (i.e., feature-less TFS). Thus we refine the definition for informational distance of values: DEFINITION 10. Informational Distance of Values Let − co, bg, tbg , tbg , tbg , tco , tco be defined as in algorithm 1 where there exists no subtype of tlub that has at least one feature. Then, loss and relative loss are defined as: $
loss(co, bg) := rloss(co, bg) :=
1 if tbg” tbg 0 otherwise
Also, win and relative win are defined as: $
win(co, bg) := rwin(co, bg) :=
1 if tbg” or tco tco 0 otherwise
Finally, the informational distance – idist – is defined as: idist(co, bg) := max (0, rloss(co, bg) − rwin(co, bg)) Next, we compute the weighted type clash which is the sum of all idist collected during overlay: DEFINITION 11. Weighted Type Clash For each node q in the result of overlay(co, bg) let the corresponding pair in QP air be defined by − (co, bg) ∈ QP air be the set of pairs of feature values from co and bg associated to each node if they exist − or else – if q is introduced in step 3 of the algorithm, see also equation (17) – (fv , fv ) ∈ QP air where fv is the feature value associated with q, with idist(f v, f v) := 0
450
ALEXANDERSSON AND BECKER
Then, the the weighted type clash – wtc – is defined as wtc =
.
(co,bg)∈QPair idist(co, bg)
The informational score then is the weighted type clash (the sum of all informational distances), normalized by the size of the result: DEFINITION 12.
Informational Score score(co, bg) =
|q| − wtc(co, bg) |q|
Unlike in previous definitions, this score ranges from 0 to 1. The upper extremal indicates two unifiable arguments whereas the lower extremal indicates that all information in the result stems from the covering. All scores between these two extremals indicate that the cover more or less fits the background, the higher the score the better the fit. Note that this score will compute possibly different scores for each solution. This has to do with, e.g., the potentially different numbers of features stemming from the background. For the overall task of discourse processing we have to take into account additional factors beyond the scoring of overlay. One simple but well functioning heuristic which is used in SmartKom is recency, i. e., how “far back” the background is found in the discourse history or how “focused” it is. In SmartKom, backgrounds being further away are penalized whereas backgrounds in focus are rewarded.
5. Generalizing Overlay
In our definitions so far, excluding information from the background in the result of overlay is mainly achieved by generalizing the type of bg in a minimal way. But consider the following example: (233) User: What’s on TV tonight? (234) System: Here (*) is an overview of tonight’s TV program. (235) User: OK, what’s on channel three? (236) System: This (*) is tonight’s program of channel three. (237) User: Oh well, what’s on TV tomorrow night?
EFFICIENT COMPUTATION OF OVERLAY
451
The intention of the user in this situation is ambiguous. She might be asking for the program of channel three or a general overview. The type-based assimilation mechanism of overlay does not help in this example, since the final utterance clearly is of type Broadcast, but nevertheless the channel slot of the background might or might not be relevant. In the strict interpretation of overlay as defined in section 4.2 it would be added to the intention. To extend overlay to also generate the general interpretation, the definition must be extended as follows: For every feature that is not set in the cover, generalized overlay has two valid results, with or without the information from the background.8 Another case in which a user utterance might have to be interpreted in a more general way affects the definition of overlay in a different sense. If, in the above example, the final utterance by the user is instead What movies are there tomorrow night? then the strict interpretation is that it refers to watching TV (on channel three). Since the user has not mentioned watching TV explicitly, the utterance is ambiguous and a generalized interpretation would be “What movies are playing in movie theaters or on TV tomorrow night?” Provided that such an interpretation of type Entertainment is generated by the analysis modules, overlay as defined in step 2 of definition 1 of assimilation would always specialize the type to Broadcast. Thus, in the same manner as discussed above for using values from the background or not, the assimilation (i. e., specialization) of a cover must be extended to return a set of all possibilities. Step 2 could be reformulated by dropping the maximality constraint on the specialization: T co := {tco | tco = θ(fs co ), co % fs co % tglb , tbg ∈ T bg }
(18)
Note that the other steps are not affected since the type of the covering is not changed there. While these extensions enlarge the set of results from the overlay operation, they still give a precise definition of the search space, based on the properties of the type hierarchy.
8
One might argue that the last utterance is really a reference back to the initial utterance What’s on TV tonight? and thus overlay should not take the current context as the background but rather “close” that node in the discourse history and go back to the initial utterance.
452
ALEXANDERSSON AND BECKER 6. Conclusion
We have provided an extended formalization of overlay as an operation that combines new and old information that is encoded as typed feature structures in such a way that preference is given to new information but as much as possible of the old information is added to the result. In particular, we use this information in dialog systems where a new user utterance must be combined with old information from the discourse history. Overlay hinges on a typed, frame-based representation framework and a scoring function, expressing the degree of similarity between two TFS. We have given a strict definition of overlay that keeps all possible old information even in a multiple-inheritance type hierarchy. In order to cover cases where not all compatible old information is desired, the generalized version of overlay as discussed in the last section applies, as it defines the search space of possible interpretations. We have given an algorithm that is based on the type hierarchy and could serve as the basis for an efficient algorithm even for the general case of credulous default unification with co-indexation. Finally, we have given an adapted scoring function in section 4.4. We believe that overlay is a very general operation that can be applied in to many phenomena. In (Pfleger et al., 2002), we have already laid out how overlay can be used to resolve anaphora and we have recently taken the first step towards extending overlay to, in restricted scenarios, work with sets (Romanelli et al., 2005). Current work includes extending the algorithm for bounded complete partial orders and the inclusion of re-entrant structures into the algorithm.
Acknowledgements
The research within SmartKom presented here is funded by the German Ministry of Research and Technology under grant 01 IL 905. The responsibility for the content is with the authors. We would like to thank Frederik Fouvry, Walter Kasper, Bernd Kiefer, Thomas Kleinbauer, Hans-Ulrich Krieger, Stephan Lesch, Takashi Ninomiya, Norbert Pfleger and Christian Schulz for invaluable discussions and comments and the anonymous reviewers for comments on earlier drafts.
EFFICIENT COMPUTATION OF OVERLAY
453
References Alexandersson, J. and T. Becker: 2001, ‘Overlay as the Basic Operation for Discourse Processing in a Multimodal Dialogue System’. In: Workshop Notes of the IJCAI-01 Workshop on “Knowledge and Reasoning in Practical Dialogue Systems”. Seattle, Washington. Alexandersson, J. and T. Becker: 2003, ‘The Formal Foundations Underlying Overlay’. In: Proceedings of the 5th International Workshop on Computational Semantics (IWCS-5). Tilburg, The Netherlands. Alexandersson, J., T. Becker, and N. Pfleger: 2004, ‘Scoring for Overlay based on Informational Distance’. In: KONVENS-04. Vienna, Austria, pp. 1–4. Baker, C. F., C. J. Fillmore, and J. Lowe: 1998, ‘The Berkeley FrameNet project’. In: Proceedings of COLING-ACL. Montreal, Canada. Bouma, G.: 1990, ‘Defaults in Unification Grammar’. In: Proceedings of the 28nd Annual Meeting of the Association for Computational Linguistics. University of Pittsburgh, Pennsylvania, USA, pp. 165–172. Calder, J.: 1993, ‘Typed unification for natural language processing’. In: T. Briscoe, V. de Paiva, and A. Copestake (eds.): Inheritance, Defaults, and the Lexicon. Cambridge, CA: Cambridge University Press, pp. 13–37. Carpenter, B.: 1992, The logic of typed feature structures. Cambridge, England: Cambridge University Press. Carpenter, B.: 1993, ‘Skeptical and Credulous Default Unification with Applications to Templates and Inheritance’. In: T. Briscoe, V. de Paiva, and A. Copestake (eds.): Inheritance, Defaults, and the Lexicon. Cambridge, CA: Cambridge University Press, pp. 13–37. Carpenter, B. and G. Penn, ‘ALE – The Attribute-Logic Engine’. http://www.cs.toronto.edu/~gpenn/ale.html. Copestake, A.: 1993, ‘The representation of lexical semantic information’. Ph.D. thesis, University of Sussex. Copestake, A.: 2002, Implementing Typed Feature Structure Grammars, No. 110 in CSLI lecture notes. CSLI Publications. Doherty, P.: 1991, ‘NML3 – A Non-Monotonic Formalism with Explicit Defaults’. Ph.D. thesis, Link¨ oping University. Fellbaum, C. (ed.): 1998, WordNet: An Electronic Lexical Database. MIT Press. ISBN: 026206197X. Fouvry, F.: 2003, ‘Robust Processing for Constraint-based Grammar Formalisms’. Ph.D. thesis, University of Essex. Grover, C., C. Brew, S. Manandhar, and M. Moens: 1994, ‘Priority Union and Generalization in Discourse Grammars’. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Las Cruces, New Mexico, pp. 17–24. Gurevych, I., R. Porzel, H.-P. Zorn, and R. Malaka: 2003, ‘Semantic Coherence Scoring Using an Ontology’. In: Proceedings of the Human Language Technology Conference - HLT-NAACL 2003. Edmonton, CA.
454
ALEXANDERSSON AND BECKER
Imaichi, O. and Y. Matsumoto: 1995, ‘Integration of Syntactic, Semantic and Contextual Information in Processing Grammatically Ill-Formed Inputs’. In: Proc. of the 14th IJCAI. Montreal, Canada, pp. 1435–1440. Kaplan, R. M.: 1987, ‘Three seductions of computational psycholinguistics’. In: P. Whitelock, H. Somers, P. Bennett, R. Johnson, and M. M. Wood (eds.): Linguistic Theory and Computer Applications. London: Academic Press, pp. 149–188. Karttunen, L.: 1986, ‘D-PATR: A Development Environment for UnificationBased Grammars.’. In: Proceedings of COLING ’86. Bonn, Germany, pp. 25–29, Institut f¨ ur angewandte Kommunikations- und Sprachforschung e.V. (IKS). Karttunen, L.: 1998, ‘The Proper Treatment of Optimality in Computational Phonology’. In: K. Oflazer and L. Karttunen (eds.): Finite State Methods in Natural Language Processing. Bilkent University, Ankara, Turkey, pp. 1–12. Krieger, H.-U.: 1995, ‘TDL – A Type Description Language for ConstraintBased Grammars. Foundations, Implementation, and Applications.’. Ph.D. thesis, Universit¨ at des Saarlandes, Department of Computer Science. Lascarides, A. and A. A. Copestake: 1999, ‘Default Representation in Constraint-based Frameworks’. Computational Linguistics 25(1), 55–105. Loeckelt, M., T. Becker, N. Pfleger, and J. Alexandersson: 2002, ‘Making Sense of Partial’. In: Bos, Foster, and Matheson (eds.): Proceedings of the 6th workshop on the semantics and pragmatics of dialogue (EDILOG 2002). Edinburgh, pp. 101–107. Moore, R. C.: 1995, Logic and Representation, No. 39 in CSLI Lecture Notes. Stanford, CA: CSLI Publications. Moshier, M. A.: 1988, ‘Extensions to Unication Grammars for the Description of Programming Languages’. Ph.D. thesis, University of Michigan, Ann Arbor. Ninomiya, T., Y. Miyao, and J. Tsujii: 2002, ‘Lenient Default Unification for Robust Processing within Unification Based Grammar Formalisms’. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002. Taipei, Taiwan, pp. 744–750. Pfleger, N.: 2002, ‘Discourse Processing for Multimodal Dialogues and its Application in SmartKom’. Diplomarbeit, Unversit¨ at des Saarlandes. Pfleger, N., J. Alexandersson, and T. Becker: 2002, ‘Scoring Functions for Overlay and their Application in Discourse Processing’. In: KONVENS-02. Saarbr¨ ucken, Germany. Pfleger, N., J. Alexandersson, and T. Becker: 2003, ‘A robust and generic discourse model for multimodal dialogue’. In: Workshop Notes of the IJCAI03 Workshop on “Knowledge and Reasoning in Practical Dialogue Systems”. Acapulco, Mexico. Pollard, C. and I. A. Sag: 1994, Head-Driven Phrase Structure Grammar. Chicago, Illinois: University of Chicago Press and CSLI Publications.
EFFICIENT COMPUTATION OF OVERLAY
455
Porzel, R. and I. Gurevych: 2002, ‘Towards Context-adaptive Utterance Interpretation’. In: Proceedings of the 3rd SIGdial Workshop on Discourse and Dialogue. Philadelphia, PA, pp. 154–161, Association for Computational Linguistics. Reithinger, N., J. Alexandersson, T. Becker, A. Blocher, R. Engel, M. L¨ oeckelt, J. M¨ ueller, N. Pfleger, P. Poller, M. Streit, and V. Tschernomas: 2003, ‘SmartKom - Adaptive and Flexible Multimodal Access to Multiple Applications’. In: Proceedings of ICMI 2003. Vancouver, B.C. Romanelli, M., T. Becker, and J. Alexandersson: 2005, ‘On Plurals and Overlay’. In: C. Gardent and B. Gaiffe (eds.): Proceedings of the 9th Workshop on the Semantics and Pragmatics of Dialogue (DIALOR). Nancy, France, pp. 101–108. Russel, S. and P. Norvig: 1995, Artificial Intelligence: A Modern Approach. Englewood Cliffs, NJ: Prentice Hall. Wahll¨ of, N.: 1996, ‘A Default Extension to Description Logics and its Applications’. Link¨ oping University, Licentiate Thesis. Wahlster, W.: 2003, ‘Towards Symmetric Multimodality: Fusion and Fission of Speech, Gesture, and Facial Expression’. In: B. N. A. G¨ unther, R. Kruse (ed.): KI 2003: Advances in Artificial Intelligence. Proceedings of the 26th German Conference on Artificial Intelligence. Berlin, Heidelberg, pp. 1–18, Springer. Zeevat, H.: 2005, ‘Conditional Anaphor’. In: C. Gardent and B. Gaiffe (eds.): Proceedings of the 9th Workshop on the Semantics and Pragmatics of Dialogue (DIALOR). Nancy, France, pp. 109–114.
RICHARD CROUCH, ANETTE FRANK AND JOSEF VAN GENABITH
LINEAR LOGIC BASED TRANSFER AND STRUCTURAL MISALIGNMENT
1. Introduction
In machine translation, ambiguities in the source language often carry across to the target language. These include syntactic ambiguities, such as some prepositional phrase attachments (John saw the man with a telescope / Jean a vu l’homme avec un t´el´escope) or semantic ambiguities, such as quantifier scope (Every student answered a question / Jeder Student beantwortete eine Frage). Rather than mechanically trying to pick a single intended interpretation of the source utterance, more accurate translation is likely if the full range of ambiguity can be preserved, leaving it to the human interpreter to resolve the ambiguity in the target. In cases like the above, a single sentence preserves all the ambiguities; in others, ambiguity preservation may necessitate generating a (hopefully small) range of alternatives. Proposals for ambiguity preserving translation typically involve transferring an underspecified semantic representation of the source sentence to an underspecified representation of the target, and from it generating target sentences, see e.g. Alshawi et al. (1991) and Emele and Dorna (1998). A variant of this approach was proposed by van Genabith et al. (1998) (henceforth GFD), where transfer takes place on lexical meaning constructors of the kind used in glue semantics (Dalrymple et al., 1996). As GFD point out, these lexical meaning constructors provide a form of underspecified semantic representation, allowing one to determine when transfer preserves semantic ambiguity. Transfer at the level of glue constructors also has other advantages. It allows for a highly lexicalized, reversible, and semi-automatable definition of transfer rules by comparing lexical entries from two mono-lingual lexicons. Since meaning constructors actually provide an encoding of the syntaxsemantics interface, generation of target sentences is more direct than it would be from a purely semantic representation. Precisely because glue meaning constructors encode the syntaxsemantics interface, transfer at this level faces problems of structural 457 H. Bunt and R. Muskens, (eds.), Computing Meaning, Volume 3, 457–472. c 2007 Springer.
458
CROUCH ET AL.
misalignment, familiar from purely syntax-based approaches to transfer (Kaplan et al., 1989). One of the most notorious cases of this is (embedded) head switching, two treatments of which are discussed by GFD, neither of them fully satisfactory. This chapter provides a more satisfactory account of structural misalignment. As with GFD the source sentence is parsed, and a set of instantiated lexical meaning constructors obtained, to which transfer rules are applied. However, the result of application is not a set of target meaning constructors. Instead it is a set of transfer constructors; a linear logic derivation consumes these to produce a set of target meaning constructors, from which the target sentence is generated. The resource-sensitive nature of the transfer derivation allows problematic cases of structural misalignment to be dealt with locally and lexically. Since transfer derivations are structurally similar to glue derivations, techniques for efficient glue derivation, e.g. (Gupta and Lamping, 1998), can be exported directly to transfer derivations. The similarity of glue and transfer derivations also means that in certain cases transfer can be sensitive to scope distinctions.
2. Glue Semantics and Transfer
2.1. Glue Semantics Glue semantics embodies a notion of ‘interpretation as deduction’ closely related to the ‘parsing as deduction’ paradigm of categorial grammar. A glue logic is used to deductively piece together the meanings of words and phrases in a (syntactically analysed) sentence, to assemble the meaning of the sentence as a whole. The meaning logic, used to represent the meanings of words and phrases, is quite distinct from the glue logic used to assemble those meanings. Following Dalrymple et al. (1999a), we use a minor extension of the implication-only fragment of propositional linear logic as the glue logic, and a ‘vanilla’ logic of generalised quantifiers as the meaning language. We also adopt their ‘Curry-Howard’ formulation of glue semantics, where meaning language expressions are treated as terms labelling glue logic formulas. This replaces the older notation of Dalrymple et al. (1996), with its uninterpreted meaning assignment predicate . The Curry-Howard formulation has the distinct advantages of (i) completely separating the glue and meaning logics, and (ii) removing the need to use higher-order unification in glue derivations.
LOGIC-BASED TRANSFER AND MISALIGNMENT
459
Although glue semantics can be applied to a variety of grammatical formalisms (Asudeh and Crouch, 2001; Frank and van Genabith, 2001), we will employ Lexical Functional Grammar (Kaplan and Bresnan, 1982) as our syntactic base. We illustrate glue semantics by means of the simple example sentence (238). (238) Hans cooks. Assume the following two lexical entries: (239) a.
b.
cooks
Hans
V
NP
↑ PRED = cook ↑ SUBJ cook : (↑ SUBJ)σ −◦ ↑ σ ↑ PRED = Hans hans :↑ σ
The ↑ meta-variables refer to the nodes in f(unctional)-structure onto which the lexical items project in a given parse. The glue constructors, shown on the second line of each entry, refer to semantic (σ) projections of these f-structure nodes: these correspond to resources that consume and produce meanings. The constructor for “Hans” pairs the meaning term hans with the resource ↑ σ . The constructor for the intransitive verb “cooks” pairs the one-place meaning predicate cook with the implication (↑ SUBJ)σ −◦ ↑ σ . The implication says that the meaning of the verb’s subject, (↑ SUBJ)σ must be consumed in order to produce the meaning of the clause headed by the verb, ↑ σ . Assume a grammar that, with this lexicon, derives the following fstructure for the example sentence, where f and g are arbitrary labels used to name the f-structure nodes. In doing so, the parse instantiates the ↑ meta-variables in the glue constructors to give the instantiated constructors shown alongside:
(240)
PRED
cook ↑ SUBJ
SUBJ
g:
f :
PRED
Hans
cook : g σ −◦ f σ hans : g σ
Here, f σ and g σ correspond to f-structure nodes, but denote semantic resources. The instantiated meaning constructors form the premises to a glue derivation. The goal of a glue derivation is to consume all the lexically obtained premises to prove that there is a single semantic resource corresponding to the outermost f-structure node producing a meaning.
460
CROUCH ET AL.
Ignoring the meaning terms for a moment, in (240) there are two lexical premises, g σ and g σ −◦ f σ , and we need to prove f σ . A simple derivation suffices: g σ −◦ f σ g σ
(241)
fσ
−◦ E
The Curry-Howard isomorphism links the natural deduction rule of implication elimination (−◦ E or modus ponens) with the functional application of the proof/meaning terms of the two premises. (Implication introduction gives rise to λ-abstraction.) The derivation above consequently automatically constructs the meaning term cook( hans) for the sentence, as follows cook : g σ −◦ f σ hans : g σ
(242)
cook(hans) : f σ
−◦ E
This is, of course, a very simple illustrative example. However, in all more complex cases a propositional linear logic derivation builds the scaffolding on which meaning terms are combined by means of functional application or λ-abstraction, as dictated by the proof rules used. In many cases, though not in the example above, distinct glue derivations, constructing distinct meaning terms, can be obtained from a single set of glue premises. These multiple derivations account for nonsyntactic ambiguities like quantifier scope, as we will see later. 2.2. Generation from Instantiated Constructors Starting just with the instantiated meaning constructors and the lexicon, it is possible to reconstruct the f-structure of our example sentence. Using the meaning terms as indices into the lexicon, we can retrieve the entries for “Hans” and “cooks”. Comparing the instantiated constructor (243a) with the uninstantiated constructor (243b) cook : g σ −◦ f σ
(243) a. b.
cook : (↑ SUBJ)σ −◦ ↑ σ
we can see that node g is the SUBJ of node f . Moreover, by looking at the feature equations in the entry for “cooks”, namely (244) ↑ PRED = cook ↑ SUBJ
LOGIC-BASED TRANSFER AND MISALIGNMENT
461
we can determine what the PRED of f is. Likewise, by matching the instantiated constructors hans : g σ against the uninstantiated entry for “Hans”, we can determine the PRED of f ’s subject (i.e. g). This gives us enough information to reconstruct the original f-structure. And from this, we generate the original sentence. 2.3. Direct Transfer of Glue Constructors Direct transfer of source glue constructors to target glue constructors is proposed by (van Genabith et al., 1998) (GFD). Suppose we have a German lexicon including the following two entries (245)
a. kocht V
↑ PRED = kochen ↑ SUBJ kochen : (↑ SUBJ)σ −◦ ↑ σ
b. Hans
↑ PRED = Hans
NP
hans :↑ σ
and a grammar that derives f-structure (246a) for the sentence “Hans kocht” (Hans cooks), with instantiated meaning constructors shown in (246b) : *
(246) a. b.
f:
PRED SUBJ
+
kochen ↑ SUBJ g : PRED Hans
kochen : g σ −◦ f σ hans : g σ
By the previous section, given the instantiated constructors and the German lexicon, we could generate the German f-structure and hence the German sentence. Starting from the previously mentioned instantiated source (English) constructors — cook : g σ −◦ f σ and hans : g σ — the following transfer rules (247) (247)
a. ∀G, F cook : G −◦ F ⇔ kochen : G −◦ F b. ∀G
hans : G
⇔
hans : G
yield the required instantiated target (German) constructors, from which generation of the target sentence can proceed. GFD points out several advantages of this transfer scheme. First, the transfer rules are in many cases derivable from a simple comparison of paired lexical entries, and much of this can be done automatically. Second, neither the instantiated constructors nor the transfer rules make reference to f-structure attributes such as SUBJ or OBJ. Information about these attributes is only obtained by matching instantiated
462
CROUCH ET AL.
constructors against mono-lingual lexical entries. GFD exploit this to deal with argument switching, as in Das Photo ist Hans mißlungen – Hans a rˆ at´e la photo (Hans messed up/ruined the photo), where grammatical roles get switched. Third, in cases where the source and target constructors are isomorphic, the range of possible glue derivations is preserved, thus preserving semantic ambiguity. They illustrate this by showing how scope ambiguities can be preserved in transfer. However, GFD face difficulties in dealing with the well known problem of head switching (section 3). Their two suggested solutions are critiqued in (Crouch et al., 2001), where a position backing off to UDRS-based transfer is proposed. Section 4 shows how an extra layer of indirection can be used to rescue GFD’s original vision of glue-based transfer.
3. Head Switching
Head switching is exemplified by the English — German translation pair: (248) Hans kocht gerne
↔
Hans likes cooking
The German attitudinal adjunct gerne is translated in English as a control construction involving the verb like. Syntactically like is the head of the English sentence (the sentence is the maximal projection of like) whereas gerne is an adverbial subconstituent of the German sentence. These differences are manifest in the corresponding f-structures:
pred
(249) a.
f 1 :subj adjn
like f 2 , f1 subj f 2 : pred hans * + f 3 : pred cook f 2 xcomp f 1 : subj f 2 : pred hans pred
b.
kochen f 2 hans f 2 : pred {f 3 : pred gerne }
In transferring from the German to the English f-structure, the German adjunct f-structure f 3 , embedded within f 1 , becomes the outer English f-structure f 3 , within which f 1 is embedded. Transfer on f-structure representations thus involves a complex inside-out folding operation.
LOGIC-BASED TRANSFER AND MISALIGNMENT
463
Worse still is where a head switching case is embedded inside another structure as in Ede vermutet daß Hans gerne kocht ↔ (250) Ede assumes that Hans likes cooking pred
(251) a.
subj f 1 : comp pred
b.
subj f 1 : comp
vermuten f 2 , f 3 f 2 : pred ede pred kochen f 4 f 3 :subj f 4 : pred hans adjn {f 5 : pred gerne }
assume f 2 , f 5 f 2 : pred ede pred like f 4 , f3 subj f 4 : pred hans * + f 5 : subj f 4 : xcomp f 3 : pred cook f 4
Here, it is not just the fact that the embedding order between f 3 and f 5 is reversed. The head switch has to be communicated to the outermost, unswitched, f-structure f 1 . The German complement argument of f 1 (vermuten) is f 3 . But the English complement argument of f 1 (assume) is f 5 . The embedded head switch thus changes the comp arguments in the embedding structure. In other words, head switching can have non-local effects on structures, such as f 1 , within which the switch takes place. This poses a challenge for transfer systems operating on the basis of local, purely lexical rules. 3.1. Head-Switched Meaning Constructors The following are instantiated meaning constructors for the German sentence in (11) (σ subscripts omitted to avoid clutter): ede : vermuten : hans : (252) kochen : λP, x. gerne(x, P (x)) :
f2 f 2 −◦ (f 3 −◦ f 1 ) f4 f 4 −◦ f 3 (f 4 −◦ f 3 ) −◦ (f 4 −◦ f 3 )
These lead to the following glue derivation
464
(253)
CROUCH ET AL. (f 4 −◦ f 3 ) −◦ (f 4 −◦ f 3 ) f 4 −◦ f 3
f 4 −◦ f 3
f 2 −◦ (f 3 −◦ f 1 ) f2 f 3 −◦ f 1 f3 vermuten(ede, gerne(hans, kochen(hans))) : f 1
f4
with the final meaning term shown. The instantiated English constructors for “Ede”, “Hans” and “cooking” differ only from the German constructors in their meaning terms. But note the differences between the constructors for likes–gerne and assumes–vermutet: a. λP, x.like(x, P (x)) : (f 4 −◦ f 3 ) −◦ (f 4 −◦ f 5 ) (254) b. λP, x.gerne(x, P (x)) : (f 4 −◦ f 3 ) −◦ (f 4 −◦ f 3 ) (255)
a.
assume : f 2 −◦ (f 5 −◦ f 1 )
b. vermuten : f 2 −◦ (f 3 −◦ f 1 )
The node f 5 in the English constructors replaces the underlined occurrences of f 3 in the German constructors. Since the gerne–likes translation clearly needs to introduce an extra level of structure, we might envisage a purely lexical transfer rule (256)
∀G, F. λP, x.gerne(x, P (x)) : (G −◦ F ) −◦ (G −◦ F ) ⇔ λP, x.like(x, P (x)) : (G −◦ F ) −◦ (G −◦ new)
where G and F range over matched structures, and new is a constant denoting the additional node introduced by the English control construction. A similar, purely lexical transfer rule for vermuten–assume would most naturally be (257)
∀G, H, F. vermuten : G −◦ (H −◦ F ) ⇔ assume : G −◦ (H −◦ F )
In the absence of an embedded head switching, this transfer rule works well. But in the case where the complement of vermuten induces head switching in transfer, we need to replace the underlined occurrence of H by the newly introduced head switched node. How to do this solely on the basis of local, purely lexical transfer is described in the next section.
LOGIC-BASED TRANSFER AND MISALIGNMENT
465
4. Linear Logic Transfer Constructors
To summarize the embedded head switching problem from the last section: translating “gerne” to “likes” involves wrapping an extra layer of structure, f 5 , around f 3 . The constructor that was originally expecting to consume f 3 , obtained by translating “vermutet” to “assumes”, has to be told to consume f 5 instead of f 3 . We would like this change to be communicated while only using local, purely lexical transfer rules. Another way of describing what happens is that the gerne–likes transfer associates a new ‘topmost’ structure with f 3 . In the German sentence, f 3 is its own topmost structure, which we represent by the assertion Top(f 3 , f 3 ). The gerne–likes transfer updates this assertion with Top(f 3 , f 5 ). The meaning constructor for “assumes” needs to consume the topmost structure associated with f 3 , whatever that structure happens to be. The association of a topmost structure with a node does not take place within glue meaning constructors — the association simply does not make any sense there. Instead, we will make these associations within linear logic based transfer constructors. In order to keep the transfer logic distinct from the glue logic, we will use −◦τ and ⊗τ to refer to the connectives of the transfer logic. The basic transfer architecture is this. A set of lexically defined transfer rules map instantiated source meaning constructors onto transfer constructors. The transfer constructors are premises to a transfer derivation. By analogy to glue derivations, the goal of a transfer derivation is to prove a single assertion about the topmost structure associated with the outermost source f-structure node. A consequence of deriving this will be to produce a set of instantiated target meaning constructors, from which generation of the target sentence can proceed. 4.1. A Transfer Derivation Section 4.3 describes the transfer rules that map source meaning constructors onto transfer constructors. In this section, we merely state what the resulting transfer constructors are for our German–English embedded head switching example, and show how the transfer derivation proceeds. Recall the German source meaning constructors (meaning terms slightly simplified, and constructors numbered for ease of reference):
466
CROUCH ET AL.
ede : 1. 2. hans : 3. kochen : 4. vermuten : 5. gerne :
f2 f4 f 4 −◦ f 3 f 2 −◦ f 3 −◦ f 1 (f 4 −◦ f 3 ) −◦ (f 4 −◦ f 3 )
From the source meaning constructors and the transfer mapping rules (described in the next section) we obtain the transfer constructors shown below. Each transfer constructor is a conjunction of two formulas: a transfer formula that consumes and produces topmost node assertions, and a glue formula giving a target meaning constructor. 1.
Top(f 2 , f 2 )
⊗τ
ede : f 2
2.
Top(f 4 , f 4 )
⊗τ
hans : f 4
3. ∀X.
[Top(f 4 , X) −◦τ Top(f 3 , f 3 )]
⊗τ cook : X −◦ f 3
4. ∀X, Y. [Top(f 2 , X) −◦τ (Top(f 3 , Y ) −◦τ Top(f 1 , f 1 ))] assume : X −◦ (Y −◦ f 1 ) ⊗τ 5. ∀X, Y. [(Top(f 4 , X) −◦τ Top(f 3 , Y )) −◦τ (Top(f 4 , X) −◦τ Top(f 3 , new))] like : (X −◦ Y ) −◦ (X −◦ new) ⊗τ
Transfer constructor (1) says that f 2 is its own topmost node, and produces the meaning constructor ede : f 2 . Transfer constructor (3) consumes an assertion about the topmost node of f 4 to produce an assertion that f 3 is its own topmost node. It also produces the meaning constructor cook : X −◦ f 3 , where X is whatever topmost node was associated with f 4 . Constructor (5) is the crucial one, but is best understood after looking at the transfer derivation. It is important to note how, in all cases, the transfer formula replicates exactly the structure of the target glue formula. This ensures that the transfer derivation will parallel the target glue derivation. The transfer derivation from premises 1–5 proceeds as follows (meaning terms in glue constructors omitted, and glue constructors in smaller font). First combine premises (3) and (5) 3 Top(f 4 , X) −◦τ Top(f 3 , f 3 ) ⊗τ X −◦ f 3 5 (Top(f 4 , X) −◦τ Top(f 3 , Y )) −◦τ (Top(f 4 , X) −◦τ Top(f 3 , new)) ⊗τ (X −◦ Y ) −◦ (X −◦ new) Top(f 4 , X) −◦τ Top(f 3 , new)
⊗τ X −◦ f 3 ⊗τ (X −◦ f 3 )
−◦ (X −◦ new)
This associates with f 3 a new topmost node, new, provided that we can find the topmost node of f 4 . The value new is instantiated in one of
LOGIC-BASED TRANSFER AND MISALIGNMENT
467
the meaning constructors. Premise (2) produces f 4 as its own topmost node, allowing us to conclude: Top(f 4 , X) −◦τ Top(f 3 , new) 2 Top(f 4 , f 4 ) ⊗τ f 4 Top(f 3 , new)
⊗τ f 4
−◦ f 3
⊗τ X −◦ f 3 ⊗τ (X −◦ f 3 ) −◦ (X −◦ new)
⊗τ (f 4
−◦ f 3 ) −◦ (f 4 −◦ new) ⊗τ
f4
That is, new is now asserted to be the topmost node of f 3 . This assertion combines with premise (4), corresponding to the word assumes, (and premise 1). Assumes consumes whatever the topmost node of f 3 is: in this case new rather than f 3 . Hence Top(f 3 , new) ⊗τ f 4 −◦ f 3 ⊗τ (f 4 −◦ f 3 ) −◦ (f 4 −◦ new) ⊗τ f 4 1 Top(f 2 , f 2 ) ⊗τ f 2 4 Top(f 2 , X) −◦τ (Top(f 3 , Y ) −◦τ Top(f 1 , f 1 )) ⊗τ X −◦ (Y −◦ f 1 ) Top(f 1 , f 1 ) ⊗τ f 4 −◦ f 3 ⊗τ
(f 4 −◦ f 3 )
−◦ (f 4 −◦ new) ⊗τ
f 4 ⊗τ f 2 ⊗τ f 2 −◦ (new −◦ f 1 )
This consumes all the transfer constructors, results in a single assertion that f 1 is its own topmost structure, and produces the desired set of target meaning constructors. No other derivation consuming all the premises and producing a single Top(f 1 , ?) assertion is possible. Note how the last step of the derivation instantiates the variable Y to the value new in the meaning constructor for assumes, communicating the changes brought about by the head switch in the first step of the derivation. 4.2. Multiple Head Switching This approach generalizes straightforwardly to cases of multiple head switching, e.g. (258) Hans kocht schließlich gerne. where the adverb schließlich is analogous to gerne, and translates into the English control verb ends up. The sentence can translate either as Hans ends up liking cooking or as Hans likes ending up cooking. This ambiguity corresponds to an adverb scope ambiguity in German, and is reflected in transfer by the availability of two transfer derivations. From the German source constructors (meaning terms simplified) 1 hans : 2 kochen : 3 gerne : 4 schliesslich :
f2 f 2 −◦ f 1 (f 2 −◦ f 1 ) −◦ (f 2 −◦ f 1 ) (f 2 −◦ f 1 ) −◦ (f 2 −◦ f 1 )
468
CROUCH ET AL.
it is evident that the two adverbials (3) and (4) are of the same type, and can permute in either order around the kochen constructor (2). Assuming similar transfer rules for schliesslich and gerne, the transfer constructors will be (meaning constructors omitted) 1 2 3 4
Top(f 2 , f 2 ) Top(f 2 , X) −◦τ Top(f 1 , f 1 ) (Top(f 2 , X) −◦τ Top(f 1 , Y )) −◦τ (Top(f 2 , X) −◦τ Top(f 1 , new1 )) (Top(f 2 , X) −◦τ Top(f 1 , Y )) −◦τ (Top(f 2 , X) −◦τ Top(f 1 , new2 ))
It is likewise evident that the transfer constructors (3) and (4) can permute in either order around (2). If (3) and (2) are combined first (the ends up liking translation) the top of f 1 is first updated to new1 , and then by (4) to new2 . If (4) and (2) are combined first (the likes ending up translation), the top of f 1 is first updated to new2 and then to new1 . This is a case where ambiguity preservation necessitates the generation of two target sentences. Because transfer derivations mirror ambiguities in the glue derivations, we succeed in detecting the two sentences required. In other words, where necessary transfer can be made sensitive to scope ambiguity. 4.3. Deriving Transfer Rules We now turn to the question of how to obtain the transfer rules that map source meaning constructors onto transfer constructors. These are obtained from aligned monolingual lexicons following the same lines as GFD. The hard part is to recognise the parallel semantic resources in the source and target constructors. In many cases this can be done automatically either through recognition of parallel f-structure attributes in source and target, or balancing up occurrences of distinct resources on either side. Hard cases, or where it is clear that there is no complete parallelism (as in head switching) can be passed to human rule writers. As an example, in comparing the entries (259)
a. vermuten : (↑ SUBJ) −◦ (↑ COMP) −◦ ↑ b. assume : (↑ SUBJ) −◦ (↑ COMP) −◦ ↑
it is easy to identify ↑, (↑ SUBJ) and (↑ COMP) as parallel resources in source and target. The source side of the transfer rule is given by the source meaning constructor with variables in place of the parallel resources. The resulting transfer constructor is obtained by making two
LOGIC-BASED TRANSFER AND MISALIGNMENT
469
copies of the target constructor, again with variables in place of parallel resources. We strip the meaning term off the first copy to form the basis of the transfer formula, giving as an intermediate stage (260)
vermuten : G −◦ (H −◦ F ) ∀F, G, H ⇒ [G −◦ (H −◦ F )] ⊗τ assume : G −◦ (H −◦ F )
We now identify the rightmost consequent variable in the transfer formula, in this case F . We replace this by the predication Top(F, F ). All other variables are associated a unique topmost variable, e.g. Top(G, X), and the variables in the transfer formula are replaced by these predications. Variables in the meaning constructors are replaced by their associated topmost variables. The associated topmost variables are universally quantified with scope over the whole transfer constructor. Thus we finally obtain the transfer rule: ∀F, G, H vermuten : G −◦ (H −◦ F ) ⇒ ∀X, Y. [Top(G, X) −◦ (Top(H, Y ) −◦ Top(F, F ))] (261) ⊗τ assume : X −◦ (Y −◦ F ) This way of constructing transfer rules ensures that transfer formulas exactly mirror target glue formulas. As a result, transfer derivations mirror glue derivations 4.4. Quantifiers One exception to this exact correspondence between transfer and glue formulas occurs in the case of quantifier meanings. A quantified pronoun like “everyone” illustrates the standard glue treatment of quantifiers, and is given a meaning constructor (262)
everyone : (↑ σ −◦ S) −◦ S
where S is a variable that can range over atomic semantic resources (the scope of the quantifier). The formula (↑ σ −◦ S) −◦ S is just a type raised version of the atomic formula ↑ σ . The transfer formula in the constructed rule is taken from the lower-type formula. Thus, for example (263)
jeder : (G −◦ S) −◦ S ∀G ⇒ Top(G, G) ⊗τ everyone : (G −◦ S ) −◦ S
Assuming a similar transfer rule for “etwas” (something), the transfer constructors obtained from the sentence “Jeder sah etwas” (everyone saw something) would be
470
CROUCH ET AL.
1 Top(g, g) 2 Top(h, h) 3 Top(g, X) −◦ (Top(h, Y ) −◦ Top(f, f ))
−◦ S ) −◦ S −◦ S) −◦ S see:X −◦ (Y −◦ f )
⊗τ everyone:(g
⊗τ something:(h ⊗τ
Here, there is just one transfer derivation, instantiating X to g and Y to h, despite the possibility of two distinct target glue derivations. 4.5. The Nature of Transfer Derivations As previously noted, transfer constructors parallel target glue constructors, so that transfer derivations parallel target glue derivations. This has a number of consequences. First, the existence of a transfer derivation guarantees the existence of a target glue derivation; we can be sure that we translate only into semantically interpretable sentences. Second, techniques developed for efficient glue derivation (such as the skeleton-modifier approach of Gupta and Lamping (1998)) can be applied directly to transfer derivations; there is sharing of technology. Third, as observed in connection with multiple head-switching, different transfer derivations can lead to distinct sets of target constructors. This arises in cases where there is no one target sentence that captures the full range of meanings open to the source sentence; ambiguity preservation necessitates the generation of multiple target sentences. Given the close connection between glue and transfer derivations, we can have some confidence that the correct ambiguities are being preserved. However, in some cases it is formally possible to have multiple transfer derivations all leading to the same set of target constructors. This parallels what often happens in glue derivations where, e.g., distinct ways of scoping existentially quantified NPs all lead to logically equivalent meanings. (Note, though, that the type-lowered transfer constructors for quantified NPs actually eliminate spurious transfer derivations arising from quantifier scope ambiguities). Techniques for efficiently detecting and removing such equivalent glue derivations can fortunately also be applied to transfer derivations.
5. Conclusions
This chapter presented a resource-sensitive approach to transfer. A source sentence is parsed, and a set of instantiated lexical meaning constructors is obtained. Transfer rules rewrite the source meaning constructors to a set of transfer constructors. A linear logic derivation
LOGIC-BASED TRANSFER AND MISALIGNMENT
471
consumes the transfer constructors to produce a set of instantiated target meaning constructors, from which a target sentence can be generated. The resource-sensitive nature of the transfer derivation allows problematic cases of structural misalignment to be dealt with smoothly and locally. In most cases, the transfer rules can be derived semiautomatically from aligned mono-lingual source and target lexicons. Cases where ambiguity preservation can only be achieved by multiple target translations are readily accommodated. Techniques developed for efficient linear logic derivations in the context of glue semantics apply directly to efficient transfer derivations. Using linear logic for transfer has also been suggested by Fujinami (1999), but not applied to structural mismatch. The treatment of headswitching bears some relation to unpublished work of Martin Emele’s, though it is not clear that his use of ‘internal’ and ‘external’ variables extends to cases of multiple head-switching. Although applied to transfer at the level of glue language meaning constructors, we would hope that our linear logic based transfer scheme could be extended to deal with structural mismatches at other levels of representation. Finally the resource sensitive nature of the transfer derivations allows for the possibility that some target lexical glue constructors get consumed in transfer. This might apply, for example, in translating the two-word English expression “commit suicide” into the French verb “se suicider”: the transfer constructor for commit – se suicider can be set up so as to consume the results of transferring the noun “suicide”. Examples such as this also often lead to a specificity ordering over transfer rules. It is an interesting question whether this kind of specificity ordering can receive a direct and explicit encoding in a linear logic based transfer scheme.
References Alshawi, H., Carter, D., Gamb¨ ack, B. and Rayner, M.: 1991. Translation by quasi logical form transfer. In Proceedings 29th Annual Meeting of the Association for Computational Linguistics (ACL’91), Berkeley, California,. pp. 161–168. Asudeh, A. and Crouch, R.: 2001. Glue semantics for HPSG. In Proceedings 8th International Conference on Head-Driven Phrase Structure Grammar, Trondheim. pp. 1–19. Crouch, R., Frank, A. and Genabith, J. van.: 2001. Glue, Underspecification and Translation. In Bunt, H., Muskens, R. and Thijsse, E. editors, Computing Meaning, volume 2, Kluwer, Dordrecht. pp. 165–184.
472
CROUCH ET AL.
Dalrymple, M., Lamping, J., Pereira, F.C.N. and Saraswat, V.: 1996. Quantification, anaphora, and intensionality. Journal of Logic, Language and Information 6(3) 219–273. Reprinted in Dalrymple, M, editor, 1999. Semantics and Syntax in Lexical Functional Grammar. Cambridge, MA.: MIT Press, pp. 39–90. Dalrymple, M., Gupta, V., Lamping, J. and Saraswat, V.: 1999a. Relating resource-based semantics to categorial semantics. In Dalrymple, M, editor, 1999. Semantics and Syntax in Lexical Functional Grammar. Cambridge, MA.: MIT Press, pp. 261–280. Emele. M. and Dorna, M.: 1998. Ambiguity preserving machine translation using packed representations. In Proceedings of COLING-ACL’98, Montr´eal, Canada. pp. 365–371. Frank, A. and Genabith, J. van: 2001. Linear logic based semantic construction for LTAG – and what it teaches us about LFG and LTAG. In Butt, M. and King, T. H., editors, Proceedings of the LFG’01 Conference, Stanford. Stanford: CSLI Publications, http://www-csli.stanford.edu/publications/. pp. 104–126. Fujinami, T.: 1999. A Decidable Logic for Speech Translation. Dagstuhl Workshop on Linear Logic and its Applications. Genabith, J. van, Frank, A. and Dorna, M.: 1998. Transfer constructors. In Butt, M. and King, T. H., editors, Proceedings of the LFG’98, Conference Brisbane, Australia. Stanford: CSLI Publications, http://www-csli.stanford.edu/publications/. pp. 190–205. Gupta, V. and Lamping, J.: 1998. Efficient Linear Logic Meaning Assembly. In Proceedings of COLING-ACL’98, Montr´eal, Qu´ebec, Canada. pp. 464–470. Kaplan, R. and Bresnan, J.: 1982. Lexical functional grammar. In Bresnan, J., editor 1982, The mental representation of grammatical relations. MIT Press, Cambridge Mass. pp. 173–281. Kaplan, R., Netter, K., Wedekind, J. and Zaenen, A.: 1989. Translation by Structural Correspondences. In Proceedings of 4th Annual Meeting of the Association for Computational Linguistics (EACL 1989), Manchester, UK. pp. 272–281.
INDEX
accommodation, 99, 101 alternative set semantics, 155–156 although, 146, 151, 152 ambiguity, 1, 2, 56 count/mass, 70 lexical, 56, 58 structural semantic, 58, 71–75 syntactic, 58, 75–78 anaphora, 88–121 bridging, 378–395 temporal, 98 anaphoric reference sloppiness in, 12–29 anchor, 377–395 intended, 380–389 potential, 380–394 annotation semantic, 3, 4 temporal, 301–344 annotation algorithm, 37–50 appropriateness, 437 assertion operation, 198 assimilation, 441, 443 auxiliary tree, 237, 241, 248, 250 background, 153 Boolean operators, 411 bridging anaphora, 378–395 definite description, 395 inference, 90, 121 relation, 102, 377–395 by-item analysis, 27 by-subject analysis, 27 causality, 267, 269, 271, 272 centering, 406 clobber, 429 coercion, 226 coherence
discourse, 88, 101–103, 105, 117 collective properties, 404 common sense principle of inertia, 265 communication problem and metalinguistic negation, 200 compactness, 68 compatability, degree of, 427 complexity issues, 446 complexity, exponential, 431 compositionality, 2, 236–254 computational efficiency, 425 concession, 146–149, 151, 152, 157– 159, 162, 166, 168, 169 concessive opposition, 146–149, 152, 164, 166, 167, 169 Constraint Language for Lambda Structures, 66, 68, 74, 78– 80 constraint logic programming, 274 context, 1–4, 6, 9, 173–191 change approach to meaning, 3 potential (CCP), 87, 89 model, 3 semantics, 6 update, 153, 154, 156, 160 contradiction contour, 199 conventional implicature, 151 Core Language Engine system, 68, 71 coreference direct, 375–382 indirect, 375–382 corpus analysis limits of, 24–26 cost-based unification, 432 credulous default unification, 440
474
INDEX
Dale and Reiter algorithm, 398, 401, 404, 407, 411, 417 Dale and Reiter’s algorithm, 370 DAML-Time, 344 default reasoning, 424 default unification, 424, 429 default unification, lenient, 432 default unification, credulous, 430 default unification, ideal lenient, 432 default unification, lenient, 432 default unification, order independent persistent, 434 default unification, skeptical, 430 definite descriptions bridging, 369–395 denial of expectation, 146–153, 158, 159, 161, 162, 164, 166, 169 DenK system, 67, 68, 70 derivation tree, 237–239, 242, 244, 246, 248, 249 Description Logic, 391–394, 424 description theory of linguistic representation, 349 descriptive negation, 198 dialogue context, 126, 137, 140 directed graphs, 399 disconfirmations, 201 discourse context, 377–391 discourse connective, 146, 151 discourse context, 173, 424 discourse processing, 424 Discourse Representation Theory, 173– 179, 243, 257, 258, 264, 272 Segmented, 87–122 Underspecified, 64 discourse structure, 87–121 discourse update, 93, 115, 118–121 disjunction, 412 disjunctive description, 412 distributivity of modifiers, 74 of quantifiers, 73
document creation time, 315, 318, 319, 321, 325, 330, 336 domain-independent lexicon, 219 domain-independent ontology, 216, 217 domain-specific ontology, 216 Dominance Constraints, 55, 65, 71 Normal, 55, 64–69 dominance constraints, 64 duration, 316–317, 329 dynamic semantics, 87–99 Dynamic Syntax, 126 elementary tree, 236, 238, 241, 242, 245, 247, 250 ellipsis, 78, 80 ellipsis resolution, 125–142 event calculus, 265–299 event expressions, 301–344 event typology, 322–323 expressive adequacy, 66, 68 expressive completeness, 66, 68, 71– 73, 76 f-structure, 33–50 familiarity of antecedents, 370–395 feature value, 438 finite-state descriptions, 351–366 flexible composition, 233–254 fluent, 267–272, 275, 278–298 focus, 145–169 focus (Praguean), 406 formal metaconstant, 70 functional uncertainty expressions, 48 fundamental frequency, 203 generalization, 440 Generation of Referring Expressions, 397 glue logic, 89, 90, 112, 115–119, 121 Glue Semantics, 66, 457–471 GODIS system, 137, 140 gradable properties, 404 granularity, 57, 78
INDEX temporal, 316, 327, 332 graphs, 399 gre, 397 greatest lower bounds, 437 H∗ pitch accent, 196 Head Driven Phrase Structure Grammar, 126–140 head switching, 458, 462–465, 467– 468 Higher-Order Unification, 126, 139, 140 hole, 64 Hole Semantics, 55, 64–66, 69, 72, 75, 240 human-machine interaction, 200 imparfait, 257–299 imperfective paradox, 278, 347 incomplete input, 58, 80 Incremental Algorithm, 370, 394, 398, 399, 401, 404, 407, 411 Incremental Boolean algorithm, 419 incremental processing, 58, 80 incrementality, 411 information structure, 127, 145–158 informational distance, of values, 449 informational distance, 448 inheritance, 424–452 multiple, 425–452 integrity constraint, 280, 281, 283, 285, 286, 293 interpretation framework, 61, 68–69, 80 intonation contour, 196 inverse linking, 233–254 IS-sensitive context, 158 IS-sensitive context update, 156, 162, 166 ISO 8601 standard, 315, 328, 329 isomorphism subgraph, 400, 403 justified sloppiness, 21–24 testing of, 26–28
475
KOS framework, 126 L+H∗ pitch accent, 196 labelled directed graphs, 399 least upper bound, 437, 442 lenient composition, 430 lenient default unification, 432 Lexical-Functional Grammar, 33–50 Lexicalized Tree Adjoining Grammars, 233–254 lexicon specialization, 223, 224 logic with free quantifier variables, 350 logical optimisation, 418 logical simplification, 418 long-distance dependencies, 33, 39, 42, 48, 49 meaning, 1–3, 6, 7, 146, 257, 258, 276, 278, 457–459 and intonation, 6 and underspecification, 2, 3 computational, 284 constructor, 66, 457–471 context-change approach to, 3 customization of, 7, 213–229 discourse, 6, 156 linguistic, 154 postulates, 286, 347, 349 mereology reference to, 15–18 merging of discourse representations, 175– 179 meta-linguistic negation, 198 metaconstant formal, 63, 72, 74, 76 referential, 63, 79, 80 metalinguistic negation, 199 metavariable, 55, 64, 70, 74, 75, 77, 79, 80 minimal models, 272 Minimal Recursion Semantics, 64, 68, 69, 238
476
INDEX
modality fusion, 428 most general satisfier, 438 multiple inheritance, 425–452
question answering, 301–344 questions under discussion, 126–140 Quine-McCluskey algorithm, 418
Natural Language Generation, 397 negation, 407 negation as failure, 274 nevertheless principle, 195 Normal Dominance Constraints, 55, 64–66, 69, 72
radical reification, 67, 69, 73, 76 reference time, 318 reference to sets, 403 referential metaconstant, 69–71 relational descriptions, 413 relational type theory, 183 rheme, 153–169 rheme-alternative set, 150, 155, 156, 158–161, 163, 166–168 rhetorical relation, 88–121 robust parsing, 433
ontological promiscuity, 67 ontology, 213, 215–219, 226, 424 domain-specific, 213, 216 generic, 216, 217, 223 optimality theory, 208 overlay, 424–425, 427–452 generalization of, 451 parallelism, 78, 125–139 parser customization, 221, 229 partial order, 437 partitionings, 414 partitions, 414 pass´e simple, 257–299 Penn treebank, 33–50 Phliqa system, 63 plans reference to, 18–21 PLUS system, 68 predicate lifting, 183 presupposition, 90–121, 151, 156, 159, 161, 162, 164, 166–168 presupposition denial, 197 presupposition failure, 198 priority union, 429 ProFIT system, 133 pronoun resolution, 11–29 prosodic variation, 195 proto-f-structure, 33, 37, 39, 41, 48 quantifier scope, 12, 35, 36, 45, 233– 254 Quasi-Logical Form, 33–50, 62, 66, 68, 71
salience, 406 satellite sets, 407 score, 447 score, cost, 432 scoring, 447 scoring function, 424 Segmented Discourse Representation Theory, 87–122 selectional restrictions, 215, 216 semantic arguments, 217 features, 217 grammars, 214 semantic annotation, 73 short answers, 125, 142 sign, 184 sloppiness justified, 21 testing of, 26–28 sluicing, 125–142 spatial areas reference to, 23 SPICOS system, 72 subgraph isomorphism, 400 subsumption, 439 supervaluation, 63 syntactic parsers, 215 temporal
INDEX annotation, 301–344 expressions, 301–344 granularity, 316, 327, 332 index, 305, 309, 310 information, 301–344 ordering, 303, 309, 310, 341 temporal semantics, 257–299, 347– 366 Tendum system, 71, 72 theme, 153–169 theme-alternative set, 155, 156, 158, 160, 161, 163, 167 time stamping, 303, 338, 341 TimeBANK, 344 TimeML, 326 topic (Praguean), 406 transfer glue-based , 458–471 transfer constructor, 458, 465–471 treebank, 33–50 TRIPS system, 213, 214, 218, 219 TUNA project, 420 type polymorphism, 180 typed feature structure, 426, 438 defeasable, 429 strict, 429 underspecification, 2–4, 11–29, 35, 55–82, 90, 112, 115, 120, 121, 233–254, 349, 350, 353, 360
477
and meaning, 2, 3 lexical, 11 technique, 62 in situ representation, 62 holes and pluggings, 64 labels and constraints, 64 metaconstants, 63, 64 metavariables, 63, 64 radical reification, 67, 69, 73, 76 Underspecified Discourse Representation Theory, 64 Underspecified Logical Form, 67, 68, 71 underspecified semantic representation, 55–62, 90 unification, cost-based, 432 uniqueness of antecedent, 370–394 unknown words, 57, 58, 79 utterance time, 315, 318 vagueness, 56–58, 78, 404 referential, 57, 78 relational, 57, 78 Vendler class, 322, 358, 359, 361, 362 Verbmobil system, 72 yes/no question, 201
Studies in Linguistics and Philosophy 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.
23. 24.
H. Hi˙z (ed.): Questions. 1978 ISBN 90-277-0813-4; Pb: 90-277-1035-X W. S. Cooper: Foundations of Logico-Linguistics. A Unified Theory of Information, Language, and Logic. 1978 ISBN 90-277-0864-9; Pb: 90-277-0876-2 A. Margalit (ed.): Meaning and Use. 1979 ISBN 90-277-0888-6 F. Guenthner and S.J. Schmidt (eds.): Formal Semantics and Pragmatics for Natural Languages. 1979 ISBN 90-277-0778-2; Pb: 90-277-0930-0 E. Saarinen (ed.): Game-Theoretical Semantics. Essays on Semantics by Hintikka, Carlson, Peacocke, Rantala, and Saarinen. 1979 ISBN 90-277-0918-1 F.J. Pelletier (ed.): Mass Terms: Some Philosophical Problems. 1979 ISBN 90-277-0931-9 D. R. Dowty: Word Meaning and Montague Grammar. The Semantics of Verbs and Times in Generative Semantics and in Montague’s PTQ. 1979 ISBN 90-277-1008-2; Pb: 90-277-1009-0 A. F. Freed: The Semantics of English Aspectual Complementation. 1979 ISBN 90-277-1010-4; Pb: 90-277-1011-2 J. McCloskey: Transformational Syntax and Model Theoretic Semantics. A Case Study in Modern Irish. 1979 ISBN 90-277-1025-2; Pb: 90-277-1026-0 J. R. Searle, F. Kiefer and M. Bierwisch (eds.): Speech Act Theory and Pragmatics. 1980 ISBN 90-277-1043-0; Pb: 90-277-1045-7 D. R. Dowty, R. E. Wall and S. Peters: Introduction to Montague Semantics. 1981; 5th printing 1987 ISBN 90-277-1141-0; Pb: 90-277-1142-9 F. Heny (ed.): Ambiguities in Intensional Contexts. 1981 ISBN 90-277-1167-4; Pb: 90-277-1168-2 W. Klein and W. Levelt (eds.): Crossing the Boundaries in Linguistics. Studies Presented to Manfred Bierwisch. 1981 ISBN 90-277-1259-X Z. S. Harris: Papers on Syntax. Edited by H. Hi˙z. 1981 ISBN 90-277-1266-0; Pb: 90-277-1267-0 P. Jacobson and G. K. Pullum (eds.): The Nature of Syntactic Representation. 1982 ISBN 90-277-1289-1; Pb: 90-277-1290-5 S. Peters and E. Saarinen (eds.): Processes, Beliefs, and Questions. Essays on Formal Semantics of Natural Language and Natural Language Processing. 1982 ISBN 90-277-1314-6 L. Carlson: Dialogue Games. An Approach to Discourse Analysis. 1983; 2nd printing 1985 ISBN 90-277-1455-X; Pb: 90-277-1951-9 L. Vaina and J. Hintikka (eds.): Cognitive Constraints on Communication. Representation and Processes. 1984; 2nd printing 1985 ISBN 90-277-1456-8; Pb: 90-277-1949-7 F. Heny and B. Richards (eds.): Linguistic Categories: Auxiliaries and Related Puzzles. Volume I: Categories. 1983 ISBN 90-277-1478-9 F. Heny and B. Richards (eds.): Linguistic Categories: Auxiliaries and Related Puzzles. Volume II: The Scope, Order, and Distribution of English Auxiliary Verbs. 1983 ISBN 90-277-1479-7 R. Cooper: Quantification and Syntactic Theory. 1983 ISBN 90-277-1484-3 J. Hintikka (in collaboration with J. Kulas): The Game of Language. Studies in GameTheoretical Semantics and Its Applications. 1983; 2nd printing 1985 ISBN 90-277-1687-0; Pb: 90-277-1950-0 E. L. Keenan and L. M. Faltz: Boolean Semantics for Natural Language. 1985 ISBN 90-277-1768-0; Pb: 90-277-1842-3 V. Raskin: Semantic Mechanisms of Humor. 1985 ISBN 90-277-1821-0; Pb: 90-277-1891-1
Volumes 1–26 formerly published under the Series Title: Synthese Language Library.
Studies in Linguistics and Philosophy 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.
40. 41. 42.
43. 44. 45. 46. 47. 48. 49.
G. T. Stump: The Semantic Variability of Absolute Constructions. 1985 ISBN 90-277-1895-4; Pb: 90-277-1896-2 J. Hintikka and J. Kulas: Anaphora and Definite Descriptions. Two Applications of GameTheoretical Semantics. 1985 ISBN 90-277-2055-X; Pb: 90-277-2056-8 E. Engdahl: Constituent Questions. The Syntax and Semantics of Questions with Special Reference to Swedish. 1986 ISBN 90-277-1954-3; Pb: 90-277-1955-1 M. J. Cresswell: Adverbial Modification. Interval Semantics and Its Rivals. 1985 ISBN 90-277-2059-2; Pb: 90-277-2060-6 J. van Benthem: Essays in Logical Semantics 1986 ISBN 90-277-2091-6; Pb: 90-277-2092-4 B. H. Partee, A. ter Meulen and R. E. Wall: Mathematical Methods in Linguistics. 1990; Corrected second printing of the first edition 1993 ISBN 90-277-2244-7; Pb: 90-277-2245-5 P. G¨ardenfors (ed.): Generalized Quantifiers. Linguistic and Logical Approaches. 1987 ISBN 1-55608-017-4 R. T. Oehrle, E. Bach and D. Wheeler (eds.): Categorial Grammars and Natural Language Structures. 1988 ISBN 1-55608-030-1; Pb: 1-55608-031-X W. J. Savitch, E. Bach, W. Marsh and G. Safran-Naveh (eds.): The Formal Complexity of Natural Language. 1987 ISBN 1-55608-046-8; Pb: 1-55608-047-6 J. E. Fenstad, P.-K. Halvorsen, T. Langholm and J. van Benthem: Situations, Language and Logic. 1987 ISBN 1-55608-048-4; Pb: 1-55608-049-2 U. Reyle and C. Rohrer (eds.): Natural Language Parsing and Linguistic Theories. 1988 ISBN 1-55608-055-7; Pb: 1-55608-056-5 M. J. Cresswell: Semantical Essays. Possible Worlds and Their Rivals. 1988 ISBN 1-55608-061-1 T. Nishigauchi: Quantification in the Theory of Grammar. 1990 ISBN 0-7923-0643-0; Pb: 0-7923-0644-9 G. Chierchia, B.H. Partee and R. Turner (eds.): Properties, Types and Meaning. Volume I: Foundational Issues. 1989 ISBN 1-55608-067-0; Pb: 1-55608-068-9 G. Chierchia, B.H. Partee and R. Turner (eds.): Properties, Types and Meaning. Volume II: Semantic Issues. 1989 ISBN 1-55608-069-7; Pb: 1-55608-070-0 Set ISBN (Vol. I + II) 1-55608-088-3; Pb: 1-55608-089-1 C.T.J. Huang and R. May (eds.): Logical Structure and Linguistic Structure. Cross-Linguistic Perspectives. 1991 ISBN 0-7923-0914-6; Pb: 0-7923-1636-3 M.J. Cresswell: Entities and Indices. 1990 ISBN 0-7923-0966-9; Pb: 0-7923-0967-7 H. Kamp and U. Reyle: From Discourse to Logic. Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. 1993 ISBN 0-7923-2403-X; Student edition: 0-7923-1028-4 C.S. Smith: The Parameter of Aspect. (Second Edition). 1997 ISBN 0-7923-4657-2; Pb 0-7923-4659-9 R.C. Berwick (ed.): Principle-Based Parsing. Computation and Psycholinguistics. 1991 ISBN 0-7923-1173-6; Pb: 0-7923-1637-1 F. Landman: Structures for Semantics. 1991 ISBN 0-7923-1239-2; Pb: 0-7923-1240-6 M. Siderits: Indian Philosophy of Language. 1991 ISBN 0-7923-1262-7 C. Jones: Purpose Clauses. 1991 ISBN 0-7923-1400-X R.K. Larson, S. Iatridou, U. Lahiri and J. Higginbotham (eds.): Control and Grammar. 1992 ISBN 0-7923-1692-4 J. Pustejovsky (ed.): Semantics and the Lexicon. 1993 ISBN 0-7923-1963-X; Pb: 0-7923-2386-6
Studies in Linguistics and Philosophy 50. 51. 52. 53. 54.
55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78.
N. Asher: Reference to Abstract Objects in Discourse. 1993 ISBN 0-7923-2242-8 A. Zucchi: The Language of Propositions and Events. Issues in the Syntax and the Semantics of Nominalization. 1993 ISBN 0-7923-2437-4 C.L. Tenny: Aspectual Roles and the Syntax-Semantics Interface. 1994 ISBN 0-7923-2863-9; Pb: 0-7923-2907-4 W.G. Lycan: Modality and Meaning. 1994 ISBN 0-7923-3006-4; Pb: 0-7923-3007-2 E. Bach, E. Jelinek, A. Kratzer and B.H. Partee (eds.): Quantification in Natural Languages. 1995 ISBN Vol. I: 0-7923-3128-1; Vol. II: 0-7923-3351-9; set: 0-7923-3352-7; Student edition: 0-7923-3129-X P. Lasersohn: Plurality, Conjunction and Events. 1995 ISBN 0-7923-3238-5 M. Pinkal: Logic and Lexicon. The Semantics of the Indefinite. 1995 ISBN 0-7923-3387-X P. Øhrstrøm and P.F.V. Hasle: Temporal Logic. From Ancient Ideas to Artificial Intelligence. 1995 ISBN 0-7923-3586-4 T. Ogihara: Tense, Attitudes, and Scope. 1996 ISBN 0-7923-3801-4 I. Comorovski: Interrogative Phrases and the Syntax-Semantics Interface. 1996 ISBN 0-7923-3804-9 M.J. Cresswell: Semantic Indexicality. 1996 ISBN 0-7923-3914-2 R. Schwarzschild: Pluralities. 1996 ISBN 0-7923-4007-8 V. Dayal: Locality in WH Quantification. Questions and Relative Clauses in Hindi. 1996 ISBN 0-7923-4099-X P. Merlo: Parsing with Principles and Classes of Information. 1996 ISBN 0-7923-4103-1 J. Ross: The Semantics of Media. 1997 ISBN 0-7923-4389-1 A. Szabolcsi (ed.): Ways of Scope Taking. 1997 ISBN 0-7923-4446-4; Pb: 0-7923-4451-0 P.L. Peterson: Fact Proposition Event. 1997 ISBN 0-7923-4568-1 G. P˘aun: Marcus Contextual Grammars. 1997 ISBN 0-7923-4783-8 T. Gunji and K. Hasida (eds.): Topics in Constraint-Based Grammar of Japanese. 1998 ISBN 0-7923-4836-2 F. Hamm and E. Hinrichs (eds.): Plurality and Quantification. 1998 ISBN 0-7923-4841-9 S. Rothstein (ed.): Events and Grammar. 1998 ISBN 0-7923-4940-7 E. Hajiˇcov´a, B.H. Partee and P. Sgall: Topic-Focus Articulation, Tripartite Structures, and Semantic Content. 1998 ISBN 0-7923-5289-0 K. von Heusinger and U. Egli (Eds.): Reference and Anaphoric Relations. 1999 ISBN 0-7923-6070-2 H. Bunt and R. Muskens (eds.): Computing Meaning. Volume 1. 2000 ISBN 0-7923-6108-3; Pb: ISBN 1-4020-0290-4 S. Rothstein (ed.): Predicates and their Subjects. 2000 ISBN 0-7923-6409-0 K. Kabakˇciev: Aspect in English. A "Common-Sense" View of the Interplay between Verbal and Nominal Referents. 2000 ISBN 0-7923-6538-0 F. Landman: Events and Plurality. The Jerusalem Lectures. 2000 ISBN 0-7923-6568-2; Pb: 0-7923-6569-0 H. Bunt, R. Muskens and E. Thijsse: Computing Meaning. Volume 2. 2001 ISBN 0-7923-0175-4; Pb: 1-4020-0451-6 R. Musan: The German Perfect. Its Semantic Composition and Its Interactions with Temporal Adverbials. 2002 ISBN 1-4020-0719-1
Studies in Linguistics and Philosophy 79. 80. 81. 82. 83.
G. Grevendorf and G. Meggle (eds.): Speech. Acts, Mind, and Social Reality. Discussions with R. Searle. 2002 ISBN 1-4020-0853-8; Pb: 1-4020-0861-9 G.-J.M. Kruijff and R.T. Oehrle (eds.): Resource-Sensitivity, Binding and Anaphora. 2003 ISBN 1-4020-1691-3; Pb: 1-4020-1692-1 R. Elugardo and R.J. Stainton (eds.): Ellipsis and Nonsentential Speech. 2005 ISBN 1-4020-2299-9; Pb: 1-4020-2300-6 C. Lee, M. Gordan and D. B¨uring (eds.): Topic and Focus: Cross-linguistic Perspectives on Meaning and Intonation. 2006 ISBN 1-4020-4795-9 H. Bunt and R. Muskens (eds.): Computing Meaning, Volume 3. 2007 ISBN 978-1-4020-5956-8
Further information about our publications on Linguistics is available on request. springer.com