JOURNAL OF SEMANTICS AN INTERNATIONAL JOURNAL FOR THE INTERDISCIPLINARY STUDY OF THE SEMANTICS OF NATURAL LANGUAGE
M A N A G I N G E D I T O R : PETER BOSCH (University of Osnabriick) ASSOCIATE E D I T O R S : NICHOLAS ASHER (University of Texas, Austin) ROB VAN DER SANDT (University of Nijmegen) EDITORIAL BOARD: MANFRED BIERWISCH (MPG and Humboldt
PHILIP N.JOHNSON-LAIRD (Princeton University)
University Berlin)
HANS KAMP (University of Stuttgart)
BRANIMIR BOGURAEV (LBM TJ. Watson Research
GRAHAM KATZ (University of Tubingen)
Center)
SEBASTIAN LOBNER (University of Dusseldorf)
KEITH BROWN (University of Essex) GENNARO CHIERCHIA (University of Milan)
SIR JOHN LYONS (Verneuil-en-Bourbonnais) MARC MOENS (University of Edinburgh)
ANN COPESTAKE (University of Cambridge)
FRANCIS J. PELLETIER (University of Alberta)
OSTEN DAHL (University of Stockholm) KEES VAN DEEMTER (University of Brighton) PAUL DF.KKER (University of Amsterdam) KURT EBERLE (linguatec-es, Heidelberg) REGINE ECKARDT (University of Konstanz) CLAIRE GARDENT (CNRS, Nancy) BART GEURTS (University of Nijmegen) LAURENCE R. HORN (Yale University)
MANFRED PINKAI. (University of Saarbriicken) ARNIM VON STECHOW (University of Tubingen) MARK STEEDMAN (University of Edinburgh) ANATOLI STRIGIN (ZAS, Berlin) HENRIETTE OF: SWART (University of Utrecht) BONNIF. WEBBER (University of Edinburgh) HENK ZEEVAT (University of Amsterdam) THOMAS E. ZIMMERMANN (University of
JOACHIM JACOBS (University of Wuppertal)
Frankfurt)
EDITORIAL ADDRESS: Journal of Semantics, c/o Dr P. Bosch, Lerchenstr. 76, 70176 Stuttgart, Germany. Phone: (49-711-) 2262616. Telefax: (49-711-) 2262614. Email:
[email protected] © Oxford University Press 2000 All rights reserved; no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise without either the prior written permission of the Publishers, or a licence permitting restricted copying issued in the UK by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London WlP oHE, or in the USA by the Copyright Clearance Center, 222 Rosewood Drive, Danvcrs, Massachusetts 01923, USA. Journal of Semantics (ISSN 0167 5133) is published quarterly in February, May, August and November by Oxford University Press, Oxford, UK. Annual subscription is US$173 P er year. Journal of Semantics is distributed by MAIL America, 2323 Randolph Avenue, Avenel, New Jersey 0700!, USA. Periodical postage paid at Railway, New Jersey, USA and at additional entry points. US POSTMASTER: send address corrections to Journal of Semantics, c/o MAIL America, 2323 Randolph Avenue, Avenel, New Jersey 07001, USA.
For subscription information please see inside back cover.
JOURNAL OF SEMANTICS Volume 17 Number 1
Special Issue on Dialogue (Part I) Guest Editors: Henk Zeevat and Robert van Rooy
CONTENTS HENK ZEEVAT AND ROBERT VAN ROOY
Introduction
1
DAVID R. TRAUM
Twenty Questions on Dialogue Act Taxonomies
7
NICHOLAS ASHER
Truth Conditional Discourse Semantics for Parentheticals
31
MIRIAM ECKERT AND MICHAEL STRUBE
Dialogue Acts, Synchronizing Units, and Anaphora Resolution (Part II to follow in vol. 17.2)
Please visit the journal's world wide web site at http://jos.oupjournals.org and the editorial web site at http://journal-of-semantics.org
51
Subscriptions: The Journal of Semantics is published quarterly. Institutional: UK and Europe £99; USA and Rest of World US$173. (Single issues: UK and Europe £31; USA and Rest of World US$54.) Personal* UK and Europe £42.50; USA and Rest of World US$79. (Single issue: UK and Europe £13; USA and Rest of World US$25.) * Personal rates apply only when copies are sent to a private address and payment is made by personal cheque/credit card.
Prices include postage by surface mail or, for subscribers in the USA and Canada by Airfreight or in Japan, Australia, New Zealand and India by Air Speeded Post. Airmail rates are available on request. Back Issues. The current plus two back volumes are available from the Oxford University Press, Great Clarendon Street, Oxford OX2 6DP. Previous volumes can be obtained from Dawsons Back Issues, Cannon House, Park Farm Road, Folkestone, Kent CT19 5EE, tel +44 (0)1303 850101, fax +44 (0)1303 850440. Volumes 1-6 are available from Swets and Zeitlinger, PO Box 830, 2160 SZ Lisse, The Netherlands. Payment is required with all orders and subscriptions are accepted and entered by the volume. Payment may be made by cheque or Eurocheque (made payable to Oxford University Press), National Girobank (account 500 1056), Credit cards (Access, Visa, American Express, Diners Club), or UNESCO coupons. Please send orders and requests for sample copies to the Journals Subscriptions Department, Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, UK, tel +44 (0)1865 267907, fax +44 (0)1865 267485,
[email protected].
Scope of" this Journal The Journal of Semantics publishes articles, notes, discussions, and book reviews in the area of academic research into the semantics of natural language. It is explicitly interdisciplinary, in that it aims at an integration of philosophical, psychological, and linguistic semantics as well as semantic work done in logic, artificial intelligence, and anthropology. Contributions must be of good quality (to be judged by at least two referees) and must report original research relating to questions of comprehension and interpretation of sentences, texts, or discourse in natural language. The editors welcome not only papers that cross traditional discipline boundaries, but also more specialized contributions, provided they are accessible to and interesting for a general readership in the field of natural language semantics. Empirical relevance, sound theoretic foundation, and formal as well as methodological correctness by currently accepted academic standards are the central criteria of acceptance for publication. It is also required of contributions published in the Journal that they link up with currently relevant discussions in the field of natural language semantics. Information for Authors: Papers for publication should be submitted to the Managing Editor (
[email protected]) as a PDF file or PS file attachment. Only if this is not feasible please send three paper copies by post to the editorial address and, if possible, enclose a DOS-formatted 3.5 inch disk with a PDF or PSfile,or text processing source file. Papers are accepted for review only on the condition that they have neither as a whole, nor in part, been published elsewhere, are elsewhere under review, or have been accepted for publication. In case of any doubt authors must notify the editor of the relevant circumstances at the time of submission. The style requirements of the Journal of Semantics are found in the style sheet http://journal-of-semantics.org/style.html and are binding for the final version to be prepared by the author when the paper is accepted for publication. For initial submission it suffices if the following minimal requires are met. The page size should be A4 (or similar format). The paper must be headed by its title and must carry the name and affiliation of the author along with the author's correspondence address (post and email) at the end of the text. All submissions must be accompanied by an approx. 200 word abstract. Detailed bibliographical references must appear at the end of the paper in alphabetical order of authors' names, abbreviated in the text by author's surname and year of publication. Diagrams must be submitted in electronic files or camera-ready on paper. Copyright: It is a condition of publication in the Journal that authors assign copyright to Oxford University Press. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning copyright, authors may use their own material in other publications provided that the Journal is acknowledged as the original place of publication, and Oxford University Press is notified in writing and in advance. Advertising: Advertisements are welcome and rates will be quoted on request. Enquiries should be addressed to Helen Pearson, Oxford Journals Advertising, PO Box 347, Abingdon SO, OX 14 5XX, UK. Tel/fax: +44 (0)1235 201904,
[email protected].
Journal of Semantics 17: 7-30
© Oxford University Press 2000
20 Questions on Dialogue Act Taxonomies DAVID R. TRAUM University of Southern California
Abstract
I INTRODUCTION When engaging in a study related to dialogue pragmatics, a researcher is confronted with a bewildering range of theories and taxonomies of dialogue acts1 to choose from. Moreover, specific deficits in any given theory often lead researchers to continue to develop new taxonomies to suit their particular purposes. To some degree, this is to be expected; dialogue act taxonomies can be seen as a kind of language for describing communicative events, and new formal languages (e.g. programming languages like Java) and (at a slower pace) natural languages continue to be created. On the other hand, in both natural and artificial languages, the use of similar signs for different concepts can cause confusion and misunderstanding, often with serious undesirable consequences (e.g. in programming languages, the use of = as an assignment rather than equality operator in a boolean context; or the firing of an American city official for using the word niggardly (of independent Scandinavian origin) because it sounded too similar to an offensive racial epithet euphemistically referred to as 'the N word'.2 Similar confusions often occur when one researcher tries to interpret the dialogue act taxonomy of another. For example, various conditions are used to characterize a dialogue act labeled as inform, including those listed in (i). 1 By the term dialogue acts, I don't mean to limit discussion to those theories and taxonomies that explicitly use this term. Other terms used for the same general concept include locutionary, illocutionary, and perlocutionary acts (Austin 1962), speech acts (Searle 1969), communicative acts (Allwood 1976; Sadek 1991; Airenti et al. 1993), conversation acts (Traum & Hinkelman 1992), conversational moves (Carletta et al. 1997), and dialogue moves (Cooper et al. 1999). My remarks here are intended to apply to the general phenomenon described by this range of terms. Dialogue acts can perhaps be seen as most generic, at least in the context of a forum on dialogue. 2 Washington DC Public Advocate David Howard, in February 1999.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
There is currently a broad interest in dialogue acts and dialogue act taxonomies, and new uses, taxonomies, and standardization efforts continue to be proposed. This paper presents a discussion of issues that must be addressed in order to facilitate the shared understanding and use of taxonomies. The discussion is framed in terms of 20 questions, the answers to which will help make the meanings of taxonomy elements more clear to different communities of users.
8 20 Questions on Dialogue Act Taxonomies
When one encounters such a label, it is often not clear which subset of the constraints in (i) (or perhaps none of them, when an entirely different formulation is used to define an inform) are meant by the labeler to characterize the labeled utterance. This kind of confusion has led some (e.g. (External Interfaces Working Group 1993; Discourse Resource Initiative 1997; FIPA 1997)) to propose standard theories that could be well defined and understood and used across groups, while others (e.g. Allwood 1977; Cohen & Levesque 1990) prefer to treat dialogue act (i.e. illocutionary force) identification as of only secondary importance, as a derived concept within a more general theory of rational interaction, using other concepts as primitives. declarative mood was used propositional information was expressed new information was expressed the addressee came to believe what was expressed what was expressed is actually believed by the speaker what was expressed is actually true
It is hard to dispute the claim that dialogue acts are a useful concept, given the wide variety of uses to which they are put. Some of these uses include:3 representations of the pragmatic meaning of utterances in dialogue theories (Vanderveken 1991; Bunt 1996; Poesio & Traum 1997, 1998), building blocks for grammars of dialogue (Winograd & Flores 1986; Bilange 1991), labels for corpus annotation (Carletta et al. 1997; Alexandersson et al. 1998), agent communication languages (External Interfaces Working Group 1993; Sidner 1994; FIPA 1997; Singh 1998), object of analysis in dialogue systems (Allen et al. 1996; Bretier & Sadek 1996), and element of a logical theory of rational interaction (Sadek 1991). Despite this popularity of the concept, there are still a number of issues that present significant challenges for creating a taxonomy of dialogue acts that can be understood and used by researchers other than the taxonomy designers. Here I will briefly raise some of the issues that have often caused confusion when interpreting one taxonomy of dialogue acts from within the viewpoint of another. These issues must be addressed in order to have a clearer idea of what one means by saying that a dialogue act occurred, whether the dialogue act taxonomy is meant for labeling a naturally occurring corpus, as part of a formal theory of action, or as a systeminternal representation of the dialogue. Although there are many such issues, I focus here on 20, formulated as questions, in homage to the 3 Here and elsewhere in the paper, examples are meant to be representative rather than exhaustive; there is a large amount of work in some of these areas.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(1) a. b. c. d. e. £
David R. Traum 9
'dialogue game' named in the title. For convenience, these questions are grouped into sections of related questions.
2 DEFINING DIALOGUE ACTS
Question i: Which is most important: fit to intuitions or formal rigor?
Question 2: Is the definition of a dialogue act an issue of Lexical Semantics or Ontology of Action? There are different tasks one might be attempting when defining the meaning of a dialogue act. Is it to provide an account of when someone
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This question has implications beyond just dialogue act definitions, and applies to any attempt to provide a formal theory of commonsense notions. Very often it is difficult to precisely formulate complex intuitions using available formal techniques. The question then arises as to which goal to sacrifice for the time being. Should one formalize a simpler notion that does not have all of the properties of the intuitive concept (e.g. normal modal logics are very popular as models of belief (c.f. Hintikka 1962), yet they have the property of logical omniscience, which is certainly undesirable as a model of human belief)? Or should one sacrifice some desirable formal properties, such as a model-theoretic semantics, necessary and sufficient conditions for categories, or soundness and completeness of an inference system? The answer will depend on the purposes to which the concept is to be put: if the primary goal is to discover and prove properties of the system, formal properties are not easily sacrificed. On the other hand, if the goals are more empirically motivated, a well-defined concept within a formal system may not be close enough to the underlying concept to be useful, but an underspecified concept without some of these formal properties may suffice for the task at hand (corpus labeling or use in a computer program). There should also be a place for intermediate points that make some sacrifices at each side, while striving for maximum utility for a given purpose. In particular, with respect to dialogue acts, it can be relatively easy to state precise definitional conditions of occurrence within a formal logic of action, but a problem may arise when these conditions diverge from a more intuitive (and intuitively useful) notion of action that empirical analysts and dialogue system designers would actually like to use.
io 20 Questions on Dialogue Act Taxonomies
Question 3: Under what conditions may an action be said to have occurred? There are a number of different criteria that are being used to decide whether or not an action occurs in a given situation. Allwood (1980) uses four criteria, shown in (2), each of which can be a sufficient condition for ascribing that an action has occurred. On the other hand, none of these conditions is necessary for action ascription. (2) a. b. c. d.
intention of performer form of the behavior (e.g. linguistic form) achieved result context in which the behavior occurs
While it is certainly coherent to define actions in terms of meeting minimal conditions along any of these dimensions, it is less clear that this is the most useful way of capturing the generalizations over acts that consumers of a dialogue act taxonomy would like to express. For example, one may be interested strictly in the result, intention, or context, or perhaps in the relationship between form and result. In the most central case, all four kinds of conditions will hold; however, one must know what to do when only some but not others hold. One should especially take care to avoid defining dialogue acts according to, say, a certain set of results
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
might be justified in describing an occurrence using a sentence headed with a particular verb (e.g. inform, request), or to provide a technical vocabulary to compactly describe various types of occurrences in convenient ways for use in analyzing aspects of interaction? As Allwood (1977) warns, these endeavors should be clearly separated, even if one might want to use similar categories to describe each (as is done in Allwood 1980), or maintain a position of identity of semantic and conceptual structure (Jackendoff 1983). Intuitions, or annotation by naive coders without instructions to the contrary, may tend to focus on the former enterprise, which may have undesirable consequences depending on how the taxonomy is to be used. The key question is how much weight, if any, should be given to linguistic intuitions about when it is true or appropriate to use a particular sentence containing a speech act verb to describe an occurrence. For Lexical Semantics, these linguistic intuitions (or similar examinations of actual usage) are paramount (barring issues of polysemy). On the other hand, the intuitions might not be useful when devising an ontology of action—such an ontology might, for independently motivated reasons, diverge from the classifications made in natural languages.
David R. Traum 11
holding, and then identify instances of these acts occurring using only one of the other criteria, as this would lead to an unjustified claim of the results holding. Using different criteria (e.g. results only vs. intention only) can also lead to misunderstandings between theorists (or coders) as to whether a particular act has been performed, and whether the performance of an act implies a particular result holding. As an example, consider a characterization of an inform act, given in (3).
One could, of course, quibble with any of these characterizations in terms of being too strong or too weak to capture the meaning of inform, or perhaps decide that they are more appropriate for some other act (e.g. statement, assertion). For example, one might produce an utterance of the same form, when not all of the context conditions hold, or in which the speaker has a different intention. Which kinds of conditions and whether they are necessary will also depend on the task being attempted. Compare, for example, the tasks of discovering lexical semantics compared with the task of constructing an action ontology, as discussed in the previous question. Also, it makes a difference whether this ascription is made from the point of view of an online dialogue participant (such as a dialogue system) or an external observer, e.g. an offline annotator of a pre-collected dialogue corpus (see also question 6).
Question 4: What is the role of speaker intention? Intention is usually given a somewhat privileged position with respect to determining what dialogue acts (or actions, in general) have been performed, viz. the first criterion in (2). Some would define dialogue acts on the basis of the intention behind them, while others would equate illocutionary acts with recognition of this intention (based on the notion of meaning in Grice (1957)). A problem with this approach is that definitively interpreting the intention of the speaker requires mind-reading on the part of the hearer. Another problem is that some dialogue acts (like other acts) can, at times, be performed unintentionally or with an only ex post facto commitment. Finally, as with other acts, one may perform them with
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(3) a. intention of performer: that receiver comes to believe proposition p. b. form of the behavior: speaker utters a declarative sentence with propositional content p. c. achieved result: speaker and hearer mutually believe p. d. context in which the behavior occurs: Speaker and hearer in contact, speaker believes p, hearer does not believe p.
12 20 Questions on Dialogue Act Taxonomies
various goals in mind—it may be unnecessary to discover the actual intention in order to recognize an act or its effects in context. For example, a declarative utterance might be performed with the intention to cause the hearer to adopt a belief in the stated proposition, p, as in (3a). However, the same utterance might very well be performed if the speaker intends instead to cause the hearer to believe that the speaker believes p. Or intends to cause the hearer to believe that the speaker wants the hearer to believe p. Or the conjunction of some set of these (or other similar conditions). For these reasons, some prefer to keep distinct the issues of intention recognition and dialogue act attribution, even though they are related.
Regardless of speaker intention, many dialogue act definitions require, for even the most limited notion of success, some changes to the addressee based on understanding of the utterance in a particular way. Noticing whether the addressee has actually understood in a particular way can often require just as much mind-reading on the part of the speaker as intention recognition requires on the part of the hearer. Later utterances in a dialogue often provide more clues, and thus some (e.g. Clark & Schaefer 1989; Traum & Hinkelman 1992) require a grounding process (in the later case by performing other kinds of dialogue acts) before considering some dialogue acts, such as inform, request to have been successfully performed. This involves the giving of positive and negative feedback (Allwood et al. 1992) about how utterances were perceived and understood. A negotiation of meaning can also occur (McRoy & Hirst 1995), severing completely the link between the dialogue effects and original speaker intentions or addressee uptake.
Question 6: What point of view should be taken regarding performance of acts? There are several points of view that may be taken when regarding the performance of dialogue acts. Relating to the previous two questions are the speaker's and hearer's point of view, respectively. Also, there is a negotiated collaborative point of view of the speaker-addressee team, which may differ from the private views of each of the participants. There is also a normative-conventional point of view, which can make reference to social institutions beyond just the speaker-hearer pair, in order to determine what acts have been performed (e.g. whether a speaker has committed herself). There is also the issue of time with respect to coding or ascription of acts: is
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Question 5: What is the role of addressee uptake?
David R. Traum 13
utterance, such as acknowledgment, repair, request for repair, or request for
confirmation (Traum & Allen 1992). The latter two could perhaps be distinguished from the former by prosody: questioning intonation could indicate lack of certainty as well as desire for further feedback, while declarative intonation could indicate one of the former functions. One could distinguish acknowledgment from repair by deciding whether the second utterance repeats (or paraphrases) the information in the former [acknowledgment), or changes some part of it (repair). However, this decision requires a point of view, indicating who believes it to be the same or different. Especially with current technology speech recognition systems, there is a significant likelihood that a system may 'repeat' what it thought it heard, while producing something different from what was actually said. It is also possible (though perhaps less likely) that a system intends to correct but ends up repeating what was really said. The same issues come up (though with less frequency) in human-human conversation.
3 DIALOGUE ACT C O M P O N E N T S
Question 7: How are actions used in a logic? In formal theories, actions are usually seen as transitions from states to states (or worlds to worlds), while dialogue acts are seen as special cases of actions (though see question 11). Theories of action proposed by Artificial Intelligence (AI) researchers generally associate several sets with actions: a
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
it an on-line decision made at the time of performance (or using only information available at the time of performance), or is one allowed to consider subsequent utterances/action, as well, before deciding what happened? Point of view is relatively straightforward from the internal perspective of a dialogue system (although a system might still need to reason about the interlocutor's point of view, including information discovered at subsequent time points, in diagnosing misunderstanding (McRoy & Hirst 1995) or constructing a negotiated view). It is, however, far from clear what point of view should be taken by coders (and how they should estimate the speaker's or addressee's point of view without mind-reading). Likewise, in defining the acts or giving them a logical semantics, it may be necessary to take point of view into account. As an example, consider the case of a feedback reply of a word or phrase following a declarative utterance by the other speaker. There are several different grounding functions that could be performed by this second
14 20 Questions on Dialogue Act Taxonomies
(4) a. Pre(X, now) A Try(X, now) —> Effects(X, next) b. Done(X, now) —> Pre(X,prev) c. Pre(Y,prev) A decomp(Y, { X , , . . . ,Xn, }) A VX, :I <,< n : Done(Xj,now) —> Done(Y, now) (5) Do(A,X) A decomp(Y, {... , X , . . . } ) - » Intend(A, Y)
Question 8: What is context? Given the general framework for actions in the discussion of the previous question, a large question remains as to which aspects of the situation are relevant as potential conditions for defining types of dialogue act performance, and which aspects are (directly) affected. Some logical models might allow the truth value of any representable proposition to be a possible condition or effect. This must, of course, be filtered through the lens of 'point of view' (see question 6). Generally there are three more special sorts of information used for conditions and effects of dialogue acts. First, there is a notion of dialogue state, as encoded as state in a dialogue grammar 4 Pollack (1990) focuses on enabling conditions rather than pre-conditions, and generation conditions rather than decomposition (following Goldman 1970). 3 Details of axioms of this sort obviously vary quite a bit depending on the syntax and semantics of the logic used, e.g. whether Done means 'happened in the immediately prior state transition' or some looser sense of happened recently.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
set of effects (constraints on the resulting state), a set of pre-conditions (constraints on the initial state), and decompositions (subactions that, performed together constitute the action).4 In terms of the categories given in (2): the effects corresponds to the achieved result, aspects of context and intention may be related to the preconditions, and the form of the behavior is characterized by the decompositions. The AI theories of action generally include requirements on each of these aspects, so that the axioms in (4) hold (where X is an action type, Pre and Effects are the preconditions and effects of this action type, and prev, now and next are 'consecutive' time points).5 (4a) involves reasoning from felicitous performance to effects, (4b) involves reasoning from performance to preconditions having held, and (4c) involves reasoning from performance of subactions to performance of the main action. In addition, something like the schema in (5) is used (although usually only in an abductive or circumscriptive sense, rather than as a sound axiom describing all circumstances), for reasoning from subaction to intention ascription (plan recognition). These axioms can also be used to help determine inconsistency of a (default) interpretation, which may then be a cue of an indirect speech act, or a misunderstanding.
David R. Traum 15
Question 9: What kind of conditions are most appropriate? The notion of pre-condition is often criticized as meaning too many different things in relation to planning and reasoning about action (e.g. Pollack 1990). First of all, there is the general issue of enabling conditions vs. applicability constraints—the former being those that can be planned to achieve, while the latter describe conditions in which this kind of action should be considered. If the enabling conditions do not hold, a more complex plan is formed to achieve the conditions so that the action under consideration may be attempted; if the applicability constraints do not hold, the action will be dropped from consideration, and some other action (or set of actions) will be considered instead. There is also the issue of whether these conditions are necessary or sufficient for (successful) performance of the action. Many convenient dialogue acts actually have few if any actual preconditions, in the sense that the action cannot occur if the conditions are
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(Winograd & Flores 1986; Traum & Allen 1992; Lewin 1998), or some other structural representation of context (e.g. Ginzburg 1998). Using the dialogue grammar approach, certain acts may be defined with a precondition that the dialogue be in a particular state in the transition network, while the effects will include a transition to a new state in the network Or, using a different notion of information state, one could stipulate a pre-condition that it is only possible to perform an answer if there is a relevant question under discussion (see also question 12). The second kind of information, the most popular in the planning approach, is in terms of mental states (e.g. belief, intention) of the speaker and addressee(s) (Cohen & Perrault 1979; Allen & Perrault 1980). For instance, pre-conditions of an inform act may include the latter two conditions in (3d). Effects will include newly adopted beliefs and intentions. A third kind of information is in terms of the social obligations and commitments undertaken by the dialogue participants (Allwood 1994; Poesio & Traum 1998;'Traum 1999; Singh 1998). Example effects include commitments to stated propositions, and commitments to do promised actions. Pre-conditions of this sort are more rare, though could be formulated for dialogue acts such excuses, which presuppose a sort of obligation to act (which has not been or will not be performed). Most approaches will actually combine two or three of these kinds of conditions and effects. There may also be other types of effects, not easily classifiable into these categories.
16 20 Questions on Dialogue Act Taxonomies
not met. Conditions are often formed in terms of either normal conditions or in terms of what is required for felicitous performance of the action (Searle 1969). Formulating conditions in this way does give greater flexibility, but this flexibility comes at the price of having to determine whether an action is felicitous and also needing to characterize non-felicitous performance. The kinds of conditions to represent in a theory will also depend on the type of cognitive tasks to be performed using the acts: dialogue act planning and performance or dialogue act recognition. For the former (e.g. using axiom
Question 10: How should an unsuccessful act be distinguished from a failed attempt to perform an act? This question is related to the difference between success and satisfaction of a speech act (Vanderveken 1990). The former has to do with whether the act was actually fully performed, the latter with whether the propositional content is (or becomes) true. If one uses a social commitment approach, then one may say the act has been performed if the commitments are established, and (fully) successful if its intended perlocutionary effects (Sadek 1991) or evocative intentions (Allwood 1995) are achieved. As an example, consider a request by agent A to agent B, for B to do some action x, schematically: Request(A,B,Do(B, x)). One must now determine which of the conditions in (6) to associate with an attempt vs. success vs. satisfaction. Condition (6f) seems sufficient to describe an attempted request, while (6a) is necessary for a fully satisfied request. Success criteria are more difficult to agree upon, however (see also question (8)). According to the mental states approach, successful performance of the request might be (6b), or (6e) (in the latter case, requiring an additional assumption of cooperativity to lead to (6b) and then (6a) (Cohen & Perrault 1979)). The social commitments approach would favor (6c) (Allwood 1994), or (6d) (Traum 1994) (in the latter theory, (6c) would come about only as a result of acceptance of the request). (6) a. Do(B,x) b. Intend(B,Do(B,x)) c. Obliged(B,Do(B,x)) d. Obliged(B,Address{B.Acti : Request(A,B,Do(B,x)))) e. Believe(B,Want(A,Do(B,x))) f. Try(A,Request(A,B,Do(B,x)))
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(4a)), one might care more about sufficient rather than necessary conditions. For the latter (e.g. using axiom (4b)), however, one might be more interested in necessary conditions (to use this as an axiom rather than as a default rule).
David R. Traum 17
Another issue concerns the kinds of actions involved in leading to success of the action (and the associated effects). Is a single utterance (in the appropriate circumstances) enough, or is a grounding process (Clark & Schaefer 1989; Traum 1994) needed? It is certainly most likely that an addressee will not perform a requested act (6a) (or intend to perform it (6b)) if she does not hear or understand the request. Likewise, it is debatable whether one would even have the obligation to perform the act (6c) under such conditions. Should one then say that the act was not performed (equating success with achieved result, as in (3 c)), or that the results do not necessarily hold when an act has been performed?
Question n : What is the relationship between dialogue acts and other (e.g. physical) acts? One of the main intuitions behind speech act theory (Austin 1962) was to connect speech acts with other actions. However, different theories may maintain a crisp or more blurred distinction between dialogue acts and non-communicative acts. Some want a clear distinction, while others would want to use the same logic of action to account for both. Litman & Allen (1987) distinguished dialogue acts as being meta-acts, defining discourse plans as having other plans (domain or discourse) as parameters. Lambert & Carberry (1991) also distinguish discourse, domain and problem solving plans and actions. Depending on the answer to question 8, some may want to describe dialogue acts as having a different sort of effect on the dialogue context, mental states, or social context than can be achieved with other kinds of action. Another difference between dialogue acts and many sorts of physical action is that dialogue acts involve multiple agents, since there is at least a speaker and addressee involved. See also question 13.
Question 12: What is the relationship between dialogue acts and dialogue structure? Dialogue structure is used for a variety of purposes, e.g. for calculating referential accessibility, topic and focus, and global coherence of utterances. There are several options as to how to view the relationship between dialogue structure and the dialogue acts that have been performed. Some conceive dialogue structure as being wholly dependent on the structure of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
4 RELATIONSHIPS AND COMPLEX ACTS
18 20 Questions on Dialogue Act Taxonomies
Question 13: Are there multi-agent dialogue acts? As mentioned in relation to question 5, some researchers view the performance of most illocutionary acts as a collective performance of multiple agents, in virtue of the grounding process. Other candidates for multi-agent action include notions of higher-level activity such as games (Severinson Eklundh 1983) or exchanges (Sinclair & Coulthard 1975), or collaborative completions where one speaker finishes another's sentence. There are several difficulties with these kinds of acts, however. The first is related to reliable tagging and deciding what aspects of a dialogue are relevant parts of the collaborative action. Finding the right 'units' at which to apply the tags can be a difficult process (see e.g. discussions in Discourse Resource Initiative 1997; Nakatani & Traum 1999). This difficulty is compounded when there are multiple acts with different boundaries (e.g. the multi-agent act and the single-agent component of a multi-agent act performed by a speaker within an utterance). Another issue is that one will need a more complex logic to represent multi-agent action than is needed for representing single agent action. For example, if one needs to reason about the single-agent components as well as the multi-agent act, then one needs a logic allowing simultaneous action and a method for relating the two actions (e.g. using something like the proposal in Goldman (1970).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
performance of dialogue acts (e.g. grammar-based approaches such as Sinclair & Coulthard 1975). Others use a different sort of structure, not directly composed of the performance of dialogue acts, which is sensitive to other aspects of the utterances, or is primarily constructed from the activity that the participants are engaged in (Allwood 1995; Grosz & Sidner 1986). In this latter case, it remains to be explicated what effect (if any) performance of different kinds of dialogue acts have on this dialogue structure. Dialogue structure is also often used as one of the aspects of context for dialogue act performance, serving as the source of preconditions for act definitions, and as input for a process of action recognition. For example, one might want to say that an answer act is only possible given some configuration of dialogue structure. One might frame this either in terms of previous acts (e.g. an information request act had just been performed), or in terms of other sorts of structure (e.g. there is a Question Under Discussion for which this act provides the answer, regardless of whether any particular act happened to bring the question under discussion).
David R. Traum 19
Question 14: Can dialogue acts be 'composed' of more primitive acts?
Question 15: Can multiple dialogue acts occur at the same time (performed through the same utterance)? Since most utterances have multiple functions, the answer will be 'yes', given most definitions formulated in terms of conditions and effects. There are, however, a number of complications, depending on the use to which the taxonomy is put. For logical theories, one important question is whether the logic can accommodate simultaneous action or level-generation (Goldman 1970). Simple versions of e.g. the situation calculus (McCarthy & Hayes 1969) or dynamic logic (Harel 1979) do not, which makes it difficult to formalize this kind of phenomenon. Likewise, within dialogue systems, reasoning about act occurrence is often made not on the basis of necessary and sufficient conditions, but on closeness of fit, using abductive (McRoy & Hirst 1995) or statistical methods (Reithinger & Klesen 1997). Such methods generally are used to decide on a particular label while excluding others, e.g. deciding that an interrogative utterance is an indirect request but not a question. Finally, in tagging a corpus, it is often tedious
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
If a dialogue act taxonomy has multiple strata of acts, then the question becomes whether these strata are conceived of as levels or ranks, according to the terminology of Halliday (1961). That is, whether there could be some grammar or recipe for performance of an act of one stratum using acts of a lower stratum, in the way that sentences can be composed of words and phrases (rank), or whether these are different kinds of phenomena, like the distinction between phonology and syntax (level). For example, the four tiers in the system of Sinclair & Coulthard (1975) is conceived of as ranks within a general 'discourse' level. Also, the check game in the Maptask coding scheme (Carletta et al. 1997) is composed of an initiating check move, along with other moves that accomplish the purpose of the check. On the other hand, the multi-tiered system in Traum & Hinkelman (1992) is organized in levels (at least for the lower three strata), and, although core speech acts like inform are only successfully realized at the point of a completed structure of grounding acts, there is no relationship between the type or sequence of grounding acts performed and the type of core speech acts that are realized. Within the plan ontology described in the discussion of question 7, this amounts to a question of whether the decomposition of a dialogue act contains only other dialogue acts, or involves some other sort of realization.
20 20 Questions on Dialogue Act Taxonomies
and unreliable to try to code all possible occurrences of all functions, and so designers of coding manuals often instruct annotators to label only the most significant function (in the opinion of the coding task designer), e.g. the code high principle in Condon & Cech (1992). It is important to be explicit about such assumptions, and whether multiple dialogue acts are assumed to be allowed to happen at the same time, and what the meaning of something not being coded is: assumed occurrence (perhaps on the basis of some other tag), assumed non-occurrence, or no statement about occurrence or nonoccurrence. In the Condon-Cech scheme, one could deduce that a 'higher' act had not occurred, but no such deduction is warranted about the occurrence of a 'lower' act.
CONSIDERATIONS
Question 16: Can the same taxonomy be used for different kinds of activities? There are two relevant notions of activity here. First is the meta-activity of recognizing or coding dialogue acts, which is the concern of question 20. Relevant types of meta-activities include logical reasoning, system participation in a dialogue, and corpus analysis. For the meta-activity of attributing dialogue acts to utterances, there is also the issue of whether this is an on-line or off-line attribution, and the amount of lookahead allowed (see question 6). Here I will concentrate on the activities that the dialogue participants are engaged in. There are a number of different dialogue activities that people have been designing taxonomies of dialogue acts for. Some examples include casual conversation (Jurafsky et al. 1997), classroom discourse (Sinclair & Coulthard 1975), and various flavors of task-oriented dialogue, such as information seeking (van Vark et al. 1996), collaborative scheduling (Alexandersson et al. 1998), and direction following (Carletta et al. I997)Taxonomies designed for different tasks or genres of dialogue tend to be quite different (e.g. even within the general realm of task-oriented cooperative dialogue, meeting-scheduling vs. direction following). To some extent, this is to be expected, since different genres will have different frequencies of acts. This can be seen in Table 1, which compares eight coding efforts, showing for each the percentage of utterances that were labeled with various tags. Most cells actually show the percentage of utterances labeled with one of a set of tags rather than an individual tag,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
5 TAXONOMIC
David R. Traum 21
6 The comparisons here are very rough, since the proposed equivalences might not hold in all cases. For one thing, the first scheme (in the first two columns) allows an utterance to be labeled with multiple tags, while the latter do not The numbers for these columns thus do not equal 100%, since the percentages are based on utterances rather than tags. Other columns fail to reach 100% due to some codes in incomparable categories. Also, while there is no corresponding category for questions in the Verbmobil scheme, it is likely that the subjects did ask questions, though these probably were coded as requests. Likewise, many of the feedback codes are probably also acknowledgments. 7 TRAINS Statistics from Mark Core, personal communication. See Core (1998) for details of the annotation. 8 HCRC Maptask Statistics, personal communication from Amy Isard. Verbmobil II statistics, personal communication from Michael Kipp.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
using the inter-scheme equivalences proposed in (Klein et al. 1999).6 The first two columns use the DRI coding scheme (Discourse Resource Initiative 1997) and the manual in Allen & Core (1997). Two different tasks were coded in these efforts, however: in the first, dialogues were about planning the movements of trains and commodities (Heeman & Allen 1994),7 while in the second, the dialogues involved more complex disaster management planning (Stent 2000). The third column shows coding of the Switchboard Corpus (Jurafsky et al. 1998) using a variant of the DAMSL tags (Jurafsky et al. 1997). The next two columns depict coding efforts using the HCRC coding scheme (Carletta et al. 1996), on the same task, Maptask. The fourth column shows results for the HCRC corpus (Carletta et al. 1997),8 involving Scottish students, while the fifth column shows the DCIEM corpus, involving Canadian military personnel (Taylor et al. 1998). The last three columns involve variants of the Verbmobil Tasks. The last one uses the first Verbmobil coding scheme (Jekat et al. 1995), on a German corpus of scheduling dialogues (Kipp 1998). The sixth and seventh use the revised scheme (Alexandersson et al. 1998) on a wider variety of tasks, the sixth column showing English-speaking subjects, the seventh showing German-speaking subjects.9 As can be seen, there are some striking differences in distributions of act types across the various domains, schemes, and corpora. For example, roughly 50% of utterances are statements in the Switchboard corpus, which is concerned with casual conversation, while the Maptask efforts, concerned with instruction giving/following, have only 8% of utterances labeled with the equivalent tag, explain. Conversely, the Maptask dialogues have over 15% of utterances marked as instruct, while Switchboard has less than 1% of utterances labeled as action-directive. Different tasks and coding purposes may also place different demands on specificity of a taxonomy (see question 18), e.g. to have an appropriate reliability and perplexity for a given coding purpose. While it is hard to see from Table 1, since individual tags are clustered for comparison purposes, there are also large differences
15.2
1.2
non-understand
0.5
O.I
23
20.5
28.5
acknowledge
0.3
30.2
1.8
other agree 3-6
0.2
5
Understanding
0.5
2.2
reject
23.0
28.1
backchannel 3-6
2-3
clarify
reject, explained 3-3
10.3
feedback
22.8
13.4
commit
26.0
request, suggest
22.8
Inform, . . .
Verbmobil II Verbmobil English
reply, clarify 20
15.2
20.3
7-9
HCRC DC1EM Maptask
accept, confirm
3
instruct 15.6
query, check, align 23-5
explain 7-9
HCRC HCRC Maptask
30.0
8.4
1.4
0.1
0.7
questions 4.9
49
SWBD-Damsl Switchboard
accept
14-7
answer
0.6
16.8
commit, offer 23.8
conventional 2-5
12.9
12.2
action-dir, 00
15.2
9.9
51.4
statement 45-9
info-request
Damsl Monroe
Damsl TRAINS
Table I Percentage distributions of dialogue acts in corpus coding
ded from jos.oxfordjournals.org by guest on January 1, 2011 3-3
'•9
4-4
12.3
9.8
15.6
0.8
27.0
21.2
Verbmobil II Verbmobil German
8.9
8.2
•3-5
0.6
16.5
32
12.2
Verbmobil I Verbmobil I German
g
David R. Traum 23
Question 17: Can the same taxonomy be used for different kinds of agents? As well as considering different communicative activities, we may also consider whether the same taxonomy could cover situations of humans communicating with humans, humans with machines, and machines with machines. Other possibilities could also include humans with animals or animals with animals (or possibly even animals with machines). Again, the hope of many researchers is that the same taxonomies (at a suitably abstract level, concerning some of the lack of subtlety of machine communication) could be used for any of these sets of agents. Some (e.g. Jonsson 199 s), however, have pointed to the differences in communication styles between human-human and human-machine communication as a motivation for different taxonomies, and not carrying over too many insights from one setting to the other. Even when only humans are communicating, there is still an important issue of the medium, e.g. face to face, spoken language only, or multi-modal computer mediated communication of various flavors. These issues will certainly have a bearing on the distribution of act types. For example, there is much more explicit grounding in spoken dialogue (> 95% (Traum & Heeman 1997)) than computer chat (~ 40% (Dillenbourg et al. 1997)),10 and more explicit verifications from computer systems with relatively poor speech recognition than between fluent humans. We can see from Table 1 that even within the same task group and using the same medium of spoken language, we can see significant differences in some of the act distributions, e.g. the different amount of acknowledgments performed during Maptask by Canadian military and Scottish students, or 10
Note, however, that these studies concerned different tasks.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
between the number of tags in these coding schemes, e.g. 12 tags in the HCRC scheme vs. 34 tags in Verbmobil II. Some researchers hope that these different task specific 'sub-taxonomies' might be fit together within a coherent general taxonomy of acts in dialogue. A general theory might also better allow one to use act distributions to identify activities or genres of activities as well as episodes within an activity. The DRI group has been working toward the goal of a general purpose scheme that might have more general applicability (at least within the general category of task-oriented dialogue) (Discourse Resource Initiative 1997; Core et al. 1999). The SLSA project at Gothenburg University is investigating more generally the issue of corpus collection and dialogue coding of spoken language activities (Allwood 1999).
24 20 Questions on Dialogue Act Taxonomies
for the Verbmobil II participants, contrasting English and German speakers. While using the same coding scheme for different corpora involving different participant groups may allow investigation into social and/or stylistic differences between speaker groups and between individuals, it may not be ideal for e.g. purposes of statistical training of computer systems within the style of a single group (in which case, one would want a scheme with maximal discrimination of the coding decisions).
Question 18: How detailed should a dialogue act taxonomy be?
Question 19: Where shouild complexity be realized im a Given that utterances in dialogue are generally multi-functional, the question arises as to how best to capture this multiplicity of functions in a taxonomy. There are two extremes: one is to separate out each function and code it separately, which requires multiple labels for each utterance, one for each function. The advantage is an ability to use fairly simple act definitions, each with fairly clear semantics and ascription conditions. The disadvantage is that there are a large number of tagging decisions—one for each functional dimension, which, if coded by human annotators, leads to a fairly onerous tagging task and lower reliability on some dimensions
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
There are many subtle gradations in speech act verbs, often relating to different facets of the participants or normative attitudes towards the content of the act (e.g. state, assert, inform, confess, concede, maintain,. . . ) . T h e question arises as to how many of these distinctions should be captured within a dialogue act taxonomy. One key issue is whether one wants to capture generalizations or distinctions. Also, there is often a trade-off between proposing many acts, to precisely capture subtle differences in conditions and effects and the reliability that can be attached to a coding effort using these tags, given inevitable ambiguity in particular situations (not to mention the potential for coders not sharing an understanding of the intended distinctions). If possible it may be best to arrange these fine distinctions within a hierarchical or lattice structure (as is done by e.g. Allen & Core 1997; Alexandersson et al. 1998), so that a degree of specificity may be chosen that is appropriate to the particular task. One issue is whether theorists and coders can agree on the hierarchical structure of related acts, which, in some cases, may be more controversial than the base labels themselves.
David R. Traum 25
Question 20: Can a taxonomy used for tagging dialogue corpora be given a formal semantics and/or be used in a dialogue system? The hope of many researchers is definitely a 'yes' answer to this question: the purpose of tagging or formal semantics is often for use within a dialogue system. Moreover, a clear semantics may help one to formulate sharper principles for a tagging exercise (see Poesio & Traum 1998 for an attempt to formalize the acts in Discourse Resource Initiative 1997; Allen & Core 1997). There are some difficulties, however. One is the issue of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
depending on annotator attention and atunement to each phenomenon. This approach is taken by Discourse Resource Initiative (1997) and Allen & Core (1997). The other extreme is to combine sets of coherent bundles of dialogue functions into complex labels and use these labels for coding dialogues. The advantage is a potentially easier and more reliable coding task, especially if the same bundles appear repeatedly within a given coding effort. The disadvantage is that there might be many possible acts if many different collections of functions co-occur in the corpus. If only some of these function-bundles are assigned labels, then it may be difficult to decide how to code an utterance that shares some (but not all) of the features of one label, while having some features from another. This approach can also lead to missing connections between different acts that share some of the features, making it hard to analyze existence of these features from the coded data. This approach is taken by the first Verbmobil coding scheme (Jekat et al. 1995). It is also possible to find taxonomies that take a more intermediate position than either extreme, attempting to capture some of the advantages of each. For example, the Switchboard DAMSL scheme uses many ideas from Discourse Resource Initative (1997) and Allen & Core (1997), while moving toward the other extreme of coding in discrete, mutually exclusive bundles rather than multiples dimensions. There are also proposals to do this for the main DRI scheme as well (Core et al. 1999). These schemes still retain the theoretical connection to the multi-layer DRI scheme, and so it should still be relatively straightforward to determine individual functions. Likewise, it should be possible to define optional rather than mandatory macros which combine convenient bundles of features together, simplifying the coding tasks while still maintaining the full flexibility of coding multiple functions. This is the method advocated in Poesio et al. (1999) and Cooper et al. (1999).
26 20 Questions on Dialogue Act Taxonomies
6 DISCUSSION Given that the above questions are not exhaustive or binary, and have remained mostly at the meta-level, we can certainly see that formulating the ultimate dialogue act taxonomy is a much harder problem than the game of 20-questions. The discussion above is also far from the last word on any of these topics. The hope is that further research may yield some more definitive answers or at least better understanding of the issues involved. Meanwhile, the above discussion may help dialogue act theorists be clearer about some of the meanings of their taxonomy, hopefully leading to wider understanding and applicability of the taxonomies that are used.
Acknowledgements The author was supported during the writing of this paper by the TRINDI (Task Oriented Instructional Dialogue) project, EU TELEMATICS APPLICATIONS Programme, Language Engineering Project LE4-8314. I would also like to (anonymously, because there are too many to mention without fear of forgetting someone important) thank the many colleagues who helped me formulate my ideas on the topics discussed in this paper. Thanks also to Jan Alexandersson, Mark Core, Amy Isard, Michael Kipp, and Amanda Stent for providing information on the various coding efforts reported in Table 1. Finally, I would like to thank Jens Allwood, William Mann, Henk Zeevat, and an anonymous reviewer for helpful comments on previous versions of this paper. DAVID R TRAUM USC Institute for Creative Technologies 13274 Fiji Way Marina del Rey, CA go2g2 USA
[email protected]
Received: 01.09.1999 Final version received: 21.07.2000
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
different resources—one may require not just the act category, but also details of the content of an act in order to use in a dialogue system or provide an appropriate semantic interpretation, yet providing this information may be too onerous for a tagging exercise. Likewise, formal representations of context built from incorporation of previous acts may not be available during a coding task. On the other hand, human coders may be able to use complex intuitions in their coding, which are difficult to incorporate in a formal description or implementation (however, these intuitions may perhaps be learned from a corpus, using machine learning techniques (Reithinger & Klesen 1997; Samuel 1998; Poesio & Mikheev 1998; Wright et al 1999)). These different skill sets may tend to make taxonomies designed for different purposes diverge.
David R. Traum 27
REFERENCES
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Airenti, G., Bara, B. G., & Colombetti, Allwood, J. (1999), The Swedish Spoken M. (1993), 'Conversation and behavior Language Corpus at Goteborg Unigames in the pragmatics of dialogue', versity', in R. Andersson, A. Abelin, Cognitive Science, 17, 197-256. J. Allwood & P. Lindblad (eds), Fonetik 99: Proceedings from the Twelfth Alexandersson, J., Buschbeck-Wolf, B., Swedish Phonetics Conference, Gothenburg Fujinami, T., Kipp, M., Koch, S., Maier, Papers in Theoretical Linguistics 81, 5-9. E., Reithinger, N., Schmitz, B., & Department of Linguistics, Goteborg Siegel, M. (1998), 'Dialogue Acts in University. VERBMOBIL-2', second edition, Verbmobil-Report 226, DFKI Saarbriicken, Allwood, J., Nivre, J., & Ahlsen, E. (1992), Universitat Stuttgart, Technische Uni'On the semantics and pragmatics of linversitat Berlin, Universitat des Saarlandes. guistic feedback',yowrna/ of Semantics, 9. Allen, J. & Core, M. (1997), 'Draft of Austin, J. A. (1962), 'How to Do Things with Words, Harvard University Press, DAMSL, dialog act markup in several Cambridge, MA. layers', available through the W W W at http://www.cs.rochester.edu/research/ Bilange, E. (1991), 'A task independent oral trains/annotation. dialogue model', in Proceedings of the Fifth Conference of the European Chapter of the Allen, J. F., Miller, B. W., Ringger, E. K, & Association for Computational Linguistics, Sikorski, T. (1996), 'A robust system for 83-8. natural spoken dialogue', in Proceedings of the 1996 Annual Meeting of the AssociationBretier, P. & Sadek, M. D. (1996), 'A rational for Computational Linguistics (ACL-96), agent as the kernel of a cooperative 62-70. spoken dialogue system, implementing a logical theory of interaction', in J. P. Allen, J. F. & Perrault, C. R. (1980), 'AnalyzMuller, M. J. Wooldridge & N. R. ing intention in utterances'. Artificial Jennings (eds), Intelligent Agents III: Intelligence, 15, 3, 143-78. Proceedings of the Third International Allwood, J. (1976), 'Linguistic communiWorkshop on Agent Theories, Architectures, cation as action and cooperation', Ph.D. and Languages (ATAL-96), Lecture thesis, Goteborg University, Department Notes in Artificial Intelligence. of Linguistics. Springer-Verlag, Heidelberg. Allwood, J. (1977), 'A critical look at speech act theory', in 6 . Dahl (ed.), Logic, Prag- Bunt, H. (1996), 'Interaction management matics and Grammar, Studentlitteratur, functions and context representation Lund. requirements', in Proceedings of the Twente Workshop on Language Technology: Allwood, J. (1980), 'On the analysis of Dialogue Management in Natural Language communicative action', in M. Brenner, Systems (TWLT 11), 187-98. (ed.), The Structure of Action, Basil Blackwell, Oxford. Also appears as Gothenburg Carletta, J., Isard, A., Isard, S., Kowtko, J., Papers in Theoretical Linguistics j8, Dept. Doherty-Sneddon, G., & Anderson, A. of Linguistics, Goteborg University. (1996), 'HCRC dialogue structure coding manual', Technical Report 82, Allwood, J. (1994), 'Obligations and options HCRC. in dialogue'. Think Quarterly, 3, 9-18. Allwood, J. (1995), 'An activity based Carletta, J., Isard, A., Isard, S., Kowtko, J. G, Doherty-Sneddon, G., & Anderson, A H. approach to pragmatics', Technical Report (GPTL) 75, Gothenburg Papers (1997), The reliability of a dialogue in Theoretical Linguistics, University of structure coding scheme', Computational Goteborg. Linguistics, 23 1, 13-31.
28 2o Questions on Dialogue Act Taxonomies Clark, H. H. (1992), 'Arenas of Language Use,
University of Chicago Press, Chicago. Clark, H. H. & Schaefer, E. F. (1989), 'Contributing to discourse', Cognitive Science, 13, 259-94. Also appears as Chapter 5 in Clark (1992). Cohen, P. R. & Levesque, H. J. (1990), 'Rational interaction as the basis for communication', in P. R. Cohen, J. Morgan & M. E. Pollack (eds), Intentions
in Communication,
M I T Press,
Working Notes AAA1 Spring Symposium on Applying Machine Learning to Discourse Processing, 18-24.
Core, M., Ishizaki, M., Moore, J., Nakatani, C, Reithinger, N., Traum, D., & Tutiya, S. (1999), The report of the third workshop of the Discourse Resource Initiative', Chiba University and Kazusa Academia Hall. Technical Report No. 3 CC-TR-99-1, Chiba Corpus Project. Dillenbourg, P., Jermann, P., Schneider, D., Traum, D., & Buiu, C. (1997), T h e design of MOO agents: implications from a study on multi-modal collaborative problem solving', in Proceedings
the Twente Workshop on the Formal Semantics and Pragmatics of Dialogues,
11-30, Enschede, Universiteit Twente, Faculteit Informatica. Goldman, A I. (1970), A Theory of Human Action, Prentice Hall Inc., New York. Grice, H. P. (1957), 'Meaning', Philosophical Review, 66, 377-88. Grosz, B. J. & Sidner, C. L. (1986), 'Attention, intention, and the structure of discourse', Computational Linguistics, 12, 3, 175-204.
Halliday, M. A. K. (1961), 'Categories of the theory of grammar', Word, 17, 241-92. Harel, D. (1979), First Order Dynamic Logic,
Springer-Verlag, Heidelberg. Heeman, P. A. & Allen, J. (1994), T h e TRAINS 93 dialogues', TRAINS Technical Note 94-2, Department of Computer Science, University of Rochester. Hintikka, J. (1962), Knowledge and Belief: An Introduction to the Logic of the Two Notions,
Cornell University Press, Cornell, NY. Jackendoff, R. (1983), Semantics and Cognition, MIT Press, Cambridge, MA Jekat, S., Klein, A, Maier, E., Maleck, I., Mast, M., & Quantz, J. (1995), 'Dialogue Acts in VERBMOBIL', Technical Report 65, BMBF Verbmobil Report. Jonsson, A (1995), 'Dialogue actions for natural language interfaces', in Proc. of of the 8th World Conference on Artificial the 14th IJCAI, 1405-11, Montreal, Intelligence in Education (AI-ED aj), Canada. 15-22. Jurafsky, D., Bates, R, Coccaro, N., Martin, R, Meteer, M., Ries, K., Shriberg, E., Discourse Resource Initiative (1997), 'StanStolcke, A., Taylor, P., & Ess-Dykema., dards for dialogue coding in natural language processing', Report no. 167, C. V. (1998), 'Switchboard discourse language modeling project final report', Dagstuhl-Seminar.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Cambridge, MA. Cohen, P. R & Perrault, C. R. (1979), 'Elements of a plan-based theory of speech acts', Cognitive Science, 3, 3, 177-212. Condon, S. & Cech, C. (1992), 'Manual for coding decision-making interactions', unpublished manuscript, updated May 1995, available at: ftp://sls-ftp.lcs.mit. edu/pub/multiparty/coding_schemes/ condon. Cooper, R., Larsson, S., Matheson, C, Poesio, M., & Traum, D. (1999), 'Coding instructional dialogue for information states. Deliverable D1.1, Trindi Project. Core, M. (1998), 'Analyzing and predicting patterns of DAMSL utterance tags', in
External Interfaces Working Group (1993), 'Draft specification of the KQML agent-communication language, available through the W W W at http:// www.cs.umbc.edu/kqml/papers/. FIPA (1997), 'Fipa 97 specification part 2, Agent communication language, Working paper available at http://drogo.cseh. stet.it/fipa/spec/fipa97/f8a21.zip. Ginzburg, J. (1998), 'Clarifying utterances', in J. Hulstijn & A. Niholt (eds), Proc. of
David R. Traum 29
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Research Note 30, Center for Speech Poesio, M., Cooper, R., Larsson, S., Matheson, C, & Traum., D. (1999), and Language Processing, Johns Hopkins 'Annotating conversations for informaUniversity. tion state update', in Proceedings ofAmsteJurafsky, D., Shriberg, E., & Biasca, D. logue 'gg Workshop on the Semantics and (1997), 'Switchboard SWBD-DAMSL Pragmatics of Dialogue. shallow-discourse-function annotation coders manual', Technical Report Poesio, M. & Mikheev, A. (1998), The 97-02, University of Colorado Institute predictive power of game structure in of Cognitive Science. Draft 13. dialogue act recognition, experimental results using maximum entropy estimaKipp, M. (1998), The neural path to diation', in Proceedings oflCSLP-gS, Sydney, logue acts', in Proceedings of ECAI g8, 1998. 175-9Poesio, M. & Traum, D. R. (1997), Klein, M., Bernsen, N. O., Davies, S., 'Conversational actions and discourse Dybkjaer, L., Garrido, J., Kasch, H., situations', Computational Intelligence, 13, Mengel, A., Pirrelli, V., Poesio, M., Quazza, S., & Soria, C. (1999), 'Supported 3coding schemes', deliverable D1.1, Poesio, M. & Traum, D. R. (1998), Towards MATE Project, available at http:// an axiomatization of dialogue acts', in www.dfki.de/mate/d 11/. Proceedings of Twendial 'gS, ljth Twente Lambert, L. & Carberry, S. (1991), 'A Workshop on Language Technology: Formal tripartite plan-based model of discourse', Semantics and Pragmatics of Dialogue. in Proceedings of the 2gth Annual Meeting Pollack, M. E. (1990), 'Plans as complex of the Association for Computational mental attitudes', in P. R. Cohen, J. Linguistics, 47-544. Morgan & M. E. Pollack (eds), Intentions Lewin, I. (1998), The autoroute dialogue in Communication, MIT Press, Cambridge, demonstrator', Technical Report CRCMA. 073, SRI Cambridge Computer Science Reithinger, N. & Klesen, M. (1997), 'DiaResearch Centre. logue act classification using language models', in Proc. Eurospeech 'gy, 2235-8, Litman, D. J. & Allen, J. F. (1987), 'A plan Rhodes, Greece. recognition model for subdialogues in conversation', Cognitive Science, 11, Sadek, M. D. (1991), 'Dialogue acts are 163-200. rational plans', in Proceedings of the ESCA/ETR Workshop on Multi-modal McCarthy, J. & Hayes, P. (1969), 'Some Dialogue. philosophical problems from the standpoint of artificial intelligence', in B. Samuel, K. (1998), 'Discourse learning: Meltzer & D. Michie (eds), Machine dialogue act tagging with transformaIntelligence 4, Edinburgh University tion-based learning', in Proceedings of the Press, Edinburgh, 463-502. Also appears 15th National Conference on Artificial in N. Nilsson & B. Webber (eds), Intelligence (AAAI-g8) and of the 10th Readings in Artificial Intelligence, Morgan- Conference on Innovative Applications of Kaufmann. Los Altos, California. Artificial Intelligence (IAAI-g8), 1199, AAAI Press, Menlo Park. McRoy, S. W. & Hirst, G. (1995), The repair of speech act misunderstandings Searle, J. R (1969), Speech Acts, Cambridge by abductive inference', Computational University Press, New York, NY. Linguistics, 21, 4, 5-478. Severinson Eklundh, K. (1983), The notion of language game: a natural unit of Nakatani, C. H. & Traum, D. R. (1999), dialogue and discourse', Technical 'Coding discourse structure in dialogue Report SIC 5, University of Linkoping, (version 1.0)', Technical Report UMIACSStudies in Communication. TR-99-03, University of Maryland.
30 20 Questions on Dialogue Act Taxonomies Sidner, C. L. (1994), 'An artificial discourse language for collaborative negotiation', in Proceedings of the forteenth National Conference of the American Association for Artificial Intelligence (AAAI-94), 814-19.
Sinclair, J. M. & Coulthard, R M. (1975), Towards an Analysis of Discourse: The English Used by Teachers and Pupils.
Language and Speech, 41, 493-512.
Traum, D. R. (1994), 'A computational theory of grounding in natural language conversation', PLD. thesis, Department of Computer Science, University of Rochester. Also available as TR 545, Department of Computer Science, University of Rochester. Traum, D. R. (1999), 'Speech acts for dialogue agents', in A Rao & M. Wooldridge (eds), Foundations of Rational
Agency, Kluwer, Dordrecht. Traum, D. R & Allen, J. F. (1992), 'A speech acts approach to grounding in conversation', in Proceedings 2nd International Conference on Spoken Language Processing (ICSLP-92), 137-40.
Dialogue Processing in Spoken Language Systems: ECAI-96 Workshop, Lecture
Notes in Artificial Intelligence, 125-40. Springer-Verlag, Heidelberg. Traum, D. R & Hinkelman, E. A (1992), 'Conversation acts in task-oriented spoken dialogue', Computational Intelligence, 8, 3, 575-99. Special Issue on non-literal language. van Vark, R, de Vreught, J., & Rothkrantz, L. (1996), 'Analysing OVR dialogue coding scheme 1.0', Technical Report 96-137, TU Delft Faculty of Technical Mathematics and Informatics. Vanderveken, D. (1990), 'On the unification of speech act theory and formal semantics', in P. R. Cohen, J. Morgan & M. E. Pollack (eds), Intentions in Communication, MIT Press, Cambridge, MA Vanderveken, D. (1990/1991), Meaning and Speech Acts, Cambridge University Press, Cambridge. Winograd, T. & Flores, F. (1986), Understanding Computers and Cognition, Addison-
Wesley. Redding, MA Wright, H., Poesio, M., & Isard, S. (1999), 'Automatic extraction of game structure for dialogue act recognition using prosodic features', in Proc. of the ESCA Workshop
Eindhoven
on
Dialogue
and
Prosody,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Oxford University Press, Oxford. Singh, M. P. (1998), 'Agent communication languages: rethinking the principles', IEEE Computer, 31, 12, 40-7. Stent, A J. (2000). The Monroe corpus', Technical Report 728, Computer Science Dept., University of Rochester. Taylor, P. A, King, S., Isard, S. D., & Wright, H. (1998), 'Intonation and dialogue context as constraints for speech recognition',
Traum, D. R. & Heeman, P. (1997), 'Utterance units in spoken dialogue', in E. Maier, M. Mast & S. Luperfoy (eds),
Journal of Semantics 17: 31-50
© Oxford University Press 2000
Truth Conditional Discourse Semantics for Parentheticals NICHOLAS ASHER The University of Texas at Austin
Abstract
1 INTRODUCTION There is a tradition in pragmatics going back at least to Grice according to which certain constructions and parts of speech do not contribute to the truth conditional content of the assertions of which they are part. Rather they implicate or indicate either a particular speech act or an attitude of the speaker. Examples of such items are: 9 mood indicators—questions, commands. • interjections—Oh, Gee, Too bad, Damn, etc. a so called discourse adverbials—allegedly, unfortunately, etc. This category also includes adverbial clauses—e.g. as Mary assures us. « so called pragmatic conditionals—ifyou know what I mean, if you see what I'm getting at. © discourse particles—tte in Japanese or re in Sissala for hearsay. • discourse connectors—but, too, hence, so, therefore, etc. « parenthetical constructions, in which full clauses missing a verbal complement occur. Wilson (1975) and others have argued that all of these phenomena exhibit a similar behavior relative to a test for non-truth conditional meaning—the 'embedding test'. Sperber & Wilson (1995) also claim that they can give a unified analysis of these phenomena. However, I will argue here that the test does not really separate out parts of speech with a non-truth conditional meaning (whatever that might be is not my concern here). Certainly, there is reason to doubt that all of these constructions fail to be amenable to truth conditional or, more generally, model theoretic analysis.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
It has been often argued that parentheticals, discourse adverbials and certain parts of speech like interjections do not contribute to the truth conditional content of the assertions of which they are part. In this paper I argue that many of these constructions do contribute a truth conditional content, and I propose a semantics for parentheticals and discourse adverbials that treats these constructions similarly to SDRT's treatment of presuppositions. I also point out differences between standard presupposition triggers on the one hand and parentheticals or discourse adverbials on the other.
32 Truth Conditional Discourse Semantics for Parentheticals
2 A TEST FOR N O N T R U T H C O N D I T I O N A L MEANING? According to Wilson (1975), there is a test for non truth conditional meaning: embed the questionable item into the antecedent of a conditional and see if the purported truth conditional contributor's meaning falls within the scope of 'if. If it does, it is truth conditional; and if not, not. Here are some examples of the test at work: (2) a. If the party, unfortunately, is over, then we should find somewhere else to get a drink. If it is unfortunate that the party is over, then we should find somewhere else to get a drink b. If the sun is shining but it's midnight, then we must be in Norway. If the sun is shining and it's midnight and that's not expected, then we must be in Norway. c. If, I'm warning you, you cross that line, I'll hit you. If I'm warning you that you cross that line, I'll hit you. According to the test, these examples appear to indicate that neither
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Some mood indicators have received detailed and rigorous model theoretic analysis in e.g. Hintikka (i 974) and Groenendijk & Stokhof (1984). Others have argued that discourse connectors have an important though sometimes subtle effect on the truth conditional interpretation of discourse (Asher 1993; Lascarides & Asher 1993) and that hence traditional pragmatics gives a misleading picture of discourse interpretation by separating out the contribution of discourse connectors from an account of truth conditional content. Similar remarks apply to the Japanese discourse particle tte (Hasegawa 1996). In this paper I will examine parentheticals and discourse adverbials. Some examples of parentheticals are given in (1) below (the parentheticals are underlined). I will argue that these parts of speech also have a straightforward truth conditional semantics in a theory of discourse interpretation that takes account of discourse structure. The theory of discourse interpretation that I will use is SDRT (Asher 1993, Lascarides & Asher 1993), an extension of DRT that incorporates an account of discourse structure and rhetorical function. (1) a. The party is over, I hear. b. Please leave, I beg you. c. The party, Mary assures us, is over.
Nicholas Asher 33
parentheticals, discourse adverbials, nor discourse particles have a truth conditional import, because their supposed content cannot embed inside a conditional. But this conclusion is too hasty. According to this test, nonrestricted relative clauses and appositive NPs would fail to have a truth conditional import, when they obviously do: (3) If the party, which Jane attended, is over, then we should find somewhere else to get a drink.
If the party is over and Jane is hosting that party, then we should find somewhere else to get a drink. Crucially, what is wrong with the test for non-truth conditional meaning is that it overlooks the obvious possibility that the content ofthe apparently non-truth conditional item may simply fall outside the scope of the conditional but nevertheless contribute to the truth conditions of the discourse. Before dismissing this test, it is nevertheless important to note that it does render very dubious an account of parentheticals as syntactically displaced constituents. Such a simple account of parentheticals would see the examples in (1) as equivalent to sentences in which the main clause is a complement to the expression in the parenthetical. Thus, (ic) would be equivalent to (5) Mary assures us that the party is over. Although this sounds initially plausible, this fails to explain a difference in the discourse behavior of (ic) and (5). The latter, but not the former, can be questioned or undercut by Does she?: (6) a. # A: The party, Mary assures, is over. B: Does she? b. A: Mary assures us that the party is over. B: Does she? Further, once the parenthetical occurs within a clause of a complex sentence like those used in the embedding test, this simple syntactic account makes the wrong predictions, as we will see below. Finally, convincing, syntactic evidence that parentheticals, discourse adverbials and interjections remain unattached 'orphans' at syntactic structure has been given by Haegeman (1991) (see also Haegeman 1984). Thus, if we are to provide a unified interpretation for sentences containing parentheticals or discourse adverbials, we will have to move to a semantic or even pragmatic-semantic account of logical form, which is what I turn to now.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
If the party is over and Jane attended that party, then we should find somewhere else to get a drink. (4) If the party, that one that Jane is hosting, is over, then we should find somewhere else to get a drink.
34 Truth Conditional Discourse Semantics for Parentheticals
3 A POSITIVE ACCOUNT 3.1 The basics Parentheticals and discourse adverbials share several features with presuppositions. First, both typically project out of the context in which they are introduced. Projection means that presuppositions also fail the 'test' for truth conditional meaning propounded by Wilson: (7) If the King of Buganda is bald, then he wears a wig in public. ? —> If there is a King of Buganda and the King of Buganda is bald, then he wears a wig in public. Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Second, parentheticals, discourse adverbials and presuppositions all typically convey propositions, once certain anaphorically underspecified elements are resolved. Asher and Lascarides (1998b) present an SDRT account of presuppositions according to which presuppositions must be attached to some antecedent, available part of the discourse context via a restricted range of discourse relations that represent their discourse function. These two relations are Background, in which the presupposition gives some stage setting information about one or more elements in the main narrative line of the text, and Defeasible Consequence, in which the presupposition is a defeasible consequence of the constituent to which it is related. One of these two relations always attaches a presupposition to the discourse context, unless the presupposition trigger itself specifies a discourse relation; for instance, the presupposition trigger too introduces the discourse relation Parallel, as argued in Asher (1993)- Following van der Sandt's account, the SDRT account of presupposition supposes that presuppositions that cannot be derived from or 'bound to' the context (and so attached with Defeasible Consequence) prefer an attachment to as superordinate a position as possible in the discourse context (the counterpart to van der Sandt's rule of wide scope accommodation). Let us now see how parentheticals are anaphoric and express propositions. In SDRT parentheticals, like presuppositions and ordinary assertions, must attach to some part of the discourse context via a discourse relation. Parentheticals prefer a different attachment to presuppositions: they typically attach to a discourse constituent formed from the asserted clause or sentence in which they are embedded, whereas presuppositions can attach at any available position and even prefer high attachment with Background. Some parentheticals—viz. the epithets, expressions like he commented, and many discourse adverbials—determine a particular discourse relation like Commentary. Others encode the rhetorical functions used to attach their propositional content in their main verbs, as in Mary assures us. On the other hand, parentheticals containing main verbs that express a
Nicholas Asher 35
relations between speech acts such as Mary explained, Mary elaborated, Mary replied function like the presupposition trigger too; they simply express a proposition containing a discourse relation that is to be used in attaching the clause surrounding them to antecedent material—consider, for example, the parenthetical in (8): (8) As I entered the class I saw a student rush out in tears. The boy, the teacher explained, had just failed his exam. With this in mind let us now consider an example of a parenthetical and how to treat it. (9) John, Mary assures us, can be trusted.
(10)
m,X,p •K:
assures(m, X, p)
R{y .*) R =, ? v =7 j TT':
j can be trusted
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
I will assume that syntax can isolate out the parenthetical and that lexical analysis will reveal that verbs used in our parenthetical examples like assure, hope, fear as well as sentence adverbials like unfortunately have an argument that needs to be specified. SDRT takes a sentence like (9), and indeed any discourse, to produce an SDRS, which we can think of as a pair consisting of a set A of labels and a function T that maps elements of A into formulas (e.g. DRSs and SDRSs) representing the content of the labeled constituents (see e.g. Asher 1997). The parenthetical and main clause then for (9) will each generate a DRS with a label; in addition the parenthetical, like a presupposition, will introduce a discourse relation R that relates the parenthetical element to some other label standing for some other discourse constituent. However, the compositional semantics of the parenthetical does not by itself specify either what R or the other discourse constituent is. I will use the following representation for the SDRS for (9).
36 Truth Conditional Discourse Semantics for Parentheticals
(11) Mary assures us that John can be trusted. This account could be used to get wide scope readings of the parenthetical material for the embedding test sentences. Thus, something like (2c) would be equivalent to: (12) I'm warning you that if you cross the line, I'll hit you. This proposal is very similar to the syntactic proposal of the previous section, but it escapes the syntactic criticisms of the earlier view. On the other hand, the equivalences this account predicts do not explain the differences in discourse behavior exemplified in (6a,b). This view also predicts incorrect truth conditional equivalences. Consider: (13) a. Mary assures us that John can be trusted, but I don't trust him. b. John, Mary assures us, can be trusted, but I don't trust him. Informants find (13b) odd. And the reason, I think, is that the use of certain parentheticals that use evidential verbs like assure, swear, testify and affirm prefer attachment with the relation Evidence, which affect the speaker's commitment to the constituent to which the parenthetical attaches. To be more precise, the Evidence relation is what one might call a veridical relation; if a speaker is committed to Evidence ( TT ,, 7r2), then he is also committed to the truth of the contents associated with the labels 7r, and nz.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In (10) 7r labels the constituent derived from the parenthetical material, here represented as a DRS; p stands for the object of assure that has yet to be specified; R stands for a yet to be specified discourse relation; and v is the label of a yet to be specified attachment point for the parenthetical constituent. Compositional semantics gets us a logical form for the assertion minus the parenthetical—which I will label with n'. The rule for processing parentheticals resolves the underspecified condition v = ? to v — IT', while the resolution of the underspecified conditions p = ? and R = ? is left to independent processes of anaphora resolution and SDRT's computation of discourse relations respectively. In this case, there is only one proposition in the context to identify with p—the proposition that John can be trusted. And the relation R can be specified given the content of the parenthetical to Evidence. On this analysis, we have a pretty straightforward truth conditional analysis of parentheticals: My account is slightly more complicated than one might think necessary. Why must we, one might argue, attach the parenthetical with a discourse relation to some constituent? Could we not just assume that p is always identified with the entire surrounding assertion, thus reviving the syntactic analysis at the level of logical form? On this simpler proposal, we would predict (9) to be equivalent to:
Nicholas Asher 37
More formally, where ; is interpreted as dynamic conjunction and where Kn represents the DRS associated with the label n, we have: (14) Evidence^, 7T,) ->
(K^K^)
With this constraint it is evident that (13 b) becomes inconsistent. The simple non-discourse based story cannot make such a difference between (13a) and (13b). Further, other parentheticals that induce other discourse relations do not give rise to the sort of difference observed in (13a) and (13b). Consider, for instance, the parenthetical in (15a) and its non-parenthetical counterpart in (15b):
There is a non-veridical relation between the parenthetical in (15a) and the main clause; the fact that we supposed Mary could not be trusted gives as a reasonable but defeasible consequence that she was not trustworthysomething that SDRT models with the relation Defeasible Consequence, according to which the fact that Mary is not trustworthy cannot be inferred if this is inconsistent with the information given (see Asher & Lascarides 1998b for details). In fact the defeasible consequence that Mary is not trustworthy is blocked in (15a). So Defeasible consequence is non-veridical, because it does not entail the truth of the formulas associated with the labels that are its terms. In fact, it is the only non-veridical discourse relation for monologue. Thus, provided we infer a non-veridical rhetorical relation between the parenthetical and its attachment point, (15a) is predicted OK and there is no difference of acceptability between (15a) and (15b). A more adequate treatment of evidential parentheticals has to add an additional parameter to the evidential relation. Evidence is always evidence for someone, and the cases we have considered so far only consider the cases where Evidence is Evidence for the speaker. Here's an example where the Evidence relation holds for the agent Paul in the example but not, presumably, for the speaker. (16) John, Mary assured Paul, could be trusted. So Paul gave him his apartment while he went on vacation. When he came back, he found that the apartment had been ransacked. • Constraint on Evidence for 5 Evidence-for5(7r, 7r,) —> S believes that
(Kn;Kni)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(15) a. Mary, we supposed, could not be trusted. But we were wrong. She is completely trustworthy. b. We supposed that Mary could not be trusted. But we were wrong. She is completely trustworthy.
38 Truth Conditional Discourse Semantics for Parentheticals
(17) a. b. c. d.
Please leave, I'm begging you. You must leave, I'm begging you. ?You must be tired, I'm begging you. ?The party must be over, I'm begging you.
A purely semantic (or syntactic) account must either accept (iybcd) or reject them all. The verbs of these parentheticals have the same sort of semantic object, which, I argued in Asher (1993), is not what is given by the indicative main clause in these examples. But because on this account parentheticals have an anaphorically specified argument, we can speculate that some coercion or bridging inferences are allowed. Bridging inferences are typically subject to certain rhetorical constraints (Asher & Lascarides 1998 a), and we see this in evidence here. A clear rhetorical connection for the parenthetical helps us find the appropriate anaphoric object. For instance, (17b) is perfectly fine; there is a clear rhetorical relation between begging someone to leave and the person's having or being obligated to leave. Being obliged to leave has as a natural result the action of leaving; there is a natural Narrative link between the two. For (17c), however, the rhetorical relation is not at all clear, nor again for (i7d). Or at least you would need a particular context where someone's begging you to be tired obliges you to be tired and where the concept ofbeing obliged to be tired makes sense. A story that does not take rhetorical relations seriously cannot account for the differences between (17b) and (17c) in any way, as far as I can see. 3.2 Attachments and parentheticals in complex assertions The discourse based account of parentheticals just sketched leaves open exactly what an appropriate attachment point for the parenthetical is. In the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
I will use this parametrized version of the Evidence relation below. Discourse factors also explain the unacceptability of (6a). Let us suppose that we have an SDRS for the first sentence such that the parenthetical (TT) is evidence for the main clause (TI^). Thus, we have Evidence^, 7r,). SDRT'S principles of discourse attachment say that we can only attach new information to 7r, or to the label for the entire SDRS; ir is not on the right frontier of graph of this SDRS and so is not accessible (see Asher 1993 for details). Yet the question Does she? in (6a) should attach to TT, as it is intuitively only Mary's assuring that is being questioned. The clash between the attachment constraint and the evidence discourse function of the question thus account for the oddity of (6a). A final difference between the simpler account and my discourse based account is that mine predicts better when some parentheticals do not work. Consider:
Nicholas Asher 39
simple example (9) above, the entire assertion served as attachment point. What about more complex sentences? On the one hand, we have wide scope readings of parentheticals. (18) a. Only if, I fear, we work like dogs, will we be able to save this company. b. Even if, I hear, the reception's over, we'll still be able to get something to eat. c. If, I'm warning you, you cross that line, I'll hit you.
(19) If the party, unfortunately, is over, then we have to go somewhere else to get a drink For this informant, the most salient interpretation of (19) is that if the party is over, then we should find somewhere else to get a drink and it is unfortunate that we should find somewhere else to get a drink. This attachment is one that is quite different from a presupposition, since the parenthetical is attached and resolved to some non-accessible element in the discourse structure (following van der Sandt). But it is a matter of cataphoric resolution of an underspecification, which should be admissible in so far as other cataphoric links are admissable. On the other hand, many people find it easy to attach parenthetical material or a discourse adverbial surrounded by the antecedent of a conditional to that antecedent. The following three examples show a diversity of interpretations, however. (20) a. If the party, as Mary assures us, is over, then we should find somewhere else to get a drink b. If the party, unfortunately, is over, then we should find somewhere else to get a drink c. If the party, unfortunately, is over, then we should go home. In (20a) the discourse adverbial takes the antecedent of the main assertion in its scope and attaches with Background to the conditional. The most salient reading of (20a) is: Mary assures us that the party is over, and if the party is over, then we should find somewhere else to get a drink (20b) gives rise to two interpetations: (i) it is unfortunate that if the party is over, then we should find somewhere else to get a drink; (ii) if the party is over, then that
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
These examples all seem to invite at least the reading on which the parenthetical's anaphoric argument p is identified with the entire assertion surrounding the parenthetical. On the other hand, we have examples in which a parenthetical or discourse adverbial has scope over the consequent, at least according to one informant.
40 Truth Conditional Discourse Semantics for Parentheticals
is unfortunate and if the party is over we should find somewhere else to get a drink The first reading is a straightforward application of the account given so far, but what about the second? In fact the second seems more plausible. To account for these various readings, we need to appeal in greater detail to the nature of discourse relations and attachment points in SDRT. The wide scope reading of (18a) can be understood as a Commentary on the assertion itself On this reading, (18a) is equivalent to:
This reading follows from our decision to treat the parenthetical as a real discourse constituent. There is perhaps also a narrower scope reading for (18a) where the Commentary extends only over the antecedent of the conditional-more on this below. The other two examples of widescope readings generate other relations than Commentary in the attachment reasoning process. The parenthetical / hear generates an Evidence relation in the widescope reading for (18b) and is analyzed in a manner similar to (18a). For (18c), we need to ask: how does the warning that if you cross that line I will hit you relate to the assertion that if you cross that line I will hit you? The warning, it seems, has the assertion as a Result. Warnings are factive! The attachment possibilities for the parenthetical in these examples reflect the resolution of the anaphoric element. In fact in all of the examples (18), (19) and (20), the attachment point and the constituent identified as the complement of the parenthetical or discourse adverbial coincide. Why should this be? In general it is because of our relations Commentary and Evidence. In SDRT attachments are decided so as to maximize discourse coherence. This principle, Maximize Discourse Coherence, can be stated informally: o Maximize Discourse Coherence: In updating a discourse context r with new information tfi, resolve all those underspecifications not resolved by the choice of a discourse relations or by constraints on discourse relations, lexical choice and logical form so as to produce an update that is T,
maximal. A r,(f> maximal update is one in which a maximal number of underspecified elements have been resolved and in which each discourse relation in the structure is as coherent as it can be. Underlying this principle is the ' Commentary is a discourse relation in SDRT. For details, see Asher (1993).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(21) Only if we work like dogs, will we be able to save this company and, as a Commentary, I fear that only if we work like dogs, will we be able to save this company,
Nicholas Asher 41
• There may also be other elements at work, like the position of the parenthetical and the intonational contour used with it. Further, negative Commentaries like^ear or unfortunately will be less coherent as a rule when they are attached to what are seen as positive outcomes (e.g. turning the company around), and so such attachments will be in general dispreferred. An analysis of these lexical and intonational factors I leave for another time.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
idea that many discourse relations are scalar—e.g. Parallel, Contrast, Explanation, and, especially for our purposes, Commentary and Evidence. These relations are maximally strong when we give evidence for or commentary on the whole of the constituent that is the attachment point. And this comes about precisely when we resolve the anaphoric element p in the representation of the parenthetical to the constituent that is the attachment site. This leaves open the question as to why the attachments and anaphoric resolutions are chosen the way they are. Maximize Discourse Coherence and other discourse factors influence these choices as well.2 In analyzing various attachment possibilities, let us turn first to some examples of narrow scope or local attachments in (2o)-(2ob-c) and their more salient second readings. There is a conflict between the status the conditional confers on the proposition that the party is over and the way the adverbial together with its resolved argument gets attached to the discourse context. On my account, the adverbial gives rise to a labeled DRS Kn containing the conditions Unfortunate(p) and p = ?; and on the reading we are interested in here, p is identified with the proposition Kni, expressed by antecedent of the conditional, in which the parenthetical is embedded and which I will label 7T,. The natural, albeit defeasible, resolution of the underspecified discourse relation between IT and its attachment point TT,, is Commentary, as is suggested by the adverbial itself. But Commentary, like many discourse relations in SDRT, is veridical; i.e. Commentary(TT,, 7r2) —> {K^^K^^, where Kn is the constituent DRS or SDRS associated with the label n and ';' represents again dynamic conjunction. Further, Unfortunate(p) generates the presupposition that the proposition identified with p is true. So on this way of interpreting the adverbial, we imply that the party in fact is over, but this conflicts with dependence of this constituent on the antecedent of the conditional. In SDRT the presence of a conditional operator signals a discourse relation between labels for the conditional's antecedent and consequent. This is again the relation Defeasible Consequence that I introduced earlier, and it is non-veridical. Now there is a pragmatic conflict generated by two modes of attachment for the discourse constituent ir in (2obc). The attachment of IT, via Commentary signals a speaker's commitment to the truth of the related constituents and hence to K^, whereas the attachment of 7r, to the
42 Truth Conditional Discourse Semantics for Parentheticals
consequent of the conditional via Defeasible Consequence signals that the speaker is not committed to the truth of the constituent. The same analysis holds if we try to resolve p to the consequent of the conditional in these examples, which does not seem to be a very salient reading here but is for
(22) When a preferred veridical relation cannot be used because of conflict with the non-veridical, conditional status of the chosen attachment point 7r, attach with Defeasible Consequence to n. This generalization accounts for the conditional reading of the parenthetical in (2ob-c). Now why does (20a) lack a conditional reading? The main verb in the parenthetical and its subject in this example indicates that the relation to be used for attachment is Evidence with the agent parameter filled in by a group that includes the speaker. Our parametrized and more sophisticated version of the Evidence relation, while not veridical, can entail a doxastic commitment by the speaker that would conflict with the context as in (13b). But the constituent for which the parenthetical provides evidence in (20a) as opposed to (13b) is irrealis, so the doxastic commitment by the speaker is rendered moot and there is no conflict between the implications of the Evidence relation and the non-veridicality of the attachment point. Further, although the parenthetical attaches to the antecedent, the adverbial does not generate a presupposition that its propositional object is true, although this is what happens in (2obc). Hence, there is not any conflict in (20a) between the nonfactual status of the proposition that the party is over that comes from the conditional and the factual status of the same proposition that comes from the adverbial. So there is no need for the SDRT construction procedure to override the default attachment of the constituent constructed from the adverbial to the outside context. So our analysis can also predict that (20a) lacks a conditional reading.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
How then can we attach the expression of the author's opinion to the antecedent in (2obc)? SDRT resolves this conflict by overriding the defeasible preference for using Commentary with this discourse adverbial of attaching Kn to the context in favor of attaching it with the one nonveridical rhetorical relation for monologue, Defeasible Consequence, in which the suppositional, non-established character of the antedecent of (19) is preserved. On this way of attaching Kn, the presupposition generated by the adverbial can also be bound. In fact, we can bring forward the following generalization:
Nicholas Asher 43
3.3 Scopes of Discourse Relations and Continuing Discourse Patterns My analysis of the parentheticals in (20) is part of a more general strategy for resolving conflicts between veridical and non veridical discourse relations. Consider examples like the following that are related to the phenomenon of modal subordination noticed by Roberts (1987). (23) a. If a shepherd goes to the mountains (TTJ), he normally brings his dog (TT2). He brings a good walking stick too (TT3). b. If the children got a chess set from that store (n,), it probably came with a spare pawn (nz). Then it rolled off the table (7r3). Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Both (23a) and (23b) exhibit in the first sentence some sort of conditional which generates the Defeasible Consequence, non-veridical relation between constituents. In both examples (?r3) must be attached to the consequent of the conditionals if the anaphoric pronoun is to get an antecedent. But there is a potential conflict between the entailment of the veridical relation—Parallel(TT2, TT3) in (23a) forced by the particle too and Narration(7r2,7r3) in (23b) inferred from the presence of then—and the rhetorical point of the non-veridical conditional. That is, if the truth of K-Ki and Kn} is entailed by the veridical relation, then why is the truth of Kn2 asserted only relative to some supposition from which it follows only defeasibly in the first sentence? This is not an outright contradiction, but it makes the point of asserting the conditional very unclear—indeed we might say that the discourse is pragmatically incoherent. In (23 a), although both veridical and non-veridical relations are involved, the attachment of the parenthetical is unproblematic. This is because the scopes of the respective discourse relations remove the clash between veridical and non-veridical relations. The scope of the non-veridical conditional relation is over both constituents (TT2) and (?r3) and the Parallel relation. Maximize Discourse Coherence will force us in effect to link (n2) and (7r3) together to form a new constituent that becomes the consequent of the conditional. This is a coherent discourse structure and there is no clash between veridical and non-veridical relations. But there must be constraints on when such new constituents can be formed, because Maximize Discourse Coherence does not allow us to form them in all cases—viz. (23b). Given a situation where we have R(a, 0) and R is non-veridical, we will be able to attach 7 to (3 via some veridical relation R1, only if 7 also bears R to a. This leaves open the possibility of attaching to some parent of a and /?, e.g. some constituent that contains both. In earlier work on largerscale patterns of discourse structure, colleagues and I argued that two constituents could be attached together in a subordinate structure to a third
44 Truth Conditional Discourse Semantics for Parentheticals
one only if the first two bore the same relations to the third— we called this constraint Continuing Discourse Patterns. Here we can motivate a similar principle about veridical and non-veridical relations from Maximize Discourse Coherence. The constraint below, mostly informal, nevertheless exploits the SDRT notation (T, Q, (3), which represents the attachment of a constituent (3 to a in the discourse context r.3
This constraint makes conceptual sense: if you want a non-veridical relation to have scope over a veridical one, you had better make sure that both terms of the veridical relation are also within the scope of the non-veridical relation. A discourse structure in which this constraint does not hold will be far from maximally coherent. So one can see how Maximize Discourse Coherence would lead naturally to a constraint like Continuing Discourse Patterns. I have referred to Continuing Discourse Patterns as a constraint only on attachments involving (3. But Maximize Discourse Coherence dictates a similar constraint for any attachments to a where a non-veridical relation R holds of a and (3. And it is just such a constraint from which we can derive (22) about parentheticals in conditional contexts—namely, that in attaching a parenthetical to the antecedent of a conditional, one must use the Conditional relation. For suppose that one wishes to attach new informaton (labeled by n2) to an antecedent (labeled say by TT,) of a conditional. In order to satisfy Continuing Discourse Patterns, we must attach w2 to nl with a non-veridical relation or we must be able to infer that TT2 also can bear the Defeasible Consequence relation to whatever 7r,'s consequent is. As Defeasible Consequence is the only non-veridical relation in monologue unless we consider repairs, it is the only non-veridical relation with which to attach TT2. One question that we have not yet answered is, why do parentheticals in 3 In Asher (1993), this constraint was built into the much more complicated SDRS update definition given there. In earlier work also continuing discourse patterns looked like the contrapositive of the constraint here and was a 'hard' constraint. The restricted version of Continuing Discourse Patterns here is defeasible and for technical reasons we have to use the slightly more complex contrapositive form, ~^A > —>B, which is not equivalent in nonmonotonic logic to B > A.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
o Continuing Discourse Patterns for non-veridical relations: o Suppose from the discourse context one can defeasibly infer that a nonveridical relation R holds between a and /3—i.e. R(a, (3). Then if 1. one cannot defeasibly infer from the context (r, a, 7) that R(a, 7) and 2. one cannot defeasibly infer from (r, (3.7) and the context that R'(/3, 7), where R' is non-veridical. o then normally.-i(r,/3. 7).
Nicholas Asher 4 j
conditionals apparently never have the scopes predicted by the simple account, as the embedding test shows? Could we not in fact have a situation of the following schematic sort: we have a conditional rf.rrI,then T , and a parenthetical .rr2 which is attached to rI via some veridical relation R but w i t h the scope of the conditional? Thus, we would have something like the following SDRS:
Note that this situation could only occur if we can derive Defeasible Consequence(r,, r 3 ) . But more is at stake here. This sort of attachment will not, for many sorts of conditionals like counterfactuals and even normal indicative conditionals, allow us to recover from any constituent what was asserted-namely, that KT, + KT,.A fundamental principle seems to be that while the addition of parenthetical information can change the discourse context and even the veridical status of the attachment point, it cannot make the information in the attachment point unrecoverable. To make this more precise, we need to recall the general definition of a discourse context as a pair consisting of a set A of labels and a function 3 that maps elements of A into formulas representing the content of the labeled constituents. More precisely, o Asserted content must be recoverable. o Suppose that prior to attachment of parenthetical information a , we have
asserted content 4. Then after integrating a into the discourse context T to get a context r', it must be the case that for some label a E T ' ,
Ka
-+
4.
The attachment of r, to r1 would make the original asserted content unrecoverable in the updated discourse structure. So if we adopt this principle about parenthetically used information, we predict that we cannot attach a parenthetical to the antecedent of a conditional by a relation other than Defeasible Consequence. Continuing Discourse Patterns also makes sense of those examples in which we have attachment to the second term of a non-veridical relatione.g. to the consequents of conditionals. examples. In (23a) Continuing
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
/ Defeasible Consequence(*, r3)
46 Truth Conditional Discourse Semantics for Parentheticals
party's over, and it is unfortunate that we have to go somewhere else to get a drink.
But in effect the deduction is straightforward. The context already gives us the conditional that if the party is over, we'll have to go somewhere else to get a drink (A > B), while world knowledge should yield that if we have to go somewhere else that's unfortunate (B > C). Since indicative conditionals are at least defeasibly closed under transitivity (i.e., from A > B and B > C one may defeasibly infer A > C), we can deduce Defeasible Consequence between the antecedent and the content of the parenthetical in (19). So we can conclude that the relevant relation, Defeasible Consequence, holds between the antecedent and the parenthetical. Thus, our constraint of Continuing Discourse Patterns will not fire, and we can felicitously attach the parenthetical to the consequent in a maximally coherent discourse. There are some apparent counterexamples to the application of Maximize Discourse Coherence that relies on Continuing Discourse Patterns. Consider the following (brought to my attention by Frank Veltman): (25) a. If, as we have just learned, Kim has made an offer, we don't stand a chance. b. If, as we now know, Kim has made an offer, we don't stand a chance. Here the author invites his audience to do a simple modus ponens. But note this is not really a conflict involving attachment. We might readily attach the parenthetical to the constituent formed by the entire previous sentence
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Discourse Patterns does not fire; in addition to the veridical relation Parallel between 7r2 and 7r3, we can establish given (r, 7rn7r2) a non-veridical conditional relation between 7r, and n}. In (23b), however, we cannot make the inference to establish that ?r3 can be related via the conditional relation to TT,. The presence of the adverbial then breaks this inference. But it also forces us to infer a veridical discourse relation, Narration, with which we must attach 7r3. This means then that both conjuncts of Continuing Discourse Patterns are satisfied and so we cannot attach 7r3 to TT2. The reason the discourse is odd is that if we do not attach ni to 7r2, then we cannot find, according to the DRT and SDRT constraints of accessibility and availability of anaphoric antecedents, an antecedent for the pronoun in 7r3. An example similar to (23 a) that involves parentheticals is (19). There is at least one reading, speakers report, on which the parenthetical can apparently attach to the consequent. In this example, the underspecified element p resolves to the consequent the conditional and we can then specify R to Commentary and attach the parenthetical to the consequent itself. How can this be according to our constraints? Well, this is allowed only if we can deduce Defeasible Consequence between the antecedent, the
Nicholas Asher 47
4 FURTHER ISSUES There seems to be a division between parentheticals that happily undergo subject verb inversion and those that do not in English. (26) The economy is no longer growing, reports the chief economist for Citycorp. (27) ? Please leave, beg you I.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
while nevertheless identifying the underspecified object argument of the propositional attitude verbs with the antecedent. There is a conflict here between the presupposed status of the complement in the parenthetical when it is attached with a veridical relation and its suppositional status as the antecedent to a conditional in the two examples. But the conflict is eliminated by making the inference to a suppressed conclusion. This type of structure is a compressed way of getting from accepted information to a new and perhaps unwelcome conclusion. One should note that the examples in (25) rely on the particular parentheticals used. The verbs in those parentheticals are both/active and epistemic and this seems essential to our intuitions that that sort of attachment pattern is possible. Other parentheticals as in (20) do not allow for that sort of attachment. Because Continuing Discourse Patterns is a default constraint on attachment, we can admit such specialized rhetorical patterns without inconsistency in this analysis. Default constraints sometimes are overridden in more specialized contexts, like those in which factive and epistemic parentheticals are present. Continuing Discourse Patterns is a constraint on attachment. But there are others, as I intimated earlier. Interestingly, when we compare (19) to (20b), we see the attachment preference is determined by the choice of modal, 'unfortunately have to', 'unfortunately must' sound fine, whereas 'unfortunately should' sounds less good. Perhaps it is that the deontic should is not something that can be regretted, though vaguer doxastic or epistemic modalities like must can easily be regretted. More to the point, if something is to be regretted within a conditional, then normally one might think it is the consequences, unless those consequences are a matter of conditional obligation. Then one should regret the triggering cause or occasion of those obligations. For parentheticals then these observations set up a preference ordering for attaching parentheticals within conditionals not so unfamiliar from presuppositions: attach as low as is consistent with r, (f) maximality. In particular the choice of modal affects where a Commentary can be attached.
48 Truth Conditional Discourse Semantics for Parentheticals
(28) a.
b. c. d. e. f. g. h.
(i) A: Does anyone know any of the applicants? (ii) B: I know one, Piell. (iii) A: How do you spell him? (iv) B: P-I-E-L-L. (v) A: OK. (vi) B: If it's the same man, I haven't read his File yet. being good means doing nothing you wouldn't want me to do as well as doing nothing we would want to do, if you see what I mean. The fault, if it's a fault, is to be found in the system. The story, if it may so be termed, is weak and loose. Piggies, if you remember Lord of the Flies, . . . If you're hungry there's food in the fridge. If you don't mind the word, he's a bully. (i) A: they thought there was something structurally wrong with it, the rear wall if you remember, (ii) B: which you had taken down? (iii) A: Yeah.
The analyses of these examples broadly follows the account already laid out for parentheticals. The antecedents of these conditionals are typically not giving propositions upon which the contents of the main clauses are truth conditionally dependent. Rather, like parentheticals, they take a wider scope. Many of these provide assumptions upon which the rhetorical functions of the main assertions depend. For this reason, the conditional antecedent takes scope over a particular rhetorical relation with which the main clause material is attached. Take, for instance, the first example (28 a).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Informants report that in some languages like German and French the inversion of saying verbs is obligatory. In Spanish and Portugese inversion is the default. In English it seems as though the inversion is largely stylistic. An informal survey of Reuters news articles reveals many parentheticals both in normal and in inverted order. On the other hand, there do seem to be syntactic constraints in German and French. It may even suggest that some elements really are extraposed and thus are distinct from the parentheticals analyzed here; that is, they simply are a syntactic rearrangement of one complex constituent rather than two distinct constituents. The syntactic extraposition cases seem, however, relegated largely to verbs of saying, for it is with these verbs that we see the inversions. Another issue concerns extensions of this work There are some other examples of scope escaping elements that, though syntactically distinct from parentheticals and discourse adverbials, have similar semantic and pragmatic properties. These are the so-called pragmatic conditionals, studied, for instance, by Haegeman (1984). I give some examples below.
Nicholas Asher 49
5 CONCLUSION I have argued for the following. Parentheticals and discourse adverbials attach with discourse relations to the assertions or components thereof in which they are introduced. But their scopes are determined by a variety of intricate factors like Maximize Discourse Coherence and the resolution of conflicts between discourse relations. Parentheticals do not have the simple analysis presupposed in the embedding test, because they are distinct discourse constituents and must be attached via some discourse relation that interacts with the conditional. In this way, my account can explain
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
B's response to A's question that he knows one of the candidates is an answer to that question, provided that the claim in the conditional is correct—i.e. provided that the Piell he knows is the one who applied. Thus, the attachment of the conditional is much higher up in the discourse structure than the material surrounding the conditional. The conditional has scope over the discourse constituent containing the two questionanswer pairs (28a.i-iv) given earlier in the example. Similarly, we might take the conditional if you see what I mean in (28b) as a conditional taking wide scope over a question answer pair relation; the conditional says that if you understand what I'm getting at, then this will be an answer to the background question about what it is to remain faithful (see Haegeman 1984 for discussion). Some of the other examples are less clear—for instance, (28d). In that example, there's food in the fridge is a background premise to the unstated conclusion that the addressee can eat some of the food if he's hungry. Once again the attachment of the antecedent depends not on sentential syntax, however, but on discourse coherence considerations. Haegeman analyzes these examples, along with parentheticals, using relevance theory. The antecedents of these conditionals are understood as facilitating access or otherwise enhancing the relevance of the material in the consequents. Haegeman's relevance theoretic analysis of pragmatic conditionals and parentheticals appears to be at a different level from the one proposed here; it tells us the speaker's goals behind the utterance of these conditionals. But it subscribes largely to the non-truth conditional view of these items, which it has been my principal aim to rebut. The SDRT analysis given here shows how parentheticals and pragmatic conditionals are just normal conditionals that make a truth conditional contribution to the content of the discourse. The purported difference between these examples and other examples of conditional sentences lies in the way the nuclear scope of these pragmatic conditionals is determined.
50 Truth Conditional Discourse Semantics for Parentheticals
away the phenomena noted by Wilson with the embedding test without endorsing the claim that parentheticals, discourse adverbials, or pragmatic conditionals have a 'non-truth conditional' semantics.
Acknowledgements I would like to thank Robyn Carston, Lilliane Haegeman, Alex Lascarides, Frank Veltman, Deirdre Wilson, and an anonymous reviewer for Amstelogue for helpful comments and discussion.
REFERENCES Asher, Nicholas (1993), Reference to Abstract Objects in Discourse, Kluwer Academic Publishers, Dordrecht. Asher, Nicholas (1997), The logical foundations of discourse structure and interpretation', in Jesus Larrazabal, Daniel Lascar, & Grigori Mints (eds), Logic Colloquium 1996, Springer Verlag, Heidelberg, 1-45. Asher, Nicholas & Lascarides Alex (1998a), 'Bridging'.yourMa/ of Semantics. 15: 83-113. Asher, Nicholas & Lascarides, Alex (1998b), The semantics and pragmatics of presupposition', Journal ofSemantics 15, 23999Grice, Paul (1989), Studies in the Ways of Words. Harvard University Press. Cambridge. Groenendijk, Jeroen & Stokhof, Martin (1984), 'Studies on the semantics of questions and the pragmatics of answers', unpublished Ph.D. thesis, Department of Philosophy, University of Amsterdam, Amsterdam. Haegeman, Lilliane (1984), 'Interjections and Phrase Structure', Linguistics 22, 41-9.
Haegeman, Lilliane (1991), 'Parenthetical adverbials: the radical orphanage approach', in Chiba et al. (eds), Aspects of Modern English Linguistics, Kotakushi, Tokyo, 232-54. Hasegawa, Yoshi (1996, The (nonvacuous) semantics of TE-linkage in Japanese', Journal of Pragmatics, 2$, 763-90. Hintikka, Jaako (1974), Models for Modalities, North Holland Press, Dordrecht. Lascarides, Alex & Asher, Nicholas (1993), Temporal interpretation: discourse relations and commonsense entailment', Linguistics and Philosophy, 16, 437-93. Roberts, Craige (1987), 'Modal subordination, anaphora, and distributivity', unpublished Ph.D. thesis, University of Massachusetts at Amherst. Sperber, Dan & Wilson, Deirdre (1995), Relevance, Blackwell, London. van der Sandt, Rob (1992), 'Presupposition projection as anaphora resolution', Journal of Semantics, 9, 333-77. Wilson, Deirdre (1975), Presuppositions and Non Truth Conditional Semantics, Academic Press, New York.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
NICHOLAS ASHER Department of Philosophy 316 Waggener Hall University of Texas Austin, TX 78712, USA e-mail: [email protected]
Journal of Semantics 17: 51-89
© Oxford University Press 2000
Dialogue Acts, Synchronizing Units, and Anaphora Resolution MIRIAM ECKERT University of Pennsylvania MICHAEL STRUBE European Media Lab
Abstract
1 INTRODUCTION In this paper, we present a model for the resolution of pronouns and demonstratives in spontaneous spoken dialogue. In the semantic, syntactic, and psycholinguistic literature, work on anaphora has concentrated primarily on the analysis of pronouns and definite NPs with NPantecedents. This is considered to be the 'normal' type of anaphoric reference. Our corpus study reveals that in actual language use this type of anaphoric reference accounts for less than half of the occurrences of pronouns and demonstratives (45%). An additional 22% are anaphors with sentential and VP-antecedents. Although this type has been studied previously (Webber 1991 and, particularly, Asher 1993 provide extensive theoretical accounts), it seems that its frequency and therefore importance has been largely underestimated. Rather surprisingly also, the remaining
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In this paper, we present the results of a corpus analysis, and a model of anaphora resolution in spontaneous spoken dialogues. The main finding of our corpus analysis is that less than half the pronouns and demonstratives have NP antecedents in the preceding text; 22% have sentential antecedents and the remainder have no identifiable linguistic antecedents. As part of the corpus analysis we present the results of inter-annotator agreement tests. These were carried out for the annotation of anaphor types and their antecedents, and for the segmentation of the dialogues into dialogue acts. The results of the inter-annotator agreement tests indicate that our classification method is reliable and that the annotated dialogues can be used as a standard against which to measure the performance of the anaphor resolution algorithm. The algorithm, based on Strube (1998), is capable of classifying pronouns and demonstratives, and co-indexing anaphors with NP and sentential antecedents. The domain from which potential antecedents for both individual and discourse-deictic anaphors can be elicited is defined in terms of dialogue acts. The dialogue segmentation method uses dialogue acts to form Synchronizing Units, which reflect the achievement of common ground (Stalnaker 1974, 1979). We show that predicate information, NP form, and dialogue structure can be successfully used in the resolution process.
52 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
third of all pronouns do not have identifiable linguistic antecedents of any kind. These are pronouns that are used to refer to inferrable entities and those that are used to refer to a vaguely defined general discourse topic. These findings indicate that an important function of pronouns, aside from anaphoric reference, is that they allow the speaker to leave certain referents underspecified. In spontaneous spoken language it is simply not necessary for the participants to be able to unambiguously identify a specific referent at all times. If they fail to understand an utterance and consider avoidance of misunderstanding to be important, they can immediately request clarification—an option not available in the written medium. Furthermore, the optional use of vague pronouns greatly facilitates the task of the speaker in on-line language production. We present a model that shows how pronouns and demonstratives can be classified and, if appropriate, co-indexed with the correct antecedents. The model makes use of the surface form of the anaphor, its predicative context, and the structure of the discourse. It also presents a basis for further empirical evaluations of theoretical issues in anaphora resolution. Furthermore, we believe that it provides an important starting point for spokenlanguage resolution algorithms in the field of computational linguistics, which have so far almost exclusively dealt with anaphora in written texts. In computational linguistics, most anaphora resolution algorithms are designed to deal with the predominant type of anaphoric reference found in written texts, which involves the co-indexing relations between anaphors and NP-antecedents. Aside from the different types of anaphors found in spoken language, the structure of dialogues is less clear than the structure of written texts, with lack of punctuation and paragraphs, and many syntactically incomplete clauses making it difficult to formally define the domain for potential antecedents. For these reasons, applying existing anaphora resolution algorithms to dialogues would result in a poor performance. Our model is presented in the form of a major extension of the anaphora resolution algorithm described in Strube (1998). The Strube (1998) algorithm consists of an ordered list of salient discourse entities (S-List), which provides preferences for the antecedents of pronouns. The main characteristic of the algorithm is that preferences for intra- and intersentential pronouns are dealt with in a unified manner as the update of the S-List and the anaphora resolution are performed incrementally. Essential to the success of the algorithm presented in this paper is the interaction between the identification and resolution of different types of anaphors and the determination of the domain of possible antecedents. We use dialogue act units (derived from speech acts) to provide the structure necessary for the determination of the antecedent domain and also to function as antecedents for anaphors with sentential antecedents.
Miriam Eckert and Michael Strube 5 3
2 THEORETICAL ISSUES In this section, we present some of the issues in theoretical linguistics which we consider to be important for the process of anaphora resolution in spoken dialogue. The value of these issues has so far been expressed in theoretical terms. We consider one of the contributions of our resolution algorithm to be that it opens the possibility of testing their value empirically.
2.1 Reference and the discourse model We assume that a conversation has a model of the discourse associated with it, which is distinct from both the real world and from the syntactic representation of the discourse. Such models have frequently been described in the literature, e.g. common (background (Stalnaker 1974, 1979), discourse model (Webber 1979), files (Heim 1982), attentional state (Grosz &
Sidner 1986), DRSs (Kamp & Reyle 1993). These proposed models differ in a number of important ways, such as whether they are said to exist at the semantic level (files, DRSs), the pragmatic level (Stalnaker's common ground), or the discourse level (Webber's discourse model, Grosz & Sidner's attentional state). Also, some models are proposed to represent properties of the conversational participants (Stalnaker's pragmatic presuppositions constituting
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The paper is structured as follows: section 2 describes the theoretical observations which are important for our analysis and which have partly been incorporated into the algorithm. Section 3 describes the spokenlanguage corpus used for our empirical analysis of anaphor types and for testing the algorithm. Section 4 gives an overview of our classification system for the different types of pronouns and demonstratives we identified in the spoken dialogues. Section 5 describes how we use dialogue acts to model the establishment of common ground and to define the domain of possible antecedents for the anaphors. Our resolution algorithm is presented in section 6. Section 7 gives the results of the empirical analysis. This consists of two parts: first, we evaluated the classification system in terms of inter-annotator agreement. We deemed this step necessary in order to verify the consistency of our classification. Second, we evaluated the algorithm by applying it to the hand-annotated dialogues. Finally, sections 8 and 9 provide comparisons to related work, suggest future additions and applications of our model, and present the conclusions.
54 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
the common ground), whilst others represent properties of the discourse itself (DRSs, attentional state).
entities the common ground.
The update of the discourse model has been the subject of considerable debate. One issue is the question of when and how entities enter into the common ground. Because conversations involve more than one participant, merely uttering a sentence does not mean that the entities referred to have entered into the common ground. It is possible, for example, for one speaker to ignore the utterance of another. Conversational participants have a number of ways in which to signal understanding of an utterance, including nods of the head, relevant further contributions to the discourse, and simple backchannels (e.g. u-huh, yeah, mmhm). In our model, if an utterance is not acknowledged by the other participant, its discourse entities are not retained in the common ground. This issue is explained in more detail in section 5. There has also been disagreement concerning the influence of NP form on update, that is, whether indefinite NPs, definite NPs and pronouns serve to update the discourse model in the same way or whether different mechanisms need to be postulated. In Russell's view (Russell 1905), indefinite NPs are not referring expressions, but rather function much like existential operators, by declaring that the set of entities described by the NP is not null. This view was subsequently challenged because it does not explain the capacity of indefinite NPs to function as antecedents of anaphoric pronouns (Grice 1975; Kripke 1979; Lewis 1979). In Heim's file change semantics (Heim 1982), the approach is taken that indefinite NPs are used to introduce new entities (file cards) to the discourse model, whereas definite NPs make use of familiar ones. A concern with making a categorical distinction between definites as NPs specifying given entities, and indefinites as NPs specifying new entities, is that there are many counterexamples in which definites are used to refer to discourse-new entities (Prince 1981). In fact, recent empirical research has
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
These versions of the discourse model have in common that they contain representations of the objects that have been referred to in the discourse, known as the discourse referents (Karttunen igj6),file cards (Heim 1982) or discourse entities (Webber 1979; Kamp & Reyle 1993). The discourse model also contains the attributes of the discourse entities and the relations holding between them but for the moment we will focus only on the entities introduced by NPs in the discourse. The discourse model contains representations of the entities that are salient to both participants at a given point in the discourse because they have been referred to in the previous discourse. Using terminology from Stalnaker (1979) and Clark & Schaefer (1989), we will call the part of the model containing representations of these
Miriam Eckert and Michael Strube 5 5
indicated that the numbers are by no means negligible. Poesio & Vieira (1998) show that in their corpus 50% of definites are discourse-new. The reason is that, as noted in Prince (1981, 1982), the status of entities is far more complex than can be determined by the distinction given-new. The following are examples of the categories of discourse entities defined in Prince (1981: 233, ex. 22; 237, exx. 25-27): Brand-new: I bought a beautiful dress. Brand-new anchored: A guy I work with says he knows your sister. Unused: Noam Chomsky went to Penn. Inferrable: I got on a bus yesterday and the driver was drunk. Containing inferrable: Hey, one of these eggs is broken! Evoked: Susie went to visit her grandmother and the sweet lady was making Peking Duck.
The categories are described by adding the distinction hearer-old-hearernew to the discourse-old-discourse-new factor. Discourse-old/new describes the
information status of an entity with respect to the discourse. Hearer-old/new describes the status with respect to the hearer. A definite NP such as Noam Chomsky in (3), for example, can be discourse-new if its referent has not been mentioned before, but hearer-old because it is familiar to the addressee. Prince describes this category as unused. A discourse-new entity can be anchored by a hearer-old or discourse-old entity, as in (2), where the indefinite NP is anchored by the first person pronoun I. Inferrable entities are hearer-new, discourse-new, but 'depend upon beliefs assumed to be hearer-old, where these beliefs crucially involve some trigger entity' (Prince 1992: 309). A trigger entity can be the referent of a previously mentioned NP, as in (4), where the NP a bus, once established in the discourse, allows one to refer to expected or related entities such as the driver with a definite NP. This phenomenon is also described in Lewis (1979) as accommodation. With containing inferrables, as in (5), an NP is inferred from another NP inside it (e.g. one of these eggs from these eggs). Finally, textually and situationally evoked entities are entities that are already in the discourse model. An example of this is the referent of the sweet lady in (6), which is textually evoked by the NP her grandmother. The discourse model is not intended to reflect which entities are familiar to the hearer but rather which entities are salient at that point in the discourse. We therefore assume that indefinite and definite NPs can add entities to the discourse model because they can both cause a referent to become salient in the discourse. The category inferrable is only accounted for in certain restricted cases (discussed below). We are interested here in pronouns and demonstratives. With a few exceptions, inferrables cannot be referred to with pronouns or demonstratives unless they have previously
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(1) (2) (3) (4) (5) (6)
56 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
2.2 Predicate information If we say that the referent of an NP is introduced into the discourse model at the point when the NP is uttered, we can assume that from that point on the entity in the discourse model is available for subsequent anaphoric reference. We will call anaphoric reference involving NP antecedents individual anaphora. However, anaphoric reference also occurs with sentential and VP-antecedents (Webber 1991; Asher 1993). Following Webber (1991), we will call this type discourse-deictic reference. In these cases, the determination of the referent seems more complex. As can be seen from the following examples taken from Asher (1993) (his numbering in parentheses), anaphors can pick up different kinds of abstract objects such as events, states, concepts, propositions or facts specified by previous clausal constituents: (7) Event: John kicked, Sam on Monday, and it, hurt. (35 (55)) (8) Concept: Somebody [had to take out the garbage,], and Bill did it,. (246 (29))
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
been referred to with a full NP. For the purposes of our model, an NP such as the driver should be used to introduce an entity into the discourse model in the same way as the NP a bus. In the algorithm presented here, we will make use of the notion of discourse model in order to simulate pronoun and demonstrative resolution. We do not intend to present a comprehensive model of the discourse. Our simplified model consists only of a list containing representations of the objects that have been referred to in the discourse with NPs. It is similar to Grosz & Sidner's attentional state as it is intended to contain representations of entities which are salient to the participants. We will use Webber's terminology and call these representations discourse entities (Webber 1979). The list is called S(alience)-list as the entities are ordered according to how salient they are in the discourse. The algorithm resolves pronouns by co-indexing them with the highest-ranked compatible entity in the S-list. The list in our model spans more than one utterance and is incrementally updated as the discourse progresses. This means that an entity is available for subsequent anaphoric reference as soon the NP is uttered. The model does therefore not require different mechanisms for inter- and intra-sentential anaphora. The details of the S-list and the resolution process are described in section 6. We first turn to other linguistic issues.
Miriam Eckert and Michael Strube 57
(9) State: John didn't know, the answer to the problem. This, lasted until the teacher did the solution on the board. (53 (85-b)) (10) Fact: Mary proved [that the defendant was lying about the President's ignorance of the cover-up.], This, shows that the cover-up is much larger than previously thought. (245 (28.a))
(n) Proposition:
Asher states that the type of referent is determined by the predicative context of the anaphor. For example, a discourse-deictic anaphor in the subject position of the intransitive verb hurt must specify an event (example 7 above), whereas an anaphor in the object position of the verb believe specifies a proposition (example 11 above). In our model, we make use of the predicative context of the anaphor to determine the type of its referent and to help distinguish between individual and discourse-deitic anaphors. For example, it is generally the case that the constituent in the object position of verbs such as assume or believe specifies an abstract entity and should therefore be co-indexed with a clause. Conversely, the constituent in the object position of the verb eat specifies a concrete entitity and should therefore be co-indexed with an NP. It is clear that such a distinction is very simplistic. For example, although the constituent in the object position of believe must specifies a proposition, and propositions are generally specified by whole clauses, this is not always the case. Certain NPs can specify abstract objects in the same way that clauses do (e.g. Jane told me [a story];. I didn't believe /(,.) Future work should
therefore make use of semantic tagging of NPs to supply information such as whether their referents are abstract or concrete. However, this is a difficult task for numerous reasons. One issue, for example, is that an NP may in certain cases be used to indirectly refer to an abstract object even though it generally specifies a concrete entity. In the sentence / don't believe Jane, the NP Jane stands for some/all proposition^) expressed by Jane.
Another issue that requires a more complex solution concerns reference to events that are inferrable but not explicitly mentioned, e.g.: (12) We just got back from France. It was great fun. The pronoun it specifies the event of being in France. However, the VP in
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The 'liberation' of the village had been bloody. [Some of the Marines had gone crazy and killed some innocent villagers. To cover up the 'mistake', the rest of the squad had torched the village, and the lieutenant called in an air strike.], At first the battalion commander hadn't believed it,. (49 (82))
58 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
the preceding context specifies the event ofgetting back from France. Getting back implies having been in a place, so the appropriate referent of the pronoun it is available to the listener as a result of world knowledge. In the work presented here, we put these complex issues aside for the time being and use the predicate of the anaphor only as one of the features guiding the simplified anaphor classification and resolution.
2.3 Referent coercion
(13) The Rhodesian ridgeback down the block bit me yesterday. (a) It's really a vicious beast. (b) They're really vicious beasts. In continuation (a) the singular pronoun is used to refer to the individual dog, whereas in (b) the plural pronoun references the set of dogs of that particular breed. In both instances, the textstring the Rhodesian ridgeback (modified by the PP down the block in version (a)) is used to provide the referent of the pronoun. The same variety of potential referents can be found with clausal antecedents. For example, the clause in (14) can make available an event, concept, fact, or proposition as a referent for subsequent anaphors: (14) [John [crashed the car]y],. (a) This, annoyed his parents, (event) (b) Jane did that,, too. (concept) (c) This, shows how careless he is. (fact) (d) His girlfriend couldn't believe it,, (proposition) Furthermore, Moens & Steedman (1988) provide an analysis of events that divides the event-complex into a. preparatory process, culmination and consequent
state. Their analysis of adverbials shows that reference can be made to any one of these subparts of the event, as can be seen in the following example taken from Ritchie (1979), cited in Moens & Steedman (1988, ex. 1): (15) When they built the 39th Street bridge . . . (a) . . . a local architect drew up the plans. (b) . . . they used the best materials. (c) . . . they solved most of their traffic problems.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The predicative context of the anaphor is important even when the antecedent constituent has been determined because the precise referent must still be identified. Webber (1983: 332) points out that the same text string can give rise to a variety of entities available for subsequent anaphora:
Miriam Eckert and Michael Strube 59
(16) (a) [I noticed that [Carol insisted on sewing her dresses*, from nonsynthetic fabric^.], (b) That,'s an example of how observant I am. (c) And they*, always turn out beautifully. (d) # Thaty's because she's allergic to synthetics. The discourse-deictic demonstrative in utterance (b) picks out a referent described in the main clause of the first utterance (/ noticed . . .). The discourse-deictic demonstrative in the final utterance (d), however, is not capable of doing the same thing. It cannot be used to refer to the intended referent in utterance (a) (Carol insisted • • •) because of the intervening utterance (c). At the time of the final utterance the referent of the first utterance is no longer available. Intervening utterances pose no such problem for individual anaphoric reference. The pronoun they in utterance (c) is used felicitously to refer to the referent of the NP her dresses in the first utterance, in spite of intervening utterances and anaphoric references. Note, however, that in spite of the transitory qualities of discourse-deictic entities, chains of discourse-deictic references are possible, as seen in this altered version of Passoneau's example: (17) (a) [Carol insisted on sewing her dresses from non-synthetic fabric], (b) That,'s because she's allergic to synthetics. (c) It,'s also because she hates cheap materials.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The event building the bridge consists of a preparatory process of building the bridge, which includes the architect drawing up the plans, a culmination, which involves using the best materials, and a consequent state, which involves the solution of the traffic problems. The adverbial clause supplies the necessary subparts of the event for the alternative continuations. Instead of assuming that all levels of abstract objects and all their subparts are introduced to the discourse model by the clause that makes them available, it has been suggested that discourse-deictic reference involves referent coercion (Dahl & Hellman 1995) or ostension (Webber 1991). That is, in a process similar to accommodation (Lewis 1979), the anaphor itself is used to create a new referent in the discourse model. This means that the referents of discourse-deictic anaphors do not exist in the discourse model unless anaphorically referred to. Webber suggests that for each context there are discourse entities that stand proxy for its propositional content. Discourse-deictic anaphora involves a referring function that yields a discourse entity proposition, event, event type or state from the proxy entity. Passonneau (1991: 69) uses the following example to show that referents of discourse-deictic anaphors are lost from the discourse model immediately unless referred to again:
60 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
In (17), the referent of the first clause is available for anaphoric reference both in clauses (b) and (c). The continued reference ensures that it is not lost.
24 Choice of NP-form
(18) [Jane bought [a new bike],]y. (a) It,'s great. (b) That's great. In contexts like this, where the predicate is great can conceivably be associated with either the referent of a full clause or an NP, the pronoun preferentially picks out an NP antecedent (a new bike), whereas the demonstrative picks out the whole clause [Jane bought a new bike). However, contexts that force either an individual or a discourse-deictic interpretation make it clear that both demonstratives and pronouns can be used for each type of reference: (19) A: B: (20) A: B:
I'm going to eat [the last piece of cake],. But John wanted to eat it,/that,. I wonder whether I should [call him],. I wouldn't do that,/it, if I were you.
In example 19, the anaphors occur in the object position of the verb eat, and must be interpreted as specifying a concrete entity. In example 20, the anaphors occur in the object position of the verb do and must thus specify
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We now turn to the differences between pronouns and demonstratives as we are interested in building a resolution algorithm for both of these NP forms. Gundel et al. (1993), amongst others, note that there is a correlation between different NP forms and the accessibility of their referents. Pronouns and demonstratives provide only little information concerning the identity of their referents (in English, number and gender only) and are therefore reserved for the most salient entities in the discourse model. The difference between demonstratives and pronouns, according to Gundel et al, is that demonstratives indicate that their referent is salient (activated), but that it is not the current most salient entity (in focus). Pronouns, on the other hand, can only be used for the most salient entities. In the literature, it is generally claimed that discourse-deictic reference, as opposed to individual anaphoric reference, is preferrably established with demonstratives rather than pronouns (Webber 1991; Asher 1993; Dahl & Hellman 1995). The contrast in (18) reflects these preferences:
Miriam Eckert and Michael Strube 61
an event concept.1 In spite of the preferences associated with the different NP forms, in each example both NP forms are capable of making the necessary specification. The observation that demonstratives are preferred for discourse-deictic reference is in line with the referent coercion assumption, i.e. the assumption that discourse-deictic anaphoric reference leads to the introduction of a new entity into the discourse model. If one assumes, following Gundel et al., that demonstratives are used for entities that are less salient than those specified by pronouns, then it is to be expected that demonstratives should be pereferred for entities newly created in the discourse model.
We now move on to examining the structural constraints to which discourse deixis is subject. Webber (1991) notes that only text sections which are on the right frontier of the discourse structure tree are available for discourse-deictic reference, as can be seen by the following discourse (Webber 1991: ex. 14): (21) There's two houses you might be interested in. (a) House A is in Palo Alto. It's got 3 bedrooms and 2 baths, and was built in 1950. It's on a quarter acre, with a lovely garden, and the owner is asking $425K. But that's all I know about it. (b) House B is in Portola Valley. It's got 3 bedrooms, 4 baths and a kidney-shaped pool, and was also built in 1950. It's on 4 acres of steep wooded slope, with a view of the mountains. The owner is asking $6ooK. I heard all this from a real-estate friend of mine. (c) Is that enough information for you to decide which to look at? (c') *But that's all I know about House A. The central part of the text is clearly divided into two sections (a and b), each containing the description of a house consisting of more than one clause. At the end of each section a demonstrative is used to refer to what is described by the preceding utterances (that for House A; this for House B). Finally, in the continuation (c) the demonstrative that picks out the referents of the whole preceding discourse, i.e. what is referred to by (21a) and (b) together. The unacceptability of the utterance in the alternative continuation (c') shows that once section (a) is closed off and the description in 1 Although some NPs can function as antecedents to pronouns in the object position of do (e.g. do it/the foxtrot, do drugs), there is no number and gender compatible antecedent in the preceding clause in example 20.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
2.5 Right Frontier Rule
62 Dialogue Acts, Synchronizing Units, and Anaphora Resolution {Info on both houses}
'But that's all I know .
{info on House B}
Figure I Discourse tree structure (Webber 1991)
section (b) has started, (a) is no longer accessible for reference. Webber represents this discourse with the tree structure shown in Figure 1. The only nodes that a new constituent could attach to are nodes on the rightfrontier of the tree, which are indicated in the figure by the crossed circles. Asher's Principle of Availability (Asher 1993: 313) has a similar function to the Right Frontier Rule. It states in part that only the current constituent itself and its discourse referents and subconstituents (subDRSs) are available as antecedents for abstract object anaphora.2 Both Webber's and Asher's findings can be interpreted as reflecting the notion of adjacency. The constituents which act as antecedents to discourse-deictic anaphors must be linearly or hierarchically adjacent to their anaphors. We will make use of this rule in our algorithm, by formulating a concept of adjacency in terms of dialogue acts.
2.6 Summary We have so far determined four fundamental differences between cospecification of an anaphor with an NP and discourse-deictic reference: o The precise referent of discourse-deictic anaphors is determined by the predicate of the anaphor;3 2 This principle also states that a constituent which stands in a discourse relation to the current constituent is available as an antecedent However, in the simple algorithm we present here we do not deal with discourse relations and so do not make use of this part of the principle. 3 We are not claiming that the predicate of an anaphor cospecifying with an NP cannot be crucial for disambiguation. However, with NP-anaphoric reference, the predicate does not add entities to the discourse model, but rather it may serve to select one of an already existing group.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
{info on House A}
Miriam Eckert and Michael Strube 63
• Referent coercion: abstract objects such as events, states, propositions, and facts are not introduced to the discourse-model by virtue of the constituent that describes them, but rather by virtue of anaphoric reference. The referents of discourse-deictic anaphors are immediately lost again from the discourse model if not referred to again; • Demonstratives are preferentially used for discourse-deictic reference, pronouns are preferentially used for cospecification with NPs; • Right Frontier Rule: the antecedent constituents of discourse-deictic anaphora should be linearly or hierarchically adjacent to the anaphor.
3 CHOICE OF CORPUS The choice of corpus is a difficult one. All corpora have corpus-specific characteristics which may influence the range of vocabulary and syntactic constructions. The choice should therefore be determined by the specific analysis one wishes to carry out. Our choice to concentrate on spoken rather than written language is guided by previous observations (Eckert 1998) that spoken language contains more pronominal anaphors and a more diverse range of anaphor types (described below, section 4). Furthermore, the purpose of the study is to analyse and develop a formal representation of the effect of grounding on anaphora, and this is a phenomenon restricted to spoken language. Spoken-language corpora can roughly be divided into two categories: task-oriented and non-task-oriented. In task-oriented corpora (e.g. TRAINS (Allen et al. 1995), Maptask (Anderson et al. 1991)), the conversational participants are required to perform a particular task, such as constructing an object, or describing a route on a map, and are recorded while carrying out the task. The advantage of such corpora is that the common ground between the participants, that is the set of entities familiar to both, is fairly easy to model. The observer can reconstruct whether a particular entity (e.g. the small screw) has been previously mentioned, is accessible in the immediate surroundings, or new to the discourse. This feature is particularly valuable when analysing, for example, the appropriate use of the definite article and pronouns. However, such corpora contain a large
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Before providing a more detailed description of the algorithm in section 6, we first describe a preliminary corpus analysis, which was used to test our anaphor classification and resolution method and the classification of dialogue acts, and to provide a standard against which the algorithm can be tested.
64 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
4 ANAPHORA IN DIALOGUES We now turn to the analysis of anaphora in the corpus. As mentioned in the introductory section, there are anaphors that cospecify with NPs, and anaphors that cospecify with VPs or clauses. In addition to these two types we identified three other types of pronouns and demonstratives, which do not appear to be cospecifying with any other linguistic constituent. The correct identification method for anaphors is important because for the purposes of the algorithm it is necessary to determine which pronouns and demonstratives are anaphoric and therefore resolvable, and which are not. Also, in the case of resolvable anaphors, it is necessary to determine the type of antecedent (NP vs. VP/clause). This section presents the results of a frequency analysis of the different types of pronouns and demonstratives and gives examples of each type from the Switchboard corpus. An empirical analysis of the inter-coder agreement for this classification is presented later in section 7.
4.1 Individual anaphors In the Switchboard corpus dialogues we examined, individual anaphors, i.e. anaphors with NP antecedents, constitute only 45.1% of all anaphoric
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
number of imperative-like constructions, and contain fewer references to non-concrete entities, thus making them unsuitable for our purposes. Non-task-oriented dialogue corpora are intended to be representative of 'natural' and 'unconstrained' speech. The Callfriend (LDC 1996) and Callhome (LDC 1997) corpora consist of recorded telephone conversations between relatives and friends. These corpora are particularly difficult to analyse as there is a large amount of common ground and shared assumptions between the participants that the observer does not have access to. For our analysis we chose the Switchboard corpus (LDC 1993), which is a collection of recorded and transcribed telephone conversations between two people who are not acquainted with each other. The participants were asked to talk about a given topic, such as childcare, exercise, or foreign politics. This corpus has some of the advantages of the task-oriented corpora, in that the amount of shared knowledge that is inaccessible to the observer is kept to a minimum. As the dialogues are between strangers, they are easier to follow than those from the Callhome corpus. In addition, the dialogues are not goal-driven and there are many references to both concrete and abstract entities.
Miriam Eckert and Michael Strube 65
references. This number includes all demonstratives and all instances of he, she, it, and they with NP antecedents, e.g.: (22) A: myparents, didn't really have music in the house . Put it that way. B: Oh , rea- , Were they, religious ? (SW4.168)
4.2 Reference to abstract objects
(23) Now why didn't she [take him over there with her],? No, she didn't do that,. (SW4877) (24) A: . . . [we never know what they're thinking],. B: That,'s right. [I don't trust them],, maybe I guess ity's because of what happened over there with their own people, how they threw them out of power . . . (SW3241) In Example (23) the demonstrative specifies the event concept referent of the preceding VP. In (24), the demonstrative specifies the proposition expressed by the preceding main clause, and the pronoun it specifies the state expressed by the clause / don't trust them. Whilst there have been attempts to classify abstract objects and describe the rules governing anaphoric reference to them (Webber 1991; Asher 1993; Dahl & Hellman 1995), there have been no empirical studies using actual resolution algorithms. However, as described in section 2, there are some important characteristics of discourse-deictic reference that research in theoretical linguistics has mapped out and that we make use of in our algorithm: referent coercion, preference for demonstratives, the right frontier rule, and the occurrence with particular predicates (see also Eckert & Strube 1999).
4.3 Vague anaphors We classified a further 13.2% of the anaphors as vague, in the sense that the pronoun does not have a clearly defined linguistic antecedent. The entities specified by vague pronouns are similar in nature to the discourse-deictic entities because they are also abstract. However, these pronouns do not specify the referent of a sentence or VP but to the general discourse topic, as shown in example 25:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We classified 22.6% of all anaphors in the corpus as discourse-deictic, i.e. whose referents are abstract objects, such as events, states, event concepts, facts, and propositions and that have VPs, clauses or sequences of clauses as antecedents, e.g.:
66 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
(25) B.29 I mean, the baby is like seventeen months and she just screams. A. 30 Uh-huh. B.31 Well even if she knows that they're fixing to get ready to go over there. They're not even there yet A. 3 2 Uh-huh. B.33 you know. A. 3 4 Yeah. It's hard. (SW4877)
4.4 Inferrable-Evoked Pronouns The remaining 19.1% of anaphors constitute a particular usage of the third person plural pronoun they, in which it has no explicit antecedent but is often associated with a singular NP denoting an institution, e.g.: (26) A.20 . . . in the Soviet Union, they spent more money on, urn, what do you call, um, military power than anything. (SW3241) In this example, the singular NP the Soviet Union has the inferrable inhabitants/population associated with it. The highlighted pronoun specifies the inferrable despite the inferrable itself not having been mentioned explicitly. We call these Inferrable-Evoked Pronouns (IEP). It is usually the case that the NP in question specifies a country, a school, a hospital, or some other kind of institution. The pronoun then specifies the authority or the population/members of the institution. Subsets of this type of pronoun have elsewhere been termed corporate pronouns (Jaeggli 1986; Belletti & Rizzi 1988). Our group of IEP's also includes cases where there is no explicitly mentioned institution, e.g.:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The pronoun in A. 3 4 is not specifying the specific incidence described by speaker B, but rather to the topic of childcare in general. With these pronouns it is impossible to identify a linguistic string in the context with which the pronoun is cospecfying. An algorithm that relies on linguistic surface form can therefore not resolve them and it is important that they be identified. In our analysis of the Switchboard dialogues, we observed an interesting contrast. Pronouns appear to be preferred for vague reference, where the referent is not easily identifiable, whereas demonstratives appear to be preferred for clearly defined reference. Note, for example, that in (25) above, if a demonstrative is substituted for the pronoun in A.34, yielding That's hard, then it would be interpreted as specifying not the general topic of childcare, but rather the specific incidence described by Speaker B.
Miriam Eckert and Michael Strube 67
(27) A. 19 They had an interview with ... The general. Stormin Norman . . . A.21 Anyway, at the end of it, they rolled all of the US names of the US casualties— (SW2403)
4.5 Unmarked anaphors We do not mark non-specifying pronouns and demonstratives such as expletives, subjects of weather verbs (quasi-arguments (Chomsky 1981: 37)) and subjects of raising verbs. Also, we ignore first and second person pronouns as the correct resolution of these would require an analysis of deictic shift, which the algorithm is not capable of modelling at this point. The pronouns specified by Postal & Pullum (1988) as subcategorized expletives, which they define as being non-specifying pronouns in argument positions are more difficult to categorize, e.g.: (28) I resent it greatly that you didn't call me. (Postal & Pullum 1988: ex. 21 h) Idiomatic uses of it are also unmarked as in the following: (29) When it comes to trucks, though, I would probably think to go American. (SW2326) (30) I haven't prepared any of my lectures, so I'm going to have to wing it/*them. ('improvise') (Postal & Pullum 1988: ex. 47c/d) The unacceptability of a pronoun agreeing in number or gender with the potential antecedent, like the plural pronoun them in example 30, is used as evidence that the neuter pronoun in that position is non-specifying. To identify non-specifying pronouns reliably, we use the criterion of possible question formation. In general, wh-questions cannot be formed on non-specifying pronouns, e.g. *lVhen what comes to trucks? *What's raining? *What seems that John snores?
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The plural pronouns in A. 19 and A.20 specifies the television authorities without the institution itself having been mentioned. It seems that certain institutions are salient enough that they require no explicit mention. IEP's and vague pronouns are the default classes in our algorithm for third person plural pronouns and third person singular neuter pronouns, respectively. They are classified as such by default when the algorithm fails to find a compatible antecedent within a predetermined domain. This is described in detail in section 5.
68 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
5 BUILDING S Y N C H R O N I Z I N G UNITS FROM DIALOGUE ACTS
5.1 Dialogue act theories
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
As mentioned in section 2, we are assuming that uttering an NP can result in its referent becoming part of the common ground. A question we had left open is determining when this happens. As Byron & Stent (1998) point out, it is difficult to determine the center of attention in multi-party discourse because the participants may not be focussing on the same entity at a given point. Our hypothesis is that the attentional state of the discourse participants can be determined by making reference to dialogue acts. The term dialogue act is derived from speech act and is intended to bring to mind the communicative function of an utterance in a conversation. We assume that acknowledgments are used by speakers to indicate that common ground is achieved and can therefore indicate which entities have been entered into the joint discourse model. Dialogue acts are also important for a second reason, namely they can be used as units for determining the domain in which the algorithm can look for potential antecedents.
There are many theories of dialogue acts and we discuss here only those relevant to our own model. Our common ground assumptions are based on Clark & Schaefer's (1989) theory of contributions (see also Traum's 1994 Discourse Units and Nakatani & Traum's 1999 Common Ground Units). In Clark & Schaefer's model, each dialogue act is labelled as a Presentation or an Acceptance. A Presentation and an Acceptance jointly form a Contribution . However, Clark & Schaefer's dialogue act labels are also used for larger units. Their rules are recursive and an Acceptance itself can consist of Contributions. This means that a dialogue can contain various subdialogues. The dialogue shown in Figure 2 (Clark & Schaefer 1989: 279, Fig. 4), for example, contains a two-turn subdialogue in which the speakers clarify the precise identification of the boy (B: Duveen? A: m). The recursion allows discourse structure to be represented. A further important feature of their model is that a single dialogue act may fulfil multiple functions: it can be both an Acceptance of a preceding Presentation and a Presentation itself, such as A's second utterance. Carletta et al. (1997) present a more fine-grained approach to dialogue acts in their model, which consists of three tiers describing Moves (dialogue acts), Games (dialogue act sequences), and Transactions (subdialogues). Moves are divided into three subtypes—Initiations, Responses, and Preparations—and,
Miriam Eckert and Michael Strube 69 A. well wo uh what shall we do about uh this boy then Pr
B. Duveen?
Ac Pr _ : A. m Ac Pr
B. well I propose to write, uh saying. I'm very sorry [etc]
Ac Figure 2 Clark & Schaefer's (1989) dialogue structure
5.2 Dialogue acts: units and categories in our model We assume that the establishment of common ground is indicated by dialogue acts and affects the operations for adding and removing discourse entities from the representation of the attentional state—in our model the list of salient discourse entities (S-list). We divide each dialogue into short, clearly defined dialogue acts. As pointed out in Byron & Stent (1998), determining utterance boundaries is difficult in spoken language, as annotators must use criteria that do not depend on punctuation. For this reason we define a unit syntactically as: o each main clause plus any subordinated clauses, or a smaller utterance. The inclusion of or a smaller utterance means that elliptical utterances, which occur frequently in spoken language, can be counted as units. The syntactic constituents serve as an upper boundary for unit definition, but a unit does not need to be syntactically complete. The labels given to these units are Initiation (I) and Acknowledgment (A), based on the top of the hierarchy given in Carletta et al. (1997). I's are
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
again, there are numerous subtypes within each of these to capture a variety of different functions. We wanted our model to fulfil two criteria: (r) it should reflect the achievement of common ground, and (2) it should be simple enough to allow a high degree of inter-coder reliability. To achieve the first goal, we use pairs of dialogue acts to form Synchronizing Units, similar but not identical to Common Ground Units and Contributions. To achieve the second, we simplify Carletta et al.'s model, ignoring the subtypes and using only an Initiation/Response-type of distinction. Furthermore, we do not allow for recursive discourse structure, as given in Clark & Schaefer's model.
70 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
(31) B18. I and it 's just like , everybody likes to blame everything on drugs now , I but I wonder , you know , I do you get the , I oh , that 's kind of side tracked , I but, uh , I just remember seeing on the news the other night, they had the thing about how Catholic schools are doing so much better A17. A Uh-huh . (sw3o83) Often it is not possible to tease apart I and A. There are utterances that function as an A but also have semantic content, for example answers to wh-questions. This type is labelled as A/I. The double label is reminiscent of Clark & Schaefer's model described above, in which a single utterance Table I Guidelines for labelling dialogue acts Label
Unit description
Initiation (I)
Statement Question
Acknowledgment/ Initiation (A/I)
Statement following an I Question following an I Answer to a wh-question Answer to a yes/no-question
Acknowledgment (A)
Vocal signal indicating understanding Word/Phrase indicating understanding
Further acknowledgment required?
Yes (If at turn transition)
No
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
dialogue acts that convey semantic content. A's, on the other hand, do not convey semantic content but have the pragmatic function of signalling that the other participant's utterance has been heard or understood. The unit type A has an important function and allows us to make use of utterances with no discourse entities, e.g. Uh-huh; yeah; right. Whilst Byron & Stent (1998) and Walker (1998) assign no importance to such utterances in their models, in our model these constitute a specific type of dialogue act that used to indicate the inclusion of entities into the common ground. In example 31 below, we see that Speaker B's turn has been divided into five dialogue acts. The third utterance do you get the constitutes a separate unit even though it is less than a full main clause. At the end of B's turn, Speaker A responds with Uh-huh. This last dialogue act does not contain any semantic information and is labelled A.
Miriam Eckert and Michael Stmbe 7 1
can fulfil two functions. Expressed in the terms of the dialogue act markup model DAMSL (Allen & Core 1997; Zollo & Core 1999), 1's are forwardlooking in the discourse, A's are backward-looking, and A/I's are both forward- and backward-looking. Only forward-looking dialogue acts require a further response or acknowledgment. Table I gives a summary of the labelling guidelines from our manual.
5.3 Achieving common ground
su
<
I
A.79
But we actually had some street people picked up last week in Dallas for picking up tin cans.
A
B.80
My gracious.
A.8 1
For picking up tin cans.
SU
- I
SU
- I
They were going to turn them in,
I
they were going to cash them in.
su
A
B.82
Uh-huh. And they picked them up, what for?
su
M A.83
Disturbing the trash, or something like that.
B.84
My gosh. Oh, ho , ho, ho. Oh, dear.
I
A.85
It just blew my mind.
A
B.86
Yes.
A m
1
su
Figure 3 Synchronizing units and dialogue acts
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In order adequately to represent the joint discourse model, we require a further unit that indicates when common ground is achieved. In our model, a single I and an A jointly form a Spchronizing Unit (SU). Examples of this can be seen in Figure 3. Single I's in longer turns (A8I) constitute SU's by themselves and do not require explicit acknowledgment. The assumption is that by letting the speaker continue, the hearer implicitly acknowledges the utterance. In this sense, SU's differ from Nakatani & Traum's Common Ground Units or Traum's Discourse Units, which require a response from the other participant to be completed. In our model, it is only in the context of turn-taking that I's and A's are paired up. This is in agreement with Clark & Schaefer's point that 'initiation of the relevant next contribution', 'acknowledgment' as well as 'continued attention' count as evidence of understanding (Clark & Schaefer, 1989, p. 267).
72 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
5.4 A note on incremental processing A positive feature of our model (and those such as Traum's) is that, unlike Clark & Schaefer's, it allows the level of dialogue acts to be labelled incrementally. Clark & Schaefer's Presentations and Acceptances appear not only at the level of dialogue acts but at embedded levels as well, meaning that these labels can only be fully applied to the discourse as a whole. In our model, labels at the dialogue act level (I, A, and A/I) are assigned locally and incrementally, a feature that is compatible with a processing model. At the level of Synchronizing Units, labels are also assigned incrementally but retrospective changes can be made. As shown in the examples above, if the content of a particular utterance indicates that the preceding utterance has been ignored, the S-List of the preceding one is deleted and the utterance not included in an SU. The difference between the two levels is due to the fact that the first level represents features of the utterances themselves, whilst the second is an attempt to represent the presuppositions of both speakers. It is unlikely that the presuppositions of all participants are ever identical, so a representation of common ground can only be an approximation. Furthermore, common ground update is generally a feature of more than one
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The SU's have two functions in our model. Firstly, they are used to indicate at which point the S-list is cleaned up—after each SU, discourse entities not referred to again are removed from the list. Again, this is a crude simplification but we leave the precise determination of the manner decay of discourse entities for future empirical research. What we wish to supply here is a unit for measuring their duration in the model. The second point is crucial to our hypothesis that common ground has an influence on attentional state: we assume that at turn transitions only acknowledged I's become part of an SU. If at a turn transition one speaker's I is not acknowledged by the other participant, it cannot be included in an SU and its discourse entities are deleted from the S-List. An example of this latter point can be seen in Figure 3. In turn B.84, the entity our area is added to the S-List. However, Speaker B is then interrupted by Speaker A. B's I is therefore at a turn transition but is not acknowledged. The discourse entity our area is then immediately deleted again from the S-List when the subsequent I shows that it is not part of the common ground. This means that it is not available as an antecedent for subsequent pronouns. The algorithm correctly predicts that the pronoun /( in A.85 does not cospecify with our area.
Miriam Eckert and Michael Strube 73
utterance, meaning that immediate representation as soon as an utterance is encountered is not feasible.
6 THE ALGORITHM Our algorithm makes use of the distinction between demonstratives and pronouns, in particular the preference for demonstratives to be discourse deictic and pronouns to have NP-antecedents. It consists of two branches, one for pronouns and the other for demonstratives. Both of them call the functions resolvelnd and resolveDD, which resolve individual and discoursedeictic anaphora, respectively.
Our method for resolving individual anaphors in spoken dialogue is based on the incremental algorithm described in Strube (1998). That model consists of a list of salient discourse entitites—the S-List—and an insertion operation. The S-List describes the attentional state of the hearer at any given point in processing the discourse and it contains the discourse entitites which are realised in the current and previous utterance. Within the S-List, the entities are ranked according to their information status, which is defined in terms of Prince's familiarity scale (Prince 1981) (cf. section 2.1): the set of hearer-old entities (OLD) contains evoked and unused elements, the set of mediated entities (MED) contains inferrables, containing inferrables and anchored brand-new discourse entities, and the set of hearer-new entities (NEW) contains brand-new discourse entities. OLD is ranked before MED and NEW, and MED is ranked before NEW. If the two entities in question carry the same information status, an entity in the preceding utterance is ranked higher than an entity in the current utterance. If both are in the same utterance, the ranking is determined by linear order, with the first entity ranked higher than the subsequent one. A formalisation of the complete ranking is shown in Table 2. Table 2 Ranking constraints on the S-List (Strube 1998) (1) If x € OLD and y 6 MED, then x < y. If x e OLD and y e NEW, then x -i. y. If x e MED and y € NEW, then x -< y. (2) If x, y e OLD, or x, yeMED, orx,ye NEW, then if uttx >- utty, then x -< y, if uttx = utty and posx < posy, then x -< y
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
6.1 Resolving individual anaphora
74 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
The algorithm processes the text incrementally. It is stated as follows: 1. If a referring expression is encountered, (a) if it is a pronoun, test the elements of the S-list in the given order until the test succeeds; (b) update S-List; the position of the referring expression under consideration is determined by the S-List-ranking criteria which are used as an insertion algorithm. 2. If the analysis of utterance U is finished, remove all discourse entitites from the S-List that are not realized in U.
6.2 Resolving discourse-deictic anaphora The method for the classification of the different types of pronouns and demonstratives described in section 4 is a major extension to the Strube (1998) algorithm. In addition to the S-List for individual anaphora, our algorithm also makes use of an A-List, which contains the referents of discourse-deictic anaphors. The function resolveDD begins with a search through the A-list. It was noted in section 2 that individual anaphora behave differently from discourse-deitic anaphors, in that the former specify entities already present in the discourse model, whereas the latter can be used to create new referents through referent coercion. For this reason we keep the two referent types separate. Unlike the S-List, which contains the discourse entities specified by each NP, the A-List only contains discourse entities previously referred to anaphorically with discourse-deictic pronouns and demonstratives. It does not contain the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The test described in point (i) succeeds when an entity is found which is specified by an NP with the same person and number as the anaphor. In our method, discourse entities are also added to the S-List immediately after they are encountered, and we adopt the same ranking as Straube (1998). resolvelnd consists of a search through the S-List for an antecedent matching with respect to gender and number. As was pointed out in section 5, the term utterance requires a different interpretation in spoken dialogues and we wish the algorithm to take common ground into account. We therefore replace the utterance unit of the Strube 1998 algorithm with Synchronising Unit (SU), which, as defined in section 5, consists of an Initiation and an Acknowledgement at turn transitions, or just an Initiation in mid-turn. At the end of each SU all discourse entities which are not referred to again are removed from the S-List. This means that the size and classification of the dialogue acts determine the set of potential antecedents of an anaphor.
Miriam Eckert and Michael Strube 75
abstract objects specified by each sentence and VP. The A-List is not necessary for first-time anaphoric reference but comes into play with multiple references to the same abstract object, as in example 17 above, and in the following, taken from the corpus: (32) I B.66 . . . and we make it so easy for them [to stay there with
welfare that they can get by just signing some papers.], I A.75 granted, they can do that, very easily. I It,'s easy to do, I but look where it, puts them. (SW24.03)
6.2.1 Context Ranking: dialogue acts and the Right Frontier Rule If the A-List is empty (which is usually the case), the algorithm looks through the linguistic context for an appropriate antecedent constituent, i.e. a non-NP constituent, which can function as an antecedent for a discourse-deictic anaphor. The order in which the possibilities are tried out is determined by the Context Ranking (examples are given below):
Context Ranking: (i) A-List. (ii) Within same I: Clause to the left of the clause containing the anaphor. (iii) Within previous I: Rightmost main clause (and subordinated clauses to the right of that main clause), (iv) Within previous I's: Rightmost complete sentence (if previous I is incomplete sentence).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In this example, we do not want to indicate that the neuter pronouns in the second and third utterance of A.75 each cospecify with their preceding I. Instead the algorithm should co-index them both with the discourse-deictic demonstrative in the first utterance of A.75. The demonstrative adds the event concept entity associated with the preceding VP to the discourse model. The algorithm adds the entity to the A-List. The subsequent discourse-deictic pronouns look in the A-List for referents. Only when there is no discourse entity in the A-List does a discourse-deictic anaphor create a new one. Like the S-List, the A-List is cleaned up at the end of each SU, meaning that referents which were not referred to again are removed. This reflects Passonneau's (1991) idea that the referents of discourse-deictic anaphors are lost immediately after intervening utterances (cf. section 2.3).
76 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
Point (ii) in the ranking indicates that if the A-List is empty, the algorithm looks first within the I containing the anaphor for the first clause to the left of the anaphor. This is successful in cases such as example (33) below: (33) I B1.04 I hope that, uh [they will start picking up on some of these things and, and getting involved],, because that.'s the only way that we're going to get out of it. (SW2403) If there is no clause to the left, as in example (34), the algorithm looks to the previous I and takes the rightmost main clause and the subordinate clause to thr right of that main clause—point (iii) in the ranking. Main and subordinated clauses preceding the first main clause are ignored. A. 50 because if you tell everybody everything, [everybody in the world would know because they'd put it on TV], A B.51 Right. I A.52 and that, wouldn't do us any good. (SW3241)
In some cases, there is no complete main clause in the preceding I alone. Point (iv) in the ranking indicates that the algorithm then looks to all preceding I's until a completed main clause is found. In example (35) (an extract from Figure 3 in the previous section), Speaker A's utterance in A. 8 3 is elliptical but the preceding question in B.82 can be used to form a syntactically complete clause. (35) I B.82 And [they picked them up, what for? A/I A83 Disturbing the trash or something like that.], A B.84 My gosh, Oh, ho, ho, ho. Oh dear. Well in our area right now, I A.85 It, just blew my mind. (SW2403) Webber's Right Frontier Rule (see section 2) is not violated because the Context Ranking is expressed in terms of dialogue acts. This means that although the text referring to the antecedent is often not literally adjacent to the anaphor, it is still within the adjacent SU. Intervening A's (B.51 in (34) and B.84 i n (3 s)) a r e invisible for the purpose of adjacency. Unacknowledged I's, i.e. those not belonging to an SU (B.84 i n (3 s)) a r e a l s o invisible for discourse-deictic reference.
6.3 Anaphor classification and resolution As noted in section 2, the predicative context of discourse-deictic anaphors determines what type of abstract object they refer to, i.e. whether they refer to states, events, event concepts, propositions, or facts. Our algorithm at
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(34) I
Miriam Eckert and Michael Strube 77 Table 3 I-incompatibility and A-incompatibility I-Incompatible (*I) Anaphors in the x-position cannot refer to individual, concrete entities. •
•
•
present does not have access to the formalized semantic information that would be necessary to make these distinctions explicit but we assume that the predicate of the anaphor creates a referent of the correct type. We also use the predicative context of the anaphor to distinguish between some individual and abstract anaphors. We define an anaphor to be I-incompatible (cannot refer to an individual object) or A-incompatible4 (cannot refer to an abstract object) if it occurs in one of the corresponding contexts described in Table 3. An anaphor in the object position of the verb assume, for example, is unlikely to have a concrete NP antecedent. This context is therefore described as being I-incompatible in the table. Conversely, the object position of the verb eat is unlikely to have an abstract entity such as an event or a proposition as its referent, and the context is listed as A-incompatible. It is clear that there are problems associated with such tables. One point is that the predicates are in most cases preferentially associated with either abstract or individual referents rather than categorically (see Section 9 for a discussion of this point). This means that although a predicate may be listed as I-incompatible, an individual referent may still be acceptable in some instances, and although a predicate may be listed as A-incompatible, an abstract object referent may be acceptable in some instances. While the lists do not reflect language competence precisely, they do describe the 4 The A and I in this terminology should not be confused with the A and I used to refer to Acknowledgements and Initiations—this similarity is a coincidence.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
• •
A-Incompatible (*A) Anaphors in the x-position cannot refer to abstract entities _ . , • Equating constructions where a Equating constructions where a pronominal referent is equated with a pronominal referent is equated with an concrete individual referent, e.g. x is a abstract object, e.g. x is making it easy, car, x is a nice place to visit. x is a suggestion. • Copula constructions whose adjectives Copula constructions whose adjectives can only be applied to concrete entities, can only be applied to abstract entities, e.g. x is expensive, x is tasty, x is loud. e.g. x is true, x is correct, x is right. • Arguments of verbs describing physical Arguments of propositional attitude contact/stimulation, which are generally verbs, arguments of verbs which mainly not used metaphorically, e.g. break x, take S'-complements, e.g. assume x; say x. smash x, eat x, drink x, smell x, swallow x. Object of do (do x). Anaphoric referent is equated with a 'reason', e.g. x is because I like her, Anaphor occurs in cleft construction with how, why, e.g. x is why he's late.
78 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
6.4 An example The extract from the corpus shown in Table 6 is used to exemplify the algorithm. The leftmost column lists the SU's (28- indicates the beginning, -28 the end of the first SU in the example), the second column gives the dialogue act labels and the third the speakers and turns. For ease of representation, the S- and A-Lists are only given below each SU in the state they are at that point in the discourse, and not each time they are updated.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
predominating language use and therefore greatly enhance the performance of the algorithm because they help avoid a large number of errors. The majority of predicates are not contained in the table. Most predicative contexts, e.g. know x or x is good, allow both concrete and abstract referents in their argument positions. I- or A-incompatibility is determined before the application of the actual algorithm. If an anaphor occurs in a context not specified on the lists, that is, it is neither /- nor A-incompatible, the classification is determined by the resolution algorithm. The anaphora resolution algorithm is shown in Tables 4 and 5. If a pronoun (third person singular neuter) is encountered (Table 4), the function resolvelnd is evaluated, if the pronoun is I-incompatible (case 1) and the function resolveDD is evaluated if the pronoun is A-incompatible (case 2). In the case of success the pronoun is classified as IPro (individual) or DDPro (discourse deictic), respectively. In the case of failure, the pronouns are classified as VagPro (vague). If the pronoun is neither /- nor Aincompatible (i.e. the predicative context of the pronoun is ambiguous in this respect), the classification is only dependent on the success of the resolution, i.e. on the availability of referents in the S/A-Lists. The function resolvelnd is evaluated first (case 3) because of the observed preference for pronouns to have individual antecedents. If successful, the pronoun is simultaneously resolved and classified as IPro, if unsuccessful, the function resolveDD attempts to resolve the pronoun (case 4). If this, in turn, is successful, the pronoun is resolved and classified as DDPro, if it is unsuccessful it is classified as VagPro, indicating that the pronoun cannot be resolved using the linguistic context. The procedure is similar in the case of demonstratives (Table 5). The only difference is that case 3 and case 4 are reversed to capture the preference for demonstratives to be discourse-deictic. Third person masculine or feminine pronouns are resolved directly by a look-up in the S-List as these cannot be discourse-deictic and are almost never vague. Third person plural pronouns for which antecedents can be found in the S-List are classified as IPro, if they cannot be resolved, they are marked as IEPro (inferrable-evoked).
Miriam Eckert and Michael Strube 79 Table 5 Demonstrative resolution algorithm
1. case PRO is I-incompatible if resolveDD(mO) then classify as DDPro else classify as VagPro 2. case PRO is A-incompatible if resolveInd(PRO) then classify as IPro else classify as VagPro 3. case PRO is ambiguous if resolveInd{PKO) then classify as IPro 4. else if r«o/i>eDD(PRO) then classify as DDPro else classify as VagPro
1. case DEM is I-incompatible if resolveDD(DEM) then classify as DDDem else classify as VagDem 2. case DEM is A-incompatible if resolvelndpEU) then classify as LDem else classify as VagDem 3. case DEM is ambiguous if resolveDD(DEM) then classify as DDDem 4. else if resolveInd(DEbA) then classify as IDem else classify as VagDem
Table 6 Example analysis 28-28
I A
B.i8 A. 19
And [she, ended up going to the [University of Oklahoma]2]3. Uh-huh. S: [DE,: she, DE 2 : Univ. of Oklahoma]
29-29
I
B.20
I can say thatj because it 2 was a big well known school, S: [DE 2 :it] A; [DE3: that]
30-30
I
it 2 had a well known eduction4— S: [DE2: it, DE 4 : education]
At the end of SU 28, the S-list contains the referents of the NPs she and University of Oklahoma. The demonstrative that in turn B.20 is in the object position of the verb say and therefore classified as I-incompatible. The Context Ranking must then determine its referent. There has been no previous discourse-deictic reference so the A-list is empty (or non-existent). There is no clause in the same I as the anaphor so it looks to the preceding I and gets the referent of the main clause she ended up going to the University of Oklahoma. This referent is added to the A-list as Discourse Entity^ (DEJ. The first pronoun it in B.20 is in an A-incompatible position as the copula construction equates it with a concrete referent (a big well-known school). The algorithm searches through the previous S-List for the highestranked referent, which in this case is the only referent DE2. In SU 30 there is another pronoun which again is in an A-incompatible context and the S-List must be looked at for an antecedent (DE2). Through repeated mention this referent is thus kept in the S-List for the entire
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Table 4 Pronoun resolution algorithm
80 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
length of the extract. At the end of SU 30 no reference has been made to the entity in the A-List (DE3) so this list is once again empty.
7 EMPIRICAL EVALUATION Our data consisted of five randomly selected dialogues from the Switchboard corpus of spoken telephone conversations (LDC 1993). We empirically evaluated
Two dialogues were used to train the two annotators (SW2041, SW4877), and three further dialogues for testing hand annotation and algorithm performance (SW2403, SW3117, SW3241).
7.1 Reliability of hand annotation As a measure of inter-coder reliability we used the Kappa-statistic, which was first suggested for linguistic classification tasks by Carletta (1996), and has since been used by others (e.g. Carletta et al. 1997; Passonneau & Litman 1997; Poesio & Vieira 1998). This statistic measures the percent agreement between annotators but adjusts it by the percent chance agreement for a particular classification task, taking into account the relative frequency of each class. The formula is stated as follows, where PA is the actual agreement between annotators, and PE is the agreement between annotators one would expect by chance: (36 U
'
K =
PA-PE i-PE
A K of more than .80 is generally assumed to indicate high reliability of the classifications, a K, between .68 and .80 allows tentative conclusions, while a K lower than .68 shows that the classification is not reliable. Dialogue acts. In the first classification task, turns were segmented into dialogue act units. For the purpose of applying the K statistic we turned the segmentation task into a classification task by using boundaries between
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
o the hand annotation of three dialogues for dialogue act units, dialogue act labels, classification of pronouns, classification of demonstratives and the co-indexation of anaphors; o the classification and co-indexation of anaphors in the same three dialogues by the algorithm.
Miriam Eckert and Michael Strube 81
Table 7 Dialogue Act Units SW2403
SW3117
SW3241
E
Non-Bound Bound
3372
3332
1717
8421
454
452
241
1147
N Z PA PE
1913 1877
979 0.9826 0.7841
4784 4705 0.9835 0.7890
0.9200
0.9217
1892 1866 0.9863 0.7896 0-9347
0.9812 0.7908 0.9100
K
962
Individual and abstract object anaphora. For the classification of pronouns (IPro, DDPro, VagPro, DEPro) a PA of 87.5% was measured, K = 0.81 (Table 9). For the classification of demonstratives (IDem, DDDem, VagDem) PA was 90.78%, K = 0.80 (Table 10). Table 8 Dialogue act labels E
SW2403
SW3117
SW3241
I A A/I No
230
211
108
549
98 38
120
68
286
41
16
95
0
8
8
16
N Z PA PE a
183 167
190
100
181
90
0.9126
0.9326
0.9000
0.4774 0.8327
0.4201
0.4152
0.9183
0.8290
473 438 0.9260 0.4273 0.8708
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
dialogue acts as one class, and non-boundaries as the other (see Passonneau & Litman 1997 for a similar practice). Table 7 shows the results. N is the total number of units (boundaries plus non-boundaries), and Z is the total percent agreement, where each unit gets 1 if both annotators agree on its classification and o if they do not. The percent agreement (PA) between the annotators was 98.35%, and K = 0.92, indicating high reliability of the annotations. These dialogue act units were then classified as Initiations (I), Acknowledgments (A), Acknowledgment/Initiations (A/I), and no dialogue act (No). For this test we used only those dialogue act units which the annotators agreed about. The PA over labels given to the dialogue act units was 92.6%, K = 0.87, again indicating that it is possible to annotate these classes reliably (Table 8).
82 Dialogue Acts, Synchronizing Units, and Anaphora Resolution Table 9 Classification of pronouns
E
SW2403
SW3117
IPro DDPro VagPro IEPro
120
148
5
33 3' 24
5
9
20
26
273 47 77
20
86
130
N
104
97
83
90
63 58
264
Z PA PE
0.7980 0-3935 0.6670
K
0.9278 0.6039 0.8170
SW3241
231
0.8750 0.3571 0.8055
0.9206 0.5151 0.8363
SW3241
E
SW2403
SW3117
IDem DDDem VagDem
9 45 5
19
2
3°
34 3
28
107
6
14
N Z PA PE
28 26
18
27
76 69
3° 0.9000 0.5919 0.7550
K
16
0.9286 0.4866 0.8609
0.8888 0.6358 0.6949
0.9078 0.5430 0.7985
Table 11 Annotators' agreement about antecedents of anaphora against key SW2403
SW3117
SW3241
Agreement No Agreement
55
69
3
127
2
0
0
2
Agreement No Agreement
56
65 4
3
124
1
0
5
Agreement No Agreement
31
15
7
2
1
10
Agreement No Agreement
35 3
16
15
66
1
0
4
Individual
B
Discourse-deictic
A 60
B
Co-indexation of anaphora. We used only those anaphors whose classification both annotators agreed upon. The annotators then marked the antecedents and co-indexed them with the anaphors. The results were compared and the annotators agreed upon a reconciled version of the data.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Table 10 Classification of demonstratives
Miriam Eckert and Michael Strube 83 Table 12 Results of the individual anaphora resolution algorithm
No. Resolved Correctly No. Resolved Overall No. Resolved in Key Precision Recall
E
SW2403
SW3117
SW3241
35
52 77 69
1
88
6 3
133 129
5° 57 0.7 0.614
0.675 0.754
0.662 0.682
0.167 0-333
7.2 Performance of the algorithm We then used the reconciled version of the annotation as the key for the individual and abstract anaphora resolution algorithms. Our measure of the algorithm's success considered both precision and recall. Precision and recall are measured by comparing the algorithm's results to the key, with the key being considered 'correct' at all times. Precision indicates how many of the anaphors resolved by the algorithm were correct. Recall indicates how many of the anaphors resolved in the key were resolved correctly by the algorithm. This distinction is important for the following reason: an algorithm with high precision but low recall makes few mistakes but leaves out many of the anaphors resolved in the key. Conversely, an algorithm with high recall but low precision gets most the anaphors resolved in the key but in addition resolves many more anaphors that were deemed unresolvable in the key. For individual anaphors, Precision was 66.2% and Recall 68.2% (Table 12), for discourse-deictic anaphors Precision was 63.6% and Recall 70% (Table 13). The low value for precision indicates that the classification did not perform very well. Only few of the Table 13 Results of the discourse-deictic anaphora algorithm
No. Resolved Correctly No. Resolved Overall No. Resolved in Key Precision Recall
SW3241
E
11
13
19 17
20
49 77
15
70
SW2403
SW3117
25 38 38 0.658 0.658
0.579 0.647
0.65
0.867
0.636 0.7
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Annotator accuracy was then measured against the reconciled version. Table 11 shows that accuracy ranged from 98.4% (Annotator A) to 96.1% (Annotator B) for individual anaphors and from 85.7% to 94.3% for abstract anaphors.
84 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
anaphors resolved incorrectly were classified correctly. One of the most common errors was that a discourse-deictic or vague anaphor was classified as individual because an individual antecedent was available. A source of errors with respect to the resolution was that we did not allow the domain of the antecedent to excede one SU. However, exactly this restriction allowed us to resolve many of the discourse-deictic anaphors and also classify a high percentage of VagPros and IEPros correctly.
8 COMPARISON T O RELATED W O R K
9 C O N C L U S I O N S AND FUTURE W O R K We consider the work presented here to make important contributions to the study of anaphora in two respects. First, we have presented a model of anaphora resolution in spontaneous spoken dialogues. In particular, we have provided a method of structuring dialogues using dialogue acts to define the domain for potential antecedents, thus avoiding the problems that incomplete utterances, repetitions, false starts and utterances with no content words present for methods relying purely on syntactic units.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Both Webber (1991) and Asher (1993) describe the phenomenon of abstract object anaphora and describe restrictions on the set of potential antecedents. They do not, however, concern themselves with the problem of how to classify a particular pronoun or demonstrative as individual or abstract. Also, as they do not give preferences on the set of potential candidates, their approaches are not intended as attempts to resolve abstract object anaphora. To our knowledge, only little research has been carried out in the area of anaphora resolution in dialogues. LuperFoy (1992) does not present a corpus study, meaning that statistics about the distribution of individual and abstract object anaphora or about the success rate of her approach are not available. Byron & Stent (1998) present extensions of the centering model (Grosz et al. 1995) for spoken dialogue and identify several problems with the model. However, they also do not present data on the resolution of pronouns in dialogues and do not mention abstract object anaphora. More recently, Zollo & Core (1999) presented their work on the extraction of grounding tags (which correspond to Nakatani & Traum's (1999) Common Ground Units) from dialogue tags. Their work is based on the same idea as ours, that Common Ground Units/Synchronizing Units can be derived from dialogue acts.
Miriam Eckert and Michael Strube 85
argument positions, e.g. I told him that [he'd been firedjj and he swallowed
it;. Secondly, in our anaphor classification, individual anaphors are those coindexed with NPs, and discourse-deictic anaphors are those co-indexed with VPs and clauses. This is a syntactic distinction. Our distinction between A- and I-incompatible contexts, on the other hand, is semantic, separating abstract from concrete referents. While there is a correlation between NPs and concrete referents on the one hand and between clauses and abstract referents on the other, there are exceptions. Most notably, there are many NPs that specify abstract entities, and that can therefore function as antecedents for anaphors in so-called A-incompatible verbal contexts, such as the event-specifying subject position of happen, e.g. The accident . . . Itj happened yesterday.
To improve this situation, we are currently looking at the possibility of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Secondly, we have provided a classification system for the different types of pronouns and demonstratives found in spoken language. This makes it possible to state from the outset which ones are in principle resolvable and which ones do not have linguistic antecedents. Furthermore, the empirical analysis has drawn attention to the large number of pronouns with non-NP antecedents and with no linguistic antecedents. For the field of computational linguistics, we hope to have provided a basis for the application of resolution algorithms to spoken language. An important contribution in this respect, is the observation that only two of the pronoun and demonstrative types identified by us are resolvable. Individual anaphors, i.e. those with NP antecedents, have been dealt with by most existing algorithms. We have identified some important criteria that can be used to resolve the second type, i.e. those involving discourse debus. Our algorithm uses information supplied by the anaphor's predicate as well as the form of the anaphor itself (pronoun vs. demonstrative) to distinguish discourse-deictic from individual reference. For the resolution process of discourse-deictic references, dialogue acts are again used to function as antecedents. We have shown that a model based on these criteria is viable. We have also identified weak points in the model which could be addressed by future research. As mentioned in section 6, our use of predicative information does not adequately reflect language use, as it generalises over preferences by making a binary distinction between verbal argument positions requiring individual and abstract object reference. While this allows the algorithm to distinguish many instances of individual and abstract anaphora, the overgeneralization also results in some mistakes. The errors result primarily for two reasons. The first is that some verbs can be used metaphorically so that physical contact verbs such as swallow, which we list as A-incompatible, can have abstract object anaphors in their
86 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
linking the algorithm to a lexical database such as WordNet (see Fellbaum 1998) to provide semantic information. In WordNet, the NP accident (Sense 1), for example, is listed as a hyponym of event, thus explaining why it can act as an antecedent for an anaphor we predict to require an event referent: (37) accident—(a mishap; especially one causing injury or death) => mishap, misadventure, mischance—(an instance of misfortune) => misfortune, bad luck—(unnecessary and unforeseen trouble) => trouble—(an event causing distress or pain; 'what is the trouble?') => happening, occurrence, natural event—(an event that happens) =>• event—(something that happens at a given place and time)
Acknowledgements We would like to thank Donna Byron and Amanda Stent for discussing the central issues in this paper and three anonymous reviewers for helpful comments. We are also grateful for feedback from the participants of Ellen Prince's Discourse Analysis Seminar and the audiences at the Amstelogue '99 workshop and at the Linguistics Research Department, Bell Labs, Lucent Technologies. This work was funded by post-doctoral fellowship awards from the Institute for Research in Cognitive Science, University of Pennsylvania (NSF SBR 8920230).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
An additional problem is that as was pointed out in section 4, there are different types of abstract objects that discourse-deictic anaphors can specify. Currently our algorithm does not distinguish between events, states, propositions and facts in the A-List. We assume, following Asher (1993), that the anaphor and its predicate select a referent of the correct type. It is clear, though, that not any clause can function as antecedent for a discourse-deictic anaphor. A clause describing a state, for example, cannot function as an antecedent for an event anaphor, e.g. *[Mary knows French.Jj Thati happens frequently. We have noted in our corpus that some discoursedeictic anaphors are not immediately adjacent to their antecedents but that such anaphor-antecedent compatibility eliminates potential ambiguity. Providing the algorithm with this kind of information could be useful for selecting the correct antecedent. However, the distinction between events and states involves a complex interaction between lexical information, tense and aspect (cf. Moens & Steedman 1988), making it difficult to determine simple rules usable in an automated process. To our knowledge, pronoun resolution algorithms have so far not been applied to the domain of spoken language. Issues such as the number of dialogue acts functioning as the antecedent domain and the characteristics of the entities in the A-List are problems that must be solved empirically. We hope to have provided a solid basis for further work in this area by identifying the specific problems and pointing towards possible solutions.
Miriam Eckert and Michael Strube 87 MIRIAM ECKERT Institute for Research in Cognitive Science University of Pennsylvania 3401 Walnut Street, Suite 400A Philadelphia, PA 19104, USA [email protected]
Received: 01.09.1999 Final version received: 14.07.2000
MICHAEL STRUBE European Media Laboratory GmbH Villa Bosch Schloss- Wolfsbrunnenweg 33 69118 Heidelberg, Germany [email protected]
the Association for Computational LinAllen, James F. & Core, Mark (1997), guistics, Montreal, Quebec, Canada, DAMSL: Dialog Act Markup in Several 10-14 August 1998, 1475-7. Layers, draft of manual, March 1997. Allen, James F., Schubert, Lenhart K, Carletta, Jean (1996), 'Assessing agreement Ferguson, George, Heeman, Peter, on classification tasks: the kappa statisHee Hwang, Chung, Kato, Tsuneaki, tic', Computational Linguistics, 22, 2, Light, Marc, Martin, Nathaniel, Miller, 249-54. Bradford, Poesio, Massimo & Traum, Carletta, Jean, Isard, Amy, Isard, Stephen, David (1995), The TRAINS project: a Kowtko, Jacqueline, Doherty-Sneddon, case study in building a conversational Gwyneth &c Anderson, Anne (1997), agent', fournal of Experimental and The reliability of a dialogue structure Theoretical AI, 7, 7-48. coding scheme', Computational Linguistics, Anderson, Anne H , Bader, Miles, Gurman 2 3 . 1. 1 3 - 3 1 Bard, Ellen, Boyle, Elizabeth, Doherty, Chomsky, Noam (1981), Lectures on Gwyneth, Garrod, Simon, Isard, Government and Binding, Foris, Dordrecht. Stephen, Kowtko, Jacqueline, McAllister, Clark, Herbert H. & Schaefer, Edward F. Jan, Miller, Jim, Sotillo, Catherine, (1989), 'Contributing to discourse', Thompson, Henry & Weinert, Regina Cognitive Science, 13, 259-94. (1991), The HCRC Map Task corpus', Dahl, Osten & Hellman, Christina (1995), Language and Speech, 34, 4, 351-66. 'What happens when we use an anaphor', in Presentation at the XVlh Asher, Nicholas (1993), Reference to Abstract Scandinavian Conference of Linguistics, Objects in Discourse, Kluwer. Dordrecht. Oslo, Norway. Belletti, Adriana & Rizzi, Luigi (1988), 'Psych verbs and theta theory', Natural Eckert, Miriam (1998), 'Discourse deixis Language and Linguistic Theory, 6, and null anaphora in German', 291-352. Ph.D. thesis, Department of Linguistics, University of Edinburgh, Edinburgh, Byron, Donna & Stent, Amanda (1998), 'A Scotland. preliminary model of centering in dialog', in Proceedings of the 17th Inter- Eckert, Miriam & Strube, Michael (1999), national Conference on Computational 'Resolving discourse deictic anaphora Linguistics and 36th Annual Meeting of in dialogues', in Proceedings of the 9th
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
REFERENCES
88 Dialogue Acts, Synchronizing Units, and Anaphora Resolution
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Conference of the European Chapter of theLDC (1996), CALLFRJEND American Association for Computational Linguistics, English, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, Bergen, Norway, 8-12 June 1999, 37-44. PA. Fellbaum, Christiane (ed.) (1998), WordNet: An Electronic Lexical Database, MIT Press, LDC (1997), CALLHOME American English Cambridge, Mass. Speech, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, Grice, H. Paul (1975), 'William James PA. lectures on logic and conversation', in The Logic of Grammar, Dickenson, Lewis, David (1979), 'Keeping in a language Encino, CA, 64-75. game', in R. Baeuerle el al. (eds), Semantics from a Different Point of View, Grosz, Barbara J., Joshi, Aravind K. & Springer Verlag, Berlin, Germany. Weinstein, Scott (1995), 'Centering: a framework for modeling the local LuperFoy, Susann (1992), The representacoherence of discourse', Computational tion of multimodal user interface dialoLinguistics, 21, 2, 203-25. gues using discourse pegs', in Proceedings of the 30th Annual Meeting of the AssociaGrosz, Barbara J. & Sidner, Candace L. tion for Computational Linguistics, Newark, (1986), 'Attention, intentions, and the DE, 28 June-2july 1992, 22-31. structure of discourse', Computational Linguistics, 12, 3, 175-204. Moens, Marc & Steedman, Mark (1988), Temporal ontology and temporal Gundel, Jeanette K., Hedberg, Nancy & reference', Computational Linguistics, 14, Zacharski, Ron (1993), 'Cognitive status 2, 15-28. and the form of referring expressions in discourse', Language, 69, 274-307. Nakatani, Christine H. & Traum, David (1999), 'A two-level approach to coding Heim, Irene (1982), The Semantics of dialogue for discourse structure: activdefinite and indefinite noun phrases', ities of the 1998 DRI working group on Ph.D. thesis, University of Massachuhigher-level structures', in Proceedings of setts, published by Graduate Linguistics the ACL '0.9 Workshop Towards Standards Student Organization. and Tools for Discourse Tagging, College Jaeggli, Osvaldo (1986), 'Arbitrary plural Park, MD, 21 June 1999, pp. 101-108. pronominals', Natural Language and Linguistic Theory, 4, 43-76. Passonneau, Rebecca J. (1991), 'Some facts about centers, indexicals, and demonKamp, Hans & Reyle, Uwe (1993), From stratives', in Proceedings of the 29th Discourse to Logic: Introduction to Annual Meeting of the Association for Modeltheoretic Semantics of Natural Computational Linguistics, Berkeley, CA, Language, Formal Logic and Discourse 18-21 June 1991, 3-70. Representation Theory. Kluwer, Dordrecht. Rebecca & Litman, Karttunen, Lauri (1976), 'Assertion', in Passonneau, James McCawley (ed.), Syntax and Diane J. (1997), 'Discourse segmentation by human and automated means', ComSemantics 7, Academic Press, New York, 363-385. putational Linguistics, 23, 1, 103-39. Kripke, Saul (1979), 'Speaker's reference Poesio, Massimo & Vieira, Renata (1998), 'A corpus-based investigation of definite and semantic reference', in P. French, description use', Computational LinguistT. Uehling & H. Wettstein (eds), Contemporary Perspectives in the Philosophy of ics, 24, 2, 183-216. Language, University of Minnesota Press, Postal, Paul & Pullum, Geoffrey (1988), Minneapolis, MN, 6-27. 'Expletive noun phrases in subcategorized positions', Linguistic Inquiry, LDC (1993), Switchboard, Linguistic Data 19, 635-70. Consortium, University of Pennsylvania, Philadelphia, PA. Prince, Ellen F. (1981), Towards a
Miriam Eckert and Michael Strube 89
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
taxonomy of given-new information', in Traum, David R (1994), 'A computational P. Cole (ed.), Radical Pragmatics, theory of grounding in natural language Academic Press, New York, NY, 223-55. conversation', Ph.D. thesis, Department of Computer Science, University of Prince, Ellen F. (1992), The ZPG letter Rochester, Rochester, NY. subjects, definiteness, and informationstatus', in W. C. Mann & S. A. Thompson Walker, Marilyn A. (1998), 'Centering, (eds), Discourse Description: Diverse Lin- anaphora resolution, and discourse guistic Analyses of a Fund-Raising Text, structure', in M A Walker, A K. Joshi, John Benjamins, Amsterdam, 295-325. & E. F. Prince (eds), Centering Theory in Discourse, Oxford University Press, Ritchie, Graeme D. (1979), Temporal Oxford, 401-35. clauses in English', Theoretical Linguistics, 6, 87-115. Webber, Bonnie L. (1979), A Formal Approach to Discourse Anaphora, Garland, Russell, Bertrand (1905), 'On denoting', New York, NY. Mind, 14, 479-93Stalnaker, Robert C. (1974), 'Pragmatic Webber, Bonnie L. (1983), 'So what can we talk about now?' in M. Brady & R. C. presuppositions', in M. Munitz & Berwick (eds.), Computational Models of P. Unger (eds), Semantics and Philosophy, Discourse, MIT Press, Cambridge, MA, New York University Press, New York, NY, 197-213. 333-71Stalnaker, Robert C. (1979), 'Assertion', in Webber, Bonnie L. (1991), 'Structure and P. Cole (ed.), Syntax and Semantics 9: ostension in the interpretation of disPragmatics, Academic Press, New York, course deixis', Language and Cognitive NY, 315-332. Processes, 6, 2, 107-35. Strube, Michael (1998), 'Never look back: an Zollo, Teresa & Core, Mark (1999), alternative to centering', in Proceedings of 'Automatically extracting grounding the 17th International Conference on Com- tags from BF tags', in Proceedings of putational Linguistics and 36th Annual the ACL 'gg Workshop Towards StanMeeting ofthe Associationfor Computational dards and Took for Discourse Tagging, Linguistics, Montreal, Quebec, Canada, College Park, MD, 21 June 1999, 10-14 August 1998, Vol. 2, 1251-7. 109-14.
Journal of Semantics 17: 1-6
© Oxford University Press 2000
Guest Editors' Introduction
you want an apple? and imperatives like You must eat an apple! These kinds of
utterances also seem to resist a successful truth-conditional analysis, and should therefore also be analysed in terms of felicity conditions. But Austin realized that even the use of a typical constative sentence, like The apple is green, is the performance of a speech act; we assert that a certain state of affairs obtains, and by performing this speech act we want to change the world; i.e. we intend to influence the (common) beliefs of the participants of the conversation. Once one admits that typical constative sentences can be used to perform certain speech acts, the distinction between constatives and performatives becomes useless. Austin indeed came to argue that all utterences contain both constative and performative elements; an utterance must be analysed in terms of both its propositional content, i.e. description of state of affairs that is or will be the case, and its illocutionary force, i.e. the speech act performed. This final view on speech acts by Austin was taken up and systematically worked out by John Searle (1969). The goal of his work was to determine the types of (illocutionary) actions that one can perform in speaking, and to
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Formal theories of natural language semantics grew mostly out of truthconditional theories of meaning, where the meaning of a sentence is equated with a criterion that says in which circumstances a sentence is true and in which circumstances it is false. Ordinary language philosophers, however, soon stressed the limitations of a purely truth-conditional analysis of meaning. In particular, Austin, in his posthumously published How to do Things with Words (1962), emphasized that not all sentences are used to make statements that are true or false; a declarative sentence like / hereby declare war on Zanzibar is not used to describe a state of affairs, but rather to perform a certain kind of action, i.e. it constitutes a speech act. John Austin called these utterances performatives, utterances whose purpose is to change the world, and distinguished them from constatives. Whereas constative utterances can be evaluated on the dimension of truth and falsity, performatives cannot, according to Austin, and should be evaluated on the dimension of felicity. Just as a constative utterance can be 'wrong' because the speaker misdescribes the relevant situation, a performative utterance can be 'wrong' because the speaker performs the speech act in an infelicitous way; the speaker might have used I hereby declare war on Zanzibar, for instance, without her having the appropriate authority to do so. Although the starting point of Austin's How to do Things with Words are explicit performatives, the attention soon shifted to interrogatives like Do
2 Introduction
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
define these types in terms of the necessary and sufficient conditions for their succesful performance. In traditional speech act theory the emphasis is put on the illocutionary act made by the utterance. It is assumed that with every use of a sentence we can associate a particular illocutionary act, and that this act is related to the sentence by convention. It is assumed that the illocutionary force of the utterance is expressed in the form of the sentence, and is thereby part of the meaning of the sentence. However, this part of the meaning of sentences is claimed to be irreducible to truth-conditions. The hypothesis that the illocutionary force of an utterance is conventionally associated with the sentence used is obviously true for explicit performatives where the illocutionary force is the one named by the performative verb in the matrix clause. But it is also hypothesized that the assumption holds for implicit performatives. In particular, it is assumed that the three major sentence-types of English, the imperative, interrogative and declarative, are associated with the forces of ordering, questioning, and stating respectively. A major problem for the hypothesis that illocutionary force is associated with sentences by convention is the phenomenon of indirect speech acts. On the assumption that there exists a direct relation between syntactic mood and illocutionary force, it has to be explained how we can interpret the utterance of Can you pass me the salt? as a request, rather than as a question. Searle (1975) argues that in such cases the sentence still retains its conventional illocutionary force, but that indirectly the speech act of a request is made, too. This additional meaning is then inferred via Grice's (1975) general theory of conversational implicatures. But once we are able to determine the indirect speech act made by an utterance in terms of a pragmatic theory, the question arises why we should adopt the conventional 'literal force' hypothesis in the first place. Indeed, in the last decades much effort has been devoted (especially in artificial intelligence) to the explanation of illocutionary force exclusively in terms of communicative intentions of language users (e.g. Cohen & Perrault 1979), thereby linking Austin's theory of speech acts with Grice's (1957) theory of meaning. The paper of Herzig & Longin in this issue belongs to this tradition. By concentrating mainly on the illocutionary force of utterances, the work done on speech acts in artificial intelligence gradually moved towards the recovery of the beliefs and intentions of the speakers. Although the (perlocutionary) effect of utterances using particular linguistic expressions was not completely neglected, most attention was given to the conditions under which speech acts can be successfully performed. The effects which particular uses of sentences have in certain contexts have been more fully studied in the context change theories of meaning, as
Henk Zeevat and Robert van Rooy 3
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
pioneered by Hamblin (1971) and Stalnaker (1978). According to these theories, contexts represent the (common) beliefs and preferences of the participants of the conversation, and speech acts are treated as functions from old to new contexts. In these dynamic theories of meaning, which are capable of incorporating a good deal of traditional pragmatics, speech acts can be studied for their informational consequences in information states ranging from those of the conversational partners individually to the common ground belonging to the dialogue. The context change theories of Kamp (1981), Heim (1982), and Veltman (1996) are sophisticated enough to be able to give a linguistic analysis of anaphora, presuppositions and modal expressions used in declarative sentences. Ginzburg (1996) and Groenendijk (1998) have recently extended these context change theories in order to account also for interrogative sentences, and in the contribution of van Rooy to this volume a context change analysis of imperative sentences is developed, too. Traditional speech act theory claims that the illocutionary force is conventionally associated with sentence form, but that its meaning cannot be captured in truth-conditional terms. Not only mood indicators, but also so-called discourse adverbials, discourse connectors, and parenthetical constructions are said to indicate either a particular speech act or an attitude of the speaker, and sometimes analysed as tacit performatives (cf. Rieber 1997). Recently, dynamic semantics has been extended so that it can also account for discourse relations, and in Asher's contribution to this volume it is argued that with their help at least parenthetical constructions can be given a truth-conditional analysis after all. The study of speech acts has in recent years received a strong impulse from the development of experimental human-machine dialogue systems based on natural language. A central problem in the engineering of these systems is the determination of the system behaviour as a function of the interactions of the user. The system behaviour is naturally expressed as a certain speech act, which is rational given the machine's goals in the system and, also, the contributions of the user must be analysed in terms of the content and intention of the user. The determination of the speech act the user is making as well as its content is crucial. Two specific contributions to speech act theory have come out of this new impulse. One is an abundance of empirical study of actual dialogues, mostly performed in order to determine the requirements of the dialogue system but also in order to improve the system's recognition of the user's intentions. This effort has made it necessary to come up with classifications of speech acts that are able to describe the phenomena encountered in the dialogues, captured in socalled Wizard of Oz experiments or in similar experimental settings. In consequence, a shift has occurred from the traditional speech acts to the
4 Introduction
David Traum This paper gives an overview of the different issues that arise in speech act classification, in the face of the different tasks that these classifications have to fulfil: tools for dialogue management, a common language for a team of corpus annotators, interface with various kinds of proof engines, etc. The paper attempts to reduce the misunderstandings that naturally arise when different scientific communities start dealing with a common theme, such as speech acts.
Nicholas Asher This paper develops a truth-conditional account of parentheticals like: Mary assures us as in (1) within the larger theory of discourse structure, developed by the author in collaboration with Lascarides. (1) The party has started, Mary assures us. The theory assumes that the parenthetical is connected by the general mechanism that constructs discourse relations to the matrix clause (in this case by the relation Evidence), leading to a combination of the statements in (2a)-(2c):
(2) a. The party has started. b. Mary assured us the party has started. c. Mary assurance that it did is evidence for the party having started.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
ones typically needed in real-life systems: the grounding of what the dialogue partner has said, the correction of the dialogue partner, cue words like uhm and uhhh, etc. Reliance on these empirical studies also underlies the work on anaphora resolution of Miriam Eckert and Michael Strube in this volume, which otherwise fits in with the context change tradition. The second change has been the need for formal theories of speech acts, or formal theories in which speech acts can be defined in a way that is sufficiently explicit to meet the requirements of the dialogue management component of the system. This has led to the development of analyses of speech acts and related notions within action logics or within the standard BDI (belief-desire-intention) model (cf. Cohen & Levesque 1990). The relation between these systems from Artificial Intelligence and the more linguistically motivated context change theories has so far remained unexplored, though it is—of course—clear that there must be a significant overlap. David Traum's paper gives a thorough introduction to the issues that arise in the newer classifications of speech acts that arise both in the empirical studies of dialogues and in their formalizations.
Henk Zeevat and Robert van Rooy $
The main body of the paper explains variation in the interpretation of parentheticals in complex clauses, due to the discourse relations involved and the connectives that build the complex sentences.
Miriam Eckert and Michael Strube
Andreas Herzig and Dominique Longin This paper gives an introduction to a version of the BDI-model developed for a dialogue system and extends the model with a restricted and feasible theory of belief change based on a notion of discourse topic. The topic of a formula is a set of contextualized themes associated with the formula by a function based on the atoms of the formula. In addition, it is assumed that the user is competent on certain themes, depending on the application. This allows the adoption of a user belief as part of the common ground when its theme belongs to the user competence. Likewise, it can be used as a guard on the preservation of beliefs under the occurrence of speech acts: a common ground belief is no longer inferable if the speech act questions a theme that is in the topic of the belief.
Robert van Rooy This paper argues for a performative analysis of most occurrences of imperatives sentences in terms of a context change theory. It concentrates mainly on permission sentences, and shows how intuitions concerning coordinate connectives and quantificational determiners can be explained when context change is governed by contraction. A major issue in the paper is how to account for conjunctive permissions, and several alternative proposals are discussed.
Acknowledgements This special issue grew out of Amstelogue '99, the Third Workshop on the Semantics and Pragmatics of Dialogue. We wish to thank all who have helped to make our workshop a success, the contributors, the invited speakers, the members of the programme committee, the sponsors and in particular our two coorganizers, Noor van Leusen and Jan van Kuppevelt.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This paper is concerned with the resolution of pronominals in spoken dialogues and bases itself on an extensive corpus study. A surprising result is that many pronouns and demonstratives do not have a linguistic antecedent. The paper further extends a pronoun resolution algoritm due to the second author with a new dialogue segmentation method, based on the achievement of a common ground between speaker and hearer, as in the dynamic framework and with the inclusion of sentential antecedents. The approach is validated by an extensive comparison with annotated corpora.
6 Introduction ROBERT VAN ROOY LLLC/University of Amsterdam Department of Philosophy Nieuwe Doelenstraat 15 1012 CP Amsterdam The Netherlands [email protected]
REFERENCES Austin, John L. (1962), How to do Things with Heim, Irene (1982), The semantics of Words: The William fames Lectures Deliv- definite and indefinite noun phrases', ered at Harvard University in 1955, Oxford Ph.D. dissertation, University of Massachusetts, Amherst. University Press, Oxford. Cohen, Philip R. & Perrault, Raymond Kamp, Hans (1981), 'A theory of truth and semantic representation', in Groe(1979), 'Plan-based theory of speech nendijk et al. (eds), Formal Methods in acts', Cognitive Science, 3, 177-212. the Study of Language, Amsterdam, Cohen, Philip. R. & Levesque, Hector J. Mathematical Centre, 277-322. (1990), 'Rational interaction as the basis for communication', in P. Cohen et al. Levinson, Stephen C. (1983), Pragmatics, (eds), Intentions in Communication, MIT Cambridge University Press, CamPress, Cambridge, MA. bridge. Ginzburg, Jonathan (1996), 'Dynamics and Lewis, David (1979), 'Scorekeeping in a language game', Journal of Philosophical the semantics of dialogue', inj. Seligman Logic, 8, 339-59& D. Westerstahl (eds), Logic, Language and Computation, Vol. 1, Stanford, CSLI Rieber, Steven (1997), 'Conventional impliPublications. catures as tacit performatives', Linguistics and Philosophy, 20, 51-72. Grice, Paul (1957), 'Meaning', Philosophical Searle, John (1969), Speech Acts, Cambridge Review, 67, 377-88. University Press: Cambridge. Grice, Paul (1975), 'Logic and conversation', in P. Cole & J. Morgan (eds), Speech Acts: Searle, John (1975), 'Indirect speech acts', in P. Cole & J. Morgan (eds), Syntax and Syntax and Semantics, 3, Academic Press, Semantics 3: Speech Acts, Academic Press, New York, 41-58. New York: 59-82. Groenendijk, Jeroen (1998), 'Questions in update semantics', in J. Hulstijn & A. Stalnaker, Robert C. (1978), 'Assertion', in P. Cole (ed.), Syntax and Semantics, Vol. g: Nijholt (eds), Twendial '98: Formal Semantics and Pragmatics of Dialogue, University Pragmatics, Academic Press, New York, Twente. 3 r 5-32Hamblin, C. L. (1971), 'Mathematical Veltman, Frank (1996), 'Defaults in update models of dialogue', Theoria, 37, semantics', Journal of Philosophical Logic, 130-55. 25, 221-61.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
HENK ZEEVAT ILLC/University of Amsterdam Computational Linguistics Spuistraat 134 1012 VB Amsterdam The Netherlands [email protected]