JOURNAL OF SEMANTICS
AN INTERNATIONAL JouRNAL FOR THE INTERDISCIPLINARY STUDY oF THE SEMANTICS Of NATURAL LANGUAGE
MANAGING EDIT 0 R: PETER Bo scH (lBM Scientific Cenrre, Heidelberg and University of Osnabriick) REVIEW EDITOR: TIBOR Kiss (IBM Scientific Cenrre, Heidelberg) EDITORIAL BOARD: N. ASHER (University of Texas, Austin) R. BARTSCH (University of Amsterdam) J. VAN BENTHEM (University of Amsterdam) M. BIERWISCH (MPG and Humboldt University Berlin) B. BocuRAEV (Apple Computer Inc) M. BORILLO (University ofToulouse) G. BROWN (University of Cambridge) 0. DAHL (University of Stockholm) S. C. GARROD (University of Glasgow) B. GEURTS (University of Osnabriick) M. HERWEG (IBM Scientific Cenrre, Heidelberg) L. R. HORN (Yale University) P. N. jOHNSON-LAIRD ( Princeton University) H. KAMP (University of Stuttgart) S. LEVINSON (MPI Nijmegen) S. L6BNER (University of Diisseldor�
SIR JoHN LYONS (University of Cambridge) A. MANASTER-RAMER (Wayne State University) W. MARSLEN-WILSON (MRC, Cambridge) J. McCAWLEY (University of Chicago) M. MoENS (University of Edinburgh) F. J. PELLETIER (University of Alberta) M. PINKAL (University of Saarbriicken) R. A. VAN DER SANDT (University ofNijmegen) T. SANFORD (University of Glasgow) R. ScHA (University of Amsterdam) H. ScHNELLE (University of Bochum) A. VON STECHOW (University of Tubingen) M. STEEDMAN (University of Pennsylvania) W. WAHLSTER (DFKI, Saarbriicken) B. WEBBER (University of Pennsylvania) H. ZEEVAT (University of Amsterdam) T. E. ZIMMERMANN (University of Stuttgart)
EDITORIAL ADDRESS: Journal of Semantics, c/o Dr P. Bosch, IBM Germany Scientific Cenrre, Vangerowsrr. I 8, D-69II S Heidelberg, Germany. Phone: (4()-622I-) 59-4251/4483. Telefax: (4()--6221-) S9-J200. Email:
[email protected],
[email protected] New Subscribers co the Journal of Semantics should apply co the Journals Subscription Department, Oxford University Press, Walton Street, Oxford, OX2 6DP. For further information see the inside back cover. Volumes I-6 are available from Swets and Zeirlinger, PO Box 830, 2I6o SZ Lisse, The Netherlands. ©Oxford University Press I99S
All rights reserved; no part of this publication may be reproduced. stored in a retrieval system. or transmitted in any form or by any means. electronic, mechanical. photocopying. recording, or otherwise without either the prior written permission of the Publishers. or a licence permitting restricted copying issued in the
UK
by the Copyright Licensing
Agency Ltd. 90 Tottenham Court Road. London WIP 9HE. or in the USA by the Copyright Clearance Center. 27 Congress Street, Salem. Mass o I 970.
The Journal of Semantics is published quarterly in February. April. August and November by Oxford Universiry Press. Subscription is $I20 per year. Second class postage paid at Newark NJ and at additional mailing offices. ISSN OI67S I 33·
POSTMASTER: send address corrections to The Journal of Semantics, c/o Virgin Mailing and Distribution. Cargo Atlantic. IO Camptown Road. Irvington NJ 07I I I-I IOj. USA.
JOURNAL OF SEMANTICS Volum.e
:u
Nmnbell'
1
SPECIAL ISSUE: LEXICAL SEMANTICS PART I
Guest Editors: Branimir Boguraev and James Pustejovsky
CONTENTS jAMEs PusTEJOVSKY AND BRANIMIR BocURAEv Introduction: Lexical Semantics in Context
I
ANN COPESTAKE AND TED BRISCOE
Semi-productive Polysemy and Sense Extension NICHOlAS As HER AND ALEx LASCARIDES
Lexical Disambiguation in a Discourse Context
IS
Journal of&mantics
12: 1-14
©Oxford University Press 1995
Introduction: Lexical Semantics in Context JAME S PUSTEJOVSKY
Computer Science Department, Brandeis University BRANIMIR BOGURAEV
Apple Computer Inc., Cupertino, CA
The papers in this double issue on lexical semantics constitute not just a set of diverse yet related articles in a core area of lexical research, but rather make up a unique collection of work on the relationship between logical polysemy, sense
extension, and discourse structure. The papers included in these two issues are:
Part I: 'Semi-Productive Polysemy and Sense Extension' by Ann Copestake and
Ted Briscoe; and 'Lexical Disambiguation in a Discourse Context' by Nicholas Asher and Alex Lascarides; Part 2: 'Transfers of Meaning' by Geoffrey Nunberg; 'Aspectual Coercion and Logical Polysemy' by James Pustejovsky and Pierrette
Bouillon; and 'A Typology and Discourse Semantics for Motion Verbs and Spatial PPs in French' by Nicholas Asher and Pierre Sablayrolles.
What these papers have in common is that each addresses the following
question: what is the representation of a lexical item such that it may assume different senses in diverse contexts in composition in the semantics? That is, what is it about the representation of a lexical item that gives rise to sense extensions and to the phenomenon of logical polysemy? Although the authors in this issue approach the problem quite differently, there are at least three major subthemes running through the papers:
o
the role of pragmatics and discourse structure in lexical disambiguation;
o
the treatment of sense extension and referential transfer phenomena.
o
the analysis of logical polysemy as a compositional process; and
Addressing the first theme are the papers by Asher and Lascarides, and Asher and Sablayrolles. These authors examine the role that lexical semantics plays in discourse-level reasoning, as well as the effects discourse coherence has on the lexical disambiguation process. The next issue, that of logical polysemy, is taken
up in the contributions by Copestake and Briscoe and by Pustejovsky and
Bouillon. Both these papers argue that it is the logical make-up and semantics of the lexical items which. in composition, give rise to logical polysemy and
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
I THE PROB LEM OF LEX I CA L AMB IGU ITY
2
lncroduction: Lexical Semantics in Context
( 1) a. b. (2) a. b. (3) a. b.
the bank of the river the richest bank in the city Drop me a line when you are in Boston. We built a fence along the property line. The judge asked the defendant to approach the bar. The defendant was in the pub at the bar.
For processing considerations, it has been conventionally assumed that homonyms such as the (a}-(b) pairs above are distributed in different contexts and would therefore not present a real challenge to disambiguation in text. This is the position taken by priming-based disambiguation strategies, such as Boguraev (1979), Waltz & Pollack (1985), and Hirst (1987), where the heuristic interpretation of lexical context helps narrow the sense selection task for ambiguous words. Although the strategy can be usefully applied to terminology in specialized domains, for general frequency words with common use senses and, possibly, specialized ones too, the problem is more difficult. For example, the word bar in (3) has at least twenty-five distinct senses in most unabridged English dictionaries (c£ Random House Unabridged Dictionary , 1993). The two contrastive senses used in (3a) and (3b) would appear to be easily distinguish able, but as Asher and Lascarides' paper demonstrates, domain-priming is not sufficient to disambiguate such lexical senses in very natural and everyday discourses. What is needed, they argue, is not a priming-based strategy, but a semantics-based approach to sense selection, where it is the unfolding logic of a
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
leads to novel senses. Finally, the question of sense extension is addressed by both Nunberg's contribution and the one by Copestake and Briscoe. It is important to note that assigning the contributions to different themes, by highlighting a particular aspect of the phenomenon of lexical ambiguity, does not necessarily make their positions contradictory nor place them in competition. Rather, one can comfortably attribute certain polysemous behavior to the semantics of lexical items in composition while still acknowledging the role of pragmatically inspired sense extensions and context ually determined disambiguation. Lexical ambiguity is one of the most difficult problems in language processing studies and, not surprisingly, is at the core of lexical semantics research. It is certainly true that most words in a language have more than one meaning, but the ways in which words carry multiple meanings can vary. Weinreich's ( 1964) distinction between contrastive ambiguity and complementary ambiguity is illustrative to this point. Contrastive ambiguity, traditionally known as homonymy, is the situation where a lexical item is associated with at least two distinct and unrelated meanings. This is the kind of ambiguity whose treatment is addressed by the first major subtheme in this issue. Examples of this relation are illustrated in ( 1 )-(3) below.
James Pustejovsky and Branimir Boguraev
3
{4) a. b. {s) a. b. (6) a. b.
The bank raised its interest rates yesterday. (i.e. the institution ) The store is next to the new bank. (i.e. the building) John crawled through the window. The window is closed. Mary painted the door. Mary walked through the door.
The semantics must somehow account for how a bank can be both an institution and a building, and how a window or door can be both an aperture and a physical object. This logical connection between lexical senses is what motivated a richer semantic representation for nouns and adjectives, known as qualia structure (c£ Pustejovsky 1991 ). The qualia refer to modes of explanation for the object. In this approach to lexical representation, a noun such as door is inherently relational, being the reification of a physical object which contains an aperture. With a richer representation language for lexical items, one can view countmass alternations as a type of logical polysemy as well: a particularly apt example here is the operation of 'animal grinding' (c£ Pelletier & Schubert 1986): {7) a. b. {8) a. b.
Sam enjoyed the lamb. The lamb is running out in the field. I ordered haddock last night The haddock are plentiful this year.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
discourse and the rhetorical relations that exist between sentences that help determine the disambiguation of a lexical item in the discourse. This is the approach taken by Hobbs (1985), and more recently by Asher {1993) and Lascarides and Asher ( 1993). The paper by Asher and Lascarides in this issue builds on these analyses by making an explicit link between the lexical semantics of the words and the effect this information has on discourse decisions and processing, such as disambiguation. Continuing with this theme, Asher and Sablayrolles in their contribution to this issue study the semantics of verbs of motion. In particular, they examine the contribution made by discourse rules and rhetorical relations to disambigua tion of French spatial prepositions such as dans (in/into ), as well as the contribution that lexical semantics of motion verb complexes makes toward determining discourse relations. Although contrastive ambiguity is a major problem in semantic inter pretation, an equally difficult problem is that of complementary ambiguity, the second theme of the papers in the special issue. Unlike homonymy, the senses in (4)-{6) exhibit a complementary polysemy, where the alternative readings are manifestations of the same core sense as it occurs in different contexts.
4
Introduction: Lexical Semantics in Context
Of course, such polysemous behavior is not acceptable with all candidate nouns, as the examples in (9) illustrate. a. *We ordered cow for dinner.
(9)
b. *The frog here is excellent.
As Copestake and Briscoe argue in their contribution to this issue, the
apparent idiosyncracies of animal grinding polysemy are due to the lexical
nature of the relation between the senses. That is, the polysemy is the result of lexical rules rather than of alternations within the qualia of a single lexical item, such as with
door in (6) above.
There are cases of verbal polysemy which pose some difficult problems for
( I o) (I I )
a. Mary began to read the novel. b. Mary began reading the novel. c. Mary began the novel.
a. Mary enjoyed drinking the beer. b. Mary enjoyed the beer.
Verbs such as
begin
and
enjoy are polysemous in that they must be able to select
for a multiple number of syntactic and semantic contexts, such as verb phrase, gerundive phrase, or noun phrase. How this is accomplished without
proliferating word senses is a difficult task and requires restructuring the
manner in which a verb's arguments are selected. For some initial discussion of this topic, see Pustejovsky I 99 I , Dixon I 99 I , and Briscoe et a/ . I 990). This problem is addressed in greater detail in the contribution by
(
)
(
)
(
Pustejovsky and Bouillon for this issue. They examine the linguistic constraints
on the mechanism of coercion in aspectual predicate complements. In order to
understand coercive behavior, they argue, it is necessary to look at the complete paradigm associated with these verbs, both the control forms (such as ( 10) above) and the
raising forms, illustrated below:
( I 2) a.
The rain began to fall. b. Mary began to feel ill.
Rather than positing separate senses of the verb, Pustejovsky and Bouillon argue that aspectual verbs such as begin and finish are logically polysemous between their control and raising senses, and that the underlying lexical
representation for the verb is the same in each form. They show how an underspecifi.ed lexical representation is able to map to either the control or rais
(
( )
ing form in the syntax, giving rise to the alternation seen in I o) and I 2 above.
With this distinction, the authors are able to show that complement coercion is
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
lexical semantics as well. These involve verbs which appear to behave poly morphically, taking several different complement types:
James Pustejovsky and Branimir Boguraev s
present only with the control sense of the predicate, subject to specific typing constraints on the complement. Another example of contextually determined polysemy comes from the semantic behavior of adjectival modification. Here we consider only evaluative predicates such asfost in ( I 3) below.
(I 3) a. a fast typist: (i.e. one who types quickly) b. a fast car: (i.e. one which can move quickly) c. a fast waltz: (i.e. one with a fast tempo)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The interpretation being selected for depends on the noun that the adjective is modifying. Pustejovsky & Boguraev ( I 993) argue that the adjective in these cases functions as an event predicate, modifying the TELI C quale role of the head. As Copestake and Briscoe point out in their contribution in this issue, however, this interpretation is defeasible, and can be ruled out by contextual readings. The complementary questions of default-driven instantiation of lexical sense and defeasibility of lexical inference are central to the concerns of lexical semantics research since they are crucial both for providing a global organization oflexical knowledge, as well as for linking lexical information into discourse contexts. A lot of work to dare has studied possible uses of default inheritance mechanisms for globally structuring the lexicon (Daelemans et a/. I 992; Briscoe et al. I 993). However, the exact manner in which lexical information is inferred by default mechanisms at the level of the lexicon is a complex problem. In particular, the lexicon-discourse interface which coordinates the overriding of lexical defaults and information inferred at discourse level must be sensitive to the varying aspects of defeasible knowledge associated with lexical items. Still, the mechanisms of default reasoning over lexically specified information are essentially the same as those used for discourse reasoning. This is a significant position in lexical research, since it suggests that a unified linguistic reasoning system is not only possible but, in fact, appropriate for all of lexical, compositional, and discourse knowledge. In this respect, the work reported here is both important and groundbreaking; and even though much more needs to be done defining the interface between lexical semantics and discourse representation, the two papers by Asher and Lascarides and by Copestake and Briscoe are the first serious attempts to bridge this gap. Furthermore, the move within the generative lexicon theory permitting semantically underspecified lexical representations (as argued. in the paper by Pustejovsky and Bouillon) is also a move towards allowing contextual information to contribute towards the full determination of a semantic representation in the discourse. Finally, the third major theme characterizing the research reported in this issue is the question of sense extension and 'displaced reference'. Sense extension includes some cases that we have already considered, such as the
6
Introduction: Lexical Semantics in Context
nominal polysemy of animal grinding, discussed in the article by Copestake and Briscoe. What distinguishes this from logical polysemy is the lexically idiosyncratic nature of the ambiguity, as well as the semi-productive status such extensions have in language. There is another type of sense extension, mentioned in Nunberg (1979), Fauconnier (1985 ), and Jackendoff (1992), which also seems difficult to characterize as logical polysemy. It is illustrated in the sentence below: (14) a. I am parked out back. b. Ringo squeezed himself into the parking space.
contrastive ambiguity, which is normally resolved by contextual and discourse knowledge; o complementary ambiguity (or logical polysemy), as resolved by co composition in the syntactic context of the sentence; and o sense extensions, as mediated by lexical rules and specific conditions relating to the speaker and context. o
In the next section we summarize the positions of each article in the issue, and demonstrate how they relate to the phenomena mentioned here. 2 POLYSEMY AND SEMANTIC TYPING
Copestake and Briscoe. In 'Semi-Productive Polysemy and Sense Extension', the authors explore the thesis that there are two types of systematic polysemies for norninals. Constructional polysemy arises in situations where there is really one lexical sense, and apparent ambiguities arise from a process of co-composition in the syntax (c£ Pustejovsky 1991), generatively giving rise to productive new
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
These sentences illustrate two types of referential transfer: a type-mismatch between subject and predicate in (14a), where it is the car that is parked, not the individual; and a mismatch between verb and object, together with a non identity between antecedent and anaphor in the binding relation in (14b). In Nunberg's contribution in this issue, such extensions of meaning are referred to as predicate tranifers . In particular, he argues against a metonymic analysis, where the subject I in (14a) and the object himself in (14b) are interpreted as my car and his car respectively. Rather, his position is that there are pragmatically licensed conditions which allow the predicate to extend its sense, where it is retyped to select for the subjects that are present in the syntax. This brief characterization of the major themes running through the special issue aims to bring forth its overall message: that lexical ambiguity is a heterogeneous phenomenon, with at least three distinct factors contributing to the contextual emergence of word senses for a particular lexical item:
James Pustejovsky and Branimir Boguraev
7
( 1 S) a. John enjoyed reading the book. b. John enjoyed the book. They show how the operation of type coercion (Pustejovsky 1993) can be implemented in an HPSG-style syntax using the type system of the LKB. One interesting aspect of their analysis is that the coercion is performed internally to the semantics of enjoy. The treatment preserves compositionality, however, since generative mechanisms are used to specialize the complement selection to the appropriate type for the different subcategorizations. Finally, Copestake and Briscoe tum to the analysis of sense extension, paying particular attention to mechanisms of'grinding' and'animal grinding'. Because of the defeasible nature of the interpretation of 'ground' nouns, they posit a general abstract lexical rule of grinding and allow for conventionalized subcases, licensed by pragmatic effects from the discourse. The effects of blocking in lexical choice are due to just such pragmatic and contextual consid erations.
Asher and Lasca rides . 'Lexical Disambiguation in a Discourse Context' investigates how discourse structure can affect the selection of lexical senses.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
senses. Sense extension , on the other hand, requires lexical rules for deriving new senses. The latter is only semi-productive and can be blocked or pre-empted by other lexical items, or overridden, as Asher and Lascarides argue, by mechanisms introducing discourse information into the interpretation. The lexical representation language Copestake and Briscoe employ is that developed for the ACQUILEX Lexical Knowledge Base System (LKB). It is a typed feature structure language which has been augmented with defaults and lexical rules. They combine an HPSG-like approach to syntax with Pustejovsky's notion of qualia structure and coercion in the generative lexicon framework. Following the definition of typing in Carpenter (1992), the types are organized as a lattice, and constraints are themselves seen as typed feature structures. Copestake and Briscoe first examine how constructional polysemy is treated in their framework. They analyze subselecting adjectives such as fast and good in terms of which qualia of the head they modify. This analysis is similar to that in Pustejovsky (1993), but they also provide a formal mechanism for treating the compositional interpretation from the qualia as defeasible knowledge. For example, a fast typist is normally interpreted as one who types rapidly, hut specific contexts can suggest diverse interpretations that are not inherent to the qualia of the lexical items, as when the phrase might refer to a typist who is running quickly. The authors then address the polymorphic behavior of verbs such as enjoy, as illustrated in (Is).
8
Introduction: Lexical Semantics in Context
They also focus on describing the mechanisms whereby lexical semantics
affects and contributes to discourse interpretation. To this aim, they integrate three components: o
o
o
a theory of discourse structure called
SDRT
(Kamp & Reyle
1993) ,
which
represents discourse in terms of rhetorical relations that connect together the propositions introduced by the text segments; an accompanying theory of discourse attachment called DICE (Lascarides &
Asher
1993) , which
computes which rhetorical relations hold between the
constituents, on the basis of the reader's background information; and a formal language for specifying the lexical knowledge-both syntactic and
lexicon mechanisms into a typed feature structure logic.
By integrating these separate components, they are able to model the information flow in both directions: from words to discourse; and from discourse to words. For the mapping from words to discourse, Asher and Lasca
rides show how the
LRL
permits the rules for computing rhetorical relations in
DICE to be generalized and simplified, so that a single rule applies to several
semantically related lexical items. From discourse to words, they encode two heuristics for lexical disambiguation:
o
o
disambiguate words so that discourse incoherence is avoided; and disambiguate words so that rhetorical connections are reinforced.
With these heuristics, the authors are able to handle several cases of lexical disambiguation that have until now been outside the scope of theories of lexical
processing. Asher and Lascarides show how lexical processing can work in service to a theory of discourse attachment. The knowledge resources encoded
in a theory of discourse attachment, however, are also useful to lexical processing. Consider the following examples and the ambiguities in them concerning the words plant , bar, and dock .
(16)
a. They ruined the view.
b.
(17)
They improved the view.
c. They put a plant there. a. The judge demanded to know where the defendant Ross was. b. The barrister mumbled apologetically, and said that Ross had last been
seen drinking heavily. c. The judge told the bailiff to escort Ross from the bar to the dock.
They argue that bar in the second example is disambiguated to its 'drinking establishment' sense on the basis of constraints on coherent discourse. In contrast, plant in the first example is disambiguated on the basis of strengthen-
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
semantic-called the LRL, Lexial Representation Language (Copestake & Briscoe 1993) , which, among other things, incorporates certain generative
James Pustejovsky and Branimir Boguraev
9
ing the rhetorical link between the sentences. They argue that the inference in the discourse which leads to this disambiguation is driven by the lexical semantic informacion associated with the qualia structure for the words.
Nunberg . In 'Transfer of Meaning', Nunberg explores the operation of predicate transfer, whereby a name of a property is mapped into a new name denoting a property to which it functionally corresponds. In particular, Nunberg discusses the meaning of predicates such as parked out back in sentences such as ( I 8):
(I 8)
I'm parked out back.
Predicate transfer is responsible for the ability of this predicate to refer to the
alternations in systematic polysemy. According to Nunberg's formulation of the phenomena, predicate transfer is subject to two general conditions; o o
the basic and derived property must stand in a functional correspondence to one another; the derived property should be a 'noteworthy' feature of its bearer.
Nunberg argues that reference to predicate transfer allows us to maintain a very strict definicion of syntactic identity, thereby ruling out all cases of 'sortal crossing', where a term appears to refer to things of two sorts at the same time, as in examples like Ringo squeezed himself into a tight space; in such a case, the reflexive is strictly coreferential with its antecedent Nunberg claims that these observtions enhance the reliability of 'zeugma' tests for ambiguity, while also highlighting a theoretical difficulty in distinguishing polysemy and generality. The results, therefore, according to Nunberg, appear to pose a difficulty for Pustejovsky's view of the distinction between logical polysemy and more general operations of sense transfer such as metaphor, etc., which generative lexicon theory claims are extralinguistic transfer phenomena. Nunberg then turns to a discussion of nominal polysemy, an area already touched on in this issue by Copestake and Briscoe with their discussion of grinding rules. One of the problems in current lexical treatments of systematic polysemy, according to Nunberg, is that they emphasize the lexical nature of the ambiguity without looking at the compositional nature of the sense relations. As he points out, transfer is essentially a phrasal process and cannot be characterized as a purely lexical phenomenon without a loss of explanatory
I 99 I , and I 987). Nunberg looks at the phrase a
power (consider, for example, the lexical subregularities ofWilensky the lexical networks of Norvig & Lakoff
widely-studied Peruvian virus, which can have four possible interpretations but,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
car but, taken syntactically as its subject, the driver of the car. Similarly, according to Nunberg, it is this property that is responsible for the lexical
10
Introduction: Lexical Semantics in Context
surprisingly, not one denoting a 'disease endemic to Peru that is caused by a
widely studied virus'. He admits that such effects of compositionality can be captured by coercion mechanisms such as those introduced by Pustejovsky and others working within generative lexicon theory. But as he argues, some
restrictions must be placed on coercion operations in order to allow a
compositional treatment of
tiny incurable wart but not the ill-formed tiny incur
able virus . The problem, as Nunberg sees it, is in formulating the constraints on
coercion so that they do not simply recapitulate the process of phrasal composi tion.
After discussing the conditions of predicate transfer, Nunberg examines the
(19)
a. Ringo; squeezed himself; into a narrow space.
b. Yeats; did not like to hear himself; read in an English accent. Because of the restricted conditions on when such co-predications are allowed,
Nunberg suggests that these need not be cases of sortal crossing, but rather instances of predicate transfer, where the individuals denoted by the subject
expressions are fixed, and it is the predicate which changes its sense. The
reliability of the zeugma as a test for determining the polysemy of a word is preserved with this interpretation.
Pustejovsky and Bouillon .
The theme of 'Aspectual Coercion and Logical
Polysemy' is to examine the behavior of aspectual predicates in French and English in order to explain the constraints on the operation of type coercion in complement position. Working within the framework of generative lexicon
theory, the authors explore the general applicability of type-changing
operations such as coercion, and consider the power of generative mechanisms
operating in the lexicon and the semantics. They argue that without a proper
notion of constraints on generative mechanisms, there will certainly be
overgeneration of interpretations in the semantics. To illustrate the manner of
the constraints on type coercion, they study the behavior of complementation
with aspecrual predicates in English and French. For example, they point out that although type coercion is normally acceptable in both languages with the verbs begin and (22) and (23)).
(2o)
commencer, there are cases where coercion is unacceptable (c£
a. John began to read the book.
b. John began reading the book.
(21)
c. John began the book.
a. Jean a commence a lire le livre. b. Jean a commence le livre.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
syntactic consequences of this operation, and in particular, the effect of predicate transfer on reflexivization. Consider the sentences first discussed in Jackendoff ( 1 992):
James Pustejovsky and Branimir Boguraev
(22) a. b. c. (23) a. b. c.
II
0John began a symphony. (listening to) 0Mary began the highway. (driving on) 0John began the dictionary. (referencing) 0Jean a commence une symphonie. 0Marie a commence l'autoroute. 0Jean a commence Ia dictionnaire.
Asher and Sablayrolles. The general theme of
'A Typology and Discourse Semantics for Motion Verbs and Spatial PPs in French' is to provide a semantics of motion verbs and verb complexes chat tries to construct the spatia-temporal semantic properties of the verbs compositionally in terms of the verbs and their arguments and adjuncts. What is novel about this treatment is chat they attempt to integrate this lexical information into discourse contexts in order to determine the spatial and temporal structure oftexts. An interesting side-effect emerging here is that they see this as a way of contributing to one kind oflexical disambiguation. The analysis focuses on motion verbs both in isolation, such as the verbs leave and sortir, as well as motion verb complexes, where the verb takes a spatial or path prepositional phrase; for example, sortir du Jardin (to go out ofthegarden) ,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
They argue that the apparently idiosyncratic behavior of coercion with aspectual verbs is due to different types of event selection on the complement position. An important part of their analysis is an attempt to explain the polysemous nature of aspectual verbs, while accounting for the semantic relatedness between the control and raising senses of verbs such as begin and finish . They demonstrate chat complemept coercion is possible only with the subject control senses of these predicates and explain why chis is so. The authors build on the analysis of unaccusatives presented in Pustejovsky & Busa ( 1 994), where verbs such as break and the Italian ajfondare (sink) are logically polysemous in predictable ways, and do not need to be assigned to multiple lexical entries, corresponding to their respective causative and unaccusative senses. Under this view, such verbal alternations are the result of an underspecified lexical representation and a focusing mechanism over the event structure for the verb. In fact, Pustejovsky and Bouillon argue that it is this semantic underspecification which gives rise to the polysemy exhibited by these predicates. By heading (or focusing) the initial event associated with the lexical representation, a subject control structure arises. By heading the final event, however, a raising structure emerges. As the authors point out, their contribution concerns not only conditions on type coercion, but it also provides an analysis of verbal polysemy which extends the generative treatment already developed for norninals in the generative lexicon approach.
12
Introduction: Lexical Semantics in Context
3
C O N CLUSIO N
The work reported in these two issues presents new directions in the treatment of polysemy and sense extension. In addition to determined analyses of data and carefully crafted formal frameworks for accounting for lexical transfer and logical polysemy, what emerges very clearly from these articles is that polysemy is not a single, monolithic phenomenon. Rather, it is the result of both compositional operations in the semantics, such as co-composition, and of contextual effects, such as the structure of rhetorical relations in discourse and pragmatic constraints on co-reference. Rather than viewing the lexicon as a separate and fixed repository of lexical information, as is traditionally assumed, the articles in this issue tackle the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
sortir dans lejardin (to go into thegarden) , etc. In the examination of motion verbs, Asher and Sablayrolles provide a typology for semantic behavior for the different verb classes. The authors are quick to point out, however, that the motion verbs and their complexes cannot be lumped together into the same typology. Yet these are clearly not unrelated forms, since the complex is derived from the simple verbs. Thus, the basic aim of the paper is to provide the compositional rules which will compute the semantics of a motion verb complex from its constituent parts. In addition to locations and positions, the authors define a term called posture, which has to do largely with the manner of the individual situated in a position or location. For example, the participial forms sitting and standing refer to manner of positional locations, and these are considered as postures. After classifying motion verbs into four cat�gories (change oflocation , change ofposition , inertial change ofposition , and change OJposture) the authors then focus on one class, the change oflocation verbs, and analyze what spatial relations are necessary to account for characterizing how language organizes space in and around locations, with respect to displacement. They first show that a course-grain analysis of space into interior and exterior is insufficient to characterize the natural language data, and employ a richer classification of spatial regions. In the final section, Asher and Sablayrolles apply their semantics to the problem of lexical disambiguation for prepositional phrases in context. They introduce a mechanism which links lexical semantic information to the current discourse structure, namely, the notion of constituent saliendocation (CSL). They show how discourse relations such as elaboration, backgrounding, narration, and explanation, are directly affected by the value of the CSL for the different participants in the discourse. For example, it will contribute towards disambiguation of the French locative preposition dans (in/into) as having either a 'goal' or a 'situational location' interpretation.
James Pustejovsky and Branimir Boguraev
13
difficult question of how ocher components in che natural language interpretation process interact with the lexicon to disambiguate and fully determine the semantics of words in context. Although there are many areas left unexplored here, the papers illustrate how lexical semantics can be made sensitive to sentence level compositional processes as well as discourse level inference mechanisms, reacting to the diverse and multiple causes of lexical ambiguity. BRANIMIR BOGURAEV
Computer Science Department zs8 Volen Centerfor Complex Systems Brandeis University Waltham, MA ozz54 USA e-ma ilja mesp@cs. brandeis.edu
Apple Computer Inc. One Infinite Loop, MS: 301-35 Cupertino, CA 950 14 USA e-mail:
[email protected]
REFERENCES Asher, N. (1993), Riference to Abstract Objects in Discourse, Kluwer Academic Publishers, Dordrecht. Asher, N. & M. Morreau (1 991), 'Common sense entailment: a modal theory of non monotonic reasoning', in Proceedings to the 1 zth International Joint Conference on Artificial Intelligence, Sydney Australia, August 1 99 1 .
Structures, Tracts in Theoretical Computer Science, Cambridge University Press,
Cambridge. 'Connection', Notre Damejournal. Copestake, A. & E. Briscoe ( 1992), 'Lexical Operations in a Unification-based Frame work' in J. Pustejovsky & S. Bergler (eds), Lexical Semantics and Knowledge Representa tion, Springer.
Boguraev, B. (1979), 'Automatic resolution of Daelernans, W., K. de Smedt, & G. Gazdar linguistic ambiguities', Ph.D. thesis, Com (eds) (1 992), Inheritance in Natural Language puter Laboratory, University of Cam Processing, special issue of Computational bridge. Linguistics, 18, 2. Briscoe, T., A.. Copestake, & B. Boguraev Dixon, R M. W. ( 1 99 1 ), A New Approach ta (1990), 'Enjoy the paper: lexical semantics English Grammar: On Semantic Principles , Oxford University Press, Oxford. via lexicology', Proceedings of 13th Inter national Conference on Computational Lin Fauconnier, G. ( 1 98s), Mental Spaces , MIT Press, Cambridge, MA. guistics , Helsinki, Finland, 42-47. Briscoe, T., A. Copestake & A. Lascarides Hirst, G. ( 1 987), 'Semantic interpretation and the resolution of ambiguity', Studies in (1994), 'Blocking', in P. St. Dizier (ed.), Computational Lexical Semantics, Cam Natural lAnguage Processing, Cambridge bridge University Press, Cambridge. University Press, Cambridge. Briscoe, T V. de Paiva, & A. Copestake Hobbs, J. R (198s), On the Coherence and ( 1993), Inheritance, Defaults, and the Lexi Structure ofDiscourse, Report No. CSLI-85con , Cambridge University Press, Cam 37, Center for the Srudy of Language and bridge. Information, October 1 985. Carpenter, R ( 1 992), The Logic ofTyped Feature Hobbs, J. R, M. Stickel, D. Appelt, & P. �
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
JAMES PUSTEJOVSKY
I4 Introduction: Lexical Semantics in Context Martin (I 990), Interpretation as Abduction, Technical Note No. 449, Artificial Intel ligence Center, SRI International, Menlo Park,CA Jackendoff, R (I992), 'Madame Tussaud meets the binding theory', Natural Lan guage andLinguisticTheory , 10, I, I-32. Kamp, H. & U. Reyle (I993). 'From discourse to logic' Introduction to Model-theoretic Semantics ofNatural Language, Formal Lagic and Discourse RepresentationTheory , Kluwer
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Academic Publishers, Dordrecht, Holland. Lascarides, A & N. Asher {I993). 'Temporal interpretation, discourse relations and commonsense entailment', Linguistics and Philosophy, 16, 437--93· Norvig, P. & G. Lakoff (I987), 'Taking: a study in lexical network theory', BLS, 13, I95-2o6. Nunberg, G. (I979), 'The non-uniqueness of semantic solutions: polysemy', Linguistics and Philosophy , 3, 1. Pelletier, F. J. & L. K. Schubert (I 986), 'Mass expressions', in D. Gabbay & F. Guenthner (eds), Handbook ofPhilosophical Lagic , vol. 4, Reidel, Dordrecht. Pustejovsky, J. (I 99I), 'The generative lexi con', Computational Linguistics , 17, 4, 40941.
Pustejovsky, J. (I993). 'Type coercion and lexical selection', in J. Pustejovsky (ed.), Semantics and the Lexicon , Kluwer Aca demic Publishers, Dordrecht, Holland. Pustejovsky,]. & B. Boguraev (1993), 'Lexical knowledge represenration and natural language processing', Artificial Intelligence , 63, 193-223. Pustejovsky, J. & F. Busa (1994), 'Unaccusa tiviry and event composition', in P-M. Bertinetro {ed.), Approaches w Tense and Aspect , Elsevier. Random House Unabridged Dictionary (I993), Random House, New York. Waltz, D. & J. Pollack {1985), 'Massively Parallel Passing: A Strongly Interactive Model of Natural Language Inter pretation', Cognitive Science , 9, 5I-74· Weinreich, U. (1964), 'Webster's third: a critique of its semantics', International Journal ofAmerican Linguistics , 30, 405--9. Wilensky, R (1991), 'Extending the lexicon by exploiting subregularities', EEC5 Report No. UCB/C5D 91/618, Computer Science Division, University of California, Berke ley.
Joumtll ofSmrantia
I 2: I 5-67
©Oxford University Press I99S
Semi-productive Polysemy and Sense Extension ANN COPESTAKE
University of Cambridge Computer Laboratory,
University ofStuttgart, and CSU TED BRISCOE
University ofCambridge Computer Laboratory, and Rank Xerox Research Laboratory, Grenoble I
In this paper we discuss various aspects of systematic or conventional polysemy and their formal treatment within an implemented constraint-based approach to linguistic representa tion. We distinguish between two classes of systematic polysemy: constructional polysemy, where a single sense assigned to a lexical entry is contextually specialized, and sense extension, which predictably relates two or more senses. Formally the first case is treated as instantiation of an underspecified lexical entry and the second by use of lexical rules. The problems of distinguishing between these two classes are discussed in detail. We illustrate how lexical rules can be used both to relate fully conventionalized senses and also applied productively to recognize novel usages and how this process can be controlled to account for semi productiviry by utilizing probabilities.
1
I N T R O DUCTIO N
Discussion of polysemy has been central to much recent work on lexical semantics. Most of the arguments for (or against) attempting a fine-grained classification of semantic structure in the lexicon rest on the treatment of polysemic behaviour and attendant syntactic effects. In this paper we argue for a distinction between two classes of systematic polysemy: constructional polysemy, where a single sense assigned to a lexical entry is contextually specialized, and sense extension, which predictably relates two or more senses. We present a unification-based formalization and implementation in which the former is treated as instantiation of an underspecified lexical entry and the latter as a rule-governed relation between signs. It is important to distinguish putatively systematic or conventional polysemy from homonymy or unsystematic and idiosyncratic polysemy;2 the two familiar senses of bank as 'financial institution' and 'raised earth' are homo nyms, whilst the verbal sense meaning to 'put money in a bank' is polysemous
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Abstract
1 6 Semi-productive Polysemy and Sense Extension
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
with the nominal financial institution sense. It seems plausible that this case of polysemy is an example of a systematic sense extension by which nouns denoting artifacts become verbs denoting a purpose to which those artifacts can be put (e.g. button , hammer, butter, waltz, and so forth); though. of course, such claims need to be carefully argued for each such case.3 In what follows we will be concerned only with cases of putatively systematic polysemy and sense extension which extend to semantically defined classes of lexical items. Some work on systematic polysemy has emphasized the conceptual or cognitive nature of the transfers or mappings which underlie such processes (e.g. Nunberg 1978, I 979; Lakoff & Johnson I 98o; Fauconnier I 985; Martin I 990). This work is important in mapping out the range of possible conceptual transfers available and also in motivating their existence. However, alone it cannot account for all aspects of the linguistic phenomena. Other work has emphasized more the conventional nature of certain transfer processes (e.g. Apresjan I 973; Ostler & Atkins I 992), their similarity to derivational morpho logical rules (e.g. Copestake & Briscoe I 992), and cross-linguistic differences in their patterns of realization and conventionalization (e.g. Nunberg & Zaenen I 992 ). Still further work has emphasized the intricate connection between polysemy (or paradigmatic change) and associated syntagmatic effects, for example on argument structure (e.g. Levin I 993), and the possibility of characterizing some apparent polysemy as a product of syntagmatic combina tion (e.g. Pustejovsky I 99 I , I 99 3 ). Sense change or extension accompanies many if not most operations in the lexicon, including those familiar from derivational morphology, many gram matical function changing operations, and so forth. Some have been extensively studied, though usually more from the perspective of the morphological or syntactic consequences of such operations. In what follows we will focus on processes of conversion or zero-derivation and particularly on processes which do not affect the major category status of the modified word. One reason for this restriction is that there is a consensus that morphological processes involving explicit affixation are rule-governed, and increasingly the focus of discussion of such examples is on their semantic effects (e.g. Riehemann I 993 ); on the other hand, processes of conversion with minor or no grammatical corollaries have a more controversial status, and the need to treat these as rule governed requires more careful argumentation. Furthermore, even if we can show that such processes can be systematic it remains to demonstrate that systematic polysemy is achieved via operations analogous to morphological rules. In this paper we argue that processes of both sense modulation and sense change (see e.g. Cruse I 986:so £) play a role in accounting for systematic poly sernies. We artempt to distinguish modulation from change using tests traditionally associated with the distinction between vagueness and ambiguity and relate this to the formal representation.4
Ann Copestake and Ted Briscoe
17
Many types of conversion process are recognized as paralleling analogous processes of derivation or compounding, and thus treated as rule-governed cases of 'zero-derivation'; for example, it is uncontroversial to suggest that a noun such as purchase is deverbal and ambiguous between eventive and resultative readings in the same manner as the morphologically complex replacement, and to propose that the lexical rule which forms deverbal nouns should cover both cases. Similarly, Hale and Keyser (1993) propose that the process of noun incorporation which forms denominal verbs in examples such as babysit (e.g. Baker 1988) be generalized to account for 'total incorporation', that is conversions, of the form shelve (from shelf), calve (from calf), and so forth. Likewise, Levin (1993) lists many verbal diathesis
the glass I The glass broke).
By contrast, apparently systematic polysemy or sense extension which at most involves subtle grammatical changes, such as various types of nominal metonymy, are often explicated in terms of processes of conceptual transfer or mapping (e.g. Lakoff I987), and are usually treated as essentially pragmatic phenomena (e.g. Nunberg I 979) · However, some nominal metonymies have closely related derivational counterparts; for example, the conventional metonymy which allows a container to stand for its contents (He drank a whole bottle (of whiskey)) is paralleled by suffixation with -ful (He drank a (?whole) bottleful (of whiskey )) 5 Cross-linguistically, metonymies which involve no syntactic change in English can involve systematic changes in other languages; for example, the conventional nominal metonymy by which a fruit or nut denotes the tree of the fruit or nut (e.g. apple , chestnut) is normally accompanied by a change of gender (masculine tree) in Spanish (e.g. aceituna Iaceituno (olive) or pomelalpomelo (grapefruit)) and Italian (Soler & Marti I 993). Whilst the underlying explanation for the possibility of such processes may rest on a cognitive account of conceptual transfer (Lakoff & Johnson I 980; Lakoff I 987) and/or a general pragmatic account of the 'cue-validity' of different metonymic functions (Nunberg I 979), these cross-linguistic differences and the similarities to other rule-governed lexical processes suggest that the pragmatic account must be overlaid with an account of lexical licenses (Nunberg & Zaenen, I 992) or lexical rules (Copestake & Briscoe I 992), in which conventionalized and language specific aspects of these general processes of conceptual transfer are expressed, and which serve as language-specific 'filters' on the general process. Polysemy as sense modulation through specialization or broadening of meaning in context is intuitively a common process. Many examples that lexicographers tend to treat as alternative senses are, in principle, amenable to this approach; for instance, Atkins & Levin ( I99 2) identify two senses of reel .
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
alternations which are usually treated as rule-governed conversions because of their clear affects on argument structure (e.g. causative-inchoative: He broke
1 8 Semi-productive Polysemy and Sense Extension
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
appropriate to the interpretation offilm reel and fishing reel and demonstrate that some but not all extant conventional dictionaries list these two senses. Often the precise relationship between the premodifier and the noun is treated as a question ofpragmatics (e.g. Hobbs et a/. I 990; Alshawi I 992: 2 I I ). However, if reel is defined as a container artifact with the purpose of (un)winding, where the material to be wound is left largely unspecified in the basic entry, then chis definition can be specialized with the appropriate material by instantiation of the object of the (un)winding. This approach would be adequate to characterize the contribution of the premodifier to the semantics of the phrase for the two examples above. However, physical differences between types of reel would be treated as outside the domain oflexical semantics. Pustejovsky ( I 99 I ) develops a theory of lexical semantics in which this approach to sense modulation can be couched. Under this account the representation of nouns includes a specification of their qualia structure , which encodes the form, content, agentive, and relic (purpose) roles. Thus the relic role of the basic sense of reel would be partially instantiated. In general, Pustejovsky suggests that the notion of semantic composition be enriched to one of'co-composition' in which aspects of the nominal semantic representation are integrated with aspects of the premodifier's semantics, using a combination of type shifting of the predicate and type coercion of the nominal complement (Pustejovsky I 99J). A related phenomenon is the broadening of a sense in context; for example, cloud seems to have a 'mass of water vapour' basic sense, but an extended usage as a mass of anything floating dust cloud, cloud ofsmoke , or cloud ofmosquitoes. One thing that normally characterizes such usages is the explicit contextual specification of the way in which the sense has been broadened: thus we might treat the basic sense as taking a default content qualia value which can be overridden by a modifying phrase. In what follows, we explore the hypothesis that systematic nominal poly semies of the kind outlined above can be divided into two types of process which we term constructional polysemy (sense modulation) and semi productive sense extension (sense change). In constructional polysemy, the polysemy is more apparent than real, because lexically there is only one sense and it is the process of syntagmatic co-composition (Pustejovsky I 99 I ) which causes sense modulation. Nevertheless, we argue that the range of possible modification in co-composition is lexically specified, though pragmatically defeasible. Many cases of pre- or post-nominal modification, such as the examples of specialization and broadening above, as well as verbal logical metonymies can be analysed in this fashion. Sense extension, on the other hand, requires lexical rules which create derived senses from basic senses, often correlating with morphological or syntactic changes. Sense extension rules are semi-productive and susceptible to processes such as blocking or pre-emption by synonymy and are, we argue, formally identical to other rules of conversion
Ann Copesrake and Ted Briscoe 1 9
2 THE LE X I C A L REPRESE N T A T I O N L A N G U A G E The language we will use to represent these classes of polysemous behaviour is the lexical representation language (LRL) developed for the ACQUILEX lexical knowledge base system (LKB). The LRL is a typed feature structure language (Carpenter 1 992), augmented with defaults and lexical rules. Types are used to structure lexical entries, which are represented as feature structures (FS), and specify how they combine by means of grammar rules, or alternatively by constraints on phrasal types.6 The LRL could be used to implement a range of unification- and constraint-based approaches. The approach taken in this paper can be regarded (roughly) as combining an HPSG-like approach to syntax with Pustejovsky's notion of qualia structure. Earlier versions of the LRL have been described in Copestake ( 1 992, 1 993a, b) and we will only provide a brief sketch of the formalism here. In this paper, however, we will make use of an improved notion of default unification, which is order-independent and allows for persistent defaults (Lascarides et a/. (forth corning), see section 2.2 below). Most previous definitions of default unification have assumed that it involves incorporating into a non-default FS all the consistent information from a default FS, making no distinction in the result between information which arose from the default and non-default structures. In our treatment, by contrast, information in FSs may be marked as default (or
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
and derivational morphology. Many cases o fconventional nominal metonymy, such as those introduced above, can be analysed in these terms. In section 2 we describe the lexical representation language that we have developed to represent basic lexical entries and characterize systematic lexical processes. In section 3 we return to constructional polysemy and motivate a more detailed analysis of specialization as well as discussing broadening in this framework. In section 4 we discuss sense extension proper with respect to grinding, portioning and other types of nominal metonymy; we address the issues of the directionality of sense extensions, and their apparent ability to apply to phrases in some cases, and their productive yet highly convention alized nature. In section s we consider cases of 'co-predication' (Pustejovsky 1 994), where distinct senses are accessible for coordination and modification, and present an analysis of some cases of co-predication compatible with our accounts of constructional polysemy and sense extension. In common with other lexical processes, sense extension is semi-productive in that it is suscept ible to blocking and sensitive to frequency effects; in section 6 we argue that these properties can be captured by adopting a probabilistic interpretation of lexical rules and utilizing probabilities in a natural fashion in language production and interpretation.
20 Semi-productive Polysemy and Sense Extension
non-default), and this distinction persists throughout subsequent default unification operations. Another difference is an improved treatment of'lexical' rules, which can now operate on both lexical and phrasal signs (see section 2.3). Partially specified phrasal signs can also be represented within the LRL. In general terrns, we are aiming at a formalism which is adequate to represent the conventionalized, non-fully productive aspects of the language, including words, idioms, and sense extension processes (which may be applicable to phrases as well as words-see section 4-J). We will use lexical broadly to include any such specification.'
Types
The LRL uses a definition of typing that largely follows Carpenter ( 1 992). The types are organized as a lattice, with top (T) being the most general type and bottom (1) indicating inconsistency. This lattice, in effect, specifies com patibility between types (any two types must have a unique greatest lower bound in the lattice-they are compatible/unifiable if this is not 1 and also allows for inheritance of constraints from types to subtypes (see Figure I ). Constraints on types are themselves FSs, which will subsume all well-formed FSs of that type-the only features that may be present on the node of a well formed FS are those appropriate to the type labelling it (see Figures 2 and 3). Furthermore, the type hierarchy itself is interpreted as constraining the class of totally specified or 'ground' FSs, since it is assumed to be complete, with subtypes fully covering their supertypes. That is, given t and t ' are subtypes of t", anything of type t" must be resolved by either t or t '. The process of type resolution can be used to drive parsing and generation. T
���
s1gn
/
lex-sign
/
lex-noun-sign
nomqualia
� -£ /1� /
animal
Figure 1
string
" al/ phys1c plant
art ..phys
A fragmenr of a rype hierarchy.
arh act
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
2.1
Ann Copestake and Ted Briscoe top ( ) .
21
gender (top) ( 0 R male female) .
string (top) .
plant (physical) .
sign (top) < ORTH > = string.
artifact (nomqualia) < TELIC > = verb-sem.
lex-sign (sign).
art _phys (physical artifact) .
lex-noun-sign (lex-sign) < QUALIA > = nomqualia.
physical (nomqualia) < FORM > = form. form (top) (OR mass indiv plural).
sem ( top) .
animal (physical) < FORM > = indiv < SEX > = gender.
eve (sem). obj (sem).
Figure z Description of illustrative type system.
[art..phys [
FORM = form
TEL IC =
string l
verb-sem IND li!) eve
PRE D = A RG l = [iii ARG2 = ARG3 = ob =
OJ.j·
l
Figure 3 Expanded constraint on art_phys. 2.2
Lexical descriptions
In the LKB, the type language is augmented with a lexical description that incorporates lexical rules and default inheritance. Lexical entries are defined in terms of types, for example: book <
1
> lex-noun-sign < QUALIA > art_phys < QUALIA 1FLIC PRED > read < QUALIA FORM > indiv. -
-
-
-
(Here we continue to use the simple type system defined in Figure 2.) The FS is defined to have overall type lex-noun-sign and to have the QUALIA appropriate
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
verb-sem (top) < IHD > = eve < PRED > = string < ARG1 > = < IHD > < ARG2 > = obj < ARG3 > = obj .
nomqualia (top ) .
22 Semi-productive Polysemy and Sense Extension
for an individuated physical artifact with a relic role instantiated to read . The ORTH feature is instantiated with a string constructed from the entry's label, "book" (string types do not have to be explicitly listed in the system) (Figure 4).
[ [
lex-oouo-slgo OIUH = book
QUALIA
rc:�r:y:
=
TELIC -
iodiv
se
w!o :fiD�
e e:�
ARGl ARG2 = o � A RG3 = obJ =
Figure 4 FS for book .
ll
novel 1 < QUALIA >
< book_ I < QUALIA >.
states that the lexical entry for a particular sense of novel inherits its qualia from (a particular sense of) book. (The symbol < indicates inheritance.) Given this specification, novel would inherit its relic role from book . One effect of this is that it would predict that the normal interpretations of ( 1 a) and ( 1 b) below would both involve a reading event (see Pustejovsky 1 991 and section 3 below). ( 1 ) a. John enjoyed the book. b. John enjoyed the novel.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Lexical descriptions are evaluated to produce psorts , which are simply named FSs. We make use of psorts rather than define distinct types for each lexical entry mainly because we have found the restrictions on the type system to be inappropriate for lexical entries-we discuss this in more detail below. Various inheritance relationships are defined to operate on psorts. In theory, arbitrary parts ofFSs can be related by inheritance. In practice, we make use oftwo classes of inheritance specification much more extensively than others. One of these is inheritance of qualia structure, the other is used in describing a lexical entry as being derived via a productive rule, but having some exceptional value for orthography, syntax, or semantics. We will concentrate on qualia inheritance here since it is more relevant to the subsequent discussion, but see Copestake ( 1 992) for a treatment oflexical exceptions in the LKB. We assume that the possible qualia structures can be regarded as a con ceptual hierarchy (actually a lattice), certain regions of which will be associated with particular lexical entries. It is convenient to be able to describe some lexical entries as inheriting their qualia structure from others (see Copestake 1 992, 1 993a). For example:
Ann Copesrake and Ted Briscoe 2.3
dictionary 1 < QUALIA > < hook_ 1 < QUALIA > < QUALIA TELIC PRED > - lrefer_to. We specify the value of the telic predicate to be defeasible here as well, because for some dictionaries this might not be appropriate (e.g. Bierce's Devil's Dictionary) and also because the contribution of the relic role to interpretation of a particular sentence is potentially defeasible. The corresponding FS is shown in Figure s.x One effect of the difference in relic role between book and dictionary is due to the different aspectual properties of the predicate: read can describe a process but refer_to is point-like. Since enjoy selects for a process, (2) is odd.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
However, inheritance of individual qualia must be defeasible. For example, dictionary should also be defined to inherit its qualia structure from book but has a relic role of refer to rather than read. Default inheritance in the LK.B is now formalized in terms of persistent default unification (PDU). We will give only a brief description of this here: it is fully defined in Lascarides et a/. (forth coming). This treatment of typed default unification is an improvement over that used previously in the LK.B (Copestake 1 992, 1 993a) in that it is order independent and allows for persistent defaults. This allows us to define multiple orthogonal default inheritance in the lexicon in a manner which is fully declarative. Furthermore, the earlier definition of default inheritance in terms of a default unification operation applying to normal FSs was restricted in applicability to lexical descriptions, but defaults may now persist outside the lexicon. Thus defaults may be combined during the interpretation/generation of a sentence and defaults which originate from lexical specifications can interact with pragmatic processing. In our new definition, parts of FSs may be defeasible; this is a necessary condition for default unification to be associative. In this respect, PDU is similar to the notion of defaults in Young & Rounds ( 1 993), but their approach is limited in that their definition is restricted to non re-entrant values and in that they assume an untyped framework. In contrast, PDU uses the type hierarchy to prioritize defaults. We use a slashed notation for partially defeasible FSs where values to the left of the slash are indefeasible and those to the right defeasible (indefeasible I defeasible). We abbreviate this to Idefeasible where the indefeasible value is uninteresting (e.g. where it is T) and omit the slash when there is no (interesting) defeasible value. So, for example, the FS for book , shown in Figure 4, specifies that the value for the relic predicate is defeasible. The description given below for dictionary specifies that it inherits its qualia structure from book but the specific default value refer_to overrides the inherited value of the relic predicate.
24 Semi-productive Polysemy and Sense Extension
::,� �·[•:: m[�!noe��"e�:r..to ll
le:a:-nouo-oign
ARGI ARG2 ARG3
Figure 5
FS
= = =
o
o j obj
for dictionary.
(2) ? John enjoyed the dictionary. Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The importance of the defeasibility of parts of the qualia structure is discussed briefly in section 3 and at more length in Lascarides et a!. (forthcoming). The persistence of the defaults 'outside' the lexicon is irrelevant for much of this paper, so for the most part we can continue to assume the formal account of the LKB provided by Copestake ( 1 992, 1 993b) and we therefore omit further discussion ofPDU. There are a number of reasons for not defining lexical entries to be types themselves. We want to maintain a distinction between the types, which are used for description or classification, and the data which they are being used to classify, i.e. the lexical entries. The type system is assumed to be complete, but we do not want to make this assumption about hierarchically arranged lexical entries. It should not be necessary or even possible to introduce features which are specific to particular lexical entries. The hierarchical organization of the psorts is used for inheritance of information, but not for classification ofwords. Furthermore the condition imposed on the type hierarchy, that a unique greatest lower bound must be explicitly specified for all compatibile types, is too restrictive to apply to the lexical entries, or parts of lexical entries, that we refer to as psorts. The FSs, of course, do form a lattice, but the points that are being specifically identified as psorts do not. Psorts are a way of identifying particular points in the lattice, but which points are so identified is not constrained in any way. Furthermore, making lexical entries types obviously leads to a proliferation of types. This is particularly acute if we wished to make some lexical entries underspecified with respect to the lexical types. For example, suppose we wish to make truth underspecified with respect to the two types lex-count-noun and lex-uncount-noun which were both defined as subtypes of lex-noun. Simply specifying truth as an additional subtype of lex-noun would not achieve the correct results, since it would then not unify with a FS of type lex count-noun or lex-uncount-noun. We would have to define explicitly truth-count as a subtype of truth and lex-count-noun and similarly for truth-mass (which means there would be no advantage of economy of
Ann Copestake and Ted Briscoe 2 5
rabbit
<> = lex-noun-sign
<
QUALIA
> = animal.
bull
<> = lex-noun-sign
[
< SEX > = male.
Query
FS :
Resolved
ORTH
lex-a_lgn
[
QUALIA
FSs:
=
atrlng nomqualla SEX = male
=
[
n u ·
�RTH : :�f.'btt QUALIA
Figure 6
=
[FORM
animal SEX
o
]
l l[
lndiv male
=
]
,
lex-noun-algn ORTH = bull
QUALIA
=
[ ro1� SEX
=
lndlv male
=
]
l
Constraint resolution with lexical constraints.
Lexical rules
2. 3
Lexical rules are formalized in the LKB as feature structures of type lexical rule, which has the constraint:
[! ����:U�J e
1
- lex_sign
Application of a particular lexical rule simply involved unification of the input of the psort with the input part of the lexical rule, indicated by the path <1>, and returns the instantiated output of the rule, given by the path
.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
representation in the underspecification).9 Instead we define lexical entries as FSs, but give them a special status in that they are identifiable and constrain the results of evaluating FSs which have lexical types. In the current version of the LRL, we define psorts as constraints on certain types. If a type is defined as being lexical it is assumed to be constrained such that any FS to which it is resolved must be subsumed by one or more psorts of the appropriate type. For example, Figure 6 shows an FS and the possible resolutions, given the psorts shown and the types in Figure 1 . If a type is defined as being phrasal it will normally be resolved as being constructed from lexical types, which will be constrained by lexical psorts. However, it is also possible for phrasal psorts to be defined which allow an alternative analysis of the phrase. These will not be fully resolved FSs, but partially specified ones which will themselves be subject to constraint resolution. (This mechanism might also be used in the treatment of idioms and other (partially) fixed phrases.)
26 Semi-productive Polysemy and Sense Extension
An example of a lexical rule in this system is portion ing, which covers the sense extension involved in usages such as three beers, where a mass noun which denotes some food or drink is converted to a count noun denoting some (conventionally served) portion of that substance. The FS in Figure 7 describes this rule using the type system from Copestake (1 992) (the justification for the particular details of the representation adopted can be found there). The qualia types c obj and c_subst indicate an edible object and substance respectively. The rule would apply to a lexical entry such as that shown for beer in Figure 8. Morphological rules are formally identical to sense extension rules, except in specifying a change of phonology/orthography. One immediate question is how the notion of lexical rules fits into a constraint-based framework. In Copestake & Briscoe ( 1 992), lexical rules were essentially indistinguishable from grammar rules, and could in fact apply to phrases. This allowed us to deal with some examples of phrasal sense extension. For example, the place g roup sense extension applies both to place denoting words such as village and to some phrases, as in (3) (see section 4·3 below for further details). _
lexical-rule lex-count-noun ORTH = o orth CAT = noun-cat n u
0
=
l
��� : r;J'r::illa
I[
PRED
SEM
=
ARG l = PLMOD QUANT
= =
�-:�tc
modifted-pred MODIFIER portion MODIFIED = [!I loglcal-pred =
boolean boolean
n
lverb-seml nomform FORM RELATIVE OBJECT-I DEX = 1!J lex-uncount-noun ORTH =r-""'@I ""--tl CAT = lnou ra-ca� ula n u f QUALIA
1
SEM =
=
Qu•u•
=
[
=
_
�;t; : �j jS;] PRED � ARGl rn =
[�� ; n�
PLMOD qu
•
=
=
·
boolean �8
l
nomrorm RELATIVE OBJECT-I DEX = rn -
=
=
portion
mass
]
]
]
l
]
I
Figure 7 Lexical rule for portioning. In this figure, and subsequent examples, boxes round type labels for a node (e.g. noun-cat) indicate that the FS which that node heads is not shown and some fearures are omitted.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
--+
[
Ann Copesrake and Ted Briscoe 27
l
lex-uncount-noun ORTH beer CAT = lnoun-catl obj-noun-fonnula IND = ill lobjl beer.! PRED SEM = ARG l = ill PLMOD false QUANT false c.subst TELIC = verb-sem QUALIA = FORM - nomform RELATIVE OBJECT-I DEX II] -
Figure 8
[
FS
=
=
=
_
=
=
mass
]
l
corresponding ro rhe lexical enrry beer.
But treating lexical rules as operating as unary grammar rules is unattractive-it obscures the distinction between the syntagmatic component of the system and the semi-productive paradigmatic component. Furthermore this treatment does not carry over in a simple way to a constraint-based approach. Within a strictly constraint-based framework there have been essentially three proposals for lexical rules: 1 . Lexical rules expand the lexicon in a preliminary processing phase. This is the standard approach (e.g. Pollard & Sag 1 987) but is unattractive because it does not extend to analogous phrasal processes and because the lexicon is not finite. 2. Treat lexical rules as being similar to grammar rules, with affixes having their own lexical entries. Such an approach is suggested in Krieger & Nerbonne (1993) for derivational morphology. But for sense extension and conversion we would need to postulate zero-morphemes. 3· The place of lexical rules is taken by complex types (Riehemann 1 993). For example, Figure 9 sketches a complex type which could replace the portion ing rule shown before. This avoids the use of zero-morphemes for sense extension. However, it still has disadvantages-there is a proliferation of types in the hierarchy as it becomes necessary to allow lexical signs of all classes which might be formed by sense extension to be either simple or of a type that depends on their derivation. For example, lex-count-noun would have subtypes simple-lex-count-noun and portioned. Signs would be distinguished in this way solely because of their construction from lexical rules, which is particularly unintuitive for sense extensions since the directionality of an extension may be non-obvious (see section 4·S below). Extending the approach to phrasal signs would be possible, but would further increase the number of types. Thus this approach would work for
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(3) The south side of Cambridge voted Conservative.
28
Semi-productive Polysemy and Sense Extension
portioned
comple:
ORI'H = mJ
MORPH =
cAT = SEM -
DTR =
[ iFi : l
lno
�
CAT = SEM -
_
[
ORI'H =
liD ]
noun-cat obj-noun-formula IND = 00 JObJI PRED = � ARGl = rn
l
�m-'l:�j··
PRED =
ARGl =
modifted-pred MODIFIER = portion MODIFIED = ffi logical-pred
]
Portioning expressed as a complex type.
l
our purposes, but the mechanics of constraint resolution are driving the representation, forcing us to postulate unnecessarily complex structures. We treat lexical rules as generating psorts. Clearly, if we simply applied all the lexical rules to the defined psorts in a precompilation phase, this would be equivalent to the first option above. Instead of doing this, we use the lexical rules to generate dynamically alternatives during constraint resolution of nodes with lexical types. To see how this works, consider the example type system in Figure I , but assume that instead of the type animal we have a type animate, with subtypes animal and human. Figure I o shows a very simple lexicon and a lexical rule that converts animal denoting nouns to human denoting ones.10 The query structure shown in Figure 1 0 might be resolved by the lexical psort given for grandmother. However, an alternative resolution is available via application of the lexical rule. Figure 1 0 shows how this is applied, in effect, by 'wrapping' it round the query FS which instantiates the output sign of the rule and constraint resolving the result. Further resolution of the input sign, because it is matched up with a psort in the lexicon, results in specialization of values on the output sign (the ORTH value in this case). The index lEx l is shown here to emphasize that under normal circumstances this resolution step would be part of the resolution of a sentence sign and thus the query FS shown will be part of a larger structure. Further constraints imposed on the output sign by the resolu tion of the surrounding structure would affect the input sign and thus limit the way in which it might be resolved. Note that this treatment implies that the output sign be resolvable with respect to the type system: it must be a potential lexical psort even though it is not actually defined as such. This strategy involves a slight modification to the constraint resolution algorithm since it entails an external mechanism adding a node to be resolved.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Figure 9
le:lt-UDCOUnt-noUD MORPH = morpb
Ann Copestake and Ted Briscoe 29
rabbit <> = lex-noun-sign < QUALIA > = animal .
grandmother <> = lex-noun-sign < QUALIA > = human < SEX > = female.
animal-metaphor
Query
[ ��::,:·::�[h:!man ] ] female lexical-rule [ lex-noun-sig[nhuman [ ��:��=·-�g[11animal ] ] lexical-rule [ t'R:rH'u�·-�g[�'�!��i�.. indiv female ] l n u n [ t'RTH : �g[ animal indiv ] l
lli!l
FS:
Lexical rule applied:
0
1
o
Resolved
FSs:
SEX
=
rnxJ
ORTH
=
QUALIA
=
@)
-
SEX
=
SEX
=
=
1D
III
rnx�
=
QUALIA
=
FORM SEX III =
=
1
Figure
>.
10
=
QUALIA
=
FORM SEX = III =
Constraint resolution with lexical rules.
Resolution of this node could itself involve lexical rule application, of course, and in general this algorithm may not terminate. This, however, also applies to the alternative formalizations. Compared with Riehemann's approach, we are trading off greater simplicity in the type system with a complication of the constraint resolution mechanism. From our viewpoint, one advantage is that we are maintaining a distinction between the straightforwardly syntagmatic aspects of the grammar, which are implemented by means of phrasal types, and the semi-productive processes we implement by lexical rules.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
<> = lexical-rule < 0 ORTH > = < 1 ORTH > < 0 QUALIA > = human < 1 QUALIA > = animal < 1 QUALIA SEX > = < 0 QUALIA SEX
30 Semi-productive Polysemy and Sense Extension
3 C O NSTRU C T I O N AL P O L YSEMY There are many cases of apparent polysemy which we would argue are better treated as 'constructional' polysemy, in that the lexical item is assigned one (often more abstract) sense and processes of syntagmatic combination or 'co composition' (Pustejovsky 1 99 I ) are utilized to specialize this sense appropri ately. We treat this as a process of sense modulation, represented by specialization in the LKB, in contrast to the process of sense extension to be discussed in the next section, which we represent using lexical rules. A simple example of specialization is the representation of reel in its container sense. It is reasonable to define a type container shown in Figure I I that has both syntactic and semantic effects, since container nouns as a class can be subcategorized for postmodification with an of phrase denoting their
/
lex-count-noun
/
container
[ QUALIA = art..phys ]
/ �
containerO
Figure I I
[
relJloun
[ CAT SUBCAT
= cPP)
/ ( [ �t� i 'ft
]
container-of
CAT SUB CAT QUALIA =
=
LA
rl CONSTITUENCY TELIC = [!]
ELIC
= @]
Outline of the description of container nouns.
]
=
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Our approach straightforwardly applies to phrases, such as example (3) above where south side ofCambridge denotes the group of people living there. In these cases the input form (e.g. south side of Cambridge denoting the place) will be a phrase with daughters (DTRS) which will themselves be further resolved in the usual way. The output structure must also be resolvable with respect to the type system. The phrasal sense extensions we have encountered so far all apply to signs which could be either lexical or phrasal as far as the context of the rest of the sentence is concerned (i.e. lexical items could be substituted for them without affecting grammaticality). Since multiword orthography does not necessitate the possession of a DTRS attribute, the lexical rule can be defined so that the output form is treated as a lexical type and will not have daughters to be resolved.
Ann Copestake and Ted Briscoe
31
contents (e.g. reel of tape) which then can be regarded as instantiating their constitutive role. Thus, the polysemy involved in the distinction berween e.g. film reel andfishing reel is not regarded as lexical, and the entry for reel is simply: reel 1 <>
=
container.
The constitutive role may be instantiated by syntagmatic combination (e.g. reel offilm ) but in some cases it may only be implicit in the context. There is, however, another source of polysemy, since container nouns as a class can also refer to their contents. Thus in a (4) reel can b.e used to refer to the film it contains. Furthermore, some types of polysemy will apply only to some subpart of the sense described by the lexical entry. In this particular case, reel used of cinema films can have an abstract sense denoting part of the film:
(s) The mystery is only resolved in the final reel. Here we have a sense extension from a physical object used for representation (in this case the contained object) to the abstract entity represented. Other examples of this extension will be discussed in more detail in section 5.2. The point here is that it is the instantiated form of the basic entry which determines what senses are available, emphasizing the need for flexible interaction between syntagmatic combination and lexical rules. A more complex example of spe!=ialization by contructional polysemy is adjectival premodification: it is well known that in examples such as (6) the adjectives takes on different meanings depending on the nature of the modified head. (6) a. a sad poem/poet/day b. a fast motorway/car/driver Such examples have been used to argue that adjectives should be treated as higher-order predicates or should introduce an unspecified predicate repre senting the relation between the property denoted by the adjective and that denoted by the head noun (e.g. Hobbs et a/. 1 990). Pustejovsky ( 1 99 1 , 1 993) argues that some such adjectives can be analysed as predicates which coerce the type of the head and operate on its qualia structure. Thus he analyses fast as a predicate which selects rhe eventive qualia accessible through the entries for the head nouns in {6b).11 The claim is char nouns denoting artifacts make available as part of their lexical specification an agentive and relic role representing their (typical) process of creation and of use, respectively. Similarly deverbal nouns make their underlying verbal predicate accessible in the same manner. Thus, an
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(4) I just accidentally exposed three reels [of Ektachrome).
32
Semi-productive Polysemy and Sense Extension
adjective selecting an eventive argument 'coerces' the type of the noun into one of the eventive qualia or the predicate underlying the deverbal noun. Pustejovsky (I 99 I , I 99 3) also discusses other examples of'logical metonymy', in which the semantics of a verbal predicate and the type of its complement exhibit mismatches, such as (7). (7) a. Sam enjoyed (drinking) the beer. b. Sam enjoyed (watching) the film. c. Sam enjoyed (reading) the book. d. Sam enjoyed (eating) the caviar.
(8) a. Sam picked up and finished his beer. b. Sam ate and enjoyed the caviar. c. Sam wrote but later regretted the article. Therefore, we treat this type of polysemy as a question of selecting the appropriate aspect of the meaning of the complement, rather than a change in the meaning of the NP itsel£ Traditionally, this is closest to saying that nouns denoting artifacts are vague, rather than ambiguous, between eventive and objective readings, in these contexts. Consider first the example offost typist . The effect we want is forfost to apply to events of the typist typing, i.e. the paraphrase offost typist is (by default) typist who types fost . We will assume that we do this by reifying the event, giving a logical form equivalent to:12
[x ][ typist (x) /\ fost(e) /\ type(e , x )]
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
enjoy subcategorizes· for a NP or progressive VP complement syntactically, but semantically requires a complement with an eventive interpretation in which the experiencer subject of enjoy participates as understood subject. Each of the examples in (7) is grammatical with or without the bracketed progressive participle. However, in the case where it is not present the interpretation remains (by default) identical. Analogously, to the case of adjectival modifica tion, Pustejovsky (I 993) captures the similarity between the two subcategoriza tion possibilities for enjoy by means of a type-shifting operator applied to the predicate, and uses a type-coercion operator which selects from the eventive qualia of the NP artifact-denoting complement to express the 'co-composi tional' aspect of the resultant interpretation. Briscoe et a I. ( I 990) presents an analysis of logical metonymies with enjoy which is based on treating type coercion as a (unary) grammatical rule which alters the type and interpretation of the NP. However, Copestake & Briscoe (I 992) and Godard & Jayez ( I 993) point out pr�blems with this analysis stemming from possibilities of 'co-predication'; for example, it seems quite possible to coordinate predicates which require physical objects and events as complements, as in (8).
Ann Copestake and Ted Briscoe
33
We achieve this result by assuming that the qualia structure for typist has its relic role instantiated to:
[x)[type(e, x)] where x is coindexed with the 'normal' variable.U Thus the lexical entry for typist contains structures equivalent to the following:
(SEM)[x] [typist(x)] (QUALIA TELIC)(x ) [rype(e , x )]
[x ] [adj-pred(w ) /1. P (w , x ))
The treatment is similar to that proposed by Hobbs et al. ( 1 990); for example, rather than directly equating the entities denoted by the noun and the adjective, the relationship between the two, denoted above by P , is underspecified. However, in our approach information from the qualia structure provides the instantiation. In the case of telic-adjectives, P will be instantiated by the relic predicate. The lexical entry for fast can be specified as adjective with the semantics instantiated so that it can only be true of an event. Any particular instance of fast in an utterance will have to become resolved to one particular subtype of adjective. In the case offast typist, the normal form of the adjective is ruled. out since typist is object denoting and only the relic role specifies a possible predicate. The choice of predicate may be determined by selectional restric tions, which can be encoded in the LK.B as constraints on the types governing the predicate argument structure, but we will not discuss the details here. The qualia structure of the modified phrase is equal to that of the noun (see Figure
I 2). In this formulation the qualia structure of the noun is not itself directly modified by the adjective. This differs from the treatment we gave in Briscoe et a/. ( 1 990) where, because we unified the entire relic role into the representation of the modified nominal, all relic events were, in effect, modified by the adjective. This meant, for example, that the interpretation of enjoy the long book entailed that the reading event assumed was also long, which is not necessarily correct. In our current treatment, the variable is specified by the adjective alone and this problem does not arise. The interpretation offast typist as someone who types fast is defeasible. In the context of a race between typists and accountants, for example, a fast typist might be one who can run, ski, or ride a motorbike quickly; in this case the predicate is given contextually. Briscoe et a/. ( 1 990) argues for the notion of a
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The type adjective has subtypes for adjectives that select the relic role, the agentive role, and so on. The basic type adjective is subcategorized for nouns, and has the folowing semantics:
34
Semi-productive Polysemy and Sense Extension
pbr-sign ORTH
=
(fast typist) IND = PRED ARG l
= 1!!1
ARG2
= 1!1
SEM =
=
[
[
TELIC
HEAD-DTR
typist
= 1!1 =
:::� : �
P ED ARG l
-
ARG2
l!lJ
]
[ an[ P RED ARG l
=
=
=
PRED ARG l ARG2
[ ���?
[
COMP-DTRS HD
ARG2
: e�ent
=
1!1
=
=
[
l!lJ
type
]]
ll
telic-adjective ORTH = (fast) SEM
]
=
c t ��T;;u� ��;:i:t) SEM = 1!!1 QUALIA =
fast
= 1!1 = le) = m 1!1
= 1!1
]
l
]
12 FS for fast typist (letters are used here to indicate re-entrancy rather than the usual numbers to make the figures easier to follow).
Figure
default lexical interpretation, which can be overridden in informationally rich contexts. Lascarides eta/. (forthcoming) describes how persistent default feature structures can be used to formalize this, by specifying the portion of the semantic representation derived from the qualia structure as default. Our current treatment of enjoy is similar to that offast , in that the 'coercion' is internal to the verb semantics. (Godard & Jayez 1 993 also adopt such an approach.) We treat enjoy as having a type which can either be specialized to take an event denoting complement in the usual way, or to introduce an indi rect relationship between an object and the event, which will be instantiated via the relic role (see Figures 1 3 and 14).14 One further example of an operation which can be involved in construc tional polysemy could be called broadening since usages are available in context which appear to subsume the basic sense semanticaly. Usually it appears that a lex-enjoy-verb
[ tvaein (v-or-np) ] _____;---- 1 � CAT SUBCAT SEM =
[
coercing CAT SUBCAT
=
(NP) ]
Figure 13
[
=
non-coercing-NP CAT SUBCAT
=
(NP) ]
[
non-coerc1ng
CAT S UBCAT
Outline of type hierarchy for enjoy and similar verbs.
=
(VPing) ]
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
[
QUALIA =
DTRS
1!1 = and
coercing
CAT SUBCAT =
( [ �:M
PRED = ARG I = SEM
=
ARG2
Figure
Ann Copestake and Ted Briscoe
=
[
= [!!] [ IND = [!) QUALIA TEbc PRED = IE)
l
an�d��� : : ::: : anf��� : �
14
[
njoy
ARG2 = 1!1. ARG3 = 1i:J -
ARG2
=
[
ARG2 = Ill ARG3 = [!)
liD
Coercing form of enjoy.
]
35
)
l]
-
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
quale which is specified in the basic sense becomes overridden in context. For example, the normal usages of bank and cloud could be specified as stating both form and composition (earth/water vapour). However, both have usages where alternative compositions are stated bank of rhododendrons, bank ofclouds Icloud bank , cloud of mosquitoes, dust cloud . In some comparable cases the broadened sense may appear more metaphorical, for example forest ofhands . In many cases there is evidence that broadening of meaning has taken place diachronically and that the original senses tended to be specific and concrete (see Sweetser 1 990). It seems appropriate to regard these examples as being comparable to those given above in that there is a modulation of sense rather than a complete shift, but, unlike the cases discussed above, this modulation is most naturally expressed as being non-monotonic. For example, in contrast with the case of reel given earlier, there is a very strong preference for one particular sense and the alternative interpretations are not conventionalized, bur given by context (there is no conventional interpretation of cloud as cloud of mosquitoes). This implies that non-default interpretations will only be usual in contexts which explicitly give the exceptional component (normally by compounding or post modification). This, then, is rather similar to the situation with respect to the stereotypical readings of enjoy the book and similar examples (Briscoe et al 1990). To represent broadening we make use oflexically specified persistent default components of the qualia structure and allow these to be overridden. In the FS for the lexical entry for cloud shown in Figure 1 s the qualia structure is stated to refer necessarily to an individuated physical object of amorphous form, with a composition that is also physical and refers cumulatively (i.e. the composition is either a mass or a plural object). By default, cloud is a natural object (as opposed to an artifact) and is composed of water vapour. 15 Referring to the process of overriding the lexically specified defaults as broadening is perhaps somewhat misleading, since a more general FS never actually exists in isolation according
36
Semi-productive Polysemy and Sense Extension lex-count-noun ORI'H .. cloud noun-cat CAT ob,J-noun-formul SEM pbys..obj =
=
QUALJA
=
FORM
=
CONSTIT
Jj/:::;:;:
Figure 1 5
bj
)
RELATIVE lndlv amorpb ABS. pbys..cum water-vapour NCY =
=
=
l
Lexical entry for cloud .
4
S E N SE EXTE N S I O N S
By contrast with constructional polysemy, we argue that there are systematic polysemies which are best represented as lexical rules, which we refer to as sense extensions-that is, predictable creation of different but related senses. As described in section 2, the formalism that we utilize to express these rules is equally applicable to derivational processes, as well as those of conversion, in that we treat all such lexical processes as mappings between lexical (and occasionally phrasal) signs.16 From our perspective, it is accidental that some rules specify phonological modifications whilst others do not.17 However, we concentrate on cases which involve little if any grammatical changes, since these constitute the major challenge to a uniform theory of lexical processes. The examples of sense extension discussed below could be broadly characterized as metonymic. In Briscoe & Copestake ( I 99 I) we suggested that similar mechanisms could be used to account for metaphoric processes as well.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
to this treatment. The intuition that the sense "is broadened is reflected in the non-defeasible components of the modified structure, however: for example, the semantic contribution of cloud to cloud ofmosquitoes could be represented as an FS with unspecified composition. Broadening could alternatively be represented using a lexical rule which removes part of the qualia structure. But this is a less attractive account since it would be difficult to avoid spurious ambiguity which would occur if the broadened sense were specialized to have a structure equivalent to the usual sense. Furthermore, the default account gives a natural explanation for the fact that explicit contextual specification of the alternative compositions is necessary for the usage to be interpreted in its broadened sense, which the lexical rule account would fail to capture, without some additional mechanism. In general, we see the use of lexical rules as appropriate when there is a shift in syntactic or semantic type, as will be illustrated in mor� detail in the next section.
Ann Copestake and Ted Briscoe 37
4. 1
Grinding and portioning
One process of sense extension is that which creates mass nouns denoting an unindividuated substance from count nouns denoting an individual physical object of some kind. Given the right context, this process can apply quire generally. The context normally suggested is to imagine a large grinding machine, the Universal Grinder (see e.g. Pelletier & Schubert I 989), which would, for example, turn a table into some substance that could be referred to by the mass term table . Conventional subcases of grinding exist, for example, food-denoting mass nouns can be formed from animal-denoting count nouns (e.g. lamb , rabbit , haddock , chicken ). This extension appears to be productive, at least in a sufficiently marked context; for example, in the LOB corpus ( I o) we · find the use of mole as a mass term. ( 10) Badger hams are a delicacy in China while mole is eaten in many parts of Africa. We therefore cannot assume that the extended senses are listed explicitly in the lexicon. As in this example, where the animal sense is a count noun and the meat sense is mass, sense extensions may affect syntactic behaviour. However, the syntactic difference is not criteria} since in examples such as ( 1 1 ) it is the predicate rather than the complement which indicates that grinding has occurred. {I 1 ) Sam enjoyed the lamb.18
Furthermore, unlike the case of co-predication with constructional polysemy, it seems much harder to coordinate predicates selecting for the ground and unground senses of a complement, especially if this is combined with co composition, as ( 1 2) illusrrates.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
For example, the sense extension from animals into metaphorical senses denoting humans with some particular characteristic is apparently productive (e.g. John is a lamb/pig /wombat), although the actual characteristics involved cannot be predicted from knowledge of the animal sense. We would argue, for example, that the properties ascribed to a person by pig are stereotypical associations with the animal, which would not be encoded in the qualia structure. Despite the more associative or analogical nature of metaphorical sense extension, there is a core component to such processes which should be expressed in terms of a sense extension rule. In general, we assume that the possible mappings defined by sense extension rules define the limits to the possible shifts in meaning, but more general reasoning may be involved in determining the meaning more exactly in a particular context. However, in this paper, we will concenrrate on metonymic examples.
38
Semi-productive Polysemy and Sense Extension
( 1 2) a. ?Sam fed and carved the lamb. b. ??Sam fed and enjoyed the lamb.
( 1 3) a. b. c. d.
curious/curiosityIcuriousness glorious/*gloriosityIgloriousness His curiosity was attracted to the curiousness of the phenomenon. ??His curiousness was attracted to the curiosity of the phenomenon.
Thus ( I 3c) and (I 3d) are not equally acceptable because curiousness is typically predicated of things, unlike curiosity which seems more appropriate to people. Similarly, we find the examples in ( 1 4) with the conventionalized subcase of meat grinding are odd. ( 1 4) a. ?Sam ate pig (pork). b. ?Sam likes cow (beef). c. 'Hot sausages, two for a dollar, made of genuine pig, why not buy one for the lady?' 'Don't you mean pork, sir?' said Carrot warily, eyeing the glistening tubes. 'Manner of speaking, manner of speaking,' said Throat quickly. 'Certainly your actual pig products. Genuine pig.' (Terry Pratchett, 1 989, Guards, Guards! Gollanz, London, p. 1 s s, Corgi edition, 1 990) d. There were five thousand extremely loud people on the floor eager to tear into roast cow with both hands and wash it down with bourbon whiskey. (Tom Wolfe, 1 979, The Right Stuff, Farrar, Straus & Giroux, New York, p. 298, Picador edition, 199 1 ) Nevertheless, such examples do occur and when they do, as in ( 1 4c, d) the intuition is that they are not synonymous with the underived senses ofpork and beef; they either convey a negative attitude to the consumption of the meat on the part of the speaker or an entailment of extended denotation, where more of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In section s we return to similar more acceptable cases and argue that in restricted cases such examples are comprehensible as instances of co-composi tion with the ORIGIN specification of the ground predicate. However, for the moment we assume that such examples suggest that we have a genuine ambiguity, as opposed to vagueness: in this case between animal and 'animal stuff' denoting senses. One striking similarity between conventionalized cases of grinding and derivational processes is that both can be blocked (e.g. Aronoff 1 976), that is, undergo pre-emption by synonymy oflexical form. For example, Aronoff notes the pattern in (I J) and argues that gloriosity is blocked by glory, whilst curiosity and curiousness co-exist because they are not synonymous (possibly as a result of . semantic specialization).
Ann Copestake and Ted Briscoe 39 the cow or pig than is normally considered 'meat' is being treated as food.
Blocking appears to be explicable on the basis of Gricean principles, in particular the Maxim of Manner. Given a choice between ways of expressing the same meaning, the most easily interpretable ones should be preferred. In general, this implies that common terms should be used rather than obscure
ones, briefer/simpler forms rather than more complex ones, and unambiguous
expressions instead of ambiguous ones.19 Apparent violation of this maxim
carries the (discourse) implication that the terms are not strictly synonymous; thus terms which are normally blocked will be interpreted as carrying addi tional entailments (see Briscoe et a/. 1 994 for additional discussion). Nunberg & Zaenen
(1992)
point out that conventionalized subcases of
English. For example, they report that in Eskimo (at least conventionalized) grinding of animals is ungrammatical; and in English it seems that grinding of fruits or nuts to produce liquids is not conventionalized: thus, the examples in are awkward, though (I s b) is imaginable, for example, in the context of a
( I S)
conversation between professional cooks.
( I S)
a. ?I drink pear rather than peach. (c£ I drink orange for breakfast) b. ?I fry courgettes with olive rather than safflower.
For these reasons, they argue that a language-specific system of 'lexical licenses' must be provided in order to specify which subcases of the more general conceptual grinding transfer occur convenrionally in a particular language. In addition, different languages choose different grammatical means to encode grinding and its subcases; for instance, in Dutch meat grinding of animals is usually realized by explicit compounding of vlees, so lamb meat is
lamsvlees and
so forth. The conversion process appears to be restricted to the more Stereo typical animals which are farmed for meat, such as chicken. In this way, Dutch
apppears to somewhat mirror the situation in English with liquid grinding,
where certain stereotypical )uicy' fruit denoting nouns, such as
orange
can
acquire a juice sense through grinding, but the majority require explicit compounding (e.g. apricotjuice). Nunberg & Zaenen
( I 992) also argue that the meaning of ground nouns is
defeasible and therefore pragmatically specified. Thus, in the case of grinding of animals, they would provide a lexical license specifying that this is conventional in English, but argue that the interpretation of ground animal denoting nouns
(I
4a, b) the Maxim of Manner as meat is contextually specified. Thus, in requires that we choose pork or beif because these terrns have a more restricted denotation than 'animal stuff'. On the other hand in examples such as ( I 6a, b)
the context tells us that a more restricted 'meat' denotation is appropriate,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
grinding vary cross-linguistically and that there are no clear pragmatic explana tions either for this variation or the absence of some conventionalized cases in
40
Semi-productive Polysemy and Sense Extension
whilst in ( I 6c) the context tells us that a 'fur' reading is more appropriate, and in ( I 6d) that nothing more specific than 'stuff• is entailed.
( I 6) a.
Sam eats rabbit regularly. b. Sam enjoyed the rabbit. c. Sam wears rabbit regularly. d. Sam both wears and eats rabbit.
grindinlgex-unconuon�u:tn-n-cor::atuhn�;z:���� � · � 1 :d 1 false stringJ 1 [;-..:/..l _,.---71 [ nom[faogremntive mass] lex-coun!nt[-or�u1on�nu-nca:tu�-rfo:;iulal ] l [ physicalftarlus[enomform individual ] ] ORTH = CAT =
o
....�..
SEM
0
ARGl = [I] PLMOD = QUANT = c
QUALIA
=
ORTH - (l!J
CAT =
I
=
SEM
=
QUALIA
MODIFIED = liD
:�:��VE
FORM
=
=
ORIGIN = [!)
RELATIVE
I
PRED = 3 ARGl = • PLMOD = QUANT =
FORM =
Figure 16
RELATIVE
Grinding.
=
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Our approach is similar in that we posit a general abstract lexical rule of grinding and conventionalized subcases, including animal meat grinding and animal fur grinding. However, we also 0'.lggest that whilst the more specific conventionalized 'meat' and 'fur' senses are defeasible in appropriate contexts (because the more general ground sense is also available), they are specified lexically as a component of the conventionalized subcases of the grinding lexical rule. The general rule of g rinding is shown in Figure 1 6 (using the type system described in Copestake I 992, which also discusses the formal semantic properties of the grinding function in the context of the general treatment of mass terms proposed by Krifka 1987). The effect of the lexical rule is to create from a count noun with the qualia properties appropriate to an individuated physical object, a mass noun with properties appropriate for an unindividuated substance.
Ann Copestake and Ted Briscoe 4 1
We specialize the grinding rule to allow for cases such as the animal/meat extension explicitly. The typed framework provides us with a natural method of characterizing the subparts of the lexicon to which such rules should apply. The lexical rules can, in effect, be parameterized by inheritance in the type system. For example, we can give rules which inherit information from grinding such as meat-grinding: meat-grinding < > < grinding < > < I QUALIA > = animal < o QUALIA > c_subst. =
lex -uncount-uouu ORTH rabbit CAT h[•oiEt�:!·�-f�:�� a t rmodifled-pred PRED MODIFIER �inding SEM MODIFIED rabbit....! ARG l PLMOD f lse a QUANT false e [ a�ntivestuff ] �� ��� :�� QUALIA TELIC PRED �co���n: - rnomform FORM mass ] RELATIVE =
-
=
[
=
=
= =
o
!aJ
=
=
=
=
OBJ ECT-I DEX
Figure
17
=
@)
=
Ill
Meat/flesh sense of rabbit .
Il ]
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
in section 2.3 c_subst is a type which stands for normally comestible naturally derived substances. The lexical rule can be applied to the lexical entry for rabbit to generate a sense corresponding to 'edible stuff derived from rabbits' partially represented as shown in Figure 1 7. Here the specification of the value for the telic role arises from the constraint on the type c_subst. Using the notion of persistent defaults described in Lascarides et a/. (forthcoming), we can treat this as defeasible. The meat-grinding rule creates a second extended sense for the mass noun rabbit (and other animal denoting count nouns) but does not result in the full specification of what might usually be taken as the meaning of the meat/flesh sense. The substance is stated to be edible (to be precise, to have the normal purpose of being eaten) and to be derived from the animal, but there is no attempt at defining the meaning to exclude, say, stuff derived from bones; particular cultural assumptions will affect exactly what is taken to be edible, so rabbit will usually exclude the bones but whitebait will not, for example. Thus not all the characteristics are captured by the lexical rule and we assume that pragmatic effects will ensure further contextual specializa tion. As
42
Semi-productive Polysemy and Sense Extension
The more specific rules which inherit from the general grinding rule express the conventionalized processes that apply to semantically specified parts of the lexicon. In addition to meat-grinding we could also define a lexical rule which gives the fur/skin sense, available for rabbit, mink , beaver, calf, lizard, crocodile , and so forth. In this way we account for the possibility of multiple distinct mass senses being possible. In context, a general mass sense corresponding to the application of the uriderspecified grinding rule is available, as in ( 1 7). ( 1 7) Mter several lorries had run over the body, there was rabbit splattered all over the road.
( 1 8) Sam enjoyed but later regretted the rabbit Under the co-compositional account of such constructional polysemy (see section 3) this is straightforward since the meat-grinding sense of rabbit provides a relic role which allows the eating interpretation to be consrructed.20 However, if the lexicon does not propose such a sense, it is unclear what it is about the context which allows pragmatic specialization of the interpretation. Briscoe et a/. (1 990) provide empirical support for the hypothesis that the lexicon proposes and pragmatics disposes of such initial interpretations: on the assump tion that logical metonymy will be utilized when a reading based on qualia is appropriate, or when the context is rich enough to provide determinate information to override this 'default'; and that an explicit event will be specified where a non-default reading is appropriate, but the general context is not rich enough to override the default. Thus, a verb like enjoy occurs mostly with metonymic NP complements, but when it does occur with progressive VPs the interpretation is never that which would be predicted by co-composition with eventive qualia; whilst with metonymic NP complements, where the default reading is inappropriate the context is always informationally rich and determinate. Multiple sense extensions/lexical rules may be applied in sequence. For example, we mentioned in section 2.3 the lexical rule portioning which converts food or drink denoting mass nouns into count nouns denoting a portion of that substance (e.g. three beers). This is clearly productive; it can be used with names of particular types of beer, for instance, such as three Heinekens/IPAs/Anchor Steams. It can also apply to extended senses such as three lambs, at least in the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Thus, under this account, the defeasibility of the more specific sense is predicted in terms of ambiguity. The alternative of relying on pragmatic specification of a single underspecified sense seems to us less satisfactory because of the specificity of readings found in uninformative contexts; for example, in examples such as ( 16b) or ( 1 8), the natural interpretation is that the rabbit was eaten.
Ann Copestake and Ted Briscoe 43
context of a restaurant. This 'feeding' of lexical rules raises the issue of why ground portioned nouns are not, for instance, reground creating an infinite sequence of more and more derived senses. There are several potential solutions to this problem; one might be to set up the rules so that grinding feeds portion ing but not vice versa. However, we do not think that this is necessary, and in fact there is no reason to believe that portioned count nouns are of a type inaccessible to grinding. Rather we think that the non-existence of ground portioned nouns follows from the semi-productivity of lexical rules; the ground portioned sense is synonymous with the original mass sense and is thus blocked. We return to the issue of semi-productivity and blocking in section 6.
Nominal metonymies
Grinding can be characterized as a set of metonymic sense extensions in which the animal comes to stand for something derived from the animal. However, it appears to have a different flavour to many of the nominal metonymies identified by Nunberg ( I 979), for example. Many of these involve objects standing for people, as in ( I 9). ( I 9) a. b. c. d.
The third violin is playing badly. The Armani suit lounging gracefully at the bar looks bored. London said that a new passport could not be issued. The village voted conservative at the last election.
Although these putative sense extensions seem to have no grammatical effects, sometimes they can affect agreement. Nunberg ( I 979) and Pollard & Sag (in press) discuss the use of food to denote people, which is a less conventionalized example of a similar metonymy, as in (2o). (2o) a. The ham sandwich wants a coke b. The french fries is getting impatient. It is clear that agreement in (2ob) is determined by the referent rather than the syntax of the NPfrenchfries which would induce plural agreement given a non metonymic reading. Similarly, co-predication of such examples seems awkward, as in (2 I). (2 I ) a. b. c. d.
??The ham sandwich wants a coke and has gone stale. ??The french fries is getting impatient and are getting cold. ??The third violin is scratched and playing badly. ??The Armani suit is at the bar and crumpled.
Similarly, it is clear that pronominal agreement and reflexivization are also affected by transfer of reference (Fauconnier I 98 5; Nunberg I 993; Pollard &
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
4.2
44 Semi-productive Polysemy and Sense Extension
&nit-to-tree (lexical-rule) < I > = lex-count-noun < o > - lex-count-noun < I QUALIA > = c_nat_obj < o QUALIA > = plant.
The normal lexical rule for Spanish can then be stated as: fruit-to-tree-ESP < > = fruit-to-tree < o SEM IND AGR GENDER > = masc < I QUALIA AGENTIVE ORIGIN > - < o SEM PRED>. The exceptional cases can be stated using explicit lexical entries which override the usual results oflexical rule application: higuera < > < (higo + fruit-to-tree-ESP ) < > < SEM IND AGR GENDER > = fern. This example illustrates that some nominal metonymies, just like grinding, can have different grammatical encodings in different languages and this supports our contention that such processes should be treated as language-specific lexical rules, creating lexical entries (signs) with extended senses and different grammatical and/or phonological specifications, as required. We return to the issue of how to distinguish such cases from those of sense modulation or contructional polysemy in section S·
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Sag, in press). These observations suggest to us that these nominal metonymies must have a non-pragmatic component and must be treated as distinct senses/ signs. Within our framework, we propose to treat them as sense extensions and provide lexical rules for them, analogous to those developed for grinding and portioning. Another such sense extension is that from a word denoting a fruit (or nut) to a plant bearing that type of fruit (e.g. apple , gooseberry , walnut) which is found in Italian and Spanish as well as English.21 However, in the Romance languages the fruit is usually (but not always) feminine while the tree is masculine (there are one or two exceptions). For example, in Spanish we have aceituna Iaceituno (olive), pomelo lpomelo (grapefruit) (see Soler & Marti 1993). In a few cases, the suffix ero applies-albaricoque , albaricoquero (again illustrating the similarity of sense extension, conversion, and derivation). The basic type for the lexical rule can be stated as:
Ann Copestake and Ted Briscoe 4 5 4·3
Phrasal sense extension
There are some examples where sense extensions apparently apply to phrases. Thus the place - group sense extension applies both to words such as village and place denoting phrases, as in (22). (22) a. The south side of Cambridge voted Conservative. b. Three villages/three villages south of the river/?three villages built of stone voted for the proposed ban on timber proquction.
(2 3) Here you can eat alligator tail, elk, rattlesnake and that snicker-inspiring delicacy, Rocky Mountain oysters. ( CSAA Magazine) The treatment of such phrasal sense extensions in the LKB is a straightforward generalization of the lexical case since as we described in section 2.3 'lexical' rules can apply to any feature structure representing a lexical or phrasal sign with the appropriate properties. Some examples where a sense extension apparently applies to a phrase are misleading though. since the availability of qualia structure does allow for modifiers which apply to the unextended sense. For example, in the meat grinding cases, we get corn-fed chicken and young lamb, where the adjectival phrase, on semantic grounds, has to apply to the animal, not the meat, but we also get, for example, young veal, corn-fed beef, so such examples do not demon strate that grinding is applying to a phrase. We would analyse all these cases as ones in which the modifier is applying to the ORIGIN feature of the qualia structure (see Figure 1 7 and the example offast typist shown in Figure 1 2, and also section s ).22 4·4
Novel sense extensions
Pragmatic factors clearly affect the acceptability of the underspecified, uncon ventionalized uses of sense extension typified by the 'ham sandwich' example in (2oa). Something like Nunberg's (1 979) conditions on transfer of reference are needed for the intended referent to be identifiable. But these in themselves do not sufficiently delimit the possible uses o( even the novel sense extensions. Nunberg posrulates a set of basic transfer functions-we would identify these with our most general sense extension rules. The existence of a (unidirectional) object - human basic tranfer function allows for the ham sandwich sentences,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
These seem quite restricted; in this particular sense extension it appears that only modifiers which might apply to the group of people, or which are locational (as in the south side ofCambridge) are fully acceptable. With grinding, too, there are cases of phrases, or at least compounds, undergoing the sense extension, as in (23).
46
Semi-productive Polysemy and Sense Extension
in appropriate contexts, but the converse case does not seem to be possible. Thus, for example, (24) is an unacceptable way of referring to the food that has been ordered by an identified customer. (24) *The man with the brown suit is in the microwave.
characteristic dress - person who wears it (e.g. blackshirt , red beret) musical instrument - person who plays it (e.g. cello , sax). (Some dictionaries also list, for example, spear, bow , gun meaning people who uses these weapons, but these seem somewhat archaic.) Thus we would treat the interpretation of all such novel examples in much the same way as the conventional cases. Novel extended usages are not rare, at least in some styles of writing: (2s) is taken from a newspaper travel article.
(25) (Chester] serves not just country folk, but farming, suburban and city folk too. You'll see Armani drifting into the Grosvenor Hotel's exclusive (but exquisite) Arkle Restaurant and C+A giggling out of its streetfront brasserie next door. (Guardian Weekly, 1 3 November I 993)
Here Armani and C+A are presumably intended to be interpreted along the lines of people wearing clothes from Annani/C+A (and could be analysed as a combination of two conventionalized processes, brand name - object, plus characteristic dress - person who wears it).23 Our account predicts that all such novel metonymic sense extensions should be analysable as falling into a range of basic patterns which might themselves be language-dependent. These basic rules whether conventionalized or not should interact with other grammatical rules appropriately; for example, grammaticality induced type coercion occurs when NPs appear as predicative complements, as in (26) (see e.g. Partee I 992). (26) a. Sam considers Bill a fool. b. Sam is a fool.
In (26) afool is coerced from a generalized quantifier to a property (from ((e , t ), t) to (e, t) in extensional terms). Ham sandwich examples can participate in this coercion quite easily, as in (27) said to a waiter delivering a variety of dishes.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Nunberg discusses the cue-validity of such putative transfer functions and argues that those which occur are motivated by the value of the function as a determinant of the referent. However, a priori, there is no apparent reason why the function from human - object cannot apply in contexts in which (24) might be uttered. For the ham sandwich examples the basic sense extension rule that applies could be characterized as physical object - human. It seems reasonable to assume that such a rule is analogous to the basic grinding rule (see section 4· I ) in that it is generally possible only in marked contexts, but that there are conven tional subcases. For example, Atkins ( I 990) lists:
Ann Copestake and Ted Briscoe 47
(27) I am the ham sandwich. This is compatible with our account, given that the extension will produce a meaning which can be glossed as 'the x who ordered a ham sandwich' which can in turn be coerced to a property of ordering a ham sandwich by the standard type shifting operator. 4· s
Directionality
(28) a. I'm not interested in the binding, cover, typeface etc.-I'm interested in the novel. b. ?I'm not interested in the plot, characterization, etc.-l'm interested in the novel. It is reasonable to assume that the perceived directionality of sense extension processes would be from fully conventionalized to less conventionalized senses. The examples in (29) seem to confirm the intuition that the animal sense is primary, in cases of meat grinding, and the fruit sense in the fruit tree examples.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Although in the case of derivation there is clear evidence of directionality, this is not the case with conversion. In the cases with which we are most concerned where the process is still clearly productive, novel uses, such as the example of mole given earlier, at least demonstrate that a particular directionality is possible. In some cases, the basic sense is evident from the morphology; thus we assume that the fruit/nut sense rather than the bush/tree sense is primary in gooseberry , strawberry, walnut, chestnut, and so forth. This does not preclude the possibility that the direction might change over time nor that there might be cases analogous to morphological back formation. In other cases, there are closely related rules of derivation or compounding which suggest that there should be the same directionality in the conversion case; for example, compounding with juice and meat closely mirrors the grinding conversion, whilst jul sufHxation mirrors the container/contents nominal metonymy. In addition, the tests for cue-validity of transfer functions which Nun berg ( I 979) proposes can also be used to distinguish basic from metonymic senses, as he suggests, and there appears to be general constraints on transfer functions which suggest that they extend from the concrete to the abstract and the simple to the complex (e.g. Sweetser 1 990). Cruse ( I 986:69) describes a test for distinguishing senses according to whether or not they are fully established (i.e. conventionalized in our terminology). This involves the possibility of simultaneously negating the non fully established sense whilst asserting the fully established sense, while the converse is much less acceptable. Thus, his example of novel meaning the text or the physical object is given in (28).
48 Semi-productive Polysemy and Sense Extension
(29) a. b. c. d.
I don't want the meat, I want the lamb. ?I don't want the animal, I want the lamb. I don't want trees, I want peaches. ?I don't want fruits, I want peaches.
s C O O R D I N A T I O N A N D C O - P RE D I C A T I O N Given that we have suggested two different methods for dealing with syste matic polysemy, it is clearly necessary to establish that we can, in fact, distinguish between constructional polysemy and sense extension. It is not always straightforward to distinguish between cases where the relational approach of encoding the different aspects of one entity will work and the examples where it seems necessary to postulate the construction of a new structure via the lexical rule mechanism. Pustejovsky (I 994) suggests that the distinction can be made on the basis of co-predication: that door can be treated as having a relational structure encoding both the aperture and physical object usages, because of the acceptability of (30). (30) John painted and walked through the door. However, he argues newspaper must be coerced between the physical object and organization usages because of the unacceptability of (3 I a), despite the accept ability of examples such as (3 I b) which might be the result of a coercion process applying phrasally to the NP.24 (3 I ) a. *The newspaper fired its editor and fell off the table. b. John used to work for the newspaper that you are reading. This is an area where opinions (and judgements) differ. For example, Cruse (I 986:6s) treats door as having distinct panel and aperture senses on the basis of the semantic abnormality of(32): (32) ?We took the door off its hinges and then walked through it.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The behaviour in this test is explicable on the assumption that basic conven tional senses are assumed by default, and that the extended senses have to be forced by context. There are some cases where neither Cruse's test nor any of the other criterion mentioned give clear results. Nunberg ( I 978) discusses at length the difficulty of making such a choice in the case of the instance/type distinction, for example. The directionality of sense extension rules does not affect the representation of the signs involved so these preferences in inter pretation must follow from the manner of rule application. In section 6 we argue that the semi-productivity of such rules can also be used to predict these preferences.
Ann Copestake and Ted Briscoe 49
but assumes that a 'global door' sense is involved in (33) (which was cited by Nunberg ( 1 979) as evidence that door is not ambiguous). (33) The door was smashed in so often that it had to be bricked up. Care also has to be taken to use cases where the predicates could be true of the same entity; thus (34) does not demonstrate that teacher must be coerced. (34) ?The teacher was pregnant and had a beard.
(36) a. He arrived in a Rolls Royce and a temper. b. Our office typist is fast and bearded. Although we argue that similar co-predications of ground and underground senses seem to be ruled out in section 4· I , some appear to be possible. For example, {37a) involves a coordination of predicates which select the animal sense of chicken , whilst in (37b) we appear to have one which selects both animal and meat senses. {37) a. This chicken is corn-fed and healthy. b. Corn-fed and inexpensive chicken is difficult to find. We can account for both these examples on the assumption that the ORIGIN of the qualia structure of the ground sense is available for modification (as mentioned in section 4.3). Nunberg (forthcoming, this Journal vol I 2.2) argues that this treatment is insufficiently restrictive since the property described has to have some applicability to the meat for the predication to be fully acceptable. However, we would argue that examples such as (38a) are no different from those in which a contextually unexpected adjective is applied straightforwardly to the noun, for example, (38b): (38) a. ??We serve com-fed and happy chicken. b. ?We serve dense potatoes.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We can assume that acceptable examples of co-predication are evidence that a single structure is available, and thus that constructional polysemy is at work. However, we will argue that zeugma25 is not in general explicable on the basis of the existence of multiple distinct lexical structures. So we cannot necessarily take negative examples as evidence for distinct senses and thus as supporting an account involving sense extension as opposed to constructional polysemy. In the cases where we have posited rules of sense extension, it is encumbent on us to account for apparent counter-examples involving co-predication. The clearest such examples are coordinations, where there is no possibility of arguing that the sense extension applies phrasally and where the standard rule of coordination requires type compatibility (e.g. Partee I 992). Furthermore, coordination involving sortal mismatches are often zeugmatic, as (36) illustrates.
so Semi-productive Polysemy and Sense Extension
(3 8b) is odd, despite the fact that potato tubers can differ in physical density, since it is not generally realized that this affects eating quality. Thus, on our account, both these examples are problematic simply because the context is not providing/supporting a clear interpretation. Making the context explicit improves the acceptability, since it restricts and guides the possible inter pretations. Such effects are, admittedly, more likely to arise with adjectives that modify different qualia on our account, but this would be expected, since properties true of aspects of an entity are less likely to relate to a common property, and thus be part of a coherent discourse.
Coordination in constructional polysemy
In section 3 we described a representation of adjectives which relied on selection of predicates from the qualia structure according to the type of the resolved adjectival structure. Adjectives of the same or differing types can be coordinated, although there seems to be some restrictions on the productivity of this process when the adjectives select different qualia (?fast and well-dressed typist). But some examples are more acceptable, such as fast and intelligent typist where intelligent is assumed to be true of the unmodified variable, and the oddness of the others is perhaps better explained as a pragmatic effect. We will assume that the SUBCAT value of the conjoined phrase is the unification of the values on the adjective daughters and that the semantics is simply specified as the conjunction.26 Thus we have:
[x] lfast(e) 1\ P(e, x) 1\ intelligent (x)] where P is coindexed to the telic predicate of the subcategorized noun. This can be applied to typist to give
[x] rJast(e) 1\ type(e, x) 1\ intelligent (x) 1\ typist (x)] Cases such as corn-fed and expensive chicken are similar, on the assumption that corn-fed selects the ORIGIN in this instance. Coordination of the noun raises some more complex issues. The first point to notice is that the treatment of adjectives given above precludes the possibility of selecting one role from one conjunct and a different one from another. This appears to be basically correct for adjectival modification. In the example below, lap is event denoting and the normal form offast would be expected to apply, whereas it selects for the telic role of cars. (39) ??Prost only gets enthusiastic about fast cars and laps.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
s.I
Ann Copestake and Ted Briscoe 5 1
This is odd at least on the reading in which fost applies to the conjunction cars and laps. However, cases where the adjective selects for the same role appear to be generally acceptable, even if the predicate selected differs. For example: (4o) The company's fast typists and computers have raised productivity by 20%. In such examples, the conjoined entities should be regarded as being combined to produce a single (complex) entity, in order to get the collective readings. The conjoined form typists and computers can be constructed from the individual representation using, for example, the formalism described by Link ( 1 98 3) to structure the domain such that complex entities can be described. Thus, the semantics of the conjoined phrase would be written as: Given the approach that we have adopted previously, of treating the qualia as quite distinct from the rest of the sign, the most straightforward option for the qualia of the conjunction is to identify it with the disjunction of the qualia of the conjunctsP In this case, fost would select the predicates from the disjunct, g1vmg:
[x E!) y ] [fost (e) 1\ (compute V type )(e, x E!) y) 1\ typist(x) 1\ computer(y )] But we may want to be able to deduce from this a distributive reading which associates the correct predicate with the particular type of individual (typists who type fast and computers which compute fast). To do this, we would have to complicate the representation somewhat, so that the disjunction was not simply of atomic predicates, but restricted the arguments with respect to the qualia. Although we do not want to equate the fast event with the variables in the qualia structure, we could restrict the fast event to be a subevent of those specified there. In the case of the disjunctive qualia, this would have the effect of restricting fast typing events to the typists and fast computing events to the computers. We will leave this open, since the precise formulation depends on the semantics adopted for events and there are other options, involving alternative trearments of the relationship of the qualia structure to the rest of the sign. Our current proposal for the representation of verbs like enjoy, begin , and so on, discussed in section 3, involves treating them in a manner analogous tofost . Conjunctions such as those in (4 1 ) are thus possible in much the same way as the conjunction offast and intelligent . (4 1 ) a. Sam picked u p and finished his beer. b. Sam ate and enjoyed the caviar. c. Sam wrote but later regretted that article.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
[xEe y)[typist(x) 1\ computer(y)]
52 Semi-productive Polysemy and Sense Extension
However, unlike modification byfast, there are some cases where the comple ment to enjoy is a conjunction, such that one conjunct is object denoting and another event denoting. (42) a. I enjoy films and mending antique clocks. b. We found Sam swimming the channel, which he enjoys more than gol£ (due to Geoff Nunberg) c. Gordon Parry (Gary Mavers) has come into the world and enjoys a small car, many women possibly including Julia and embezzling the premiums he collects. (Guardian 16 Jan. 1 990, Features)
(43) Tigger became famous and a complete snob. Similar remarks must apply to the syntax of examples such as (42) and the semantic effects parallel the syntactic ones: the conjuncts individually have types which are accepted by enjoy and the conjunction is only licensed in contexts where enjoy (or a similar predicate) is involved. So a promising direction for future research would be to provide an account where this parallelism is explicit. However, any such account will have to move beyond a strictly unification-based formalism, to allow for the multiple distinct coercions involved in examples such as (42c).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In any approach, where the 'coercion' is internal to enjoy , problems arise in treating such examples, no straightforwardly unification-based approach can account for both (4 1 ) and (42) by postulating one operation applying either to the verb or its complement. If coercion applied to the noun phrase then the noun would need to have a dual coerced/uncoerced nature in (4 1 ); if it were internal to the verb then this would have to be both coercing and non-coercing in (42). This remains true even if the work of specifying the coerced meaning is shared between the components, or if the coercion affects part of the sign rather than the whole of it. Since the examples of conjunction of unlike types in the complement seem more restricted and marked than the conjunction of the verbs, we prefer our current account (which makes (42) problematic rather than (41 )) over the one we gave in Briscoe et al. ( 1 990) (where the converse applied). The difficulty seems comparable to the problem of cross-categorial coordination from a syntactic viewpoint for which a number of solutions have been proposed (see e.g. Sag et al. 1 98 s; Shieber I 992; Cooper I 99 1 ). Conjunction is licensed in examples such as (43) if the syntactic description of each of the conjuncts independently unifies with the subcategorization requirement of the verb, despite the fact that these descriptions will not unify with each other:
Ann Copestake and Ted Briscoe 5 3 5 .2
Co-predication tests
There are cases where the co-predication test gives less clear indications as to whether constructional polysemy or sense extensions are involved. Take the example of book: it seems clear enough that it has two senses (or usages)-as a physical entity which represents some text and as the abstract text itself But the distinction between these is not really straightforward. Consider the set of examples in (44)
There seems to be a cline here from properties which are clearly true of the content, through those which may be true only of a particular edition of print ing through to those which are true only of a copy of (cf Cruse 1 986:7 1 ). Co predication of the first and last properties seems odd, as in (45). (45) ?That book is full of metaphorical language and is covered with coffee, so it's very hard to read. But co-predication of adjacent pairs seems natural in all cases, for example (46) (46) That book is full of typographical errors and has an unreadable font. If we treat these senses as cases of constructional polysemy, co-predication is predicted. Thus book can have a formal role and a content role in its qualia structure. On this basis, there is no necessary conflict between properties such as isfull oflong sentences and has coffee spilt on it . This treatment will not, there fore, account for the apparent oddity of some co-predications. However, although it is standardly assumed that cases of zeugma provide evidence for lexical ambiguity, it is not clear that this is justifiable. Although we must assume, within a unification-based account, that acceptable co-predications imply the existence of a single structure, it does not follow that the converse is true. As we suggested above, oddness of co-predication can be simply due to incompatibility of the predicates. Furthermore, there is clear evidence that some sort of pragmatic principle of cohesion must be postulated to account for the unacceptability of some readings where lexical ambiguity cannot be involved. For example, (47) has readings where the gardener bought either fruit or trees, but does not have the crossed interpretations where apple tree and pear fruits were purchased or vice versa.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(44) That book is full of metaphorical language. That book is full of long sentences. That book is full of spelling mistakes. That book is full of typographic errors. That book has an unreadable font. That book has lots of smudged type. That book is covered with coffee.
54 Semi-productive Polysemy and Sense Extension
(47) The gardener bought three apples and two pears. Coherence also means that repeated uses of the same homonymous form will tend to have the same interpretation, as in (48), where the crossed inter pretation, although possible, is dispreferred (see e.g. van Deemter 1 990). (48) John gave four files to Mary and three files to Sue.
.
Now again, the properties seem compatible with their neighbours, but co predication of the first and last is odd, as in (so).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Assuming that some such principle is involved, it would also account for the oddness of cases such as fast and bearded typist , tasty and skinny chicken where we are predicating properties of distinct aspects of the entity, without there being any apparent connection between these aspects. The acceptable examples, such as fast a nd intelligent typist , tasty and corn-fed chicken , are those where the distinct aspects are nevertheless related-good typists might be expected to be both fast and intelligent, the food a chicken is given is known to affect the flavour of its meat, and so on (see above). Given this, it is tempting to assume a single structure for book. However, other examples show even more complex polysemy: newspaper can also refer to the physical copy or the abstract text (of a particular issue), equivalent sentences to those above can be constructed and the same remarks apply to these as to book . But newspaper can also refer to an abstract entity other than the text. This is somewhat hard to categorize-it is not necessarily a company, as ownership and editors can change without there being a different newspaper and so on. It seems plausible to suggest that newspapers are regarded as (named) institutions in themselves. Whatever their ontological status, it is clear that in some sentences there is a notion of a 'newspaper-as-institution', but it is not clear that we can make a sharp distinction between this, the content of the newspaper over a number of issues, and the abstract text reading (49) (49) That newspaper is owned by a trust. That newspaper is left of centre. That newspaper supported the Labour Parry at the last election. That newspaper carries long articles about the internal struggles of the Labour Parry. That newspaper has obscure editorials. That newspaper is full of metaphorical language. That newspaper is full oflong sentences. That newspaper is full of spelling mistakes. That newspaper is full of typographic errors. That newspaper has an unreadable font. That newspaper has lots of smudged type. That newspaper is covered with coffee.
Ann Copestake and Ted Briscoe S S
( so) *That newspaper is owned by a trust and is covered with coffee. But in some cases co-predication of the copy sense and the organization sense does seem possible, as in (s r) (suggested to us by Geoff Nunberg): (s 1 ) The newspaper has been attacked by the opposition and publicly burned by demonstrators.
(s2) Three newspapers have been attacked by the opposition and publicly burned by demonstrators. However, there is no reason within our account why both ambiguity/sense extension and vagueness/constructional polysemy should not be involved, and this would account for the data. Thus for newspaper we assume two structures, one corresponding primarily to the copy and one to the institution. Both of these may be involved in constructional polysemy-the text and parent organ ization of the newspaper copy is accessible via its qualia, and conversely the copies are accessible from the structure representing the parent organization. Note that no intermediate primary structure corresponding to one edition of a newspaper seems to be justified-three newspapers cannot mean three editions of the same paper, considered as abstract texts, for example. Thus, in this case, the abstract contents of the physical object can only be accessed indirectly. Thus the account we have developed here is able to capture facts of co predication in coordinate structures with constructional polysemy and sense extension in so far as the latter is acceptable. In addition, our account makes further predictions regarding the grammaticality of non-constituent coordina tion in cases ofconstructional polysemy. We have not considered the interaction of lexical rules of sense extension with indexical and anaphoric pronouns (see Nunberg 1 99 3 ). It is clear that there are many challenges to be faced here, and the consequent complication of the theory ofanaphora must be weighed against the advantages gained here in the succinct characterization ofthe behaviour ofverbs, such as enjoy, which subcategorize for multiple complementation within the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Despite this, assuming that a single structure can cover all the senses of newspaper is highly problematic. Constructing a qualia structure to cover all the senses of newspaper in such a way that different predicates can apply appropri ately is difficult, si�ce it seems that the copy and the organizational sense (at least) should have their own distinct qualia. It is also not clear that one sense can be regarded as primary. Perhaps the most important point is that we can quantify newspaper in either the copy or the organizational sense and vagueness of interpretation with respect to the quantification is not possible in such contexts. Thus (s2) has the interpretation that three newspapers-as organizations have been attacked, and some arbitrary number of copies pertaining to each other have been burned.
56
Semi-productive Polysemy and Sense Extension
same or highly related senses, and in the capmring ofsimilarities between sense extension and other lexical processes. 6 THE SE M I - P R O D U C T I V I TY O F LE X I C A L RULES
(53) a. b. c. d. e.
John saw some lambs. John saw some animals. John saw some humans with some lamb-like properties. John saw some portions oflamb meat. John saw some portions of substance derived from humans with some lamb-like properties.
This problem of rules of sense extension feeding further rules is exacerbated by the lack of morphological marking of the change; that is, the fact that these are
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
There are several empirical problems with the account of lexical rules we have developed. Some of these problems are shared with other generative accounts of morphological operations (see e.g. Bauer I 98 3 for extensive discussion), others are more specific to our proposal to account for sense extensions in the same fashion. It is well known that morphological processes tend to be semi productive and are rarely (if ever) exceptionless (e.g. Bolinger I 975; Aronoff I 976); for instance, the rule of -er nominalization in English creates deverbal nouns which denote the subject of the underlying predicate-typically an agent, as in teacher or thinker, sometimes an instrument, as in (dish )washer or (bottle) opener where the instrumental argument can occur as subject, and occasionally the patient sticker or (best)seller. However, the rule is not fully productive because items such as banker and stationer do not have the predicted meaning, whilst a form like stealer is blocked by thief, though is more acceptable when its meaning is specialized (and made non-synonymous) with a postmodifier stealer offast sports cars Ihearts . Rappaport & Levin ( I 990) argue that both the agent, instrument, and patient versions of -er suffixation are rule-governed and the verbs which undergo the latter are at least partly predictable on the basis that they allow middle formation and thus the promotion to subject of the patient argument- The book sold well. If we assume that subregularities block all regularities and exceptions block regularities, we can account for this pattern of data without problem. The mechanism required to achieve this looks very similar to that which is required to block pig having a meat reading in normal circumstances (Briscoe et al. I 994)· Lexical rules of sense extension, as we have described them, clearly lead to overgeneration. For example, given the sense extension rules for grinding, portioning, and animal-metaphor discussed above, (5 3a) has the interpretations (5 3 b), (s 3c), (5 3d), and (s3e).
Ann Copestake and Ted Briscoe 57
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
rules of conversion rather than derivation. Similar problems arise with uncontroversially 'morphological' conversion and derivation; for example, a generative rule-governed approach would have problems explaining why forms such as unreuntie are not attested. In the literature on lexical rules, this has led to vacillation between interpretations oflexical rules as 'redundancy' state ments relating pre-existent enrries (e.g. Jackendoff I 97 s) and as fully productive generative devices creating new enrries from existing ones which match their structural description (e.g. Pollard & Sag I 987). Neither approach is fully satis factory since the former fails to caprure the semi-productive nature of these rules and the latter leads to overgeneration. Finally, it is clear that in the case of a sense extension such as grinding there is distinct variability in the application of the rule to lexical items even within a conventionalized subcase, such as meat grinding; thus, lamb , chicken , and haddock are common and established, whilst mole and alligator tail are not. It is also clear that language users are sensitive to such frequency-based judgements concerning the relative novelty of usages. The same issue arises with deriva tional morphology in that many forms which are predicted by productive derivational rules are not attested, for example, hammerer and nailer can be formed by applying er nominalization to the 'incorporated' verbs hammer and nail, respectively. However, English speakers are liable to react to these forms in much the same way they would react to mole in the meat sense: with a degree of resistance, but without serious difficulty in interpretation. Bauer ( 1 98 3: 7 I £), in supporting the view that lexical rules should be treated as fully productive generative rules analogous to those employed in syntactic description, argues that it is this greater 'item-familiarity' oflexical items which allows judgements of relative novelty/conventionality to be built up. He points out that there are simply too many combinatoric possibilities at the sentential level for the frequency of particular combinations to be assessed with any confidence by a language user. However, in the case of words and, we might add, idioms the range of possibilities though large is not so great that judgements of novelty based on frequency of use cannot be acquired. Bauer argues, therefore, that accounting for semi-productivity is an issue of performance, not competence. The frequency with which a given word form is associated with a particular sense (or lexical enrry) is often highly skewed; Church ( 1 98 8) points out that a model of part-of-speech assignment in context will be 90% accurate (for English) if it simply chooses the lexically most frequent part-of-speech for a given word. The incidence of senses of words may well tum out to be similarly skewed. In the absence of other factors, it seems very likely that language users utilise frequency information to resolve indeterminacies in both generation and interpretation. Such a strategy is compatible with and may well underlie the Gricean Maxim of Manner, in that ambiguities in language will be more easily interpretable if there is a tacit agreement not to utilise abnormal or rare
58
Semi-productive Polysemy and Sense Extension
(54) a. John prefers rabbit. b. John wants three rabbits. c. The diners ordered three rabbits.
Figure 1 8
Lexeme for rabbit .
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
means of conveying particular messages. We can model this aspect of language use as a conditional probability that a word form will be used in a specific sense, that is, is associated with a specific entry(Pr(lexical-entrylword-form )). We assume that such probabilities are acquired for both basic and derived senses (lexical entries) independently of the lexical rules used to create derived senses. Thus we make no claim that a derived sense will necessarily be less frequent than a basic one; in the case of a word such as turkey in English our intuition is that the ground or animal-metaphor senses are more frequent than the basic sense. It might seem that this assumption commits us to a 'full entry' theory of the lexicon (e.g. Aronoff I 976) in which all possible words are present; that is, the consequences of lexical rules are precomputed. In the limit, the full entry theory cannot be correct because of the presence of recursive derivational rules such as re- , anti- or great preflXation in words such as rereprogram , anti-anti-missile or great-great grandfother, and in our theory of 'cyclic' rules of sense extension such as portion ing and grinding. Instead we adopt an intermediate position in which we claim that basic entries are augmented with a representation of the attested lexical rules which have applied to them and any such derived chains, where both the basic entry and these 'abbreviated' derived entries are associated with a probabiliry.28 For example, a word form such as rabbit might be associated with a basic entry like that illustrated in Figure I 8, in which meat grinding is shown to be (hypo thetically) more probable than grinding, meat grinding and portioning, or fur/ skin grinding. Following Cruse (I986) we might refer to this as the lexeme for rabbit, in the sense that this basic entry encapsulates our knowledge of the (predictable) behaviour of this word-form (though not of its morphological derivatives, such as rabbit-like, and so forth). The attribute LRS associated with the lexeme for rabbit records which combinations of lexical rules have been attested with what frequency in the experience of the language user.29 If we assume that speakers choose well-attested high-frequency forms to realize particular senses and listeners choose well-attested high-frequency senses when faced with ambiguity, then much of the 'semi-productivity' of lexical rules can be tteated as a side-effect of performance. For instance, we would predict that in the 'null' or a neutral context (54a) will be interpreted as rabbit meat, and (54b) will be interpreted as animals.
Ann Copestake and Ted Briscoe 59 On the other hand, less frequent but attested senses should be chosen when
other contextual factors so dictate, as in (54c). In order to specify precisely how
this interpretation is preferred, and to formalize the notion of neutral context within this framework, we would need to develop either a thorough-going
account of the interaction of lexical probabilities with probabilities associated with specific sentential interpretations, or an account of how probabilities
reflecting frequency of usage interact with pragmatic principles establishing
discourse coherence (or both). This would take us well beyond the scope of this paper, but see e.g. Wu ( 1 990), Lascarides
et al. (forthcoming).
In addition to such lexical probabilities, we also think that probability may play a role in the application of lexical rules in novel usage. Under the current
attested senses of a word form, that is, those which have a non-zero probability in the associated lexeme entry. However, in the situation where an inter pretation for a novel usage is called for, an assessment of the relative probability of extant lexical rules would provide a means for adopting the most likely
' analogous' interpretation. For instance, interpreting examples such as (s s), the listener who had not experienced examples of any variant of grinding with these nouns might choose the rule with the highest probability given the semantic type of the noun.
( ss)
a. John prefers alligator tail/mole. b. John prefers chinchilla.
c. John prefers pig.
The probability of a lexical rule might be derived by comparing the number of
lexemes to which the rule could apply (i.e. that it unifies with) where the sense is
unattested, to those for which it is attested. Since grinding can apply to any
count noun but will be attested for very few, whilst meat grinding can only apply to animal denoting nouns and will be attested for a higher proportion,
s
this predicts that (5 a) will be interpreted as cases of meat grinding even in a neutral context. Thus, we can account for productive or ' analogical' use of a
lexical rule to interpret a novel usage.30 Assuming that the rule of fur/skin grinding is restricted to words denoting animals with fur or 'good' skin we may
s
be able to construct a similar account for the preferred interpretation of (s b).
However, the notion of semantic type may need to be more fine-grained than is
plausible or desirable in a lexicon if we are to account for all such preferences in this manner, since ( s sb) shows a preference for fur/skin grinding probably as a result of the salience of fur in distinguishing chinchillas from other types of
rodent, rabbit, or cat. Nevertheless, however, this is achieved, it is ultimately a fact about the word and associated sense(s) rather than a fact about animals,
since it is irrelevant whether, in reality, more chinchilla animals are worn than
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
proposal, lexical rules will have something akin to the status of ' redundancy' rules in that they can be used to create appropriate lexical entries on demand for
6o Semi-productive Polysemy and Sense Extension
7
C O N CL U S I O N
We have drawn a distinction between some cases of sense modulation and change which we have termed constructional polysemy and sense extension, respectively. This distinction is based on behaviour under co-predication and the traditional distinction between vagueness and ambiguity. We also pointed
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
eaten. The case of(s sc) is different though, since this approach would predict a meat reading on the basis of the greater probability of meat grinding than grinding. However, the preferred interpretation is probably the less specific 'pig-stuff' in a neutral context, because of the blocking of this sense by pork. Thus the generation and interpretation of normally blocked forms (unblock ing) seems to require a different type ofexplanation. Briscoe et al. ( I 994) proposed to account for cases of pre-emption or blocking by introducing a defeasible notion oflexical rule and allowing the output ofsuch rules to be defeasibly over ridden in the case where there was a pre-emption by synonymy or by phono logical form. The {pragmatic) principle of blocking introduced case-specific defeasible blocking statements that could be themselves overridden in prag matically marked contexts to account for the occasional usages of, for example, pig to mean meat with additional affect, and so forth. In this manner, the approach captured Bauer's ( I 98 3 : 87) insight that blocking is a bar to the institu tionalization (in our terms conventionalization) ofa meaning rather than an out right ban on its use. In this paper, we have presented a rather different formalization oflexical rules in which the output of the rule itself is not defeas ible. From our current perspective, pre-emption by a synonymy can be explained simply by assuming that speakers will use higher-frequency forms to convey a given meaning. Thus an extended meaning will not become conventionalized ifa common synonym exists. This does not, however, explain the exceptions where blocked forms do occur (except those where the speaker or hearer are unaware of the synonym) nor the effects of their use. The biggest challenge to our current proposal will be to develop an account of the interaction of frequency-based j udgements represented as probabilities with default constraints, such as those which allow unblocking. From the perspective of natural language processing a viable alternative might be to model all such pragmatic phenomena prob abilistically, perhaps deriving data on the frequency of predicted senses from large corpora (e.g. Pustejovsky et al. I 99 3 ). However, ifwe wish to limit the role of probabilities to modelling the frequency-based aspects ofsemi-productivity and develop theoretical accounts ofblocking and unblocking and, say, the interaction of frequency-based judgements with contextual factors favouring a low prob ability sense, then it will be necessary to utilize a non-monotonic logic in which it is possible to reason about probabilities (see e.g. Pearl I 988).
Ann Copestake and Ted Briscoe 6 1
Centerfor Study ofLAnguage and Information Ventura Hall Stanford University Stanford, CA 94305-41 15 USA e-mail:[email protected]
IRCS
University oJPennsylvania 400C 3401 Walnut Street Philadelphia, PA 19104-6228 USA e-mail:[email protected]
Revised version received: 0 3.08.94
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
out i n section 5.2 that i n the absence of clear tests some cases remain difficult to classify with respect to this distinction. Both constructional polysemy and sense extension are productive processes which require 'generative' lexical mechanisms, in the sense of Pustejovsky (1991). We have proposed to account for some cases of contructional polysemy utilizing the notion of nominal qualia structure and predicate coercion. We have formalized this account in a constraint-based approach to linguistic description which has been implemented-the LRL/LKB (Copestake 1 992, 1 993b). We have argued that this approach, unlike those ofBriscoe et a/. ( 1 990) and Pustejovsky (1 993), is capable of capturing many facts of 'co-predication'. However, our account requires extension in order to deal with the cases of non constituent coordination discussed in section 5.1, in line with other constraint based approaches to coordination (e.g. Shieber 1 992). Furthermore, it needs to be supplemented with a pragmatic account of cohesive co-predication along the lines ofNunberg (forthcoming, thisJoumal vol 1 2.2) as discussed in section s. We have argued that sense extensions are semi-productive related sense changes: we cannot simply list all the extended senses in the lexicon, since new 'analogous' cases which will not be listed occur. In addition, there are cross linguistic exceptions and differences of encoding, conventionalized subcases, and so forth, which all suggest a sign-based, lexical rule account. Nevertheless, sense extensions like other lexical rules of conversion and derivation can be blocked and are applied conservatively. We outlined in section 6 an account of the semi-productivity of lexical rules in terms of a probabilistic performance account of their deployment in language production and interpretation. We have also suggested that this account should be integrated with an independent account of blocking or preemption (Briscoe et a/. 1 994), but this integration remains to be undertaken. The LRL/LKB framework has also been used to represent cross-linguistic lexical translation (non-)equivalence (Copestake & Sanfilippo 1 993), verbal diathesis alternations (Sanfilippo 1 993), and as a target representational frame work for the semi-automatic acquisition of lexical entries from machine readable dictionaries (see papers in Briscoe et a/. 1 99 3 and references therein). In future work, we intend to extend the framework to deal more accurately with default aspects of lexical behaviour and with the integration of lexical and pragmatic phenomena. TED BRISCOE Received: 1 4.01 .94 ANN COPESTA.KE
62 Semi-productive Polysemy and Sense Extension
NOTES
6
7
8
9
10
ness which accounts for the preferred usage of -ful nominals as measure phrases (e.g. A spoon/spoonful of sugar in a recipe context). Such differences are expected given blocking/pre-emption by syno nymy (e.g. Aronoff 1976: section 6 and below). LKB and LRL are thus something of a misnomer, since the system is not specific to lexical representation, but is also used for syntagmatic description. We assume that the lexicon includes everything which is not completely com positional, that is not regularly composed from the usual meanings, that the com ponents have in isolation. We have assumed here for ease of exposi tion that the consrraint specifications in the type system are all non-defeasible, although this will not be true in general. Type resolution, however, is determined by the indefeasible constraints and there is no notion of a 'default link' in the type hierarchy itself. so the formalization of the type system itself remains very similar. In any event, there would be severe practical problems in consrructing such a system, given that the type system would have to be recompiled each time a lexical entry was added. We are using this as a simple example purely to explain the lexical rule mechan ism, but we would, in fact, propose an animal human rule to allow for (some aspects of) the metaphorical uses of pig , worm , rabbit, and so on. Briscoe et a/. ( 1990) and Godard & Jayez ( 1 993) point out that there are problems with Pustejovsky's technical approach to type coercion relating to co-predication (see section 5 and below). We omit details of this proposal here, which is described most fully in Pustejovsky ( 1 993). We use a linearized form equivalent to the FS representation here for readability.
-+
11
12
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
1 W e would like to thank Geoff Nunberg for many helpful comments on a draft version of this paper, much discussion, and several examples. We have also bene fited from the discussion at presentations of earlier versions of this paper at the Dagstuhl seminar on 'Universals in the Lexicon' (March 1 993) and at the CSLI workshop on 'Ambiguity and Under representation' (September 1 993). We thank two anonymous referees for their helpful comments and suggestions. We take full responsibility for any remaining errors and infelicities. This work was partly supported by the ESPRIT Acqui lex-11, project BR-73 1 s. grant to Cam bridge University. We would also like to express our gratitude to Xerox PARC for providing Ann Copestake with a pleasant and productive working environment while this paper was being written. 2 We use 'conventional' to refer to a sense which is accepted and well attested within a speech community. sometimes this is called 'institutionalized' (e.g. Bauer 1983: 48) or 'established' (e.g. Cruse 1 986: 68). See Clark & Clark (1 979) and Hale & Keyser ( 1 993) for two widely differing views of such denominal verbs. 4 The term 'vagueness' has been used to refer to more general, less specified senses, such as the 'humankind' sense of man , as opposed to the fuzzy peripheral denota tion of cup or game. Cruse ( 1986: 8 1 ) argues that 'generality' would b e more appropriate to the former. We continue to use 'vague' to mean general or unspeci fied in deference to existing usage. The distinction between sense modulation and sense change is similar to Bierwisch's (1982) distinction between conceptual shift and conceptual specification. The semantics of these two processes are not identical: -ful suffixation has an addi tional entailment of fullness or complete-
Ann Copestake and Ted Briscoe 6 3 and other pragmatically or contextually determined interpretations. In our account, the stereotypical reading is specified by default as a by-product of the parsing process, but can be overridden pragmatically (see Briscoe et a/. I 990; Lascarides et a/. (forthcoming)). 1 4 We leave the treatment for both fast and enjoy with respect to coordination to section 5 below. 1 5 This description has been somewhat simplified but in any case we would not claim chat it is completely adequate. It does not, for instance, cover the mass use of cloud , found in (9a), which seems to be available only with the · default usage (compare (9b)):
( 9)
I6
17
18
19
a. We flew into dense cloud. b. *We walked into dense cloud of smoke.
Nor does it cover the metaphorical uses, such as cloud ofsuspicion . This makes our approach closest to chat of word-based morphology (e.g. Aronoff 1976) but with the possibility of phrasal based operations as well. In fact, there is more to be said on chis topic, since it seems plausible that deriva tional rules are less ambiguous, because of the information about the process con veyed by the affix, and therefore perhaps more fine-grained in the sense modifica tions they produce. Discussion of such differences, though, would take us out side the scope of this paper. Note chat in ( 1 1 ) both grinding and co composition are required-we assume that grinding of animals to meat creates an artifact which is specified for eventive relic and agentive qualia, leading to a default 'Sam enjoyed eating the lamb' interpretation. Avoidance of ambiguity might apply to sense extension, but not to derivation and it is not obvious how to measure brevity/ complexity. In fact, blocking is explicable simply in terms of avoiding obscurity, by which we mean that the speaker will
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We will leave some aspects of the repre sentation incomplete where the details are not relevant to our main concerns. For example, we do not specify here how the event variable e should be bound. Simple existential quantification looks unsatis factory since there seems to be something generic or habirual about fast typist. One possible approach might be to treat the domain of events as having a lattice struc ture (e.g. Krifka I 987) which would allow us to make the event referred to asfast the composite of subevents of the typist typ ing (cf. Ojeda I 993 on generic nominals) or perhaps the composite of some con textually salient subevents. Sincefast need not be fully distributive, this would not imply chat all subevents werefast . But we have not worked out the details of such a treatment since it is not .at all obvious how many typing events fast ought to apply to. Most work on generics and habituals makes the assumption that they can be paraphrased using normally or usually , but it is not clear chat chis is true offast typist , fast car, etc. It "is possible to assert chat Bill is a fast typist even if he usually types at 20 words per minute but was observed doing I 20 w.p.m. in a competition. An individual car can per haps be truthfully said to be fast even if it has never been driven above 40 m.p.h. yet, as long as its potential is known. This situation is not peculiar to this class of adjectival modification: John eats snails , for example, can be true even if he has only done so once or rwice (cf. Pelletier & Schubert's ( I 988) comments on Frenchmen eat horsemeat and similar examples). 1 3 The status of qualia structure in our approach is slightly different to that of Pustejovsky & Boguraev ( I 993 ) in chat we include qualia structure in the lexical representation of the noun (as a com ponent of an FS in the LKB) and specify type coercion in unification-based terms. However, we also recognize the need for interaction berween qualia structure derived stereotypical eventive readings
64 Semi-productive Polysemy and Sense Extension
20
22
23
2.4
25
(3 5 )
27
28
29
He was wearing a scarf, a pair of boots, and a look of considerable embarrassment. (Cruse 1 986: 1 3 )
26 We will also assume, for the moment, that the rype of the conjoined phrase is
30
underspecified. Technically, this raises a problem analogous to that affecting con junction in HPSG (Pollard & Sag, in press), since the rype could not be fully resolved, although, in this parricular case, it is possible to define a more complex rype system which avoids this situation. The main reason why we have main tained the distinction berween qualia structure and the rest of the sign here is to avoid making the representations un necessarily theory dependent. Within HPSG, for example, there are a variery of ways in which the qualia structure might be incorporated into the semantic repre sentation, which would affect the way in which the qualia structure of the con junct was derived. Qualia could be regarded as part of the BACKGROUND {that is as presuppositional rather than truth conditional) or even be located on the INDEX (Pollard & Sag, in press). These options would carry different implica tions as to how the qualia should be com bined in conjoined phrases. The only essential point here is that the inter pretation of examples like fast typists and computers where fost distributes over the conjuncts requires that the qualia struc ture of the conjuncts should still be indi vidually accessible in the phrase. Modulo the probabilistic interpretation, this manner of encoding the (non-)appli carion of a lexical rule has been deployed m many theories, e.g. Flickinger & Nerbonne ( 1 992) and Sanfilippo ( 1993 ) in recent accounts of verbal diathesis alter nations. It is plausible to imagine that language users are able to memorize some estimate of the relative frequency with which a word form and sense occur, though it is unlikely that this process 1s accurate enough to derive probabilities. Neverthe less, probabiliry rheory offers a precise and well-understood theory within which such institutions can be formal ized. Note that this account has little to say
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
21
generally use the form which has highest frequency. At first sight it might seem that this is circular, but note that we are not trying to account for the distribution of the blocked form in the general speech communiry here, but only for the effects on the individual speaker. Obviously, the choices of individual speakers affect over all frequencies, giving a positive feedback effect in this case. We consider this in more detail in section 6. We defer to section 6 an explanation of why this reading is preferred to one in which Sam is wearing rabbit fur. Some techniques for exploiting parallel ism berween lexical processes in machine translation are described in Copestake & Sanfilippo ( 1 993 ). Many adjectives which could normally apply to the animal but which are not usually seen as affecting the meat do not appear in these constructions (??We serve happy/beheaded chicken vs. We serve the meat of happy/beheaded chickens; see Nunberg, this volume.) However, we think this is explicable on the basis of general prag matic principles outlined in section s below. It seems relatively easy to become accus tomed to metonymic usages after a particular pattern when they recur in some corpus as though the process were becoming {locally) conventionalized (ham sandwich examples may have this status in the linguistics literature). In this particular case, however, it is by no means obvious that the newspaper that you are reading has to refer to a physical copy of a paper (see below). Zeugma is the traditional term for the variery of anomaly which arises when terms are inappropriately linked (yoked) together, such as in (3 s):
Ann Copestake and Ted Briscoe 65 about the conditions under which novel uses will be created, so we will need a further pragmatic theory of the factors licensing novel usage and of the possibil ity of such usage becoming conventional ized (see e.g. Bauer I 98 3). It might be
possible to account for the acquisition of lexical rules in terms of a post hoc process of generalization berween 'basic' and 'derived' entries at some point when the productivity of the putative rule reached some probabilistic threshold.
RE FERENCES
.
.
(1993), Inheritance, Defaults and the Lexicon , Cambridge University Press, Cambridge. Carpenter, R ( 1 992), The Lagic ofTyped Feature Structures , Cambridge University Press, Cambridge. Church, K. ( I 988), 'A stochastic parts pro gram and noun phrase parser for un restricted text', Proceedings of the Second
Conference on Applied Natural Language Processing (ANLP-88) , Austin, Texas, I 3643·
Acquisition: Using On-line Resources to Build Clark, E. V. & H. H. Clark ( I 979), 'When a Lexicon , Lawrence Erlbaum, New Jersey. nouns surface as verbs', Language , 55, 767Baker, M. C. ( I 988), Incorporation: A Theory of 8II. Grammatical Function Changing , University Cooper, R. P. ( I 99 I ), 'Coordination i n Uni of Chicago Press, Chicago. fication-Based Grammars', Proceedings of Bauer, L. ( I 983), English Word-Formation , the 5th Conference ofthe European Chapter of the Association for Computational Linguistics Cambridge University Press, Cambridge. Bierwisch, M. (I 982), 'Formal and lexical (EACL-9 1) , Berlin, I 67-72. Copestake, A. ( I 992), 'The representation of semantics', Linguistische Berichte, So, 3- I 7· lexical semantic information', doctoral Bolinger, D. L. ( I 97 5), Aspects of Language , Harcourt, Brace & Jovanovich, New York. Briscoe, E. J. & A Copestake ( I 991), 'Sense extensions as lexical rules', Proceedings ofthe
IJCAI Workshop on Computational Ap proaches to Non-Literal Language , Sydney, Australia, I 2-20. Briscoe, E. J., A. Copestake & B. Boguraev ( I 990), 'Enjoy the paper: lexical semantics via lexicology', Proceedings ofthe 13th Inter
national Conference on Computational Lin guistics (COUNG-90) , Helsinki, 42-7. Briscoe, E. J., A Copestake & A Lascarides ( I994, in press), 'Blocking', in P. St. Dizier & E. Viegas (eds), Computational Lexical Semantics , Cambridge University Press, Cambridge. Briscoe, E. J., A Copestake & V. de Paiva (eds)
dissertation, University of Sussex, Cogni tive Science Research Paper CSRP 280. Copestake, A. ( I 99Ja), 'Defaults in lexical representation', in E. J. Briscoe, A. Cope stake, & V. de Paiva (eds), Inheritance, Defaults and the Lexicon , Cambridge Uni versity Press, Cambridge, 223-45. Copestake, A. ( I 993b), 'The Compleat LKB', ACQUILEX-11 Deliverable, J. I . Copestake, A & E. J. Briscoe ( I 992), 'Lexical operations in a unification based frame work', in J. Pustejovsky & S. Bergler (eds),
Lexical Semantics and Knowledge Representa tion , Proceedings ofthe First SIGLEX Work shop , Berkeley, CA, Springer-Verlag, Berlin, I O I - I 9. Copestake, A & A Sanfilippo ( I 99J),
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Alshawi, H . (ed.) ( I 992), The Core Language Engine , MIT Press, Cambridge, MA Apresjan, Ju D. ( I 973), Regular Polysemy , Mouton, The Hague, The Netherlands. Aronoff, M. ( I 976), Word Formation in Genera tive Grammar, Linguistic Inquiry Mono graph I , MIT Press, Cambridge, MA Atkins, B. T. (1 990), 'Lexical rules: a starter pack', MS, Oxford University Press. Atkins, B. T. & B. Levin ( I 992), 'Admitting impediments', in U. Zemik (ed.), Lexical
66
Semi-productive Polysemy and Sense Extension
'Multilingual lexical representation', Pro ·ceedings of the AAAl Spring Symposium on Building Lexicons for Machine Translation , Stanford, CA. Cruse, D. A. ( 1 986), ·Lexical Semantics, Cam bridge University Press, Cambridge. van Deemter, K. ( 1 990), 'The ambiguous logic of Ambiguity', Proceedings of the First CLIN Meeting , Utrecht, The Netherlands, 1 7-32.
Fauconnier, G. ( 1 98 5) , Mental Spaces: Aspects of
Meaning Construction in Natural Language , .
putational Linguistics , 1 8.3, 269-3 1 0.
Godard, D. & J: Jayez ( 1 993 ), 'Towards a proper treatment of coercion phenomena',
Proceedings ofthe Sixth Conference ofthe Euro . pean Chapter of the Association for Computa tional Linguistics {EACL-9J), Utrecht, The Netherlands, 1 67-77Hale, K. & S. J. Keyser ( 1 993 ), 'On argument structure and the lexical expressions of syntactiC' relations', in K. Hale &. . S.J. Keyser (eds), The View from Building .zo: Essays in Honor ofSylvain.Bromberger, MIT Press, Cambridge, MA 5 3- 1 1 0. Hobbs, J. R., M. Stickel, D. Appelt, & P. Marrin ( 1 990), 'Interpretation as abduc tion', Technical Note No. 499, , Artificial Intelligence Center, SRI International, Menlo Park, CA. Jackendoff, R. ( 1 975 ), 'Morphological and semantic regularities in the lexicon', ,
Language, 5 1 ,
3, 6 3 9-7 1 .
Krieger, H-U. & J. Nerbonne ( 1 993 ), 'Fea ture-based inheritance networks for com putational lexicons', in E. J. Briscoe, A. Copestake & V. de Paiva (eds), Inheritance, Defaults and. the Lexicon , Cambridge Uni versity Press, Cambridge, 90- I 37· Krifka, M. ( I 987), 'Nominal reference and temporal constitution: towards a seman tics of quantity', Proceedings of the 6th Amsterdam Colloquium , University of Amsterdam, I 5 3-7 3 · LakofT, G. ( 1 987), Women, Fire, and Dangerous
Chicago. Lakoff, G. & M. Johnson ( 1 990), Metaphors We Live By, University of Chicago Press, Chicago. Lascarides, A., E. J. Briscoe, N. Asher & A. Copestake (forthcoming). 'Persistent associative default unification', ACQUI LEX Working Paper. Levin, B. ( 1 993 ), Towards a Lexical Organiza tion of English Verbs, Chicago University Press, Chicago. Link, G. ( 1 98 3 ), 'The logical analysis of plu rals and mass terms: a lattice-theoretical approach', in R. Bauerle, C. Schwartze, & A. von Stechow (eds), Meaning, Use and Interpretation of Language, de Gruyter, Berlin, 302-23. Martin, J. ( 1 990), A Computational Model of Metaphor Interpretation , Academic Press, Cambridge, MA. Nunberg, G. D. ( I 978), 'The pragmatics of reference', doctoral dissertation, CUNY Graduate · Center, reproduced by the Indiana University Linguistics Club. Nunberg, G. D. ( 1 979), 'The non-uniqueness of semantic solutions: polysemy', Linguis
tics and Philosophy ,
J, I 45-84.
Nunberg, G. D. ( 1 993 ), 'On the meaning and interpretation of lexical expressions',
Linguistics and Philosophy ,
16, I -43·
Nunberg, G. D. & A. Zaenen ( I 992), 'Syste matic polysemy in lexicology and lexico graphy', Proceedings of Euralex 92, Tampere, Finland. Ojeda, A. ( I 99 3 ), Linguistic Individuals , CSLI Lecture Notes 3 I , CSLI and University of Chicago Press. Ostler, N. & B. T. Atkins ( I 992), 'Predictable meaning shift some linguistic properties of lexical· implication rules', in J. Puste jovsky & S. Bergler (eds), Lexical Semantics
and Knowledge Representation, Proceedings of the First SIGLEX Workshop , Berkeley, CA,
Springer-Verlag, Berlin, 87- I oo. Partee, B. ( I 992), 'Syntactic categories and semantic type', in M. Rosner & R.Johnson (eds), Computational Linguistics and Formal
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
MIT Press, Cambridge, MA Flickinger, D. & J. Nerbonne ( 1 992), 'Inherit ance and complementation: a case study of easy adjectives and related nouns', Com
Things: What Categories Reveal about the Mind, University of Chicago Press,
Ann Copestake and Ted Briscoe 67
Semantics , Cambridge University Press, Cambridge, 97-1 26. Pearl,]. ( 1 988), Probabilistic Reasoning in Intelli gent Systems , Morgan Kaufmann, San Mateo, CA. Pelletier, F. J. & L. K. Schubert ( I 988), 'Prob lems in the representation of the logical form of generics, plurals, and mass nouns', in LePore (ed.), New Directions in Semantics , Academic Press, London, 3 8 s-� 5 1 . Pelletier, F. J. & L. K. Schubert ( I 989), 'Mass expressions', in D. Gabbay & F. Guenthner (eds.), Handbook of Philosophical Logic, Vol.
Based Approach to Syntax and Semantics: Vol. 1: Fundamentals , CSLI Lecture Notes I 3, Stanford, CA. Pollard, C. & I. Sag (in press), Head-driven Phrase Structure Grammar, Chicago U ni versity Press, Chicago. Pustejovsky,]. ( 1 99 I ), 'The generative lexicon', Computational Linguistics, 17, 4, 409-4 I . Pustejovsky, J. { I 993), 'Type coercion and lexical selection', in J. Pustejovsky {ed.), Semantics and the Lexicon , Kluwer, Dordrecht, 7 3-96. Pustejovsky, J. (I 994, in press), 'Linguistic constraints on type coercion', in P. St. Dizier & E. Viegas (eds), Computational Lexical Semantics, Cambridge University Press, Cambridge. Pustejovsky, ]. & B. Boguraev ( 1 993), 'Lexical knowledge representation and natural language processing', Artificial Intelligence , 6J, I 93-223. Rappaport, M. & B. Levin ( I 990), '-er
·
MA.
Soler, C. & M. A. Marti ( I 993). 'Dealing with lexical mismatches', ACQUILEX working paper no. 2.4. Sweetser, E. ( I 990), From Etymology to Prag matics , Cambridge University Press, Cam bridge. Wu, D. ( 1 990), 'Probabilistic unification based integration of syntactic and seman tic preferences for nominal compounds',
Proceedings of the 13th International Confer ence on Computational Linguistics {Colinggo) , Helsinki, 4 I 3- I 8. Young, M. & W. Rounds ( I 993). 'A logical semantics for non-monotonic sorts', Pro
ceedings of the 31st Conference of the Associa tion for Computational Linguistics (A CL-93), Columbus, Ohio, 209- I s.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
IV: Topics in the Philosophy of Longuage ,
Reidel, Dordrecht, 327-�07. Pollard, C. & I. Sag { I 987), An Information
nominals: implications for the theory of argument structure', in E. Wehrli & T. Stowell (eds), Syntax and the Lexicon , Syntax and Semantics 26, Academic Press, New York. Riehemann, S. ( 1 993), 'Word formation in lexical type hierarchies', M.Phil. thesis, University ofTiibingen, Germany. Sag, I., G. Gazdar, T. Wasow & S. Weisler { I 98 5), 'Coordination and how to distin guish categories', Natural Longuage and Linguistic Theory, J, I I 7-7 I . Sanfilippo, A . ( 1 993), 'LKB encoding of lexical knowledge from machine-readable dictionaries', in E. J. Briscoe, A. Copestake & V. de Paiva (eds), Inheritance, Defaults and the Lexicon , Cambridge University Press, Cambridge, I 90-222. Shieber, S. M. (1992), Constraint-based Gram mar Formalisms , MIT Press, Cambridge,
Journal of&mantics
© Oxford University Press 1995
12: 69-108
Lexical Disambiguation in a Discourse Context N I C H O L AS ASH E R
IRIT, Universiti Paul Sabatier and Department ofPhilosophy, University of Texas, Austin AL E X L AS CAR ID ES
Department ofLinguistics, Stanford University
In this paper we investigate how discourse structure affects the meanings of words, and how the meanings of words affect discourse structure. We integrate three ingredients: a theory of discourse structure called soRT, which represents discourse in terms of rhetorical relations that glue together the propositions introduced by the text segments; an accompanying theory of discourse attachment called DICE, which computes which rhetorical relations hold between the constituents, on the basis of the reader's background information; and a formal language for specifying the lexical knowledge-both syntactic and semantic-called the LKB. Through this integration, we can model the information flow from words to discourse, and discourse to words. From words to discourse, we show how the LKB permits the rules for computing rhetorical relations in DICEto be generalized and simplified, so that a single law applies to several semantically related lexical items. From discourse to words, we encode two novel heuristics for lexical disambiguation: disambiguate words so that discourse incoherence is avoided, and disambiguate words so that rhetorical connections are reinforced. These heuristics enable us to tackle several cases oflexical disambiguation that have until now been outside the scope of theories of lexical processing.
1
I NTRODUCT I O N
How is discourse information used to take lexical decisions, and lexical information used to take discourse decisions? In this paper, we observe data that illustrate the information flow between the semantics of words and the structure of discourse. From discourse to words, we illustrate how constraints on coherent discourse determine lexical sense disambiguation. From words to discourse, we illustrate how the meanings of words affect discourse structure. We go on to explain how a formal theory of discourse interpretation can be augmented with a theory oflexical semantics, so that this information flow can be modelled.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Abstract
70
Lexical Disambiguation in a Discourse Context
2.
T HE P U ZZLE S
The presentational problem Consider the disambiguation of bar in text ( I ): ( I ) a. The judge asked where the defendant was. 2.1
b. The barrister apologised, and said he was at the pub across the street. The court bailiff found him slumped underneath the bar.
c.
( I ) d. He took him to get coffee before returning to the courtroom. d '. ?He took him out of the courtroom to get coffee. This disambiguation must take place because of information in the discourse. But what information? Obviously, domain knowledge still plays an important part in the Discourse Scenario. In addition, we propose that the ways this domain information is presented in the discourse also has a critical effect. For consider how ( 1 ) is interpreted. Via domain knowledge, one infers that the defendant is not in a courtroom when the events described in ( I a, b) occur. Now we must interpret ( I c) in this context. To maintain discourse coherence, we must calculate how the meaning of ( I c) is connected to the meaning of the preceding discourse. One way in which one can do this is to work out the rhetorical relation -such as Explanation , Background , Narration , Contrast, and Evidence -which connects the mearting of the segments of text (Hobbs 1985; Thompson & Mann 1 988; Mann & Thompson (1987); Asher 1993a). The only candidate discourse relation for connecting (rc) to (1a, b) is Narration (Dahlgren 1 988; Lascarides & Asher 1991), thus making ( 1 ) a narrative story with events described in their temporal order. An alternative interpretation would have required further information in this context; for example, an Explanation or Evidence relation could have been
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
A theory oflexical processing that takes only domain information into account cannot handle this, because information about the structure of discourse is needed in order to resolve the lexical ambiguity. To see this, first consider the Sentence Scenario, where we interpret ( I c) in isolation of the discourse context and with the anaphor resolved: in other words, the sentence The court bailifffound the defendant slumped underneath the bar. In tltis context, bar would be interpreted as the courtroom bar. This is derived in part from the domain knowledge and the word associations of the courtroom bar with defendants and bailiffs. Now consider the Discourse Scenario: the interpretation of (I c) in the discourse context provided by ( I a, b). Now, bar refers to the drinking establishment, which is why the continuation (1d) is better than (1d').
Nicholas Asher and Alex Lascarides
71
{ I ) b'. The barrister apologised, and said he was talking to his family across the street.
There are four things to observe here. First, the above explanation hinges on two things: the constraints on coherent discourse that rhetorical relations impose; and a general heuristic for avoiding disambiguation that leads to discourse incoherence. The constraints on coherent discourse make crucial use of the particular rhetorical relations featured. The spatial and temporal constraints for a narrative are different from those for a contrastive discourse
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
marked by placing (1c) in the pluperfect, but they cannot be inferred on the basis of the information as it stands. Let us first suppose that { I c) is indeed connected to { I a, b) with Narration . This imposes a spatial constraint. Simplistically put, the constraint is as follows: unless there is explicit information in the constituents which, together with other background information, makes one believe that an actor in the text moves between the end of the eventuality in the first sentence and the start of the eventuality in the second, then the actor is stationary between these two time points. So, because the defendant is not in the courtroom when the events in ( I a, b) occur, he is not in the courtroom at the start of the finding event either. But find is an achievement verb, without temporal extent. So the start and end of the finding refer to one and the same time and place. So this entails that the defendant is not in the courtroom when he is found. But by the predicate argument structure offind , the defendant is at the bar when he is found. So the bar referred to cannot be in the courtroom. This precludes interpreting bar as having its courtroom sense. Alternatively, let us suppose that we use the domain information to disambiguate bar that was used in the Sentence Scenario. That is, we make domain knowledge about defendants and bailiffs determine the interpretation of bar in (I c), and thus assign its courtroom sense. Then the defendant would be in the courtroom. This violates Narration 's spatial constraint, and would therefore preclude { I c) from forming a narrative with { I a, b). But linking the segments with Narration was the only way of maintaining discourse coherence. So using this domain knowledge here ultimately results in discourse inco herence. The fact that we do not interpret ( I ) as incoherent leads �ne to conclude that we use the following strategy when disambiguating words: avoid disambigua tion that results in discourse incoherence. So in the Discourse Scenario, we do not exploit the domain information that we used in the Sentence Scenario. Rather, we use the spatial constraints on narratives to eliminate the courtrooom bar as a possibility, and so assign its pub sense. Note that we can even remove the word association between pub and the pub bar without affecting the result: replacing ( I b) with ( I b ') preserves the pub bar interpretation in ( I c).
72
Lexical Disambiguation in a Discourse Context
for example. So changing ( 1 ) to a contrastive discourse yields different meanings for bar, as is illustrated in (2), where now ( 1 d ') is a better continuation than ( 1 d) is. (2) Thejudge asked where the defendant was. His barrister apologised, and said he was at the pub across the street. But in fact, the court bailiff found him slumped underneath the bar.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Thus the disambiguation in ( 1) is ultimately driven by the rhetorical structure of the discourse. It illustrates a presentational problem: we must model how lexical disambiguation is affected by the way information is presented in the discourse context. The second thing to note is that rhetorical relations between sentences are not always linguistically marked. Sometimes they are inferred pragmatically, using a wide variety ofinfonnation: knowledge about syntax, semantic content, Gricean-style pragmatic maxims, the domain, and communicative intentions all play a part in inferring how segments of text should be connected together (Hobbs 1 98 5). Therefore, to model the information flow between discourse processing and lexical processing, we must use a computationally tractable reasoning mechanism that models how rhetorical relations are pragmatically inferred. The third thing to note is that the knowledge resources recruited in calculating the meaning of bar in (I) give conflicting messages, and the conflict is ultimately resolved. The two items of knowledge that apply and conflict are on the one hand the defeasible domain knowledge about bailiffs and defendants that is used to disambiguate bar in the Sentence Scenario, and on the other the defeasible knowledge that the discourse is narrative. The former knowledge entails that the defendant is in the courtroom when he is found, while via Narration 's spatial constraints the latter entails that he is not. The latter knowledge is favoured in disambiguation. We must explain why the conflict is resolved in this way. The fourth thing to note is that the above examples go beyond what current theories on lexical processing have attempted. Some techniques have been developed for resolving the ambiguity of words in context (e.g. Wilks 1 975; Boguraev 1 979; Hirst 1 987; Alshawi 1 992). Several theories of word meaning have addressed how pragmatic factors, like world knowledge, affect dis ambiguation (Wilks 1 975; Hayes 1 977; Schank & Abelson 1 977; Hobbs et a/. 1 990; Charniak 1 98 3; Wilensky 1 98 3, 1 990; Wilks et a/ . 1 988; Guthrie et a/. I 99I ; McRoy I 992). But this work has not attempted to tackle texts like (I) because one needs more than domain knowledge to explain them; one needs knowledge about rhetorial relations too. These theories present various techniques for modelling how domain information determines lexical disambiguation. Charniak (I98 3), for example,
Nicholas Asher and Alex Lascarides
73
( 1 ) a. The judge asked where the defendant was. b. The barrister apologised and said he was at the pub across the street. (3) But suddenly, his whereabouts were revealed by a loud snore. ( I ) c. The court bailiff found him slumped underneath at the bar.
The second technique is to assign a higher weight to associations between words that appear closer together in the text. This wrongly predicts that bar in ( 1 ) should have the courtroom sense, because court and bailiff are in closer proximity to bar in ( 1 ) than pub . The third technique is the one Guthrie et a/. ( 1 99 1 ) adopt, which is to favour the word sense with the most associations in the text. This technique would also disambiguate bar in ( 1 ) to its courtroom sense, because there are more word associations for this sense (from court , baili.Jf, judge, barrister, and defendant), than there are for the pub sense (from pub ). The problem with these techniques for resolving conflicting word associations is that they ignore the way the information is structured in the context. Obviously, modelling word associations along the lines suggested in
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
uses marker passing techniques to etablish word associations between items like bailiff and bar. More recently, Guthrie et a/. ( 1 99 1 ) use statistical techniques to tackle the same problem. They construct neighbourhoods for word senses, which contain the words that occur most frequently with that word sense in the definitions in a machine-readable dictionary. The idea is that bailiff is more likely to be in the neighbourhood of the courtroom bar than the drinking establishment bar. And one can use neighbourhoods to derive disambiguation: the word sense that has the largest intersection of its neighbourhood with the . words in the text wins. These techniques can sometimes predict the wrong results, for at least two main reasons. First, the frequencies of word co-occurrences are small, even in large corpora, so that the statistical models of disambiguation built from these frequencies can be unreliable. Second, statistical models of disambiguation using word association do not always handle conflicting word associations in the right manner. For example in ( 1 ), there are two conflicting word associations-one from pub and one from court , etc.-which mark different senses of bar. In Guthrie et a I .'s ( I 99 I ) terms, pub is in the neighbourhood of the pub sense of bar, whereas court is in the neighbourhood of the courtroom sense of bar. Given the statistical flavour of these lexical techniques, there are three ways of resolving this conflict. The first is to favour the word associations that occur more frequently in the corpus. But this will not work in general. Even if the association between pub and bar is stronger, thereby predicting the right interpretation of bar in ( 1 ), it would fail to predict that the meaning of bar is changed when (3) is inserted between ( I b) and ( I c).
74
Lexical Disambiguation in a Discourse Context
2.2
Strengthening rhetorical relations
Rhetorical relations also affect disambiguation in (4a, c) and (4b, c), and (4a, d), but in a different way to that illustrated in (I).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
the literature is an essential part oflexical processing. But in addition, we need to model how lexical processing is influenced by the rhetorical structure of the discourse. So we come to the following conclusions about how to solve the Presenta tional Problem. First, current theories on the pragmatics of lexical processing need to be brought together with theories on discourse interpretation that calculate how rhetorical relations are inferred, and the constraints they impose on coherent discourse. Then, disambiguation strategies like Avoid Discourse Incoherence can be represented. Second, we require a precise and computa tionally feasible account of how conflict among the various knowledge resources recruited during lexical processing is resolved. Now, many computational linguists model discourse by linking units of the discourse with rhetorical relations (e.g. Hobbs 1985; Hobbs et al. 1 990; Lascarides & Asher 1 99I, I993; Thompson & Mann 1 988; Mann & Thompson (1 987); Scha & Polanyi I988; Hovy 1990; Moore & Paris 1 989). However, with the exception of Hobbs et a!. (I 990) and Lascarides & Asher (1 993), these theories do not attempt to resolve conflict among the knowledge resources during interpretation. They therefore do not supply the kind of inferential framework we need to handle conflicting knowledge resources during disambiguation. TACITUS (Hobbs et a!. r 990) tackles conflict resolution by assigning weights to predicates and guiding inferences so that the conclusions inferred have the least weight. But there are no general principles behind the assignment of weights, making their account of conflict resolution unsatisfactory. Hobbs et a!. ( 1 990: 46) point out that extending the system by adding new knowledge resources requires extensive revisions to the existing representation, involving many hours of manual retuning of the weights on the target material. It is difficult to see what a solution to this retuning problem would be. Ideally, conflict resolution should be modelled by a sound logical consequence relation. For if this is achieved, then adding new knowledge resources to the theory is straightforward. The logic will predict how the new knowledge resource interacts with the existing ones, and the interactions among the existing knowl edge resources will remain the same. So no retuning will be required. This is essentially the approach adopted by Lascarides & Asher (1 99 1 , 1 993). They use a logic that can resolve conflict, and exploit this to model how the various pieces of background knowledge contribute to discourse interpretation. Because the logic predicts conflict resolution, they do not have to assign weights to predi cates.
Nicholas Asher and Alex Lascarides
(4-) a. b. c. d.
75
The EC has been acting decisively lately. The EC has been running meetings on time lately. Last night's meeting came to a conclusion by 8pm. But last night's meeting came to a conclusion before any significant matters were discussed.
(4-) a. The EC has been acting decisively lately. e. Last night's meeting came to an agreement by 8pm. ( Last night's meeting came to an end by 8pm. So the discourse is coherent regardless of how the word is disambiguated. Rather, having inferred which rhetorical relation holds between the sentences, the reader makes decisions about which sense of conclusion reinforces that rhetorical relation. So we have pinpointed a further disambiguation strategy involving rhetorical relations: disambiguate so as to reinforce rhetorical connecnons.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In (4-a, c) the word conclusion corresponds to agreement ; in (4-b, c) it means end . It is plausible to assume that this lexical disambiguation is driven by the interpreter H's knowledge of what the author/speaker S is trying to do in the discourse; i.e. to present (4-c) as Evidence for (4-a) (or (4b)). H infers this rhetorical relation from the dispositional reading of (4a) (and (4b))-primed by the fact that they are generic-vs. the fact that (4c) refers to a particular event. Having inferred that the rhetorical connection is Evidence , H can use this to disambiguate conclusion . In (4a, c), (4c) provides better evidential support for (4a) if conclusion is interpreted as agreement rather than end ; for H knows that meetings can come to an end without any decisions being made. But in (4b, c), (4c) provides better evidential support if conclusion means end . Finally in contrast to (4a, c), conclusion means end rather than agreement in (4a, d), because S is trying to do something different in the discourse: he is contrasting two propositions, rather than providing a relation of evidential support between them. Although both ( 1 ) and (4) use discourse information to drive lexical disambiguation, the ways in which they do so are different. In ( r ), we claim that only the pub sense of bar will preserve discourse coherence. A disambiguation strategy of Avoid Discourse Incoherence is being followed. In contrast, the lexical choice in (4a, c) is not driven by the need to avoid discourse incoherence, because an Evidence relation is supported, whatever the interpretation of conclusion . This is illustrated by the fact that (4a, e) and (4a, f) are both acceptable cases of Evidence .
76
Lexical Disambiguation in a Discourse Context 2.3
Lexical information and discourse decisions
Integrating lexical processing and discourse processing is not only of benefit to lexical interpretation. The theory of discourse interpretation benefits too. Without a representation of lexical knowledge, a theory of discourse attachment is unaware of the semantic concepts that underly lexical entries. Consequently, it misses generalizations in interpretation, across discourses that use semantically similar lexical entries (e.g. Lascarides & Asher I 99 I ). For example, in attaching the second sentence to the first in (s), one uses causal preferences about howfall and push should be connected together. Without a theory of lexical knowledge, the rule in the domain knowledge that represents this causal preference must use the actual predicatesfall and push , so that this rule applies when interpreting (s). But this misses a generalization, because we would then need separate laws of a similar nature for each text in (6), in spite of their semantic similarity.
(6) a. Max fell. John shoved him. b. Max rripped.John shoved him. c. Max stumbled. John bumped into him. This makes it difficult to encode the rules for discourse attachment in a systematic way, because the rules must be specific to the lexical entries present. By introducing a theory of lexical knowledge of an appropriate kind, we could generalize the discourse attachment laws. As long as we encode in our lexical knowledge base that push , shove, and bump are verbs that describe forces which can cause the movement of the patient, and fall, trip , and stumble are verbs that describe movement of its subject, we can represent the information we need to do discourse attachment in (s) and (6) with just one law: when trying to attach two constituents together, where the former describes movement, and the latter describes the application of a force that can cause movement to that same individual, then normally, Explanation is the preferred discourse relation. Theories of discourse attachment at present either provide an unsatisfactory account of how the knowledge resources interact during interpretation (e.g. Hobbs 1 98 s, Hobbs et a/. I 990); or they fail to integrate lexical and discourse processing so that the laws for discourse attachment capture intuitive generalizations (e.g. Lascarides & Asher I 99 1 , 1 993). We aim to solve these problems.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
( s ) Max fell. John pushed him.
Nicholas Asher and Alex Lascarides
77
3 S T A RT I N G P O I N T
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In order to place a theory oflexical disambiguation into a discourse context, we require three ingredients. First, we require a theory of discourse structure, which stipulates how rhetorical relations affect the structure of discourse, the constraints they impose on coherent discourse, and the semantic effects they have on the constituents they relate. Second, we require an accompanying theory of discourse attachment, which models how pragmatic knowledge resources are used to infer which rhetorical relations hold between two given discourse constituents. Finally, we need a formal language for representing lexical information; such as push is a verb that describes the application of a force that can result in movement of the patient, or the courtroom sense of bar is an object in a courtroom at which a barrister addresses the member of the court. These three ingredients must be mixed together in a unified account of NL interpretation, which makes precise the above accounts of how information flows between words and discourse. As we have mentioned, there are several theories of discourse structure available that use rhetorical rela�ons: e.g. Thompson & Mann's (I 987) Rhetorical Structure Theory, Hobbs et a I .'s (I 990) TACITUS, Scha & Polanyi's ( I 98 8) Linguistic Discourse Model (mM), and Asher's ( I 993a) Segmented Discourse Representation Theory (soRT). Only LDM and SORT provide a model theoretic interpretation of the representations of discourse structure. And only SORT accounts for the semantic effects that rhetorical relations have on the constituents being related. Consequently, SORT is the only theory that can calculate the impact that the coherence constraints of the various rhetorical relations have on the semantics of the constituents between which those rhetorical relations obtain. Furthermore, only TACITUS and SORT come in tandem with a theory of discourse attachment, which computes which rhetorical relations underly the text, given the reader's background knowledge. We therefore will use soRT as our discourse structure ingredient. And the theory of discourse attachment that accompanies soRT is called Discourse In Common Sense Entailment (mcE) (Lascarides & Asher I 99 I , I 993). This will be the second ingredient we use. DICE utilizes a logic called Commonsense Entailment (cE) (Asher & Morreau I 99 I ) , which is designed to handle reasoning with conflicting knowledge resources. As we have mentioned, it refines the tools used in TACITUS, in that it supplies a logical consequence relation for resolving conflict among the know ledge resources during discourse interpretatioiL DICE can explain how linguistic strings can be interpreted differently in different discourse contexts. But it is not equipped with a theory oflexical knowledge, and so as it stands it cannot model the reasoning that underlies the lexical disambiguations in ( I ) and (4).
78
Lexical Disambiguation in a Discourse Context
3 . 1 A description ofSORT
and DICE
SORT is a semantically based theory of discourse structure (Asher I 99 3a). This theory extends Kamp's ( I 98 I ) Discourse Representation Theory (oRT) to represent the rhetorical relations that hold between the propositions introduced in a text. SDRT takes the basic building blocks of discourse structure to be propositions with a dynamic content, which are represented as DRSs-the representation scheme in DRT. A simple discourse structure consists of DRSS related by discourse relations-like Narration , Background, and Evidence, among others. More generally, an NL text is represented by a recursive structure called a segmented DRS (or SDRs). An SDRS is a pair of sets containing respectively: the DRSS or SDRSS representing respectively sentences or text segments, and discourse relations between them. These structures are constructed in a dynamic, incremental fashion. The default assumption is that the sentence
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
For the third ingredient-a language for representing lexical information several candidates exist. Arguably from our point of view, the most promising of these is Copestake & Briscoe's ( I 99 I ) Lexical Knowledge Base (LKB). They use typed feature structures (Fss) to represent lexical information. The semantic type hierarchy, and the accompanying subsumption relation, represents information like cheese is a subtype of FOOD, or a trans-verb is a subtype of verb. One reason why using fss is advantageous from our perspective is that through exploiting re-entrancy we gain a tight interface between syntax and semantics. This is an essential requirement if we are to explore interactions between the syntactic and semantic knowledge resources recruited during text processing. The subsumption relation on the type hierarchy will also prove useful at the discourse level, for it represents in a succinct way the syntactic and semantic proximity of words. This will enable us to generalize the laws for discourse interpretation. We can build a general multi-model for the language of feature structures $)5 (Blackburn I 992), and the language of CE .Z > (where > is the default conditional connective): viz. a model for .Z (fs.>)· Nevertheless-and this is important-our formulation countenances no interaction between the CE logic of> (and afortiori the nonmonotonic consequence relation I"") on the one hand, and Blackburn's modal operators on the other. So from the perspective ofmcE, we can translate .Z (ft .>) into .Z> by treating each feature structure description and statement about subsumption relations, as atomic formulae of $>· The semantics of>, its logic, and the properties of the nonmonotonic consequence relation I"" are as discussed in Lascarides & Asher ( 1 99 3), or more generally (for the predicate case) as in Asher and Morreau (I 99 I ). We now describe the SDRT, DICE, and the LKB in more detail.
Nicholas Asher and Alex Lascarides
79
,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
boundary marks the unit of information to be attached to the SDRS for the preceding discourse. Discourse relations modelled after those proposed by Hobbs ( 1 985 ) and Thompson & Mann ( 1 988 ) link together the constituents of an SDRS. We will use seven discourse relations: Narration , Elaboration , Explanation , Background, Evidence , Consequence, and Contrast. The first four of these constrain temporal structure: Narration entails that the descriptive order of events matches their temporal order; an Explanation or Elaboration entail they mismatch; and Background entails temporal overlap. Narration and Elaboration also constrain spatial structure (Asher 1 993b). Further details of the semantics of these relations, and NL examples, can be found in (Hobbs 1 98 s; Thompson & Mann 1 987; Asher 1 993; Lascarides & Asher 1 993 ). The recursive nature of soass give discourse structures a hierarchical configuration. The subordinating relations are Elaboration and Explanation , and the constituents to which new information can attach are a subset of the so-called open constituents , which are a subset of those constituents on the right frontier of the discourse structure (cf Polanyi 1 98 5; Grosz & Sidner 1 986; Webber 1 99 1 ), assuming that it is built in a depth first, left to right manner. SDRT specifies where in the preceding discourse structure the proposition introduced by the current sentence can attach with a discourse relation. DICE (Lascarides & Asher 1 99 1 , 1 993; Lascarides & Oberlander 1 992, 1 993 ) is a formal theory of discourse attachment, which provides the means to infer from the reader's knowledge resources which discourse relation should be used to do attachment. Here, we assume the reader's knowledge base (Ks) contains: the SDRS for the text so far; the logical form of the current sentence; an assumption that the current sentence must attach at an open site (i.e. the text is coherent); all defeasible and indefeasible world and pragmatic knowledge; and the laws of logic. Lascarides & Asher ( 1 99 1 ) argue that the rules introduced below are manifestations of Gricean-style pragmatic maxims and world knowledge, and, as we have just mentioned, they form parr of the reader's KB. A forma i notation makes clear both the logical structures of these rules, and the problems involved in calculating rhetorical relations. Let ( r, a , {3) be the update function, which means 'the representation r of the text so far, of which a is an open node, is to be updated with the representation f3 of the current sentence via a discourse relation with a '. Let a 1_ {3 mean that a is a topic for {3; let ea be a term referring to the main eventuality described by a ; and let Ja ll (ea m ) mean that this event is a Max falling. Let e 1 < e 2 mean the eventuality e 1 precedes e 2, and cause ( e , e 2) mean e 1 causes e 2• Finally, we represent the defeasible connective as a conditional > (so ¢ > tp means 'if ¢ , then normally tp'). The maxims for modelling implicature are then represented as schemas:1
So Lexical Disambiguation in a Discourse Context o Narration: ( T, a , {3) > Narration (a , {3) o Axiom on Narration: o(Narration ( a , {3) --+ e a < ep) o Background: ((-r, a , {3) A state (ep)) > Background(a , {3) o Axiom on Background: o(Background(a , {3) --+ overlap (ea, ep)) o Push Explanation Law: ((-r, a , {3) Afoll(ea , m ) 1\ push (ep.j. m )) >
&planation (a , {3)
o Axiom on Explanation : o(Explanation (a , {3) --+ cause(ep. ea)) o Causes Precede Effects: o(cause(e 1 , e , ) - ...., e , < e ,)
o
A Common Topic for Narrative:
o(Narration (a , {3) - (3 y )(y H a 1\ y a {3) 1\ .....,( a a {3) 1\ ...., ({3 A a)) o Topic for Elaboration: o(Elaboration (a , {3) --+ a A {3) The logic on which DICE rests is Asher & Morreau's ( 1 99 1 ) Commonsense Entailment (cE). Motivation for choosing CE over other candidate nonmono tonic logics is discussed in detail in (Lascarides & Asher, 1993) . Three patterns of nonmonotonic inference are particularly relevant. The first is Defeasible Modus Ponens: if one default rule has its antecedent verified, then the consequent is nonmonotonically inferred. The second is the Penguin Principle: if there are conflicting default rules that apply (conflicting in the sense that the consequents cannot all hold in a consistent KB), and the antecedents to these default rules are in logical entailment relations, then the consequent of the rule
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The rules for Narration and its Axiom convey the pragmatic effects of the textual order of events; by default, textual order mirrors temporal order. Background and its Axiom convey the pragmatic effects derived from aktion sart (states normally provide background information). The Push Explanation Law is a mixture ofworld knowledge (WK) and linguistic knowledge (LK). Given that {3 is to be attached to a with a discourse relation, the events they describe must be connected somehow. Given the kinds of events that they are, the reader normally concludes that the pushing caused the falling. Therefore, the pushing explained why Max fell. Finally, the Causes Precede their Effects is indefeasible world knowledge. In fact, in some cases these rules are slightly modified versions of the rules in Lascarides & Asher (1991, 1 993). We have made modifications to simplify the inferences underlying discourse attachment. The complexities we ignore have no bearing on the information flow between words and discourse, which is our main concern in this paper. We also have laws relating the discourse strucrure to the topic structure (Lascarides & Asher 1 993). A Common Topic for Narrative states that any constiruent related by Narration must have a distinct, common (and perhaps implicit) topic, and Topic for Elaboration states that the elaborated constituent is the topic:
Nicholas Asher and Alex Lascarides 8 1
(7) Max stood up. John greeted him. (a)
standup( e1 , m ) hold( e1 , t t ) t 1 --< now
({j)
greet( e2, j, m ) hold( e2, t2) t2 --< now
The only rule that applies is Narration, and its consequent is inferred via Defeasible Modus Ponens. Hence by Narration's Axiom, the standing up precedes the greeting. By contrast in text (s), the KB verifies the antecedents to two conflicting defeasible laws: Narration and the Push Explanation Law (Narration ( a , {3) and Explanation ( a , {3) cannot both hold given their Axioms).
( s ) Max fell. John pushed him.
By the Penguin Principle, the Push Explanation Law wins, because its antecedent entails Narration's. Hence its consequent-Explanation -is inferred. 3 .2
Typedfeature structuresfor lexical information
As mentioned, we require a formal language for representing lexical information. We use typed ESs, similar to chose described in Carpenter ( I 992). As shown in Copestake & Briscoe ( I 99 I ), such a language allows a tight interface between syntax and semantics, and the lexicon and interpreter.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
with the most specific antecedent is nonmonotonically inferred. The third is the Nixon Diamond: if there are conflicting default rules that apply bur no logical relations between the antecedents, chen no conclusions are inferred. Two further features of CE are essential to DICE. First, all the monotonic inferences from the premises are retrieved before embarking on the nonmono tonic inferences. The second feature is irrelevance: each one of these inference patterns also holds when information that is irrelevant to the predicates involved in the pattern is added to the premises (see Morreau (1 992) for a precise definition: also Lascarides & Asher ( I 993: appendix) have shown the role of irrelevance in DICE) . Let us illustrate how DICE works by means of two simple examples. In interpreting (7), the KB contains a, f3 and (a , a , {3), where a and f3 are respectively the logical forms of the first and second sentences.
82
Lexical Disambiguation in a Discourse Context
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Features are represented by words in italics and types by words in bold. The type hierarchy is a partial ordering defined by the type subsumption relation � Subtypes inherit all the properties of all their supertypes. So, for example, if salmon � food (meaning salmon is a type of food), then salmon has all the properties specified by the type constraints on food, plus perhaps some more properties of its own. We return to this shortly. The models for typed FSS correspond to directed acyclic graphs (nAGs). One can define the semantics of typed FSS by thinking of features as modal operators that label arcs between the nodes of a DAG, types as propositions holding at nodes, and constraints on types as conditionals (c£ Blackburn 1 992 ). As we have explained, since there is no interaction between Blackburn's models for FSS and those for >, we can think of Blackburn's WFFS for defining Fss as atomic WFFS on the language of DICE. Our semantic framework is based on DRT, which defines the accessibility constraints on anaphora in discourse, as well as truth-conditional content. Because we exploit this property of DRT in SDRT and DICE, we augment the semantic component of Copestake & Briscoe's lexical entries by replacing the predicate logic formulae with DRSS. These will yield the logical forms of sentences as DRSS, and consequently we can maintain DRT's explanation of anaphora resolution in discourse. We will assume, in line with Pustejovsky ( 1 99 1 ), that the lexical entries that describe causes and their effects specify in which dimension this rakes place. The dimensions of causation we use stem from Aristotle. There are four of them: Location, Form, Matter, and Intention. Locative (we) causes are efficient causes. They include movement, for example, such as pushing Max. Formal (FORM) causes create and destroy objects: for example, building a house, or eating a sandwich. Matter (MAlTER) causes change the shape, size, matter, or colour of an object: for example, treading on a cake, or painting a block red. Finally, Intentional {INT) causes change the propositional attitudes of indi viduals: for example, persuading John. It is by no means clear that these should be universal categories of a KB, or the only categories. But they do for our purposes here. Using notation similar to Sanfilippo ( I 992), we distinguish between those verbs that describe change, and those that cause it. So loc (change(e, y)) means that the event e describes a change in the individual y in the dimension LOC : e.g. fo ll (e , y) will make loc (clzange(e, y)) true.2 And loc (cause-change-force(e, x , y )) means that x , through action e, causes a change in the LOC forces acting on y. For example, under one of irs senses, push (e , x , y ) will make foe (cause-change Jorce(e, x , y )) true. Consider the LOC sense of push . This sense of pushing involves the application of a force through contact between the agent and patient. And depending on the strength of the force and the type of object, this force can result in the patient moving-that is, the patient can undergo a LOC change. In
Nicholas Asher and Alex Lasca rides
83
line with Sanfilippo (1 992), we assume that this sense of push appears in the semantic type hierarchy at the meet of the types force and contact, and so would inherit features from both. The we sense of push is represented in the diagram/ causal-t ransverb
orth : push cat : transverb-cat
subj :
sem : agent : syn : 'Y" '
., ' [
[ ' "Pl 1 J
obj : sem
] m]
=
patzent
=
m
contact-force
sem :
[
loc "'""-
form :
[j] x , IIJ y , IIJ e , t
l]
push( e, x , y) hold( e, t)
This FS relates syntactic elements to semamic ones via re-emrancy. Whatever fills the agent and patient positions in the syntax must also fill respectively the arguments in semantics for the individual who applies the force through contact, and the individual affected by this. In the semantic component, we see that push is classified as a contact-force verb; in other words it's a verb describing force through contact. qs stands for qualia structure , and the value of this feature supplies information about the dimension of causation (if any), the relic roles (i.e. the purpose) of objects, the form of objects, and their agentive roles (c£ Pustejovsky 199 1 ; Copestake & Briscoe 1 99 1 , this volume). From the fact that push describes a LOC force through contact, we'll learn from the type hierarchy that the pushing action causes a change in the LOC forces acting on the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
[ [
syn : cat : np
84
Lexical Disambiguation in a Discourse Context
(8) foe : cause-change-force(rJJ, OJ, rn) (9) int : encourage (rJJ , OJ, rn, p) We assume a hierarchy of types ensures that the resultant feature structure is well typed; in particular, the type of the semantic component of the FS will be the more general type force, rather than contact-force. Or to put it another way, some semantically ambiguous words form a type hierarchy, such that each sense is a subtype of the ambiguous word. Any DAG that satisfies the we sense of push will also satisfY the feature structure that represents the ambiguity between the we and INT sense ofpush . When translating the NL verb push into logical form, we use the feature structure that represents the ambiguous sense until the word can be disambiguated by other information. Given the syntactic and semantic information in ( w), only the we or INT senses of push will produce a well formed feature structure representing ( I o). ( I o) John pushed Max. We assume that the grammar will produce a feature structure where the logical form of ( I o) is the standard DRT representation ( I o '), that DICE used in Lascarides & Asher ( 1 99 I ).5 e, x , y, t
( 10')
john(x) max(y) push( e, x, y) hold( e, t) t -< now
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
patient y, thereby making loc(cause-change-Jorce(e, x, y)) true, as we require. Finally, the value of the featureform is a DRS, and via the sentence grammar this is used to build the DRS representing the logical form of any sentence containing the word push . Feature structures can describe partial information, in the sense that more than one DAG may satisfy the feature structure. We can use this to represent semantic ambiguity. For example, John pushed Max can mean John applies a force to Max which, iflarge enough, will cause Max to move (push 's we sense), or that he encouraged Max to p , where p is some proposition (push 's INT sense).4 The semantic ambiguity between the we and INT senses ofpush is represented by a feature structure that looks just like that given above, save that (8) is removed from the qs , and replaced with an FS that is a supertype of the FSS in (8) and (9):
Nicholas Asher and Alex Lascarides
Ss
A grammar such as HPSG (Pollard & Sag 1994) o r LFG (Kaplan & Bresnan 1982) could be used here. We assume at this stage that the word push has not been disambiguated, because the intra-sentential informacion in (1o) alone does not suffice to do this. Other information will be needed, for example the preceding discourse context. However, the typed FSS represent more informacion than just the DRS ( 10 ): for example, that push involves a wc or INT force where the subject John is the agent and the object Max is the patient. This additional information can be exploited when reasoning about discourse attachment, since the typed FS is assumed to form part of the reader's KB. How modular are the representations of lexical and discourse informacion? The rules for NL interpretation in TACITUS (Hobbs et a/. I 990) are non-modular, in that a single rule can represent lexical informacion, world knowledge, and linguistic knowledge as inextricably linked together. Consequently, there are no lexical entries in TACITUS; lexical information and more general background information are not represented in separate modules. In DICE, the rules for NL interpretation are also represented so that the various knowledge resources seem to be inextricably linked. But in contrast to TACITUS, lexical knowledge is added to DICE via a separate mechanism: typed Fss. We will put information into a lexical entry only if it is useful for what have been thought of as lexical processes-such as coercion, metonymy (Pustejovsky 1 99 1 ), and sense exten sions (Copestake & Briscoe I 99 1 , chis volume). The logic will permit complex interactions between the lexical and discourse modules, and, through this, integration among knowledge resources will be achieved. '
4. 1
F R O M W O R D S T O D I S C O U R SE
Lexical semantics in service to explanations
We have extended DICE with typed FSs which represent lexical information. This extension can be used to generalize laws like the Push Explanation law. o
Push Explanation Law: ( r, a , {3 ) 1\fall(ea , m ) 1\ push (ep,j . m ) > Explanation ( a , {3)
This law was used to interpret (5). But it would not apply when interpreting rhe texts in (6), in spite of the fact that they are so closely related in meaning. (5) Max fell. John pushed him. (6) a. Max fell. John shoved him. b. Max tripped. John shoved him. c. Max stumbled. John bumped into him.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
4·
86
Lexical Disambiguation in a Discourse Context
Our classification of the dimensions of causation in the lexical entries have utility at the level of discourse attachment, and can overcome this limitation in DICE. First, we re-encode causation's effect on discourse attachment, so that a single, new Explanation law applies to the texts in (s) and (6): •
Explanation: ( r, a , {3) 1\ n(change(ea, y )) 1\ n (cause-change-force(ep. x, y )) > Explanation (a ,
/3)
( 1 1 ) The cake fell off the table. Max trod on it.
Explanation will not apply to ( 1 1 ) because the causal dimensions of the verbs mismatch. And so one will infer ( 1 1 ) is narrative. Thus, using causal dimensions in the representation of lexical information is crucial if lexical processing is to influence discourse attachment in appropriate ways. How do we ensure that lexical knowledge is specified in such a way that the antecedent to Explanation is verified when doing discourse attachment in (5)? There is a further problem that must be addressed if the above story is to work. As we have already mentioned, the word push is ambiguous in the sentence
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In words, the above law expresses a mixture of pragmatic knowledge and lexical knowledge about causation. It states that if f3 is to be attached to a with a discourse relation, and the event ea (which is introduced in the lexical entry of the main verb in a ) is a change in the individual y along the causal dimension n, and ep is an action brought about by x which changes the forces along dimension n that act on y, then normally f3 explains a (and so by Explanation's Axiom, ep caused ea , thereby leading to the inference that the n force on y was sufficient to cause the n change in y described by ea).6 Now suppose that we specify the lexical entries offall , stumble , trip , push , shove , and bump so that the Explanation Law applies when attaching the sentences in (s) and (6). This is guaranteed if these lexical entries have an rqs that looks like that of the we sense of push defined above. Then by the Penguin Principle, Explanation will be inferred in each of these texts, the conflicting laws being Explanation and Narration . By encoding lexical knowledge in such a way that the antecedent of the above Explanation Law is verified, we simplify the theory of discourse attachment. A single Explanation law for discourse attachment applies to all the above texts, rather than requiring a separate law for each lexical entry. It is essential that this new version of the Explanation law specify the causal dimensions of change. This is because we must ensure that a relation of causation is inferred only if the changes to the individual that are described in the sentences to be attached are compatible. We do not, for example, wish to infer a relation of causation between treading on a cake, which causes deformation of the cake, to the cake falling, which describes movement:
Nicholas Asher and Alex Lascarides
87
john pushed Max between its we and INT senses. But in the discourse context (s), push must be disambiguated to its LOC sense for the Explanation Law to apply.
(a )
x , e 1 , t1
y, e2, t2
max(x) fall(et , x ) hold( e1 , t1 ) t1 -< now
john(y) push( e2 , y, x) hold(e2, t2) t2 -< now
(!3)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
How does this disambiguation take place? Intuitively, when the context fails to provide a proposition p that can be used to interpret John pushed Max as John encouraged Max to p , and when the context fails to tell you that the agent and patient cannot be in physical contact with each other, then the we meaning of push is preferred. No such proposition appears in (s), and there is no inference thatJohn and Max cannot possibly be in contact with each other. So push is assigned the LOC sense. Thus Explanation applies. How are we to formalize this reasoning process? In CE the nonmonotonic consequence relation is I""· So if the context provides an appropriate proposition p such that John encouraged Max to p holds, then the following holds: (-r, a , {J), r, fJ I"" (3p )encourage( e , x , y, p )? Similarly, if the context provides information which leads to the inference that x and y cannot possibly be in contact with each other, then the following holds: ( r, a , {J), r, fJ I"" -. contact (x , y ). CE is a logic that can represent directly in the object language the I"" relation (Asher 1 993c). It is a nested conditional with the premises in the antecedent, and the conclusion in the consequent. We gloss over which formulae appear in between the antecedent and consequent in the nested conditional, since they are complex, and irrelevant to our purposes here. We simply refer to the nested conditionals that encode the above two I"" relations as n ;,, and ndisl respectively. So the law below captures the intuition that the LOC sense ofpush is chosen in ( 1 o), unless discourse context provides the appropriate proposition p such that John encouraged Max to p , or the discourse context leads to the inference that John and Max cannot be in physical contact. • Locative Push: o ((({J - push (e , x , y)) 1\ --.n ;., 1\ --.n J;s, ) -- loc(cause-change-force(e, x , y))) Note that we assume chat the semantics of loc(cause-change-force (e , x, y )) is such chat rllis rule does not entail chat Max actually moves. Rather, it entails chat the force applied to him is one that, if of sufficient strength, would cause him to move. We also assume chat the metaphoric uses ofjolm pushed Max block the rule from firing by making ndist true. Now consider how the lexical preference stated in Locative Push will interact with the other laws in DICE, in the analysis of (s). The logical forms of the sentences in (5) are respectively a and fJ .
88
Lexical Disambiguation in a Discourse Context
s
Defeasible Version ofLocative Push:
push (e, x , y) > loc (cause-change-force(e, x, y ))
Then we would fail to infer in the monotonic component of DICE that push is assigned the LOC sense. The laws that apply in the nonmonotonic component would then be Narration, and the defeasible version of the Locative Push Law. Bur there is conflict between them, because Narration ( a , /3) and Explanation ( a , {3 ) are inconsistent, and the latter is inferrable if the defeasible version of the Locative Push Law and Explanation fire in turn. But the antecedents of Narration and the Locative Push Law are unrelated. Consequently, the conflict is irresolvable, and no discourse relation between the sentences in (5) would be inferred, thus predicting-contrary to intuition-that the text is incoherent. If we assume that lexical and discourse information are modelled by the same logic of defaults, and that lexical and discourse processing can be interleaved, we obtain the above undesirable interactions. We could avoid this by either assuming that lexical information and discourse information are modelled by two different logics of defaults, or by assuming that lexical processing is always done before discourse processing, so that the conclusions of the Defeasible Version ofLocative Push is inferred before discourse attachment
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Bur a and {3 together do not provide a proposition p such that we can infer encourage(e 2, x, y, p ). There is no inference from the discourse context to en courage(e2, x, y, p ), because the candidate propositions-john {y), hold(e " t ,), fall (e " x )-are not of the appropriate kind; and none of the lexical entries in a or {3 features the semantic type encourage (at least, not unambiguously). So -.Oint holds. Moreover, a and f3 do not allow us to infer that John and Max cannot be in contact with each other. So -.n dist also holds. Hence Locative Push's ante cedent is verified. This is an indefeasible law, and in CE monotonic reasoning takes place before nonmonotonic reasoning. Therefore its consequent that push is assigned its wc sense, is inferred in DICE. Consequently, in the nonmonotonic component of DICE, the laws for discourse attachment that apply are Explana tion and Narration. And by the Penguin Principle, Explanation ( a , /3) is inferred. We have represented the preference for the wc sense of push as an indefeasible law, with the exception to this preference explicitly represented in the antecedent, the exception being that there is a proposition p in the discourse context such that x encouraged y to p , or that x and y cannot be in contact. This appears to go against the current trend in lexical processing for using defeasible reasoning as an abbreviatory mechanism and allowing . one to delete the exception statement (cf Daelemans et al. 1 992.). But such an abbreviatory strategy may be dangerous when applied here. For suppose that we stated the law as defeasible and deleted the exception statement as shown below:
Nicholas Asher and Alex Lascarides 89
4.2
Lexical semantics in service to elaborations
A similar problem to DICE's original treatment of explanations arises in elaborations. Consider the following example, taken from Lascarides & Asher ( I 99 1 ):
( 1 2) a. Guy et�oyed a lovely meal. b. He ate salmon. c. He devoured lots of cheese. Lascarides & Asher ( I 99 I ) use the following rules to infer that ( I 2b) and ( I 2c) elaborate ( 1 2a): they capture the intuition that if Guy eating a meal and Guy eating salmon (or cheese) are to be connected, then a part/whole relation is preferred, and that this in turn normally yields Elaboration : o o o
The Salmon Law: (r, a , {3) 1\ eat (ea, g , meal) 1\ eat (e13, g, salmon ) > part (e13,
ea)
The Cheese Law: ( r, a , {3) 1\ eat(ea,g , meal) 1\ eat(e13,g, cheese) > part (e13, ea) Elaboration: ( r, a , {3) 1\ part(e13, ea) > Elaboration (a , {3)
These laws allow one to infer that the text is an elaboration. But this analysis misses a generalization: it is the fact that salmon and cheese are a subtype of food, which in turn constitutes the substance of the meal, which permits us to infer that ( I 2b) and ( I 2c) elaborate ( I 2a). The fact that we needed two laws-the Cheese and Salmon Laws-to analyse the above misses chis generalization.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
is attempted. We do not want to make either of these assumptions, however, because this would preclude doingjustice to the complex interactions between lexical and discourse processing. In particular, we would fail to formalize the resolution of conflict between the lexical and discourse information that we illustrated in our informal analysis of text ( 1 ). We were able to present the lexical preference for push in ( 1 0) via an indefeasible law, because the exceptions to the preference can be explicitly stated. This contrasts with defaults for discourse attachment, where exceptions to preferences cannot be exhaustively listed. If exceptions to lexical preferences can be listed, then default statements should be avoided, or else spurious knowledge conflicts will occur. The above framework for specifying the semantics oflexical entries ensures a straightforward interaction with the rules for discourse attachment in DICE. Inferences about causation ar the lexical level help us ro infer the rhetorical relation &planation at the textual level. The analyses of the texts in (6) will be similar ro that of (s ), assuming that the lexical entries are similar to fall and push . So through lexical processing, we have achieved a generalization in DICE about how causation affects discourse attachment.
90
Lexical Disambiguation in a Discourse Contexc
event- no u n
orth : meal event
agent : II] animate eventstr :
patient : 0 food process : m eat
sem : form :
II] X , (2] y , m e eat ( e, x , y )
edible-fish-s ubstancet;;;; food lexical- mass- nou n
[
orth : salmon syn :
sem :
cat : noun count : -
]
edible-fish-substance X
salmon( x )
In words, meal refers to an event of an animate individual eating food. Salmon is a mass noun and edible-fish-substance is a subtype of food.9
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We have seen how specifying causal information in lexical semantics permitted generalization at the textual level for explanations. Here we see how subsumption relations on semantic types in lexical entries permit generaliza tion for elaborations. ( 1 2a) is a sentence where logical metonymy takes place, as described in Briscoe et a/. ( 1 990). One must ensure that the object of enjoy is of the right type, namely, an event. Using techniques described in Pustejovesky ( 1 99 1 ) we can assume that the meal in ( 1 2a) is coerced into the event eating a meal, because this is the relic role or purpose of a meal. Coercing meal to eating the meal is a default, and hence further discourse information could override this preference (c£ Copestake & Briscoe, this volume). We gloss over this here, however, So the relevant FSs for building the logical forms of ( 1 2a) and (1 2b) are respectively as shown in the diagrams below.8
Nicholas Asher and Alex Lascarides
91
The above FSS encode semantic type information o n the individual involved. We can use this to generalize the laws at the discourse level that are used to infer Elaboration . To do this, we first extend � to take DRSS y 1 and y, as arguments: y 1 b y, if there is a discourse reference x 1 E Uy 1 such that the conditions on x 1 are a subtype of those on the discourse referents in UY , and y 2 has no conditions on its discourse referents that are a subtype of a condition on a discourse referent in y 1 • So, for example, y 2 !: y 1 below, because salmon is an edible-fish-substance, which is a subtype of food.
X
salmon(x)
We now state the relevant laws: o
o
Subtype:
o (( 0;( e-condn a, a , y 1 ) 1\ 0;(e-condnp, {3, y 2) 1\ e-condnp) s:;;; e-condna 1\ y 2 s:;;; y 1 ) -+ subtype(f3 , a )) Elaboration: (r, a , {3) 1\ subtype(f3, a ) > Elaboration (a , {3)
In words Subtype states the following: if (a) the DRSS y 1 and y, respectively identify the thematic role 0; in a and {3, with respect to the event conditions e-condna and e-condnp on particular events in a and {3; (b) this condition on an event of {3 is a subtype of that of a (for example in this case, devour is a subtype of eat); and (c) y, is a subtype of y 1 ; then {3 is a subtype of a . Elaboration states that if {3 is to be attached to a, and {3 is a subtype of a , then normally Elaboration (a , {3) holds. The above lexical information and rules for discourse attachment have an impact on the analysis of( 1 2.a, b). We assume that the sentence grammar derives the logical forms a and {3 respectively for ( 1 2.a) and ( 1 2.b) from the above lexical entries, and in the FS for {3, the patient eat is of type edible-fish-substance which, as we have stated, is a subtype of food: x , e1 , t1 . e, f
( ) o
ax ( x ) enjoy(e1 , x , e ) hold(ei . t t ) eat(e, x , ! ) t 1 -< now
e2, t2 , s
m
(,B)
eat( e2, x , s ) salmon(s ) hold( e2, t 2 ) t2 -< now
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(It) � �
92
Lexical Disambiguation in a Discourse Context
5
F R O M D I S C O U RSE TO W O R D S
We have seen how lexical processing can work in service to a theory of discourse attachment. Now we investigate the other side of the coin. How can the knowledge resources encoded in a theory of discourse attachment be used in lexical processing? In particular, how should we encode the affects of discourse context on lexical disambiguation in ( 1 ) and (4)? ( 1 ) a. b. c. (4) a. b. c.
The judge demanded to know where the defendant was. The barrister apologised and said that he was drinking across the street. The court bailiff found him slumped at the bar. The EC are decisive. The EC run meetings on rime. Last night's meeting came to a conclusion by 8pm.
We suggested earlier that bar in ( 1) is disambiguated to its pub sense on the basis of constraints on coherent discourse. In contrast, conclusion in (4a, c) and (4b, c) are disambiguated on the basis of strengthening the rhetorical link between the sentences. We now show how these proposals can work in formal detail. We analyse each of the above texts in rurn. 5.1
Lexica[ information and discourse coherence
When disambiguating bar in text ( 1 ), we argued that the discourse information on how to disambiguate bar-which results from the coherence constraints on discourse-conflicts with the information on how to disambiguate bar which
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
According to the above lexical entries for meal and salmon , the DRss y 1 and y 2 above are in the same theta role patient to a 's and P's eating events respectively. Furthermore, y 2 !: y 1 and eat b eat, and therefore e-condnp) !: e-condna. So the antecedent to Subtype is verified and in the monotonic component of CE we infer subtype(/3, a ). In the nonmonotonic reasoning for discourse attachment, therefore, two laws apply: Elaboration and Narration. By the Penguin Principle, Elaboration ( a , P) is inferred. The same laws for discourse attachment will apply when analysing (1 2c), because cheese C food, and devour b eat. Through exploring how to spread reasoning in interpretation between the lexicon and pragmatics, we have learnt about the kind of lexical structures that we need. Lexical entries must be structured along causal dimensions to ensure the right flow between the semantics of verbs and the lexical level and explanations at the texrual level. Lexical entries must also encode semantic type subsumprion relations, to ensure the right flow between thematic role information at the lexical level and elaborations at the texrual level.
Nicholas Asher and Alex Lascarides 93
•
Bar Rule: FSbar > FSbar1
This rule states the following: the information in the sentence normally leads one to infer that the meaning of bar is bar1 •10 Assuming that the default disambiguation of senses by sentential information can be expressed as a >-rule is a relatively weak assumption, but it poses a problem. The Bar Rule is specific to ( 1 c). So how is the Bar Rule acquired from more general principles about disambiguation in a sentential context? It appears that we have replaced DICE's problem of requiring very specific laws like the Push Explanation law concerning discourse attachment with the problem of requiring very specific laws concerning sense preference in a sentential context. Ideally, we should be able to exploit general principles about word sense disambiguation in a sentential context. If these principles predicted correctly that the courtroom bar is favoured in the sentential context (• c), then we would be assuming merely that this preference is stated as a > rule. We could, in essence, systematically generate rules like the Bar Rule from the general principles of lexical processing. But as we have mentioned, these principles are not well understood in the lexical literature, although it has been appreciated for a long time that such principles are needed in a full treatment of NL interpretation. So for now, we simply assume that the preferences that an adequate theory of sense disambiguation in a sentential context would give will be stated in DICE as a >-rule.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
resides in the sentence ( • c). The former favours the pub sense of bar, and the latter favours the courtroom sense, and the former ultimately wins. We must represent these discourse and sentential knowledge resources, and show how the conflict between them is resolved in the logic in the appropriate way. So let us first consider the sentential knowledge resources. The rational process that underlies the way the syntactic and semantic information in ( 1 c), and the domain knowledge about courtroom scenarios, favour the courtroom sense of bar is not well understood. So we simply make the following crucial assumption about lexical disambiguation using word association. If words in the sentence favour by default a particular sense of an ambiguous word, we can encode this in DICE in the form of a >-rule, where the antecedent is the FS of the sentence that is constructed from the FS of the ambiguous word, and the consequent is the FS of the sentence that is now constructed from the FS of the sense of the word that is favoured. Some notation clarifies this assumption. Let FSbar be the feature structure that represents the sentence ( 1 c), which is constructed from the FS for the word bar before it is disambiguated. Let FSbar , be the same FS as FSba n save that the part of the FS that represents the ambiguous word bar is replaced with the FS that represents the courtroom sense of bar, which we label bar1 • Then our assumption is that these FSS are related by >, as show in the Bar Rule.
94
Lexical Disambiguation in a Discourse Context
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Now we rum to the discourse information that applies when interpreting text ( 1 ). First, we argued at the beginning of the paper that the spatial constraints on narrative discourse played an important role in the disambiguation of bar. What are the spatial constraints? Asher ( 1 99 3 b) investigates the spatial constraints of narratives in DICE, and uses these constraints to reason about how objects move in narrative discourse. First he defines how to calculate when and where evenrualities start (written source(e )) and when and where they stop (written goal(e )). source(e) is calculated by projecting e onto the space line and time line, and the temporal part of source(e) is the first point on the time line in the projection, and the spatial part of source (e) is the place where e occurs at this point on the time line. goal (e) is similarly defined. The spatial constraints in narrative that Asher ( 1 99 3 b) proposes then constrain the values of svurce (e) and svurce(e ). The test ( 1 ), however, is more complex than the narratives considered in Asher ( 1 993b), in that it contains the indirect speech report ( 1 b), and this plays a significant role in reasoning about the defendant's whereabouts during the discourse. Asher ( 1 99 3 b) does not consider how indirect speech reports of this kind affect the inferences about space in narratives. Now, in ( 1 ), the defendant is an actor in all three sentences, and in ( 1 b) the barrister makes a claim about where he is. Upon interpreting ( 1 b), therefore, we gain an expectation: the defendant is at the pub across the street when the barrister is speaking. This expectation is a manifestation of Grice's Maxim of Quality; we assume that the barrister is being sincere, and we expand our beliefs with his belief about the defendant's whereabouts. It must be stressed that this is an e>.pectativn though, and not added to the truth conditional content of the constituents in the discourse. For if it were added, then it would be impossible to refute the barrister's speech report by subsequent information in the discourse, without reducing the discourse to inconsistency. In narratives, expectations cannot be cancelled by subsequent information in the discourse: Contrast is the rhetorical relation that plays this function. So if ( 1 c) is attached to ( 1 b) with narration, then the expectation that the defendant is at the pub when the barrister spoke survives when interpreting what ( 1 c) means. But what does this tell us about where the defendant was found? There is a spatial constraint on narratives, that as long as there is no information in the compositional semantics of fJ that, together with the rest of the KB, leads to an inference that actors have moved between the end of the first eventuality and the start of the second, then they do not move between these points. One example of such a signal in the compositional semantics of fJ is a temporal adverbial like twenty years later, which indicates such a long lapse in time that actors are likely to have moved. Furthermore, given that expectations cannot be cancelled in narratives, if the expectations include information about where an actor is, then this can fix the location of the actors in subsequent
Nicholas Asher and Alex Lascarides
95
eventualities in the narrative. Let n loc be the embedded default that asserts the following relationship: the compositional semantics of fJ and the KB I"" the actors have moved between the end of a 's eventuality and the start of {J's. Then these ideas are about locations in narrative, and in particular how expectations about locations are exploited, are captured in the following constraint: o
Spatial Constraint on Narrative: o((Narration ( a , {J) 1\ actor(x , a ) 1\ actor(x, fJ) 1\ &pt( r u a , y) 1\
loc (x , source(ep))
=
loc (x , source(ea + er)))
--.n loc) -
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This law states: if Narration ( a , fJ) holds; a and fJ share an actor x; y is an expectation from the discourse context; and fJ contains no information that leads you to believe things have moved between the goal of a and y's eventualities combined, and the source of {J's; then the location of x at these spatio-temporal points are the same. More needs to be said about the source and goals of complex eventualities, but the idea is that the source and goal of these complex entities is calculated by projecting the eventualities onto a time line and space line, in a similar manner to simple eventualities. The situation is even more complicated when we consider states: what is the endpoint of the state as far as the discourse context is concerned? This is a difficult issue, and requires extensive research in the interaction between aktionstart and discourse. So we gloss over this issue here, and will simply fix the values ofgoal (ea + er) as intuitions would dictate. Now, the fact that the above spatial constraint allows eY to play a role in calculting where x is guarantees that if the expectation y gives information about where x is, then this fixes where he is in the subsequent narrative. In this way, we exploit the expectation of where x is when interpreting the subsequent discourse. But it should be stressed that this is not possible with all rhetorical relations. Narratives cannot cancel expectations, but contrastive discourses do. We will see how the above spatial constraint plays a crucial role in our interpretation of ( 1 c). One might think that domain knowledge would be adequate here to reason about the movement of objects. After all, many AI sys tems of knowledge representation represent a persistence axiom that states that unless there is information to the contrary, objects remain stationary. But there are two reasons why encoding such a constraint is inadequate for our purposes. First, as we showed at the beginning of the paper, different rhetorical relations have different spatial constraints. An AI style 'spatial persistence' axiom fails to reflect this. Second, AI persistence axioms are default rules, and our spatial con straint on Narration is an indefeasible rule, which applies in quite specific cir cumstances. It will shortly become clear why it is important that the spatial constraint is an indefeasible law. If it were not, we would get spurious irresolv able knowledge conflicts when reasoning about discoure attachment. Now, in the analysis of ( I ) we aim to formalize the following line of
96
Lexical Disambiguation in a Discourse Context
o
Lexical Impotence:
o (((FS�ex > FS�ex1) 1\ -.((Info1ex ( a , {J ) 1\ ( r, a , {J)) > (R (a , {J) 1\ R ¥ Narration ))) -+ (FS1ex 1\ ( T, a , {J)) > Narration (a , fJ))
In words, it states the following: suppose there is information in the sentence FS�ex that normally leads one to conclude the meaning of a particular lexical item lex is lex 1• Suppose furthermore that even if you interpret lex as lex 1, then you still do not add information to the two constituents a and fJ that help you infer a non-default discourse relation between them (i.e. a relation other than Narration ). Then, indefeasibly, one can assume that updating a with fJ normally leads to Narration , whatever lex means. That is, the intra-sentential information does not change the defaults for discourse attachment. In essence, if Lexical Impotence strikes, then discourse attachment has
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
reasoning. The discourse context wins over the sentential context when disambiguating bar in order to avoid discourse incoherence. How do we encode this? Discourse incoherence occurs when no rhetorical relations can be inferred for attaching the current constituent. The rule Narration is the only rule that always applies when attaching fJ to a , regardless of their content. So consider the case when Narration is the only rule that applies. Then if sentential information about disambiguation conflicts with Narration-such conflict is possible because Narration ( a , b ) constrains the semantic relation between a and p -then a Nixon Diamond will form, no discourse relation will be inferred, and the discourse will be incoherent. So we wish to capture the intuition that these cases of irresolvable conflict are avoided by ensuring that discourse information wins over sentential information. Any Nixon Diamond between Narration and sentential information about disambiguation must be avoided if Narration is the only rule for discourse attachment that applies. We avoid this Nixon Diamond by transforming it into a Penguin Principle. More specifically, suppose sentential information favours a particular sense of a word lex, and this preference is encoded as a >-rule FS�ex > FS�ex!" Suppose, furthermore that Narration is the only discourse attachment rule that applies. Then under these circumstances, we derive a new narrative rule which is 'strengthened', in that this sentential information FS1ex is added as a conjunct in the antecedent of Narration. So we have a new rule (FS�ex 1\ ( T , a , {3)) > Narration (a , {J ). Then, if there is conflict between this discourse rule and the sentential rule for disambiguation, Narration ( a , fJ) will be inferred via the Penguin Principle, since the new rule's antecedent entails FS1ex, making it more specific. This ensures that discourse information wins, and discourse incoherence is avoided. This general law about the interaction between words and discourse ts formally represented in DICE as follows:
Nicholas Asher and Alex Lascarides
97
(I 3) (Infoba, 1 ( a2, a3) 1\ ( r, a2, a3)) > (R (a , a3) 1\ R oF Narration ) So in the monotonic component of CE, we conclude the following law via Lexical Impotence:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
pnonty over the lexical disambiguation from sentential information. This prioritization is ensured, because specificity will favour the strengthened narra tive rule-which is inferred via Lexical Impotence-when it conflicts with the disambiguation rule FS�cc > FStcx . Lexical Impotence carries with it the heuristic we required of disambiguat ing words so as to avoid discourse coherence. To see this, we will show how it underlies the analysis of ( 1 ), and how it ensures that we choose to preserve discourse coherence, rather than disambiguating bar in favour of the intra sentential information. Let the logical forms ( I a, b) be respectively a1 and a2• We assume that when interpreting a1 , WK is stated as intuitions would dictate, so that, given the reader's KB, the reader has an expectation that the judge is in court, and the defendant is not, and hence not in bar1 (written respectively as -.overlaps,(c , d , t0 1 ) and -.overlaps,(bar1 , d, ta 1 ), where c is the court, d is the defendant, overlaps, is 'spatially overlaps', and ta I is the time where ea I holds). 1 t Now consider attaching a2 to at· The only law that applies is Narration, and its consequent-Narration (at , a2)-is consistent with the rest of the KB. So by Defeasible Modus Ponens, it is inferred. And by logical omniscience and the Axiom on Narration, ea I < ea z· As we have mentioned, we also gain an expectation from the indirect speech report in a2 about the defendant's whereabouts: he is at the pub across the street at the time when ea 2 (the saying event) occurs. Let the DRS representing The defendant is at the pub across the street be y . Then, the reader infers Expt ( a2, y ), and assuming that goal(ea + er) is defined as intuitions would dictate, this spatia temporal referent point fixes the whereabouts of the defendant, as at the pub across the street. If he is at the pub across the street, he is not in the courtroom. So loc (d , goal(ea + er)) Sl c holds, where d is the defendant and c is in the courtroom. But, as we said before, because this was just an expectation about the defendant's whereabouts, it is not added to the truth conditional content of the discourse. But it will have discourse effects via the spatial constraint. Now the task is to attach a 3-the logical form of( t c)-to the preceding SDRS. The only open constituent is a2, and so ( r, a2, a3) is added to the KB. So which rules apply when attaching a3 ? First, we must consider the monotonic reasoning component of CE. Lexical Impotence is verified. This is because, first, we have the Bar Rule; and, second, the courtroom sense of bar does not affect the candidate discourse relations that can be Usfd to attach a3 to a2• More specifically, there is no rule of the form below, where Injoba, 1 (a2, a3) is a gloss for 'information about the semantic content of a and {J, where bar is interpreted as bart':
98
Lexical Disambiguation in a Discourse Context
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
By the logic CE, these monotonic inferences are performed before any nonmonotonic inferences, and so Strengthened Narration forms part of the premises to the nonmonotonic component. The rules that apply in this component are: Narration, the Bar Rule, and Strengthened Narration. Given the Spatial Constraint on Narration, Na"ation ( a2, a3) and the consequent of the Bar Rule are inconsistent, given the rest of the contents of the reader's KB. For on the one hand, FShar t represents the predicate argument structure of ( I c) as intuitions would dictate, then it entails that the defendant is in the courtroom when he is found. On the other hand, the Spatial Constraint on Narration entails that if Narration (a2, a3) holds, then the location of the defendant at the source of the finding event is given by loc (d ,goal(e" + er)). So if Na"ation (a2, a3) holds, then this location does now form part of the truth conditional content of the discourse; the defendant is at the pub at the source of the finding event. Hence, if the discourse is narrative, the reader infers from the spatial constraint that the defendant is not in the courtroom at the source of the finding event. But find is an achievement verb, and as such it has no temporal extent, and its source is its goal. So the defendant is not in the courtroom when he is found, if the discourse is narrative. This conflicts with the Bar Rule, which predicted that the defendant is in the courtoom when he was found. So, since the Spatial Constraint on Narration makes FShar 1 and Narration ( a2, a3) inconsistent in the reader's KB, the Bar Rule and Bar Strengthened Narration conflict. But the latter is more specific, because its antecedent has the added conjunct ( r, a2, a3). So by the Penguin Principle, Na"ation ( a2, a3) is inferred, and the consequence of the Bar Rule is not inferred. Having inferred Na"ation ( a2, a3), the Spatial Constraint on Narration applies. So by logical omniscience, the bar where the court bailiff finds the defendant is not bar1• Therefore bar must be bar2 (i.e. its pub sense) rather than bar1 • In this example, we saw how coherence constraints on narrative can drive lexical disambiguation. If the intra-sentential information about how a word should be disambiguated conflicts with discourse coherence constraints, then the need for discourse coherence wins. At least, this is the case so long as the intra-sentential information prefers a particular sense by default , rather than indefeasibly. This preference for discourse coherence was modelled in a general rule that encoded the interaction between sentential information ai1d discourse information. Lexical Impotence in essence captured the following: if the disambiguation favoured by sentential information does not affect which rules for discourse attachment apply, then one can assume that the rules for discourse attachment have priority during NLP. Now consider the contrastive discourse (2), introduced at the beginning of the paper, and compare its interpretation in DICE with that of (I ).
Nicholas Asher and Alex Lasca rides 99
(2) a. The judge asked where the defendant was. b. His barrister apologised, and said he was at the pub across the street. c. But in fact, the court bailiff found him slumped underneath the bar.
5.2
Lexical information and strengthening rhetorical connections
In text ( 1 ), discourse coherence constraints influenced disambiguation. We now examine a further type of discourse-word interaction. Intuitively, the discourse relations Evidence and Consequence are scalar. One can have weak or strong evidential support; and weak or strong consequential links. Here we analyse texts that feature Evidence and Consequence , and show that if a word can be disambiguated in favour of the stronger support, then this takes place. Typically, when attempting to attach constituents together with Evidence, the reader is prepared to assume information that is not stated, so as to achieve an appropriate relationship of logical or causal support between the con stituents. The strength of evidential support can be measured in terms of the amount of this new information that the reader needs to infer. But the relation ship between Evidence, new information, and strength of connection is very
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The logical forms of (2a, b) are as before-a, and a2• And of course, reasoning about their discourse attachment is the same as before: the reader infers Narration (a1, a2), ea } < ea 2, and Expt ( a2, y ) , where y is the DRS representing Tlze defendant is at tlze pub across the street . Now we must attach (2c) to the soas for the preceding discourse. Let the logical form of (2e) be a). Firt consider the monotonic component. In contrast to a3, Lexical Impotence is not verified, because the information in a) allows a rule for discourse attachment to apply whose consequent is a relation other than Narration . Namely, the presence of But means that a Contrast relation must be inferred. So in this example, Strengthened Narration is not inferred. Instead we infer Contrast (a2, a)). Now consider the nonmonotonic component. Two rules apply: Narration and the Bar Rule. The Contrast relation 4lready inferred conflicts with Narration, and so its consequent is not inferred. Therefore, the Spatial Constraint on Narration plays no role in interpreting (2). But what about the Bar Rule? In DICE, it is necessary to check that, whenever a Contrast is inferred (by the presence of but , for example), it has been coherently used: the information in a) must violate an expectation that arose from the discourse context. And indeed, if the Bar Rule fires, then such an expectation is violated. For then, a) entails that the defendant is in the courtroom when he is found, whereas our expectation from the preceding discourse is that he was not in the courtroom. So in contrast to ( 1 ), the Bar Rule does not conflict with the discourse information. It fires in the interpretation of (2.), and so bar is now assigned its courtroom sense.
too
Lexical Disambiguation in a Discourse Context
o
Strengthening Evidential Support: (a) o ((PS1 - orth ( lex ) 1\ P� - orth ( lex ) 1\ (b) X ( a ) /\ Y(/3) /\ (r:, a , {J) /\ ((X ( a ) 1\ Y(f3) 1\ (r:, a , {J))> Evidence ( a , {3)) (c) 1\ Pl(aps 1 I f3ps 1 ) > Pl (aps 2 1 f3ps )) -+ PSI )
The notation f3ps 1 stands for the logical form of {3 obtained using PS1: similarly for ap5 1 , (f3ps 2 and (ap52 • So (c) above says what we wish, that assuming lex is PS1 provides better evidential support than PS2 would. Consequently, Strengthen ing Evidential Support says: the sense of lex is PS1 , because this reinforces the relation Evidence , which is inferrable from the semantic content of.a and {3. Now we consider the impact of this law on the analysis of (4a, c). (4) a. The EC are decisive. b. The EC run meetings on time. c. Last night's meeting came to a conclusion by 8pm. Let the logical forms of the sentence (4a) and (4c) be respectively a and {3. The lexical ambiguity of conclusion in {3 is as yet unresolved, because the intra sentential information in (4c) alone fails to disambiguate conclusion in the monotonic reasoning component. (4a) is a generic, and a must reflect this. In particular, if a is true, then by WK it follows that all EC meetings are normally (or 'generically') decisive. In contrast, {3 does not quantify over EC meetings; it is about a particular event token of the EC meeting, which happened last night. It says of this event token that it came to a conclusion. Now if conclusion is interpreted as agreement, then the meeting was decisive. On the other hand, if conclusion is interpreted as end, then the linguistic
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
complex, and forms a focus of current research for the authors. We gloss over the complexities here, by approximating their effect in a plausibility scale. PI ({3 1 a1 ) < PI ({3 I a2) means 'The plausibility of {3 given a1 is greater than the 12 plausibility of {3 given a2'. The plausibility scale measures the strength of evidential support according to the following assumption: if PI ( a l fJS > PI (a I fh.), then {31 would provide better evidential support for a than fh. would. How does the plausibility scale affect lexical disambiguation? The following law is a general principle about how lexical and discourse informaqon interact. Is states the following: if (a) a word lex is ambiguous, with (at least) two senses represented by the FSS PS1 and P�; (b) Evidence is a candid�te relation for attaching {3 to a (because a default rule for inferring Evidence applies), and (c) the sense of the ambiguous word lex provided in PS1 would strengthen the. evidential support, compared to interpreting lex as PS2, then disambiguation takes place in favour of the sense PS1•
Nicholas Asher and Alex Lascarides 10 1
knowledge (tK) in (4c) we can infer that the meeting was over relatively quickly, and by WK we can in tum infer that, therefore, the meeting was decisive. Either way, there appears to be a nonmonotonic inference from the reader's KB when interpreting {3 that the meeting referred to was a decisive one, regardless of how conclusion is interpreted. So, let meeting(e) mean that e is an event of the EC meeting, and decisive(e) mean that this meeting was decisive. Then we obtain the following relation ships between a , {3, generic laws and particular statements: 1. 2.
We declaratively specify a rule which states that the three above properties are normally sufficient to infer Evidence . This amounts to inductive evidential support, for condition ( 1 ) above states that a entails a generic statement, condition (2) states that {3 describes an instance of the antecedent to that generic statement, and condition (3) states that {3 and other knowledge resources in the KB lead to the consequence of the generic statement. So ( 2) and (3) together mean that {3 is a particular example of the generic relationship entailed by a . Thus assuming f3 is Evidence for a is tantamount to assuming -a step of induction, from a particular case-{3 -to the general rule-a . This inductive step is stipulated in Inductive Evidence below, where ilz(a ) stands for the nested conditional which specifies in the object language that f3 and the laws characterizing WK and LK together I"" Z (a ) (that is, the object language specification of the information in 3 above): 1 3 •
Inductive Evidence: ((r, a , {3) 1\ o ( a .... (Vx)( Y (x) > Z (x )))
1\ o({J .... Y(a )) 1\ ilz(a)) > Evidence(a , {3) This law will apply when attaching (4c) to (4b) or (4a). It will also apply when trying to attach (4e) or (4f) to (4a): (4) e. Last night's meeting was over very quickly f Last night's meeting was successful. g. ?Last night they had a meeting. h. ?Last night's meeting ended without any agreement. i. ?Last night's meeting was in Brussels.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
3·
By WK, a entails EC meetings are decisive: o ( a - (V e )(meeting(e) > decisive(e ))) {3 strictly entails that there is a meeting e': o({J - meeting (e')) {3 and LK and WK nonmonotonically entail that the meeting in question was a decisive one, regardless of how conclusion is interpreted: p u laws representing WK and LK I"" decisive(e)
1 02 Lexical Disambiguation in a Discourse Context But it will not apply when attaching (4g); (4h) or (4i) to (4a), because ndecisiw(e) will be false. Recall the logical forms of(4a) and (4c) are a and {3. In line with intuitions, we assume a plausibility scale where reaching agreement quickly provides better evidential support for being decisive than meetings coming to an end quickly. So let PS1 be the lexical entry for conclusion corresponding to its agree ment sense, and P� ache lexical entry corresponding to its end sense. Then all the following hold: PS1 -+ orth (conclusion ), PS2 orth (conclusion ) and Pl ( ars 1 I �f3rs 1 ) > Pl ( ars 2 1 Prsz). This plausibility scale, together with the fact that the above Inductive Evidence rule applies for a and {3, ensures that when analysing (4a, c), the antecedent of Strengthening Evidential Support is verified in the monotonic component of CE. So its consequence is inferred: conclusion means agreement. This shows how the heuristic to strengthen evidential support can cause lexical disambiguation. We tum now to the nonmonotonic reasoning in which a discourse relation between a and fJ will be inferred. The laws that apply are Inductive Evidence and Narration. The tense structure of (4a, c) conflicts with the consequence of Narration, because ea holds at the time of speech, whereas ep held earlier. So Evidence wins, and Evidence( a , fJ) is inferred. 1 4 Now consider (4b, c). Suppose the logical form of(4b) is y. Then the contents of the KB are the same as those above where a is substituted with y, save that now the following holds: PI ( Yrs 2 I �fJrs 2) > PI( Yrs 1 I �fJrs ). So in the monotonic component of CE, PS2 is inferred via Strengthening Evidential Support (and so conclusion means end), and in the nonmonotonic component, by a similar pattern of reasoning, Evidence (y, fJ) is inferred. We next consider a text where the discourse relation is Consequence . -+
Consider first text ( 1 4a, b) and let the logical forms of the sentence be a and fJ respectively. The verb ruin is a causative verb (Sanfilippo 1 992). Moreover, we assume that the selectional restrictions on ruin means that the pronoun in fJ must be resolved to the event ea of putting a plant there. So cause(ea, ep) is inferred via the lexical semantics of ruin . This has an effect at the textual level via Consequence: 1 5 o Consequence: (r, a , {J ) 1\ cause (e", ep) > Consequence(a , fJ )
Finally, there is an analogous law to Strengthening Evidential Support for Consequence: 1 6 o
Strengthening Consequential Su pport:
(PS1
-+ orth (lex) 1\ P� -+ orth (lex) 1\
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
( 1 4) a. They pur a plant there. b. I t ruined the view. c. It improved the view.
Nicholas Asher and Alex Lascarides
103
X( a) 1\ Y({J) 1\ (r, a , P) 1\ (X ( a ) 1\ Y(P) 1\ ( r, a , P) > Consequence( a , P)) 1\ Pl (f3Fs 1 1 aFs1 ) > Pl (fJFS2 1 aFS )) --- FS1
6 CONCLUSION In this paper, we have investigated how lexical disambiguation takes place in a discourse context. We showed that domain knowledge and word association are insufficient to account for lexical disambiguation in many cases. By augmenting a formal theory of discourse attachment with lexical knowledge, two important goals were achieved. First, we were able to model how discourse information affects lexical decisions. We offered a theory of lexical processing chat uses new knowledge resources: knowledge about rhetorical relations, and the constraints they impose on coherent discourse. And we were able to specify very general heuristics for disambiguation that used this discourse information: Avoid Discourse Incoherence, and Strengthen Rhetorical Connections. The second important goal was that, through adding lexical knowledge to a theory of discourse attachment, many of the pragmatic heuristics and causal laws for deriving discourse strucrure can be simplified and generalized. We were able to replace lexical items in the laws with the underlying semantic concepts that make those laws plausible, thereby allowing a single law to apply to many closely related lexical items. In essence, we showed how lexical information is used to take decisions about discourse attachment. Although integrating discourse and lexical processing provides a forum where world knowledge and lexical knowledge interact in precise ways, answers to the long-standing question of where lexical semantics ends and
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We are now in a position to see how disambiguation occurs in ( 1 4a, b). We have already mentioned that when attaching p to a , cause (ea, ep) is inferred in the monotonic component, via the lexical semantics of ruin . This entails chat the antecedent to Consequence is also verified. Let FS1 be the factory sense of plant and FSz, its fauna sense. Then the plausibility scale reflects the relative aesthetics of factories and flora as long as PI (PFs 1 I aFs 1 ) > PI (fJFs 1 I aFs J In other words, it is more plausible for a factory to ruin a view than flora. So the antecedent to Strengthening Consequential Support is also verified, and consequently FS1 is inferred. That is, plant in P is assumed to have its factory sense. In the nonmonotonic component, Consequence and Narration apply. These do not conflict, and so Consequence and Narration are both inferred. The analysis of ( 1 4a, c) is similar, save that the plausibility scale ensures one infers that plant has its fauna sense FS2.
104 Lexical Disambiguation in a Discourse Context
( 1 s) a. Max got angry at the newspaper. b. He had thrown it onto the table, and in the process he had spilt his coffee all over his contract. c. He had received a rude reply from the editor to his letter accusing him oflibel. This disambiguation is determined by the information connected to the anaphor it, which is constrained by the discourse structure to be resolved to the antecedent newspaper. Our theory is in a good position to model this line of reasoning, since SDRT already represents constraints on anaphora resolution imposed by discourse structure. These are the topics of current research. Acknowledgements We would like to thank Ted Briscoe, Ann Copestake, Claire Grover, Antonio Sanfilippo, Greg Whittemore, Yorick Wilks, and two anonymous reviewers for helpful comments on the work reported here. This paper was written while supported by CNRS at IRIT, Universite Paul Sabarier, Toulouse.
NICHOLAS ASHER
ALEX LASCARIDES
Centerfor Cognitive Science University ojTexas CRG 220 Austin, TX 78712 USA e-mail: [email protected] u
Centerfor Cognitive Science Edinburgh University 2 Buccleuch Place Edinburgh, EHB 9LW Scotland
Received: 07.01 .94 Revised version received: os.o8.94
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
world knowledge starts remain elusive. But, nevertheless, we can draw general conclusions about the kinds oflexical structures we require, iflexical processing is to work in service to discourse attachment. In order to infer explanations and elaborations at the textual level, the lexicon must include typed semantic information about causation and part/whole relationships, as well as typed syntactic information. This is unsurprising. But it is a fortunate discovery, since it is compatible with current approaches to computational lexical semantics (e.g. Pustejovsky I 99 I ; Copestake & Briscoe I 99 I , this volume; Sanfilippo I 992). Many research questions remain unanswered, the most pressing of which is to reconstruct in DICE more general principles about how intra-sentential information affects lexical disambiguation. Furthermore, it is hoped that the techniques used in this paper will extend to an account for how lexical dis ambiguation is affected by discourse structural constraints on anaphora resolution: in ( I sa, b), newspaper refers to the paper object, whereas in (I sa, c), it refers to the organization.
Nicholas Asher and Alex Lascarides 105
N O TE S
10
1 I
I2
I3 I4
Is
entry for salmon are in order. This is an example of sense extension, as described in Briscoe, Copestake & Lascarides ( I 99 3 ). We assume a lexical rule of animal grind ing rums an animal into its food substance, where the orthography remains unchanged. In sentence (r2b), this food sense of salmon is used because part of the meaning of eat is that the object is of type food, and so the pre ferred interpretation of salmon in (1 2b) is as food. Consequencly, when the logical form of (r 2b) is fixed, we already know salmon is assigned its food sense, thus ensuring Subtype and Elaboration apply in the logic CE. Unlike we push, we assume we really do need >-rules here, because the exceptions to inferring the consequent cannot be exhaustively listed. This is a reasonable assumption to make since, unlike ( I o), the disambiguation of (I c) is dependent on domain knowledge. -.overlaps,( c, d, 1) is a gloss for -.overlaps,(strif(c), strif(d), t) as used in Asher (I 99 3 b), where strif stands for spatio-temporal reference. The semantics of strif(x) is defined in Vieu ( I 99 I ). We assume four axioms characterize the behaviour of the PI scale, which amount to: Validity is maximally plausible; Contradiction is minimally plausible; Asymmetry; and Transitivity. These axioms are compatible but weaker than the axioms that define distributive probability functions. A similar rule is specified in further detail in Asher & Lascarides (I 994). Even if Narration were consistent with the facts in the KB, Inductive Evidence would still win over Narration thanks to the Penguin Principle. The law for inferring Consequence can be generalized, so that Consequence is inferred when the constituents are
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
1 Discourse structure and a p are given model theoretical interpretations in Asher (1 993a); e0 abbreviates me(a ), which is formally defined in Asher (1993a) in an intuitively correct way. 2 We consider foe to be a modal operator, because it will correspond to a feature in the lexical entry for verbs and, using Blackburn's (1 992) semantics, features correspond to modal operators. For the sake of simplicity, we do not represent the possible coercions ofpush for example, the change in possession that occurs in Maxpushed Bill the book . 4 There are other senses of push , but they are incompatible with either the sen tences syntax ofJohn pushed Max, or with the semantic type of the object Max. Syntax rules out things like the sense of push in John pushed against the waves . And the semantic type of the object rules out the sense of push in john pushed heroin . These senses can be eliminated using techniques described in Boguraev ( I979) and Alshawi (1 992), and so they are not considered here. Re-entrancy and unification will play a role traditionally played by ). -abstraction in deriving this logical form, as described in Moore (I 989). 6 It would be useful to distinguish between those verbs that describe change and are amenable to external causation from those that are not amenable to external causation (c£ Sanfilippo I 992). But we gloss over this for now. 7 For the sake of simplicity, we gloss over the fact that encourage could be replaced by persuade or force . 8 For the sake of simplicity, we have glossed over the values of qs in these entries. In the case of salmon , some properties of its qs will be inferred via type inheritance from food. 9 Some comments about the above lexical
106 Lexical Disambiguation in a Discourse Context logically related, as well as when they are causal related, but we gloss over this for the sake of simplicity. 16 Using schemas, we could reduce Streng-
thening Evidential Support and Streng thening Consequential Support to one rule, but we gloss over this here.
R E F E RE N C E S Alshawi, H. (I 992) (ed.), The Core Language Engine , MIT Press. Asher, N. ( I 99 3a), Reference to Abstract Objects
Academic Publishers, Holland. Asher, N. ( I 993b), 'Temporal and locational anaphora in texts', Research Report, IRIT, Universite Paul Sabatier, Toulouse, France. Asher, N. ( I 993c), 'Extensions for common sense entailment', in C. Boutilier & ]. Delgrande (eds), IJCAI Workshop on Condi
tionals. Asher, N. & A. Lascarides (I 994). 'Intentions and information in discourse', Proceedings
ofthe JZnd Annual Meeting ofthe Association for Computational Linguistics (ACL94), Las Cruces, New Mexico, June I 994, 34-4 1 . Asher, N . & M. Morreau (I 9 9 I ), 'Common sense entailment: a modal theory of non monotonic reasoning', in Proceedings of the 1 zth International joint Conference on Arti ficial Intelligence, Sydney Australia, August
I99 1 . Blackburn, P. (I 992}, 'Modal logic and attri bute value structures', in M. de Rijke (ed.) Diamonds and Defaults , Studies in Logic, Language and Information, Kluwer, Dor drecht, Holland (available as University of Amsterdam, ITLI, LP-92-o2). Boguraev, B. (I 979), Automatic Resolution of Linguistic Ambiguities, Ph.D. thesis, Computer Laboratory, University of Cambridge. Briscoe, T., A Copestake & B. Boguraev (I990), 'Enjoy the paper: lexical semantics via lexicology', in Proceedings of CO UNG9o , vol. 2, 42-7. Briscoe, T., A. Copestake & A Lascarides
Proceedings ofthe ACL SIGLEX. Workshop on Lexical Semantics and Knowledge Representa tion , Springer-Verlag, I O I - I 9. Copestake, A. & E.]. Briscoe (I 994), 'Semi productive polysemy and sense extension', this volume. Daelemans, W., K. De Smedt & G. Gazdar (I992), 'Inheritance in natural language processing', Computational Linguistics , 18, 2, 205-I9. Dahlgren, K. ( I 988), Naive Semantics for Natural Language Processing, Kluwer Aca demic Publishers, Holland. Grosz, B.]. & C. L. Sidner (I 986), 'Attention, intentions, and the structure of discourse', Computational Linguistics , 12, I 75-204. Guthrie, J., L. Guthrie, Y. Wilks, & H. Aldinejad ( I 99 I}, 'Subject-dependent co occurrence and worse sense disambigua tion', in Proceedings ofthe 29th Associationfor Computational Linguistics , Berkeley, June I99I, 5 5-63. Hayes, P. (I 977), 'Some association based techniques for lexical disambiguation by Machine', Ph.D. thesis, University of Rochester; New York. Hirst, G. ( I 987), Semantic Interpretation and the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
in English: A Philosophical Semantics for Natural Language Metaphysics, Kluwer
( I 993), 'Blocking', in St. Dizier, P. (ed.), Computational Lexical Semantics , Cam bridge University Press. Carpenter, R (I 992 ), The Logic ofTyped Feature Structures, Tracts in Theoretical Computer Science, Cambridge University Press. Chamiak, E. (I 983), 'Passing markers: a theory of contextual influence on lang uage comprehension', Cognitive Science , 7, I 7 I --90. Copestake, A. & E.]. Briscoe (I 99 I), 'Lexical operations in a unification-based frame work', in J. Pustejovsky & S. Bergler (eds),
Nicholas Asher and Alex Lascarides I07
Resolution ofAmbiguity, Studies in Natural Language Processing, Cambridge Univer sity Press. Hobbs, J. R ( 1 98 5). 'On the coherence and structure of discourse', Report No: CSLI8 5-37, Center for the Study of Language and Information, October I 985. Hobbs, J. R, M. Stickel, D. Appelt & P. Martin (I 990), 'Interpretation as abduc tion', Technical Note No. 499, Artificial Intelligence Center, SRI International, Menlo Park, CA. Hovy, E. (I 990), 'Pragmatics and Natural Languge Generation', Artificial Intelligence, 43, 1 5 3-197· Kamp, H. (r98 1 ), 'A theory of truth and semantic representation', in J. A G. Gro enendij, T. M. V. Janssen and M. B. J. Stokhof (eds), Formal Methods in the Study of Language, 136, 277-322, Mathematical Centre, Amsterdam, Tracts. Kamp, H. & U. Reyle (in press), From Discourse
poral coherence and defeasible knowl edge', Theoretical Linguistics , 19, 1 . Mann, W. C. & S. A. Thompson ( 1987), 'Rhetorical Structure Theory: A Theory of Text Organization', lSI Reprint Series lSI/ RS-87- 1 90. McRoy, S. W. (I 992), 'Using multiple knowl edge sources for word sense d iscrimina tion', Computational Linguistics , 18, I , I -30. Moore,J. D. & C. Paris (1989), 'Planning Text for Advisory Dialogues', Proceedings of the
to Logic: Introduction to Mode/theoretic Seman tics of Natural Language, Formal Logic and Discourse Representation Theory, Kluwer
ver, B.C., Canada. Morteau, M. (1 992), 'Conditionals in philo sophy and AI', Ph.D. thesis, IMS, Univer sity of Stuttgart. Norvig, P. & R Wilensky (I 990), 'A critical evaluation of commensurable abduction models for semantic interpretation', in H. Karlgren (ed.), Proceedings of COUNGgo , Helsinki, Finland, July 1 990, 22 5-230. Polanyi, L. ( 1 98 s). 'A theory of discourse structure and discourse coherence', in W. H. Eilfort, P. D. Kroeber & K. L. Peter son (eds.), Papersfrom the General Session at
Proceedings of the 29th Association for Com putational Linguistics, Berkeley, June I 99 I , S S-
Linguistics and Philosophy. Lascarides, A. & J. Oberlander (I 992), 'Abducing temporal discourse', in R Dale, E. Hovy, D. Rosner & 0. Stock (eds),
Aspects of Automated Natural Language Generation , Springer-Verlag. Lascarides, A & J. Oberlander ( 1 993), 'Tern-
Canada. Moore,J. D. & M. Pollack (1 992), 'A problem for RST: the need for multi-level discourse analysis', Computational Linguistics, 1 8, 4, 5 37-44· Moore, R C. ( 1 989), 'Unification-based semantic interpretation', Proceedings of the
27th Annual Meeting of the Association for Computational Linguistics , 33-4 1 , Vancou
the Twenty-First Regional Meeting of the Chicago Linguistics Society, Chicago, 25-27 April I 98s. Pollard, C. & I. A. Sag (I 994). Head-Driven Phrase Structure Grammar, University of Chicago Press. Pustejovsky, J. ( 1 99 I ), 'The generative lexicon', Computational Linguistics , 1 7, 4, 409-4 1 . Sanfilippo, A. (I 99 I), 'Grammatical relations in unification categorial grammar', Lingua & Stile , Fall issue. Scha, R & L. Polanyi (I 988), 'An Augmented Context Free for Discourse', Proceedings of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Academic Publishers. Kaplan, R M. & J. Bresnan (I 982), 'Lexical functional grammar: a formal system for grammatical representation', in J. Bresnan (ed.), The Mental Representation of Gram matical Relations , MIT Press. Lascarides, A. (1 992), 'Knowledge, causality and temporal representation', Linguistics , JO, 5, 94I-7 J . Lascarides, A. & N. Asher ( 1 99 1 ), 'Discourse relations and defeasible knowledge',
27th Annual Meeting of the Association for Computational Linguistics, Vancouver, B.C.,
108 Lexical Disambiguation in a Discourse Context
the 1 zth International Conference on Compu tational Linguistics, Budapest, Hungary,
I
ing: A Computational Approach Reasoning , Addison-Wesley.
to
Human
Wilensky, Y. (1 990), 'Extending the lexicon by exploiting subregularities', Proceedings of COUNGgo , Helsinki, Finland,July 1 990. 407- 1 2. Wilks, Y. ( 1 975), 'A preferential pattern seek ing semantics for natural language infer ence', A rtificial Intelligence , 6, 5 3-74· Wilks, Y., D. Fass, C. Guo, J. McDonald, T. Plate & B. Slator (1 988), 'A tractable machine dictionary as a resource for computational semantics', in B. Boguraev & T. Briscoe (eds), Computotional Lexi
cography for Natural Language Processing , Harlow, Essex: Longman.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
5 73-577Schank, R C. & R P. Abelson (1 977), 'Scripts, plans and knowledge', in P. N. Johnson Laird & P. C. Wason (eds), Thinking, Cambridge Universiry Press. Thompson, S. & W. Mann ( 1988), 'Rhetorical structure theory: a framework for the analysis of texts', in IPRA Papers in Prag matics , I, 79-105. Vieu, L. ( 1 99 1 ), 'Semantique de deplacemenr et de Ia localisation en fran�ais: une etude des verbes, des prepositions et de leur relations dans Ia phrase simple', Ph.D. thesis, IRIT, Universite Paul Sabatier, Toulouse. Webber, B. (1991), 'Structure and ostension in the interpretation of discourse deixis',
6, 2, 7 1 3 5· Wilensky, R (1 983), Planning and Understand
Language and Co$nitive Processes ,