ISSN 0167-5133
VOLUME 8 NUMBER 3 1991
Journal of
SEMANTICS
OXFORD UNIVERSITY PRESS
JOURNAL OF SEMANTICS AN INTERNATIONAL JOURNAL FOR THE INTERDISCIPLINARY STUDY OF THE SEMANTICS OF NATURAL LANGUAGE MANAGING EDITOR: PETER BOSCH (IBM Germany) REVIEW EDITOR: BART GEURTS (IBM Germany) EDITORIALBOARD: PETERBOSCH(LBM Germany) SIMON C. GARROD (Univ. of Glasgow) BART GEURTS (IBM Germany) PAUL HOPPED (Carnegie Mellon Univ., Pittsburgh) LAURENCE R. HORN (Yale University) STEPHEN ISARD (Univ. of Edinburgh) HANS KAMP (Univ. of Stuttgart) LEO G. M. NOORDMANN (Univ. of Tilburg) ROB A. VAN DER SANDT (Univ. of Nijmegen) PIETER A. M. SEUREN (Univ. of Nijmegen)
C O N S U L T I N G EDITORS: R. BABTSCH (Univ. of Amsterdam) D. S. BREE (Univ. of Manchester) G. BROWN (Univ. of Cambridge) 0 . DAHL (Univ. of Stockholm) G. FAUCONNIER (Univ. of California, San Diego) P. N.JOHNSON-LAIRD (MRC, Cambridge) SIF JOHN LYONS (Univ. of Cambridge)
J. D. MCCAWIEY (Univ. of Chicago) B. RICHARDS (Imperial College, London) H. SCHNELLE (Ruhr Univ., Bochum) M. STEEDMAN (Univ. of Pennsylvania) Z. VENDLER (Univ. of California, San Diego) Y. WILKS (New Mexico State Univ., Las Cruces) J. VAN BENTHEM (Univ. of Amsterdam)
H. E. BREKLE (Univ. of Regensburg) H. H. CLARK (Stanford University) H.-J. EIKMEYER (Univ. of Bielefeld) J. HOBBS (SRI, Menlo Park) D. ISRAEL (SRI, Menlo Park) E. L. KEENAN (Univ. of California, Los Angeles) E. LANG (Univ. or Wuppertal) W. MARSLEN-WILSON (MRC, Cambridge)
H. REICHGELT (Univ. of Nottingham) A.J. SANFORD (Univ. of Glasgow) A. VON STECHOW (Univ. of Konstanz) D. VANDERVEKEN (Univ. of Quebec) B. L. WEBBER (Univ. of Pennyslvania) D. WILSON (Univ. College, London).
EDITORIAL ADDRESS: Journal of Semantics, IBM Germany Scientific Center, IWBS 7000-75, Postfach 800880, D-7000 Stuttgart 80, W. Germany. Phone: (49-711-) 6695-559. Telefax: (49-71 1) 6695-500. BITNET: boschOdsolilog. New Subscribers to the Journal of Semantics should apply to the Journals Subscription Department, Oxford University Press, Pinkhill House, Southfield Road, Eynsham, OX8 iJJ. For further information see the inside back cover. Volumes 1-6 are available from Foris Publications Holland, PO Box 509,3300 Am Dordrecht, The Netherlands. Published by Oxford University Press
Copyright by NIS Foundation
ISSN 0167-5133
JOURNAL OF SEMANTICS Volume 8 Number 3 CONTENTS ANTON BATLINER
Deciding upon the relevancy of intonadonal features for the marking of focus: a statistical approach
171
ROBERT BANNERT
Automatic recognition of focus accents in German
191
SUSANNE U H M A N N
On the tonal disambiguation of focus structures
219
DIETER WUNDERLICH
Intonation and contrast
239
JAKOB HOEPELMAN, JOACHIM MACHATE AND RUDOLF SCHNITZER
Intonational focusing and dialogue games
253
Book Reviews
277
Journal ofSemantics 8: 171-189
© N.I.S. Foundation (1901)
Deciding upon the Relevancy of International Features for the Marking of Focus: a Statistical Approach ANTON BATLINER University of Munich
Abstract
MATERIAL AND PROCEDURES This paper is concerned with the prediction of focus; focus is the part of an utterance which is semantically most important. On the phonetic surface focus is marked by the focal accent (Fa). To be more exact, we will try to predict the phrase that carries the Fa. Our material consists of 360 utterances, spoken by six untrained speakers (three male, three female). Three different sentences with a similar syntactic structure were each put in different contexts that determined sentence modality as well as place and manner of focus (simple focus, focus projection, or multiple focus); for a detailed description of the corpus and the intended focal structures, cf. Batliner & Oppenrieder (1989) and Oppenrieder (1989). In each of the sentences the last two phrases could be stressed, depending on the surrounding context. Based on the sentence modality system according to Altmann (1987), the sentences formed minimal pairs that could only be differentiated by their intonational form: focus infinal vs.focus in prefinal position on the one hand, and questions us. non-questions on the other hand. Table 1 shows an
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
We present results on how focus is marked internationally in German. Six untrained speakers produced a corpus of 360 sentences. The corpus was constructed in such a way that sentence modality and place of focus could only be differentiated by intonational means. Acoustic features representing the parameters pitch, duration, and intensity were extracted manually or automatically. The relevancy of these features and the effect of several transformations were tested with statistical methods (discriminant analysis). Perceptual experiments where the listeners had to decide upon the place of the focal accent and to judge the naturalness and categories of the utterances were performed as well. By calculating average values for the (appropriately transformed) relevant features we found 'normal', prototypical cases; by looking at utterances where all listeners agreed on the naturalness and (intended) categories we arrived at coinciding results. At the same time we found 'unusual' but regular productions. Finally, the speaker-specific use of the different parameters is discussed and the question is addressed as to whether the parameters can be classified as relevant or irrelevant for the intonational marking of focus.
172 Deciding upon the Relevancy of International Features
Table 1 Examples of context and test sentence, induced sentence modality and place of focus Constellation of sentence modality and focus: Assertion, focus on 'linen' Context: Mother 'What does the master make Nina weave at the moment?' Sentence: Employee: 'She makes Nina weave the linen.'
Table 2 Test sentences, translation, and induced sentence modalities Sie lafet die Nina das Leinen weben?/.
She makes the Nina the linen weave She makes Nina weave the linen
assertive question vs. assertion Lassen Sie den Manni die Bohnen schneiden?/!
Make the Manni the beans cut Make Manni cut the beans
polar question vs. imperative Lassen wir den Leo die Blumen diingen?/!
Let us make the Leo the flowers fertilize Let us make Leo fertilize the flowers
polar question vs. adhortative
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
example of a context sentence, the pertinent test sentence, and the induced sentence modality and place of focus. Table 2 shows the three test sentences, an (awkward) word-by-word translation into English, an appropriate translation, and a finer description of the induced sentence modalities question/nonquestion (Q/NQ), NQ being either assertion, imperative, or adhortative. The only instruction given to the speakers was to produce the context and the test sentence. We did not instruct the speakers to produce the Fa or Qs/ NQs in a certain way: by instructing the speakers, one can eliminate certain variabilities and facilitate the analysis. On the other hand one loses the chance to find regular and interesting deviations and merely receives several realizations of representative cases where representativeness is based on the intuition of the researcher. By evaluating a relatively large number of cases we expected to find both representative cases (which we will call central types) and rarer but acceptable cases (which we will call marginal types). The data were evaluated in two ways that proved to be converging:
Anton Batliner 173
(i) We extracted acoustic feature values that represent the prosodic parameters pitch, duration, and intensity. Using a statistical classifier we tested the relevancy of the features with respect to the place of the Fa. By calculating average values for the relevant features we found the central type of each Q/NQ-Fa constellation. (ii) We presented the utterances to a forum of listeners who judged the naturalness, category, and place of Fa. Category roughly means sentence modality; as for the differences, c£ Oppenrieder (1988). By selecting the utterances that were judged to be the 'best' ones and by comparing the feature values of those utterances with the average values from (i) we found the central type as well as marginal types.
For each utterance we calculated the following features: (i) For the whole utterance: the fundamental frequency (Fo) at the end of the utterance (off); the all-point regression line of the Fo values (reg); the duration in centiseconds. (ii) For the 2nd and 3rd phrase: the maximal and minimal Fo value; the difference of the position on the time axis of the maximal and minimal Fo value in centiseconds; the duration in centiseconds; the average and maximal logarithmic energy. The parameter values were extracted 'by hand' on mingograms and automatically from the digitized versions of the utterances (cf. Noth 1989 for details on the Fo algorithm and the computation of the energy values). In Batliner et al. (1989) we showed that automatically extracted Fo values produced recognition rates comparable to those from mingogram values. An automatic extraction of the durational values, however, would pose a problem (cf. Batliner & Noth 1989: 212 f).
PERCEPTION EXPERIMENTS An average of twelve listeners participated in three different perception experiments: (i) Context and test sentence were presented by earphone and at the same rime in a written version. On a raring scale from 1 (test sentence matches very well with context) to 5 (test sentence does not match at all), the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
E X T R A C T I O N OF FEATURES
174 Deciding upon the Relevancy of International Features
listeners had to judge the naturalness of the production. We will name the average rating of the listeners NAT. (ii) The test sentence was presented in isolation. The listeners had to classify the sentence as question, assertion, imperative, exclamation, or optative. We will name the percentage of classifications as question MOD. (iii) The test sentence was again presented in isolation. The listeners had to decide which of the phrases carried the Fa. \ifai is the number of listeners who perceived the ith phrase as most stressed then FOK - {fa 2 -fa 3)/{fa i
+fa2+fa3)
STATISTICAL EVALUATION OF THE EXTRACTED FEATURES
'Best' transformations Each of the intonational features was used as a predictor variable in the discriminant analysis to predict sentence modality (Q/NQ) and (position of the) Fa. Because of the combinatorial explosion the optimal feature combination had to be determined heuristically: the predictors entered the analysis separately and (if the feature was calculated for the 2nd and 3rd phrase) together with the corresponding variable for the other phrase. Several transformations for each variable were tested. In order to reduce the necessary amount of computation all cases were used both for learning and testing with learn = test (/ — t). Throughout this paper, the analyses are based on this constellation, if not explicitly another constellation (/ 5/j or li 15 , cf. below) is referred to. The relevant variables under the best transformation were put into multivariate discriminant analyses. We can only present the most important results; for a more detailed discussion see Batliner (1989a). The statistical method is fully described in Klecka (1980) and Norusis (1986). Further applications of this method with respect to the prediction of sentence modality can be found e.g. in Batliner (1988) and Batliner etal. (1989).
Fo The transformation of the Hz values into semitones did not improve the classification results. A possible explanation could be that semitone transformation 'over' normalizes the different voice ranges of male and female
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
takes on values between i (all listeners perceived the 2nd phrase as stressed) and — i (all listeners perceived the 3rd phrase as stressed).
Anton Batliner 175
speakers (cf. Batliner et al. 1989). A normalization of the voice register by subtracting a reference value for either the speaker or the utterance resulted in significant improvements in the prediction. In the final analyses we used semitone values and subtracted the basic value of the speaker, i.e. the lowest Fo value produced by the speaker. The transformed maximal and minimal values for the 2nd and 3rd phrase are called max2, max3, min2, and min}. The relative position of the maximal and minimal values on the time axis for the 2nd and 3rd phrase are called pos2 and pos3. These values are positive, if the minimal value comes later than the maximal value; they are negative, if it is the other way round.
Best prediction was achieved after a normalization of the speaking rate that took into consideration average duration of that phrase for each speaker {avduri) and the average duration of the syllables in the utterance (dur I'numberof syllables): dur
duri
avduri
dur/number of syllables
The transformed duration values for the 2nd and 3rd phrase are called dur2 and dur}. We tested several other formulas. The results did not differ much—as long as the actual duration value was put into relation to some reasonable reference value.
Intensity The best results were achieved with the maximal energy in the 0-5000 Hz band. Average values, 'sonorant' energy sub-bands, and normalizations with respect to the average energy level of the utterance, or with respect to the different intrinsic energy values of the vowels, produced worse results. The intensity values for the 2nd and 3rd phrase are called int2 and int}.
Discarded transformations Declination The phenomenon of declination—the lowering of the Fo curve along the rime axis—is well known. Often accents are described as excursions from this
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Duration
176 Deciding upon the Relevancy of International Featutes
Comparison ratios The Fo values of the 2nd and 3rd phrase can be put into the analysis separately, or they can be combined into comparison ratios; cf. Taylor & Wales (1987): for the two phrases that could be accented in their Australian English material, they computed three different comparison ratios: Division ratio — a/u. Subtraction ratio — a — u. Michaelson Contrast ratio — (a — u)/(a + u). . (a — accented, u — unaccented). In a multivariate regression analysis, they obtained much better results with the contrast ratio than with the two other ratios; the average values of R2 ('explained variance') are: contrast ratio subtraction ratio: division ratio:
0.8 5 o. 15 0.29
Unfortunately, Taylor & Wales have not done any analyses with the raw data that could be compared with our data. We computed comparison ratios for our variables as well and put them into regression and discriminant analyses; our results can be summarized as follows:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(hypothetical base-) line. In that case, a Fo peak later in the utterance must not have the same excursion height as an earlier peak to indicate an accent and/or the Fa. It could be possible for our material as well to base the analysis not on (properly transformed) absolute parameter values but on values that are put into relation to a falling declination line. We computed therefore both an abstract ('neutral') speaker-specific declination line based on NQs with an 'unmarked' declination and a concrete declination line for each utterance as an all point regression line. The prediction of the Fa based on these values was inconsistent and generally not as good as a prediction based on the values described in the previous section. The reason might be that our computation of the declination line is not the best one. Anyway, there seems to be virtually no agreement on adequate computation (cf. Lieberman 1986; Lieberman et al. 1985; t'Hart 1986; Ladd 1984; and Batliner 1989b: 72). In our opinion, a declination line is therefore still rather an object of investigation than an appropriate reference parameter. (In any case, the discriminant analysis takes into consideration the effect of declination because it is based on the distribution of the parameter values and not only on the absolute values.)
Anton Batliner 177
(i) The contrast ratio was not better than the two other ratios, (ii) The comparison ratios were not better than the absolute values. We cannot explain the huge differences between the results of Taylor & Wales and our results in (i); as a consequence, we did not work with comparison ratios, but with the separate parameter values of the 2nd and the 3rd phrase. (Again, the extra information contained in the comparison ratios are taken into consideration by the discriminant analysis because it is based on the joint distribution of the predictor variables of the 2nd and the 3rd phrase.) Results
(i) Learning sample: 5 speakers; test sample: 1 speaker (simulation of speaker independence: l$ti). This is the most relevant constellation for a speakerindependent automatic speech understanding system. 100 90 80 70 60 50
Ihl.lll off
reg
max2
max}
min2
mint
Predictors (univariate, l=t)
pos2
pos}
Figure 1 Per cent correct classifications: quesrions/non-quesrions
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In Figures 1 and 2, per cent correct classifications are displayed if only one variable is used as predictor variable in the (univariate) discriminant analysis. On the abscissa, the different variables are plotted; on the ordinate, the per cent correct classifications. For the Q/NQ -classification, durarion and intensity are not included, because they always produced results near chance level. For the Fa classification, not off and reg were used, but durarion and intensity. For Q/NQ (Figure 1), most of the variables are relevant, the most relevant ones being off, maxj, and then reg and m\ny (Of course, most of these variables are more or less correlated with each other; cf. Batliner 1989a: 37 ff). If one tries to predict the Fa and does not separate Qs and NQs [FaAll in Figure 2), the results are not very convincing; a separation of Qs and NQs yields better results. The most relevant variables are max} and dur} for NQs, and max2 and pos2 for Qs. Besides / — t (leam — test), multivariate analyses with two further learn and test constellations were conducted (Figure 3):
178 Deciding upon the Relevancy of International Features 100 FaAH FaQ
90
I
I FaNQ
"'in,
pus,
80
70
60
max 2
max j
mm 2
pos,
ctur,
durf
mt2
'"'.
Predictors (univariate, l = t)
Figure 2 Per cent correct classifications: Fa
100 90
80 70
50 Q/NQ
FaAil
FaO + FaNO
FaQ
FaNQ
Predictors (multivariate)
Figure 3 Per cent correct classifications
(ii) Learning sample: 1 speaker, test sample: 5 speakers (generalization from a single speaker to the other speakers: h 15). All the univariate discriminant analyses were done with I — 1. If we look at the corresponding multivariate analysis (all the variables are put at the same time into the analysis; I - / i n Figure 3), the classification is very good (always well above 00 per cent), best for Q /NQ; as for the Fa, the separation of Q s and NQ s
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
50
Anton Ba diner 179
60 50 40 30 20 10 %
0 -10 -20 -30 -40 -50
-60 max2
max t
min2
"tin,
pas,
pas t
dur,
dur,
Predictors (multivariate. l=t)
Figure 4 Correlation: predictors with discriminant function
im2
mf,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(FaQ and FaNQ and the weighted mean of these two groups FaQ + FaNQ in Figure 3) produces better results than an analysis with no separation of Q s and NQs (FaAll), especially for l$ti. Figure 4 shows the correlation of the predictors with the discriminant function in a multivariate analysis for I — t. The greater the correlation, the more relevant is the predictor. For the impact of the predictor on the assignment of the Fa, the signs are irrelevant. Ceterisparibus, a positive value indicates rather Fa on the 2nd phrase, and a negative value rather Fa on the 3rd phrase. (In our case, this procedure is more appropriate than the discriminant function of the predictors, as some of the variables are correlated with each other, cf. Klecka 1980: 33 f). The different relevancy of e.g. max2, max,, min}, pos2, and pos} for Qs and NQs shows up clearly. Generally, the results indicate that in Q s, other intonational parameters are used to mark the Fa or the same parameters are used in a different way than in NQs. The prediction is worse if Qs and NQ s are analysed together than if they are treated separately. Fa is classified better in NQs than in Qs. The explanation might be that in Q s the same parameters are used to indicate sentence modality as well as place of Fa; cf. especially the variable height of the Fo offset. There are therefore more degrees of freedom in Qs and consequently more possible confusions. The results under FaQ and FaNQ were achieved with a grouping into Qs and NQs 'by hand'. For / — t the grouping of the Q/NQ-classifier was used as an input to the FaQ- and FaNQ -classifier as well. The classification errors of the first step even improved the results (c(. the error analysis below).
i8o Deciding upon the Relevancy of International Features
CENTRAL AND MARGINAL TYPES We will now show the two converging strategies (cf. the first section) as to how to find the central types:
For the four central types, Figures 5-8 show the average feature values as well as the Fo contour of a typical production (four out of the nineteen cases): the dashed vertical line marks the border between the 2nd and the 3rd phrase of the actual production. For the 2nd and 3rd phrase, each of the filled squares shows averages for max2, min2, max} and min}. The position on the abscissa corresponds to the average position on the time axis in centiseconds starting from the beginning of the utterance; the position on the ordinate corresponds to the average Fo values in semitones above the speaker-specific basic value (sti,as). On the top of each figure average beginning point and duration of the 2nd and 3rd phrases is displayed. In the following characterization, the terms 'High',
20
A verage duration (146 cases)
18
I
16
1
2nd phase
3rd phase
14 12 j
10 8 6 4 2 80
100
120 Centiseconds
140
160
Figure 5 Focus on 2nd phrase, non-question, central type
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(i) Each of the 4 Q/NQ — Fa constelllations has one central type that is characterized by the average values of the predictors. (ii) We inspected those cases where a strong agreement among the listeners could be observed: practically all the listeners agreed upon the intended Q / NQ grouping, the place of the Fa, and the naturalness of the production {MOD > 80 for Qs and MOD < 20 for NQs, \FOK\ - 1, NAT< 2). Twenty-four out of the 360 cases passed these strict criteria. Nineteen cases could be identified as representatives of the central types.
Anton Batliner 181
'Low', and 'boundary tone' (cf. the tone sequence model, e.g. in Pierrehumbert 1980) are used interchangeably with the terms 'rising'/'falling' contour. (1) Focus on 2nd phrase, non-question (Figure 5): the contour is falling in bodi phrases (High Low). Max2 is markedly higher than max/, min2 and min3 do not differ.
20
Average duration (121 cases) 1 | > 1
18
i
I
2nd phase
3rd phase
14 maXin
7
12 max2 °
I 10 8
1
6-
j
4 2
a m'm3
A
-
J mm2
I I I
0 80
1 100
1 120
1
I
1
140
1601
Centiseconds
Figure 6 Focus on 2nd phrase, question, central type
20
Ave rage duration (38 cases)
18
|
i
1
16 -
3rd phase
2nd phase
14 12
j 10
Is A
\
1
1
6
1 1 1
8
4-
min2
2 0
-
max °
max2 a
D
ll
1
80
100
\
1
120 Centiseconds
i
140
i
i
160
Figure 7 Focus on 3rd phrase, non-question, central type
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
16 -
182 Deciding upon the Relevancy oflntonarional Features
20
Average duration (46 cases)
1
18 16
2nd phase
I
1 3rd phase
14 12
J
10
max3
A max2
8 6 4
0 100
120 Centiseconds
140
160
Figure 8 Focus on 3rd phrase, question, central type
(2) Focus on jrd phrase, non-question (Figure 7): the contour is again falling in both phrases (High Low). Max} is about as high as max2; tnin2 and min, do not differ. Comparing the two types, we can say that the absolute values for the features of the 2nd phrase in Figures 5 and 7 do not differ remarkably. It is rather the relative values of the features in comparison with the respective values of the 3rd phrase that marks the Fa. (3) Focus on 2nd phrase, question (Figure 6): the contour is rising in both phrases (Low High). (4) Focus on 3rd phrase, question (Figure 8): in the 2nd phrase, this type has a falling contour comparable to the NQs, whereas in the 3rd phrase, the contour is rising (Low High) Comparing these two types, we can say that the Fo range of the phrase with the Fa is markedly greater than that of the other phrase. In the final phrase, a rising contour (high boundary tone) is used for both types to mark sentence modality. The remaining five cases can be grouped into three marginal types which are displayed in Figures 9-11. To demonstrate the deviations from the central types, the respective average values are projected into the contours of the marginal types: (1) One speaker typically marked Fa in prefinal position with a falling contour (High Low), even in Qs. If one looks at the average feature values for all speakers and for this specific speaker, one could say that this marginal type across speakers is a central type for this speaker (Figure 9).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
2
Anton Batliner 183
A verage duration (121 cases)
120 Centiseconds
140
160
Figure 9 Focus on 2nd phrase, question, marginal type
Average duration (121 cases)
-
1
18
i
2nd phase
16
3rd phase
14
maxt a
12
max, a
J 10 : 8
<-• \
/
/I" mini
\
6 4
-
2 -
0
\
min, /
1
rA B V
1
1 1 100
1
'
!
1 11 1 120 140 Cenliseconds
1 160
Figure 10 Focus on 2nd phrase, question, marginal type
(2) Another speaker typically marked Q s only in the phrase with the Fa; i.e. with Fa in prefinal position, the final phrase showed a falling contour comparable to NQs (Figure 10). (3) The last marginal type, an NQ with Fa on 3rd phrase, could approximately be described as a 'hat-contour' (cf. Cohen & t'Hart 1967), i.e. a concatention of the two Fo-peaks on the 2nd and 3rd phrase and a low Fo-value at the end of the utterance (Figure 11).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
100
184 Deciding upon the Relevancy of International Features
20
Average duration
1
18 16
\~
2nd phase
3rd phase
14 12
J ,0
f\ max3
max2
8 6 4
I
0
80
100
I
mm 3 I
I
120 Centiseconds
140
160 t
Figure 11 Focus on 3rd phrase, non-question, marginal type
ERROR ANALYSIS For I — t, there are 27 misclassifications, 10 for Q/NQ and 19 for Fa (i.e. two double misclassifications). Question/non-question In all the 10 cases, Qs are misclassified as NQs. Eight cases are clearly misproductions, as they are not classified as Q s in the perception experiment (cf. (ii) in the section 'Perception experiments' above), and 9 got very low NATscores, i.e. they were judged as unnatural productions as well. This also explains the fact mentioned above that the classification errors of Q/NQ improved the results of the Fa classification: the items under question were misproduced as NQs, and the position of the Fa could therefore be classified correctly because the Fa had the intonational shape of an NQ. Focal accent In all but 2 cases, there are indications that the Fa -assignment in production and/or perception is not clear-cut: in 11 cases, there is no agreement between the results of the perception experiment FOK (cf. (iii) in 'Perception experiments' above) and another experiment, where listeners only had to decide upon the place of the sentence accent (cf. Batliner, 1989a: 30,65 ff.) In 12 cases,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
min2 •
2
Anton Batliner 185
there is a very weak agreement between the subjects in the perception experimentflFOiC|< .4). In 7 cases, the probability of group membership in the discriminant analysis is near 50 per cent (possibly because of a violation of a necessary assumption). To sum up the error analysis with respect to the placement of the Fa: this is not an easy task (cf. e.g. Lieberman 196 5 and Lickey & Waibel 1985), neither for the native speaker/listener nor for the discriminant analysis. We are playing safe when we conclude that the misclassifications did not occur because our statistical model was inadequate, but because of the inherent difficulty of placing the Fa.
In Figures 5-11, we have seen that the production of the four different Q /NQ Fa constellations is not uniform across speakers. In Figure 3, it can be seen that h 15, i.e. the generalization of one speaker to the other five speakers, yields considerably worse results for the prediction of focus than \<,ti (not for the prediction of Q/NQ, by the way). It is therefore very likely that different speakers use the predictor variables in a different way. This fact is illustrated in Figures 12-14, where the correlation between each predictor variable and the discriminant function are plotted for each speaker (S1-S6) separately. The higher the correlation, the more important is the variable; the signs are irrelevant. A positive value indicates rather Q (Figure 12) or Fa on the 2nd phrase (Figures 13,14), and a negative value rather NQ (Figure 12) or Fa on the 3rd.phrase (Figures 13, 14). If the bars had roughly the same height, all the speakers would use the parameter under consideration in the same way. Of course, a certain variability is normal; some of the differences might as well be traced back to automatic (physiological) processes or to co-variation with another variable. A clear-cut difference, however, can indicate an active process: the speaker uses different parameters or the same parameters in a different way. A more detailed discussion of the speaker-specific use of the parameters can be found in Batliner (1989a: 55 ff). We will just mention some of the most striking differences: (i) For 52-56, o^f is very relevant for the marking of Q/NQ, but not for 51 (Figure 12). S1 produces Qs with Fa on the 2nd phrase regularly with a falling contour (cf. the marginal type in Figure 10). (ii) For the Fa assignment in Q s, posj is much more important for S1 than for the other speakers (Figure 13). In that case, pos} co-varies with the height of the offset, cf. Figure 10.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
SPEAKER-SPECIFIC USE OF THE VARIABLES
186 Deciding upon the Relevancy of International Features
(iii) 56 uses pos2 for the Fa assignment in NQs, but not 5i-5$, cf. Figure 14 and the 'hat-contour' in figure 11 that was produced by 56. (iv) Duration and intensity are used differendy by die speakers, cf. Figures 13 and 14. Although diese differences might be caused by automatic processes to a certain extent, we will show in the following that these two parameters contribute to die marking of focus in dieir own way.
100 80 60
20 %
0 -20 -40 -60 _oj.
off
reg
max2
max,
min2
mint
pos2
pos3
Predictors (multivariate. l=t) Figure 12 Intra-speaker correlations: predictors with discriminant function: Q/NQ
- 6 0 L. mini
Pns->
/>f>s <
ditr-,
ditr(
Predictors (multivariate, l=t)
Figure 13 Intra-speaker correlations: predictors with discriminant function: FaQ
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
40
Anton Batliner 187
"W-i" ?
min 2
nun ?
pus->
pos
t
dur-,
dur^
int;
Predictors (multivariate, l=t)
Figure 14 Intra-speaker correlarions: predictors with discriminant function: FaNQ
W H I C H VARIABLES ARE T H E R E L E V A N T O N E S ? Coming back to the title of this paper, 'Deciding upon the relevancy ...', it turned out that some transformations of the variables considerably improved the classification. We have not shown yet whether some variables might be irrelevant—candidates are of course intensity and/or duration. In Figure 15, per cent correct classification are plotted for / — / and / 5 / 1 , if we—stepwise— exclude (i) intensity, (ii) intensity and duration, and (iii) intensity, duration and
100 g 3 no ini I I no int dur fcffj no int dur pos
%
91)
80
FaQ(l =
FaNQ(l = t)
FaO(l5tl)
Constellations
Figure 15 Per cent correct classifications (' 5»
FaNO(l5tl)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
max2
188 Deciding upon the Relevancy of Intonational Features
CONCLUSION The purpose of this study was to find out how focus is marked intonarionally in German. We have shown that all three intonational parameters are used for this task (in order of importance: Fo, duration, and intensity). Speaker-specific or utterance-specific transformations of the features improved their relevancy. Using two different approaches, a statistical and a 'psychological' one (average values and perception experiments), we arrived at central (mostly used) and marginal (rare but acceptable) types. The results indicate that the focal accent is marked differently in questions and non-questions. Speaker-specific ways to use the intonational parameters for the marking of focus were observed. Generally, the focus could be predicted with a high probability (up to 96 per cent), depending on the chosen constellation and/or transformation.
Acknowledgements This research was financed by the Deutsche Forschungsgemeinschaft (DFG). It was carried out in close cooperation with E. Noth (University of Erlangen). Parts of this paper were published in Batliner & Noth (1989); a more detailed presentation can be found in Batliner (1989a); cf. also Batliner et a\. (1990). ANTON BATLINER Institutfur Deutsche Philologie Universitdt Miinchen Schellingstr. 3 8 Miinchen 40 FRG
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
position. It can be seen that the prediction gets worse. (The only exception of this step function is FaQ with no intensity and no duration. The reason might be that int2, int}, and dur2 are rather irrelevant for Qs, cf. Figure 2.) In this range, a difference of 2 per cent for example—about 7 cases out of 360—is not a small difference if one considers the (informal) '80/20-rule': that it costs 20 per cent expenses to get 80 per cent of the results, but for the remaining 20 per cent one needs 80 per cent expenses. Note that, generally, the classification gets worse if an additional and irrelevant predictor variable is put into the analysis. In our case, the classification gets better if more variables are added—therefore, duration and intensity might be of minor importance but they cannot be irrelevant. In other words, if only a tonal model is used that does not take into consideration these two parameters, quite a lot about the placement of the Fa can be said, but it is not exactly the whole story of the marking of focus by intonational means.
Anton Batliner 189
REFERENCES
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Altmann, H. (1987), 'Zur Problematik der Klecka, W. R. (1980), 'Discriminant analysis', Konsritution von Satzmodi als FormtySage University Paper Series on Quantitapen', in J. Meibauer (ed.) Satzmodus tive Applications in the Social Sciences, zwischen Grammatik und Pragmatik, 07-019, Sage Publications, Beverly Hills Niemeyer, Tubingen, 22-56. and London. Altmann, H. (ed.) (1988), Intonationsfor- Ladd, D. R. (1984), 'Declination: a review and schungen, Niemeyer, Tubingen. some hypotheses', in J. C. Ewen & J. M. Altmann, H., A. Badiner & W. Oppenrieder Anderson (eds), Phonology Yearbook, I, 53(eds) (1989), Zur Intonation von Modus und 74Fokus im Deutschen, Niemeyer, Tubingen. Lickey, S. A. & A. Waibel (1985), 'Perceptual Batliner, A. (1988), 'Produktion und Pradikstress assignments', in A. Waibel (1986), tion: Die Rolle intonatorischer und Prosody and Speech Recognition, Carnegie anderer Merkmale bei der Bestimmung Mellon University, Computer Science des Satzmodus', in H. Altmann (ed.), 207Department, 192-8. 21. Lieberman, P. (1965), 'On the acoustic basis of Batliner, A. (1989a). 'Fokus, Modus und die the perception of intonation by linguists', grofie Zahl: Zur intonatorischen IndizierWord, 21:40-54. ung des Fokus im Deutschen, in Altmann, Lieberman, P. (1986), 'Alice in declinationBatliner & Oppenrieder (eds), 21-70. land: a reply to Johan 't Hart', J. Acoust. Soc. Am., 80: 1840-2. Batliner, A. (1989b), 'Fokus, Deklinarion und Wendepunkt', in Altmann, Batliner & Lieberman, P., W. Katz., A. Jongman, R. Oppenrieder (eds), 71-85. Zimmerman & M. Miller (1985), 'MeasBanner, A. & E. Noth (1989), 'The prediction ures of the sentence intonation of red and of focus', Proceedings of the European spontaneous speech in American English', Conference on Speech Communication J. Acoust. Soc. Am., 77,649-57. and Technology, Paris, 26-28 September Norusis, M.J. (1986), SPSSPC+Advanced 1989,210-13. Statistics, Chicago: SPSS Inc. Batliner, A., E. Noth, R. Lang, & G. Stallwitz, Noth, E. (1991), 'Prosodische Information in (1989), 'Zur Klassifikation von Fragen und der automatischen SpracherkennungNicht-Fragen anhand intonatorischer Berechnung und Anwendung, Niemeyer, Merkmale', Fortschritte der Akustik-DAGA Tubingen. •89, Bad Honnef: DPG-GmbH, 335-8. Oppenrieder, W. (1988), 'Intonation and Batliner, A. & W. Oppenrieder (1989), Identifikation: Kategorisierungstests zur 'Korpora und Auswertung', in Altmann, kontextfreien Identifikation von SatzmoBatliner & Oppenrieder (eds), 281-331. di', in H. Altmann (ed.), 153-67. Batliner, A., W. Oppenrieder, E. Noth & G. Oppenrieder, W. (1989), 'Fokus, FokusproStallwitz (1990), '"Neue Information" im jektion und ihre intonatorische KennSprachsignal. Die prosodische Markierung zeichnung', in Altmann, Batliner & der Fokusstruktur', Fortschritte der Akustik— Oppenrieder (eds), 267-80. DAGA 'go DPG GmbH, Bad Honnef, Pierrehumbert, J. B. (1980), The phonology 1059-62. and phonetics of English intonation', Ph.D. Dissertation, MIT. Cohen, A. & J. 't Hart (1967), 'On the anatomy of intonation, Lingua 19:177-92. Taylor, S. & R. Wales (1987), 'Primitive 't Hart, J. (1986), 'Declination has not been mechanisms of accent perception', Journal defeated: a reply to Lieberman el al\ J. ofPhonetics, 15: 235-46. Acoust. Soc. Am, 80: 18 3 8-40.
Journal of Semantics 8: 191-218
© N.I.S. Foundation (1991)
Automatic Recognition of Focus Accents in German ROBERT BANNERT Fraunhofer Institutefor Industrial Engineering, Stuttgart
Abstract
THE GENERAL FRAMEWORK The present research is carried out in a remarkable and promising context. Through an interdisciplinary research group it has been possible to realize a dream of speech processing: to unite semantics, syntax, speech recognition and the phonetics of intonation. A serious attempt is made to outline a model for the evaluation of focus intonation in spoken dialogue. Semantic focus and its projection into the speech signal, the focus accents, are the common goals of interest. Focussing as coherence of text A typical feature of natural language is to be found in the coherence of texts, whether they are written or spoken. To be sure, it was relatively late when the insight that natural-language texts are not merely a concatention of grammatically permitted sentences became a conscious part of common linguistic knowledge. The natural coherence of texts is achieved by a variety of linguistic and phonetic means. Among these means of expression, the prominence of certain words (contents, semantic parts), the so-called focus, as it appears in texts, represents a very important and significant means of clearly marking the connection of semantic parts in consecutive sentences. For the sake of illustration, in the following fragment of text, the focused words are shown as capital letters: Haben Sie schon geHORT? Der alte STRACHSCHEWSKI ist gestorben. Ach . . . JaJA. Soil wohl ganz PLOTZLICH gekomm' sein. Im HERBST hat
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This paper presents a theoretical foundation of German intonation. It describes the automatic recognition of focus accents and specifies a recognition algorithm. Results of the analyses of complex utterances concerning their tonal characteristics are presented. This work is based on an acoustic-phonetic model of generating the Fo-contour in German. This generative model that contains phonetic and phonological rules is modified in order to recognize focal accents.
192 Automatic Recognition of Focus Accents in German
er ja noch die APFELernte gebracht, aber da war er wohl schon sehr KRANK. (Have you HEARD already? Old STRACHSCHEWSKI has died. Oh YES. It must have happened quite SUDDENLY. Last AUTUMN he managed APPLE picking, but then he must have been very ILL already.)
Focus andfocus accent It is generally known that in a syntactically defined sentence there may be one focus or several foci, depending on the number of semantic parts or elements of the proposition that are to be made prominent to the listener. These semantically important and prominent parts of texts are marked and signalled in special ways by certain phonetic means. Due to this special signalling, the semantically important parts in the speech signal stand out clearly and are easily perceived by the listener. Today we know that semantic focus in German, among other languages, is transmitted acoustically, above all, by prosodic means and, in turn, especially by fundamental frequency (Fo). It is due to this relationship that the direct connection between phonetics, i.e. the speech signal and the hearer, and the semantics, i.e. the message and the speaker, is established. The projection of the semantic focus on to the basic lexical parts ('words') will be termed focus accent. This is illustrated as follows: Speaker A: Speaker B:
I believe that JOHN loves MARY. No, he loves SUSAN.
In the second sentence, 'Susan' is focused and clearly marked by the focus accent which, in this case, is manifested primarily by tonal means. Up to now no clarity and uniformity of opinion on this phenomenon and its interlinguistic connections, with respect to German, has been found in the phonetic and linguistic literature. Other expressions used in the literature to denote this focus projection on to the acoustic level of the speech signal are word accent (Wortakzent), main accent (Hauptakzent), and sentence accent (Satzakzent) (cf. e.g. von Essen 1956; Thorsen 1982, Bannert 1985b). The acoustic correlate of a focus accent in German is a clear and marked change of Fo which is clearly
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Focusing different words would result in destroying the text with respect to its semantic and pragmatic message to the effect that the true coherence would be lost. To our knowledge, the tonal information contained in focus accents and representing a direct link between semantics and phonetics has not been exploited in speech recognition up to now. In this respect, too, our approach amounts to true innovation and pioneering effort.
Robert Bannert 193
Integration of focus recognition, speech recognition and spoken dialogue The present research on automatic recognition of focus accents is carried out in a new and very promising framework. It is part of the project 'Model Formation for the Evaluation of Focus Intonation in Spoken Dialogue'. Apart from the recognition of focus accents, the project also contains the semantic component of dialogue handling and the acoustic component of speech recognition. Figure 1 presents a simiplified diagram of the structure of the project work. A spoken sentence (an utterance), produced by the dialogue component, is analysed and processed twice according to, above all, spectral criteria by the speech recognition component and to purely tonal information encoded in Fo by the intonation component. The speech recognition component is provided with information by the intonation component pointing to which word or words carry a focus accent. The speech recognition component, in turn, passes recognized sentences, including the focus markings, on to the semantic dialogue component. Finally this component uses this feedback information on the two phonetic components when carrying out its dialogue processing.
Aim The present contribution deals exclusively with the intonation part of the project. After this short presentation of the general framework of our work, the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
audible. The Fo-change is also clearly visible in an Fo-analysis of the speech signal and an appropriate plot of the Fo-curve. This tonal change may look different and may vary to different degrees. In auditory terms diere may be a tonal jump up or down. Acoustically this amounts to a Fo-rise or fall of varying degree. The amount of the Fo-change from a minimum to a maximum and from a maximum to a minimum may vary accordingly. However, in order for the listener to be able to perceive a focus accent, the tonal change has to pass a certain threshold value. As is already known, this value must not be set absolutely but is instead dependent on the absolute range of Fo-variation of the speaker. Bearing this psycho-acoustic fact in mind, focus accents in German, when their tonal patterns and even their temporal extension are concerned, show a rather uniform picture. This statement is based on a large amount of data on German intonation that was delivered by a great number of speakers from different regions of West Germany and that comprised a great amount of material where different parameters had been varied. It should also be noted in this connection that focus accents interfere in various ways with other tonal features of the language, namely the word accent and the phrase boundary.
194 Automatic Recognition of Focus Accents in German
DIALOGUE HANDLING (semantic, syntactic, lexical, pragmatic)
J
\
1
INTONATION
Fo-analysis and processing
SPEECH RECOGNITION
Spectral analysis Hidden Markov Models
Figure i Schematic and simplified representation of project structure
theoretical foundations of German intonation and the recognition of focus accents are outlined. Then the algorithm for the recognition of focus accents are described and finally results concerning the tonal characteristics of prosodically complex utterances in German are presented for the first time.
THEORETICAL F O U N D A T I O N S In order to be able successfully to accomplish our aim, certain theoretical foundations have to be presupposed. Some years ago an acoustic-phonetic model of generating the Fo of German utterances, the first one of its kind, was proposed (Bannert 1983, 1985a, b). Based on this intonation model, theoretical considerations concerning the principles of recognition of focus accent are worked out.
A model of German intonation As far as German is concerned, the first attempt was made by Bannert (1983, 1985a, b) to design a generative model for German intonation based on acoustic data. This model, innovative as it was for German, was built taking into consideration the basic structure of the intonation models already existing for
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Recognition of focus accents
Robert Bannert 195
VG - (m • v + w + k + j + . . . + n)Fo Mia, It takes into account the contribution of several different factors and relates them to the absolute low fundamental frequency Fo (or tone) at the end of statements of a given speaker (Bannert 1985: 335). Figure 4 illustrates the behaviour of the sentence accent, i.e. the final word accent of prosodically simple utterances. The sentence 'Der franzosische Konig war ein launischer Geselle' (The French king was a humourous guy) was produced as a statement, echo question, and information question. In the test sentences, the distribution of the sentence accent was varied across the four main word positions of the sentence in all three utterance types. In the bundles of Fo-curves, the common tonal feature of sentence accent stands out very clearly. After the sentence accent, the Fo-curve continues smoothly without any marked movement towards the high or low end of the contour.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
some other languges, e.g. Dutch (t' Hart & Cohen 1973), Swedish (Bruce 1977), and Danish (Thorsen 1978). It should be obvious that this first attempt, quite naturally, had to be limited in various respects. Among other things, it was restricted to simple prosodic phrases (intonational units) and to two accent types. However, basic linguistic and phonetic parameters were included, e.g. type of utterance (statement, information question, echo question), number of word accents in the phrase (1 to 8), involvement (normal/neutral intonation, emphasis), and syntactic structure (distribution of syntactic boundaries). Thus a wide range of significant tonal variation was delimited. A rather extensive speech material spoken by four North German speakers was investigated. One major finding of this investigation was the systematic character of German intonation which is illustrated in Figure 2. The utterance type of statement is terminated by a low boundary tone and that of question is signalled by a final high tone. Echo questions show a larger tonal range. It has to be pointed out especially that this increase of tonal range is achieved by lifting up the Fo-maxima. As can be seen in Figure 2, the tonal floor defined by the accent minima remains rather constant. Word accents appear as tonal movements (rises and falls) superimposed on the tonal floor. However, there is one exception. The final word accent of statements is manifested as a tonal fall throughout the accented vowel. All these observations comprise the basis for the intonation algorithm which, in five steps, is capable of generating the Fo-contour of any prosodically simple utterance. The working of the tonal algorithm is shown in Figure 3. It should be noted that this model does not contain a base line nor a top line. As another positive feature, it captures the dynamic nature of the word accent, i.e. the tonal movement caused by the accent. The amount of the tonal accent movement (VG) is expressed by the general formula
196 Automatic Recognition of Focus Accents in German
FoHz I I:
I: I:
Statement Echo question Information question
-
- 400
I •
I : /.'
200
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
- 200
Figure 2 Normalized and superimposed Fo-contours of four utterances containing I, 2, 3, and 8 accents of speaker B. Line-up point here is the end of the tonal contour bringing out the systematic structure of contours (from Bannert 1985a)
Robert Bannert 197
F0
STATEMENT
l
I
INFORMATION QUESTlON
minf min,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
0 I U
mu
U U I U U I U U
I U
mu
U U I U U I U U
0
1
2
3
4
Accentuated vowels
0 Unaccentuated vowels Figure 3 The intonation algorithm illustrating the generation o f the tonal structure for a sentence with 4 accents ( I , 2, 3, 4) as statement (left) and information question (right) (from Bannen 1985a)
I 98
Automatic Recognition of Focus Accents in German
STATEMENT
, ECHO QUESTION
Der fran z o sische K o ni gwar einlauni scher Ge s e
II
-------
........................ -.-
e? !!
INFORMATION QUESTION
...
Warderfran
...
\..I.' ......... -.,.- 1f C;.-
2 osi sche K onig ein ....................... -.-.-
I aun i scherGes --------
e
II
e?
Figure 4 Superimposed Fo-curves of focus in statement, information question, and echo question. The focused word in each version is underlined with the symbol of the curve (from Bannen 1985b)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Der fran z 6 s ische K o n i e wareinl aunisch er Ges e I1 e
Robert Bannert 199
Johannes liebt Susanne. (John loves Susan.) taken as the utterance type statement, the following features are varied: (1) type of word accent: peak accent/bridge accent (2) form of word accent: high accent/low accent (3) engagement: normal (neutral)/empharic Out of all the possible versions, 17 intonation patterns are chosen and shown schematically in Figure 5. A normal (neutral) accent is symbolized in orthography by ', and an emphatic accent by " over the accented vowel. The symbol T marks the low accent. Otherwise the peak accents are realized as high accents. A possible declination of the Fo-contour throughout the utterance is not taken into consideration in this schematic representation.
Concepts of focus accent recognition The intonation model outlined above contains phonological and phonetic (tonal and temporal) rules that are capable of generating the Fo-contour of a given utterance using linguistic information. Word accents are characterized by marked tonal movements. In what way will it be possible to use the generative intonation model in order to recognize word accents? The basic idea for recognizing word accents must be to reverse the rules that generate them. Starring out from the speech signal and the analysed Fo-curve, one has to search for those tonal movements that are caused by word accents. All other kinds of information contained in the Fo-curve has to be dismissed. Even if this strategy seems straightforward, some considerable obstacles must be overcome before the pure Fo-movement
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Starting out from these facts, an intonation model was developed that contains only a few phonologically tonal features. These tonal features and their acoustic manifestation are shown in Table 1 (cf. Bannert 1985b). Their tonal specification can be expressed acoustically as Fo-value, and auditorily as semitone value. In the meantime, it has become evident that the model has to be completed with a few features. This refers, in principle, to the number of word accents. In my preliminary version of the intonation model, only two word accent types are listed: the high accent (peak accent) and the falling accent. The latter represents a special case; it is the final word accent in statements. Perhaps two more accent types have to be recognized, namely the low accent and the bridge accent (the hat pattern of Dutch Intonation, cf. 't Hart and Cohen 1973; cf. also Wunderlich 1988). The large range of possible variations of the intonation model may be illustrated by the following example. In the prosodically simple sentence
Table I Linguistically motivated tonal features in German and their main acoustic manifestation Short description and function
Tonal specification (point or level, range)
Acoustic manifestation
+ ACCENT
stressing of the most salient syllable of the word
high
falling tonal movement (from high to low) during the final accented syllable in intonation type + COMPLETED (— statement)
low
rising tonal movement (from low in the preaccented consonant to high in the post accented consonant) during the accented syllable in all other cases high tone (tonal rise) at the end of non-utterance final phrases
+ PHRASE BOUNDARY
internal, medial phrase boundary, end of nonutterance final phrase
high
+ COMPLETED
boundary feature, terminal juncture, end of utterance and intonation-type statement boundary feature, terminal juncture, end of utterance and intonation-type question special, paradigmatic prominence. Domain: one lexical element (accent)
low
absolute low tone at the end of an utterance
high
high tone at the end of an utterance
wide
enlarged tonal movement by raising the Fomaximum of a given accent
wide
enlarged tonal movement by raising the Fomaxima of all accents of a phrase
- COMPLETED
+ CONTRAST
+ EMPHASIS
special, syntagmatic prominence of a phrase Domain: the whole phrase/ utterance
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Tonal feature
-«
A
-A.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Figure 5 Variation of word accent types and engagement in the sentence 'Johannes liebt Susanne' 0ohn loves Susan). Schematized Fo-curves
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
Johannes liebt Susanne.
aded from jos.oxfordjournals.org by guest on January 1, 2011 P 3
tD
2O2 Automatic Recognition of Focus Accents in German
corresponding to word accents is accomplished. This is due to the problems inherent in the speech signal and its defective acoustic analysis into Fo. This problem will be considered now. Phonetic-acoustic aspects of word accent recognition
(1) Die Moni will malen lernen. (Moni wants to learn painting.) (2) Der Pzppipzckt den Pickel. (Daddy grabs the pick.) After Fo-analysis of utterance (1), a complete Fo-curve is obtained (in most cases). However, the Fo-curve belonging to sentence (2) is interrupted at six places that correspond to the voiceless consonants or consonant clusters. In spite of this, for sentence (2) the intonation does not sound incomplete. When we acquired our first language, we learned to interpolate these gaps in the Fo. Furthermore, an analysed Fo-curve contains yet other features that are superimposed on to the basic Fo-curve as a consequence of sound production. This is the phenomenon of the so-called micro-intonation which is caused by segmental features other than voice and which definitely has to be seen as a distortion of the original, ideal Fo-curve intended by the speaker. It is generally known that every sound will affect the frequency of the vibrations of the vocal folds—and thus Fo proper—to a different but characteristic degree. This disturbance of the basic Fo-course shows up especially clearly when it precedes and follows those gaps in the Fo that are caused by voiceless consonants. However, even fully voiced consonants, above all voiced obstruents like [v, z, j , b, d, g], will change the Fo locally by causing the Fo to fall sharply and then rise abruptly within the domain of this segment. Further incorrect Fo-values that are to be observed in the Fo-curve may be caused by an imperfect Fo-analysis. Thus it happens that Fo-values are indicated superfluously where there cannot be any. This is the case (e.g. in the middle of voiceless consonants, especially the sibilants [s, s]) where the Fo-value equals zero. The opposite case is also to be found when Fo-values are missing in fully voiced segments. In the traditional processing of Fo-curves the phonetician, thanks to his knowledge and experience, has learned visually to correct these errors of Foanalyses and distortions of the Fo-curve by comparing it to the complete and undistorted Fo-curve.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Due to speech production, among other factors, the speech signal contains voiced and voiceless segments. Thus the fundamental frequency Fo is only to be found at those instances in the speech signal when the vocal folds are vibrating during sound production. However, the listener perceives the intonation of an utterance as an uninterrupted whole even if parts of the Fo-curve are missing. This is demonstrated by contrasting the following sentences.
Robert Bannert 203
A L G O R I T H M FOR R E C O G N I T I O N OF FOCUS ACCENTS From a phonetic and acoustic point of view, the following steps of processing are necessary in order to achieve the recognition of focus accents by means of pure tonal information. Two main components of this processing are distinguished: (1) restoration and simplification of the analysed Fo-curve; (2) focus accent recognition.
Restoration ofthe Fo-curve When analysing Fo from the speech signal, we use an algorithm that carries out a Fast-Fourier-Transform analysis and then processes the local maxima in the frequency range 0-1.25 kHz. Tests have shown that this algorithm works much faster than the API programme of the ILS package. However, the algorithm
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Considering all these facts, it seems necessary as a first step automatically to correct and complete the Fo-values of the analysis algorithm before the search for the word accent can start. For this step, the following phonetician's strategy will be adopted. The Fo-values of vowels and sonorants will be kept while diose parts of the Fo-curve that are obviously incorrect or missing are corrected or completed respectively, trying to produce a smooth curve. The result should be the pure or intended Fo-curve that carries all the linguistically relevant tonal information. Another method of processing the raw, analysed Fo-curve, namely the stylization of the curve by straight lines, will be used at a later stage of the project. After having corrected and completed the Fo-curve successfully, die focus accents will stand out in the Fo-curve as relatively clear and marked tonal changes in the processed Fo-curve. These changes, due to the focus accents, may be of different kinds; a focus accent may be realized as a peak accent (a Forise immediately followed by a fall) or as a low accent. The fall is made during the pre-accented consonant. A third kind of focus accent is called a bridge accent (the hat pattern in Dutch intonation). This accent type contains two accents, the first one being a rise, the second one a fall. The bridge accent may be considered as a dilated peak accent. The external constraints for our recognition of focus accents are defined as follows. The sentences, prosodically simple or complex, will be spoken with normal, conversational loudness and clear articulation. The recognition of focus accent should be speaker-independent, i.e. it should work for different male and female voices, including different regional varieties of German.
204 Automatic Recognition of Focus Accents in German
makes many errors when the utterance contains voiceless obstruents. These errors of the analysing algorithm represent of course some problems that make the processing of the basic Fo-data much more difficult. It may even happen that no Fo-values are found by the algorithm. This is especially the case at the end of statements which is due to the rather low intensity at the end of utterances. It should be quite obvious that all these cases where the Fo is missing represent very serious problems for the successful restoration of the Fo-curve and the focus accent recognition. The restoration of the Fo-curve or the preprocessing is carried out in five steps:
Focus accent recognition After the preprocessing of the analysed Fo-values, i.e. the correction of the errors of Fo-analysis and of the phonetic distortions of micro-intonation, the search for the focus accents can be started. Focus accents are projected on to the Fo-curve as local and relatively large tonal changes either as rises or falls, superimposed on to the global declination given by the Fo-minima. The recognition of the focus accents is carried out in three steps: 1. Determination of the gross course of the Fo-curve. The tonal changing of the Fo-curve up or down is described and determined. 2. Evaluation of the course of the Fo-curve. It is determined if a given tonal change corresponds to a linguistically relevant movement, i.e. signalling a focus accent. Two conditions have to be fulfilled simultaneously: (a) the tonal movement has to extend over a certain span of time. For our male speaker it amounts to 18 frames (— 115.2 ms).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
1. Deletion of Fo-values incorrectly indicated by the analysis algorithm. They appear sporadically in larger gaps of the Fo-curve caused by voiceless consonants. 2. Completion of Fo-values not indicated by the analysis algorithm. They are to be found sporadically. 3. Smoothing of Fo-values distorted by certain segmental features. These values are to be found before and after gaps of voiceless consonants and are caused by aerodynamic processes in the production of voiceless obstruents. 4. Straightening of the Fo-curve across the local dip (fall-rise) of voiced obstruents. 5. Bridging of the gaps of voiceless consonants. The concatenation of the Focurve is achieved by connecting the parts of the Fo-curve to the left and to the right of gaps applying a curve function of higher order, namely a parabola of the form y — ax2 + bx + c. Linear interpolation is ruled out because in this case the restoration of the Fo-curve would be too deviating.
Robert Bannert 205
(b) The tonal movement has to exceed a certain threshold value. For our male speaker, the frequency change AFo ^30 Hz. Both conditions reflect the two-dimensional nature of intonation. It is just the characteristic feature of word accents that they are easily detectable in the Focourse of an utterance standing out as a relatively large tonal change that extends over 1 to 3 syllables. Due to this double nature, word accents are distinguished from effects of irrelevant factors like micro-intonation and shortcomings of the analysis algorithm.
These phonetic-acoustic steps are implemented in a program package for recognition of focus accents. This is done in C on a VAX 11/750 with the VMS 4.3 operative system. Some illustrations of different Fo-contours of the sentence 'Johannes liebt Susanne' (John loves Susan) and the focus accents recognized and marked along the time axis by the recognition algorithm are shown in Figure 6.
I N T O N A T I O N IN PROSODICALLY COMPLEX UTTERANCES In this section, an important addition to the intonation model proposed by Bannert (1985a) will be presented. A large material of prosodically complex utterances spoken by four speakers was analysed and described. But to start with, I shall set out by characterizing the relationships between large prosodic units and syntax.
Prosodic and syntactic units Today it is generally acknowledged that semantic, morphological, and syntactic units on the phonological level are divided into hierarchically structured units that are larger than the acoustic segment. Different prosodic structures of utterances that originate from one syntactic sentence depend on various factors, not only linguistic or pragmatic like, among others, speech tempo, register, and
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
3. Marking of that part of the time axis that corresponds to a focal accent or accents. Thus the tonal phenomenon of focal accent in the Fo-curve is projected on to the time axis which is identical to the speech wave. This temporal information contained in the time axis is passed on to the speech recognition component. This component, in turn, transfers the temporal markings of the focal accent to orthographic words (—focused words) which is the information transmitted to the dialogue component.
206 Automatic Recognition of Focus Accents in German
2301-
180
130
80 50
100
150
200
250
300 [Frames]
230 i -
180
130
80
50
100
150
200
250
300 [Frames]
Johannes liebt Susanne. (Bridge accent)
230
180
130
80 50
100
150
200
250
Johannes liebt Susanne. (Peak accent)
300 [Frames]
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Johannes liebt Susanne. (Peak accent)
Robert Bannert 207
230 1—
180
130
I
80 50
100
150
200
250
300 [Frames]
Johannes liebt Susanne. (Low accent) Figure 6 Fo-contours of the sentence 'Johannes liebt Susanne' showing different distributions of the focus accent and different accent types. The focus accents recognized by the algorithm are marked along the horizontal axis below
psychological state of the speaker. Thus the prosodic structure of an utterance need not to be identical to the syntactic structure of the corresponding sentence. As a consequence of the hierarchical structure of the prosodic units, it may be the case that a spoken utterance appears as a different prosodic unit on different levels of the hierarchy. As an example, the utterance 'Ja' (Yes) represents prosodically a syllable, a word, a stress group, a phrase, an utterance, and a text simultaneously. In what follows, an attempt will be made to define some current prosodic units relevant to the intonation model. Some previously used concepts will be defined formally, related to other prosodic and phonological units, and their correspondence on the syntactic level will be shown. A basic prosodic unit of analysis is the prosodic phrase as used in Bannert (1985a). It constitutes a prosodically complete unit, defined rhythmically and tonally, that may be related to syntactically different structures. Prosodic phrases may correspond to complete sentences, e.g. (3) Das Gemalde von Kandinsky ist gestern versteigert worden. (The painting of Kandinsky was auctioned off yesterday.) (4) Der Miiller will die Manner immer Lummel nennen. (The miller will always call the men for louts.) Prosodic phrases, however, may also correspond to these sentences expanded, e.g. by adjectives or prepositional phrases, or to elliptical phrases, e.g. nominal phrases like (5) Die langeren Manner. (The taller men.)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
^MB
208 Automatic Recognition of Focus Accents in German
Prosodic phrases are characterized by prosodic patterns including the number and distribution of word accents and the (final) boundary tone. Parallel to the prosodic phrase (5) that corresponds to only one syntactic phrase, and which could appear as an elliptical sentence in a dialogue (cf. Bannert 1985 b), complete syntactic sentences can be found easily that nevertheless show the identical prosodic structure seen at a given prosodic level, for example the number and distribution of word accents and intonation type ( = final boundary tone). (6) Die Sangerinnen schweigen. (The female singers are silent.) (7) Ihr Sohn kann nicht tot sein. (Her son cannot be dead.)
(8) '/Manchmal aber/'Vund dann ohne grofie Ankundigung/2Vblickte er sie finster unter seiner gerunzelten Stirn an/3tVso dafi sie beinahe das Fiirchten bekam./4 (Sometimes, and this without any notice, he looked at them gloomingly with a wrinkled forehead, that she almost got frightened.) One possible division of this utterance into prosodic phrases is indicated by slashes (phrase boundaries), the indices denoting the serial number of the prosodic phrases. The whole utterance that corresponds to a linguistic construction with a very complex syntactic hierarchy, on the prosodic level, is divided into four smaller prosodically coherent parts, namely the Prosodic Phrases. It is evident that other divisions are possible. This prosodic division into smaller phrases is signalled by different prosodic means of expression. At the same time, the whole utterance itself appears prosodically (especially due to the tonal marking) as an inseparable entity. From that follows that one should assume a hierarchy of prosodic units and that the Prosodic Utterances is positioned one level above the Prosodic Phrase (cf. also Nespor & Vogel 1986). Our concept of Prosodic Phrases (Prosodische Phrase) seems to correspond to the 'Intonational Phrase' of Nespor & Vogel, our Prosodic Utterance (Prosodische Aufierung) relating to their 'Phonological Utterance'. On the analysis level above the Prosodic Utterance, the unit of the Prosodic
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In these examples, the prosodic phrase equals the whole utterance, i.e. the phonological unit Prosodic Phrase coincides with the phonological unit Prosodic Utterance. However, an utterance need not consist of one Prosodic Phrase only. In spontaneous speech and normally in utterances of considerable length, the utterance is divided into two or more prosodic phrases by the speaker. As an example, consider the prosodically complex utterances presented in Bannert (1985b):
Robert Bannert 209
Text or Discourse should be introduced. As has been shown by a number of studies (e.g. Lehiste 1975; Bruce 1982; Thorsen 1984), a special prosodic unit showing, among other things, typical and characteristic temporal and tonal features is formed by two or more successive and coherent sentences (of a monologue or dialogue). As an illustration, consider the following text: (9) Die Sonne scheint. Mein Vater, der heute frei hat, maht den Rasen. Peter geht baden. (The sun is shining. My father who is free today is mowing the lawn. Peter is out bathing.)
(10) Mein Vater, der heute frei hat, mahte den Rasen. The graphic representation of the syntactic hierarchy of this complex sentence is roughly illustrated in Figure 7. The first Nominal Phrase of this sentence contains a subordinate clause. The prosodic representation of this sentence is shown in Figure 8. It is to be seen that the three prosodic phrases of the prosodic utterance are obviously arranged into a linear pattern. Comparing the prosodic structure with the syntactic structure clearly shows the principle now generally known. It states that the intonation of utterances is structured much more simply than their syntactic counterpart. This principle of prosodic simplicity will be used when formulating prosodic rules. The linguistic structure of the input, however, does not only contain the information about the hierarchically dependent prosodic units U and P. To each of these symbols one of two values is assigned. The Prosodic Utterance may either be of the intonation type completed (statement, low boundary tone) or non-completed (question, high boundary tone). Therefore, the symbol U has to be indexed, A representing a statement (Antwort) and F a question (Frage). The Prosodic Phrase P may be progredient, i.e. linked strongly
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The prosodic text is made up of three prosodic utterances, each of them being marked by a boundary signal (low tone) at their ends. The global course of the Fo throughout the text also shows typical features. The first and third prosodic utterances are at the same rime and at the level below prosodic phrases. The second utterance in the middle of the text contains three prosodic phrases that are set off by internal (medial) boundary tones and that, in turn, are subordinated to the prosodic utterance. It should be obvious that texts are characterized by complex and multi-level prosodic relationships. My choice of German terms is motivated by reasons of parallelism of expressions bringing out the common bond in the prosodic hierarchy. With respect to the present investigation, the two units of Prosodic Utterance (U) and Prosodic Phrase (P) are put into the focus of interest. Let us start with the second phrase of the text above:
2io Automatic Recognition of Focus Accents in German
der heute frei hat,
maht
den Rasen.
Figure 7 Syntactic representation of the complex sentence 'Mein Vater, der heute frei hat, maht den Rasen'
[
I Mein Vater, | | der heute frei hat, "I f maht den RaserTI ? LP J p Lp JpLp Jv
Figure 8 Prosodic representation of the sentence 'Mein Vater, der heute frei hat, maht den Rasen'
to the following Prosodic Phrase, or detached. In the first case the intonation is characterized by the continuation rise (cf. Delattre et al. 1965), signalling that there are more phrases to come. In the latter case, however, the phrase may be the last one of an utterance.
Material The speaker has the choice to signal the end of non-utterance final prosodic phrases by different prosodic means. It is assumed that in German, above all, the tonal marking is achieved by a boundary tone, e.g. a high tone. Furthermore, of course, other parameters do play a role, e.g.finallengthening, intensity, spectral
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Mein Vater,
Robert Bannert 211
(a) as a question with high final boundary tone (at the end of the utterance); (b) as a statement with low final boundary tone. The medial prosodic phrases were spoken with high internal boundary tone. The distribution of the word accents were indicated in the orthography.
Analysis According to phonetic standards, the recordings were subject to an auditory examination. This was done in order to make sure that each utterance was spoken with the correct intended prosody as to the placement of the word accent, intonation type, and phrase boundary signalling. From each test sentence, four utterances correctly pronounced were digitalized into a VAX 11/750 computer at a sample rate of 10 kHz and analysed acoustically using the API programme (autoregressive spectral
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
structure of final segments, and voice quality. It is also assumed that the parameters cannot be used independently of each. Instead they seem to show some interplay in certain respects (cf. e.g. Bannert 1987). According to appropriate speech tempo and other preconditions, the end of a prosodic phrase may be signalled by a physical pause, i.e. a clear interruption of phonation extending over some span of time. Due to methodological reasons, for this investigation the signalling of the phrase boundary is done by a physical pause. However, as a case of reference and for reasons of comparison, we will also investigate utterances where phrase boundaries are realized without any pause (Material A, sentences 1 a, b). The whole material consisted of three parts, A, B, and C, as illustrated in Figure 9. The test sentences are listed, and the prosodic structure indicated. In prosodic investigations, very often a conflict arises between the phonetic requirement concerning the segmental structure of words and the semantic naturalness of the sentences. Usually a compromise has to be found. From a phonetic point of view, such material would be ideal that makes it possible to show the Fo-curve throughout the whole utterance without any interruption or effect of any feature other than those investigated. In order to fulfil these segmental conditions, one has to find sentences containing only sonorants and vowels of the same degree of opening and of the same category of quantity (phonologically long or short). With respect to the semantic conditions, however, this was not possible because of the rather long sentences. Nevertheless, an attempt was made, as far as possible, to do justice to the phonetic segmental requirement in the accented words. The sentences were read by three female speakers and one male speaker with two intonation types:
Viele junge MSler in Ndalen
Viele junge Mdler in Naalen,
Die L6ni kennt viele junge Maler in Naalen,
(2)
(3)
die
die
in hellen W6hnungen leben.
in hellen W6hnungen leben,
-^p3Ju
in HohenlShningen.
^p 2 #^p 3
leben in hellen W6hnungen
B. Prosodic utterances and phrases showing different syntactic structures
(1)
A. Different marking of phrase boundaries
oaded from jos.oxfordjournals.org by guest on January 1, 2011
-'p 2 '-u
fahren nach Hohenlahningen.
< g
1
>
u
[ p'
Wenn es die jungen Maler mal wagen,
in
Namen uas dem alten Naalen
(9) Viele junge Maler mit einem groBen
fahren nach Hohenlahningen.
fahren nach Hohenlaliningen.
Figure 9 Test sentences (three parts A, B, C) and their prosodic structures
Manner aus dem alten Naalen
(8) Viele junge MSIer und dumme
fahren nach Hohenlahningen.
JU
fahren nach Hohenlahningen.
Naalen
Maler
(7)
Viele junge
h-r2
Viele junge Maler
J
werden alle in hellen W6hnungen leben.
(6)
C. Varying lengths of non-final prosodic phrase
(4)
oaded from jos.oxfordjournals.org by guest on January 1, 2011 OB
I
214 Automatic Recognition of Focus Accents in German
modelling with cepstrally based periodical estimation) of the ILS package (Interactive Laboratory System). The analysed Fo-curves were represented graphically together with the speech wave using plot and editing programs developed by our research group. Important points of the Fo-curve were defined and measured in the temporal and tonal dimensions. These points are the beginning and end of the whole Focurve, and the Fo-minima and Fo-maxima corresponding to word accents or phrase boundary tones. The arithmetic mean of each point was calculated; it served to draw time-normalized and schematized fo-curves. This brings out the relevant Fo-movement of the curve. The Fo-curves of statements and questions were superimposed.
All four speakers showed a rather similar tonal behaviour. Therefore a representative sample of one of the female speakers (BW) is chosen in order to illustrate the findings. The results concerning the intonation of syntactically and prosodically complex sentences are valid for all four speakers. As an illustration of the results, only sentence (8) of material C will be given. The speech wave and the Fo-curve of a representative production of this sentence are shown in Figure 10. Figure 11 shows the normalized, schematized and superimposed Fo-curves derived from the means for this speaker. The time axis is horizontal and the fundamental frequency axis is vertical linear in Hz. The internal boundaries of the prosodic phrases are indicated by #.
The schematized Fo-curves The points marked along the time axis depict the measuring point of Fo. The points corresponding to word accents are interconnected and the accented word is indicated. The results are given for each of the materials A, B and C.
Material A Boundaries of the prosodic phrases, besides pauses in sentence (ic) are marked clearly by a local high tone appearing in the last syllable of the internal phrase of sentence (i). It should be noted that the word preceding the phrase boundaries carried the word accent. As a contrast, the first word accent in the first phrase (Maler) that is not influenced by other factors is realized as a peak accent (risefall). Comparison of the Fo-values of the high tone at the phrase boundaries of sentences (ib, c) shows that the tonal manifestation of the phrase boundary is not affected by the occurrence of a physical pause. However, in sentence (ia) where there is no tonally marked phrase boundary the corresponding Fo-value
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Results
Robert Bannerr
215
FUNDAMENTAL FREQUENCY 290
-2400
WAVEFORM
1
23
I
I
I
I
I
I
I
I
106
189
273
356
440
523
606
690
[Hz]
FUNDAMENTAL FREQUENCY
[Sm~ll
WAVEFORM
I
1
773Fr
F i r e 10 Fo-contour and speech wave o f the sentence 'Viele junge Maler und dutnme Mahner aus dem alten Naalen fahren nach Hohenlihningen'. Above: staternenr, below: echo quesrion. Speaker BW
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
[Smpl]
216 Automatic Recognition of Focus Accents in German Fo Hz 350
Sentence t a a Statement O
0
9
Question
1 1
—
—
a /
300 — —
i i i i i i
19
Iif
n
—
I
R l\
250
-
/ \
\
200
/ I
/
1
n
V
\v
A I k
1
/
I
Maler
Mahner
Naalen #
i
y A\1 i r
Figure n Normalized and superimposed schematic Fo-curves of the sentence 'Viee junge Maler und dumme Mahner aus dem alten Naalen fahren nach Hohenlahningen'
is considerably lower. Thus it should become evident that the phrase boundary is signalled by an increased Fo-value.
Material B It can be seen clearly that the phrase boundaries in all three sentences are tonally manifested identically although the boundaries have a completely different status on the syntactical level. The peaks of the first word accent stand out clearly.
Material C When the first prosodic phrase is prolonged step by step, the final prosodic phrase being kept constant, a rather simple picture arises. The beginning and end of the first phrase keep their appearance; the prolongation is done by inserting the word accent in between as tonal peaks. Here again the systematic character of German intonation pointed out in Bannert (1985a) and shown in Figure 2 above should be recalled. The Fo-curve of the final phrase remains unchanged. As a summary for the whole material one can state: phrase boundaries of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
z
i i i i
Robert Bannert 217
different syntactical status are manifested tonally as one high tone. Some variations of the measured Fo-value, however, are to be observed. This variation, though, does not seem to contain any relevant linguistic information. It could rather be related to some individual features of the speakers.
Recognition of focus accents in prosodically complex sentences
(11) Es war ein Tag # wie jeder andere. (It was a day like every day.) Then it may be impossible, relying only on acoustic information, to tell the two tonal features apart. It is obvious that in this case other kinds of information must be exploited.
Acknowledgements 'Modellbilding fur die Auswertung der Fokusintonation im gesprochenen Dialog' (MAFID), supported by the DFG (German Research Foundation) and carried out at the Fraunhofer Institute for Industrial Engineering (LAO), Stuttgart, Germany. ROBERT BANNERT Slorgarten 10 24130 Eslov Sweden
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In what way do the results of this investigation relate to the problem of recognizing focus accents in prosodically complex sentences compared to simple sentences as studied earlier? In what way will recognition be more complicated? As has been observed earlier in different contexts, a general relationship between intonation and syntax is also to be found here in these complex sentences showing varying linguistic structures. Contrary to syntax, intonation is characterized by a much simpler structure, i.e. the different syntactic features appear as one tonal feature only. In sentences spoken in a normal conversational manner, the high phrase boundary tone is clearly manifested. What is more, preceding a physical pause, it signals unambiguously a phrase boundary. The end of the whole complex utterance is also clearly manifested either as a low or a high tone; this is also true of simple utterances. Thus complex prosodic utterances, in principle, should pose no new problems for the recognition of focus accent. The focus accents in non-final prosodic phrases can be found in the same way as in simple utterances. One theoretical problem, however, should be considered. This is the case when, preceding a phrase boundary, a focus accent and the high boundary tone coincide on the final syllable of the prosodic phrase, e.g.
218 Automatic Recognition of Focus Accents in German
REFERENCES Altmann, H., A Batliner & W. Oppenrieder (1989), Zur Intonation von Modus undFokus im Deutschen, Linguistiche Arbeiten 234,
Niemeyer, Tubingen. Bannert, R. (1982a), Temporal and tonal control in German', Lund University, Department of Linguistics and Phonetics, Working Papers 22, 1-26. Bannert, R. (1982b), 'An Fo-dependent model for segment duration?', Uppsala University, Department of Linguistics, Report (RUUL) 8, 59-80. Bannert, R. (1983), 'Some phonetic character(eds), Structure and Process in Speech Percepistics of a model for German prosody', tion, 195-203, Springer, Berlin. Working Papers 25, 1-34, Institute of Nespor, M. & I. Vogel (1986), Prosodic PhonolGeneral Linguistics and Phonetics, Lund ogy, Dordrecht. University. Thorsen, N. (1978), 'An acoustical investigation of Danish intonation', Journal of PhonBannert, R. (1985 a), 'Towards a model for etics^: 151-75. German prosody', Folia Linguistica, XIX: 321-41. Thorsen, N. (1982), 'Two issues in the Bannert, R (1985b), 'Fokus, Kontrast und prosody of standard Danish: the lack of Phrasenintonarion im Deutschen', Zeitsentence accent and the representation of schrift fur Dialektologie und Linguistik, 52: sentence intonation', Annual Report of the 289-305. Institute of Phonetics, University of Copenhagen 16, 10-36. Bannert, R. (1987), 'Independence and interdependence of prosodic features: Thorsen, N. (1984), 'Intonation and text in some general remarks', in K. Gregersen standard Danish', PHONOLOCICA and H. Basboll (eds), Nordic Prosody IV, 311984, 301-9, London. 40, Odense University Press. Wunderlich, D. (1988), 'Der Ton macht die Bruce, G. (1977), 'Swedish word accents in Melodie: zur Phonologie der Intonation sentence perspective', Travaux de I'Institut des Deutschen', in H. Altmann (ed.), de Linguistique de Lund XII, Lund.
Bruce, G. (1982), 'Textual aspects of prosody in Swedish', Phonetica, 39, 274-87.
Intonationsforschungen: Linguistische Arbeiten
200, 1-40, Niemeyer, Tubingen.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Delattre, P., E. Poenack & C. Olsen (1965), 'Some characteristics of German intonation for the expression of continuation and finality', Phonetica, 13: 134-61. Essen, O. von (1956), Grundziige der hochdeutschen satzintonation. Henn, Ratingen. 't Hart, J. & A. Cohen (1973), 'Intonation by rule: a perceptual quest', Journal of Phonetics, 1: 309-27. Lehiste, I. (1975), 'The phonetic structure of paragraphs', in A. Cohen & S. Nooteboom
Journal of Semantics 8: 219-238
(g) N.I.S. Foundation (1991)
On the Tonal Disambiguation of Focus Structures SUSANNEUHMANN University ofWuppertal
Abstract
(1) was he (warned (to look out for (an exconvict (with a red (SHIRT))))) any of the constituents in brackets may be regarded as a focus. The presence of the foci is indicated by an intonation centre placed on the constituent shirt. Although I in principle share Hohle's view,2 who underlines the importance of formulating the rules for focus projection, I am not going to deal here with the rules governing focus projection, the origin of focus, or the problem of focus assignment in detail. My main interests in this article are instead concerned with the tonal realization of sentences. The question I will try to answer will be the following: does the ambiguity stated on the accentual level also show on the intonational level? To answer this question I will examine four aspects of the tonal realization more closely. 1. the internal structure of the intonational phrases; (i.e. the number of pitch accents and their choice from the inventory of German pitch accents); 2. the initial boundary of the intonarional phrases; 3. the pitch range; 4. the association of syllables and pitch accents.
INTRODUCTION Before I can apply myself to these problems I will have to introduce some technical terms and I will try to outline the theoretical background guiding my empirical analysis.3 By focus structure I understand a structure in which the focused constituents) are marked by an abstract F-feature. As I am not interested in the semantics of focus we may leave the origin of the feature open. For the purpose of the present discussion it will be sufficient to say that I will be interested only in those foci which are controlled by questions, i.e. foci assigned by a special focusing operator.4 The presence of a focus feature is—in a language like German—indicated phonologically by at least one pitch accent within the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In this article I will be concerned with certain well-known ambiguities of focus structures arising from a special accentuation of phrases or sentences. This ambiguity has been discussed under the heading focus projection 1 in the literature. What is generally understood by focus projection goes back to Chomsky (1971) and maybe even to Hermann Paul (1880)—'avant la lettre'. Chomsky (1971: 201) observed that in a sentence such as
220 On the Tonal Disambiguiries of Focus Structures
(2) Principle I:focus assignment The F-feature is assigned to one or more constituents. (3) PrincipleII:phonological realization The phonological correlate of the F-feature is at least one accent which is called pitch accent because of the Fo-contour as its most important acoustic correlate. (4) Principle III: metrical structure andfocus structure The pitch accent is associated with the metrically most prominent syllable of the phrase focused. If rather complex phrases (like NPs, VPs, or Ss) are focused, the focus rules have to identify first the focus exponent of this phrase. The pitch accent is then associated with its most prominent syllable (<7*), which becomes the focus syllable (2*) after association with the phonological correlate of focus (broad focus and focus projection). In the case of focused words or syllables the pitch accent goes directly to the focused syllable or to the most prominent syllable of the focused word (narrow focus). Based on a phonological distinction between High-tones and Low-tones and inspired by the work of Pierrehumbert, four different and functionally distinctive pitch accents (PA) were included in the repertoire of German pitch accents. Based on Goldsmith's (1976, 1981) 'accentuation principle', which controls the linking between a string of syllables (o) and a sequence of tones in such a way that metrically prominent syllables (a* or 2*) are linked with tones which are also marked by an asterisk (*), the inventory includes two level tones (H* and L*) and two contour tones (H* + L and L* + H).6
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
phrase focused which shows up phonetically in the Fo-contour. This pitch accent has to be associated with the metrically most prominent syllable within the phrase focused. If we focus rather complex phrases (like NPs, VPs, or Sentences) the so-called focus rules have to identify the focus exponent 3 of this phrase, i.e. the constituent whose accentuation allows for maximal ambiguity concerning focus projection. The pitch accent is now associated with its most prominent syllable (CT*), which becomes the so-called focus syllable (abbreviated as 2*) after association with the phonological correlate of focus. In this case we have a broad focus together with focus projection. In the case of focused words or syllables, there is no need for the application of the focus rules and the pitch accent goes directly to the focused syllable or to the most prominent syllable of the focused word and we arrive at the case of a narrow focus. The assumptions which have been made up to now are summarized under (2) to (4).
Susanne Uhmann 221
(5) Accentuation principle (Goldsmith) a o a* o
H* + L
(6)
Warumj sind hier so viele Polizisten? [Fi weil einige Fufiballfans wiedermal volltrunken einen grofien SIEG feiern miissen].
I
I
H" + L L% Example (6) and its intonation corresponds to the principles (2) to (4) in so far as it contains the minimal tonal specification that is needed for a sentence marked with the given focus structure. This is an extreme case of broad focus and focus projection and it is questionable if this is indeed a natural realization. At this point it is necessary to underline that focus projection is a widely optional choice and that speakers have different options as well. For example (6a): (6a) [Fi weil einige Fussballfans wiedermal voLLtrunken einen grofien SIEG feiern miissen].
I
I
V +H
H' + L
II H' + L
L%
The problem of multiple pitch accent assignment has been discussed by Gussenhoven (1983) under the heading 'focus domains'. Although his concept seems to be basically correct, I want to follow Ladd (1983) in his critique and change the name to 'accent domain' (AD). But more important than the new name is the equally new status of accent domain construction in the model of German intonation outlined here. In Gussenhoven's concept the division into accent domains is the first step after the assignment of focus features; it applies automatically and every accent domain also automatically receives its own pitch accent. The essential disadvantage of this conception is that the rules of accent domain construction carry the main load for the decision about number and placement of pitch accents. Gussenhoven's concept
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In addition to the pitch accents another tone was used for the tonal characterization of German intonation contours, namely the boundary tone as the phonological correlate of the borders of the intonational phrase. Boundary tones are H% and L%. They are necessarily associated with the last and optionally with the first syllable of the intonational phrase. The picture that has been outlined so far gives us something like (6) (bold type is used for marking the focus exponent and small caps mark the syllable which receives a pitch accent):
222 On the Tonal Disambiguities of Focus Structures
leaves no room for optionality and it cannot do justice to German data. Therefore, in the model of German intonation outlined here the rules for accent domain construction only apply after the focus rules which locate the focus exponent. Here is an example: (7) Waning hat sie ihn nicht schon langst verlassen? (7) [R wegen der KEsder] (broad focus and focus projection) (7) [Fi[ADiWEgen][/vD2(ier KiNder]] (broad focus and two accent domains) (7a) and (7b) are variants which can be freely chosen by a speaker, they depend perhaps on different amounts of emphasis or tempo. But what is going on in (7c) and (7d)?
Here the division into accent domains has left the area of stylistics or pragmatics. What seems to be important in cases like (7d) is the already mentioned concept of prominence, which here means m e t r i c a l strength. 7 1 cannot go into a detailed discussion of metrical grids or trees8 at this point, and in a way this problem is immaterial to the argument I want to develop. The metrical grid in (8) (Figure 1) gives a first impression. (8) x x x x xx
(x) x x x x x
x x x x x
level 4 (NSR) level 3 (PA assignment) level 2 (word stress) level 1 (no [d] or syll. nasal) level o (every syllable)
(Iph UI4»)( PA
PA Figure i
Let us return to the example (7d). There is no problem up to level 2. But the division into two accent domains with two PAs forces the assignment oftwo additional beats for the syllables Kin and we on level 3. A nuclear stress rule assigns another beat on level 4 to the rightmost constituent with a pitch accent, in this case wegen. Contrary to (7b) and (8), the nuclear stress rule places in (7d) the fourth level beat on a constituent which is not at the same time identical with the focus exponent of the entire focus domain. Turning a non-focus exponent into the most prominent constituent of the intonational phrase seems to be the point
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(7c) [Fi der KiNder wegen] (7d) *[Fi[AD, derKiNder][AD2WEgen]]
Susanne Uhmann 223
where the speaker's freedom to build up accent domains stops. The focus exponent has to remain the metrically most prominent constituent or—rather more precisely—the focus exponent has to contain the metrically most prominent syllable of the entire intonational phrase. These restrictions are summarized in a well-formedness condition which regulates the interplay between the metrical prominences, accent domain construction and the building up of intonational phrases in German.
After this regrettably somewhat lengthy introduction I will now approach the first question—the disambiguation of focus structures through choice and placement of pitch accents.
PITCH ACCENTS All the contours under examination show instances of the realiztion of the sentence Xenja promoviert. This sentence allows focus projection via the accentuation of the verb. So promoviert serves as the focus exponent of the entire sentence.9 (10) [H[AD Xenja promoviERT]] (broad focus, focus projection) PA A single pitch accent which is not associated with this focus exponent therefore signals the narrow focusing of the accented constituent. Based on the Fo-contours of the examples (11) to (15) I will demonstrate some realizations I have found in my data.10 The example (11) (Figure 2) with the pitch accent H* + L is unambiguous regarding its focus structure—it cannot be anything but (1 la).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(9) Principle TV: Well-formedness condition (a) A syllable which receives a pitch accent is always more prominent than a syllables without pitch accent. (b) In case of more than one pitch accent the rightmost pitch accent bearing syllable receives an extra beat (NSR) (c) The focus exponent (broad focus) or the focused word or syllable (narrow focus) has to contain the most prominent syllable of the entire intonational phrase.
224 On the Tonal Disambiguities of Focus Structures
(11)
VII81 GL 2: Wer promoviert? Hz
300 250 200 150 -
promoviert
sec.
H*+L Figure 2
(lla)
[pi XENja] promoviert. I H*+L
Notice that the Fo-contour after this pitch accent is almost entirely flat. This contour follows directly from Principle IV, especially IV(c), which states that the focused constituent has to contain the most prominent syllable of the entire intonational phrase. If, for example, promoviert had received a pitch accent too, the obligatory application of the Nuclear Stress Rule would have turned promoviert into the most prominent constituent of the intonational phrase and thus violated Principle IV (c). Also unambiguous regrding its focus structure is example (12) (Figure 3). it contains two pitch accents. The H* + L accent for the focused constituent and L* for the background constituent. Both the accentuation of the background and the choice of L* serve to disambiguate the focus structure. As promoviert is the focus exp)onent of the entire phrase a single accent associated with its most prominent syllable could—on the accentual level—mark a narrow focus as well as a broad focus. But by assigning a secondary accent and by choosing the L* pitch accent from the inventory of German pitch accents the speaker disambiguates the focus structure, for L* is only possible for background information. Notice that in this case the assignment of two accents does not violate the Principle IV as the NSR turns the focused constituent into the most prominent syllable of the intonational phrase.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Xenja
Susanne Uhmann 225
(12) Vn 162 GL 3:
Was macht Xenia? Hz
300 250 200 150 •
sec.
Figure 3
(12a) [H xENJa][Fj promoviERT] L*
H*
This example shows pretty well that the assignment of pitch accents is not restricted to focused constituents and that theories which infer the presence of focus from the presence of pitch accents will run into problems. It also demonstrates the necessity of formulating Principle IV in such a way that it controls the construction of intonational phrases and that is not restricted to focus domains. Unfortunately, the third pitch accent of the German inventory is not used as congruent with a certain focus structure as is the case for the L* accent. This can be shown with the help of the examples (13) and (14) (Figure 4). (13) has the same focus structure as (12) namely (13a): (13a) [H XENJa][Fl promoviERT] H*
H*
but the speaker has chosen H* for the tonal realization of the accented background constituent. H* has also been chosen in example (14), but here the tonal sequence of H* and H* + L is associated with an 'out of the blue sentence' divided into two accent domains.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Xenja promoviert I L* H*+L
226 On the Tonal Disambiguiries of Focus Structures
(13)
VII157 GL 5:
Ubrigens, Xenia heiratet.
Hz
300250 -
/-A
200 150 - \
sec.
Gibt 's was Neues?
(14) v m 74 HK 1:
Hz
300 -
I
250 -
J::
200 •
*\
150-
Ja
J ^
Xen 1
\ \
ja
promoviert 1
sec.
H*+L
H*
Figure 4
[14a) [fi[ADI
j ] [AD2 promoviERT]]
XEN a
H*
H* + L
The only difference between the two F 0 -contours is the higher Fo-peak in the example (13). This might be due to higher metrical prominence, but I will return to the interaction of metrical prominence as Fo-peaks later. The last remaining pitch accent is L* + H and its realization is shown in example (15) (Figure 5). In this case the application of L* + H is quite interesting but a bit problematic, too, because it is not the most natural tonal
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Xenja promoviert I I H*+L H* •
Nein
Susanne Uhmann 227
(15)
VH68GL1 Hz
Gibt's was Neues?
300 250-
K
200 150
'
:
Xenja promovien I I L*+H H*+L
sec.
Figure 5
realization of an 'out of the blue sentence'. The tonal sequence L* + H and H* + L is instead a quite natural tonal realization of a question-answer pair like (.6): (16) Wie geht es deiner Mutter? Meiner Mimer geht es GUT, aber mein Vater ist sehr krank.
I
L*
H*
What is the L* + H accent doing in this example? The question controlling the focus structure of the answer already contains the information that the person to be talked about next will be Mutter. The L* + H accent associated with a backgrounded constituent signals—in my opinion—that the speaker will not only give the inquired information about her mother's health, but she will also voluntarily add some further information, in this case her father's bad state of health. The question controlling the focus structure of example (15) does not contain the information that the person to be talked about next will be Xenja. A speaker choosing this pitch accent for a constituent in the initial field andin an 'out of the blue context' is doing something which was called a 'double duty turn' in conversation analysis (see Turner 1976). The speaker does more than the obligation to answer a certain question forces her to do. She was asked if she knew some tellable news. Producing the answer with an intonation contour like (14) would have done this job. Choosing L* + H instead of H* does more: it announces more tellable news to come and perhaps the opening of a gossip sequence with Zenja not the only victim.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Ja—•
228 On the Tonal Disambiguities of Focus Structures
(17) Gibt's was Neues? Ja. Xenja promoviert, und Marianne wird heiraten.
(l 8) Alle sind nicht gekommen.
I L* + H
I H* + L
I L%
(19) Alle sind nicht gekommen.
I
I
I
H* + L H* + L L% Clearly these two sentences do not convey the same meaning. This difference is due to the choice of pitch accents assigned to the quantifier in the initial field, because the L* + H accent affects the scope of the negation in this construction. I do not want to go into a more detailed analysis of this pitch accent here (cf. Jacobs 1982) because it might take us too far away from the problem of the tonal disambiguation of focus structures. I will instead try to summarize the findings of my first point. It seems to be important to me that the speakers I taped for my collection of data always assigned not just the one obligatory pitch accent to the focus exponent or the focused constituent. Wherever they could assign more pitch accents without violating Principle IV they did so. These pitch accents freed from the task of serving as the phonological correlate of the focus feature take over additional functions. But with the exception of the pitch accent L* and perhaps L* + H, too, there seems to be no straightforward one-to-one relation between certain pitch accents and certain functions.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
From a more semantically or pragmatically oriented point of view, another analysis has drawn attention away from the focus-background structure to the topic-comment structure" as another syntagmatic distinction which must not be confused or identified with the first one. Topic can be defined as the element which sets the frame for the interpretation of the remaining part of the sentence, i.e. the comment (cf. Jacobs 1984: 46). Besides several syntactic constructions, 'freies Topik' (hanging topic) or 'Linksversetzung' (left dislocation),12 Jacobs (1982, 1984) has identified another way of marking topicality in German, namely via certain intonation contour called 'I-Topikalisierung' (Jacobs 1984: 50). What he identifies as '(...) auf dem ersten Akzent eine progrediente Intonation (...)' turns out to be the L* + H accent on a constituent in the initialfield.In some cases the L* + H accent even affects the meaning of a sentence:
Susanne Uhmann 229
INITIAL BOUNDARY TONES
1. they can start the intonational phrase in a neutral, unmarked Fo-range which is a widely invariant feature of a speaker's voice; 2. they can start considerably higher, or 3. they can start considerably lower. Consider the example (20) (Figure 6). Speaker GL starts her turn at a level of 222 Hz. Irrespective of the first pitch accent she starts 59% of her intonational phrases in an Fo-range between 185 and 230 Hz. If we choose a smaller pitch sector, there are still 43% in the range between 195 and 215 Hz. The table in (21) (Figure 7) shows the overall distribution of GL's onsets. A second accumulation of onsets can be seen between 155 and 173 Hz. Above the level of 230 Hz there are only a few onsets. Among them is example (22) (Figure 8). The speaker GL starts here at 260 Hz. The two sentences do not exhibit the same focus
(20)
VII168 GL 8 Hz
Was hat gebrannt?
300250 200 •
150 -
Die Scheune hat gebrannt I H*+L
Figure 6
sec.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Let us have a look now at the second tone type, the boundary tone. As I have said, the initial boundary tone is optional in the construction of well-formed intonation contours in German, whereas the final boundary tone is obligatory and plays an important role in signalling the sentence mood. As the obligatory pitch accent assigned to the focus exponent, the final boundary tone is fixed in function and location, too. A better candidate for additional functions in the area of focus marking might therefore be the initial boundary tone. Regarding the onset level of their intonational phrase, speakes face three options.
230 On the Tonal Disambiguities of Focus Structures
(21) GL-Onset-Niveau Gesamtmenge : 226
Hz
Figure 7
(22)
VH 165 GL 7
Was war los? Hz 260 Hz
I' 300 250 200150 -
Die Scheune hat gebrannt. H%
sec.
H*+L Figure 8
structures; we have narrow focus in (20) and broad focus in (22). Unfortunately, not all instances in our test corpus which start at such a high Fo-level are 'out of the blue' sentences. But there are similar cases and they all have two additional restrictions in common.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
1 11 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
Susanne Uhmann 231
1. The focus exponent is located near the initial phrase boundary. Multiple pitch accent is excluded because of Principle IV and only one pitch accent can be used for the tonal make up of the entire phrase focused. 2. There is at least one syllable between the initial boundary and the most prominent syllable of the focus exponent.
PITCH RANGE AND TEXT TO TUNE ALLIGNMENT Up to now I have considered sequences of tones which show up phonetically as typical Fo-contours. Via this correspondence these tones were incorporated into an abstract phonological inventory of German intonation. Besides the allocation of pitch accents and boundary tones there is another level of tonal representation that can be used for the disambiguation of focus structures. But what we are dealing with in this new area is a different kind of tonal information. Two aspects of tonal realization will be discussed: the pitch range used to realize the contour tones and the text-to-tune-alignment of unstarred tones. Although these aspects are not part of the phonology of intonation themselves, the factors governing them are phonological features. The contours (23) and (24) (Figure 9) will serve as illustration. Both examples have the same tonal sequence (H* + L and L%), and on the accentual level the accenting o( Scheune is ambiguous because Scheune is the focus exponent of the entire sentence. But example (24) shows clearly that the constituent Scheune is realized intonationally with a higher peak in the case of narrow focus than in the broad focus example (21) where we have focus projection. Following Pierrehumbert (1980), die actual peak height in Fo-contours is achieved in this model of Geman intonation by an interplay between metrical prominence and tonal realization of pitch accents. The higher the prominence,
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
But even with these two restrictions there is no automatism which guarantees for an H-tone marking of the initial phrase boundary. The problem we are facing here might be that additional factors which belong to the area of topical coherence or topical progression disturb the picture. Even in highly artificial situations and highly controlled test patterns it seems to be difficult to eliminate interferences of this sort totally. Nevertheless, in an appropriate environment the high initial boundary tone might be a good candidate for a speaker who wishes to disambiguate the focus structure of her turn, because, ex negativo, it can be said that I have found no instance in our data where this tone was used in the restricted environment mentioned above without the simultaneous presence of an 'out of the blue' sentence. I shall now discuss my last two points, namely the use of pitch range and the assocation of pitch accents and syllables.
232 On the Tonal Disambiguiries of Focus Structures
(23)
VIII105 HK 7
Was war los?
Hz
300 250200 150 -
(24)
VIII106 HK 8
sec.
Was hat gebrannt?
Hz
300 250 200 150 -
Die Scheune hat gebrannt I H*+L
sec.
Figure 9 the higher the peaks for H* pitch accents and the lower the valleys for L* pitch accents. The building up of metrical prominence was shown at the beginning of my paper (example (8)). But metrical grid construction can involve more than what has been shown up to now. In addition to the rules building up three or four level beats, speakers have the possibility to give special emphasis 13 to certain elements. Emphasis here means a supplementary device which speakers use to mark certain parts of their turn in addition to the focus-background distinction. Emphasis shows in the metrical grid through higher beat columns on focused constituents. Going a step further than Pierrehumbert, who
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Die Scheune hat gebrannt I H*+L
Susanne Uhmann 233
explicitly built up her model without any functional claims, I have formulated a principle governing extra height resulting via special emphasis which connects the phenomenon with the focus structure. (25) Principle V: Emphasis
The metrically most prominent syllable of a narrow-focused constituent can receive one or more additional beats.
1. The spreading of unstarred tones is stopped at the syllables which are already associated with a starred tone via the 'accentuation principle'. The necessity of this boundary is pretty obvious, for the main task of pitch accents in German is the tonal realization of prominent syllables. This tonal marking is spelled out phonetically via changes in the Fo-contour. If the tonal
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This possibility was used quite regularly through our data. Although special emphasis is a possible choice for all constituents carrying a pitch accent, it seems to be important to me that the frequent combination of narrow focus and enlarged pitch might hint at a systematic connection. Ambiguity on the accentual level may be excluded by the enlargement of the pitch range on a constituent which could as well be the focus exponent of the entire phrase. But here further systematic tests will have to be done. I will now embark upon my last point. I have used not only question-answer sequences for controlling the focus structure, but also correction sequences. With the following examples I want to show another possibility of disambiguation: the mapping of text and tune. In order to mark the narrow focused syllable Spiil-, the speaker has not enlarged the pitch range in example (26) (Figure 10), but has restricted the downward slope to the focused syllable. At the end of this syllable the lowest point of the Fo-contour is already reached and the rest of the contour remains totally flat. In example (27) (Figure 10) the lowest point of the H* + L pitch accent is not reached until the syllable Iduft. How can the difference be accounted for in a phonologically motivated model of intonation? Up to diis point I have introduced only Goldsmith's 'accentuation principle' which associates metrically prominent syllables with the starred tones of the bitonal pitch accent. I have not said anything about the association of the unstarred tones of the bitonal pitch accent. In German unstarred tones of bitonal pitch accents are dumped on to the syllable already associated with the starred element (contour tones) and they can also be subject to spreading on to the unaccented syllables to the right.14 The question now is: how far can an unstarred tone spread? Or, to formulate it in a different way, what are the limits for tonal domains in German? The answer to this question is not easy. But what I can do here, is motivate some boundaries which seem to be relevant.
234 On the Tonal Disambiguities of Focus Structures
(26)
v n 135 GL 29
Das Waschwasser lauft nicht ab.
Hz
300 250 200 150 -
(27)
Nicht das Waschwasser. Das Spiilwasserlauft nicht ab I I L% H*+L
VII 235 GL 26
sec.
Was ist los?
Hz
300 250 200 150 4
"* ^
N.
Das Spiilwasser lauft nicht ab
sec.
I H*+L
Figure io marking is limited to the starred syllables and—in the case of the association of contour tones—to their direct neighbourhood, the difference between syllables bearing tonal information and syllables lacking this tonal information becomes more salient if spreading is indeed restricted in the way mentioned above. If there are only a few intervening syllables between two tonally specified syllables no further problem arises, but if their number is greater, a second restriction seems to be relevant which imposes further restriction on the spreading of unstarred tones. This restriction is connected with the tonal marking of prominence in German.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Nein.
Susanne Uhmann 23$
2. The rightward spreading of unstarred tones is stopped at the first syllables bearing a level 2 beat. Note that level 2 is the relevant level for this restriction because word stress is marked here. Besides, the realization of prominence pitch accents were identified as the phonological correlates of the F-feature, and this connection must not be obscured at the phonetic level. 3. The spreading of unstarred tones is stopped at focus domain boundaries. Let us consider examples (26) and (27) (Figure 11) again with the new information about tone spreading in German.
X
x
X
X
X
X
X
X
Das [F SPOL] wasser lauft nicht ab.
o o
a
2*
I L%
I \ H*+L
o
a
a
I > L% Interpolation
(27a) broadfocus x x x x x xx x x x [F Das SPOL wasser lauft nicht ab.]
o I*
aa a
\ \ II I H*+L
a
a
I
> L% Interpolation Figure 11
In example (27) the spreading of the unstarred part of the H* + L pitch accent is not stopped until the syllable lauft, which is the first syllable bearing a level 2 beat. This possibility is excluded in example (26). Spreading ofL would cross the focus domain boundary, this is why the tonal slope in example (26) is limited to the focused syllable and L is only dumped on to the syllable already associated with its starred counterpart. These restrictions for tonal domains are summarized under (28).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(26a) narrowfocus x x
236 On the Tonal Disambiguities of Focus Structures
(28) Restrictionfor tonal domains ,(a) Focus syllables (2*), 2accented syllables (a*) phrase final syllables3 (0%) and focus domain boundaries ([p]) are maximal boundaries for tonal domains. (b)The rightward spreading of non-starred tones within a tonal domain is stopped at the first syllable bearing a level 2 beat
SUMMARY
— The L* pitch accent is directly linked with a notation of background information. — The L* + H pitch accent has a special function, too. Associated with a background constituent in the initial field it signals topicality. — For H* and H* + L the picture is unclear. — The optional high initial boundary tone can be used to signal broad focus in certain environments. — Enlarged pitch range realizing pitch accents is used to signal narrow-focused constituents. — The spreading of unstarred tones is restricted to signalling narrow focus. I do not wish to argue totally against focus projection, but in many cases the ambiguity stated at the accentual level is solved as soon as we spell out the intonational contour of the sentences under examination in more detail and do not restrict the analysis to the underlining or capitalizing of certain words or syllables to mark an unspecified intonation centre. SUSANNE UHMANN BUGH Wuppertal FB4 Sprach- und Literaturwissenschaflen Postfach 100127 5600 Wuppertal 1 Germany
NOTES 1 cf. Chomsky (1971), Fuchs (1976, 1980) Halliday (1967a, 1967b), Hohle (19X2), Jackendoff (1972), Klein & v. Stechow (1982), Ladd (1978), Paul (1880).
2 'Das fundamentale Problem bei alien Untersuchungen zum Fokusphanomen ist es, Regeln zur Fokusprojekriou zu finden' (Hohle 1982: 99).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
I shall now summarize my findings. On very different levels of tonal realization speakers have the possibility to disambiguate focus structures and they make use of these possibilities quite regularly:
Susanne Uhmann 237 9 The obligatory final boundary tone (L%) is neglected in the following examples. 10 The data were collected during my work in a DFG research project at the University of Konstanz directed by Arnim von Srechow. 11 cf.forexampleChafe(i976),Jacobs(i982, 1984) and Reinhard (1981). 12 cf. Alrmann (1981) for a detailed discussion of German examples. 13 The term 'emphasis' must not be understood here to correspond to Bolinger's (1958, 1985), Carlson's (1983) or Taglicht's (1982, 1984) use of the word. Under the heading of 'highlighting' Bolinger uses emphasis more or less in the same way as focus is used here. 14 For a detailed discussion cf. Uhmann (1987: chapter IV).
REFERENCES Altmann, H. (1981), Formen der 'Herausstellung'
im Deutschen, Niemeyer, Tubingen. Bolinger, D. (1958), 'Stress and information', American Speech, 33: 5-20.
Bolinger, D. (1958), 'Two views of accent', Journal of Linguistics, 21: 79-123. Carlson, L. (1983), Dialog Games: An Approach to Discourse Anlysis, Dordrecht.
English, Cambridge University Press, Cambridge. Goldsmith, J. (1976), 'An overview of autosegmental phonology', Linguistic Analysis, 2: 23-68. Goldsmith, J. (1981), 'English as a tone language', in D. Goyvaerts (ed.), Phonology in the 19&OS, E. Story-Scientia, Gent, 287-308. Gussenhoven, C. (1983), 'Focus, mode and
Chafe, W. (1976), 'Givenness, contrastiveness, definiteness, subjects and topics', in the nucleus', Journal of Linguistics, 19: 377C. Li (ed.), Subject and Topic, Academic 417. Press, New York, 27-55. Halliday, M.A.K. (1967a), 'Notes on transChomsky, N. (1971), 'Deep structure, surface itivity and theme in English', Journal of structure and semantic interpretation', in Linguistics, 3: 37-81, 199-244. D. Steinberg & L. Jacobvitz (eds), Semantics: Halliday, M. A. K. (1967b), Intonation and An Interdisciplinary Reader in Philosophy,
Cambridge University Press, Cambridge, 183-216. Fuchs, A (1976), '"Normaler" und "kontrastiver" Akzent', Lingua, 38: 29-312. Fuchs, A (1980), 'Accented subjects in "all-
Grammar in British English, Mouton, The
Hague. Hayes, B. (1983), 'A grid-based theory of English meter', Linguistic Inquiry, 14: 35793Hayes, B. (1984), The phonology of rhythm new" sentences', in Wege zur Universalienin English', Linguistic Inquiry, 15: 33-74. forschung (Festschrift fur H.J. Seller) Narr, Hohle, T. (1982), 'Explikation fur "normale Tubingen, 449-61. Betonung" und "normale Worstellung"', Giegerich, H.J. (1985), Metrical Phonology in W. Abraham (ed.), Saltzglieder im and Phonological Structure: German and Deutschen, Narr, Tubingen, 75-152.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
3 For detailed discussion cf. Uhmann (1987)4 cf. Karttunen & Peters (1979) and Jacobs (1984) for the subsumption of absolute focus under relative focus. 5 cf. Fuchs (1976, 1980). 6 For details cf. Uhmann (19X7) and Uhmann (1991). 7 cf. Selkirk (19X4); I have adopted her view and tried to spell it out for German in Uhmann (1987). 8 The discussion is still open. Selkirk (1984) and Prince (1983) are voting for grid notation only (1 am using grids only, too), whereas Kiparsky (1979), Selkirk (1980) and Giegerich (1985) are basing their analysis on trees only. Liberman & Prince (1977) and Hayes (1983, 1984) argue for the necessity to use grids and trees.
238 On the Tonal Disambiguities of Focus Structures Reinhart, T. (1981), 'Pragmatics and linguistics: an analysis of sentence topics', Philosophica, 27: 53-94. Jacobs, J. (1984), 'Funkrionale Satzperspektive und Illokutionssemanrik', Linguistische Selkirk, E. (1980), 'On the role of prosodic categories in English word stress', Linguistic Berichte, 91: 25-58. Jackendoff, R. S. (1972), Semantic interpretation Inquiry, 11: 563-605. in generative grammar, MIT Press, Cam- Selkirk, E. (1984), Phonology and Syntax: the Relation Between Sound and Structure, MIT bridge. Press, Cambridge. Kartrunen, L. & S. Peters (1979), 'Conventional implicature' in C. K. Oh & D. A. Taglicht, J. (1982), 'Intonation and the assessment of information', Jonrmj/ of Linguistics, D i n n e e n (eds.) Syntax and Semantics 11: 18:213-30. Presupposition, Academic Press, New York, Jacobs, J. (1982), Syntax und Semantik der Negation im Deutschen, Fink, Munich.
1-56.
Taglicht, J. (1984), Message and Emphasis: On
Linguistic Inquiry, 14: 19-100.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Focus and Scope in English, Longman, Kiparsky, P. (1979), 'Metrical structure is London. cyclic', Linguistic Inquiry, 8: 421-42. Klein, W. & A. von Stechow (1982), 'Intona- Turner, R. (1976), 'Utterance positioning as an interactional resource', Semiotica, 17: tion und Bedeutung von Fokus', Arbeitspapiere des SFB pp, Nr. 77, Konstanz. 233-54Ladd, D. R. (1978), The structure of intona- Uhmann, S. (1987), 'Fokussierung und Intonrional meaning', Ph.D. Diss., University ation: eine Untersuchung zum Deutschen Microfilms International, Ann Arbor. an Frage/Antwort-Sequenzen in experimentellen Dialogen', Konstanz (Diss.) Ladd, D. R. (1983), 'Even focus and normal (MS). stress', Journal of Semantics, 2: 157-170. Liberman & A. Prince (1977), 'On stress and Uhmann, S. (1988), 'Akzenttone, Grenztone und Fokussilben: zum Aufbau eines linguistic rhythm', Linguistic Inquiry, 8: phonologischen Intonationssystems fur 249-336. das Deutsche', in H. Altmann (ed.), IntonaPaul, H. (i960), Prinzipien derSprachgeschichte, tionsforschungen, Niemeyer, Tubingen, 65Darmstadt (Wiederabdruck der 5. Auflage 88. 1920, 1. Aufl. 1880). Pierrehumbert.J. (1980), 'The phonology and Uhmann, S. (1991), Fokusphonologie: Eine Beschreibung standarddeutscher Intonationsphonetics of English intonation', Ph.D. konturen im rahmen der nichtlinearen PhonoDiss., MIT (MS). logie, Niemeyer, Tubingen. Prince, A. (1983), 'Relating to the grid',
Journal oj SrmaniiH X: i
—-S l
© N.I.S. Foundation (loyi)
Intonation and Contrast DIETER WUNDERLICH University ofDiisseldorf
Abstract
THE PROBLEM In this section, I will state the problem more precisely. A particular instance where the bridge contour arises is given by (i) when uttered as an alternative question. In this case, the two alternatives, Tee and Kaffee, are separately focused and they must be assigned a pitch accent. The intonarional contour which is produced will always result in a bridge. (2) shows a bridge in its abstracted form.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In this paper I will deal with one important aspect of the relationship of syntactic structure and intonarional structure. Syntactic structure is organized hierarchically and may involve some co-indexing between parts of it, whereas intonarional structure is organized linearly and from left to right. I shall argue that in matching these two different kinds of structure, one needs an interface level which I will call the level of contrast. The particular idea I propose is that on this level syntactic information is used to form a structure of so-called contrast phrases which is purely right-branching, and that it is this contrast structure on which pitch assignment rules apply in order to yield the lay-out of intonarional structure. Because in German syntax rightbranching dominates over left-branching, the constrast structure often preserves the properties of syntactic structure. But there are also clear cases with different structures at the two levels. The empirical background comes from some observations made in a former experimental project on German intonation (see Wunderlich 1988). In that project intonation contours were systematically varied by resynthesis on the basis of natural utterances, and these synthesized stimuli were then judged in perception tests. We are planning to study the predictions outlined in this paper in a similar experimental framework. For several reasons, however, these experiments have not yet been performed. Therefore, the study is still in a preliminary state. The most important observation concerns a particular type of intonation contour in German which may be called a 'bridge' (somewhat similar to the 'hat' in Dutch, which has been investigated by the Eindhoven group). It has been shown elsewhere that the bridge contour is composed of two successive pitch accents, the first one rising with a following high level, the second one falling. What has been puzzling are the conditions under which this bridge contour is realized. It can be produced under quite different circumstances. The idea put forth in this paper is that the two pitch accents which complement each other in the bridge signal a contrast between two focus domains on which the bearers of the accent can be projected.
240 Intonation and Contrast
(i) Willst du Tee oder Kaffee? ('Do you want tea or coffee')
w H»
H
L*
(3) s* - s* / _ ?
I
HH (4) s* - s* / H _
L In (3), the first H is associated with the focus syllable (hence an H*), whereas the second H is a floating tone at this point According to general principles, the floating tone will be associated with each syllable until it meets another preassociated syllable; in other words, itfloatsup to the next s*, or, alternatively, up to a right boundary. (4) states that a pitch accent becomes Low in an immediately preceding High context. Both rules are needed for independent reasons. (4) has to be applied if a phonological phrase starts with a High boundary tone and an early pitch accent And (3) accounts for the effects of iteration. The utterance of an alternative question as in (5), which presents more than two alternatives exhaustively, will show up with an intonational pattern as in (6).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
As has been shown in Wunderlich (1988), German intonation can be described sufficiently by assuming the two level tones High (H) and Low (L). Pitch accents are assigned to focus syllables (s*), which are usually syllables with a word stress and projected onto a particular focus domain. Tones which are realized on the focus syllable itself are marked by a star. Other tones, which serve to lay out general characteristics of the contour (e.g. the so-called phrasal tones), are unmarked. (There are also boundary tones, which play only a marginal role in this paper.) The bridge in (2) presents an intonarional gestalt in which the two pitch accents complement each other. One may consider this bridge a single unit, but then one would have to assume rather complex rules which map syntactic information onto intonational information. More promising is an approach which decomposes the bridge into two parts which are independently motivated. I propose two different pitch assignments rules which together yield the bridge in (2).
Dieter Wunderlich 241
(5) Willst du Bier oder Wein oder Saft? ('Do you want beer, or wine, or juice?') Kommst du am Sonntag, am Montag oder am Dienstag? ('Will you come on Sunday, on Monday, or on Tuesday?') (6)
In the case of (5), the syntactic structure of the coordinated phrase conforms with (7), at least in the most common view. But this need not be the case in other occurrences of the bridge contour. The problem we are confronted with can now be stated more precisely: what is the relevant context condition of rule (3) (which has been marked by a question mark)? Let us investigate this question on a number of points. First, it should be clear that the context cannot be empty. The default pitch accent in German is simply H*, i.e. rising on the focus syllable with a subsequent fall (because L is the default value for non-focused syllables). Second, in a case such as (1, 2) one may simply say that the first focus syllable s* globally 'sees' the second one, and therefore the first pitch accent assignment can be carried out within this greater domain. At the level of the syntactic structure, it is certainly true that the first focus syllable can 'see' the second one. But if we assume pitch accent assignment to be a lower-level rule, we expect it to be much more locally restricted in its left-to-right processing. From our experiments we know that the assignment of an L*-accent in view of a High right boundary only occurs in a context of at most three to five syllables. But the distance between the two focus syllables in a bridge contour can be much greater. And, bearing in mind the possibility of iterating the left pier as in (5,6), the view that the first left pier can 'see' the final fall must be rejected. Third, the context must be stated generally enough to include all the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(6) shows a clear left-to-right asymmetry. A sequence of left piers, all of the pattern H*H as being assigned by (3), is followed by a final fall at the right corner. If one enlarges the number of alternatives, one gets iteration of the left pier (in case the speaker does not prefer some additional grouping). Patterns such as in (6) can easily be induced on a right-branching structure. We can abbreviate a syllable which undergoes (3) as s,, and a syllable which undergoes (4) as s2. Then the structure in which the pitch accents are assigned is an instance of the following schema.
24.2. Intonation and Contrast
different occurrences of the bridge contour. Considering only examples of alternative questions, one may be prompted to use the syntactic properties of coordination to formulate the context condition. The bridge contour arises, however, in a very large set of diverse cases which have nothing in common syntactically. We need, therefore, some level on which one can abstract from the peculiarities of syntactic structure. What we want to know is the proper label of the bracketings illustrated in (7). SOME EXAMPLES OF BRIDGE PATTERN CONSTRUCTIONS
(a) 'Out ofthe blue' utterances: (8) (Weifit du schon das Neueste?) Der BUNdeskanzler ist zuRUCKgetreten. ('Do you know the latest news? The chancellor has stepped down') (b) Proverbs: (9) ErLAUBT ist, was geFALLT. ('What pleases is permitted') SCHENken heifit ANgeln. (To give means to fish') Wie du MIR, so ich DIR. ('As you to me, so I to you') (c) Contrast of two adjacent phrases: (10) Peter KOMMT, Anna GEHT.
('Peter is coming, Anna is going')
(d) Alternative questions: see above (e) Lists: (11)..., DREIzehn, VIERzehn, FUNFzehn, SECHzehn. ('... thirteen, fourteen, fifteen, sixteen') (f) Different syntactic bracketings: (12) (zwei mal DREI) plus VIER (Two times three plus four') er (zog und verLOR) den SPRINger ('He moved and lost the knight') (13) ZWEI mal (DREI plus VIER) er ZOG und (verLOR den SPRINger)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The bridge pattern is a very common intonation contour in a large set of quite different structures. In some of these it'is the only option (as in the exhaustive alternative question), in others it is one of several possible options. In the following examples the focus syllables are marked by capitals. Except for the lastfocussyllable with a falling pitch accent according to (4), all preceding ones must be (or at least can be) realized by the pitch accent assigned by (3).
Dieter Wunderlich 243
The possible bracketings can be disambiguated by means of the intonation contour only. Pauses or decompositions into intonation phrases with final lengthening are additional means which may or may not be used. In (12), the first element (zwei resp. zog) is normally focused on, too; it is, however, not realized by an H*H contour but by a default H* only. (g) Syntactic extractions:
(14) WANN (glaubst du (dafi Peter gesagt hat (dafi Kate kommt))) * * * (14) ('When do you believe that Peter said that Kate will be coming')
(h) Gapping: (15) Maria hat ANna besucht und PEter HEINrich ('Maria visited Anna, and Peter Heinrich') Ich sehe inn HAUfiger als PEter HEINRrich ('I see him more often than Peter Heinrich') Here again, more options in the intonation pattern are possible, but each of them will realize the gapped clause in a bridge contour. The most common alternatives focus on the counterparts of the bearers of the bridge within the first clause; these can be realized by either a default pitch accent (H*), or by a left pier (H*H). This list is by no means exhaustive. But it comprises quite a representative set of cases which forces us to look for a general solution. The question is, what is common to all these examples? And how can we determine in a principled way the conditions under which the bridge pattern (or a sequence of left piers with a final bridge) occurs? By now it should be obvious that reference to syntactic structure alone is not sufficient. The list in (14) can easily be given a right-branching structure as required by (7). The same is true with the bracketings in (13), compared with those in (12). (13) fulfils the structure of (7), whereas (12) does nor, and consequently, the (optional) first focus syllable in (12) can in no case be realized by a left pier (H*H). But in a case such as (14), the syntactic structure is much too
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In such an utterance as (14), the fall can take place in either one of the marked clauses, usually on the finite verb. As has been tested experimentally (by resynthesized contours and perception judgements), this correlates very strongly with the intended interpretation. The wh-word wann can be related to either one of the marked clauses (which leads to different meanings). Consequently, the bridge intonation can relate the extracted phrase to a representative of the clause which contains its trace. Clearly, (14) can be uttered with quite a lot of different intonational patterns. However, the strongest correlation with the intended interpretation is induced by the bridge contour.
244 Intonation and Contrast
rich to explain the occurrence of a bridge contour which correlates the extracted phrase with the domain of its trace. And in the case of gapping, it is hard to envisage the relevant syntactic structure to be right-branching, although gapping is a phenomenon in which left-to-right asymmetry plays an important role. What we have to look for is a level at which some of the syntactic structure is preserved, some is depressed, and some modified.
THE CORE OF AN EXPLANATION Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
In this section, I will try to generalize the notion of focus. It has been assumed that pitch accent is assigned to individual focus syllables, and that a focus syllable is the representative (or exponent) of a particular focus domain. There are different sources of focus; it can be assigned by lexical means (particles such as sogar, nur (even, only)) or by syntactic means (cleft-sentence construction). But in general, focus has been considered a phenomenon of new vs. old information. This view includes the presupposition-bearing means just mentioned as well as free-choice focus, where one pan of an utterance involves a particular choice of new information against a background of given information. We may say that the new information contrasts with all other alternatives which were possible under a given information base. Just now we have used the notion of contrast in order to describe what kind of phenomenon focus is. Along this route it seems possible to generalize the traditional notion of focus such that it turns out to be a subcase of contrast. Indeed, the notion of contrast seems helpful in covering all the different constructions presented in the previous section. In the alternative question, two or more alternatives are mentioned which contrast with each other regarding a choice. In a pair of adjacent phrases such as in (10), the particular chosen values of a person and an activity contrast with each other. Concerning lists, we may say that each of the enumerated elements contrasts with the respective end of the list. (By taking this view, we immediately construct the list as a right-branching structure.) In the examples of different syntactic bracketings presented in (12) and (13), we find a more complex interplay between list aspects, different values of a syntactic category and its thematic complements. For instance, zwei mal (dreiplus vier) can be viewed as a small list of two elements, the first contrasting with the second one, which again can be viewed as a small list In erzog und (yerlor den Springer), we oppose different values of a VP to each other, which again make up a small list; and regarding the second VP, verlorden Springer, we may oppose a value of V to a value of its thematic complement (see below). In syntactic extractions such as those presented by (14), the extracted phrase contrasts with its trace (or more exactly, with the domain of its trace, since it
Dieter Wunderlich 245
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
offers a possible completion within that domain). In the case of gapping, it is again values of a thematic category which contrast with each other (see below). In a single focus constituent, exactly one alternative (out of a set of possible choices) is expressed. This differs from the contrast construction in which more than one alternative is expressed. If an utterance is made within a particular context, it augments (or replaces) die information given so far. It is therefore quite natural to expect only one focus domain in this utterance. But what about 'out of the blue' utterances? By definition, there is nothing in the context they can rely on. Therefore the first constituents) of the sentence itself lay die information base for the next ones. Take, for instance, example (8) from above: der Bundeskanzler ('the chancellor') is a possible choice out of a set of alternative topics, and the same is true for the predicate ist zuriickgetreten ('has stepped down') which is a choice out of a set of alternative predicates for the given topic. These particular values of a thematic predicate-argument constuction can be contrasted. It seems to be a general feature of linguistic communication to reduce possible alternatives step by step, and to mark the exponents of the choices by contrast. Proverbs, which constitute a special class of complete texts, display this general feature quite obviously. In general, proverbs have a parallel structure and contrast the respective exponent values with each other; see the examples in (9). Since it seems in principle to be possible to cover all instances of the bridge contour by the notion of contrast (even if the way we are doing this could be made more precise), I propose to consider this notion as being central to mapping syntactic structure onto intonational structure. In its broader sense, contrast also includes focus. An utterance (or intonation) phrase may have one focus domain (traditional focus) or more than one focus domains which contrast with each other. The difference is that in the first case the contrast is with the context, whereas in the second case it is displayed within the utterance phrase. In studying focus, it has often been assumed that there is an interface level between syntax and phonology at which focus domains as well as their respective exponents, the focus syllables, are determined. My proposal is to expand the tasks of this level to the task of determining possible contrast domains, and I shall therefore speak of the level of contrast. As in the determination of focus domains, the decomposition into contrast domains (or phrases) is free unless it is required by particular lexical or structural information. This means that, in general, there are several possible options. It is obvious that syntactic and lexical information must be available at the level of contrast. In its output, information about contrast (and focus) domains and about focus syllables must be available such that the rules of pitch accent assignment can be applied. These rules belong to the level of phonology or, more precisely, to its intonation component, see (16).
246 Intonation and Contrast
level of contrast
intonation component
(17) a.
b- [4K.ltK2tK.iy]]
c Ubc.lk2k.lk2k.iy]]] We are now in the position to formulate the context condition of the pitch accent assignment rule (3). The global properties of the bridge contour and its iteration by a sequence of left piers has been reconstructed by the decomposition procedure. Therefore, the context condition itself can be stated rather locally as in (20): s* must be the last focus syllable within a K,-domain. (2O)S*-S*/_] K1
I HH In the configuration of two adjacent K, phrases, we get an intonation pattern as in (21). In this pattern, the second H* cannot be distinguished from its tonal context. Therefore a bracketing into two intonation phrases with a medial Low boundary is induced.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Up to now we have left open whether contrast domains cover all or only some parts of an utterance phrase. Regarding the different instances of the bridge contour considered in the previous section, it seems plausible to assume thefirstoption. This means that the decomposition procedure encompasses the whole unit. The decomposition must yield a right-branching structure as in (7), which can be arrived at by iteration. In the first step, the whole utterance phrase 3> is decomposed (from left to right) into two contrast phrases, say K, and K2. In the next steps, K2 (but not K,) is again decomposed into K, and K2.
Dieter Wunderlich 247
[1 I
I
HH HH I
SOME APPLICATIONS
(22)a. [derBundeskanzler]K1 [ist zuruckgetreten]K2 b. [er ist]K, [zuriickgetreten]K2 HH
L
The other examples of simple contrasts can be treated in a similar way. More interesting are the different syntactic bracketings shown in (12) and (13), here shown in (23) and (24). (The left pier is more distinct if s* is followed by at least one unstressed syllable; therefore the monosyllabic numerals zwei, drei, etc. have been replaced by the bisyllabic numerals dreizehn, vierzehn, etc.). In the first step, decomposition can follow the syntactic bracketing. According to the rules in (17), further decomposition is not possible in (23), but it is in (24) as an option (the a-version). The first numeral in (23) can, however, be designated as an independent focus, which will then be realized by the default pitch accent (simple H* peak). The distinction between the two bracketings can thus be realized by a minimal tonal distinction on the first numeral (or the first three syllables of the utterance). This is quite an interesting fact. In a left-toright processing it can be decided in the very beginning of the utterance whether syntactically a right-branching or a left-branching is in question.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Let us first consider two examples of'out of the blue' sentences. (It goes without saying that these and other examples can be produced by different intonation patterns; I consider here only the conditions for a bridge contour.) Compare (22a) and 22b). Although they display the same syntactic structure, they must be decomposed differently into contrast phrases. The reason for this is the condition in (18). The NP der Bundeskanzler can convey thematic information (and therefore can be focused or stressed); the pronoun er ('he') cannot. Therefore, the decomposition in (22a) can follow the first syntactic branching, the decomposition in (22b) is only possible at the second syntactic branching, since here the finite verb (the perfect auxiliary) can convey assertion and hence be stressed.
Intonation and Contrast
(23) [(dreizehn mal vierzehn)]K, [plus fiinfzehn] K2 (H)
HH
L
(24) a. [dreizehn mal] Kl [[(vierzehn]K, [plus fiinfzehn)]K2]K2
I
I
I
HH HH L b. [dreizehn mal]K1 [(vierzehn plus funfzehn)]K2]K2
I
I
(25) ((dreizehn mal vierzehn) plus fiinfzehn) mal sechzehn (H)
HH
HH
L
(26) dreizehn mal (vierzehn plus (fiinfzehn mal sechzehn)) HH (27)
HH
HH
L
(dreizehn mal vierzehn) plus (fiinfzehn mal sechzehn)
ab.
I
I
I
I
(H)
HH
HH
L
I
I
HH L Let usfinallyconsider syntactic gapping. There are several well-known ways to describe gapping, either by deletion of lexical material or by a parallel syntactic structure with phonologically empty categories or by reconstruing a full clause structure by means of contrast properties. If we restrict ourselves to simple cases of gapping as in (28) in which the second clause is 'reduced' to a sequence of NPs, we may roughly say the following. A sequence of syntactic constituents is grammatical (and thus projects onto a full clause—this may be handled semantically or syntactically) if and only if, for every constituent,1 a counterpart with the same case features and, moreover, a common governor of case exist in the immediately preceding context. Let us introduce the syntactic feature ([+K], which designates a contrast domain that must be mapped onto the intonation component. In (28), [+K] must be present in order to make the final sequence of NP constituents grammatical. In the reconstrual, the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
HH L Similarly, the distinction between a left-branching in (25) and a rightbranching in (26) can minimally be signalled by the tonal contour on the first numeral. Here, however, the contour in (25) can be conflated with one of the options in (27). This makes it clear that in more complex cases the intonation contour cannot uniquely distinguish the bracketings on its own; it must be accompanied by some additional means (such as pauses or final lengthening).
Dieter Wunderlich 249
respective counterparts will be marked by [+C] and co-indexed with the [+K]constituents. As we can see from (28), the two contrasts involved cross each other. There can be no intonation contour (processed from left to right) which counts for the two contrasts separately. (28) Maria hat
Anna besucht und Peter
Heinrich.
(29)
(i) A sequence of constituents [XP, +K]* is grammatical and projects onto S iff for every [XP, +K] a case-identical counterpart [XP, +C] can be found within the preceding adjacent context, (ii) On the level of contrast, every [+K]-constituent must be included in a phrase of type K2. (iii) For every [+K]-constituent, a designated focus syllable s* must exist.
For (28), we get, for instance, the following two contrast structures and sequences of pitch accent. (30)
Maria hat
Anna besucht und Peter
[+C]
[+q
a- [
Heinrich
[+K] ]KI[[
[+K] ]KI
II HH ] K , [[
II
]K 2 ]K2
II
(H) b- [
[
HH ]x.[[
L ]x.
[
LciUu
II
HH HH HH L It can easily be seen that the contrast structure (CS) in (30a) follows the syntactic structure (SS) of sentence (28), whereas the structure in (30b) does not. For convenience, these two different structures are presented again in (31): (31) SS: ((Maria hat (Anna besucht)) (und Peter (Heinrich))) CS: [Maria hat [Anna besucht [und Peter [Heinrich]]]] In the following example (32) with gapping in the comparative complement, the [+C]-pronouns will normally be unstressed. Since, according to (18), Ki must contain a focus exponent, it has to be chosen on the comparative adjective. But again, (32) presents only one option. With thematic NPs instead of the first
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
If we map (28) onto the level of contrast, we have to assure that each [+K]consrituent is included in a K2 phrase such that its exponent focus syllable falls into the [+K] -constituent. The syntactic counterparts, however, the [+C]constituents, are in both respects free; they can but must not be included in a K2 phrase, and they can be left unstressed. The relevant conditions are expressed by
250 Intonation and Contrast
two pronouns, there are also intonation patterns similar to those in (30) available. (32) Ich schreibe ihm
[+C]
haufigerals
[+C] HH
er
mir
[+K]
[+K]
HH
CONCLUSION Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This paper has presented a systematic treatment of intonation patterns which are typical in German for a large set of different constructions. It has been argued that these constructions involve some kind of contrast in one way or another. Therefore, I tried to make the notion of contrast involved a little more precise. In particular, I proposed a level of contrast which interfaces between syntactic and intonation structure. The assumption of such a level is not new; it is independently needed as the level at which focus exponents are determined. What is new is the proposal to integrate considerations of contrast domains and of focus domains in a more general treatment; hence the tasks of this level has been enlarged. A particular property of contrast structure is that it is purely rightbranching. This enables a left-to-right processing on the lower level of pitch accent assignment. When syntactic structure is already right-branching, this property will simply be copied. In other cases, right-branching must be induced by die proper principles of decomposition. In this respect, gapping constructions are of particular interest, the crossing of individual contrasts indicated by the co-indexing in examples such as (28) must be resolved within a left-to-right procedure. So far, only a few examples of gapping have been considered. Therefore, the principles outlined in (29) must be understood as very preliminary. Rather, they show the direction in which I expect further investigation to make some progress. The intonation patterns outlined in my examples are derived from theoretical assumptions. As far as I can see they conform to empirical observations. In any case, one should have in mind that, in general, several options for intonation are possible. One intended aspect of my treatment was to predict a certain variety of possible intonation patterns. It may be the case that there are still more options, but it is rather doubtful whether these options can also be specified under the notion of contrast. It is, however, clear from this study that strict experimental investigations (over and above pure observations) must still be carried out and, considering the possibility of strong predictions within the framework outlined here, such investigations promise to be fruitful.
Dieter Wunderlich 251
Acknowledgements This study was supported with a grant from the German Research Foundation (DFG) under the title 'Satzintonation und Fokusstruktur von W-Fragen im Deutschen' (—Wu 86/7). DIETER WUNDERLICH Allgemeine Sprachwissenschaft Universitat Dusseldorf Universitatsstr. 1 4000 Dusseldorf Germany
Wunderlich, Dieter (1988). 'Der Ton macht die Melodie: Zur Phonologie der Intonation des Deutschen', in Hans Altmann (ed.), lntonalionsforschungen, Niemeyer, Tubingen, 1-40.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
REFERENCES
Journal ofSemantics 8: 253-275
© N.I.S. Foundation (1991)
International Focusing and Dialogue Games JAKOB HOEPELMANt, JOACHIM MACHATE'andRUDOLF SCHNITZER* tIBM Germany, Scientific Centre *Fraunhofer Institutefor Industrial Engineering, Stuttgart
Abstract
1 FUNCTIONAL SENTENCE
PERSPECTIVE
One of the main features of natural language communication is the fact that the speaker tries to modify or augment the information that exists for the listener. Within the scope of cooperative communication the speaker can mark by means of intonational focusing those parts of the utterance serving these communicative functions. According to Carlson (1984: 297). 'Intonational focusing is involved when parts of a sentence—phrases, individual words, or even parts of words—are specially emphasized by intonational means: intensity, pitch, duration.' In the sequel the information augmenting function will be called informative focusing', that is, for example, involved when an answer is given to a wh-queston as in the following dialogue: (1) 'Who is the most famous painter?' '!Dali is the most famous painter.' The information modifyingfunction will be called in the sequel 'correctivefocusing1, that is, for example, involved when an alternative is proposed to an element of a preceding dialogue step as in the following dialogue: (2) 'Chagall is the most famous painter.' 'No, !Dali is the most famous painter.'
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The traditional conceptions concerning the problem of focus, which are known under names like 'Functional Sentence Perspective' and 'Topic Focus Articulation', leave unanswered many questions concerning the semantics and above all the pragmatics of utterances where intonational focusing is involved. But precisely these questions of pragmatics such as the dialogue strategies of the dialogue participants, the usage of conditions and dialogical-funcrions of intonational focusing, and the dialogue context have to be answered to arrive at an adequate description of the complex phenomenon of intonational focusing and its functioning in communication. In this paper we aim to discuss some of these problems within the framework of the socalled 'Dialogue Game Theory'—a game theoretically oriented discourse grammar—in order to develop a model for the interpretation of intonarional focusing in dialogues.
254 International Focusing and Dialogue Games
These two functions—informative and corrective focusing—are important parts of the dialogue model, where we try to find an answer to the question in which dialogue situations a dialogue participant has the right to make use of one of these functions. Traditional focus conceptions, like the above-mentioned Topic-Focus Articulation, have some difficulty in answering this quetion because they do not put the focus of interest on pragmatic questions but mainly on the analysis of semantic structuring of sentences. They try to determine this semantic structuring by an operational criterion, the well-known question test.
According to Sgall etal. (1986:210) the question test 'is based on the assumption that, for every sentence, the intuition of the native speaker will determine a unique set of questions that can be appropriately answered by the given sentence in different contexts.' For example, the sentence (3) 'Chagall painted beautiful pictures in IParis.' can be an answer to any one of the following quesrions: (4) 'What did Chagall do?' 'What did Chagall paint?' 'Where did Chagall paint beautiful pictures?' The relation between the sentence to be analysed and the set of relevant questions will determine the Topic-Focus structure. In addition to this Sgall et al. (1986) mention further 'if a phrase A (from the "answer" sentence) occurs in no element of the set of relevant questions, then it is the focus proper of that sentence.' This condition is in our example met by the phrase 'in Paris' which hereby can be determined as the focus. As for the topic Sgall et al. (1986) state: 'if phrase A (from the "answer sentence") occurs in every element of the set of relevant questions, then it is the topic of that sentence.' This condition is met by die proper noun 'Chagall' which is consequently according to the question test the topic of sentence (3).
I.I.I Communicative dynamism The phrases which are part of some but not all of the questions (e.g. in our example the verb 'paint') belong to the so-called transition. Thus the question test allows us to extend the bipartition to a whole hierarchy or scale, which is called die 'scale of communicative dynamism'. This 'communicative dynamism' is defined by Firbas (1975: 317) as follows:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
I. I The question test
J. Hoepelman,J. Machate and R. Schniczer 255 By CD I understand a qualify displayed by communication in its (unfolding of the) information to be conveyed and consisting in advancing this development By the degree of CD carried by a linguistic element, I understand the relative extent to which this element contributes to the further development of the communication.
1.2 Criticism of the question test The analysis that is reached by employing the question test leads to certain difficulties that should be noticed because the question test is an important heuristic element of the traditional focus conception. The first thing to note is that the question test is carried out to analyse isolated sentences with the help of an artificially established context of questions. The step that seems to be lacking here is the explicit formal consideration of the natural context, for example, the preceding dialogue step and the formal introduction of the dialogue participants, their dialogical roles and their assumptions. These requirements become more obvious by noticing that informative and corrective focusing are dependent on the preceding dialogue step and the fact that the semantic scope of the focus is dependent on the actual dialogical context. For example, in the dialogue (5) 'What did Chagall paint?' 'Chagall painted wonderful Ipictures.' the phrase 'wonderful pictures' can be determined as focus, whereas in the dialogue
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The element with the highest degree of 'communicative dynamism' corresponds to the focus of a sentence, which is characterized hereby as that element which contributes most to the further development of communication. The element with the lowest degree of'communicative dynamism' corresponds to the topic and its information-preserving function. The elements that carry degrees of 'communicative dynamism' that range between the highest and the lowest belong to the transition. The scale of 'communicative dynamism' resulting from the question test serves to establish the semantic representation, called by Sgall 'deep-word order', that reflects the communicarive structuring of sentences. The sentence parts, or rather the semantic roles standing for the sentence parts, are arranged in this 'deep-word order' by increasing degrees of'communicative dynamism'. Thus the topic of a sentence, which carries the lowest degree of 'communicative dynamism', is placed in this kind of representation on the leftmost side, and the focus, which carries the highest degree of'communicative dynamism', on the rightmost side.
2 $6 International Focusing and Dialogue Games
(6) 'What did Chagall do?' 'Chagall painted wonderful [pictures.' the verbal phrase 'painted wonderful pictures' constitutes the focus. Another problem the question test has to deal with is the seeming equivalence of the concepts 'topic' and 'old information' and of'focus' and 'new information'. Such an equivalence is forced by the property of the question test to determine that element of the sentence as the focus that represents new information with regard to the set of relevant questions. That the abovementioned concepts have to be distinguished can be shown by the following dialogue: C7
The phrase 'this famous painter' is not part of the relevant question as a syntactic constituent, but it can serve as the topic because it preserves the object of reference which was introduced by the question. Furthermore the object of reference is characterized by the topic as a famous painter and so we get as new information (8) (The speaker assumes that Dali is a famous painter) Hence, the topic can include old as well as new information. That the focus need not necessarily include only new information can be shown by the following dialogue (cf. Sgall etal. 1986: 58): (9) 'Who is more famous, Dali or his wife Gala?' 'I think !he is more famous.' The personal pronoun 'he' refers as the focus of the answer to an object which was already introduced into the context by the question and therefore represents old information. Another deficiency of the question test is that only declarative sentences can be analysed by it but not expressions for which no set of relevant questions can be found, like, for example, questions themselves. We could conclude that the question test can indeed serve as a criterion for the analysis of isolated declarative sentences but that it has to cope with two main problems: 1. Only isolated sentences can be analysed without taking the natural dialogue context into consideration. 2. The dialogical functions of intonational focusing can not be analysed in an adequate way. As already mentioned, we try to solve these problems with the so-called Dialogue Game Theory.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(7) 'What do you think of Dali?' 'I Hike this famous painter.'
J. Hoepelman,J. Machate and R. Schnitzer 257
2 THE DIALOGUE GAME THEORY The Dialogue Game Theory was developed by Lauri Carlson as a discourse grammar which should be able to answer the main question of a dialogue theory which is, according to Labov (1972: 252), 'how one utterance follows another in a rational rule governed manner—in other words, how we understand coherent discourse'. The dialogue rules that are developed within the Dialogue Game Theory constitute in Carlson's view 'a concept of appropriateness of a sentence to a given context'. This was mentioned above as a necessary condition for an adequate interpretation of intonational focusing.
One important theoretical starting point for the Dialogue Game Theory is 'Game Theoretical Semantics' developed by Jaakko Hintikka (cf. Hintikka 1976, 1985) as a game theoretical variant of truth conditional semantics. We must take a short look at this theory in order to understand the game theoretical features and characteristics of the Dialogue Game Theory. Game Theoretical Semantics serves to check the truth conditions of formal and natural language expressions by means of semantical games. These semantical games are verification procedures carried out by an ordered pair of semantical roles, called Nature' and 'Myself that can be characterized by their argumentative function towards the expression to be analysed: 'Myself has to try to verify the expression, whereas 'Nature' has the obligation to falsify the expression. The question of winning and losing is dependent on the game rules at the disposal of the players that constitute their strategies. An expression, for example, a natural language sentence S, counts as true 'if there exists in G(S) a winning strategy for Myself, i.e. a strategy that wins against any strategy of Nature' (Carlson 1983). To clarify this definition let us have a look at a short example. A set of premises is presupposed as being true, i.e. already accepted by Nature. For example: Dali is a painter. Chagall is a painter. Picasso is a painter. Rembrandt is a painter. Chagall smokes. Picasso smokes. Suppose someone wants to verify a universally quantified sentence against this set of premises, namely
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
2.1 Game Theoretical Semantics
258 International Focusing and Dialogue Games
(10) All painters smoke. According to the game rules for universally quantified sentences, 'Nature' tries to attack this sentence by introducing into the game a dialogue object X out of the set of premises for which the antecedent of the universally quantified sentence holds, i.e. which has to be a painter. 'Myself, who tries to defend his statement that all painters smoke, has to show for the dialogue object X chosen by Nature, that the conclusion of the universally quantified sentence holds with regard to the set of already accepted premises.
Myself
Dali is a painter Chagall is a painter Picasso is a painter Rembrandt is a painter Dali smokes Chagall smokes Picasso smokes All painters smoke Dali is a painter Dali smokes Chagall is a painter Chagall smokes Picasso is a painter Picasso smokes Rembrandt is a painter
Figure 1 A way in which the semantical game can be represented is depicted in Figure 1. We see that Myself can not show for Rembrandt, who is a painter, that he smokes. So Myself loses the game. The strategy of Nature has won over that of Myself and therefore the sentence could be proved as false with regard to the set of premises and the dialogue rules for universally quantified sentences. This is a very brief look at Game Theoretical Semantics, but it
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Nature
J. Hoepelman, J. Machate and R. Schnitzer 259
should be enough to illustrate that in the procedure of a semantical game semantical usage conditions which are given by the game rules are checked in relation to a given set of premises.
2.2 Differences between Semantical Games and Dialogue Games In order to use game theoretical considerations for the description of dialogues as in Dialogue Game Theory, Game Theoretical Semantics has to undergo some modifications. The relation between the two types of game theory is defined by jaakko Hintikka (cf. Barth & Martens, 1982: 5) in the following way:
In principle the following distinctions between semantical games and dialogue games can be made: 1. The moves in a dialogue game consist of complete utterances put forward by the dialogue participants in the course of the dialogue. This property is necessary for our aim because it allows us to describe intonational focusing as dependent on preceding dialogue steps. 2. The number of dialogue participants is arbitrary, but it will be fixed to two participants in our model for sake of simplicity. 3. The participants in dialogue games hold no fixed roles in contrary to Nature and Myself in semantical games. 4. Semantical games are 'truth-seeking games', i.e. the process of verification is in the centre of interest as we saw in the example with the universally quantified sentence. Contrasting to this there are at least two variants of dialogue games: critical games, in which the tenability of a statement against a set of premisses is investigated; and agreement-seeking games, in which the dialogue participants try to agree in respect to the information under discussion which is no longer a truth conditional but a communicative aim. 5. Dialogue Games take care of the dialogue context, in its explicit form, i.e. the utterances, and in its implicit form, i.e. the assumptions which are entertained by the dialogue participants.
2.3 Intonationalfocusing and Dialogue Games These properties of the Dialogue Game Theory can serve to describe intonational focusing as a phenomenon of discourse. According to Carlson (1983), the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
A semantical game, however, is not a parlour game at all, but an 'outdoor game' more like hide-and-seek, and should not be studied as a variety of conversation. Rather conversation, at least part of conversation, should be understood as rooted in semantic activities... consisting in non-verbal games against Nature.
26o International Focusing and Dialogue Games
meaning of international focusing can hardly be described in terms of truth conditions. The difference in meaning of utterances that only differ in intonational focusing as, for example, (i i) 'Dali was a famous Ipainter.' and (12) '!Dali was a famous painter.'
(13) 'What do you think of Dali?' 'I !like that famous painter.' because the constituent 'that famous painter' is not part of the question. Now we can say that a usage condition for this part of the utterance is that the speaker assumes that the hearer believes or entertains the idea that Dali is a famous painter, i.e. that he presupposes this information.
2.3.1 A simple model Usage conditions concerning presuppositions as well as those of the focus of an utterance can be treated in the Dialogue Game Theory in a systematic way. For this purpose the dialogue can be represented by a board which, according to Carlson (1984: 300), is a two column list of sentences, each side listing sentences successively uttered or written down by one of the dialogue participants. This is the list of the explicit dialogue moves made in the game. In addition each player keeps a private list, not seen by the other player. In this list, again on one side, are entered the player's own assumptions at each stage of the game. On the opposite side, the player enters assumptions that he takes the other player to make or have made in the game.
With this we get the representation as shown in Figure 2. The abbreviations stand for ALA: ALD: INFA: INFA/Bj:
assumption list of speaker A assumption list of speaker B information that A holds information that A believes that B holds
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
can be taken into account if the utterances are used as moves in a dialogue game with different usage conditions. Such usage conditions depend, for example, on the implicit presuppositions the utterances are based on. The concept of presupposition here is not that of logical presupposition—that part of an utterance that is constant under negation. It is pragmatically defined as that part of an utterance which the speaker thinks is believed, accepted or entertained by himself and by his dialogue partner. As we saw before it is difficult for the question test to offer an adequate description of the dialogue:
J. Hoepelman, J. Machate and R. Schnitzer 261
EDA
AL A
E ED B
INF A
AL B INFB
INF B(A)
Figure 2 A simple model
information that B holds information that B believes that A holds explicit dialogue move of A explicit dialogue move of B
Any state of a dialogue, a dialogue situation D at stage x, can be represented by a 3-tuple of the following form:
This representation can be extended to: Dx - «INF A , INFA(B)), <EDA,EDB>,
lN¥m
But it is a characteristic of natural language dialogues that the speaker's assumptions about the information that the hearer holds is not the same as the information the hearer holds in reality, and in this way presuppositions can go wrong. One of the functions of a dialogue consists in correcting such wrong presuppositions. Our dialogue example 'What do you think of Dali?' 'I Hike that famous painter.' can be continued by 'But I don't Ithink he is a famous painter.'
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
INFB: INFB(Aj EDA: EDB:
262 International Focusing and Dialogue Games
and hence the assumptions have to be changed in a corresponding way. We see that it is necessary for our dialogue model to distinguish between: 1. the information the speaker holds; 2. the information the speaker believes the hearer holds; and 3. the information the speaker thinks is shared by himself and the hearer.
Dx - «INFA, INFA(B), SIA>, (EDB>,
2.3.2 Some dialogue rules To obtain a systematic analysis of the dialogical behaviour of the dialogue participants and the development of dialogue games, Carlson (1984) distinguishes two extreme ways dialogue games can be played, the so-called 'cooperative game' and the 'competitive game'. Each kind of game is connected with specific dialogical aims which Carlson defines as follows: The aim of the players in the cooperative game is to make their private lists match each other to impart their own privileged information to the opposite side and to enrich their own assumptions by means of items on the list of their interlocutor. The aim of the players in the competitive game is optimally satisfied if the opponent is forced to unilaterally give up his conflicting assumptions and accept the other's view.
We will see that the dialogical rights and" the dialogue strategies of the dialogue participants are dependent on the type ofgame that is played. According to Carlson, the dialogue players' strategies are in principle constituted by the following types of dialogue moves that can be carried out by the dialogue participants: 1. Initial moves: e.g. (D. say) 2. Countermoves: e.g. (D. reply)
Dialogue moves which open a new dialogue or a new dialogue topic. 'A player may assert an assumption of his.' Dialogue moves which are a reaction to earlier steps in the dialogue. 'When a player has put forward an assertion, the other player(s) may choose to accept it, deny it, or (just) to
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The third information set can serve as an extension of the representation of the dialogue situations, as the column SI of shared information, where we have to distinguish between the two dialogue participants' assumptions concerning this shared information. If we add the information set of shared information to the description of the dialogue situations we get the following representation:
J. Hoepelman, J. Machate and R. Schnitzer 263
acknowledge it (e.g. by prompting the interlocutor to continue).' 3. Continuation moves: Dialogue moves which are a continuation of preceding dialogue moves. e.g. (D. add) 'A player may add an assertion to an assertion he has already asserted.'
(D: emphasis) When a player has put forward a sentence of the form (i) X-A-Y where A receives special emphasis the listener may look for a sentence of the form (ii) X-B-Y among the sentences on the board or on his private record, if such a sentence is found, the listener may construe (i) as a countermove by some noninitial dialogue rule to (ii). If the listener cannot apply this rule, for example, if there is no sentence of the form X-B-Y to which he can relate X-A-Y as a countermove, he can nevertheless conclude that the speaker has some false assumptions concerning the information on the listener's side. This rule is a useful rule of interpretation, when listening to an utterance in which intonational focusing is involved. But if we want to describe the usage conditions of intonational focusing we also have to develop rules describing the use of intonational focusing in the perspective of the speaker. And we have to pay attention to the communicative tasks of intonational focusing that were specified before as informative and corrective focusing.
2.3.3 Corrective focusing by constituent negation A special kind of corrective focusing consists in constituent negation, which was described in Gabbay & Moravcsik (1987: 251) as one of the main functions
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
One type of these dialogue moves—the 'countermove'—is closely connected to intonational focusing, or 'emphasis', as intonational focusing is called in the following quotation of Carlson (1984): The function of an emphasis is nothing else than to relate a dialogue move as a countermove to an explicit move or an implicit assumption entered in the dialogue. The focusing intonation thus serves as a pointer to an earlier step in the dialogue.' So we get, according to Carlson, the following dialogue rule employing simple surface structural representations:
264 International Focusing and Dialogue Games
of natural language negation: Thus the point of denial in most typical contexts is not only to pose a contradictory to some proposition, but to claim that something is wrong with a proposition, and to indicate—in so far as possible— which is the objectionable item.' And just to indicate this objectionable item is the task of intonational focusing, as the following examples will show: (i 5) 'Dali isn't a Imusician.' in this utterance the negation refers to 'musician' whereas in the utterance (16) '.'Dali isn't a musician.'
(17) 'He is a Ipainter.' and the utterance '!Dali isn't a musician.' can be continued by (18) 'IBeethoven is a musician.' So to employ constituent negation, at least two dialogical conditions have to be met: 1. There has to be an 'objectionable item' as part of an utterance of one of the dialogue participants. In our dialogue model it has to be part of the explicit dialogue moves. 2. There has to be an alternative to that 'objectionable item' on the side of the hearer's assumptions. But these two conditions have to be completed by an important third one. It makes, for example, no sense to make a countermove to the utterance: (19) 'Bush is a republican.' by using constituent negation and uttering (20) 'No, the president of the !U.S. is a republican.' The objectionable item and the alternative have to be, according to Gabbay and Moravcsik, 'incompatibles'. That means they have to exclude each other with regard to the given context. But this condition is not met in the above-cited example, in contrast to the next one: (21) 'Dali is the most famous painter.' '!Dali isn't the most famous painter, IChagall is the most famous painter.'
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
the negation refers to 'Dali', while both utterances have the same propositional content. A feature of cooperative communication is that the speaker has to be able to propose an alternative for the negated element. So the utterance 'Dali isn't a !musician.' can be continued by
J. Hoepelman, J. Machate and R. Schnitzer 265
Because only one person can be the most famous painter, 'Chagall' and 'Dali' are in this context in the relation of being incompatibles. All elements that meet this condition are called the 'range of incompatibles'. How this set is constituted in the course of the dialogue is too complicated a question to be discussed in detail now. So as a third condition for the right to use constituent negation we"will only add: 3. The alternative has to be an element of the range of incompatibles according to the listener concerning the objectionable item and the corresponding context.
DRi: If there is a dialogue situation Do and Do = «X-Y-C, INFA(B)( SIA, 0>, (EDA) X-Y-D), (ALB» and C and D are incompatibles dialogue participant A gets as a dialogical consequence the right R, to use an utterance of the form [X-Y-not-!D] so that a new dialogue situation D, arises and D, - «X-Y-C, INF^,, SIA, R,), <EDA> X-Y-D), and C and D are incompatibles dialogue participant A gets as a dialogical consequence the right R2to use an utterance of the form [X-Y-not-!D]
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Now we can say that as a dialogical consequence of these three conditions the hearer has the dialogical right to use constituent negation. We think that it is plausible that constituent negation without immediately revealing the alternative is only used when the alternative is an element of the information that the speaker holds solely or that he assumes to be an element of the set of shared information, and not if it is an element of the assumptions concerning the information the dialogue partner holds solely. If we add to the descriptions of the dialogue situations the dialogical rights as a further parameter, we arrive at the following dialogue rule that describes the dialogical conditions that lead to the right to use constituent negation:
.jfifl International Focusing and Dialogue Games
so that a new dialogue situation Di arises and D, = «INFA, INF^,, X-Y-C, R, R2), <EDA, X-Y-D), (AL,,))
FRi: In cooperative games the dialogical rights R, and R2 to use constituent negation are lost. A dialogical consequence of FRi is that the listener has to accept the information with the objectionable item and has to enrich his list of shared information by it, and he has to extinguish the information which contains on his side the alternative element. After the application of DRi which leads to dialogue situation Di and the application of FRi we get dialogue situation D2 with D 2 - «INFA\{X-Y-C), INFA(B), X-Y-D, 0), (EDA, X-Y-D), (ALB))1 Analogously the same dialogue situation D2 is reached after the application of DRzandFRi. In contrast to 'cooperative games', the other type of games, the 'competitive games', are characterized as we have seen by the attempt of the dialogue participants to force the dialogue partner to give up his conflicting assumptions. To do this the right to use constituent negation—which as we saw is a right to correct information—becomes a dialogical obligation and therefore the dialogical rights Ri and R2 have to be realized. So the corresponding frame rule goes as follows: FR2: In competitive games the dialogical rights Ri and R2 to use constituent negation have to be realized. If FR2 is applied and the right to use constituent negation is realized, the conflicting information becomes part of the set of explicit dialogue moves. But the dialogue partner is not forced in this dialogue situation to accept the information and give up his own. A decision will depend on further argumentative dialogue steps. So after the application of dialogue rule DRi and the application of FR2 for competitive games the following dialogue situation D2 is reached:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Now it depends on the type of dialogue game that is played—a cooperative or a competitive game—as to whether the dialogical rights R, and R2 have to be realized, i.e. if they change to dialogical obligations, or whether they disappear. As we said, in a cooperative game the dialogue participants have to enrich their own assumptions by means of information on the list of their interlocutor in order to achieve the desired agreement. So in this case the listener has to accept the 'objectionable item' and loses the right to use constituent negation. The question as to how dialogical rights are treated is answered by a class of rules, called 'frame rules'. As a first frame rule we get
J. Hoepelman.J. Machate and R. Schnitzer 267
D2 - «X-Y-C, INFA(B), SIA, 0>, <XTY-not-!D, EDB), After the application of dialogue rule DR2 we get the same dialogue situation except that the alternative information is part of the set of shared information: D2 - «INFA, INF^), X-Y-C, 0), <X-Y-not-!D, EDB),
2.3.4 Corrective focusing by explicit proposal of alternatives
DR3: If there is a dialogue situation Do and Do = «X-Y-C, INF^B), SIA> 0>, <EDA, X-Y-D), and C and D are incompatibles dialogue participant A gets as a dialogical consequence the right R3 to use an utterance of the form [But I know that, X-Y-1C]
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
If in 'competitive games' the right to use constituent negation is realized, the listener does not know whether DRi or DR2 was applied, so he does not know which information set the conflicting item, the alternative, is part of. Neither does he know what this alternative looks like and so constituent negation without explicitly telling the alternative does not seem to be a good way to reach an agreement. A more effective way to do this is to use corrective focusing by means of immediate expression of the alternative in question. Although constituent negation and the explicit proposal of alternatives have much in common—they are both, as was shown before, means to criticize information— the explicit proposal of alternatives has a proper dialogical function and will therefore be treated independently of constituent negation. The right to propose alternatives is based again on inconsistencies between an explicit dialogue move of one of the dialogue participants and the assumption list of the other dialogue participant. That means that the three conditions cited above which lead to the right to use constituent negation also have to be met to obtain the right to propose an alternative. But in addition to the dialogue situations leading to the right to use constituent negation, the alternative can be an element of that information set that contains assumptions about the informations concerning the dialogue partner. So all three information sets of the assumption list can contain the alternative to the 'objectionable item'. In order to act in a cooperative way—cooperative in furthering the aim of agreement, which is also part of the competitive game—the right to use the proposal of alternatives can contain information about the source of the alternative. So we get the following dialogue rule concerning the right of corrective focusing by means of the explicit proposal of an alternative:
268 International Focusing and Dialogue Games
so that a new dialogue situation Di arises and D, - «X-Y-C, INFA(B)) SIA, R3>, <EDA) X-Y-D), If the alternative is an element of the information A holds thatB holds, dialogue participant A gets the right R4 to say [But you told me that X-Y-1C] If the alternative is an element of the set of shared information dialogue participant A gets the right R5 to say [But we agreed that X-Y-1C]
FR3: The right to use corrective focusing disappears in cooperative games. FR4: The right to use corrective focusing in competitive games becomes a dialogical obligation. Both frame rules concerning corrective focusing serve to further the aim of reaching an agreement between the assumptions of the dialogue participants by employing different dialogue strategies depending on the type of game that is played. As well as corrective focusing, another way to reach an agreement in dialogues is that of posing and answering questions by means of informative focusing.
2.4 Wh-questions and intonationalfocusing As we saw before, it is one of the main dialogical functions of intonational focusing to mark that part of an answer which augments the information which is contained in the question. But apart from that, it is the typical function of an answer to augment the information on the side of the questioner as a whole. For example, a question of the form: (24) 'Who-X-Y?' can be adequately answered by an utterance of the form:
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The question of how these dialogical rights are to be treated depends again on the type of dialogue game that is played and the corresponding frame rules. We saw before that the right to use corrective focusing by constituent negation disappears in 'cooperative games', whereas it becomes a dialogical obligation in 'competitive games'. The same holds for the explicit proposal of an alternative. So for both types of corrective focusing two general frame rules that can replace the former ones can be formulated:
J. Hoepelman,J. Machate and R. Schnitzer 269 (25) '!A-X-Y;
where the dialogue participant who has to give the answer to that question has to look only in that information set of his assumption list which contains information that is only accepted or entertained (maintained) by himself So if we leave out rhetorical questions and questions in an examination we get the following dialogue rule for answering a wh-question by employing intonational focusing: DR6: If there is a dialogue situation Do and Do - ((A-X-Y, INFA(B)) SIA, 0), <EDA, Who-X-Y?),
[IA-X-Y] so that a new dialogue situation Di arises and D, - ((A-X-Y, INF^B), SIA, R6>, (EDA, Who-X-Y?), Because the aim of agreement is common to both types of games and answering questions furthers this aim we get for competitive games as well as for cooperative games the following frame rule: FR5: The right to answer a question becomes a dialogical obligation
2.5 Yes/no questions Besides wh-questions also yes/no questions fulfil a dialogical function which is connected with intonational focusing. Where in wh-questions the information that is asked for substituted by a question word like 'who' or 'what', which leads the dialogue partner to a certain desired answer, the way question/answer dialogues in which yes/no questions are involved function is somewhat different. Important for our dialogue model is the fact that the search for an answer to a yes/no question is dependent on the placement of intonational focusing, so that by intonational means a certain search strategy can be forced. Suppose that a dialogue participant has in his assumption list the following information: Dali is a painter Beethoven is a composer if he is asked now by his dialogue partner the question (28) 'Is Dali a Icomposer?'
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
dialogue participant A gets a dialogical consequence the right R6 to use an utterance of the form
270 International Focusing and Dialogue Games
he can say just 'No!', but assuming that he will behave in a cooperative way with the aim to reach an agreement he would answer: (27) 'No, he is a Ipainter.' But if this dialogue participant is asked the question (28) 'Is !Dali a composer?' which differs from the former question just in the placement of intonational focusing, the answer would be (29) 'No, Beethoven is a composer.'
ERO(A XNP[Dali is a X], composer) for Is Dali a [composer? and NP ERO(AX [X is a composer], Dali for Is !Dali a composer? This type of representation shows that the dialogue participant who is going to give an answer to a dialogue, has to check whether the focus fits with the background in order to get a true proposition with regard to his assumptions. If the focus in question does not meet this condition, the dialogue participant can look for an alternative depending on the focus-background structure. We can use these considerations for our dialogue model. This means that apart from surface structures of the form X-A-Y we can deal with semantical
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This fact, that intonational focusing can be a means to force search strategies, is interesting, too, if we think of one of the dialogue participants as an information system. The failure to give a positive answer to a yes/no question would trigger a search process depending on intonational focusing which leads to positive alternatives to the rejected prepositional content of the question. So the equivalence between cooperatively answering a yes/no dialogue and corrective focusing is' obvious. Another parallel that becomes obvious is that between intonational focusing and negation on the one hand and intonational focusing and questioning on the other. Intonational focusing marks both times those utterance items which are affected by, say, the operation of negation and questioning. On the level of semantic representation, the negation element and, according to Jacobs (1983), the illocutionary type of dialogues can be represented by an operator which divides propositions in two complementary parts. In this two-place operation the second argument represents the focus which is bound by a A-operator as a variable in the first argument. This first argument represents the so-called background. For example, for the abovecited dialogues we get the following representations:
J. Hoepelman,J. Machate and R. Schnitzer 271
representations. Consequently under these considerations the assumption lists are constituted by propositions and the utterances have to be translated into focus-background structures of the above-cited type.2 For the purpose of our dialogue model simple representations will suffice, and we get the following dialogue rule for answering yes/no questions: DR7: If there is a dialogue situation Do and Do - «A-B-C, INFA(B), SIA> 0), (EDA, X-B-C/D?), (ALB» dialogue participant A gets as a dialogical consequence the right R7 to use an utterance of the form
so that a new dialogue situation Di arises and D, = «A-B-C, INFA(B), SIA, R7>, <EDA, X-B-C/D?), (ALB))
3 H O W TO MAKE THE DIALOGUE STRATEGIES MORE FLEXIBLE The dialogue model as described so far is a strong simplification of natural dialogues. One way to make the dialogue model more natural is to give up the strict distinction between 'cooperative games' and 'competitive games', which are only two extreme possibilities for behaviour in dialogues. One step in the direction of more flexible dialogues could consist in leaving the dialogue participants the decision to realize a dialogical right cooperatively or competitively,, that means that the frame rules are no longer obligatorily determined for the whole dialogue with regard to a certain game that is played. The decision whether a dialogue participant accepts cooperatively the information (30) Dali is a famous musician although he assumes that (31) Dali is a famous painter or whether he realizes the right to use competitively the corrective dialogue move (32) Dali is a famous Ipainter should depend on a criterion other than a fixed type of game. Such a criterion could be, for example, the defence value of the alternative. This defence value could depend on the question whether one dialogue participant regards the
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
[No, IA-B-C]
272 International Focusing and Dialogue Games
Dali is a famous Ipainter we get the representation of possible strategies illustrated in Figure 3.
DO
Dl / Rl [ Dali is a famous Ipainter 1
cooperative move
competitive move
frame rule / cooperative move
frame rule / competitive move
Rl
Rl ->diaJogiea) obligation
D2/0
D2 / "Dali is a famous Ipainter"
Figure 3
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
other one as being more competent with respect to the controversial information or if he assumes himself to be more competent In the first case the defence value of the alternative is low, the dialogue participant chooses a cooperative move with the corresponding frame rule and the right to use corrective focusing gets lost. In the second case the defence value of the alternative is high; the dialogue participant chooses a competitive move with the corresponding frame rule and the right to use corrective focusing will be realized. The defence value can also help to differentiate between assumptions that count as sure knowledge with a high defence value and those that are rather uncertain with a low defence value. The decision as to which dialogue move should be chosen, i.e. if a competitive move or a cooperative move should be used, could be managed by a special kind of rules which can be called 'decision rules'. If we have a dialogue situation Do which leads to a dialogue situation D, with the dialogical right to use the corrective move:
J. Hoepelman,J. Machate and R. Schnitzer 273
4 IMPLEMENTING DIALOGUE RULES So far, we have described the meaning of intonational focusing within our dialogue model with regard to
But what makes the paradigm of dialogue games interesting to linguistic research is not just its ability to describe human behaviour in dialogues: The rules presented here are suitable to build a basis for the implementation of an intonation interpretation module of a speech-understanding system. It is this task we have dealt with in a project called MAFID which was sponsored by the German Research Foundation (DFG). The aim of the project was the integration of both intonational focus recognition and its interpretation in an information system. With a continuous speech recognizer (COSIMA) based on Hidden Markov Modelling and connected to an intonation recognition module, we were able to show the importance of integrating focus intonation in future speech understanding systems. Of course, the dialogue rules which define the rights and duties of two parties having equal rights have to be adapted in a suitable manner. The first condition is naturally that the system's dialogue strategy should be a cooperative one. A competitive strategy would surely decrease the user's acceptance of such a system, since he would always have to ask for more concrete information. The dialogue sequent describing the progress of dialogue steps can be reduced to a triple containing the system's data base, the utterances of the user and the system's responses. A semantic representation which describes an operational semantic and is suitable to be used for integration in the dialogue rules is produced by a parser which supports free constituent order. With the dialogue history mentioned above the system is able to refer to earlier stages of the dialogue. Hence, the assumption list of the system can be derived from its data base and the dialogue history. It is not possible to build an assumption list for the user, since his utterances are the only information the system can rely on. However, if we take as an example a yes/no question with an intonation line indicating the focus of the user's interest, the system is able to produce an appropriate answer with respect to the focused constituent In addition to the interpretation of intonational focus, some rules have been
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
— the dialogue context which is constituted by the dialogue situations; — the communicative functions of intonational focusing, as e.g. informative and corrective focusing; — the dialogue strategies of the dialogue participants depending on the dialogue situations; — pragmatic considerations concerning the type of the game that is played or the defence value of the information under discussion
274 International Focusing and Dialogue Games
defined to enable the system to use intonational focus even by itself. Technically, this is realized by a text-to-speech board with the facility of marking words to get phonetic stress. To conclude, the system which we have briefly described here not only recognizes and interprets focus intonation, but also makes use of it. In this paper we have concentrated on the communicative functions of intonational focusing called informative focusing and corrective focusing and have neglected many other problems. But although the dialogue model and the dialogue rules are still very preliminary we hope we have shown that intonational focusing can be seen as a dialogical phenomenon that is important for the description of discourse strategies.
IBM Germany, Scientific Centre, Institutefor Knowledge-based Systems, Wilckensstr. la 6900 Heidelberg Germany
NOTES The set theoretical notation for the information that is now included in SIA would be SIAU{X-Y-D) or in list notation
In DR7 the expression X-B-C/D? corresponds to the focus-background structure ERO(/IXN1)[X-B-C], D).
REFERENCES Barth, E. M. & J. L. Martens (eds) (1982), Argumentation: Approaches to Theory Formation .
Carlson, L. (1983). Dialogue Games: An Approach to Discourse Analysis. D. Reidel,
Dordrecht. Carlson, L. (1984). Focus and dialogue games: a game-theoretical approach to the interpretation of intonational focusing', in L. Vaina & J. Hintikka (eds), Cognitive Constraints on Communication, Reidel.
Firbas, J. (1975). 'On the thematic and non-
thematic section of sentence', in H. Ringbom el a\ (eds), Style and Text: Studies Presented to N. E. Enkvist.
Gabbay, D. M. & J. Moravcsik (1987). Negation and denial', in F. Guenthner & C. Rohrer (eds), Studies in Formal Semantics.
Hintikka, J. (1976). 'Language games', in E. Saarinen (ed), Game-Theoretical Semantics.
Hintikka, J. & J. Kulas (1985). Anaphora and Definite Descriptions: Two Applications of Game-Theoretical Semantics. Jacobs,J. (1983). Fokus undSkalen — ZurSyntax
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Address for correspondence: JAKOB HOEPELMAN
J. Hoepelman, J. Machate and R. Schnitzer 275 undSemantik der Gradpartikeln im Deutschen, Niemeyer, Tubingen. Labov, W. (1972). Sociolinguistic Patterns. Sgall, P., E. Hajicova, & J. Panevova (1986).
'The meaning of the sentence in its semantic and pragmatic aspects', in J. L. Mey (ed.), Language and Discourse: Test and Protest,Benjamins, Amsterdam.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Journal ofSemantics 8: 277-286
© N.IS Foundation (1991)
Book Review
Eric Reuland and Alice ter Meulen (eds), The Representation of (in)definiteness. Cambridge, Mass., MIT Press, 1987. $35.00 (paperback). PETER LUDLOW
(1) (2) (3) (4)
There's a fox in the henhouse. *There's the fox in the henhouse. There are three foxes in the henhouse. *There's every fox in the henhouse.
Misark's (1974) generalization was that the determiners which admit thereinsertion are the weak determiners (found in cardinal NPs) and that those which do not admit there-insertion are strong determiners (found in quantificational NPs). The analysis is often considered purely taxonomic, in that there is no obvious semantic basis for the distinction between cardinal and quantificational NPs. For example, why should 'a man' be classified as a cardinal NP instead of quantificational NP? Barwise and Cooper (1981) attempted to motivate Milsark's analysis by a settheoretic characterization of the strong/weak distinction. A weak determiner is one in which 'DET N are (is) N' is contingent. A strong determiner is one in
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
It is rather common for edited collections to contain a number of papers which have no coherent theme interweaving them save perhaps that they are all papers falling under the general heading of semantics, or model theory, or syntax. While the individual papers may be good, it is too often the case that the collected whole is weaker than the individual parts. Happily, this volume escapes this common trap, bringing together a number of papers from somewhat different perspectives, but all related to a common theme. The common theme is a loosely grouped class of constructions which are related to the phenomenon of indefmiteness. Among these constructions are those which involve thereinsertion, predication, discourse anaphora, quantifier scope, etc. The papers in this volume all address some portion of this problem space, and some even attempt to provide a unified treatment of the constructions. The point of departure for many of the papers is the definiteness effect, discussed in Milsark (1974), Safir (1982), and Barwise and Cooper (1981). The basic observation, from Milsark, is that there-insertion is only possible with a particular class of determiners. So, for example, one gets the following distribution of facts.
278 Book Review
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
which 'DET N are (is) N' is either tautological or contradictory (positive strong in the former case, negative strong in the latter). Because 'three foxes are foxes' is contingent, it is a weak determiner. 'Every fox is a fox' is tautological, so 'every' is a positive strong determiner, and 'None of the foxes are foxes' is contradictory, so 'none of the' is a negative strong determiner. Barwise and Cooper suggest that constructions of the form 'There is DET N' will be contradictory if DET is negative strong, and tautologous if positive strong, so strong determiners cannot appear in there-insertion contexts. The problem, however, is that being tautological or contradictory does not usually imply being ungrammatical. After all, we do utter completely grammatical contradictions and tautologies. The volume picks up where the above proposals left off—in some cases extending the strong/weak taxonomy to other constructions, and in other cases attempting to refine the taxonomy. It contains a helpful introduction by the editors, and the following papers: 'Where does the Definiteness Effect Apply? Evidence from the Definiteness of Variables', by Irene Heim; 'Indefiniteness and Predication', by Jim Higginbotham; 'What Explains the Definiteness Effect?' by Ken Safir, 'WH-in-Situ: Movement and Unselective Binding', by David Pesetsky; 'Specifier and Operator Binding', by Tanya Reinhart; 'An Indefiniteness Restriction for Relative Clauses in Lakhota', byjanis Williamson; 'The Syntax of Chamorro Existential Sentences', by Sandra Chung; 'Existential Sentences in Chinese and (In)definiteness'( by Jim Huang; 'Definiteness, Noun Phrase Configurationality, and the Count-Mass Distinction', by David Gil; 'The Compositional Nature of (In)definiteness', by Franciska de Jong; and 'A Semantic Definition of "Indefinite NP"\ by Edward Keenan. These papers represent a great variety of research methods and styles. One weakness in the collection qua collection is that the editors could have encouraged the contributors to make their proposals more accessible to a general audience. A number of papers assume a thorough grounding in Government-Binding theory. Other papers assume a thorough grounding in formal semantics. It may well turn out that both syntacticians and semanricists will buy the book, but each will find different portions of the collection accessible. My only other complaint about the collection is that there is no contribution by Milsark, whose (1974) work mapped out a significant portion of the territory explored by the papers in this volume. It would have been interesting to read his views of recent developments. A full analysis of each of the papers in this volume would be impossible, and a short summary of each would be of limited interest, so I will focus my attention on four of the papers which together represent a cross-section of the approaches taken in the volume. In particular, I shall discuss the papers by Heim, Reinhart, Higginbotham, and Keenan. The leadoff paper in the volume is Heim's. She argues that the definiteness
Book Review 279
effect occurs at LF, a level of syntactic representation in which operator scope is explicidy represented. The evidence for this claim is that if one thinks of the trace of a moved operator as something of referring expression (and consequently, as a strong NP), then one would expect the trace of a moved operator to block diere-insertion. For example, (1) exhibits a scope ambiguity which is not found in (2). (1) Ralph believes that a man is spying on him. (2) Ralph believes that there is a man spying on him. The missing ambiguity is predicted by the fact that the wide scope reading for (2) would require an LF representation like that in (3),
which is blocked because [NP e] is a strong NP. The downstairs reading is available when the NP remains in place, as in (4). (4) [s Ralph believes that [s there is [NP a man] spying on him]] There are two worries about the analysis as developed thus far. First, we are given no indication as to how the in-place NP is to be interpreted. If it is a quantifier, then what is it binding? Perhaps the variable is introduced only in the interpretative meta-language, but this is the sort of proposal which needs some explanation of how we might execute it. On the other hand, if the NP is not a quantifier, but is rather a referring expression, then what does it refer to, some vague man? Second, the analysis appears to be inconsistent with the analysis of indefinites proposed in Heim (1982). There Heim suggested that indefinites are free variables which are bound by (sometimes implicit) operators. On such an analysis, following the proposal made by Heim in this volume, every indefinite would be a bound variable, and hence a strong NP. Given the central role that Heim (1982) played at the Groningen conference (on which this volume is based) and in the other papers in the volume, one might have expected her to address this inconsistency, or at least note it. While the relation of Heim's proposals to her (1982) work are unclear, the papers by Reinhart and Pesetsky adopt the basic approach of that analysis and attempt to extend it. Reinhart, for example, discusses donkey anaphora constructions, and suggests a taxonomy of quantifiers which will support donkey anaphora. Specifically, Reinhart maintains that only weak determiners will support donkey anaphora. So for example, the italicised determiners in the (5) below are weak and support donkey anaphora. The italicised determiner in (6) is strong and does not support donkey anaphora. (5) a. Everyone who owns [a donkey]; beats it;. b. Everyone who owns [three donkeys]; beats therrij.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
(3) [NP a manjj [s Ralph believes that [s there is[ NP e;] spying on him]]
28o Book Review
(6) *Everyone who owns [every donkey]; beats it;. In this respect, Reinhart's proposal bears certain similarities to an often overlooked aspect of Kamp's (1981/1984) proposal—that the determiners which support donkey anaphora are those which are preserved under model extensions.1 Reinhart's proposal makes different empirical predictions than Kamp's, and is superior in a number of cases. For example, Reinhart's proposal correctly predicts that (7), which contains a weak determiner, will support donkey anaphora. (7) Everyone who owns [few donkeys]; beats therrij.
(8) *Everyone who owns [no donkey]; beats it;. There are also cases which are strong determiners but which do seem to be perfectly acceptable cases of donkey anaphora. Examples include the following. (9) a. Everyone who saw [the donkey to his left]; beat it;. b. Everyone who saw [every third donkey]; beat it;. It has recently been suggested that any determiner can support donkey anaphora in the right context. So, for example, we find the following sorts of cases discussed in Neale (1990). (10) a. Everyone who saw [none ofthe women]; concluded they; hadn't come. b. Everyone who donated [no organs]; kept them; instead. c. Everyone who interviewed [every candidate]; evaluated him/her; too. So far I have not explained the mechanics of Reinhart's proposal. The point of departure is Heim's (1982) proposal that the indefinite determiner is a free variable which can be bound by the specifier of the containing NP (or a sentential operator, etc.). So, for example, in (5)a, the specifier 'every' binds the free variable in 'a donkey' and the pronoun 'it'. The result is something like the following. (11) Every (x, y) (person x & x owns y & donkey y) (x beats y)2 One of the problems with Heim's proposal was that it only works in a very narrow class of constructions. One construction where it fails is when the specifier of the containing NP is, for example, 'most'. Heim's analysis would render (12) as (13), which does not deliver the correct truth conditions. (12) Most people who own a donkey beat it. (13) Most (x, y) (person x & x owns y & donkey y) (x beats y)
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
There are troubles with Reinhart's proposal, however. One obvious example is the case of'no' which is a weak determiner but which usually is not thought to support donkey anaphora.
Book Review 281
The problem is that (13) could be true in the case where one individual owns 50 donkeys and beats them, and 20 other people who own but one donkey each, and don't mistreat their donkeys. (13) is true under such circumstances because there are 50 pairs of owners/donkeys in which the former beat the latter, even though only one of the 20 donkey owners is a donkey beater. Reinhart's proposal is that the second variable be construed as a set variable, the restriction on set membership being fixed by the weak NP. An example might help illustrate the idea. (14) Most (x, Y (person x & Y — {z| donkey z & x owns z)) (x beats Y)3
(15) Most persons who own more than two donkeys beat them. (16) Most (x, Y) (person x & Y - {z| donkey z & x owns z} & |Y| >2) (x beats Y) One worry with this proposal is that it drives a wedge between donkey anaphora and cross-sentential anaphora, assimilating the first case to a form of binding, and leaving the latter unexplained. If the first donkey pronoun in (17) is within the scope of the specifier 'Most', the second pronoun surely isn't. The two pronouns in (17) do not seem so different that one would expect different explanations of what is going on in each case. (17) Most persons who own a donkey feed it. However, they beat it too. There is a proposal due to Evans (1977), Parsons (1978), Cooper (1979) and Davies (1981), which unifies these accounts of anaphora and which avoids the pitfalls of the Heim (1982) analysis, and does so without violence to binding theory. The idea, simply put, is that the donkey pronouns stand proxy for definite descriptions (or, in Evans's case, have their content fixed by description).4 The analysis of (17) might be glossed as follows. (18) [Most persons who own a donkey]; feed the donkey (or donkeys) that they; own. However, [the persons who own and feed a donkey]; beat the donkey (or donkeys) that they; own and feed too. Whatever the ultimate merits of these descriptive pronoun solutions, they really deserved to be addressed. Higginbotham's paper tries to give a unified account of a number of natural language constructions (all generally considered to be cases of the indefiniteness phenomenon), including there-insertion, predicative nominals, donkey anaphora, cleft constructions, and quantifier interdependence. The common thread in these constructions is the notion of an adjectival quantifier, where an adjectival quantifier is defined as one which is symmetric. 'Det A are B' is true just in case 'Det B are A' is true. 'A man is a lawyer' will be true just in case 'A lawyer is
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The determiner of the weak NP fixes the cardinality of the set. So, for example, (15) may be rendered as in (16).
282 Book Review
a man' is true, so it is adjectival. On the other hand, there are circumstances in which 'Every man is a lawyer' is true, while 'Every lawyer is a man' is false. Consider the predicate nominal construction discussed in Williams (1983). Higginbotham argues that only some NPs appear in predicational position (as defined in Williams (1980)) and those which do cannot be thought of as truly quantificational. So, for example, the following contrast shows that only some determiners can appear in predicative position: (19) John is a lawyer. (20) *John is every lawyer.
(19') (3x: lawyer x) [John is x] for if it did have such an analysis, we would expect (21) to have the analysis given in (21'). (21) John is not a lawyer. (21') (3x: lawyer x) [John is not x] (21') cannot have the same truth conditions as (21), however, for (21') can be true in cases where John is a lawyer. Higginbotham's analysis is that only determiners which are adjectival in character may appear in predicate position. Higginbotham goes on to suggest that only adjectival determiners will give rise to there-insertion, and that only adjectival monotone increasing quantifiers will support donkey anaphora. While ambitious, Higginbotham's proposal is problematic at points. First of all, there are troubles with the analysis of predicative nominals. The problem with (20) goes much deeper than simply that NPs like 'every lawyer' cannot appear in post-copular position. It is observed in Ludlow (1985) for example, that (20') is just as bad. (20') [Every lawyerjj is such that John is him^. Moreover, there is an alternative explanation (due to Lasnik (p.c.)) for why (21) cannot be interpreted as (21'). Namely, that negation serves as something of a scope island for operators. Thus consider the following examples, discussed in Ludlow (1985). (22) a. Not everyone went to the party. b. * [everyone] j NEG e( went to the party
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Notice that (20) is problematic even ifJohn is the only lawyer.5 Higginbotham follows an argument he attributes to Emmon Bach and suggests diat 'a lawyer in (19) cannot be quantificational, i.e. it cannot have the following analysis suggested by (e.g.) Montague,
Book Review 283
(23) a. b. (24) a. b.
I don't always go to parties, * [always] NEG I go to Parties Waldo doesn't love everyone. * [everyone]; NEG Waldo loves e;
(25) Many people that I know are in the garden. (26) There are many people that I know in the garden. According to Higginbotham, (25) can be true if I don't know many people, just so long as many of the people that I know are in the garden. (26), on the other hand, is supposed to be false under the same circumstances. The point is that it is only the absolute sense of'many' which appears in there-insertion constructions, and it is the absolute sense which seems to be adjectival. Keenan argues that the quantifiers which support there-insertion are precisely the existential ones—where 'existential' is defined as follows: a. A basic determiner is called existential if it is always interpreted by an existential function, where b. A function/from properties to sets of properties is existential if for all properties p, q,p zf(q) iff 1 ef(q
A p)
Less formally, the idea is that a determiner DET will be existential just in case (27) and (28) have the same truth conditions. (27) DET X are (is) Y (28) DET X who are (is) Y exist Kccnan claims that (29) and (30) cannot vary in truth value so 'some' is existential. —
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The (a) examples simply cannot have the same interpretation as the (b) examples. With respect to donkey anaphora, as examples (7) and (9) above show, it is not correct that only adjectival monotone increasing quantifiers will support donkey anaphora. It is an interesting question as to how Higginbotham's analysis fares with there-insertion constructions. The interesting cases are 'many' and 'few', which clearly support there-insertion, but which are not clearly adjectival. It seems, for example, that 'Many humans swim' is true, though 'Many swimmers are human' is false. But Higginbotham argues that there is an absolute sense of 'many' which might be used in sentences like the above. If'many' is understood in the absolute sense, then the two sentences will have the same truth conditions. He then suggests that it is this absolute sense of 'many' which is found in there-insertion constructions. Thus there is argued to be a contrast between (25) and (26)
284 Book Review
(29) Some men are bald. (30) Some men who are bald exist. On the other hand, (31) and (32) have different truth conditions, so 'every' is not existential. (31) Every man is bald (32) Every man who is bald exists.
(33) a. Fewer than zero dollars are in my checking account. b. Fewer than zero dollars that are in my checking account exist. (34) Fewer than zero dollars are dollars. As I noted, the evidence is subtle, and not everyone will be comfortable hanging their choice of explanation on such evidence. So far we have seen Keenan's argument that the relevant generalization must be that the quantifiers that admit of there-insertion are the existential ones. The question remains as to why only the existential ones permit there-insertion. An answer to this question is advertised in section 12.3.1 of the paper, but the explanation is difficult to locate. Keenan's argument proceeds in two steps. First, he argues that the structure of a there-insertion sentence like 'there is a man in the garden' is as follows. (35) [S[NP m e r e ] [VP [v is] [NP a
man
] [XP i n
me
g a r <M]]
Let us grant this step in the argument. The second step is to suggest that strict compositionality demands that the only way to interpret a structure like the above would be to say that the denotation of the predicate IMP (QNP) has the property expressed by the XP (pXp)- In short, Q NP e p x p . Let us grant this step in the argument as well. Does it follow that only existential NPs will be able to appear in these constructions? It is hard to see why. On generalized quantifier theory, an NP formed from a non-existential determiner will not differ in any interesting way from one formed from an existential determiner—both will denote a set-theoretic object which will have the property expressed by the NP (i.c. QNP).
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
The first question to consider is whether this proposal is empirically distinguishable from Barwise and Cooper's. Put another way, are there any cases of weak determiners which are not existential, or of strong determiners which are existential? Keenan suggests that there are, and suggests further that the existential/ non-existential distinction is a better predictor of there-insertion possibilities. Keenan suggests that the determiner 'fewer than zero' (which admits thereinsertion) will be existential, yet strong positive. The evidence is subtle. First we must convince ourselves that the pair of sentences in (33) have the same truth conditions. Then we must convince ourselves that (34) is a tautology.
Book Review 285
I hope by examining some of the proposals in the volume I have given some indication of the complexity and pervasiveness of these problems, and of how intricately interwoven they are. Once again, I think this is a very strong collection, not merely because it has brought together a number of papers by the leading figures in the field, but because the papers all address some portion of an interesting and important problem space. One can only wish that more edited volumes provided the same service.
NOTES 1 Kamp(io84),p. i6.Seealsofn. i8.Theidea is that if 'a donkey is tired' is true and the model is extended to include other donkeys, the truth of the sentence is preserved. If 'every donkey is tired' or 'no donkey is tired' is true, and the model is extended to include other donkeys, there is no guarantee that the truth of the sentence will be preserved. (The passage wasfirstbrought to my attention by Jim Higginbotham.) 2 The truth conditions for structures like (11) are not given, but they appear to be those standardly given in generalized quantifier theory.
3 The truth conditions for '(x beat Y)' will presumably be something like the following: '(x beat Y)' is true iff x beat all the members of Y. 4 For a discussion of these proposals, and a modification to them, see Neale (1990). 5 One cannot say things such as 'John is everything his mother wanted him to be' but the standard rebuttal to this observation is that some form of quantification over properties is taking place.
REFERENCES Philosophy, 7,467-5 36. Reprinted in Evans, Barwise.J. & R. Cooper (1981), 'Generalized Collected Papers, Oxford University Press quantifiers and natural language'. Linguis(1985), Oxford. tics and Philosophy, 4, 159-219. Cooper, Robin (1979), 'The interpretation of Heim, I. (1982). 'The semantics of definite and indefinite noun phrases', doctoral pronouns', in Heny and Schnelle (eds), dissertation, University of Massachusetts, Syntax and Semantics, vol. 10, Academic Amherst. Press, New York. Davies, Martin (1981), Meaning, Necessity and Kamp, H. (1981/1984), 'A theory of truth and Quantification, Routledge & Keegan Paul, semantic interpretation', inj. Groenendijk London. el al. (eds), Formal Methods in the Study of Natural Language, Amsterdam Centre Evans, Gareth (1977). 'Pronouns, quantifiers, (1981). Reprinted in J. Groenendijk, T. and relative clauses (I)', Canadian Journal of
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Department ofPhilosophy State University ofNew York at Stony Brook Stony Brook, NY 117Q4 USA
286 Book Review Janssen and M. Stokhof (eds), Truth, Interpretation, and Information,
Foris (1984),
Dordrecht, 1-4.1. Ludlow, P. (1985), 'The syntax and semantics of referential attitude reports', doctoral dissertation, Columbia University. Milsark, G. (1974). 'Existential sentences in English', doctoral dissertation, MIT. Neale, Stephen (1990), Descriptions, MIT Press, Cambridge, Mass. Parsons, Terence (1978), pronouns as para-
phrases', MS, University of Massachusetts, Amherst. Safir, K. (1982). 'Syntactic chains and the definiteness effect', doctoral dissertation, MIT. Williams, E. (1980), 'Predication', Linguistic Inquiry, 11: 203-38. Williams, E. (1983). 'Semantic vs. syntactic categories', Linguistics and Philosophy, 6: 423-46.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
Journal ofSemantics 8: 167-170
© N.I.S. Foundation (1901)
Book Review
Deborah Schiffrin. Discourse Markers. Cambridge University Press, 1987. 364 pages, £30 (paperback). GILLIAN BROWN
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
This book seems at first glance to relate to a descriptive tradition established in the last fifteen years, which relies on a large corpus of conversational data and examines the distribution of discourse particles of one kind or another •within the corpus, states the distribution, and then attempts to give some functional/ notional account of the 'meaning' of the particle in the different contexts in which it occurs. Schiffrin is, however, attempting to do more than this: she is trying to develop a theoretical model which will permit a principled account of the way a range of different types of discourse particles contribute to the coherence of conversational discourse—in particular how the same item has to be understood differently, depending on its role at a particular point in the conversation. Her account is based on data derived from a series of sociolinguistic interviews which she conducted among Jewish families in an area of Philadelphia. The data appear to be quite restricted, since the same chunks of transcription frequently reappear to illustrate the use of yet another discourse marker. This is not particularly harmful except when she wishes to make a quantitative point about some aspects of her data and we find for example, a total of six instances representing 66 per cent of the total (Table 4.3). It would be sensible not to attempt a quantitative statement when the sample is so small. What does need to be made clear is that whereas many features of her data appear to be characteristic of conversations appearing in other, much larger, corpora, some features may be restricted to particular speakers. I was struck by the high incidence of rhetorical questions in the middle of a speaker's turn (often prefaced by 'now' as in 'Now what's two hundred as against six thousand years?' (240)). This may be an ethnicity feature (though it is not mentioned in Tannen's account of New York Jewish speech (1981)). Since the data are based on a questionnaire, it is obvious that the recorded conversations will have a high incidence of questions in them in any case. Schiffrin offers an extended justification of the 'usefulness' of her data (43) in terms of the possible objections arising from her own participant-observer role: a more important issue seems to me to be that of how representative of conversation in general this data can be held to be. The position to take is surely that it is an interesting small corpus and generalisations
288 Book Review
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
based on it should be extrapolated from with caution to other types of conversation and to other groups of speakers. Schiffrin devotes a chapter each to a variety of discourse markers, ranging from those judged to have minimal linguistic meaning ('oh' and 'well'), through the discourse connectives, 'and, but, or', the complements 'so, because', the temporal adverbs 'now, then', to two markers 'whose literal meanings directly influence their discourse use' (267) ('y'know' and 'I mean'). She discusses each of them within a model which is intended to give an account of discourse coherence. This presupposes a particular speaker and (at least one) particular hearer who participate in the conversation by relating to each other and to what they are talking about, and an information state in which speakers make judgments about their hearer's and their own current state of knowledge (a state which evolves as the interaction proceeds). The conversation itself consists of a series of utterances which can be analysed in terms of their ideational structure (propositional content), their action structure (the sequence of acts, what is being done by uttering a particular proposition), and, finally, their exchange structure which is concerned with the management of the interaction, the handing-over of turns, the indication that the current speaker intends to continue speaking, the provision of supportive feedback to the current main speaker and so on. Schiffrin proposes that discourse coherence can be discerned not only by relationships between items at one level of structure, an attractive and potentially constrainable position, but between successive items of different levels of structure, which does permit a large number of possible relationships. A crucial problem with undertaking an analysis of the kind Schiffrin proposes is determining the scope of the relation of each marker to the chunk of conversation in which it occurs. Schiffrin considers the range of possibilities and eventually concludes (37) that markers will have to be defined in relation to 'units of talk', which cannot be independently defined in terms of syntactic or phonological features. This position, taken together with the freedoms holding between different levels of analysis in the model of discourse coherence, leads to a highly complex and necessarily somewhat unconstrained analysis. In her final chapter, Schiffrin records her struggles with these extremely complex data, and points out how much she was obliged to rely on her personal knowledge of individual speakers and of their 'positions on controversial issues' (313) in reaching first an interpretation of what was said, and then an analysis of the forms used to say it. In spite of the fact that as soon as you attempt to give an account of how you understand a chunk of real human interaction, you have to call, it seems, on a multiplicity of strands of knowledge whose interrelationship is far from easy to determine, Schiffrin does produce a great deal of interesting detailed discussion which seems fully justified in terms of her data. Some of her findings are not
Book Review 289
University of Cambridge Centre of English as an International Language Keynes House Trumpington Street Cambridge CB2 tQA England
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011
unfamiliar, for instance, that 'oh', used as a response, denies the correctness of the speaker's presupposition (or part of it) (87). Some draw the reader's attention to systematic differences in the use of these discourse markers and to differences in nuances of meaning which I have not encountered before, for instance, that the primary use of 'oh' lies in the management of information, marking shifts in the speaker's orientation to information. However, I do find the analysis she offers for this less than compelling in terms of her model. If its function is to mark shifts in information state in order to display these to the listener, then it seems that it functions mainly at the level of exchange structure to alert the hearer to the speaker's change in information state—it does not itself contribute to a change at the level of information structure. Schiffrin's own position on this does not seem entirely clear in the final sentence of the chapter when she writes 'Although oh is a marker of cognitive tasks, its use may have pragmatic effects in interaction' (101). This rather surprising outcome islniaintained when she contrasts 'oh' with 'well': 'The main difference is that well marks responses at an interactional level, and oh marks responses at a cognitive level'(127). Working within a model, however evolutionary, and however complex and unconstrained, does permit Schiffrin to make generalisations of a kind which are not available to the simple taxonomist who assumes that all instances of a form are to be forced into one level of analysis. Her systematic examinations of the effect of varying the discourse marker used, or of excising it from the transcript, draw attention to the different levels of function of the various markers. Her conclusion, that the context of utterance constrains the likely interpretation of the relationship between utterances, and that the speaker, by choosing to use a discourse marker further constrains the range of possible interpretations (319), seems sensible and correct. It is an attractive book, frankly recognising the difficulties raised by the methodologies adopted, discussing the drawbacks fully and also weighing up the drawbacks of other possible solutions to the problem. One of its major strengths lies in these careful, well-informed, methodological discussions. It would be unfair, however, to suggest that it does not also make a useful contribution to the vast and complex undertaking of coming to grips with how participants in a conversation structure their contributions, bully each other, try to impose their own opinions, try to come by information, repeat themselves, and try to find out what other people think.
2oo Book Review
REFERENCE Tannen, D. (1981), 'New York Jewish conversational style', International Journal ofthe Sociology of Language, 30: 3 3-9.
Downloaded from jos.oxfordjournals.org by guest on January 1, 2011