Segmental and prosodic issues in Romance phonology
AMSTERDAM STUDIES IN THE THEORY AND HISTORY OF LINGUISTIC SCIENCE General Editor E.F.K. KOERNER (Zentrum für Allgemeine Sprachwissenschaft, Typologie und Universalienforschung, Berlin) Series IV – CURRENT ISSUES IN LINGUISTIC THEORY Advisory Editorial Board Lyle Campbell (Salt Lake City); Sheila Embleton (Toronto) Brian D. Joseph (Columbus, Ohio); John E. Joseph (Edinburgh) Manfred Krifka (Berlin); E. Wyn Roberts (Vancouver, B.C.) Joseph C. Salmons (Madison, Wis.); Hans-Jürgen Sasse (Köln)
Volume 282
Pilar Prieto, Joan Mascaró and Maria-Josep Solé (eds) Segmental and prosodic issues in Romance phonology
Segmental and prosodic issues in Romance phonology
Edited by
Pilar Prieto
ICREA – Universitat Autònoma de Barcelona
Joan Mascaró
Universitat Autònoma de Barcelona
Maria-Josep SolÉ
Universitat Autònoma de Barcelona
JOHN BENJAMINS PUBLISHING COMPANY AMSTERDAM/PHILADELPHIA
4-
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences — Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984.
Library of Congress Cataloging-in-Publication Data Segmental and prosodic issues in romance phonology / edited by Pilar Prieto, Joan Mascaró and Maria-Josep Solé. p. cm. -- (Amsterdam studies in the theory and history of linguistic science. Series IV, Current issues in linguistic theory, ISSN 0304-0763 ; v. 282) Includes bibliographical references and index. 1. Romance languages--Phonology--Congresses. PC76 .S44 2007 440--dc22 2007060751 ISBN 978 90 272 4797 1 (Hb; alk. paper) © 2007 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. • P.O.Box 36224 • 1020 ME Amsterdam • The Netherlands John Benjamins North America • P.O.Box 27519 • Philadelphia PA 19118-0519 • USA
CONTENTS
Introduction
vii
Part 1: Segments and processes
1
Detection of liaison consonants in speech processing in French: Experimental data and theoretical implications Noël Nguyen, Sophie Wauquier-Gravelines, Leonardo Lancia & Betty Tuller
3
Patterns of VCV coarticulatory direction according to the DAC model Daniel Recasens
25
The stability of phonological features within and across segments: The effect of nasalization on frication Maria-Josep Solé
41
Pre- and postaspirated stops in Andalusian Spanish Francisco Torreira
67
Part 2: Prosodic structure
83
Variation in the intonation of extra-sentential elements Lluïsa Astruc-Aguilera & Francis Nolan
85
Voicing-dependent cluster simplification asymmetries in Spanish and French Laura Colantoni & Jeffrey Steele
109
The phonetics and phonology of intonational phrasing in Romance Sónia Frota, Mariapaola D’Imperio, Gorka Elordieta, Pilar Prieto & Marina Vigário
131
Disentangling stress from accent in Spanish: Production patterns of the stress contrast in deaccented syllables Marta Ortega-Llebaria & Pilar Prieto
155
vi
CONTENTS
Part 3: Acquisition of segmental contrasts and prosody
177
On the effect of (morpho)phonological complexity in the early acquisition of unstressed vowels in European Portuguese Maria João Freitas
179
The perception of lexical stress patterns by Spanish and Catalan infants Ferran Pons & Laura Bosch
199
Logistic regression modelling for first and second language perception data Geoffrey Stewart Morrison
219
Rhythmic typology and variation in first and second languages Laurence White & Sven L. Mattys
237
Subject Index
259
INTRODUCTION
The second Phonetics and Phonology in Iberia (PaPI) conference, hosted by the Universitat Autònoma de Barcelona in June 2005, proved a great success in bringing together scholars from around the world, all of them involved in researching contemporary issues in phonetics, phonology, and related areas, including language acquisition, language variation and change, and speech technology. This volume provides a selection of the papers presented at that conference. Most of them are concerned with the relationship between phonetics and phonology, and most of them also use a methodological approach that has come to be known as ‘laboratory phonology’. This approach, whose foundations were first laid about twenty years ago, sets out to answer a wide array of research questions through the use of experimental methods. In other words, experimental methodology previously associated with phonetic studies is applied to the realm of phonology with the goal of exploring the crucial correspondence between empirical data and theoretical claims. Over these last two decades, this experiment-based approach has proved extremely fruitful in its ability to test controversial claims in phonological theory, resolve phonological issues, and discover the principles guiding linguistic mechanisms (for a good overview, see articles by Pierrehumbert et al. 2001 and D’Imperio 2005). The specific focus of the papers in the present collection is descriptive and theoretical issues in the phonology of Romance languages. These papers provide new empirical data on a number of phonetic and phonological phenomena in a variety of Romance languages and their dialectal varieties, including Catalan (Eastern and Western Catalan, Valencian and Majorcan), French (European and Quebec), Italian (Neapolitan), Portuguese (Standard European and Northern European) and Spanish (Andalusian, Argentinian, Central Peninsular and Chilean). Importantly, most of the contributions take a crosslinguistic or crossdialectal perspective, paving the way to a better understanding of linguistic differences in typologically close language varieties. This focus on Romance languages is motivated by our belief that there is a need for multilanguage data to test current theoretical claims and models, which often lack precisely this sort of broad crosslinguistic basis. The virtue of crosslinguistic research is that it constitutes a valuable tool to explore similarities and differences between languages and thus allows us to construct general linguistic theories, while at the same time ensuring that the peculiaritites of individual languages can be characterized within the theory.
viii
INTRODUCTION
An important goal of this volume is to bridge the gap between traditional Romance linguistics—already with a long and rich tradition in data collection, cross-language comparison, and phonetic variation—and laboratory phonology. In our view, subjecting the theoretical claims and data from traditional Romance linguistics to the scrutiny of experimental techniques in the laboratory can only strengthen the scientific basis of this discipline and better integrate its findings in current phonetic and phonological theory. Though in recent years laboratory phonology has proved to be a broad and fertile interdisciplinary approach in Romance linguistics and has grown in popularity among researchers, it is still not a well-known and established approach among Romance scholars. The body of experimental work devoted to Romance languages is still far smaller than the work that has examined, for example, Germanic languages. This volume is thus an attempt to help redress that imbalance: as the reader will see, the studies collected herein present cutting-edge laboratory phonology research as applied to Romance languages. The volume has been organized into three main topic areas, which reflect the main themes of the conference. The first is concerned with segmental processes (coarticulation and assimilation processes, sandhi processes, feature cooccurrence and sequential restrictions), the second with prosodic structure (prosodic characterization of parentheticals, syllable structure, prosodic phrasing, stress and pitch accent prominence), and the third with the acquisition of segmental and prosodic features (the acquisition of vowel reduction, L1 and L2 vowel perception, initial word segmentation, and L2 rhythmic patterns). Thus we begin with the smaller segmental units, move on to larger phonological constituents, and conclude with a look at acquisition on both levels. Section 1, Segments and processes, comprises four papers which address the phonetic properties of segments, their mutual influence, and phonological processes across words. Three of these papers focus on the interaction of production and perception in phonological structure and sound change, mostly within the framework of gestural phonology (Browman & Goldstein 1986), which allows modelling of the biomechanical and aerodynamic constraints of speech gestures in a way that accounts for the actual observed patterns. These papers also serve to illustrate the notion that sound change can be partly explained by universal phonetic factors and has its origins in synchronic variation (Ohala 1989, 1991). Daniel Recasens examines the coarticulation patterns found in different types of VCV sequences in Catalan and Spanish and how they provide support for the ‘Degree of Articulatory Constraint’ (DAC) model of coarticulation. In particular, he argues that the workings of the speech production mechanism, and in particular the DAC model, can explain the directionality and extent of coarticulation found in VCV sequences. Acoustic data on the size of anticipatory and carryover effects between vowels [a] and [i] and a set of consonants clearly show that: (a) some VCV sequences with salient consonant anticipatory effects—like sequences with dark [l]—exhibit more vowel-to-
INTRODUCTION
ix
consonant anticipation effects than carryover effects, while sequences involving the alveolopalatals [] and [] show the reverse pattern; (b) vowel anticipation does not exhibit a comparable size for all consonants but is greater for unconstrained consonants than for more constrained ones. As predicted by the model, the direction and extent of vowel coarticulation varies inversely with the degree of constraint of the intervocalic consonant. Importantly, Recasens provides evidence in support of a close match between the predictions of the model and a number of observed sound changes and assimilatory processes in various Catalan dialects. Maria-Josep Solé argues that the articulatory-acoustic stability of phonological features may be affected not only by concurrent features, but also by features in adjacent segments when they coincide in time due to coarticulatory overlap. Specifically, she addresses the question of whether aerodynamic factors are at the origin of the incompatibility between nasality and frication. She presents the results of a series of experiments designed to explore the effects of velopharyngeal opening (or degrees of nasality) on the stability of segments requiring a high pressure build-up in the oral cavity, such as fricatives. Acoustic and aerodynamic evidence shows that in fricative + nasal sequences anticipatory velum lowering during the acoustic duration of the fricative reduces or even extinguishes the pressure difference required for frication. This is clear evidence that frication is unstable when it comes in contact with nasalization in adjacent segments. She presents important additional evidence that this instability is at the origin of a number of phonological patterns found historically and synchronically in Romance languages by which fricatives tend to lose their friction when they precede nasal consonants. In addition, she argues that the same aerodynamic and acoustic factors responsible for the combination of features within a segment can be used to explain how features interact in contiguous segments. Francisco Torreira deals with the well-known phonological process of /s/ aspiration in coda position present in a large number of Spanish dialects. The author furnishes instrumental data showing that in Andalusian Spanish, a southern variety of Peninsular Spanish, /s/ aspiration before voiceless stops is accompanied by consistent postaspiration of the stop consonant. His analysis of spontaneous speech clearly shows that /s/ preceding a stop, though usually realized as a period of aspiration or breathy voice, may be absent in a considerable number of cases. Crucially, voiceless stops following /s/ show a consistent pattern of postaspiration. This asymmetry suggests that the conditioning factor for postaspiration in Andalusian voiceless stops may be the presence of a preceding underlying laryngeal gesture that is not strictly timed with the supralaryngeal gestures. Torreira argues that a gestural analysis (Browman & Goldstein 1986) offers a plausible account for a phenomenon that is especially difficult to explain in terms of segments. Within this framework, the timing of the starting point of the glottal opening gesture with respect to the supraglottal closure may not be very accurately specified, while the timing of the ending point with respect to the end of the stop closure is more precise.
x
INTRODUCTION
Finally, Torreira reviews diachronic and synchronic examples from other languages which illustrate various paths of change for the same type of sound sequence. He suggests that this pattern in Andalusian Spanish, showing unstable preaspiration and more consistent postaspiration, might eventually lead towards a new category of aspirated voiceless stops, as has occurred in other languages. Noël Nguyen and colleagues address the well-known sandhi phenomenon of French liaison and the question of how liaison consonants are processed in speech perception. Specifically, the study analyses whether during speech comprehension liaison consonants are processed and represented differently from word-initial and word-final consonants. With this object, the authors undertook a series of perception experiments that examined potential differences in the detection rate of liaison consonants vs fixed consonants and attempted to determine whether these differences might be attributable to the phonetic properties of the consonants involved. The results provide evidence that liaison consonants are more difficult to detect than word-initial consonants (detection scores were lower and response times tended to be slower for the former than for the latter) and that these differences are not attributable to potential phonetic differences between the two. Nguyen and colleagues argue that the difficulty in processing liaison consonants seems to provide partial support for the autosegmental representation of liaison consonants as floating segments. Yet as detection accuracy also seemed to vary depending on the degree of lexicalisation of the carrier word sequence, the authors point out that more research is needed to evaluate the potential effects of probability of occurrence and thereby properly evaluate the predictions of exemplar-based models in the perception of liaison consonants. Prosodic structure and intonational phonology have been recurring themes in laboratory phonology work, partly because of the unreliability of introspective work on these issues. The four papers in Section 2, Prosodic structure, address important issues in this area of study, taking advantage of the benefits offered by using a common crosslinguistic experimental approach. Lluïsa Astruc-Aguilera and Francis Nolan examine the prosodic characteristics of extra-sentential elements (dislocated phrases, vocatives, adverbials, etc.) in Catalan and English. The authors present the results of three experiments set up to study phrasing and accentuation patterns in these constructions. Though phonological studies have traditionally considered that extra-sentential elements form prosodically independent units and are unaccented, results from these experiments show that they do not always form independent tonal units, nor are they always deaccented. Rather, they show variation in their phrasing and intonation, revealing a trade-off between prosodic independence and tonal subordination: deaccentuation only seems to be compulsory in those cases in which the extrasentential elements and the main phrase belong to the same prosodic domain. Also, the experiments reveal that accentual cues seem to be stronger and more consistent cues than phrasing cues and the degree of inter-speaker variation is lower in the former than in the
INTRODUCTION
xi
latter. In their third experiment, using both a database in which three levels of stress were specified and a masking noise technique for recording, the authors found that right-dislocated phrases were totally deaccented. Astruc and Nolan conclude that prosody signals the pheripheral status of extrasentential elements by general deaccentuation or compression of the pitch range, independent phrasing being a more optional cue. Laura Colantoni and Jeffrey Steele provide a detailed account of the behavior of stop-liquid clusters in two varieties of Spanish (Chilean and Argentinian) and two varieties of French (Quebec and European). They show that choice of cluster simplification, whether by assimilation or dissimilation, is correlated with the voicing properties of the stop and the manner properties of the liquid (tap, fricative, approximant). The results of the production experiment in the four varieties under investigation show that: a) similarity in manner and voicing between the two members of the cluster determines the degree of cluster simplification; b) in the case of stop-rhotic clusters, the phonetic characteristics of the rhotic determines the strategy used; in the case of Spanish, with the tap being highly similar to the stop, dissimilation via vowel epenthesis is the preferred outcome; and c) stop voicing plays a role in determining the degree of assimilation and dissimilation: given that voiceless stops are longer than their voiced counterparts, a compensatory lengthening effect is observed, and thus shorter epenthetic vowels are found. The effects that trigger synchronic variation can also account for the evolution of stopliquid clusters from Latin to Romance. Finally, the authors provide an optimality theory-based analysis of their experimental results. The chapter by Sónia Frota and colleagues explores the phonetic realization of intonational phrasing in five Romance language varieties: Catalan, two varieties of European Portuguese, Neapolitan Italian, and Spanish. Data from a common experimental database (the ‘Romance Languages Database’) is used to analyze the phonetic realization of phrasing in the five varieties under examination. First, the authors provide a typology of combinations of nuclear pitch accents plus boundary tones used across languages, as well as their relative frequency. The dominant boundary tone used in the five varieties is the high (H) boundary tone, in the form of either a continuation rise or sustained pitch. Second, they offer a detailed analysis of the phonetics of the H boundary tone across languages. Specifically, the data reveal that nuclear pitch accent choice affects the scaling of the H boundary tone in a similar and consistent way, namely, the tone is higher after High nuclear accents than after Low nuclear accents. They interpret this as resulting from the upstep of the H boundary tone after an accentual H. Also, the data reveal mixed effects of constituent length on the scaling of H boundary tones, with the languages observed clustering in two main groups: the CatalanSpanish group (with almost no length effects) and the Italian-European Portuguese group (with clear length effects). Marta Ortega-Llebaria and Pilar Prieto’s paper examines the acoustic correlates of stress prominence in Spanish in both accented and unaccented
xii
INTRODUCTION
environments (e.g. parentheticals). Traditional studies typically describe the correlates of stress in accented environments, thus suffering from covariation between stress and accent. This paper goes beyond traditional accounts in that the pitch accent factor is controlled for. The results of the production experiments described reveal that the stress contrast is maintained in deaccented contexts and that syllable duration and spectral tilt (intensity at high frequencies of the spectrum) are reliable acoustic correlates of this contrast in Spanish. These results contribute to the discussion about the nature of stress across languages, advocating for the view that stress prominence has its own phonetic cues, and against other views which claim that stress cues are parasitic on vowel reduction cues. Thus while American English, Dutch and Spanish differ in the degree of vowel reduction involved in marking stressed positions, they do not differ greatly in the way they use other acoustic correlates (i.e. duration and intensity) to signal the presence of stress. Section 3, Acquisition of segmental and prosodic structure, comprises four papers which address the acquisition of segmental and prosodic contrasts by infants and second language learners. The papers in this section illustrate how laboratory phonology is in fact starting to bridge the gap between psycholinguistics and phonology. Maria João Freitas focuses on the early production of vowels in unstressed position by European Portuguese-speaking children, and specifically, on how these children start acquiring two specific phonological processes of vowel reduction, namely, that of /, e/ turning into [] and that of /a/ turning into [] in unstressed positions. Since the acquisition of vowel reduction processes has not received much attention in the acquisition literature, this paper provides new empirical data from European Portuguese, a Romance variety which presents a number of reduced vowels in unstressed position deriving from the productivity of the vowel reduction process. On the basis of longitudinal data collected for four children aged 0;10 to 2;8, it is observed that Portuguese children acquire vowel reduction relatively early in the path of development, and that syllable deletion is one of the common strategies found in children. Freitas claims that the complexity of the target vowel system increases children’s early sensitivity to vowel differences and promotes the speed of phonological development. Interestingly, the results show that vowel reduction emerges either simultaneously in word-medial and word-final position or possibly earlier word-finally. Freitas suggests that the presence of morphological content in the word-final vowel might be promoting phonological development in this position. Geoffrey Stewart Morrison introduces logistic regression analysis as applied to L1 and L2 speech perception data involving Spanish vowels. The chapter is intended as a tutorial for L2-speech-perception students and researchers who are not familiar with the technique. Using data taken from previous identification experiments on L1 Spanish vowel perception and on L1 and L2 English vowel perception, the author applies logistic regression model fitting techniques to determine which acoustic cues are attended to by listeners
INTRODUCTION
xiii
when identifying stimuli. He shows that logistic regression coefficients can be successfully used to produce intuitive representations and quantify how listeners use those acoustic cues, as well as to model sequential stages in L2 learners’ perception. At the same time, these statistics can also be used to determine whether there are significant differences in the perception of stimuli by L1 vs L2 groups of listeners. In sum, Morrison shows how the logisticregression technique can be successfully used in L2 speech perception research. Ferran Pons and Laura Bosch focus on how infants under one year of age deal with word segmentation and which prosodic features they pay attention to in order to perform this task. Sensitivity to prosodic information has been observed very early in development in studies with English and Dutch children. For example, at nine months, American English infants show a trochaic bias, meaning that they prefer to listen to lists of strong-weak disyllabic words (trochaic), as opposed to lists of weak-strong disyllabic words (iambic), a stress pattern which is atypical of English. Using a slightly modified version of the Head-Turn Preference Procedure, a paradigm that has been used successfully in infant speech perception research over the past twenty years, Pons and Bosch set out to explore the metrical preferences of sixmonth-old Spanish- and Catalan-learning infants. The data revealed no pattern of preference for trochees. An additional experiment with nine-month-old infants revealed that, unexpectedly, even at this age they do not show a pattern of preference for trochaic or iambic stress. The authors partly attribute this crosslinguistic difference to a weaker predominance of the bisyllabic trochaic pattern in Catalan and Spanish relative to English. Yet the results cast some doubt on the usefulness of this prosodic cue (i.e. stress pattern) alone to help early word segmentation of fluent speech. The authors suggest that phonotactic information—the fact that heavy CVC syllables appear generally in stressed positions—might be combined with stress cues at a very early age in order to predict the patterns of preference. Finally, Laurence White and Sven Mattys set out to test the discriminative performance of different metrics of rhythmic distinctions across languages. One of the novelties of this article is that it collects data from second language rhythm, in the hope that the metrics will prove useful as a tool to identify the rhythmic differences between native English speakers, for example, and Spanish speakers of L2 English. The authors’ first production experiment is designed to test how well different metrics support the distinction between the rhythm of ‘syllable-timed’ French and Spanish and that of ‘stress-timed’ Dutch and English, with the effect of L1 on L2 rhythm also considered. The results show that rate-normalised metrics of variation in vocalic interval duration clearly and effectively (a) discriminate between the classic distinction between stress-timed and syllable-timed languages; and (b) are informative about the adaptation of speakers to rhythmically-similar (Dutch and English) or rhythmically-distinct (Spanish and English) second languages. Their second production experiment examines the rhythmic contrasts between
xiv
INTRODUCTION
different accents of British English, with results showing evidence of rhythmic gradience between them. Finally, results from a perceptual test find a normalised metric of vocalic interval variation to be the strongest predictor of the rating of the second language speaker’s accent as native or non-native. As a final word, we would like to thank the scholars who agreed to review the contributions included in this volume. We are greatly indebted to the anonymous reviewers at the John Benjamins office as well as the external reviewers who have participated in the assessment of articles: Laura Colantoni, Néstor Cuartero, Eva Estebas, Paula Fikkert, Chip Gerfen, Barbara Gili-Fivela, José Ignacio Hualde, Conxita Lleó, Francis Nolan, Hugo Quené, Daniel Recasens, Marija Tabain, and Laurence White. We owe the smooth progress in the production of this book to Anke de Looper of Benjamins and E.F.K. Koerner, the series editor, who we would like to thank for their active and continuing support from the start of the project. Many thanks are also due to Michael Kennedy-Scanlon, Marianna Nadeu and Maria del Mar Vanrell for their help in textual editing and proofreading. This work was partially supported by grants BFF2003-08364-C02, HUM200420318-E, and HUM2006-01758/FILO awarded by the Ministerio de Ciencia y Tecnologia and FEDER, and by grant 2005 ARCS1 00174 awarded by the Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya). We believe this volume will constitute a very useful companion for phoneticians, phonologists, and researchers investigating sound structure in Romance languages. It is our desire that it will spark further interest in laboratory phonology and will contribute to enlarging the body of research focusing on these languages. Barcelona, November 2006 Pilar Prieto Joan Mascaró Maria-Josep Solé
References Browman, Catherine P. & Louis Goldstein. 1986. “Towards an articulatory phonology”. Phonology Yearbook 3. 219-252. D’Imperio, Mariapaola. 2005. “La Phonologie de Laboratoire: Finalités et quelques applications”. Phonologie et phonétique: Forme et substance ed. by Noël Nguyen, Sophie Wauquier-Gravelines & Jacques Durand. 241-264. Paris: Hermès. Ohala, John J. 1989. “Sound change is drawn from a pool of synchronic variation”. Language Change: Contributions to the study of its causes ed. by Leiv E. Breivik & Ernst H. Jahr. 173-198. Berlin: Mouton de Gruyter.
INTRODUCTION
xv
----------. 1991. “What's cognitive, what's not, in sound change”. Diachrony within Synchrony: Language History and Cognition ed. by Günter Kellermann & Michael D. Morrissey. 309-355. Duisburg: Peter Lang Verlag. Pierrehumbert, Janet, Mary E. Beckman & D. Robert Ladd. 2001. “Conceptual foundations of phonology as a laboratory science”. Phonological Knowledge: Conceptual and Empirical Issues ed. by Noel BurtonRoberts, Philip Carr & Gerard Docherty. 273-303. Oxford: Oxford University Press.
PART 1 SEGMENTS AND PROCESSES
DETECTION OF LIAISON CONSONANTS IN SPEECH PROCESSING IN FRENCH EXPERIMENTAL DATA AND THEORETICAL IMPLICATIONS *
NOËL NGUYEN1, SOPHIE WAUQUIER-GRAVELINES2, LEONARDO LANCIA1 & BETTY TULLER3 1 Laboratoire Parole et Langage, CNRS & Université de Provence, 2Structures formelles du langage, CNRS et Université de Paris VIII, 3Center for Complex Systems and Brain Sciences, Florida Atlantic University
Abstract The goal of the present study is to better understand the mechanisms involved in the processing of liaison consonants by listeners in French. Previous work (Wauquier-Gravelines 1996) showed that liaison consonants are more difficult to detect than word-initial consonants in a phoneme-detection task. We examined to what extent such differences are attributable to the consonants’ phonetic properties, and we also compared the perception of liaison consonants with that of fixed word-final and word-medial consonants, as well as wordinitial ones. The results suggest that liaison consonants have a specific perceptual status. Implications for both autosegmental and exemplar-based theories of liaison are discussed.
1.
Introduction French liaison is a well-known phenomenon of external sandhi that refers to the appearance of a consonant at the juncture of two words, when the second word begins with a vowel, e.g. un [œ] + enfant [f] → [œnf] “a child”, petit [pti] + ami [ami] → [ptitami] “little friend”. Liaison consonants are usually enchaînées, i.e. realized as syllable-onset consonants, although they can also appear in coda position, compare [p.ti.ta.mi] (with enchaînement) and [p.tit.a.mi] (without enchaînement, Encrevé 1988). In the following, the two words at the juncture of which liaison consonants appear will be referred to as Word 1 and Word 2, respectively.
*
This work was partly supported by the ACI Systèmes complexes en SHS Research Program (CNRS & French Ministry of Research) and by NSF Grant #0414657. We thank Sharon Peperkamp and Stéphanie Ducrot for drawing our attention to the missing-letter effect. We are also grateful to Robert Espesser for sharing his statistical expertise, and to Pierre Encrevé, Zsuzsanna Fagyal, Cécile Fougeron, Mariapaola D'Imperio, Maria-Josep Solé, Marina Vigário, and three anonymous reviewers for useful comments.
4
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
Among the many different approaches to French liaison that have been proposed over the last thirty years or so (see Tranel 1995; Côté 2005, for reviews), a major bone of contention relates to whether liaison is a phonological or a lexical phenomenon. The phonological approach dates back to early generative studies on French phonology, in which liaison was seen as an exception to a general process of final consonant deletion, referred to as the ‘French Truncation Rule’ by Schane (1968). By contrast, according to another proposal made later in the same general framework (e.g. Klausenburger 1974, 1977), liaison consonants arose in the course of the derivation owing to an insertion mechanism (views differed as to whether this epenthesis occurred at the end of Word 1 or at the onset of Word 2). More recent treatments of liaison in nonlinear phonology have reconceptualized the deletion/insertion dichotomy, as pointed out by Tranel (1995). Thus, in the autosegmental account proposed by Encrevé (1988) and Encrevé and Scheer (2005), liaison consonants are viewed as floating segments, with respect to both the segmental and syllabic tiers. Such consonants must be associated with both tiers to be phonetically realized, and this association takes place only under certain conditions. In both the linear and nonlinear phonological approaches, liaison is generally portrayed as being subjected to prosodic, morphological, syntactic and stylistic factors. Lexical approaches to liaison can be divided into two main strands. Suppletive analyses as advocated by Klausenburger (1984) among others, assume that words such as petit are associated in the lexicon with two distinct allomorphs, a longer one ending in a liaison consonant (/ptit/) and a shorter one without liaison consonant (/pti/). In contrast, in exemplar-based models, such as the one recently proposed by Bybee (2001, 2005), liaison consonants are said to take place within specific grammatical constructions, e.g. [NOUN z- [vowel]-ADJ]Plural, in enfants intelligents [fzteli] “clever children”. Constructions display different degrees of generality/abstractness, and range on a continuum from very abstract (as in the example given above), to fixed, lexicalized phrases like c’est-à-dire [setadi] “that is to say”. This provides a unified account of both false liaisons, which are attributed to the overgeneralization of a high-frequency construction, as in quatre enfants [katzf] “four children”, and word-specific differences in the realization of liaison. Frequency of use is of central importance, as liaison is assumed to occur more often within a sequence of words characterized by a higher frequency of co-occurrence. This approach is neutral with respect to the issue of whether liaison consonants result from a deletion or insertion process, nor does it make any specific claim as to whether the consonant belongs to Word 1 or 2. As noted above, liaison consonants when realized are usually enchaînées, i.e. syllabified into onset position. This results in a mismatch between word and syllable boundaries. Specifically, the syllable whose onset position the liaison consonant comes to occupy straddles the boundary between Word 1 and Word 2 (e.g. [p.ti.ta.mi], where the word boundary takes place between [t]
DETECTING LIAISON CONSONANTS IN FRENCH
5
and the following [a]). Recent psycholinguistic studies (Wauquier-Gravelines 1996; Gaskell, Spinelli & Meunier 2002; Spinelli, McQueen & Cutler 2003) have shown that this mismatch does not necessarily make it more difficult for listeners to identify the second word, but may in fact facilitate the recognition of that word with respect to a baseline condition. This raises questions for models of speech perception in which the syllable is viewed as a primary unit of segmentation in lexical access in French (see Content, Kearns & Frauenfelder 2001, for a recent discussion in that domain). A related issue concerns the way in which liaison consonants are processed in speech perception. One may ask which perceptual mechanisms allow a liaison enchaînée to be distinguished from word-initial as well as word-final consonants, and to which of the two words the liaison consonant is associated by the listener. More generally, the question arises whether in speech comprehension liaison consonants are processed and represented in a way that is different from fixed consonants. It is this issue that is addressed in the present paper. A series of experiments are reported which together suggest that liaison consonants do have a distinct perceptual status. Implications for current models of liaison in French will be discussed. 2.
Empirical evidence for a specific status of liaison consonants in speech perception Wauquier-Gravelines (1996) examined the speed and accuracy with which listeners can detect the presence of a liaison consonant in the speech chain. Because it has not been published, we present this work here in some detail and in light of more recent findings. Wauquier-Gravelines compared listeners’ responses to liaison consonants and word-initial consonants in a phoneme detection task. Listeners were presented with a series of sentences and were asked to detect a pre-specified phoneme in each sentence. The material contained pairs of sentences that were designed so that the target phoneme appeared as a word-initial consonant in one sentence (e.g. son navire [snavi] “his ship”) and as a liaison consonant (e.g. /n/ in son avion [snavj] “his plane”) in the other. The two sentences in each pair were matched with respect to their syntactic, lexical and phonemic make-up. A number of filler sentences were also used. Both the test and filler sentences were recorded by a native speaker of standard French. Two experiments were conducted. Each experiment was comprised of a training phase and a test phase. The target consonant was /t/ in the first experiment and /n/ in the second one. There were fourteen subjects, all native speakers of standard French, with no known hearing impairment, and naive as to the purpose of the experiment. The data showed that listeners experienced greater difficulties in detecting the liaison than the word-initial consonant. There were significantly fewer correct responses for the liaison than for the word-initial consonant for both /t/ (liaison: 67.8%, word-initial: 92.8%, χ2 = 9.56, p<0.01) and /n/
6
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
(liaison: 44.6%, word-initial: 87.5%, χ2 = 21.07, p<0.01), although this difference was smaller for /t/ than /n/. These results suggest that liaison consonants are not processed in the same way as fixed consonants by listeners. There is a potential parallel between this phenomenon and the status liaison consonants have in autosegmental phonology. As indicated above, liaison consonants display both syllabic and skeletal flotation in Encrevé’s (1988) autosegmental model. When followed by a word with a null onset (i.e. an onset with no corresponding segmental constituent and no skeletal slot), the liaison consonant is attributed a skeletal slot and, in the unmarked case, is syllabified into onset position. Thus, liaison consonants are not lexically anchored to a timing unit and are in this regard characterized by structural instability. It may be hypothesized that listeners’ behaviour in the phoneme-detection experiments is a reflection of this instability. In other words, it would be more difficult for listeners to map a liaison consonant onto a phonemic label because unlike ‘ordinary’ phonemes, i.e. fixed consonants, liaison consonants are underlyingly floating with respect to the skeleton associated with the word to which they belong. The absence of a pre-established link between the liaison consonant and one of the available timing units in the underlying lexical representation would make that consonant harder to detect in an explicit manner. This phenomenon is reminiscent of Sapir’s (1933) observation that speakers of British English are convinced they do not pronounce sawed and soared in the same way, because soared is viewed as underlyingly containing an r, even though both words may be phonetically transcribed [sd] (in nonrhotic varieties of BE). In Encrevé’s model, the difference between sawed and soared is attributed to the presence of a floating r in the latter but not in the former. Likewise, Wauquier-Gravelines’ (1996) findings may suggest that a liaison consonant is perceived by listeners in a way that mirrors its specific phonological status as a floating segment. In other words, syllabic/skeletal flotation may be perceptually and cognitively relevant. Although differences in the phonological status of liaison and wordinitial consonants thus provide an appealing explanation for the observed perceptual patterns, other factors such as the frequency of occurrence of the Word 1-2 sequences, the target’s acoustic properties, and the target’s position within the carrier word, may have also played a role. We begin with the issue of lexical frequency. It might be the case that liaison consonants appeared in a context that rendered them less predictable by listeners than word-initial consonants. Recent studies (e.g. Adda-Decker, Boula de Mareüil & Lamel 1999; Fougeron, Goldman & Frauenfelder 2001; Fougeron, Goldman, Dart, Guélat & Jeager 2001) suggest that the realization of liaison is partially conditioned by a complex interplay between the lexical frequencies of Words 1 and 2. Specifically, Fougeron, Goldman and Frauenfelder (2001) found that the rate of realization of liaison shows both a positive correlation with the frequency of Word 1, and a small, but significant, negative correlation with the frequency of
DETECTING LIAISON CONSONANTS IN FRENCH
7
Word 2. Fougeron et al.’s results also revealed that the rate of liaison increases with the frequency of co-occurrence of the two words. In WauquierGravelines’ experiments, however, potential lexical frequency effects were fully neutralized for Word 1 since that word was identical for both sentences in each sentence pair. In addition, Fougeron, Goldman and Frauenfelder (2001) point out that because high-frequency words are often short function words, the relationship found between frequency of Word 2 and rate of liaison may actually reflect the fact that liaison is realized less often before short function words than before longer words. Since Wauquier-Gravelines only used nouns and adjectives (most of them di- or trisyllabic) in Word 2 position, it seems unlikely that, in her material, liaison consonants had a lower probability of occurrence than word-initial consonants. Note also that words starting with a vowel are much more numerous in French than consonant-initial words with either of the two target consonants used in the experiments, /t/ or /n/. In such contexts, listeners should have been biased towards identifying the target as a liaison, rather than a word-initial consonant. This again suggests that the lower detection rate obtained for the liaison consonant was not related to the frequencies of occurrence associated with both targets. Let us now turn to the target consonant’s acoustic properties. Differences may arise in that domain between liaison and word-initial consonants, which would make the former less perceptually salient than the latter. Such differences have indeed been found in the vicinity of the consonant in previous work (e.g. Delattre 1940; Dejean de la Bâtie 1993; Gaskell et al. 2002; Spinelli et al. 2003). Thus, Dejean de la Bâtie (1993) found that the duration of the closure and that of the following burst are both shorter for liaison /t/ compared with word-initial /t/. In Gaskell et al. (2002), the duration of /t/, /r/ and /z/ also proved to be on average slightly but significantly shorter in liaison (73 ms) than in word-initial position (88 ms; consonant duration was taken as the time interval between the offset of the preceding vowel and the onset of the following vowel). A similar durational difference was found between liaison (64 ms) and word-initial consonants (71 ms) by Spinelli et al. (2003), for /p, r, t, n, /. Note that the shorter duration for liaison consonants reported in the above studies could be due to actual liaison shortening and/or word-initial lengthening (Fougeron 2001). Wauquier-Gravelines carried out a series of acoustic analyses on sentences analogous to those she used as stimuli in the two experiments reported above. For /t/, she found that the closure and burst had a significantly shorter duration in liaison enchaînée (mean overall value: 50 ms) than in wordinitial position (70 ms), in keeping with previous findings. For /n/, however, the acoustic duration of the consonant was not found to be statistically different in liaison enchaînée (58 ms) and word-initial position (61 ms). Thus, it seems that variations in duration in liaison vs word-initial position are both subtle and specific to certain consonants (possibly obstruents) only. Although such data suggest that the observed differences in the listener’s responses to the liaison and word-initial consonants are not related to how these two types of
8
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
consonant are phonetically realized, this issue will be taken up again in the next section. Yet another factor that may have contributed to making the liaison consonant less easily detectable than the word-initial consonant relates to the position that these consonants occupied in the carrier word. In the phonological approach espoused by Encrevé (1988), among others, the liaison consonant lexically belongs to Word 1 and occurs in final position in that word. Because a greater perceptual weight is attributed to word onsets compared with word offsets in sequential models of word recognition such as Cohort (MarslenWilson & Zwitzerlood 1989), it may be speculated that the word-initial consonant was perceptually more prominent than the liaison consonant. Thus, to test the hypothesis that the lower detection rate for the liaison consonant is attributable to syllabic/skeletal flotation, rather than position in the word, it would be necessary to include word-final fixed consonants in the potential targets, and to show that listeners’ responses are more accurate for these consonants than for liaison consonants. In Encrevé’s model, so-called final fixed consonants are characterized by the fact that the corresponding coda constituent on the syllabic tier is floating with respect to the skeleton. This allows the model to account for the enchaînement of final fixed consonants prior to a vowel-initial word. A crucial difference between final fixed and liaison consonants, however, is that only the former are anchored to the skeleton. Wauquier-Gravelines’ material was not designed to undertake systematic comparisons between listeners’ responses to liaison and final fixed consonants. These methodological issues were addressed in the experiment described in the following section. Further evidence on the specific perceptual status of liaison consonants The goal of this experiment was to confirm and extend WauquierGravelines’ findings in two directions. First, we examined to what extent differences in the detection rate of liaison consonants vs word-initial consonants are attributable to the phonetic properties of these consonants, by systematically manipulating these properties. Second, the potentially distinctive status of liaison consonants compared with fixed consonants in perception was further explored by inserting fixed word-final and word-medial consonants, as well as word-initial ones, in the material. 3.
3.1 Method 3.1.1 Material. The material was made up of twenty sets of four test sentences. These sentences contained a target consonant which appeared in the vicinity of the boundary between two words. The target consonant was /z/ for twelve of the twenty sets and /n/ for the remaining sets. Within each set, the target consonant was located at the onset of Word 2, at the end of Word 1, in wordmedial position, and as a liaison consonant at the juncture between Words 1
DETECTING LIAISON CONSONANTS IN FRENCH
9
and 2. As an example, the position of the target /z/ in each of the four sentences for one of the sentence sets is shown in Table 1. The critical words are underlined in the orthographic transcription. A phonemic transcription of these two words is also shown, with the target consonant displayed in bold. Sentence type
Target position
Example
Il y a des zéros /dezero/ partout dans le tableau. “There are zeros everywhere in the table”. On a eu seize élèves /szelv/ qui ont réussi au bac 2 W1-final “Sixteen pupils of ours have passed the baccalaureate exam”. J’ai rapporté du raisin /dyrz/ du marché ce matin. 3 Word-medial “I brought some grapes back from the market this morning”. J’ai remis des écrous /dezekru/ en haut du radiateur. 4 Liaison “I put some nuts back on top of the radiator”. Table 1: Position of the target consonant /z/ in each of the four sentences, for one of the twenty sentence sets. 1
W2-initial
In all cases, liaison consonants appeared in an unmarked context which made their pronunciation obligatory: determinant + noun (e.g. des [z] écrous “nuts”), adjective + noun (e.g. lointain [n] ami “distant friend”), monosyllabic adverb (e.g. très [z] ému “very touched”) or preposition (e.g. en [n] Asie “in Asia”) before another word. In addition, both Type-1 and Type-4 sentences were locally ambiguous as to the morpho-phonological status of the target consonant, i.e. the first part of the sentence, up to the post-consonantal vowel, was in both cases consistent with the consonant being a W2-initial or a liaison consonant. This is true, for example, of the W2-initial [z] in Il y a des [z] zéros “There are zeros” (where the morpho-syntactic and phonological make-up of the first part of the sentence up to the post-consonantal vowel may allow the listener to interpret [z] as a liaison consonant, until the following word is identified) and, reciprocally, of the liaison [z] in J’ai remis des [z] écrous “I put some nuts back” (where the first part of the sentence up to the post-consonantal vowel could lead to [z] being temporarily interpreted as the initial consonant of the upcoming word by the listener). Importantly, for most Type-1 sentences, Word 1 contained a liaison consonant whose realization would be obligatory prior to a word-initial vowel. For example, the liaison consonant /z/ associated with the determinant des in des zéros is obligatorily pronounced when the following word begins with a vowel. [There were only two exceptions to this. In les délégués zaïrois “the Zairian delegates” (plural noun + adj., target cons.: wordinitial /z/), the realization of the liaison consonant /z/ at the end of délégués prior to a word-initial vowel is optional. In un bien naturel “a natural resource” (sing. noun + adj., target cons.: word-initial /n/), the realization of a liaison /n/ at the end of bien used as a noun before a word-initial vowel, is excluded. The
10
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
corresponding Type-4 sequences are les avis “the notices” (det. + noun, liaison /z/) and bien appris “well learned” (adv. + part participle, liaison /n/), respectively.] Such constructions allowed us to ensure that the listeners could not predict whether the target consonant was a W2-initial or liaison consonant from the preceding words in the sentence. All sentences had about the same number of syllables (mean = 13, s.d. = 1.4) and the rank of the word in which the target consonant appeared was approximately the same across sentences (average rank, from the beginning of the sentence = 4.5 words, s.d. = 1.1). The target-bearing word was as short as possible and contained two syllables on average (s.d. = 0.6) in Type-1 sentences, one syllable (s.d. = 0) in Type-2 sentences, two syllables (s.d. = 0.2) in Type-3 sentences, and one syllable (s.d. = 0.2) in Type-4 sentences. The purpose of using such short words was to minimize the possibility for the target consonant to be anticipated by the listener in Type-2, -3 and -4 sentences. The pre- and post-target vowels were as phonetically similar as possible across the four sentences in each set, differing from each other by at most one distinctive feature (in a standard distinctive-feature system) for most sets. The pre-target vowel itself was preceded by a consonant (e.g. /d/ in des zéros) on which two constraints were imposed for Type-1 and Type-4 sentences. First, consonants appearing in that position in the two sentences had to share as many phonetic properties with each other as possible. Second, whenever possible we used consonants characterized by a well-defined acoustic transition with the following vowel, such as voiceless obstruents. These constraints were motivated by the splicing procedure to which Type-1 and Type-4 sentences were later subjected (see below). A further phonetic constraint was that the sounds preceding the target consonant were as different from the target as possible, to avoid any perceptual interference (Stemberger, Elman & Haden 1985). In addition, the sentences had similar syntactic structures, and Word 2 was chosen to be as semantically unpredictable as possible from the first part of the sentence (on the basis of the first and second authors’ intuitions as native speakers of French). Finally, we constructed 240 filler sentences (120 without /z/ and 120 without /n/), which were similar to the test sentences with respect to overall length and syntactic structure. Furthermore, part of the words occurring in Word 1 position in Type-1 and Type-4 test sentences also appeared in the filler sentences prior to a word-initial consonant that differed from the target in the test sentences, e.g. des crêpes /dekrp/ “pancakes”. This means that these words were not systematically associated with the presence of the target consonant in the material, and that the listeners were thus prevented from developing a response strategy based on learning such an association over the course of the experiment (thus, des was not always followed by /z/, whether as a word-initial or liaison consonant).
DETECTING LIAISON CONSONANTS IN FRENCH
11
3.1.2 Speaker, recording and acoustic labelling. The material was recorded by the first author, whose speech can be characterized as intermediate between Southern and standard French. In particular, this speaker does not pronounce word-final schwas, as is the case in Southern French (see Nguyen & Fagyal 2006, for further details). The recording took place in a sound-proof room using high-quality recording equipment (sampling frequency = 22050 Hz). The speaker first read the list of test sentences five times, then the filler sentences. Both the test and filler sentences were randomized. The speaker’s task was to read the sentences naturally, while maintaining the same rate, rhythm and pitch contour throughout the corpus. The acoustic data were transferred onto a personal computer for further processing. For each test sentence, markers were placed at the acoustic onset and offset of each segment in each V-target C-V sequence. The location of these acoustic boundaries was determined from both the digital speech waveform and a corresponding wideband spectrogram. 3.1.3 Stimuli and experimental design. The initial set of stimuli consisted of the 80 test sentences and 240 filler sentences. For each of the Type-2 and Type-3 sentences, one repetition out of the five available was selected, which we judged as being articulated fluently, clearly, and at a normal rate. In addition, two different versions of Type-1 and Type-4 sentences were created. In the identity-spliced version, the target consonant and preceding vowel originated from another repetition of the same sentence. In the cross-spliced version, the target consonant and preceding vowel came from either the Type-1 or Type-4 corresponding sentence, for Type-4 and Type-1 sentences, respectively. To construct the identity- and cross-spliced stimuli, we selected those among the five available repetitions per sentence which allowed the vowel + consonant sequence to be spliced into the carrier sentence with no audible discontinuities across the splicing points. As for Type-2 and Type-3 sentences, fluency, clarity of articulation and rate were also taken into consideration. Although the consonant’s duration and that of the preceding vowel did not significantly differ when the consonant was in W2-initial compared with liaison position (as reported in Section 3.2.1 below), variations related to the consonant’s position may be shown in the vicinity of that consonant by other acoustic parameters. Cross-splicing allowed us to assess the perceptual relevance of such potential acoustic variations. These were expected to result in a lower target detection rate and/or a longer reaction time in the cross-spliced sentences than the identity-spliced sentences, which we used as a baseline condition. The experimental task was a speeded phoneme detection task, with two targets, /n/ and /z/. Thirty-four native speakers of French with no known hearing deficit participated and were partitioned into two main groups. The stimuli were blocked by target, and the order of presentation of the targets was counterbalanced across groups. Test and filler sentences were fully randomized within each block. The two subject groups were further divided into two
12
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
subgroups. For each of the Type-1 and Type-4 sentences, one subgroup was presented with the identity-spliced version and the other with the cross-spliced version. Which subgroup heard the identity-spliced vs cross-spliced version systematically changed from one sentence to the next. In this way, each subject heard each sentence only once, either the identity-spliced (for half of the sentences) or the cross-spliced version (for the other half). One of the four subgroups contained ten subjects and the others had eight subjects. The stimuli were played over headphones at a comfortable sound level. Subjects had to press a button on a response box, using their dominant hand, if and as soon as they detected the target in the sentence. Reaction time was measured from the acoustic onset of the target phoneme. The test phase was preceded by a short training phase with ten sentences. The experiment lasted about thirty minutes, and each subject received a small fee for her/his participation. 3.2 Results 3.2.1 Durational measurements. In a first attempt to characterize the acoustic properties potentially associated with the target consonant depending on its position and phonological status, we measured the duration of that consonant, along with that of the preceding vowel. Figure 1 shows the average duration for each segment in each of the four types of sentence. Repeated-measure ANOVAS revealed that duration significantly varied as a function of sentence type for /z/ (F(3,33) = 6.282, p<0.01) and the preceding vowel (F(3,33) = 17.669, p<0.001), as well as for /n/ (F(3,21) = 3.185, p<0.05) and the preceding vowel (F(3,21) = 7.101, p<0.01). Scheffé post-hoc tests showed that the duration of /z/ was significantly longer in W2-initial position than in W1final position (p<0.01). In addition, and for both /z/ and /n/ sentences, the preceding vowel’s duration was significantly longer in W1-final than in W2initial (/z/ sentences: p<0.001; /n/ sentences: p<0.01), word-medial (/z/ sentences: p<0.01; /n/ sentences: p<0.05), and liaison position (/z/ sentences: p<0.001; /n/ sentences: p<0.01). Pairwise comparisons between the mean values associated with the four types of sentence yielded no significant difference for /n/ duration. To summarize, vowels in word-final closed syllables were longer than vowels in other positions and /z/ was longer when it appeared in onset position in word-initial syllables as opposed to coda position in word-final syllables. Importantly, however, the comparison between W2-initial and liaison positions, which formed the main focus of interest in this work, revealed no significant difference in the duration of either the target consonants or the preceding vowel. Note that this is not consistent with the tendency for consonants to be shorter in liaison than in W2-position reported previously (Dejean de la Bâtie 1993; Gaskell et al. 2002; Spinelli et al. 2003). This may be due, at least in part, to the phonetic make-up of the material used in each study. Dejean de la Bâtie’s (1993) analyses focused on /t/; the present work examines /z/ and /n/. The two other studies used a variety of target consonants that included /z/ (Gaskell et al. 2002) and /n/ (Spinelli et al. 2003), but it is unclear
DETECTING LIAISON CONSONANTS IN FRENCH
13
to what extent /z/ and /n/ actually contributed to the observed positiondependent differences in duration because the authors only provide mean duration values across all target consonants. A more relevant comparison is with Wauquier-Gravelines (1996), who measured the duration of /n/ in liaison vs W2-initial position, and, as in the present study, found no significant difference between the two.
Figure 1: Average duration of the target consonant and pre-consonantal vowel as a function of consonant position, for /z/ and /n/.
3.2.2 Perceptual data pre-processing. Data from one subject out of the thirtyfour were omitted due to the unusually high error rate (61%); data from two other subjects were omitted because their mean reaction times were more than two standard deviations above the overall mean RT. After these exclusions, the four subgroups of subjects contained seven, eight, nine and seven members. For these thirty-one subjects, the proportion of correct detections ranged from 65% to 93% over both targets, and the mean reaction time ranged from 538 ms to 1396 ms. There was a significant negative correlation between percent correct detection and mean RT per subject (R2 = 0.36, t(29) = –4.02, p<0.001), i.e. subjects who tended to miss the target more often were also slower to respond when they did detect the target. 3.2.3 Target detection rates. To assess the effect of cross-splicing on phoneme detection, a by-subject repeated-measures ANOVA was carried out, with target identity, splicing type (identity-spliced vs cross-spliced) and position (W2initial, liaison) as independent variables and percent correct detection as the dependent variable. All of the independent variables were within-group factors. The experimental design allowed us to put these three independent variables together in a by-subject ANOVA but not in a by-item ANOVA. The analysis was restricted to the W2-initial and liaison positions since cross-splicing was performed for these two positions only. Percent correct detection was
14
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
submitted to an arcsin transformation prior to being subject to the ANOVA. The percent correct detection was significantly higher for /z/ (92%) than for /n/ (70%; F(1,30) = 82.020, p<0.001). In addition, the W2-initial position was associated with a more accurate phoneme detection (92%) than the liaison position (70%; F(1,30) = 46.851, p<0.001). There was a significant interaction between target identity and position (F(1,30) = 15.560, p<0.001), which reflected the fact that the difference in the detection score between the W2initial and liaison positions was smaller for /z/ (diff. = +11%) than for /n/ (diff. = +32%). However, the percent correct detection was not significantly different for cross-spliced sentences (80%) and identity-spliced sentences (82%; F(1,30) = 0.780, p = 0.384) and no significant interaction was found between splicing and any of the other independent variables. This shows that potential acoustic cues associated with consonant position, in the consonant itself and in the preceding vowel, did not have a measurable influence on the accuracy of the listeners’ responses. In the analyses reported below, percent correct detection for identity- and cross-spliced sentences were therefore pooled together. Both by-subject and by-item ANOVAs were performed using target identity and position (W2-initial, W1-final, word-medial and liaison) as independent variables. Average percents of correct detections for the two target consonants in each of the four positions are shown in Figure 2, along with the corresponding standard deviations. Percent correct detection was found to be significantly higher for /z/ (91%) than for /n/ (73%; by-subject repeated-measures ANOVA: F(1,30) = 85.060, p<0.001; by-item ANOVA: F(1,18) = 81.408, p<0.001). Variations in percent correct detection as a function of target position were also statistically significant (by-subject ANOVA: F(3,90) = 22.970, p<0.001; byitem ANOVA: F(3,54) = 10.947, p<0.001). A significant interaction between target identity and position was found in the by-subject ANOVA (F(3,90) = 6.994, p<0.001) but not in the by-item ANOVA (F(3,54) = 1.334, p<0.273). For each target consonant, Scheffé post-hoc comparisons between percent correct detection associated with the four positions were performed. For /z/, percent correct detection was significantly higher in W2-initial vs W1final position (by-subject analysis, p<0.05), W2-initial vs word-medial position (by-subject analysis, p<0.05), and W2-initial vs liaison position (by-subject analysis: p<0.001; by-item analysis: p<0.05). For /n/, percent correct detection was significantly higher in W2-initial vs liaison position (by-subject analysis: p<0.001; by-item analysis: p<0.01), W1-final vs liaison position (by-subject analysis, p<0.001) and word-final vs liaison position (by-subject analysis, p<0.001). Other pairwise comparisons between positions for each target were not statistically significant. These results replicate Wauquier-Gravelines’ (1996) earlier finding that listeners have greater difficulties detecting liaison consonants than W2-initial consonants. Our data show that this tends to be true to a greater extent for /n/ than for /z/. Moreover, they indicate that, in the case of /n/, liaison consonants were more difficult to detect than W1 final consonants. They
DETECTING LIAISON CONSONANTS IN FRENCH
15
further reveal that the nasal target is intrinsically more difficult to detect than the fricative.
Figure 2: Average percent correct detection, along with the corresponding standard deviation, in each of the four positions, for each target consonant.
3.2.4 Target detection reaction times. We now turn to the potential influence of cross-splicing on the listeners’ reaction times. Figure 3 shows the average RT associated with correct responses to the identity-spliced and cross-spliced sentences, for each of the two targets in W2-initial and liaison positions, which again were the two positions for which cross-splicing was applied. RT partly mirrored the tendencies observed for phoneme detection in that RT was longer for the nasal than the fricative target, and longer for liaison consonants than W2-initial consonants for the identity-spliced tokens. These trends were confirmed in a by-subject repeated-measures ANOVA (target-identity main effect: F(1,22) = 50.580, p<0.001; position effect: F(1,22) = 4.446, p<0.05). However, the observed RT patterns differed from the phoneme detection patterns with respect to splicing. Specifically, whereas splicing did not interact with any of the other factors for percent correct detection, a significant interaction (F(1,22) = 6.313, p<0.05) between position and splicing was found for RT such that the difference in RT as a function of position in the identityspliced version disappeared in the cross-spliced version. Table 2 allows us to compare the mean RT and corresponding standard deviation for each target across the four positions. For the sake of comparison with Type-2 and Type-3 sentences, the values given for Type-1 and Type-4 sentences were computed from the identity-spliced stimuli only. A by-subject, repeated-measures ANOVA showed that RT was significantly longer for /n/ than for /z/ (F(1,26) = 76.429, p<0.001), and varied as a function of target position (F(3,78) = 12.487, p<0.001). In addition, the Target Identity × Target Position interaction proved significant (F(3,78) = 4.110, p<0.01), in keeping with the fact that mean RT varied in different directions depending on position
16
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
for /z/ and /n/. Note, however, that Target Identity was the only variable whose effect on RT was significant in the corresponding by-item ANOVA (F(1,18) = 140.160, p<0.001).
Figure 3: Average reaction times associated with correct responses to identity-spliced and cross-spliced sentences, for each target consonant in W2-initial and liaison position. The corresponding standard deviations are also shown. Target identity Sentence type Target position /z/ /n/ 1 W2-initial 707 (345) 1040 (425) 2 W1-final 728 (437) 935 (422) 3 Word-medial 714 (434) 954 (442) 4 Liaison 755 (407) 1166 (545) Table 2: Mean reaction times (in ms) associated with each of the two target consonants in each of the four positions. The corresponding standard deviation is shown in parentheses.
Scheffé post-hoc tests conducted in conjunction with the by-subject analysis indicated that RT was significantly longer for the liaison target compared with the W2-initial (p<0.05) and word-medial (p<0.05) targets for /z/, and with the W1-final (p<0.001) and word-medial (p<0.01) targets for /n/. Other pairwise comparisons between positions for each target did not reach statistical significance. 4.
General discussion To summarize, liaison consonants were found to be more difficult to detect than W2-initial consonants by listeners. Detection scores were lower, and correct responses tended to be slower, for the former than for the latter. Cross-splicing liaison and W2-initial consonants tended to neutralize the difference in reaction times associated with both targets, but had no significant effect on how frequently targets were successfully detected. Proportions of correct responses for W1-final and word-medial targets were halfway between
DETECTING LIAISON CONSONANTS IN FRENCH
17
those for W2-initial and liaison targets. Finally, correct responses were both fewer and slower for the nasal than for the fricative target. The tendency to miss liaison consonants seems to be a robust perceptual phenomenon. In our experiment, it arose even though the position of the target with respect to the beginning of the carrier sentence was roughly the same for all the sentences, which may have made it increasingly easier for listeners to predict that position over the course of the experiment. In addition, failure to detect liaison consonants occurred in spite of the fact that the phoneme detection task should have drawn the listeners’ attention to the phonetic level at the expense of higher-level (lexical, in particular) properties of the stimulus. A main issue in the present experiment was to determine the extent to which differences in detection rate between W2-initial and liaison targets can be accounted for by the targets’ acoustic characteristics. Acoustic analysis revealed no significant variation in the duration of the target consonant, nor in that of the preceding vowel, depending on whether the consonant appeared in W2-initial or liaison position. Further analyses will be needed to determine whether position-dependent variations can be found in the vicinity of the consonant along other acoustic dimensions (e.g. rate of transition at the offset of the preceding vowel and/or into the following one, formant pattern of the preceding vowel). What the listeners’ responses showed, however, was that potential variations in the target consonant and preceding vowel’s acoustic properties, depending on the target’s position, had little or no impact on response accuracy. Failure to detect liaison consonants therefore seems to be attributable to higher-level factors, which may relate to how these consonants are represented as part of the speaker/listener’s linguistic knowledge. In the introduction, two main linguistic theories of liaison were presented. In autosegmental theory, the liaison consonant is seen as a highly abstract phonological object whose phonetic realization involves establishing associations between tiers, and is conditioned by a number of syntactic and stylistic constraints. According to exemplar-based theory, liaison is mostly a lexical phenomenon, i.e. it forms one of the elements of frequently co-occurring sequences of morphemes or words, referred to as constructions. Let us consider how failure to detect liaison consonants can be interpreted in light of each of these theoretical viewpoints. As already suggested above, it may be the case that the listeners’ poorer performance in detecting liaison consonants compared with W2-initial consonants stems from the specific phonological status attributed to liaison consonants in French. In the autosegmental model, liaison consonants differ from fixed consonants in that the former are lexically floating with respect to both the skeletal and syllabic tiers. Most consonants are fixed, i.e. have a preestablished link with one of the available slots in the skeleton, and liaison consonants form a much more specific case. In the phoneme-detection task, it is therefore reasonable to assume that listeners expected the target to be, by default, a fixed consonant. This would explain why they showed a tendency to miss liaison consonants, which do not fall into that general category, more
18
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
often than W2-initial consonants, and why, when listeners did detect liaisons, it took more time for them to respond. Another noticeable feature in our perceptual data is how listeners responded to W1-final and word-medial target consonants. The detection rate for these targets was found to be lower than that for the W2-initial target for /z/, on the one hand, and higher than the liaison target detection rate for /n/, on the other hand, in the by-subject analyses. Although these trends were not statistically significant in the corresponding by-item analyses, they nevertheless suggest that W1-final and word-medial consonants may form an intermediate case between W2-initial and liaison consonants, as far as the listener’s capacity to detect the presence of the target in the speech chain is concerned. It is particularly interesting to compare listeners’ responses to W1final consonants and liaisons, because both the W1-final and liaison consonants were systematically enchaînées in our material. The tendency for the detection rate to be higher for W1-final consonants than for liaison consonants may indicate that it is not enchaînement per se, i.e. the anchoring of the consonant to the onset position of the following vowel-initial word, which makes the liaison consonants more difficult to detect. Rather, failure to detect liaison consonants may be specifically due to flotation with respect to the skeleton, a property attributed to liaison consonants only in Encrevé’s model. Studies by Wauquier-Gravelines (1996), Gaskell et al. (2002) and Spinelli et al. (2003) explored the potential impact of liaison and enchaînement in spoken word recognition. These studies showed that liaison (WauquierGravelines 1996; Gaskell et al. 2002; Spinelli et al. 2003) and enchaînement (Gaskell et al. 2002) make it easier for listeners to recognize the following word, compared with a control condition. According to Gaskell et al. (2002), the facilitative effect of liaison and enchaînement may be caused by lexical knowledge about the offset of the preceding word, combined with sensitivity to the phonological context conditioning the occurrence of liaison and enchaînement. In addition, acoustic cues associated with resyllabification may contribute to facilitate the processing of Word 2 in both liaison and enchaînement conditions. Quite importantly in the context of the present work, these data show that listeners are sensitive to the presence of a liaison consonant in the speech signal, and that this consonant may provide them with early information about the phonological make-up of the upcoming word (which must begin with a null onset for liaison to occur). What our own experimental data suggest is that listeners sometimes fail to identify liaison consonants as phonemic units in an explicit phoneme detection task. In our view, facilitative effects in word identification, on the one hand, and inhibitory effects in phoneme detection, on the other hand, can both be seen as pointing to the specific status liaison consonants have in French phonology. Because they occur at the juncture between two words, and because their realization and syllabification across that juncture are highly context-dependent, liaison consonants may allow listeners to identify the upcoming word more easily while being difficult to map onto ‘ordinary’ phonemic categories.
DETECTING LIAISON CONSONANTS IN FRENCH
19
We now turn to the exemplar-based approach to liaison as proposed by Bybee (2001, 2005). This approach differs radically from the autosegmental account, most notably because the contexts in which liaison appears are assumed to be encoded in memory as a large set of grammatical constructions with different degrees of abstractness and frequencies of occurrence, as opposed to the parsimonious and uniformly abstract representations used in autosegmental theory. Despite these theoretical differences, failure to detect liaison can also be accounted for by the exemplar-based approach. In this approach, liaison consonants are deeply entrenched in specific constructions, and the realization of liaison is highly conditioned by the strength of the associations between words within such constructions. According to Bybee (2001), liaison provides evidence for the existence and nature of storage units beyond the traditional word. The evidence presented so far strongly suggests that frequent fixed phrases are storage and processing units, as are constructions containing grammatical morphemes.
It follows that liaison consonants are processed by listeners as being part and parcel of the constructions in which they appear. As a result, listeners may find it difficult to identify them as context-independent phonemic units, as explicitly required in a phoneme-detection task. In the construction [NOUN -z[vowel]-ADJ]Plural for example, the liaison consonant /z/ is said to be tightly associated with the other elements of which this construction is composed, and it may be difficult for these elements to be abstracted away by listeners. Thus, in spite of the sharp opposition between the exemplar-based and autosegmental models of liaison—constructions being much closer to surface forms than lexical autosegmental representations—both models would seem to be consistent with the fact that detecting liaison consonants in speech is difficult. Crucially, however, it seems to us that in the exemplar-based approach, the difficulties experienced by listeners in the phoneme-detection task should not be specific to liaison and should extend to all the segments a construction may contain. In other words, W2-initial consonants should be as difficult to process as liaison consonants. More generally, the exemplar-based approach does not seem to lead to the prediction that response accuracy in the phonemedetection task should differ depending on the position of the target in the construction. The lower detection rates observed for liaison targets compared with W2-initial targets therefore seem to provide better evidence for the autosegmental account than for the exemplar-based account. One question we have not addressed yet, and which has important implications for autosegmental and exemplar-based approaches, relates to the potential role of the syntactic status of the carrier word in the detection of liaison. From that point of view, an interesting parallel may be drawn between failure to detect liaison consonants and a well-established effect in reading, namely the Missing-Letter Effect (MLE). The MLE refers to the fact that letter
20
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
detection in connected text is more difficult in frequent function words than in less common words (Healy 1976, 1994; Koriat & Greenberg 1994; Greenberg, Healy, Koriat & Kreiner 2004). For example, readers tend to miss the target letter t more often in the than in weather. In Healy’s unitization model, the MLE is attributed to the fact that highly common words are associated with whole-word, unitized representations in reading. The fast activation of these representations would prevent lower-level units (e.g. constituent letters) from being fully processed. By contrast, according to Koriat and Greenberg (1994), the MLE reflects the role of function words as cues for sentence structure. Early in text processing, readers focus their attention on function morphemes and use them to establish a structural frame. Subsequently, structural cues recede to the background as attention shifts from structure to content. In our phoneme-detection experiment, liaison targets generally belonged to high-frequency monosyllabic determiners, while most W2-initial targets belonged to nouns. Thus, we need to determine to what extent failure to detect liaison consonants is attributable to the syntactic status of the carrier word, rather than to the phonological status of the consonant. For the liaison /n/, there was some variation in the carrier word’s syntactic category, which may allow us to shed preliminary light on this issue. In addition to including four determiners and one preposition, the eight carrier words also comprised two adjectives and one adverb. We classified these words in two broad categories on the basis of their morphosyntactic properties, namely DET/PREP and ADJ/ADV. A by-subject repeated-measures ANOVA was conducted on the phoneme-detection data with target position (W2-initial, liaison) and syntactic category of the carrier word for liaison (DET/PREP, ADJ/ADV) as independent variables, and percent correct responses as dependent variable. The results showed that percent correct responses was significantly higher for W2-initial targets (86%) than for liaison targets (56%; F(1,30) = 38.131, p<0.001, arcsin-transformed values) whereas no significant main effect was found for syntactic category. There was, however, a significant interaction between position and syntactic category (F(1,30) = 6.469, p<0.05), which reflected the fact that for the liaison target, the percent of correct responses was lower for DET/PREP (49%) than for ADJ/ADV (62%). In other words, the subjects tended to miss a liaison target more often when this target occurred at the end of a short function word (DET/PREP) compared with an adjective or adverb. There is, therefore, some evidence pointing towards a link between response accuracy and syntactic status of the carrier word for the liaison target, although it must be noted that position-dependent variations in response accuracy remain highly significant. Such results are at variance with the autosegmental account we have offered for failure to detect liaison, as this account focuses on the phonological properties of the liaison consonant, and assigns no role to the syntactic status and/or frequency of use of the carrier word and its neighbours. By contrast, these results seem to lend support for the exemplar-based model, as they suggest that failure to detect liaison is to some extent dependent upon the
DETECTING LIAISON CONSONANTS IN FRENCH
21
strength of the connections between the words at the juncture of which liaison is realized. In the exemplar-based framework, it may be assumed that a determiner + noun sequence such as son hôtel “his hotel” will be more likely to form a single processing unit than an adjective + noun sequence such as un lointain ami “a distant friend”, because of the much higher probability of cooccurrence of the two words in the first sequence than in the second one. As a consequence, the liaison consonant would be more deeply embedded, and therefore more difficult to detect, in a determiner + noun sequence than in an adjective + noun sequence. (Note, in that respect, that liaison is fully obligatory in determiner + noun and preposition + noun sequences, whereas it may not be realized in adjective + noun and adverb + noun sequences.) Response accuracy in liaison detection seems to decrease in carrier word sequences with a higher degree of lexicalisation, as might be predicted by the exemplar-based model. To conclude, our data indicate that detecting liaison consonants in speech is difficult. These difficulties do not seem to be attributable to acoustic differences these consonants may show with W2-initial consonants, and may reflect the influence of higher-level properties, related to the way in which liaison is represented in the speaker-listener’s grammar. Our results are in part consistent with the hypothesis that liaison consonants are characterized by a highly specific phonological status. However, detection accuracy seems to vary to a certain extent depending on the degree of lexicalisation of the carrier word sequence. Future work, extended to non-obligatory liaisons in word sequences with a low probability of co-occurrence, will be conducted with a view to better establishing which of the phonological and lexical approaches can best account for how liaison is processed by listeners in French.
References Adda-Decker, Martine, Philippe Boula de Mareüil & Lori Lamel. 1999. “Pronunciation variants in French: Schwa and liaison”. Proceedings of the 15th International Congress of Phonetic Sciences ed. by Maria-Josep Solé, Daniel Recasens & Joaquín Romero. 2239-2242. Barcelona: Causal Productions. Bybee, Joan. 2001. Phonology and Language Use. Cambridge: Cambridge University Press. ----------. 2005. “La liaison: Effets de fréquence et constructions”. Langages 158. 24-37. Content, Alain, Ruth Kearns & Uli Frauenfelder. 2001. “Boundaries versus onsets in syllabic segmentation”. Journal of Memory and Language 45. 177-199. Côté, Marie-Hélène. 2005. “Le statut lexical des consonnes de liaison”. Langages 158. 66-78. Dejean de la Bâtie, Bernadette. 1993. Word boundary ambiguity in spoken French. PhD diss., Monash University, Victoria, Australia.
22
NGUYEN, WAUQUIER-GRAVELINES, LANCIA & TULLER
Delattre, Pierre. 1940. “Le mot est-il une entité phonétique en français?” Le Français Moderne 8. 47-56. Encrevé, Pierre. 1988. La liaison avec et sans enchaînement. Paris: Seuil. ---------- & Tobias Scheer. 2005. “L’association n’est pas automatique”. Proceedings of the 7th Annual Meeting of the French Network of Phonology ed. by Noël Nguyen & Mariapaola D’Imperio. 23-24. Aixen-Provence, France: Laboratoire Parole et Langage, CNRS et Université de Provence. Fougeron, Cécile. 2001. “Articulatory properties of initial segments in several prosodic constituents in French”. Journal of Phonetics 29. 109-135. ----------, Jean-Philippe Goldman, Alicia Dart, Laurence Guélat & Clémentine Jeager. 2001. “Influence de facteurs stylistiques et lexicaux sur la réalisation de la liaison en français”. TALN 2001 ed. by Denis Maurel, Nathalie Friburger & Béatrice Bouchou. 173-182. Tours: Université François-Rabelais de Tours. ----------, Jean-Philippe Goldman & Uli Frauenfelder. 2001. “Liaison and schwa deletion in French: An effect of lexical frequency and competition?” Proceedings of Eurospeech 2001 ed. by Paul Dasgaard, Børge Lindberg, Henrik Benner & Zheng-hua Tan. 639-642. Aalborg: Aalborg University. Gaskell, M. Gareth, Elsa Spinelli & Fanny Meunier. 2002. “Perception of resyllabification in French”. Memory & Cognition 30. 798-810. Greenberg, Seth N., Alice F. Healy, Asher Koriat & Hamutal Kreiner. 2004. “The GO model: A reconsideration of the role of structural units in guiding and organizing text on line”. Psychonomic Bulletin & Review 11. 428-433. Healy, Alice F. 1976. “Detection errors on the word the: Evidence for reading units larger than letters”. Journal of Experimental Psychology: Human Perception and Performance 2. 235-242. ----------. 1994. “Letter detection: A window to unitization and other cognitive processes in reading text”. Psychonomic Bulletin & Review 1. 333-344. Klausenburger, Jurgen. 1974. “Rule inversion, opacity, conspiracies: French liaison and elision”. Lingua 34. 167-179. ----------. 1977. “Deletion vs epenthesis: Intra- vs interparadigmatic arguments in linguistics”. Lingua 42. 153-160. ----------. 1984. French Liaison and Linguistic Theory. Stuttgart: Franz Steiner Verlag. Koriat, Asher & Seth N. Greenberg. 1994. “The extraction of phrase structure during reading: Evidence from letter detection errors”. Psychonomic Bulletin & Review 1. 345-356. Marslen-Wilson, William & Pienie Zwitserlood. 1989. “Accessing spoken words: The importance of word onsets”. Journal of Experimental Psychology: Human Perception and Performance 15. 576-585. Nguyen, Noël & Zsuzsanna Fagyal. 2006. “Acoustic aspects of vowel harmony in French”. Accepted with revisions in the Journal of Phonetics.
DETECTING LIAISON CONSONANTS IN FRENCH
23
Sapir, Edward. 1933. “La réalité psychologique des phonèmes”. Journal de Psychologie Normale et Pathologique 30. 247-265. Schane, Sanford A. 1968. French Phonology and Morphology. Cambridge, Mass.: MIT Press. Spinelli, Elsa, James M. McQueen & Anne Cutler. 2003. “Processing resyllabified words in French”. Journal of Memory and Language 48. 233-254. Stemberger, Joseph Paul, Jeffrey Locke Elman & Patricia Haden. 1985. “Interference during phoneme monitoring”. Journal of Experimental Psychology: Human Perception and Performance 11. 475-489. Tranel, Bernard H. 1995. “Current issues in French phonology: Liaison and position theories”. The Handbook of Phonological Theory ed. by John A. Goldsmith. 798-816. Cambridge, Mass.: Blackwell. Wauquier-Gravelines, Sophie. 1996. Organisation phonologique et traitement de la parole continue. [Phonological organization and connected-speech processing]. PhD diss., Université Paris 7.
PATTERNS OF VCV COARTICULATORY DIRECTION ACCORDING TO THE DAC MODEL * DANIEL RECASENS Universitat Autònoma de Barcelona & Institut d’Estudis Catalans
Abstract This paper provides support for the DAC model of coarticulation by showing that directionality patterns of vowel and consonant coarticulation in VCV sequences are inversely related. Thus, VCV sequences with dark [l] exhibit more vowel-to-consonant anticipation than carryover, due to the salient consonant anticipatory effects blocking vowel carryover, whereas the opposite is the case for the alveolopalatals [] and []. It is shown that differences in articulatory constraint at consonant onset and offset need to be taken into consideration in order to account for the relative salience of the anticipatory and carryover vowel effects in those VCV sequences. Manner of articulation requirements and lingual configuration characteristics determine trends in vowel coarticulatory direction in VCV sequences with relatively unconstrained dentals and alveolars. The implications of these experimental findings for predicting regularities in assimilatory processes and phonemic planning strategies are also addressed.
1.
Introduction An understanding of the speech production mechanisms involves uncovering the spatiotemporal patterns of articulatory activity and coordination used by speakers for the realization of strings of phonetic segments. In our view (Recasens, Pallarès & Fontdevila 1997), the degree of articulatory constraint or DAC model of coarticulation contributes to improving our understanding of those mechanisms. The model is based on the notion that different requirements are imposed on the articulatory structures for the production of different vowels and consonants, and that the size, extent and direction of the coarticulatory effects between consecutive phonetic segments are determined by those articulatory requirements. The DAC model of coarticulation has been motivated by the complex lingual coarticulatory interactions observed in speech, and is rooted in earlier work by Öhman and *
This research was funded by projects 2005SGR864 of the Generalitat de Catalunya, and BFF2003-09453-C02-01 of the Ministry of Science and Technology of Spain and FEDER. I would like to thank Maria-Josep Solé, Marija Tabain and two anonymous reviewers for their insightful remarks on previous versions of this paper.
26
DANIEL RECASENS
Bladon and colleagues (Öhman 1966; Bladon & Al-Bamerni 1976; Bladon & Nolan 1977). The present paper explores the predictive power of the DAC model to explain certain aspects of the direction of vowel anticipatory (right-to-left) and vowel carryover (left-to-right) effects at the tongue dorsum in VCV sequences more thoroughly than in previous studies (essentially Recasens et al. 1997 and Recasens 2002). It argues that directionality patterns of vowel coarticulation may be modelled by assuming that the direction of vocalic effects is conditioned by the direction of consonant-dependent effects. The basic hypothesis is that there should be an inverse relationship between the vocalic and consonantal effects at the temporal site where these two coarticulation types conflict with each other, i.e. between vocalic anticipation and consonantal carryover and between vocalic carryover and consonantal anticipation. Thus, the prominence of vowel-dependent anticipatory effects is expected to decrease with an increase in the degree of consonant-dependent carryover coarticulation, while the strength of the vowel-dependent carryover effects ought to vary inversely with the salience of the consonant-dependent anticipatory component. In order to investigate vowel coarticulatory direction in VCV sequences, this paper will evaluate the size of the V-to-C anticipatory and carryover effects for a set of consonants favoring different patterns of C-to-V coarticulatory direction. The implications of the experimental findings for the interpretation of sound change patterns and phonemic planning in speech production will also be addressed. Degrees of articulatory constraint and coarticulatory sensitivity Within the framework of the DAC model, the degree of coarticulation associated with a specific consonant or vowel depends on its degree of articulatory constraint or DAC value. Highly constrained segments are more resistant to coarticulation and exert more coarticulation effects on adjacent segments than less constrained ones do. Closure or constriction formation renders consonants more constrained and more resistant to coarticulation than vowels, and the degree of articulatory constraint and coarticulatory sensitivity also varies from vowel to vowel and from consonant to consonant depending on the production mechanisms involved. Regarding vowels, the DAC value decreases according to the progression high front (3) > mid front (2) > low back and back rounded (1) > schwa (0). This scale is based on the notion that the degree of articulatory constraint increases with the biomechanical properties involved in displacing the tongue dorsum upwards and frontwards for front vowels and, to a lesser degree, with coupling effects between the tongue dorsum and the postdorsal articulator for back vowels. A minimal DAC value for mid central [] is consistent with the absence of a clear articulatory target for this vowel. F2 and dorsopalatal contact coarticulation data in the literature show that the degree of tongue dorsum sensitivity for vowels varies inversely with the DAC values just 2.
VCV COARTICULATORY DIRECTION
27
referred to, i.e. lowering and backing effects on front vowels are less prominent than raising and fronting effects on back vowels and schwa (Recasens 1999a). As for consonants, the DAC scale decreases according to the progression lingual fricatives [s] and [], the trill [r], dark [l] (4) > alveolopalatals (3) > dentals, alveolars (2) > labials (0). Velar consonants are excluded from this DAC classification because they are not always produced at the same articulatory zone but may be implemented as mediodorso-postpalatal or postdorso-velar depending on the contextual vowel. A minimal DAC value for labials follows from the fact that these consonants involve little or no lingual activation. Dentals and alveolars are less constrained than alveolopalatals since the former are articulated with less raising and involvement of the massive and sluggish tongue dorsum. A high DAC value for lingual fricatives, trills and /l/ is in line with tongue body requirements such as the achievement of a critical constriction and a central groove for [s, ], a posterior alveolar constriction with a lax tongue tip and a lowered and retracted tongue dorsum for [r] (Solé 2002), and an u-like configuration in addition to a front alveolar central closure for [l]. This DAC classification accords with data on V-to-C coarticulation showing more coarticulatory resistance for consonants with DAC values of 3 and 4 than for those with a DAC value of 2, and maximal V-to-C coarticulation for labials (Recasens 1999a). A lower degree of constraint for alveolopalatals such as [] than for [s, , r, l] is consistent with coarticulation data showing that consonants of the former group are overridden by consonants of the latter in clusters (Recasens & Pallarès 2001). Smaller differences in degree of constraint may also hold for consonants which have been assigned the same DAC value in the classification presented above. Among dentals and alveolars, coupling effects between the tongue front and the tongue dorsum may account for why laminals exhibit more tongue dorsum raising and may be more constrained and coarticulation resistant than apicals (Bladon & Nolan 1977). The degree of constraint may also be somewhat higher for clear [l] vs [t, n] and for [] vs [] in line with laterals exhibiting a very anterior constriction and a relatively lowered tongue dorsum position for the passage of airflow through the mouth sides. Moreover, evidence from assimilatory processes and coarticulation in Romance leads us to believe that lingual fricatives and trills may differ in degree of constraint in the progression [r] > [] > [s] and [r] > [l]. 3.
Coarticulatory direction The production conditions for the consonants under analysis allow us to formulate several predictions on vowel coarticulatory direction based on our initial hypothesis that consonantal and vocalic effects ought to conflict with each other at a given CV or VC site. As shown in Sections 3.1 and 3.2, the direction of vocalic effects conforms to a clear scenario in the case of VCV sequences with highly constrained consonants since a good deal is known about the articulatory mechanisms involved in their production. The overall
28
DANIEL RECASENS
picture is less clear for less constrained dentals and alveolars, which is why coarticulation studies in the literature have found these consonants to allow opposite vowel directionality patterns (Section 3.3). [i]-to-C Ant Carr 301 386 318 433 436 645 159 372 304 459
iCa
aCi
PL DR FM CA X
1992 2129 2268 1986 2094
1687 1806 1792 1681 1742
1602 1691 1583 1468 1586
1301 1373 1147 1309 1283
l
FM CA X
2400 1990 2195
1596 1528 1562
1552 1450 1501
1181 1251 1216
804 462 633
848 540 694
371 199 285
415 277 346
PL DR JP JS DP JC X
1461 1307 1282 1444 1290 1455 1373
1243 1140 1158 1255 1059 1283 1190
1112 1220 1204 1315 1182 1397 1238
1101 1022 1060 1149 992 1128 1075
218 167 124 189 231 172 184
349 87 78 129 108 58 135
11 198 144 166 190 269 163
142 118 98 106 67 155 114
n
DR JP JS DP JC X
2212 2404 2226 2490 2480 2362
1730 1616 1744 1868 1693 1730
1764 1602 1740 1728 1748 1716
1482 1575 1600 1620 1510 1557
482 788 482 622 787 632
448 802 486 762 732 646
282 27 140 108 238 159
248 41 144 248 183 173
s
DR JP JS DP JC X
2078 1960 1594 2004 2018 1931
1944 1756 1586 1940 1932 1832
1874 1828 1628 1856 1815 1800
1712 1728 1680 1818 1698 1727
134 204 8 64 86 99
204 132 -34 148 203 131
162 100 -52 38 117 73
232 28 -94 122 234 104
l
aCa
[a]-to-C Ant Carr 305 390 323 438 476 685 305 518 352 508
iCi
Table 1: F2 vowel coarticulation data for several Catalan and Spanish dental and alveolar consonants. Four left columns: F2 values for [iCi, iCa, aCi, aCa]. Four right columns: anticipatory and carryover effects from [a] and [i] in asymmetrical VCV sequences.
Patterns of V-to-C coarticulatory direction are explored with data on Vto-C effects from [i] vs [a] in Catalan and Spanish VCV sequences taken from Recasens 1987, Recasens et al. 1997 and Recasens & Pallarès 1999 (see Tables 1 and 2 and Figure 1). They correspond to the following subset of consonants, languages and speakers: [] (Catalan PL, DR; Spanish FM, CA), clear [l] (Spanish speakers FM, CA), dark [l] (Catalan PL, DR, JP, JS, DP, JC), [n, s, ,
29
VCV COARTICULATORY DIRECTION
] (Catalan DR, JP, JS, DP, JC), the tap [] (Catalan PL, DR, JP, JS, DP, JC), the trill [r] (Catalan PL, DR, JP, JS, DP, JC), and [] (Catalan PL, DR). [i]-to-C Ant Carr 153 475 267 419 146 238 150 290 129 392 229 354 179 361
iCa
aCi
PL DR JP JS DP JC X
1826 2081 1885 2080 2080 2010 1994
1833 1836 1704 1878 1980 1792 1837
1511 1684 1612 1738 1717 1667 1655
1358 1417 1466 1588 1588 1438 1476
r
PL DR JP JS DP JC X
1611 1476 1248 1663 1268 1438 1451
1414 1379 1214 1698 1193 1434 1389
1321 1447 1168 1544 1160 1320 1327
1244 1270 1164 1474 1060 1315 1255
197 97 34 -35 75 4 62
290 29 80 119 108 118 124
77 177 4 70 100 5 72
170 109 50 224 133 119 134
DR JP JS DP JC X
2288 1972 1480 2096 2168 2001
2032 1932 1598 1984 2056 1920
1968 1898 1648 2032 2044 1918
1768 1764 1582 1902 2000 1803
256 40 -118 112 112 80
320 74 -168 64 124 83
200 134 66 130 44 115
264 168 16 82 56 117
PL DR
2033 2095 2064 2282 2360 2248 2176 2326 2278
1986 2014 2000 2284 2312 2192 2168 2396 2270
1827 1898 1863 2180 2262 2098 2094 2231 2173
1644 1672 1658 2160 2000 2044 2028 2186 2084
47 81 64 -2 48 56 8 -70 8
206 197 202 102 98 150 82 95 105
183 226 205 20 262 54 66 45 89
342 342 342 124 312 148 140 210 187
DR JP JS DP JC X
aCa
[a]-to-C Ant Carr -7 315 245 397 181 273 202 342 100 363 218 343 157 339
iCi
Table 2: F2 vowel coarticulation data for several Catalan and Spanish alveolar and alveolopalatal consonants. Four left columns: F2 values for [iCi, iCa, aCi, aCa]. Four right columns: anticipatory and carryover effects from [a] and [i] in asymmetrical VCV sequences.
Vocalic anticipatory and carryover effects were obtained by subtracting the F2 frequency values at consonant midpoint in asymmetrical and symmetrical VCV sequences. Information about the relative salience of the anticipatory and carryover vowel components was gained through a comparison of F2 differences between [iCa] and [iCi] and between [aCi] and
30
DANIEL RECASENS
[iCi] (i.e. between the size of the anticipatory and carryover effects from [a] in the context of [i]; see unfilled and filled bars in Figure 1), and between [aCi] and [aCa] and between [iCa] and [aCa] (i.e. between the size of the anticipatory and carryover effects associated with [i] in the context of [a]; lightly cross-hatched and densely cross-hatched bars). 3.1
Dark [l] and trill [r] Among dentals and alveolars, highly constrained consonants produced with active dorsum lowering, i.e. dark [l] and the trill [r], ought to exert more prominent C-to-V anticipatory than carryover effects. Salient consonantdependent anticipatory effects in this case should be associated with an early tongue dorsum lowering and retraction movement in anticipation of the primary tongue tip raising gesture (Sproat & Fujimura 1993). Given that C-toV effects for [l] and [r] are expected to favor the anticipatory direction, the DAC model predicts that C-to-V anticipation should block vowel carryover effects from V1 = [i] to a larger extent than C-to-V carryover will block vowel anticipation from V2 = [i] (in principle, effects from [a] should be smaller than effects from [i] in line with larger differences in tongue body height and fronting between [i] and [l] than between [a] and [l]).
Anticipation from [a] Carryover from [a] Anticipation from [i] Carryover from [i]
Hz 800 700 600 500 400 300 200 100 0
s l n l r Figure 1: Size of the anticipatory and carryover effects in F2 frequency exerted by [a] and [i] at consonant midpoint for several Catalan and Spanish consonants. Consonants range from least constrained (on the left) to maximally constrained (on the right).
VCV COARTICULATORY DIRECTION
31
In agreement with the initial prediction, bars for [l] in Figure 1 reveal the existence of more vowel anticipation than vowel carryover independently of the vowel sequence considered. This trend in coarticulatory direction was found to hold for five out of the six Catalan speakers analyzed (see Table 1). F2 trajectories for [ali] and [ila] in Figure 2 (upper left graph) may help to understand this pattern of vowel coarticulatory direction. The F2 trajectories in this graph and in the other graphs of the figure correspond to single VCV tokens from a single speaker, and are representative of the F2 trajectories for other tokens and speakers. They have been lined up at consonant onset, which is where the long vertical line appears in the graphs. Short vertical lines have been placed at consonant offset. In the graph, the VC transition for [ila] barely penetrates inside the consonant, thus implying that carryover effects exerted by [i] are largely blocked by the salient anticipatory component for [l]. As shown in Table 1, this causes F2 for [l] to be equally low in the sequences [ila] and [ala], i.e. about 1200 Hz. On the other hand, the CV rising transition for [ali] in the figure rises quite rapidly from a low F2 frequency until about 1700 Hz at consonant offset, meaning that anticipatory effects associated with V2 = [i] are allowed to occur already at about consonant midpoint. Indeed, F2 at this temporal point is higher for [ali] than for [ala] (see Table 1). Contrary to our initial expectations, bars for the trill in Figure 1 are somewhat higher for the vowel-dependent carryover effects than for anticipatory effects, and this trend in vowel coarticulatory direction holds for five out of the six speakers analyzed (see Table 2). This vowel coarticulation outcome could be associated with the strict postural and aerodynamic requirements for the trill. Thus, while anticipatory tongue dorsum lowering for [r] blocks vowel carryover effects to a large extent (just as for [l]), trilling may cause the tongue body configuration for the consonant to be held for a long time during V2 thus preventing much vowel anticipation from occurring (Recasens & Pallarès 1999). 3.2
Alveolopalatals The mechanico-inertial properties associated with the tongue dorsum raising and fronting gesture may render C-to-V effects from alveolopalatals more prominent at the carryover than at the anticipatory level. More considerable C-to-V carryover than anticipatory effects should result in less anticipatory effects than carryover effects in tongue dorsum lowering as a function of [a] (effects from [i] should be less prominent since this vowel is also a palatal articulation). These coarticulatory patterns are consistent with articulatory differences at closure onset and offset for alveolopalatals: some tongue dorsum coarticulation is allowed to occur at [] onset since the dorsum is less involved in alveolopalatal closure formation at this temporal point, i.e. the consonant is generally postalveolar at closure onset and a full alveolopalatal contact configuration is not achieved until closure midpoint. On the other hand, tongue dorsum coarticulatory effects exerted by V2 are blocked at [] offset because the articulatory movement towards closure release starts at
32
DANIEL RECASENS
the tongue front, thus leaving an [j]-like configuration before central contact is released completely. Bars for [] and [] in Figure 1 are in agreement with our expectation that vocalic carryover should prevail over vocalic anticipation in VCV sequences with alveolopalatals. Indeed, for both consonants and independently of whether effects are exerted by [i] or by [a], filled bars are higher than unfilled bars and densely cross-hatched bars are higher than lightly crosshatched ones. This pattern of F2 coarticulatory direction was found to hold for all speakers analysed, five for [] and two for [] (see Table 2). A better understanding of trends in vowel coarticulatory direction for VCV sequences with alveolopalatal consonants may be gained from an inspection of F2 trajectories for [ia] and [ai] in the top right graph of Figure 2. According to this graph, F2 lowering effects from [a] are larger at closure onset than at closure offset and thus, occur at the carryover level rather than at the anticipatory level, since they are blocked less by C-to-V anticipation than by C-to-V carryover. A relatively low F2 frequency associated with V1 = [a] in the sequence [ai] may be traced all the way from closure onset to about closure midpoint, while F2 lowering associated with V2 = [a] in the sequence [ia] does not start until consonant closure has ended. The coarticulatory direction scenario is somewhat different for the alveolopalatal fricative []. Bars in Figure 1 show that the fricative cognate [] favors no specific vowel coarticulatory direction. Indeed, individual speakers may favor either direction or none. As a general rule, F2 values in Table 2 reveal the existence of carryover effects of a similar size (about 100 Hz) from V1 = [a] on [] and []. Some blocking of the vowel carryover effects in VCV sequences with [] may be attributed to anticipatory manner requirements associated with the formation of a lingual groove for the fricative consonant. On the other hand, V2 = [a] exerts more anticipation on [] than on []. Indeed, a comparison between the F2 trajectories for [ia] and [ia] in Figure 2 (middle left and upper right graphs) shows that F2 lowering may begin already at consonant onset for the fricative but not until consonant offset for the nasal. More vowel anticipation on [] than on [] appears to result from a higher degree of tongue dorsum constraint for the nasal than for the fricative at consonant offset. 3.3
Other dentals and alveolars Patterns of coarticulatory direction for relatively unconstrained dentals and alveolars [, l, n, s, ] appear to depend to a large extent on the tongue body configuration characteristics. Indeed, while all these consonants share a front raising lingual gesture, they differ as to whether the tongue predorsum is slightly lowered for more apical productions or slightly raised for more laminal productions (Dart 1991). Bars in Figure 1 show a strong trend for dental and alveolar consonants [, l, s, ] to favor vowel carryover over vowel anticipation. According to Tables 1 and 2, this directionality pattern is at work for all speakers in the case
VCV COARTICULATORY DIRECTION
33
of [, l, ] (four, two and six, respectively) and for most speakers in the case of [s]. Catalan speakers exhibit no preference for any specific vowel coarticulatory direction in VCV sequences with [n], i.e. they may favor carryover (JP, DP), anticipation (DR, JC) or neither one direction nor the other (JS). Hz
2500
ila
ia
2000
1500
ai
ali 1000 2500
ia
ia
2000
1500
ai
ai
1000
2500
isa
ina
asi
ani
2000
1500
1000
Time
Figure 2: F2 trajectories for [aCi] and [iCa] sequences sampled every 10 ms for [l] (JP) and [, , s, n] (DR) and at equidistant points for [] (JM), and lined up at consonant onset. Short vertical marks have been placed at consonant offset.
Inspection of the F2 trajectories for the asymmetrical sequences [ia] and [ai] in the middle right graph of Figure 2 may help to explain why vowel carryover prevails over vowel anticipation in VCV sequences with [, l, ]. The fact that F2 lowering for [ia] proceeds earlier than F2 rising for [ai] explains why F2 is higher in the former sequence than in the latter at consonant
34
DANIEL RECASENS
midpoint. It may thus be suggested that V1 = [i] prevails over V2 = [a] in the former sequence while V1 = [a] prevails over V2 = [i] in the latter. There may be two complementary reasons for this finding: much of the carryover coarticulation from V1 = [i] in the sequence [ia] may be due to the fact that apical raising and little tongue dorsum involvement for the consonant does not significantly oppose the prominent mechanico-inertial constraints associated with the tongue dorsum gesture for preceding [i]; on the other hand, much of the carryover coarticulation from V1 = [a] in the sequence [ai] is compatible with the presence of some uncontrolled tongue dorsum lowering during the consonant as the primary constriction is being formed. VCV trajectories for clear [l] and [] agree with those for [] since these essentially apical consonants also involve some tongue dorsum lowering for the formation of lateral channels to allow the passage of airflow ([l]) or the execution of a fast tap ([]). A comparison between the bottom left and middle right graphs of Figure 2 reveals that, while the F2 trajectories for [isa] do not differ substantially from those for [ia], F2 rises to about 1700 Hz already at consonant onset in the case of [asi], quite unlike the case of [ai]. This F2 raising characteristic is also obvious for [asa] (see Table 1), and may be associated with a requirement for the fricative to adopt a relatively high tongue dorsum position. This requirement and strict demands on the formation of a lingual groove at [s] onset may explain why there is less vowel carryover coarticulation in [VsV] than in [VV] sequences. In order to understand why, differently from other dentals and alveolars, vowel carryover does not prevail over vowel anticipation in [VnV] sequences, F2 trajectories for [n] ought to be compared with those for [] (see bottom and middle right graphs of Figure 2). On the one hand, slightly greater anticipatory tongue dorsum raising for more laminal [n] than for more apical [] might account for why carryover effects exceed anticipatory effects to a larger extent in the sequence [ai] than in the sequence [ani]. On the other hand, a faster F2 lowering trajectory for [ina] than for [ia] results in less carryover coarticulation from V1 = [i] vis-à-vis anticipatory coarticulation from V2 = [a] in the former sequence than in the latter. The reason for the latter outcome is unknown but appears to be related to a trend for Catalan [n] to adopt a relatively low tongue dorsum position (see also F2 values for [ana] in Table 1). In summary, an increase in the degree of tongue dorsum raising for [s] and, to a large extent, for [n] relative to [, l, ] appears to cause an increase in the C-to-V anticipatory component and a decrease in the relative prominence of vowel carryover effects in comparison with vowel anticipatory effects. 4.
Coarticulatory organization in speech production A look at the relative weight of the vowel anticipatory and carryover effects for all consonants under analysis in this paper reveals that most consonants favor vocalic carryover over vocalic anticipation (see Figure 1). This is so for most alveolopalatals, all dentals, most alveolars and the trill [r].
VCV COARTICULATORY DIRECTION
35
On the other hand, dark [l] favors vocalic anticipation, and [n] and [] favor no particular vowel coarticulatory direction. This finding suggests that the trend for vowel-dependent effects to favor carryover over anticipation reported in several VCV coarticulation studies in the literature is not necessarily languagespecific but could be related to universal articulatory requirements on consonant production. A relevant issue is whether, unlike carryover effects, anticipatory effects reflect planning for a given phoneme independently of the articulatory requirements for the preceding phonetic segments. If so, the onset of anticipatory coarticulation ought to be relatively fixed, (e.g. it should occur about 200-250 ms before the phoneme in question, according to Fowler & Saltzman 1993) while the offset of carryover coarticulation should take place at different times depending on the mechanico-inertial characteristics of the following phonetic segments. According to a less strict version of the planning hypothesis (Recasens 1989, 2002), VCV anticipatory effects would be more fixed than carryover effects though influenced to some extent by the articulatory requirements for the contextual vowel and consonant. Data on the size of the vocalic effects in Figure 1 allow us to examine whether vowel anticipation is independent from the articulatory properties of the preceding contextual segments. Inspection of the unfilled and lightly crosshatched bars reveals that anticipatory vowel effects do not exhibit a comparable size for all consonants but are greater for unconstrained consonants than for more constrained ones. Assuming that the size and temporal extent of coarticulation are closely related, it appears that vowel anticipatory effects in VCV sequences start earlier when the intervocalic consonant is unconstrained than when it is highly constrained. Therefore, it is not the case that vowel coarticulation is planned to start invariably at the same moment in time; instead, the temporal extent of vowel coarticulation depends inversely on the degree of constraint for the intervocalic consonant. In support of this possibility we have also found that anticipatory effects in tongue dorsum movement begin later during the more constrained vowel [i] than during the less constrained vowel [a] (Recasens 2002). This is not to say, however, that anticipatory effects do not involve any planning at all. In comparison to carryover effects, anticipatory effects appear to be more independent from the articulatory requirements for the ongoing contextual segments. Indeed, data on vowel coarticulation in Figure 1 reveal that anticipatory effects (unfilled and ligthly cross-hatched bars) are often shorter and less variable than carryover effects (filled and densely crosshatched bars). The more adaptive nature of carryover effects relative to anticipatory effects is consistent with data showing that the former become particularly long if they result from articulatory overshoot in VCV sequences composed of consecutive alveolopalatal segments, e.g. in the case of [ia], where a high F2 frequency may be carried over from V1 all the way into V2. Additional evidence may be found in dorsopalatal contact data on longrange coarticulatory effects (Recasens 1989). According to this study,
36
DANIEL RECASENS
carryover effects as a function of V1 = [i] vs [a] in the sequences [VtCi] and [VtCa] were found to last until C2 if C2 was /t/ but only until V2 = [] midpoint if C2 was [], presumably because anticipatory effects from [] block vowel carryover effects to a larger extent than those from /t/. On the other hand, vowel-dependent anticipatory effects associated with V3 = [i] vs [a] in the sequences [VtV] and [VttV] turned out to start invariably during []. Had V3-dependent anticipatory effects been conditioned by C1, we would have expected them to start earlier in sequences with C1 = [t] than in those with C1 = [], the reason being that the alveolopalatal fricative is assumed to exert more prominent carryover coarticulation on the following segmental material than the dental stop. The finding of a more fixed window for anticipatory effects than for carryover effects in [VCCV] sequences provides evidence in support of the notion that the former reflect planning while the latter are exclusively conditioned by inertia and the biomechanical requirements for the contextual segments. 5.
Sound change The DAC model claims that segmental assimilations should arise from prominent coarticulatory adaptations. Evidence for a close relationship between the direction of the C-to-V coarticulatory effects and the direction of vowel assimilations and other sound changes induced by consonants have been presented earlier (Recasens 1999b). Thus, front vowel lowering and mid front and low vowel backing and rounding assimilations triggered by dark [l] are regressive but not progressive, while alveolopalatals may favor both progressive and regressive vowel raising assimilations. Assimilatory processes involving non-adjacent segments are harder to predict, which is in agreement with the corresponding coarticulatory effects being less prominent than coarticulatory effects between adjacent segments. In order to study these assimilations and other sound changes, we gathered a large database of lexical variants of Catalan dialects (Eastern and Western Catalan, Valencian, Majorcan) collected by philologists and linguists since the beginning of the 20th century. Inspection of the data collected reveals that the assimilatory processes of interest are partly in agreement with the predicted patterns of coarticulatory direction. In the first place, they are much more frequently regressive than progressive, independently of the articulatory characteristics of the triggering vowel and the intervocalic consonant, which is not in agreement with V-to-V coarticulatory effects being carryover rather than anticipatory (see Section 4). Thus, mid front vowel or schwa raising is triggered by following [i] rather than by preceding [i] (e.g. [tini] instead of [teni] from Latin TENERE “to have”), vowel backing and rounding favors the regressive direction (e.g. [roo], [ruo] from the expected form [reo] RENIONE “kidney”), and vowel lowering may favor either the regressive or progressive directions (e.g. Western Catalan [piatat] for [pietat] PIETATE “piety”, [astarnua] for [asternua] STERNUTARE “to sneeze”). It is not clear why the direction of
VCV COARTICULATORY DIRECTION
37
vowel assimilations at a distance should be mostly regressive independently of the direction of the corresponding V-to-V coarticulatory effects. A possible reason is that listeners pay attention to those coarticulatory events which reflect phonemic planning rather than to those that do not. In agreement with predictions of the DAC model of coarticulation, examination of the data reveals that the involvement of specific vowels in Vto-V assimilatory changes may be associated with the degree of articulatory constraint for the intervocalic consonant such that a higher involvement obtains when the consonant is less constrained than when it is more highly constrained. Thus, the frequency of occurrence of specific triggering vowels in V-to-V assimilatory processes happens to be higher in VCV sequences with less constrained dentals, alveolars and labials (between 10% and 20%) than in those with more highly constrained alveolopalatal consonants and the alveolar trill (below 10%). This suggests that, in comparison to more unconstrained consonants, highly constrained consonants may act more independently of possible triggering vowels in the implementation of vowel assimilations. Indeed, alveolopalatals are often not assisted by vowels in the regressive or progressive palatalization of syllable initial [e] (e.g. [mio] MELIORE “better”, Eastern Catalan [ip] Arabic sarab “syrup”, Western Catalan [diat] GEBRATU “frozen”, [dirma] GERMANU “brother”, [sio] SENIORE “sir”). Also, the regressive and progressive lowering of syllable initial [e] by the trill [r] often applies independently of the contextual vowels (Western Catalan [rao] RENIONE “kidney”, [sarmo] SERMONE “sermon”, Valencian [rasina] RESINA “resin”, [rako] RENCORE “resentment”). The Catalan survey also suggests that regressive vowel changes may be facilitated by the simultaneous action of a consonant and a vowel in a CV sequence in a way which is consistent with the DAC model. Thus, both dark [l] and [] could be responsible for front vowel lowering in Western Catalan variants such as [salnsi] for [silnsi] SILENTIU “silence”. It may be claimed that the simultaneous anticipatory action of the triggering vowel and consonant reinforce each other, thus rendering the regressive vowel assimilation possible. 6.
Conclusions This paper has shown that patterns of vowel coarticulation in VCV sequences may be accounted for by patterns of coarticulatory direction for the intervocalic consonant. The rationale underlying this behavior is that the coarticulatory effects associated with vowels and consonants in a VCV string conflict with each other such that the extent to which a vowel affects the consonant depends on the prominence of the C-to-V effects at the same temporal site. This scenario clearly holds for consonants produced with much predorsum lowering and tongue dorsum retraction or much tongue predorsum raising and fronting. Indeed, maximal articulatory control at consonant onset for [l] and at consonant offset for [] accounts for why vowel carryover is blocked to a larger extent than vowel anticipation in [VlV] sequences while the opposite holds in [VV] sequences. Other factors related to manner of
38
DANIEL RECASENS
articulation and tongue body configuration have been shown to determine the relative contribution of the anticipatory and carryover components in VCV sequences with [] and relatively unconstrained dentals and alveolars. Data on VCV coarticulation for velars and labials reported elsewhere may also be explained by the articulatory constraints on the intervocalic consonant (Recasens et al. 1997; Recasens 1999a): front velars exhibit essentially the same pattern of vowel coarticulatory direction as alveolopalatals (i.e. they also exert C-to-V carryover rather than anticipation), presumably since they are also articulated at the palatal zone; back velars favor vowel anticipation rather than vowel carryover, perhaps in line with the presence of a continuous tongue dorsum forward movement during closure; and bilabials favor vowel carryover or vowel anticipation depending on factors such as tongue dorsum position and involvement for the consonant, speech rate and vowel articulation. The present study has also provided evidence in support of the predictive power of the DAC model of coarticulation for the interpretation of assimilatory changes and the role of planned vs mechanical factors in speech production.
References Bladon, Anthony & Ameen Al-Bamerni. 1976. “Coarticulation resistance in English [l]”. Journal of Phonetics 4. 137-150. ---------- & Francis J. Nolan. 1977. “A video-fluorographic investigation of tip and blade alveolars in English”. Journal of Phonetics 5. 185-193. Dart, Sarah. 1991. “Articulatory and acoustic properties of apical and laminal articulations”. UCLA Working Papers in Linguistics 79. 1-155. Fowler, Carol A. & Elliot Saltzman. 1993. “Coordination and coarticulation in speech production”. Language and Speech 36. 171-195. Öhman, Sven. 1966. “Coarticulation in VCV utterances: Spectrographic measurements”. Journal of the Acoustical Society of America 39. 151168. Recasens, Daniel. 1987. “An acoustic analysis of V-to-C and V-to-V coarticulatory effects in Catalan and Spanish VCV sequences”. Journal of Phonetics 15. 299-312. ----------. 1989. “Long range coarticulatory effects in VCVCVC sequences”. Speech Communication 8. 293-307. ----------. 1999a. “Lingual coarticulation”. Coarticulation ed. by William J. Hardcastle & Nigel Hewlett. 80-104. Cambridge: Cambridge University Press. ----------. 1999b. “Predicting directionality patterns in assimilatory and epenthetic processes from patterns of coarticulatory directionality”. Proceedings of the 14th International Congress of Phonetic Sciences ed. by John J. Ohala, Yoko Hasegawa, Manjari Ohala, Daniel Granville & Ashlee C. Bailey, vol. 3, 1847-1850. Berkeley: Linguistics Department, University of California.
VCV COARTICULATORY DIRECTION
39
----------. 2002. “An EMA study of VCV coarticulatory direction”. Journal of the Acoustical Society of America 111. 2828-2841. ----------, Maria Dolors Pallarès & Jordi Fontdevila. 1997. “A model of lingual coarticulation based on articulatory constraints”. Journal of the Acoustical Society of America 102. 544-561. ---------- & Maria Dolors Pallarès. 1999. “A study of // and [r] in the light of the ‘DAC’ coarticulation model”. Journal of Phonetics 27. 143-169. ---------- & Maria Dolors Pallarès. 2001. “Coarticulation, blending and assimilation in Catalan consonant clusters”. Journal of Phonetics 29. 273-301. Solé, Maria-Josep. 2002. “Assimilatory processes and aerodynamic factors”. Laboratory Phonology 7 ed. by Carlos Gussenhoven & Natasha Warner. 351-386. New York: Mouton de Gruyter. Sproat, Richard & Osamu Fujimura. 1993. “Allophonic variation in English [l] and its implications for phonetic implementation”. Journal of Phonetics 21. 291-311.
THE STABILITY OF PHONOLOGICAL FEATURES WITHIN AND ACROSS SEGMENTS THE EFFECT OF NASALIZATION ON FRICATION*
MARIA-JOSEP SOLÉ Universitat Autònoma de Barcelona
Abstract This paper argues that the articulatory-acoustic stability of phonological features may be affected not only by concurrent features, but also by features in adjacent segments which may coincide in time due to coarticulatory overlap. Specifically, the paper illustrates how frication may be endangered by concurrent and coarticulatory nasality. We review aerodynamic and acoustic evidence showing that fricatives tend to be impaired and become unstable with co-occurring nasalization. Then we examine the stability of fricatives when they come in contact with nasality in adjacent segments. An experiment is described where aerodynamic and acoustic data were obtained for fricative + nasal sequences at slow and fast rates. The results show that anticipatory velophrayngeal opening during the acoustic duration of the fricative vents the high oral pressure required for audible frication, thus providing support for the claim that the same physical principles disfavoring the combination of frication and nasality within a segment are at play when these features combine across segments. It is argued that the instability of frication when combined with nasalization may be at the origin of a number of phonological patterns.
1.
Introduction It is known that the articulatory-acoustic stability of phonological features may be endangered by their combination with other features within segments. In this paper we suggest that the stability of features may be affected not only by concurrent features, but also by features in adjacent segments which may coincide in time due to coarticulatory overlap. Specifically, we examine the stability of fricatives when they combine with nasality within a segment and when they come in contact with nasality in adjacent segments with varying degrees of coarticulatory overlap. In this paper we focus on the phonetic grounding of the combination of features within a segment and across segments, and the implications for *
Work supported by grants HUM2005-02746, BFF2003-09453-C02-C01 from the Ministry of Science and Technology, Spain, and by the research group 2005SGR864 of the Catalan Government. The insightful suggestions and comments of Daniel Recasens and an anonymous reviewer are gratefully acknowledged.
42
MARIA-JOSEP SOLÉ
phonological patterns. Much work is available on the physical, physiological and auditory principles that account for the combination of features within a segment, and characterize feature co-occurrence restrictions, phonological universals (i.e. why some feature combinations are universally preferred) and system gaps (i.e. why certain feature combinations do not occur) (e.g. Ohala 1983; Solé 2002a; Westbury & Keating 1986). How speech features affect each other across segments is at the origin of restrictions on the sequencing of sounds, the likelihood that segments follow one another, and feature spreading or blocking in consonant/vowel harmony. The interaction between features in contiguous segments, however, does not end there. Sound change and phonological processes may also result from the way in which features combine across segments. After all, segments change in certain segmental contexts, due to interaction with features in neighbouring segments, but not in others. Work on how features combine across segments has thus far focused on perceptual and physiological constraints. For example, Kawasaki (1986, 1992) addressed the perceptual discriminability of sound combinations. Coarticulation theories have explored the contextual restrictions on the temporal extension and magnitude of nasality, labiality, laryngeal and lingual gestures for consonants and vowels (e.g. Hardcastle & Hewlett 1999; Huffman & Krakow 1993). In contrast, the role of aerodynamic factors in the combination of features across segments has been little addressed (but see Ohala 1981, 1997a; Solé 2002b). Furthermore, not much emphasis has been placed on the relationship between how features combine across segments and how they influence each other when they co-occur within a segment. This paper goes beyond previous work in suggesting that aerodynamic factors may be at the origin of the incompatibility of features and phonological patterning. In addition, it is suggested that the physical and physiological principles that account for the paradigmatic arrangement of features can also explain their syntagmatic arrangement. We will explore how the stability of features may be endangered by their combination with other features within segments and by features in contiguous segments due to coarticulatory effects. We hypothesize that the physical and physiological constraints which are at work within a segment will also play a role in the combination of features across segments. If this is the case, it follows that features that do not combine well within a segment, due to physical and physiological constraints, are not likely to combine in adjacent segments since the same constraints will apply when the two features overlap in time. Before addressing our hypothesis we need to consider the theoretical framework underlying the stability of features. According to Steven’s (1972, 1989) quantal theory, gradual and continuous articulatory movements and aerodynamic variation may have a categorical acoustic-auditory result; that is, some variations along the continuum involve abrupt acoustic changes whereas certain others do not. Features that are used in speech are those that fall in stable regions in the acoustic/auditory space to allow for articulatory, contextual, rate and prosodic variation while preserving a robust acoustic
THE STABILITY OF PHONOLOGICAL FEATURES
43
result. The quantal nature of speech has been illustrated by varying features in isolation along the articulatory (or acoustic) dimension and observing the acoustic/auditory result of such variations (e.g. laryngeal adduction and presence of voicing; movement of the velum and percept of nasalization). The range of allowable articulatory/aerodynamic variation within which the percept of the feature is not affected will define the stability of the articulatory-acoustic correlation. Elaborating on this view, we may consider that this stable range may vary, i.e. may be expanded, reduced, or shifted, with co-occurring features in the same segment and, as claimed in this paper, in adjacent segments. Combinations of features which result in tightly constrained articulatory or aerodynamic requirements are unstable because they allow a narrow range of variation, that is, they may easily fall into a different category with small variations in the articulatory/aerodynamic parameters. Such unstable combinations may easily change into a different percept and will tend to be disfavoured (as shown, for example, by gaps in segment inventories, and a lower lexical frequency of certain segment types). A classic example is the difficulty in maintaining the co-occurrence of voicing and obstruency. The partial or full blockage of the air exiting the oral cavity for obstruents leads to a rapid increase of oropharyngeal air pressure—required to generate turbulence for fricatives and to create an audible burst for stops—but tends to impair the transglottal flow required for voicing within a few tens of milliseconds. Unless the obstruent constriction is kept very short or the oral cavity is enlarged to accommodate more air and thus prolong voicing—both maneuvers to the detriment of a high pressure build-up for obstruency—voiced obstruents will tend to devoice. Thus, voiced obstruents require very finely tuned aerodynamic conditions in order to maintain voicing and obstruency (Ohala 1983). Similarly, combinations of features which result in a poor acoustic signal, for example voiceless nasals (as the low frequency amplitude modulation for nasals is impaired by voicelessness), are auditorily unstable (as measured from confusion studies) and will tend not to be used. In order to describe the interactions between features it is necessary to vary the parameters—physiological, aerodynamic or acoustic—that characterize such features not only singly but in combination (e.g. changes in oral pressure and duration of the obstruent constriction in the first example (Westbury & Keating 1986), or changes in glottal excitation for nasals and non-nasals, or during different degrees of velopharyngeal opening for the nasal, in the second example). We can then identify a set of categorial values along these parameters which remain stable with variations in the other parameters. These categories or optimal settings across the different parameters, if wide enough in range, are the more likely combinations of features into segments. The claim made in this paper is that the physical and auditory principles that account for how features interact within segments may also account for the interaction of features in contiguous segments. It is known that when two
44
MARIA-JOSEP SOLÉ
segments are in contact their articulations necessarily overlap. Variations in the shape, position and temporal coordination of the articulators due to coarticulation with neighbouring segments may cause modifications of the articulatory trajectories or the aerodynamic conditions in the vocal tract that can, in turn, affect the acoustic and auditory result. Thus, the execution of the articulatory gestures in implementing a particular feature will vary considerably with the features in contiguous segments. As described earlier, voicing is difficult to maintain during an obstruent; however, if an obstruent is preceded by a nasal, voicing during the obstruent is facilitated. In nasal + obstruent sequences, voicing continuation into the obstruent is facilitated by nasal leakage before full velic closure is achieved and, after velic closure, by the velum continuing to rise toward the high position for obstruents, thus expanding the volume of the oral cavity. Both mechanisms, nasal leakage and oral cavity expansion, lower the oropharyngeal pressure which accumulates in the oral cavity and thus prolong transglottal flow for voicing (Hayes & Stivers 1996). Such phonetic effects have phonological significance in languages with a phonological post-nasal voicing rule, in phonotactic patterns, and in sound change. Thus, aerodynamic principles in the maintenance of voicing within segments may account for the extension (or cessation) of voicing when segments are combined. In order to test the hypothesis that the factors governing the interaction between features in the same sound may also govern the interaction of features across segments when they overlap due to coarticulation, we designed a series of experiments to explore the impact of co-occurring and coarticulatory velopharyngeal opening for nasality on the stability of segments requiring a high pressure build-up in the oral cavity, such as fricatives. The results may throw light on why some feature combinations fail to occur, e.g. nasal fricatives, and why certain segment combinations, e.g. fricatives followed by nasals, tend to change. We report on research in which the aerodynamic conditions in these segment types were varied (1) by venting the oral pressure with a pseudo-velopharyngeal valve, and (2) by increasing speaking rate and thus articulatory overlap of velopharyngeal opening on the pressure build-up for the fricative. The remainder of this paper is structured as follows. In Section 2 we review the aerodynamic and perceptual effects of combining frication and nasalization within a segment. In Section 3 we address the conflicting requirements of frication and nasalization when they occur in contiguous segments and we review how fricatives tend to lose their friction preceding nasals historically and synchronically. In Section 4 we provide aerodynamic and acoustic evidence that in fricative + nasal sequences anticipatory velum lowering during the acoustic duration of the fricative reduces or extinguishes the pressure difference required for frication. In Sections 5 and 6 we argue that the principles that explain why the features [nasal] and [fricative] do not combine within a segment also explain why they do not combine across segments.
THE STABILITY OF PHONOLOGICAL FEATURES
2.
45
Co-occurring features: Nasality and frication Languages of the world have nasal stops, nasal taps, nasal approximants, nasal glides and nasal vowels but no nasal fricatives (Ohala 1975; Ohala & Ohala 1993). Segments reported as nasalized fricatives (e.g. in Umbundu (Schadeberg 1982), Coatzospan Mixtec (Gerfen 1999), and Waffa (Stringer & Hotz 1973)) are more adequately described as frictionless continuants due to the lack of high frequency aperiodic noise (Ohala & Ohala 1993; Ohala, Solé & Ying 1998). The reported nasalized fricatives in Bantu languages, Kwa languages, and Igbo appear to involve sequencing of the nasal and the fricative configuration and are, in fact, better described as prenasalized fricatives (Welmers 1973:70-73). Formal phonology attributes the lack of nasal fricatives to antagonistic constraints: “The rarity of such segments [nasalized liquids, glides and fricatives] can be attributed to an antagonistic constraint NAS/CONT: A nasal must not be continuant” (Pulleyblank 1997:76); or to a constraint hierarchy ranking segments according to their incompatibility with nasalization: “*NASOBSTRUENTSTOP >> *NASFRICATIVE >> *NASLIQUID >> *NASGLIDE >> *NASVOWEL, where the less compatible a segment is with nasality, the higher-ranked its constraint” (Walker 2000). The phonetic grounding for such antagonistic constraints or constraint hierarchy is provided by work by Ohala and coworkers. Ohala and Ohala (1993) provide an explanation based on aerodynamic principles. They suggest that obstruents require a build-up of oral pressure behind the constriction in order to create audible turbulence. An open velopharyngeal port for nasality would vent the airflow through the nasal cavity, thus reducing or eliminating the required pressure difference across the oral constriction for frication. Ohala, Solé and Ying (1998) further explored this explanation by examining the aerodynamic and acoustic effect on fricatives of concurrent nasality and its perceptual result. They vented oro-pharyngeal pressure (Po) with a pseudovelopharyngeal valve (i.e. catheters of varying cross-sectional areas: 7.9, 17.8, 31.7, and 49.5 mm2; all 25 cm long, inserted into the mouth via the buccal sulcus and the gap behind the back molars), and quantified how much velopharyngeal opening for nasality was allowed before frication would be extinguished. The results of Ohala et al. (1998) show that in producing a fricative there can be some opening of the velic valve, but the impedance of this valve has to be high relative to that in the oral constriction so that the air will escape through the aperture with lower impedance and create friction at the oral constriction. Thus venting fricatives with catheters with a higher impedance (7.9 mm2-area catheter) than that at the oral constriction did not affect the quality of the fricative, it just slightly attenuated the fricative noise. Catheters with values for impedance similar (17.8 mm2 area) to those at the oral constriction had noticeable effects on fricatives: they lost much of their highfrequency aperiodic energy (e.g. the spectral peak at 6 kHz for [s] disappeared and the energy level dropped 20 dB). The effect was most dramatic for voiced
46
MARIA-JOSEP SOLÉ
fricatives, which became frictionless continuants. Sibilant fricatives sounded non-sibilant. Larger area catheters (≥31.7 mm2), with a lower impedance than that in the vocal tract, extinguished frication, since airflow exited through the aperture with lower impedance, thus reducing the required pressure drop across the oral constriction to generate turbulence. Voiceless fricatives only retained the glottal friction. Voiced fricatives were more seriously affected than voiceless fricatives, becoming vowel-like. The results show that if impedance at the velopharyngeal port is lower than that at the oral constriction the air will escape through the nose (i.e. the fricative will be nasalized), but supraglottal frication will be impaired. Velic openings which do not impair frication (<17.8 mm2) would be insufficient to create the percept of nasalization in the fricative or even adjacent vowels. A greater coupling between the oral and the nasal cavity is required for vowels and sonorants to be perceived as nasalized. Data by other investigators suggest that a velo-pharyngeal opening of 36 mm2 (Whalen & Beddor 1989), 40 mm2 (Maeda 1993) or more is needed to create a robust percept of nasalization in vowels 1 . Consequently, if impedance at the velopharyngeal port is high enough not to affect the fricative quality, the fricative will not sound nasalized. In summary, to the extent that a fricative is a good fricative perceptually, it cannot be nasalized (without added biomechanical cost, e.g. increased subglottal pressure). In other words, along the independent physical parameters of frication and nasalization there are categorial values which show stable perceptual properties, i.e. a certain range within the continuum where a reliable identification of frication (or nasalization), say 80%, can be obtained. The solid line in Figure 1 illustrates that, for a sufficient rate of flow across the oral constriction, a stable percept of a fricative can be obtained with some velopharyngeal opening, but if the velic opening is larger than approximately 18 mm2, as shown by Ohala et al. (1998), the resulting sound will not be heard as having friction. The dashed line shows that the same sound will not be heard as nasalized unless the velic opening is at least 40 mm2, as found by Whalen and Beddor (1989), and Maeda (1993). The two ranges of reliability for frication and nasality, however, do not overlap, i.e. there is not a range of values for both frication and nasalization where we may get 80% identification for both features. Shosted (2006) obtained similar results to those of Ohala et al. (1998) with a mechanical model of the vocal tract with which he generated fricatives with different degrees of velopharyngeal opening. He found that nasalization on fricatives reduced the intensity of friction and increased the bandwidth of spectral peaks, thus changing the percept of fricatives. The difficulty in achieving a stable percept of concurrent nasalization and frication has been Hajek and Watson (1998) found that a velopharyngeal opening of 16.8 mm2 was sufficient to give a strong nasal percept in vowels. Even with such a small magnitude of velum opening our argument still holds. In Figure 1, right panel, the identification curve would shift towards lower values (i.e. 80% identification of a nasalized segment would be obtained at approximately 20 mm2), but the two curves would still not overlap. 1
THE STABILITY OF PHONOLOGICAL FEATURES
47
noted by other investigators. For example, research on a variety of languages has shown that nasalized voiced fricatives produced with perceptible nasalization tend to lose audible frication and become approximants (e.g. in Guarani, Gregores & Suarez 1967). In contrast, nasalized voiceless fricatives with audible frication do not differ much auditorily from non-nasalized fricatives, that is, the acoustic cues for nasalization are hardly detectable (Cohn 1993; Ladefoged & Maddieson 1996:132). fricative
nasalized
% identification
100 80 60 40 20 0 0
10
20
30
40
50
60
70
velo-pharyngeal opening in mm2
Figure 1: Diagrammatic representation of the reported percentage of identification of a fricative and a nasal for continuous variation in velic opening for a given flow rate. See text.
In the next section we address whether the weakening or loss of fricatives before nasals can be explained by the same principles. 3.
Contiguous frication and nasality Related to the difficulty of combining velopharyngeal opening with frication within a segment is the precise timing of gestures in nasal + fricative and fricative + nasal sequences if both segments are to be preserved. The antagonistic requirements of turbulence generation (i.e. a tightly closed velum to allow turbulent airflow in the vocal tract) and nasal coupling (i.e. a lowered velum) in contiguous fricatives and nasals severely constrain the timing of velic movements. The relative synchronization of articulatory movements in nasal + fricative sequences has been investigated by Ali, Daniloff and Hammarberg (1979), Ohala and Busà (1995), Ohala (1997b) and Busà (in press) amongst others. Historically, such sequences may result in (i) nasal consonant loss, with associated nasalization and lengthening of the preceding vowel (e.g. Latin institutione > Italian istituto; PGmc *fimf > Old English five), and (ii) an epenthetic stop (e.g. English Hampstead, Hampshire < O.E. ham + stede, scir). Interestingly, one or the other outcome has been related to specific coarticulatory patterns (Busà in press). Ohala and Busà (1995) have argued that the first outcome, nasal consonant loss, is due to anticipatory vowel nasalization resulting from coarticulatory lowering of the velum for the upcoming nasal consonant and to associated perceptual factors. They provide perceptual evidence that in these sequences the listener attributes the acoustic
48
MARIA-JOSEP SOLÉ
effects of velum lowering for the nasal (attenuated amplitude and increased bandwidth of F1 in adjacent vowels) to the similar effects of a wide glottal opening required for high airflow segments, such as /s/, on neighboring vowels, thus discounting the nasal consonant. The second outcome, epenthetic stops, reflects anticipatory raising of the velum (and anticipatory glottal abduction) during the oral constriction for the nasal, which ensures sufficient time and rate of flow to build up pressure for the fricative (Ali et al. 1979). Reverse fricative + nasal sequences, although not the object of extensive investigation, require equally precise coordination of velic and oral gestures. These sequences have resulted in several sound changes, including (i) fricative weakening and loss 2 , and (ii) stop epenthesis. Importantly, the tendency for fricatives to weaken, or disappear, before a nasal may be related to the difficulty involved in combining frication with velopharyngeal opening within a segment observed in the preceding section. Examples of prenasal fricative weakening in historical sound change, morphophonological alternations and dialectal-stylistic variation are shown in Table 1. Prenasal fricative weakening may result in vocalization or gliding, rhotacism, nasal assimilation, and elision. Examples 1-3 and the first example in (c) in Table 1 illustrate vocalization; 4 exemplifies rhotacism; 5-9 illustrate fricative loss, and various examples of nasal assimilation and fricative loss are presented in Section (c). The examples in (c) and the examples of /s/ vocalization in (a) illustrate processes affecting only certain frequent words or combinations of words, thus illustrating the role of frequency in phonological change. The weakening of prenasal fricatives in the Romance data in Table 1(a), examples 1-5, may be argued to be part of a more general historical process of coda weakening—due to a drecreased oral gesture syllable finally—in Late Latin and Gallo-Romance (Straka 1964; Gess 1999) by which not only fricatives but also other obstruents were lost syllable finally, regardless of the nature of the following segment. We suggest that the weakening of fricatives before nasals found in a variety of languages may be attributed not only to articulatory reduction but more crucially to varying aerodynamic conditions due to the temporal sequencing of velic gestures. Although fricative weakening in Romance also occurs before non-nasals (e.g. Latin insula “island” > French île; Germanic *bruzdon “to embroider” > Old Occitan broidar), a number of scholars (Pope 1934:151; Rohlfs 1966; Torreblanca 1976; Recasens 2002) have noted that this process is favored by a following voiced consonant and, in particular, by a following [n], [m], [r] or
2
The term fricative ‘weakening’ is used here to indicate attenuation of the high frequency noise which characterizes fricatives, due to gestural reduction or aerodynamic factors. Fricative loss is considered the endpoint of the weakening continuum, i.e. extreme attenuation leading to the segment becoming inaudible. In perceptual terms gradient attenuation of the friction noise may result in identification of a discrete segment (e.g. a frictionless continuant, a vowel, a tap, an assimilated segment, or /h/) or in the perceptual loss of the segment (i.e. deletion).
THE STABILITY OF PHONOLOGICAL FEATURES
49
[l] 3 . Recasens (2002:342) attributes fricative weakening before voiced consonants in general to anticipatory glottal gestures for voicing during the fricative, which result in reduced transglottal flow (due to higher glottal impedance) and lower intensity of frication vis-à-vis pre-voiceless fricatives, thus making frication more likely to be missed. Other factors besides voicing, however, may be at play in the common weakening of fricatives preceding nasals, laterals, and trills. Fricative weakening may arise from variation in the relative timing of antagonistic positional requirements of the tongue for contiguous lingual fricatives and laterals (raised tongue sides and central critical constriction for the fricative and lowered tongue sides and central contact for the lateral; Ohala 1997b), and lingual fricatives and trills (raised and advanced tongue dorsum and a central groove for /s/ vs predorsum lowering and postdorsum retraction with a lax tongue-tip for the trill; Solé 2002b); in such sequences anticipatory movements for the lateral or the trill may bleed the positional and aerodynamic requirements for audible frication for [z, s]. In a similar way, in fricative + nasal sequences variation in the relative timing of the velic opening gesture and the preceding oral constriction for the fricative (requiring a sealed velum) may result in fricative weakening. In particular, anticipatory velum lowering for the nasal may affect the aerodynamic requirements for the generation of turbulence for the fricative. The examples in Table 1 illustrate that fricatives are weakened into frictionless continuants, or are lost altogether, more often when followed by a nasal segment, involving the same or different articulators, than when followed by non-nasal segments, when they retain their fricative quality. Fricatives have also been found to disappear in connected speech in French before nasal vowels (D. Duez, p.c.), which involve a lower position of the velum than nasal consonants. Consonants other than fricatives may assimilate the velic state of the upcoming nasal in casual speech (e.g. let me [lemm], wouldn’t [wnnt], Gimson, 1962), but they do not result in sound change or phonological alternations 4 . The question is then why fricatives are weakened more often than other segment types, and why they are weakened more often before nasal as opposed to non-nasal sounds. We hypothesized that the same aerodynamic factors disfavouring the combination of frication and nasality within a segment are responsible for the weakening/loss of fricatives before nasals: anticipatory movements to lower the velum for the nasal may reduce the oropharyngeal pressure necessary for the generation of turbulence for the fricative. Coarticulatory nasal leakage 3
Thus, for example, in Old French /s/ weakening and loss is found earlier in blâmer < blasmer < Lat. *blastemare “blame” and mêler < mesler, medler [l] < Lat. misculare “meddle” than in fête < feste < Lat. festa “holiday” (Pope 1934:151, 449). 4 This is most likely because stops tend to show up as oral in all other contexts (i.e. prevocalically, preconsonantally, and prepausally), thus listeners can attribute nasality in stops in stop + nasal sequences to the conditioning effect of the nasal and reconstruct an oral stop. In contrast, fricatives tend to show diminished amplitude of frication not only when followed by nasals but also preconsonantally and prepausally (Solé 2003), thus it is less likely that speakers attribute the weakened fricative to a variety of contexts.
50
MARIA-JOSEP SOLÉ
would have a larger effect on voiced fricatives as opposed to voiceless fricatives (Ohala et al. 1998; Solé 2002b), since to generate friction at the oral constriction while air is flowing out through the nasal passage due to anticipatory velopharyngeal opening would require a high volume of flow, and vocal fold vibration reduces airflow through the glottis. Thus, voiced fricatives with overlapping velic movements for the nasal would have two independent systems reducing the pressure drop across the oral constriction required for frication, high impedance to ingoing airflow at the glottis and low impedance to outgoing flow at the velum. In addition, voiced fricatives are known to be shorter than voiceless fricatives and, due to the lesser rate of flow through the vibrating glottis, they take longer to achieve the pressure difference for frication and result in a lower intensity of friction than their voiceless counterparts (Solé 2002b). This makes anticipatory velopharyngeal opening for the nasal more likely to impair audible friction in voiced than in voiceless fricatives. This prediction is in accord with the fact that the majority of examples of fricative weakening in Table 1 involve voiced fricatives. Note that our aerodynamic and perceptual explanation is not at odds with Ohala and Busà’s (1995) claim presented above that in N + fricative sequences, the nasal disappeared due to acoustic-perceptual factors. The N + fricative sequences they examine (e.g. Latin institutione > Italian istituto), in fact, illustrate that anticipatory velum lowering can be accommodated by a preceding vowel, nasalizing it, and that the acoustic effects of nasalization on the vowel may be misidentified. What we suggest is that in fricative + N sequences nasalization cannot be accommodated by a preceding fricative without compromising its spectral identity. The active role of the listener here would involve reconstructing a frictionless continuant at face value (for example, [j], [w] or a rhotic) or missing the fricative altogether. A second outcome of contiguous fricatives and nasals (which require a tightly closed velum and a lowered velum, respectively) is an epenthetic stop in the transition between the two articulatory configurations, due to a prolonged velic occlusion of the fricative during the oral constriction for the nasal (e.g. Old English glisnian > glisten; Sanskrit krṣṇā > Krishna ~ Krishtna). Along similar lines, the insertion of an epenthetic schwa in /sm/ >[sm] and /sn/ > [sn] sequences in Montana Salish (Ladefoged and Maddieson 1996:109-110) reflects a delayed velic lowering and oral closure for the nasal relative to the end of the fricative (i.e. an increase in the temporal distance between articulatory gestures), which avoids an overlapped lowered velum during the fricative in order to preserve frication. Thus a very precise synchronization of the velic and oral movements is required in order to sequence segments involving frication and nasality.
THE STABILITY OF PHONOLOGICAL FEATURES
51
(a) Historical change 1. [zn], [zm] > [jn], [jm] Latin mesnata “kids”, elemos(i)na “alms” > Catalan mainada, almoina (Badia 1951). Latin asinu “donkey” > Gascon aine (cited in Recasens 2002). Standard Catalan besnét “grandchild” > Majorcan Catalan beinét. Old French ae(s)mer, Standard Catalan esma > Majorcan Catalan eima (Alcover & Moll 1978-1979); English aim. 2. [n] > [jn], [wn] Latin agnu “lamb”, ligna “line” > S. Italian dialects ['ajn], ['lwna] (cited in Recasens 2002). 3. *[dzm], *[m] > [wm] Latin decimare, decimu > Catalan deumar “lessen, reduce”, deume “tribute”. 4. [zn], [zm] > [n], [m] Latin asinu “donkey” > Old Picard arne. Latin *dis(ju)nare “to eat breakfast” > Old Occitan dis/rnar (also Cat. dinar). Latin spasmu “spasm” > Roussillon esparme (cited in Recasens 2002). S. Spanish mismo ['mimo] “same” (Recasens 2002). 5. [zm] > [m] Old French ble(s)mir > English blemish. Vulg.Latin blastemare > Old French bla(s)mer > Fr. blâmer, English blame. Latin rosmarinu, -aris > *romarinu, -ari(u)s “rosemary” > Catalan romaní, Spanish romero. Standard Catalan quaresma “Lent” > Majorcan Catalan [ko'm]. 6. *sn > [n] IE snuṣā “daughter-in-law”, OE snoru > Latin nurus, Greek nuós, Armenian nu, Spanish nuera (Watkins 1985). 7. *sn > [n] Burmese *sna > [na] “nose”. (b) Phonological alternations 8. [sn] > [n]
9. [sm] > [m]
IE *dhus-no > Welsh dwn “dull, brown colour”, OE dun(n) “dark brown”; BUT IE *dhus-ko > Latin fuscus, OE dox, English dusk (Watkins 1985). IE *dhs-no > Latin fanum “temple”, English fanatic, (pro)fane; BUT IE *dhes-to > Lat. festus “festive”, German Fest, Spanish fiesta; IE *dhes-ya > Lat. fesiae, feriae, English fair, Spanish feria (Watkins 1985). IE *gras-men > Latin gramen “fodder”, English grama, gramineous; BUT IE *gras-ter > Greek gaster “stomach”, English gastric, epigastrium (Watkins 1985).
(c) Stylistic variation [zn] > [nn], [jn] isn't [nnt], ain’t [ent]; doesn’t [dnnt]; wasn’t [wnnt] (Gimson 1962). [mVn] > [mn], [mn] something [smn], [smn].
n ] Tyneside English (Local 2003:325). [nsVn] > [nVn] Vincent that [vn
[mfr Vn] > [mr Vn] San Francisco [sæmrnssko] American English (N. Hilty, p.c.). [vm] > [mm] [VN] > [VN]
give me, gimme [mm], have mine [ hæmman] (Gimson 1962). like them [lakm], tell them [telm]. BUT like this [lak s], tell this [tel s].
[VN] > [VN]
thank you, thanks [kju], [ks].
[zn] > [nn]
business [bdns], [bnns].
Table 1: Examples of fricative weakening/loss in fricative + nasal sequences.
52
MARIA-JOSEP SOLÉ
4.
Experiment. Variations in articulatory overlap The following experiment was designed to find out whether the tendency for prenasal fricatives to weaken or disappear can be attributed to anticipatory velopharyngeal opening for the nasal overlapping the acoustic duration of the fricative, thus diminishing the oropharyngeal pressure required for frication, as argued for concurrent features. In order to test this hypothesis we examined whether anticipatory velic opening occurred in such sequences and, if so, how it affected the aerodynamics and acoustics of fricatives. 4.1
Experimental procedure Simultaneous oropharyngeal pressure (Po), oral and nasal flow, and audio signal were obtained with PCquirer (Scicon) for five American English speakers producing words containing fricative + nasal sequences at slow and fast speech. The two speaking rates allowed us to observe the effects of increased articulatory overlap on the relative timing of velic and oral gestures, and how such changes in timing affect the pressure build-up for fricatives. The audio signal was digitized and sampled at 12 kHz, and the dc channels were sampled at 1 kHz. Po was obtained by a catheter introduced at the side of the mouth and bent behind the rear molars and connected to a pressure transducer. The volume flow from the mouth and the nose was collected simultaneously with a Rothenberg mask, using one of the outlet holes in the mouth mask for the pressure tube. The pressure and airflow signals were low-pass filtered at 50 Hz. Po and airflow were calibrated as described in Solé (2002a). Words containing fricative + nasal sequences with C1 = [s, z] and C2 = [n, m] in word medial position (e.g. Dessna [desn], Fresno [frezno], Missmer [msm], Mesmer [mezm]) were read in a carrier phrase. Control sequences with voiced and voiceless alveolar fricatives followed by laterals, stops, fricatives and approximants, all dento-alveolar (e.g. Grizzly [zl], Gizder [zd], is the [z], Ezra [z]; Esling [sl], Esda [sd], less the [s], Esra [s]) were also analyzed for comparison. The test and control tokens were randomized in the reading list. The aerodynamic and acoustic data were collected in two different sessions. In the first session, three speakers (JO, MS and DM) read five repetitions of each token at self-selected slow and fast rates. Because the three speakers showed different patterns of velic-oral coordination, two more speakers were recorded in a second session under the same conditions. Speakers JE and RS produced six repetitions of each token at slow and fast rates. 4.2
Measurements and analysis The measurements made on the data are illustrated in Figure 2, which shows the aerodynamic and acoustic data for Say Mesmer again in slow and fast speech for one of the speakers. Measurements were made at the following points in time for the fricative + nasal sequence: onset and offset of friction on the spectrographic records; onset of increased nasal flow (channel 5), indicated
THE STABILITY OF PHONOLOGICAL FEATURES
53
by a vertical line in Figure 2, reflecting velo-pharyngeal port opening for the nasal /m/; onset of oral closure for the nasal, indicated by a drop in oral flow (dotted line in channel 4), and offset of the nasal, indicated by an increase in oral flow (dashed line in channel 4) and an increase in amplitude and formant structure for the following vowel on the spectrogram. Figure 2 illustrates anticipatory velopharyngeal opening: the vertical lines mark the onset of nasal flow (channel 5) due to velum lowering for the nasal. At this point in time oral pressure decreases (channel 2) and the high frequency noise disappears (in slow speech) or is attenuated (in fast speech) (The sliding center frequency noise after the vertical line in fast speech reflects the effect of anticipatory lip movement, which is known to filter out higher frequencies). The increase in nasal flow leads the drop in oral flow (dotted line in channel 4) for the complete oral constriction for the nasal. Thus, for a few tens of ms there is concurrent (increasing) nasal flow and (decreasing) oral flow, characterised acoustically by some low amplitude aperiodic noise around 3 kHz. In other words, the velum starts to lower during the acoustic duration of the fricative, resulting in a sudden drop in amplitude of high frequency noise. SLOW
FAST
[ s e m e z m e n] [ s e m e z m e n] Figure 2: (1) Audio signal, (2) filtered Po, (3) unfiltered Po, (4) oral airflow, (5) nasal airflow and 0-5 kHz spectrogram of Say Mesmer again in slow and fast speech. Speaker JO. See text.
The duration of fricatives in test and control sequences was measured on spectrograms and aerodynamic records. In fast speech a considerable number of cases involved blending of the gestures for the two contiguous consonants generally resulting in deocclusivized following stops (as noted by Honorof 2003) or fricativized following sonorants [l, r, n, m]. The aerodynamic records were of great help in segmenting these blended sequences.
54
MARIA-JOSEP SOLÉ
4.3 Results 4.3.1 Patterns of velic movements. The patterns of velic coordination observed in the data are described in patterns 1 through 4, below, and schematized in Figure 3, which shows the oral gestures for the fricative (C1) + nasal (C2) sequence followed by traces of the observed velic patterns. Time 0 is the onset of the oral constriction for the nasal, indicated by the drop in oral flow (dotted line in Figure 2). A delayed oral gesture for the nasal, resulting in an epenthetic vowel, is represented at the bottom of Figure 3. (1)
If onset of velopharyngeal opening is synchronized with the oral constriction for the nasal, nasal flow starts at 0. We take this pattern to reflect a precise synchronization of the supraglottal and velic gestures for the nasal 5 , which allows sequencing of the fricative and the nasal consonant. The oral and velic gestures for the nasal, however, may not be precisely synchronized.
(2)
Onset of velum opening, i.e. nasal flow, may occur prior to the complete oral constriction for the nasal (time 0). That is, anticipatory velophrayngeal opening overlaps the acoustic duration of the preceding fricative. In this case we find concurrent oral flow (from the fricative slit constriction) and nasal flow before time 0 in the aerodynamic data, and lack of friction or attenuated friction on the spectrogram during this time interval.
(3)
Velopharyngeal opening may be delayed relative to the oral constriction for the nasal (time 0) resulting in a transitional epenthetic stop. In this case the aerodynamic data shows neither nasal flow (velum up) nor oral flow (nasal constriction formed) immediately after 0, and a silent gap on the spectrogram. We found the epenthetic stop to be audible if the oral and velic closure overlapped for over 15 ms. Figure 3 shows that the transitional stops emerging in these sequences are always nasally released, that is, they end when the velum lowers.
(4)
Finally, the oral constriction for the nasal may be delayed with respect to the release of the preceding fricative constriction, resulting in an epenthetic vowel. In this case oral and nasal flow is found immediately before 0. Only one such case was found in the data.
5
Whereas it is clear that the oral target gesture for the nasal is a complete constriction, the velic target gesture may not be opening the velum but rather achieving a certain magnitude of velum opening. If this were the case, onset of the oral closure for the nasal would be coordinated with a lowered velum, and nasal flow would begin before 0, as in the case in VN sequences. However, since most of the observed values cluster around 0, we will assume the suggested target coordination.
THE STABILITY OF PHONOLOGICAL FEATURES
55
It is worth noting that these patterns parallel those found in the historical and synchronic data reviewed in Section 3, in line with Ohala’s (1989, 1993) claim that sound change emerges from synchronic phonetic variation.
1. synchronous, onset of nasal flow at 0 2. bleeding Po, oral & nasal flow before 0 3. epenthetic stop, no nasal or oral flow after 0
4. epenthetic vowel, oral & nasal flow before 0 Figure 3: Diagrammatic representation of patterns of velic coordination in fricative-nasal sequences. Traces for the supraglottal movements for C1 and C2 followed by different traces of velum position (dashed lines). See text.
4.3.2 Coordination and timing of velic movements. The patterns of coordination of oral and velic movements in fricative + N sequences for each token at slow and fast rates are presented in Figure 4. This figure plots the time interval between onset of velopharyngeal opening and onset of the oral constriction for the nasal (time 0) for each token for the five speakers. The production of each individual token has been arranged in decreasing duration of that interval. Bars to the left of 0 are cases of anticipatory velopharyngeal opening (onset of nasal flow precedes onset of oral constriction for the nasal; pattern 2 in Figure 3); tokens at 0 represent synchronous onset of oral and velic gestures for the nasal (onset of nasal flow at 0; pattern 1 in Figure 3) and, consequently, a precise sequencing of the fricative and the nasal segment; bars to the right of 0 represent cases of velopharyngeal opening lagging behind the oral closure for the nasal (transitional stops; pattern 3 in Figure 3). The single case of vowel epenthesis (Fresno pronounced [frezno]), where the oral constriction for the nasal consonant is delayed relative to fricative release and velum lowering, is indicated by a lined bar (speaker DM, slow speech). Since inspection of the patterns of sequences involving voiced (e.g. Fresno) and voiceless (e.g. Dessna) fricatives showed no differences, both types of sequences were pooled in this graph. White bars represent homorganic sequences, [sn, zn], and grey bars represent heterorganic sequences, [sm, zm]. Each bar represents one token and the number of plotted tokens ranges
56
MARIA-JOSEP SOLÉ
between 5-6 for each sequence, [sn, zn, sm, zm], depending on the speaker and session. Figure 4 shows that although all observations cluster around time 0, the five speakers show three distinct patterns of velic-oral coordination. Subject JO exhibits extensive anticipatory velic opening during the acoustic duration of the fricative for homorganic and heterorganic sequences. Such anticipatory velic opening vents the required high oral pressure for turbulence and, thus, frication is attenuated or extinguished (see Figure 2). Speakers JE and MS exhibit a majority of cases of anticipatory velic opening in heterorganic sequences (/sm, zm/), but not in homorganic sequences (/sn, zn/), which exhibit a greater number of transitional stops (i.e. delayed velum lowering)6 . However, both speakers show cases of a precise synchronization (time 0) and of anticipatory velic opening for homorganic as well as heterorganic sequences. Finally, speakers DM and RS show mostly epenthetic stops, with a few cases of anticipatory velopharyngeal opening. The difference between homorganic and heterorganic sequences for these speakers seems to point in a different direction from that observed for speakers JE and MS: the epenthetic stops, resulting from a prolonged velic raising while the oral occlusion for the nasal has been achieved, appear to be more common and longer for /sm, zm/ sequences than for /sn, zn/ sequences. In spite of speaker-dependent differences, all speakers show cases of anticipatory velic opening bleeding the high oral pressure for frication. Overall, anticipation of velar activity during the fricative was found in 40% of the tokens in slow speech and 26% of the tokens in fast speech. We now turn to differences in the timing of gestures in slow and fast speech. We expected to find greater overlap of anticipatory movements of the velum with the fricative in fast vis-à-vis slow speech. Contrary to our expectations, Figure 4 shows similar patterns of coordination of velic and oral gestures at slow and fast rates for each speaker, and no major differences in the absolute values of velic timing across rates. Any differences are in the direction of more cases (i.e. more bars with negative values) and a longer period (i.e. larger negative values) of anticipatory velic lowering in slow than in fast speech. That is, the velum appears to be freer to lower before the oral constriction for the nasal is achieved at slow rates, most likely due to the time pressure in fast speech imposing tighter time constraints. However, since fricatives are slightly shorter at faster rates, the same period of anticipatory velopharyngeal opening has a slightly greater percentage effect in fast than in 6
Assuming that the motor instructions for the oral and velic gestures for the nasal are synchronic, and ignoring differences in velocity of articulators, the difference between homorganic and heterorganic sequences could in part be accounted for in terms of gestures involving independent articulators overlapping in time—anticipatory overlap of the labial and velic gesture for /m/ during the alveolar fricative in /sm, zm/ sequences. In homorganic /sn, zn/ sequences involving the same articulator—the tongue tip—the oral and velic gestures for the nasal would be delayed till the tongue tip was available for repositioning. However, this interpretation does not explain why velic opening lags behind the oral closure for the nasal in homorganic sequences.
57
THE STABILITY OF PHONOLOGICAL FEATURES
slow speech. This is illustrated in Figure 5, which plots the modal 7 duration of anticipatory velopharyngeal opening found for each speaker and rate as a percentage of the average duration of the fricative for that speaker and rate, for voiced and voiceless fricatives separately. Figure 5 shows a slightly larger percentage of overlapping velopharyngeal opening (i.e. longer black bars) in fast than in slow speech for all speakers and sequences. Figure 5 also allows us to observe the larger percentage of anticipatory velopharyngeal opening in voiced as opposed to voiceless fricatives, due to the shorter duration of the former. SLOW JO_sm_slow JO_sn_slow
12 11
JO_sm_fast JO_sn_fast
12 11
10
10
9
9 Token number
Token number
FAST
8 7 6 5 4
8 7 6 5 4
3
3
2
2
1
1 -60
-40
-20
0
20
40
60
-60
-40
-20
Time in ms
JE_sm_slow JE_sn_slow
12
11
10
10
9
9
8 7 6 5 4
40
60
40
60
8 7 6 5 4
3
3
2
2
1
1 -60
-40
-20
0
20
Time in ms
7
20
JE_sm_fast JE_sn_fast
12
Token number
Token number
11
0
Time in ms
40
60
-60
-40
-20
0
20
Time in ms
The mode was plotted rather than the mean because of extreme values in the data.
58
MARIA-JOSEP SOLÉ
MS_sm_slow MS_sn_slow
12
MS_sm_fast MS_sn_fast
12 11
10
10
9
9 Token number
Token number
11
8 7 6 5 4
8 7 6 5 4
3
3
2
2
1
1 -60
-40
-20
0
20
40
60
-60
-40
Time in ms
-20
0
20
40
60
Time in ms
DM_sm_fast DM_sn_fast
12 11 10 Token number
9 8 7 6 5 4 3 2 1 -60
-40
-20
0
20
40
60
Time in ms
RS_sm_slow RS_sn_slow
12
11
10
10
9
9 Token number
Token number
11
RS_sm_fast RS_sn_fast
12
8 7 6 5
8 7 6 5 4
4 3
3
2
2
1
1 -60
-40
-20
0
20
Time in ms
40
60
-60
-40
-20
0
20
40
60
Time in ms
Figure 4: Coordination of oral and velic gestures for each production of Fresno and Dessna (white bars), and Mesmer and Missmer (grey bars) at slow and fast rates for each speaker. Tokens with voiced and voiceless fricatives have been pooled.
59
THE STABILITY OF PHONOLOGICAL FEATURES
fricative duration
overlapped velopharyngeal opening
100% 80% 60% 40% 20%
FAST JO
FAST JE
FAST MS
FAST DM
vd
vd
vless
vless
vd
vd
SLOW
vless
vless
vd
vd
SLOW
vless
vless
vd
vd
SLOW
vless
vless
vd
vd
SLOW
vless
vless
0%
SLOW FAST RS
Figure 5: Modal duration of anticipatory velopharyngeal opening (heterorganic and homorganic sequences pooled) as a percentage of the total duration of the voiced or voiceless fricative in slow and fast speech for the five speakers.
4.3.3 Fricative duration. The hypothesis that nasal leakage due to anticipatory velopharyngeal opening extinguishes or attenuates frication for a few tens of ms predicts that fricatives preceding nasal or nasalized segments should be phonetically shorter than those preceding non-nasal segments. We tested this prediction by measuring fricative duration for test and control tokens in slow and fast speech. The results of the measurements show that, as predicted, fricatives preceding nasals are generally shorter than those preceding non-nasals. (Following fricatives and laterals also tend to result in shorter preceding fricatives for some speakers.) Two-way ANOVAs with fricative voicing (voiced, voiceless) and following consonant (nasal, non-nasal) as independent variables, and duration of the fricative as the dependent variable, were performed for each speaker and speech rate separately. Table 2 shows the results for the ANOVAs. The durational differences between fricatives preceding nasal vs non-nasal consonants reached significance for speakers JO, MS and DM in slow and fast speech. Since the interaction between the two factors was significant for speaker JE at faster rates, one-way ANOVAS were carried out for voiced and voiceless fricatives separately for this speaker. Voiced fricatives were found to be significantly shorter preceding nasal than oral consonants (F(1,33) = 11.629, p<0.01), whereas for voiceless fricatives the difference did not reach significance. To conclude, in seven out of ten comparisons fricatives before nasals were significantly shorter than before nonnasals, as expected. A significant main effect of voicing was found for all speakers and rates, with voiced fricatives being significantly shorter than voiceless fricatives.
60
MARIA-JOSEP SOLÉ JO
Slow speech Prenasal vs non-prenasal Fricative voicing Interaction Fast speech Prenasal vs non-prenasal Fricative voicing Interaction
F(1,76)=82.76, p<0.0001 F(1,76)=50.92, p<0.0001
F(1,76)=141.0, p<0.0001 F(1,76)=31.86, p<0.0001
JE
F(1,52)=147.19, p<0.0001
MS
DM
RS
F(1,76)=6.88, p<0.01 F(1,76)=7.67, p<0.01
F(1,76)=8.92, p<0.01 F(1,76)=86.60, p<0.0001
F(1,44)=104.48, P<0.0001
F(1,76)=27.02, p<0.0001 F(1,76)=30.12, p<0.0001
F(1,76)=16.78, p<0.001 F(1,76)=9.53, p<0.01
F(1,64)=183.43, F(1,44)=66.26, p<0.0001 P<0.0001 F(1,64)=14.84, p<0.001 Table 2: Significant differences for the two factor ANOVAS performed for the independent variable on the rows and fricative duration as the dependent variable for slow and fast speaking rates for each speaker.
5.
Discussion and conclusions The results on the timing of velic and oral gestures show that in fricative + nasal sequences the velum may lower during the acoustic duration of the fricative (in 40% of the cases in slow speech and 26% in fast speech in our data). These results are in agreement with those obtained by other investigators. Shosted (2006) obtained nasal and oral flow for VCV utterances where C was a fricative and V was an oral or nasal vowel. He found coarticulatory nasalization during the fricative, with the same acoustic output observed for nasalized fricatives: attenuation of high-frequency energy and increased bandwith of spectral peaks. Recasens (in press) provides electropalatographic and acoustic evidence that /s/ in Majorcan Catalan may be lost in /sn/, /sm/ and /sl / clusters whereas /sb/, /sv/, /sd/, and /sg/ clusters show less extreme cases of fricative weakening. Overall the results indicate that nasal leakage during the fricative reduces the oral pressure required for the generation of turbulence and frication is attenuated or extinguished for a few tens of milliseconds, which may lead to the perceptual loss of the fricative. This is more likely in voiced fricatives which are shorter (Table 2) and, due to reduced transglottal flow, have a lower intensity of friction vis-à-vis voiceless fricatives (Solé 2002b). Thus, if anticipatory velopharyngeal opening for the nasal overlaps the latter portion of the voiced fricative, the low amplitude turbulence may not be heard. Our data show that, in addition to the effect of voicing in the following segment observed by Pope (1934), Rohlfs (1966), Torreblanca (1976), and Recasens (2002), anticipatory velopharyngeal opening for a following nasal may also favor attenuation or loss of friction. Thus, there seems to be a gradient reduction continuum for coda fricatives: fricative + nasal > fricative + voiced consonant > fricative + voiceless consonant. Anticipatory velopharyngeal opening was hypothesized to be larger at faster speaking rates than at slower rates; however, approximately the same period of an overlapped lowered velum during the latter portion of the fricative was found across rates. The same amount of velic overlap had a slightly larger
THE STABILITY OF PHONOLOGICAL FEATURES
61
effect at faster rates due to the slightly shorter duration of the fricative in fast speech. The results in Section 4.3.2 also show cases of epenthetic stops in the transition between fricatives and nasals due to prolonged velum raising for the fricative when the oral constriction for the nasal has been achieved. Such transitional stops are always nasally released and the lack of a strong release burst, which is a perceptual cue for intrusive stops (Ali et al. 1979), is most likely the reason why these epenthetic stops may not be noticed by speakers and have not phonologised as opposed to those emerging in contexts where they are orally released (e.g. nasal + fricative, sense [nts]; nasal + flap, Catalan cambra < Latin cam(e)ra; nasal + lateral, Spanish temblar < Latin trem(u)lu; lateral + fricative, else [lts]). Finally, the results in Section 4.3.3 show that the duration of the fricative tends to be shorter preceding nasal than oral consonants, suggesting that the effect of nasal leakage during the latter portion of the fricative is present phonetically. The results obtained are compatible with the proposed account for the historical and synchronic defricativization or loss of fricatives before nasal or nasalized segments: velopharyngeal opening during the latter part of the fricative vents the intraoral pressure, thus reducing or eliminating the required pressure difference across the oral constriction for audible frication. These findings suggest that interarticulatory timing and associated aerodynamic effects may account for weakening of segments crucially dependent on airflow conditions. Indeed, data on the perceptual impact of these aerodynamic and temporal variations is needed to back up the findings of this study. 6.
General conclusions We set out to test the hypothesis that the physical and physiological principles used to account for paradigmatic aspects of phonology, such as feature co-occurrence restrictions, can be used to explain syntagmatic aspects, such as phonotactic patterns, context-dependent phonological processes and sound change. The results of the experiments reported here show that speech features requiring high airflow through the oral constriction, such as fricatives 8 , tend to be impaired and become unstable with co-occurring or coarticulatory nasalization. The results in Section 2 show that a reduction in oral pressure during the articulation of a fricative (due to venting oral pressure with a pseudo-velopharyngeal valve) reduces the pressure difference and the particle velocity of the air across the oral constriction, and frication is attenuated or lost—hence, the constraint against combining the features [fricative] and [nasal] within a segment. The results of the experiment in Section 4 show that when these features occur in contiguous segments, as in fricative + nasal sequences, there can be anticipatory velopharyngeal opening during the acoustic duration of the fricative, which has the same aerodynamic 8
Tongue-tip trills also require high airflow to set the tongue tip into vibration and, consequently, cannot be nasalized. For the incompatibility between trilling and nasality see Solé (2002b).
62
MARIA-JOSEP SOLÉ
and acoustic consequences on the fricative as concurrent nasalization. Reduction of the oral pressure and subsequent reduction of the intensity of the high-frequency noise may lead to a non-fricative percept or to missing the fricative altogether. Thus, the same factors responsible for the difficulty in combining the two features within a segment may be used to explain why these features do not combine across segments. Relating constraints on the combination of features within and across segments illustrates the generality that can be achieved by a physically based explanation. The instability of frication when combined with nasalization may be at the origin of a number of phonological patterns, specifically, feature cooccurrence restrictions (e.g. lack of nasal fricatives), phonological change, morphological alternations and stylistic variation (e.g. loss/weakening of fricatives followed by a nasal), and transitional probabilities in the sequencing of sounds (lower lexical frequency of fricatives followed by nasals, Solé (forthcoming)). This is one further example of how phonological structure may emerge from physical constraints as advocated by Ohala (1974, 1983) and Lindblom (1986, 1990), and strongly suggests that the same physical principles may provide an explanation for paradigmatic and syntagmatic aspects of phonology.
References Alcover, Antoni M. & Francesc de B. Moll. 1978-1979. Diccionari CatalàValencià-Balear. Barcelona: Gràfiques Instar. Ali, Latif, Ray Daniloff & Robert Hammarberg. 1979. “Intrusive stops in nasal-fricative clusters: An aerodynamic and acoustic investigation”. Phonetica 36:2. 85-97. Badia, Antoni M. 1951. Gramática histórica catalana. Barcelona: Noguer. Busà, Maria Grazia. In press. “Coarticulatory nasalization and phonological developments: Data from Italian and English nasal-fricative sequences”. Experimental Approaches to Phonology ed. by Maria-Josep Solé, Patrice S. Beddor & Manjari Ohala. Oxford: Oxford University Press. Cohn, Abigail C. 1993. “The status of nasalized consonants”. Huffman & Krakow 1993. 329-367. Gerfen, Chip. 1999. Phonology and Phonetics in Coatzospan Mixtec. Dordrecht: Kluwer. Gess, Randall. 1999. “Rethinking the dating of Old French syllable-final consonant loss”. Diachronica 16. 261-296. Gimson, Alfred C. 1962. An Introduction to the Pronunciation of English. London: Arnold. Gregores, Emma & Jorge Suarez. 1967. A Description of Colloquial Guaraní. The Hague: Mouton. Hajek, John & Ian Watson. 1998. “More evidence for the perceptual basis of sound change? Suprasegmental effects in the developent of distinctive nasalization”. Proceedings of the 5th International Congress on Spoken
THE STABILITY OF PHONOLOGICAL FEATURES
63
Language Processing ed. by R. H. Mannell and J. Robert-Ribes. 17631765. Sydney: Causal Productions. Hardcastle, William J. & Nigel Hewlett, eds. 1999. Coarticulation. Theory, Data and Techniques. Cambridge: Cambridge University Press. Hayes, Bruce & Tanya Stivers. 1996. “A phonetic account of postnasal voicing”. Ms., Department of Linguistics, UCLA, Los Angeles, Calif. http://www.linguistics.ucla.edu/people/hayes/phonet.htm#postnasal. Honorof, Douglas N. 2003. “Articulatory evidence for nasal de-occlusivization in Castilian”. Proceedings of the 15th International Congress of Phonetic Sciences ed. by Maria-Josep Solé, Daniel Recasens & Joaquín Romero, vol. 2, 1759-1763. Barcelona: Causal Productions. Huffman, Marie K. & Rena A. Krakow, eds. 1993. Nasals, Nasalization and the Velum. San Diego, Calif.: Academic Press. Kawasaki, Haruko. 1986. “Phonetic explanations for phonological universals: The case of distinctive nasalization”. Experimental Phonology ed. by John J. Ohala & Jeri J. Jaeger. 81-103. San Diego, Calif.: Academic Press. Kawasaki-Fukumori, Haruko. 1992. “An acoustical basis for universal phonotactic constraints”. Language and Speech 35:1,2. 73-86. Ladefoged, Peter & Ian Maddieson. 1996. The Sounds of the World’s Languages. Oxford: Blackwell. Lindblom, Björn. 1986. “Phonetic universals in vowel systems”. Experimental Phonology ed. by John J. Ohala and Jeri J. Jaeger. 13-44. San Diego, Calif.: Academic Press. ----------. 1990. “On the notion of ‘possible speech sounds’”. Journal of Phonetics 18. 135-152. Local, John. 2003. “Variable domains and variable relevance: Interpreting phonetic exponents”. Journal of Phonetics 31. 321-339. Maeda, Shinji. 1993. “Acoustics of vowel nasalization and articulatory shifts in French nasal vowels”. Huffman & Krakow 1993. 147-167. Ohala, John J. 1974. “Phonetic explanations in phonology”. Papers from the Parasession on Natural Phonology ed. by Anthony Bruck, Robert Fox & Michael LaGaly. 251-274. Chicago: Chicago Linguistics Society. ----------. 1975. “Phonetic explanations for nasal sound patterns”. Nasálfest: Papers from a Symposium on Nasals and Nasalization ed. by Charles A. Ferguson, Larry M. Hyman & John J. Ohala. 289-316. Stanford: Language Universals Project. ----------. 1981. “Articulatory constraints on the cognitive representation of speech”. The Cognitive Representation of Speech ed. by Terry Myers, John Laver & John Anderson. 111-122. Amsterdam: North Holland. ----------. 1983. “The origin of sound patterns in vocal tract constraints”. The Production of Speech ed. by Peter F. MacNeilage. 189-216. New York: Springer Verlag. ----------. 1989. “Sound change is drawn from a pool of synchronic variation”. Language Change: Contributions to the Study of its Causes ed. by Leiv E. Breivik & Ernst H. Jahr. 173-198. Berlin: Mouton de Gruyter.
64
MARIA-JOSEP SOLÉ
----------. 1993. “The phonetics of sound change”. Historical Linguistics: Problems and Perspectives ed. by Charles Jones. 237-278. London: Longman. ----------. 1997a. “Aerodynamics of phonology”. Proceedings of the 4th Seoul International Conference on Linguistics. 92-97. Seoul, Korea. ----------. 1997b. “Emergent stops”. Proceedings of the 4th Seoul International Conference on Linguistics. 84-91. Seoul, Korea. ---------- & Maria Grazia Busà. 1995. “Nasal loss before voiceless fricatives: A perceptually-based sound change”. Special issue on The Phonetic Basis of Sound Change ed. by Carol A. Fowler. Rivista di Linguistica 7. 125144. ---------- & Manjari Ohala. 1993. “The phonetics of nasal phonology: Theorems and data”. Huffman & Krakow 1993. 225-249. ----------, Maria-Josep Solé & Goangshiuan Ying. 1998. “The controversy of nasalized fricatives”. Proceedings of the 16th International Congress on Acoustics and 135th Meeting of the Acoustical Society of America. 29212922. Seattle, Washington. Pope, Mildred K. 1934. From Latin to Modern French with special consideration of Anglo-Norman. Manchester: Manchester University Press. Pulleyblank, Douglas. 1997. “Optimality theory and features”. Optimality Theory. An Overview ed. by Diana Archangeli & D. Terence Langendoen. 59-101. Oxford: Blackwell. Recasens, Daniel. 2002. “Weakening and strengthening in Romance revisited”. Rivista di Linguistica 14:2. 327-373. ----------. In press. “Gradient weakening for syllable-final /s, r/ in Majorcan Catalan consonant clusters”. Proceedings of the 7th International Seminar on Speech Production. Ubatuba, Brazil. http://cefala.org/issp2006/camera-ready/recasens.pdf Rohlfs, Gerhard. 1966. Grammatica storica della lingua italiana e dei soui dialetti. Fonetica. Torino: Einaudi. Schadeberg, Thilo C. 1982. “Nasalization in Umbundu”. Journal of African Languages and Linguistics 4. 109-132. Shosted, Ryan. 2006. The aeroacoustics of nasalized fricatives. PhD diss., University of California, Berkeley. Solé, Maria-Josep. 2002a. “Aerodynamic characteristics of trills and phonological patterning”. Journal of Phonetics 30:4. 655-688. ----------. 2002b. “Assimilatory processes and aerodynamic factors”. Laboratory Phonology 7 ed. by Carlos Gussenhoven & Natasha Warner. 351-386. Berlin & New York: Mouton de Gruyter. ----------. 2003. “Aerodynamic characteristics of onset and coda fricatives”. Proceedings of the 15th International Congress of Phonetic Sciences ed. by Maria-Josep Solé, Daniel Recasens & Joaquín Romero. 2761-2764. Barcelona: Causal Productions.
THE STABILITY OF PHONOLOGICAL FEATURES
65
----------. Forthcoming. “Compatibility of features and phonetic content: The case of nasalization”. Presented at the 16th International Congress of Phonetics Sciences. Saarbrücken, Germany. Stevens, Kenneth N. 1972. “The quantal nature of speech: Evidence from articulatory-acoustic data”. Human Communication. A Unified View ed. by Peter B. Denes & Edward E. Jr. David. 51-66. New York: McGrawHill. ----------. 1989. “On the quantal nature of speech”. Journal of Phonetics 17. 346. Straka, Georges. 1964. “Remarques sur la désarticulation et l’amuissement de l’s implosive”. Mélanges de Linguistique Romane et de Philologie médievale offerts à Maurice Delbouille 1. 607-628. Stringer, Mary & Joyce Hotz. 1973. “Waffa phonemes”. The Language of the Eastern Family of the East New Guinea Highland Stock ed. by Howard McKaughan. 523-529. Seattle: University of Washington Press. Torreblanca, Máximo. 1976. Estudio del habla de Villena y su comarca. Alicante: Instituto de Estudios Alicantinos. Walker, Rachel. 2000. Nasalization, Neutral Segments, and Opacity Effects. New York: Routledge. Watkins, Calvert (ed.). 1985. The American Heritage Dictionary of IndoEuropean Roots. Boston: Houghton Mifflin. Welmers, William. 1973. African Language Structures. Berkeley: University of California Press. Westbury, John R. & Patricia A. Keating. 1986. “On the naturalness of stop consonant voicing”. Journal of Linguistics 22. 145-166. Whalen, Douglas H. & Patrice S. Beddor. 1989. “Connections between nasality and vowel duration and height: Elucidation of the Eastern Algonquian intrusive nasal”. Language 65. 457-486.
PRE- AND POSTASPIRATED STOPS IN ANDALUSIAN SPANISH * FRANCISCO TORREIRA University of Illinois at Urbana-Champaign
Abstract In this paper, we provide instrumental evidence from an experiment and a spontaneous speech study that, in Andalusian Spanish, /s/ aspiration consistently implies postaspiration in following voiceless stops. In order to explain the occurrence of this sound pattern, we offer a gestural analysis based on the notion of the syllable as a unit of articulatory coordination. Finally, we briefly contrast the Andalusian pattern with parallel sound sequences in other languages and Spanish dialects, and raise the question of an ongoing sound change.
1.
Introduction In a large number of Spanish dialects, representing the majority of the world’s Spanish speakers, /s/ is reduced to [h] or deleted entirely in wordinternal preconsonantal position (e.g. /este/ → [ehte], este “this”), in wordfinal preconsonantal position (e.g. /las#toman/ → [lahtoman], las toman “they take them”), and/or in prepausal position (e.g. /komemos/ → [komemo(h)], comemos “we eat”). In those dialects considered the most phonologically innovative (such as the Spanish of Andalusia, Extremadura, the Canary Islands, the Hispanic Caribbean, and the Pacific coast of South America), /s/ aspiration has also extended to prevocalic environments word-finally (e.g. /las#alas/ → [lahala], las alas “the wings”) and even word-internally, this phenomenon being less common (e.g. /asi/ → [ahi], así “this way”). In this study, we will provide instrumental data showing that in Andalusian Spanish, a southern variety of Peninsular Spanish, /s/ aspiration before voiceless stops is accompanied by consistent postaspiration. Although this phenomenon has already been briefly reported (Maza 1999; Vaux 1998), to our knowledge no instrumental data demonstrating it have been offered so far.
* Thanks are due to José Ignacio Hualde, Jennifer Cole, Christopher Stewart, Miquel Simonet and Yudong Chen and three anonymous reviewers for their generous support and comments on previous versions of this paper. My thanks also to the audience at PaPI 2005.
68 1.1
FRANCISCO TORREIRA
Phonetic cues of /s/ aspiration The most recognizable acoustic correlate of aspirated /s/ is perhaps a period of voiceless or voiced aspiration. In Andalusian Spanish, this correlate is typically present in utterance-medial word-final position, such as in the utterance las alas [lahala]. However, when we consider multiple phonetic contexts and dialectal differences, the phonetic manifestations of aspirated /s/ appear as being quite varied and finely detailed. For this reason, in this introduction we will only refer to the context studied here, that is, utterancemedial /s/ aspiration before voiceless stops (e.g. /sp/ as in aspa “cross”, /st/ as in casta “caste” and /sk/ casca “shell”). In a perception study on Miami Cuban Spanish carried out with natural stimuli, Hammond (1978) found that speakers were unable to detect utterancefinal /s/, while they consistently distinguished minimal pairs such as costa “coast” ~ cota “height above sea level” (the word costa being always pronounced without a full sibilant). It follows from this that, when confronted with las costas “the coasts”, speakers relied on cues in the medial part of the utterance in order to distinguish it from la costa “the coast”, since utterancefinal /s/ was categorically lost. After analyzing the acoustic differences among the stimuli used in the experiment, it was found that the vowel preceding aspirated /s/ (e.g. /a/ in gasto “costs”) was considerably longer than vowels in open syllables (e.g. gato “cat”). This effect was recently replicated for Puerto Rican Spanish in a study sharing the same design and test-words as Hammond (1978) (Figueroa 2000). Gerfen (2002) also found that, for words sharing this pattern, Eastern Andalusian speakers also tended to produce longer vowels preceding weakened /s/. However, this study explicitly mentioned that, in order for this effect to be significant, vowel duration needed to include the period of aspiration corresponding to weakened /s/. If this period was not counted as being part of the vowel, the latter was actually slightly shorter than vowels in open syllables, as would have been expected since vowels in tautosyllabic VC sequences usually tend to be shorter than open-syllable nuclei. The motivation for including the aspiration period within the vowel was that, in many cases, they could not be clearly delimitated from each other during segmentation, and also that in other cases speakers did not produce voiceless aspiration, but rather different degrees of breathiness towards the end of the vowel. Since neither Hammond (1978) nor Figueroa (2000) explicitly mentioned this issue in their studies, one can wonder whether their stimuli were indeed totally devoid of any aspiration or breathiness, as they suggest, or whether some aspiration or breathy voicing was actually present but was not considered in their segmentation. Another widely-described correlate of /s/ aspiration in this and other phonetic contexts is the opening of mid and low vowels in Eastern Andalusian varieties (EAS) (Navarro Tomás 1939; Alonso, Canellada & Zamora Vicente 1950; Alvar 1955; Salvador 1977). In these dialects, the vowels /a, e, o/ are said to be open or lax when followed by weakened /s/ (whether /s/ aspirates or
PRE- AND POSTASPIRATED STOPS IN ANDALUSIAN SPANISH
69
deletes). Moreover, this opening seems to extend to preceding vowels in the prosodic word, in a typical example of vowel harmony (e.g. lobo “wolf” vs lobos “wolves”). Even though these two features have been traditionally taken for granted in the phonological literature, it should be admitted that little instrumental evidence supports them so far. In reexamining experimental data from Martínez Melgar (1986), Maza (1999) argues that vowel harmony is not consistent when examined in detail (it is not clear, for example, what the domain of the phenomenon is), and that vowel opening is characteristic of both stressed syllables and of vowels preceding weakened -s, without any causal relationship between the two phenomena. Hualde and Sanders (1995) posit that EAS may have preserved etymological higher vowels in the singular than in the plural of masculine words like lobo/lobos < Lat. lupu(m)/lupos, and that this contrast may have been phonologized after final /s/ was weakened. They find support for this hypothesis in the fact that rural EAS speakers display higher final unstressed /e/ and /o/ than speakers of standard varieties. The data presented by Martínez Melgar (1986) further complicate the picture. In spite of considerable variability in her data, this author claims that open and close vowels do not differ as much in word-final position as in penultimate stressed syllables. At the present time we will conclude that, even though the situation does not seem as clear as depicted in traditional accounts, it seems reasonable to assume that speakers of Eastern Andalusian Spanish do somehow associate the occurrence of lax vowels with weakened /s/. As for other dialects, we will note that Hammond (1978) and Rodríguez Cadena (2003) also report data from Cuban where vowels undergo F1 raising in the context discussed, even though both studies found little consistency in the effect. Finally, in the study on Eastern Andalusian Spanish cited above, Gerfen (2002) identified stop closure duration as the most robust correlate of /s/ weakening in words of the type [CVh.CV] where the second consonant was a voiceless stop. It was found in this study that the closure of voiceless stops preceded by aspirated /s/ was significantly longer than stop closures in V.CV contexts. The aim of the present study is to show that Andalusian voiceless stops preceded by aspirated /s/ are consistently postaspirated. Given the consistency of our data, it will be argued that this phonetic feature should be considered along other cues of /s/ aspiration reviewed above, such as stop closure lengthening, vowel lengthening and vowel opening. We leave the realization of aspirated /s/ preceding consonants other than voiceless oral stops outside of our purview (see Romero 1995 for aspirated /s/ plus voiced stop sequences). 2.
The experiment The hypothesis addressed by this experiment is that, in Andalusian, an /s/-aspirating Spanish dialect, voiceless stops preceded by aspirated /s/ are also consistently postaspirated, as opposed to voiceless stops in other contexts (e.g. preceded by either a vowel or another consonant). On the other hand, standard Northern Peninsular Spanish should not display postaspiration for voiceless
70
FRANCISCO TORREIRA
stops preceded by /s/, which is expected to be articulated as a full sibilant. If this hypothesis holds true, postaspiration should be considered as a potential perceptual cue of /s/ aspiration, at least for Andalusian Spanish, and eventually, as a potential source of sound change in this dialect. Given the design of the experiment, it will also be possible to test if other acoustic correlates reported in the literature such as stop closure and vowel lengthening are also present in the studied context, and examine potential trade-offs between these cues. We finally decided not to consider vowel opening in the present study, since the participants in the experiment were speakers of Western Andalusian, a dialect reported as not displaying this phenomenon (Alvar 1955). 2.1
Method Two groups of three speakers of two different Spanish dialects—Western Andalusian and Northern Peninsular—read a list of sentences carrying the test words and distractors, the carrier sentence being Digo X para mí “I say X for myself”. All speakers were aged twenty-five to fifty and had always lived in urban centers. The test words were selected in a balanced way by considering the factors below. All were paroxytones and ended in a low or mid back vowel. In a small number of cases nonce words were used in order to satisfy certain combinations of factors: 1. STOP TYPE: /p/ vs /t/ vs /k/ (e.g. /papa/ “potato” vs /pata/ “leg” vs /paka/ “proper noun”). 2. CODA TYPE: ø vs /l/ vs /s/ (e.g. /kako/ “burglar” vs /kalko/ “calque” vs /kasko/ “helmet”). 3. VOWEL TYPE: /a/ vs /i/ (e.g. /tapo/ “(I) cover” vs /tipo/ “type”). The test word counts were: three different words x three stops x three preceding consonants x two preceding vowels for a total of fifty-four test words. With three repetitions for each of the six speakers, the total number of tokens in the database was 972. The readings were carried out in a quiet room and recorded using a Shure SM10A head-mounted microphone, an M-Audio 410 Firewire audio interface/preamplifier and a Macintosh PowerBook laptop computer. The speakers were informed that they were taking part in a study of how people from different regions speak and that they should try to speak in a natural and relaxed fashion, as if they were at home with friends. These instructions were aimed at mitigating the chances that standard orthography might induce Andalusian speakers to pronounce a full sibilant where the experimental design required weakened /s/. Indeed, the author, a native speaker of Western Andalusian, checked that /s/ was weakened as required. The few cases where /s/ was pronounced as a sibilant were discarded and replaced with items displaying aspiration recorded at the end of the session. Once the data were collected, each token was extracted and the following measurements were manually taken using Praat software (Boersma & Weenink 2005):
PRE- AND POSTASPIRATED STOPS IN ANDALUSIAN SPANISH
71
1. VOT: Voice Onset Time for the target voiceless stops. 2. CLOSURE: Duration of the stop closure for the target voiceless stops. 3. C: Duration of the consonants preceding the target voiceless stops: [h, s, l]. For tokens lacking this consonant (e.g. [kapa] “layer”), the value for this measurement was 0 ms. 4. V: Duration of the vowels /a/ and /i/ preceding the target voiceless stops. Following standard instrumental procedures for measuring VOT, waveforms were used rather than spectrograms, the former allowing for better resolution. After browsing through a good number of tokens, we finally decided to mark the ending point of VOT at the downward zero-crossing before a whole first cycle could be perceived in the signal. Even though this method cannot be considered entirely faithful to the events in the speaker’s glottis, the signal being the result of overlapping supraglottal and glottal gestures, it appeared to be a consistent way of measuring VOT in the absence of articulatory data. Another issue concerning our VOT measurements is that they include the stop release and a period of supraglottal frication besides postaspiration, even if our hypothesis focuses exclusively on postaspiration. Because these three events overlap in the signal (Stevens 1998:456), the whole sequence of events at the release of the stop was preferred over strict postaspiration as a more reliable measurement. All other things being equal, we assumed that postaspirated stops should display a longer VOT than unaspirated stops. However, it could be argued that this difference might in fact be a difference in supraglottal frication, and not in postaspiration. We must say that, during segmentation, a considerable stretch of postaspiration was clearly identified in most tokens where it was expected to occur, even though we decided not to delimit it from supraglottal frication for the sake of consistency. The token in Figure 1 illustrates the presence of postaspiration within the VOT bounds. Another potential segmentation issue concerned the beginning and ending points of aspirated /s/ in the tokens pronounced by the Andalusian group, as was observed above in our review of Hammond (1978), Figueroa (2000) and Gerfen (2002). Here we decided to mark the beginning at the point where F2 showed a clear decrease in energy, regardless of whether the following aspiration period displayed some sort of breathy voicing or plain aspiration. As for the ending point, a problem was raised by spurious energy spots that occasionally appeared during—and well into—the stop closure. Since comparable spots appeared in the medial closure of words without aspirated /s/ such as /pata/ “leg”, they were considered as perturbations not attributable to glottal frication. For this reason, the ending point of this interval was marked with reference to high frequencies (4000-5000 Hz), where these spots were not perceived. The complete segmentation procedure is illustrated in Figure 1.
72
FRANCISCO TORREIRA
2.2 Results 2.2.1 Voice Onset Time. In order to test our hypothesis, namely that Andalusian stops preceded by aspirated /s/ are consistently postaspirated, Voice Onset Time was used as the main dependent variable. Figure 2 illustrates the distribution of VOT values for each stop in the two dialectal groups depending on the preceding consonant. As predicted by our hypothesis, a longer VOT was present only in stops preceded by Andalusian aspirated /s/ (marked as underlying or etymological /s/ in all figures regardless of its different manifestations in the two dialectal groups). Table 1 summarizes the results of a series of repeated-measures ANOVAs carried out for each dialect and each stop type, VOT being the dependent variable, CODA TYPE a withinsubjects factor and SPEAKER a between-subjects factor.
Figure 1: Segmentation procedure as applied to the test word pasta [pahta] uttered by an Andalusian speaker.
For the Andalusian Spanish group, VOT was significantly different depending on CODA TYPE for each of the stops, with all p values under 0.001. A series of TukeyHSD posthoc tests confirmed that the only statistically different subset was CODA TYPE /s/, which in conjunction with the distributions in Figure 2 leads us to maintain that VOT under this condition was statistically longer for the Andalusian speakers. In the Standard Spanish data, the statistical tests broadly followed our predictions. Except for stop consonant /t/, no significant differences were found. Regarding /t/, a TukeyHSD post-hoc test revealed a difference between the CODA TYPE subsets /s/ and /l/, and no difference for other possible pairs. Apart from the fact that this difference only appears between two CODA TYPE subsets and only for stop /t/, a look at the distributions suggests that it is not as clear as the corresponding case in the Andalusian group. Indeed, we doubt that such a difference might become categorical, while for the Andalusian data we believe
PRE- AND POSTASPIRATED STOPS IN ANDALUSIAN SPANISH
73
that the distributions do indeed point towards such a conclusion. In summary, given that the differences in VOT predicted by our hypothesis were highly significant for each stop consonant in the Andalusian Spanish group, and that no comparable pattern emerged from the Standard Spanish data, we will consider that our hypothesis was verified in this experiment. 2.2.2 Other correlates. Stop closure duration and vowel lengthening, which had been previously reported as acoustic cues of /s/ aspiration (Gerfen 2002), were also examined in the experiment. As in the results reported by Gerfen, vowel lengthening was only present if the period of aspiration or breathy voicing resulting from /s/ aspiration was considered to be part of the vowel. Otherwise, a one-tailed t-test revealed that the duration of vowels followed by aspirated /s/ was actually shorter than that of vowels in open syllables [t(322) = 1.72, p<0.005], with a mean difference of 4.4 ms. For this reason, we will refer to rhyme instead of vowel duration. Figure 3 illustrates the effects of Coda Type on stop closure and rhyme duration. Regarding stop closure duration, a series of repeated-measures ANOVAs was run on the Andalusian data for each stop consonant, with Closure as the dependent variable, Coda Type as a withinsubjects factor and Speaker as a between-subjects factor. Significant differences were found for each stop [F(2,157) = 30.39, p<0.001 for /p/; F(2,157) = 28.11, p<0.001 for /t/; F(2,157) = 25.92, p<0.001 for /p/]. TukeyHSD tests revealed that these duration differences were significant for each pair of subsets (/s/ > /l/ > ø) within all three stops. Rhyme Duration also turned out to be statistically different depending on Coda Type, as indicated by a series of repeated-measures ANOVAs carried out separately for each vowel with Speaker as a between-subjects factor [F(2,238) = 233.29, p<0.001 for /a/; F(2,238) = 284.32, p<0.001 for /i/]. Here too TukeyHSD post-hoc tests revealed differences in duration between each logical pair (/l/ > /s/ > ø).
Figure 2: Voice Onset Time (ms) for all stops (/p/ /t / /k/) grouped by Coda Type (ø, /l/ and /s/).
74
FRANCISCO TORREIRA
/p/
/t/
/k/
AndSp
F(2,157) = 165.85 p<0.001
F(2,157) = 168.37 p<0.001
F(2,157) = 123.62 p<0.001
StSp
F(2,157) = 0.53 p = 0.587
F(2,157) = 7.73 p<0.001
F(2,157) = 0.06 p = 0.935
Table 1: Results of a series of repeated-measures ANOVAs carried out for each dialect and each stop type, with VOT as the dependent variable, CODA TYPE as a within-subjects factor and SPEAKER as a between-subjects factor.
Figure 3: Andalusian data: Stop closure duration for each Stop Type (left) and Rhyme duration for each Vowel Type (right), both grouped by Coda Type (ø, /l/ and /s/).
2.2.3 Trade-offs. Given that three acoustic correlates related to /s/ aspiration were identified, the possibility of finding trade-offs in their distribution needed to be considered. VOT was therefore plotted as a function of [h] (or aspirated /s/) and stop closure duration. Figures 4 and 5 show no strong correlation in any of the cases examined. However, in Figure 5 a negative trend seems to emerge for stop consonants /t/ and /k/, though neither correlation coefficient is close to being meaningful. Given that stop closure duration is known to be partly correlated with VOT (Cho & Ladefoged 1999), we prefer to disregard these trends as characteristic of /s/ aspiration, and will rather ascribe them to universal articulatory constraints. Regarding the absence of trade-offs, it will be noted that speech rate effects might have obscured real compensations, since it seems plausible that faster utterances might display both shorter [h] and shorter VOT, while slower utterances might display both longer [h] and longer VOT. In this respect we must also note, however, that the data analyzed were all collected under roughly the same conditions—as laboratory speech—and that we did not notice sudden changes in speech rate during the recording
PRE- AND POSTASPIRATED STOPS IN ANDALUSIAN SPANISH
75
sessions. We will therefore conclude that no consistent trade-offs were identified between VOT and the other two acoustic correlates.
Figure 4: VOT as a function of [h] duration.
Figure 5: VOT as a function of stop closure duration for Andalusian utterances displaying aspirated /s/ (e.g. [kahpa, pahta, tahka] “dandruff, pasta, bar”).
76 3.
FRANCISCO TORREIRA
Spontaneous speech data In order to test whether the results of our experiment could be extended to speech produced in more natural conditions, we recorded and analyzed eight interviews from Canal Sur, an Andalusian public television channel. Because the origins of the participants were well distributed over the region, these recordings also made it possible to test whether the findings of our experiment, which only involved Western Andalusian speakers from urban areas, could be extended to speakers from Eastern Andalusia and rural areas. Given that each interview only lasted between twenty and thirty minutes, several sound sequences relevant for our study had to be discarded. Ultimately, it was decided that only the pair /t/ vs /st/ would be considered, since it was the only one that offered sufficient tokens in each of the interviews (in no case less than twenty tokens per group). However, we believe that the results of our experiment, in which all stop consonant pairs patterned in the same way in a very significant fashion, constitute sufficient reason for extending the results obtained with stop consonant /t/ to /k/ and /p/. In the author’s judgment, the interviews were carried out in a very natural climate. Most importantly for our study, underlying or etymological /s/ was never realized as a sibilant, but was either aspirated or absent from the signal. The participants, aged forty to seventy, were ordinary private citizens, and attended the show with the goal of finding a partner. The recordings were obtained during real-time broadcast of the show using an M-Audio 410 Firewire interface and a Macintosh PowerBook laptop computer connected to a digital TV receiver. Our task then was to detect every occurrence of /t/ and /st/ and extract it for analysis. In the case of the /st/ sequence, we finally decided to include tokens where a word boundary was present within the sequence (e.g. es tierno “it’s tender”). Even though we paid special care to extract every appearance of either of these sequences, also under very fast speech rates, we were obliged to discard certain tokens in a small number of cases (e.g. /t/ was voiced or even spirantized in some tokens). Figure 6 illustrates VOT values for the sequences /Vt/ and /Vst/, divided into two dialectal areas within Andalusia. Following traditional criteria, we considered speakers from the provinces of Huelva and Seville as being Western Andalusian speakers, while speakers native to the provinces of Granada and Jaén were classified as Eastern Andalusian. No speaker from the two central provinces in the region—Córdoba and Málaga—was included in the data. Table 2 indicates the region (W = Western Andalusia, E = Eastern Andalusia) as well as exact origin of each speaker and the statistics of a series of one-tailed t-tests where VOT in /Vst/ was predicted to be higher than in /Vt/ sequences. The boxplots in Figure 6 show a pattern similar to the one obtained under experimental conditions, this being somewhat more marked in the case of Western Andalusian speakers than for Eastern Andalusian speakers. As shown in Table 2, the t-tests confirmed that this impression was in accordance with
PRE- AND POSTASPIRATED STOPS IN ANDALUSIAN SPANISH
77
statistical significance for each speaker. Given that the spontaneous speech data were affected by external factors (e.g. prosodic, word position, segmental, speech-rate) that certainly affected VOT values in disparate ways, and that nevertheless we found a statistically higher VOT in /Vs.t/ than in /V.t/ sequences, we will consider that the spontaneous speech results replicate our experimental hypothesis.
Figure 6: VOT (ms) of /Vs.t/ and /V.t/ sequences for W and E Andalusian speakers.
SPEAKER
TOWN (PROVINCE)
MEAN DURATION in ms (Standard Deviation) (SD) /t/ /st/
tSTATISTICS
W1
Écija (Sevilla)
15.81 (4.51)
37.25 (8.78)
t(53) = 11.71 p<0.001
W2
Lepe (Huelva)
20.33 (7.96)
28.98 (11.50)
t(43) = 2.98 p<0.005
W3
Sevilla (Sevilla)
14.60 (3.97)
41.81(11.28)
t(90) = 15.55 p<0.001
W4
Camas (Sevilla)
18.09 (5.04)
43.27 (9.42)
t(98) = 17.23 p<0.001
E1
Chiclana de Segura (Jaén)
19.61 (5.46)
26.73 (6.67)
t(66) = 4.83 p<0.001
E2
Javalquinto (Jaén)
15.32 (6.02)
28.55 (10.12)
t(45) = 5.41 p<0.001
E3
La Rábita (Granada)
13.82 (5.27)
23.19 (6.82)
t(75) = 6.73 p<0.001
E4
Almuñécar (Granada)
22.90 (8.14)
39.90 (11.90)
t(47) = 5.91 p<0.001
Table 2: Origins of each speaker, VOT means and SD, and t-statistics of individual one-tailed tests where VOT in /Vst/ sequences was predicted to be greater than in /Vt/ sequences.
78 4.
FRANCISCO TORREIRA
Discussion The data presented above lead us to maintain that some Andalusian voiceless stops have undergone postaspiration under the influence of a preceding [h] sound. In this discussion we will try to offer a hypothesis of how this pattern may have arisen. A simplistic way of interpreting these data is to assume that the stops analyzed here are postaspirated whenever they are preceded by underlying or etymological /s/. A look at the data distributions, however, reveals that both VOT and [h] durations present a degree of variability that might be considered problematic if we were to ascribe categorical status to each of these acoustic features. As shown in Figure 4, a considerable number of tokens in our experiment, seventeen to be more precise, did not display any preaspiration at all. As for VOT values, even though significant differences were found between voiceless stops preceded by underlying /s/ and voiceless stops preceded by open syllables, we also found some overlap between the distributions of these groups. Figure 7 presents two words illustrating this point. Notice that the word lista “list” does not display any preaspiration, while it presents a long VOT. On the other hand, the word pasta “pasta” does contain both pre- and postaspiration, even though VOT for this token is shorter than for the former one (note that the time/space scaling of both illustrations is the same). We believe that a gestural analysis (Browman & Goldstein 1989) offers a plausible account for the patterns observed. In the Articulatory Phonology framework, phonological primitives are defined as vocal tract gestures. In the case of this study, we assume that the gesture involved is a single opening of the glottis with limits anchored within a window around the stop closure. Figure 8 shows a gestural score for the word pasta as hypothetically encoded by an Andalusian speaker according to the Articulatory Phonology model. In order to account for the variability found in [h] durations and less so in VOT values, we think it reasonable to assume that the timing of the glottal opening with respect to the supraglottal closure may not be very accurately specified. Because the variability in [h] durations was higher than in VOT values—no single VOT value for any voiceless stop preceded by /s/ was close to the lowest values in the other stop group—while numerous tokens did not present preaspiration at all, we will consider that the timing of the ending point in the glottal gesture with respect to the end of the stop closure is more precise in its specification. If we assume that the gestural account is correct—and even if we do not attach a strong phonological status to it—we may wonder how the timing of these gestures came about diachronically. A similar sound change is documented in the literature for at least one language: Pali is known in the historical linguistics literature for displaying voiceless aspirated stops in contexts where Sanskrit exhibited unaspirated stops preceded by /s/ (e.g. /athi/ ~ /asti/) (Vaux 1998). A different outcome, however, is offered by Old French, in which preconsonantal /s/ underwent weakening and eventually deletion without leaving any trace in the following consonant (e.g. feste > fête [fεt]
PRE- AND POSTASPIRATED STOPS IN ANDALUSIAN SPANISH
79
“party”). With regards to Spanish we find contemporary dialectal variation illustrating similar patterns. Indeed, the variants that a word displaying /s/ before a voiceless stop can exhibit in different Spanish dialects parallel the Pali and French examples given above. While Castilian has the most conservative form with a full sibilant (e.g. [pasta]), the other dialects present different forms of /s/ lenition. Our current work on Porteño Spanish seems to indicate that in this dialect no gestural reorganization has taken place, with a period of aspiration corresponding roughly to the temporal slot formerly occupied by /s/. Puerto Rican Spanish, on the other hand, appears to be at a more advanced evolutionary stage where the laryngeal gesture has been further reduced, resulting in a potential merger of /VsC/ and /VC/ sequences (or reintepretation of the contrast as one of vowel length, as claimed in Hammond (1978) and Figueroa (2000) and suggested by reported cases of hypercorrection (Morgan 1998)).
Figure 7: Utterances lista “list” (left) and pasta “pasta” (right) as pronounced by Andalusian speaker AN.
Finally, it has been claimed in this study that Andalusian presents a laryngeal gesture loosely anchored around the stop closure, resulting in preand postaspirated voiceless stops (more particularly, our spontaneous speech data seem to indicate that speakers from Western Andalusia exhibit a more salient contrast in VOT values than speakers from Eastern parts of the region, even though we found statistical differences for every Andalusian speaker regardless of her/his exact origin). In our view, this phenomenon is particularly interesting because it illustrates a sequence difficult to capture in terms of segments, and calls for a more refined phonological model in order to account for its occurrence as well as for the phonetic variability that it presents. A question that arises is whether this pattern, displaying unstable preaspiration
80
FRANCISCO TORREIRA
and more consistent postaspiration, will eventually lead towards a new category of aspirated voiceless stops in Andalusian Spanish as has occurred in Pali. For the moment, we will only note that the older Andalusian speakers studied in the spontaneous speech data seemed to produce VOT contrasts in the same manner as the younger speakers in our experiment. A more detailed study focusing on factors such as age and the geographic origin of speakers could in principle help elucidate this question.
Figure 8: Gestural score for the word “pasta” in Andalusian Spanish.
4.1
Conclusion This study provides instrumental evidence of the occurrence of postaspiration in Andalusian voiceless stops. Both the experiment and the analysis of spontaneous speech clearly suggest that the conditioning factor for postaspiration in Andalusian voiceless stops is the presence of a preceding underlying or etymological /s/, usually realized as a period of aspiration or breathy voice, but also absent in a considerable number of cases. Given the considerable variability found in the durations of aspirated /s/, with values ranging from 0 to 80 ms, and that VOT values were also quite variable (although statistically different from those of unaspirated stops), we have favored a gestural analysis following the Articulatory Phonology model (Browman & Goldstein 1989). Moreover, two other correlates such as stop closure and rhyme lengthening already reported in the literature have also been identified in our data. Finally, we have offered diachronic and synchronic examples from other languages and dialects illustrating different paths of change for the sound sequence addressed in this study. We believe that these different evolutions along with the phonetic detail offered here for Andalusian should encourage further instrumental work addressing language variation and change.
PRE- AND POSTASPIRATED STOPS IN ANDALUSIAN SPANISH
81
References Alonso, Dámaso, María Josefa Canellada & Alonso Zamora Vicente. 1950. “Vocales andaluzas”. Nueva Revista de Filología Hispánica 4. 209-230. Alvar, Manuel. 1955. “Las encuestas del Atlas Lingüístico y Etnográfico de Andalucía”. Revista de Dialectología y Tradiciones Populares 11. 231274. Boersma, Paul & David Weenink. 2005. Praat: Doing Phonetics by Computer (Version 4.3.27). [Computer program]. Retrieved from http://www.praat.org/. Browman, Catherine & Louis Goldstein. 1989. “Articulatory gestures as phonological units”. Phonology 6. 201-251. Cho, Taehong & Peter Ladefoged. 1999. “Variation and universals in VOT: Evidence from 18 languages”. Journal of Phonetics 27. 207-229. Figueroa, Neysa. 2000. “An acoustic and perceptual study of vowels preceding deleted postnuclear /s/ in Puerto Rican Spanish”. Hispanic Linguistics at the Turn of the Millennium: Papers from the 3rd Hispanic Linguistics Symposium ed. by Alfonso Morales-Front, Héctor Campos, Elena Herburger & Thomas J. Walsh. 66-79. Somerville, Mass.: Cascadilla Press. Gerfen, Chip. 2002. “Andalusian codas”. Probus 14. 247-277. Hammond, Robert. 1978. “An experimental verification of the phonemic status of open and closed vowels in Caribbean Spanish”. Corrientes actuales en la dialectología del Caribe hispánico ed. by Humberto López Morales. 33-125. Río Piedras: Editorial Universitaria, Universidad de Puerto Rico. Hualde, José Ignacio & Sanders, Benjamin. 1995. “A new hypothesis on the origin of the Eastern Andalusian vowel system”. Berkeley Linguistics Society 21: Parasession in Historical Issues in Sociolinguistics / Social Issues in Historical Linguistics. 426-437. Berkeley, Calif.: Berkeley Linguistic Society. Martínez Melgar, Antonia. 1986. “Estudio experimental sobre un muestreo de vocalismo andaluz”. Estudios de Fonética Experimental vol. II. 197-248. Maza, María José. 1999. “Phonetic data and functional explanation in phonology: The case of Granada Spanish”. Newcastle and Durham Working Papers in Linguistics ed. by Kyoko Oga & Geoffrey Poole, vol. V, 161-170. Durham, UK: University of Durham. Morgan, Terrell. 1998. “The linguistic parameters of /s/ insertion in Dominican Spanish: A case study in qualitative hypercorrection”. Perspectives on Spanish Linguistics ed. by Javier Gutiérrez-Rexach & José del Valle, vol. III, 79-96. Columbus, Ohio: The Ohio State University Press. Navarro Tomás, Tomás. 1939. “Desdoblamiento de fonemas vocálicos”. Revista de Filología Hispánica 1. 165-167. Rodríguez Cadena, Yolanda. 2003. “La aspiración y la tonía intrínseca de las vocales”. La Tonía: Dimensiones fonéticas y fonológicas ed. by Esther Herrera & Pedro M. Butragueño. 37-74. México D.F.: El Colegio de México.
82
FRANCISCO TORREIRA
Romero, Joaquín. 1995. Gestural organization in Spanish. An experimental study of spirantization and aspiration. PhD diss., University of Connecticut. Salvador, Gregorio. 1977. “Unidades fonológicas en el andaluz oriental”. Revista Española de Lingüística 7. 1-23. Stevens, Kenneth N. 1998. Acoustic Phonetics. Cambridge, Mass. & London: The MIT Press. Vaux, B. 1998. “The laryngeal specifications of fricatives”. Linguistic Inquiry 29. 497-511.
PART 2 PROSODIC STRUCTURE
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS ∗
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN University of Cambridge
Abstract Extra-sentential elements usually form independent intonational domains. Due to this property, they have been used in the phonological literature to define the intonational phrase. This study, using English and Catalan empirical data collected in three experiments, shows significant variation in the phrasing and accentuation of these constructions. We argue that their role is primarily semantic: they are supplementary, semantically non-restrictive, and anaphorically linked to their referent. Prosody signals their grammatical function by means of independent phrasing, by reductions in pitch span leading to total deaccentuation, and/or by tonal reduplication. We argue that both tonal and junctural cues are used in combination to mark extra-sentential elements as external to the phrase.
1.
Introduction The group of so-called extra-sentential elements (ESEs) includes phrases, such as dislocated phrases (They are crazy, those Romans), and words, such as vocatives (Thanks, sir), and sentential adverbs (obviously). Extra-sentential elements have been used in the phonological literature to define the intonational phrase, the level of the prosodic hierarchy that is most relevant to intonation. Nespor and Vogel claim that there are certain types of constructions such as “parenthetical expressions, non-restrictive relative clauses, tag questions, vocatives, expletives, and certain moved elements” that usually form independent intonation domains (Nespor & Vogel 1986:188). The traditional assumption is that ESEs are syntactically independent, just as they are also prosodically independent (e.g. Pierrehumbert 1980; Nespor & Vogel 1986; Nespor 1993), and thus they have been used as evidence of the effect of syntactic constraints upon phrasing (Cooper & Paccia-Cooper 1980; Nespor & Vogel 1986; among others). Recent work has refined this account by proposing that the boundaries of extra-sentential elements are compulsorily aligned with ∗
Many thanks to Eva Estebas-Vilaplana and to an anonymous reviewer for their very useful comments on an earlier version of this article. All the errors that remain are ours. We wish to thank our informants for kindly donating their time. The research reported here was partly supported by a scholarship Batista i Roca (Generalitat de Catalunya).
86
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
those of the intonational phrase and by developing specific mechanisms to generate this alignment (Selkirk 1984; Truckendbrodt 1995, 1999). One possible problem is that extra-sentential elements show syntactic, phonetic, and perhaps also phonological variation that has not yet been studied in depth. Syntactically, ESEs do not fall into a single class. Appositions are syntactically governed because they are attached to the NP they modify while parentheses and vocatives are commonly analysed as being external to the syntactic structure. The status of non-restrictive relatives is subject to debate: for Emonds (1979), they are attached to S, the root sentence; for Safir (1986), they are attached at the level of Logical Form; for Fabb (1990), they belong to the level of discourse structure; and for Kempson, Meyer-Viol & Dov Gabbay (2000), they are syntactic constituents. The status of other categories such as epithets (He wouldn’t lend me his car, the bastard) has not been addressed in the literature. Consequently, the first difficulty in the study of ESEs arises when deciding which elements belong to this class. Drawing up a preliminary list of ESEs is a necessary first step in the investigation, even if this might not be a comprehensive list (which has never been attempted in previous studies, as far as we know). The first research question would be: which elements belong to this class and what should be the criteria for membership? Another problem is that ESEs show puzzling differences between elements which are initial in the sentence and elements which are not. Initial elements receive a normal intonation, while non-initial elements receive an intonation which is tonally subordinated to that of the main phrase. This tonal subordination is manifested either by reductions in prominence, leading even to total deaccentuation (Bonet 1984:31-32, 90; Ladd 1980:163-165; Liberman 1975:182-184; Gussenhoven 1985, 1993, 2004:290-295; Beckman & Pierrehumbert 1986:293-298; Nespor 1993:265; Prieto 2002a:409ff) and/or by tonal reduplication (Gussenhoven 1985:107, 2004:315-316; Bonet 1984:30, 34; Recasens 1993:214; Prieto 2002a:428-430). It is not clear, however, why the same element in the same phrase should be pronounced in a different way depending on whether it appears in first position or not, as it is not clear whether the different mechanisms used for signalling tonal subordination also have different functions. Tonal subordination of this kind, often right across a prosodic boundary, constitutes a challenge for intonational theory (Ladd 1996:246). A further problem, as has been observed in the literature and also in the results of a previous study (Astruc 2003), is the evidence of prosodic variation among ESEs. However, divergences in the literature can also be attributed to underlying theoretical discrepancies. Therefore, the main goal of this study is to carry out a detailed quantitative study of the phrasing and accentuation of the main categories of ESEs, thus answering the research questions: are ESEs accented? Do they always form independent intonational phrases? Is there real variation in the prosodic form of ESEs, as suggested by descriptions in the literature?
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
87
Variation in the prosodic form of these categories, within or across the languages in the study, would point to the inadequacy of the usually held view that the prosodic form of ESEs is determined by their syntactic form. The apparent lack of prosodic homogeneity casts doubt on the commonly assumed view that ESEs should compulsorily form independent phrases, since this prosodic property is taken to follow from their assumed syntactic independence. The reported asymmetry between initial and non-initial elements casts further doubts on this. The structure of this article is as follows: in Section 1.1 we examine the core categories of ESEs, discuss the criteria for membership, and review previous studies. In Section 2 we present the overall objectives of the study, and we describe the main phonological characteristics of ESEs in English and in Catalan. In Section 3 we describe two experiments involving rightdislocated phrases in Catalan. In Section 4 we summarize the results and draw some conclusions. 1.1
Members of the class of ESEs The decision adopted in this study is to take as the core members of the class of ESEs those categories identified on the basis of grammatical (Huddleston & Pullum 2002) and prosodic (see Nespor & Vogel 1986:188) criteria. The intersection of the two sets of categories yields the categories in (1) 1 : (1)
Dislocated phrases, sentential adverbs, non-restrictive relative clauses, appositions, parentheses, epithets, quotations, vocatives.
The starting assumption is that ESEs do not form a homogenous class from a grammatical point of view. Dislocated phrases and sentential adverbs are clearly syntactic constituents. Non-restrictive relatives, parentheses, sentential adverbs, appositions, and epithets are semantically governed, since they are semantically linked to the element they modify, while direct speech markers and vocatives are pragmatically governed. The lack of grammatical homogeneity of ESEs compromises the commonly assumed view that they form a single category. Two alternative explanations can be considered: one, that there is no motivation to treat ESEs as a single category; and two, that the core property that defines the set resides somewhere else, namely, in their structural role. In this paper we will argue for the second explanation and we will cite as evidence Catalan and English data.
1
Question tags and interjections appear in both lists, but they were not included in the study, because the former are only found in English and the latter are susceptible to much paralinguistic variation.
88
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
1.2
ESEs as supplements In Huddleston and Pullum (2002) ESEs have the role of ‘supplements’, since they add information that supplements but does not restrict or modify the propositional content of the main clause. They are external to the syntactic structure, as illustrated by this example: (2)
Pat, the life and the soul of the party, had invited all the neighbours.
If the life and the soul of the party is removed, neither the grammaticality nor the truth conditions of the sentence are affected. Therefore, the parenthetical clause is optional: it has the function of a supplement which is semantically related to the clause with which it co-occurs, the ‘anchor’. Owing to their lack of integration into the syntactic structure, supplements are semantically non-restrictive. As seen in (2), the supplement the life and soul of the party does not serve to distinguish one referent from another: it does not restrict the denotation of Pat, the head nominal. With the possible exception of sentential adverbs 2 , the syntactic peripherality of ESEs neatly corresponds to their function as semantic supplements, in that ESEs show a clear correspondence between grammatical form and grammatical function. It remains to be seen whether there is also a correspondence between grammatical form and prosodic form. In the next section, we are going to examine what has been said about the phrasing and intonation of ESEs in previous studies. 1.3
Previous intonational studies ESEs have attracted ample attention in English (Liberman 1975; Pierrehumbert 1980; Beckman & Pierrehumbert 1986; Pierrehumbert & Hirschberg 1994; Ladd 1980, 1996; Bing 1985; Gussenhoven 1985), in Catalan (Bonet 1984; Prieto 2002a, 2002b; Emonds 1979; Recasens 1993; Payà 2002, 2003), and also in other languages such as French (Wunderli 1987; Martin 1987; Delattre 1972; Fagyal 2002a, 2002b), Italian (Nespor 1993; Nespor & Vogel 1986), Spanish (Zubizarreta 1998). There is widespread agreement as regards the phrasing of ESEs into independent units. Most studies describe ESEs as separated by audible prosodic breaks, be these tonal boundaries or pauses. There are different views about which prosodic unit ESEs belong to. Proposals range from analysing ESEs as ‘enclitic’ phrases, which are both separated from, and tonally attached to, the main phrase (Trim 1959; Liberman 1975; Pierrehumbert 1980; Gussenhoven 2004), to proposals treating them as intermediate phrases (Beckman & Pierrehumbert 1986), and to proposals describing them as full2 Certain types of sentential adverb (modal adverbs such as possibly) do restrict and/or modify the truth conditions of the proposition (Bing 1985; Allerton & Cruttenden 1974). Sentential adverbs, therefore, appear to have functional characteristics that set them apart from the other ESEs (see Astruc 2005; Astruc & Nolan in press for a more detailed description).
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
89
fledged intonational phrases (Ladd 1996; Zubizarreta 1998). However, there is much less agreement in the literature regarding their accentuation. Certain categories such as epithets and direct speech markers (Ladd 1980:164-165 3 ) tend to appear as deaccented. Categories generally described as accented are appositions and non-restrictive relatives (with the main exceptions of Pierrehumbert & Hirschberg 1990; Pierrehumbert 1980; Cruttenden 1997). Vocatives are described as deaccented in English (Liberman 1975; Ladd 1980, 1996; Beckman & Pierrehumbert 1986; Bing 1985; Pierrehumbert & Hirschberg 1990) but as accented in Catalan (Bonet 1984:28-29, 60-61, 88; Recasens 1993:211, 214; Prieto 2002a:428-429). Cross-linguistic variation in the frequency of accentuation of certain categories should not come as a surprise since it is well known that English uses accentuation to a greater extent than Catalan does. Intra-linguistic variation, however, can prove difficult to accommodate in a model that treats ESEs as a single class, such as the model aimed at in this study, unless it can be proved that divergences in the literature have a theoretical basis. This is not unlikely, since, by their very nature, ESEs provide abundant examples of mismatches between phrasing and intonation that are not easily accommodated within the framework assumed in this study (the Autosegmental Metrical approach, henceforth AM), which does not admit the existence of intonational units that are both independent and deaccented. 2. 2.1
Experiment 1: comparative production study Goals and methodology The main goal of this study is to analyse the phrasing and intonation of the different elements in order to elucidate whether the categories traditionally considered as ESEs, and which are hypothesised to form a single pragmatic category, also show a consistent prosodic behaviour. Narrowing down the main research question, we can ask: are ESEs accented? Do they always form independent intonational phrases? Is there real variation in the prosodic form of ESEs, as seems to emerge from descriptions in the literature? Is it true that there is an asymmetry between the elements of the left periphery and those of the right periphery? In order to address these research questions, the experimental design has covered three studies: a general comparative study of the phrasing and intonation of the core types of ESEs in English and Catalan, two quantitative studies to find evidence that some ESEs are completely deaccented, and a further quantitative study (which is not reported here, for reasons of space) to test the hypothesis that English and Catalan use different strategies for marking ESEs prosodically.
3 But see Ladd 1996:219-221, where direct speech markers are described as underspecified regarding accentuation: they can receive either pitch accents or phrase tones.
90
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
A corpus of 605 phrases was gathered 4 containing the target ESEs listed in Example (1), and which included the main categories of ESEs usually discussed in syntactic and phonological studies. The English part consisted of 462 items: twenty-nine sentences each read twice by eight Southern British English speakers. The Catalan part consisted of 143 items: fifty sentences, each read once by three Central Catalan speakers. Recordings were conducted in a sound-proof studio, using a DAT recorder. In both corpora, some tokens had to be discarded because of reading errors. Examples of the target ESEs are shown in (3): (3)
a. Dislocated phrases: Ella només feia que pensar en sa mare, la bona de la Norma (“She was thinking all the time about her mother, good Norma”). b. Sentential adverbs: No saben comportar-se, honradament (“They can’t behave, honestly”). c. Non-restrictive relative clauses: Una de les noies, que es diu Norma, es va posar malalta (“One of the girls, whose name is Norma, got ill”). d. Appositions: Molt millor que es quedin amb son pare, el Norman. (“They’d better stay with their dad, Norman”). e. Parentheses: I es va gastar els diners, amb gran alegria, fent un viatge a una illa tropical’ (“And she spent the money, with great joy, by taking a trip to a tropical island”). f. Epithets: Acabo de veure el meu ex, el cabró (“I’ve just seen my ex, the bastard”). g. Quotation markers: Com va anar el viatge?—l’Alma demana a la Mariana (“‘How was the trip?’ Alma asks Mariana”). h. Vocatives: L’Anna va guanyar-la, Manu (“Anna won it, Manu”).
2.2
Results The data was digitised at 16 kHz and subsequently analysed by the first author by listening to the recording and looking at pitch traces obtained with Praat 4.1.21. The analysis comprised identifying and labelling the type and location of pitch accents and prosodic breaks. In this study, phrasing and accentuation are considered to be independent systems, as proposed by Trim (1959) and Ladd (1980:164), among others. Phrasing is taken to be based on junctural cues without tonal movement being obligatory. The criteria for deciding that a prosodic boundary was present were the presence of junctural cues, such as pauses and/or pre-boundary lengthening. The criteria for deciding that a stretch of speech was deaccented were the presence of flat F0 and reduced amplitude. A pause was defined here as a period of silence not caused
4 The Catalan data was collected in three stages: a pilot experiment, and two further studies. The English data was recorded later, together with a study on sentential adverbs in which ESEs were used as distractors (see Astruc 2005; Astruc & Nolan in press). The imbalance in the number of examples in the two corpora does not affect the results of the phonological analysis.
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
91
by the presence of obstruents of 100 ms or longer. Anything shorter than 100 ms was considered pre-boundary lengthening. We used a stylised representation for pitch accents, combined with a numbered coding system (for both accents and prosodic breaks) that was later used for entering the data in SPSS. The analyses were repeated two further times, at intervals, without significant inconsistencies emerging. A subset of the data was checked by the second author and the analyses mostly confirmed. The following generalizations can be drawn about the main prosodic properties of ESEs. First, ESEs tend to be prosodically independent of the main phrase, except for appositions and vocatives. In these two cases, prosodic variation seems to correlate with a dual communicative function (see Section 2.2.3). Second, there is a prosodic difference between initial and non-initial ESEs. In initial position, ESEs are both rhythmically and intonationally independent (again, with the exception of vocatives and appositions), while in medial and final position they are tonally subordinated to the main phrase. Third, this tonal subordination is carried out by means of two principal strategies: (4)
a. by reductions in pitch span leading to total deaccentuation. b. by reduplicating the contour of the main phrase, which can be accompanied by the use of an overall lower pitch level and a much lower voice volume.
Dislocated phrases, quotations markers, and epithets follow strategy (4a); that is, they show deaccentuation. Parentheses, non-restrictive relatives, and appositions follow strategy (4b): they show tonal reduplication at a lower level and with a compressed pitch range. Sentential adverbs and vocatives do not fit clearly into either (4a) or (4b). 2.2.1 Dislocated phrases, quotations markers, and epithets. In English, dislocated phrases form independent phrases 88% of the time, both in initial and final position (see Figure 1). In final position they are always deaccented (100% of the time) and are usually followed by a final rise (70% of the time). In the Catalan corpus, they always form independent intonational phrases (see Catalan examples in Section 3), and they are nearly always deaccented (95%). Unlike in English, they are not followed by a final rise. Direct speech markers, in both English and Catalan, are nearly always deaccented (86%) and they always form independent phrases (100%), which are usually set off by pauses (65% of the time) (see Figure 2). In English, epithets always form independent phrases, are nearly always deaccented (86%) and are frequently followed by a rise (68%). This final pitch movement is assumed to be not an accent but a boundary tone. Figure 3 shows the intonation that typically corresponds to English epithets.
92
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
F0 20 (st)
20
12
12
6
6
0
0 -5
-7 Those Ro
mans,
they're cra
0
They're cra
zy 1.67806 1.6780
zy,
those
Ro
mans 1.70456 1.7045
0
Figure 1: In the left panel, an example of a left-dislocated subject in English. In the right panel, an example of a right-dislocated subject (st = semitones). 25
F0 (st)
12
0 -7 "The meal
is
rea
dy',
my
mo
ther an
noun
ced 1.76456
0 Time (s)
Figure 2: Example of a direct speech marker in English. 25
F0 (st) 12
0 -7
I
quite
like
my
neigh
bours,the old
fo
0
geys 2.25381
Time (s)
Figure 3: Example of epithet in English.
Unlike in English, epithets in Catalan 5 may be accented (50% of the time) and they never carry a final rise. The epithets in the corpus are at times produced in such a way that they form part of the same intonation unit as the main phrase (50%), and at other times produced so that they form a separate intonation unit and are accented (although with a very reduced pitch range). Even in those cases where the speaker added extra emphasis, the emphatic 5 Epithets, however, can carry an emotional load, in which case they are accented, irrespective of language. In the data, the few occurrences of accenting correspond to the same phrase, I’ve just caught a glimpse of my ex, the bastard and its interrogative counterpart Have you seen my ex, the bastard?
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
93
accent had a pitch range subordinated to that of the main phrase, as can be seen in the pitch trace in Figure 4. Parallel lines have been placed by hand to facilitate the comparison of pitch range and level in the main clause and epithet. From the pitch trace it is easy to appreciate that the epithet is pronounced at a lower level and with a narrower pitch range than the main phrase, and the speech wave indicates that it also has less intensity. 0.1821 0 -0.2292
F0 (st)
20 12
0
1.69637
Time (s)
6 0 -6 -10 A ca bo de veu re el meu
ex, el ca
bró 1.69637
0 Time (s)
Figure 4: Example of epithet in Catalan: Acabo de veure el meu ex, el cabró (“I’ve just seen my ex, the bastard”).
2.2.2 Parentheses, non-restrictive relatives, appositions, and vocatives. Parentheses, non-restrictive relatives, and appositions follow strategy (5b): they show tonal reduplication at a lower level and with a compressed pitch range. Parentheses in English usually form independent phrases (97%, with pauses occurring 60% of the time), and are mostly accented (83%), often with a reduplicating contour (53% of the time). In Catalan, parentheses always form independent units which are mostly demarcated by pauses (58%) (see Figure 5). This is in line with descriptions in previous work (Payà 2002, 2003; Prieto 2002a, 2002b), just as also is the fact that they are mostly accented, with a low register and compressed pitch range. F0 (st)
30 24 18 12 6 0 T'hasd'estar al
llit,
ai xò és un dog ma,
quantens la ma là
ria
0 Time (s)
Figure 5: Example of parenthesis in Catalan: T’has d’estar al llit, això és un dogma, quan tens la malària (“You have to stay in bed, that’s a rule, when you have malaria”).
94
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
One of the first things that is observed about non-restrictive relative clauses is that they show more cohesion with the phrase to their left than with that to their right. Both in English and Catalan, the non-restrictive relative clause reduplicates the tonal pattern of the left-hand clause (60% of the time in Catalan, 53% in English) (see Figure 6). They are mostly prosodically detached (88% in English, 100% in Catalan) by phrase tones or pauses, and in those cases when only one pause is present, the pause occurs between the nonrestrictive relative clause and the right-hand clause. This behaviour is not surprising since non-restrictive relative clauses are closely grammatically linked to their immediately preceding clause, their ‘anchor’, which contains their referent and with which they agree in number, and, in Catalan, also gender. F0 (st)
20 18 15 12 9 5 La Ra mo na, que viu a la
vo
ra,
veu
l'An na da vant d'una bo
ti
ga
[La Ramona] [que viu a la vora] [veu l'Anna] [davant d'una botiga] 0
3.16481 Time (s)
Figure 6: Example of a non-restrictive relative clause in Catalan: La Ramona, que viu a la vora, veu l’Anna davant d’una botiga (“Ramona, who lives nearby, sees Anna in front of a shop”).
Similar to the case of non-restrictive relative clauses, appositions, both in English (88% of the time) and in Catalan (83% of the time) tend to form independent intonational phrases that reduplicate the contour of their anchor (Figure 7) 6 . Vocatives show a different behaviour in English and Catalan. In English they are mostly deaccented, while in Catalan they are accented 50% of the time (see Figure 8). As noted by Gussenhoven (1985) and Cruttenden (1997), vocatives have either an attention-catching function (in which case they are accented and receive an intonational contour of their own), or an expressive function (in which case they are deaccented and belong to the same intonational unit as the main phrase).
6
Appositions that provide identifying information about the referent are interpreted as ‘appositive modifiers’ and do not receive independent contours. For instance, Norman in This is my husband Norman would be interpreted as an ‘appositive modifier’, that is, a way of identifying this person, whereas in This is my husband, Norman would be interpreted as a ‘supplement’, that is, as additional information about the referent. (Huddleston and Pullum 2002:1064, 447). This distinction would correspond to that proposed by Gussenhoven (2004:290-292) between ‘incorporating’ and ‘enclitized’ ESEs.
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
95
Sentential adverbs, and adverbs in general, are more heterogeneous prosodically than any of the other categories, just as they are also more heterogeneous semantically than any of the other parts of speech. The examples in the corpus presented variation in their accentuation, but they were very consistent as regards to their phrasing: they nearly always formed independent sentences. They behave, thus, according to what is predicted in the literature about ESEs in general. Their detailed description, however, exceeds the scope of this article (see Astruc 2005; Astruc & Nolan in press). F0 (st)
30 24 18 12 6 0 Hi com pra
ri
a,
a Bo ra Bo
ra,
u nae le gant
vil
la 3.03669
0 Time (s)
Figure 7: Example of an apposition in Catalan: Hi compraria, a Bora Bora, una elegant vil·la (“I would buy an elegant villa there, in Bora Bora”). F0 (st)
22 18
12
5 L'An
na
va gua
yar-
0
la,
Ma
nu 1.30575
Time (s)
Figure 8: Example of a Catalan vocative in final position: L’Anna va guanyar-la, Manu (“Anna won it, Manu”).
3. 3.1
Experiments 2 and 3: Right-dislocated phrases in Catalan Introduction Right-dislocated phrases are clauses such as those girls in They are nice, those girls. Their main characteristic is the presence of a co-referential element within the main phrase (in this case, the pronoun they) which is linked to the dislocated NP (those girls). The main function of the right-dislocated phrase is that of introducing background information in a position where a high informative content would normally be expected (Huddleston & Pullum 2002; Geluykens 1992, 1994; Lambrecht 1981, 1994). In languages such as Catalan, which mainly use syntactic changes to signal focus, right- and left-dislocations
96
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
serve the main function of removing background information out of the main clause, so that the focal accent coincides with the last pitch accent of the main phrase (Vallduví 1990, 1994, and elsewhere). Previous work on Catalan intonation has made no strong claims about the accentual status of right-dislocated phrases. It is implied that they have an accentual pattern of their own, though compressed and subordinated to that of the main phrase (Prieto 2002a; Recasens 1993; Payà 2003). However, they have also been described as lacking prominence (Bonet 1984:31-32, 90). Therefore, empirical testing is needed in order to support the claim that rightdislocations are really deaccented. It was noted in the data from Experiment 1 that there were several instances of ‘miniature accents’ following the focal accent; little ‘bumps’ with an excursion size of about 1 semitone. Post-focal accents have been identified so far in narrow focus sentences in Catalan (Estebas-Vilaplana 2000), in Spanish (Zubizarreta 1998), and in Italian (Grice 1995; D’Imperio 2002). It is doubtful, though, whether these ‘bumps’ in the present data correspond to a reduced and subordinated pitch accent, or whether they are a mere side-effect of the higher subglottal pressure associated with a stressed syllable. The main goal of the two experiments reported here is, therefore, to provide quantitative evidence that right-dislocations are indeed deaccented in Catalan, so that this analysis can be extended to the English data and to the other structures similarly described as deaccented in final position, that is, epithets, reported speech markers, parentheses, and some sentential adverbs. Experiments 2 and 3 were designed as separate experiments, although they were recorded in a single session, and their aim was to quantify the scaling of the target syllable in the dislocated element, in order to assess whether it receives a true pitch accent or not. 3.2 Experiment 2 3.2.1 Experimental material. The corpus consisted of fourteen sentences each read twice by four speakers under four experimental noise conditions, thus yielding 448 sentences in total, of which 224 contained appositions and 224 right-dislocations. The fourteen sentences thus consisted of seven minimal pairs (see Appendix), each containing an apposition and a dislocation, with both elements of each pair introduced by a short background text: (5)
a. Apposition: —La mama va veure els nuvis, abans de la boda? (“Did Mum see the bride and the groom before the wedding?”) —Va veure la Núria, la núvia (“She saw Núria, the bride”) b. Right-dislocated subject: —Qui és l’amiga que va anar a veure la núvia? (“Who’s the friend that the bride went to see?”) —Va veure la Núria, la núvia (“She saw Núria, the bride”)
Two sets of comparisons were planned: (1) comparison of the stressed syllables in the apposition and dislocation, that is the initial syllable nú in Nú-
97
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
ria in (5a) and (5b); (2) comparison of the most prominent syllables (the penultimate ones) in the main phrase and dislocation, that is, the comparison of Nú-ria with nú-via 7 . As can be seen, the pragmatic context is more rigorously controlled than in the comparative study described in Section 2, though both aimed at being as near to natural sounding speech as possible. The seven minimal pairs were repeated twice and mixed in random order with fillers and distractors. Among the fillers were the phrases used in Experiment 3, which will be described in Section 3.3. 3.2.2 Methodology. Masking noise was used to elicit an increase in voice volume which in turn induced an increase in pitch (the so-called Lombard effect: see Lane & Tranel 1971; Junqua 1996. See detailed description of the procedure in Astruc 2005, Chapter 3). It was expected that there would be an increase in the scaling of potential pitch accents. However, preliminary results showed a general raising in pitch level instead of the expected local raise in pitch. This prompted a change in methodology in Experiment 2. The first author carried out a phonological analysis, quantifying the cases of prosodic separation and the types of prosodic breaks used, as well as the instances of accentuation and deaccentuation. This was done by carefully listening to the recordings and by looking at the pitch traces obtained with Praat (4.1.21) following the criteria described in Section 2.2. This analysis was repeated twice, with an interval of a few weeks separating each analysis, and without having the annotations of the previous analysis at hand, and a portion of the data was checked by the second author, without finding substantial divergences. F0 (st)
20
20
12
12
6
6
0
0 -5
-5
Vol
la
ve
la
0
la ve
Vol
lla 1.15706
Time (s)
la
ve
la
la
ve
lla 1.44594
0 Time (s)
Figure 9: A minimal pair in Catalan. Left-hand panel, an example of right-dislocation. Righthand panel, the apposition counterpart.
Figure 9 presents in the left-hand panel the phrase Vol la vela, la vella (“She wants the sail, the old lady [does]”), with a right-dislocated NP subject. This interpretation was elicited by a sentence about Mary, an old lady who has 7
The planned comparisons were not carried out in the end because a better method for testing the accentuation of the right-dislocations was devised and applied in the following experiment, as reported in Section 3.3.
98
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
a ship in a bottle that is missing a sail, followed by the question What does the old lady want? In the right-hand panel, we see the appositive interpretation of the same segmental string, Vol la vela, la vella. Subjects were presented with the context of a fisherman fixing his boat and working on the sail, which was old. To the question What does the fisherman want?, the answer was: [He] wants the sail, the old one. Both appositions and dislocations are set off by prosodic boundaries, that is, by lengthening, tonal movements, and/or pauses. The main difference between them is that appositions receive a contour that reduplicates the contour of the main phrase at a lower voice level. Such reduplication is observed in 60% of the data. 3.2.3 Right-dislocations: Phrasing and intonation. In this section we will present an overview of the phrasing and intonation of the 224 right-dislocated phrases, which constitute half of the corpus. The analysis was performed by carefully listening to the recordings and examining the pitch traces. Following the criteria described in Sections 2.2 and 3.2.2, it was decided that a rightdislocation was deaccented if it sounded less loud than the main phrase and without any perceivable pitch movement. (A more detailed quantification of the scaling of the stressed syllables of the dislocated phrases will be undertaken in the experiment reported in Section 3.3, which uses a different methodological approach). The criterion for deciding whether the main phrase and the right-dislocated NP formed independent units or not was the presence of lengthening, tonal movements, and/or pauses. When the constituent started with a vowel, there was also creakiness at the end of prosodic constituents and glottalization at the beginning. If any of these indications of a prosodic break was found, it was decided that they formed independent phrases. That is, ‘independent units’ includes both intermediate and intonational phrases. According to this criterion, right-dislocations formed independent units 70% of the time. This percentage includes a sizeable degree of inter-speaker variation, as can be seen in the bar graph presented in Figure 10. inde pe nde nt units 120
98
percentage
100 80
72
73
speaker 2
speaker 3
60 40
38
20 0
speaker 1
speaker 4
speakers
Figure 10: Percentages of right-dislocated phrases forming separate units (y-axis) for each speaker (x-axis).
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
99
In Figure 10, the first speaker shows a marked tendency to produce the right-dislocated phrase in the same unit as the main phrase, (forming separate units only 38% of the time). On the other hand, nearly all (98%) of the productions of the last speaker fall into separate units. The other two speakers show a very similar behaviour, with 72% of the productions of speaker 2 and 73% of speaker 3 forming prosodically independent units. deaccentuation 120
percentage
100
96 85 71
80
64
60 40 20 0
speaker 1
speaker 2
speaker 3
speaker 4
speakers
Figure 11: Percentages of deaccented right-dislocated phrases (y-axis) for each speaker (xaxis).
Figure 11 shows that, on average, speakers deaccented 79% of rightdislocations, with the highest percentage belonging to speaker 1 (96% of deaccented cases), and the lowest to speaker 4 (64%), with the other two speakers 3 and 4 showing 71% and 85% deaccented tokens respectively. 3.2.4 Discussion. Accentual cues seem to be slightly stronger than phrasing cues, since 79% of the cases appear as deaccented and the degree of interspeaker variation is lower than with phrasing cues (70% of phrases form independent units and inter-speaker variation is much higher). One possible explanation for the remarkable inter-speaker differences in phrasing may be the speakers’ different reading styles. The reading pace of speaker 1 is quite fast, while that of speaker 4 is rather slow, and that of the other two speakers can be considered normal. As has been suggested in work by Cooper and PacciaCooper (1980:189), the slower the reading, the more likely the speaker is to break utterances into separate prosodic phrases. Speakers who read fast make fewer prosodic breaks and deaccent more frequently, as is the case with speaker 1, the fastest reader. By contrast, speaker 4, the slowest reader, tends to produce his right-dislocated phrases in a separate unit, and he also shows a greater tendency to accent them. This seems to point to a trade-off between phrasing and accentuation in the prosodic form of right-dislocated phrases (at least in read, pre-planned speech). However, further study is needed to examine how differences in phrasing correlate with differences in reading style and with the increasing degrees of vocal effort.
100
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
3.3 Experiment 3 3.3.1 Methodology. In Experiment 3, instead of using masking noise to elicit variations in pitch range, the levels of prosodic prominence were manipulated. The experimental material (see Appendix) was highly controlled. The target syllables were the initial syllable in disyllabic words with either stress on the second syllable (as in Vila, a surname), or stress on the first syllable (as in Vila, the nickname of a football club), as well as the initial syllable in tetrasyllabic words with secondary stress on the first syllable and primary stress on the third one (as in Vilabella, a place name and also the name of a football club). In this way, the target syllables have identical segmental composition but different degrees of stress—unstressed, primary stress, and secondary stress 8 . Therefore, the three stress conditions are: (6)
Stress conditions a. Stress 0: Vila (“surname”) b. Stress 1: Vila (“nickname of a football club”) c. Stress 2: Vilabella (“a football club”)
There were twelve such words in total embedded in three right-dislocated subject phrases and in three right-dislocated object phrases (see Appendix). The material was balanced to level out vowel-specific pitch differences (Lehiste 1970), so that half of the target syllables contained high vowels and half contained low and central vowels. Both the information structure and the semantic/pragmatic context were kept constant. To this effect, the sentences were introduced by a question calculated to elicit an out-of-focus interpretation, as in: (7)
a. Va guanyar la lliga, el Vilabella? (“Did they win the league, Vilabella?”) b. Va guanyar-la, el Vila (“They won it, Vila”)
The target structures were mixed in random order with other phrases intended to act as ‘distractors’ to prevent subjects from falling into a repetitive reading style. Thus prepared, the text was read by six Central Catalan speakers, three males and three females (see Astruc 2003). It was expected that syllables bearing stress, whether primary or secondary, would be scaled higher than their unstressed counterparts. But if syllables with primary stress were significantly higher than those with just secondary stress, this would indicate that they receive real pitch accents and not mere stress effects.
8
Three or more unstressed syllables, as in el-Vi-la-be-lla, are not allowed in Catalan and a support stress has to be re-introduced (Oliva & Serra 2002; but see also Mascaró 2002).
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
101
3.3.2 Results. With regards to phrasing, it was found that dislocations, as in Experiment 1, tended to form separate tonal units. This can be observed in the two pitch traces in Figure 12. In both cases there is a pause between the main phrase and the dislocated phrase, which has a rather flat pitch range compared with that of the main phrase. The dislocated sentences appeared to be deaccented, both acoustically and instrumentally, as shown in Figure 13. F0 (st)
40
40
24
24
12
12
0
0
Ja 3.3
l’hi a
gra
da,
la
lli
Ja
ma
l’hi a
gra
da,
la lli mo
na
da 1.93094
1.6645 0
0
Time (s)
Time (s)
Figure 12: Phrasing in two right-dislocations in Catalan: Ja li agrada, la llima (“S/he likes it, lime”); ja li agrada, la llimonada (“S/he likes it, lemonade”).
semitones
scaling according to metrical weight 10 8 6 4 2 0
5.155
unstressed
5.559
secondary stress
5.018
primary stress
stress levels
Figure 13: Scaling of the target syllables. Fundamental frequency in semitones (on the y-axis) and three levels of stress (on the x-axis).
As Figure 13 shows, the F0 level measured over the unstressed syllables is virtually identical to that of the target primary stressed syllable. This can be taken as evidence against the existence of pitch accents, which is further confirmed statistically by a one-factor repeated-measures mixed ANOVA run on the data of all six speakers for the three stress conditions. The ANOVA provides no significant evidence of effects of stress level upon scaling (F(2,10) = 0.547, p>0.05). The 0.50 semitones difference between the syllables with secondary stress and those with primary stress, apart from not being significant, is contrary to the initial hypothesis that syllables with primary stress would have a higher scaling. The lower scaling of all primary stressed syllables is interpreted as an artifact of the experimental procedure used, because unstressed syllables (llimona “lemon”) and syllables with primary stress (llima “lime”) are shorter than syllables with secondary stress (llimonada “lemonade”) and so the measurement point is earlier in the overall pitch downtrend.
102
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
There is no support for the hypothesis that right-dislocations are accented since the differences between the three stress conditions are fairly modest and, furthermore, in a direction opposite to the initial hypothesis, which was that primary stressed syllables should be higher than secondary stressed ones. In this case, secondary stressed syllables, which appear in initial position in tetrasyllabic words, are lower than primary stressed ones that belong to disyllabic words. This is not interpreted as an indication of the existence of low pitch accents. Such a possibility is discarded, first, on auditory grounds, and, second, following the outcome of the statistical analysis which shows no significant differences in scaling under the three different stress conditions. 4.
Summary and conclusion Phonological studies have traditionally considered ESEs as forming syntactically and prosodically independent units. However, as argued in Section 1.2, they do not form a homogeneous grammatical category, even though they have certain common characteristics which are semantic in nature: their semantic scope encompasses the whole sentence, and most ESEs (except sentential adverbs) also share the semantic function of adding supplementary information. The question is, how does this functional role relate to their prosodic form? First of all, our view is that the prosodic behaviour of dislocations and extra-sentential elements in general is governed by general principles of information structure and textual organization, as suggested by Liberman (1975:185), and as is also implicitly contained in Ladd’s notion of ‘structural pitch range effects’ (that is, downstep and pitch range shifts) to signal syntactic and textual structure (Ladd 1996:279). Our view is that the role of ESEs is primarily semantic (either that of signalling sentence-wide semantic scope, as sentential adverbs do, or that of signalling an anaphoric connection to their referent, as most ESEs do), and that such a semantic role is signalled prosodically by means of tonal and/or junctural cues. The purpose of Experiment 1 was to establish whether ESEs always form independent tonal units and are always deaccented, or rather show variation in their phrasing and intonation. The answer is yes: they show both types of variation, which can be taken as an indication of an on-going trade-off between prosodic independence and tonal subordination to cue the peripheral status of ESEs. Experiment 2 used masking noise to study the accentuation of Catalan appositions and right-dislocations, which were showed to differ in the way they signalled tonal subordination (appositions, with reduplication; rightdislocations, with deaccenting). There was also observed in both Experiment 1 and Experiment 2 a potential correlation between phrasing and speaking rate, so that the faster the rate, the lower the occurrence of prosodic breaks. A tightly controlled methodology is needed to find confirmation for this trend, which was not confirmed statistically, perhaps owing to inter-speaker variation. Unfortunately, the masking noise technique used in Experiment 2 did not yield quantitative confirmation of the analysis. Experiment 3 followed a
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
103
much stricter methodology, which involved measuring the F0 excursion of syllables with different degrees of prosodic prominence. With this method, it was possible to conclude that right-dislocated phrases, when tightly controlled for contextual factors, are totally deaccented. Prosodically, ESEs signal their usual peripheral role in the sentence either by being totally deaccented or, if accented, by means of a dramatic compression in pitch range coupled with prosodic separation, and often, by a combination of both strategies. Prosodic separation, in fact, is not strictly compulsory. About 70% of the tokens in the corpus were split into two units. Most of them, but not all, were also deaccented. Deaccentuation only seemed to be compulsory in those cases in which the ESEs and the main phrase belonged to the same prosodic unit, and there was scope for ambiguity. This behaviour hints at the existence of a trading relationship between rhythm and melody, a notion that can be traced back to Trim (1959) and Ladd (1980:164). If further evidence were found, standard AM will have to account for it. A possible solution (in line with Beckman & Pierrehumbert 1986 and with Truckenbrodt 1995, 1999) would be to analyse ESEs as intermediate phrases, with obligatory phrase accents and boundary tones but with optional pitch accents. In the absence of pitch accents, phrase accents would spread from right to left. When pitch accents are present, tonal subordination mechanisms such as reductions in pitch range or tonal reduplication would operate within the domain of the intermediate phrase to signal the subordinated grammatical role of ESEs.
References Allerton, David J. & Alan Cruttenden. 1974. “English sentence adverbials: Their syntax and their intonation in British English”. Lingua 34. 1-30. Astruc, Lluïsa. 2003. “Right-dislocations in Catalan”. Proceedings of the 15th International Congress of the Phonetic Sciences ed. by Maria-Josep Solé, Daniel Recasens & Joaquín Romero. 1265-1268. Barcelona: Causal Productions. ----------. 2005. The intonation of extra-sentential elements in Catalan & English. PhD diss., University of Cambridge (available from www.astruc.info). ---------- & Francis Nolan. In press. “Variation in the intonation of sentential adverbs”. Tones and Tunes ed. by Tomas Riad & Carlos Gussenhoven, vol. I. Berlin & New York: Mouton de Gruyter. Beckman, Mary & Janet Pierrehumbert. 1986. “Intonational structure in Japanese and English”. Phonology Yearbook 3. 225-310. Bing, Janet M. 1985. Aspects of English Prosody. New York & London: Garland Publishing. Bonet, Eulàlia. 1984. Aproximació a l’entonació del català central. [A preliminary investigation of the intonation of Central Catalan]. Master’s thesis, Universitat Autònoma de Barcelona.
104
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
Cooper, William E. & Jeanne Paccia-Cooper. 1980. Syntax and Speech. Cambridge, Mass. & London: Harvard University Press. Cruttenden, Alan. 1997. Intonation. (2nd edition). Cambridge: Cambridge University Press. Delattre, Pierre. 1972. “The distinctive function of intonation”. Intonation: Selected Readings ed. by Dwight Bolinger. 159-174. Hamondsworth: Penguin. D’Imperio, Mariapaola. 2002. “Italian intonation: An overview and some questions”. Probus 14. 37-69. Emonds, Joseph E. 1979. “Appositive relatives have no properties”. Linguistic Inquiry 102. 211-243. Estebas-Vilaplana, Eva. 2000. The use and realisation of accentual focus in Central Catalan with a comparison to English. PhD diss., University College London. Fabb, Nigel. 1990. “The difference between English restrictive and nonrestrictive relative clauses”. Journal of Linguistics 26. 57-77. Fagyal, Zsuzsanna. 2002a. “Prosodic boundaries in the vicinity of utterancemedial parentheticals in French”. Probus 14. 93-111. ----------. 2002b. “Tonal template for background information: The scaling of pitch in utterance-medial parentheticals in French”. Proceedings of the 1st International Conference on Speech Prosody ed. by Bernard Bel & Isabelle Marlien. 279-282. Aubenas d’Ardèche: Lienhart. Geluykens, Ronald. 1992. From Discourse Process to Grammatical Construction: On Left-dislocation in English. Amsterdam: John Benjamins. ----------. 1994. The Pragmatics of Discourse Anaphora in English: Evidence from Conversational Repair. Berlin: Mouton de Gruyter. Grice, Martine. 1995. The Intonation of Interrogation in Palermo Italian: Implications for Intonation Theory. Tübingen: Niemeyer. Gussenhoven, Carlos. 1985. “The intonation of George and Mildred: Postnuclear generalizations”. Intonation in Discourse ed. by Catherine JohnsLewis. 77-123. San Diego: College Hill. ----------. 1993. On the Grammar and Semantics of Sentence Adverbs. Dordrecht: Foris. ----------. 2004. The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. Huddleston, Rodney & Geoffrey K. Pullum. 2002. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press. Junqua, Jean-Claude. 1996. “The influence of acoustics on speech production: A noise-induced stress phenomenon known as the Lombard reflex”. Speech Communication 20. 13-22. Kempson, Ruth, Wilfried Meyer-Viol & Dov Gabbay. 2000. Dynamic Syntax: The Flow of Language Understanding. Oxford: Blackwell. Ladd, D. Robert. 1980. The Structure of Intonational Meaning: Evidence from English. Bloomington & London: Indiana University Press.
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
105
----------. 1996. Intonational Phonology. Cambridge: Cambridge University Press. Lambrecht, Knud. 1981. Topic, Antitopic, and Verb Agreement in Nonstandard French. Amsterdam: John Benjamins. ----------. 1994. Information Structure and Sentence Form: Topic, Focus, and the Mental Representation of Discourse Referents. Cambridge: Cambridge University Press. Lane, Harlan & Bernard Tranel. 1971. “The Lombard Sign and the role of hearing in speech”. Journal of Speech & Hearing 14. 677-709. Lehiste, Ilse. 1970. Suprasegmentals. Cambridge, Mass. & London: The MIT Press. Liberman, Mark. 1975. The Intonational System of English. Bloomington, Ind.: Indiana University Linguistics Club. Martin, Pierre. 1987. “Prosodic and rhythmic structures in French”. Linguistic Inquiry 8. 249-336. Mascaró, Joan. 2002. “Reducció vocàlica”. Gramàtica del català contemporani ed. by Joan Solà, Maria-Rosa Lloret, Joan Mascaró & Manuel Pérez Saldanya. 89-123. Barcelona: Editorial Empúries. Nespor, Marina. 1993. Fonologia: Le strutture del linguaggio. Bologna: Il Mulino. ---------- & Irene Vogel. 1986. Prosodic Phonology. Dordrecht: Foris Publications. Oliva, Salvador & Pep Serra. 2002. “Accent”. Gramàtica del català contemporani ed. by Joan Solà, Maria-Rosa Lloret, Joan Mascaró & Manuel Pérez Saldanya. 345-391. Barcelona: Editorial Empúries. Payà, Marta. 2002. “Incidental clauses in spoken Catalan: prosodic characterization and pragmatic function”. Proceedings of the 1st International Conference on Speech Prosody ed. by Bernard Bel & Isabelle Marlien. 559-562. Aubenas d’Ardèche: Lienhart. ----------- 2003. “Prosody and pragmatics in parenthetical insertions in Catalan”. Catalan Journal of Linguistics 2. 207-227. Pierrehumbert, Janet. 1980. The phonetics and phonology of English intonation. PhD diss., MIT. ---------- & Julia Hirschberg. 1990. “The meaning of intonational contours in the interpretation of discourse”. Intentions in Communication ed. by Philip R. Cohen, Jenny L. Morgan & Martha E. Pollack. 271-311. Cambridge, Mass. & London: The MIT Press. Prieto, Pilar. 2002a. “Entonació”. Gramàtica del català contemporani ed. by Joan Solà, Maria-Rosa Lloret, Joan Mascaró & Manuel Pérez Saldanya. 393-462. Barcelona: Editorial Empúries. ----------. 2002b. Entonació: Models, teoria, mètodes. Barcelona: Ariel. Recasens, Daniel. 1993. Fonètica i fonologia. Barcelona: Enciclopèdia Catalana. Safir, Ken. 1986. “Relative clauses in a theory of binding and levels”. Linguistic Inquiry 17. 663-683.
106
LLUÏSA ASTRUC-AGUILERA & FRANCIS NOLAN
Selkirk, Elisabeth. 1984. Phonology and Syntax: The Relation between Sound and Structure. Cambridge, Mass.: The MIT Press. Trim, John L. 1959. “Major and minor tone groups in English.” Le Maître Phonétique 112. 26-29. Truckenbrodt, Hubert. 1995. Phonological phrases: Their relation to syntax, focus, and prominence. PhD diss., MIT. ----------. 1999. “On the relation between syntactic phrases and phonological phrases”. Linguistic Inquiry 30:2. 219-255. Vallduví, Enric. 1990. “The role of plasticity in the association of focus and prominence”. Proceeding of the Eastern States Conference on Linguistics 7 ed. by Yongkyoon No & Mark Libucha. 295-306. Columbus, Ohio: Ohio State University. ----------. 1994. “Detachment in Catalan and information packaging”. Journal of Pragmatics 22. 573-601. Wunderli, Peter. 1987. L’intonation des sequences extraposées en français. Tübingen: Gunter Narr Verlag. Zubizarreta, María Luisa. 1998. Prosody, Focus, and Word Order. Cambridge, Mass. & London, England: The MIT Press. Appendix Experiment 2 Quasi-minimal pairs of apposition and right-dislocated phrases in Catalan (with translations in English). Appositions —És la Mona, la dona. Sí, es diu així: Mona. (“She is Mona, the wife. Yes, that’s her name, Mona.”) —Va veure la Núria, la núvia. (“She saw Núria, the bride.”)
—És Móra, la Nova. No l’altra Móra, la Móra d’Ebre. (“This is Móra, la Nova. Not the other Móra, Móra d’Ebre.”) —Fan el drama, ‘La Mama’. (“They are showing the drama, ‘The Mum’.”) —Vol la nena, la Lena.(“She wants the little girl, Lena.”) —Vol la vela, la vella. (“He wants the sail, the old one.”) —Han posat Dallas, el drama. (“They have shown Dallas, the drama.”)
Dislocations
—És mona, la dona. Però no sembla pas gaire agradable. (“She’s cute, the wife [is]. But she doesn’t seem very nice.”) —Va veure la Núria, la núvia. (“She saw Núria, the bride [did].”)
—És mora, la nova. (“She’s an Arab, the new one [is].”)
—Fa drama, la mama. (“She’s making a drama out of it, mother [is].”)
—Vol una nena, la Lena. (“She wants a girl, Lena [does].”) —Vol la vela, la vella. (“She wants the sail, the old lady [does].”) —Va passar a Dallas, el drama. (“It happened in Dallas, the drama [did].”)
Experiment 3 Right-dislocated phrases in Catalan with translations in English. Ja li agrada, la mel (“S/he likes it, honey”). Ja li agrada, el meló (“S/he likes it, watermelon”).
VARIATION IN THE INTONATION OF EXTRA-SENTENTIAL ELEMENTS
Ja li agrada, la melonada (“S/he likes it, watermelon juice”). Ja en vol, de mel (“S/he wants some, honey”). Ja en vol, de meló (“S/he wants some, melon”). Ja en vol, de melonada (“S/he wants some, melon juice”). Va guanyar-la, la mare (“She won it, the mother [did]”). Va guanyar-la, la mamà (“S/he won it, the mama [did]”). Va guanyar-la, la Mamabona (“They won it, the Mamabona [did]”). Ja li agrada, la llima (“S/he likes it, lime”). Ja li agrada, la llimona (“S/he likes it, lemon”). Ja li agrada, la llimonada (“S/he likes it, lemonade”). Ja en menja, de llima (“S/he eats it, lime”). Ja en menja, de llimona (“S/he eats it, lemon”). Ja en beu, de llimonada (“S/he drinks it, lemonade”). Va guanyar-la, el Vila (“They won it, the Vila [did]”). Va guanyar-la, el Vilà (“He won it, Vilà [did]”). Va guanyar-la, el Vilabella (“They won it, the Vilabella [did]”).
107
VOICING-DEPENDENT CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH AND FRENCH ∗
LAURA COLANTONI & JEFFREY STEELE University of Toronto
Abstract In the present work, we build upon the proposal outlined in Colantoni & Steele (2005b) that asymmetrical patterns of Spanish and French stop-liquid cluster simplification are conditioned by liquid type and stop voicing. Specifically, using data from four Romance varieties (Argentine and Chilean Spanish; European and Quebec French), we show that the longer epenthetic vowel in Spanish voiced versus voiceless stop-rhotic clusters and the restriction of voicing assimilation to voiceless stop-rhotic clusters in French can be explained with reference to asymmetrical stop length as it interacts with consonant sequence timing, and constraints on voicing in fricatives and dorsals respectively. The factors shown to condition synchronic variation can be extended to explain the evolution of stop-liquid clusters from Latin to Romance.
1.
Asymmetries in cluster simplification Spanish and French stop-liquid (SL) clusters 1 display asymmetric patterns of assimilatory and dissimilatory simplification conditioned by stop voicing and liquid type attested both synchronically and diachronically. Table 1 illustrates that a different specification for voicing in both members of the clusters is a necessary and sufficient condition for simplification via assimilation. For example, voicing assimilation applies synchronically to voiceless stop-rhotic clusters (Asymmetry types 1, 2, 5 in Table 1); in Spanish, it is restricted to coronal clusters in some varieties (e.g. Chilean Spanish, (2)). Diachronically, assimilation also occurred primarily in voiceless stop-lateral clusters, triggering the development of palatal segments in Spanish and Portuguese word-initially and medially (e.g. AMPLU (Lat.) > an[t]o “wide” (Spanish); PLAGA (Lat.) > []aga “wound” (Portuguese); Mendeloff 1969; Hall 1976; Jessen 2001) and in French word-medially and finally (e.g. ∗
We wish to thank Travis Bradley for his detailed comments on an earlier version of this paper as well as the editors of the volume and the three anonymous reviewers for their useful feedback. All errors remain our own. 1 In this paper, we focus on word-initial and medial clusters. While French also permits wordfinal SL clusters (e.g. table [tabl] “table”, lettre [lt] “letter”), they are absent in Spanish and thus not discussed here.
LAURA COLANTONI & JEFFREY STEELE
110
COCHLEARIUM (Lat.) > cui[j]ère “spoon”; ACUC(U)LA (Lat.) > aigui[j]e “needle”). Conditioning variable Voicing Liquid type 5. Assimilation with rhotics; non1. Rhotic devoicing following assimilation with laterals (French voiceless obstruents (French [ke] “create” versus [kle] “key”; [ky] “raw” versus [y] “crane”) 2 Chilean Spanish [otro] “other” versus [atlas] “atlas”) 2. Assibilation in voiceless clusters; non-assibilation in voiced 6. Cluster-medial epenthesis in clusters (e.g. Chilean Spanish rhotic clusters; no epenthesis with Synchronic [otro] “other” versus [dama] lateral clusters (French [b] “drama”) 3 “arm” versus [ble] “wheat”; Spanish [otro] “other” versus 3. Cluster-medial epenthesis in [sopla] “blows”) voiced stop-rhotic clusters only (French /i/→[i] “grey” “yell”); versus /ki/→[ki] longer epenthetic vowels in voiced clusters (Spanish) 7. Rhotic clusters maintained; 4. Voiceless clusters simplified; lateral clusters simplified (Latin to voiced clusters maintained (Latin Diachronic to Spanish PLUVIA > [uja] Spanish QUADRUS > versus “rain” versus GLORIA > [loja] [kwao] “square” “glory”) AMPLUS > [anto] “wide”) Table 1: Selected asymmetries within Spanish and French SL clusters conditioned by voicing and liquid type. Type
Whereas a difference in voicing specification seems to be a necessary and sufficient requirement for assimilation, dissimilation, in contrast, is conditioned by more than one factor. This is particularly true of the type of dissimilation on which we will focus in the rest of the present work, namely cluster-medial vowel epenthesis 4 , which appears to be conditioned by voicing (Table 1, (3)) and liquid type (Table 1, (6)). 1.1
Previous approaches to cluster simplification Previous accounts of cluster simplification have suggested that similarity in place, manner and voicing plays a role in the selection of the strategy. Côté (2004) argues that perceptual similarity motivates stop deletion in final liquid2
Typical descriptions of devoicing in this context mention both laterals and rhotics. However, in the French data investigated here (Section 4, Table 3), devoicing is restricted to stop-rhotic clusters. 3 Argüello (1978) also reports assibilation in /d/ clusters. However, assibilation in voiced clusters is far less frequent in general than in their voiceless counterparts. 4 Malmberg (1965) proposes treating the type of cluster-medial vowel epenthesis studied here as an instance of dissimilation.
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
111
stop clusters in Quebec French, Catalan, Hungarian and two varieties of English. According to this approach, perceptual similarity is not computed in acoustic or auditory terms, “but is mediated by phonological features, which have perceptual correlates” (2004:3). Thus, simplification patterns are determined by universal as well as language-specific principles. In order for deletion to take place, both members of the cluster should be similar on at least two of the three dimensions. Based on data from an experimental study of Argentine Spanish and Quebec French, Colantoni and Steele (2005b) also propose that similarity in manner and voicing determines the rate of stop-liquid cluster simplification. This study focuses on patterns of cluster-medial epenthesis as opposed to final cluster simplification. As shown in Table 2, epenthetic vowels were observed in voiced rhotic clusters in French versus all rhotic clusters in Spanish. French Spanish Total Total Cluster # epenthesis clusters % # epenthesis clusters %
Obstruent + /l/ 9 599 1.5 11 591 1.9 /p,t,k/ + /r/ 20 356 5.6 339 360 94.2 /b,d,g,/ + /r/ 320 356 89.9 386 395 97.7 Table 2: Epenthesis rates in French and Spanish SL clusters (Colantoni & Steele 2005b).
The authors argue that a precondition for epenthesis is that one of the members of the cluster be [−continuant]. Moreover, they propose that the rate of epenthesis, as well as the length of the epenthetic vowel, is a function of liquid type and the degree of similarity between the rhotic and the stop. In particular, when the rhotic is realized as a tap, epenthesis is almost categorical (Spanish), given the similarity in manner with the stop 5 . In addition, when the stop is voiced and the degree of similarity hence increases, the length of the epenthetic vowel also increases (Figures 1a and b); these vowels are highlighted in Figures 1 and 2 by a dotted box. In French, on the other hand, dissimilation only occurs when the stop is voiced (see Figure 2); when it is voiceless, assimilation which results in rhotic devoicing is the preferred strategy as per (1) in Table 1. Neither analysis of cluster simplification, however, is completely satisfying, as both fail to account for the variable nature and specific phonetic implementation of the phenomena. Indeed, they do not explain the asymmetrical length of the epenthetic vowel in Spanish (Colantoni & Steele 2005b) and the sources of devoicing in the French stop-rhotic clusters (Côté 2004). In order to provide such an explanation, we will argue here that the phonetic characteristics of the rhotics in both languages must be considered, as well as the phonetic correlates of voicing in Romance more generally. Previous accounts have focused on a single variety, which makes it difficult to 5
The similar patterning of taps and stops has been widely documented cross-linguistically (Ohala 1993a; Inouye 1995).
112
LAURA COLANTONI & JEFFREY STEELE
determine the productivity of each strategy in a given language as well as across varieties. In particular, if vowel epenthesis is a productive strategy in Spanish and French, it should be observed in all varieties whose rhotics and stops are phonetically realized like those of Argentine Spanish or Quebec French. Moreover, if simplification is a function of the degree of similarity of both members of the clusters in manner, place and voicing, in any variety in which the rhotic is coronal but is variably realized as a fricative in voiceless clusters (e.g. Chilean Spanish, Lenz 1940), one should predict a pattern of simplification that is intermediate between those observed in Argentine Spanish and Quebec French; that is, such varieties should exhibit a rate of epenthesis in voiced clusters equally high to that of Spanish and French, yet a lower rate of epenthesis in voiceless clusters vis-à-vis Argentine Spanish.
Figures 1a and b: Epenthesis in Spanish voiceless and voiced stop-rhotic clusters (targets letra /leta/ “letter” and cobrá /koa/ “cash (2PS imperative)”).
Figure 2: Epenthesis in French voiced stop-rhotic clusters (target drapeau /dapo/ “flag”).
Building on proposals made in Colantoni & Steele (2005b), we will argue that, in order to fully account for the asymmetries observed, it is necessary to focus on (i) the aerodynamics of voicing in fricatives in order to account for voicing assimilation in French; and (ii) the phonetic correlates of phonemic voicing in stops, in order to explain the epenthetic vowel length asymmetry in Spanish. As concerns the latter, stop duration and laryngeal voicing are considered to be the most important phonetic correlates of phonemic voicing in Romance (e.g. Borzone de Manrique & Gurlekian 1980; Jacques & Gurlekian 1982; Romero 1995); some researchers even argue that
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
113
duration alone is the relevant feature for the identification of stop voicing (e.g. Martínez Celdrán 1991; Pérez 1998 for Spanish). In particular, we will show that one of these acoustic correlates—namely stop duration—outweighs the other in accounting for the two competing strategies under analysis here. First, we propose that the length asymmetry in the Spanish epenthetic vowel is directly related to asymmetries in the phonetic realization of stops: voiceless stops are longer than their voiced counterparts. As such, we predict compensatory lengthening effects with all [+continuant] segments 6 versus no effect with [−continuant] segments (i.e. the Spanish tap). This would be consistent with previous observations of length compensation in French consonant-vowel sequences (Fischer-Jørgensen 1968) and vowel-consonant sequences (Chen 1970; O'Shaughnessy 1981) as well as lateral compensation following labial stops in Spanish (Guirao & García Jurado 1991). Second, we argue that the strategy adopted for cluster simplification is determined by the degree of similarity in voicing and manner of the members of the clusters as well as universal markedness and aerodynamic constraints on voicing. In particular, if the liquid is high in sonority and thus differs greatly from the preceding stop, voicing will be maintained and cluster simplification will be minimal. Laterals, being acoustically similar to vowels and consequently the optimal sonorants in both languages (see Colantoni & Steele 2005a), should involve the least degree of simplification. In contrast, if the liquid is doubly marked as is the case with French []—voicing is strongly disfavoured both in fricatives and dorsals—a higher rate of simplification is expected. Finally, we will demonstrate that the inter- and intra-language variation attested may be formalized within Optimality Theory as competition between Faithfulness and Markedness constraints. We will test our claims with experimental data from four Romance varieties representing two different languages, namely Argentine and Chilean Spanish and European and Quebec French. As concerns Spanish, the varieties are selected because they have been described as having different realizations of the rhotic in singletons and clusters (see Section 3 below). In the case of French, the goal is to extend the empirical coverage, particularly given previous suggestions that the epenthesis reported in Colantoni & Steele (2005b) may simply be an idiosyncrasy of this particular variety. The remainder of the paper is structured as follows. In Section 2 we motivate our hypotheses and follow this with a discussion of the methodology in Section 3. Results concerning the selection of the strategy and the role played by length and voicing are presented in Sections 4, 4.1, and 4.2 respectively. We conclude with a proposal for formalizing the variation observed in Section 5 and an evaluation of the hypotheses in Section 6.
6
Although Spanish laterals have sometimes been characterized as [-continuant] (e.g. Goldsmith 1981; Mascaró 1984), we assume them to be [+continuant] segments based on their acoustic characteristics.
114
LAURA COLANTONI & JEFFREY STEELE
2.
Hypotheses Our first hypothesis, following Colantoni and Steele (2005b), is that similarity in manner and voicing between the two members of the cluster determines the degree of cluster simplification. In particular, if both members of the cluster are highly similar as concerns these two parameters, the rate of simplification will be higher. As such, we predict a higher degree of simplification in stop-rhotic clusters than in stop-lateral clusters across the four varieties, given that the lateral is consistently realized as an approximant in Spanish and French, and is thus more vowel-like. Second, as concerns stop-rhotic clusters, we hypothesize that the phonetic characteristics of the rhotic will determine the strategy selected. If the rhotic is highly similar to the stop, as is the case with the Spanish tap, dissimilation via epenthesis will be the preferred outcome. This is strictly the case for voiceless-stop-rhotic clusters. However, Spanish voiced stops may be lenited both in intervocalic position and following all consonants but nasals (e.g. Navarro Tomás 1970) 7 . Traditionally, these lenited variants have been characterized as fricatives but acoustic studies have demonstrated that they are rather approximants (Santagada & Gurlekian 1989; Martínez Celdrán 1991). Comprehensive articulatory and acoustic studies (e.g. Romero 1995; Martínez Celdrán 2006) as well as the data collected here show that non-lenited variants are also attested in intervocalic position. As such, in the formulation of our hypotheses, the voiced consonants are assumed to be stops underlyingly (see Harris 1969). In contrast, if the rhotic is less similar to the stop, assimilation is then possible. In particular, if the rhotic is a dorsal fricative as in French 8 , assimilation should be a more likely outcome in voiceless clusters. The greater probability for assimilation with a dorsal fricative in this context is related to aerodynamic constraints against voicing in both fricatives and dorsals (Ohala 1993a, 1993b, 1997), which together result in the cross-linguistic dispreference for voiced fricatives (Maddieson 1984). If, in addition, the dorsal fricative is preceded by a voiceless consonant, the likelihood of a devoiced variant should increase. Thus, the rate of dissimilation (epenthesis) should be (i) higher in Spanish than in French, and, (ii) within Spanish, more prevalent in the Argentine than Chilean variety, as the rhotic is realized as a tap in the former but may be variably articulated as a fricative in the latter. In contrast, the rate of voicing assimilation should be higher in French, where the rhotic is realized as a dorsal fricative. Finally, stop voicing also plays a role in determining the degree of assimilation and dissimilation. Given that voiceless stops are longer than their voiced counterparts, a compensatory lengthening effect should be observed in the second member of the cluster, provided that it be [+continuant]. As such, It has also been observed that /d/ is realized as a stop after a lateral but not after a rhotic (Navarro Tomás 1970). 8 While French r may have other realizations including apical trills, all of the twenty French speakers tested for the present study including those from Quebec produced a dorsal rhotic. 7
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
115
we predict shorter epenthetic vowels, laterals and fricative rhotics following voiceless stops. Given that stop voicing varies across both languages and varieties (see Rosner, López-Bascuas, García-Albea & Fahey 2000) 9 , we also predict a variable degree of simplification across varieties. 3.
Methodology In order to test these hypotheses, a reading-task experiment was conducted with forty speakers (ten each of Argentine and Chilean Spanish, and European and Quebec French). Speakers, of which there were equal numbers of males and females for all four varieties, ranged in age from twenty-two to sixty-six; all had post-secondary education. For each language, the stimuli consisted of forty-eight target-words controlled for liquid type, stop voicing and contextual effects (see Appendix), as well as twenty-seven distractors. Both targets and distractors were embedded in a carrier sentence (Spanish: Digo____otra vez; French: Je dis____encore une fois; “I say____again”) and read three times by each speaker generating 5760 tokens (forty speakers x forty-eight words x three rounds). Participants were recorded in a quiet room using a Marantz CDR300 portable CD recorder and an Audiotechnica AT803B unidirectional condenser microphone. Following the testing sessions, sound files were downsampled (22050 Hz, 16 bits) and target words were extracted. Data analysis was performed with Praat 4.0.41. Three parameters were measured: (i) stop and liquid duration; (ii) percentage of the segment voiced; and (iii) duration of the epenthetic vowel, when present. Percentage voicing was chosen over other correlates of phonemic voicing, including VOT, given that (i) percentage voicing can be used to compare different types of obstruents and sonorants; and (ii) voiceless stops can be voiced during the first half of their articulation, which cannot be captured with VOT 10 . Both the waveform and the spectrogram were used to segment the stop and the liquid, with all measurements taken at zero crossings. As concerns voicing, it was measured as the percentage of the segment’s length during which the fundamental frequency was clearly visible. All measurements were entered into an Excel file, and statistics were calculated with SAS 8.2; significance was set at 0.05. 4.
Cluster simplification strategies in Spanish and French Two different strategies of simplification are compared here, namely vowel epenthesis (dissimilation) and voicing assimilation. Table 3 presents the rate of epenthesis in both lateral and rhotic clusters as well as the collapsed percentage of liquid voicing in voiceless and voiced environments for all four varieties. 9
Rosner et al. (2000) have demonstrated that VOT values differ significantly in Latin American and Peninsular Spanish varieties. 10 See Möbius (2004) for a similar argument for measuring voicing via this parameter.
LAURA COLANTONI & JEFFREY STEELE
116
A first inspection of the results reveals a clear asymmetry between laterals and rhotics across varieties. Language
Lateral Rhotic Epenthesis Voicing Epenthesis Voicing (Rate) (% voiced) (Rate) (% voiced) Argentina 0.02 0.99 0.96 0.96 Spanish Chile 0.05 0.98 0.91 0.80 France 0.07 0.94 0.49 0.52 French Quebec 0.02 0.95 0.48 0.61 Table 3: Cluster simplification strategies in stop-lateral and rhotic clusters. Variety
In lateral clusters, simplification is seemingly absent (i.e. there is neither vowel epenthesis nor devoicing). In rhotic clusters, in contrast, both strategies are at play, albeit at different rates. Epenthesis is almost categorical in both Spanish varieties (Argentine: 96%; Chilean: 91%), whereas it affects only half of the French rhotic tokens 11 . In contrast, as concerns voicing assimilation, it is in French that the process is the most prevalent. Indeed, the rhotic does not undergo devoicing in Spanish 12 , whereas there is a high rate of devoicing in French (France: 48%; Quebec: 39%). These first results show that both strategies of cluster simplification, dissimilatory epenthesis and voicing assimilation, are intimately tied: varieties which permit voicing assimilation are characterized by lower rates of epenthesis. We will demonstrate in the following sections that cluster simplification is also determined by the phonetic realization of stop voicing and the characteristics of the rhotic in each variety. 4.1
Length asymmetries We will examine here the role played by stop length in the selection of each of the aforementioned simplification strategies. In particular, we will test the hypothesis that differential length in stops (i.e. greater length in voiceless stops vis-à-vis their voiced counterparts) conditions length adjustments in the following sonorant (liquid or vowel). The measures relevant to testing this hypothesis are given in Tables 4 and 5. As concerns the lateral, a clear asymmetric pattern emerges: the lateral is significantly shorter after voiceless stops 13 . It is interesting to point out that the length values for the lateral in both voiceless and voiced environments is
11
As will be shown in Table 9, which gives values for voiceless versus voiced clusters, epenthesis is restricted almost exclusively to the latter in both French varieties. 12 This statement is valid for the general purpose of comparing Spanish with French. As we will show in Section 4.2, the rhotic in Chilean Spanish is slightly devoiced in voiceless clusters, with devoicing being directly linked to manner (i.e. fricative realizations). 13 Based on the results reported by Mendoza, Carballo, Cruz, Fresnada, Muñoz and Marrero (2003) for Spanish, the lateral in clusters is generally shorter than in singletons. However, values after a voiced stop are closer to those reported for singletons, and, thus, we consider that the lateral in voiceless environments undergoes shortening.
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
117
similar for three of the four varieties (Argentine Spanish excluded) 14 . The lateral in Argentine Spanish is longer, and the difference in length in voiceless versus voiced contexts is greater. Language Lateral Rhotic Variety [−voice] [+voice] [−voice] [+voice] Argentina 68 87 * 27 27 Spanish Chile 52 65 * 28 26 * France 51 62 * 70 52 * French Quebec 52 68 * 80 53 * Table 4: Length (ms) of the lateral and rhotic following voiceless versus voiced stops, and significance level 15 .
This pattern can be attributed to the fact that voiceless and voiced stops in Argentine Spanish seem to be maximally distinguished by duration as shown in Table 5. Language p_ t_ k_ b_ d_ Variety Argentina 110 101 100 104 106 63 57 60 Spanish Chile 80 73 73 84 76 61 50 51 France 102 94 96 110 91 78 65 56 French Quebec 89 75 77 91 82 70 58 59 Table 5: Spanish and French stop length (ms) in SL clusters.
g_ 63 54 70 64
54 44 75 68
This asymmetric pattern is interesting even if lateral clusters in these varieties do not undergo assimilation or dissimilation. The fact that the lateral adjusts in length motivates grouping it with vowels, which participate in similar compensatory lengthening in the same contexts in Romance (FischerJørgensen 1968; O’Shaughnessy 1981) and other languages (Chen 1970). Additionally, this length adjustment may account for diachronic cases of cluster simplification. As mentioned in the Introduction, voiceless stop-lateral clusters simplified in the evolution from Latin to different Romance languages, specifically to palatals (laterals, fricatives or affricates), whereas voiced clusters either remained unchanged or underwent deletion of one of the members of the cluster (Lloyd 1993). The greater length of the lateral in voiced environments may have disfavoured assimilation. The French rhotic, on the other hand, patterns oppositely: it is longer in [−voice] environments. This pattern has two different sources. First, the rhotic in voiceless clusters is always devoiced (see Section 4.2 below, especially Table 10). Voiceless fricatives are longer than their voiced counterparts, given the trade-off between voicing and frication (Ohala 1993a, 1997). On the other 14
Moreover, the lateral differs significantly in length in voiceless vs voiced environments as indicated by the asterisks. 15 Here comparisons are made between voiced and voiceless members within each category (laterals and rhotics). Those involving differences significant at the 0.05 level are marked with one asterisk. All statistics are one-way ANOVAs followed by post-hoc Fisher’s LSD tests.
118
LAURA COLANTONI & JEFFREY STEELE
hand, in order to facilitate voicing, the ideal voiced fricative is short (50 ms for Crystal & House 1988; 71 ms for Stevens, Blumstein, Glicksman, Burton & Kurowski 1992). Second, in French, voiceless-stop rhotic clusters are not broken up by an epenthetic vowel, in contrast to voiced-rhotic clusters for which epenthesis affects 90% or more of tokens (see Table 9) 16 . In spite of the longer rhotic in voiceless contexts—contrary to the third hypothesis made earlier—the epenthetic vowel-rhotic sequence in voiced clusters is nonetheless longer than the singleton rhotic in voiceless clusters, as shown in Table 6. [−voice] [+voice] [] [] Epenthetic vowel Sum length France 80 53 36 89 Quebec 70 52 37 89 Table 6: Length (ms) of the French rhotic in voiceless clusters and of the epenthetic vowelrhotic sequence in voiced environments.
The only [−continuant] segment in the sample is the Spanish tap. As illustrated in Table 7, this rhotic is equally long in both voiceless and voiced clusters in Argentine Spanish. In Chilean Spanish, the 2 ms difference is imperceptible yet statistically significant. [−voice] [+voice] Epenthetic Sum Epenthetic Sum [] [] vowel length vowel length Argentina 27 29 56 27 40 67 Chile 28 20 48 26 33 59 Table 7: Length (ms) of the epenthetic vowel and tap in Spanish voiceless and voiced stoprhotic clusters.
The small difference in Chilean Spanish could be attributed to the higher percentage of fricative variants observed in this variety (Table 8). Tap Approximant Fricative Argentina 0.99 0.01 0 Chile 0.91 0.03 0.06 Table 8: Percentage of tap, approximant and fricative realizations in the Spanish varieties under analysis.
Frication in clusters has been described as a common feature of Chilean Spanish (e.g. Lenz 1940). However, fricative rhotics are usually stigmatized (Lipski 1994), which possibly explains the low rate of frication observed in the present sample collected from educated speakers in a formal setting. 16
Diachronically, the absence of vowel epenthesis in this environment may be the consequence of the lengthening of the rhotic. If vowel epenthesis existed in Latin stop-rhotic clusters as proposed by Jensen (1999:225-226) and the vowel length asymmetry attested in modern Spanish was also present in French historically, then the combination of a shorter vowel and a longer rhotic in voiceless stop clusters may have led to the eventual loss of the vowel.
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
119
In summary, laterals in both languages undergo length adjustments which result in their being longer following [+voice] stops. This synchronic asymmetry favouring shortening in voiceless environments mirrors diachronic simplification observed in the evolution of voiceless obstruent-liquid clusters from Latin to Spanish (Lloyd 1993; Penny 2002). Length adjustments in rhotics, on the other hand, are tied to their phonetic realization. If the rhotic is realized as a tap, as in Spanish, there is no variation in length. If it is realized as a fricative, as in French, a length asymmetry is observed, although different than the one just reported for laterals. Specifically, rhotics are shorter in [+voice] environments. 4.2
Voicing asymmetries Having examined one of the phonetic correlates of voicing, namely length, in the previous section, we will now present results relevant to the second correlate under study, namely laryngeal voicing. Recall from Section 4 that vowel epenthesis is restricted to rhotic clusters in all four Romance varieties under investigation. As was demonstrated in Section 4.1, in Spanish, epenthesis is categorical with the only asymmetry between voiceless and voiced clusters involving the length of the epenthetic vowel. In French, on the other hand, the lower rate of epenthesis reported in Table 3 can be explained with reference to the data in Table 9. As we can see, epenthesis is quasi-categorical in voiced clusters, whereas there is virtually no epenthesis in voiceless clusters in either variety. Language % epenthesis Variety [−voice] [+voice] Argentina 0.94 0.98 Spanish Chile 0.85 0.97 France 0.00 0.97 French Quebec 0.06 0.90 Table 9: Rate of epenthesis in Spanish and French voiceless and voiced clusters.
Cluster-medial epenthesis can be motivated both articulatorily and perceptually. In Spanish, stop-rhotic clusters involve a sequence of two closures, which is cross-linguistically marked. In addition, perceptual experiments have demonstrated that the presence of the vowel is necessary for the identification of the rhotic (Guirao & García Jurado 1991). Epenthesis in French voiced clusters is also articulatorily motivated, since the insertion of a cluster-medial vowel creates a VCV environment which facilitates voicing in the fricative rhotic. Table 10 gives the measures relevant to determining the extent of the second strategy, namely voicing assimilation in rhotic clusters. There is no significant devoicing of the Argentine Spanish tap. In Chilean Spanish, there is slight devoicing, which is probably related to the
120
LAURA COLANTONI & JEFFREY STEELE
presence of some fricative realizations 17 . In French, on the other hand, the percentage of devoicing is significantly higher, especially in the European variety (France: 92% devoicing; Quebec: 70%). This assimilation asymmetry may once again be articulatorily motivated. As was mentioned previously, maintenance of voicing in the rhotic is facilitated in the intervocalic environment created via epenthesis. In voiceless clusters, universal constraints on voicing override the underlying specification of the rhotic, and devoicing occurs (e.g. Maddieson 1984; Ohala 1989, 1993b). As a consequence, affrication takes place in dorsal clusters (e.g. crée /ke/→[ke]). Figure 3 illustrates such a case, where the rhotic is fully devoiced and the stop becomes shorter 18 . Language % voicing Variety [−voiced] [+voice] Argentina 0.95 0.97 Spanish Chile 0.74* 0.86* France 0.08* 0.90* French Quebec 0.30* 0.92* Table 10: Percentage of voicing of the rhotic in stop-rhotic clusters. The asterisk indicates p<0.05.
Figure 3: Affrication in French voiceless stop-rhotic clusters (target crée “creates (3PS)”).
To our knowledge, previous experimental studies (e.g. Rochette 1973; O'Shaughnessy 1982) have not commented on such affrication. We propose that this synchronic pattern in French can be extended to account for a similar asymmetry in some Spanish varieties and used to explain the development of affricate segments in several Romance languages in the course of the evolution from Latin. If this is indeed correct, we hypothesize that evolution of Spanish or Portuguese affricates and fricatives from Latin started with voiceless clusters, most likely velars, where the lateral is shorter and where the 17
Fricative realizations in Chilean Spanish are voiced for 37% of their duration on average, whereas taps are voiced for 98% of their duration. 18 Our classification of these segments as affricates, at this moment, is mostly impressionistic, since we have not (i) measured the rise-time (Gerstman 1957); nor (ii) compared rhotic duration in singletons and voiceless clusters. As for the length of the stop, recall from Table 5 that voiceless stops are shorter in stop-rhotic clusters than in stop-lateral clusters.
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
121
possibility of devoicing via assimilation is higher. In voiced clusters, on the other hand, the lateral is longer and the possibility of simplification is reduced. 5.
Formalizing the asymmetries The results presented here indicate that the length of the liquid varies as a function of the length of the preceding stop. In stops, length is one of the primary phonetic correlates of phonemic voicing; voiceless stops are longer than their voiced counterparts. Liquids, on the other hand, do not contrast for voicing in Romance and, thus, length adjustments can only be motivated by timing, that is, by the need to maintain isochrony. The exact nature of this isochronic pattern, however, deserves further explanation. Although Spanish and French have often been described as syllable-timed languages (Laver 1994:156 for Spanish; O’Shaughnessy 1981; Léon 1992 for French), there is evidence suggesting that syllables of equal length are rarely (if ever) produced (Navarro Tomás 1970; Canellada & Kuhlmann Madsen 1987 for Spanish; Astésano 2001 for French). A possible alternative to syllable-time isochrony would be to consider the relative timing of the consonantal and vocalic gestures, since it has been shown that consonant gestures are timed relative to the period between two successive vowels (Tuller & Kelso 1984). However, obtaining syllable length measurements, as well as the interval between successive vowels, exceeds the scope of this paper. Moreover, although a subset of palatal laterals (Spanish and French) and post-alveolar fricatives (Portuguese) derive from stop-lateral sequences, stop-lateral simplification is not a productive strategy synchronically. As such, the formalization of the length asymmetry will not be discussed further here. As concerns the formalization of the voicing asymmetry, vowel epenthesis does indeed vary among varieties and between languages and environments. It is, however, unclear whether its source is primarily phonetic or phonological. Sequences of two syllable-initial occlusions are difficult to pronounce. Moreover, the Spanish tap, being a short occlusion (its average length is barely above the perceptual threshold; i.e. 20 ms) requires an intervocalic environment to be identified (Guirao & García Jurado 1991). However, the epenthetic vowel attested in the four Romance varieties does not fully resemble intrusive vowels described elsewhere. For example, Hall (2003:1) lists five properties typically associated with such vowels. These include (i) intrusion being triggered by consonant clusters involving a sonorant; (ii) the vowel having the same quality as the vowel following the sonorant; (iii) intrusion usually being restricted to heterorganic clusters; (iv) the vowel behaving as if it does not add a syllable to the word; and (v) in many languages, the disappearance of the vowel at fast speech rates. As concerns the Spanish and French data here, only criteria (i) and (iv) are satisfied. In Colantoni & Steele (2005b), we demonstrate that, while there is an influence of the preceding and following vowels on the formant structure of the clustermedial vowel in Argentine Spanish and Quebec French, the epenthetic vowel is not a copy of either of these vowels contra (ii). As concerns criterion (iii),
122
LAURA COLANTONI & JEFFREY STEELE
epenthesis occurs with homorganic clusters (coronal-rhotic in Spanish; dorsalrhotic in French) as well as heterorganic clusters. Finally, when obtaining the data for Quebec French, an eleventh speaker was excluded from the sample due to his extremely fast speech rate, yet this speaker’s production of the epenthetic vowel resembles the other Quebec French speakers reported here. The conflicting phonetic evidence raises the question whether the process may have phonological status. While the epenthetic vowel does not participate in most phonological processes, such as stress assignment in Spanish (see Bradley 2005) or schwa deletion in French (see Colantoni & Steele 2005b), it is observed in morphological alternations (e.g. Spanish chacra /taka/ “farm” - chacarero /takaeo/ “farmer”) 19 . Consequently, while not arguing for its phonological reality, we will formalize the asymmetries analyzed above using the Markedness and Faithfulness constraints in (1) and (2), respectively. Such formalization is useful in that the interaction of the two families of phonological constraints mirrors the interplay of various phonetic parameters argued for here.
(1)
Markedness constraints
AGREE(VCE) Consonant clusters must agree in voicing
OCP(CONT) 20 Consonant clusters must differ in [±cont]
*VCE∩FRIC(C-) Avoid voiced fricatives post-consonantly
(2)
Faithfulness constraints
DEP(IO) Output segments must have input correspondents (no epenthesis)
IDENT(VCE) Output segments must have the same specification for [±voice] as their input correspondents
The ranking of these constraints in Spanish and French is given in (3) 21 .
19
See Malmberg (1965) for further examples. In addition, see Jensen (1999) for examples documenting this alternation in Latin. 20 As stated earlier, we focus on onset clusters alone here. In both Spanish and French, this constraint must be position-sensitive, as stop-fricative and fricative-stop clusters are attested heterosyllabically (e.g. Spanish subsistir [sub.sis.ti] “survive”, esclavo [es.kla.o] “slave”; French subvenir [syb.v.ni] “to attend to”, ésclave [es.klav] “slave”).
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
(3)
123
Spanish and French rankings Spanish *VCE∩FRIC(C-), IDENT(VCE) French *VCE∩FRIC(C-), OCP(CONT)
AGREE(VCE),
AGREE(VCE)
OCP(CONT)
»
DEP(IO)
»
»
DEP(IO),
IDENT(VCE),
There exist two differences in ranking between the two languages. The first difference concerns the relative ranking of the Faithfulness constraints DEP(IO) and IDENT(VCE). Whereas these constraints are unranked with respect to each other in Spanish, DEP(IO) dominates IDENT(VCE) in French. It is this difference in ranking which accounts for the lack of epenthesis in French voiceless rhotic clusters. The second difference involves the low ranking of the Markedness constraint OCP(CONT) in French 22 . The evaluation of the voiced SL clusters in the two languages is given in Tables 11 and 12, illustrated with Spanish /an/ and French /o/ respectively. /an/ [an] [an] ) [an]
*VCE∩FRIC(C-) AGREE(VCE) OCP(CONT) *! *!
DEP(IO)
*
IDENT(VCE) *
* Table 11: Evaluation of Spanish voiced stop-rhotic clusters.
In Spanish, the totally faithful candidate [an] as well as the candidate [an] in which the tap is devoiced violate the high-ranked OCP constraint against two consecutive segments with the same specification for [continuant] 23 . The second candidate also fails to satisfy AGREE(VCE) given the difference in voicing between the stop and tap. These candidates are consequently eliminated. The winning candidate [an], which involves a cluster-medial epenthetic vowel, only violates low-ranked DEP(IO).
21
While there seems to be no evidence indicating that *VCE∩FRIC(C-) is active in Spanish, it is nonetheless present in the ranking given constraint universality and is thus included here. 22 Were SL clusters the sole consideration, OCP(CONT) could be ranked identically in both languages. However, an analysis of French voiceless fricative-rhotic clusters such as frais /f/ “fresh” shows that this cannot be the case. Were OCP(CONT) to dominate the Faithfulness constraints DEP(IO) and IDENT(VCE), one would predicate epenthesis in such clusters. However, as shown in Colantoni & Steele (2005b), these latter clusters pattern with voiceless stop-rhotic clusters involving assimilation (i.e. [f], *[f]). 23 Recall from Section 2 that we assume the voiced stops to be [-continuant].
124
LAURA COLANTONI & JEFFREY STEELE
Epenthesis is also optimal in voiceless clusters, as [p, t, k] violate OCP(CONT) in the same manner as their voiced counterparts. Whereas OCP(CONT) plays a central role in determining SL cluster wellformedness in Spanish, this is not the case in French where both the rhotic and lateral differ in continuancy vis-à-vis the stop. In this language, it is rather the high-ranked conjoined constraint *VCE∩FRIC(C_), which formalizes the double markedness of post-obstruent voiced fricatives, that is responsible for vowel epenthesis in voiced clusters versus assimilation in voiceless clusters; Table 12 illustrates the candidate evaluation for the former. The first candidate [o], which involves a voiced rhotic fricative following a stop, is eliminated since it violates the conjoined constraint. The second candidate, in which the rhotic is devoiced and thus satisfies *VCE∩FRIC(C-), nonetheless violates AGREE(VCE). As was the case in Spanish, the winning candidate [o] only violates lower ranked DEP(IO). /o/ [o] [o] ) [o]
*VCE∩FRIC(C-) *!
AGREE(VCE)
DEP(IO)
IDENT(VCE)
OCP(CONT)
*
*! *
Table 12: Evaluation of French voiced stop-rhotic clusters.
In summary, in both languages, epenthesis in voiced clusters results from high-ranking markedness constraints on voicing in obstruent-obstruent sequences. Epenthesis in these clusters allows for the faithful realization of underlying rhotic voicing in an environment in which voicing is articulatorily favoured. 6.
Voicing-dependent asymmetries and cluster simplification In the present work, we have shown that choice of cluster simplification, whether by assimilation or dissimilation, is correlated with both stop voicing and the manner of the liquid in all four Romance varieties under analysis. As concerns the specific phonetic correlates of phonemic voicing analyzed here, we have demonstrated that asymmetrical stop length—specifically greater length in the voiceless member of the pair—results in a length adjustment in the epenthetic vowel in Spanish, with the vowel being shorter in voiceless clusters. Laterals in all varieties behave like the epenthetic vowel, being shorter following voiceless stops. In this way, laterals mirror vowels and hence pattern as good sonorants; this constitutes further evidence that the lateral is the prototypical liquid in Romance (Colantoni & Steele 2005a). Length adjustments in rhotics, on the other hand, depend on the phonetic realization of in each of the varieties under study. In Argentine Spanish, where the rhotic is consistently realized as a tap, there are no compensatory effects, as opposed to Chilean Spanish in which there is a small yet statistically significant effect due to incipient variation in manner. In French, finally, there is a length asymmetry in
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
125
rhotics. The difference occurs in the opposite direction to what was predicted; indeed, the rhotic is longer in voiceless environments. We have argued that this is motivated by aerodynamic constraints on the production of fricatives. Laryngeal voicing, the second phonetic correlate under analysis, also motivated the asymmetries observed. Neither voicing assimilation nor epenthesis was documented here with laterals, which are inherently and easily voiced segments. Rhotics, once again, exhibit a more complex pattern. In Argentine Spanish, there is no devoicing of the short tap. In Chilean Spanish, the rhotic is slightly devoiced with a concomitant lower rate of epenthesis in voiceless contexts. In French, rhotic devoicing and epenthesis are in complementary distribution: the former is observed in voiceless clusters, whereas the latter is quasi-categorically present in voiced clusters. Additionally, the current experimental study documents the importance of analyzing phonetic correlates of voicing in order to account for phonetic and possibly phonological variation in the evolution from Latin to Romance, where voiceless stops in clusters became voiced (Jensen 1999:225-226). The analysis of the phonetic realization of voicing in stops and manner in rhotics provides a better account of the variable asymmetric behavior of laterals and rhotics in voiced and voiceless clusters, as it ties such variation to the interaction of multiple phonetic parameters. It has also shown that the Romance tendency towards CV syllables observed in SL clusters is variable, being conditioned by the nature of the rhotic and the degree of voicing of the stop. Finally, we have observed that the inter- and intra-linguistic variation can be adequately formalized within an OT framework. The present study, however, has only looked in detail at the phonetic correlates of voicing. In order to strengthen the present claims in the future, it will be necessary to investigate other parameters including timing. Nonetheless, the present research demonstrates the important contribution that the study of phonetic parameters of phonological features may make to more accurately characterizing variable phenomena. Finally, it contributes to furthering our understanding of the role of voicing in cluster simplification in Romance more generally.
References Argüello, Fanny. 1978. El dialecto zeísta en el Ecuador: Un estudio fonético y fonológico. [The zeist dialect in Ecuador: A phonetic and phonological study]. PhD diss., Pennsylvania State University. Astésano, Corine. 2001. Rythme et accentuation en français: Invariance et variabilité stylistique. Paris: L’Harmattan. Borzone de Manrique, Ana María & Jorge Gurlekian. 1980. “Rasgos acústicos de las consonantes oclusivas españolas”. Fonoaudiológica 26. 326-330. Bradley, Travis. 2005. “Systemic markedness and phonetic detail in phonology”. Experimental and Theoretical Approaches to Romance Linguistics ed. by Randall S. Gess & Edward J. Rubin. 41-62. Amsterdam & Philadelphia: John Benjamins.
126
LAURA COLANTONI & JEFFREY STEELE
Canellada, María Josefa & John Kuhlmann Madsen. 1987. Pronunciación del español. Madrid: Castalia. Chen, M. 1970. “Vowel length variation as a function of the voicing of consonant environment”. Phonetica 22. 129-159. Colantoni, Laura & Jeffrey Steele. 2005a. “Liquid asymmetries in French and Spanish”. Toronto Working Papers in Linguistics 24. 1-14. ----------. 2005b. “Phonetically-driven epenthesis asymmetries in French and Spanish obstruent-liquid clusters”. Experimental and Theoretical Approaches to Romance Linguistics ed. by Randall S. Gess & Edward J. Rubin. 77-96. Amsterdam & Philadelphia: John Benjamins. Côté, Marie-Hélène. 2004. “Consonant cluster simplification in Quebec French”. Probus 16. 151-201. Crystal, Thomas H. & Arthur S. House. 1988. “Segmental durations in connected-speech signals: Current results”. Journal of the Acoustical Society of America 83. 1553-1573. Fischer-Jørgensen, Eli. 1968. “Voicing, tenseness, and aspiration in stop consonants with special reference to French and Danish”. Annual Report of the Institute of Phonetics of the University of Copenhagen 3. 136-164. Gerstman, Louis J. 1957. Perceptual dimensions for the frication portions of certain speech sounds. PhD diss., New York University. Guirao, Miguelina & Ma. Amalia García Jurado. 1991. “Los perfiles acústicos y la identificación de /l/ y /r/”. Revista Argentina de Lingüística 7. 21-42. Goldsmith, John. 1981. “Subsegmentals in Spanish phonology: An autosegmental approach”. Linguistic Symposium on Romance Languages 9 ed. by William Cressey & Donna Jo Napoli. 1-16. Washington D.C.: Georgetown University Press. Hall, Nancy. 2003. Gestures and segments: Vowel intrusion as overlap. PhD diss., University of Massachusetts, Amherst. Hall, Robert Jr. 1976. Proto-Romance Phonology. New York: Elsevier. Harris, James. 1969. Spanish Phonology. Cambridge, Mass.: The MIT Press. Inouye, Susan. 1995. Trills, taps and stops in variation and contrast. PhD diss., University of California, Los Angeles. Jacques, Benoît & Jorge Gurlekian. 1982. “Étude comparée de quatre paramètres acoustiques des occlusives en espagnol de Buenos Aires et en français de Montréal”. Revue Québécoise de Linguistique 22. 257-272. Jensen, Frede. 1999. A Comparative Study of Romance. New York: Peter Lang Publishing. Jessen, Michael. 2001. “Phonetic implementation of the distinctive auditory features [voice] and [tense] in stop consonants”. Distinctive Feature Theory ed. by Allan Hall. 237-294. Berlin: Mouton de Gruyter. Laver, John. 1994. Principles of Phonetics. Cambridge: Cambridge University Press. Lenz, Rodolfo. 1940. "Estudios chilenos. Fonética del Castellano de Chile". El español en Chile ed. by Amado Alonso & Raimundo Lida, vol. VI, 79208. Buenos Aires: Universidad de Buenos Aires.
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
127
Léon, Pierre R. 1992. Phonétisme et prononciations du français. Paris: Nathan. Lipski, John. 1994. Latin American Spanish. New York: Longman. Lloyd, Paul. 1993. Del latín al español. Madrid: Gredos. Maddieson, Ian. 1984. Patterns of Sound. Cambridge: Cambridge University Press. Malmberg, Bertil. 1965. Estudios de fonética hispánica. Madrid: CSIC. Martínez Celdrán, Eugenio. 1991. “Duración y tensión en las oclusivas no iniciales del español: un estudio perceptivo”. Revista Argentina de Lingüística 7. 51-71. ----------. 2006. “Algunas quimeras en la fonética del español”. Paper presented at 3rd Conference on Laboratory Approaches to Spanish Phonology, Toronto. Mascaró, Joan. 1984. “Continuant spreading in Basque, Catalan and Spanish”. Language Sound Structure ed. by Mark Aronoff & Richard Oehrle. 287298. Cambridge, Mass.: The MIT Press. Mendeloff, Henry. 1969. A Manual of Comparative Romance Linguistics. Phonology and Morphology. Washington D.C.: The Catholic University of America Press. Mendoza, E., G. Carballo, A. Cruz, M.D. Fresnada, J. Muñoz & V. Marrero. 2003. “Temporal variability in speech segments of Spanish: Context and speaker related differences”. Speech Communication 40. 431-447. Möbius, Bernd. 2004. “Corpus-based investigations on the phonetics of consonant voicing”. Folia Linguistica 38. 5-26. Navarro Tomás, Tomás. 1970. Manual de pronunciación española. Madrid: CSIC. Ohala, John. 1989. “Sound change is drawn from a pool of synchronic variation”. Language Change: Contribution to the Study of its Causes ed. by Leiv Breivik & Ernst Jahr. 173-198. Berlin: Mouton de Gruyter. ----------. 1993a. “The origin of sound patterns in vocal tract constraints”. The Production of Speech ed. by Peter MacNeilage. 189-216. New York: Springer-Verlag. ----------. 1993b. “The phonetics of sound change”. Historical Linguistics: Problems and Perspectives ed. by Charles Jones. 237-278. London: Longman. ----------. 1997. “Aerodynamics of Phonology”. Proceedings of the 4th Seoul International Conference in Linguistics. 92-97. Seoul, Korea: Korea University. O'Shaughnessy, Douglas. 1981. “A study of French vowel and consonant duration”. Journal of Phonetics 9. 385-406. ----------. 1982. “A study of French spectral patterns for synthesis”. Journal of Phonetics 10. 377-399. Penny, Ralph. 2002. A History of the Spanish Language. Cambridge: Cambridge University Press.
LAURA COLANTONI & JEFFREY STEELE
128
Pérez, Hernán Emilio. 1998. “Incidencia de dos rasgos acústicos en la percepción de la correlación /p-t-k/ vs /b-d-g/”. Revista de Lingüística Teórica y Aplicada 36. 113-126. Rochette, Claude-E. 1973. Les groupes de consonnes en français: Étude de l'enchaînement articulatoire à l'aide de la radiocinématographie et de l'oscillographie. Quebec: Les presses de l'Université Laval. Romero, Joaquín. 1995. Gestural organization in Spanish. An experimental study of spirantization and aspiration. PhD diss., University of Connecticut. Rosner, Burton S., Luis E. López-Bascuas, José E. García-Albea & Richard P. Fahey. 2000. “Voice-onset times for Castilian Spanish initial stops”. Journal of Phonetics 28. 217-224. Santagada, Miguel & Jorge Gurlekian. 1989. “Spanish voiced stops in VCV contexts: Are they fricative variants or approximants?” Revue de Phonétique Appliquée 91-93. 363-375. Stevens, Keneth, Sheila Blumstein, Laura Glicksman, Martha Burton & Kathleen Kurowski. 1992. “Acoustic and perceptual characteristics of voicing in fricatives and fricative clusters”. Journal of the Acoustical Society of America 91. 2979-3000. Tuller, Betty & J.A. Scott Kelso. 1984. “The timing of articulatory gestures: Evidence for relational invariants”. Journal of the Acoustical Society of America 76. 1030-1036. Appendix: Stimuli 24
Word shape
Liq [l]
CLV
[plan]
[bla]
*
[l]
plegó
[]
24
Labial b bla
[]
CLV. 'CV
CV. 'CLV
p plan
* blasón
Coronal f
t
flor [flo] frac [fak] floté
[pleo] [blason] [flote] bramó frenó prevé
[pee] [bamo] [feno] doblá soplé * [l] [sople] [dola] sufrí cobrá * [] [koa] [sufi]
* tras [tas]
Dorsal d * *
k clan [klan] * clavé
*
*
traté
drogó
[tate]
[doo] [keo]
* *
[klae] creyó
g * gran [an] * grabé [ae]
*
*
*
podré
lucrá
lográ
[poe] [luka]
[loa]
In the case of Spanish, the presence of an asterisk does not necessarily imply a gap in the language. Rather it most often simply indicates the absence of a cluster type from the stimuli set. For example, in order to control the number of variables and to facilitate between-language comparison, an effort was made to include only words containing mid and low vowels. As such, clusters followed by high vowels were kept to a minimum.
CLUSTER SIMPLIFICATION ASYMMETRIES IN SPANISH & FRENCH
[l] 'CV.CLV [] [l] 'CLV.CV []
Word shape
Liq [l]
CLV [] CLV. 'CV
CV. 'CLV
CV. CLV. 'CV
[l] [] [l] [] [l] []
sopla
tabla
[sopla] lepra
[tala] sobra
*
*
*
cifra
letra
sidra
[lepa] plato
[soa] bledo
[sifa] flaco
[leta]
[sia]
[plato] prado
[bleo] [flako] brazo frase
* traje
* drama
129
tecla
regla
[tekla] sacra
[rela] negra
[saka] claro
[nea] globo
[klao] crema
[loo] grave
[pao] [baso] [fase] [taxe] [dama] [kema] [ae] Table 13: Spanish SL stimuli.
Labial b plat blé [pl] [ble] pré bras [pe] [b] placer blaguer [plase] [blae] précis bravo [pesi] [bavo] couplet doubler [kupl] [duble] mépris cobra [mepi] [kba] déplacer déblayer [deplase] [debleje] soprano librairie [spano] [libei] p
Coronal f flot [flo] frais [f] flatter [flate] frapper [fape] siffler [sifle] coffret [kf] refléter [flete] naufragé [nofae]
t
Dorsal d
k
clé [kle] très drap crée [t] [d] [ke] classer * * [klase] tracer drapeau craquer [tase] [dapo] [kake] boucler * * [bukle] vitré poudrer sucré [vite] [pude] [syke] déclarer * * [deklae] détraquer redresser décrasser [detake] [dese] [dekase] *
Table 14: French SL stimuli.
*
g glas [l] gras [] glacé [lase] gratter [ate] régler [ele] degré [de] déglacer [delase] dégrader [deade]
THE PHONETICS AND PHONOLOGY OF INTONATIONAL PHRASING IN ROMANCE * SÓNIA FROTA1, MARIAPAOLA D’IMPERIO2, GORKA ELORDIETA3, PILAR PRIETO4 & MARINA VIGÁRIO5 1 Universidade de Lisboa, 2Laboratoire Parole et Language – CNRS, 3 Euskal Herriko Unibertsitatea, 4ICREA & Universitat Autònoma de Barcelona, 5Universidade do Minho
Abstract This paper examines the phonetics and phonology of intonational boundaries in five Romance languages/varieties. A typology of the boundary cues used is given, as well as their relative frequency. The phonology of the tonal boundary gesture is described by means of the inventory of nuclear accents used plus their possible combinations with the two dominant end contours: continuation rise (H) and sustained pitch (!H). A detailed analysis of the phonetics of the H boundary tone, which is the main boundary cue observed across these languages, is provided. This involved assessing the impact on H scaling of nuclear accent choice, phrase length and first peak height. Overall, it is shown that the variation found consistently groups these languages in two sets: the Catalan-Spanish group and the Italian-European Portuguese group.
1.
Introduction Intonational phrasing in Romance has been the topic of recent research conducted within the Romance languages intonational phrasing project (Elordieta, Frota, Prieto & Vigário 2003; Elordieta, Frota & Vigário 2005; D’Imperio, Elordieta, Frota, Prieto & Vigário 2005; Prieto 2005, 2007; Frota & Vigário in press). The three main goals of this project are to establish the patterns of placement of intonational boundaries, to determine the influence of syntactic and prosodic factors on boundary placement, and to describe the phonetics and phonology of intonational boundaries. To attain these goals intonational phrasing has been studied on a corpus of laboratory speech which * This research was funded by the Onset-Centro de Estudos da Linguagem (Onset-CEL) da Universidade de Lisboa (as part of the SILC Project), the Centro de Estudos Humanísticos da Universidade do Minho (CEHUM), and the following research grants: 9/UPV00033.13013888/2001 from the University of the Basque Country, 2002XT-00032 and 2001SGR 00150 from the Generalitat de Catalunya and HUM2006-01758/FILO from the Ministry of Science and Technology of Spain, and ACI 0220244 from the French Ministry of Research. We are grateful to the audience of the II PaPI Conference held at the Universitat Autònoma de Barcelona, especially to Bob Ladd and José Ignacio Hualde, and to two anonymous reviewers.
132
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
was designed to be comparable across languages—the Romance Languages Database (RLD). The present paper emerges from this research project and is intended to address the third goal, in other words, to determine how intonational boundaries are realized in Central Catalan, in two varieties of European Portuguese (Standard and Northern EP), in Neapolitan Italian, and in Central Peninsular Spanish. Our approach to intonation is couched within the autosegmental metrical theory (see Pierrehumbert 1980; Beckman & Pierrehumbert 1986; Ladd 1996, among the landmarks in the development of this theoretical model). For all the languages under observation, the utterances were prosodically annotated using language-specific versions of a ToBI-like transcription (see Price, Ostendorf, Shattuck-Hufnagel & Fong 1991; Beckman & Ayers 1994), which took into account the work based on the autosegmental metrical model as applied respectively to each language (e.g. for Catalan, Prieto 1995; Estebas-Vilaplana 2000; Prieto, D’Imperio, Elordieta, Frota & Vigário 2006; for European Portuguese, Vigário 1998; Grønnum & Viana 1999; Frota 2000, 2002a, 2002b; Vigário & Frota 2003; for Italian, D’Imperio, 2000, 2001, 2002; and for Spanish, Prieto, van Santen & Hirschberg 1995; Prieto, Shih & Nibert 1996; Prieto 1998; Sosa 1999; Nibert 2000; Beckman, Díaz-Campos, McGory & Morgan 2002; Face 2002; Hualde 2002; and McGory & Díaz-Campos 2002, among others). Thus the account of the phonetics and phonology of intonational phrasing we provide is necessarily informed and constrained by the tenets of the approach we have adopted. Using the same framework for describing intonation makes cross-language comparisons possible, and work on other languages has shown that languages may differ not only in the phonology (i.e. the inventory of tones and their permitted combinations) but also in the phonetics (i.e. the realization of tones) of intonational boundaries. For example, phonological differences between German and English question intonation are described in Ladd (1996), namely, the use of H* LH% and L*H LH%, respectively, and phonetic differences between German and British English in the way they exploit the phonetic space of H% are reported in Chen (2003). This paper is organised as follows. Section 2 reviews the main findings of previous research within the Romance languages intonational phrasing project and describes the RLD. Section 3 provides a typology of the boundary cues used in each language, as well as their relative frequency. In Section 4, we describe the phonological choices that characterize each language, that is, the inventory of nuclear accents used and the shape of nuclear contours observed. Section 5 deals with the phonetics of the dominant boundary cue used by all the languages under study: the H boundary tone. The impact of different factors on the realization of the H boundary tone, such as the type of nuclear accent or the length of the phrase, is examined. The paper concludes with an assessment of the similarities and differences that characterize intonational boundaries across Romance languages.
INTONATIONAL PHRASING IN ROMANCE
133
2. 2.1
Background Previous work comparing intonational phrasing in Romance Previous comparative work on intonational phrasing in Romance languages has focused on the role of syntactic and prosodic factors on the placement of intonational boundaries in broad focus declarative sentences containing a subject, verb and object (i.e. SVO). The import of syntactic branching (i.e. constituency), prosodic branching (i.e. number of prosodic words), and length (i.e. number of syllables) was examined in a systematic way by approaching intonational phrasing from an empirical perspective. The collective results of these various studies demonstrate that the five Romance languages/varieties differ in their phrasing patterns. In Catalan, for example, the most common phrasing is (S)(VO) across all conditions observed (Prieto 2005; D’Imperio et al. 2005). However, Catalan is the only one of these languages where the (SV)(O) phrasing pattern was also found. This phrasing obtains due to a strong tendency to balance the length of the prosodic constituents in terms of number of syllables and also number of stresses and/or prosodic words. An effect of branchingness was also found in Catalan, but the relevant factor is prosodic and not syntactic (Prieto 2005; D’Imperio et al. 2005). Standard European Portuguese (SEP) is the only language in the group with a prevalence of the (SVO) phrasing pattern. The alternative (S)(VO) pattern, also found in the data, is triggered by phrase length, not prosodic or syntactic branching. By contrast, Northern European Portuguese (NEP) shows a higher frequency of (S)(VO) phrasing than SEP, with prosodic branchingness being more important than constituent length (Elordieta et al. 2005; D’Imperio et al. 2005; Frota & Vigário in press). The phrasing patterns shown by Italian are similar to NEP: both (SVO) and (S)(VO) are found, and the main factor triggering (S)(VO) is prosodic branchingness (D’Imperio et al. 2005). In Spanish, like in Catalan, the most common phrasing is (S)(VO) across all conditions. However, differently from the other languages, syntactic branching seems to be a major factor in phrasing decisions in Spanish (Elordieta et al. 2003; Elordieta et al. 2005; D’Imperio et al. 2005). In all this previous work, intonational boundaries were identified and marked, but no analysis of the type and frequency of boundary cues was made. This is the object of the present paper. 2.2
The RLD The Romance Languages Database contains a set of comparable SVO sentences designed with all the combinations of two constituent length conditions (‘short’, meaning three syllables, and ‘long’, meaning five syllables) and the three syntactic branching conditions (non-branching, branching and double branching S and O). In a subset of these materials, the syntactic branching condition is substituted with a prosodic branching condition, namely a phrase with two prosodic words that are syntactically non-branching (for a full description of the RLD, see D’Imperio et al. 2005). The speech materials were read three times each in random order (with distractor sentences in
134
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
between) by two speakers of each of the five languages/varieties under study. Examples of the speech materials are given in (1): (1)
Non-branching Subject and branching Object (Long-Short-Long) Cat: La boliviana mirava la melmelada meravellosa. “The Bolivian woman looked at the wonderful jam.” EP: A boliviana gravava uma melodia maravilhosa. “The Bolivian woman recorded a wonderful song.” It: La boliviana mirava la serenata meravigliosa. “The Bolivian woman observed the wonderful serenade.” Sp: La boliviana miraba la mermelada maravillosa. “The Bolivian woman looked at the wonderful jam.”
For the present paper, since we were exclusively interested in the phonetics and phonology of intonational phrasing, we selected only those utterances in the RLD that had been previously classified as containing a clear phrasing boundary. In other words, we analysed a subset of this database. As we were interested in the nature of the cues used to signal phrasing, we decided to consider only the uncontroversial cases of intonational phrasing, that is, those cases that were perceived by two judges (one author and one external judge) as unarguably containing a clear phrasing boundary. All cases judged unclear by one or both judges were excluded from the analysis. The results reported below are thus based on a total of 998 utterances: 239 for Catalan, 267 for EP (117 for SEP and 150 for NEP), 233 for Italian, and 259 for Spanish. Although only two speakers from each language/variety were analysed, two factors make us feel justified in claiming that we are indeed describing intonational properties of the languages and varieties under observation. First, only cases of clear phrasing were analysed. Second, the same speakers had been recorded together with other speakers for the study of different aspects of intonation and did not show deviant or atypical patterns (see Frota 2000 for Standard EP; Vigário & Frota 2003 for NEP; and D’Imperio 2000 for Italian). 3.
Typology of boundary cues All the phrasing boundaries examined show one or more of the following boundary cues: (i) the preboundary stretch is realized as a rise from/on the last stressed syllable into the boundary syllable, that is, a ‘continuation rise’; (ii) the preboundary stretch is realized as a rise on the last stressed syllable followed by a high plateau up to the boundary, that is, ‘sustained pitch’; (iii) the boundary is signalled by a High tone; (iv) the boundary is signalled by a Low tone; (v) there is ‘pitch reset’ after the boundary, at the beginning of the second phrase; (vi) the F0 drops to the speaker’s base level at the boundary; (vii) there is preboundary lengthening; and (viii) a pause (defined as a stretch of silence) is present at the phrasing boundary. The data were classified as containing one or more of these cues. Of this set of cues, pitch reset and
INTONATIONAL PHRASING IN ROMANCE
135
preboundary lengthening turned out to be extremely hard to capture in a systematic and comparable way across languages, as will be explained below. First, pitch reset was found to be either full or partial, and measuring it in a comparable fashion across languages was problematic. For European Portuguese, the peak line delineated by two (or more) preboundary peaks was used as a reference line to place the first peak of the second phrase, and peaks above the line were classified as cases of reset. This criterion matched well with the perception of pitch reset. By contrast, for Italian, a purely perceptual decision was made. For Spanish and Catalan, ratios of 0.90 or higher between the first peaks of the first and second phrases were considered cases of reset. Again, the criterion seemed to match with the auditory impression of pitch reset. Establishing the presence of preboundary lengthening in a comparable way across these languages was even harder. The database was not designed to measure lengthening, and only by chance could we avail ourselves of the same sentence uttered with and without a phrase boundary by the same speaker, the ideal case to examine lengthening effects. The few cases where such pairs were found were measured (specifically, the duration of the last stressed syllable and the preboundary syllable) and the result of the presence or absence of lengthening was extended to the utterances that were perceptually similar (with regard to the impression of lengthening) to those that were actually measured 1 . Cont. Rise Cat Sp SEP NEP It
(2)
100.0 88.4 95.0 89.0 54.5
Sustained Pitch
Boundary Pitch Drop PB Tone Reset BL Length. H L 0.0 100.0 0.0 28.0 0.0 100.0 11.2 99.3 0.7 76.0 0.7 40.2 4.0 25.0 4.0 15.0 0.0 95.0 3.0 21.0 1.0 72.0 8.0 97.0 1.3 98.0 0.0 100.0 45.5 98.7 Table 1: Frequency of boundary cues per language.
a.
10.5 28.2 5.0 17.0 16.7
Continuation rise
Ba
da
L O
b.
na Sustained pitch
Ba 1
Pause
da
L O
na
“Badalona”
In future work we plan to address the issues of pitch reset and preboundary lengthening by means of specifically designed experiments that can produce comparable data and the application of identical criteria for all languages.
136
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
Table 1 shows the frequency of the different boundary cues in the languages studied. It is clear that prosodic breaks in Romance are predominantly marked by a High boundary tone. The preboundary stretch is predominantly realized as a continuation rise except in Italian, where sustained pitch is just as frequent. In all cases, the boundary is marked by a H tone. The two boundary configurations, sustained pitch and continuation rise, are phonetically distinct in those languages that show both types of contours, namely, Italian, Spanish, and NEP. The configurations in (2) illustrate the general patterns found in the Italian data. Figures 1 and 2 show typical contours produced with a prosodic break after the subject noun phrase in NEP and Spanish, respectively. In both figures the top panel shows a continuation rise after the subject, and the bottom shows sustained pitch.
Figure 1: Waveform and F0 contour of the NEP utterance (A nora morena da velha) (manuseava dinheiro libanês na mala) “The dark-haired daughter-in-law of the old woman was holding Lebanese money in her handbag”, speaker MI.
As shown in Table 1, Low boundary tones occur in all the languages except Catalan, but are rare. The F0 drop to the speaker’s base level is also
INTONATIONAL PHRASING IN ROMANCE
137
rare. Pauses, although present in all languages, are not a frequent cue either 2 . As to pitch reset and preboundary lengthening, these cues appear in all the languages but are very frequent in only some of them. Due to the dominant use of the H boundary tone across languages and its contribution to both a continuation rise and a sustained pitch boundary configuration, this paper will focus on the phonology and phonetics of the whole tonal gesture that signals phrasing boundaries in Romance.
Figure 2: Waveform and F0 contour of two Spanish utterances. Top panel (La niña mora) (miraba la mermelada) “The Moorish girl looked at the jam”, speaker MR. Bottom panel: (La boliviana) (rememoraba la noria de Vigo) “The Bolivian girl remembered the ferris wheel in Vigo”, speaker LM.
4.
Phonological choices This section is devoted to the phonology of the continuation rise and sustained pitch tonal gestures. In Section 4.1 we describe the nuclear accents found in each language, that is, the pitch accent preceding the boundary tone. In Section 4.2 we examine the shape of the nuclear contours as a whole. 2
In our data speech rate was not controlled, as speakers were simply asked to produce the utterance in a natural way at a normal speech rate. It may thus be the case that speech rate differences are responsible for differences in pause occurrence.
138
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
4.1
Nuclear accents Across the languages, four different nuclear accents were found before the phrasing boundary. The shapes of the tonal trajectory within the stressed syllable are schematised in (3). We describe tune-text alignment of these shapes in (4). b. L*+H
c. H+L*
d.
L*
(3)
a. L+H*
(4)
Alignment description: a. L+H*: High target attained within the stressed syllable (at the end of the vowel). b. L*+H: Low target in the stressed V and high target in the poststressed syllable. c. H+L*: Low target in the stressed V preceded by a high target. d. L*: The stressed vowel remains low throughout.
The descriptions in (3) and (4) were instrumental in the identification of nuclear accents across languages, so that we could systematically apply the same category label to similar objects cross-linguistically. In addition, it is important to note that all these accents had independently been described in previous work as part of the tonal inventory of the languages described (e.g. Prieto 1995 for Catalan; Frota 1997, 2002a; Vigário 1998; Vigário & Frota 2003 for EP; D’Imperio 2000, 2002 for Italian; Beckman et al. 2002; Face 2002; Hualde 2002; McGory & Díaz-Campos 2002; Sosa 1999, among others for Spanish) 3 . The distribution of the different nuclear accent types across languages is given in Table 2. The five languages/varieties clearly form two groups: (i) those only with rising accents or where rising accents are the overwhelming choice, that is, Spanish and Catalan; and (ii) those with both rising and falling accents, that is, SEP, NEP and Italian. Within the latter group, a further distinction can be made: SEP is different from both NEP and Italian in not showing the accents L+H* and L*. This does not come as a surprise as L* has been reported to be the most frequent nuclear accent in NEP declarative accents, whereas SEP has H+L* as the declarative nucleus (Vigário & Frota 2003), and L+H* has never been reported as a possible accent in SEP to our knowledge. 3
The rare cases of falling accents found in Catalan require a further comment. These cases may fall in either the H+L* or L* category, as they are ambiguous between the two and may well be variants of the same accent. We have classified them as H+L* for the sake of simplicity.
INTONATIONAL PHRASING IN ROMANCE
139
L+H* L*+H H+L* L* Cat NM 97 0 3 0 PG 100 0 0 0 Sp LM 27 73 0 0 MR 5 95 0 0 0 0 57 43 SEP AG 0 11 89 0 MC NEP MI 19 57 7 17 MS 0 0 14 86 It LC 47 0 0 53 LD 41 0 59 0 Table 2: Distribution of nuclear pitch accent types (% relative to total of utterances by speaker) per language and speaker.
4.2
Nuclear contours We will now consider the contribution of the nuclear accents to the two predominant types of nuclear contours we have found: continuation rise and sustained pitch. The four different accents participate in the continuation rise and sustained pitch boundary configurations as described in Table 3. Not surprisingly, falling/low accents only appear with continuation rises. Rising accents, on the other hand, show two interesting patterns which again divide Romance languages into the same two groupings: (i) Catalan and Spanish; and (ii) SEP, NEP and Italian. In the latter group there is a strong connection between L+H* and sustained pitch, and L*+H and continuation rise. In the former group, the connection is much less strong or simply does not hold: in Spanish, L+H* does not have to be followed by sustained pitch (in fact, with L+H* sustained pitch is observed in only 26% of the cases), and sustained pitch may appear with L*+H (9% of the cases); in Catalan, L+H* is almost the only accent (except for 3% of H+L*) and sustained pitch was not found. Based on our data, we must conclude that from a cross-language perspective, nuclear accent type (L+H* and L*+H) and a sustained pitch or continuation rise configuration at the intonational boundary are independent choices and different languages combine these two properties of the nuclear contour in different ways. Cat Sp SEP NEP It
L+H* ContRise ContRise/SusPitch
L*+H
H+L*
L*
ContRise/SusPitch ContRise ContRise SusPitch ContRise ContRise ContRise SusPitch ContRise ContRise ContRise Table 3: Dominant nuclear contour types across languages.
Finally, we would like to comment on the phonology of the two types of boundary contours. Continuation rises involve a H boundary tone that may be preceded in some cases by a Low tone (e.g. in Catalan and Spanish, though not frequently) yielding a boundary of the (L)H type. Sustained pitch also involves a H boundary tone, and the high plateau may be analyzed as the result of a HL boundary where the L tone is responsible for the final sustained level (as proposed in Pierrehumbert 1980), or simply as the result of a !H boundary,
140
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
assuming that downstep is an independent intonational feature (along the lines of work by Ladd 1983, 1996). The latter analysis would have the advantage of reserving the HL boundary type for complex boundaries that do involve a real rise-fall gesture (as in the case of yes-no questions in NEP, Vigário & Frota 2003; or exhortative utterances in Catalan, Prieto, Aguilar, Mascaró, Torres & Vanrell 2007). 5.
The phonetics of the H boundary tone This section examines the phonetics of the dominant boundary cue used by all the languages under study: the H boundary tone. A detailed analysis of the impact of different factors on the realization of the H boundary tone is provided, namely the type of nuclear accent (Section 5.1), the length of the phrase (Section 5.2), and the interdependence of the realization of H with the scaling of the first peak of the phrase (Section 5.3). The section concludes with a summary and discussion of the main findings. 5.1
The impact of nuclear pitch accent choice on the scaling of the H boundary We have seen that the languages studied, with the exception of Catalan, may show different types of accents that frequently appear in nuclear position before the phrasing boundary. The realization of the H boundary tone (HBT) may thus be affected by the choice of nuclear pitch accent in these languages, along the lines suggested by Pierrehumbert (1980) for the upstep of H% after a H tone but not after a L tone. A detailed examination of nuclear pitch accent choice as a factor constraining the scaling of HBT shows important and consistent effects across SEP, NEP, Italian and Spanish. In SEP, HBT is higher after L*+H than after H+L* and this effect is consistent across speakers. Figure 3 (top panel) displays the data for speaker AG (who shows a significant difference in the scaling of HBT, p<0.0001). In NEP, HBT is also higher after rising accents than after falling/low accents (p<0.001). HBT is also higher after H+L* than L*, for both speakers (though it does not reach significance). This is illustrated by the data for speaker MI, who shows all four types of nuclear accent (Figure 3, bottom panel). The data from Italian replicates the same basic finding that rising accents promote higher HBT than low/falling accents (for both speakers p<0.0001), as shown in Figure 4. In Spanish two types of rising accents were found, and HBT is consistently higher after L*+H than after L+H* across speakers, though the difference only reaches significance for speaker LM (p<0.05), who shows a more balanced distribution between accents (Figure 5). This result is not surprising, as L*+H is followed by a continuation rise in 92% of cases, whereas a continuation rise appears after L+H* in only 74% of the occurrences of this accent.
INTONATIONAL PHRASING IN ROMANCE
141
275 270 265
HBT (Hz)
260 255 250 245 240 235
±Std. Dev. ±Std. Err. Mean
230 HL*
L*H ACCENT TYPE
250 240 230
HBT (Hz)
220 210 200 190 180 ±Std. Dev. ±Std. Err. Mean
170 L*H
HL*
LH*
L*
ACCENT TYPE
Figure 3: HBT scaling by type of nuclear accent. Top panel: speaker AG from SEP; bottom panel: speaker MI from NEP.
260 240 220
HBT (Hz)
200 180 160 140 120 100 80 H+L*
L+H*
L*
H+L*
L+H* SPEAKER: LD
SPEAKER: LC
L*
±Std. Dev. ±Std. Err. Mean
ACCENT TYPE
Figure 4: HBT scaling by type of nuclear accent for the two Italian speakers.
142
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO 295
285
HBT (Hz)
275
265
255
245
235
225 L*+H
L+H*
±Std. Dev. ±Std. Err. Mean
ACCENT TYPE
Figure 5: HBT scaling by type of nuclear accent for Spanish speaker LM.
We can conclude that nuclear pitch accent choice affects the scaling of HBT in a similar and consistent way in all the languages studied that use different pitch accents. The tendency is as follows: HBT is higher after L*+H/L+H* than after H+L*/L*. Within rising accents, L*+H promotes higher HBT than L+H* in Spanish. Within accents with a (final) Low tone, H+L* promotes higher HBT than L* in NEP. Though only containing L+H*, the Catalan data is consistent with the cross-linguistic findings in that the H boundary tone shows high values in this language. This fact is revealed by a cross-language comparison of the ratios between HBT and the F0 values at the beginning of the utterance: in Catalan the ratios are very high (indeed it is the only language with ratios above 1.30), as one would expect from a rising accent plus HBT sequence where only continuation rises are found. The findings just described can be interpreted as resulting from the upstep of HBT after an accentual H. This would account for the higher scaling of HBT after rising accents in general, relative to low/falling accents. Moreover, this implementation rule is independent of the downstep (phonological) feature we proposed to represent the sustained pitch configuration (!H). Thus, a downstepped HBT would tend to be phonetically lower than a non-downstepped HBT, even if preceded by an accentual H 4 . 5.2
The impact of phrase length on the scaling of HBT It has long been known that F0 tends to decline over the course of phrases (and utterances) in many languages, whether we consider the tendency shown by the topline or the baseline (e.g. Bruce & Garding 1978; Liberman & Pierrehumbert 1984; Pierrehumbert & Beckman 1988; Ladd 1996; Prieto et al. 1996; Prieto 1998, among many others). However, languages seem to differ in the sources of global trends, as the scaling of any given tone may depend on a 4
The reason why H+L* promotes higher HBT than L* does is not totally clear at this point. However, we would like to suggest a functional interpretation in terms of contrast enhancement, along the lines suggested by Rialland (2001): after a fall, a higher target is required to facilitate both the perception of the low tone and the following high tone.
INTONATIONAL PHRASING IN ROMANCE
143
variety of factors like phrasal length, phrasal position, temporal distance to preceding accent, F0 values of preceding accent, etc. For some languages, the global trend seems to be mainly due to localised changes in the contour (as proposed in Liberman-Pierrehumbert’s model), whereas for others some amount of global pre-planning is required (see Rialland 2001; Gussenhoven 2004). In the former languages, phrasal length is not a crucial factor, unlike the F0 value of preceding accents (e.g. Prieto et al. 1996); in the latter languages, the length of the phrase is crucial and speakers tend to begin higher as the phrase gets longer (e.g. Rialland 2001). Within the same language, the factors affecting the scaling of different tones may also vary. For example, scaling of accentual peaks in Mexican Spanish is mainly predicted by the F0 value of the previous peak (Prieto et al. 1996), whereas scaling of L tones requires a combination of contextual factors, among them phrasal length (Prieto 1998). In this section we examine the impact of the length of the phrase on the scaling of HBT. We have measured phrase length in number of syllables, and thus the distance between HBT and the beginning of the phrase (which is also the beginning of the utterance in our data) may vary from three to fifteen syllables. In our analysis of the length factor, we looked at the ratio between HBT and the F0 value at the beginning of the utterance (UttIni) across languages, for each speaker and by nuclear pitch accent type. We then checked whether the results found were mainly due to the impact of length on HBT scaling, UttIni scaling, or both. 1,9
1,8
HBT/UTTINI
1,7
1,6
1,5
1,4
1,3 3
5
9
10
15
3
5
9
10
SPEAKER: PG
SPEAKER: NM
15
±Std. Dev. ±Std. Err. Mean
LENGTH (nº syllables)
Figure 6: HBT/UttIni ratio as a function of phrase length in number of syllables, for both Catalan speakers.
In Catalan, although there is a slight tendency for the HBT/UttIni ratio to be higher in longer phrases, the effect is neither consistent nor significant, as shown in Figure 6 (p>0.01).
144
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
1,38
1,34
HBT/UTTINI
1,30
1,26
1,22
1,18
1,14
1,10 3
5
9
15
±Std. Dev. ±Std. Err. Mean
LENGTH (nº syllables)
Figure 7: HBT/UttIni ratio as a function of phrase length in number of syllables, for the phrases produced with a L*+H nuclear accent by Spanish speaker LM.
Spanish is similar to Catalan in that no significant effect of phrase length was found for any of the speakers or accent types (ANOVA results for HBT/UttIni: speaker LM, L*+H and L+H* p>0.05; speaker MR, L*+H and L+H* p>0.1; ANOVA results for UttIni: LM, L*+H and L+H* p>0.1; MR, L*+H and L+H* p>0.1). This is illustrated by the data for speaker LM showing the L*+H nuclear accent plotted in Figure 7. In addition, there is no consistent tendency in Spanish to have higher or lower HBT/UttIni ratios, across speakers or accent types. The European Portuguese data offer a very different picture. There is an overall tendency, consistent across speakers and accent types, to have lower HBT/UttIni ratios with increasing phrase length. This effect is clear in SEP (though not statistically significant) and in NEP, where it is significant for both speakers and the different accents, whether rising or falling (NEP; speaker MI p<0.01 for L*+H and p<0.05 for L+H*; speaker MS p<0.0001 for L*). The EP results are illustrated in Figure 8. Our attempt to determine whether this clear effect of length was mainly due to an impact on the scaling of UttIni or on the scaling of HBT, or both, revealed an interesting finding. In SEP, no effect of length on HBT was found (ANOVA results: speaker MC, H+L* accent p>0.1 5 ; speaker AG, H+L* and L*+H p>0.1). However, phrase length had a strong and significant effect on UttIni: the beginning of the utterance becomes higher with increasing phrase length (ANOVA results: speaker MC p<0.01; speaker AG p<0.05). This is shown in Figure 9, for speaker AG, which shows the reverse effect of that depicted in Figure 8 (left panel): UttIni is scaled higher as the phrase becomes longer (Figure 9), whereas the HBT/UttIni ratio diminishes with phrase length (Figure 8). Clearly, then, the effect of length on the HBT/UttIni ratio in SEP is crucially due to its effect on UttIni scaling, not HBT scaling. 5
For this speaker there were not enough cases of L*+H across the different length conditions.
INTONATIONAL PHRASING IN ROMANCE
145
1,30 1,25 1,20
HBT/UTTINI
1,15 1,10 1,05 1,00 0,95 0,90 0,85 3
5
9
15
3
ACCENT TYPE: L*H
5
9
ACCENT TYPE: L*
15
±Std. Dev. ±Std. Err. Mean
LENGTH (nº syllables)
Figure 8: HBT/UttIni ratio as a function of phrase length in number of syllables. Left panel: phrases produced with a L*+H nuclear accent by SEP speaker AG. Right panel: phrases produced with a L* nuclear accent by NEP speaker MS. 240 235 230
UTTINI (Hz)
225 220 215 210 205 200 195 5
9
15
±Std. Dev. ±Std. Err. Mean
LENGTH (nº syllables)
Figure 9: UttIni scaling as a function of phrase length for SEP speaker AG.
In NEP, by contrast, there is a consistent effect of phrase length on the scaling of HBT: HBT becomes lower as phrase length increases (ANOVA results: speaker MI L*+H p<0.05; speaker MS L* p<0.0001) 6 . This is illustrated in Figure 10. As to the scaling of UttIni, there is only a slight tendency for UttIni to be higher with increasing length, but the effect is not significant (ANOVA results: speaker MI and speaker MS p>0.1). Thus in NEP, unlike SEP, the significant effect of length on the HBT/UttIni ratio is crucially 6
For the other nuclear accent types there was an insufficient number of cases in the different length conditions.
146
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
due to an effect on HBT scaling. The HBT/UttIni ratio decreases with increasing length (Figure 8, right panel) because HBT is scaled lower as length increases (Figure 10). The effect is still reinforced by the slight tendency for UttIni to be scaled higher in longer phrases. 245
235
HBT (Hz)
225
215
205
195
185
175 3
5
9
15
±Std. Dev. ±Std. Err. Mean
LENGTH (nº of syllables)
Figure 10: HBT scaling as a function of phrase length, for the phrases produced with a L* nuclear accent by NEP speaker MS.
In short, phrase length is a crucial factor in European Portuguese: lower HBT/UttIni ratios are obtained as the length of the phrase increases. However, in SEP length is crucial to the scaling of the beginning of the utterance, whereas in NEP length mainly affects the scaling of end of the phrase, that is, HBT. Italian is similar to EP, as it also shows a tendency (which does not reach significance) for HBT/UttIni ratios to fall as the phrase becomes longer. This is consistent across speakers and nuclear accent types. Figure 11 provides an illustration of this pattern for speaker LD. In Italian, phrase length does not affect the scaling of UttIni, as shown in Figure 12 (ANOVA results: speaker LD and speaker LC p>0.1). In other words, the scaling of the beginning of the utterance is not sensitive to phrase length in this language. Thus, like in NEP, the tendency shown by the HBT/UttIni ratios seems to be due to an effect of length on the scaling of HBT (which reached significance for speaker LD, p<0.01) 7 . Summing up, the impact of length on the scaling of HBT and UttIni reveals important differences across Romance languages. The languages observed seem to cluster in two main groups: (i) those showing an effect of phrase length, such as European Portuguese and Italian; (ii) those with no effect of length, such as Spanish and Catalan. Within the two groups, some further differences were found: (i) the first group shows lower ratios with increasing length; in SEP this is due to higher scaling of UttIni, while in Italian 7
For a detailed analysis of the scaling of UttIni as well as the first peak in the RLD, see Prieto et al. (2006).
INTONATIONAL PHRASING IN ROMANCE
147
length has no effect on UttIni but an effect on HBT, and in NEP length has a strong effect on HBT (which decreases with increasing length) combined with a tendency for higher scaling of UttIni; (ii) in the second group there is no clear effect of length, but a slight tendency to higher ratios in the longer phrases appears in Catalan, whereas in Spanish this is not consistent across speakers or nuclear accent types. 1,30 1,25 1,20
HBT/UTTINI
1,15 1,10 1,05 1,00 0,95 0,90 0,85 3
5
9
15
3
ACCENT TYPE: H+L*
5
9
15
ACCENT TYPE: L+H*
±Std. Dev. ±Std. Err. Mean
LENGTH (nº syllables)
Figure 11: Ratio HBT/UttIni as a function of phrase length for the Italian speaker LD, by nuclear accent type. 200
180
UTTINI (Hz)
160
140
120
100
80 3
5
9
15
3
SPEAKER: LC
5
9
SPEAKER: LD
15
±Std. Dev. ±Std. Err. Mean
LENGTH (nº syllables)
Figure 12: UttIni scaling as a function of phrase length for the two Italian speakers.
5.3
Scaling correlation between the first peak and HBT We saw in Section 5.2 that Romance languages vary with respect to the importance of phrase length for the scaling of HBT (and for the beginning of
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
148
the utterance). In the present section we examine the relation between the scaling of HBT and the scaling of the first peak (H1) of the phrase (which is also the first peak of the utterance in our data). It is known that, at least in some languages, the first peak sets the beginning F0 value from which the following peak value is computed, the process being locally iterated between adjacent peaks within the same phrase (e.g. Liberman & Pierrehumbert 1984; van den Berg, Gussenhoven & Rietveld 1992; Prieto et al. 1996). The question we set out to answer is whether the first peak has an influence on the scaling of the boundary peak in the languages observed. In the event that the iterated changes mentioned above do apply, we would expect to find a correlation between the scaling of the first peak and the scaling of HBT. If, on the other hand, HBT is scaled independently from the first peak (and the other peaks in the phrase), no correlation would be expected. Nuclear accents L+H* L*+H H+L* L*
Catalan PG NM 0.36* −0.14
Spanish LM MR 0.13 −0.25 0.43* 0.44*
SEP
NEP Italian MI MS LC LD 0.36 0.01 0.04 −0.13 0.16 −0.07 0.17 −0.53 0.06 0.52 0.20 0.26 Table 4: Correlation coefficients for first peak in the phrase and HBT, by speaker and nuclear pitch accent type (significant results are indicated by *; for all significant cases found p<0.001). MC
AG
Correlation results are given in Table 4 8 . In Catalan, a highly significant correlation was found for one of the speakers. Significant correlations were also found in Spanish, for both speakers, but only when the nuclear accent is of the L*+H type. Unlike Catalan or Spanish, no significant correlations were found in European Portuguese or Italian (p>0.05). Again, the languages cluster in two groups: (i) those without significant correlations between H1 and HBT, that is, European Portuguese and Italian; and (ii) those showing significant correlations, that is, Catalan and Spanish. Notably, this is the same grouping that was found previously when analysing the phrase length effect on the scaling of the H boundary tone (Section 5.2). 5.4
Summary and discussion The phonetics of the dominant boundary cue used in Romance languages—the H boundary tone—was analysed in Section 5. It was found that the choice of nuclear pitch accent is a major factor constraining the scaling of HBT in all the languages examined, and that this factor affects HBT height in a similar and consistent way across these languages: HBT is higher after rising accents (L*+H/L+H*) than after falling or low accents (H+L*/L*). This finding was interpreted as resulting from the upstep of HBT after an accentual 8
Blank cells in the table indicate that either the relevant accent type is not present in the speaker’s data, or the number of cases with that accent type, a H boundary tone, and a first peak is too small (i.e. less than 5).
INTONATIONAL PHRASING IN ROMANCE
149
H, along the lines suggested by Pierrehumbert (1980) for upstep relations between H tones. Unlike nuclear pitch accent choice, the impact of phrase length on the scaling of HBT, as seen by the ratio between HBT and the beginning of the phrase (UttIni), is not consistent across the languages, and both major and minor differences were found. The languages observed cluster in two groups. European Portuguese and Italian show a clear effect of phrase length. They show lower ratios with increasing length but in SEP this is due to higher scaling of UttIni, while in Italian there is no effect on UttIni, and in NEP there is a combination of both factors: a lower scaling of HTB and a tendency for higher scaling of UttIni. In contrast, Spanish and Catalan show no effect of phrase length (though a slight tendency to higher ratios in longer phrases appears in Catalan). The question arises as to whether the effect of length patent in the first group is a function of a whole phrase implementation effect or rather of a more local effect, such as the scaling of the previous accent. In the case of SEP, the available evidence points to a whole phrase effect, as it is the beginning of the phrase that is affected by the size of the phrase in a way similar to that described in Rialland (2001) for Dagara. In the case of NEP, evidence suggests a combined effect of local and global factors, as a main influence on HBT scaling was found together with a slight influence on UttIni. In Italian, only an influence on HBT scaling was found. How this local influence of phrase length on HBT should obtain in both NEP and Italian is a matter for future research. The best place to look seems to be the nuclear pitch accent, that is, the accent immediately preceding the boundary. We will thus explore this issue in the near future. Like the phrase length factor, the influence of the first peak on the scaling of HBT is also not consistent across languages. The analysis of this factor divided the languages observed in the same two groups: European Portuguese and Italian on the one hand, with no significant correlation between HBT and the first peak of the phrase (H1), and Catalan and Spanish on the other hand, with significant correlations between H1 and HBT. The interdependence between the scaling of HBT and the first peak in the latter group suggests that locally iterated changes in peak scaling within a phrase may apply in these languages. Indeed, in the Catalan data the number of accents in a phrase seems to affect the scaling of HBT, with HBT scaling higher in phrases with more accents. This may be taken as an indication that a local iterated computation between accents within a phrase (as shown by Prieto et al. 1996 for Mexican Spanish) and between the last accent and the boundary peak is an important factor to take into account in Catalan and Spanish. This is a topic to explore in further research. Finally, the impact of the two last factors examined—phrase length and first peak scaling—on the height of the boundary peak clusters languages in exactly the same way: European Portuguese and Italian show an effect of length, but not an effect of the first peak; conversely, Catalan and Spanish show an effect of the first peak, but not a length effect. Another question to be
150
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
addressed in future research within the Romance languages intonational phrasing project is whether this variation corresponds to different ways of implementing tone scaling across languages, as has been suggested in the literature (namely, via localized changes between adjacent peaks within a phrase as described in Liberman & Pirrehumbert 1984, inter alia, or via global phrasal implementation as described in Rialland 2001, among others). 6.
Conclusion This paper described the phonetics and phonology of intonational boundaries in Catalan, two varieties of European Portuguese, Italian, and Spanish. A typology of the boundary cues used was put forward and their relative frequency was established. Due to the dominant use of the H boundary tone across these languages and its contribution to the two main types of boundary configurations found, continuation rise and sustained pitch, this paper has focused on the phonology and phonetics of the whole tonal gesture that signals phrasing boundaries in Romance. It was shown that the phonology of intonational boundaries in Romance is characterized by two main properties: on one hand, these languages share the presence of the H boundary tone as a common feature, while on the other, nuclear pitch accent choice and the possible combinations of nuclear accent with the continuation rise/sustained pitch configurations divide these languages into two different groups. The phonetics of intonational boundaries offers a similar picture: nuclear pitch accent choice plays a major role on HBT scaling in all the languages, with HBT being upstepped after an accentual H, whereas the other factors analysed split these languages into exactly the same two groups. Overall, the variation found is between the Catalan-Spanish group on the one hand, and the European Portuguese-Italian group on the other. Within the latter, the Northern variety of European Portuguese is consistently closer to Italian than the Standard variety. It is hoped that the present findings will add to recent work on variation in intonation (inter alia, Grabe 2002; Chen 2003; Grice, D’Imperio, Savino & Avesani 2005), and contribute to our understanding of the dimensions of variation in intonational phrasing in Romance languages.
References Beckman, Mary & Janet Pierrehumbert. 1986. “Intonational structure in English and Japanese”. Phonology 3. 255-309. ---------- & Gayle Ayers. 1994. “ToBI annotation conventions”. http://ling.ohio-state.edu/~tobi/ame_tobi/. ----------, Manuel Díaz-Campos, Julia Tevis McGory & Terrell A. Morgan. 2002. “Intonation across Spanish, in the Tones and Break Indices framework”. Probus 14:1. 9-36. Berg, Rob van den, Carlos Gussenhoven & Toni Rietveld. 1992. “Downstep in Dutch: Implications for a model”. Papers in Laboratory Phonology II:
INTONATIONAL PHRASING IN ROMANCE
151
Gesture, Segment, Prosody ed. by Gerry Docherty & D. Robert Ladd. 335-359. Cambridge: Cambridge University Press. Bruce, Gösta & Eva Garding. 1978. “A prosodic typology for Swedish dialects”. Nordic Prosody ed. by Eva Garding, Gösta Bruce & Robert Bannert. 219-228. Lund: Gleerup. Chen, Aoju. 2003. “Language dependence in continuation intonation”. Proceedings of the 15th International Congress of Phonetic Sciences ed. by Maria-Josep Solé, Daniel Recasens & Joaquín Romero. 1069-1072. Barcelona: Causal Productions. D’Imperio, Mariapaola. 2000. The role of perception in defining tonal targets and their alignment. PhD diss., Ohio State University. ----------. 2001. “Focus and tonal structure in Neapolitan Italian”. Speech Communication 33:4. 339-356. ----------. 2002. “Italian intonation: An overview and some questions”. Probus 14:1. 37-69. ----------, Gorka Elordieta, Sónia Frota, Pilar Prieto & Marina Vigário. 2005. “Intonational phrasing in Romance: The role of syntactic and prosodic structure”. Prosodies ed. by Sónia Frota, Marina Vigário & Maria João Freitas. 59-97. Berlin & New York: Mouton de Gruyter. Elordieta, Gorka, Sónia Frota, Pilar Prieto & Marina Vigário. 2003. “Effects of constituent weight and syntactic branching on intonational phrasing in Ibero-Romance”. Proceedings of the 15th International Congress of Phonetic Sciences ed. by Maria-Josep Solé, Daniel Recasens & Joaquín Romero. 487-490. Barcelona: Causal Productions. ----------, Sónia Frota & Marina Vigário. 2005. “Subjects, objects and intonational phrasing in Spanish and Portuguese”. Studia Linguistica 59:2/3. 110-143. Estebas-Vilaplana, Eva. 2000. The use and realisation of accentual focus in Central Catalan. PhD diss., University College London. Face, Timothy. 2002. Intonational Marking of Contrastive Focus in Madrid Spanish. Berlin: Lincom-Europa. Frota, Sónia. 1997. “On the prosody and intonation of Focus in European Portuguese”. Issues in the Phonology and Morphology of the Major Iberian Languages ed. by Fernando Martínez-Gil & Alfonso MoralesFront. 359-392. Washington, D.C.: Georgetown University Press. ----------. 2000. Prosody and Focus in European Portuguese. New York: Garland. ----------. 2002a. “Nuclear falls and rises in European Portuguese: A phonological analysis of declarative and question intonation”. Probus 14:1. 113-146. ----------. 2002b. “Tonal association and target alignment in European Portuguese nuclear falls”. Laboratory Phonology 7 ed. by Carlos Gussenhoven & Natasha Warner. 387-418. Berlin & New York: Mouton de Gruyter.
152
FROTA, D’IMPERIO, ELORDIETA, PRIETO & VIGÁRIO
---------- & Marina Vigário. In press. “Intonational phrasing in two varieties of European Portuguese”. Tones and Tunes, Volume I, Typological and Comparative Studies in Word and Sentence Prosody ed. by Tomas Riad & Carlos Gussenhoven. Berlin & New York: Mouton de Gruyter. Grabe, Esther. 2002. “Variation adds to prosodic typology”. Speech Prosody 2002 - Proceedings of the 1st International Conference on Speech Prosody ed. by Bernard Bel & Isabel Marlien. 127-132. Aix-enProvence: Laboratoire de Parole et Language, Université de Provence. Grice, Martine, Mariapaola D’Imperio, Michelina Savino & Cinzia Avesani. 2005. “Towards a strategy for labelling varieties of Italian”. Prosodic Models and Transcription: Towards Prosodic Typology ed. by Sun-Ah Jun. 55-83. Oxford: Oxford University Press. Grønnum, Nina & Maria do Céu Viana. 1999. “Aspects of European Portuguese Intonation”. Proceedings of the 14th International Congress of Phonetic Sciences ed. by John Ohala, vol.3, 1997-2000. Berkeley: University of California. Gussenhoven, Carlos. 2004. The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. Hualde, José Ignacio 2002. “Intonation in Spanish and the other IberoRomance languages: Overview and status quaestionis”. Romance Phonology and Variation ed. by Caroline Wiltshire & Joaquim Camps. 101-115. Amsterdam & Philadelphia: John Benjamins. Ladd, D. Robert. 1983. “Phonological features of intonational peaks”. Language 59. 721-759. ----------. 1996. Intonational Phonology. Cambridge: Cambridge University Press. Liberman, Mark & Janet Pierrehumbert. 1984. “Intonational invariance under changes in pitch range and length”. Language Sound Structure: Studies in Phonology Presented to Morris Halle by his Teacher and Students ed. by Mark Aronoff & Richard Oehrle. 157-233. Cambridge, Mass.: The MIT Press. McGory, Julia Tevis & Manuel Díaz-Campos. 2002. “Declarative intonation patterns in multiple varieties of Spanish”. Structure, Meaning, and Acquisition of Spanish. Papers from the 4th Hispanic Linguistic Symposium ed. by James F. Lee, Kimberly L. Geeslin & J. Clancy Clements. 73-92. Somerville: Cascadilla Press. Nibert, Holly 2000. Phonetic and phonological evidence for intermediate phrasing in Spanish intonation. PhD diss., University of Illinois at Urbana-Champaign. Pierrehumbert, Janet. 1980. Phonetics and phonology of English intonation. PhD diss., MIT. ---------- & Mary Beckman. 1988. Japanese Tone Structure. Cambridge, Mass.: MIT Press.
INTONATIONAL PHRASING IN ROMANCE
153
Price, Patti, Mari Ostendorf, Stefanie Shattuck-Hufnagel & Cynthia Fong. 1991. “The use of prosody in syntactic disambiguation”. Journal of the Acoustical Society of America 90. 2956-2970. Prieto, Pilar. 1995. “Aproximació als contorns entonatius del català central”. Caplletra, Revista Internacional de Filologia 19. 161-186. ----------. 1998. “The scaling of the L values in Spanish downstepping contours”. Journal of Phonetics 26. 261-282. ----------. 2005. “Syntactic and eurhythmic constraints on phrasing decisions in Catalan”. Studia Linguistica 59:2/3. 194-222. ----------. 2007. “Phonological phrasing in Spanish”. Optimality-Theoretic Advances in Spanish Phonology ed. by Fernando Martínez-Gil & Sonia Colina. 39-61. Amsterdam & Philadelphia: John Benjamins. ----------, Jan van Santen & Julia Hirschberg. 1995. “Tonal alignment patterns in Spanish”. Journal of Phonetics 23. 429-451. ----------, Chilin Shih & Holly Nibert. 1996. “Pitch downtrend in Spanish”. Journal of Phonetics 24. 445-473. ---------, Mariapaola D’Imperio, Gorka Elordieta, Sónia Frota & Marina Vigário. 2006. “Evidence for soft preplanning in tonal production: Initial scaling in Romance”. Proceedings of Speech Prosody 2006 ed. by Rüdiger Hoffmann & Hansjörg Mixdorff. 803-806. Dresden: TUDPress Verlag der Wissenschaften. ----------, Lourdes Aguilar, Ignasi Mascaró, Francesc J. Torres & Maria del Mar Vanrell. 2007. “CatToBI (Catalan Tones and Break Indices)”. http://seneca.uab.es/atlesentonacio. Rialland, Annie. 2001. “Anticipatory raising in downstep realization: Evidence for preplanning in tone production”. Proceedings of the Symposium Cross-Linguistic Studies of Tonal Phenomena: Tonogenesis, Japanese Accentology, and Other Topics ed. by Shigeki Kaji. 301-321. Tokyo: Institute for Languages and Cultures of Asia and Africa/Tokyo University of Foreign Studies. Sosa, Juan Manuel. 1999. La entonación del español: Su estructura fónica, variabilidad y dialectología. Madrid: Cátedra. Vigário, Marina. 1998. Aspectos da Prosódia do Português Europeu: Estruturas com advérbio de exclusão e negação frásica. Braga: CEHUM. ---------- & Sónia Frota. 2003. “The intonation of Standard and Northern European Portuguese”. Journal of Portuguese Linguistics 2:2. 115-137.
DISENTANGLING STRESS FROM ACCENT IN SPANISH PRODUCTION PATTERNS OF THE STRESS CONTRAST IN DEACCENTED SYLLABLES *
1
MARTA ORTEGA-LLEBARIA1 & PILAR PRIETO2 University of Texas at Austin, 2ICREA & Universitat Autònoma de Barcelona
Abstract According to Sluijter and colleagues (1996b, 1997), stress is independent from accent because it has its own phonetic cues: stressed vowels are longer and have flatter spectral tilts than their unstressed counterparts. However, Campbell and Beckman (1997) show that, for American English, these duration and spectral tilt patterns are a consequence of vowel reduction: when unreduced vowels with different levels of stress (primary and secondary stress) are compared, duration and spectral tilt do not correlate with the stress difference. This paper contributes to the above discussion by examining the stress contrast in deaccented syllables in Spanish. Since Spanish has no phonological vowel reduction, it constitutes a good test case for the above hypotheses. Moreover, this study attempts to disentangle the correlates of stress from those of accent, something which has thus far not been done in the traditional literature on Spanish stress. The results indicate that stress contrast in Spanish is maintained in deaccented contexts by differences in duration, spectral tilt, and to a lesser extent, vowel quality.
1.
Introduction In this article we examine the phonetic characterization of the stress contrast in Spanish in accented and deaccented syllables. Stress (or ‘primary stress’) is a structural linguistic property of a word which specifies which syllable will be ‘stronger’, i.e. more prominent than the others. In stress-accent languages, stressed syllables serve as the landing site for accents, which are signalled acoustically by a pitch movement (Bolinger 1958, 1961; Pierrehumbert 1980; Beckman 1986; Ladd 1996; Beckman & Edwards 1994; Sluijter & van Heuven 1996a, 1996b, among others). However, not all *
The results of this experiment were presented at the 2005 PaPI Conference (Bellaterra, June 2005). We would like to thank the audience at this conference, and especially Lluïsa Astruc, Sónia Frota, Barbara Gili-Fivela, José Ignacio Hualde, Bob Ladd, Carme de la Mota, and Marina Vigário for very useful feedback. We are also grateful to Laura Colantoni and two anonymous reviewers for their comments on an earlier version of this paper. Finally, we thank the informants who kindly participated in the production experiment (Néstor, Ángel, Esther, Cristina, and Mari Carmen). This research has been funded by grants 2002XT-00032, 2001SGR 00150, and 2001SGR 00425 from the Generalitat de Catalunya and HUM200601758/FILO from the Ministry of Science and Technology of Spain.
156
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
syllables with primary stress are accented in all discourse contexts: the presence or absence of a pitch accent depends on the larger prosodic structure in which the lexical item is found. Thus, there exist at least three levels of syllabic prominence: unstressed, stressed and accented, and stressed but not accented. Thus far, acoustic cues to stress prominence in Spanish have been studied in words and sentences spoken in intonation patterns that exhibited covariation between stress and accent. In other words, all stressed syllables also had a pitch accent, while unstressed syllable were deaccented (see Navarro Tomás 1914, 1964; Contreras 1963, 1964; Quilis 1971; Gili Gaya 1975; Solé 1984; Canellada & Kuhlman-Madsen 1987; Llisterri, Machuca, de la Mota, Riera & Ríos 2003, among others). As a consequence the cues to stress could not be distinguished from cues to accent in the results of these studies and, not surprisingly, the researchers found that pitch movements accompanied stressed syllables. Of these authors, only Navarro Tomás claimed that, in Spanish, the strongest cue to stress was a local increase in loudness or intensity and established the idea that stress in Spanish was mainly an ‘intensity stress’ (the so-called acento de intensidad), while relating pitch movements to intonation 1 . To our knowledge, the only results available on the production of stress cues in Spanish while controlling for the effects of accent are those of Ortega-Llebaria (2006). She finds evidence that supports Navarro-Tomas’s hypothesis, namely that stress and accent in Spanish are related to different phonetic cues, i.e. pitch relates to accent while intensity cues stress. However, her study was limited to oxytone words. In terms of other languages, Sluijter and colleagues’ experiments on the correlates of stress in Dutch and American English were among the first that controlled for stress and accent covariation (Sluijter & van Heuven 1996b; Sluijter, van Heuven & Pacilly 1997). They found that stressed syllables were longer and had flatter spectral tilts than their unstressed counterparts, regardless of whether they bore a pitch accent or not. Thus, they too found that intensity cues related to stress, not accent, and concluded that stress was not a weaker degree of accent: One would expect to observe lower values along all measure correlates in stressed syllables of unaccented words. However, what we do observe is weakening along only those dimensions that are related to the omission of accent-lending pitch movements. (Sluijter & van Heuven 1996b:2483)
1
“El acento de intensidad, que en estado actual de la pronunciación española influye más que ningún otro elemento en la estructura prosódica de nuestras palabras, proviene directamente, en la mayor parte de los casos, de la acentuación latina.” (Navarro Tomás 1914:176, sec. 159) “A veces, bajo una misma forma, se dan dos o tres palabras distintas, que fonéticamente sólo se diferencian por el lugar en que cada una de ellas corresponde al acento de intensidad: límitelimite-limité, célebre-celebre-celebré, depósito-deposito-depositó... (..) El oído español es evidentemente más sensible a las modificaciones de intensidad que a las de otros elementos fonéticos.” (Navarro Tomás 1914:177, sec. 159)
DISENTANGLING STRESS FROM ACCENT IN SPANISH
157
Campbell and Beckman (1997) replicated Sluijter’s study for American English, but with a change in focus. Instead of comparing full stressed vowels with primary stress to unstressed and reduced vowels, they compared unreduced vowels with primary stress to unreduced vowels with secondary stress. Their intention was to demonstrate that the patterns obtained by Sluijter and colleagues for American English were related to the vowel reduction differences between their target vowels. Campbell and Beckman hypothesized that the absence of vowel reduction would result in an absence of duration and spectral tilt differences related to stress. Their results confirmed their hypothesis: spectral balance did not differentiate levels of stress in the absence of a pitch accent, indicating that subjects did not use duration cues in a consistent fashion. Thus, if Spanish patterns like Dutch, it will show stress differences based on duration and spectral tilt in deaccented contexts. If, on the other hand, it is true that in the absence of vowel reduction there are no differences between stress levels, as demonstrated for unreduced vowels in English, then Spanish, which has no vowel reduction, will not be able to maintain a stress contrast in contexts where there is no covariation between stress and accent. In the present study, in order to test these hypotheses, we will examine the phonetic cues of duration, vowel quality, intensity, and pitch movements in stressed and unstressed syllables as spoken within declarative sentences and parenthetic phrases in Spanish. The article is organized as follows. Section 2 describes the methodology used for the production experiment. In Section 3, we present the main effects of pitch, duration, overall intensity and spectral tilt on the stress and accent dimension, as well as the results of the linear discriminant analyses. Finally, in Section 4, we discuss the relative strength of these four acoustic cues as correlates of stress and accent in Spanish and compare our results with the results for other languages. 2. 2.1
Methodology Materials In order to examine the [+/−stress] contrast, we created a corpus of fifteen four-syllable verbs that end either in -nimar, like desanimar (“to discourage”), or in -minar, like determinar (“to determine”). As shown in Table 1, the target verbal forms used in the experiment had either a paroxytone stress in the present tense (i.e. desanimo “I discourage”, determino “I determine/calculate”), or an oxytone stress in the past tense (i.e. desanimó “(s)he discouraged” and determinó “(s)he determined/calculated”). In this way, we were able to contrast syllables that have the same segmental content and that differ only in degree of prominence, for example, stressed [no] in determinó vs unstressed [no] in determino, and stressed [mi] in determino vs unstressed [mi] in determinó. [N.B. throughout the article, stressed syllables are underlined.]
158
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
Present, 1st person sing. Past, 3rd person (paroxytones) (oxytones) abominar “to abominate” abomino abominó determinar “to determine” determino determinó denominar “to name” denomino denominó desanimó desanimar “to discourage” desanimo descaminó descaminar “to mislead” descamino discriminó discriminar “to discriminate” discrimino diseminó diseminar “to spread” disemino eliminó eliminar “to eliminate” elimino encaminó encaminar “to guide” encamino examinó examinar “to examine” examino exterminó exterminar “to exterminate” extermino iluminó iluminar “to light” ilumino incrimino incriminó incriminar “to incriminate” predominó predominar “to predominate” predomino recrimino recriminó recriminar “to recriminate” Table 1: Target verbs used in the experiment. Spanish verbs
sing.
Figure 1: Waveform, spectrogram with F0 track, and segmentation tier of the declarative utterance Determinó la masa “She determined the mass” (left) and of the quotation mark Determinó complacida “She determined in a satisfied way” (right).
In order to control for stress and accent covariation, each of the fifteen four-syllable verbs was embedded in a segmentally identical utterance fragment that was spoken with either a declarative intonation or the flat intonation of parenthetic sentences. In declarative sentences, stressed syllables also bear a pitch accent while unstressed syllables remain deaccented. In contrast, in parenthetical intonation, F0 is flat across the utterance and shows no pitch accents (Figure 1). Thus, for each verb, we obtain the four-sentence set shown in Table 2: one declarative sentence with the verb in the present tense in (a), one declarative sentence with the verb in the past tense in (b), one parenthetic sentence with the verb in the present in (c), and one parenthetic sentence with the verb in the past in (d). The data results in a total of three hundred syllabic
DISENTANGLING STRESS FROM ACCENT IN SPANISH
159
tokens: two syllabic positions (final and penultimate) x two utterance types (declarative and parenthetic) x fifteen verbs x five subjects. [+stress] paroxytone verbs [−stress] oxytone verbs
Declarative sentences [+accent]
Parenthetic sentences [−accent]
(a) Determino la masa.
(c) —La masa del átomo es medible— determino complacida. [−accent]
[−accent] (b) Determinó la masa.
(d) —La masa del átomo es medible— determinó complacida. Table 2: Target syllable mi (in bold) in four sentences. Underlining indicates stressed syllables.
2.2
Procedure Thirty cards were prepared which each showed a verb in infinitival form, a context, and two questions with their corresponding answers. The subjects were told that they would hear a question and should then read the appropriate answer with the corresponding intonation, i.e. either with a declarative intonation and or the flat intonation of a parenthetic sentence. After shuffling the thirty cards, the experimenter (either the first or the second author) chose the card on top of the pile and then read aloud the verb, context, and first question to the subject. The subject said the answer with the appropriate intonation. Then the second question was read and the subject read out the second answer accordingly. If the experimenter thought that the subject’s pronunciation or intonation of an utterance was unnatural, the speaker was asked to repeat the sentence. The process was repeated for each one of the thirty cards. Speakers were recorded individually in a quiet room, using a Sennheiser MKH20P48U3 omnidirectional condenser microphone and a Pioneer PDR609 digital CD-recorder. Speech samples were digitized at 32000 Hz in 16-bit mono, and target utterances were double-checked to make sure that they had been produced with the intended prosody. 2.3
Subjects Five native speakers of Barcelona Spanish, two male and three female, participated in the experiment. Their ages ranged from twenty-six to forty-two years old. All subjects had earned university degrees and spoke an educated variety of their Spanish dialect. They reported that they normally spoke this language with their parents and siblings, and had learnt Catalan later as a second language in school. No subject reported having speech or hearing problems.
160
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
2.4
Data Analysis and Measurements The following measurements were made with Praat (Boersma & Weekink 2005; Wood 2005) on each of the three hundred syllabic tokens. 2.4.1 Fundamental Frequency. We took the general view that pitch movements are the correlate of accent. In order to test this assumption, we measured the pitch range of the target pitch accent (in the accented case) or target syllable (in the unaccented case). The valleys and peaks of the prenuclear pitch accents (see L and H marks in Figure 2) were then marked. In the cases where the pitch was completely flat, such as in the parenthetic sentences, marks were placed at the beginning and at the end of the syllable. A Praat script extracted the F0 value in Hz at the marked points and calculated the pitch range by subtracting the F0 values at L from the F0 values at H for each of the three hundred tokens. Pitch range was given in absolute values. 2.4.2 Duration. Each segment of the verb endings -mino and -nimo was marked according to the F2 transitions displayed in the spectrograms. Vowels contained the transitions (see marks for ‘m’, ‘i’, ‘n’, ‘o’ in tier 1 of Figure 2). A script calculated the duration of each segment and each syllable in milliseconds. 2.4.3 Vowel quality. Formant measurements were based on the stable part of the vowel as marked in tier 3 in Figure 2. F1, F2 and F3 were calculated as frequency averages in Barks. Vowel quality for each token was computed as the difference between F2 and F1. 2.4.4 Intensity. Following Sluijter and van Heuven (1996b), we estimated intensity in terms of both overall intensity and spectral tilt (or spectral balance). Speakers were first normalized for overall differences in intensity. By using an algorithm included in the sound editing software ‘Cool Edit’, the loudest part of the waveform was set to a specified amplitude, −10dBFS in our study, thereby raising or lowering all other parts of the same waveform by the same amount. In this way, we ensured that all files and all speakers had a consistent volume. Overall intensity was estimated using the command ‘Get intensity’ from Praat over the stable part of each vowel (tier 3 in Figure 2), after having levelled each sentence for loudness. To obtain the measures of spectral tilt for vowel [o], we extracted the amplitudes of two frequency bands as segmented in tier 3: band 1 ranged from 0 to 400 Hz and band 2 from 400 Hz to 4000 Hz. Band 1 contained F0 while band 2 contained the vowel formants. The same procedure could not be performed on vowel [i], because F1 frequency was too low to be separated from F0. The spectral tilt for vowel [o] was computed as the ratio of band 2 to band 1. Thus a score closer to 1 indicates that the intensity from the lower
DISENTANGLING STRESS FROM ACCENT IN SPANISH
161
frequencies is similar to that in the highest frequencies, while a score closer to 0 shows that the intensity of that vowel is concentrated in the lower band.
Figure 2: Waveform, spectrogram, F0 trajectory, and segmentation tier of the declarative utterance Determinó la masa “(S)he calculated the mass”.
2.4.5 Statistical analysis. We first performed a Repeated Measures ANOVA with the factors vowel ([i], [o]) and accent (+accent/−accent) on the measurements of pitch range in order to verify that accented and stressed vowels bore a pitch accent while unaccented and stressed vowels did not. As for duration, vowel quality, spectral tilt and overall intensity, we performed two statistical analyses. First, we ran a Repeated Measures ANOVA with stress (+/−stress) and intonation (declaratives/parenthetic sentences) as main factors on each vowel ([i] / [o]) for each set of measurements. Second, in order to investigate the contribution of each set of measurements in the prediction of stress we carried out a Linear Discriminant Analysis (LDA) with duration, vowel quality, spectral tilt, and overall intensity as the predictor variables and stress or accent as the criterion variables. 3. 3.1
Results Pitch range differences One of the first things we wanted to check was whether accented syllables (in declarative sentences) were consistently produced with a rising pitch trajectory, in contrast with unaccented stressed syllables (in parenthetics), which were expected to be flat in pitch. The graph in Figure 3 shows mean values and standard error (in Hz) of the pitch range of the stressed syllables in paroxytones (in gray) and oxytones (in black) in accented and unaccented conditions for all five speakers. As is clear from the graph, subjects consistently used a pitch increase in declarative sentences (e.g. the [+accent] condition: mean 40.93 Hz, s.d. 30.22) and practically no increase or F0 variation in the parenthetic sentences (e.g. the [−accent] condition: mean −0.91, s.d. 4.11). A one-way ANOVA corroborated this difference as
162
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
significant (F(1,298) = 526.222, p<0.0001). Thus, as expected, lexical stress was consistently cued by a pitch accent in declarative sentences, while it was not in parenthetic utterances.
Figure 3: Mean values and standard errors (in Hz) of the pitch range of stressed syllables.
3.2 Duration, vowel quality, overall intensity, and spectral tilt. 3.2.1 Duration. The two graphs in Figure 4 plot the confidence intervals for the mean of the penultimate syllable mi (left panel) and word-final syllable no (right panel) in different stress (stressed/unstressed) and intonation (declarative/parenthetic) conditions for all five speakers. Three patterns stand out. First, stressed syllables (in grey) are systematically longer than unstressed syllables (in black), and most importantly, this difference is maintained across intonation contexts, meaning that stressed syllables are longer even in unaccented environments. Moreover, the magnitude of lengthening of the factor [stress] is greater in word-final syllables than in penultimate syllables (mean differences between stressed and unstressed syllables: 15 ms for wordfinal syllables vs 7 ms for penultimate syllables). Second, we find no consistent patterns with respect to the potential lengthening effects of accent: while wordfinal syllables (syllable 2) are longer in accented (declarative) contexts than in unaccented (parenthetic) contexts, this effect is not obvious for syllable 1 (mean differences between declarative and parenthetic sentences: 1 ms for penultimate syllables vs 6 ms for word-final syllables). Importantly, though, the magnitude of lengthening exerted by the presence of stress is higher than that produced by accent. Finally, the graphs in Figure 4 also show that final syllables are longer than penultimate syllables in all conditions (stressed, unstressed, accented, unaccented). This effect might be related to either the inherent duration of vowels or word position.
DISENTANGLING STRESS FROM ACCENT IN SPANISH
163
We ran a Repeated Measures ANOVA with the factors of stress (+stress/−stress) and intonation (declarative/parenthetic) on the duration of syllables 1 and 2. The main factor of stress was significant while the interaction ‘stress x intonation’ was non-significant, meaning that stressed syllables were longer than unstressed syllables in both conditions (stress: [i] F(1,74) = 31.635, p<0.0001; [o] F(1,74) = 86.535, p<0.0001; interaction: [i] F(1,74) = 2.293, p = 0.134; [o] F(1,74) = 0.019, p = 0.891). The main factor of intonation was significant only for vowel [o] ([i] F(1,74) = 0.156, p = 0.694; [o] F(1,74) = 9.987, p = 0.002). Syllable 1: [mi]
Syllable 2: [no]
Figure 4: Mean syllable duration of stressed and unstressed syllables in declarative and parenthetic utterances. Syllable
Intonation
Contrast
Significance
Declarative
[+stress, −stress]
p<0.0001
Parenthetic
[+stress, −stress]
p<0.01
Declarative
[+stress, −stress]
p<0.0001
Parenthetic
[+stress, −stress]
P<0.0001
1: [mi]
2: [no]
Table 3: Results of paired-samples t-tests on duration of syllable 1 and syllable 2. Significance at 0.05 alpha level. Bonferroni adjustment for multiple comparisons.
164
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
Table 3 shows the results of paired-samples t-tests comparing stressed with unstressed syllables within declarative and parenthetic sentences for each syllable. The results confirm that the duration differences between stressed and unstressed syllables remain significant within declarative and parenthetic sentences for both syllable 1 and syllable 2. Therefore, the differences in duration between stressed and unstressed syllables are significant regardless of the presence of an accent. Further paired samples t-tests compared stressed accented syllables with stressed unaccented syllables, as well as unstressed and unaccented syllables from declarative and parenthetic sentences. As the results in Table 4 show, the differences between stressed accented and stressed unaccented syllables are only significant for syllable 2. Moreover, unstressed and unaccented syllables from declarative and parenthetic sentences are also significantly different only for syllable 2. These results confirm that the factor [+accent] does not yield a systematic additive effect on syllable duration and suggest that the factor involved in the lengthening of the last syllable is probably related to the inherent duration of vowels (i.e. the [i] in the penultimate syllable is shorter than the [o] in the last syllable), or within-word position (word-medial versus word-final position), rather than the property of being accented or not. Syllable
Stress
Contrast
Significance
[+stress]
[+accent, −accent]
p>0.05 n.s.
[−stress]
[−accent, −accent]
p>0.05 n.s.
[+stress]
[+accent, −accent]
p<0.05
[−stress]
[−accent, −accent]
p<0.01
1: [mi]
2: [no]
Table 4: Results of paired-samples t-tests on duration of syllable 1 and syllable 2. Significance at 0.05 alpha level. Bonferroni adjustment for multiple comparisons.
We conclude on the basis of these results that duration is a strong acoustic correlate of the stress difference in Spanish, but not of the presence of an accent. In our data, the presence of an accent does not obligatorily trigger lengthening on the stressed syllable. 3.2.2 Vowel quality. Given that males tend to have lower formant values than female speakers, vowel quality was measured as the distance in Barks between F2 and F1 separately for female and male speakers in our data. The graphs in Figure 5 illustrate the mean confidence intervals for the mean F2-F1 difference in Barks for vowel [i] (female speakers in top left panel, male speakers in top right panel) and for vowel [o] (the panels at the bottom, females left panel,
DISENTANGLING STRESS FROM ACCENT IN SPANISH
165
males right panel) in different stress (stressed/unstressed) and intonation conditions (declarative/parenthetic). The graphs reveal that the differences between stressed and unstressed vowels are less than 1 Bark for all the contexts. The direction of the reduction is consistent only for vowel [o]. Female and male speakers tend to increase the distance between F1 and F2 in unstressed [o], making it closer to a central vowel. Vowel [i], females
Vowel [i], males
Vowel [o], females
Vowel [o], males
Figure 5: Mean F2-F1 difference (in Barks) for vowel [i] and vowel [o] in different stress and intonation conditions.
Repeated measures ANOVA with the factors of stress (+/−stress) and intonation (declarative/parenthetic sentences) for vowel [o] shows the main
166
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
factor of stress to be significant for both female speakers (F(1,44) = 25.098, p<0.0001) and male speakers (F(1,29) = 30.856, p<0.0001), while intonation and the interaction ‘stress x intonation’ are non-significant. This indicates that only stress, and not accent, has an effect on vowel quality changes (intonation: females F(1,44) = 0.288, p = 0.594; males F(1,29) = 0.083, p = 0.775; interaction: females F(1,44) = 0.146, p = 0.704; males F(1,29) = 0.300, p = 0.588). Paired-samples t-tests confirm that the 1 Bark difference between stressed and unstressed [no] is maintained across intonation contexts by both female and male speakers. Syllable [no]
Intonation
Contrast
Significance
Declarative
[+stress, −stress]
p<0.0001
Parenthetic
[+stress, −stress]
p<0.0001
Declarative
[+stress, −stress]
p<0.0001
Parenthetic
[+stress, −stress]
p<0.05
Females
Males
Table 5: Results of paired-samples t-tests on vowel quality of syllable [o]. Significance at 0.05 alpha level. Bonferroni adjustment for multiple comparisons.
In sum, we found a small but consistent effect of stress on the formant values of [o]: Unstressed [o] becomes slightly more centralized than stressed [o]. These results are in agreement with those of Quilis and Esgueva (1983), which showed a slight tendency for centralization in unstressed mid-vowels in Castilian Spanish, and with patterns of vowel reduction in unstressed syllables across Romance languages. In contrast, pitch accents did not have any significant effect on vowel quality changes. 3.2.3 Overall intensity. The two graphs in Figure 6 display the confidence intervals for the mean overall intensity (in dB) for vowel [i] (left panel) and vowel [o] (right panel) in different stress (stressed/unstressed) and intonation conditions (declarative/parenthetic) for all five speakers. In the first place, the graphs reveal that in contrast with duration and vowel quality, stressed and unstressed vowels differ in overall intensity only within declarative sentences. In parenthetic sentences, these differences tend to disappear. This means that, on the one hand, there is no consistent effect of stress on overall intensity measurements; on the other, it indicates a possible effect of accent. Moreover, note that the declarative ‘stressed-unstressed’ pattern in vowel [i] is reversed in vowel [o]. While stressed [i] has a higher overall intensity than unstressed [i], unexpectedly, for [o], it is the unstressed vowel that displays a higher overall intensity score.
DISENTANGLING STRESS FROM ACCENT IN SPANISH
Vowel 1: [i]
167
Vowel 2: [o]
Figure 6: Mean overall intensity and standard error (in dB) for vowel [i] (left panel) and vowel [o] (right panel) in different stress (stressed/unstressed) and intonation contexts (declarative/parenthetic) for all five speakers.
Results from the Repeated Measures ANOVA show that the ‘stress x intonation’ interaction is significant for both syllables (vowel [i] F(1, 74) = 27.140, p<0.0001; vowel [o] F(1, 74) = 20.559, p<0.0001) indicating that patterns of overall intensity differ across declarative and parenthetic sentences. Paired-samples t-tests indicate that the difference in overall intensity is only active in declarative sentences. Since declarative sentences differ in regards to accent while parenthetic sentences do not, overall intensity relates to accent, not stress. Vowels
Intonation
Contrast
Significance
Declarative
[+stress, −stress]
p<0.0001
Parenthetic
[+stress, −stress]
p>0.05 n.s.
Declarative
[+stress, −stress]
p<0.0001
Parenthetic
[+stress, −stress]
p>0.05 n.s.
1: [i]
2: [o]
Table 6: Results of paired-samples t-tests on overall intensity of vowels 1 and 2. One-tailed, significance at 0.05 alpha level. Bonferroni adjustment for multiple comparisons.
There is an asymmetry in the overall intensity values of declarative sentences: in vowel [i], unstressed syllables display lower overall intensities
168
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
than their stressed counterparts, while in [o] this pattern reverses. We may be able to explain this if we consider the F0 trajectories of each vowel in detail and assume that there is a possible covariation between F0 trajectories and overall intensity. Greater intensity is generally found in accented syllables due to the larger amplitude of vocal fold vibration related to greater speaker effort (Sluijter & van Heuven 1996b:2472). All unstressed instances of [i]—i.e. [i] in parenthetic sentences (see Figure 1) and the unstressed [i] in declarative sentences—display a flat F0 trajectory. These vowels also show the lowest intensity values. By contrast, stressed accented [i]s bear a rising F0 trajectory, and, correspondingly, show the highest intensity values. As for [o], this vowel in parenthetic sentences displays a flat intonation contour and thus has lower intensity values than in declarative utterances, where it has a rising F0 trajectory. The stressed accented [o] in declarative sentences bears the pitch accent and therefore has both a rising F0 trajectory and high intensity values. Crucially, although unstressed [o]s in declarative sentences are phonologically unaccented, they bear the peak of the preceding pitch accent and display higher intensity values than unstressed [o]s at the beginning of the F0 rising trajectory. In fact, the results in Table 7 below demonstrate that all subjects showed a significant positive correlation between overall intensity and F0 height for each vowel, the correlation coefficients being especially strong for subjects 2, 3, and 4. That is, the higher the pitch of the vowel, the higher the overall intensity levels obtained. Therefore, it is hypothesized that the increased overall intensity patterns found in the Spanish data are due to the interdependence between F0 levels and overall intensity. As Sluijter and van Heuven (1996b:2482) claim, the greater intensity typically found in accented syllables is caused by the larger amplitude of the pulses in vocal fold vibration. vowel [i] vowel [o] Subject 1 0.507 0.552 Subject 2 0.713 0.641 Subject 3 0.751 0.750 Subject 4 0.783 0.721 Subject 5 0.586 0.635 Table 7: Correlation coefficients between overall intensity and F0 height for the two vowels [i] and [o] for five subjects. One-tailed, all cells were significant at 0.01 level.
3.2.4 Spectral tilt. In our data, spectral tilt was calculated as the ratio of the intensity in the higher frequencies to the intensity of the lower frequencies in vowel [o], as spectral tilt could not be measured for vowel [i] (see Section 2.3.3). Thus, when frequencies from the higher and from the lower part of the spectrum have similar intensities, the ratio approaches 1 and the tilt in the spectrum decreases. Figure 7 shows the mean spectral tilt ratios (and standard error values) for vowel [o] in different stress (stressed/unstressed) and intonation conditions (declarative/parenthetic) for all five speakers. First, the
DISENTANGLING STRESS FROM ACCENT IN SPANISH
169
spectral tilt ratios of stressed [o]s (in grey) are closer to 1 and show a flatter tilt than unstressed [o]s (in black). Like for duration and vowel quality, this difference is maintained across intonation contexts, revealing a potential effect of stress on spectral tilt: stressed syllables tend to increase the intensity of the higher frequencies, and consequently have a ‘flatter’ spectral tilt than their unstressed counterparts. Second, the spectral tilt of [o] in declarative sentences is closer to 1, and therefore the tilt decreases, in contrast to parenthetic sentences. This reveals a potential effect of the presence of an accent. Results on the Repeated Measures ANOVA show that the interaction ‘stress x intonation’ is non-significant for spectral tilt measurements (F(1,74) = 1.797, p = 0.185 for vowel [o]), indicating that the effect of stress on these independent variables is the same regardless of the presence of an accent. Moreover, paired-samples t-tests confirm that spectral tilt is a reliable acoustic correlate of stress across intonation contexts. Paired T-tests show that there is a significant difference between accented and unaccented syllables, and between unaccented syllables from declarative and parenthetic sentences. These results indicate that there is a difference between sentence type: declarative sentences display greater intensity levels in the higher regions of the spectrum than parenthetic sentences.
Figure 7: Mean spectral tilt ratios (and standard error values) for vowel [o] in different stress (stressed/unstressed) and intonation (declarative/parenthetic) contexts for all five speakers. Vowel
Intonation
Contrast
Significance
Declarative
[+stress, −stress]
p<0.05
Parenthetic
[+stress, −stress]
p<0.0001
2: [o]
Table 8: Results of paired-samples t-tests on spectral tilt of vowel [o]. One-tailed, significance at 0.05 alpha level. Bonferroni adjustment for multiple comparisons.
170
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
These results therefore suggest that spectral balance is a more robust and systematic cue to stress than overall intensity, and are in keeping with previous results on other stress-accent languages (Sluijter & van Heuven 1996a, 1996b). We thus suggest that Navarro Tomás’s hypothesis that Spanish stress is strongly cued by intensity (the so-called acento de intensidad) can be interpreted as essentially correct if one understands that the acoustic correlate of increased perception of loudness is greater intensity levels in the higher parts of the spectrum. Thus the perception that a stressed syllable is more prominent probably derives from its increased intensity levels in the high regions, not the low regions, of the spectrum. This difference is maintained in unaccented contexts and enhanced in accented syllables. Vowel
Stress
Contrast
significance
[+stress]
[+accent, −accent]
p<0.0001
[−stress]
[−accent, −accent]
p<0.0001
2: [o]
Table 9: Results of paired-samples t-tests on spectral tilt of vowel [o]. One-tailed, significance at 0.05 alpha level. Bonferroni adjustment for multiple comparisons.
3.2.5 Linear Discriminant Analyses. Following Sluijter and van Heuven (1996b), the contribution of each acoustic correlate was examined by Linear Discriminant Analyses (LDA). Two LDA with the grouping variable of stress (+stress/−stress) were performed on measurements of duration, vowel quality, spectral tilt and overall intensity for vowel [o]. Since spectral tilt could not be measured for vowel [i], LDA was not run for this vowel (see Section 2.3.3). First, all measurements were entered together in order to assess how well stress could be predicted. The obtained discriminant functions correctly classify as [+stress] or as [−stress] 71.3% of the vowel [o] tokens in declarative sentences and 70.7% in parenthetic sentences. Thus, stress could be predicted with a reasonable level of accuracy from the measurements of duration, spectral tilt, vowel quality and overall intensity. Secondly, measurements of duration, spectral tilt, vowel quality and overall intensity were entered separately into the discriminant function so as to determine the contribution of each one of these variables in the prediction of stress. As Figure 8 shows, duration correctly classified 70% of stressed syllables in declarative sentences and 66.7% in parenthetic sentences. Vowel quality classifications achieved scores of 60.7% in declaratives sentences and 57.3% in parenthetic sentences. Spectral tilt scored 51.3% in declarative sentences but increased to 61% in parenthetic sentences. In contrast with the preceding measurements, classification scores for overall intensity were always below chance (46% in declarative sentences and 50% in parenthetic sentences). These results indicate that duration is the main cue to stress in Spanish. Duration measurements showed that stressed syllables had longer durations
DISENTANGLING STRESS FROM ACCENT IN SPANISH
171
than their unstressed counterparts, and the LDA results indicate that these differences in duration are sufficient to distinguish stressed from unstressed syllables with a high level of accuracy. Moreover, since the scores for declarative sentences do not vary substantially from those obtained for parenthetic sentences (3.3% difference), they show that the successful classification of stressed syllables based on duration differences takes place regardless of the presence or absence of a pitch accent. Thus, duration does effectively cue the stress contrast independently of pitch accents. 75
% correct
70 65
declarative
60
parenthetic 55 50 45
duration
overall intensity
spectral tilt
vowel quality
Figure 8: Percentage of vowels correctly predicted as stressed or unstressed for each phonetic cue.
A similar pattern is found for vowel quality: the distance between F1 and F2 increases slightly in unstressed [o] in both declarative and parenthetic sentences, indicating a slight tendency towards vowel centralization in unstressed vowels, which in turn leads to correct classification scores of stressed syllables in both sentence types. This tendency to vowel centralization, however, is based on a difference of less than 1 Bark, which may call into question the perceptual relevance of this cue. Spectral tilt also contributes to the prediction of stress, but only in parenthetic sentences. This may indicate a compensatory relation between duration (and possibly vowel quality) and spectral tilt. Since duration cues have less predictive power in parenthetic sentences, spectral tilt becomes a better predictor of stress in this context. The only cue that does not contribute to the prediction of stress in any context is overall intensity. In summary, LDA of the Spanish data show that duration is the most effective correlate of stress in both parenthetic and declarative sentences. After that, vowel quality makes a significant contribution in predicting vowels as [+stressed] or as [−stressed], followed by spectral tilt. Overall intensity, however, does not contribute to this classification.
172 4.
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
Discussion and conclusions In this article, we were concerned with the acoustic correlates that characterize stress and accent in Spanish. We analyzed four acoustic correlates of stress (syllable duration, vowel quality, overall intensity, and spectral balance) in four conditions, namely, stressed and unstressed syllables in both accented and unaccented environments. This allowed us to examine the relative strength of these correlates with relation to stress and see how the stress contrast is maintained in the presence or absence of a pitch accent. The duration measurements revealed that stressed syllables are longer than unstressed syllables regardless of the presence of an accent, demonstrating that syllable duration is a strong acoustic correlate of the stress difference in Spanish. Moreover, LDA results for the Spanish data singled out the effectiveness of duration as the most robust acoustic separator between stressed and unstressed conditions. This is basically in accordance with the main results for Dutch, where duration is the most effective correlate of stress (Sluijter & van Heuven 1996a, 1996b:2475). Furthermore, in contrast with previous studies, our results show that the presence of a pitch accent does not consistently trigger additive effects on the duration cues. That is, in our data, the presence of an accent does not obligatorily trigger lengthening on the stressed syllable. Even though previous studies on other stress-accent languages have found additive effects of accent (Sluijter & van Heuven 1996a, 1996b:2475, for English and Dutch respectively), Beckman and Edwards (1994:20-25) found that this pattern varied across speakers and speech rates: while one of the speakers showed a consistent durational effect of accent, this was not the case for the other speaker. Consequently, we claim, along with Beckman & Edwards (1994), that while duration is a crucial acoustic cue to mark a lower level prominence contrast (stressed vs unstressed), it is a secondary (and thus optional) acoustic marker of a higher-level prominence contrast (accented vs unaccented). The formant measurements for the Spanish data revealed significant effects of stress on the formant values of [o], indicating a slight tendency towards centralization in unstressed positions. Moreover, LDA results confirm the significant contribution of the vowel quality variable to stress prediction (albeit less strong than duration). These results contrast with those for Dutch, where the effects of stress on vowel quality were only partially significant and vowel quality was found to be a poor predictor of stress in LDA analyses (Sluijter & van Heuven 1996b). On the one hand, this difference could be related to the fact that Sluijter used both vowels in the LDA prediction ([a] and []) and only [a] reduced into [] in unstressed syllables. On the other hand, vowel reduction in Dutch takes place mainly in derivational suffixes (van Heuven 2001), which were not examined in Sluijter’s test materials. In our data, we included only [o], which underwent vowel quality changes, not [i], which did not. If we had included both vowels in the LDA analysis, vowel
DISENTANGLING STRESS FROM ACCENT IN SPANISH
173
quality might not have been such a good predictor of stress, and our results might have been closer to those of Sluijter and van Heuven. On the other hand, the presence of an accent does not affect formant frequency values, and therefore accented syllables have similar vowel qualities to unaccented syllables. Thus, both syllable duration and vowel quality cues can be interpreted as ‘primary’ cues in the stress dimension and ‘secondary’ cues in the accent dimension. This is probably due to the fact that in our data vowel lengthening was related to stress, not to accent, and as Lindblom showed (Lindblom 1963; Moon & Lindblom 1994), there is a linear relationship between duration and formant displacement: shorter vowels undergo more formant displacement (towards centralization) than longer vowels. This linear relationship between duration and formant displacement is biomechanically motivated and provides evidence for a vowel undershoot model. In shorter vowels, articulators have less time to attain their target, and as a result, vowels become reduced, thus showing more formant displacement towards a reduced vowel. Thus, unstressed vowels in Spanish become slightly more centralized than stressed vowels because they are also shorter than their stressed counterparts. However, the magnitude of this centralization is very small, probably because Spanish does not have phonological vowel reduction. It would be interesting to compare Spanish to a language with phonological vowel reduction in order to examine how these two variables cue the stress contrast in each language. We turn now to a discussion of intensity patterns. Crucially, the data presented in this article replicates Sluijter and collaborators’ (1996a, 1996b) finding that the intensity differences between stressed and unstressed vowels are mainly located in the higher regions of the spectrum and, as Campbell and Beckman (1997) showed, these differences are enhanced in accented contexts. It is clear that overall intensity cannot be regarded as a reliable acoustic correlate of stress in Spanish, as our ANOVA and LDA results demonstrate. By contrast, spectral balance differences (i.e. intensity levels at higher regions of the spectrum), appear to be a consistent cue for stress. Thus, we contend that the classic claim in the Spanish phonetics literature made by Contreras (1963) and Quilis (1971, 1981) that intensity plays almost no role as a cue to stress is not accurate. On the other hand, we take Navarro Tomás’ view that intensity plays an essential part in the production of stress (the so-called acento de intensidad) as essentially correct. Thus, a Spanish stressed syllable is probably perceived as more prominent due to an increase in the intensity levels in the higher, not lower, regions of the spectrum. The present findings also have implications for previous debates on acoustic correlates of Spanish stress, as this is one of the first experiments comparing the acoustic correlates of Spanish stress in accented and unaccented words. Previous studies on the acoustic characterization of Spanish stress had only studied words containing a pitch accent (see Navarro Tomás 1914, 1964; Contreras 1963, 1964; Quilis 1971; Gili Gaya 1975; Solé 1984; Canellada & Kuhlman-Madsen 1987; Llisterri et al. 2003, among many others). In this new
174
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
context, the traditional goal of searching for the main cue to stress in Spanish makes no sense, as phonetic cues are not used the same way in accented and unaccented contexts. In accented contexts, it is clear that pitch is a strong phonetic cue of stress, as claimed by Contreras (1964) or Quilis (1971, 1981) and authors of perception studies like Solé (1984), Enríquez, Casado & Santos (1989), and Llisterri et al. (2003); in this context, duration and intensity cues also accompany the pitch difference. Yet in unaccented contexts, where pitch is flat and cannot be an indicator of the stress difference, the results of this study reveal that duration, intensity, and even vowel quality are good indicators of the stress difference. In this sense, the traditional claim by Navarro Tomás that the strongest cue to stress in Spanish is a local increase in loudness or intensity is only partially true, as clearly duration is also a very strong indicator of the presence of stress. Thus in the absence of an accent, cues like duration and spectral tilt are crucial in the production of Spanish stress. We conclude on the basis of these results that syllable duration, vowel quality, and spectral tilt (intensity at high frequencies of the spectrum) are all reliable acoustic correlates of the stress difference in Spanish. Accentual differences are acoustically marked by intensity cues, but our findings cast doubt on the notion that these might be a by-product of higher F0 levels which covary with higher intensity levels. Thus, our results reveal that American English, Dutch and Spanish do differ fundamentally in the use of vowel reduction and consonant reduction (flapping, aspiration) to mark stressed positions, but do not differ greatly in the way they use the other acoustic correlates (duration and intensity) to signal the presence of stress and accent. Stress is cued by duration, intensity, and vowel quality in the absence of an accent, confirming the relative independence of metrical and pitch properties. Finally, an appropriate follow-up of this research would be to examine the relevance and interaction of these factors in the actual perception of the stress contrast in Spanish.
References Beckman, Mary E. 1986. Stress and Non-stress Accent. Dordrecht: Foris. ---------- & Jan Edwards. 1994. “Articulatory evidence for differentiating stress categories”. Phonological Structure and Phonetic Form. Papers in Laboratory Phonology III ed. by Patricia A. Keating. 7-33. Cambridge: Cambridge University Press. Boersma, Paul & David Weenink. 2005. Praat: Doing Phonetics by Computer (Version 4.3.01). [Computer program]. Retrieved from http://www.praat.org/ Bolinger, Dwight L. 1958. “A theory of pitch accent in English”. Word 14. 109-149. Canellada, María Josefa & John Kuhlman-Madsen. 1987. Pronunciación del español. Lengua hablada y literaria. Madrid: Editorial Castalia.
DISENTANGLING STRESS FROM ACCENT IN SPANISH
175
Campbell, Nick and Mary Beckman. 1997. “Stress, prominence and spectral tilt”. Intonation: Theory, Models and Applications. Proceedings of an ESCA Workshop ed. by Antonis Botinis, Georgios Kouroupetroglou & George Carayiannis. 67-70. Athens: ESCA & University of Athens Department of Informatics. Contreras, Heles. 1963. “Sobre el acento en español”. Boletín del Instituto de Filología de la Universidad de Chile 15. 223-237. ----------. 1964. “¿Tiene el español un acento de intensidad?” Boletín del Instituto de Filología de la Universidad de Chile 16. 237-239. Enríquez, Emilia V., Celia Casado & Andrés Santos. 1989. “La percepción del acento en español”. Lingüística Española Actual 11. 241-269. Gili Gaya, Samuel. 1975. Elementos de fonética general. Madrid: Gredos. Heuven, Vincent J. van. 2001. Boven de klanken / Beyond the segments. Inaugural address in acceptance of the chair in Phonetics at Leiden University, The Netherlands. Retrieved from: http://www.let.leidenuniv.nl/ulcl/faculty/vheuven/oratie-eng.htm. Lindblom, Björn. 1963. “Spectrographic study of vowel reduction”. Journal of the Acoustical Society of America 35. 1773-1781. Llisterri, Joaquim, María Machuca, Carme de la Mota, Montserrat Riera & Antonio Ríos. 2003. “The perception of lexical stress in Spanish”. Proceedings of the 15th International Conference of Phonetic Sciences ed. by Maria-Josep Solé, Daniel Recasens & Joaquín Romero. 20232026. Barcelona: Causal Productions. Navarro Tomás, Tomás. 1926 [1914]. Manual de pronunciación española. 2nd ed. Madrid: Hernando. ----------. 1964. “La medida de la intensidad”. Boletín del Instituto de Filología de la Universidad de Chile 16. 231-235. Moon, Seung-Jae & Björn Lindblom. 1994. “Interaction between duration, context, and speaking style in English stressed vowels”. Journal of the Acoustical Society of America 96:1. 40-55. Ortega-Llebaria, Marta. 2006. “Phonetic cues to stress and accent in Spanish”. Selected Proceedings of the 2nd Conference of Laboratory Approaches to Spanish Phonology ed. by Manuel Díaz-Campos. 104-118. Somerville, Mass.: Cascadilla Press. Pierrehumbert, Janet B. 1980. The phonetics and phonology of English intonation. PhD. diss., MIT. Quilis, Antonio. 1971. “Caracterización Fonética del Acento Español”. Travaux de Linguistique et de Littérature 9. 53-72. ----------. 1981. Fonética acústica de la lengua española. Madrid: Biblioteca Románica Hispánica, Gredos. ---------- & Manuel Esgueva. 1983. “Realización de los fonemas vocálicos españoles en posición fonética normal”. Estudios de Fonética I ed. by Manuel Esgueva & Margarita Cantarero. 159-252. Madrid: Consejo Superior de Investigaciones Científicas.
176
MARTA ORTEGA-LLEBARIA & PILAR PRIETO
Sluijter, Agaath M.C. & Vincent van Heuven. 1996a. “Acoustic correlates of linguistic stress and accent in Dutch and American English”. Proceedings of the 4th International Conference on Spoken Language Processing ed. by H. Timothy Bunnell & William Idsardi. 630-633. New Castle, Del.: Citation Delaware. Sluijter, Agaath M.C. & Vincent van Heuven. 1996b. “Spectral balance as an acoustic correlate of linguistic stress”. Journal of the Acoustical Society of America 100:4. 2471-2485. Sluijter, Agaath M.C., Vincent van Heuven & Jos J.A. Pacilly. 1997. “Spectral balance as a cue in the perception of linguistic stress”. Journal of the Acoustical Society of America 101:1. 503-513. Solé, Maria-Josep. 1984. “Experimentos sobre la percepción del acento”. Estudios de Fonética Experimental I. 134-243. Universidad de Barcelona. Wood, Sidney. 2005. Praat for Beginners. [Manual]. Retrieved from http://www.ling.lu.se/persons/Sidney/praate/
PART 3 ACQUISITION OF SEGMENTAL CONTRASTS AND PROSODY
ON THE EFFECT OF (MORPHO)PHONOLOGICAL COMPLEXITY IN THE EARLY ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE * M. JOÃO FREITAS University of Lisbon
Abstract The goal of this paper is to examine the acquisition of vowels in European Portuguese by monolingual Portuguese children. The vowel inventory in this language is affected by reduction in unstressed position, which entails allophonic (and allomorphic) variation. This study focuses on the acquisition of the unstressed [] and [] resulting from the reduction of the segments /a/, /e/ and //. Regarding the analysis of the target system, the process that reduces /a/ to [] involves the Height node (degree of openness) while the process that reduces /, e/ to [] affects both the Height node and the V-Place node (place of articulation). Given the facts attested in the adult system, data obtained from seven children in the process of acquiring Portuguese as their native language is examined in the light of the following research questions: is the order of acquisition related to the featural architecture of segments? Are children sensitive to phonological processes in the early stages of phonological development?
1.
Introduction In this paper, we will provide information on the acquisition of vowels in European Portuguese (EP), by focusing on Portuguese children’s early productions of target unstressed vowels. EP presents a reduced vowel inventory in unstressed position; this derives from the productivity of the vowel reduction process, which entails both allophonic and allomorphic variation in the language. The acquisition of this language is therefore an interesting case to investigate how children deal with allophony, for it has a massive impact on the shapes of words in the target language. Moreover, the effect of allophonic variation in the acquisition of the EP vowel system is a *
I thank the audience at the PaPI 2005 conference and the three anonymous reviewers for their comments and suggestions. I am particularly grateful to Paula Fikkert for the discussions on the topic dealt with here. This research was supported by the Fundação para a Ciência e a Tecnologia through the research center Onset – Centro de Estudos da Linguagem (FCT/FEDER.POCTI/33277/LIN/2003). The database used for this paper was taken from a project on the acquisition of EP (PCSH/LIN/524/93) carried out at the Laboratório de Psicolinguística – Universidade de Lisboa.
180
M. JOÃO FREITAS
relevant topic for the debate on the nature of early phonological representations. At present, there are two possible scenarios: (i)
(ii)
If we find that all children’s vowels match their targets, this suggests that early abstract phonological representations are not present in the child’s system. In this case, the featured structure of vowels would be fully specified and no impact of target allophonic variants in the child’s phonological development would be attested. On the other hand, if there are mismatches between the target forms and the children’s productions, we may argue that the selected repair vowels emerge as a consequence of the presence of early abstract phonological representations in the child’s system. We would then expect the repair vowels in the children’s words to be instances of the target abstract phonological vowel that undergoes the reduction process. These repair vowels will therefore be products of abstract vowels that lack featural structure under a specific node in the child’s system. If this is the case, children will show that they are aware of the complexity associated with allophonic variants in the adult system.
Research on vowel alternations in child language is scarce in literature (see Macken 1995; Bernhardt & Stemberger 1998; Peperkamp & Dupoux 2002; Hayes 2004; Freitas 2004; Fikkert & Freitas 2006). By focusing on processes involving vowel alternation in Portuguese children’s productions, the central aim of this paper is therefore to contribute with empirical evidence to the debate on how abstract the child’s phonological representations can be. In previous work on the acquisition of the EP vowel system (Freitas 2004), we observed that children are able at an early age to discriminate two different instances of the [] vowel in the target system: (i)
(ii)
One that results from the neutralization of /, e/ to [] in unstressed position (e.g. rega [] “irrigation” – regar [] “to sprinkle”; cedo [sedu] “early” – cedinho [sdiu] “early-DIMINUTIVE”) 1 ; one that is used as a filler of empty prosodic categories (e.g. admirar [dmia] “to admire” → [dmia]; pneu [pnew] “tyre” → [pnew]; mar [ma] “sea” → [ma]).
It was also demonstrated that Portuguese children produced a variety of different repair vowels in the case of the target neutralized [], repair vowels which are basically possible instances of the underlying vowels assumed to exist in the lexical representation of the adult. As for the filler [], the vowel used by the children in the study consistently matched the vowel in the adult system. This finding of early sensitivity to the phonological nature of vowels in
1
See the next section of this paper for further examples.
ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE 181
EP suggested the need to pursue further investigation on the acquisition of vowels in this language. Thus, in a second study, Fikkert & Freitas (2006) showed that Portuguese children are able to process the underlying differences associated with the only two phonological conditions where stressed [] emerges in the system (in all the other contexts in the language, a stressed dorsal is always [a] (Mateus 1975; d’Andrade 1977; Mateus & d’Andrade 2000, among others)): (i)
(ii)
Stressed [] shows up as an allophone of /a/ in the context of left adjacency to a nasal consonant (lama [lm] “mud”; ano [nu] “year”; aranha [] “spider”); stressed [] surfaces as an allophone of /e/ in the context of left adjacency to a palatal (lenha [l] “firewood”; espelho [pu] “mirror”; rei [j] “king”; cereja [s] “cherry”; fecho [fu] “zipper”).
The results showed that at an early stage Portuguese children are able to discriminate between these two target stressed []s. In the case of the target abstract /a/, mainly dorsal vowels occurred in the children’s production, while in the case of the target abstract /e/, it was coronals that were most often produced. The two problems mentioned above provided empirical evidence to support the claim that children are able to deal with allophony in the target system during an early period of their phonological development. In this paper, I proceed with the research on the acquisition of the vowel system in EP by confronting the neutralizations that generate the unstressed allophones [] and []. In Section 2, I will briefly describe the properties of the target system with regard to the phonological process affecting the unstressed vowel inventory. Section 3 formulates the acquisition problem related to the mismatch between the lexical representation of words and the properties of the phonetic output of words in the target system. In Section 4, I will provide methodological information on the nature of the data observed. Section 5 contains the description of the data selected for analysis and Section 6 discusses the predictions presented in Section 3. 2. 2.1
The target system Non-final vowels The vowel system in EP is phonologically complex, for it is affected by several different processes. Within this set of processes, the one that concerns vowel reduction in unstressed position is highly productive and generally applies to all vowels in this prosodic condition. The following neutralizations of vowels in unstressed positions are attested in the language (Morais Barbosa 1968; Mateus 1975; d’Andrade 1977; Mateus & d’Andrade 2000):
182 (1)
M. JOÃO FREITAS
a. b. c.
/a/ reduces to []. /, e/ reduce to []. /, o/ reduce to [u].
These phonological facts generate allophonic and allomorphic variation in the language, as we may observe by the contrasts [a, ], [, ], [e, ], [, u] and [o, u] in (2a). On the other hand, / i/ and /u/ generally do not undergo the reduction process affecting the vowels mentioned in (1) (see the examples in (2b)): (2)
Allophonic/allomorphic variation in EP I. Stressed V a. [saku] “bag” [lv] “light” [sed] “silk” [md] “fashion” [lobu] “wolf” b.
“book” “juice”
[livu] [sumu]
II. Unstressed V [skiu] “bag-DIMINUTIVE” [lvez] “lightweight” [sdozu] “silk-like” [mudit] “dressmaker” [lubiu] “wolf-DIMINUTIVE” [livi] “bookshop” [sumetu] “juicy”
The system therefore exhibits eight vowels—[a, 2 , , e, i, , o, u]—in the stressed position, with three degrees of openness and three places of articulation. However, only four vowels—[, , i, u]—correspond to the possible output forms in unstressed position, showing two degrees of openness ([+high] and [–high]/[–low]), while the contrasts in terms of place of articulation are maintained (dorsal, coronal and labial). The information in Figure 1 is based on Mateus & d’Andrade (2000:20), who assume the proposal in Clements and Hume (1995) as the background model for the feature geometry proposed for EP. [+high] [–high] [–low] [+low]
i
u
e o a (Coronal) Dorsal Labial Figure 1: The vowel system in EP.
2
As mentioned in Section 1, under specific conditions (at the left edge of palatals and at the left edge of nasal consonants), the vowel [] may surface in stressed position. At the left-edge of palatals, it is assumed to be an allophone of /e/, while at the left-edge of nasal consonants, it is an allophone of /a/ (Mateus 1975; d’Andrade 1977; Mateus & d’Andrade 2000). For the acquisition of these two instances of stressed [] by monolingual Portuguese children, see Fikkert & Freitas (2006).
ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE 183
According to Mateus & d’Andrade (2000), vowels in EP show a contrast under the domain of the Height node, represented by the features [+/–high] and [+/–low]. Moreover, [a, , ] are represented as Dorsal [+back] and [, o, u] are represented as Labial [+round]. The vowels [, e, i] show no specification under the V-place node. Although Mateus & d’Andrade (2000) do not use Coronal for the characterization of [, e, i], we will however refer to this natural class as the set of Coronal vowels to distinguish them from the two other sets, Dorsals and Labials. The internal structure of vowels assumed in Mateus and d’Andrade (2000) entails different levels of featural complexity under the neutralization process in unstressed position: (3)
Featural changes in vowel reduction → [] (changes under the Height node). a. /a/ b. /, e/ → [] (changes under the Height and V-place nodes). c. /, o/ → [u] (changes under the Height node).
An additional relevant aspect of the target system relates to the frequent optional deletion of [] and [u] in spontaneous speech (see Mateus & d’Andrade 2000 and Vigário 2003, among others), as illustrated in the examples in (4): (4)
Optional deletion of [] and [u] [dpa] → [dpa] [lum] → [lum] [futufi] → [ftufi]
“to unfasten” “light” “photo”
The frequent vowel deletion in the input (Mateus & Delgado-Martins 1982; Mateus & d’Andrade 2000; Vigário 2003, among others) assigns a high level of opacity to the system since it frequently entails a mismatch between the lexical representation of words and the lack of vowels in the output forms that Portuguese children have access to (for more on this topic, see Frota, Vigário & Freitas 2003). 2.2
Word-final vowels Word-final vowels are generally instances of a class marker in EP. It is possible to identify four morphological paradigms involving the presence of class markers in the EP non-verbal system (see Mateus & d’Andrade 2000): (i)
(ii)
Word-final [], generally (but not obligatorily) associated with feminine forms (e.g. a menina [mnin] “the girl”; o mapa [umap] “the map”); word-final [u], generally associated with masculine forms (e.g. o menino [umninu] “the boy”);
184
M. JOÃO FREITAS
(iii) word-final [], a non-transparent vowel in terms of gender information (e.g. o palacete [uplset] “the big house”; a idade [idad] “the age”); (iv) the absence of a phonetic class marker, corresponding to a morphophonologically empty category (e.g. rubi [ubi] “ruby”; peru [pu] “turkey” 3 ; final [fina] “final”). The non-verbal system therefore exhibits three unstressed vowels at the right edge of words which show morphological content and are considered to be output forms of the vowel reduction process [, , u] (see examples in (5)): (5)
Class markers in EP 4 (Mateus & d’Andrade 2000:66-67) [saku] “bag” ([u] ← /o/) [sed] “silk” ([] ← /a/) [sidd] “town” ([] ← /e/)
In the case of these three word-final vowels, the absence of stress contrast available in the cases of allomorphy as illustrated in (2) does not allow the child to extract information on the nature of the abstract vowel underlying [], [] and [u]. In this case, only a generalization based on the general dynamics of allophony affecting non-final vowels in the target system could lead the child to build an abstract lexical representation for each of these three word-final unstressed vowels. The extent to which Portuguese children are treating the same unstressed vowel in both word-final and non-final positions similarly is therefore a relevant issue for research on how sensitive children are to abstract phonological representations in the target language. 3.
The acquisition puzzle As demonstrated in the previous section, Portuguese children are faced with a mismatch between lexical representations of words and the phonetic string in adult speech, due, in this case, to the reduced inventory of unstressed vowels [, , i, u] and the optional deletion of [, u]. Considering exclusively the effects of the regular vowel reduction process, the system therefore shows: (i)
Differences in featural structure within the vowel reduction process (/, e/ → [] is more complex than /a/ → [] or /, o/ → [u]); (ii) a stress contrast for non-final vowels within stems or derivational stems ([a, ], [, ], [e, ], [, u] and [o, u]); (iii) the absence of stress contrast for word-final class markers, which never occur in the stressed position [, u, ]. 3
These word-final vowels are possible but rare in the system; in these cases, the vowel is generally stressed and it is the last segment of the stem. 4 We are excluding here the fourth class marker in the nominal system, which is morphophonologically empty, exemplified by peru “turkey” or rubi “ruby”.
ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE 185
Considering these phonological facts, and the hypothesis that the complexity of the target language may enhance the rate of phonological development (see Fikkert & Freitas 1998 for the impact of the Rhyme complexity in the syllabic development of Dutch and Portuguese children), our prediction is that the complexity induced by both allophonic variation and the mismatch between the phonetic string and the phonological properties of lexical representations will lead Portuguese children to build abstract phonological representations at an early age. If this prediction is true, namely, that children are able to deal with both allophony and allomorphy at early ages, the repair vowels used by children will be instances of an abstract lexical vowel, determined on the basis of the allophonic and allomorphic relations present in the target grammar. If, on the other hand, children are not able to deal with allophony and allomorphy, the formats of the unstressed vowels will match the adult’s output forms, thus confirm the idea that children store all variants (see Bybee 2001). As demonstrated in Section 2, different neutralizations in an unstressed position in EP involve different featural changes: /, e/ → [] involves changes under the Height node and the V-place node, whereas /a/ → [] and /, o/ → [u] involve changes only under the Height node. This suggests the following hypothesis: Hypothesis 1—the number of featural changes will have an impact on the acquisition of vowel reduction; as a result, neutralizations involving one single node in the internal structure of the vowel will become stable before neutralizations involving more than one node. In the previous section, we observed that, within stems, Portuguese children found positive evidence from both allophonic and allomorphic variations to capture information on the vowel system, namely, that the shapes of stressed and unstressed vowels are related and follow on from distributional properties in the system. However, we can see that there is a contrast between final and non-final vowels: unlike non-final vowels, word-final vowels carry morphological content (as class markers) and never show up in a stressed position. The low acoustical prominence of these morphological word-final unstressed vowels allows us to formulate the second hypothesis: Hypothesis 2—the absence of stress will delay the acquisition of word-final vowels relative to non-final ones. The research reported in this paper will focus on the target unstressed vowels [] and [] that result from /a/ → [] and /, e/ → [] neutralizations, in both final and non-final word positions. 4.
The data In this paper, I will analyse a subset of data taken from a larger corpus of data consisting of 18,654 utterances produced by monolingual Portuguese children. The corpus contains longitudinal cross-sectional data from seven children aged 0;10 to 3;7. Six of them were videotaped monthly over the course of one year, and one (João) was videotaped for over the course of two years. Each session took place in the child’s home, in the presence of the
186
M. JOÃO FREITAS
mother and the researcher, and lasted from thirty to sixty minutes. The database format used for the analysis of the children’s productions was the CHILDPHON Wordbase, an application of the 4th Dimension software for Macintosh, developed at the Max Planck Institute for Psycholinguistics in Nijmegen and first used in Fikkert (1994) and Levelt (1994). Since our goal was to investigate the children’s early sensitivity to the phonological processes affecting the vowel inventory, we used only data produced by the three youngest children (Inês, Marta and João). These children’s utterances were examined in order to account for their behaviour concerning the unstressed neutralized vowels [] and []. Therefore, all non-verbal lexical targets containing one or both of these vowels were considered. The children’s ages are given below: Children Age Inês 0;11.14-1;10.29 Marta 1;2.0-2;2.17 João 1;8.13-2;8.27 Table 1: Children’s ages.
5.
The results In this section, we will describe the behaviour of the three Portuguese children observed in terms of neutralized [] and [], limiting our analysis exclusively to those instances which result from the processes of vowel reduction (/, e/ → [] and /a/ → []). In Section 5.1, we will refer to the acquisition of non-final vowels, while in Section 5.2, the acquisition of wordfinal vowels will be described. In order to provide information on the phonological development of each child in regards to the issue under analysis, the data in this section will be presented separately for each child. 5.1
Non-final vowels The examples in (6) exhibit different repair vowels for target []; the examples in (7) refer to target lexical units containing []:
(6)
/mnin/ /ptadu/ /mnin/ /tli/ /kid/ /tlfn/
→ [miin] → [ptadu] → [mnin] → [ti] → [kikid] → [tni]
(Inês: 1;9.19) (Inês: 1;8.19) (Marta: 1;10.4) (Marta: 2;0.26) (João: 2;2.28) (João: 2;8.27)
“girl” “tight” “girl” “star-DIM.” “dear” “phone”
(7)
/spatu/ /pw/ /nl/ /knet/ /btat/
→ [ppatu] → [papw] → [nl] → [klel] → [batat]
(Inês: 1;8.2) (Inês: 1;8.2) (Inês: 1;10.29) (Marta: 1;5.17) (Marta: 1;7.18)
“shoe” “hat” “windows” “pen” “potatoes”
ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE 187
/ppaju/ → [kabajo]/[kbaju] (Marta: 1;8.18) /ki/ → [ti] (João: 1;11.13) /mkaku/ → [akaku]/[mkaku] (João: 2;3.19)
“parrot” “here” “monkey”
The results reported in Tables 2 and 3 refer to Inês and the values are given in percentage of occurrences in different production forms for target reduced [] (Table 2) and for target reduced [] (Table 3) in word non-final position. For all Tables in this section, the sequences ‘∅[]’ and ‘∅[]’ stand for the deletion of the vowel; the sequence ‘∅σ’ represents the deletion of the syllable that contains the target vowel; and the leftmost column gives the child’s age at each session videotaped. The lack of information in one specific session means that no targets containing the structure under analysis were produced. Inês [] ∅[] ∅σ [i] [e] [] 0;11.14 100 1;0.25 100 1;1.30 1;3.6 100 1;4.9 100 1;5.11 100 1.6.6 100 1;7.2 100 1;8.2 17 83 1;9.19 33 67 1;10.29 13 17 46 4 8 12 Table 2: Production of target neutralized [] (/, e/ → []) in relative percentages. Inês [] ∅σ [a] [] [i] [e] [] [u] [o] 0;11.14 1;0.25 1;1.30 94 6 1;3.6 25 50 25 1;4.9 22 67 11 1;5.11 67 33 1.6.6 42 25 8 25 1;7.2 14 45 17 7 3 14 1;8.2 38 31 12 8 8 3 1;9.19 61 21 6 6 6 1;10.29 52 14 8 10 4 2 8 2 Table 3: Production of target neutralized [] (/a/ → []) in relative percentages.
As we may observe by comparing Tables 2 and 3, Inês clearly shows that reduced [], a product of the target process /a/ → [], emerges before the reduced [] resulting from /, e/ → [] neutralization. In Table 2, the emergence of [] is registered at 1;8 and 1;10 (17% and 13%), whereas for target [], Table 3 shows that target [] starts to appear at 1;3 and that its frequency values range between 14% and 67%, with the two last sessions showing rates above
188
M. JOÃO FREITAS
50% 5 . Moreover, syllables containing [] are preferably deleted until 1;9, while [] shows up at 1;3. With regard to the repair vowels Inês uses to deal with the two targets, the information for [] is scarce, though at 1;9, a clear preference for the coronal [i] is attested (see column [i] in Table 2). The dorsals [a] and [] are the preferred repair vowels to deal with target [] (see columns [a] and [] in Table 3). The infrequent use of labials and coronals is normally due to harmony processes in the child’s system, as we may see in the examples /nl/ → [sinl] “windows”, /ml/ → [l] “yellow” or /su/ → [u] “safe” (Inês: 1;10.29). Tables 4 and 5 provide percentage information on the occurrences in the data for Marta of production forms for target reduced [] (Table 4) and for target reduced [] (Table 5), in word non-final position. Marta [ ] ∅[] ∅σ [i] [e] [] [j] [] [u] 1;2 100 1;3.8 100 1;4.8 1;5.17 60 40 1;6.23 46 8 30 8 8 1;7.18 18 55 9 9 9 1;8.18 20 7 26 7 14 26 1;10.4 18 35 41 6 1;11.10 8 4 50 4 26 8 2;0.22 25 6 32 6 25 6 2;1.19 32 12 47 6 3 2;.2.17 20 16 32 4 4 12 12 Table 4: Production of target neutralized [] (/, e/ → []) in relative percentages. Marta [] ∅[] ∅σ [a] [] [i] [e] [] [u] [o] [] 1;2 25 8 59 8 1;3.8 6 94 1;4.8 100 1;5.17 30 5 37 5 21 2 1;6.23 62 19 3 3 3 10 1;7.18 42 28 3 6 12 3 6 1;8.18 36 2 49 2 2 2 2 5 1;10.4 29 2 39 2 12 1 1 13 1 1;11.10 55 7 17 11 3 7 2;0.22 43 14 27 4 4 8 2;1.19 47 3 14 24 3 6 3 2;.2.17 67 33 Table 5: Production of target neutralized [] (/a/ → []) in relative percentages.
Although not as clearly shown as in Inês’s data, Marta also develops production repair strategies for target [] before she does so for target [], and syllable deletion is generally more frequent with [] than with [] (see column ∅σ in Tables 4 and 5). Considering the rate of success at 2;2 (20% for [] and 67% for []), it is obvious that [] develops before []. Again, coronals and labials for target [] and labials for target [] emerge mainly in contexts of 5
Following Fikkert (1994), I assume that 50% of productions matching the target structure cue the beginning of its acquisition.
ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE 189
harmony processes in the child’s system, as shown by the examples /knet/ → [keket] “pen” (Marta: 1;6.23) or /lju/ → [ulju] “watch” (Marta: 1;8.18). For target [], the repair vowels are mainly: (i) (ii)
The dorsal [], matching the target in this specific feature (from 1;10 to 2;0, the values range between 41% and 25%); the coronal vowels, matching the V-place of the target lexical vowels (at 1;5, the coronal [i] replaces [] in 40% of the cases).
The results reported in Tables 6 and 7 refer to the data for João that correspond to the percentage of occurrences of production forms for target reduced [] (Table 6) and target reduced [] (Table 7), in word non-final position. João [] ∅[] ∅σ [i] [] 1;8.13 100 1;9.11 100 1;10.8 1;11.13 2;0.19 2;1.23 100 2;2.28 12 6 70 6 6 2;3.19 100 2;4.30 18 76 6 2;6.11 85 15 2;7.22 17 66 17 2;8.27 21 58 21 Table 6: Production of target neutralized [] (/, e/ → []) in relative percentages. João [] ∅σ [a] [i] [e] 1;8.13 1;9.11 67 33 1;10.8 100 1;11.13 91 9 2;0.19 17 66 17 2;1.23 13 73 14 2;2.28 9 82 9 2;3.19 9 91 2;4.30 15 76 9 2;6.11 69 31 2;7.22 67 33 2;8.27 91 3 3 3 Table 7: Production of target neutralized [] (/a/ → []) in relative percentages.
As is attested for Inês and Marta, João clearly masters [] before []. The successful production of [] ranges between 12% and 21%, while, for [], João shows values above 69% from 2;6 on, and the last session corresponds to 91%. A poor system of repair vowels is exhibited by the child. For target [], only dorsals and coronals occur, matching neither the V-place of the output form nor that of the underlying vowel, with the child showing a preference for the
190
M. JOÃO FREITAS
coronal [i] in the last two sessions. With regard to the target vowel [], the dorsal [a] is the most frequent repair vowel selected by João. 5.2
Word-final vowels In this section, I will describe the behaviour of the word-final vowels [] and [], which play the morphological role of class markers in EP. Examples in (8) refer to lexical targets with [], whereas examples in (9) correspond to productions of lexical targets with []: → [di] → [kmi] → [tju] → [ufi] → [afi] → [tni]
(8)
/d/ /km/ /tlfn/ /fm/ /avu/ /tlfn/
(9)
/tp/ → [pat]/[pata] /mt/ → [mat]/[mata] /pisez/ → [tita] /ttu/ → [tul] /pk/ → [pba] → [awa] /aw/ /aw/ → [aba] /t/ → [t]/[ta]
(Inês: 1;9.19) (Inês: 1;10) (Marta: 1;2) (Marta: 1;2) (João: 2;3) (João: 2;8.27)
“big” “lotion” “phone” “hunger” “tree” “phone”
(Inês: 1;8.2) (Inês: 1;8.2) (Inês: 1;10.29) (Marta: 1;3.8) (Marta: 1;4.8) (Marta: 1;7.18) (João: 1;11.13) (João: 2;4.30)
“lid” “blanket” “princess” “turtle” “dirty” “water” “water” “Horta”
The information in Tables 8 and 9 below refers to the percentage of occurrences of production forms for target reduced [] (Table 7) and target reduced [] (Table 9) in word-final position in the data for Inês. Like what is attested for non-final vowels produced by the three children, syllable deletion is the strategy selected by Inês initially to deal with the targets [] and [] (see columns ∅σ in Tables 8 and 9). If we compare the data in these two tables, we can observe that word-final [] is clearly mastered before wordfinal []. In the last two sessions (at 1;9 and at 1;10) the values for word-final [] are 72% and 80%, respectively, while at the same age, the corresponding values for word-final [] are 36% and 21%. Table 8 shows that, when repair vowels emerge, coronal [i] is the preferred one to replace the target word-final [], matching the V-place of the lexical vowels. In the case of the word-final [], although the use of repair vowels is not frequent, dorsal [a] is the preferred one. If we compare word-final vowels with their non-final counterparts (Tables 2 and 8), we see that Inês starts acquiring final and non-final [] simultaneously, at 1;8. On the other hand, non-final [] is episodically attested at 0;11 and then starts occurring systematically one session before the word-final [] (Tables 3 and 9).
ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE 191 Inês 0;11.14 1;0.25 1;1.30 1;3.6 1;4.9 1;5.11 1.6.6 1;7.2 1;8.2 1;9.19 1;10.29
Inês 0;11.14 1;0.25 1;1.30 1;3.6 1;4.9 1;5.11 1.6.6 1;7.2 1;8.2 1;9.19 1;10.29
[]
∅[]
∅σ
[i]
[e]
[u]
[]
100 100 100 83
17
50 16 34 36 64 21 11 5 53 5 5 Table 8: Production of target word-final [] in relative percentages. [] 25
∅σ [a] [] [i] [e] 75 100 100 100 17 83 8 92 3 94 3 10 87 3 40 31 15 2 72 17 8 1 1 80 6 13 1 Table 9: Production of target word-final [] in relative percentages.
[u]
12 1
The results reported in Tables 10 and 11 provide the data for Marta corresponding to the percentage of occurrences of production forms for target reduced [] (Table 10) and target reduced [] (Table 11) in word-final position. These tables show that word-final [] is mastered before word-final [] in Marta’s production data. For word-final [], values above 50% occur from 1;4 on, whereas this rate only starts to appear for word-final [] at 1;10. Unlike [], [] is acquired early by the child (see the rates in columns [] and [], Tables 10 and 11). As is the case for Inês, coronal [i] is clearly Marta’s preferred repair vowel to deal with the target word-final [] (see the results for column [i] in Table 10), again, matching the V-place of the lexical vowel in the target system. By comparing word-final vowels (Tables 4 and 10) with their non-final counterparts (Tables 5 and 11), we see that Marta starts acquiring final and non-final [] simultaneously, at 1;5. As for word-final [], it emerges before non-final []. Word-final [] reaches rates above 50% from 1;4 on, while from 1;5 on, the values for its non-final counterpart ranges between 30% and 67%, although results above 50% occur in only three sessions. Tables 12 and 13 provide information on the occurrences of production forms for target reduced [] (Table 11) and for target word-final reduced [] (Table 13) in the data for João.
192 Marta 1;2 1;3.8 1;4.8 1;5.17 1;6.23 1;7.18 1;8.18 1;10.4 1;11.10 2;0.22 2;1.19 2;.2.17
Marta 1;2 1;3.8 1;4.8 1;5.17 1;6.23 1;7.18 1;8.18 1;10.4 1;11.10 2;0.22 2;1.19 2;.2.17
João 1;8.13 1;9.11 1;10.8 1;11.13 2;0.19 2;1.23 2;2.28 2;3.19 2;4.30 2;6.11 2;7.22 2;8.27
João 1;8.13 1;9.11 1;10.8 1;11.13 2;0.19 2;1.23 2;2.28 2;3.19 2;4.30 2;6.11 2;7.22 2;8.27
M. JOÃO FREITAS []
∅[] 66 25
[i]
∅σ
[e]
[a]
[]
[u] 34
75
10 70 10 21 5 48 26 20 68 12 33 11 11 45 73 20 7 80 10 7 3 73 9 9 9 63 21 11 5 64 36 Table 10: Production of target word-final [] in relative percentages.
10
[] ∅[] ∅σ [a] [] [i] [u] 41 13 33 13 24 38 21 14 3 69 6 15 10 82 4 2 10 2 83 7 5 5 84 6 6 3 91 3 3 3 81 8 3 5 3 91 2 2 5 87 7 2 4 78 4 2 2 15 86 2 10 2 Table 11: Production of target word-final [] in relative percentages. []
∅[]
[i]
∅σ 100
[e]
[]
[u]
100 14 71 22 26 43 9 11 4 11 41 26 26 43 14 29 14 14 13 31 56 33 42 8 63 37 7 18 71 4 Table 12: Production of target word-final [] in relative percentages. [] 80 45
∅[]
∅σ 50
[a]
[] 20 5
[i]
[e]
[]
43 50 5 47 43 48 7 7 7 7 22 85 13 88 8 4 84 1 10 1 2 1 88 6 4 2 88 5 5 2 96 2 2 Table 13: Production of target word-final [] in relative percentages.
[]
3
[]
14 8 17
[u]
2 10 3 1
ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE 193
The pattern observed in the production data from Inês and Marta is maintained in João’s data: like the two other children, João clearly masters word-final [] before word-final []. Table 12 reveals his preference for coronal [i] as the most frequently selected repair vowel to replace target [], matching the V-place of the abstract vowel in the adult system. As for word-final [], the first six sessions (the ones where the structure is not stable—see the rates below 50% in Table 13) show that the preferred repair strategy is syllable deletion. By comparing Tables 6 and 12 with Tables 7 and 13, we observe that João acquires word-final [] and [] before their non-final counterparts. In the case of target [], syllable deletion (the strategy that cues the first stage of acquisition of unstressed vowels), is much more frequent in non-final than in word-final position. For target [], João starts exhibiting rates around 50% at 2;0 for word-final [], while the non-final vowel only reaches values above 50% from 2;6 on. 6.
Discussion and conclusions In this paper, we have reported ongoing research on the acquisition of the vowel inventory in EP, by focusing on the mastery of the two vowels [] and [] that result from the phonological processes of /, e/ → [] and /a/ → [] reduction in unstressed positions. As mentioned previously, the main goal of this research was to evaluate the impact of phonological complexity in the acquisition of vowel reduction, as associated with these two processes. Considering that /, e/ → [] involves changes under two nodes in the vocalic featural structure (Height and V-place), and that /a/ → [] involves only changes in the domain of one single node (Height), our purpose was to test the effect of these different levels of complexity in the phonological development of the Portuguese children observed. The first conclusion that we can extract from the data described in Section 5 is that syllable deletion is the repair strategy used at the initial stage of acquisition of Portuguese unstressed vowels. This confirms information in the literature on the initial deletion of syllables containing unstressed vowels, constrained by the lack of acoustic prominence of these segments as heard in the output (see Bernhardt & Stemberger 1998 for an overview). However, although these unstressed vowels are not acoustically prominent, one might assume that they are phonologically prominent in the target system since they are products of highly productive phonological processes that entail both allophony and allomorphy in EP, as shown in Section 2 in this paper. The high productivity of this vowel reduction process in EP led us to test the Portuguese children’s early sensitivity to this aspect of their target grammar. The prediction that children may be sensitive at an early age to the phonological processes that affect the internal structure of segments in their target language has not yet been systematically investigated and a consensus was not found among the few scholars that refer to this subject. Some predict that children are not able to deal with allophony and allomorphy in the target system early on; others assume that children are able from an early age to build abstract lexical
194
M. JOÃO FREITAS
representations based on the information they pick up from the contrasts derived from allophonic and allomorphic variants in the target language (see, for differing perspectives, Bybee 2001; Peperkamp & Dupoux 2002; Hayes 2004; Fikkert & Freitas 2006). Under the initial stage of syllable deletion, we have already observed an asymmetry between [] and [], the latter vowel emerging before the former. In general, Portuguese children start producing a variety of different vowels for the targets under analysis, providing empirical evidence to confirm the prediction that very young children are sensitive to the phonological processes affecting vowels in the adult grammar and that they may be building abstract lexical representations based on the allophonic/allomorphic variants present in the target. However, this general prediction has not yet been systematically explored in the field of phonological acquisition, and further research is therefore called for. More specifically, we observed that the amount of featural information involved has an impact on the acquisition of the vowel reduction process: for both word-final and word-internal vowels, the acquisition of /a/ → [] preceded the acquisition of /, e/ → [] in the children observed. In other words, and as expected, acquiring a process where only Height features change is easier than mastering a process involving changes under both the V-place and Height nodes (confirming our Hypothesis 1, namely that the amount of featural changes will have an impact on the acquisition of vowel reduction). As for the nature of the repair vowels, we may observe that [i] is often preferred to replace target []. This can be interpreted as the result of two processes. In the first process, the children are reducing vowels but only the Height node is under analysis, for this is the most prominent feature change in the general process of vowel reduction in EP, given that it affects the three sets of vowels, as shown in (3) in Section 2 (/, e/ → []; /a/ → []; /, o/ → [u]). This is coherent with the order of acquisition mentioned above. In the second process, the children are picking up, from the allophonic and allomorphic variants, the coronal nature of the abstract underlying vowels /, e/ that surface in stressed position. This effect is even more surprising when we consider that word-final [] never contrasts with a stressed coronal vowel /e/, which is assumed to be the phonological shape for the word-final allophone []. Our hypothesis is that the general acquisition of the /, e/ → [] process, along with the presence of non fully specified root nodes in the children’s lexical representations, may derive from the emergence of coronal vowels wordfinally. As for target [], children preferably use dorsals, including [], especially in a non-final position. Again, this shows that the Height node is being acquired, although the degree of openness is not yet mastered, ranging from [a] to []. The types of repair vowels exhibited by the children are, therefore interpreted as a consequence of the presence of early abstract phonological representations in the child’s system. The absence of stress contrast for word-final vowels carrying morphological content could entail a delay in the acquisition of these target
ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE 195
neutralized vowels relative to non-final ones. However, this does not appear to be the case. Therefore, our Hypothesis 2, namely that the absence of stress will delay the acquisition of word-final vowels over non-final ones, has not been confirmed by the data observed. The three children showed one of two patterns of emergence: (a) simultaneous emergence of the vowel in both structural positions; (b) emergence of the word-final vowel before its non-final counterpart. In both cases, the rate of acquisition was faster for word-final vowels than for non-final ones. This might be interpreted as the consequence of the presence of morphological content in the word-final vowel, which would promote phonological development. A similar pattern has been attested for the EP word-final Coda fricative that encodes plural marking, which is acquired before word-internal Coda fricatives, as reported in Freitas, Miguel & Faria (2001). Unlike syllable deletion, the vowel deletion strategy shows low rates of selection in the three children, both word-finally and word-internally. This tendency has already been noted by Freitas (1997) and (2004) and in Frota, Vigário & Freitas (2003). The vowel deletion strategy is highly productive in the output forms and it entails a mismatch between abstract lexical representations of words and their output form. As stated in the research above, despite the mismatch caused by the productivity of vowel deletion in adults, Portuguese children are able to pick up information from the target grammar, which allows them to rebuild the lexical structure of words and therefore to produce vowels in contexts where adults delete them; and only later in development Portuguese do children start using this deletion strategy in spontaneous speech. To conclude, the data explored in this paper show that the unstressed vowels observed are far from being stable from the beginning of speech production. In previous research (Freitas 2004), we reported that, from the beginning of production, Portuguese children are able to distinguish the vowel [] that fills empty prosodic constituents from the vowel [] that follows from /, e/ neutralization in unstressed position. Moreover, we observed that Portuguese children also clearly distinguish two different phonological paradigms for neutralized [] in the target system (/e/ → []; /a/ → []), producing different repair vowels for each of the paradigms (Fikkert & Freitas 2006). In this paper we have shown that Portuguese children start acquiring the vowel reduction process early in the process of phonological development. Moreover, Portuguese children reveal early sensitivity to the amount of featural information associated with the structures observed. The complexity of the target system thus seems to develop Portuguese children’s early sensitivity to the shape of vowels and to allophonic variation, showing that the richness of the system may promote the early mastery of phonological structures.
196
M. JOÃO FREITAS
References Andrade, Ernesto. 1977. Aspects de la phonologie (générative) du portugais. Lisbon: INIC. Bernhardt, Barbara & Joseph Stemberger. 1998. Handbook of Phonological Development from the Perspective of Constraint-based Nonlinear Phonology. San Diego: Academic Press. Bybee, Joan. 2001. Phonology and Language Use. Cambridge: Cambridge University Press. Clements, George N. & Elizabeth Hume. 1995. “Internal organization of speech sounds”. The Handbook of Phonological Theory ed. by John A. Goldsmith. Oxford: Blackwell. Fikkert, Paula. 1994. On the Acquisition of Prosodic Structure. Leiden: HIL. ---------- & Maria João Freitas. 1998. “Acquisition of syllable structure constraints: Evidence from Dutch and Portuguese”. Proceedings of the Generative Approaches to Language Acquisition 1997 Conference on Language Acquisition ed. by Antonella Sorace, Caroline Heycock & Richard Shillcock. 217-222. Edinburgh: University of Edinburgh. ---------- & Maria João Freitas. 2006. “Allophony and allomorphy cue phonological acquisition: Evidence from the European Portuguese vowel system”. Catalan Journal of Linguistics 5. 83-108. Freitas, Maria João. 1997. Aquisição da estrutura silábica do português europeu. [Acquisition of the syllable structure in European Portuguese]. PhD diss., University of Lisbon. ----------. 2004. “The vowel [] in the acquisition of European Portuguese”. Proceedings of Generative Approaches to Language Acquisition 2003 ed. by Jacqueline van Kampen & Sergio Baauw. 163-174. Utrecht: LOT. ----------, Matilde Miguel & Isabel Faria. 2001. “Interaction between prosody and morphosyntax: Plurals within codas in the acquisition of European Portuguese”. Approaches to Bootstrapping. Phonological, Lexical. Syntactic and Neurophysiological Aspects of Early Language Acquisition ed. by Jurgen Weissenborn & Barbara Hohle. 45-58. Amsterdam: John Benjamins. Frota, Sónia, Marina Vigário & Maria João Freitas. 2003. “From signal to grammar: Rhythm and the acquisition of syllable structure”. Proceedings of the 27th Annual Boston University Conference on Language Development ed. by Barbara Beachley, Amanda Brown & Frances Colin. 809-821. Sommerville: Cascadilla Press. Hayes, Bruce. 2004. “Phonological acquisition in Optimality Theory”. Constraints in Phonological Acquisition ed. by René Kager, Joe Pater & Wim Zonneveld. 158-203. Cambridge: Cambridge University Press. Levelt, Clara. 1994. On the Acquisition of Place. Leiden: HIL. Macken, Marlys. 1995. “Phonological acquisition”. The Handbook of Phonological Theory ed. by John A. Goldsmith. 671-696. Oxford: Blackwell Publishers. Mateus, Maria Helena. 1975. Aspectos da fonologia portuguesa. Lisbon: INIC.
ACQUISITION OF UNSTRESSED VOWELS IN EUROPEAN PORTUGUESE 197
---------- & Raquel Delgado-Martins. “Contribuição para o estudo das vogais átonas [] e [u] no Português Europeu”. A Face Exposta da Língua Portuguesa ed. by Maria Helena Mateus. 137-152. 1982. Lisbon: Imprensa Nacional – Casa da Moeda, 2002. ---------- & Ernesto d’Andrade. 2000. The Phonology of Portuguese. Oxford: Oxford University Press. Morais Barbosa, Jorge. 1983 [1968]. Études de phonologie portugaise. 2nd ed. Évora: Universidade de Évora. Peperkamp, Sharon & Emanuel Dupoux. 2002. “Coping with phonological variation in early lexical acquisition”. The Process of Phonological Acquisition ed. by Ingeborg Lasser. 359-385. Berlin: Peter Lang Verlag. Vigário, Marina. 2003. The Prosodic Word in European Portuguese. Berlin: Mouton de Gruyter.
THE PERCEPTION OF LEXICAL STRESS PATTERNS BY SPANISH AND CATALAN INFANTS ∗
FERRAN PONS & LAURA BOSCH University of British Columbia & University of Barcelona
Abstract Previous research with English-learning infants has shown that stress cues can have a powerful influence on early word segmentation. Early sensitivity to the predominant lexical stress pattern (trochaic) in the native language has been observed in English and German, two stress-timed languages (Jusczyk, Cutler & Redanz 1993; Höhle 2002). In this paper, we offer data from two syllable-timed languages: Catalan and Spanish. We report experiments aimed at studying infants’ preferential patterns and discrimination abilities for trochaic vs iambic word forms. Results indicate that neither six-month-old nor nine-month-old Catalan- and Spanish-learning infants show a preference for either stress pattern, although they are able to discriminate between them. It is argued that failure to observe a trochaic preference can be attributed to frequency factors of specific lexical stress patterns in these languages. Stress cues alone would not be sufficient for early lexical segmentation in this case.
1.
Introduction To acquire a language, infants must learn to identify and segment words from a speech stream. Identifying words is not a simple task for an infant, because speakers do not consistently produce silent pauses between words when speaking (e.g. Cole & Jakimik 1980). Most of the speech directed to infants consists of words strung together into sentences or sentence fragments (van de Weijer 1998). Even in situations in which mothers are explicitly encouraged to teach new words to their infants, only a minority of words (around 20%) are uttered in isolation (Woodward & Aslin 1990). An ability that infants may bring to bear on word segmentation is the statistical learning mechanism demonstrated by Saffran and colleagues (Aslin, Saffran & Newport 1998; Saffran, Aslin & Newport 1996). However, in an infant’s natural environment, the speech stream contains multiple redundant cues to word
∗ This research has been supported by grant SEJ2004-06429/PSIC from the Spanish Ministerio de Educación y Ciencia and the Human Frontier Science Program (HFSP) grant RGP68/2002. We thank Marta Ramon and Eva Águila for their assistance in conducting the experiments, Xavier Mayoral for technical support and the parents of the infants for their participation. We are grateful to the editors and an anonymous reviewer of an earlier version of the manuscript for their useful comments.
200
FERRAN PONS & LAURA BOSCH
boundaries, including not only statistical information but also stress cues, phonotactic rules, and allophonic cues (Johnson & Jusczyk 2001; Jusczyk 1999; Jusczyk et al. 1993; Jusczyk, Hohne & Bauman 1999; Mattys & Juszcyk 2001; Mattys, Juszcyk, Luce & Morgan 1999; Thiessen & Saffran 2003). One of the acoustic cues that seem to have a powerful influence on word segmentation is syllable prominence or stress. Strong and weak syllables alternate in various ways in connected speech, forming specific rhythmic patterns that infants are able to perceive from an early age. Sensitivity to prosodic information and attention to the rhythmic properties of the input language have been observed very early in development and it has been shown that newborn infants are able to distinguish between languages from different rhythmic classes (Mehler & Cristophe 1994; Mehler, Jusczyk, Lambertz, Halsted, Bertoncini & Amiel-Tison 1988; Moon, Cooper & Fifer 1993; Nazzi, Bertoncini & Mehler 1998). Significantly, for levels below sentence prosody, changes in the rhythmic patterns of alternating strong-weak (SW) syllables are already detected by infants between one and four months of age (Jusczyk & Thompson 1978). It has also been demonstrated that during the first year of life children progressively gain knowledge about the distribution of stress within words in the language. For example, at nine months of age, but not at six months of age, American infants show a trochaic bias, preferring to listen to lists of strong-weak disyllabic words (trochaic), as opposed to lists of weakstrong (WS) disyllabic words (iambic), a stress pattern atypical of English (Jusczyk et al. 1993). The predominant stress pattern of English disyllabic words is also found in other languages such as German and Dutch. These are also considered stress-timed languages, in which stressed syllables, being more salient than unstressed ones, contribute to the languages’ specific rhythmic properties (see Ramus, Nespor & Mehler 1999, for a characterization of specific rhythm metrics that support the distinction between stress-timed and syllable-timed languages). A recent study has shown that six-month-old German infants listen significantly longer to trochaic than to iambic items when presented with two syllable pseudo-words varying only in stress pattern but not in phonetic context (Höhle 2002). These data from German-learning and English-learning infants give evidence of the refinement of their sensitivity to distributional information present in their linguistic input. Moreover, it is important to be aware that infants not only show a preference for the most frequent word stress patterns in their speech input, but are also able to successfully exploit this information to locate possible word-boundaries. Word segmentation literature offers positive evidence of infants’ use of stress cues to help them group syllables that form small units and segment speech. Morgan (1996) reported that six-month-old English-learning infants tended to perceive pairs of syllables as cohesive units only when they exhibited a trochaic (i.e. SW) rhythm. Approximately 90% of disyllabic content words in English speech follow a SW stress pattern (Cutler & Carter 1987). Given this high percentage, one could assume that strong syllables will mark the occurrence of a word boundary (i.e. the onset of a new word). Cutler and
THE PERCEPTION OF LEXICAL STRESS PATTERNS
201
Norris (1988) referred to this process as the Metrical Segmentation Strategy (MSS). Adult speakers of English have been shown to exploit this strategy to hypothesize word boundaries (McQueen, Norris & Cutler 1994; Norris, McQueen & Cutler 1995). Because newborns are sensitive to the rhythmic structure of their language (Mehler et al. 1988; Nazzi et al. 1998), a reasonable segmentation strategy for English-learning infants might be to posit the beginning of a new word at each stressed syllable. This hypothesis was explored in a study conducted by Jusczyk, Houston and Newsome (1999). The results revealed two significant findings. The first one was that infants at 7.5 months of age could only segment disyllabic words with a trochaic stress pattern (e.g. kingdom). Importantly, infants parsed both syllables of the word and were not just responding to the first stressed syllable (i.e. king). They could not segment WS words (e.g. guitar) from fluent speech. Furthermore, when a WS word was consistently followed by the same unstressed word (e.g. guitar followed by is), infants treated the stressed syllable and the following unstressed monosyllabic word as a unit (i.e. taris). However, although 7.5month-old English-learning infants perceive stressed syllables as markers of word onsets, the distribution of syllables within the speech stream is also influential. For example, when two syllables consistently co-occur, they are perceived as two parts of a single unit. The second important finding was that it was not until 10.5 months that infants could segment WS words from fluent speech. It was therefore concluded that the older infants must have either used a different strategy altogether, or else complemented the MSS with another, more advanced strategy that allowed them to detect words that followed a weak-strong stress pattern. These advanced strategies are thought to be based on phonotactic or allophonic information, which gradually becomes integrated and can be reliably used for word segmentation purposes (Jusczyk, Hohne & Bauman 1999; Mattys & Jusczyk 2001). The early availability of stress information to help find word boundaries has been further attested in studies using conflicting cues: in these situations, eight-month-old English-learning infants prioritize stress information over phonotactic or distributional cues for word segmentation (Johnson & Jusczyk 2001; Mattys et al. 1999; but see Thiessen & Saffran 2003, for an alternative account based on an artificial language study). Finally, the role of stress for word segmentation has received further support in a recent study by Curtin, Mintz and Christiansen (2005), also using a simple artificial language. They have shown that seven- and ninemonth-old infants, after being familiarized with a continuous stream of CV syllables with no segmentation cues other than stress, can use an initial, strong syllable strategy to parse sequences in the familiarization stream. These results are taken as evidence that stress information does not need to co-occur with other segmental or suprasegmental information to promote word-form segmentation. Furthermore, the study also argues that stressed and unstressed syllables are represented differently and that infants encode stress information in their representation of potential word forms (Curtin et al. 2005).
202
FERRAN PONS & LAURA BOSCH
The segmentation studies reviewed above were conducted with Englishlearning infants. In cross-language research, it has also been shown that by eight-nine months of age, English and Dutch infants are more successful in segmenting targets with trochaic stress than iambic stress (Echols, Crowhurst & Childers 1997), and both Dutch-learning and English-learning nine-monthold infants can segment strong-initial Dutch words from fluent Dutch speech (Houston, Jusczyk, Kuijpers, Coolen & Cutler 2000). Dutch differs from English in many phonetic features, but shares with it rhythmic structure and the fact that most words begin with strong syllables (Rietveld & Koopmans-van Beinum 1987; Schreuder & Baayen 1994). Converging results in English and Dutch confirm the hypothesis that stress cues play an important role in helping the learner to determine boundaries between words, at least for stress-timed languages. Infants’ segmentation abilities have also been recently explored in both European and Canadian French, a syllable-timed language characterised by an iambic rhythm (Schane 1968). Published data are still scarce and rather controversial. For Canadian French, Polka, Sundara and Blue (2002) tested French-learning 7.5-month-old infants using the Head-Turn Preference Procedure (HPP) to measure their ability to segment disyllabic words after the infants had been familiarized with trochaic or iambic disyllables. The results demonstrated that infants could only segment disyllabic words with the predominant stress pattern of their native language, that is, WS words. However, studies of European French-learning infants have not yielded consistent results (Gout 2001; Iakimova, Nazzi, Sundara & Polka 2006; Nazzi, Iakimova, Bertoncini, Frédonie & Alcántara 2006). While some experiments suggest that monosyllabic items are the first units to be successfully segmented and disyllabic word segmentation is not reached until around sixteen months of age (Gout 2001; Nazzi et al. 2006), recent work reveals that eight-month-old French-learning infants are actually able to segment disyllabic words in certain test conditions. It has been suggested that dialectal differences might explain these controversial results (Iakimova et al. 2006). An interesting issue that stands out from this research on French-learning infants is that the syllable may play an important role as a unit of prosodic segmentation in that language (and possibly in other syllable-timed languages, such as Spanish or Catalan). This, in turn, suggests the need for more cross-linguistic research involving languages from different rhythmic classes, as a way to explore the possibility that variation in the development of infants’ segmentation abilities depends on the rhythmic properties of their native language. From the review of the word segmentation literature it can be concluded that the hypothesis that stress is a salient cue used by the infant to parse sequences from the continuous speech stream seems to be valid, especially in the case of stress-timed languages (Jusczyk, Houston & Newsome 1999). In addition, the results of this last study also reveal that after just a few months of exposure to the input language, infants are able to process distributional information regarding stress to the extent that the predominant stress pattern in
THE PERCEPTION OF LEXICAL STRESS PATTERNS
203
that language is identified and can be used as a cue to parse the speech input and find new word forms. But another conclusion that can be drawn from the literature is that differences in the emergence of segmentation abilities appear as soon as languages from different rhythmic classes are taken into consideration. This evidence stemming from cross-linguistic segmentation studies could also be applied to studies in which the sensitivity to nativelanguage lexical stress patterns is explored by means of preferential attention tasks. Although English-learning and also Dutch-learning infants’ sensitivity to the predominant lexical stress patterns of native language words is well documented, a question still remains regarding the development of this sensitivity and the role of stress in word segmentation in other languages with different rhythmic properties and differences in lexical stress assignment. In this paper we take a first step toward addressing this question by studying preference for lexical stress patterns in infants raised in environments where Romance languages, in this case Catalan and Spanish, are spoken. We begin by reviewing the main findings regarding discrimination and preference for word stress patterns that have been obtained for English and other languages with a similar rhythmic structure. Subsequently, we characterize the cases of Catalan and Spanish in terms of lexical stress, and finally, we report experimental data for nine-month-old infants tested for preference and discrimination with different lexical stress patterns. Results from our experiments suggest that infants may process lexical stress information differently in the languages under study. 1.1
Infants’ sensitivities to lexical stress patterns: Previous research As mentioned above, a few studies have explored infants’ sensitivity to lexical stress in words. In contrast with the extensive word segmentation literature, research focusing on infants’ differentiation and preference for the predominant lexical stress pattern of their native language is quite limited. However, all results of research on this issue converge in showing that sensitivities to distributional information regarding stress patterns are in place during the second half of the first year of life and ready to be subsequently exploited to identify words in the speech input. This position can be held at least for infants exposed to stress-timed languages. The capacity to discriminate stress patterns was first observed in a study by Jusczyk and Thompson (1978) using the High-Amplitude Sucking (HAS) technique. They found that infants as young as one month could discriminate two-syllable utterances that differed in stress contour only (strong-weak vs weak-strong). This early capacity is certainly related to young infants’ sensitivity to the acoustic correlates of accentuation that can also be observed in newborns (Sansavini 1997). After six months of age, it becomes more relevant to analyze whether these initial sensitivities are applied to detect characteristic stress patterns in the language of exposure and further help segment the speech stream into word-form units. This is the direction of the research that was carried out a decade later by Jusczyk and collaborators
204
FERRAN PONS & LAURA BOSCH
(1993). They found with their Head-Turn Preference Procedure that ninemonth-old infants listened significantly longer to disyllabic words bearing a strong-weak pattern (e.g. butter and ardour) than to ones bearing a weakstrong pattern (e.g. between and arouse). Results held even for low-pass filtered speech, when segmental detail was suppressed. The material in this experiment included two types of lists (corresponding to the strong-weak and weak-strong stress patterns) containing twelve disyllabic English words each. Lists were carefully matched according to the phonetic characteristics of the syllables and frequency of the words. Stress pattern differences were carried by the strong syllables, which exhibited the expected differences in duration, intensity and pitch, and not the unstressed ones. We refer to details of this study because this work not only provided the basis for subsequent research for both English and German, but also, more relevant here, it is a crucial reference for the research involving Spanish and Catalan reported in this paper. Turk, Jusczyk and Gerken (1995) extended the findings reported by Jusczyk et al. (1993). They examined the role of syllable weight in infants’ preference for the strong-weak pattern observed in the previous study. Using the same procedure (HPP), but using nonsense words instead of real ones, they obtained the same results (infants’ preference for strong-weak). It was also observed that syllable weight was not a necessary component for the strongweak preference reported previously. Finally, using the same procedure, Höhle (2002) found that six-month-old German infants listened significantly longer to trochaic than to iambic items when presented with two-syllable pseudo-words varying only in stress pattern. Similar studies for syllable-timed languages are nonexistent. Work on French regarding the trochaic bias (English-learning infants’ preference for trochees) has focused on production, i.e. babbling (Vihman, DePaolis & Davis 1998), which is only indirectly related to discrimination and preference for lexical stress patterns in perception. More recent work, in which English and French infant data are analyzed, focuses on familiar word recognition and the integration of stress and segmental information in the lexical representation (Vihman, Nakai, DePaolis & Hallé 2004). Although it does not directly analyze infants’ preferences for lexical stress patterns, it shows that stress is encoded in early lexical representations and that English and French elevenmonth-old infants are differently affected by changes to consonantal onset in accented syllables in a word recognition task. This can be taken as indirect evidence that knowledge of specific word stress pattern of the native language has already been assimilated at that age. It also suggests that cross-linguistic research is crucial to identify differences in the building of this type of knowledge. Thus, infants’ sensitivity to frequent or predominant lexical stress patterns remains to be explored for syllable-timed languages, in order that data so far restricted to stress-timed languages can be considered from a broader perspective. This is the aim of the research reported below on infants growing up in Spanish- and Catalan-speaking families. These two Romance languages
THE PERCEPTION OF LEXICAL STRESS PATTERNS
205
offer, in addition, the possibility of exploring the development of infants’ preferences for lexical stress patterns in languages which allow stress position in the word to vary (unlike French, where stress invariably falls on the same syllable, at the right edge of the word). 1.2
Stress pattern in Spanish and Catalan Though Spanish is a Romance language that can be considered to have a trochaic rhythm like English, it differs in the location of lexical stress. While in English stress falls primarily on the initial syllable of a word, Spanish is one of the languages in which stress does not always coincide with one of the word boundaries. Polysyllabic words in Spanish have one syllable marked for primary stress and this is usually the second-to-last syllable (stress medial pattern), although stress in other positions is also found (Navarro Tomás 1965). About 70% of trisyllabic words have stress on the medial syllable, according to computations done on the LEXESP database (Sebastián-Gallés, Martí, Carreiras & Cuetos 2000). For disyllabic words, the most frequent word structure in Spanish (half of the words infants hear are disyllabic forms, according to Prieto 2006), trochees are more often found than iambs, but only about 65% of these words begin with a strong syllable (Alcina & Blecua 1975; Guerra 1983; Quilis 1981). Corpus studies based on speech addressed to children, which report the frequency with which certain words actually occur in the input, reveal a similar trend (Roark & Demuth 2000; Saceda-Ulloa 2005). However, the actual percentage of trochees reported in the two studies differs significantly (Saceda-Ulloa 2005 finds a much higher presence of trochees in her data), probably due to the specific characteristics of the respective corpora and differences between the Spanish dialectal varieties involved. Nevertheless, the SW pattern is clearly predominant for two-syllable words in Spanish albeit less so than what has been found for disyllabic content words in English. Another relevant difference between Spanish and English words is the massive presence in the latter of stressed, monosyllabic items, with a binary foot structure (around 80% in English, according to Roark & Demuth 2000, and only 26% in Spanish, according to Prieto 2006). Lexical stress is also variable in Catalan, as in Spanish, and although Catalan has more monosyllabic and disyllabic iambic words than Spanish due to the productive historical loss of word-final masculine vowel markers, it can be considered a trochaic language (Prieto 2006). Around 27% of content words are monosyllabic and only 19% are tri- or polysyllabic words (Guasti & Gavarró 2003). The presence of stressed monosyllabic items is even higher (35%), based on calculations carried out on child-directed speech (Prieto 2006). However, it is far below the high percentage observed for English. Catalan is similar to Spanish in terms of the predominance of disyllabic trochees, though the percentage of trochees has been reported to be somewhat smaller in Catalan. Different authors have obtained slightly different SW:WS ratios for disyllabic content words in Catalan, ranging from 59%:41% (Guasti, de Lange, Gavarró & Caprin 2004), to 63%:37% (Prieto 2006) and 66%:34% (Cabré 1993), the
206
FERRAN PONS & LAURA BOSCH
differences resulting mainly from whether the computations were based on child-directed or adult-directed speech and from the specific characteristics of the target samples. Taken together, the computations performed on Spanish and Catalan material reveal that a trochaic stress pattern is frequent, but when compared to English data, the SW pattern is far less predominant (around 90% of disyllabic content words in English speech follow a SW stress pattern, according to Cutler and Carter (1987), and around 90% of all the words English-learning infants hear begin with a strong syllable, as shown by Roark & Demouth 2000). 2.
Preference studies: Spanish and Catalan infant data The lexical stress cue seems to have a strong impact on word segmentation in English, Dutch (Curtin et al. 2005; Echols et al. 1997; Jusczyk, Houston & Newcome 1999) and French (Polka et al. 2002). As already reviewed, between six and nine months of age, infants gain sensitivity to the sound organization of their native language and rapidly increase their knowledge of the predominant stress patterns of native language words and their phonotactic constraints. The knowledge of the typical prosodic shape of individual words (word initial stress, as in English and other Germanic languages) can then be used as a segmentation strategy, and this strategy will have a high probability of success in these languages, where stress placement is rather regular. However, for languages such as Catalan or Spanish, in which stress placement is more variable, this cue is less informative and may not be reliable for word segmentation purposes. As a first step towards exploring the reliability of stress cues to word boundaries in early segmentation by Spanishand Catalan-learning infants, the analysis of the pattern of preference for SW vs WS disyllabic word forms was undertaken. If a preferential response could be observed even in these languages, where SW words are frequent but statistically less predominant than in English, then the metrical segmentation strategy would subsequently be explored in this population. If no preference were observed, then further research should be focused on the identification of the specific cues that can be more reliably used in early word segmentation in these Romance languages. In the present experiment, in order to explore the infants’ preference for iambic or trochaic lists of words, a slightly modified version of the HPP was used (Sebastián-Gallés & Bosch 2002). The Head-Turn Procedure has been extensively used in infant speech perception research during the past twenty years and can be considered a viable tool to study word recognition, memory and categorization of different kinds of speech materials (Kemler Nelson, Jusczyk, Mandel, Myers, Turk & Gerken 1995). The procedure has previously been used with English-learning infants to show preference for words with a strong-weak stress pattern (Jusczyk et al. 1993). In the current version of the procedure used in our laboratory, the infants were seated on their parent’s lap, while the parent listened to music through insulated headphones. There were
THE PERCEPTION OF LEXICAL STRESS PATTERNS
207
three computer monitors in the testing room. The central monitor was located directly in front of the infant at distance of 90 cm. The two lateral monitors were above loudspeakers at 35° to the left and right of the infant and at a distance of 70 cm. A video camera was placed above the central monitor behind a dark curtain. The lens of the video camera protruded from a small hole in the curtain. A trial started with a colourful animated image appearing on the central monitor. This image would disappear as soon as the infant looked in that direction and a different image would appear on one of the lateral monitors. When the infant’s gaze was directed to the lateral screen (a head turn of at least 30°), the auditory material was played. The image was presented on the screen until completion of the trial, or until the infant ceased to look in that direction for more than two consecutive seconds. The researcher was in the adjacent room recording on-line the infant’s looking time by pressing and releasing a key on the computer keyboard. He was completely unaware of the type of stimuli that were being played. The behaviour of the infant was video-recorded for later reliability checking. Four training trials were initially conducted to acquaint the infant with the procedure. These were immediately followed by twelve test trials in which different trochaic and iambic materials (six lists of each type) were assigned to the two sides in random order in such a way that there were no more than two consecutive trials on the same side and no more than two lists of the same type in a row. Duration of attention to the two different types of lists (trochaic vs iambic) was measured. Stimuli for this experiment were nonsense CVCV words, created using three vowels shared by Spanish and Catalan (a, i, and u), and common consonants also found in both languages (p, t, k, b, d, m, n, l). Two series of experiments were run, with six- and nine-month-old infants respectively, covering the same age levels as in previous research with English-learning infants. The aim of the first experiment was to explore and compare the preference for the stress pattern in six-month-old Catalan and Spanish infants from monolingual families. A total of sixteen Catalan and sixteen Spanish infants participated in this study. Previous studies have indicated that stress pattern preferences arise during the second half of the first year of life from exposure to the ambient language, possibly after some words have already been segmented by means of statistical learning mechanisms applied to the continuous speech signal (Jusczyk et al. 1993; Morgan & Saffran 1995). Our results for the younger age group, as expected, demonstrated that six-month-old Spanish- and Catalan-learning infants do not listen significantly longer to word lists composed of items that follow a specific stress pattern (see Figure 1). Mean times spent looking at the lists were analyzed in a two (Group: Spanish vs Catalan) x two (List: trochaic vs iambic) mixed analysis of variance (ANOVA). There was no effect for List or Group (both F<1), and no significant interaction, F(1, 30) = 1.389, p = 0.248. This result is consistent with previous work reported by Jusczyk et al. (1993), in which no stress pattern preference was found in six-month-old English-learning infants.
208
FERRAN PONS & LAURA BOSCH
As already discussed, previous studies have suggested that stress preference emerges around nine months of age. For this reason, a second series of experiments was run on an older sample of sixteen Catalan-learning and sixteen Spanish-learning infants of around nine months of age. The results showed again that neither of the language subgroups preferred either one of the two stimulus lists. As in the previous experiment, no effect for List or Group, was found (both F<1), and no significant interaction (F(1, 30) = 1.014, p = 0.322) was observed. Mean looking times for both lists (trochaic and iambic) are presented in Figure 1. Statistical analysis confirmed the absence of significant differences between the different groups in this study. Our results from nine-month-old Catalan-learning and Spanish-learning infants thus failed to replicate the pattern of preference reported by Jusczyk et al. (1993), who found a preference for the trochaic pattern stimuli in nine-month-old Englishlearning infants. These results suggest that infants exposed to these Romance languages, which, unlike English, do not have a strong-initial lexical stress pattern and which show a less predominant SW pattern in disyllabic words (65% vs 90%), may not develop an early affinity for this metrical property. Presumably, the specific characteristics of the native input would therefore be responsible for the emergence of this pattern of preference, as a consequence of statistical predominance of certain suprasegmental cues in the ambient language. 20
Mean looking times (sec)
18 16 14 12 Trochaic
10
Iambic
8 6 4 2 0 Catalan
Spanish
6-month-old Infants
Catalan
Spanish
9-month-old Infants
Figure 1: Attention time to trochaic and iambic CVCV stimuli for six-month-old and ninemonth-old infants from Catalan and Spanish monolingual families.
However, because no satisfactory explanation can be derived from a null result, additional experiments were designed to gain a deeper understanding of the role of stress in word perception. First, specific analyses were undertaken to assess the acoustic nature of the stress markers in the test stimuli. Second, an
THE PERCEPTION OF LEXICAL STRESS PATTERNS
209
additional experiment with nine-month-old infants was designed to evaluate their capacity to discriminate between the two rhythmic patterns at issue. 2.1
Acoustic measures of stress pattern in the test material The stimuli analyzed had been produced by an adult female speaker. She was instructed to read two-syllable CVCV nonsense words (half of them trochaic and the other half iambic), using a ‘motherese’ or Infant-Directed Speech (IDS) intonation, which is commonly preferred by young infants (Cooper & Aslin 1990; Pegg, Werker & McLeod 1992). IDS has been described as slower, simpler, more clearly articulated, and with higher and wider intonation contours than adult-directed speech (Fernald & Kuhl 1987; Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies & Fukui 1989), and it has been shown to facilitate different aspects of language acquisition (Fernald & Mazzie 1991; Hirsh-Pasek, Kemler Nelson, Jusczyk, WrightCassidy, Druss & Kennedy 1987; Kemler Nelson, Hirsh-Pasek, Jusczyk & Wright-Cassidy 1989). For example, the exaggerated acoustic properties observed in IDS have been considered relevant in infants’ acquisition of phonemic categories (Andruski & Kuhl 1996; Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg & Lacerda 1997). Acoustic analyses were performed using Praat 4.2 (Boersma & Weenink 2004). Each nonsense word and its vowels were labelled. Using Praat scripting language, three routines were written to extract and record the duration (in ms) of each labelled segment, the values of the pitch (mean pitch peak) and the intensity of the labelled vowels (duration, pitch and amplitude being the main acoustic correlates of stress). The analyses revealed that the strong syllables had longer duration, higher pitch and higher intensity than the weak ones (see Figure 2). Results from these acoustic analyses, thus, confirm that the stimuli used in the preference experiments carried unambiguous stress information that distinguished the trochaic from the iambic targets. However, the question still remains as to whether the lack of preference shown by Catalan and Spanish nine-month-old infants could be the consequence of their inability to discriminate between these two stress patterns. 3.
Stress discrimination studies: Spanish and Catalan infant data Although there are many studies supporting the claim that stress patterns can be discriminated from birth using disyllabic ([mama]) or trisyllabic ([takala]) words (Sansavini 1997) or very early in life (Johnson & Jusczyk 2001; Jusczyk 1999; Jusczyk et al. 1993; Jusczyk, Houston & Newsome 1999; Mattys et al. 1999; Thiessen & Saffran 2003), recent data suggest that it may not be until five months of age that infants can perceive a stress pattern change (Weber, Hahne, Friedrich & Friederici 2004). In this latter study, Weber and colleagues did not find any electrophysiological evidence for the detection of stress pattern changes in German infants before five months of age. We could then hypothesize that our Catalan- and Spanish-learning infants have even
210
FERRAN PONS & LAURA BOSCH
more difficulty in distinguishing the iambic and trochaic stress patterns, as SW is not as predominant in Spanish and Catalan as it is in German. To assess this possibility we ran an additional study exploring Spanish and Catalan infants’ capacity to discriminate between SW and WS nonsense words at nine months of age.
Figure 2: Acoustic correlates of stress in the test stimuli used in the infant experiments.
The methodological approach used in this third experimental series was a modified version of the familiarization-preference procedure (Jusczyk & Aslin 1995), previously used by Bosch and Sebastián-Gallés (2001, 2003). This methodology combines features of the Head-Turn Preference Procedure and features characteristic of the word monitoring and auditory priming paradigms used in research with adults. The paradigm uses an extended familiarization phase, with presentation of stimuli contingent on infants’ looking behavior and a test phase in which listening times to different stimuli lists are monitored. Differential responses to novel lists (based on the duration of infants’ visual fixation on an image on a screen) are expected if discrimination is easily achieved (this is known as the ‘novelty effect’). Sixteen infants either from Catalan (eleven) or Spanish (five) language environments were tested on the discrimination task. They were familiarized to lists of either trochaic or iambic disyllabic words until they accumulated two minutes of attention time. After that, the test phase began, in which attention time was monitored during same and switch trials (i.e. trials presenting lists with the same stress pattern as in the familiarization phase, and trials presenting lists with the alternative pattern, respectively). There were eight test trials (four same and four switch) and mean attention time was computed for each type of list. Results from this group of nine-month-old infants from monolingual families showed that infants were able to discriminate the two
THE PERCEPTION OF LEXICAL STRESS PATTERNS
211
stress patterns. Mean looking times for same and switch trials in the test phase were obtained and they were submitted to a two (Familiarization: trochaic vs iambic) x two (Test: same vs switch) mixed analysis of variance (ANOVA). A significant main effect was revealed, with infants looking longer during switch trials than during same trials, F(1, 14) = 5.943, p = 0.03 (M switch = 15.46, M same = 10.78). There was no effect for familiarization, F(1, 14) = 1.143, p = 0.30, and no significant interaction, F(1, 14) = 0.038, p = 0.85. 20 Mean Looking Times (sec)
18 16 14 12 10 8 6 4 2 0 Same
Switch
Figure 3: Attention time during same and switch trials in the discrimination experiment for SW vs WS stress patterns in CVCV stimuli conducted on nine-month-old infants from Spanish and Catalan monolingual families.
As can be observed in Figure 3, infants looked significantly longer during switch trials (in which non-familiarized stimuli were presented) than during same trials. A closer look at individual results revealed that four Spanish-learning infants and eight Catalan-learning infants showed a novelty reaction during the switch trials, thus indicating that they were able to differentiate between the two patterns, independently of the language of exposure. Ongoing research in our laboratory is currently exploring infants’ discrimination of stress patterns in a bigger sample of Spanish monolingual infants in order to confirm and replicate data observed in this smaller sample (Pons & Bosch in preparation). These results indicate that at around nine months of age, Catalan and Spanish infants are able to discriminate different stress-patterned tokens. This discrimination capacity, which may even be present at an earlier age, does not seem to have been modified by language exposure. This result fits with previous findings that have revealed infants’ ability to discriminate stress patterns at birth and at two months of age (Sansavini 1997; Jusczyk & Thompson 1978), and that at nine months infants retain this capacity (Johnson & Jusczyk 2001; Jusczyk 1999; Jusczyk et al. 1993; Jusczyk, Houston & Newsome 1999; Mattys et al. 1999; Thiessen & Saffran 2003).
212 4.
FERRAN PONS & LAURA BOSCH
Final comments The goal of this paper was to thoroughly review previous research on infants’ sensitivity to lexical stress patterns in their native language and to present experimental data on infants’ preferences for prosodic patterns in languages that differ from English in stress placement at the lexical level. More specifically, we wanted to explore how differences in the statistical predominance of a certain stress pattern (SW) may determine differences in infants’ spontaneous preferential listening to SW vs WS lists of stimuli, at two different age levels. While no preference was expected in the younger groups of infants (six-month-olds), a trochaic preference could have been found by nine months of age, in spite of the weaker predominance of this stress pattern in Catalan and Spanish compared to English. However, results indicated that neither six-month-old nor nine-month-old Catalan and Spanish infants showed a preference for the trochaic pattern, in spite of being able to distinguish between these two prosodic patterns, at least at nine months of age. Taken together, these results suggest that infants exposed to languages with a less predominant lexical stress pattern may not develop a clear sensitivity for this metrical property and, consequently, this type of cue may not be useful for word segmentation purposes. However, it can also be argued that the fact that they show no preference for either stress pattern does not necessarily mean that they never use stress cues for word segmentation. It can be hypothesized that Catalan- and Spanish-learning infants may develop a different strategy than English-learning infants, as they cannot initially rely solely on stress to successfully pick out words from the speech stream. Perhaps Spanish and Catalan infants manage to link other types of distributional information (i.e. phonotactics) to the stress cues at an earlier age, and in this way they can successfully exploit these prosodic cues in segmentation. By 10.5 months of age, English-learning infants seem to know that they cannot solely rely on stress being a sufficient cue for segmentation and other cues are then integrated which allow them to better succeed in the task. Spanish and Catalan infants might follow a different process and possibly acquire this knowledge slightly earlier. Further research is needed to clarify this issue. Prior to the current study, there was ample information from languages such as English and Dutch indicating that stress is by itself a sufficient cue that is used by the infant to parse sequences from the continuous stream of speech (e.g. Jusczyk, Houston & Newsome 1999). These studies also supported the idea that after around seven months of exposure to language the predominant pattern has been identified and can be used as a cue to parse novel utterances. Cross-linguistic research, when languages with less predominant prosodic patterns or more variable stress assignment are brought into consideration, may suggest important differences in the relevance and role played by certain sublexical cues to word boundaries. The absence of a pattern of preference for trochees in Spanish- and Catalan-learning infants casts some doubts on the usefulness of this specific prosodic cue to help early word segmentation of
THE PERCEPTION OF LEXICAL STRESS PATTERNS
213
fluent speech, unless it is tied to other sub-lexical cues that more closely reflect the specific suprasegmental properties of the native language. For instance, syllable weight is a relevant factor for stress pattern in Catalan and Spanish. Therefore it can be hypothesized that a structural modification of the stressed syllable of disyllabic words, such as its conversion into a heavy one by the addition of a phonotactically legal consonant coda, may be informative enough to enable Catalan- and Spanish-learning infants to show a pattern of preference. Although it has been shown that syllable weight is not a necessary component of the strong-weak preference in English-learning infants (Turk, Jusczyk & Gerken 1995), it would be interesting to know whether the same is true for a Catalan or Spanish language context. If syllable weight is a relevant factor in lexical stress assignment in these Romance languages, then ninemonth-old infants can be expected to show a preference for material in which stress is congruent with syllable weighting. This hypothesis is currently being tested in our laboratory with a group of nine-month-old infants from Spanish-speaking families (phonotactic constraints limit the possibility of using the same material for both Spanish and Catalan infants here). By presenting infants with CV.CVC stimuli, with either trochaic or iambic stress (most frequent in Spanish, since around 85% of content words with this syllabic structure are stress-final), we are exploring their preferences for SW vs WS words, using the same preferential procedure as in the previous experiments (Pons & Bosch in preparation). Preliminary results reveal that Spanish-learning nine-month-olds listen significantly longer to lists made up of words stressed on the last syllable. Knowledge about the most frequent stress pattern in CV.CVC words seems to have been acquired by these infants (along with the specific phonotactics that determine quantitysensitive stress), so that they show a preference for this specific pattern, even though disyllabic words in Spanish are mostly stressed on the first syllable. Although further research is still required, these preliminary results suggest that infant preferences for lexical stress reflect the predominant pattern in the native language and that even minor variability in the statistical predominance seems to be determinant in the preferential patterns observed in infants from different linguistic environments. They also suggest that phonotactic information might be combined with stress cues at an earlier age than has been shown for English. Cross-linguistic research offers a valuable tool to explore infants’ attention to different types of sub-lexical cues to word boundaries and how they can be reliably exploited as strategies in early word segmentation.
References Alcina, Juan & José Manuel Blecua. 1975. Gramática española. Barcelona: Editorial Ariel. Andruski, Jean E. & Patricia K. Kuhl. 1996. “The acoustic structure of vowels in mothers’ speech to infants and children”. Proceedings of the 4th International Conference on Spoken Language Processing ed. by H.
214
FERRAN PONS & LAURA BOSCH
Timothy Bunnell & William Idsardi. 1545-1548. New Castle, Del.: Citation Delaware. Aslin, Richard N., Jenny R. Saffran & Elissa L. Newport. 1998. “Computation of conditional probability statistics by 8-month-old infants”. Psychological Science 9. 321-324. Boersma, Paul & David Weenink. 2005. Praat: Doing Phonetics by Computer (Version 4.2). [Computer program]. Retrieved from http://www.praat.org/ Bosch, Laura & Núria Sebastián-Gallés. 2001. “Evidence of early language discrimination abilities in infants from bilingual environments”. Infancy 2. 29-49. Bosch, Laura & Núria Sebastián-Gallés. 2003. “Simultaneous bilingualism and the perception of a language specific vowel contrast in the first year of life”. Language and Speech 46. 217-244. Cabré Monné, M. Teresa. 1993. Estructura gramatical i lexicó: el mot mínim català. [Grammatical structure and lexicon: Minimal words in Catalan]. PhD diss., Universitat Autònoma de Barcelona. Cole, Ronald A. & Jola Jakimik. 1980. “A model of speech perception”. Perception and Production of Fluent Speech ed. by Ronald A. Cole. 136163. Hillsdale, New Jersey: Erlbaum. Cooper, Robin P. & Richard N. Aslin. 1990. “Preference for infant-directed speech in the first month after birth”. Child Development 61. 1584-1595. Curtin, Suzanne, Toben H. Mintz & Morten H. Christiansen. 2005. “Stress changes the representational landscape: Evidence from word segmentation”. Cognition 96:3. 233-262. Cutler, Anne & David Carter. 1987. “The predominance of strong initial syllables in the English vocabulary”. Computer Speech and Language 2. 133-142. ---------- & Dennis Norris. 1988. “The role of strong syllables in segmentation for lexical access”. Journal of Experimental Psychology: Human Perception and Performance 14. 113-121. Echols, Catharine H., Megan J. Crowhurst & Jane B. Childers. 1997. “The perception of rhythmic units in speech by infants and adults”. Journal of Memory and Language 36. 202-225. Fernald, Anne & Patricia K. Kuhl. 1987. “Acoustic determinants of infant preference for motherese speech”. Infant Behavior & Development 10. 279-293. ----------, Traute Taeschner, Judy Dunn, Mechthild Papousek, Bénédicte de Boysson-Bardies & Ikuko Fukui. 1989. “A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants”. Journal of Child Language 16. 477-501. ---------- & Claudia Mazzie. 1991. “Prosody and focus in speech to infants and adults”. Developmental Psychology 27. 209-221. Gout, Ariel. 2001. Etapes précoces de l’acquisition du léxique. [Early Steps in Lexical Acquisition]. PhD diss., Ecole des Hautes Etudes en Sciences
THE PERCEPTION OF LEXICAL STRESS PATTERNS
215
Sociales. Paris, France. Guasti, Teresa & Anna Gavarró. 2003. “Catalan as a testing hypothesis concerning article omission”. Proceedings of the 27th Boston University Conference on Language Development ed. by Barbara Beachley, Amanda Brown & Frances Conlin. 288-298. Somerville, Mass.: Cascadilla Press. ----------, Joke de Lange, Anna Gavarró & Claudia Caprin. 2004. “Article omission: Across child languages and across special registers”. Proceedings of Generative Approaches to Language Acquisition 2003 ed. by Jacqueline van Kampen & Sergio Baauw. 199-210. Utrecht: LOT. Guerra, Ramón. 1983. “Recuento estadístico de la sílaba en español”. Estudios de Fonética I ed. by Manuel Esgueva & Margarita Cantarero. 9-112. Madrid: Consejo Superior de Investigaciones Científicas. Hirsh-Pasek, Kathy, Deborah G. Kemler Nelson, Peter W. Jusczyk, Kimberly Wright-Cassidy, Benjamin Druss & Lori J. Kennedy. 1987. “Clauses are perceptual units for young infants”. Cognition 26. 269-286. Höhle, Barbara. 2002. Der Einstieg in die Grammatik: Die Rolle der Phonologie/Syntax Schnittstelle für Sprachverarbeitung und Spracherwerb. [The beginning of grammar: The role of the phonology/syntax interface in language processing and language acquisition]. Habilitation thesis, Freie Universität Berlin. Houston, Derek M., Peter W. Jusczyk, Cecile Kuijpers, Riet Cooler & Anne Cutler. 2000. “Cross-language word segmentation by 9-month-olds”. Psychonomic Bulletin & Review 7. 504-509. Iakimova, Galina, Thierry Nazzi, Megha Sundara & Linda Polka. 2006. Emerging evidence of differences in segmentation abilities between Parisian and Canadian French infants. Poster presented at the International Conference on Infant Studies, Kyoto. Johnson, Elisabeth K. & Peter W. Jusczyk. 2001. “Word segmentation by 8month-olds: When speech cues count more than statistics”. Journal of Memory and Language 44. 548-567. Jusczyk, Peter W. 1999. “How infants begin to extract words from speech”. Trends in Cognitive Science 3. 323-328. ---------- & Elisabeth Thompson. 1978. “Perception of a phonetic contrast in multisyllabic utterances by 2-month-old infants”. Perception & Psychophysics 23:2. 105-109. ----------, Anne Cutler & Nancy Redanz. 1993. “Preference for the predominant stress pattern of English words”. Child Development 64. 675-687. ---------- & Richard N. Aslin. 1995. “Infants’ detection of the sound patterns of words in fluent speech”. Cognitive Psychology 29. 1-23. ----------, Derek M. Houston & Mary Newsome. 1999. “The beginnings of word segmentation in English-learning infants”. Cognitive Psychology 39. 159-207. ----------, Elisabeth A. Hohne & Angela Bauman. 1999. “Infants’ sensitivity to allophonic cues for word segmentation”. Perception & Psychophysics 61. 1465-1476.
216
FERRAN PONS & LAURA BOSCH
Kemler Nelson, Deborah G., Kathy Hirsh-Pasek, Peter W. Jusczyk & Kimberly Wright-Cassidy. 1989. “How the prosodic cues in motherese might assist language learning”. Journal of Child Language 16. 55-68. ----------, Peter W. Jusczyk, Denise R. Mandel, James Myers, Alice Turk & LouAnn Gerken. 1995. “The head-turn preference procedure for testing auditory perception”. Infant Behavior & Development 18. 111-116. Kuhl, Patricia K., Jean E. Andruski, Inna A. Chistovich, Ludmilla A. Chistovich, Elena V. Kozhevnikova, Viktoria L. Ryskina, Elvira I. Stolyarova, Ulla Sundberg & Francisco Lacerda. 1997. “Cross-language analysis of phonetic units in language addressed to infants”. Science 277. 684-686. Mattys, Sven L., Peter W. Juszcyk, Paul A. Luce & James L. Morgan. 1999. “Phonotactic and prosodic effects on word segmentation in infants”. Cognitive Psychology 38. 465-494. Mattys, Sven L. & Peter W. Jusczyk. 2001. “Phonotactic cues for segmentation of fluent speech by infants”. Cognition 78. 91-121. McQueen, James M., Dennis Norris & Anne Cutler. 1994. “Competition in spoken word recognition: Spotting words in other words”. Journal of Experimental Psychology: Learning, Memory, and Cognition 20:3. 621638. Mehler, Jacques, Peter W. Jusczyk, Ghislaine Lambertz, Nilofar Halsted, Josiane Bertoncini & Claudine Amiel-Tison. 1988. “A precursor of language acquisition in young infants”. Cognition 29:2. 143-178. ---------- & Anne Christophe. 1994. “Maturation and learning of language in the first year of life”. The Cognitive Neurosciences: A Handbook for the Field ed. by Michael S. Gazzaniga. 943-954. Cambridge, Mass.: The MIT Press. Moon, Christine, Robin P. Cooper & William P. Fifer. 1993. “Two-day-olds prefer their native language”. Infant Behavior & Development 16. 495500. Morgan, James L. 1996. “A rhythmic bias in preverbal speech segmentation”. Journal of Memory and Language 35. 666-688. ---------- & Jenny R. Saffran. 1995. “Emerging integration of sequential and suprasegmental information in preverbal speech segmentation”. Child Development 66. 911-936. Navarro Tomás, Tomás. 1965. Manual de pronunciación española. Madrid: Consejo Superior de Investigaciones Científicas. Nazzi, Thierry, Josiane Bertoncini & Jacques Mehler. 1998. “Language discrimination by newborns: Toward an understanding of the role of rhythm”. Journal of Experimental Psychology: Human Perception and Performance 24. 756-766. ----------, Galina Iakimova, Josiane Bertoncini, Séverine Frédonie & Carmela Alcantara. 2006. “Early segmentation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic differences”. Journal of Memory and Language 54:3. 283-299
THE PERCEPTION OF LEXICAL STRESS PATTERNS
217
Norris, Dennis, James M. McQueen & Anne Cutler. 1995. “Competition and segmentation in spoken-word recognition”. Journal of Experimental Psychology: Learning, Memory, and Cognition 21:5. 1209-1228. Pegg, Judith E., Janet F. Werker & Peter J. McLeod. 1992. “Preference for infant-directed over adult-directed speech: Evidence from 7-week-old infants”. Infant Behavior & Development 15. 325-345. Polka, Linda, Megha Sundara & Stephanie Blue. 2002. The role of language experience in word segmentation: A comparison of English, French, and bilingual infants. Paper presented at The 143rd Meeting of the Acoustical Society of America: Special Session in Memory of Peter Jusczyk, Pittsburgh, Penn. Pons, Ferran & Laura Bosch. In preparation. “Stress pattern preference in Spanish-learning infants: Some limits to the trochaic bias”. Prieto, Pilar. 2006. “The relevance of metrical information in early prosodic word acquisition: A comparison of Catalan and Spanish”. Language and Speech 49:2. 233-261. Quilis, Antonio. 1981. Fonética acústica de la lengua española. Madrid: Biblioteca Románica Hispánica, Gredos. Ramus, Franck, Marina Nespor & Jacques Mehler. 1999. “Correlates of linguistic rhythm in the speech signal”. Cognition 73. 265-292. Rietveld, Toni C. M. & Florien J. Koopmans-van Beinum. 1987. “Vowel reduction and stress”. Speech Communication 6. 217-229. Roark, Brian & Katherine Demuth. 2000. “Prosodic constraints and the learner’s environment: A corpus study”. Proceedings of the 24th Annual Boston University Conference on Language Development ed. by Catherine Howell, Sarah A. Fish & Thea Keith-Louis. 597-608. Sommerville, Mass.: Cascadilla Press. Saceda Ulloa, Marta. 2005. Adquisición prosódica en español peninsular septentrional: La sílaba y la palabra prosódica. [Prosodic Acquisition in Northern Spanish: the Syllable and the Prosodic Word]. Master’s thesis, Universitat Autònoma de Barcelona. Saffran, Jenny R., Richard N. Aslin & Elissa L. Newport. 1996. “Statistical learning by 8-month-old infants”. Science 274. 1926-1928. Sansavini, Alessandra. 1997. “Neonatal perception of the rhythmical structure of speech”. Early Development & Parenting 6. 3-13. Schane, Sanford A. 1968. French Phonology and Morphology. Cambridge, Mass.: MIT Press. Schreuder, Robert & R. Harald Baayen. 1994. “Prefix stripping re-revisited”. Journal of Memory and Language 33. 357-375. Sebastián-Gallés, Núria, Maria Antònia Martí, Manuel Carreiras & Fernando Cuetos. 2000. LEXESP: Léxico informatizado del español. Barcelona: Ediciones Universitat de Barcelona. ---------- & Laura Bosch. 2002. “The building of phonotactic knowledge in bilinguals: the role of early exposure”. Journal of Experimental Psychology: Human Perception and Performance 28:4. 974-989.
218
FERRAN PONS & LAURA BOSCH
Thiessen, Erik D. & Jenny R. Saffran. 2003. “When cues collide: Statistical and stress cues in infant word segmentation”. Developmental Psychology 39. 706-716. Turk, Alice E., Peter W. Jusczyk & LouAnn Gerken. 1995. “Do Englishlearning infants use syllable weight to determine stress?” Language and Speech 38. 143-158. Vihman, Marilyn M., Rory A. DePaolis & Barbara L. Davis. 1998. “Is there ‘a trochaic bias’ in early word learning? Evidence from infant production in English and French”. Child Development 69. 935-949. ----------, Satsuki Nakai, Rory A. DePaolis & Pierre Hallé. 2004. “The role of accentual pattern in early lexical representation”. Journal of Memory and Language 50. 336-353. Weber, Christiane, Anja Hahne, Manuela Friedrich & Angela D. Friederici. 2004. “Discrimination of word stress in early infant perception: Electrophysiological evidence”. Cognitive Brain Research 18. 149-161. Weijer, Joost van de. 1998. Language Input for Word Discovery. PhD diss., Max Planck Series in Psycholinguistics 9. Woodward, Julide Z. & Richard N. Aslin. 1990. Segmentation cues in maternal speech to infants. Paper presented at the 7th Biennial Meeting of the International Conference on Infant Studies, Montreal, Canada.
LOGISTIC REGRESSION MODELLING FOR FIRST- AND SECONDLANGUAGE PERCEPTION DATA * GEOFFREY STEWART MORRISON University of Alberta 1
Abstract Logistic regression analysis has, for some time, been successfully applied to L1 speech perception data, but has not been widely applied in L2 speech perception research. This chapter is a tutorial which makes use of simple data sets to introduce logistic regression analysis as applied to categorical response data from L1 and L2 speech perception experiments. Data are taken from an experiment on L1 Spanish vowel perception by Álvarez González, and experiments on L1 and L2 English vowel perception by Escudero & Boersma, and Morrison. Model fitting is demonstrated as a technique to determine which acoustic cues are attended to by listeners. Logistic regression coefficients are used to quantify how listeners use those acoustic cues, to produce graphical representations of their use of acoustic cues, and as statistics in secondary analyses used to determine whether there are significant differences in the perception of stimuli by L1 versus L2 groups of listeners.
1.
Introduction Logistic regression is a statistical method suitable for analysing identification response data from speech perception experiments 2 . Although logistic regression has, for some time, been applied successfully in firstlanguage (L1) speech perception research (e.g. Benkí 2001; Breier, Gray, Fletcher, Diehl, Klaas, Foorman & Mollis 2001; de Jong, Lim & Nagao 2004; Maddox, Molis & Diehl 2002; Nearey 1990, 1997; Rosen & Manganari 2001), it has not been widely applied in second-language (L2) speech perception research. This chapter is intended to be an introduction to understanding logistic regression applied to L1 and L2 speech perception data, and is aimed especially at L2-speech-perception students and researchers who are not familiar with the technique. Using relatively simple data sets, I will illustrate some of the ways in which logistic regression can be applied. Readers should then find it easier to understand the more complex analyses in L1 perception *
The writing of this chapter was supported by the Social Sciences and Humanities Research Council of Canada. My thanks to Terrance M. Nearey, the editors, and anonymous reviewers for comments and advice. 1 Now at Boston University. 2 Although less flexible, another suitable method is probit analysis.
GEOFFREY STEWART MORRISON
220
papers such as Nearey (1990, 1997) and L2 perception papers such as Morrison (submitted, 2006). For general introductions to applied logistic regression see Hosmer & Lemeshow (2000), Menard (2002), and Pampel (2000). 2. 2.1
Fitting a logistic regression model One stimulus dimension, binomial responses In speech perception research, the basic goal of logistic regression analysis is to fit a sigmoidal (S-shaped) curve to categorical response data. Consider a classic voice onset time (VOT) experiment in which there is a single acoustic dimension, VOT ranging from 0 to 60 ms in 10 ms intervals, and there are two response categories, voiced or voiceless (one stimulus dimension and binomial/dichotomous responses). Imagine the following idealised response data: A participant hears each of the seven stimuli ten times in random order and gives ten voiceless responses for all the stimuli with VOT<20 ms, eight voiceless and two voiced responses for the stimulus with VOT = 20 ms, two voiceless and eight voiced responses for the stimulus with VOT = 30 ms, and ten voiced responses for all the stimuli with VOT>30 ms. This binary response data can be converted to proportional data: The proportion of voiced responses is 0 for all stimuli with VOT<20 ms, 0.2 for the stimulus with VOT = 20 ms, 0.8 for the stimulus with VOT = 30 ms, and 1 for all stimuli with VOT>30 ms. The observed proportions of voiced responses are plotted in Figure 1, as are the sigmoidal curves fitted via a logistic regression analysis to the proportions of voiced and voiceless responses.
proport ion
1
voiced
.75 .5 .25 voiceless
0 0
10
20
30 VOT (ms)
40
50
60
Figure 1: Sigmoidal logistic regression curves fitted to idealised VOT data. Dots represent proportions of voiced responses observed in the data.
The fitted curves are not a perfect fit to the data; for example, the predicted probability of a voiced response at 20 ms is 0.172 rather than the observed value of 0.2. However, the curve is generally very close to the data points. Goodness-of-fit can be assessed in several ways. A standard method is to measure the distance between the observed and predicted values for each stimulus and take an average over all the stimuli: Root-mean-squared (RMS)
LOGISTIC REGRESSION MODELLING
221
error is the sum of the squares of the differences between the observed and predicted values (sum of squared errors), divided by the residual degrees of freedom in the model, then square rooted 3 . RMS error can be scaled by the number of responses per stimulus to give a percentage root-mean-squared error (%RMS). The RMS error for the logistic regression model fitted to the data in Figure 1 is 2.6%. Another measure of goodness-of-fit is the percentage modal agreement (%MA), the percentage of times, over all the stimuli, that the most likely response predicted by the model matches the most common (the modal) response of the listener. If getting the category right is what counts, then %MA may be a more meaningful measure. The MA for the logistic regression model fitted to the data in Figure 1 is 100%. The goodness-of-fit measure actually used when fitting logistic regression models is the deviance statistic G2, which is determined as follows: For each response category at each stimulus, calculate the natural logarithm of the model’s predicted value for the response category divided by the natural logarithm of the value of the observed response for that category and multiplied by the value of the observed response, then sum over all categories and stimuli and multiply by minus two. Compared to RMS error, the G2 statistic is less intuitively meaningful, but, like RMS error, it decreases as goodness-of-fit improves 4 . Several factors can affect goodness-of-fit. One factor is the appropriateness of the model: Clearly the sigmoidal curve of a logistic regression model is a better fit to our data than would be the straight line of a linear regression model. In some cases the appropriateness of the model, or lack thereof, may not be so apparent, an issue which Hosmer and Lemeshow (2000:§5.3) discuss in detail. For formant values in vowel stimuli, goodnessof-fit typically improves when frequency is entered into the model in log Hertz (or mel, Bark, or ERB) rather than in Hertz. Since human frequency perception is closer to logarithmic rather than linear, a model fitted to log Hertz values is usually more appropriate than a model fitted to Hertz values. Another factor which can decrease goodness-of-fit is noise in the data: If the listener is occasionally distracted, they may fail to hear a stimulus and press a response button at random. A certain number of responses in the data will then be from a random distribution which does not reflect the listener’s perception of the stimuli. If the number of random responses is relatively small, they may have 3
The number of residual degrees of freedom is the number of independent pieces of information in the model. For the models here, this is the number of stimuli multiplied by one less than the number of response categories, minus the number of non-redundant coefficients estimated in the model. (Since the responses are proportions, they must sum to 1, and the proportions for the last category are redundant.) There are seven stimuli and two response categories in the VOT data, and two coefficients/parameters in the logistic regression model fitted to the data; therefore there are five residual degrees of freedom in the model. 4 I follow Nearey (1990, 1997) in the use of the symbols G2 for deviance and ΔG2 for the difference in the deviance between two nested models (see below). Hosmer and Lemeshow (2000) use D for the former and G for the latter. Menard (2002) uses DM for the former, and GM for the latter if the smaller model is the bias-only model, but Gk for other pairs of models.
222
GEOFFREY STEWART MORRISON
relatively little effect on the location and shape of the fitted curve; however, the random responses will likely cause the observed values for some stimuli to be further from the curve than they would otherwise have been, and so will decrease the goodness-of-fit (noise will also usually cause the slopes of the curves to be shallower). Yet another factor that can decrease goodness-of-fit is the use of data pooled across listeners. It could be that a logistic regression model fits each individual’s data well, but that the exact location of the category boundaries vary across listeners, and hence the boundaries in the pooled data are fuzzier than each individual listener’s boundaries. Although problematic for statistical analysis 5 , use of pooled data may be justified on linguistic grounds: If the listeners are all native speakers of the same dialect then it may be argued that they will have similar pronunciation and perception patterns, and any interlistener differences will be negligible for communication purposes. A population average model based on data pooled across listeners may reasonably be taken to characterise the perception of a group of native speakers of a given dialect. 2.2
Multiple stimulus dimensions, multinomial response categories Let us look at some data from an actual experiment. Álvarez González (1980:Chapter 3) investigated L1-Spanish listeners’ perception of a synthetic vowel space in which F1 varied from 250-800 Hz in nine steps (ten points), F2 varied from 750-2700 Hz in eight steps (nine points), and F3 varied from 2300-2900 Hz in two steps (three points). The total number of stimuli was 231 rather than 270 since the corner where F1 would have been higher than F2 was excluded. Fifty listeners heard each stimulus once in random order in the context /_a/, and responded by circling orthographic ara, era, ira, ora, or ura on an answer sheet, thereby identifying each synthetic vowel as one of the Spanish vowels, /a/, /e/, /i/, /o/, or /u/. This constitutes three stimulus dimensions and five response categories. Álvarez González reported results pooled across participants. We will use logistic regression analysis to answer three questions regarding the Álvarez González data: Question 1: Does the listeners’ vowel perception depend on F1 and F2? Question 2: Does the listeners’ vowel perception depend on F3 in addition to F1 and F2? Question 3: How do F1 and F2 affect the listeners’ vowel perception? 5
The use of pooled data obscures individual differences which increase the variance in the data, and the assumption of independence of observations is violated. Given these issues, and the lack of consensus on an appropriate approach to repeated measures data in this type of analysis, some researchers do not believe that pooling can be justified. In some instances, multi-level modelling may be applied (see Quené & van den Bergh 2004).
LOGISTIC REGRESSION MODELLING
223
The software that we will use to build logistic regression models of multinomial/polytomous response data was implemented by Terrance M. Nearey based on an algorithm described in Haberman (1979) 6 . Logistic regression operates in a logistic (log odds) space 7 , and fits a model by maximising the goodness-of-fit to the data using an iterative maximum likelihood technique. The technique selects a set of estimated coefficient values that (given the constraints of the model) result in the predicted values for each response category at each stimulus being as close as possible to the observed values (the G2 average error between observed and predicted values is minimised over all the stimuli and categories) 8 . For models that will be fitted to the Álvarez González data, the set of possible logistic regression coefficients will be: bias coefficients:
α/a/, α/e/, α/i/, α/o/, α/u/
F1-tuned coefficients:
β/a/F1, β/e/F1, β/i/F1, β/o/F1, β/u/F1
F2-tuned coefficients:
β/a/F2, β/e/F2, β/i/F2, β/o/F2, β/u/F2
F3-tuned coefficients:
β/a/F3, β/e/F3, β/i/F3, β/o/F3, β/u/F3
These include redundant coefficients since the value of the fifth coefficient in each family of coefficients (α, βF1, βF2, βF3) is known once the values of the other four coefficients are known: We use deviation-from-mean coding, hence the sum of the values of the coefficients in each family is zero, and the value of the fifth coefficient is minus the sum of the other four coefficients. In a model containing only bias coefficients, the bias coefficients would reflect the proportions of the number of responses given to each category in the whole data set, irrespective of stimulus properties. Stimulus-tuned coefficients are those which capture the changes in a listener’s responses which correlate with changes in the properties of the stimuli presented to the listener. We will assume below that the changes in the stimulus properties are the cause of the changes in the listener’s responses. 6
The software is available as Matlab code upon request from T. M. Nearey (current e-mail: [email protected]), or, along with additional code to run the analyses described in this paper, from G. S. Morrison (current website: http://cns.bu.edu/~gsm2). With some additional effort, most of the analyses described below could also be conducted using commercial software such as SPSS or STATA, or free software such as R. 7 Non-linear probability values can be transformed into linear logit values (see Pampel 2000: Ch. 1). In the case of the VOT data, the odds of a voiced response is the ratio of the probability of a voiced response to the probability of a voiceless response odds(voiced) = p(voiced) / p(voiceless). The logit is the natural logarithm of the odds Logit(voiced) = log(odds(voiced)). 8 Details of model fitting are beyond the scope of this tutorial. Interested readers may wish to consult, in increasing depth of coverage, Pampel (2000), Hosmer & Lemeshow (2000), McCullagh & Nelder (1983), and Haberman (1979).
224
GEOFFREY STEWART MORRISON
We will answer Questions 1 and 2 by comparing the difference in goodness-of-fit between different logistic regression models fitted to the response data. If a model that contains F1 and F2 fits the data better than a model which does not contain F1 and F2, then this indicates that the listeners’ vowel perception depends on F1 and F2. Likewise, if a model that contains F3 fits the data better than a model which does not contain F3, then this indicates that the listeners’ vowel perception depends on F3. The models we will fit include the coefficients given in 1a-1c 9 : Bias coefficients only:
α/a/, α/e/, α/i/, α/o/, α/u/
(1a)
F1 and F2 tuning:
α/a/, α/e/, α/i/, α/o/, α/u/, β/a/F1, β/e/F1, β/i/F1, β/o/F1, β/u/F1, β/a/F2, β/e/F2, β/i/F2, β/o/F2, β/u/F2
(1b)
F1, F2, and F3 tuning:
α/a/, α/e/, α/i/, α/o/, α/u/, β/a/F1, β/e/F1, β/i/F1, β/o/F1, β/u/F1, β/a/F2, β/e/F2, β/i/F2, β/o/F2, β/u/F2, β/a/F3, β/e/F3, β/i/F3, β/o/F3, β/u/F3
(1c)
The difference in goodness-of-fit of nested models (models where the smaller model contains a subset of the parameters in the larger model) can be statistically assessed using the difference in the G2 statistic between the two models, ΔG2 (the −2 log likelihood ratio for testing the significance of a difference between two nested models). Assuming pure multinomial error, ΔG2 is asymptotically distributed as a χ2 with degrees of freedom equal to the difference in degrees of freedom between the two models. However, if there is overdispersion/heterogeneity in the data, such as may arise when data is pooled over participants, then the ΔG2 test may suffer from a serious Type II error and indicate a significant difference when the difference is in fact not significant. One approach to dealing with this problem (provided in Nearey’s software), is to use a quasi-likelihood F-test: The F-ratio is the result of dividing the ΔG2 by the overdispersion factor (the overdispersion factor is calculated as the ratio of the Pearson χ2 to the residual degrees of freedom)10 , and the degrees of freedom in the F-test are the difference in degrees of freedom between the two models and the residual degrees of freedom of the larger model (see McCullagh & Nelder 1983; Nearey 1990, 1997). Table 1 shows the G2, %RMS error, and %MA for each model fitted to the response data. F1 and F2 were converted to the natural logarithms of their 9 It is also possible to build more complex models including coefficients for quadratic square and crossproduct terms, etc. 10 The Pearson χ2: For each stimulus, the square of the difference between the observed values of the responses and the model’s predicted values, then divided by the model’s predicted values, then summed over all stimuli.
LOGISTIC REGRESSION MODELLING
225
Hertz values before fitting the logistic regression models. Table 2 shows the ΔG2, overdispersion, and quasi-likelihood F-ratio for comparisons of model 1b with 1a, and 1c with 1b. Adding F1 and F2 stimulus tuning to a model containing only bias coefficients (1b vs 1a) resulted in a large (22.8 percentage point) decrease in %RMS error, and a large (54.9 percentage point) increase in %MA, and the increase in goodness-of-fit was statistically significant on the quasi-likelihood F-test11 . Therefore it can be concluded that the listeners’ vowel responses did depend on F1 and F2. Adding F3 stimulus tuning to a model already containing bias coefficients and F1 and F2 tuning (1c vs 1b) resulted in a small (0.2 percentage point) decrease in %RMS error, and a small (0.8 percentage point) decrease (rather than increase) in %MA, and the increase in goodness-of-fit was not statistically significant on the quasi-likelihood F-test. There is therefore little reason to believe that the listeners’ vowel responses depended on F3 in addition to F1 and F2. χ2
df
G2
1a
920
43647
54031
34.6
35.1
1b
912
8105
125912
11.8
90.0
Model
%RMS
%MA
1c 908 7911 130478 11.6 89.2 Table 1: Goodness-of-fit measures for models fitted to the vowel perception data from Álvarez González (1980). Models compared
Δdf
df residual
ΔG2
p(ΔG2)
overdispersion
F
p(F)
1b vs 1a
8
912
35542
0.000
58.7
75.65
0.000
4 908 194 0.000 0.35 0.843 1c vs 1b 138.1 Table 2: Comparisons of goodness-of-fit measures for models fitted to the vowel perception data from Álvarez González (1980).
3. 3.1
Interpreting logistic regression coefficients Graphical representations The third question asked regarding the Álvarez González data was: How do F1 and F2 affect Spanish listeners’ vowel perception? One way to answer this question is via graphical representations of the logistic regression model of listeners’ perception. The estimated logistic coefficient values calculated for Model 1b are shown in Table 3 and the stimulus-tuned coefficient values are plotted in Figure 2. The relative locations of the perceptual vowel response categories in the F1-tuned-coefficient–F2-tuned-coefficient space in Figure 2 is 11
McCullagh and Nelder (1983) advise using a fixed overdispersion, typically from the largest model considered. The 1b versus 1a comparison would still be significant on the quasilikelihood F test if the overdispersion from Model 1c were used.
GEOFFREY STEWART MORRISON
226
reminiscent of the distribution of vowel production values in the F1-F2 space; correlation of coefficients with production patterns are frequently found in logistic regression analyses. The direct interpretation of the stimulus-tuned coefficients will be discussed below in Section 3.2. bias coefficients
F1-tuned coefficients
F2-tuned coefficients
α/a/
−35.667
β/a/F1
6.804
β/a/F2
−1.059
α/e/
−40.832
β/e/F1
1.240
β/e/F2
4.774
α/i/
1.519
β/i/F1
−5.664
β/i/F2
4.561
α/o/
14.618
β/o/F1
3.982
β/o/F2
−5.405
60.362 β/u/F1 −6.362 β/u/F2 −2.870 α/u/ Table 3: Estimated values of logistic regression coefficients for Model 1b fitted to the vowel perception data from Álvarez González (1980). 8 6
F2-tuned coefficient
4 2 0 -2 -4 -6 -8 -8
-6
-4
-2
0
2
4
6
8
F1-tuned coefficient
Figure 2: Plot of estimated values of stimulus-tuned logistic regression coefficients for Model 1b fitted to the vowel perception data from Álvarez González (1980), as in Table 3.
In order to obtain a predicted logistic value for a given category at a given set of stimulus values, the F1 and F2 values and estimated logistic regression coefficient values for that category are substituted into Equation 1b and all coefficients that do not correspond to the given category are set to zero. For example, to obtain the predicted logistic value for the response /u/, Logit/u/, at F1 = 250 Hz, F2 = 800 Hz, the values would be substituted into Equation 2: Logit/u/ = Logit/u/ =
α/u/ + β/u/F1 × F1 + β/u/F2 × F2 60.362 − 6.362 × log(250) − 2.870 × log(800)
(2) =
6.050
The predicted probability for the response /u/, p/u/, is calculated as in
LOGISTIC REGRESSION MODELLING
227
Equation 3:
p /u /
(3)
e Logit/u / = ∑e Logitx x
p/u/ =
eLogit/u/ / (eLogit/a/ + eLogit/e/ + eLogit/i/ + eLogit/o/ + eLogit/u/)
p/u/ =
e6.050 / (e−5.178 + e−2.073 + e0.734 + e0.474 + e6.050)
=
0.991
where x takes on the values of all the response categories {/a/, /e/, /i/, /o/, /u/}: Each value of Logitx is calculated as in Equation 2 using the same F1 and F2 values, and the estimated logistic regression coefficients appropriate for each response category. If a range of F1 and F2 values covering the stimulus space are substituted into equations of the type given in Equations 2 and 3, the predicted probability of each vowel response category can be calculated over the two-dimensional stimulus space and plotted in a three-dimensional probability surface plot as in Figure 3. The height of a surface above the base of the plot indicates the predicted probability of the response associated with that surface. The predicted probability of an /u/ response is close to 1 for low-F1–low-F2 values and decreases sigmoidally as either F1 or F2 or both increase. Response categories /i/, /e/, and /o/ have their highest predicted probabilities in the other corners of the stimulus space. The predicted probability of an /a/ response is highest for high-F1 and intermediate-F2 values. The maximum predicted probability of an /a/ response is quite low compared to the maximum predicted probabilities of the other response categories (the number of /a/ responses in the raw data was low, this is not an analytical error). Figure 4 is a two-dimensional territorial map, equivalent to a view of the three-dimensional probability surface plot (Figure 3) from directly above the stimulus plane. Only the response with the highest predicted probability is visible in any part of the stimulus space. The solid lines represent the location of perceptual boundaries between vowels; on one side of the boundary one vowel is the more probable response, on the other side another vowel is more probable. The dashed and dotted lines represent the 0.5 and 0.75 predicted probability contours for the locally dominant categories. The /i/-/e/ boundary is at lower F1 values than the /u/-/o/ boundary; this perceptual result corresponds to the finding that Spanish speakers produce /e/ with lower F1 than /o/ (e.g. Álvarez González 1980:§2.7).
GEOFFREY STEWART MORRISON
228
probability
1 .75 .5 .25
0 2700
800 1960
598 1423
F2 (Hz)
447 1033
334 750
F1 (Hz)
250
Figure 3: Probability surface plot based on logistic regression Model 1b fitted to the vowel perception data from Álvarez González (1980). The height of a surface about the base of the plot indicates the predicted probability of the corresponding response category. 2700
F2 (Hz)
1960
1423
1033
750 250
334
447 F1 (Hz)
598
800
Figure 4: Territorial map based on logistic regression Model 1b fitted to the vowel perception data from Álvarez González (1980).
3.2
Boundary crispness A stimulus-tuned logistic regression coefficient represents the slope of a line in the logistic space. With deviation-from-mean coding, the rate of change from one category to another along a dimension in the logistic space is the difference between the estimated stimulus-tuned logistic regression coefficient values for each category (the distance between the centres of the vowel labels in Figure 2). For example, in Model 1b fitted to the Álvarez González data, the rate of change from /i/ to /e/ as F1 increases is β/e/F1 − β/i/F1 = 1.240 − (−5.664) = 6.904 logit units per log Hertz. The rate of change from one category to
LOGISTIC REGRESSION MODELLING
229
another will be referred to below as the contrast coefficient12 . The contrast coefficient slope in the logistic space is related to the slope of the sigmoidal curve representing the rate of change from one category to another in the probability space. For expository purposes, we will return to the binomial VOT example. In a binomial model, the slope of the steepest tangent to the sigmoidal curve representing rate of change in the probability space (e.g. Figure 5b) is one-quarter the slope of the contrast coefficient13 line in the logistic space (e.g. Figure 5a)14 . The size of the contrast coefficient and the corresponding steepness of the steepest tangent to the sigmoidal curve in the probability space are indicators of the crispness of the boundary between the two categories. The logistic regression model fitted to the idealised VOT data has a voiceless to voiced contrast coefficient, β(voiced-voiceless)VOT (hereafter βVOT), of 0.314 logit units per millisecond = a maximum rate of change in the probability of 0.079 per millisecond. Figures 5a and 5b show plots of the linear slope in the logistic space and the sigmoidal curve in the probability space, based on a contrast coefficient value four times that of the contrast coefficient value from the model fitted to the VOT data. The sigmoidal curve is almost steplike: as the VOT increases, the probability of a voiceless response is essentially 0 until very close to the boundary, then jumps to essentially 1. This is therefore a very crisp categorical boundary. Figures 5c and 5d show plots of the linear slope in the logistic space and the sigmoidal curve in the probability space, based on a contrast coefficient value one fourth that of the contrast coefficient value from the model fitted to the VOT data. The sigmoidal curve is almost linear with a gradual increase in the probability of a voiced response 12
Rates of change for any category contrast can be calculated along any arbitrary line in the stimulus space. For example, the rate of change from back vowel to front vowel identification as F2 increases: (β/i/F2 + β/e/F2) − (β/u/F2 + β/o/F2) logit units per log Hertz. Or the rate of change from /i/ to /e/ for a one log Hertz increase in F1 and a two log Hertz decrease in F2: (β/e/F1 − 2×β/e/F2) − (β/i/F1 − 2×β/i/F2) logit units per log Hertz. 13 In the binomial case, one would usually use reference-category rather than deviation-frommean coding. The coefficient values for one category would be fixed at zero and (what I have designated) the contrast coefficients would be the only coefficients reported by the software. If reference-category coding had been adopted in the multinomial model of the Álvarez González data, the reference category, e.g., /u/, would have been at the origin of Figure 2, and the other categories would have been shifted but would have maintained the same relative locations. 14 The instantaneous value of the probability slope is the (partial) derivative of the probability with respect to the dimension of interest. Using the binomial VOT example, this is: dp⁄dβVOT = β(voiced-voiceless)VOT × p(voiced) × p(voiceless) (see Pampel 2000:24). The steepest tangent occurs at the intersection between the lines/surfaces representing the probability of each category. In the binomial case each category has a probability of 0.5 at the intersection, hence the instantaneous slope at this point is: β(voiced-voiceless)VOT × 0.5 × 0.5 = β(voiced-voiceless)VOT × 0.25. In multinomial cases, the calculation of the slope of the maximum tangent to the sigmoidal rateof-probability-change curve between two categories is complicated by the fact that other categories may have non-zero predicted probabilities at the intersection of the two categories of interest, thus each category of interest will not have 0.5 probability at the intersection. However, a larger contrast coefficient value will still indicate a larger value for the maximum slope of a tangent to the sigmoidal curve.
GEOFFREY STEWART MORRISON
230
from 0 to 60 ms VOT. This is therefore a very fuzzy categorical boundary. 10 (a)
logits
5
(c)
0 -5
-10 1 .75 .5 .25 0
voiced
probability
voiced (d)
(b) voiceless 0
10
20 30 40 VOT (ms)
50
voiceless 60 0
10
20 30 40 VOT (ms)
50
60
Figure 5: Linear slopes in the logistic space (a and c) and the corresponding sigmoidal curves in the probability space (b and d) for contrast coefficient values of 1.256 logits/ms (a and b) and 0.079 logits/ms (c and d).
Measures of boundary crispness or fuzziness are useful when analysing L2 perception data. Native speakers typically have crisp boundaries between categories, similar to Figure 5b. L2 learners may not have L1 categories distinguished by the same acoustic cues as the L2 categories, the L1 may not use an acoustic dimension that is used in the L2, or the range of values sampled along the dimension may all fall within a single L1 category. In such cases, the L2 learners would be expected to have very fuzzy boundaries, similar to Figure 5d. Even though their L1 may not provide them with a crisp categorical boundary, they may still be able to hear differences along the acoustic dimensions under study and respond in a gradient manner, e.g. giving more voiced responses for longer VOT, and thus have a non-zero contrast coefficient. As they learn the L2, they would be expected to approximate the perception of native speakers of the L2, their categorical boundaries would become crisper, and this would be reflected in the contrast coefficient values from logistic regression models fitted to their perception data. 3.3
Polar-coordinate contrast coefficients We will now turn to an example of the use of logistic regression contrast coefficients applied to real L2 perception data. In Escudero & Boersma (2004), L1-English and L1-Spanish L2-English listeners gave English /i/ or // responses to a synthetic vowel continuum that varied orthogonally in spectral and duration properties. Morrison (2005a) fitted logistic regression models to individual participant’s responses in Escudero and Boersma’s data (data was not pooled across listeners) and derived /i/-// contrast coefficients βspec and
LOGISTIC REGRESSION MODELLING
231
βdur along the spectral and duration dimensions. The contrast coefficient values for the twenty L1-English speakers from the south of England, and for the fourteen L1-Spanish listeners learning a Southern England dialect of English, are plotted in Figure 6. L1-Spanish L1-English
0.75 0.50
βdur 0.25 0.00 -0.25 -1
0
βspec
1
2
Figure 6: Contrast coefficients values from logistic regression models fitted to individual participant data from Escudero & Boersma (2004).
Relative to L1-English listeners, the L1-Spanish listeners had significantly larger duration-tuned contrast coefficients and significantly smaller spectral-tuned contrast coefficients: Welch’s t tests βdur t(26.589) = 3.951, p<0.01 versus βspec t(27.858) = −4.742, p<0.001. This was taken as evidence that, compared to the L1-English listeners, the L1-Spanish listeners made greater use of duration and less use of spectral properties when distinguishing English /i/ and // (similar results have been reported elsewhere). Boersma and Escudero (2005) pointed out that, because of constraints imposed by the edges of the stimulus space, the spectrally-tuned and durationtuned contrast coefficients were partially correlated, and recommended using the ratio of the two contrast coefficients in the same manner as Escudero and Boersma (2004) had used the ratio of their spectral and duration reliance measures. The ratio of the spectrally-tuned and duration-tuned contrast coefficients gives the orientation of the /i/-// boundary in the spectral-duration stimulus space, i.e. the orientation of the boundary line on a territorial map (the ratio is a gradient, which may be converted to an angle in degrees). However, rather than simply taking the ratio, the two contrast coefficients can be converted into polar coordinates to provide orthogonal measures of: (1) the orientation of the boundary in the spectral-duration stimulus space, i.e. polarcoordinate angle; and (2) the boundary crispness, i.e. polar-coordinate
GEOFFREY STEWART MORRISON
232
magnitude15 . The boundary crispness is the rate of change from one category to the other in the direction perpendicular to the orientation of the boundary. Two listeners could have identical boundary orientations, but one could have a crisp and the other a fuzzy boundary. Looking at boundary orientation alone would ignore this important difference in the listeners’ perception, which could signal, for example, that the first listener has a well established categorical boundary, and the second listener is responding to within-category acoustic differences. (a)
(b)
(c)
Figure 7: Probability surface plots illustrating different boundary angles and magnitudes. (a) L1-English listener, angle 70° magnitude 0.88 (b) L2-English listener, angle 27° magnitude 0.35 (c) L2-English listener, angle −2° magnitude 0.46
The use of polar coordinates provides relatively intuitive numerical descriptors for the boundary. Figures 7a-7c provide probability surface plots which give examples of different boundary angles and magnitudes (the values are reported in the caption). Note the differences in the steepness of the curved surfaces reflecting differences in boundary crispness, and the differences in the orientation of the intersection between the curved surfaces reflecting boundary orientation. (The angles were calculated such that an angle of 90° would indicate that the listener used only spectral cues, and an angle of 0° would indicate that the listener used only duration cues.) 15
angle = arctan(βspec / βdur)
magnitude = √̅̅̅̅̅ βspec2 + βdur2
LOGISTIC REGRESSION MODELLING
233
Comparing the two groups in Escudero and Boersma’s data, the L1Spanish L2-English listeners’ /i/-// boundary angles were significantly smaller than those of the L1-English listeners, t(32) = 5.503, p<0.001, indicating a relatively greater use of duration cues. On the other hand, the L1-Spanish L2English listeners’ /i/-// boundary magnitudes were not significantly smaller than those of the L1-English listeners, t(32) = 1.367, p = 0.181. Again we conclude that, compared to the L1-English listeners, the L2-English listeners made greater use of duration, but we did not find statistical evidence that, as a group, the L2 learners had fuzzier boundaries. 3.4
Additional example of the use of contrast coefficients Like Escudero and Boersma (2004), Morrison (submitted) investigated L1-Spanish L2-English listeners’ perception of the English /i/-// contrast; however, in the latter study the dialect of English was General Canadian English and the study simultaneously assessed vowel and consonant perception. Listeners gave English /bit/, /bid/, /bt/, /bd/, /bt/, and /bd/ responses to stimuli from a resynthesised natural speech continuum in which the vowels varied orthogonally in spectral and duration properties. A diphonebiassed logistic regression model (see Nearey 1990, 1997) was fitted to each individual participant’s response data: Segment bias coefficients:
α/i/, α//, α//, α/t/, α/d/
Diphone bias coefficients:
α/it/, α/id/, α/t/, α/d/, α/t/, α/d/
Stimulus-tuned coefficients:
β/i/spec, β//spec, β//spec, β/t/spec, β/d/spec, β/i/dur, β//dur, β//dur, β/t/dur, β/d/dur
(4)
Participants were grouped via a hierarchical cluster analysis on the contrast coefficient values β(/i/-//)spec, β(/i/-//)dur, β(/d/-/t/)spec and β(/d/-/t/)dur, and on the basis of the crispness of their categorical boundaries the groups of L1Spanish listeners were assigned to a modified version of Escudero’s (2000) hypothesised stages of development for L1-Spanish listeners learning the English /i/-// contrast: Stage 0 Stage ½ Stage 1 Stage 2 Stage 3
no ability to distinguish the contrast category-goodness assimilation to Spanish /i/ distinguished via duration cues distinguished via a mixture of duration and spectral cues native-English-like perception, distinguished primarily on the basis of spectral cues
The values of individual participant’s contrast coefficients and their
234
GEOFFREY STEWART MORRISON
assignments to stages of development are plotted in Figure 8. The hypothesised progression along the stages of development is represented by the arrow. The contiguity of the hypothesised stages along the arrow is a necessary condition for them to represent a developmental sequence.
Figure 8: Contrast coefficients from logistic regression models fitted to individual participant data from Morrison (submitted). Arrow joins contiguous groups of L1-Spanish listeners and represents a hypothesised developmental path.
4.
Conclusion This chapter introduced logistic regression analysis as applied to the type of categorical response data typically collected in speech perception experiments in which listeners are asked to identify synthetic stimuli in terms of speech-sound categories. Comparison of the goodness-of-fit of different logistic regression models was demonstrated as a means of determining which acoustic cues listeners used when identifying stimuli. This chapter also demonstrated the use of logistic regression coefficients to describe listeners’ perceptual use of acoustic cues. Logistic regression coefficients were used to produce detailed graphical representations of listeners’ use of perceptual cues. They provided a metric of intercategory boundary orientation and crispness. They were also used as statistics in secondary analyses which tested the
LOGISTIC REGRESSION MODELLING
235
differences in perception between L1 and L2 groups. Given that syntheticstimuli category-identification experiments are common in L2 speech perception research, there is great potential for the application of logistic regression analysis to this field of research. I hope that this chapter has helped readers not previously familiar with the technique to gain a basic understanding of applied logistic regression analysis.
References Álvarez González, Juan Antonio. 1980. Vocalismo español y vocalismo inglés. [Spanish and English Vowels]. PhD diss., Universidad Complutense de Madrid. Benkí, José R. 2001. “Place of articulation and first formant transition pattern both affect perception of voicing in English”. Journal of Phonetics 29. 122. Boersma, Paul & Paola Escudero. 2005. “Measuring relative cue weighting: A reply to Morrison”. Studies in Second Language Acquisition 27. 607617. Breier, Joshua I., Lincon Gray, Jack M. Fletcher, Randy L. Diehl, Patricia Klaas, Barbara R. Foorman & Michelle R. Mollis. 2001. “Perception of voice and tone onset time continua in children with dyslexia and without attention deficit/hyperactivity disorder”. Journal of Experimental Child Psychology 80. 245-270. Escudero, Paola. 2000. Developmental patterns in the adult L2 acquisition of new contrasts: The acoustic cue weighting in the perception of Scottish tense/lax vowels by Spanish speakers. MA thesis, University of Edinburgh. ---------- & Paul Boersma. 2004. “Bridging the gap between L2 speech perception research and phonological theory”. Studies in Second Language Acquisition 26. 551-585. Haberman, Shelby J. 1979. Analysis of Qualitative Data. Vol. 2. New York: Academic. Hosmer, David W. & Stanley Lemeshow. 2000. Applied Logistic Regression. (2nd ed.). New York: Wiley. Jong, Kenneth J. de, Byung-jin Lim & Kyoko Nagao. 2004. “The perception of syllable affiliation of singleton stops in repetitive speech”. Language and Speech 47:3. 241-266. McCullagh, Peter & John A. Nelder. 1983. Generalized Linear Models. London: Chapman and Hall. Maddox, W. Todd, Michelle R. Molis & Randy L. Diehl. 2002. “Generalizing a neuropsychological model of visual categorization to auditory categorization of vowels”. Perception & Psychophysics 64: 4. 584-597. Menard, Scott. 2002. Applied Logistic Regression Analysis. Thousand Oaks, Calif.: Sage.
236
GEOFFREY STEWART MORRISON
Morrison, Geoffrey Stewart. 2005. “An appropriate metric for cue weighting in L2 speech perception: Response to Escudero & Boersma (2004)”. Studies in Second Language Acquisition 27. 597-606. ----------. 2006. L1 & L2 production and perception of English and Spanish vowels: A statistical modelling approach. PhD diss., University of Alberta. ----------. Submitted. L1-Spanish speakers’ acquisition of the English /i/-// contrast: Hypothesised developmental stages. Nearey, Terrance M. 1990. “The segment as a unit of speech perception”. Journal of Phonetics 18. 347-373. ----------. 1997. “Speech perception as pattern recognition”. Journal of the Acoustical Society of America 101:6. 3241-3254. Pampel, Fred C. 2000. Logistic Regression: A primer. Thousand Oaks, Calif.: Sage. Quené, Hugo & Huub van den Bergh. 2004. “On multi-level modelling of data from repeated measures designs: A tutorial”. Speech Communication 43. 103-121. Rosen, Stuart & Eva Manganari. 2001. “Is there a relationship between speech and non-speech auditory processing in children with dyslexia?” Journal of Speech, Language, and Hearing Research 44. 720-736.
RHYTHMIC TYPOLOGY AND VARIATION IN FIRST AND SECOND LANGUAGES * LAURENCE WHITE & SVEN L. MATTYS Department of Experimental Psychology, University of Bristol
Abstract This paper explores the concept of linguistic rhythm classes through a series of studies exploiting metrics designed to quantify speech rhythm. We compared the rhythm of ‘syllable-timed’ French and Spanish with that of ‘stress-timed’ Dutch and English, finding that rate-normalised metrics of vocalic interval variability (VarcoV and nPVI-V), together with a measure of the balance of vocalic and intervocalic intervals (%V), were the most discriminant between the two rhythm groups. The same metrics were also informative about the adaptation of speakers to rhythmically-similar (Dutch and English) or rhythmically-distinct (Spanish and English) second languages, and showed evidence of rhythmic gradience within accents of British English. Patterns of scores in all studies support the notion that rhythmic typology is not strictly categorical. A perceptual study found VarcoV to be the strongest predictor of the rating of a second language speaker’s accent as native or non-native.
1. 1.1
Introduction Speech rhythm and rhythm classes It has long been asserted that languages fall into distinct rhythm classes (e.g. Pike 1945). Within the languages of Western Europe, Romance languages, such as Spanish, are described as ‘syllable-timed’ and Germanic languages, such as English, are described as ‘stress-timed’. Initial attempts to quantify this distinction appealed to the notion of isochronous units in speech timing: the syllable in syllable-timed languages and the stress-delimited foot in stress-timed languages (e.g. Abercrombie 1967:96-98). Dauer (1983) and *
This research was supported by a grant from the Biotechnology and Biological Sciences Research Council (BBSRC) to Sven Mattys (7/S18783). We thank Sarah Davies, Casimier Ludwig and Ineke Mennen for help with stimulus preparation and Elizabeth Johnson, Klaske van Leyden, Reinier Salverda, Astrid Schepman, Mike Sharwood Smith, Juan Manuel Toro, Isabelle Viaud-Delmon, Atie Vogelenzang de Jong, Eric-Jan Wagenmakers and Rod Walters for help with recordings. Thanks are also due to James Melhorn for assisting in stimulus preparation for the perceptual study reported here, and for running that experiment. The crosslinguistic production study reported here was first presented at the Phonetics and Phonology in Iberia conference 2005. We thank the organisers of this conference, the conference delegates for some very interesting discussions, and Klaske van Leyden, Rod Walters and three anonymous reviewers for very useful comments on an earlier draft of this paper.
238
LAURENCE WHITE & SVEN L. MATTYS
others showed, however, that syllable duration varies substantially in syllabletimed languages and, conversely, that inter-stress intervals are highly variable in stress-timed languages (see Ramus, Nespor & Mehler 1999, for a review). Despite the lack of evidence for isochrony-based rhythm classes, there remains the perception that the contrast between stressed and unstressed syllables is greater in, for example, English than Spanish, at least to native English listeners. Lloyd James (1940) expressed this as the distinction between ‘Morse code’ English rhythm and ‘machine gun’ Spanish rhythm. In line with this distinction, Dauer and others (Roach 1982; Dasher & Bolinger 1982) suggested that differences between rhythm classes emerge from contrasts in syllable structure. The phonotactic rules of stress-timed languages typically allow greater complexity in syllable onsets and codas, so that a syllable like strands, with three onset and three coda segments, is permissible in English but phonotactically illegal in Spanish or French, the latter two having a much higher preponderance of open syllables such as simple CV syllables. In addition, stress-timed languages have stressed vowels that are substantially longer than (typically reduced) unstressed vowels, whereas syllable-timed languages have much contrast in vowel duration between stressed and unstressed syllables. 1.2
Rhythm metrics A number of rhythm metrics have recently been proposed to exploit these differences in syllable structure and vowel duration. Ramus et al. (1999) suggested indices of rhythm based on a division of the speech signal into vocalic and consonantal intervals, specifically: ΔV Standard deviation of vocalic interval duration ΔC Standard deviation of consonantal interval duration %V Percentage of utterance duration that is made up of vocalic rather than consonantal intervals They showed that combinations of these ‘interval measures’ successfully captured the stress-timed vs syllable-timed distinction for a range of languages in a speech-rate controlled corpus. Ramus (2002) suggested that speech rate normalisation might be necessary when applying interval measures to corpora with variable speech rate. Barry, Adreeva, Russo, Dimitrova and Kostadinova (2003) provided evidence supporting normalisation, showing that both ΔV and ΔC are inversely related to speech rate. In contrast, Dellwo and Wagner (2003) found little evidence of a consistent relationship between %V and speech rate, suggesting normalisation may not be necessary for this metric. Dellwo (2006) exploited a rate-controlled metric, VarcoC—the standard deviation of consonantal interval duration divided by the mean consonantal interval duration—and found that it was better than ΔC at all speech rates for discriminating stress-timed English and German from syllable-timed French.
FIRST AND SECOND LANGUAGE RHYTHM
239
Taking a parallel approach, again based on a division of speech into vocalic and consonantal intervals, Low, Grabe and Nolan (2000) argued that it is the sequential nature of rhythm that is critical. They proposed a ratenormalised pairwise variability index (PVI) to exploit specifically the durational contrast between successive vocalic intervals, derived by dividing the difference between pairs of vocalic intervals by the sum of the intervals. The normalised PVI for vocalic intervals, nPVI-V, is calculated thus (where m is the number of intervals and d is the duration of the kth interval): m −1
(1a)
nPVI = 100 × (∑ | (d k − d k +1 ) /((d k + d k +1 ) / 2) |) /( m − 1) k =1
Grabe and Low (2002) further proposed a non-rate normalised PVI measure for consonantal intervals, suggesting that normalisation could mask rhythmically-relevant variation in onset and coda structure. The raw PVI for consonantal intervals, rPVI-C, is calculated thus: m −1
(1b)
rPVI = (∑ | d k − d k +1 |) /( m − 1) k =1
Utilising both of these metrics, Grabe and Low examined a range of languages and found evidence for stress-timed and syllable-timed groups, as well as rhythmically-intermediate languages, such as Polish and Catalan. Few direct comparisons have been made between interval measures and pairwise variability indices. We report here on studies in which we attempted to evaluate these metrics. Firstly, we looked for evidence of the traditional distinction between stress-timed and syllable-timed languages, and also we examined the influence of first language (L1) on second language (L2) rhythm. Secondly, we sought evidence for rhythmic distinctions between accents of British English. Finally, we considered how well rhythm metrics predict the ratings of speakers of English as native or non-native. 2.
Production studies of speech rhythm Here we present results from two studies designed to evaluate the power of the various rhythm metrics. The first study, reported in detail in White and Mattys (in press), examined speech rhythm in first and second languages. The study was designed to test how well different metrics supported the distinction between stress-timed (English and Dutch) and syllable-timed (Spanish and French) languages. In addition, the effect of L1 on L2 rhythm was considered, by analysing the L2 rhythm of speakers with first and second languages in different rhythm classes (English and Spanish). We hypothesised that, where L2 speakers have a clearly non-native accent, rhythm metrics scores in L2s should reflect the rhythmic properties of both the L1 and the L2. There has been a limited amount of previous research on the influence of L1 on L2
240
LAURENCE WHITE & SVEN L. MATTYS
rhythm metric scores, such as a study by Carter (2005) of American Hispanic bilinguals. He found that these speakers had nPVI-V scores which were intermediate between the higher scores for L1 English and the lower scores for L1 Spanish. We predicted that the most useful rhythm metrics should show this pattern in our study, and similar intermediacy of scores for native English speakers of L2 Spanish. We also looked at the L2 rhythm of speakers with first and second languages in the same rhythm class (English and Dutch), with the expectation that there should be little difference in rhythm scores between L1 and L2 speakers in this case. The second study, reported in detail in White and Mattys (in preparation), examined evidence for rhythmic contrasts between different accents of British English. Accents such as Bristolian English have been held to manifest less widespread vowel reduction and less contrast between the length of tense and lax vowels (Hughes & Trudgill 1996). Accents with pitchpeak delay, such as Welsh Valleys English, may show levelling of the duration contrast between stressed and post-stress syllables. This may arise either from relative shortening of the stressed syllable, or at least the vocalic part of the stressed syllable, or from relative lengthening of the post-stress syllable (see Walters 2003, for Welsh Valleys English). This durational levelling may underpin the perception of Welsh Valleys English as being more syllable-timed than Standard Southern British English (Mees & Collins 1999). The native dialects of speakers from the Orkney Islands also show pitch peak delay (van Leyden 2004), and this may also be manifest in Orcadians’ production of Standard English. For comparison with Orkney English, we also analysed the rhythm of English as spoken by natives of Shetland, which, despite being geographically proximate to Orkney and sharing a Scandinavian substrate for its indigenous dialect, lacks the distinctive pitch-peak delay of Orcadian (van Leyden 2004). Orkney and Shetland speech have some phonological features in common that distinguish them from SSBE and the other accents in this study. In particular, both show the operation of the Scottish Vowel Length Rule (SVLR), whereby most vowels are short, but are lengthened in certain contexts, such as before voiced fricatives in stressed syllables (e.g. Aitken 1981). The consequences for rhythm metric scores of the SVLR have not been examined: given the existence of phonological contexts in which certain vowels may be substantially lengthened, it seems likely that SVLR speakers will—other things being equal—show more durational variability between vowels than, for example, Spanish speakers. 2.1 Method 2.1.1 Participants: Cross-linguistic study. There were eight groups of speakers and six speakers in each group. Four groups were composed of L1 speakers, all speaking near-standard European varieties of their native languages: Standard Southern British English – EngEng; Dutch (Algemeen Nederlands) – DutDut; Spanish (castellano) – SpSp; French (français neutre) – FrFr.
FIRST AND SECOND LANGUAGE RHYTHM
241
The other four groups were composed of L2 speakers: English speakers of Dutch – DutEng; Dutch speakers of English – EngDut; English speakers of Spanish – SpEng; Spanish speakers of English – EngSp. Because our hypotheses regarding expected rhythm scores rely on native speakers manifesting some degree of non-native accent, we only used L2 speakers who sounded nonnative and had not learnt their L2 in early childhood. L2 speakers had a minimum residency of five months in the country of their second language and had to have at least reasonable competence in the L2, as evidenced by their ability to describe a route around a map with minimal preparation, as well as to read sentences and a short story intelligibly and without widespread hesitations or restarts. 2.1.2 Participants: British accents study. There were five groups of speakers and six speakers in each group. The Standard Southern British English (SSBE) speakers were the same as those in the cross-linguistic study, with the same utterance tokens analysed for both studies. In addition, there were four groups of speakers of regional accents of English, from Bristol, the Welsh Valleys, Orkney and Shetland. It should be noted that both Orkney and Shetland dialects are significantly different in syntax and lexis from Standard English or Standard Scots. Speakers reading Standard English sentences, as in this study, are therefore speaking what could be regarded as a second language, albeit an early acquired one, and may not necessarily manifest all the prosodic features of their native dialect. 2.1.3 Materials. All sentence materials for the analyses reported here are listed in full in White and Mattys (in press). The five English sentences were adapted from a larger set created by Nazzi, Bertoncini and Mehler (1998) and used in Ramus et al.’s (1999) investigation of rhythm metrics. The adaptations were designed to exclude the approximants /j/, /w/, /r/ and /l/, to facilitate measurement of speech interval duration, given that boundaries between vowels and approximants can be difficult to identify reliably from visual analysis of waveforms and spectrograms. Speakers from Bristol, Orkney and Shetland may, however, realise orthographic post-vocalic “r” as an approximant. Where this occurred, the approximant was included within the vocalic interval: the impact of this procedure on rhythm scores is discussed below. The sentences for the other languages were constructed along similar lines, with the same set of approximants excluded, although other allophonic approximants were not systematically excluded. Excluding the approximant [] from all the French sentences proved problematic, so it was taken as part of the vocalic interval in the words biscuits “biscuits”, mois “month” and nuit “night”, as it could not be separated from the vowel with adequate consistency. Given the cross-linguistic variation in syllable complexity, we attempted to match sentences for overall duration by constructing sentences with slightly more syllables for Spanish and French than for English and Dutch.
242
LAURENCE WHITE & SVEN L. MATTYS
2.1.4 Procedure. Each recorded sentence was labelled into vocalic intervals and consonantal intervals through visual analysis of the waveform and spectrogram in Praat (www.praat.org), based on standard criteria (e.g. Peterson & Lehiste 1960; see White & Mattys in press, for a description of the specific measurement criteria applied). The duration of these intervals was derived from the labelled speech files using a Praat script. Mid-utterance pauses were excluded from the analysis, as were utterance-initial consonants, and glottalised sections between vowels. The rhythm metrics calculated for each utterance from the vocalic and consonantal interval durations were as follows. ΔV ΔC %V VarcoV VarcoC nPVI-V rPVI-C
Standard deviation of vocalic interval duration Standard deviation of consonantal interval duration Sum of vocalic interval duration divided by the total duration of vocalic and consonantal intervals Standard deviation of vocalic interval duration divided by mean vocalic interval duration, multiplied by 100 Standard deviation of consonantal interval duration divided by mean consonantal interval duration, multiplied by 100 Normalised Pairwise Variability Index for vocalic intervals (see formula 1a) Raw Pairwise Variability Index for consonantal intervals (see formula 1b)
All pairwise comparisons reported below between rhythm metric scores are two-tailed Tukey HSD. 2.2 Results and discussion 2.2.1 First and second language rhythm. First-language scores for all the rhythm metrics are shown in Table 1. As outlined below, the only metrics that consistently discriminated stress-timed English and Dutch from syllable-timed Spanish and French were %V, VarcoV and nPVI-V. SpSp FrFr EngEng DutDut Interval measures ΔV 32 (1.9) 44 (2.2) 49 (2.2) 49 (2.6) 40 (2.3) 51 (3.6) 59 (2.4) 49 (4.1) ΔC %V 48 (0.8) 45 (0.5) 38 (0.5) 41 (1.2) VarcoV 41 (2.0) 50 (0.9) 64 (1.7) 65 (1.5) VarcoC 46 (2.0) 44 (0.8) 47 (1.0) 44 (1.8) Pairwise variability indices nPVI-V 36 (1.6) 50 (1.8) 73 (1.2) 82 (2.4) rPVI-C 43 (2.1) 56 (4.3) 70 (2.8) 52 (4.2) Table 1: Rhythm metric scores (standard errors) for first language speakers.
Spanish had significantly lower ΔV scores than all other groups [vs EngEng: p<0.001; vs DutDut: p<0.001; vs FrFr: p<0.005]. Spanish ΔC scores
FIRST AND SECOND LANGUAGE RHYTHM
243
were significantly lower than those for English [p<0.005]. There were no other significant differences between languages for either ΔV or ΔC scores. Thus, both metrics suggested that Spanish belongs to a distinct rhythm class, but French appeared to be rhythmically more similar to Dutch and English. One likely reason for this is the greater speech rate for Spanish than French (8.0 syls/s vs 5.6 syls/s): as discussed above, both ΔV and ΔC scores are inversely correlated with speech rate. Indeed, as seen below, the rate-normalised vocalic interval metric, VarcoV, was more successful in distinguishing between traditional rhythm classes. Both Spanish and French %V scores were higher than those for English and Dutch [SpSp vs EngEng: p<0.001; SpSp vs DutDut: p<.001; FrFr vs EngEng: p<0.001; FrFr vs DutDut: p<0.05]. Stress-timed languages, given their widespread vowel reduction in unstressed syllables and their higher occurrence of onset and coda consonant clusters, had lower %V than syllable-timed languages. However, there was also suggestive evidence of differences within rhythm classes [EngEng vs DutDut: p = 0.062; SpSp vs FrFr: p = 0.085]. If the difference between Dutch and English is reliable, this could relate to the reported less widespread occurrence of vowel reduction in Dutch (Swan & Smith 1987). English and Dutch VarcoV scores were significantly higher than those for Spanish and French [p<0.001 for all four comparisons]. French also had significantly higher VarcoV than Spanish [p<0.005]. For VarcoC, however, rate normalisation appeared to eliminate all distinctions between languages, with no significant differences in scores. Both English and Dutch had higher nPVI-V scores than Spanish and French [p<0.001 for all four comparisons]. In addition, Dutch had a higher nPVI-V score than English [p<0.01] and French had a higher score than Spanish [p<0.001]. The patterns for VarcoV and nPVI-V were similar to each other and in line with expectations based on the greater durational difference between stressed and unstressed vowels in stress-timed languages. As with %V, differences within rhythm classes were also evident, suggesting that this classification is not straightforwardly categorical. English had significantly higher rPVI-C scores than all other languages [vs DutDut: p<0.01; vs FrFr: p<0.05; vs SpSp: p<0.001] and the difference between French and Spanish approached significance [p = 0.078]. As with ΔC, scores did not strongly support the traditional rhythmic distinctions between these languages, perhaps due, once again, to the lack of rate normalisation for these metrics of consonantal interval variation and the consequent influence of speech rate on scores. Given their power to discriminate expected rhythm classes, reporting of subsequent results will focus on the metrics %V, VarcoV and nPVI-V. (Scores for all metrics in this study are given in White & Mattys in press.) Results for comparisons of L1 and L2 rhythm are shown in Table 2 (there was no L2 analysis for French).
244
LAURENCE WHITE & SVEN L. MATTYS
English Dutch Spanish EngEng EngDut EngSp DutDut DutEng SpSp Interval measures %V 38 (0.5) 40 (0.4) 41 (0.9) 41 (1.2) 38 (1.6) 48 (0.8) VarcoV 64 (1.7) 61 (2.7) 54 (3.2) 65 (1.5) 65 (1.7) 41 (2.0) Pairwise variability indices nPVI Voc 73 (1.2) 70 (1.6) 66 (4.3) 82 (2.4) 75 (1.6) 36 (1.6) Table 2: Rhythm metric scores (standard errors) for L1 and L2 speakers.
SpEng 52 (0.8) 52 (1.3) 51 (2.4)
We consider first the case of languages from the same rhythm class. There was no significant difference in VarcoV between EngEng and EngDut and no difference between DutDut and DutEng. The same was true for nPVI-V. The lack of differentiation between first and second languages by the measures of vocalic interval variability reinforces the idea that Dutch and English are rhythmically similar. There was no significant difference in %V scores between EngEng and EngDut, but %V for DutDut was higher than for DutEng, the difference approaching significance [p = 0.093], and the %V scores for DutEng being the same as for EngEng. This trend was the only distinction between L1 and L2 within the same rhythm class. The fact the English speakers of L2 Dutch have %V scores that are so similar to those of L1 English—likewise Dutch speakers of L2 English and L1 Dutch speakers—suggests that L2 speakers either do not perceive subtle rhythmic distinctions, such as between Dutch and English, or not do not realise them because communication is not compromised by ignoring them. All three metrics showed discrimination between L1 and L2 when the two languages belonged to different rhythm classes. EngEng had lower %V scores than EngSp [approaching significance: p = 0.083]. SpSp had significantly lower %V scores than SpEng [p<0.05], a surprising result given that L1 Spanish had higher %V than L1 English. EngEng had significantly higher VarcoV scores than EngSp [p<0.05] and SpSp had significantly lower scores than SpEng [p<0.05]. For nPVI-V, there was no significant difference between EngEng and EngSp, but scores were significantly lower for SpSp than for SpEng [p<0.005]. Comparing metrics of vocalic interval variability, VarcoV appeared slightly more successful in capturing the differences between L1 and L2 rhythm than nPVI-V. The latter metric showed no difference between English L1 and Spanish L2 speakers of English; in contrast, both L1 vs L2 English and L1 vs L2 Spanish were distinguished by VarcoV. As discussed further below, there are several segmental and suprasegmental processes which contribute to patterns of vowel and consonant duration. Thus, the working assumption of this study is that L2 speakers, where they have clear non-native accents, should be distinguished from L1 speakers in terms of rhythm scores, which exploit these patterns of vowel and consonant duration. The marginal preference for VarcoV over nPVI-V stems from this working assumption. Figure 1 shows the VarcoV scores for all L1 and L2 speakers plotted against the %V scores.
FIRST AND SECOND LANGUAGE RHYTHM
245
Figure 1: VarcoV and %V scores and standard error bars for all first and second language groups. Eng – English; Dut – Dutch; Sp – Spanish; Fr – French.
For VarcoV, L2 speakers had rhythm scores intermediate between scores for their L1 and those for native speakers of the L2. VarcoV scores suggest that Spanish speakers of English accommodate towards the shorter unstressed vowels of their L2, and may also produce longer stressed vowels than in their L1, but do not make the distinction between stressed and unstressed vowels as great as do native English speakers. English speakers of Spanish appear to produce unstressed vowels that are longer than those of English but not as long as those of native Spanish speakers. Given this general intermediacy of L2 rhythm, the pattern for %V in L1 and L2 Spanish is rather surprising: as shown in Figure 1, English speakers of L2 Spanish actually had %V scores higher than those of L1 Spanish speakers. There are a number of segmental and suprasegmental differences between Spanish and English that could account for this pattern. At the segmental level, English speakers of Spanish may produce vowels with generally greater duration than Spanish speakers, particularly where the closest English vowel is a diphthong: for example, English speakers may realise Spanish [e] as the diphthong [e], or at least retain its greater duration. At the suprasegmental level, English may have more marked prosodic lengthening. For example, Ortega-Llebaria and Prieto (this volume) reported little evidence of accentual lengthening in Castilian Spanish, in contrast with American or British English (Turk & Sawusch 1997; Turk & White 1999). Likewise, perceived phrase-final lengthening is less widespread in Castilian Spanish than some other Romance languages (Frota, D’Imperio, Elordieta, Prieto & Vigário, this volume), whereas phrase-final lengthening is very well attested in varieties of English (e.g. Wightman, Shattuck-Hufnagel, Ostendorf & Price 1992). If English speakers retain their native prosodic lengthening patterns in L2 Spanish, these processes may conspire to increase the vocalic proportion of the total
246
LAURENCE WHITE & SVEN L. MATTYS
utterance: the preponderance of open syllables in Spanish means, for example, that final lengthening is likely to affect vowels more than consonants. The overall pattern that emerges from this study of first and second language rhythm is that certain rhythm metrics, particularly %V and VarcoV, and to a slightly lesser extent nPVI-V, capture cross-linguistic differences in vowel duration and syllable onset and coda phonotactics. It is these durational and phonotactic contrasts that conspire to make stressed syllables relatively strong in English and Dutch, and make the relative strengths of stressed and unstressed syllables less different in Spanish and French. Clearly, given the range of contributing factors, the relative strength of stressed and unstressed syllables is gradiently variable, so, although there is evidence here for stresstimed vs syllable-timed language grouping, this typology is highly unlikely to be categorical. 2.2.2 Accents of British English. We report the results for the three rhythm metrics that were found to be most useful for discriminating stress-timed from syllable-timed languages, namely, %V, VarcoV and nPVI-V. (Scores for accents of British English for the other metrics discussed above are reported in White & Mattys in preparation.) Table 3 shows the scores for these metrics for the five accents of British English studied. SSBE Welsh Valleys (= EngEng) Shetland Orkney Bristol Interval measures %V 38 (0.5) 39 (0.6) 40 (1.0) 42 (0.7) 41 (0.5) VarcoV 64 (1.7) 59 (1.2) 53 (2.0) 53 (1.8) 57 (2.2) Pairwise variability indices nPVI-V 73 (1.2) 77 (2.6) 70 (1.9) 66 (2.9) 70 (2.1) Table 3: Means (standard errors) of rhythm metrics for SSBE (Standard Southern British English), Shetland, Orkney, Welsh Valleys and Bristol English.
SSBE had significantly lower %V scores than Bristol and Welsh Valleys accents [p<0.005 for both comparisons], and the difference between SSBE and Orkney approached significance [p = 0.059]. Shetland also had significantly lower %V scores than Bristol and Orkney accents [p<0.05 for both comparisons]. No other differences in %V scores between accents were significant. A similar pattern of discrimination was observed for VarcoV: scores for SSBE were significantly greater than those for all other accents [vs Bristol: p<0.05; vs Welsh Valleys: p = 0.001; vs Orkney: p = 0.001; vs Shetland: p<0.05]. In addition, VarcoV scores for Shetland were significantly higher than those for Welsh Valleys and Orcadian [p<0.05 for both comparisons]. No other differences in VarcoV scores between accents were significant. Thus, both %V and VarcoV suggest a rhythmic accent grouping of Bristolian, Welsh Valleys and Orcadian, with a lesser degree of stress-timing and more evidence of syllable-timing than SSBE. This grouping appears to
FIRST AND SECOND LANGUAGE RHYTHM
247
accord well with the segmental and/or suprasegmental properties of these accents which reduce the vowel duration contrast between stressed and unstressed syllables. The %V scores of Bristolian, Orcadian and Shetland may have been somewhat elevated due to the inclusion of post-vocalic /r/ in vocalic intervals, so some caution must be exercised in interpreting these results. The fact that Orkney had higher %V than Shetland, however, indicates that the treatment of rhoticity is not the sole cause of the apparent greater syllable timing in the former, given the both Orkney and Shetland speakers manifest rhoticity. In contrast, the scores for nPVI-V were less supportive of this distinction. Although the mean scores were lower for Bristol, Welsh Valleys and Orkney than for SSBE, only the comparison between SSBE and Welsh Valleys approached significance [p = 0.077]. Shetland had the highest nPVI-V scores, significantly greater than those for Welsh Valleys [p<0.05], with the difference approaching significance for the comparison with Orkney English [p = 0.064]. 2.2.3 Discussion: Evaluation of rhythm metrics. Thus, the results of the crosslinguistic study and the study of British accents converge on the conclusion that %V and VarcoV offer the best discrimination between accent and language groups that are usually held to differ rhythmically. These distinctions are illustrated for British accents in Figure 2, where it can be seen that Bristol, Orkney and Welsh Valleys English have %V and VarcoV scores that are intermediate between those of SSBE and Castilian Spanish.
Figure 2: VarcoV and %V scores and standard error bars for British accent groups. SSBE – standard Southern British English; Sh – Shetland; Br – Bristol; Or – Orkney; WV – Welsh Valleys. Scores for L1 Spanish are shown for comparison.
248
LAURENCE WHITE & SVEN L. MATTYS
The scores for nPVI-V also offer some support for these distinctions, but in both studies expected differences were not found and some differences emerged which had not been predicted. In particular, nPVI-V did not discriminate between native English and L2 English spoken by Spanish speakers. It is also worth noting that Dutch and English had rather different, though not significantly different, nPVI-V scores, with Dutch appearing more stress-timed, a distinction not apparently motivated by what is known about the languages and not reflected in the rhythmic differences between L1 and L2 speakers in the English/Dutch comparison. Finally, nPVI-V did not discriminate between the rhythm of SSBE and that of Bristol and Orkney accents of English, a distinction consistent with the processes affecting stressed vs unstressed vowel duration and which is supported by the scores for %V and VarcoV. It may be that nPVI-V, based as it is on syntagmatic comparisons of vowel duration, is actually too sensitive to the characteristics of individual utterances, with the cruder global measures %V and VarcoV better able to capture broad rhythmic trends. 3.
Rhythm metrics and the perception of native and non-native accent Of all the rhythm metrics assessed in the production studies described above, %V and VarcoV best capture expected differences in speech rhythm between languages and language varieties. There is a strong case for arguing, however, that linguistic rhythm is primarily a perceptual phenomenon. Research is clearly required on how these rhythm metrics relate to an individual’s perception of speech and of the differences between languages, language varieties and individual speakers. Studying rhythmic perception in isolation is not straightforward. Segmental information can be removed from speech by low-pass filtering and fundamental frequency contours can be flattened to remove distinctive intonation patterns, but such an approach may leave insufficient information for listeners to make perceptual judgments (e.g. van Leyden 2004). Other researchers have used resynthesis to generate segmentally and intonationally monotonous speech whilst preserving the durational characteristics of the original signal (e.g. Ramus, Dupoux & Mehler 2003). As a first pass at establishing perceptual correlates of rhythm metrics, we here report a simple accent judgment experiment, in which native English participants rated accents of English as sounding more or less native or nonnative. Clearly, with unprocessed speech, there will be many segmental and suprasegmental indicators of a speaker’s linguistic origin available, some of which will have no bearing on rhythm. Given, however, that a range of linguistic processes influence the duration of segments, in particular, the relative duration of stressed and unstressed vowels, we predict that the most effective rhythm metrics should capture something of the variability that leads to perceptions of speech as native and non-native.
FIRST AND SECOND LANGUAGE RHYTHM
249
3.1
Method The methodology for the perception experiment was that utilised in a review study of native accent assessment (Piske, MacKay & Flege 2001). Participants were given a nine-point scale of accent nativeness/non-nativeness and told to rate each of a series of auditorily-presented utterances according to this scale. The only difference in our methodology was that we counterbalanced the polarity of the scale: half of participants were told that a rating of 1 should indicate ‘no foreign accent’ and a rating of 9 should indicate ‘strong foreign accent’; for the other participants, the ratings scale was reversed, with 1 indicating ‘strong foreign accent’ and 9 indicating ‘no foreign accent’. This was done to control for potential response bias in the use of the scale. In calculating the mean overall ratings, the ratings were inverted for the second set of participants, so that lower ratings consistently meant a more native-like accent. 3.1.1 Participants. Participants were twelve native speakers of English with no self-reported hearing or speaking problems. They were paid a small honorarium or received course credit for their participation. 3.1.2 Materials. Three groups of speakers were used from the cross-linguistic study described above: native SSBE speakers; native Dutch speakers and native Spanish speakers, with three female and three male speakers in each group. Each speaker read each of the five English experimental sentences. Thus there were a total of ninety experimental utterances and another ninety similar utterances were also rated. 3.1.3 Procedure. Participants were seated in front of a computer monitor and keyboard, and were presented with the utterances over headphones. A ninepoint scale was marked on the keyboard and participants were told to rate each utterance according to its degree of foreign accent, as described above. After a short practice block using a sample of the utterances, participants were played all of the 180 utterances three times, in three blocks of separately-randomised order. 3.2
Results Mean accent ratings were: for SSBE speakers, 1.4; for Dutch English speakers, 3.6; for Spanish English speakers, 6.7. A by-Subjects repeated measures ANOVA showed a main effect of accent group on ratings [F(2,22) = 445.83, p<0.001]. Mean ratings for all accent groups differed from each other at the p<0.001 level. Figure 3 shows the mean ratings for each accent group broken down by speakers. ANOVAs showed main effects of speaker on ratings for all accent groups [SSBE: F(5,55) = 2.88, p<0.05; Dutch English: F(5,55) = 49.41, p<0.001; Spanish English: F(5,55) = 38.66, p<0.001]. Thus, for all groups,
250
LAURENCE WHITE & SVEN L. MATTYS
accent rating varied between speakers. As Figure 3 indicates, this variation was much less for SSBE speakers than for the other groups. In line with the primary distinction between stress-timed and syllabletimed languages, Dutch speakers were rated as being more native-like in their production of English than Spanish speakers. As can be seen from Figure 3, two Dutch speakers were rated as much more non-native than the other four, indicating that, of course, factors other than rhythm influence accent perception. 9 8
Mean accent rating
7 6 5 4 3 2 1 SSB E
Dutch English
Spanish English
Figure 3: Mean accent ratings by speakers in each accent group.
Table 4 shows a correlation matrix between accent ratings, rhythm metrics and speech rate. There are a number of trends to note in the pairwise correlations. Firstly, there are expected correlations between measures of vocalic intervals variability (ΔV, VarcoV, nPVI-V) and between measures of consonantal interval variability (ΔC, rPVI-C). Secondly, the non-ratenormalised measures (ΔV, ΔC, rPVI-C) show evidence of inverse correlations with speech rate, as was discussed in the introduction. Thirdly, only the three metrics that emerged from the production studies as most effectively discriminant between rhythmic groups show strong significant positive (%V) or negative (VarcoV, nPVI-V) correlations with accent ratings. As these measures are also correlated with each other, further analysis is reported below to assess which metrics best predict accent rating. Finally, speech rate and accent rating are also inversely correlated, as would be expected, given that fluent speakers should tend to speak more quickly than less fluent speakers. The mean scores for the seven rhythm metrics and speech rate were used as predictor variables for accent ratings in a stepwise linear regression, the results of which are shown in Table 5. VarcoV was found to be the best single predictor of accent rating (r2 = 0.541). A model incorporating both VarcoV and speech rate accounted for a greater proportion of the accent ratings (r2 = 0.669)
251
FIRST AND SECOND LANGUAGE RHYTHM
than VarcoV alone, but no additional rhythm metrics made significant further contributions to predicting accent ratings.
Accent rating ΔV ΔC %V VarcoV VarcoC nPVI-V rPVI-C
ΔV
ΔC
%V
VarcoV
VarcoC
nPVI-V
rPVI-C
−0.127
0.019
0.647**
−0.735**
−0.264
−0.561*
−0.123
Speech rate −0.486*
0.601**
0.000
0.556**
0.170
0.661**
0.495*
−0.685**
−0.151
0.183
0.664**
0.472*
0.953**
−0.576*
−0.603**
−0.177
−0.539*
−0.227
−0.338
0.316
0.863**
0.193
0.178
0.463†
0.698**
0.136
0.451†
−0.095 −0.437†
Table 4: Pairwise Pearson correlations between accent ratings, rhythm metrics and speech rate. Two-tailed significance levels: †: 0.10>p≥0.05; *: p<0.05; **: p<0.01. r r2 Adjusted r2 β t Model 1 0.735 0.541 0.512 VarcoV −0.735 −4.34** Model 2 0.818 0.669 0.625 Varco V −0.671 −4.44** Speech rate −0.363 −2.41* Table 5: Stepwise linear regression with accent rating as dependent variable and rhythm metrics (ΔV, ΔC, %V, nPVI-V, rPVI-C, VarcoV, VarcoC) and speech rate as potential predictor variables. Significance levels: *: p<0.05; **: p<0.01.
To look at the contribution of VarcoV further, we examined the partial correlation of VarcoV with accent rating, taking speech rate into account, separately for the three native speaker groups: English, Dutch and Spanish. Given the small number of different speakers, we used the scores for each of the five utterances by each of the six speakers (rather than the speaker means) giving a total of thirty scores for each language group. The partial correlation between VarcoV rating was not significant for English and Dutch native speakers [English: r = −0.003, p = 0.998; Dutch: r = −0.147, p = 0.446], but it was significant for Spanish speakers [r = −0.440, p<0.05]. Figure 4, showing a scatter-plot and a partial regression line, gives an indication of this relationship, demonstrating that utterances by Spanish speakers are likely to be rated as more non-native if VarcoV scores are lower. 3.3
Discussion The results of this simple perceptual experiment offer support for the conclusions drawn from both of the production studies discussed above
252
LAURENCE WHITE & SVEN L. MATTYS
regarding the value of the VarcoV metric as a measure of speech rhythm. VarcoV was found overall to be the best predictor of accent ratings, even when the contribution of speech rate, clearly an important difference between native and non-native speakers, was also considered. 2.5
Accent rating residuals (controlling speech rate)
2 1.5 1 0.5 0 -0.5 20
30
40
50
60
70
80
-1 -1.5 -2 -2.5 VarcoV
Figure 4: Relationship between accent ratings and VarcoV, showing partial regression line, for Spanish speakers of L2 English.
The power of VarcoV as a predictor of accent rating was clearly demonstrated for the utterances produced by native Spanish speakers, though not for the other two groups. The correlation between accent rating and VarcoV is readily interpretable in terms of the accommodation required for a native Spanish speaker to produce L2 English: speakers who produce short, reduced vowels in unstressed syllables in L2 English speech will, other things being equal, have higher VarcoV scores than speakers who consistently produce long, full unstressed vowels in L2 English. The former group, with higher VarcoV scores, are likely to be rated as more English-like in their speech production. We would also predict that English speakers who have lower VarcoV scores in L2 Spanish should be rated as more native-like, reflecting their production of relatively long, full-vowel unstressed syllables and stressed syllables that are not greatly longer. The failure to find a similar relationship between VarcoV and accent rating for the other two language groups is not surprising. For native English speakers, although there was evidence of slight variation between speakers, accent ratings were essentially at floor level, with a mean (1.35) only just above the lowest possible rating (1). For Dutch speakers, the fact that rhythm may play a lesser role in accent perception is indicated by the presence of an essentially bimodal distribution of accent ratings (see Figure 3): thus, two speakers had much higher ratings of non-nativeness than the other four, although their scores across the range of rhythm metrics were not markedly
FIRST AND SECOND LANGUAGE RHYTHM
253
different. As indicated in the cross-linguistic production study reported above, the rhythmic differences between Dutch and English are slight and, to the extent that they exist at all, are indexed by %V rather than VarcoV. Thus segmental and prosodic processes not impinging on linguistic rhythm seem likely to account for variations in the perception of the nativeness of Dutch speakers of English. 4. 4.1
General discussion Rhythm metrics We have reported on three studies designed to assess the discriminative performance of a range of speech rhythm metrics. The metrics %V, VarcoV and nPVI-V most clearly discriminated stress-timed English and Dutch from syllable-timed Spanish and French. Of these measures, nPVI-V did not discriminate quite as effectively as %V or VarcoV in the comparison of first and second language rhythm, finding no rhythmic difference between native English speakers and Spanish speakers of L2 English, despite the perceptible non-native accent of the latter group. We also looked for evidence of rhythmic differences between accents of British English. Once again, %V and VarcoV were the most discriminant metrics, suggesting that there may be significant variability in rhythm even within a canonically stress-timed language like English. This variability is likely to arise, at least in part, from segmental and suprasegmental processes that affect the durational balance of stressed and post-stress syllables. Finally, we reported a perceptual experiment which tested the power of rhythm metrics to predict ratings of English speech as native or non-native. VarcoV proved the most effective metric for this task, indexing variability in vowel duration—a factor that should be a key indicator of native or non-native accent. Scores for %V and nPVI-V also showed correlations with accent ratings, but their correlations with VarcoV (negative and positive respectively) meant that they did not emerge as independently reliable predictors of rating. 4.2
Rhythmic typology and variation The evidence from the production study of first and second languages reported above, and from studies such as Grabe and Low (2002), suggests that stress-timing vs syllable-timing is a gradient distinction between languages. The study of English accents reported here also showed support for postulated rhythmic variation between varieties of British English. Clearly, rhythmic variation may also be found between varieties of other languages. It has been held, for example, that southern varieties of Italian are relatively more stresstimed than the syllable-timed northern varieties (e.g. Grice, D’Imperio, Savino & Avesani 2005), although few quantitative data appear to be available. Also, Latin American Spanish subjectively conveys the impression of conferring greater salience on at least some stressed syllables than Castilian Spanish. Rhythmic metrics suggest a method of quantifying these apparent variations within Romance languages.
254
LAURENCE WHITE & SVEN L. MATTYS
Drawing together different strands of work on speech timing provides suggestive evidence that gradient rhythmic distinctions may be paralleled in differences in prosodic timing processes. Stress-timed languages such as English use strong durational cues to indicate syllable stress, and also indicate prosodic structure with localised lengthening effects (e.g. Wightman et al. 1992). As we have seen, the durational difference between stressed and unstressed syllables is less marked in Spanish. Recent studies suggest that timing may also have a lesser role in the indication of prosodic structure in Spanish. Ortega-Llebaria and Prieto (this volume) reported that pitch accent in Castilian Spanish was not consistently marked by lengthening within the accented word, in contrast with English (e.g. Turk & White 1999). Similarly, Frota et al. (this volume) reported that perceived final lengthening was not a reliable feature of Castilian Spanish prosodic phrasing. Any link between rhythm typology and prosodic timing processes is unlikely to be categorical. Indeed, the evidence from Frota et al.’s study is that phrase-final lengthening is widespread in Italian, but this could reflect the particular variety of Italian analysed. Their study analysed Neapolitan Italian, held to be more stress-timed than northern varieties. The possibility that varieties of a single language could show covariance in their degree of stresstiming and in their durational marking of prosodic boundaries is intriguing. More empirical work is required to settle this question, but it does suggest a promising direction for research utilising rhythm metrics. 4.3
Rhythm perception It is clear that further perceptual studies are necessary to determine what rhythm metrics such as %V and VarcoV tell us about the experience of linguistic rhythm for the listener. Firstly, the role of rhythm in linguistic discrimination should be explored further. Using speech resynthesis techniques designed to eliminate cues other than rhythm, Ramus et al. (2003) showed that language classes suggested by rhythm metric scores corresponded to listeners’ perceptual groupings. Similar techniques could be applied to assessing the extent of rhythmic variation between accents of the same language. The possibility of gradience in the role of rhythm in the perception of linguistic juncture should also be explored. It has been shown that the statistical predominance of word-initial stress in Germanic languages is used by listeners in the identification of word boundaries (e.g. Cutler & Norris 1988, for English; Vroomen & de Gelder 1997, for Dutch), at least when other, more reliable cues are not available (Mattys, White & Melhorn 2005). In contrast, Romance languages tend to have stressed syllables in penultimate or wordfinal position. The consequence of this latter arrangement for segmentation has barely been tested, however. The results of this and other recent research raise the possibility that rhythmic gradience within languages—specifically, variation in the relative strength of stressed and unstressed syllables—may be paralleled by gradience in listeners’ exploitation of stress-based segmentation
FIRST AND SECOND LANGUAGE RHYTHM
255
strategies. The rhythm metrics positively evaluated here, in particular %V and VarcoV, provide tools that will facilitate future research in such directions.
References Abercrombie, David. 1967. Elements of General Phonetics. Edinburgh: Edinburgh University Press. Aitken, Adam J. 1981. “The Scottish vowel length rule”. So meny People, Longages and Tonges: Philological Essays in Scots and Mediaeval English presented to Angus McIntosh ed. by Michael Benskin & Michael L. Samuels. 131-157. Edinburgh: The Middle English Dialect Project. Barry, William J., Bistra Andreeva, Michela Russo, Snezhina Dimitrova & Tania Kostadinova. 2003. “Do rhythm measures tell us anything about language type?” Proceedings of the 15th International Congress of Phonetics Sciences ed. by Maria-Josep Solé, Daniel Recasens & Joaquín Romero. 2693-2696. Barcelona: Causal Productions. Carter, Phillip M. 2005. “Quantifying rhythmic differences between Spanish, English, and Hispanic English”. Theoretical and Experimental Approaches to Romance Linguistics: Selected Papers from the 34th Linguistic Symposium on Romance Languages ed. by Randall S. Gess & Edward J. Rubin. 63-75. Amsterdam & Philadelphia: John Benjamins. Cutler, Anne & Dennis G. Norris. 1988. “The role of stressed syllables in segmentation for lexical access”. Journal of Experimental Psychology: Human Perception and Performance 14. 113-121. Dasher, Richard & Dwight L. Bolinger. 1982. “On pre-accentual lengthening”. Journal of the International Phonetic Association 12. 58-69. Dauer, Rebecca M. 1983. “Stress-timing and syllable-timing reanalyzed”. Journal of Phonetics 11. 51-62. Dellwo, Volker. 2006. “Rhythm and speech rate: A variation coefficient for deltaC”. Language and Language Processing: Proceedings of the 38th Linguistic Colloquium ed. by Pawel Karnowski & Imre Szigeti. 231-241. Frankfurt: Peter Lang. ---------- & Petra Wagner. 2003. “Relations between language rhythm and speech rate”. Proceedings of the 15th International Congress of Phonetics Sciences ed. by Maria-Josep Solé, Daniel Recasens & Joaquín Romero. 471-474. Barcelona: Causal Productions. Grabe, Esther & Ee Ling Low. 2002. “Durational variability in speech and the rhythm class hypothesis”. Papers in Laboratory Phonology 7 ed. by Natasha Warner & Carlos Gussenhoven. 515-546. Berlin: Mouton de Gruyter. Grice, Martine, Mariapaola D’Imperio, Michelina Savino & Cinzia Avesani. 2005. “A strategy for intonation labelling varieties of Italian”. Prosodic Typology: The Phonology of Intonation and Phrasing ed. by Sun-Ah Jun. 362-389. Oxford: Oxford University Press.
256
LAURENCE WHITE & SVEN L. MATTYS
Hughes, Arthur & Peter Trudgill. 1996. English Accents and Dialects. London: Arnold. Leyden, Klaske van. 2004. Prosodic characteristics of Orkney and Shetland dialects. PhD diss., Leiden University. Utrecht: LOT Dissertation Series 92. Lloyd James, Arthur. 1940. Speech Signals in Telephony. London: Pitman. Low, Ee Ling, Esther Grabe & Francis Nolan. 2000. “Quantitative characterisations of speech rhythm: ‘Syllable-timing’ in Singapore English”. Language and Speech 43. 377-401. Mattys, Sven L., Laurence White & James F. Melhorn. 2005. “Integration of multiple speech segmentation cues: A hierarchical framework”. Journal of Experimental Psychology: General 134. 477-500. Mees, Inger M., & Beverley Collins. 1999. “Cardiff: A real-time study of glottalization”. Urban Voices: Accent Studies in the British Isles ed. by Paul Foulkes & Gerard Docherty. 185-202. London: Arnold. Nazzi, Thierry, Josiane Bertoncini & Jacques Mehler. 1998. “Language discrimination by newborns: Towards an understanding of the role of rhythm”. Journal of Experimental Psychology: Human Perception and Performance 24. 756-766. Peterson, Gordon E. & Ilse Lehiste. 1960. “Duration of syllable nuclei in English”. Journal of the Acoustical Society of America 32. 693-703. Pike, Kenneth. 1945. The Intonation of American English. Ann Arbor: University of Michigan Press. Piske, Thorsten, Ian R.A. MacKay & James E. Flege. 2001. “Factors affecting degree of foreign accent in an L2: A review”. Journal of Phonetics 29. 191-215. Ramus, Franck. 2002. “Acoustic correlates of linguistic rhythm: Perspectives”. Proceedings of Speech Prosody ed. by Bernard Bel & Isabel Marlien. 115-120. Aix-en-Provence: Laboratoire Parole et Langage. ----------, Marina Nespor & Jacques Mehler. 1999. “Correlates of linguistic rhythm in the speech signal”. Cognition 73. 265-292. ----------, Emmanuel Dupoux & Jacques Mehler. 2003. “The psychological reality of rhythm classes: Perceptual studies”. Proceedings of the 15th International Congress of Phonetic Sciences ed. by Maria-Josep Solé, Daniel Recasens and Joaquín Romero. 337-342. Barcelona: Causal Productions. Roach, Peter. 1982. “On the distinction between ‘stress-timed’ and ‘syllabletimed’ languages”. Linguistic controversies ed. by David Crystal. 73-79. London: Edward Arnold. Swan, Michael & Bernard Smith. 1987. Learner English. Cambridge: Cambridge University Press. Turk, Alice E. & James R. Sawusch. 1997. “The domain of accentual lengthening in American English”. Journal of Phonetics 25. 25-41. ---------- & Laurence White. 1999. “Structural influences on accentual lengthening in English”. Journal of Phonetics 27. 171-206.
FIRST AND SECOND LANGUAGE RHYTHM
257
Vroomen, Jean & Beatrice de Gelder. 1997. “Activation of embedded words in spoken word recognition”. Journal of Experimental Psychology: Human Perception and Performance 23. 710-720. Walters, J. Robert. 2003. “On the intonation of a South Wales ‘Valleys accent’ of English”. Journal of the International Phonetic Association 332. 211238. Wightman, Colin W., Stefanie Shattuck-Hufnagel, Mari Ostendorf & Patti J. Price. 1992 “Segmental durations in the vicinity of prosodic phrase boundaries”. Journal of the Acoustical Society of America 91. 1707-1717. White, Laurence & Sven L. Mattys. In press. “Calibrating rhythm: First language and second language studies”. Journal of Phonetics.
SUBJECT INDEX
A. Accent 90, 92, 95-96, 103, 138-144, 148149, 155-158, 160-162, 164-165, 167169, 172-174. Accentuation 85-86, 88-90, 94, 97, 99, 102, 203. Accent deletion 85-86, 91, 97, 99, 103. Correlates of accent 155, 160, 169, 172, 174. Nuclear accent 132, 137-140, 144, 146-147, 150; see also nuclear contour. Prenuclear accent 160. Accents 237, 239-241, 246-254. Non-native accent 237, 239, 241, 244, 248-249, 253. Acoustic cue 14, 18, 47, 73, 172, 200, 219, 230, 234. Acquisition 179-181, 184-186, 193-195, 209. Acquisition of phonology 209. Order of acquisition 179, 194. Aerodynamics 52, 112. Aerodynamic constraint 113-114, 125. Aerodynamic factor 41-45, 48-49, 61-62. Allomorphy 179, 182, 184-185, 193-194. Allophony 179-182, 184-185, 193-195, 241. Alveolar 25, 27-30, 32, 34, 37-38, 52, 200201. Alveolopalatal 25, 27, 29, 31-32, 34-38. Amplitude 43, 47, 53, 60, 90, 160, 168169, 209; see also intensity, loudness, spectral balance, spectral tilt. Apposition 86-87, 89-91, 93-98, 102, 106. Articulatory constraint 25-26, 37-38, 74. Articulatory overlap 41, 44, 52. Articulatory Phonology 78, 80. Aspiration 67-71, 73-74, 78, 174. Aspirated /s/ 67-76, 79-80. Preaspiration 78-79. Postaspiration 67, 69-71, 78, 80. Assimilation 36-37, 48, 109-111, 114-115, 117, 119-121, 124, 233. Asymmetry 87, 89, 109-110, 113, 116, 119-122, 124-125, 168, 194. Autosegmental metrical theory 3, 89, 132; see also Autosegmental Phonology.
Autosegmental Phonology 4, 6, 17, 19-20; see also autosegmental metrical theory. B. Binomial/dichotomous data 220, 229. Boundary 4, 8, 11, 85-86, 88, 90, 98, 131134, 200-202, 205-206, 212-213, 222, 227, 231-233, 241, 254. Boundary crispness/fuzziness 228233, 235. Boundary orientation 231-232, 235. Boundary tone 92, 103, 131-132; see also phrase boundary, prosodic boundary cue. Broad-focus utterances 133. C. Categorical response data 219-220, 234. Class marker 183-185, 190. Coarticulation 25-28, 31, 35, 37-38, 42, 44. Anticipatory coarticulation 34-35. Carryover coarticulation 26, 34-36. Coarticulatory resistance 27. DAC model of coarticulation 25, 37. Coefficient 74, 169, 219, 223-228, 234. Contrast coefficient 229-231, 233234. Stimulus-tuned coefficient 223, 225-226, 233. Bias coefficient 223-226, 233. Continuation rise 131, 134-137, 139-142, 150. Consonant cluster 121-122, 243. Coronal 109, 112, 122, 128-129, 181-183, 188-191, 193-194. D. Dark /l/ 25, 27-28, 30, 35-37. Deaccentuation, see accent deletion. Deletion 4, 48, 67, 69, 78, 110-111, 117, 122, 183-184, 187-188, 190, 193-195; see also elision. Dental 25, 27-28, 30, 32, 34, 36-38. Devoicing 110-111, 116, 120-121, 125. Dialectal variation 48, 68, 72, 76, 79, 202, 205. Direct speech marker 87, 89, 92. Dislocated phrase 85, 87, 90-91, 95-103, 106-107. Dissimilation 110-111, 114-117, 124.
260
INDEX OF TERMS AND CONCEPTS
Distributional information 185, 200-203, 212. Downstep 140, 142; see also upstep. Duration 69, 71, 73-75, 78, 112-113, 115, 117, 135, 155, 157, 160-166, 169, 171174, 204, 209, 230-231, 233, 238-243, 245, 248, 254. Vowel duration 68, 71, 73, 115, 162, 164, 238, 244, 246-248, 253. Consonant duration 7, 11-13, 17, 41, 43-44, 52-57, 59-62, 71, 80, 244. E. Elision 48; see also deletion. Enchaînement, see liaison. Enclitic phrases 88. Epenthesis 4, 111-114, 116, 118-125. Cluster-medial epenthesis 110-111, 119. Epenthetic stop 47-48, 50, 54-56, 61. Epenthetic vowel 50, 54-55, 109113, 115, 118-119, 121-123. Epithet 86-87, 89-93, 96. Exemplar-based Phonology 3-4, 17, 19-21. Extra-sentential element 85-86, 102. F. F0, see pitch. Familiarization-preference procedure 210; see also Head-Turn Preference Procedure. Feature 10, 18, 41-46, 52, 61-62, 69, 78, 113, 119, 140, 150, 182-183, 189, 194, 202, 241, 254. Combination of features 41-44, 4749, 61-62. Feature compatibility 45. Feature co-occurrence restriction 42, 61-62. Feature stability 41-43, 46, 61-62. Incompatibility of features 42, 45. Interaction of features 42-44, 60. Phonological feature 41, 111, 125, 142, 240. Flap, see tap. Floating segment 4, 6, 8, 17. Frication 41, 44-50, 52, 56, 59-62, 71, 118-119. Fricative 15, 17, 27, 32, 34, 36, 41, 43-62, 109, 112-115, 117-122, 124-125, 195, 240.
Fricative duration 41, 44, 52-54, 56-57, 59-62. G. Gesture 30-32, 34, 42, 44, 47, 49-50, 53, 56, 71, 78-79, 121, 131, 137, 140, 150. Gestural reorganization 79. Gestural score 78, 80. Velic gesture 48, 52, 54-56, 58, 60. Oral gesture 48, 52, 54-56, 58, 60. Glide formation 48. Gliding, see glide formation. Glottis 50, 71, 78. Goodness-of-fit 220-225, 234. H. Head-Turn Preference Procedure (HPP) 202, 204, 206, 210. Historical change 48, 61; see also sound change. I. Iambic foot 200, 202, 204-211, 213; see also trochaic foot. Intensity 46, 49-50, 60, 62, 93, 156-157, 160-161, 168-170, 173-174, 204, 209. Overall intensity 157, 160-162, 166-172, 174. Intonation 85-86, 88-89, 92, 96, 98, 102, 132, 134, 150, 156, 158-159, 161-170, 209, 248. Intonational phrasing, see phrasing. Variation in intonation 85-87, 89, 91, 94, 100, 102. L. Labial 27, 37-38, 128-129, 182-183, 188. Language acquisition, see acquisition. Lateral 27, 34, 49, 59, 61, 109-110, 113117, 119-121, 124-125; see also liquid. Lengthening 7, 47, 69-70, 73, 80, 162, 164, 172-173, 240, 245-246, 254. Compensatory lengthening 113114, 117. Preboundary lengthening 90-91, 98, 134-135, 137. Lexical frequency 6-7, 43, 62. Lexical stress 162, 203, 205-206, 213. Lexical stress pattern 203-205, 208, 212; see also stress pattern preference. Liaison 3-21. Liquid 45, 109-111, 113, 115-116, 121, 124. Lombard effect 97.
INDEX OF TERMS AND CONCEPTS
Logistic regression 219-231, 233-235. Logit 223, 226-227, 229-230. Loudness 156, 160, 170, 174; see also intensity. M. Missing-Letter Effect (MLE) 19-20. Model fitting 219-221, 224. Multinomial/polytomous data 222-224. N. Nasal 15, 17, 32, 41-56, 59-62, 114, 181. Nasalization 41, 43-47, 50, 59-62. Non-restrictive relative 85-91, 93-94. Nuclear contour 132, 138-139. O. Optimality Theory 113. P. Parentheses 86-87, 90-91, 93, 96. Parenthetic utterances 85, 88, 157162, 164-165, 167-169, 171172. Perception 5, 8, 204, 208, 219-222, 224226, 228, 230, 232-233, 235, 240, 248250, 252-254. Rhythm perception 254. Speech perception 5, 135, 170, 174175, 206, 219-220, 234-235. Phoneme detection 3, 5-6, 11, 13-15, 1720. Phonetic correlate 68-70, 73-75, 80, 111112, 119, 121, 124-125; see also correlates of stress, correlates of accent. Phonological pattern 42, 62. Phonological process 42, 48, 61, 122, 179, 181, 186, 193-194. Phonological representation 180, 184-185, 194. Phonotactics 44, 61, 200-201, 206, 212213, 238, 246. Phrase boundary 135. Phrase length 133, 142-149. Phrasing 85-86, 88-91, 94-95, 98-99, 101102, 131-134, 137, 149-150, 254; see also phrase boundary. Pitch 90-93, 97-98, 100-103, 155-157, 160-162, 168, 174, 204, 209, 240; see also pitch accent. Pitch reset 134-135, 137. Pitch span 85, 91. Sustained pitch 134-137, 139, 142, 150.
261
Pitch accent 90, 95-97, 100-103, 138, 140, 142-143, 148-149, 150, 156-158, 160162, 166, 168, 171-172, 174, 254. Prenuclear pitch accent 160. Place of articulation 182. Polar coordinate 230-232. Post-nasal voicing 44. Probability 7, 21, 62, 220, 227, 229-230. Probability surface plots 227-228, 232. Prosodic boundary cue 131-133; see also boundary tone, phrase boundary. Q. Quantal theory 42-43. Quotation 87, 90-91. R. Rhotacism 48, 247. Rhotic 50, 109-125. Rhythm 11, 103, 200-203, 205, 209, 237241, 243-248, 250, 252-254. Rhythm metrics 200, 238-242, 246248, 250-251, 253-255. Rhythm perception 254. Rhythmic typology 237, 253-254. Rhythmic variation 253-254. S. Schwa 26-27, 36, 50, 122. Second language 159, 219-220, 230, 232233, 235, 237, 241, 244-245, 253. Second language rhythm 239-240, 242, 253. Segment 11-12 19, 25-26, 35-36, 41-45, 47-50, 53, 55, 59-62, 109, 113-115, 118, 120, 122, 123, 125, 179, 193, 233, 238, 248. Segment loss 47-49, 51, 60-62, 205; see also elision. Segment weakening 47-51, 60-62, 69, 78. Segmentation 5, 68, 71-72, 201-203, 206, 212, 254-255. Metrical Segmentation Strategy (MSS) 201. Word segmentation 199-203, 206, 212-213. Sentential adverb 85, 87-88, 90-91, 94, 96, 102. Sequential restriction Simplification 109-117, 119, 121, 124125; see also elision.
262
INDEX OF TERMS AND CONCEPTS
Sound change 26, 36, 42-44, 48-49, 51, 55, 61-62, 67, 70, 78, 80; see also historical change. Spectral balance 157, 160, 170, 172, 174; see also spectral tilt. Spectral tilt 155-157, 160-162, 169, 171172, 174. Speech rate 41-42, 44, 52, 55-61, 74, 7677, 121-122, 172, 238, 243, 250-252. Speech segmentation 199-201, 203, 207. Speech timing 237, 254. Spontaneous speech 67, 76-77, 79-80, 183, 195. Stop 36, 43, 45, 52-56, 61, 67-76, 78-80, 109-121, 123-125. Stop closure 69-71, 73-75, 78-80. Voiceless stop 67-69, 71, 78-80, 113-118, 121, 124-125. Stress 100-102, 122, 133, 155-158, 161175, 184-185, 194-195, 200-207, 209213. Correlates of stress 155-157, 165, 169, 172, 174, 209. Stress cue to segmentation 199-203, 206, 212-213. Stress-accent language 155, 170, 172. Stress discrimination Stress pattern preference 200, 203208, 212. Stress-timed language 199-200, 202-204, 237-239, 242-243, 246, 248, 250, 253-254. Syllable 3-5, 10, 12, 37, 48, 67-69, 73, 78, 96, 98, 100-103, 121, 125, 133-135, 138, 143, 155-158, 160-174, 187-188, 190, 193-195, 200-202, 204-205, 237238, 240-241, 243, 246-247, 252-255. Strong syllable 200-202, 204-206, 209. Syllable constituent Syllable weight 204, 213. Syllable-timed language 121, 199200, 202, 204, 237, 239-240, 242-243, 246, 250, 253. Weak syllable 200. Syntagmatic aspect 61-62. T. Tap 29, 34, 45, 61, 111, 113-114, 118-121, 123-125, 174. Territorial map 227-228, 231. Timing 78, 109, 121, 125, 237, 246-247, 253-254; see also speech timing.
Timing of articulatory movement 47, 49, 52, 55-56, 60-61. Tonal reduplication 85-6, 91, 93, 103. Tonal scaling 78, 96-98, 101-102, 140150. Trill 37; see also rhotic, rhotacism. Trochaic foot 199-202, 204-205, 207-213, see also iambic foot. U. Upstep 140, 142, 148, see also downstep. V. Velar 27, 38, 120. Velo-pharyngeal opening 46-47, 53. Vocative 85-87, 89-91, 93-95. Vocalization 48. Voicing assimilation 109, 112, 114, 116, 119, 125. Voice Onset Time (VOT) 71-73, 220. Vowel 3-4, 7-14, 17-19, 25-38, 42, 45-50, 53-55, 60, 68-70, 73-74, 79, 98, 113, 116-117, 119, 121, 138, 155, 157, 160161, 163, 165-173, 179-191, 193-195, 219, 221-222, 224-228, 230, 233, 238, 240-243, 245-246, 248, 252. High vowel 69, 100. Low vowel 36, 68-70, 100. Vowel opening 68-70. Vowel quality 155, 157, 160-162, 165-166, 169, 171-174. Vowel reduction 155, 157, 166. 173-174, 179, 181, 183-186, 193-195, 240, 243. W. Word recognition 5, 8, 18, 204, 206.
CURRENT ISSUES IN LINGUISTIC THEORY
E. F. K. Koerner, Editor
Zentrum für Allgemeine Sprachwissenschaft, Typologie und Universalienforschung, Berlin [email protected] Current Issues in Linguistic Theory (CILT) is a theory-oriented series which welcomes contributions from scholars who have significant proposals to make towards the advancement of our understanding of language, its structure, functioning and development. CILT has been established in order to provide a forum for the presentation and discussion of linguistic opinions of scholars who do not necessarily accept the prevailing mode of thought in linguistic science. It offers an outlet for meaningful contributions to the current linguistic debate, and furnishes the diversity of opinion which a healthy discipline must have. A complete list of titles in this series can be found on the publishers’ website, www.benjamins.com 284 Salmons, Joseph C. and Shannon Dubenion-Smith (eds.): Historical Linguistics 2005. Selected papers from the 17th International Conference on Historical Linguistics, Madison, Wisconsin, 31 July - 5 August 2005. Expected August 2007 283 Lenker, Ursula and Anneli Meurman-Solin (eds.): Connectives in the History of English. Expected August 2007 282 Prieto, Pilar, Joan Mascaró and Maria-Josep Solé (eds.): Segmental and prosodic issues in Romance phonology. 2007. xv, 262 pp. 281 Vermeerbergen, Myriam, Lorraine Leeson and Onno Crasborn (eds.): Simultaneity in Signed Languages. Form and function. 2007. viii, 356 pp. (incl. CD-Rom). 280 Hewson, John and Vit Bubenik: From Case to Adposition. The development of configurational syntax in Indo-European languages. 2006. xxx, 420 pp. 279 Nedergaard Thomsen, Ole (ed.): Competing Models of Linguistic Change. Evolution and beyond. 2006. vi, 344 pp. 278 Doetjes, Jenny and Paz González (eds.): Romance Languages and Linguistic Theory 2004. Selected papers from ‘Going Romance’, Leiden, 9–11 December 2004. 2006. viii, 320 pp. 277 Helasvuo, Marja-Liisa and Lyle Campbell (eds.): Grammar from the Human Perspective. Case, space and person in Finnish. 2006. x, 280 pp. 276 Montreuil, Jean-Pierre Y. (ed.): New Perspectives on Romance Linguistics. Vol. II: Phonetics, Phonology and Dialectology. Selected papers from the 35th Linguistic Symposium on Romance Languages (LSRL), Austin, Texas, February 2005. 2006. x, 213 pp. 275 Nishida, Chiyo and Jean-Pierre Y. Montreuil (eds.): New Perspectives on Romance Linguistics. Vol. I: Morphology, Syntax, Semantics, and Pragmatics. Selected papers from the 35th Linguistic Symposium on Romance Languages (LSRL), Austin, Texas, February 2005. 2006. xiv, 288 pp. 274 Gess, Randall S. and Deborah Arteaga (eds.): Historical Romance Linguistics. Retrospective and perspectives. 2006. viii, 393 pp. 273 Filppula, Markku, Juhani Klemola, Marjatta Palander and Esa Penttilä (eds.): Dialects Across Borders. Selected papers from the 11th International Conference on Methods in Dialectology (Methods XI), Joensuu, August 2002. 2005. xii, 291 pp. 272 Gess, Randall S. and Edward J. Rubin (eds.): Theoretical and Experimental Approaches to Romance Linguistics. Selected papers from the 34th Linguistic Symposium on Romance Languages (LSRL), Salt Lake City, March 2004. 2005. viii, 367 pp. 271 Branner, David Prager (ed.): The Chinese Rime Tables. Linguistic philosophy and historicalcomparative phonology. 2006. viii, 358 pp. 270 Geerts, Twan, Ivo van Ginneken and Haike Jacobs (eds.): Romance Languages and Linguistic Theory 2003. Selected papers from ‘Going Romance’ 2003, Nijmegen, 20–22 November. 2005. viii, 369 pp. 269 Hargus, Sharon and Keren Rice (eds.): Athabaskan Prosody. 2005. xii, 432 pp. 268 Cravens, Thomas D. (ed.): Variation and Reconstruction. 2006. viii, 223 pp. 267 Alhawary, Mohammad T. and Elabbas Benmamoun (eds.): Perspectives on Arabic Linguistics XVII–XVIII. Papers from the seventeenth and eighteenth annual symposia on Arabic linguistics. Volume XVII–XVIII: Alexandria, 2003 and Norman, Oklahoma 2004. 2005. xvi, 315 pp. 266 Boudelaa, Sami (ed.): Perspectives on Arabic Linguistics XVI. Papers from the sixteenth annual symposium on Arabic linguistics, Cambridge, March 2002. 2006. xii, 181 pp. 265 Cornips, Leonie and Karen P. Corrigan (eds.): Syntax and Variation. Reconciling the Biological and the Social. 2005. vi, 312 pp.
264 Dressler, Wolfgang U., Dieter Kastovsky, Oskar E. Pfeiffer and Franz Rainer (eds.): Morphology and its demarcations. Selected papers from the 11th Morphology meeting, Vienna, February 2004. With the assistance of Francesco Gardani and Markus A. Pöchtrager. 2005. xiv, 320 pp. 263 Branco, António, Tony McEnery and Ruslan Mitkov (eds.): Anaphora Processing. Linguistic, cognitive and computational modelling. 2005. x, 449 pp. 262 Vajda, Edward J. (ed.): Languages and Prehistory of Central Siberia. 2004. x, 275 pp. 261 Kay, Christian J. and Jeremy J. Smith (eds.): Categorization in the History of English. 2004. viii, 268 pp. 260 Nicolov, Nicolas, Kalina Bontcheva, Galia Angelova and Ruslan Mitkov (eds.): Recent Advances in Natural Language Processing III. Selected papers from RANLP 2003. 2004. xii, 402 pp. 259 Carr, Philip, Jacques Durand and Colin J. Ewen (eds.): Headhood, Elements, Specification and Contrastivity. Phonological papers in honour of John Anderson. 2005. xxviii, 405 pp. 258 Auger, Julie, J. Clancy Clements and Barbara Vance (eds.): Contemporary Approaches to Romance Linguistics. Selected Papers from the 33rd Linguistic Symposium on Romance Languages (LSRL), Bloomington, Indiana, April 2003. With the assistance of Rachel T. Anderson. 2004. viii, 404 pp. 257 Fortescue, Michael, Eva Skafte Jensen, Jens Erik Mogensen and Lene Schøsler (eds.): Historical Linguistics 2003. Selected papers from the 16th International Conference on Historical Linguistics, Copenhagen, 11–15 August 2003. 2005. x, 312 pp. 256 Bok-Bennema, Reineke, Bart Hollebrandse, Brigitte Kampers-Manhe and Petra Sleeman (eds.): Romance Languages and Linguistic Theory 2002. Selected papers from ‘Going Romance’, Groningen, 28–30 November 2002. 2004. viii, 273 pp. 255 Meulen, Alice ter and Werner Abraham (eds.): The Composition of Meaning. From lexeme to discourse. 2004. vi, 232 pp. 254 Baldi, Philip and Pietro U. Dini (eds.): Studies in Baltic and Indo-European Linguistics. In honor of William R. Schmalstieg. 2004. xlvi, 302 pp. 253 Caffarel, Alice, J.R. Martin and Christian M.I.M. Matthiessen (eds.): Language Typology. A functional perspective. 2004. xiv, 702 pp. 252 Kay, Christian J., Carole Hough and Irené Wotherspoon (eds.): New Perspectives on English Historical Linguistics. Selected papers from 12 ICEHL, Glasgow, 21–26 August 2002. Volume II: Lexis and Transmission. 2004. xii, 273 pp. 251 Kay, Christian J., Simon Horobin and Jeremy J. Smith (eds.): New Perspectives on English Historical Linguistics. Selected papers from 12 ICEHL, Glasgow, 21–26 August 2002. Volume I: Syntax and Morphology. 2004. x, 264 pp. 250 Jensen, John T.: Principles of Generative Phonology. An introduction. 2004. xii, 324 pp. 249 Bowern, Claire and Harold Koch (eds.): Australian Languages. Classification and the comparative method. 2004. xii, 377 pp. (incl. CD-Rom). 248 Weigand, Edda (ed.): Emotion in Dialogic Interaction. Advances in the complex. 2004. xii, 284 pp. 247 Parkinson, Dilworth B. and Samira Farwaneh (eds.): Perspectives on Arabic Linguistics XV. Papers from the Fifteenth Annual Symposium on Arabic Linguistics, Salt Lake City 2001. 2003. x, 214 pp. 246 Holisky, Dee Ann and Kevin Tuite (eds.): Current Trends in Caucasian, East European and Inner Asian Linguistics. Papers in honor of Howard I. Aronson. 2003. xxviii, 426 pp. 245 Quer, Josep, Jan Schroten, Mauro Scorretti, Petra Sleeman and Els Verheugd (eds.): Romance Languages and Linguistic Theory 2001. Selected papers from 'Going Romance', Amsterdam, 6–8 December 2001. 2003. viii, 355 pp. 244 Pérez-Leroux, Ana Teresa and Yves Roberge (eds.): Romance Linguistics. Theory and Acquisition. Selected papers from the 32nd Linguistic Symposium on Romance Languages (LSRL), Toronto, April 2002. 2003. viii, 388 pp. 243 Cuyckens, Hubert, Thomas Berg, René Dirven and Klaus-Uwe Panther (eds.): Motivation in Language. Studies in honor of Günter Radden. 2003. xxvi, 403 pp. 242 Seuren, Pieter A.M. and Gerard Kempen (eds.): Verb Constructions in German and Dutch. 2003. vi, 316 pp. 241 Lecarme, Jacqueline (ed.): Research in Afroasiatic Grammar II. Selected papers from the Fifth Conference on Afroasiatic Languages, Paris, 2000. 2003. viii, 550 pp. 240 Janse, Mark and Sijmen Tol (eds.): Language Death and Language Maintenance. Theoretical, practical and descriptive approaches. With the assistance of Vincent Hendriks. 2003. xviii, 244 pp. 239 Andersen, Henning (ed.): Language Contacts in Prehistory. Studies in Stratigraphy. Papers from the Workshop on Linguistic Stratigraphy and Prehistory at the Fifteenth International Conference on Historical Linguistics, Melbourne, 17 August 2001. 2003. viii, 292 pp.
238 Núñez-Cedeño, Rafael, Luis López and Richard Cameron (eds.): A Romance Perspective on Language Knowledge and Use. Selected papers from the 31st Linguistic Symposium on Romance Languages (LSRL), Chicago, 19–22 April 2001. 2003. xvi, 386 pp. 237 Blake, Barry J. and Kate Burridge (eds.): Historical Linguistics 2001. Selected papers from the 15th International Conference on Historical Linguistics, Melbourne, 13–17 August 2001. Editorial Assistant: Jo Taylor. 2003. x, 444 pp. 236 Simon-Vandenbergen, Anne-Marie, Miriam Taverniers and Louise J. Ravelli (eds.): Grammatical Metaphor. Views from systemic functional linguistics. 2003. vi, 453 pp. 235 Linn, Andrew R. and Nicola McLelland (eds.): Standardization. Studies from the Germanic languages. 2002. xii, 258 pp. 234 Weijer, Jeroen van de, Vincent J. van Heuven and Harry van der Hulst (eds.): The Phonological Spectrum. Volume II: Suprasegmental structure. 2003. x, 264 pp. 233 Weijer, Jeroen van de, Vincent J. van Heuven and Harry van der Hulst (eds.): The Phonological Spectrum. Volume I: Segmental structure. 2003. x, 308 pp. 232 Beyssade, Claire, Reineke Bok-Bennema, Frank Drijkoningen and Paola Monachesi (eds.): Romance Languages and Linguistic Theory 2000. Selected papers from ‘Going Romance’ 2000, Utrecht, 30 November–2 December. 2002. viii, 354 pp. 231 Cravens, Thomas D.: Comparative Historical Dialectology. Italo-Romance clues to Ibero-Romance sound change. 2002. xii, 163 pp. 230 Parkinson, Dilworth B. and Elabbas Benmamoun (eds.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume XIII-XIV: Stanford, 1999 and Berkeley, California 2000. 2002. xiv, 250 pp. 229 Nevin, Bruce E. and Stephen B. Johnson (eds.): The Legacy of Zellig Harris. Language and information into the 21st century. Volume 2: Mathematics and computability of language. 2002. xx, 312 pp. 228 Nevin, Bruce E. (ed.): The Legacy of Zellig Harris. Language and information into the 21st century. Volume 1: Philosophy of science, syntax and semantics. 2002. xxxvi, 323 pp. 227 Fava, Elisabetta (ed.): Clinical Linguistics. Theory and applications in speech pathology and therapy. 2002. xxiv, 353 pp. 226 Levin, Saul: Semitic and Indo-European. Volume II: Comparative morphology, syntax and phonetics. 2002. xviii, 592 pp. 225 Shahin, Kimary N.: Postvelar Harmony. 2003. viii, 344 pp. 224 Fanego, Teresa, Belén Méndez-Naya and Elena Seoane (eds.): Sounds, Words, Texts and Change. Selected papers from 11 ICEHL, Santiago de Compostela, 7–11 September 2000. Volume 2. 2002. x, 310 pp. 223 Fanego, Teresa, Javier Pérez-Guerra and María José López-Couso (eds.): English Historical Syntax and Morphology. Selected papers from 11 ICEHL, Santiago de Compostela, 7–11 September 2000. Volume 1. 2002. x, 306 pp. 222 Herschensohn, Julia, Enrique Mallén and Karen Zagona (eds.): Features and Interfaces in Romance. Essays in honor of Heles Contreras. 2001. xiv, 302 pp. 221 D’hulst, Yves, Johan Rooryck and Jan Schroten (eds.): Romance Languages and Linguistic Theory 1999. Selected papers from ‘Going Romance’ 1999, Leiden, 9–11 December 1999. 2001. viii, 406 pp. 220 Satterfield, Teresa, Christina M. Tortora and Diana Cresti (eds.): Current Issues in Romance Languages. Selected papers from the 29th Linguistic Symposium on Romance Languages (LSRL), Ann Arbor, 8–11 April 1999. 2002. viii, 412 pp. 219 Andersen, Henning (ed.): Actualization. Linguistic Change in Progress. Papers from a workshop held at the 14th International Conference on Historical Linguistics, Vancouver, B.C., 14 August 1999. 2001. vii, 250 pp. 218 Bendjaballah, Sabrina, Wolfgang U. Dressler, Oskar E. Pfeiffer and Maria D. Voeikova (eds.): Morphology 2000. Selected papers from the 9th Morphology Meeting, Vienna, 24–28 February 2000. 2002. viii, 317 pp. 217 Wiltshire, Caroline R. and Joaquim Camps (eds.): Romance Phonology and Variation. Selected papers from the 30th Linguistic Symposium on Romance Languages, Gainesville, Florida, February 2000. 2002. xii, 238 pp. 216 Camps, Joaquim and Caroline R. Wiltshire (eds.): Romance Syntax, Semantics and L2 Acquisition. Selected papers from the 30th Linguistic Symposium on Romance Languages, Gainesville, Florida, February 2000. 2001. xii, 246 pp. 215 Brinton, Laurel J. (ed.): Historical Linguistics 1999. Selected papers from the 14th International Conference on Historical Linguistics, Vancouver, 9–13 August 1999. 2001. xii, 398 pp. 214 Weigand, Edda and Marcelo Dascal (eds.): Negotiation and Power in Dialogic Interaction. 2001. viii, 303 pp.
213 Sornicola, Rosanna, Erich Poppe and Ariel Shisha-Halevy (eds.): Stability, Variation and Change of Word-Order Patterns over Time. With the assistance of Paola Como. 2000. xxxii, 323 pp. 212 Repetti, Lori (ed.): Phonological Theory and the Dialects of Italy. 2000. x, 301 pp. 211 Elšík, Viktor and Yaron Matras (eds.): Grammatical Relations in Romani. The Noun Phrase. with a Foreword by Frans Plank (Universität Konstanz). 2000. x, 244 pp. 210 Dworkin, Steven N. and Dieter Wanner (eds.): New Approaches to Old Problems. Issues in Romance historical linguistics. 2000. xiv, 235 pp. 209 King, Ruth: The Lexical Basis of Grammatical Borrowing. A Prince Edward Island French case study. 2000. xvi, 241 pp. 208 Robinson, Orrin W.: Whose German? The ach/ich alternation and related phenomena in ‘standard’ and ‘colloquial’. 2001. xii, 178 pp. 207 Sanz, Montserrat: Events and Predication. A new approach to syntactic processing in English and Spanish. 2000. xiv, 219 pp. 206 Fawcett, Robin P.: A Theory of Syntax for Systemic Functional Linguistics. 2000. xxiv, 360 pp. 205 Dirven, René, Roslyn M. Frank and Cornelia Ilie (eds.): Language and Ideology. Volume 2: descriptive cognitive approaches. 2001. vi, 264 pp. 204 Dirven, René, Bruce Hawkins and Esra Sandikcioglu (eds.): Language and Ideology. Volume 1: theoretical cognitive approaches. 2001. vi, 301 pp. 203 Norrick, Neal R.: Conversational Narrative. Storytelling in everyday talk. 2000. xiv, 233 pp. 202 Lecarme, Jacqueline, Jean Lowenstamm and Ur Shlonsky (eds.): Research in Afroasiatic Grammar. Papers from the Third conference on Afroasiatic Languages, Sophia Antipolis, 1996. 2000. vi, 386 pp. 201 Dressler, Wolfgang U., Oskar E. Pfeiffer, Markus A. Pöchtrager and John R. Rennison (eds.): Morphological Analysis in Comparison. 2000. x, 261 pp. 200 Anttila, Raimo: Greek and Indo-European Etymology in Action. Proto-Indo-European *aǵ-. 2000. xii, 314 pp. 199 Pütz, Martin and Marjolijn H. Verspoor (eds.): Explorations in Linguistic Relativity. 2000. xvi, 369 pp. 198 Niemeier, Susanne and René Dirven (eds.): Evidence for Linguistic Relativity. 2000. xxii, 240 pp. 197 Coopmans, Peter, Martin Everaert and Jane Grimshaw (eds.): Lexical Specification and Insertion. 2000. xviii, 476 pp. 196 Hannahs, S.J. and Mike Davenport (eds.): Issues in Phonological Structure. Papers from an International Workshop. 1999. xii, 268 pp. 195 Herring, Susan C., Pieter van Reenen and Lene Schøsler (eds.): Textual Parameters in Older Languages. 2001. x, 448 pp. 194 Coleman, Julie and Christian J. Kay (eds.): Lexicology, Semantics and Lexicography. Selected papers from the Fourth G. L. Brook Symposium, Manchester, August 1998. 2000. xiv, 257 pp. 193 Klausenburger, Jurgen: Grammaticalization. Studies in Latin and Romance morphosyntax. 2000. xiv, 184 pp. 192 Alexandrova, Galina M. and Olga Arnaudova (eds.): The Minimalist Parameter. Selected papers from the Open Linguistics Forum, Ottawa, 21–23 March 1997. 2001. x, 360 pp. 191 Sihler, Andrew L.: Language History. An introduction. 2000. xvi, 298 pp. 190 Benmamoun, Elabbas (ed.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume XII: Urbana-Champaign, Illinois, 1998. 1999. viii, 204 pp. 189 Nicolov, Nicolas and Ruslan Mitkov (eds.): Recent Advances in Natural Language Processing II. Selected papers from RANLP ’97. 2000. xi, 422 pp. 188 Simmons, Richard VanNess: Chinese Dialect Classification. A comparative approach to Harngjou, Old Jintarn, and Common Northern Wu. 1999. xviii, 317 pp. 187 Franco, Jon A., Alazne Landa and Juan Martín (eds.): Grammatical Analyses in Basque and Romance Linguistics. Papers in honor of Mario Saltarelli. 1999. viii, 306 pp. 186 Mišeska Tomić, Olga and Milorad Radovanović (eds.): History and Perspectives of Language Study. Papers in honor of Ranko Bugarski. 2000. xxii, 314 pp. 185 Authier, Jean-Marc, Barbara E. Bullock and Lisa A. Reed (eds.): Formal Perspectives on Romance Linguistics. Selected papers from the 28th Linguistic Symposium on Romance Languages (LSRL XXVIII), University Park, 16–19 April 1998. 1999. xii, 334 pp. 184 Sagart, Laurent: The Roots of Old Chinese. 1999. xii, 272 pp. 183 Contini-Morava, Ellen and Yishai Tobin (eds.): Between Grammar and Lexicon. 2000. xxxii, 365 pp.
182 Kenesei, István (ed.): Crossing Boundaries. Advances in the theory of Central and Eastern European languages. 1999. viii, 302 pp. 181 Mohammad, Mohammad A.: Word Order, Agreement and Pronominalization in Standard and Palestinian Arabic. 2000. xvi, 197 pp. 180 Mereu, Lunella (ed.): Boundaries of Morphology and Syntax. 1999. viii, 314 pp. 179 Rini, Joel: Exploring the Role of Morphology in the Evolution of Spanish. 1999. xvi, 187 pp. 178 Foolen, Ad and Frederike van der Leek (eds.): Constructions in Cognitive Linguistics. Selected papers from the Fifth International Cognitive Linguistics Conference, Amsterdam, 1997. 2000. xvi, 338 pp. 177 Cuyckens, Hubert and Britta E. Zawada (eds.): Polysemy in Cognitive Linguistics. Selected papers from the International Cognitive Linguistics Conference, Amsterdam, 1997. 2001. xxviii, 296 pp. 176 Van Hoek, Karen, Andrej A. Kibrik and Leo Noordman (eds.): Discourse Studies in Cognitive Linguistics. Selected papers from the 5th International Cognitive Linguistics Conference, Amsterdam, July 1997. 1999. vi, 187 pp. 175 Gibbs, Jr., Raymond W. and Gerard J. Steen (eds.): Metaphor in Cognitive Linguistics. Selected papers from the 5th International Cognitive Linguistics Conference, Amsterdam, 1997. 1999. viii, 226 pp. 174 Hall, T. Alan and Ursula Kleinhenz (eds.): Studies on the Phonological Word. 1999. viii, 298 pp. 173 Treviño, Esthela and José Lema (eds.): Semantic Issues in Romance Syntax. 1999. viii, 309 pp. 172 Dimitrova-Vulchanova, Mila and Lars Hellan (eds.): Topics in South Slavic Syntax and Semantics. 1999. xxviii, 263 pp. 171 Weigand, Edda (ed.): Contrastive Lexical Semantics. 1998. x, 270 pp. 170 Lamb, Sydney M.: Pathways of the Brain. The neurocognitive basis of language. 1999. xii, 418 pp. 169 Ghadessy, Mohsen (ed.): Text and Context in Functional Linguistics. 1999. xviii, 340 pp. 168 Ratcliffe, Robert R.: The “Broken” Plural Problem in Arabic and Comparative Semitic. Allomorphy and analogy in non-concatenative morphology. 1998. xii, 261 pp. 167 Benmamoun, Elabbas, Mushira Eid and Niloofar Haeri (eds.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume XI: Atlanta, Georgia, 1997. 1998. viii, 231 pp. 166 Lemmens, Maarten: Lexical Perspectives on Transitivity and Ergativity. Causative constructions in English. 1998. xii, 268 pp. 165 Bubenik, Vit: A Historical Syntax of Late Middle Indo-Aryan (Apabhraṃśa). 1998. xxiv, 265 pp. 164 Schmid, Monika S., Jennifer R. Austin and Dieter Stein (eds.): Historical Linguistics 1997. Selected papers from the 13th International Conference on Historical Linguistics, Düsseldorf, 10–17 August 1997. 1998. x, 409 pp. 163 Lockwood, David G., Peter H. Fries and James E. Copeland (eds.): Functional Approaches to Language, Culture and Cognition. Papers in honor of Sydney M. Lamb. 2000. xxxiv, 656 pp. 162 Hogg, Richard M. and Linda van Bergen (eds.): Historical Linguistics 1995. Volume 2: Germanic linguistics.. Selected papers from the 12th International Conference on Historical Linguistics, Manchester, August 1995. 1998. x, 365 pp. 161 Smith, John Charles and Delia Bentley (eds.): Historical Linguistics 1995. Volume 1: General issues and non-Germanic Languages.. Selected papers from the 12th International Conference on Historical Linguistics, Manchester, August 1995. 2000. xii, 438 pp. 160 Schwegler, Armin, Bernard Tranel and Myriam Uribe-Etxebarria (eds.): Romance Linguistics: Theoretical Perspectives. Selected papers from the 27th Linguistic Symposium on Romance Languages (LSRL XXVII), Irvine, 20–22 February, 1997. 1998. vi, 349 pp. + index. 159 Joseph, Brian D., Geoffrey C. Horrocks and Irene Philippaki-Warburton (eds.): Themes in Greek Linguistics II. 1998. x, 335 pp. 158 Sánchez-Macarro, Antonia and Ronald Carter (eds.): Linguistic Choice across Genres. Variation in spoken and written English. 1998. viii, 338 pp. 157 Lema, José and Esthela Treviño (eds.): Theoretical Analyses on Romance Languages. Selected papers from the 26th Linguistic Symposium on Romance Languages (LSRL XXVI), Mexico City, 28–30 March, 1996. 1998. viii, 380 pp. 156 Matras, Yaron, Peter Bakker and Hristo Kyuchukov (eds.): The Typology and Dialectology of Romani. 1997. xxxii, 223 pp. 155 Forget, Danielle, Paul Hirschbühler, France Martineau and María Luisa Rivero (eds.): Negation and Polarity. Syntax and semantics. Selected papers from the colloquium Negation: Syntax and Semantics. Ottawa, 11–13 May 1995. 1997. viii, 367 pp. 154 Simon-Vandenbergen, Anne-Marie, Kristin Davidse and Dirk Noël (eds.): Reconnecting Language. Morphology and Syntax in Functional Perspectives. 1997. xiii, 339 pp.
153 Eid, Mushira and Robert R. Ratcliffe (eds.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume X: Salt Lake City, 1996. 1997. vii, 296 pp. 152 Hiraga, Masako K., Christopher Sinha and Sherman Wilcox (eds.): Cultural, Psychological and Typological Issues in Cognitive Linguistics. Selected papers of the bi-annual ICLA meeting in Albuquerque, July 1995. 1999. viii, 338 pp. 151 Liebert, Wolf-Andreas, Gisela Redeker and Linda Waugh (eds.): Discourse and Perspective in Cognitive Linguistics. 1997. xiv, 270 pp. 150 Verspoor, Marjolijn H., Kee Dong Lee and Eve Sweetser (eds.): Lexical and Syntactical Constructions and the Construction of Meaning. Proceedings of the Bi-annual ICLA meeting in Albuquerque, July 1995. 1997. xii, 454 pp. 149 Hall, T. Alan: The Phonology of Coronals. 1997. x, 176 pp. 148 Wolf, George and Nigel Love (eds.): Linguistics Inside Out. Roy Harris and his critics. 1997. xxviii, 344 pp. 147 Hewson, John: The Cognitive System of the French Verb. 1997. xii, 187 pp. 146 Hinskens, Frans, Roeland van Hout and W. Leo Wetzels (eds.): Variation, Change, and Phonological Theory. 1997. x, 314 pp. 145 Hewson, John and Vit Bubenik: Tense and Aspect in Indo-European Languages. Theory, typology, diachrony. 1997. xii, 403 pp. 144 Singh, R.K. (ed.): Trubetzkoy's Orphan. Proceedings of the Montréal Roundtable on “Morphonology: contemporary responses” (Montréal, October 1994). In collaboration with Richard Desrochers. 1996. xiv, 363 pp. 143 Athanasiadou, Angeliki and René Dirven (eds.): On Conditionals Again. 1997. viii, 418 pp. 142 Salmons, Joseph C. and Brian D. Joseph (eds.): Nostratic. Sifting the Evidence. 1998. vi, 293 pp. 141 Eid, Mushira and Dilworth B. Parkinson (eds.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume IX: Washington D.C., 1995. 1996. xiii, 249 pp. 140 Black, James R. and Virginia Motapanyane (eds.): Clitics, Pronouns and Movement. 1997. 375 pp. 139 Black, James R. and Virginia Motapanyane (eds.): Microparametric Syntax and Dialect Variation. 1996. xviii, 269 pp. 138 Sackmann, Robin and Monika Budde (eds.): Theoretical Linguistics and Grammatical Description. Papers in honour of Hans-Heinrich Lieb. 1996. x, 375 pp. 137 Lippi-Green, Rosina L. and Joseph C. Salmons (eds.): Germanic Linguistics. Syntactic and diachronic. 1996. viii, 192 pp. 136 Mitkov, Ruslan and Nicolas Nicolov (eds.): Recent Advances in Natural Language Processing. Selected Papers from RANLP ’95. 1997. xii, 472 pp. 135 Britton, Derek (ed.): English Historical Linguistics 1994. Papers from the 8th International Conference on English Historical Linguistics (8 ICEHL, Edinburgh, 19–23 September 1994). 1996. viii, 403 pp. 134 Eid, Mushira (ed.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume VIII: Amherst, Massachusetts 1994. 1996. vii, 261 pp. 133 Zagona, Karen (ed.): Grammatical Theory and Romance Languages. Selected papers from the 25th Linguistic Symposium on Romance Languages (LSRL XXV) Seattle, 2–4 March 1995. 1996. vi, 330 pp. 132 Herschensohn, Julia: Case Suspension and Binary Complement Structure in French. 1996. xi, 200 pp. 131 Hualde, José Ignacio, Joseba A. Lakarra and R.L. Trask (eds.): Towards a History of the Basque Language. 1996. 365 pp. 130 Eid, Mushira (ed.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume VII: Austin, Texas 1993. 1995. vii, 192 pp. 129 Levin, Saul: Semitic and Indo-European. Volume I: The Principal Etymologies. With observations on Afro-Asiatic. 1995. xxii, 514 pp. 128 Guy, Gregory R., Crawford Feagin, Deborah Schiffrin and John Baugh (eds.): Towards a Social Science of Language. Papers in honor of William Labov. Volume 2: Social interaction and discourse structures. 1997. xviii, 358 pp. 127 Guy, Gregory R., Crawford Feagin, Deborah Schiffrin and John Baugh (eds.): Towards a Social Science of Language. Papers in honor of William Labov. Volume 1: Variation and change in language and society. 1996. xviii, 436 pp. 126 Matras, Yaron (ed.): Romani in Contact. The history, structure and sociology of a language. 1995. xvii, 208 pp. 125 Singh, R.K. (ed.): Towards a Critical Sociolinguistics. 1996. xiii, 342 pp. 124 Andersen, Henning (ed.): Historical Linguistics 1993. Selected papers from the 11th International Conference on Historical Linguistics, Los Angeles, 16–20 August 1993. 1995. x, 460 pp.
123 Amastae, Jon, Grant Goodall, M. Montalbetti and M. Phinney (eds.): Contemporary Research in Romance Linguistics. Papers from the XXII Linguistic Symposium on Romance Languages, El Paso/Juárez, February 22–24, 1992. 1995. viii, 381 pp. 122 Smith, John Charles and Martin Maiden (eds.): Linguistic Theory and the Romance Languages. 1995. xiii, 240 pp. 121 Hasan, Ruqaiya, Carmel Cloran and David G. Butt (eds.): Functional Descriptions. Theory in practice. 1996. xxxvi, 381 pp. 120 Stonham, John T.: Combinatorial Morphology. 1994. xii, 206 pp. 119 Lippi-Green, Rosina L.: Language Ideology and Language Change in Early Modern German. A sociolinguistic study of the consonantal system of Nuremberg. 1994. xiv, 150 pp. 118 Hasan, Ruqaiya and Peter H. Fries (eds.): On Subject and Theme. A discourse functional perspective. 1995. xii, 414 pp. 117 Philippaki-Warburton, Irene, Katerina Nicolaidis and Maria Sifianou (eds.): Themes in Greek Linguistics. Papers from the First International Conference on Greek Linguistics, Reading, September 1993. 1994. xviii, 534 pp. 116 Miller, D. Gary: Ancient Scripts and Phonological Knowledge. 1994. xvi, 139 pp. 115 Eid, Mushira, Vicente Cantarino and Keith Walters (eds.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. VolumeVI: Columbus, Ohio 1992. 1994. viii, 238 pp. 114 Egli, Urs, Peter E. Pause, Christoph Schwarze, Arnim von Stechow and Götz Wienold (eds.): Lexical Knowledge in the Organization of Language. 1995. xiv, 367 pp. 113 Fernández, Francisco Moreno, Miguel Fuster and Juan Jose Calvo (eds.): English Historical Linguistics 1992. Papers from the 7th International Conference on English Historical Linguistics, Valencia, 22–26 September 1992. 1994. viii, 388 pp. 112 Culioli, Antoine: Cognition and Representation in Linguistic Theory. Texts selected, edited and introduced by Michel Liddle. Translated with the assistance of John T. Stonham. 1995. x, 161 pp. 111 Tobin, Yishai: Invariance, Markedness and Distinctive Feature Analysis. A contrastive study of sign systems in English and Hebrew. 1994. xxii, 406 pp. 110 Simone, Raffaele (ed.): Iconicity in Language. 1995. xii, 315 pp. 109 Pagliuca, William (ed.): Perspectives on Grammaticalization. 1994. xx, 306 pp. 108 Lieb, Hans-Heinrich: Linguistic Variables. Towards a unified theory of linguistic variation. 1993. xiv, 261 pp. 107 Marle, Jaap van (ed.): Historical Linguistics 1991. Papers from the 10th International Conference on Historical Linguistics, Amsterdam, August 12–16, 1991. 1993. xviii, 395 pp. 106 Aertsen, Henk and Robert J. Jeffers (eds.): Historical Linguistics 1989. Papers from the 9th International Conference on Historical Linguistics, New Brunswick, 14–18 August 1989. 1993. xviii, 538 pp. 105 Hualde, José Ignacio and Jon Ortiz de Urbina (eds.): Generative Studies in Basque Linguistics. 1993. vi, 334 pp. 104 Kurzová, Helena: From Indo-European to Latin. The evolution of a morphosyntactic type. 1993. xiv, 259 pp. 103 Ashby, William J., Marianne Mithun and Giorgio Perissinotto (eds.): Linguistic Perspectives on Romance Languages. Selected Papers from the XXI Linguistic Symposium on Romance Languages, Santa Barbara, February 21–24, 1991. 1993. xxii, 404 pp. 102 Davis, Philip W. (ed.): Alternative Linguistics. Descriptive and theoretical modes. 1996. vii, 325 pp. 101 Eid, Mushira and Clive Holes (eds.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume V: Ann Arbor, Michigan 1991. 1993. viii, 347 pp. 100 Mufwene, Salikoko S. and Lioba Moshi (eds.): Topics in African Linguistics. Papers from the XXI Annual Conference on African Linguistics, University of Georgia, April 1990. 1993. x, 304 pp. 99 Jensen, John T.: English Phonology. 1993. x, 251 pp. 98 Eid, Mushira and Gregory K. Iverson (eds.): Principles and Prediction. The analysis of natural language. Papers in honor of Gerald Sanders. 1993. xix, 382 pp. 97 Brogyanyi, Bela and Reiner Lipp (eds.): Comparative-Historical Linguistics: Indo-European and Finno-Ugric. Papers in honor of Oswald Szemerényi III. 1993. xii, 566 pp. 96 Lieb, Hans-Heinrich (ed.): Prospects for a New Structuralism. 1992. vii, 275 pp. 95 Miller, D. Gary: Complex Verb Formation. 1993. xx, 381 pp. 94 Hagège, Claude: The Language Builder. An essay on the human signature in linguistic morphogenesis. 1993. xii, 283 pp. 93 Lippi-Green, Rosina L. (ed.): Recent Developments in Germanic Linguistics. 1992. xii, 163 pp.
92 Poyatos, Fernando: Paralanguage: A linguistic and interdisciplinary approach to interactive speech and sounds. 1993. xii, 478 pp. 91 Hirschbühler, Paul and E.F.K. Koerner (eds.): Romance Languages and Modern Linguistic Theory. Selected papers from the XX Linguistic Symposium on Romance Languages, University of Ottawa, April 10–14, 1990. 1992. viii, 416 pp. 90 King, Larry D.: The Semantic Structure of Spanish. Meaning and grammatical form. 1992. xii, 287 pp. 89 Burridge, Kate: Syntactic Change in Germanic. Aspects of language change in Germanic with particular reference to Middle Dutch. 1993. xii, 287 pp. 88 Shields, Jr., Kenneth: A History of Indo-European Verb Morphology. 1992. viii, 160 pp. 87 Brogyanyi, Bela and Reiner Lipp (eds.): Historical Philology: Greek, Latin, and Romance. Papers in honor of Oswald Szemerényi II. 1992. xii, 386 pp. 86 Kess, Joseph F.: Psycholinguistics. Psychology, linguistics, and the study of natural language. 1992. xiv, 360 pp. 85 Broselow, Ellen, Mushira Eid and John McCarthy (eds.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume IV: Detroit, Michigan 1990. 1992. viii, 282 pp. 84 Davis, Garry W. and Gregory K. Iverson (eds.): Explanation in Historical Linguistics. 1992. xiv, 238 pp. 83 Fife, James and Erich Poppe (eds.): Studies in Brythonic Word Order. 1991. x, 360 pp. 82 Van Valin, Jr., Robert D. (ed.): Advances in Role and Reference Grammar. 1992. xii, 569 pp. 81 Lehmann, Winfred P. and Helen-Jo Jakusz Hewitt (eds.): Language Typology 1988. Typological Models in the Service of Reconstruction. 1991. vi, 182 pp. 80 Comrie, Bernard and Mushira Eid (eds.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume III: Salt Lake City, Utah 1989. 1991. xii, 274 pp. 79 Antonsen, Elmer H. and Hans Henrich Hock (eds.): STAEFCRAEFT: Studies in Germanic Linguistics. Selected papers from the 1st and 2nd Symposium on Germanic Linguistics, University of Chicago, 4 April 1985, and University of Illinois at Urbana-Champaign, 3–4 Oct. 1986. 1991. viii, 217 pp. 78 Kac, Michael B.: Grammars and Grammaticality. 1992. x, 259 pp. 77 Boltz, William G. and Michael C. Shapiro (eds.): Studies in the Historical Phonology of Asian Languages. 1991. viii, 249 pp. 76 Wickens, Mark A.: Grammatical Number in English Nouns. An empirical and theoretical account. 1992. xvi, 321 pp. 75 Droste, Flip G. and John E. Joseph (eds.): Linguistic Theory and Grammatical Description. Nine Current Approaches. 1991. viii, 354 pp. 74 Laeufer, Christiane and Terrell A. Morgan (eds.): Theoretical Analyses in Romance Linguistics. Selected papers from the Linguistic Symposium on Romance Languages XIX, Ohio State University, April 21–23, 1989. 1991. viii, 515 pp. 73 Stamenov, Maxim I. (ed.): Current Advances in Semantic Theory. 1991. xi, 565 pp. 72 Eid, Mushira and John McCarthy (eds.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume II: Salt Lake City, Utah 1988. 1990. xiv, 332 pp. 71 O’Grady, William: Categories and Case. The sentence structure of Korean. 1991. vii, 294 pp. 70 Jensen, John T.: Morphology. Word structure in generative grammar. 1990. x, 210 pp. 69 Wanner, Dieter and Douglas A. Kibbee (eds.): New Analyses in Romance Linguistics. Selected papers from the Linguistic Symposium on Romance Languages XVIII, Urbana-Champaign, April 7–9, 1988. 1991. xviii, 385 pp. 68 Ball, Martin J., James Fife, Erich Poppe and Jenny Rowland (eds.): Celtic Linguistics/ Ieithyddiaeth Geltaidd. Readings in the Brythonic Languages. Festschrift for T. Arwyn Watkins. 1990. xxiv, 470 pp. 67 Lehmann, Winfred P. (ed.): Language Typology 1987. Systematic Balance in Language. Papers from the Linguistic Typology Symposium, Berkeley, 1–3 Dec 1987. 1990. x, 212 pp. 66 Andersen, Henning and E.F.K. Koerner (eds.): Historical Linguistics 1987. Papers from the 8th International Conference on Historical Linguistics, Lille, August 30-September 4, 1987. 1990. xii, 577 pp. 65 Adamson, Sylvia M., Vivien A. Law, Nigel Vincent and Susan Wright (eds.): Papers from the 5th International Conference on English Historical Linguistics. 1990. xxi, 583 pp. 64 Brogyanyi, Bela (ed.): Prehistory, History and Historiography of Language, Speech, and Linguistic Theory. Papers in honor of Oswald Szemerényi I. 1992. x, 414 pp. 63 Eid, Mushira (ed.): Perspectives on Arabic Linguistics. Papers from the Annual Symposium on Arabic Linguistics. Volume I: Salt Lake City, Utah 1987. 1990. xiii, 290 pp.
62 Frajzyngier, Zygmunt (ed.): Current Progress in Chadic Linguistics. Proceedings of the International Symposium on Chadic Linguistics, Boulder, Colorado, 1–2 May 1987. 1989. vi, 312 pp. 61 Corrigan, Roberta L., Fred R. Eckman and Michael Noonan (eds.): Linguistic Categorization. Proceedings of an International Symposium in Milwaukee, Wisconsin, April 10–11, 1987. 1989. viii, 348 pp. 60 Kirschner, Carl and Janet Ann DeCesaris (eds.): Studies in Romance Linguistics. Selected Proceedings from the XVII Linguistic Symposium on Romance Languages. 1989. xii, 496 pp. 59 Voorst, Jan van: Event Structure. 1988. x, 181 pp. 58 Arbeitman, Yoël L. (ed.): Fucus: A Semitic/Afrasian Gathering in Remembrance of Albert Ehrman. 1988. xvi, 530 pp. 57 Bubenik, Vit: Hellenistic and Roman Greece as a Sociolinguistic Area. 1989. xvi, 331 pp. 56 Hockett, Charles F.: Refurbishing our Foundations. Elementary linguistics from an advanced point of view. 1987. x, 181 pp. 55 Hall, Jr., Robert A.: Linguistics and Pseudo-Linguistics. 1987. vii, 147 pp. 54 Weidert, Alfons: Tibeto-Burman Tonology. A comparative analysis. 1987. xvii, 512 pp. 53 Sankoff, David: Diversity and Diachrony. 1986. xii, 430 pp. 52 Fasold, Ralph W. and Deborah Schiffrin (eds.): Language Change and Variation. 1989. viii, 450 pp. 51 Chatterjee, Ranjit: Aspect and Meaning in Slavic and Indic. With a foreword by Paul Friedrich. 1989. xxiii, 137 pp. 50 Rudzka-Ostyn, Brygida (ed.): Topics in Cognitive Linguistics. 1988. x, 704 pp. 49 Waugh, Linda and Stephen Rudy (eds.): New Vistas in Grammar: Invariance and Variation. Proceedings of the Second International Roman Jakobson Conference, New York University, Nov. 5–8, 1985. 1991. x, 540 pp. 48 Giacalone-Ramat, Anna, Onofrio Carruba and Giuliano Bernini (eds.): Papers from the 7th International Conference on Historical Linguistics. 1987. xvi, 672 pp. 47 Lehmann, Winfred P. (ed.): Language Typology 1985. Papers from the Linguistic Typology Symposium, Moscow, 9–13 Dec. 1985. 1986. viii, 200 pp. 46 Prideaux, Gary D. and William J. Baker: Strategies and Structures. The processing of relative clauses. 1987. ix, 197 pp. 45 Koopman, Willem F., Frederike van der Leek, Olga Fischer and Roger Eaton (eds.): Explanation and Linguistic Change. 1986. viii, 300 pp. 44 Jungraithmayr, Herrmann and Walter W. Mueller (eds.): Proceedings of the Fourth International Hamito-Semitic Congress. 1987. xiv, 609 pp. 43 Akamatsu, Tsutomu: The Theory of Neutralization and the Archiphoneme in Functional Phonology. 1988. xxi, 533 pp. 42 Makkai, Adam and Alan K. Melby (eds.): Linguistics and Philosophy. Festschrift for Rulon S. Wells. 1985. xviii, 472 pp. 41 Eaton, Roger, Olga Fischer, Willem F. Koopman and Frederike van der Leek (eds.): Papers from the 4th International Conference on English Historical Linguistics, Amsterdam, April 10–13, 1985. 1985. xvii, 341 pp. 40 Fries, Peter H. and Nancy M. Fries (eds.): Toward an Understanding of Language. Charles C. Fries in Perspective. 1985. xvi, 384 pp. 39 Benson, James D., Michael J. Cummings and William S. Greaves (eds.): Linguistics in a Systemic Perspective. 1988. x, 452 pp. 38 Brogyanyi, Bela and Thomas Krömmelbein (eds.): Germanic Dialects. Linguistic and Philological Investigations. 1986. ix, 693 pp. 37 Griffen, Toby D.: Aspects of Dynamic Phonology. 1985. ix, 302 pp. 36 King, Larry D. and Catherine A. Maley (eds.): Selected papers from the XIIIth Linguistic Symposium on Romance. Languages, Chapel Hill, N.C., 24–26 March 1983. 1985. x, 440 pp. 35 Collinge, N.E.: The Laws of Indo-European. 1985. xviii, 273 pp. 34 Fisiak, Jacek (ed.): Papers from the VIth International Conference on Historical Linguistics, Poznaön, 22–26 August 1983. 1985. xxiii, 622 pp. 33 Versteegh, Kees: Pidginization and Creolization. The Case of Arabic. 1984. xiii, 194 pp. 32 Copeland, James E. (ed.): New Directions in Linguistics and Semiotics. 1984. xi, 269 pp. 31 Guillaume, Gustave (1883–1960): Foundations for a Science of Language. Texts selected by Roch Valin. Translated and with an introduction by Walter Hirtle and John Hewson. 1984. xxiv, 175 pp. 30 Hall, Jr., Robert A.: Proto-Romance Morphology. Comparative Romance Grammar, vol. III. 1984. xii, 304 pp. 29 Paprotté, Wolf and René Dirven (eds.): The Ubiquity of Metaphor: Metaphor in language and thought. 1985. iii, 628 pp.
28 Bynon, James (ed.): Current Progress in Afro-Asiatic Linguistics. Papers of the Third International Hamito-Semitic Congress, London, 1978. 1984. xi, 505 pp. 27 Bomhard, Allan R.: Toward Proto-Nostratic. A New Approach to the Comparison of Proto-IndoEuropean and Proto-Afroasiatic. With a foreword by Paul J. Hopper. 1984. xi, 356 pp. 26 Baldi, Philip (ed.): Papers from the XIIth Linguistic Symposium on Romance Languages, University Park, April 1–3, 1982. 1984. xii, 611 pp. 25 Andersen, Paul Kent: Word Order Typology and Comparative Constructions. 1983. xvii, 245 pp. 24 Lehmann, Winfred P. and Yakov Malkiel (eds.): Perspectives on Historical Linguistics. Papers from a conference held at the meeting of the Language Theory Division, Modern Language Assn., San Francisco, 27–30 December 1979. 1982. xii, 379 pp. 23 Danielsen, Niels: Papers in Theoretical Linguistics. Edited by Per Baerentzen. 1992. xxii, 224 pp. 22 Untermann, Jürgen und Bela Brogyanyi (Hrsg.): Das Germanische und die Rekonstruktion der Indogermanischen Grundsprache. Akten des Freiburger Kolloquiums der Indogermanischen Gesellschaft, Freiburg, 26–27 Februar 1981.. Proceedings of the Colloquium of the Indogermanische Gesellschhaft, Freiburg, 26–27 February 1981. 1984. xvii, 237 pp. 21 Ahlqvist, Anders (ed.): Papers from the Fifth International Conference on Historical Linguistics, Galway, April 6–10 1981. 1982. xxix, 527 pp. 20 Norrick, Neal R.: Semiotic Principles in Semantic Theory. 1981. xiii, 252 pp. 19 Ramat, Paolo (ed.): Linguistic Reconstruction and Indo-European Syntax. Proceedings of the Colloquium of the 'Indogermanische Gesellschaft'. University of Pavia, 6–7 September 1979. viii, 263 pp. Expected Out of print 18 Izzo, Herbert J. (ed.): Italic and Romance. Linguistic Studies in Honor of Ernst Pulgram. 1980. xxi, 338 pp. 17 Lieb, Hans-Heinrich: Integrational Linguistics I. 1984. xxiii, 527 pp. 16 Arbeitman, Yoël L. and Allan R. Bomhard (eds.): Bono Homini Donum. Essays in Historical Linguistics, in Memory of J. Alexander Kerns. (2 volumes). 1981. xvi, 557, viii, 581 pp. 15 Anderson, John A. (ed.): Language Form and Linguistic Variation. Papers dedicated to Angus McIntosh. 1982. viii, 496 pp. 14 Traugott, Elizabeth, Rebecca Labrum and Susan C. Shepherd (eds.): Papers from the Fourth International Conference on Historical Linguistics, Stanford, March 26–30 1979. 1980. x, 437 pp. 13 Maher, J. Peter, Allan R. Bomhard and E.F.K. Koerner (eds.): Papers from the Third International Conference on Historical Linguistics, Hamburg, August 22–26 1977. 1982. xvi, 434 pp. 12 Fisiak, Jacek (ed.): Theoretical Issues in Contrastive Linguistics. 1981. x, 430 pp. 11 Brogyanyi, Bela (ed.): Studies in Diachronic, Synchronic, and Typological Linguistics. Festschrift for Oswald Szemérenyi on the Occasion of his 65th Birthday. 1979. xiv, 487, x, 506 pp. (2 vols.). 10 Prideaux, Gary D. (ed.): Perspectives in Experimental Linguistics. Papers from the University of Alberta Conference on Experimental Linguistics, Edmonton, 1–14 Oct. 1978. 1979. xi, 176 pp. 9 Hollien, Harry and Patricia Hollien (eds.): Current Issues in the Phonetic Sciences. Proceedings of the IPS-77 Congress, Miami Beach, Florida, 17–19 December 1977. 1979. xxi, 587pp., xiii, 608 pp. 8 Wilbur, Terence H.: Prolegomena to a Grammar of Basque. 1979. x, 188 pp. 7 Meisel, Jürgen M. and Martin D. Pam (eds.): Linear Order and Generative Theory. 1979. ix, 512 pp. 6 Anttila, Raimo: Historical and Comparative Linguistics. 1989. xvi, 460 pp. 5 Itkonen, Esa: Grammatical Theory and Metascience. A critical investigation into the methodological and philosophical foundations of 'autonomous' linguistics. 1978. x, 355 pp. 4 Hopper, Paul J. (ed.): Studies in Descriptive and Historical Linguistics. Festschrift for Winfred P. Lehmann. 1977. x, 502 pp. 3 Maher, J. Peter: Papers on Language Theory and History. Volume I: Creation and Tradition in Language. With a foreword by Raimo Anttila. 1979. xx, 171 pp. 2 Weidert, Alfons: Componential Analysis of Lushai Phonology. 1975. xiv, 139 pp. 1 Koerner, E.F.K. (ed.): The Transformational-Generative Paradigm and Modern Linguistic Theory. 1975. viii, 462 pp.