CONTEMPORARY MUSIC REVIEW Editor in chief Nigal Osborne Music and the Cognitive Sciences 1990
Issue Editors Ian Cross ...
64 downloads
928 Views
5MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
CONTEMPORARY MUSIC REVIEW Editor in chief Nigal Osborne Music and the Cognitive Sciences 1990
Issue Editors Ian Cross and Irène Deliège Volume 9 Proceedings of Cambridge Conference on Music and the Cognitive Sciences, 1990
harwood academic publishers Published in Switzerland
CONTEMPORARY MUSIC REVIEW Editor in Chief Nigel Osborne (UK) Regional Editors Peter Nelson (UK) Stephen McAdams (France) Fred Lerdahl (USA) Jō Kondō (Japan) Tōru Takemitsu Editorial Boards UK: Paul Driver Alexander Goehr Oliver Knussen Bayan Northcott Anthony Payne USA: John Adams Jacob Druckman John Harbison Tod Machover JAPAN: Joaquim M Benitez, S.J. Shōno Susumu Tokumaru Yoshihiko USSR: Edward Artemyev Edison Denisov Yury Kholopov Alfred Schnittke Vsevolod Zaderatsky
This edition published in the Taylor & Francis e-Library, 2005. To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to http://www.ebookstore.tandf.co.uk/. Aims and Scope: Contemporary Music Review is a contemporary musicians’ journal. It provides a forum where new tendencies in composition can be discussed in both breadth and depth. Each issue will focus on a specific topic. The main concern of the journal will be composition today in all its aspects—its techniques, aesthetics and technology and its relationship with other disciplines and currents of thought. The publication may also serve as a vehicle to communicate actual musical materials. Notes for contributors can be found at the back of the journal. © 1993 Harwood Academic Publishers GmbH. All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, without permission in writing from the Publisher. Ordering Information Each volume is comprised of an irregular number of parts depending upon size. Issues are available individually as well as by subscription. 1993 Volume: 7–8 Orders may be placed with your usual supplier or directly with Harwood Academic Publishers GmbH care of the addresses shown on the inside back cover. Journal subscriptions are sold on a per volume basis only. Claims for nonreceipt of issues will be honored free of charge if made within three months of publication of the issue. Subscriptions are available for microform editions; details will be furnished upon request. All issues are dispatched by airmail throughout the world. Subscription Rates Base list subscription price per volume: ECU 58.00 (US $69.00).* This price is available only to individuals whose library subscribes to the journal OR who warrant that the journal is for their own use and provide a home address for mailing. Orders must be sent directly to the Publisher and payment must be made by personal check or credit card. Separate rates apply to academic and corporate institutions. These rates may also include photocopy license and postage and handling charges. Special discounts are available to continuing subscribers through our Subscriber Incentive Plan (SIP). *ECU (European Currency Unit) is the worldwide base list currency rate; payment can be made by draft drawn on ECU currency in the amount shown or in local currency at the current conversion rate. The US Dollar rate is based on the ECU rate and applies to North American subscribers only. Subscribers from other territories should contact their agents or one of the offices listed on the inside back cover. To order direct and for enquiries, contact: Europe Y-Parc, Chemin de la Sallaz 1400 Yverdon, Switzerland Telephone: (024) 239–670 Fax: (024) 239–671
Far East (excluding Japan) Kent Ridge, PO Box 1180 Singapore 9111 Telephone: 741–6933 Fax: 741–6922 USA PO Box 786, Cooper Station New York, N.Y. 10276 Telephone: (212) 206–8900 Fax: (212) 645–2459 Japan Yohan Western Publications Distribution Agency 3–14–9, Okubo, Shinjuku-ku, Tokyo 169, Japan Telephone: (03) 3208–0181 Fax: (03) 3209–0288 License to Photocopy This publication and each of the articles contained herein are protected by copyright. The subscription rate for academic and corporate subscribers includes the Publisher’s licensing fee which allows the subscriber photocopy privileges beyond the “fair use” provision of most copyright laws. Please note, however, that the license does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotion purposes, for creating new collective works, for resale, or as agent, either express or implied, of another individual or company. A subscriber may apply to the Publisher for a waiver of the license fee. For licensing information, please write to Harwood Academic Publishers GmbH, Y-Parc, Chemin de la Sallaz, 1400 Yverdon, Switzerland. Reprints of Individual Articles Copies of individual articles may be obtained from the Publisher’s own document delivery service at the appropriate fees. Write to: SCAN, PO Box 786, Cooper Station, New York, NY 10276, USA or Y-Parc, Chemin de la Sallaz, 1400 Yverdon, Switzerland. Special Fax Service—USA: (212) 645– 2459 or Switzerland: (024) 239–671. Permission to reproduce and/or translate material contained in this journal must be obtained in writing from the Publisher. Please contact Rights and Permissions Officer, Harwood Academic Publishers GmbH, Y-Parc, Chemin de la Sallaz, 1400 Yverdon, Switzerland. Distributed by STBS—Publishers Distributor SEPTEMBER 1993 ISBN 0-203-39328-7 Master e-book ISBN
ISBN 0-203-39705-3 (OEB Format) ISBN 3-7186-54202 (Print Edition)
Contents Introduction: Cognitive science and music—an overview Ian CROSS and Irène DELIÈGE
1
Music in Culture An interactive experimental method for the determination of musical scales in oral cultures: Application to the vocal music of the Aka Pygmies of Central Africa Simha AROM and Susanne FÜRNISS An interactive experimental method for the determination of musical scales in oral cultures: xylophone music of Central Africa Vincent DEHOUX and Frédéric VOISIN The influence of the tambura drone on the perception of proximity among scale types in North Indian classical music Kathryn VAUGHN
7
14
21
Constraints on Music Cognition—Psychoacoustical Pitch properties of chords of octave-spaced tones Richard PARNCUTT
37
Identification and blend of timbres as a basis for orchestration Roger A.KENDALL and Edward C.CARTERETTE
55
What is the octave of a harmonically rich note? Roy D.PATTERSON, Robert MILROY and Michael ALLERHAND
75
Brightness and octave position: are changes in spectral envelope and in tone height perceptually equivalent? Ken ROBINSON
89
Constraints on Music Cognition—Neural A cognitive neuropsychological analysis of melody recall David W.PERRY Split-brain studies of music perception and cognition Mark Jude TRAMO
102
119
Musical Structure in Cognition The influence of implicit harmony, rhythm and musical training on the abstraction of “tension-relaxation schemas” in tonal musical phrases 132 Emmanuel BIGAND Is the perception of melody governed by motivic arguments or by generative rules or by both? 150 Archie LEVEY Transformation, migration and restoration: shades of illusion in the perception of music Zofia KAMINSKA and Peter MAYER
163
Associationism and musical soundtrack phenomena Annabel J.COHEN
175
Rhythm perception: interactions between time and intensity Claire GERARD, Carolyn DRAKE and Marie-Claire BOTTE Mechanisms of cue extraction in memory for musical time Irène DELIEGE Generativity, mimesis and the human body in music performance Eric F.CLARKE
192 204
221
Representations of Musical Structure Issues on the representation of time and structure in music Henkjan HONING
235
A connectionist and a traditional AI quantizer, symbolic versus sub-symbolic models of rhythm perception Peter DESAIN
254
Computer perception of phrase structure Robert ERASER
274
Critical study of Sundberg’s rules for expression in the performance of melodies Peter van OOSTEN
287
Contribution to the design of an expert system for the generation of tonal multiple counterpoint Agostino di SCIPIO
296
Computer-aided comparison of syntax systems in three piano pieces by Debussy David MEREDITH
307
Psychological analysis of musical composition: composition as design Ron ROOZENDAAL
329
How do we perceive atonal music? Suggestions for a theoretical approach Michel IMBERTY
336
Index
353
Introduction: Cognitive science and music
1
Introduction: Cognitive science and music—an overview Ian Cross and Irène Deliège Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 1–6 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
Over the last decade, cognitive science has increasingly come to be seen as offering an appropriate framework within which to explore and to explain issues in musical listening, performance, composition, development and analysis. There are a number of reasons for this. As cognitive science develops, it provides progressively more and more sophisticated and plausible accounts of the phenomena of mental life. Moreover, cognitive science appears to offer frameworks of understanding (or at least modes of enquiry) which appear largely “culturally-neutral”. This is of profound importance given the culturally-diffuse nature of music as it exists now in the West and the fact that most musicological frameworks of understanding can be thought of as highly ethnocentric and culturally-specific. In addition, a number of different dynamics are impelling what might be called the “computerisation” of music, or the embodiment of aspects of music in computer software and in hardware. This drive towards representing elements of music in computational terms is motivated by powerful aesthetic, educational and commercial imperatives. On the whole, the application of cognitive science to music can be thought of as being intended to bridge the gap between what music feels like—its experiential texture—and the language that is used to describe it and to teach it. To be more specific, the development of a cognitive science of music can help to span the disjunction that exists between the ways that music is experienced by listeners and by practising musicians and the rational frameworks of discourse that conventionally constitute music theory, i.e. that are used to describe and to define music. This development proceeds by seeking to provide accounts of music that are consonant with the concepts of computability and with empirically-derived evidence about musical perception, performance and creation. To approach music by means of cognitive science involves the scientific study of all aspects of the musical mind and of musical behaviour at all possible levels of explanation—be it neurophysiological, psychoacoustical or cognitive-psychological—by theoretical or empirical inquiry, and by means of computer modelling or by practical experiment. The objects of study—musical behaviour and the musical mind—can be conceived of as comprising the capacity to experience—and to learn to experience— patterns of events in time as music, and in the faculty of conceiving or producing, and of learning to conceive or produce, particular sequences of events as music.
Contemporary music review
2
The idea of musical behaviour as an object of study is not unique to cognitive science. In fact, the cognitive science of music shares with ethnomusicology a concern with accounting for musical behaviours; however, while ethnomusicology tends to do so on the basis of the cultural and social function or utility of such behaviours, the cognitive sciences of music do so in terms of the inferred mental processes underlying such behaviours. Given this aim, it is not surprising that it is only in the last thirty years that music has become a focus of study for cognitive scientists, whether psychologists, computer scientists, philosophers or musicologists. After all, behaviour—particularly musical behaviour—is intrinsically evanescent. It is only in this century that have we have come to possess methods whereby we can reliably assess the representativeness of particular observed behaviours in respect of broader theoretical classes of behaviour, only in the last thirty years have we had appropriate metaphors in terms of which to express the mental processes that can be inferred as underlying the observed musical behaviours and only perhaps in the last twenty years have we had the technology to record and to examine musical behaviours economically and accurately. Despite these advances in the means of enquiry, until recently it might have been suggested (and sometimes was) that attempts to understand music in cognitive terms were inadequate. Studies were condemned as being over-reductionist (e.g., attempting to account for the cognition of melody in terms of the perception of single notes presented in isolation) or at being musically or psychologically simplistic. This can be traced to a lack of communication which existed between musicians and researchers in the cognitive sciences; the comments and critiques encountered by researchers were all too often directed as narrow issues of theory and method, whilst musicians were simply unaware of—or unable to comprehend the issues, methods and findings of cognitive studies. However, these circumstances have changed. As John Sloboda (1985) puts it “the psychology of music has come of age”. This “coming of age” he equates this with the appearance of Lerdahl and Jackendoff’s seminal Generative Theory of Tonal Music (1983), a text which constitutes a highly sophisticated attempt to provide a theory of tonal music consonant with the findings of cognitive psychology. Over the last decade research into music cognition has increasingly aspired—and frequently risen—to this level of sophistication, seeking to reflect an awareness of and a responsiveness to historical, analytical, practical and pedagogical perspectives on music. The constant need to stress the requirements to avoid over-reductionism and to strive for a high degree of “ecological validity” in studying music perception and production has diminished as psychologists, computer scientists and musicians have come together in communication and collaboration. The papers collected in this volume clearly reveal the range, diversity and sophistication of current cognitive-scientific accounts of music. These papers also attest to the growing realisation that one of the most significant contributions that cognitive science can make to the elucidation of music is in the exploration of music in its cultural context. In this volume it will be seen that the “culturally-neutral” character of cognitive-scientific explanation in combination with the close analysis that exemplifies ethnomusicological method can yield insights about music unattainable by other means. Bruner (1990) suggests that: “It is culture…that shapes human life and the human mind, that gives meaning to action by situating its underlying intentional states in an
Introduction: Cognitive science and music
3
interpretive system. It does this by imposing the patterns inherent in the culture’s symbolic systems—its language and discourse modes, the forms of logical and narrative explication, and patterns of mutually dependent communal life.” The dynamics of the “forms of logical and narrative explication” that shape the mind within a given culture are rarely amenable to conscious introspection; they are usually not consciously-knowable by members of that culture. They can only be unravelled by means that are often oblique, but which are centred on cognitive-scientific method that is sensitively and imaginatively applied. A further powerful current in recent developments in the application of cognitive science to music that is evident in these papers is the drive towards the “computerisation” of music. Indeed, this can appear to be the governing force in these developments. This is unsurprising, given the confluence of intellectual, aesthetic, education and commercial imperatives that become manifest when one considers the interaction of music and cognitive science. Since its inception, one of the main engines of cognitive science has been the concept of computability, the idea that computational logic should constitute the principal criterion whereby to judge the efficacy or adequacy of theories of mind. This idea can exist in “harder” versions, wherein computational theory is taken to represent the fundamental substrate of mind (e.g., Churchland, 1986) or “softer” versions, in which computational logic serves as a functional metaphor in the description of mental processes (e.g. Bruner, 1990). This permeation of cognitive science by the concept of computability has increasingly determined the tools, methods and output of cognitive science. At the same time, the practical application of computers in music has recently sustained exponential growth. As technology has developed and advanced, computers have pervaded music at all levels, from the school to the studio, from the concert-hall to the field-trip. While the genesis of the use of computers in music lies in post-war musical aesthetics their current ubiquity arises from commercial interests responding to, and leading, consumer demand for accessible and populist musical tools. The broader utility of these tools in composition, performance and in music education is by-and-large a fortunate and highly-productive spin-off. There is obvious scope for exciting and innovative practical development in the coming-together of music and cognitive science. A cognitive-scientific understanding of the nature of perception and performance can help to shape new tools for composers and performers, providing new means of control over complex musical systems and structures. It can enhance and vastly expand the ways in which human and computer “performers” may interact. It might even help to mitigate the baleful influence on the development of computer-based learning systems that is exerted by those political ideologies for which cost-effectiveness is more important than any enrichment of the human experience that might arise from the process of learning. Overall, then, cognitive science has much to contribute to our understanding of music. It may even play some role in determining what we come to accept as music. However, even if we reject the proposition that cognitive science should be prescriptive in the domain of music, there is still space for it to make a major contribution to the frameworks
Contemporary music review
4
of discourse in terms of which we describe and conceive of music. Music itself has value as a subject for the cognitive sciences; the understanding of a non-verbal auditory domain, rich in associative power, multifarious in form and culturally-emblematic surely has much to offer to the quest for a better understanding of the human mind. The papers in this volume demonstrate the breadth and complexity of some of what has already been achieved, and point towards some likely future developments.
Volume structure and contents This volume arises out of the 2nd International Conference on Music and the Cognitive Sciences, which was held in Cambridge in 1990. Its contents are laid out across six sections: Music in Culture, Psychoacoustical Constraints on Music Cognition, Neural Constraints on Music Cognition, Musical Structure in Cognition, Representations of Musical Structure and General Issues in Cognitive Musicology. These divisions arose from the ways in which papers submitted for the Cambridge Conference fitted into the four thematic areas proposed for that Conference—Music in Culture, Music in Action, Representing Musical Structure and Cognitive Musicology. In the first instance, these four themes chosen for the Conference were intended to embrace aspects of cognitive science which would be differentiable largely by the methods that they employed. Thus, it was felt, papers within the theme Music in Culture would be likely to reflect ethnomusicological practices (reliance on informants, immersion of researcher(s) in specific cultures via field-work, etc.): papers in Music in Action should reflect experimental work employing conventional psychological empirical methods: those in Representing Musical Structure would focus on issues in, and applications of, modelling music cognition via computer: and those in Cognitive Musicology should reflect the ways in which aspects of cognitive-scientific studies had fed back into the general theory and practice of music. However, the papers submitted for the Conference provided a different picture from that which had been anticipated. Most papers fell within the theme Music in Action, with those falling into Representing Musical Structure being the next most numerous. Those which dealt with issues of Music in Culture did so in fascinating and unexpected ways, while few papers actually demonstrated the applications of cognitive science to musicological concerns and thus fell under the heading of Cognitive Musicology. From the wealth of papers considered for this volume, it appeared most sensible to retain slightly altered versions of the four original themes but to split Music in Action into three categories: Psychoacoustical Constraints on Music Cognition, Neural Constraints on Music Cognition and Musical Structure and Cognition. This division acknowledges the differences in methodology which exist within the experimental tradition, and pointsup the ways in which psychoacoustical and neuropsychological studies sketch the boundaries of enquiry for cognitive science as a whole. The first section, Music in Culture, contains three papers. Two of these are by members of a group of French researchers, directed by Simha Arom, who have developed and applied new research methods to issues in African music that would appear otherwise intractable. The third paper, by Kathryn Vaughn, explores aspects of the perception and
Introduction: Cognitive science and music
5
cognitive representation of North Indian music, using sophisticated experimental psychological techniques. The second section, Psychoacoustical Constraints on Music Cognition, contains papers which examine, respectively, the degree to which psychoacoustical considerations can be said to underlie our perceptions of harmonic structure (Parncutt), about timbral quality (Kendall and Carterette), and the interaction—or, perhaps, the ways in which we respond to real instrumental sonorities in making judgments relative inseparability-of pitch and timbre when considered from psychoacoustical perspectives (papers by Patterson, Milroy and Allerhand, and by Robinson). The next section, Neural Constraints on Music Cognition, contains papers by Perry and by Tramo, addressing aspects of our musical-perceptual and memory abilities via neurological data. The fourth section, Musical Structure in Cognition, starts with a paper by Bigand that explores the issue of how, in listening, we abstract those elements of musical structure that may well determine our emotional responses to music. A paper by Levey then studies our sensitivity to different types of music-theoretic relationships between melodies. The next paper, by Meyer and Kaminska, outlines a number of different experimental approaches to determining the similarities between musical and verbal processing. Cohen’s paper presents a rare study of ways in which musical sound contributes to our overall perception by examining the interaction of sound and vision via film. A paper by Gérard, Botte, and Drake then investigates the factors that play a role in our perception of rhythm. Following this, a paper by Deliège presents the results of a study of the ongoing perceptions that arise in the course of listening to a piece of music. The section ends with a paper by Clarke, in which an exploration of real and “artificial” (i.e. computer-generated) rubato provides clues as to the relation between musical structure and expression. The first two papers in the following section Representations of Musical Structure, by Honing and by Desain, present theoretical considerations of many of the issues addressed experimentally by Clarke within the framework of their functional connectionist model of rhythmic quantization processes. In contrast, Eraser’s subsequent paper adopts an explicitly grammatical approach to questions of how the cognitive representation of musical phrase structure might be modelled on computer. The paper by van Oosten returns to the connectionist framework to provide a critique of Sundberg’s model of musical performance. The section concludes with two papers examining, respectively, constraints on the computational representation of contrapuntal composition (di Scipio) and the formal representation of harmonic structure in music analysis (Meredith). The concluding section, General Issues in Cognitive Musicology, is comprised of three papers. The first, by Meeùs, explores issues in the development of a non-linguistically based semiotics of music. The second, by Roozendaal, reports an attempt to delineate the cognitive processes involved in the act of musical composition. Fittingly, the final paper of the volume is by Michel Imberty. At the Cambridge Conference, a meeting convened by Stephen McAdams led to the formation of the European Society for the Cognitive Sciences of Music (ESCOM). This body held its first formal colloquium and assembly in 1991, at which Professeur Imberty was elected to be its president. It is only fitting, then, that his paper on the perception of atonal music should round off the present volume. The editors would like to thank all of those who contributed to the organisation of the Cambridge Conference and the subsequent production of this volume. Thanks should go
Contemporary music review
6
to all members of the Organising Committee, to all of those who chaired the sessions of the Conference and, in particular, to Marie-Isabelle Collart, Diana Stammers and Jane Woods for all of their administrative and practical assistance. The editors would also like to express their gratitude to the British Council and to the British Academy, without whose generosity the Conference could not have taken place.
References Bruner, J. (1990) Acts of Meaning. London: Harvard University Press. Churchland, P.S. (1986) Neurophilosophy. Cambridge, Mass.: M.I.T. Press. Lerdahl, F. and Jackendoff, R.A. (1983) A Generative Theory of Tonal Music. M.I.T. Press. Cambridge, Mass.: Sloboda, J. (1985) The Musical Mind. Oxford: Oxford University Press.
An interactive experimental method for the determination of musical scales
7
Music in Culture An interactive experimental method for the determination of musical scales in oral cultures Application to the vocal music of the Aka Pygmies of Central Africa Simha Arom and Susanne Fürniss Lacito, CNRS, Paris Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 7–12 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
The contrapuntal vocal polyphony of the Aka Pygmies is based on a pentatonic scale but the nature of this scale is difficult to determine by ear. To overcome this problem and to circumvent difficulties of articulating such abstract concepts as musical scales which, for the Aka, are not subject to verbalisation, a method based on the use of a synthesiser was conceived and applied in a series of experiments among the Pygmies. In these experiments, polyphonic music of their own culture was simulated with different underlying scale models and submitted to their cultural judgment. This method was shown not only to cope with the initial problem but also to provoke a series of non-verbal interactions that open new dimensions for the study of cognitive aspects of musical systems in oral traditions. KEY WORDS: Ethnomusicology, methodology, modelling, scales, experimental research.
Aka Pygmy music is essentially vocal. It is characterized by a contrapuntal polyphony based on four constituent parts. Each of these four parts has a name in Aka language: mò tàngòlè, ngúé wà lémbò, ò sêsê and di yèí. Moreover, each of them can be distinguished
Contemporary music review
8
from the others by several distinct traits: the presence or absence of words, its position in the sound space or the vocal technique with which it is executed. The mò tángòlè1 is the principal voice which begins the song and pronounces the incipit, the essential words of the song.2 This part is generally sung by a man. Ngúé wà lémbò which means literally “the mother of the song”, is (as the name indicates) the support of the song. This part has longer values than the other ones and, according to Aka theory, is a men’s part as well. The ò sêsê3 is a female middle voice characterized by a contrary movement to the mò tángòlè’s, with generally a descending melodic line. These three parts are sung with the chest voice. The fourth part, dì yèí, which means literally “yodelling”, is a yodelled part which is sung above all the other parts by the women. Aka music is an isoperiodic music, being embedded in invariant periodic cycles. These temporal matrices consist of regular and always even numbers of beats. The rigourousness of the periodic framework is attested by the recurrence of similar musical material in identical positions of each cycle. Each of the four parts has its own melodic scheme for every single song. This scheme serves as a referent for the several variations in terms of which it is realised. It consists of a minimal and non-varied version of the part, that is, a real pattern, determined by the presence of certain notes systematically located in certain positions within the cycle. With one exception, the contrapuntal repertoire of the Pygmies is based on poly rhythmic “blocks”, performed by several percussion instruments of which the pitches are not relevant (Arom 1985:408). This means that their singing has no reference whatever to any predetermined pitch. Seen from the perspective of its scale structure, this music can be considered as being sung a cappella. When listening to a polyphonic pygmy song there can be no doubt that it is based on a pentatonic system. Nevertheless, one can observe non-systematic phenomena in the realisation of the scale-degrees, the exact position of which are problematic. For example, although a semitone is never sung as a successive interval, it does appear in certain musical contexts. On the one hand, certain contiguous intervals are modified according to whether they are part of a rising or descending passage; on the other hand, the performance of part of a rising or descending passage; on the other hand, the performance of disjunct, yodelled intervals can give way to what appears now as a major sixth, now a minor seventh. The songs we analysed in this study 4, have been recorded by means of a re-recording technique5; the different constituent parts are not recorded simultaneously but successively, each of the singers hearing in his headphones the part or parts of the previous singer(s). By this means, each part of the polyphony can be isolated on a separate track of a tape (Arom 1976). When examining the relationship between the different—separately recorded—parts of a song, a divergence emerged between horizontal and vertical listening, which led to different conclusions. Thus, two scale-degrees which appeared an octave apart vertically were revealed to be forming an interval of a major seventh on the separate audition of the two parts which contained them. This means that a physically identical phenomenon with stable elements—as nothing physical varies on the tape—is being perceived in different ways. This last interval, a major seventh, creates a problem, since it is excluded from the anhemitonic pentatonic system.
An interactive experimental method for the determination of musical scales
9
In order to interpret these kinds of phenomena, and to determine whether they were casual or structural, some acoustical analyses were carried out, starting with measurements of fundamental frequencies and the resulting intervals.6 However, this procedure did not uncover any systematic explanation of the observed phenomena; if in one case the seconds and sevenths were quite close to the Pythagorean system, in other cases they were closer to equipentatonic intervals. The possibility was also explored that the alternation of different vocal timbres associated with the use of specific vowels in the yodelled production of disjunct scale-degrees might account for the variations of pitch observed. However, an analysis based on this hypothesis provided no basis for an explanation of the observed interval usage. It was felt that a survey of scalar theories in the history of music since antiquity might suggest some scale models based on different orderings of scale-degrees within an octave, at least one of which might correspond to the musical scale of the Aka. A scale obtained by superimposition of fifths (c-g-d-a-e, Pythagorean system), a scale made of disjunct tetrachords (c-f/g-c) each divided into two equal intervals, together with some other scales, each corresponding to a different sectional sequence from the harmonic series, were considered. Here again, it was impossible to give preference to one of these models by the mere acoustical or musical analysis. Accordingly, it was thought that it might perhaps be better to try a method of analysis-by-synthesis. This would allow us to submit particular different scale models to the Pygmies’ own judgement. In order to communicate with the Aka on such an abstract level of musical theory, we had to take several steps towards them for to approach their own understanding as closely as possible. Only by the application of the Aka’s musical concepts—as known from earlier studies7—could we hope to find a point of intersection common to both sides. This was regarded as the basic step or origin of the methodology that will now be discussed. It was evident that a requirement existed for a purpose-designed toolkit with which interactive experiments in field conditions with the Aka themselves could be conducted. Any experiment should be based on contrapuntal pieces of their own culture, in which the different scale-models would be embodied. This latter aspect is of great importance; in cultures with implicit musical theory and without institutionalized musical apprenticeship, investigation of scale-structure has to be carried out in a concrete musical context, coming as close as possible to a traditional performance. The method used should ideally circumvent difficulties of articulating such abstract concepts as musical scales which, for the Aka, are not subject to verbalisation. It should further allow a nondirective investigation but at the same time elicit an answer to questions which, in certain respects, had not previously been posed. It was intended to discover which of the proposed scale models would be considered as adequate and which not. As we had many difficulties in synthesizing an acceptable vocal timbre8, we discarded the idea of imitating the human voice and contented ourselves with instrumental timbres developed from the pre-programmed timbres of a Yamaha DX7 IIFD synthesizer9. The latter enables the use of an integrated micro-tuning programme—fundamental for the application of our method—permitting extremely fine adjustments of pitch at 1/85th of a semitone (that is 1.17 cents). With this micro-tuning programme we were able to program the scale models which seemed to be the closest to those of Pygmy music and which we wanted to submit to the Aka’s judgement.
Contemporary music review
10
Who were the judges? In an Aka community, musical practice is shared by all of its members. Each member of an encampment knows the totality of the repertoire. In this respect, everybody is equally performer, listener and judge. The encampment of Mbonzo, where we worked, consists of about 25 people and the majority of its members, i.e. about 15 persons, participated in our experiments. Not until we had reached Africa did Frédéric Voisin discover the sequencer which is integrated within the synthesizer and thanks to which we could transform it to a real “mediator” between Aka music and our hypotheses about it. Using tapes and transcriptions already to hand, it would become possible to simulate several versions of the same polyphonic song. However, before being able to undertake the experiments, properly speaking, it was necessary to get the Pygmies not to reject the very sound of the synthesizer from the outset. To do this, we had the idea of letting them hear some utterly different music which would share with theirs some common features. This was intended to avoid confusion between the following parameters: structural periodicity, vocal parts and timbre. The music chosen for this preliminary phase of the experiment was the opening of the Andante of the first movement of the Partita in C minor by J.S.Bach. It was performed by one of us on the synthesizer. In order to reproduce the principle of periodicity, the piece was put into a loop with a cycle of eight beats, at the end of which the same segment reappeared. We told our Pygmy participants, “You are going to hear a piece of our own music, but which works like your own songs. There are not four parts in this music as there are in yours, there are only two, which correspond to your mò tángòlè and ngúé wà lémbò. You will be able to follow them by listening, and it will help you to look at our musician’s hands.” Following this, we would need to get the Pygmies to recognise their own vocal music reconstituted via the synthesizer in a timbre noticeably different from that of the human voice. To try to help them to overcome this difficulty, we took the opening of the Andante and let them hear it at first with a very different timbre, like the sound of a bell. We repeated this piece several times with different timbres, until we converged on the timbre of one of their own instruments, a little wooden flute. At this stage, some Aka melodies being simulated by the synthesizer with this timbre were identified almost immediately. The Aka not only got used very quickly to the sound of the synthesizer and the reproduction of their vocal songs on an instrument, but also themselves spontaneously tried to play on the synthesizer. All experimental sessions were filmed in their entirety with a video camera. The scale experiment, properly speaking, consisted of letting the Aka listen to each of the simulated versions several times, each version having been programmed with one of the different scale models; they were asked, after each of these versions, to accept or to reject it. For this type of experiment, we chose two polyphonic songs, each of them having been programmed in two versions: one in a schematic form, that is, the mere polyphonic four-part pattern without any variation, and the other coming close to a conventional performance, with variations in every constituent part. Both versions were put into a loop in order to restore the periodic character of the music. The variations effected were
An interactive experimental method for the determination of musical scales
11
chosen arbitrarily from the paradigms of every part’s proper variations, as the Aka usually do when singing in a real performance. In order to be sure that they concentrated their listening on the melodic structure of each part and on the vertical resultant of their superimposition, we asked each of them to follow his individual part while listening and to tell us if it was correct or not and if they agreed with its relation to the other parts. Apart from some remarks about rhythmic inaccuracies and the particular sound of the “voice of the machine”, there was no rejection either on the melodic axis nor on that of simultaneity. Several times and after having repeated very different versions from the point of view of the scale structure, the Aka said to us: “What the machine does and what we sing is the same thing”. The result of this series of experiments was completely unexpected: the Aka accepted every one—the totality—of the versions we had submitted to their judgement. In other words, they considered the ten different scale models as all being equivalent. Some of the Pygmies expressed the desire to try to play more on the synthesizer. This is why, on one occasion, we left them the instrument for three hours while we were absent from the camp. In the meantime, a fixed video camera filmed their exploration of the keyboard and their attempts to reproduce especially those melodies we had worked on that morning. Following this familiarisation with the synthesizer, we could integrate in our experiments the performance on the synthesizer by the Aka themselves, but always in presence of the other members of the encampment. To our great astonishment, this revealed that the vocal parts can start from any degree of the same pentatonic scale. Such liberties modify considerably each interval’s width with respect to the contour of each melody. Additionally, watching later the video of the Aka’s own experiments on the synthesizer while we were absent, we found the same phenomenon: one and the same melody was performed several times, starting each time from a different degree, but without changing the pentatonic system (see Figure 1).
Figure 1 “Mutations” of a melody, performed by the Aka when playing on the synthesizer.
Contemporary music review
12
The Aka themselves performed melodies with several “transpositions”, or rather, “mutations” (involving the modification of some interval’s width within the same melody). This led us to the following hypotheses: if the key to the melodic structure of Aka Pygmy song is not to be found in the scales used, it can only be found in the progressive unfolding of the parts. The order of succession of the degrees in a pentatonic scale seems to prevail over the width of the intervals between them, subject to the condition that the melodic contour—which is characteristic for each and every song—be respected. A last experiment, which dealt specially with the recognition of melodic contours of each of the constituent parts, confirmed this hypothesis. Generally admitted theories about interval systems are based on the idea that a scale system—apart from the more-or-less large margins of tolerance it allows—consists of a mental template, a kind of mental grid, in which each degree of the scale has its predetermined, more-or-less fixed, position. The results of the experiments shown here may make such an idea questionable; indeed we found that Aka Pygmy music admits ten different scale models. The role of the synthesizer in the heuristics of this result is pre-eminent. It did not only permit us to investigate the issues that we had intended when preparing the field work. On the contrary, by allowing the Pygmies to participate as much as possible, the synthesizer provoked a series of interactions which reorientated and adjusted our research in other directions that we had not imagined before, such as the idea of investigating melodic contours. Thus, it appears that the application of a method of investigation to the study of scale systems which associates high technology with field work opens new dimensions for the study of cognitive aspects of musical systems in oral traditions.
Notes 1. literally “the one who counts”. 2. All the other parts are mostly sung with non-significant syllables. 3. literally “underneath”, which means inferior in hierarchy to the mò tángòlè. 4. Our corpus consisted of 8 polyphonic contrapuntal songs of 4 parts and 8 tale-songs, each of them recorded with different versions, the sum of which is 158 isolated parts. 5. They were collected during several field trips between 1974 and 1983 by the first author. 6. This was done by the second author at Hamburg’s Tonstudio of the Staatliche Hochschule für Musik und darstellende Kunst in 1987 with a sampler CASIO FZ-1, a synthesizer SYNTHI 100 (Electronic Music Studios London Ltd.) and a frequency-meter CA 51 N (Schurig). Other measurements were made at the Department of Music of the Hebrew University of Jerusalem on the Cohen-Katz-Melograph, and others again with the programme FØana2 of the workstation S_TOOLS at the Kommission für Schallforschung, Austrian Academy of Sciences. 7. The first author has worked on Pygmy music since 1971. 8. Knowing that the timbre of synthesized sounds could be problematic for the acceptability of the experiments, the second author first tried to sample Pygmy songs on a PDP 11.04 computer. By this means, the variation of scale-degrees would not have modified the characteristics of the voice timbre proper to the Pygmies. But the synthesis programmes at hand, based on the treatment of speech, were not accurate enough to be used for the simulation of singing; even the simple reproduction of the sampled melody was completely unsatisfying. 9. This was done by Frédéric Voisin.
An interactive experimental method for the determination of musical scales
13
References Arom, S. (1976) The use of play-back techniques in the study of Oral polyphonies. Ethnomusicology, 20(3), 483–519. Arom, S. (1985) Polyphonies et polyrythmies instrumentales d’Afrique Centrale. Paris: SELAF. Arom, S. (1987) La musique des Pygmées. Le Courrier du CNRS, 69–70, 60. Arom, S. (1991) A synthesizer in the Central African bush: a method of interactive exploration of musical scales. In Für Ligeti. Die Referate des Ligeti-Kongresses Hamburg 1988. LaaberVerlag, pp. 163–178. Arom, S. & V.Dehoux (1978) Puisque personne ne sait à l’avance ce que tout autre que lui-même va chanter dans la seconde qui suit. Musique en jeu, 32, 67–71. Arom, S. & S.Fürniss (1991) The pentatonic system of the Aka-Pygmies of Central Africa. In Selected articles of the VIIth European Seminar on Ethnomusicology October 1990. Berlin: Intercultural Music Studies (in the press). Chailley, J. (1960) L’imbroglio des modes. Paris: A.Leduc. Chailley, J. (1964) Ethnomusicologie et harmonie classique. In Les Colloques de Wégimont IV, 1958–1960, pp. 249–269. Paris: Les Belles Lettres. Chailley, J. & J.Viret (1988) Le symbolisme de la gamme. Paris: Revue Musicale 408–409. Fürniss, S. (1991) Die Jodeltechnik der Aka-Pygmäen in Zentralafrika. Eine akustisch-phonetische Untersuchung. Berlin: Dietrich Reimer Verlag (in the press). Kubik G. (1985) African tone-systems: a reassessment. Yearbook for Traditional Music, 17, 31–63. Sallée, P. (1985) Quelques hypothèses, constatations et expériences à propos de l’échelle pentaphone de la musique des Pygmées Bibayak du Gabon. Paper presented at the European Seminar in Ethnomusicology, Belfast, March 1985.
Contemporary music review
14
An interactive experimental method for the determination of musical scales in oral cultures Application to the xylophone music of Central Africa Vincent Dehoux and Frédéric Voisin Lacito-CNRS, Paris Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 13–19 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
When trying to determine the scalar system of the Centrafrican xylophones, one is confronted by the irrelevancy of physical measures which do not correspond to pitch as perceived, as well by the lack of verbalisation in respect of musical scales in these societies. In order to permit a field investigation into these scalar systems, we conceived an interactive experimental method founded upon the simulation of these instruments with a synthesizer. When the centrafrican musicians were playing the synthesizer like their own xylophones, we submitted to them tuning structures which also could be retuned by the musicians themselves. The analysis of the data generate a model of tuning structure which determine the interval ratios, the margin of tolerance, and the pertinent timbre structures of the xylophones. The results show various conceptions of scalar system corresponding to four centrafrican ethnic groups, specially in terms of pitch-and-timbre interaction. KEY WORDS: Centrafrican xylophones, experimentation, scalar system, pitch, timbre, synthesis, modelling.
The portable xylophone with multiple gourds resonators is in general use in Central Africa. This kind of xylophone always serves to accompany songs and is found in orchestral ensembles which include percussion instruments. While certain ethnic groups use the xylophone as the leading melodic soloist instrument, one can observe in other groups orchestral ensembles with two, three or four xylophones. The corpus collected since 1983 covers all types of possible formations, which are:
An interactive experimental method for the determination of musical scales
15
ethnic groups using one xylophone: – the Manza who use one xylophone of five bars; – the Gbaya who use one xylophone of twelve bars. and ethnic groups using ensembles of several xylophones: – the Ngbaka-manza who use an ensemble of 3 xylophones (respectively, 9.7 and 4 bars); – the Banda Gbambiya who use an ensemble of 4 xylophones (respectively 8, 7, 7 and 5 bars). The xylophone musics studied use a scale close to a pentatonic anhemitonic scale. This scale is characterized by a division of the octave into five unequal intervals and above all by the lack of a semitone used as a melodic or successive interval.
The tuning of the xylophones When trying to determine the scale system used by the xylophones of Central Africa with precision, one is confronted by numerous difficulties such as, for example, the ambiguous nature of intervals. To attempt to overcome this problem, we have undertaken acoustical measurements1 from xylophone tunings recorded in the field. From the analysis of these measurements, it was not possible to establish a satisfactory and coherent interpretation. There seemed no firm grounds for any particular interpretation. The difficulties in the determination of the xylophones’ tuning appear equally due to the presence of two factors: the “roughness” of perceived pitch and the complexity of timbre. One of the intrinsic qualities of sounds produced by Central African xylophones is their roughness. In this case, the roughness can be considered as a group of “virtual” frequencies close to the fundamental frequency of each degree of the scale. It is a perceptual phenomenon due to the inharmonic structure of the spectrum, which. The complexity of timbre is tied to the specific organology of these xylophones. Each bar of the xylophones is coupled to its own resonator. On each resonator is placed a mirliton (a small buzzing membrane) of which the vibration is added to the bar’s own resonance. But if the mirliton’s forced vibration makes the perception of the pitch unambiguous, its strength is not the same in respect of the different bars of each xylophone. The perception of the intervals is consequently modified in proportion.
The scale of the vocal part Because of the difficulty of determining scale only from the tuning of the xylophones, and also because in every case the xylophones accompany sung melodies, we have approached the problem from the angle of the scales used in the vocal part. Listening to the vocal part, another phenomenon appears tied to the formal structure of the music. These songs are in a responsorial form. A fragment of the melody sung by the soloist singer is followed by a second fragment stated by a choir or by a second singer. The leader and choral vocal parts fall within, respectively, one higher and one lower register, which have some common degrees in a middle register. So, neither of the two
Contemporary music review
16
protagonists cover all the range of the melody, but they divide it into two complementary registers in which the realisation of the scale-degrees is not the same form one register to the other. However, if neither the vocal nor the instrumental parts could help us in determining the scale used in the xylophone music, one might be able to assume a parallelism between the ambiguity of the tuning of the xylophone and the variable 1
Measurements made at the Musée de l’Homme in Paris, with the collaboration of Jean Schwarz, and at the IRCAM with the collaboration of Jean-Baptiste Barrière, also in Paris. 2 These last served as a reference point for out experiments.
realisation of the pitches in the vocal parts. The alternative possibilities would then be as follows: i) the scale is pentatonic with a large margin of tolerance in the realisation of the degrees ii) the scale is what we shall term a “composite scale”, in which each register has its own type of pentatonic scale linked together by common degrees; the passage from one register to the other brings a change of pentatonic mode.
An interactive simulation tool The choice between these two possibilities could only be established by a new investigation in the field, because the African musicians were the only ones who could provide evidence for or against the alternatives. For this purpose, it was necessary to establish a new field-work methodology able to address our requirements concerning scalar systems. As a result of a number of talks with specialists like John Chowning, Jean-Claude Risset and Louis Dandrel, Simha Arom had the idea of undertaking this field-work with the support of a Yamaha DX7 II synthesizer. This machine made available a combination of three facilities necessary for the research: (i) each key of the DX7 II could be “microtuned” to make all the scales needed, with a precision of ca. one cent (1/100 semitone): (ii) the successive order in which keyboard sounds were produced could be modified (an important facility as the topology of central African xylophones does not correspond to a continuous order of pitches as on, e.g. a piano keyboard): and (iii) novel timbres could be easily generated and modified (a condition indispensable in simulating the sound of each of the xylophones subjected to investigation). Moreover, all of the information concerning the results of the operations undertaken (programming and modifications) could be stored in the memory of the DX7 II. We considered following several different approaches in making use of the synthesizer in the field. The first idea was to present the synthesizer to the musicians as though it were a xylophone of a particular type, and to note the reactions of the musicians when we ourselves played examples of different scales. However, we were obliged to reject this approach because of the difficulty of entering into a discourse on abstract concepts like scale and timbre with the Central African musicians; considering the limited number of cases where the musicians have volunteered such information, our results would necessarily have to be treated with extreme caution.
An interactive experimental method for the determination of musical scales
17
At this point, we thought to gauge the Central African musicians’ reactions not to “abstract” instrumental tunings, but to the music itself. To this end, we thought of playing a piece of music ourselves, with the help of transcriptions which the first author had made from his field recordings, and to change the scales on each playing. This approach would depend on our performing competence in a Central African repertoire; because of the strictness of the tempo and of the rhythmic articulation observed in this music, any mistake of rhythm could lead to an immediate rejection by the Central African musicians. So, our ideas on tunings and timbre might be rejected not because of the scales or sounds used, but because of the performance—if the wrong tempo or rhythmic articulation was introduced. Thus we arrived at the following conclusion: it was necessary for the musicians to play the synthesizer themselves as though it were one of their own xylophones. In order to make this possible, the keyboard of the synthesizer was transformed; its “western” configuration was made changeable according to the different typologies of Central African xylophones. Longish bars of plywood, large enough to be hit by mallets, were affixed to some of the keys of the DX7, projecting outwards from the keyboard; the keys not to be sounded were rendered mute. In this way, using Velcro to fasten the bars to the keys of the DX7, we were able to simulate at leisure as many types of xylophones on the synthesizer as necessary. Each xylophone player could thus imagine that he was in front of his own xylophone.
Experimental method The procedures used in our investigations in the field were the same for the different ethnic groups: the synthesizer was presented to a musician who was asked to play a piece of his own repertoire, and note was taken of whether or not he found acceptable the timbres and tunings that were provided. It should be stated that none of the xylophone players had any problems with the machine. During the experiments, the xylophone player was surrounded by several people, singers as well as other xylophone players. This was in order that their choice should be a reflection of a general consensus. Our approach always took the form of asking in the first place whether the musicians would accept or reject the tuning structures and timbres which had been programmed. As the sessions progressed, the musicians gave us more and more commentaries about their choices. We filmed our work in real time as the sessions took place, from the first moment that the musicians came into contact with the synthesizer until the end of the last work session. The video revealed the presentation order of the tuning structures, reactions and commentaries of the musicians and enabled us to record the different manipulations made on the synthesizer. Having determined the interactive principle of the experiments—using a genuine musical situation wherein the musicians played the DX7 as a “xylophone”—the focus can shift to the scales of xylophones. Since the problem was the same for each of the different ethnic groups, each experiment was based on the same theoretical tuning structures. These tuning structures were programmed into the DX7 before leaving for the field-work. Because of the influence of the xylophone’s timbre on its perceived pitch, we had to
Contemporary music review
18
introduce into the tuning structures two other parameters: the degree of roughness and of inharmonicity of the xylophone tones. So, each theoretical tuning structure is a combination of three parameters: pitches (scalar tuning system), roughness, and inharmonicity. (i) Pitch Since our tuning structures make a distinction between pitch and roughness, we could consider pitches as fixed frequencies. These pitches correspond to different scalar systems according to our hypothesis: – five possible pentatonic anhemitonic systems – one possible pentatonic anhemitonic system incorporating a tritone (these were derived from a twelve-tone equitempered system) – one possible equipentatonic system – the original tunings of the xylophones, as determined from the acoustic measurement2 It appeared that it was necessary to develop in the field new tuning structures corresponding to the particular scale conceptions of each ethnic group, as revealed by the current experiments. Indeed the reactions of Central African musicians to the tuning structures were different from one group to the next. They concerned not only the scalar system of pitches, but also the inharmonicity of timbre. For example, the five-bar xylophone simulation of the Manza people had not less than 25 tuning structures, including several inversions and permutations of the original tuning. For the threexylophone ensemble of the Ngbaka-manza people, we programmed scales in which major sevenths or minor ninths replaced the octaves. (ii) Roughness Each scalar tuning structure could be combined with roughness, or not. The aim was to determine if roughness had an independent function in the structure of the scale itself, or if it is only due to the inharmonicity of the spectrum. In this last case, we had to concentrate on the timbre and its interaction with pitch. (iii) Inharmonicity After having synthesized on the DX7 an initial harmonic—periodic—timbre of a xylophone, we shifted step by step its harmonic components. This shifting concerned essentially the position of the second harmonic, which was progressively displaced from precise harmonicity within a range between +/−15 to 100 cents. With this operation, the timbre became more and more inharmonic. The difficulty was to maintain the impression of fusion of the timbre without any change in the other parameters except the harmonic ratios. Our collection contained twelve inharmonic degrees of the same timbre. Acoustical analyses of the spectra, of the synthesized timbres developed by ear and of the originals, gave us the opportunity of verifying the similarity of their spectral structures. Each theoretical tuning structure, as a combination of pitch, roughness and inharmonicity, was also submitted to the Central African xylophone players. Then, they could test the tuning structures by playing the xylophone simulation on the DX7, and accepting or rejecting each of these tuning structures. When musicians rejected a tuning structure, we asked them to retune the synthesizer, or to choose another timbre, or only to
An interactive experimental method for the determination of musical scales
19
say what they considered as wrong. Sometimes, the tuning structure was so wrong that the xylophone players could not retune it, and said “everything is wrong, I can’t do anything”. Sometimes, just one bar of the tuning structure was wrong, and the musicians could retune it, lower or higher (thanks to the micro-tuning function of the DX7); also at times they expressed a wish to repair the “mirliton” on the gourd.
Results, and future perspectives The data were derived both from the synthesizer—pitch adjustments and timbre choices—and from the video—the reactions of the musicians. ‡These last served as a reference point for our experiments.
As all the experiments were filmed in real-time, the video allowed us not only to note a tuning structure as accepted or rejected, but also to determine what the musicians wanted to do. Their comments were important in order to validate models of their scale systems. As noted, we can see in an initial analysis that the conception of scale is quite different for each ethnic group. The importance given to the specific scalar systems used is greater when a single xylophone was played alone than when three or four xylophones were played as an ensemble. The distinction between harmonic and inharmonic timbres is also relevant for the Manza, Ngbaka-Manza and Gbaya populations, but not for the ensemble of four xylophones. Our current conclusion is that for the solo xylophone this harmonic/inharmonic distinction corresponds to different instrumental registers, some bars requiring a stricter harmonicity than others. For the three-xylophone ensemble, this distinction may correspond to different xylophones, a whole xylophone being less inharmonic than the others. Let us see now what kind of tuning structure model our data analysis suggests for the five bar xylophone. The rules of the scalar system are: a) the scale needs three types of intervals: – a major second (200 cents) – an equipentatonic interval (240 cents) – and a small minor third (285 cents), which can be replaced by a major second) – the margin of tolerance of these intervals is more or less 15 cents b) several combinations of these intervals are correct, while others are wrong c) the principal constraint is the pitch range across the bars: it must be between 900 cents and 940 cents, the mean adjusted pitch range being in the order of 930 cents d) the roughness is due to the inharmonicity of timbre, which must be avoided at the three highest pitches (or bars). e) the lower bars can be slightly inharmonic. We intend to verify these rules in our next field-work trip. Because the synthesizer became increasingly easy to handle for both the researchers and the African musicians, the experimental set-up is able to reveal knowledge which is not commonly verbalised in these Central African cultures. Here we provide an account
Contemporary music review
20
of research which places the accent on the problems relating to musical systematics, but even so, such specific work leads quickly into a cognitive dimension. Indeed, it is important to note that the present work with the synthesizer was preceded by long familiarisation to the musics concerned. In other words, and insofar as this experimental format is characterized by processes involving much coming-and-going from one side to the other—from the researcher to the informant and back—it has become possible to imagine different protocols of investigation, reversing the conventional roles of the participants. Becoming the active protagonist in the research, the informant musician controls the flow of the processes of discovery according to his own musical behaviour. This kind of transformation of functions is one of the original aspects of this innovative experimental method in the field. The relation informant/researcher is now superseded by a new relation—informantmediator-researcher—in which, as its name indicates, the synthesizer takes charge in joining-up two behaviours which are radically different, suggesting a new kind of agreement between the participants in the research.
References Arom, S. (1985) Polyphonies et polyrythmies d’ Afrique Centrale, Paris SELAF. Arom, S. (1990) A synthesizer in the Central African Bush: a method of interactive exploration of musical scales, in für Ligeti. Die Referate des Ligeti-Kongress Hamburg 1988, Hamburger Jahrbuch für Musikwissenschaft 11 (in press). Dehoux, V. (1986) Chants à penser Gbaya (Centrafrique), Paris SELAF. Dehoux, V. & Voisin, F. (1990) Procédures d’analyse des échelles dans les musiques avec xylophones d’Afrique Centrale, in pre-publication of the VIIth European Seminar of Ethnomusicology. Berlin IICM. Jones, A.M. (1971) Africa and Indonesia, Leiden Netherland E.J.Brill. Pelletier, S. (1988) Description des échelles musicales d’ Afrique Centrale: problématique, hypothèses, heuristique, Mémoire de DEA.
Discography Arom, S.: Central African Republic, UNESCO “Atlas Musical” EMI 1653901. Duvelle, C.: Musicque Centrafricaine, OCORA, OC43, Radio-France. Tracey, H.: Xylophones, “Musical Instruments no. 5”, The Music of Africa series, Kaleidophone KMA 5.
The influence of the tambura drone on the perception of proximity
21
The influence of the tambura drone on the perception of proximity among scale types in North Indian classical music Kathryn Vaughn Department of Ethnomusicology and Systematic Musicology, University of California, Los Angeles, USA Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 21–33 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
The unique character of the tambura drone is the result of a nonlinear interaction of the strings with the bridge of the instrument. The placement of juari, or “life giving” threads at the curved bridge causes the energy to spread into the higher partials. Experimental evidence indicates that the perception of timbre resulting from this acoustical property alters similarity judgments of pitches within the three most commonly used tambura tunings. The rags of North Indian classical music can be grouped into a system known as the Circle of Thats, based on 32 possible scales, of which 10 are the most commonly used. The perceptual relation among the ten scales and the three tambura drone tunings was investigated using multidimensional scaling and cluster analysis of experimental data from both North Indian and Western musicians. It was found that the perceptual relation among the ten scale types in the absence of the drone is very close to the theoretical Circle of Thats. In the presence of the PA-SA drone the scales tend to cluster on the basis of common tones, placement of gaps and tetrachord symmetry. Correlation between subjects was unrelated to original culture background but significantly related to length of time spent studying this musical tradition. KEY WORDS: Tambura drone, North Indian rag, nonlinear phenomena, psychoacoustics, figure ground pattern recognition, indigenous cognition.
Contemporary music review
22
Introduction The classical music of India is based on both theoretical and acoustical systems. The classification of North Indian rags by scale type, or That (framework), is one such system. A series of experiments was designed to determine a possible perceptual basis for the system of That groupings, and to test cross-culturally what effect the psychoacoustical context provided by the tambura drone may have on the perception of proximity among the most commonly used Thats. The tambura is an unfretted, long-necked lute with four to six strings, used to accompany one or more melodic instruments with percussion. Characterized by a repeating bass pitch pattern and its unusual ‘wash’ of sound, it provides a shimmering backdrop which permeates the overlay of musical structure created by the other instruments with which it is used. The unique timbral modulation of the instrument. C.V.Raman (1920) showed that, in addition to those harmonics tambura drone is the result of the interaction of the strings with the bridge of the predicted by the YoungHelmholtz law, other partials which have a node at the point of excitation are also generated. This is due to the grazing contact between the string and the curved surface of the bridge. The angle at the point of contact is altered by placing threads of material known as the juari [‘life-giving’] threads between each string and the bridge of the instrument. The placement of the juari causes the energy to spread into the higher partials. It has been confirmed that the shimmering, “buzzing” sound of the sitar is due to the wrapping and unwrapping of the strings around the elastic boundary created by the parabolic shape of the bridge (Burridge, Kappraff & Morshedi, 1982). In the case of the tambura, the bridge is flatter but the addition of the juari creates a similar smoothly rounded shape. The resulting non-linear behavior of the strings creates variation in the string lengths. This produces amplitude modulation and frequency modulation sidebands from upper partials causing interactions between the individual strings, giving rise to harmonically related clumps (Benade & Messenger, 1982). Initial experimental evidence indicates that the perception of timbre created by addition of the juari threads alters similarity judgments between separate fundamental pitches of the tambura strings within the three primary tunings: {Pa Sa Sa SA} {Ma Sa Sa SA} {Ni Sa Sa SA}. Furthermore, these results suggest that timbre may be at least as strong a factor in rating similarity between single tones as is the fundamental pitch (Vaughn & Carterette, 1989). Function of the drone Emergence of the tambura drone in Indian music is documented from the sixteenth century. Its increasing use parallels a transition from a modal system with a movable “tonic” or base pitch, to a system of scales with variable intervals relative to a single basic tone. An equivalent transition occurred in European music as composition based on the “church modes”, with their moveable finalis tones, was gradually displaced in favor of the use of transposed modes known as “church keys” and eventually toward functional harmony. The rag and its time cycle, or tal, interact against the constant background of the drone. Jairazbhoy (1971) holds that need for resolution within a rag melody is affected by
The influence of the tambura drone on the perception of proximity
23
two distinct types of consonant/dissonant polarities. The drone establishes the static framework wherein the relation between any note and the groundnote underlies the dynamic quality of the note, a perceptual aspect of which is the relative tension created and its tendency towards completion, or resolution. Indeed, the tambura does provide a continual unchanging bass pattern, but the constantly shifting emphasis along its rich spectrum simultaneously imbues the performance with a sense of constant fluctuation as well. As the strings are plucked successively in continual, unaccented iteration, the result is a dynamic complex of the tones that interact with each other and with the melodic line. In North Indian classical music the four-stringed bass tambura is most often used for accompaniment in one of three tunings known by the name of the altered string: Pa, Ma, or Ni. Pa
Sa
Sa
SA4
5
8
8
1
5th
Ma
Sa
Sa
SA4
4
8
8
1
4th
Ni
Sa
Sa
SA4
7
8
8
1
7th
The relative frequencies are as follows: Sa1
70 Hz
C#2
Ma
93 Hz
F#2
Pa
106 Hz
G#2
Ni
132 Hz
C2
SA4
140 Hz
C#3
Spectral analysis of these three tunings has shown that the distribution of power is strong and significantly harmonically related up through the 20th partial, to approximately 5000 Hz (Carterette, Vaughn & Jairazbhoy, 1989). If the spectrum of the tambura were to interact with the sitar on the level of pitch perception, specific tones of a given melodic pattern could be emphasized. For example, the PA SA drone, which establishes the ground note and the fifth, has partials at 742 Hz, the pitch of F#5, at an amplitude of 50 dB greater than the fundamental pitch of the PA string. This means the sitar scale (C#4– C#5) could have its natural fourth degree (F#4) enhanced at the octave of that tone.1 Therefore, one could expect similarity judgments between scales having a sharpened fourth degree and those having the natural fourth degree to be affected either positively or negatively by the addition of the fifth degree in the tambura drone tuning. Notwithstanding the timbre of the instrument, one would expect the addition of a fundamental at 106 Hz to add some measure of dissonance to those same sets of scales in any case.
A model for the relationship between Thats Circle of That The system of Thats formalized by Pandit V.N.Bhatkhande (1930) classified rags into ten major groups based on the set of tones from which each rag in that group was observed
Contemporary music review
24
to be composed. A cyclic component is inherent in Bhatkhande’s system since the categories he found were related to the assigned time of day each group of rags was (and for the most part still are) performed. Each twenty-four hour day is divided into two segments, from midnight to midday, so that one cycle repeats each twelve hour period. One cycle is subdivided into three main periods: 4
to
7
Sunrise and sunset {Transition from night to day}
7
to
10
Following sunrise and sunset {Morning/Evening}
10
to
4
Preceding sunrise and sunset {Day/Night}
The rags in each of these time “zones” fall into consecutive subsets which Bhatkhande identifies as: 4–5–6, those with a natural 3rd and natural 6th; 1–2–3, those with flatted 3rd and 6th degrees; 7–8–9, those with flatted 2nd and natural 7th degrees.2 Jairazbhoy proposed (1974) this classification system may be considered to have circular properties by the inter-That relation on musical characteristics such as a cycle of fourths, similar to the western circle of fifths. Hence the Circle of That.
Figure 1 Circle of Thats after Bhatkhande
The influence of the tambura drone on the perception of proximity
25
Note: That 9 is somewhat of a conundrum in that it has taken the place of a hypothetical That which was presumed to have used an altered fifth and is no longer in existence. Its place in the sequence can be considered to be unstable so that the location of 9 and 10 in the circle are interchangeable. Feature analysis of scale types A model of the ten scales, based on common tone relations, arrangement of successive intervals, and internal symmetry was created in order to determine how various strategies might affect the mapping of proximities among the modes. Figure 2 lists the ten Thats in traditional western notation, which, in spite of deviation from equal temperament, represents very well the relative differences between each of the modes. Figure 3 shows the same ten scales represented as shapes formed around the size of the gaps between the tones of each scale. The overlaid geometric contour represents the difference curve between each set of tones. Each profile represents the step size from note to note as a peak of either 1, 2, or 3 {semitone, whole tone, diminished 2nd}. The pitch series has been detrended so that the octave rise from C to C, present in all the scales, is not a factor in the analysis. Modelling the scale patterns in this way helps to visualize the component features which appear as each scale is heard through time. For instance, one can easily see that the beginning and ending slopes create some sense of symmetry or asymmetry. Also, a sequence of even step sizes creates a plateau in the difference contour. It is possible to derive a measure of distance between any two scales by numerically encoding the sequences in question. For instance, the ten Thats are seven tone subsets of the set of twelve possible tones. Sa
1
C
Re-komal
2
Db
Re
3
D
Ga-komal
4
Eb
Ga
5
E
Ma
6
F
Ma-tivr
7
F#
Pa
8
G
Dha-komal
9
Ab
Dha
10
A
Ni-komal
11
Bb
Ni
12
B
Bilaval That (major scale) can thus be represented as: 1
3
5
6
8
10
12
{C
D
E
Kalyan That (lydian mode) can be represented as the sequence
F
G
A
B}
Contemporary music review
1
3
5
7
8
10
12
{C
D
26
E
F#
G
A
B}
Based on this representation of the scales as profiles of tone sequences, the ‘distance’ between these two scales has a value of (1). Thus a matrix of distances between each pair of scales was derived.3 Similarly, two additional sets of distances were produced by encoding the scales on the basis of successive interval size (the size of each peak as in Figure 3) and on the differences between successive intervals. A simulated “perceptual space” for each condition was then determined by applying multidimensional scaling (MDS)4 to those calculations. Figure 4 is the two dimensional MDS solution which is derived from distances based on common tones. Figure 5 is derived from the gap-size solution based on size of successive intervals in each That. The resemblance between the Circle of That and the MDS solution for the common tones is striking, but quite logical if one considers that for the most part each That happens to be one tone different from its predecessor. The circular space is very similar to that of the color scale described by Shepherd (1962). The classes of data structures which give rise to “circumplexes” are discussed by Guttman (1954). The present data may arise from the Cartesian product of points on two simple underlying dimensions. Since there is no linear solution for more than three point which are consecutively separated by a distance of (1), two dimensions are necessary to describe the space. Further, the analysis clearly showed that the three dimensional solution yielded a negligible reduction in Kruskal stress, and the individual weights for the third dimension under INDSCALE were consistently insignificant.
The influence of the tambura drone on the perception of proximity
Figure 2 The Thats
27
Contemporary music review
28
Figure 3 Profiles based on size of successive intervals and differences between intervals
The influence of the tambura drone on the perception of proximity
Figure 4 Two dimensional perceptual space simulated using similarities based on number of common tones between Thats
29
Contemporary music review
30
Figure 5 Two dimensional space simulated using similarities based on comparison of successive gaps in each scale Experiments The experimental design was intended to discover the effect of the drone. However, the process seems to have uncovered additional evidence for a perceptual basis for the Circle of That. The work of Castellano, Bharucha & Krumhansl (1984) suggested that correlations of hierarchical ratings of tones from ten rag phrases, falling into eight of the Thats, could be represented by a somewhat circular MDS space.5 The common tone
The influence of the tambura drone on the perception of proximity
31
space presented above in Figure 4 explains that circular relationship. However, the effect of the PA-SA drone is found to disturb that arrangement. General experimental procedures The ten scales and the PA-SA tambura drone, were played by Usted Imrat Khan and recorded separately on a Panasonic SV 250 Digital Audio Tape recorder and two Neumann FET 100 condenser microphones. Each scale was then combined with the drone in a professional recording studio using a Studer A-80 24 track recorder and SSL mixing console. The 20 stimuli were then digitally sampled to disk at 50000 samples per sec with 16 bit resolution. Playback was at the same rate as sampling and was smoothed and converted at a cut off of 20k. The group of subjects were experts in this music and included both Indian musicians and Westerners who had studied North Indian music for at least ten years and perform the music in a professional capacity. In all cases the number of years spent learning this musical tradition was at least 50% of the subject’s life. Random pairs of scales were presented to the subjects, who rated the similarity on a continuous scale ranging from 0 to 100 selecting a point on the computer screen corresponding to their choice. The data then was stored for analysis. Experiment 1: Scaling scales The purpose here was to see how the musician’s perceptual scaling of the Thats without the tambura drone in the background compared to the common tone and gap-size models described above, and to the Circle of Thats. All possible pairs of the ten modes, including identical pairs, were presented in random order to the group of experts. The two dimensional multidimensional scaling solution derived by Alscal is shown in Figure 6.6 The results show a striking resemblance to the Circle of Thats discussed above. The scales are dispersed in a circular manner from 1 (Bhairvi) to 10 (Tori), with number 9 (the conundrum) being slightly out of line, as would be expected for the reason given above. The spacing from one That to the next is consistent, except for the grouping of 7– 8–9–10, all of which contain at least one gap of a diminished 2nd (about three semitones). This grouping is supported by cluster analysis using the technique of overlapping clusters, which allows a non-exclusive clustering pattern. The first two groups to appear from this proximity matrix are divided between {7, 8, 9, 10} and {1, 2, 3, 4, 5, 6}, with no overlap. Experiment 2: Effect of the tambura drone The same group of subjects rated similarity among the Thats in the presence of the PASA drone. The order of presentation between experiments 1 and 2 was also randomized, the results of similarity scaling in the presence of the PA-SA drone, the most commonly used tambura drone tuning, are shown in Figure 7. Here the scales start to cluster more clearly into Bhatkande’s original groupings, i.e. {4–5–6} {1–2–3} {7–8–9–10}. A sub-division also appears between 7–8 and 9–10, with 10 moving somewhat closer to That number 1. It thus appears that the presence of the PA-SA drone creates a context in which the subjects categorize more strictly than in the
Contemporary music review
32
absence of the PA-SA drone. The tighter groupings seem to be related more by features, as in Figure 4, rather than by simply the number tones two scales have in common. Furthermore, there is some increase in the separation between the protomajor, protominor, and gapped scales. If one were to make a case for the influence of the tambura harmonics on these groupings, it would have to be limited to the strength of the major triad within the partial structure of the drone. Otherwise the natural seventh present in mode 10 (Tori) would set it further from mode 1 (Bhairvi) instead of closer to it. It is more likely that features become increasingly influential because the framework provided by the PA-SA drone gives the listener more information. The essential point here is that this most common of drone tunings does alter judgment of proximity, even within a highly trained group of subjects who claim that the PA-SA drone is such an integral part of the music they always assume its presence, even if it is not there. In general, the subjects asserted that they performed identically under both conditions. There was no significant difference between the musicians born and educated in India and the nonIndian Americans. However, initial analysis shows correlation across subject spaces for each condition (with and without the drone) is significant on the basis of the number of years spent studying this musical tradition.
Discussion It is clear that the tambura drone establishes a context on two levels: the first a tonal center, i.e. the harmony of the fundamental pitches; the second, a subtle reinforcement of the upper partials. A third factor, however, must also be considered. As the partials are strengthened at higher and higher frequencies, the apparent “buzziness” creates a separate effect. It is this aesthetically pleasing distortion which is most obvious to the listener. The buzz is a constant source of arousal, transforming the repetitive bass pitch pattern into the reference point for a binary distinction between the melody and the background pattern. Preliminary results on the additional two drone tunings Experiments using the NI SA tuning (7 8 8 1), which is the most dissonant of the three, shows similar results to PA-SA, with slight differences in distances within the subsets. However, the MDS solution for the MA SA tuning (4 8 8 1) places the Thats in an even more evenly distributed circular pattern than under the condition of no tambura. That 9 (Bhairav) comes much closer to 1 (Bhairvi). The ambiguity created by including the interval of the 4th in the drone is somewhat like modulation to the dominant and seems to create a new tonal center entirely. The results thus far indicate that modeling cognition of melodic patterns should include consideration of the performance context, i.e. the musical fabric, as well as
The influence of the tambura drone on the perception of proximity
Figure 6 Perceptual space for the ten Thats without the tambura drone
33
Contemporary music review
34
Figure 7 Perceptual space for the ten Thats in the presence of the PS SA (I– V) drone influence of culturally specific prototypes which may have been learned by musicians and audiences. For the higher level music processing task presented here, amount of exposure to the performance practice was more a factor than the original cultural background of the musician. The listener’s integration of background and foreground information seems to enhance the encoding process, contributing cues regarding switching and summations between explicit and implicit levels of processing. Acknowledgements This research has been supported by grants from the Charles F.Scott Foundation and the Regents of the University of California. I would like to express my appreciation to the Ali
The influence of the tambura drone on the perception of proximity
35
Akbar Khan College of Music and the Los Angeles Philharmonic Institute for their generous cooperation. Special thanks to Professors Nazir Jairazbhoy, Edward Carterette, Roger Kendall, Sue Carole De Vale, and Eric Holmann for their expert advice. I am deeply indebted to Ustadt Imrat Khan for contributing his invaluable time and consummate talent in support of this work.
Notes 1. I have measured the F#4 to be a frequency of 371 Hz, or 508 cents above the C#4 at 278 Hz. 2. Bhatkhande associates That number 10 (Tori) with the group 1–2–3 on the grounds that rag Tori is sometimes played with both natural and sharpened fourth degrees. 3. The distance matrix used for the MDS space presented here was derived by the CLUSTAN program which will give a distance matrix for any set of profile data using values between 0 and 1. 4. Multidimensional scaling is a class of mathematical methods which try to find a space of dimensional size n which best fits a set of points according to some minimum-error criterion. If n=1, a line fits the point; if 2, a plane; if 3, a cube or sphere, and so forth. The iterative criterion used here was for 2 dimensions. The task of interpreting the dimensions is of course, the problem peculiar to a given domain, in this case music and cognition. 5. Although there are some problematic issues in that excellent study (i.e. the use of transcriptions by Danielou who embedded his own notion of a hierarchy in his versions of the rag phrases, synthetic stimuli and some misinterpretation of Jairazbhoy’s theory) the results suggested that neither incorrect temperament nor “unreal” timbre would interfere with subjects’ recognition of the common tone relation between those eight Thats. 6. Disregard the order of 9 and 10. For purposes of illustration the points are joined by an interpolative smooth line fitting using a spline routine which fits a cubic spline that minimized a linear continuation of the sum of squares of the residuals of fit and the integral of the square of the 2nd derivative.
References Bhatkhande, V.N. (1930) A comparative study of some of the leading music systems of the 15th, 16th, 17th and 18th centuries—A series of articles published in Sangita, Lucknow. Benade, A.H. & Messenger, W.G. (1982) Sitar spectrum properties. Journal of the Acoustical Society of America, Supplement 1, 71, 583. Burridge, R., Kapproaff, J. & Morshedi, C. (1982) The sitar string: A vibrating string with a onesided inelastic constraint. SIAM Journal of Applied Mathematics, 42,1231–1251. Carterette, E.C., Vaughn, K. & Jairazbhoy, N.A. (1989) Perceptual, Acoustical and Musical Aspects of the Tambura Drone. Music Perception, 7, No. 2, 75–108. Castellano, M.A., Bharucha, J.J. & Krumhansl, C.L. (1984) Tonal Hierarchies in the Music of North India. Journal of Experimental Psychology, 113, No. 3, 394–412. Jairazbhoy, N.A. (1971) The rags of North Indian Music: Their structure and evolution. London: Faber & Faber. Raman, C.V.. On some Indian stringed instruments. (1920) Indian Association for the Cultivation of Science, 1920, 7, 29–33. Shepherd, R.N. (1962) The analysis of proximities: Multidimensional scaling with an unknown distance function. II. Psychometrika, 27, 219–246. Steblin, Rita (1983) A History of Key Characteristics in the 18th and 19th Centuries. Ann Arbor, Michigan: UMI Research Press.
Contemporary music review
36
Vaughn, K. & Carterette, E.C. (1989) The effect on perception of tambura non-linearity. Proceedings of the First International Conference on Music Perception and Cognition, 1989, 187–191.
Constraints on music cognition
37
Constraints on music cognition—psychoacoustical Pitch properties of chords of octave-spaced tones Richard Parncutt Department of Music Acoustics, Royal Institute of Technology, Stockholm, Sweden Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 35–50 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
Listeners were presented with simultaneities of 1, 2, 3, or 4 octave-spaced (Shepard tones). In Experiment 1, they were asked how many tones they heard in each chord (its multiplicity). In Experiment 2, they heard a chord followed by a tone, and were asked how well the tone went with the chord; this resulted in a tone profile for each chord. In Experiment 3, they heard successive pairs of chords, and were asked to rate their similarity. The experiments may be regarded as octave-generalized versions of experiments reported in Parncutt (1989). Results were modelled by adjusting and extending a psychoacoustical model for the root of a chord (Parncutt, 1988). The model predicts the multiplicity of a chord, the salience (probability of noticing) of each tone in a chord, and the strength of harmonic relationships between chords (pitch commonality). Implications for the theory of roots, implied scales, and harmonic relationships are discussed. KEY WORDS: Pitch salience, chord, root, tone profile, pitch commonality, similarity, multiplicity.
Contemporary music review
38
Introduction In Western music theory chords have roots, and imply scales. For example, a C added sixth chord (CEGA) normally has the root C, and implies the scale of C major. Roots and implied scales are usually somewhat ambiguous: a C6 chord may have other roots (such as A), or imply other scales (such as G major), depending on context. This paper investigates the roots and scales implied by musical chords by comparing the results of listening experiments with calculations according to a psychoacoustical model. The model accounts for roots and implied scale tones by means of a single parameter, pitch salience. Roots are supposed to have high pitch salience, or perceptual importance; additional, implied scale tones have intermediate pitch salience. The model further accounts for harmonic relationships, measured by similarity judgments of pairs of chords, by means of a parameter called pitch commonality. The model explains the sensory origins of roots, implied scales, and harmonic relationships, but neglects culturespecific effects such as conditioning by particular, arbitrary chord sequences. Octave-spaced tones were used in the experiments so as to enable octave-generalized aspects of music theory to be investigated as directly as possible. By building chords from octave-spaced tones, effects of octave register (pitch height) and voicing were minimized. Remaining register effects (such as the “tritone paradox” investigated by Deutsch, 1987) were avoided in Experiments 2 and 3 by random transposition of trials (see procedure sections). In this research, octave equivalence is regarded firstly as an axiom of music theory. As a perceptual phenomenon, it is assumed to be primarily learned from music (see e.g. Burns, 1981). It would appear to be unnecessary to postulate a neurophysiological basis for active equivalence, as e.g. Ohgushi (1983) has done, in order to account for octavegeneralized aspects of music theory. The paper begins by considering the number of simultaneously noticed tones in a chord, here called its multiplicity.1 This parameter is later used in the model to scale pitch saliences as absolute values, representing probabilities of noticing.
Experiment 1: Multiplicity The number of tones simultaneously noticed in a musical chord does not necessarily correspond to the number of pure tone components (Thurlow and Rawling, 1959) or complex tone components (DeWitt and Crowder, 1987; Parncutt, 1989). In the present experiment, sounds were constructed from octave-spaced tones (Shepard, 1964) and listeners were asked how many such tones they heard. Apart from providing some new experimental data, the experiment aimed to test the algorithm for pitch ambiguity in Parncutt (1988). A similar algorithm (for multiplicity) is presented below as part of a model for the salience of a chroma (pitch class) in an octavegeneralized chord, e.g. a chord made of octave-spaced tones. The multiplicity algorithm allows saliences to be expressed as absolute values: “probabilities of noticing” which may be compared across different chords.
Constraints on music cognition
39
Method Listeners. 26 adults participated in the experiment. Their musical experience (here measured in terms of the number of years spent regularly practising or performing music, either instrument or voice) had a mean of 11 years and a standard deviation of 10 years. Equipment. Waveforms were calculated by adding pure tone components in alternating cosine and sine phase (to reduce maximum amplitude) and transferred by analog signal to a digital sampling synthesizer (Casio FZ-1). During the experiment, sounds were called via MIDI by a Le_Lisp program running on a Macintosh II personal computer. They were amplified and reproduced over a loudspeaker in a sound-isolated room. Listeners responded by pressing keys on the computer keyboard. Sounds were compared of octave-spaced tones of equal amplitude. By contrast to the tones used by Shephard (1964), pure tone components had equal amplitude (before amplification) across the range 16 Hz to 16 kHz.2 All pure tone components were tuned to the standard equally-tempered scale, with A=440 Hz and no octave stretching. Twenty different sounds were presented, each in two different transpositions, six semitones apart. Pitches were chosen to produce a balance around the chroma cycle, and so not to emphasize any particular pitch. The 20 sounds consisted of one single octavespaced tone (monad), six dyads of octave-spaced tones (spanning intervals 1 to 6 semitones), five triads (037, 047, 048, 036 and 057) and eight tetrads (047Q, 037Q, 047L, 037L, 0369, 036Q, 057Q and 046Q).3 Sounds had durations of 0.2 s. All components in each sound started exactly simultaneously; this was important, as the auditory system is remarkably sensitive to asynchrony in onset times, and uses asynchrony to discriminate musical tones in performance (Rasch, 1978). Overall loudness was adjusted to a comfortable level by each listener. Procedure. In each trial, one of the 20 sounds was presented twice, with a pause of 0.5 s between presentations. The task was to indicate how many tones they heard in the sounds. No upper limit was set on their responses. Listeners could take as long as they wished to respond. They were asked, however, to respond spontaneously, without thinking too hard. It was stressed that this was not a test of musical ability. Each sound was presented twice, making a total of 40 trials. To avoid serial effects, trials were presented in a random order which was different for each listener. The experiment was preceded by a practice session. During the practice, listeners were told after they responded whether the chord had contained one, or more than one, (octave-spaced) tone. In the experiment proper, no feedback was given.
Results The non-musicians initially had some difficulty with the task, but after some practice found they were able to distinguish a small number of response categories (e.g. 1 to 3). Still, they were not sure that they actually heard all the tones that they guessed were present. Musicians’ responses generally covered larger ranges. Many said afterwards that they had responded on the basis of musical experience (e.g. responding “4” on
Contemporary music review
40
recognizing a seventh chord). All results correlated positively with the actual number of tones in the sounds, so none were eliminated from the analysis.
Figure 1 Points: mean responses (52 data per point). Bars: 95% confidence intervals. Squares: calculations according to (7) with kM=45, kW=3.9, and kS=0.91 (see below). Chord classes are indicated by intervals in semitones above the nominal root (e.g. “047”=major triad). Results are graphed in Fig. 1 as means and 95% confidence intervals of responses.4 Responses corresponded closely to the “actual” number of octave-spaced tones in each sound (but clearly not to the number of pure tone components) for the monad, dyads and triads. In the case of the monad, the result was not surprising, as listeners had been taught to recognize monads during the practice.5 The tetrads were heard to contain about 3.5 tones, agreeing with Huron’s (1989) finding that the accuracy of identifying the number of concurrent voices in polyphonic music drops markedly at the point where a three-voice texture is augmented to four voices. There was an additional tendency for consonant sounds to have lower, and dissonant to have higher, multiplicity. So, for example, dyad 05 (perfect fourth/ fifth) was heard to have significantly less (p<0.05) tones than dyad 04 (M3/m6), triad 047 (major) less than triad 048 (augmented), and tetrad 047Q (major-minor) less than tetrad 047L (major) with its dissonant semitone (L0). It is not clear whether this was a “real” effect, due to the number of tones actually heard, or an artifact: many listeners appear to have used
Constraints on music cognition
41
dissonance as a basis upon which to guess the number of tones in each sound. Neither of these approaches explains why the mean response for the dissonant 037L (minor-major) tetrad was lower than that for the 047L (major) tetrad.
Experiment 2: Pitch analysis Krumhansl and Kessler (1982) measured the relative musical importance of the twelve chroma in octave-generalized musical chords by presenting a chord (constructed from octave-spaced tones) followed by a tone (also octave-spaced) and asking how well, in a musical sense, the two went with each other. In the resultant tone profiles, the highest peak normally corresponded to the root of the chord, and subsidiary peaks corresponded to other notes. In the experiment to be described, I repeated this paradigm for five different chord classes, and compared results with calculations according to a psychoacoustical model for the root of a chord (Parncutt, 1988).
Method Listeners. 32 people participated. Of these, 5 were later eliminated, as their results correlated negatively with the actual presence or absence of tones in the chords. The eliminated listeners had very little or no experience of practising or performing music. The remaining 27 had mean 12 years, standard deviation 9 years musical experience. Equipment was the same as in Experiment 1. Sounds. Five of the chords presented in the previous experiment were analyzed: the majo triad 047, the minor triad 037, the major-minor tetrad 047Q, the half-diminished tetrad 036Q and the diminished tetrad 0369. As before, the chords were composed of octavespaced tones. Comparison tones were also octave-spaced. Procedure. The listeners considered this experiment more difficult than Experiments 1 and 3. So in most cases they did if after Experiment 3, to allow themselves more time to become more familiar with the kinds of sounds used, and the general procedure. The experiment began with a practice session, during which listeners were told (after each response) whether the tone in each chord-tone pair had actually been part of the chord. In the experiment proper, no feedback was given. There were 60 trials, in which 5 chords were each compared with tones at 12 chromatic intervals above their (nominal) roots. Trials were presented in a random order that was different for each listener. Each trial consisted of a chord followed by a tone, then the same chord-tone pair repeated. Both chords and tones had durations of 0.2 s. The time interval between chord and tone was 0.35 s; between repetitions, 0.55 s. Each chord-tone pair was transposed through a random chromatic interval, with the exception of the 12 pairs including the diminished tetrad 0369, which were held at the same pitch (so as to allow investigation of absolute pitch effects: see blow). As before, listeners could take as long as they wished to respond, but were asked to do so as spontaneously as possible.
Contemporary music review
42
The task was to indicate how well the tone went with the chord on a scale from 0 (very badly) to 3 (very well). If listeners thought that the tone was actually in the chord, they were asked to select the response “3”.
Results Results for all five analysed chords are shown in Fig. 2. Mean responses were generally higher for notes actually in the chord. In the case of the major-minor tetrad 047Q and the half-diminished tetrad 036Q, there was a further clear peak in the responses at the root (0 and 3 respectively). So 036Q was most often heard as a minor sixth chord (0379). Surprisingly, in the case of the major (047) and minor (037) triads, the response at the root (0) was not significantly different from that at the fifth (7). In ordinary musical voicings of these two chords, the root is clearly more salient than other chroma (Parncutt, 1989; Terhardt, Stoll, and Seewann, 1982a). A pitch height (or absolute pitch) effect occurred for the diminished tetrad 0369, which (unlike the other chords) was not subjected to random transpositions. Of the four results corresponding to actual notes, the lowest occurred at interval 9, which in this case was always the note A. This effect is consistent with the finding of Terhardt, Stoll, Schermbach, and Parncutt (1986) that the distribution of the main pitches of octavespaced tones is centred around 300 Hz, or between D4 and Eb4 (Eb4 lies opposite A on the chroma cycle). Results for notes not actually present in the chords also showed some structure. The major triad 047 went better with notes in the corresponding major scale than with other notes: specifically, it went better with 2 than 1, 5 than 6, 9 and 8, and 11 than 10. Similarly, the minor triad 037 went better with notes in the corresponding harmonic minor scale (2 than 1, 8 than 6, 10 than 11). The major-minor tetrad 047Q went better with notes in the major scale on 5, the scale in which it is a dominant seventh chord (2 better than 1, 5 than 6, 9 than 8). The half-diminished triad 036Q went particularly well with 8, a note which, if actually present, would function as its root, producing a dominant ninth chord.
Constraints on music cognition
43
Figure 2 Points: mean responses (27 data per point). Bars: 95% confidence intervals. Squares: calculations according to (8) with kM=8, kW=2.2, and kS=0.9. Calculations are adjusted linearly to have the same mean and standard deviation as mean responses over all 60 values.
Contemporary music review
44
According to music theory (as well as the model of Parncutt, 1988), the diminished tetrad 0369 should go well with the notes 1, 4, 7 and Q (all of which can function as the root of the chord). In the experiment, however, these notes went no better with the chord than the remaining chroma (2, 5, 8, L).
Experiment 3: Similarity The harmonic relationship between musical chords may be modelled by their pitch commonality, or the extent to which they have pitches in common, and measured by similarity judgments (Parncutt, 1989). The aim of the current experiment was to measure the strength of harmonic relationships for some common chord progressions in music theory.
Method Listeners. 26 people took part, with mean 11 years and standard deviation 10 years’ experience of regular musical practice. Equipment and sounds were as previously. Procedure. Five chord pairs were each compared with 12 different chromatic intervals between the two roots. In the first 12 trials, both chords were major triads, so the actual chord pairs were 047–047, 047–158, 047–269, …047–L36. In the next 12 trials, major triads were compared with minor triads (047–037, 047–148, …047– L26). Then, majorminor tetrads were compared with major triads (047Q–047, 047Q–158, etc.), and halfdiminished tetrads with major-minor tetrads (036Q-047Q, 036Q–158L, etc.). Finally, diminished tetrads were compared with major triads (0369–047, 0369–158, 0369–269), minor triads (0369–037, –148, –259), major-minor sevenths (0369–047Q, –158L, −2690) and half-diminished tetrads (0369–036Q, –147L, –2580). All but the major-major (047– 047, etc) pairs were also presented in the reverse order (e.g. 037–047 as well as 047– 037), making a total of 60+ 48=108 trials. Each listener heard the trials in a different random order, and each chord pair was transposed through a random chromatic interval. Listeners were asked to rate the similarity of the chords on a 4-point scale from 0 (very different) to 3 (very similar). Musicians were told that “similarity” may be interpreted as meaning “harmonically related”; they were nevertheless asked to avoid thinking in terms of music theory and to respond instead according to how related they perceived the chords to be. As before, listeners were allowed to practice for as long as they wished.
Results Figure 3 shows the results for all trials. Note that the confidence intervals in part (a) of the figure are larger than in parts (b) to (e), as pairs of major triads (part a) were presented only once in the experiment, while the other chord pairs were presented in two different orders.
Constraints on music cognition
45
Figure 3 Points: mean responses (26 data per point in part a, 52 in parts b– e). Bars: 95% confidence intervals. Squares: correlation coefficients between tone profiles according to (8), with kM=4 and kW=2.1, adjusted linearly to have the same mean and standard deviation as mean responses over all 60 values. In part e, the comparison chord is 047 (major triad)
Contemporary music review
46
for the first three points 037 (minor triad) for the second three, 047Q (major-minor tetrad) for the third three, and 036Q (half-diminished tetrad) for the last three. Figure 3a contains the only trial in which chords were identical. Almost all listeners responded “very similar” to this trial. Next in the rank order to mean responses were pairs of major triads (047) with intervals of 5 and 8 semitones between their roots. Surprisingly, the rising fifth (7) was judged more similar than the musically more usual rising fourth (5) (cf. Bharucha and Krumhansl, 1983). Next in line were third relationships (3, 4, 8—but, surprisingly, not 9—semitones). Like fifth relationships, these involve one common tone (for example, the tone 7 is common to 047 and 37Q). The low response at the tritone interval (6) is in line with music theory. Part (b) of the figure shows results for pairs of major (047) and minor (037) triads. The “tonic minor/major” relationship (interval 0) was judged most similar. Next comes interval 4—the mediant relationship in the major key, or the submediant in the minor (both called Gegenklang by Riemann, 1893; De la Motte, 1976)—and the familiar dominant-tonic cadence in a minor key (interval 5). Only after these comes the relative major/minor relationship (9), together with the dominant minor or subdominant major relationship (5). The response for the unusual progression 047–148 (interval 1) was presumably enhanced by the presence of a common tone (4). The results in Figure 3c are headed by interval 0 (for the almost identical chords 047Q and 047). Next, surprisingly, comes interval 4 (the “German augmented sixth” relationship in the major key), along with the familiar dominant-tonic (interval 5). After this comes interval 7 (cf. the progression II—V7), intervals 8 (cf. V7—III in the minor key), and interval 3 (e.g. I—VI7). The results in Figure 3d are less structured, due to the relatively high pitch ambiguity of the half-diminished tetrad 036Q (allowing for many equally satisfactory resolutions or preparations). Similarity with the major-minor tetrad 047Q is highest for intervals 0, 2, 3, 6, 8 and 11 semitones between the nominal roots. Of these, the progression at interval 2 is unusual, apparently because it involves parallel fifths; interval 11 corresponds to the resolution of the Tristan chord (from the opening of Wagner’s Tristan und Isolde); and the other progressions tend in musical contexts to sound like chromatic shifts over the same root. The most functionally important resolution of the half-diminished tetrad, ii7— V7 in a minor key (corresponding to interval 5 in the figure), is next in the rank order to similarity. Figure 3e shows resolutions of (or preparation for) the diminished tetrad 0369. For the major and minor triads (the first six points), similarity is higher for interval 0 (e.g. 0369– 047) due to pitches in common, and for interval 1 (e.g. 0369–158) which is functionally important due to its good voice-leading and V–I implication. Overall, however, similarity is low at these six points due to the difference in timbre (or consonance) between the major/minor triads and the diminished tetrad. In the case of the major-minor tetrad 047Q, similarity is highest for interval 2 (0369–2690: cf. the resolution of a minor ninth interval
Constraints on music cognition
47
to an octave over a dominant seventh chord) due to pitches in common. The high result for the half-diminished tetrad 036Q at interval 0 is also due to common pitches.
Model Results were simulated by a model of pitch salience in chords of octave-spaced tones. The model is based on the model for the root of a musical chord of Parncutt (1988). The model may also be regarded as an octave-generalized version of the model of pitch salience and tonal relationship of Parncutt (1989). These models are in turn based on Terhardt’s (1982) model for the root of a chord, and his algorithm for the pitch and pitch salience of complex tonal signals (Terhardt et al. 1982b). In Parncutt (1988), I aimed primarily to predict the root of a chord class (i.e. a chord built from full harmonic complex tones, which may appear in different inversions, spacings and doublings). The present model aims instead to simulate the results of the above experiments on pitch properties of octave-spaced tones. The experiments suggested that pitch salience in chords of octave-spaced tones is similar to, but not exactly the same as, musical chord-root salience (consider, for example, the responses at intervals 0 and 7 in Figure 2, parts a and b). Accordingly, the present model differs from its predecessor in a number of way. It accounts for masking, places less emphasis on the recognition of harmonic pitch-patterns, and contains some free parameters.
Input As the stimuli were realized from octave-spaced tones, it is appropriate to restrict the model to a single octave register. This register may be supposed to lie in the most important region of pitch perception—say, 500 to 1000 Hz (Fletcher and Galt, 1950), or register 5. Within this register, chroma (pitch classes) c take values 0, 1, 2,…9, Q and L, corresponding to the musical note names C, C#/Db, D,…A, Bb/A#, and B. Pure tone components are assumed to have SPLs of 50 dB. The sounds in the experiment were not as quiet as this may suggest, as they included pure tone components in many critical bands, covering almost the entire range of hearing (see Zwicker, Flottorp, and Stevens, 1957). Note also that the predictions of the model are almost independent of input level over quite a wide range of levels.
Masking and audibility Masking is accounted for in the model between all pure tone components less than one octave apart. Each octave-spaced tone masks every pure tone component of every other octave-spaced tone from both sides. The extent to which a component at chroma c′ masks another component c is expressed in terms of the effective reduction (in dB) of the level at c due to c′: (1a)
Contemporary music review
48
(1b) Here, “mod12” means “modulo 12” (clock arithmetic). The “masking parameter” kM is the first free parameter in the model. Its value is expected to lie in the vicinity of 9 dB per semitone, i.e., 27 dB per critical band (Zwicker and Feldtkeller, 1967), given that critical bandwidths is approximately equal to 3 semitones in the region above 500 Hz.6 Contributions to masking of component c by component(s) c′ are combined by adding amplitudes: (2) where the summation is carried out over all values of c not equal to c′. The audible level AL (in dB) of each pure tone component—its level above masked threshold—is then simply: (3) where the “max” function prevents negative values of AL. The audibility Ap of each pure tone component is assumed to saturate with increasing audible level: (4)
Harmonic pitch pattern recognition The recognition of harmonic pitch patterns among pure tone sensations is simulated by means of a template of root-support weightsw (i), where i stands for “interval” and ranges from 0 to 11 semitones. In Parncutt (1988), these weights were estimated at 1, 0, 1/5, 1/10, 1/3, 0, 0, 1/2, 0, 0, 1/4, 0, for intervals 0, 1, 2,…, 11. Here, two changes are made to these values. First, the value at interval 3 is set at zero (instead of 1/10), as interval 3 is not among the first 10 harmonics (0, 0, 7, 0, 4, 7, 10, 0, 2, 4 semitones).7 Second, the weights at the remaining root-support intervals 2, 4, 7 and 10 are adjusted to optimize the fit between calculations and the results of the above experiments. First, they are treated as independent free parameters (Table 1 below). Then, the weights listed above are all raised to the power kW (the “harmonic weight parameter”): the second free parameter in the model (Table 2). The effect of harmonic pattern recognition on the overall audibility A of tone components is accounted for by the following template matching procedure: (5)
Multiplicity and salience
Constraints on music cognition
49
A first estimate of the number of tones heard in a simultaneity of octave-spaced tones— its multiplicity—is: (6) where Amax is the maximum value of A, i.e. the audibility of the most audible tone component. M′ is scaled in the model by raising it to a power less than one: (7) where the “simultaneity perception parameter” kS is the third free parameter in the model. The perceptual salience S of each tone sensation is defined as its probability of being noticed. If S is proportional to the audibility A, and the sum of the saliences S of all tone sensations in a sound equals its multiplicity M, then:
A graph of calculated pitch salience against chroma (for c=0 to 11) is called a calculated tone profile (cf. Krumhansl and Kessler, 1982).
Modelling of experimental data The mean responses for the 20 sounds in Experiment 1 (multiplicity), the 60 chord-tone pairs in Experiment 2 (pitch analysis) and the 60 chord pairs in Experiment 3 (similarity) were compared with calculated responses according to the model. Calculated values for Experiment 1 were the multiplicities M (equation 7); for Experiment 2, the pitch saliences S (8); and for Experiment 3, correlation coefficients r between calculated tone profiles of pairs of chords. In Experiment 1, the values of the free parameters were independently adjusted by small steps until the root-mean-square difference between the 40 experimental and 40 theoretical values reached a minimum. In the other two experiments, parameters were adjusted until the correlation coefficient bet ween 60 experimental and 60 theoretical values was a maximum.8 The entire adjustment and minimizing procedure was performed automatically by computer. First six parameters were varied: the masking parameter kM (equation 1), the weights w (i) for i=2, 4, 7 and 10, and the simultaneity perception parameter kS (equation 7). Results are shown in Table 1.
Table 1 Optimal values of six free parameters Experiment 1 (Multiplicity) 2 (Pitch Analysis)
kM
w (2)
w (4)
w (7)
w (10)
kS
r
51
.00
.00
.04
.05
.91
.98
7
.11
.00
.30
.07
.8
.89
Contemporary music review
3 (Similarity)
4
.08
.18
50
.25
−a
.00
.77
a. No value is given for kS in Experiment 3. This parameter only affects the scaling of pitch saliences, so it has no influence on correlation coefficients between tone profiles of chords, i.e. on predicted chord similarities.
As shown in the Table, optimal values of the weights w in the different experiments were unstable, and did not permit improvement on previous, theoretical estimates (Parncutt, 1988). So the previous estimates were retained, and subsequently only three parameters (kM, kW and kS) were varied (Table 2).
Table 2 Optimal values of three free parameters Experiment
kM
1 (Multiplicity)
kM
kS
r
45
3.9
.91
.98
2 (Pitch Analysis)
8
2.2
.9
.88
3 (Similarity)
4
2.1
—
.76
“typical”
6
2
1
The impossibly high values of kM and kW for Experiment 1 (in both tables) support the contention that listeners estimated numbers of tones by recognizing whole sounds, guessing on the basis of musical experience, rather than by counting tone sensations (as assumed in the model). Values of kM for the other two experiments are more plausible: assuming a critical bandwidth (cb) of 3 semitones in the dominance region of spectral pitch perception, they correspond to gradients in the range 12–24 dB/cb. A “typical” value of 6 (i.e. 18 dB/cb) is proposed for music-theoretic applications (see footnote 6). Values of kW for Experiments 2 and 3 are high by comparison to the effective value of 1 in Parncutt (1988). This suggests an essential difference between the salience (probability of noticing) of a tone in a chord of octave-spaced tones, and the probability that a tone will function as the root of a chord class in music theory. Values of kS in Experiments 1 and 2 are also high, compared to the value of 0.5 proposed in Parncutt (1989). This presumably compensates for the octave-generalized model’s failure to consider all the tones in a sound. The model, in effect, only looks at one octave register, neglecting the possibility that tones in other octaves might also be noticed.
Conclusion The model was found to simulate the results of the experiments quite accurately, in spite of the listeners’ relatively low confidence in their responses, and the unavoidable influences of cultural conditioning. This confirms the validity of the model in musictheoretical applications, as described in Parncutt (1988). The model explains important aspects of octave-generalized harmony theory such as the root of a chord, chord-scale compatibility, and harmonic relationship, in terms of a
Constraints on music cognition
51
psychoacoustical theory of pitch perception. As described in Parncutt (1989), the model may be used to simulate—quite accurately—the experimental key profiles of Krumhansl and Kessler (1982), by summing tone profiles of individual chords. This confirms Butler’s (1989) claim that these key profiles, at least in the case of chord progressions, are largely artifacts of effects of short-term memory: they may be essentially “sensory” rather than “cognitive”. Note, however, that the model does not directly explain tone profiles obtained from scales (Krumhansl and Shepard, 1979) or other tone sequences (e.g., Cuddy and Badertscher, 1987). Tone profiles in these cases appear to be a result of cultural conditioning by exposure to music containing chords (including broken chords) and chord sequences in major and minor keys. These latter profiles may be described as indirectly sensory, or “sensory in origin” (Parncutt, 1989). The model may be relevant for the automated performance of music: it provides a possible psychoacoustic basis for the concepts melodic charge and harmonic charge in Sundberg’s (1988) performance rules. In stochastic composition, the model could be used to compose chord progressions where the probability of occurrence a particular chord depends in some consistent fashion on its multiplicity, pitch commonality with previous and following chords, or pitch commonality with a nominal tonic chord, or some combination of these. Like Parncutt (1989), the present study addresses the issue of whether harmonic relationships in music are sensory or cognitive in nature. According to Shepard (1982, p. 346), there is: a fundamental limitation inherent in any purely unidimensional representation of pitch…there is no way in which such a unidimensional scale can represent the fact that under appropriately musical conditions, two tones separated by an especially significant interval, such as the octave…, are perceived to be more closely related than two tones separated by a slightly smaller interval, such as the major seventh. In order to accommodate an increase in similarity between all tones separated by a particular interval, the rectilinear scale must be deformed into some more complex structure requiring…a higher-dimensional embedding space. The present research argues against Shepard’s claims, in that most of the results could be accounted for by means of a sensory model based on a one-dimensional pitch scale. It was not necessary to invoke cognitive-structural representations. This casts some doubt on the existence of such structures. Later in the same article (p. 350), Shepard makes the important point that: considerations of the sensory limitations of the input transducer—such as the reduced efficiency of the ear in discriminating nearby amusical pitches at the low end of the continuum of audible frequencies, which such psychophysical scales of pitch as the mel scale had been designed to represent—are essentially irrelevant to the problem of the representation of the cognitive structures that underlie the interpretation of musical sequences.
Contemporary music review
52
In other words, the mel scale has little or nothing to do with musical pitch relationships. In the terminology of the present study, musical pitch relationship depend instead on pitch commonality and familiarity. Both of these involve information processing in some way, and so may be regarded as cognitive, as well as sensory, in nature. Acknowledgments I am grateful to Johan Sundberg for the use of his laboratory and equipment, for friendly support, and for comments on the manuscript. Special thanks also to Sten Termström for calculating the waveforms of the sounds, and for general technical assistance. This research was supported by a scholarship from the Swedish Institute. Helpful suggestions from an anonymous reviewer are gratefully acknowledged.
Notes 1. Other possible terms for multiplicity are numerosity, ambiguity, and complexity. 2. Shepard’s (1964) tones had a bell-shaped amplitude envelope, tailing off at low and high frequencies. The tones used here had flat amplitude envelopes across most of the audible spectrum (rounded by the frequency response of the loudspeaker). In spite of this difference, they sounded practically identical to Shepard’s tones (Pollack, 1978). The lowest-pitched components of the tones used here were inaudible due to masking, and both the highest and the lowest audible components were irrelevant for pitch perception, in particular for the formation of virtual pitch (complex tone sensation), due the dominance effect in spectral pitch perception (Fletcher and Galt, 1950; Terhardt, et al., 1982b). The main pitch of both Shepard’s tones and the tones used here lies in the vicinity of 300 Hz (Terhardt et al., 1986). A difference between the sound of the two kinds of tone did, however, become apparent when tones were superposed to form chords: chords composed of the tones used here sounded rougher, as they had more low-ptiched components. This presumably had no effect on their pitch salience patterns, which were determined mainly by pure tone components in the dominance region (near 700 Hz). 3. Note on terminology. In this paper, chords are specified in terms of the number of semitones between the music-theoretical root (or other reference pitch) and the other notes (Parncutt, 1990). So, for example, a major triad is written “047”, where “0” denotes the root, “4” the major third and “7” the perfect fifth. Intervals of 10 or 11 semitones above the root are specified by the symbols Q and L respectively (obtained by superimposing the component symbols 0 and 1 of these numbers; L also stands for “leading note”). So the dominant seventh chord is called 047Q. Confusion between diatonic (traditional) and chromatic interval labels (e.g. “seventh” versus “7 semitones”) is avoided by consistent use of the terms monad, dyad, triad and tetrad for single tones, (simultaneous) intervals, chords of three tones, and chords of four tones, respectively. The minor seventh chord 037Q, for example, is called a minor tetrad, and 0369 is called a diminished tetrad. Incidentally, the term “dominant tetrad” is reserved for 047Q chords actually on the dominant scale degree (7 semitones above the tonic); when tonal context is not specified, the 047Q chord is called a major-minor tetrad (cf. major-minor seventh). Confusion with conventional interval names (third, fifth, etc.) is further avoided by the formulation “interval 0” for unison, “interval 1” for one semitone etc. 4. To a good approximation, the mean responses for two different trials are significantly different (p>0.05) if they differ by more than a 95% confidence deviation (i.e. the half-width of a 95% confidence interval) divided by root 2, provided the confidence deviation for the two experiments is about the same.
Constraints on music cognition
53
5. In a different context (Parncutt, 1989), single octave-spaced tones were mostly heard to comprise between two and three tones. This is an example of the general rule that the perceived number of tones in a musical sound depends on context. 6. At medium to high sound levels, the masking pattern of a pure tone is considerably higher and longer on its upper than its lower side. However, at low levels the pattern is almost symmetrical with respect to critical-band rate (Zwicker and Jaroszewski, 1982). 7. In Parncutt (188), interval 3 was assigned a small root-support weight (1/10), as the third harmonic of interval 3 (3+7=10) corresponds to a harmonic (the seventh) of interval 0. It may therefore contribute to the root sensation. Here, octave-spaced tones were used, containing only harmonics 1, 2, 4, 8, 16, etc., so interval 3 could not contribute to the root— except, perhaps, by cultural conditioning. 8. Note that each parameter may be regarded as a measure of how analytically sound is perceived: kM at the level of spectral analysis (discrimination of pure tone components), kW at the level of “hearing out” of pure (as opposed to complex) tone components, and kS at the level of simultaneous perception of tones in a sound.
References Bharucha, J. & Krumhansl, C.L. (1983) The representation of harmonic structure in music: Hierarchies of stability as a function of context. Cognition, 13, 63–102. Burns, E.M. (1981) Circularity in relative pitch judgments for inharmonic complex tones: The Shepard demonstration revisited, again. Perception & Psychophysics, 30, 467–472. Butler, D. (1989) Describing the perception of tonality in music: A critique of the tonal hierarchy theory and a proposal for a theory of intervallic rivalry. Music Perception, 6, 219–242. Cuddy, L.L. & Badertscher, B. (1987) Recovery of the tonal hierarchy: Some comparisons across age and musical experience. Perception and Psychophysics, 41, 609–620. de la Motte, D. (1976) Harmonielehre. Kassel: Bärenreiter. Deutsch, D. (1987) The tritone paradox: Effects of spectral variables. Perception & Psychophysics, 41, 563–575. DeWitt, L.A. & Crowder, R.G. (1987) Tonal fusion of consonant musical intervals: The Oomph in Stumpf. Perception & Psychophysics, 41, 73–84. Fletcher, H. & Galt, R.H. (1950) The perception of speech and its relation to telephony. Journal of the Acoustical Society of America, 22, 89–151. Huron, D. (1989) Voice denumerability in polyphonic music of homogeneous timbres. Music Perception, 6, 361–382. Krumhansl, C.L. Kessler, E.J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, 89, 334–368. Krumhansl, C.L. & Shepard, R.N. (1979) Quantification of the hierarchy of tonal functions within a diatonic context. Journal of Experimental Psychology: Human Perception and Performance, 5, 579–594. Ohgushi, K. (1983) The origin of tonality and a possible explanation of the octave enlargement phenomenon. Journal of the Acoustical Society of America, 73, 1694–100. Parncutt, R. (1988) Revision of Terhardt’s psychoacoustical model of the root(s) of a musical chord. Music Perception, 6, 65–94. Parncutt, R. (1989) Harmony: A Psychoacoustical Approach. Berlin: Springer-Verlag. Parncutt, R. (1990) Chromatic chord symbols. Computer Music Journal, 14 (2), 13–14. Pollack, I. (1987) Decoupling of auditory pitch and stimulus frequency. The Shepard demonstration revisited. Journal of the Acoustical Society of America, 63, 202–206. Rasch, R.A. (1978) The perception of simultaneous notes such as in polyphonic music. Acustica, 40, 21–33.
Contemporary music review
54
Riemann, H. (1893) Vereinfachte Harmonielehre. London: Augener. Shepard, R.N. (1964) Circularity in judgments of relative pitch. Journal of the Acoustical Society of America, 36, 2346–2353. Shepard, R.N. (1982) Structural representations of musical pitch. In D.Deutsch (Ed.), The Psychology of Music, pp. 344–390. London: Academic Press. Sundberg, J. (1988) Computer synthesis of music performance. In J.A.Sloboda (Ed.), Generative Processes in Music. Oxford: Clarendon. Terhardt, E. (1982) Die psychoakustischen Grundlagen der musikalischen Akkordgrundtöne und deren algorithmische Bestimmung. In C.Dahlhaus & M.Krause (Eds.), Tiefendstruktur der Musik. Berlin: Technical University of Berlin. Terhardt, E., Stoll, G., Schermbach, R., & Parncutt, R. (1986) Tonhöhenmehrdeutigkeit, Tonverwandschaft und Identifikation von Sukzessivintervallen. Acustica, 61, 57–66. Terhardt, E., Stoll, G., & Seewann, M. (1982) Pitch of complex tonal signals according to virtual pitch theory: Tests, examples and predictions. Journal of the Acousticacl Society of America, 71, 671–678 (a). Terhardt, E., Stoll, G., & Seewann, M. (1982) Algorithm for the extraction of pitch and pitch salience from complex tonal signals. Journal of the Acoustical Society of America, 71, 679–688 (b). Terhardt, E., Stoll, G., Schermbach, R., & Parncutt, R. (1986) Tonhöhenmehrdeutigkeit, Tonverwandschaft und Identifikation von Sukzessivintervallen. Acustica, 61, 57–66. Thurlow, W.R. & Rawling, L.L. (1959) Discrimination of number of simultaneously sounding tones. Journal of the Acoustical Society of America, 31, 1332–1336. Zwicker, E. & Feldtkeller, R. (1967) Das Ohr als Nachrichtenempfänger, 2nd ed. Stuttgart: HirzelVerlag. Zwicker, E., Flottorp, G., & Stevens, S.S. (1957) Critical bandwidth in loudness summation. Journal of the Acoustical Society of America, 29, 548–557. Zwicker, E. & Jaroszewski, A. (1982) Inverse frequency dependence of simultaneous tone-on-tone masking patterns at low levels. Journal of the Acoustical Society of America, 71, 1508–1512.
Identification and blend of timbers as a basis for orchestration
55
Identification and blend of timbres as a basis for orchestration Roger A.Kendall and Edward C.Carterette Departments of Ethnomusicology & Systematic Musicology and Psychology University of California, Los Angeles, USA Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 51–67 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
We report on a series of experiments directed toward questions concerning the timbres of simultaneous orchestral wind instruments. Following a background exposition on instrumentation and orchestration, we discuss previous experimental research on the properties of multiple timbres. To augment and explicate previous findings, we conducted two experiments: Experiment 1 was directed at subject ratings of the blend of oboe, trumpet, clarinet, alto saxophone, and flute dyads. Experiment 2 required subjects to identify the constituent instruments of a pair. Results demonstrated that increasing blend correlated with decreasing identification, and was related to the distribution of time-variant spectral energy. Oboe dyads, which were rated in other experiments as highly “nasal,” produced the lowest blend values and the highest identification. The findings are discussed in terms of a theoretical model of timbral combination and the possibilities for composition and musicological analysis. KEY WORDS: Timbre, tone color, wind instrument, orchestration, instrumentation, cognition, perceptual scaling, blend. Background The study of orchestration as an element of musical composition is, historically, a relatively recent event. In the Middle Ages and Renaissance, the assignment of instruments to parts was dictated largely by availability of resources; orchestration in the sense of a planned, structural use of instruments and instrument combinations was not employed. The Baroque era witnessed the increasing specificity of instrumental combinations. For example, Monteverdi suggests certain instrumental combinations for his opera Orfeo (1607). By the end of the Baroque, concert works might include parts for flutes, oboes, bassoons, horns, trumpets, timpani, and continuo, plus strings. The
Contemporary music review
56
conception of the orchestra as consisting of contrasting families of timbres can be seen, for example, in Handel’s Music for the Royal Fireworks (1749). In the classical period, Mozart was innovative in his application of winds to solo melodic lines; his wind concerti, in fact, are beautifully crafted explorations of wind virtuosity. The origin of the scholarly study of orchestration, and its establishment as a study “independent of [and equal to] the three other great musical powers [i.e. melody, harmony, and rhythm],” can be attributed to Berlioz (1844/1856, p. 4). Instead of conceiving of orchestration as the application of instruments to already completed music, Berlioz saw orchestration as an essential part of the musical ideas themselves, an attitude expressed in the opening pages of A Treatise on Modern Instru-mentation and Orchestration (1844/1856, p. 4). Novel combinations of winds and strings abound in his music, for example, the trombone and flute trio in the “Hostias” and “Agnus Dei” of his Grande messe des morts [Requiem] (1837), and the opening woodwind and horn choir in the Symphonie Fantastique. His invention was remarkable, consider, for example, the instruction for muted clarinet enveloped in a leather bag in Lélio. Technological improvement of instruments, such as the Boehm flute, ca. 1850, and the invention of such important winds as the valved horn and trumpet, was the catalyst for compositional experimentation in orchestration. The winds have at least an equal footing with the strings in orchestral stature by the time of Debussy. Contemporary use of timbral combinations includes the innovations of such composers as Schoenberg in the Klangfarbenmelodie of the third of the Five Orchestral Pieces, op. 16 (1909), the structural use of timbre by Messiaen and Stockhausen, and the dodecaphonic organization of timbres in Slawson’s theoretical work and compositions in Sound Color (1985). It is startling that, with the increasing importance of timbre as a compositional force, particularly accelerated by computer synthesis, experimental studies of timbre perception are so rare. Partially to blame may be the pitch-centricity of Western scholarship, another factor may be the difficulty of manipulating the multidimensional timbral parameters of real instruments. In fact, much of the experimental timbre literature has used brief, steady-state, synthetic signals which are amenable to precise control and analysis. This work is in the tradition of Helmholtz (1863/ 1885/1954) whose subjective impressions of vowel and instrument timbres were quite insightful. Plomp (1976) reviews classical psychoacoustical studies on timbre, including his own work using single periods of musical instruments and organ stops. Temporal aspects of the acoustic spectrum, particularly the attack transient, have been extensively studied since Stumpf (1926); the criterial importance of the attack has been only recently challenged (Kendall, 1984). Very little timbre research has been conducted since Grey’s An Exploration of Musical Timbre (1975), which was based on real instruments. However influential this study has become, it has significant imperfections. Some unusual instruments were used, for example, soprano saxophone, Eb clarinet, bass clarinet, particularly considering that the more common members of the instrument family, such as Bb soprano clarinet, were not included. The trombone, bassoon, and cello were in moderately high registers, and the use of mutes and atypical bowing techniques should be noted. Another limitation was the brevity of the signals (ca. 330 msec) and the fact that line-segment resynthesized tones were employed (see Kendall & Carterette, 1991 for additional details).
Identification and blend of timbers as a basis for orchestration
57
Recently, we have addressed many of these concerns in our explorations of simultaneously sounding wind instruments. Since most music is not monophonic, it is astonishing that little work has been done on perceptual aspects of simultaneously sounding instruments. To our knowledge, the first such experimental investigations were those of Carterette & Kendall (1989) and Kendall & Carterette (1989), which form the basis for the present work, discussed below.1 In addition, Sandell (1989a, 1989b) has reported preliminary work on the “blend” of “concurrent timbres” using 15 of Grey’s (1975) line-segment approximations of brief real instrument tones, whose limitations were just mentioned. Note that the combinations of timbres were mixed and adjusted computationally by Sandell, and not performed in duet; the resultant dyads were not equalized in loudness. Subjects rated the “blend” of all possible 120 pairs of 15 instruments. For each instrument its average blend with all other instruments was calculated. Some results of interest demonstrate that blend is related to the summed distribution of energy in the harmonic series of the two tones, with less blend correlated with more energy in higher harmonics contrasted to lower. Perceptual scaling of simultaneous timbres In a previous investigation, we required subjects to rate the degree of similarity among all combinations of five wind instruments: oboe, clarinet, flute, alto saxophone, and trumpet (Kendall & Carterette, 1991). We utilized six musical contexts: Unison, unison “melody” (consisting of scale degrees 3, 4, 5, 3 based on Bb4), major thirds (Bb4–D5), and harmonized melody (I–IV6–V6–I), the latter two contexts with both instruments of the pair as the soprano (we refer to the principal instrument order [e.g. OF=an oboe-flute dyad with flute in the soprano] as the noninverted major third and noninverted harmony contexts; the reversed instrument orders are called the inverted-instrument harmony and inverted-instrument major third contexts, e.g. FO. Therefore there are two contexts for each of the harmony and major third conditions). Duet performances by professional instrumentalists were digitally recorded in a concert hall; the resulting dyads were equalized in loudness (see Kendall & Carterette, 1991, for details). The ratings of similarity were subjected to a multidimensional scaling analysis (MDS). Briefly stated, the rating of similarity between two sounds is treated as a distance in some geometrical space. MDS attempts to find the “best” configuration of points in this space which minimizes the amount of error. The essential idea is easily illustrated (Figure 1). Suppose that the distance between A and B is 2; B and C is 10; A and C is 12. In this case, the configuration of points is best fit by Figure 1a, a line in one dimension. However, if the distance between A and B is 5; B and C is 9; A and C is 12, then (since AB+BC=14, rather than 12), the points are best fit by Figure 1b, a triangle in two dimensions.
Contemporary music review
58
Figure 1 Illustration of one (a) and two (b) dimensional solutions for distances among three hypothetical stimulus points.
Figure 2 Three dimensional solution for wind instrument dyad similarity ratings for Bd4–D5 major third context. Subjects were nonmusic
Identification and blend of timbers as a basis for orchestration
59
majors. O=oboe, F=flute, T=trumpet, S=saxophone, C=clarinet. For example a pair is represented by letter combination: OF is oboe-flute. The second instrument was the soprano (D5). Figure 2 graphs the three-dimensional solution for similarity ratings of the Bb4– D5 major third context. Notable is the clustering of wind dyads on the basis of “dominant” instrument: Oboe (right side), trumpet (left front), and saxophone (far rear corner). These subjects were nonmusic majors; we have found that music major results produce a similar graph, with even a tighter clustering of dyads. In order to interpret dimensions, we conducted extensive verbal rating experiments (Kendall & Carterette, In Press, a, b). As far as we know, this was the first study to obtain verbal ratings using adjectives derived from a musical source (Piston, 1955), rather than simply relying on nonmusician’s intuition. Contrary to previous research which has emphasized “sharpness” (Bismarck, 1974) “acuteness” (Slawson, 1985), or “brightness” (Bismarck, 1974; Risset & Wessel, 1982) we found the principle dimension to be one of “nasality”: (Figure 2) (D1=nasal (negative, −2) vs. non-nasal (positive, +1). Note that OF (oboe-flute) was the most nasal sounding instrument combination; SC (saxophoneclarinet) was the least nasal. Dimension 2 represents “brilliant” (−1) vs. “rich” (=1), which generally separates trumpet from saxophone. The FC dyad surprisingly is rated as relatively “brilliant,” a finding which holds across contexts. Dimension 3 relates to “strong” and “complex” (+1) vs. “weak” and “simple” (−1). For ease in observing twodimensional pairs, the points are projected as diamonds on the floor and walls of the figure. For example, D1 vs. D2 is the floor; D1 vs. D3 is the rear wall; D2 vs. D3 is the left side wall. We were struck by the fact that “nasal” and “rich” were first used by Helmholtz (1863/1888/1954, p. 118) to describe the relationship between instrument timbres and vowels.
Contemporary music review
60
Figure 3 Two-dimensional INDSCAL solution summarizing wind instrument dyad distances across contexts. Left to right is “nasal” to “not nasal”; top to bottom is “rich” to “brilliant.” From Kendall & Carterette (1991).
Identification and blend of timbers as a basis for orchestration
61
Figure 4 Two-dimensional configuration of single instruments positioned by a professor of musicology on the basis of mental image of timbre (Bb4). Dimension 1 (left to right) was “nasal” to “not nasal” and dimension 2 (top to bottom) was “rich” to “brilliant.” From Kendall & Carterette (1991). We submitted the entire set of scalings over all contexts to INDSCAL (INdividual Differences SCALing), which provided a summary solution. The two dimensions gave essentially as good a fit as three; we present the two-dimensional solution here (Figure 3). The result was a circumplex with a notable gap on both sides of the oboe dyads. Other well-known circumplexes are those for pitch chroma (Shepard, 1964) and visual colors (Shepard, 1962). In terms of verbal attributes, left to right along dimension 1 (Figure 3) is “nasal” to “not nasal”; top to bottom is “rich” to “brilliant.” A circumplex like Figure 3 might arise from the vector sum of the positions of single instruments arranged in two dimensions (Figure 4): Nasal-not nasal and brilliant-rich. In fact, the results of
Contemporary music review
62
positioning single instruments in such a space by a professor of musicology generated a quasi-circumplex similar to Figure 3 (Kendall & Carterette, In Press, a, b).
Experiments in identification and blend We report here new work designed to complement and extend the results of our previous research. We wanted to know the relationship between our verbal and perceptual scaling data and the degree of identifiability of the constituent instruments of a dyad as well as the rated “blend.” In our study, we operationalize “blend” in terms of the extent of “oneness” versus “twoness.” Orchestration treatises differ regarding the degree of emphasis placed upon blend. Indeed, most monographs lightly treat sound combinations, instead focusing on the characteristics of instruments and their uses as soloists within the orchestra. Piston (1955) distinguishes between “instrumentation” and “orchestration” in order to emphasize the fact that orchestration involves more than memorization of instrument properties, and devotes several chapters to instruments in combination—yet blend does not appear as an overriding concern. Rimsky-Korsakov (1913), on the other hand, seems preoccupied with the concept. Some authors suggest that instrument combinations which “blend well” are more desirable than those which do not. However, even a cursory examination of orchestral music leads to the conclusion that the degree of blend is a variable being manipulated by the composer according the demands of the musical context. Therefore it is as useful to know what does not blend as well as what does. The present study investigated blend with two experimental approaches: 1. Ratings, where a subject indicates the degree of “oneness” to “twoness” of wind instrument combinations; 2. Identification, where a subject must name the constituent instruments of a dyad.
Experiment 1: Ratings of perceived blend Methods and materials The aim of this experiment was to discover the degree to which a pair of simultaneously playing natural instruments blended, or fused, perceptually. The stimuli were all possible combinations of flute, clarinet, oboe, trumpet, and alto saxophone, performing the six contexts outlined above. Stimuli were digitally recorded in stereo on stage in a moderately reverberant concert hall (without audience, reverberation time=ca. 1.6 sec). The recordings were sampled in stereo directly to hard disk (35714 samples/sec per channel) with five-pole Butterworth anti-aliasing filters with a cut-off frequency set to 10 Khz.2 Playback and control of experiments was handled by an IBM 80386-based computer and custom software3 (Kendall, 1988). The method required the listener to make a simple response along a 12.7 cm bar displayed on a graphics screen. At the left end of the bar the word “one” was displayed, and at the right end of the bar the word “two.” The subject’s task was to move the pointer
Identification and blend of timbers as a basis for orchestration
63
from its initial, randomly set position along the bar to the position which the subject felt best described his feeling of “oneness” or “twoness”. The subject set the position of the pointer by moving a mouse on a pad. Subjects knew that every sound was played by a pair of wind instruments. There were 9 subjects all of whom were music majors with at least ten years of formal instruction. Each subject heard six blocks of stimuli, grouped by context, in a random order; the six blocks were presented twice and data were averaged. Results and discussion Subject ratings were converted to blend by taking the complement: High scores indicating “twoness” became low scores for blend; low scores indicating “oneness” became high scores for blend. Analysis of Variance (ANOVA) on repeated measures indicated that the mean values for blend across contexts were statistically significantly different (df=5,40; F=3.7p; p<.008), as were the mean values across instruments (df=9,72; F=15.2; p<.0009), and the interaction of context with instrument (df=45,360; F=2.4; p<.0009). Post-hoc analyses (Newman-Keuls) indicated that the inverted-instrument harmony condition (mean=39) over all instruments, was less blended than the unison context (mean=56); also the inverted-instrument harmony was, on average, less blended than the noninverted-instrument harmony (mean=53.3). Figure 5 presents a bar graph of mean blend for each instrument combination averaged over the six contexts. Q-critical (Tukeya) is 15.6; any means which differ by greater than 15.6 points are significantly different. For example, OF, SO, and OT dyads are not different from one another, but are significantly less blended than SF, SC, FT, FC, and TC dyads. In fact, the oboe dyads are consistently less well blended than other combinations. The most important data comes from the relationship between context and instrument, consisting of all components of the design. Figure 6 provides a graph of the context by instrument interaction. We have separated the data into three graphs for the sake of clarity. Q-critical (qk=22) for the interaction was determined by the geometric mean of the extremes of the Newman-Keuls gap-order differences. In general, the set of contexts with inverted instruments led to less blending. In particular, saxophone dyads, with the instrument playing in a moderately high register and mostly open, blended less well in inverted-instrument contexts (Figure 6, center & right). Conversely the trumpet/clarinet (CT)4 combination is more blended in inversion (Clarinet-Bb4; Trumpet/D5), as are FO, TO, and CO for inverted-instrument major thirds (IMAJOR3), contrary to orchestration treatises (see Piston, 1955, p. 423). However, when a musical context is admitted, the oboe
Contemporary music review
64
Figure 5 Rank order, from least to most, of mean blend (averaged over context) for the ten dyads. dyads become less blended (Figure 6, right panel). We speculate that, in the static singlenote major-third, the contrast in time-variant spectral properties between a relatively stable oboe and flute, or trumpet, or clarinet are perceptually salient in noninvertedinstrument form, with oboe on the bottom, and become more blended with the relatively stable oboe on the top. This is most evident for OF, which is the least blended condition. The flute, playing with vibrato, is perceptually salient in the soprano, breaking apart from oneness with the oboe. With oboe in the soprano, the flute is less salient, and the “steady” oboe timbre dominates the results in increasing blend, a point supported by the restricted RMS range of the oboe reported in Kendall & Carterette (1990). It is interesting to note that the distance of a pair of instruments from each other in Figure 4 is a rather good predictor of blend, with the greater the distance the less the blend. For example SO, OT, OC are far apart, and they do not blend well. On the other hand, FC, TC, FT are relatively close together and have high blend (Figure 5). Therefore, there is a relationship between the perceptual scaling of similarity (Figure 2) and blend, although not perfect. “Nasal” combinations blend less well than “brilliant” or “rich.” We can conceive of the task of assigning a verbal attribute, such as “blend,” to a sound event as superimposing two multidimensional configurations, one sonic, the other verbal. The verbal configuration relates the target word, “blend,” to the multiplicity of meanings
Identification and blend of timbers as a basis for orchestration
65
about itself in relation to all other words. The “meaning” of the target timbre is defined by its position relative to mental structures of timbres.
Figure 6. Mean blend of the ten dyads for each of the six contexts (dyad×context interaction). For clarity, similar contexts are paired. The mapping of one structure to another is variable, and fragile. In short, there is no isomorphic mapping of verbal spaces to sonic. Therefore, a perfect correlation between ordinal values for distances in perceptual space and rating values of “blend,” or any other verbal attribute, should not be expected, nor was such found in this study.
Experiment 2: Identification of component instruments within dyad Methods and materials This experiment was designed to discover the extent of a subject’s ability to explicitly identify the two instruments sounding in a dyad under differing musical contexts. We hypothesized that dyads with greater blend would be more difficult to identify, and the confusions among the errors would be related to proximity of instruments in perceptual space. The same stimuli and contexts used in Experiment 1 were employed. Eight musically trained subjects were presented with six blocks of stimuli grouped by context. In order to participate in the experiment, subjects had to identify the five single instruments (oboe, clarinet, alto saxophone, flute, trumpet) playing the melody “All Through the Night” with complete accuracy, twice. The subject’s task was to select, from a list presented on a computer screen, the correct pair of instruments then sounding from the ten possibilities (i.e. a closed set). All presentations were completely randomized; the experimental sessions were controlled by the same equipment described under Experiment 1. Results and discussion Tables 1a and 1b present the identification data for the unison and harmony contexts (space does not permit presentation of all six matrices). Rows represent the dyad heard,
Contemporary music review
66
and columns the number of responses made to a particular choice. For example, in Table 1a, the OF×OF upper left-hand corner indicates that 6 correct responses were made; reading across the row, one finds that one incorrect response was made to each of OC and SO. Therefore, the diagonals represent correct identifications, with a perfect score of 8; all off-diagonal entries are errors. Note the spread of incorrect entries for the unison context in comparison to the harmony context. In general, the number and spread of incorrect choices decreases from unison to unison melody to major third to harmony. For example, note the wide spread of incorrect responses for SC in unison, however, for harmony, errors concentrated to a single dyad, SF. It is worth noting that the OF and OT conditions are particularly easy to identify in unison context. It may be that the vibrato of the flute and trumpet in the soprano aids in the separation, that is, the time-variant spectral modulation is used as a perceptual cue for instrument identity. In fact frequency×amplitude×time plots of unison spectra (Kendall & Carterette, In Preparation) indicate that contrasts of regions of relatively stable energy with regions of time-variant energy (in all combinations) are related to degree of identifiability.5 The inverted-instrument third and harmony contexts yielded slightly fewer correct responses than the non-inverted condition, in particular for the oboe dyads. We note that for inverted-instrument conditions the blend ratings increased, corresponding to an identification accuracy decrease.
Table 1a Context: Unison OF OF
OT
<6>
OT
<7>
OC
1
OC
SO
1
1
1
ST
<5>
1 1
1
<5>
2
1
1
<2>
1
2
FC
FT
TC
1
1
FC
2 2
SC
1
<2>
FT TC
ST
SC
1
SO SF
SF
1 2
1
<3>
1
1
2
<3>
1
1
1
2
1
1
SF
ST
1
2
<3> 2
1
<1>
TC
FC
SC
Table 1b Harmony OF OF
<8>
OT
2
OC SO
OT
OC
SO
<6>
1
<6> 1
<8>
FT
Identification and blend of timbers as a basis for orchestration
SF
<6>
ST
1 <7>
FT 1
FC SC
1 3
1
1 <7>
TC
67
1 <6>
1
1
<4>
2 <5>
As hypothesized, errors tend to choices with an instrument in common with the sounding dyad. For example, even for SC in the unison condition, 4 responses are made to other saxophone dyads, and three are made to other clarinet dyads. In fact, regarding the constituents of the dyad (Figure 4), once an identification of one instrument of the pair is made, an erroneous second instrument is picked from those in close proximity. This would seem to imply a rather close relationship between blend and identifiability, a possibility quantified below. Each subject’s responses were converted to score data by simply counting correct answers. Therefore the maximum mean for a given context would be 8.0. Across instruments, identification accuracy ranged from 4.625 for unisons to 7.875 for noninverted-instrument harmony. The only significant differences were between the unison single-note context and the group of non-unison contexts (df=5, F= 4.524, p<.003). Average correct identifications for instruments across contexts are presented in Figure 7. There is a general inverse relationship with blend (Figures 4 & 5); the correlation of mean identification and mean blend (across contexts) is low and
Contemporary music review
68
Figure 7 Rank order, from least to most, of mean correct identification (averaged over context) for the ten dyads. negative (−0.347). However, a stronger relationship is found for individual, rather than averaged, contexts: The correlation of mean identification and blend for unisons was −0.731; for harmony, −0.588. The reason for the lower correlation for the harmony condition is almost certainly due to a “ceiling” effect; that is, the relative accuracy of identification was nearly perfect for the harmony context (mean=7.9). The results indicate that, in general, the greater the blend the poorer the identification. In unison contexts, constituents of dyads are difficult to identify (mean—4.6, 57.5%), even for highly trained musicians. General discussion and conclusions In this study, we ascertained the degree of blend and identifiability of soprano orchestral winds. In general, the unison context produced the highest blend ratings, and the lowest identification. There was a moderately high negative correlation between degree of blend and accuracy of identification. We found that the degree of blend corresponded with the positions of instruments in a two-dimensional similarity space, that is, distances of instruments from one another in similarity corresponded to their degree of blend (Figures
Identification and blend of timbers as a basis for orchestration
69
3 & 4). Ordinal rankings of blend (Figure 5) and of identification (Figure 7) only loosely correspond to single dimensions in similarity space (Figures 2 & 3). In particular, nasal dyads are easiest to identify and produce the least blend.6 We note that blend is probably correlated with both energy and time-variancy contrasts of lower to upper partials. In fact, the energy ratio of fundamental to other components (the so-called total harmonic ‘distortion’) in a dyad is correlated with the degree of nasality, and thus with blend as well (Kendall & Carterette, 1989). One question which arises is how “blend” relates to the results of perceptual similarity spaces, reported earlier. In order to answer this question, we decomposed the blend ratings for dyads into a lower-half triangular matrix without the diagonal—e.g. we placed the blend rating for OF as an entry in row O column F, and the blend rating for ST in row S column T, and so forth. This triangular matrix was subjected to classical MDS with Kruskal stress reduction. The resulting two-dimensional spaces (stress<0.00031) for the five instruments are shown in Figures 8a (Unisons) and 8b (Noninverted-instrument major third). These should be compared to Figure 4, which arose from the positioning of instruments in the space by a musicologist; these spaces are nearly identical, a fact which will now be quantified. We used the data points of Figures 8a and 8b in order to compute for all possible pairs the vector-sums from the theoretical procedure in Kendall & Carterette (1991). These points create a hypothetical “space” corresponding to that derived from perceptual similarity scaling. Figures 9a and 9b show the results of this calculation. The Pearson correlations between this “blend” space and that of composite similarity are: Dimension 1, 0.893 (Unison) and 0.874 (Major third); Dimension 2, 0.843 (Unison) and 0.920 (Major third). This demonstrates a remarkable convergence between differing methods and models. These results should be useful for both the traditional composer/orchestrator, as well as those working with electronic music. A space based upon nasality and richnessbrilliance can be used as a basis for choosing timbral patterns, paralleling Slawson’s (1985) dodecapohonic techniques which are based on vowel timbres. Such a space also provides the basis for musicological analysis based on weighted sums of instrumental densities (specific gravities) in scores. The emergence of a circumplex (Figure 3) implies the existence of prototypes (for example OC, TC, ST, SO) as theorized by Lerdahl (1987). What is needed, however, is a thorough set of perceptual scalings for tones in different tessituras for a relatively large body of instruments. We believe that instrumental scalings such as those of Figures 2 & 3 should not be limited to a single octave. Clearly, having high blend is not a criterion for frequency of occurrence in musical scores. The highest mean blend was found for TC, which is not a particularly common pairing. Yet OF, which is common, has the lowest mean blend. As we noted in our theory of musical communication (Kendall & Carterette, 1990), the musical fabric is comprised of time-ordered sequences of state-changes, the fundamental operant principle of which is that of contrast. Blend contrasts are employed in generating musical messages. We theorize that timbral contours, that is, changes of direction and magnitude in timbral space, demark important macro-and micro-structural points. The stratified layering of various periodic and quasi-periodic contours in time leads to the composite texture of the musical work. Composite timbres and blend contours are one such layer, and deserve more attention than they have been given.
Contemporary music review
70
Figure 8 Classical multidimensional scaling solution for decomposed blend
Identification and blend of timbers as a basis for orchestration
71
ratings. Stress was less than 0.00031 in two dimensions. Figure 8a is for unison and Figure 8b for noninvertedinstrument major third contexts.
Contemporary music review
72
Figure 9 Theoretically-derived dyad points from vector sums of the
Identification and blend of timbers as a basis for orchestration
73
positions of instruments in Figure 8. Figure 9a is for unison and Figure 9b for noninverted-instrument major third contexts. Notes 1. Pepinsky (1941) conducted an investigation of combinations of brass instruments, but the procedures were informal and reports of composite timbral quality were subjective—not based on operational definition. It is notable that fifty years ago Pepinsky’s composite mathematical structures took into account the Fourier spectra of the individual instruments played at intensities from piano to mezzo forte, including the psychophysical masking effects of the partials of the individual instruments. It is unfortunate that the technology to synthesize tones based upon this calculations was not yet available. We note that Pepinsky also used the terms “rich” and “brilliant” to describe the timbres of brasses, the second dimension of our analysis. 2. Therefore, at the frequency threshold of 20 Khz, attenuation was approximately 30 dB IL. 3. Additional details can be found in Kendall & Carterette, 1991. 4. Order of letters corresponds to the bottom and top instruments of the dyad: OF=Oboe-Bb4; Flute-D5; FO=Flute-Bb4; Oboe-D5. Graph labels are consistently in noninverted order for sake of clarity. 5. OF, which was easy to identify, has a stable fundamental partial, with highly time-variant partials 2–4; SC, which was hard to identify, had highly time-variant partials 1–3, with a stable upper partial region. 6. Sandell (1989a) provides an ordinal ranking of blend for a small subset of his dyads (these were not naturally recorded or mixed instruments, see above). We are unsure of the relation of his work to the present study, since he did not operationalize the verbal attribute of “brightness” vs. “darkness.” The saxophone-oboe combination was least blended in his study, which was not the case in ours. Differences in methods, materials, analytical techniques, and the limited results given in Sandell, make it difficult to compare the two studies.
References Berlioz, H. (1856) A. Treatise on Instrumentation and Orchestration. Mary Cowden Clarke (trans.) London: Novello, (original edition 1844). Bismarck, G.von Timbre of steady sounds: A factorial investigation of its verbal attributes. Acustica, 30, 146–159. Carterette, E.C. & Kendall, R.A. (1989) Dynamics of musical expression, Journal of the Acoustical Society of America. 85, Supplement 1, S141. Grey, J. (1975) An Exploration of Musical Timbre. Doctoral Dissertation published as report STAN-M2, University of Stanford. Helmholtz, H.von (1954) On the sensations of tone as a physiological basis for the theory of music. [Die Lehre von den Tonempfindugen als physiologische Grundlagefuer die Theorie der Musik.] English translation by A.J. Ellis, reprinted by Dover Publications, New York. Original edition, Braunschweig, F.Vieweg & Sohn, 1863.
Contemporary music review
74
Kendall, R. (1984) The role of transients in listener categorization of musical instruments: An investigation using digitally recorded and edited musical phrases. Dissertation Abstracts International, 45(8), 2297A. (University Microfilms No. 82–25754). Kendall, R. (1988) A sample-to-disk system for psychomusical research. Behavior Research Methods, Instruments & Computers, 20(2), 129–136. Kendall, R.A. & Carterette, E.C. (1989) Perceptual, verbal, and acoustical attributes of wind instrument dyads. Proceedings of the First International Conference of Music Perception and Cognition. Kyoto, Japan: Japanese Society for Music Perception and Cognition, 365–370. Kendall, R. & Carterette, E. (1991) Perceptual scaling of simultaneous wind instrument timbres. Music Perception, 8(4), 369–404. Kendall, R. & Carterette, E. Verbal attributes of simultaneous wind instrument timbres I: von Bismarck’s Adjectives. Music Perception, In Press, a. Kendall, R. & Carterette, E. Verbal attributes of simultnaeous wind instrument timbres II: Adjectives induced from Piston’s Orchestration. Music Perception, In Press, b. Kendall, R. & Carterette, E. Acoustical attributes of simultaneous wind instrument timbres. In Preparation. Kendall, R. & Carterette, E. (1990) The communication of musical expression. Musical Perception, 8(2), Winter, 129–164. Lerdahl, F. (1987) Timbral hierarchies. Contemporary Music Review, 2(1), 135–160. Pepinsky, A. (1941) Masking effects in practical instrumentation and orchestration. Journal of the Acoustical Society of America, 12, 405–408 (Abstract). Piston, W. (1955) Orchestration. New York: W.W.Norton. Plomp, R. (1976) Aspects of Tone Sensation. London: Academic Press. Rimsky-Korsakov, N. (1964) Principles of Orchestration. Trans, by E.Agate. New York: Dover. Original publication, St. Petersburg, 1913. Risset, J. & Wessel, D. (1982) Exploration of timbre by analysis and synthesis. In D.Deutsch (ed.) The Psychology of Music, New York: Academic Press. Sandell, G. (1939a) Perception of concurrent timbres and implications for orchestration. Proceedings, International Computer Music Conference, 268–272. Sandell, G. (1989b) Effect of spectrum and attack properties on the evaluation of concurrently sounding timbres. Unpublished text of paper delivered at the 118th meeting of the Acoustical Society of America. Shepard, R. (1962) The analysis of proximities: Multidimensional scaling with an unknown distance function. II. Psychometrika, 27, 229–237. Shepard, R. (1964) Circularity in judgments of relative pitch. Journal of the Acoustical Society of America, 1964, 36, 2346–2353.
What is the octave of a harmonically rich note?
75
.
What is the octave of a harmonically rich note? Roy D.Patterson, Robert Milroy and Michael Allerhand MRC Applied Psychology Unit, University of Cambridge, UK Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 69–81 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
A sound composed of the harmonics of 131 Hz produces the note C3 with a buzzy timbre if the components have equal amplitude and are in cosine phase. Notes with the same tone chroma and a tone height between C3 and C4 can be produced by 1) attenuating the lower harmonics of the sound, 2) attenuating the odd harmonics, or 3) shifting the phase of the odd harmonics. The effects of the manipulations were measured in an octave experiment: a note was chosen at random and used to play a brief melody on the notes around C; the octave varied from C1 to C6 and the listeners judged the octave of each melody on a scale from C0 to C7. The results show that waves with the same period can lead to average octave judgements that differ consistently by more than half an octave, and that a substantial component of many timbre differences (e.g. that between a piano and a harpsichord) is actually a toneheight difference. The effects of manipulations 2) and 3) are difficult to explain with traditional hearing theories because the manipulations do not affect the centre of gravity of the spectrum of the sound. The effects can be explained by the “spiral” model of pitch (Patterson, 1987) because the spokes of the multi-channel spiral contain both a spectral dimension (within circuits) and a temporal dimension (across circuits). It appears that octave judgements are closely related to the position of the centre of gravity of activity on the main spoke of the spiral. KEY WORDS: Pitch perception, timbre perception.
Contemporary music review
76
Introduction There is a serious discrepancy between the psychological representation of pitch, the mel scale, and the musical representation of pitch (the pitch helix). The mel scale is a monotonic, unidimensional mapping of the frequency of a pure tone (a sine wave); the helix is a cyclic, bi-dimensional mapping of the repetition rate of multi-harmonic tones (musical notes). The circular dimension of the pitch helix is tone chroma and the longitudinal dimension is tone height (see Ueda & Oghushi, 1988, for a review). Recently, Patterson (1989, 1990) has emphasized the bi-dimensionality of pitch by demonstrating that one can construct a sequence of notes in which tone height rises an octave while tone chroma remains fixed. Consider a sound composed of 20 harmonics of 100 Hz, and the perceptual change that occurs as the odd harmonics (100, 300, 500,…) are attenuated, as a group, by an ever increasing amount. The tone height rises smoothly from 100 to 200 Hz without any change in tone chroma. In retrospect, this is not surprising; when the attenuation is greater than about 20 dB, the odd harmonics are effectively removed, leaving a harmonic series that is the octave of the original note (200, 400, 600,…). What it demonstrates, however, is that the mel scale is completely inadequate as a representation of pitch; it cannot explain how we move continuously from a note to its octave without going through all the intervening tone chromas. The pitch helix has a separate dimension for tone height and, in this case, the new data can be accommodated simply by assuming that it is possible to move continuously along a line from a note to its octave, as well as around the chroma circle. The fact that notes exist between the circuits of the helix means that it is actually a helical cylinder rather than a helical wire, as suggested previously. But the helical cylinder is an obvious extension of the traditional representation. Patterson (1990) showed that there are several ways to alter the tone height of a multiharmonic tone without changing its tone chroma. One can attenuate the even harmonics rather than the odd harmonics, and one can phase shift either the even or the odd harmonics. In general, however, he employed only the extremities of the stimulus dimensions; that is, he used only complete attenuation or 90-degree phase shifts. In the first part of the current paper, we report data from an experiment designed to measure intermediate points on the attenuation and phase-shift functions. The effect of phase on tone height is particularly difficult to explain within a spectral theory of pitch perception; the longterm spectrum is not affected by the phase shift. In the latter part of the paper we describe a spectro-temporal model of hearing that explains the tone-height changes and how one might measure tone height.
I. The experiment Method The control stimulus was a set of 28 harmonics of a fundamental, fo. The amplitude of the harmonics was reduced at the rate of 1.5 dB per octave from harmonic 1 to 24; beyond
What is the octave of a harmonically rich note?
77
that the amplitude fell 12 dB per component. The fundamental ranged from 31.25 to 1000 Hz in octave steps; for convenience, the notes are designated C1–C6, although they are about 5% below the corresponding keyboard frequencies. All the components started at their maximum value and so these control stimuli are referred to as ‘cosine-phase’ or CPH sounds. In one experimental condition, the amplitude of all the odd harmonics was reduced by 9, 18 or 27 dB. These stimuli are referred to as ‘alternating-amplitude’ or AAMP sounds. They have the same spectral centre of gravity as the corresponding control sound. In another condition, the starting phase of the odd harmonics was shifted by either 60 or 90 degrees. These stimuli are referred to as ‘alternating-phase’ or APH sounds. They and the CPH sound have the identical longterm spectrum. In the final condition, the experimental mahipulations were combined; the odd harmonics were attenuated by a fixed 9 dB and then these same harmonics were phase shifted study of listeners’ abilities to detect the phase manipulation is presented in either 60 or 90 degrees. They are referred to as AAMP/APH sounds. A detailed Patterson (1987b) along with a review of previous studies of APH sounds. On each trial of the experiment, one of the sounds was chosen at random (without replacement) and used to construct a short melody that converged on three identical half notes. The pitch of the half notes was one of the notes C1–C6; the duration of the half notes was 500 ms. The listener’s task was to judge the octave of the half notes. The primary concern in these studies is the musical perception of sounds rather than the audibility of individual harmonics. Presenting the sound as a melody promotes synthetic listening over analytic listening. Details of the procedure and rationale are presented in Patterson (1990). The musical designation of the C’s on the keyboard C0–C8) was explained to the listeners (C4 is middle C). Although the notes in the experiment ranged from C1– C6, the listeners were told to use the range C0–C7 to ensure that their responses were not artificially restricted by the response scale. We told them that some of the notes would have intermediate octave values and instructed them to use two digit responses with the second digit indicating the position within the octave. There were five listeners and all of them found the task easy to perform. There were two training runs in which all of the stimuli were presented once, and then 10 replications of the complete experiment. Although the average response did vary across listeners, they all produced the same pattern of results, and so, for brevity, the discussion is limited to the average data. Thus, there are 50 judgements per point for the average data presented in the next section (5 listeners by 10 replications). Results The data are presented as ‘confusion-matrix’ plots in Figure 1; the physical octave (that is, the true period of the acoustic waveform measured in octaves of 31.25 Hz) is the abscissa, the average octave response is the ordinate. If the listeners invariably gave the physical octave as their response, the data would fall along the central dashed diagonal running from (1,1) to (6,6) in each subfigure. The upper and lower diagonals show where responses an octave above and below the physical octave would fall, respectively. The AAMP results are presented by the bold lines in Figure 1a. The bottom data line (solid line, no symbols) presents the data for the control stimulus (the CPH sound). These
Contemporary music review
78
data show that listeners can readily identify the octave of the basic sound and label it as requested with octave numbers. The average response bias is only 0.16 octaves and the largest average deviation is 0.49 octaves. The top line (filled diamonds) presents the average response when the attenuation is 27 dB. When compared with the upper dashed diagonal, it shows that the average response is about an octave about the response to the corresponding CPH sound, indicating that 27 dB is effectively complete attenuation. The average bias in this case is 1.17, or 0.17 above the upper octave. The remaining lines with open squares and open diamonds present the data for the conditions where the attenuation is 9 and 18 dB, respectively. The 18-dB data fall above the 9-dB data and both fall in the range bounded by the CPH and 27-dB data. The average biases for the 9- and 18-dB data are 0.45 and 0.93, respectively, which when measured relative to the average biases for the CPH and 27-dB conditions, shows them to be 29% and 77% of the way from the initial to the final octave, on average. These percentages are reasonably representative for the upper three octaves 229–230 & 6), but as the physical octave decreases from 3 to 1, a given attenuation value has progressively more effect on the octave response. In summary the AAMP data show that a) octave judgements are highly regular, b) the attenuation required to raise the response a full octave is surprisingly large, 27-dB, and c) it is possible to measure tone-height as a function of the attenuation of the odd harmonics. The APH results are presented in Figure 1b. The solid line (no symbols) presents the same CPH data as in Figure 1a. The phase shift elevates the octave response for the lowest two octaves but not at the higher octaves. The 60- and 90-degree APH data give the same results in this case. Patterson (1987b) showed that the detectability of the phase shift was limited to sounds with repetition rates less than 400 Hz which
Figure 1 Average octave responses for five listeners as a function of physical octave presented, for sounds in which the odd harmonics are (a) attenuated 0–27 dB, (b) phase shifted 0–90 degrees, or (c) attenuated 9 dB and
What is the octave of a harmonically rich note?
79
phase shifted 0–90 degrees. The central dashed diagonal in each subfigure shows the position of responses at the physical octave. means that phase shifts were not expected for the upper two octaves (500 and 1000 Hz). A difference might have been expected, however, for the middle two octaves (125 and 250 Hz). The AAMP/APH results are presented in Figure 1c. The solid line (open squares) shows 9-dB AAMP data replotted from Figure 1a as the appropriate comparison. The upper two curves show the data for sounds where the odd harmonics are attenuated 9 dB and the same components are phase shifted either 60 degrees (open triangles) or 90 degrees (frilled triangles). In this case we see the expected effects. Both phase shifts raise tone height for octaves below 400 Hz and not above, and the 90 degree phase shift has more effect than the 60 degree shift. It seems likely that the elevation of the CPH responses to sounds at octaves three and four limits our ability to measure a phase effect when there is no attenuation of the odd harmonics. In summary, attenuating the odd harmonics and phase shifting the odd harmonics both raise tone height, and the manipulations combine to produce additional rises in tone height. These results are difficult to explain with spectral models of pitch perception.
II. Modelling octave perception The waveform for the CPH sound, with fundamental 125Hz, is presented in Figure 2, along with examples of the 125-Hz waveforms produced by attenuating or phase shifting the odd harmonics. The CPH wave (Figure 2a) is a modified pulse train; the small oscillations leading away from the pulses simply indicate the absence of fullsize harmonics in the region above the 24th. A comparison of the CPH wave with the AAMP(9) and AAMP(18) waves in Figures 2b and 2c shows that attenuating the odd harmonics produces a secondary pulse halfway through the period of the CPH wave. As the attenuation increases, the secondary pulses grow and the primary pulses shrink, and when the attenuation is carried to completion, the pulses are equal in size and the fundamental becomes 250 Hz. The APH(60) wave in Figure 2d shows that phase shifting the odd harmonics also introduces a secondary pulse in the waveform, but in this case, the peak of the pulse is just after the mid-point of the period. Both the primary and secondary pulses have negative excursions and they are on opposite sides of the pulses. As the phase shift increases to 90 degrees (Figure 2e) the primary and secondary pulses converge on the same height but the asymmetries in pulse position and pulse shape remain. As the odd harmonics of the APH waves are attenuated, the asymmetries in pulse position and pulse shape diminish as shown by the AAMP(9)/APH(90) wave in Figure 2f. Auditory sensation processing and auditory images
Contemporary music review
80
The APH sounds were originally introduced in a study of monaural phase perception (Patterson, 1987a, b) designed to support a spectro-temporal model of hearing—the Pulse Ribbon Model. The work has now been extended, with the addition of a temporal integration mechanism, to the point where it can stimulate the Auditory Sensation Processing necessary to convert acoustic waves into a reasonable representation of the auditory images we hear when presented with musical notes (Patterson & Holdsworth, 1990). The first stage in the model is an auditory
Figure 2 Four cycle segments of selected stimuli with 8-ms periods (fo=125 Hz). The sounds are (a) CPH,
What is the octave of a harmonically rich note?
81
(b) AAMP(9), (c) AAMP(18), (d) APH(60), (e) APH(90), and (f) AAMP(9)/APH(90). The attenuation and phase-shift manipulations introduce secondary peaks in the central portion of the period of the CPH wave. filterbank which performs a spectral analysis much like that of the basilar membrane (Patterson & Holds worth, 1991). The second stage is a bank of two-dimensional, adaptive-threshold generators that perform compression, rectification, adaptation and suppression on the outputs of the individual channels of the filterbank (Holdsworth, 1990). In so doing the generators simulate the function of the hair cells in the cochlea and convert the output of the filterbank into a simulation of the neural activity pattern flowing from the cochlea. The final stage of the model converts the fast flowing neural activity patterns of periodic sounds into stabilised auditory images through a process referred to as triggered, quantized, temporal integration (Patterson & Holdsworth, 1990, 1991). In essence, the larger peaks in each channel of the neural activity pattern are used as strobe pulses for the integration process. When they occur, a portion of the neural activity pattern in that channel is transferred as a unit to the corresponding channel of the auditory image and added point for point to what is already there. When a sound is periodic, the strobe pulses are synchronised to the period of the wave and the sections of the neural pattern transferred to the image are all very similar. They are also aligned and, as a result, they accumulate to form an auditory image that is stationary even though the neural activity pattern is streaming past at a rapid rate. The auditory model has been implemented as a computer program that converts waves into auditory images; the images of the waves in Figure 2 are presented in Figure 3. The filterbank has 49 channels with the lowest and highest filters centred at 100 and 2500 Hz, respectively; the filter spacing is quasi-logarithmic. Both the neural activity pattern and the auditory image have the same number of channels as the filterbank. The channels are indicated by the horizontal lines in each subsection of Figure 3. The abscissa is ‘time since the last strobe pulse’. The auditory image of the CPH sound is presented in Figure 3a. The period of the wave is 8 ms and, in the auditory image, the channels with activity show a peak at this time. In the upper channels there is no activity in the centre portion of the period. Lower channels containing resolved harmonics have peaks with reduced amplitude in the centre portion of the period. The harmonic number of a resolved harmonic can be identified by the number of peaks in one period of the wave. For example, the second harmonic which appears in channels 7–10 has two peaks per period. The effect of attenuating the odd harmonics is shown in the auditory images of the AAMP(9) and AAMP(18) sounds (Figures 3b and 3c). The initial attenuation (9 dB) causes a thinning of the ridge of activity at 8 ms, and an increase in activity in the central portion of the period. As the attenuation continues (Figure 3c), the patterns of activity along the ridge and in the central section become more and more similar, and for attenuations in excess of 25 dB,
Contemporary music review
82
the pattern is essentially the same; that is, the CPH 8-ms sound has become a CPH 4-ms sound. Thus, in the auditory image model, the perception of a continuous progression from a note to its octave (Figure 1a) occurs through continuous change in the relative strength of peaks in the image that the note and its octave have in common. The effect of phase shifting the odd harmonics is shown in the auditory images of the APH(60) and APH(90) sounds (Figures 3d and 3e). The phase shift does not cause a thinning of activity on the 8-ms ridge because it does not reduce the level of the odd harmonics. Rather, it reduces the suppression of peaks in the centre of the period for all harmonics. As a result, the even resolved harmonics and all of the unresolved harmonics are a little stronger in the central portion of the APH images than they are in the CPH image. When the odd harmonics are reduced by 9 dB, the
What is the octave of a harmonically rich note?
83
Figure 3 Auditory images of the sounds presented in Figure 2: (a) CPH, (b) AAMP(9), (c) AAMP(18), (d) APH(60), (e) APH(90), and (f) AAMP(9)/APH(90). The manipulations introduce activity into the central portion of the auditory
Contemporary music review
84
image where the CPH image has no activity.
(see facing page) phase shifting has a greater effect. The activity in the central portion of the AAMP(9)/APH(90) sound (Figure 3f) is greater than it is in the corresponding portion of either the AAMP(9) sound (Figure 3b) or the APH(90) sound (Figure 3e). Indeed, phase
What is the octave of a harmonically rich note?
85
shifting the attenuated components increases activity in the central portion of the image to levels like those in the AAMP(18) image. A comparison of the data in Figures 1c and 1a shows that the combination of manipulations also increases the octave responses for AAMP(9)/APH(90) to near the level of those for AAMP(18). Spiral Excitation Patterns In the auditory image model, the tone chroma of the sound is determined with the aid of a “spiral processor” (Patterson, 1986; Patterson & Holdsworth, 1990). In essence, the processor looks at the activity on sets of vertical slices of the auditory image that are separated by doublings in time to see if one such set has more activity than the others (Patterson & Nimmo-Smith, 1986). For the sounds in the current experiment, the sequence of vertical slices at 1, 2, 4, 8, 16, 32 and 64 ms is correctly identified as the tone chroma in every case. The tone height of the notes is not specified by the spiral processor in its original form. But the fact that the octave-response data are closely related to the activity on the 4- and 8-ms slices of the auditory images suggests that the spiral algorithm might be extended to include a measure of tone height, and so become a complete model of musical pitch. Accordingly, the model was used to extract slices through the auditory images in Figure 3 at 1 ms and its successive doublings. The slices at the 8-ms point and the 4-ms point are presented in the left and right columns of Figure 4, respectively. The abscissa is channel number; the spectal resolution was increased to 99 channels for these plots. For convenience, all of these activity patterns will be referred to as ‘spiral excitation patterns’, or simply ‘excitation patterns’ when there is no ambiguity. In the lefthand column of Figure 4a is the spiral excitation pattern at the physical period of the CPH sound (8 ms). It is much like the traditional excitation pattern that any spectral model of pitch would produce for the CPH sound, with five largely resolved harmonics in the lower channels (0–50) and a band of largely unresolved harmonics in the higher channels. The spiral excitation patterns on slices farther along the sequence (e.g. at 16, 32 and 64 ms) are all very similar to that at 8 ms. The same is not true, however, at shorter periods in the sequence. The righthand portion of Figure 4a shows the spiral excitation pattern at 4 ms for the CPH sound. Only the channels associated with the second and fourth harmonics appear in this excitation pattern, since only these components have peaks half way through the period of the sound. This abrupt change in the shape of the spiral excitation pattern signals the position of the octave. Excitation patterns at 2 and 1 ms show even less activity than that at 4 ms. One procedure for locating the position of rapid change would be as follows: Chose a low fundamental of the given tone chroma, say 31.25 Hz, and for each harmonic of that fundamental, calculate the difference between the level of the harmonic in one excitation pattern and the next. The point in the sequence where the level difference is greatest is the tone-height estimate associated with that harmonic. The overall tone-height value for the sound is a weighted average of the individual estimates. Consider the case of the CPH sound. The spiral excitation
Contemporary music review
86
Figure 4 Spiral excitation patterns extracted at 8 ms (lefthand column) and 4 ms (righthand column) from the
What is the octave of a harmonically rich note?
87
auditory images in Figure 3: (a) CPH, (b) AAMP(9), (c) AAMP(18), (d) APH(90), and (f) AAMP(9)/ APH(90). Tone height increases when the similarity between the 8- and 4-ms patterns increases. patterns beyond 8 ms in the sequence are all very similar to that at 8 ms, and so the level differences would be small for each and every harmonic. For most harmonics, there is one large level difference which occurs between the 8- and 4-ms excitation patterns, as shown in Figure 4a. The patterns at 2 and 1 ms have no activity for most of the harmonics. The only exception is harmonic 2. Its strong presence in the 4-ms excitation pattern means that its tone-height estimate will occur between the 4- and 2-ms patterns— an octave higher than the rest. Thus, the overall tone-height value will be a little over octave 3. Now consider how the tone-height values would change as the odd harmonics are attenuated, or phase shifted, or both. The attenuation of the odd harmonics rapidly removes them from the 8-ms excitation pattern as shown by the patterns for AAMP(9) and AAMP(18) on the lefthand sides of Figures 4b and 4c. At the same time, however, it removes the odd harmonics from later patterns in the sequence and so the odd harmonics rapidly drop out of the calculation altogether. The attenuation increases the activity of the even harmonics in the 4-ms excitation patterns as shown on the righthand sides of the same subfigures. This gradually shifts the maximum-level-difference of more and more harmonics from the 8-ms/ 4-ms pair of patterns to the 4-ms/2-ms pair of patterns, and so gradually raises the overall tone-height value from octave 3 to octave 4. Shifting the phase of the odd harmonics has essentially no affect on the 8-ms excitation pattern of the sound and those further along in the sequence (compare the pattern in the lefthand column of Figure 4a with those in Figures 4d and 4e). It does, however, affect the 4-ms pattern because it reduces suppression of peaks in the even harmonics in the centre portion of the period (compare the righthand column of Figure 4a with those in Figures 4d and 4e). Thus, for the even harmonics, it decreases the level differences associated with the 8-ms/4-ms pair of excitation patterns and increases the level differences for the 4-ms/2-ms pair of patterns. The effect on the overall tone-height value is rather small in this case, but it is also a small effect in the data (Figure 1b). The combination of 9/dB of attenuation and a 90-degree phase shift raises the even harmonics in the 4-ms excitation pattern to near the level of those in the 4-ms pattern of the AAMP(18) sound. Thus, it moves the position of the large level differences to the 4ms/2-ms pair of patterns for many of the even harmonics. At the same time, the attenuation reduces the influence of the odd harmonics, and so there is a stronger rise in the overall tone-height value—a rise that is mirrored in the octave-response data. In summary, then, it is possible to produce a tone-height measure that will track the rise in tone height produced by attenuating and/or phase shifting the odd harmonics of the CPH sound. In traditional spectral models, the spiral excitation patterns do not exist as separable entities and so this type of measure is precluded from the outset.
Contemporary music review
88
Acknowledgements The authors would like to thank John Holdsworth for programming the cochlea simulation and the platform for the auditory image model. The work was supported by the MRC and two grants: MOD PE XTR/2239 and Esprit BRA 3207.
References Holdsworth, J. (1990) Two-dimensional adaptive thresholding. Annex 4 of APU AAM-HAP Report 1. Patterson, R.D. (1986) Spiral detection of periodicity and the spiral form of musical scales. Psychology of Music, 14, 44–61. Patterson, R.D. (1987a) A pulse ribbon model of peripheral auditory processing. In W.A.Yost and C.S. Watson (Eds.), Auditory Processing of Complex Sounds, pp 167–179. New Jersey: Erlbaum. Patterson, R.D. (1987b) A pulse ribbon model of monaural phase perception. Journal of the Acoustical Society of America, 82, 1560–1586. Patterson, R.D. (1989) The tone height of multi-harmonic tones. In Proceedings of the First International Conference on Music Perception and Cognition. Kyoto, Japan. Patterson, R.D. (1990) The tone height of multiharmonic sounds. Music Perception, 8, 201–211. Patterson, R.D. & Holdsworth, J. (1990) An introduction to auditory sensation processing. Annex 1 of APU AAM-HAP Report 1. Patterson, R.D. & Holdsworth, J. (1991) A functional model of neural activity patterns and auditory images. In Advances in Speech, Hearing and Language Processing Vol. 3. W.A.Ainsworth (Ed.), JAI Press, London. Patterson, R.D. & Nimmo-Smith, I. (1986) Thinning Periodicity Detectors for Modulated Pulse Streams. In B.C.J.Moore and R.D.Patterson (Eds.), Auditory Frequency Selectivity, pp 299–307. New York: Plenum. Ueda, K. & Oghushi, K. (1987) Perceptual components of pitch: Spacial representation using a multidimensional scaling technique. Journal of the Acoustical Society of America, 82, 1193– 1200.
Brightness and octave position
89
Brightness and octave position: are changes in spectral envelope and in tone height perceptually equivalent? Ken Robinson MRC Applied Psychology Unit, University of Cambridge, UK Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 83–95 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
Rapid changes in spectral envelope have been reported to influence estimation of octave position by musically trained listeners. Musically trained and untrained listeners were asked to judge changes in spectral envelope as well as octave position for successively presented stimuli. Untrained listeners erroneously perceived unison intervals concurrent with a change in spectral envelope as octave intervals. Trained listeners were at ceiling performance. In a second experiment, performance was measured as a function of stimulus duration. Increases in duration did not affect spectral envelope discrimination, but did improve pitch interval identification. Furthermore, all listeners confused unison intervals as octave with a change in spectral envelope for durations of 7.5 ms. As durations increased, trained listeners improved their unison interval identification performance, but untrained listeners were unable to improve. The results support the view that the brightness is formed earlier than pitch. Listeners with little or no musical training primarily rely on spectral cues, so that their octave position judgements are partially dependent on spectral envelope. KEY WORDS: Pitch perception, timbre perception.
This research investigates the relationship between octave position and spectral envelope. Octave position is the physical counterpart of tone height, which is that attribute of pitch used to discriminate between notes that are separated by octave intervals. Spectral envelope is the physical determinant of brightness, which is that attribute of timbre that is varied by the tone control on many transistor radios. The tone control varies the degree of
Contemporary music review
90
bass or treble that may be heard in the sound signal, by adjusting the relative intensity of each of the signal’s component frequencies. Timbre differences between instruments have been found to contribute towards octave errors (Bachem, 1937; Hesse, 1982). Bachem asked listeners to match the pitch of notes produced by a number of musical sources by pointing to the appropriate note on a piano. The musical sources included tuning forks, piano, violin and the tin whistle. Judgement errors of one to two octaves were reported for those notes which were characterized by a high number of upper harmonics. If tone height were perceptually interpreted as synonymous with brightness, as has been suggested by Van Noorden (1982), it would explain why octave position judgments were confused by variations in spectral envelope. Hesse (1982) systematically manipulated spectral envelope independently with octave position, and asked 27 music students to identify pitch intervals spanning from unison to octave. Listeners made errors of an octave when spectral envelope was varied. For example, when two successive notes shared the same periodicity, and one of the two notes varied in spectral envelope, listeners reported the interval as being octave rather than unison. Hesse suggested that brightness was an attribute of pitch. Hesse’s findings, however, may be an artifact of his task. Listeners were only asked to judge the pitch interval between two notes, and were never given an opportunity to clarify their perceptual experience by also using a timbre related response. Therefore, any auditory difference would have biased listeners towards an octave interval judgment rather than an unison interval judgment. The experiments reported in this paper measured the degree and direction of confusion between variation in octave position concurrent with variation in spectral envelope. The general design is a modification of Hesse’s (1982) experiment, ensuring that listeners were aware that both tone height and brightness were being varied. Moreover, to assess the nature of confusion, both musically trained and untrained listeners were tested. In the first experiment, listeners were asked to judge the pitch interval of two notes as either unison or octave, and also to judge whether a change in spectral envelope had occurred. To facilitate the brightness decisions, a constant spectral envelope was described to listeners as being the same laboratory instrument, and a change in spectral envelope was described as being a different laboratory instrument. Hence, there were four response categories: (1) unison interval, same instrument; (2) octave interval, same instrument; (3) unison interval, different instrument, and, (4) octave interval, different instrument. The stimuli were presented either at a fundamental frequency of 131 Hz (octave below Middle C) or 262 Hz (Middle C). Spectral envelope was varied by using one of four different spectral weighting functions (see Figure 1). In the second experiment, the experimental task was simplified so that listeners did either the instrument discrimination task or the pitch interval identification task on any given daily session. This had the advantage of minimizing response confusions. To balance the design, there were two categories of pitch (C3/C4) and two categories of instrument (flute/brass). The stimuli themselves were selected to be more typical of a musical instrument by recording the flute and brass voices from a Yamaha DX-9 synthesizer at C3 and C4. The peak cycles of each waveform was then excised, and iterated to make up the steady-state experimental stimuli. To further investigate the perceptual relationship between spectral envelope and octave position, performance was measured as a function of stimulus duration ranging from 7.5 to 240 ms.
Brightness and octave position
91
In predicting the outcome of the experiments, it is worth reviewing the theoretical ideas of van Noorden (1982) and of Hesse (1982). Van Noorden (1982) has suggested that brightness and tone height judgments share the same underlying process, and Hesse interpreted his results as indicating that brightness was an attribute of pitch. If brightness and tone height were perceptually equivalent, then they should be indiscriminable, and the resultant confusions should be symmetric. If, however, tone height and brightness were perceptually independent, then performance in either instrument discrimination or pitch interval identification should be unaffected by a change in the irrelevant dimension. Instrument discrimination performance should be equally good in conditions where the pitch interval was unison or octave, and pitch interval identification performance should be equally good in conditions where the instrument remained the same or was different across the successive note stimuli. The pattern of confusions maybe either
Figure 1 Spectral weighting functions used to produce the different laboratory instruments in Experiment
Contemporary music review
92
I. The spectra depicted are the stimuli synthesized at Note C3. suggesting a one-way dependency between the two percepts of brightness and tone height.
Experiment I Method Stimuli Pulse trains with a 3 dB/octave rolloff were initially generated at fundamental frequencies of 131 and 262 Hz. Four spectral weighting functions were used (Figure 1). The upper cutoff frequency was set at 0.7 kHz for Spectra A and C, and 2.1 kHz for Spectra B and D. The lower cutoff was set at 0.4 kHz for Spectrum C and 1.2 kHz for Spectrum D. Stimuli were sampled at 10 kHz using 12-bit D2A converters, and low-pass filtered at 4.2 kHz using two cascaded Barr & Stroud EF3 -02 filters (each 48 dB/octave slope). All notes had 20 ms onset and offsets shaped by a raised cosine function. Stimuli were presented binaurally at 68 dB SPL through TDH-39 headphones in sound-attenuated IAC booths. Loudness cues were further controlled by randomizing stimuli intensity levels within a 5 dB range on a trial by trial basis. In addition, a random pitch variation within 5 percent of the centre frequency was introduced to reduce memory effects. This resulted in a roving pitch range of a semitone. Procedure Listeners heard two notes that randomly varied in spectral envelope and/or in octave position. For ease of identification, constant spectral envelope between the two note stimulus was called the same laboratory instrument, whereas a change in spectral envelope was called a different laboratory instrument. Listeners were asked to make one of four possible decisions: (1) Same instrument, unison interval; (2) Same instrument, octave interval; (3) Different instrument, unison interval, and, (4) Different instrument, octave interval. All listeners were tested over one 2 hr session. At the beginning of the session, a demonstration of each of the eight notes in each of the four decision categories was given. Each listener repeated the experiment ten times (10 runs). In addition, listeners were given a practice run in the first session to ensure that they were familiar with the task. There were 96 trials in each run, with all four decision categories in equal proportion. Each trial consisted of two 200 ms notes interspersed by 100 ms silence. A further 500 ms elapsed, and the stimulus was repeated, after which the listeners’ responses were collected. No knowledge of results was given to listeners until the completion of the experiment. Subjects
Brightness and octave position
93
Nine listeners ranging in age from 30 to 50 years were tested. Four listeners were classed as musically untrained as they had either never studied a musical instrument, or studied for les than 1 year, whereas listeners were classed as musically trained if they had formal musical training of over 1 year. All listeners had normal binaural absolute thresholds measured at pure tone frequencies of 500, 2000 and 8000 Hz.
Figure 2 Group mean performace in the instrument discrimination and pitch identification tasks for each classification category used in Experiment I. The legend in capital lettersrefers to the decision taken by listeners, whereas the lower case legend refer to the status of the irrelevant dimension. Results and Discussion Performance was worse for the musically untrained listeners, than for musically trained listeners (77 and 93 percent correct respectively, p<0.01). Nevertheless, the musically untrained listeners performed at better than chance levels, which shows that they were able to discriminate between variation in octave position and in spectral envelope. Musically trained listeners performed at ceiling in discriminating between octave position and spectral envelope changes (97 and 90 percent respectively, n.s.). Untrained listeners found more difficulty with octave position decisions than with spectral envelope decisions (73 and 82 percent correct respectively p<0.01). One condition (Unison interval, different instrument) contributed most of this confusion (see Figure 2). When the spectral envelope changed, untrained listeners erroneously classified the unison pitch interval as “octave interval, same instrument”. Hence, changes in spectral envelope
Contemporary music review
94
confused the unison interval decisions of the untrained listeners, whereas changes in pitch interval did not interfere with spectral envelope decisions. The results for the musically trained listeners are inconsistent with Hesse’s (1982) finding that spectral envelope changes are confused with octave position decisions. The performance of the untrained listeners, however, is in accord with the results Hesse reported. Moreover, in the current study a performance asymmetry was found: untrained listeners found that changes in octave position did not disturb spectral envelope judgments, but changes in spectral envelope disturbed octave position judgments. This asymmetry in response suggests that separate processes were used for the octave position and spectral envelope judgment tasks. For untrained listeners, the judgment of tone height is dependent on a stable timbre, whereas the judgment of timbre is not dependent on a stable tone height. The lack of effect found with the musically trained listeners was probably due to their ceiling performance. There were no significant effects in any of the conditions for these listeners: the task was too easy for musically trained listeners. Another problem for interpretation is task demand, especially on musically untrained listeners. It can be argued that the task of four alternative responses was too much for untrained listeners, especially in one single testing session. The necessity to keep track of both instrument discrimination and pitch interval identification at the same time may have placed confusing response demands on the untrained listeners. Hence, untrained listeners may have perceived the stimuli correctly, but merely confused their responses. Accordingly, in Experiment II, the design was simplified so that listeners could concentrate on one dimension at a time, although at the same time being aware that the other dimension was freely varying. In the second experiment, listeners made either instrument discrimination judgments or pitch interval identification judgments on a given daily session. The ceiling performance of the musically trained listeners was avoided by making the task more difficult: duration of the stimuli were manipulated from 7.5 to 240 ms. Moreover, to generalize the finding that spectral envelope affected pitch interval perception, more representative musical stimuli were used.
Experiment II Method Stimuli and procedure Stimuli were recorded from a Yamaha DX-9 synthesizer on the Flute and Brass voice settings at C3 and C4. The peak cycle of each recorded note was then excised, and iterated to make up the steady-state stimuli for the experiment. Spectral envelopes of the Flute and Brass stimuli at C3 are shown in Figure 3. Stimuli were sampled at 16.384 kHz using 12-bit D2A converters, and low-pass filtered at 6.5 kHz using the two cascaded Barr & Stroud filters. Stimuli consisted of a note of varying duration from 7.5, 15, 30, 60, 120 and 240 ms; each note was presented twice. The
Brightness and octave position
95
Figure 3 Spectra for the four sounds used in Experiment II. The top two spectra represent the steady-state brass stimuli synthesized at notes C4 and C3. The bottom two spectra represent the steady-state flute stimuli synthesized at notes C4 and C5. inter-stimulus-interval was 200 ms. Stimuli were presented binaurally through Sennheiser-414 headphones at 63 dB SPL in sound-attenuated IAC booths. Loudness cues associated with increasing duration were controlled by reducing the level of the stimulus by 3 db with each doubling in duration. Loudness cues associated with the occurrence of higher harmonics resulted in brass stimuli sounding louder than flute stimuli. The loudness was subjectively equalized by increasing the level of the flute stimuli by 7.5 dB. In addition, a random pitch variation within +/−12 percent of the centre frequency was introduced, resulting in an overall pitch range of two whole tones.
Contemporary music review
96
Listeners heard two notes that varied in instrument and/or in pitch interval. They were asked either to discriminate the instrument or identify the pitch interval on any given daily session. Each of the four listeners were tested over six 2 hr sessions. At the beginning of each session, and after every 48 experimental trials, a demonstration of four trials with answers was given to remind listeners of the task. Furthermore, the first session was designated practice for the instrument discrimination task, and the second session was practice for the pitch interval identification task. There were 10 runs of the experiment for each session, and the order of tasks over the six days was ABBAAB, where A was instrument identification, and B pitch interval discrimination. Knowledge of results was given to listeners after every experimental trial, and a summary of their performance was given to them after every run. Subjects Four listeners ranging in age from 21 to 32 years were tested. TN and CF were classed as musically untrained as neither had ever studied a musical instrument. Listener NR is a post-graduate music student, whilst author KR has had six years of formal musical training. All listeners had normal binaural absolute thresholds measured at pure-tone frequencies of 500, 2000 and 8000 Hz. Results and discussion The salient result was that increasing stimulus duration had no effect on instrument discrimination performance, yet had a large effect on pitch interval identification performance. Figure 4 shows performance collapsed over “same” and “different” instrument discrimination and “unison” and “octave” pitch interval identification trials. Instrument discrimination did not improve over increasing duration for any of the four listeners. In contrast, pitch interval identification performance improved an average of 24 percent for stimulus durations from 7.5 to 240 ms. For all listeners, performance in pitch interval identification was worse than performance in instrument discrimination for the 7.5 ms duration. Clearly, the two tasks show different patterns of performance, and further show that the brightness percept is formed earlier than the pitch percept. Figure 4 also shows that the performance improvement in pitch interval discrimination for musically trained listeners (NR, KR) occurred over 7.5 to 15 ms stimulus durations. For the musically untrained listeners (TN, CF), improvement in pitch interval identification occurred throughout the range of stimulus durations. Hence, musical training reduces the amount of time required to discriminate pitch intervals. The data were further analyzed over both tasks of instrument discrimination and
Brightness and octave position
97
Figure 4 Instrument discrimination and pitch interval identification performance as a function of stimulus duration (Experiment II). TN and CF are musically untrained listeners, NR and KR musically trained listeners.
Contemporary music review
98
Figure 5 Same instrument and unison interval judgments as a function of stimulus duration. TN and CF are musically untrained listeners, NR and KR musically trained listeners. pitch interval decisions, and “different” instrument and “octave” pitch interval decisions. For the instrument discrimination task, there were “same instrument” decisions for conditions where the pitch interval was either unison or octave. Similarly, there were “unison interval” decisions for conditions where the instrument was common or changed over the successive note stimuli. Figure 5 shows that a change in pitch interval between the two stimuli causes minimal disruption to instrument discrimination performance for all listeners. The pitch interval identification performance, however, was markedly reduced by a change in instrument for all listeners at durations of 7.5 and 15 ms. Musical training had an influence on the pitch interval identification performance: trained listeners improved their performance as stimulus duration increased, whereas the untrained listeners did not. Moreover, untrained listeners almost always incorrectly classified unison pitch interval trials as “octave intervals” when there was a change in instrument. Hence, the untrained listeners confused the octave interval with an instrument change. This replicates the result reported by Hesse (1982) and Experiment I in this paper. Performance for “different” instrument decisions as a function of stimulus duration is depicted in Figure 6. “Different instrument” decisions were made in conditions when the pitch interval was either unison or octave. Similarly, “octave interval” decisions were made when the instrument was common or changed over the successive notes. The results show that there are no effects for any conditions except for the octave interval identifications when the instrument was common across the successive notes. For this condition, stimulus duration has a clear effect
Brightness and octave position
99
Figure 6 Different instrument and octave interval judgments as a function of stimulus duration. TN and CF are musically untrained listeners, NR and KR musically trained listeners. on all listeners. Patterns of improvement, however, differ for the musically trained and untrained listeners. Improvements occur for trained listeners only from 7.5 to 15 ms stimulus durations. The untrained listeners improve performance up to a ceiling of 120 ms stimuli durations. The results of the separate analyses show a distinct asymmetry in performance mirroring the finding reported in Experiment I. For short durations of 7.5 and 15 ms, listeners incorrectly classify unison intervals as “octave” when the instrument changes. Musically trained listeners are able to remedy the confusion when stimulus durations are increased from 15 to 30 ms. Untrained listeners, however, are unable to use the information from increased stimulus duration. Moreover, when the pitch interval is octave, and the instrument remains constant, all listeners are able to use increased stimulus durations to correctly identify the pitch interval.
General discussion Results from Experiments I and II show that the perceptions of brightness and of tone’ height are not mediated by the same underlying process. In Experiment I, musically untrained listeners incorrectly classified unison pitch intervals as octave when the spectral envelope changed. Experiment II replicated this finding, and extended it to
Contemporary music review
100
musically trained listeners for stimulus durations of 7.5 and 15 ms. Moreover, in Experiment II stimulus duration had remarkably different effects on the instrument discrimination and pitch interval identification tasks. Instrument discrimination performance did not improve over increasing duration for all listeners, whereas performance in pitch interval identification improved over increasing stimulus durations. The results are consistent with the view of separate processes being used for octave position and spectral envelope judgments. Furthermore, brightness appears to be processed at an earlier stage than tone height, given that duration does not improve instrument discrimination judgments but does improve pitch interval identification judgments. The performance asymmetry reported in both experiments suggests that tone height is at least partially dependent on brightness, and that as stimulus duration increases the initial dependence disappears. The results are clearly inconsistent with the theory that brightness and tone height share the same underlying process (van Noorden, 1982). The effect of stimulus duration, and asymmetric dependency of tone height on brightness argues against this theory. Van Noorden suggested that brightness and tone height were jointly encoded by a common analyser. Data from Experiments I and II show that brightness and tone height judgments are perceptually different, and that toneheight judgments are partially dependent on brightness. This indicates two analysers. Although the dependence of tone height on brightness is consistent with Hesse’s findings, it is unusual that his musically trained listeners confused unison intervals as octave with a concurrent spectral envelope change. His stimuli durations were 1 s each; well over the 7.5 to 15 ms durations where musically trained listeners were shown to make confusions in Experiment II. Hesse’s musical listeners, however, were only asked to judge the pitch interval, and were never given an opportunity to clarify their perceptual experience by also using a timbre-related response. Moreover, the testing occurred over one session, so his results may have also been due to relative unfamiliarity with the stimuli he used. There is other experimental evidence that pitch is affected by spectral envelope. For example, Bachem (1937; 1940) found a similar asymmetry to that reported in this paper. For pitch decisions, listeners with absolute pitch were little disturbed by spectral envelope change, whereas those without absolute pitch were disturbed by the spectral envelope change. Similarly, Risset (1976; p. 531) has reported that untrained listeners do not immediately perceive a periodicity pitch, but rather a spectral pitch. Risset has contended that musically trained listeners have better tuned periodicity pitch processors, whereas listeners with little or no musical training do not have as well developed periodicity pitch processors, and rely primarily on spectral pitch. Musically untrained listeners are therefore more confused between tone height and brightness than those with musical training. The asymmetry in results for both trained and untrained listeners at short durations suggests that their tone height judgments are at least partially dependent on the spectral envelope being stable. Risset’s explanation encompassing the idea of periodicity pitch is particularly apt, as the musically trained listeners showed asymmetric performance for durations of 7.5 ms. This is the period of one cycle of the note C3, and with only one period it is impossible to derive an estimate of the periodicity of the note. At this duration, the musically trained listeners act like untrained listeners in that they rely on the spectral envelope to estimate tone height. When durations become longer,
Brightness and octave position
101
musically trained listeners may have been able to drive a clearer estimate of periodicity pitch with which to make their pitch interval identifications. Houtsma and Goldstein (1972) have reported a related result using listeners with “quite extensive musical training.” They asked their listeners to identify musical intervals, whilst randomly varying the harmonic number of a few consecutive components. Given the results of the two experiments reported in this paper, it is predicted that untrained listeners would be unable to replicate their finding, as the change in spectral envelope would overwhelm their undeveloped pitch interval sense. Similarly, B.C.J.Moore (personal communication) has recently reported that a change in harmonic structure markedly reduces pitch discrimination performance for most listeners. Four of his five listeners showed worse pitch discrimination performance under the changing harmonic structure condition than when the harmonic structure was unchanged. His one musically trained listener did not find that the spectral envelope change disrupted her performance. In summary, brightness and tone height can be perceived independently given enough musical training and stimulus durations of over 30 ms. Musically trained listeners will confuse changes in spectral envelope with pitch interval at stimulus durations of one or two cycles, and musically untrained listeners will confuse changes in spectral envelope with pitch interval over most stimulus durations. Untrained listeners can perform well in pitch interval identification only under conditions when the spectral envelope remains constant. Acknowledgements Special thanks to Dr. Roy Patterson for discussion. Supported by M.T.Meyer Graduate Studentship, Girton College Cambridge.
References Bachem, A. (1937) Various types of absolute pitch. Journal of the Acoustical Society of America, 9, 146–151. Bachem, A. (1940) The genesis of absolute pitch. Journal of the Acoustical Society of America, 11, 434–439. Hesse, H.-P. (1982) The judgment of musical intervals. In M.Clynes (Ed.), Music, Mind and Brain: The Neuropsychology of Music., pp. 217–225. New York: Plenum Press. Houtsma, A.J.M. & Goldstein, J.L. (1972) The central origin of pitch of complex tones: Evidence from musical interval recognition. Journal of the Acoustical Society of America, 51, 520–529. Noorden, L.van (1982) Two channel pitch perception. In M.Clynes (Ed.), Music, Mind and Brain: The Neuropsychology of Music, pp. 251–269. New York: Plenum Press. Risset, J.-C. (1976) Musical acoustics. In E.C.Carterette and M.P.Friedman (Eds.), Handbook of Perception Vol. IV: Hearing, pp. 521–564. New York: Academic Press.
Contemporary music review
102
Constraints on music cognition—neural A cognitive neuropsychological analysis of melody recall David W.Perry Cognitive Neuropsychology Laboratory, Good Samaritan Hospital & Medical Center, Portland, Oregon, USA Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 97–111 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
Studies in which recognition performance is compared for melodies presented to each ear, although most frequently indicating a left ear asymmetry (LEA), have generated a complex set of results in which EA appears to be fundamentally determined by the mode of processing engaged. In a melody recall experiment, experienced pianists were presented with melodies excerpted from the fugal expositions of J.S.Bach’s Well Tempered Clavier Book II, and asked to reproduce them on a keyboard, with performance recorded by computer. For right-handed pianists, reliable left ear asymmetries in accuracy of pitch sequence recall and in response latency were observed. The results suggest that perception of novel pitch sequences by musically-experienced subjects, under conditions demanding attention to the entire melody, consistently enlists specialized processors in the nonlanguage dominant hemisphere. A possible neural substrate supporting working memory for pitch information, involving connections between auditory and prefrontal cortices, is proposed. KEY WORDS: Ear difference, cerebral hemisphere, melody, recall, recognition, working memory.
Constraints on music cognition
103
Introduction The search for the neural substrates of musical functions focused hopefully on the right hemisphere following reports from unilateral temporal lobectomy patients (Milner, 1962), and from dichotic melody recognition (Kimura, 1964), both of which suggested a predominant role for it in processing musical stimuli. Initial dichotic listening reports indicated a left ear advantage for processing melodies, both with a recognition and a recall task, and with musically-experienced and inexperienced subjects (Kimura, 1964; 1967). The contralateral ear-hemisphere connection appears to be consistently stronger in man. A larger surface electrical (Tanguay, Taub, Doubleday and Clarkson, 1977), depth electrode-recorded (Liegeois-Chauvel, Musolino, and Chauvel, 1991), magnetic (Rogers et al., 1990), and bloodflow (Lauter et al., 1985) response is observed from the temporal lobe of the contralateral hemisphere, and presumably a greater number of cortical neurons fire in response to the contralateral ear signal (Middlebrooks, Dykes, and Merzenich, 1980). Therefore, it can be inferred that significantly better performance with one ear, particularly if in a manner related to handedness or language dominance, may reflect some degree of specialization or preference of the opposite hemisphere for the type of melody perception engaged. This assumption implies an averaging over individual differences in the degree of contralateral predominance, which may also result from variations in the subcortical interaction of contralateral and ipsilateral signals (Sidtis, 1982; Efron, Crandall, Koss, Divenyi, and Yund, 1983).
Melody recognition The seminal paper of Bever and Chiarello (1974) upset the simplicity of early interpretations. They found a REA with musicians and a LEA with non-musicians for monaural melody recognition. In a secondary task that involved recognition of a 2-note excerpt, only the musicians performed above chance level. The imposition of a secondary excerpt recognition task (and its successful performance by the musician group only) was taken as evidence for the inducement of an ‘analytic’ strategy in musicians. Subsequent research, most with multiple-choice or same-different recognition tasks, has generated a complex set of results. In particular, apparent shifts toward a right ear/left hemisphere advantage have been noted, most frequently for musically experienced subjects. It has been suggested (Bever and Chiarello, 1974), and demonstrated by comparing performance asymmetries to subjects’ reports. (Peretz and Morais, 1980) that task strategy may be the factor that leads to this ear shift, e.g. using an analytic strategy leads to a left hemisphere advantage, for both musicians and non-musicians. As Peretz and Morais (1988) have pointed out, under conditions in which melodies are constructed in such a way as to exclude comparison based on pitch changes at specific positions, LEA’s are most frequently obtained. For example, Zatorre (1979) constructed melodies so that no interval was ever repeated in the same position, and obtained a relatively consistent LEA [71% of right-handers overall, with both musicians (N=24) and nonmusicians (N=24)]. However, identical or similar constraints on the construction of melodies-to-becompared have not always produced statistically reliable LEA’s (e.g. Piazza, 1980; Peretz
Contemporary music review
104
and Morais, 1983). It is possible that constraints on the construction of melodies-to-becompared alone may not effectively eliminate strategies that preferentially engage lefthemisphere processors in recognition studies. In general, demonstrating consistent, reliable left ear advantages for melody perception has proved to be much more elusive than demonstrating right ear advantages for language perception (Bryden, 1988). Although musical processing is almost certainly less lateralized than language processing (Marin, 1982), part of the reason for the greater variability in ear asymmetry maybe that all melody recognition tasks involve tone-sequence comparison, and may potentially engage analytic processing strategies and hence left-hemisphere processors, at least to some degree and for some subjects. The percentage of LEAs among right handers for delayed melody recognition (Zatorre, 1979) or immediate visual melody identification (Mazzucchi, Parma, and Cattelani, 1981) peaks at about 70–75%, vs an average of 80% REAs for speech stimuli (Bryden, 1988), and peaks of greater than 90% (e.g. Geffen and Traub, 1980).
Neuroimaging Various techniques for the imaging of brain activity have been used in the analysis of auditory stimuli related to melody. Roland, Skinhøj, and Lassen (1981) measured regional cerebral bloodflow during performance of the Seashore rhythm test, which involves a same-different response to two uniform-pitch, rhythmically-varied tone sequences. Results indicated increased activity in the posterior temporal-parietal regions, and in medial and lateral frontal areas, both right greater than left. Brain electrical activity mapping (BEAM), or topographic plotting of spectrally analysed electroencephalogram data, was utilized to study the same task (Duffy, McAnulty, and Schachter, 1985). Data from one subject were sampled just prior to the end of stimulus presentation, and immediately following, as the subject evaluated the stimulus in preparation for a response. For the alpha frequency range (8 to 12 Hz), the authors were able to distinguish between a greater increase in posterior activity (posterior temporal and occipital) during stimulus presentation, and a greater increase in frontal activation during stimulus evaluation. Although these or similar time-course sensitive techniques of analysis have not been applied to melody recognition paradigms, they highlight the importance of response mode for engaging different neural processors. A similar differentiation between patterns of brain activity associated with melody perception and evaluation seems plausible. Measurement of regional glucose metabolism was used to investigate brain activity during the Seashore tonal memory test (Mazziotta, Phelps, Carson, and Kuhl, 1982). The local cerebral metabolic rate of glucose (LCMRGlc) was calculated from the results of positron emission computed tomography following injection of 18F-fluorodeoxyglucose. Measurements were taken during a 30 minute period during which subjects responded to pairs of taped stimuli by depressing a microswitch to indicate identity or difference. Significant increases in LCMRGlc above hemispheric mean were observed in right posterior superior and middle temporal cortex. Subjects were differentiated by strategy, on the basis of posttest interviews, into two groups. Subjects who reported a structured system of visual imagery (e.g., “Visualizing a frequency histogram”, or “seeing the notes
Constraints on music cognition
105
on the musical scale”) were classified as analytic (N=3). Subjects who reported no strategy (N=3), or who reported that they had mentally “resung” the sequence of tones (N=2) were classified as nonanalytic. Subjects in the nonanalytic group had right greater than left metabolic asymmetries in superior prefrontal, inferior prefrontal, inferior premotor, and parietotemporal regions; subjects reporting an analytic strategy showed greater metabolism in the left posterior superior temporal cortex, and no right greater than left metabolic areas. This neurophysiological result thus provides converging evidence for the left-hemisphere biasing effect of analytic modes of tone sequence comparison observed in ear advantages for melody recognition.
Factors determining melody recognition ear asymmetry A hierarchy of factors (see Figure 1) appears to determine EAs [for more complete reviews of lateralized pitch information results, see Peretz and Morais, 1988; Perry, 1991]. Asymmetries in the perception of particular psychoacoustic parameters of the stimulus that are relatively fixed for each individual (e.g. ear dominance for pitch, Efron and Yund, 1974) may interact with cognitively-mediated asymmetries. In order to produce cognitively-based asymmetries that appear to be related to hemisphere differences, certain stimulus information is necessary but not sufficient for engaging a particular set of lateralized cortical processors. Specific tasks demands that utilize this information must then be made, and subject choice of strategy (if more than one mode of processing is possible, given the constraints of stimulus and task conditions) finally determines the direction and consistency of
Contemporary music review
106
Figure 1 Hierarchy of factors determining cognitively-mediated ear asymmetry. EA. Subject choice of strategy can be quite flexible, both within and between subject performances. Musical experience is a particularly crucial subject variable in determining the range of strategies that are available for a given subject. It seems clear that comparison of tone sequences with pitch changes discrete positions, and excerpt-recognition tasks, tend to produce rightward shifts in EA with both musicians and non-musicians, presumably by encouraging analytic processing. It is not clear, however, that such apparently left-hemisphere preferential, analytic processing modes are ever fully excluded in melody recognition tasks. If comparison of tone sequences always allows the possibility of adopting an analytic processing mode that engages lefthemispheric processors, then the weakness and variability of the LEA for melody recognition tasks would be explained. The direction and consistency of EA would then be determined by individual variations in choice of processing strategy, and by subtle variations in task conditions and melody construction that influence this choice, thus shifting the balance of relative hemispheric engagement.
Constraints on music cognition
107
Melody recall A melody recall task was presented to pianists with an average of 24 years of musical experience (Perry, 1990). The melodies were derived from the fugal expositions in J.S.Bach’s Well Tempered Clavier Book II, and ranged in length from 7–13 notes. Two sets of 12 melodies were selected, matched in length and as closely as possible in difficulty. In order to prepare additional sets of melodies that were motorically and harmonically different, but exactly matched for difficulty, each original set was simultaneously transposed into the opposite mode and into a key opposite on the circle of fifths (with the same number of sharps or flats, and ratio of black and white keys). Melodies were presented monaurally in blocks of 12, alternating between ears in ABBA fashion. Each ear received one set in original mode, and one in opposite mode and key in the first half (blocks 1–4). In the second half (blocks 5–8), each ear received the blocks presented to the opposite ear in the first half. The melodies in the second half of the experiment thus consisted of exact repeats of melodies presented in the first half. Subjects were instructed to attempt to reproduce the melody immediately following presentation. They were allowed to rehearse their recall attempt until reasonably satisfied that it matched what they remembered hearing, before playing one final recall attempt. They were then asked to rate the difficulty of each melody on a scale of 1–7. In order to avoid floor and ceiling effects that could prevent the emergence of ear asymmetries, subjects first performed a tempo titration task. Rate of presentation was systematically altered in order to produce a 50–75% pitch sequence recall accuracy rate. This was necessary due to the wide variation in melody recall ability. Ear-related performance asymmetries were then compared for the first presentation (blocks 1–4), for the second presentation (blocks 5–8), and collapsed across presentations. The primary goal of this experiment was to measure the ear advantage for pitch sequence recall accuracy, and to demonstrate that the group LEA for vocal recall obtained with questionable data, and scored for accuracy in an all-or-none fashion (Kimura, 1967) could be reliably obtained on an individual basis for instrumental recall, with a sufficient number of trials, scored on a note-by-note basis. 11 Pitch sequence accuracy was scored by a computer-programmed matching of recall attempts to stimulus melodies, that allowed for insertions, omissions, and variable recall lengths. The choice of stimuli was based on a top-down approach, beginning with rich musical stimuli first, before attempting to gradually abstract simpler features (Dowling, 1989). If an overall LEA can be demonstrated for right-handed pianists with complex tonal melodies, then it cannot be claimed that under experientially veridical conditions, musicians preferentially process melodies in the left hemisphere. Further experiments can then determine more precisely what characteristics of the task and melodies account for the differential engagement of lateralized processors. Two measures of pitch sequence recall accuracy were examined: unadjusted accuracy, and accuracy adjusted for guessing. A proportion (1/4) of the unmatched responses was subtracted in order to partially remove correct responses resulting from improvisational guessing within the correct key.
Contemporary music review
108
In addition, a number of other ear-related performance asymmetries were investigated, for which precedents do not exist in the literature. Keyboard responses were recorded by computer, in units of 0.25 ms, permitting investigation of timing variables. Performance time intervals studied were: response latency (from onset of the last stimulus note to onset of the first recall note), keyboard rehearsal (from onset of the first recall note to onset of the first final recall note), and
Figure 2 Division of time in one trial into response latency, length of keyboard rehearsal, and length of the final recall attempt. Vertical lines indicate onsets of the last stimulus note, the first recall attempt note, and the first and last notes of the final recall. (The example melody is derived from the subject of Fugue XXIII of the Well Tempered Clavier, Book II, J.S.Bach.) length of the final recall attempt (from onset of the first final recall note to onset of the last final recall note) (see Figure 2). Response latency in the present experiment is not the equivalent of reaction time, since subjects did not perform under speeded conditions. Timing information was also used to derive an estimate of the extent of temporal segregation, or number of chunks, in each final recall attempt, based on analysis of the pattern of inter-onset intervals. Chunk boundaries were defined as the first shortening of inter-onset interval greater than 1 standard deviation (acceleration) following a lengthening of inter-onset interval greater than 1 standard deviation (deceleration). Lastly, ear asymmetries in difficulty rating were examined, with the expectation that lower difficulty ratings would be associated with the more accurate ear. For right-handed pianists (N=10), left ear presentation resulted in higher pitch sequence recall accuracy (unadjusted, p<.01; adjusted p<.001) and longer response latency (p<.005) for the first presentation. Individual ear asymmetry indices (R−L/ R+L) for both measures were positively correlated with each other (r=.70). A REA in final recall length (e.g. faster performance with left ear presentation) reached statistical significance, but only as a single comparison (p<.05). EAs for the remaining measures in the first presentation, and for all measures in the second presentation were clearly not significant. All right-handed pianists showed the same direction of EA in both adjusted pitch sequence recall accuracy and response latency, with 9 out of 10 showing LEA. One
Constraints on music cognition
109
of three subjects with reported absolute pitch ability showed a REA on both measures, as well as the only simple reversal of the typical octave illusion anisotropy (Deutsch, 1970a). (For a more complete discussion of this experiment, see Perry, 1991.) The results indicate that perception of novel pitch sequences by musically experienced subjects, under conditions demanding attention to the entire melody, consistently enlists specialized processors in the non-language dominant hemisphere. The positive correlation between EAs for response latency and pitch sequence accuracy, together with subject reports, suggests that covert singing of the melody may be characteristic of the mode of processing that preferentially enlists right hemisphere processors.
A preliminary model of component processes for melody recall and their neural substrates Although stimulus lateralization allows focus on the perceptual/cognitive aspects of this multidimensional task, melody recall is certain to involve a complex set of coordinated cognitive operations and neural processors (Posner, Petersen, Fox, and Raichle, 1988). These neural processors may each exhibit varying degrees of localization, hemispheric specialization, and bilateral equipotentiality. It is not possible to specify with any degree of certainty at this time all of the cognitive and motoric processes involved in melody recall, nor their precise neural substrates. However, a sequence of cognitive and motoric operations and hypothetical neural substrates can be suggested. The entire task can be understood as an example of acousticmotor coordination (Luria, 1962). Luria regarded investigation of the perception and reproduction of pitch relations as a crucial part of the clinical analysis of lesions of the temporal and premotor cortices. He pointed out that such tasks often offer the only evidence of right temporal dysfunction. The tests he outlined for investigation of the reproduction of pitch relations involve listening to a series of tones, up to 5 for nonmusically trained patients, and then reproducing them by singing. He points out that poor vocal reproduction can result from defects in acoustic analysis and synthesis resulting from temporal lesions, a condition he refers to in advanced forms as “sensory amusia”; or from deficits related to the motor process of vocalization. The latter can result from a variety of lesions, including those in anterior divisions of the sensorimotor zone, leading to pseudobulbar disorders (especially in the right hemisphere), or in the basal ganglia, resulting in intermittent, unmodulated vocalizations. Inferior premotor lesions can result in motor perseveration and an inability to switch from the production of one sound to another. Finally, extensive postcentral lesions that result in kinesthetic apraxia can produce a vocal apraxia in pitch reproduction as well, a phenomenon he calls “motor amusia” (Luria, 1962). All of these systems are involved in keyboard recall as well, with the substitution of manual for vocal motor control. However, the only component than can be convincingly related to auditory lateralization of the melodic stimulus is the first stage of acoustic analysis and synthesis. The following discussion describes an interconnected series of cognitive operations involved in melody recall, and provides suggestions regarding the neural processors subserving them.
Contemporary music review
110
Before beginning an analysis of the neural processors involved in melody recall, it will be helpful to briefly review the comparative anatomy of the auditory cortical fields in mammals, primates, and man, as revealed by electrophysiological, cytoarchitectonic, and connectional analyses. Initial sensory registration is tonotopically organized, from the cochlea, through the brainstem nuclei, the inferior colliculus and terminating in the medial geniculate body (MGB) of the thalamus, which projects ipsilaterally via the auditory radiation to the transverse gyri of Heschl in man. In all mammals, there appears to be a relatively sharply-tuned auditory cortical core (e.g. AI in the cat, AI or KA in man and rhesus monkey, Merzenich and Brugge, 1973) that receives the only input from the laminar portion of the MBG, and a surrounding belt of secondary cortices with complex or nontonotopic organization, that receives thalamic input exclusively from nonlaminar nuclei (Neff et al., 1975). Neff et al. (1975) regard this basic arrangement of core and belt areas as “the mammalian plan”. It must be cautioned that correspondences between the monkey and human brain, while compelling on cytoarchitectonic and connectional grounds, must still be regarded as hypothetical, especially in terms of function. The following is a preliminary framework for viewing the set of cognitive operations required for melody recall (see Figure 3), and the network of neural processors hypothesized to subserve them.
Constraints on music cognition
111
Figure 3 Cognitive operations involved in melody recall performance. Pitch perception of single tones Pure tones In spite of the precise tonotopic arrangement of at least some auditory cortex, bilateral ablations do not appear to significantly impair frequency discrimination in animals (Neff et al., 1975), although increased detection thresholds in the monkey have recently been demonstrated (Heffner and Heffner, 1986). Complex tones Auditory cortices in the vicinity of the gyryus of Heschl in the non-language dominant hemisphere appear to be crucial for computation of the fundamental frequency of
Contemporary music review
112
complex tones, when spectral pattern analysis is required, e.g. when energy at the fundamental frequency is not present (Zatorre, 1988). However, projection of spectral information to more posterior cortices that are spared by anterior temporal lobectomy cannot be ruled out. The patient described by Tramo, Bharucha, and Musiek (1990) with bilateral damage to Heschl’s gyri exhibited a severe impairment in the ability to discriminate musical consonance from dissonance, a perhaps related deficit since it also requires spectral pattern analysis. Since Heschl’s gyrus was severely infarcted in the right, this result is consistent with the above hypothesis, though not confirming, since the left transverse gyrus was also infarcted, as well as right, anterior cortex. Involvement of cortex anterior to Heschl’s gyrus may not be crucial, as Zatorre’s results indicate for perception of the missing fundamental. Most crucially, however, results from focal posterior lesions in other cortices receiving from the primary auditory cortex, but with sparing of the primary auditory area itself, are needed. It does seem, however, that the sharply-tuned information received by Heschl’s gyrus from the lateral division of the MGN is required by the right hemisphere for normal proficiency in the computation of the fundamental frequency of complex tones, and possibly in other aspects of auditory perception requiring spectral analysis such as consonance. It thus appears that cortical processing becomes necessary for complex spectral analysis of single tones or chords.
Maintenance in auditory-tonal working memory Once the initial spectral analysis has been accomplished, and simple pitch discrimination has been achieved, some mechanism for the short-term retention of pitch information must be called upon. Neff et al. (1975, p. 345) conclude from results of animal studies that the auditory cortex becomes necessary whenever “the effects of neural activity set off by acoustic stimuli must be maintained in the brain in some form for short intervals (milliseconds to a few seconds)”. For any complex cognitive task, some sort or sorts of auditory working memory (Baddeley, 1986; McCarthy and Warrington, 1987) is required. Large bilateral ablations removing all of the superior temporal gyrus in dogs were required in order to disrupt memory for same-different discrimination of 2-tone sequences (same-same vs. same-different) (Neff et al., 1975). Ablation in monkeys of the first and second temporal gyri anterior to the primary auditory area with sparing of the auditory core was sufficient to prevent learning of the same-different positive response. They were still able to learn an analogous visual task. Monkeys with bilateral inferotemporal lesions showed the reverse pattern of deficits (Stepien, Cordeau and Rasmussen, 1960). More recently, Colombo et al. (1990) have confirmed these findings in monkeys with lesions of secondary auditory cortices that more clearly spared the primary receiving areas. Short term memory (20 seconds) for two tones widely spaced in frequency was mildly impaired following bilateral lesion. Temporal lobectomy patients were tested using a paradigm modeled after Deutsch (1970b). Two frequencies were compared for identity or difference over an interval of 5 seconds filled either by silence, or by six random tones (Zatorre and Samson, in press).
Constraints on music cognition
113
Deutsch (1970b) found severe interference with recall performance, even though the tones were ignored. Performance was much higher with intervening digits, even when they were attended to for recall. All temporal lobe patients were able to perform the task well with no interference, but right anterior temporal patients, with or without encroachment on Heschl’s gyrus, were impaired with interference (Zatorre and Samson, in press). Zatorre (1989) concluded that lesions of the right anterior temporal lobe are “sufficient to disrupt short-term tonal memory”. Deutsch’s results suggest that tonal memory can only hold one absolutely-pitched event (or set of events in the case of a melody) in mind at once, as if frequency-specific auditory short-term memory is a tape that can be written on in an all-or-none fashion only, and then either held, recoded or erased. It is tempting to view this kind of memory as an actual maintenance of the firing of frequency-specific cells, and their networks, which appears to require processing in secondary association cortices. Analogous parallel distributed neural networks of the type proposed for visuospatial working memory (Goldman-Rakic, 1987; Goldman-Rakic, 1988) may be involved in auditory working memory as well. Parietal-prefrontal connections are hypothesized to be crucial for the short-term maintenance of visuospatial information, perhaps as part of a reverberating circuit in which feedback projections from prefrontal cortex serve to maintain excitation of parietal-prefrontal feedforward pathways. Prefrontal processors are then well-situated for influencing motor response (Goldman-Rakic, 1987). Chavis and Pandya (1976) suggested such an analogy, and proposed that connections from what they refer to as the “third association” auditory areas to the orbital prefrontal cortex might be particularly involved. Galaburda and Pandya (1983) point out that the pattern of long temporo-frontal connections for the auditory cortices parallels their proposed architectonic levels of development, since the temporopolar fields are connected primarily with orbitofrontal and medial prefrontal cortex, the midtemporal fields are primarily connected with the lateral prefrontal areas, and the caudal-most temporal fields primarily to the premotor region (Jones and Powell, 1970; Chavis and Pandya, 1976). Petrides and Pandya (1988) have further defined prefrontal projections from the superior temporal region of the rhesus monkey. They summarized their findings in terms of three major bundles: from rostral auditory cortices to orbital and medial frontal cortex via the uncinate bundle, from middle auditory cortices to lateral and dorsomedial frontal cortex via the rostral portion of the extreme capsule, and from posterior auditory cortices to lateral frontal cortices via the arcuate fasciculus. Although still entirely speculative at this point, it seems plausible, as suggested by Chavis and Pandya (1976) that a network similar to that proposed for visuospatial working memory could be involved in the short-term maintenance of auditory information as well. If the example of the visuospatial system is generalizable, then a much larger network of reciprocally connected areas may be involved, perhaps including common thalamic innervation from the medial geniculate and/or the pulvinar. Such functionally distinct but parallel networks have been proposed for subregions of the inferior parietal lobule and their specific prefrontal targets, e.g. area 7a with area 46, and area 7ip with area 8A, for spatial guidance of eye and hand movements respectively (Goldman-Rakic, 1988).
Contemporary music review
114
It does not seem possible to strongly differentiate which auditory cortices are crucial for immediate memory at this time. Samson and Zatorre’s (in press) results suggest that short-term memory can be disrupted by lesions solely to more rostral temporal cortices (or their projections). In any case, auditory working memory is likely to involve a complex network of distributed processors, that may serve to maintain patterns of activity originating in auditory association cortices. In man there appears to be some indication that this system favors the right hemisphere for tonal stimuli. This stands in clear contrast to the obvious left-hemisphere preference for auditory-verbal working memory, as seen in the repetition defects of conduction aphasia. Frequent asymmetries in the volumes of analogous cytoarchitectonically distinct auditory cortices are observed between the cerebral hemispheres in man. Area Tpt was observed to be as much as 7 times larger on the left than the right (Galaburda and Sanides, 1983). Areas PaAi and PaAe were larger on the right in all 3 brains examined. This asymmetry and the close proximity to primary koniocortex suggest a possibly crucial role for these regions in auditory working memory for multi-tonal frequency patterns. Their asymmetry also suggests a possible role in the complex spectral analysis for which the right hemisphere also seems to be favored. However, until further behavioral correlations are available, these speculations regarding structure-function correspondences must be regarded as tentative guidelines for investigation.
Rehearsal via auditory-vocal loop Auditory short-term memory as discussed thus far need not involve the type of covert singing described by many subjects in the present experiment, and in other experiments (Peretz and Morais, 1980; Mazziotta et al., 1982) that likewise resulted in LEA or greater right hemisphere activation. In a manner similar to the articulatory loop proposed for verbal short-term memory (Baddeley, 1982), this component of the task would serve to “refresh” the auditory representation held in short-term memory through subvocal or covert vocalization.
Encoding of sequential pitch patterns Once the sequence of pitches is registered in some sort of auditory working memory, encoding of the type required for multiple pitch patterns can begin. Various mechanisms of neural encoding have been proposed. As sequential tones of differing frequency Neff et al. (1975) observe that discrimination of simple tone sequences that vary only in pattern requires “interaction of neural events elicited by the component” tones. The single-neuron studies of Weinberger and McKenna (1988) for perception of 5-tone sequences in the cat showed effects of omission, in which the response to the remaining tones in a repeating pentad was altered when one was omitted. After several repetitions,
Constraints on music cognition
115
such effects were observed in most AI neurons, but in all AII neurons. This suggests that secondary auditory cortices are crucial for the encoding of pitch sequences. As diatonic pitch chroma The neural substrate is unknown, although development and learning are crucial, with adult abilities to perceive absolute pitch in familiar melodies reached near the sixth year of life (Dowling, 1982). Thus involvement of later-maturing higher order association cortices is suggested. As a pattern of finger movements on a familiar instrument Though the importance of this potential mode of perceptual registration is not known, and it of necessity applies only to musicians, it is suggested, at least for some absolute pitch possessors (Zatorre and Beckett, 1989; Perry, 1990). Encoding of sequential pitch patterns as sequential finger movements is likely to involve higher order acoustico-motor association cortices—temporal, parietal, and/or frontal.
Consolidation of internal auditory sequential representation Gordon (1983) proposes that projections from the temporal pole to the limbic system are important for consolidation of the short-term trace, and perhaps the formation of longterm memory. In support of this hypothesis, primacy errors in his study of recall for binary tone sequences were associated with temporopolar lesions.
Translation of internalized auditory internal representation into motor program The hypothesis of a functional temporo-prefrontal network provides a ready access for auditory internal representations to centers for motor planning. Frontal and parietal cortices are likely to be involved in planning the sequence of movements in space. Neurons in the inferior parietal lobe that fire to acoustic input may be especially involved. A cerebellar component may also be important even at the stage of mental rehearsal for a planned, skilled motor movement, and is certainly involved in its execution. Although important differences must exist between motor programs for finger movements on a musical instrument, and for vocal chord movements in singing or for labial movements in whistling, a similar translation, from internal representation to motor program, must occur. In vocal or whistled recall, however, the motor program is not mediated by an internalized spatial representation of diatonic pitch, and hence does not involve the integration of both a spatial and an auditory representation.
Contemporary music review
116
Expression and correction of the motor program If needed, correction of the intended motor program occurs through analysis of acoustic feedback, and comparison to the internalized mental representation. At this point, the entire motor system for control of hand or vocal chord movements is required, as well as all of the previous levels discussed, beginning with simple pitch perception, for acoustic analysis of feedback from recall attempts. Incorrect attempts may interfere with the representation held in auditory working memory. Although complex and multidimensional, melody recall is a phenomenon well suited to hypotheses and investigations aimed at understanding the links between brain activity, and perception, cognition, and behavior.
Conclusion Melody perception for recall appears to rely preferentially on neural processors in the non-language dominant hemisphere, and may also rely on a network of component neural processors in which temporal-prefrontal interconnections are crucial.
Acknowledgements I would like to thank John Chowning and others at the Stanford University Center for Computer Research in Music and Acoustics, where the keyboard melody recall experiment described was conducted, and Earl Schubert and Oscar Marin for their guidance during the formation of these ideas.
References Baddeley, A.D. (1986) Working Memory. Oxford: Clarendon Press. Bever, T. and Chiarello, R. (1974) Cerebral dominance in musicians and nonmusicians. Science, 185, 537–539. Bryden, M.P. (1988) An overview of the dichotic listening procedure and its relation to cerebral organization. In K.Hugdahl (Ed.). Handbook of dichotic listening: Theory, Methods and Research: London: John Wiley & Sons Ltd. Chavis, D. and Pandya, D.N. (1976) Further observations on corticofrontal pathways in the rhesus monkey. Brain Research. 117, 369–386. Colombo, M., D’Amato, M., Rodman, H., & Gross, C. (1990) Auditory association cortex lesions impair auditory short-term memory in monkeys. Science, 247, 336–338. Deutsch, D. (1970a) An auditory illusion. Nature (London), 251, 307–309. Deutsch, D. (1970b) Tones and numbers: specificity of interference in short-term memory. Science. 168, 1604–1605. Dowling, W.J. (1982) Melodic information processing and its development. In D.Deutsch (Ed.). The Psychology of Music. New York: Academic Press.
Constraints on music cognition
117
Dowling, W.J. (1989) Simplicity and complexity in music and cognition. In In S.McAdams & I.Deliège (Eds.), Proceedings of the Symposium on Music and the Cognitive Sciences, 1988, Paris, France, Contemporary Music Review. 4, 247–253. Duffy, F., McAnulty, G., and Schachter, S. (1984) Brain electrical activity mapping. In N.Geschwind & A.M.Galaburda, (Eds.). Cerebral Dominance: The Biological Foundations. Cambridge: Harvard University Press. Efron, R., Crandall, P., Koss, B., Divenyi, P.L., and Yund, E.W. (1983) III. The “cocktail party” effect and anterior temporal lobectomy. Brain and Language. 19, 254–263. Efron, R. and Yund, W. (1974) Dichotic competition of simultaneous tone bursts of different frequency-I. Dissociation of pitch from lateralization and loudness. Neuropsychologia. 12, 249– 256. Galaburda, A.M. and Pandya, D.N. (1983) The intrinsic architectonic and connectional organization of the superior temporal region of the rhesus monkey. The Journal of Comparative Neurology, 221, 169–184. Galaburda, A.M. and Sanides, F. (1980) Cytoarchitectonic organization of the human auditory cortex. The Journal of Comparative Neurology, 190, 597–610. Geffen, G. and Traub, E. (1980) The effects of duration of stimulus, preferred hand and familial sinistrality in dichotic monitoring. Cortex, 16, 83–94. Goldman-Rakic, P.S. (1987) Circuitry of the prefrontal cortex and regulation of behavior by representational memory. In: Handbook of Physiology. Vol. V, Part 1, 373–417. Goldman-Rakic, P.S. (1988) Topography of cognition: parallel distributed networks in primate association cortex. In: Annual Review of Neuroscience, 11, 137–156. Gordon, W.P. (1983) Memory disorders in aphasia-I. Auditory immediate recall. Neuropsychologia. 21, 325–339. Heffner, H.E. and Heffner, R.S. (1986) Hearing loss in Japanese macaques following bilateral auditory cortex lesions. Journal of Neurophysiology, 55, 256–271. Jones, E.C. and Powell, T.P.S. (1970) An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain, 93, 793–820. Kimura, D. (1964) Left-right differences in the perception of melodies. Canadian Journal of Psychology, 15 156–165. Kimura, D. (1967) Functional asymmetry of the brain in dichotic listening. Cortex. 3, 163–178. Lauter, J.L., Hersovitch, P., Formby, C. and Raichle, M.E. (1985) Tonotopic organization of the human auditory cortex revealed by positron emission tomography. Hearing Research. 20, 199– 205. Liegeois-Chauvel, C., Musolino, A., and Chauvel, P. (1991) Localization of the primary auditory area in man. Brain. 114, 139–153. Luria, A.R. (1962) Investigation of acoustic-motor coordination. In: Higher Cortical Functions in Man (pp. 436–443). New York: Basic Books. Marin, O.S.M. (1982) Neurological aspects of music perception and performance. In D.Deutsch (Ed.), The Psychology of Music. New York: Academic Press. Mazziotta, J., Phelps, M., Carson, R., and Kuhl, D. (1982) Tomographic mapping of human cerebral metabolism: auditory stimulation. Neurology 32, 921–937. Mazzucchi, A., Parma, M., and Cattelani, R. (1981) Hemispheric dominance in the perception of tonal sequences in relation to sex, musical competence, and handedness. Cortex. 17, 291–302. McCarthy, R.A. and Warrington, E.K. (1987) The double dissociation of short-term memory for lists and sentences: evidence from aphasia, Brain. 110, 1545–1563. Merzenich. M.M. and Brugge, J.F. (1973) Representation of the cochlear partition on the superior temporal plane of the Macaque monkey. Brain Research, 50, 275–296. Middlebrooks, J., Dykes, R., and Merzenich, M. (1980) Binaural response specific bands in primary auditory cortex (Al) of the cat: topographical organization orthogonal to isofrequency contours. Brain Research. 181, 31–48.
Contemporary music review
118
Milner, B. (1962) Lateralization effects in audition. In V.B.Mountcastle (Ed.) Interhemispheric Relations and Cerebral Dominance. Baltimore: Johns Hopkins Press. Neff, W., Diamond, I., and Casseday, J. (1975) Behavioural studies of auditory discrimination: central nervous system. In W.D.Keidel & D.Neff (Eds.) Handbook of sensory physiology, Auditory System, Vol V/2. Springer, Berlin Heidelberg New York, pp. 307–400. Peretz, I. and Morais, J. (1980) Modes of processing melodies and ear asymmetry in nonmusicians. Neuropsychologia. 18, 477–489. Peretz, I. and Morais, J. (1983) Task determinants of ear differences in melody processing. Brain and Cognition. 2, 313–330. Peretz, I. and Morais, J. (1988) Determinants of laterality for music: towards an information processing account. In K.Hugdahl (Ed.) Handbook of dichotic listening: Theory, Methods, and Research London: John Wiley & Sons Ltd. Perry, D.W. (1990) Monaural ear differences for melody recall. Journal of the Acoustical Society of America. 88, S90. Perry, D.W. (1991) Ear and hemisphere differences for melody recall. Doctoral dissertation, California School of Professional Psychology, Berkeley. Petrides, M. and Pandya, D.N. (1988) Association fiber pathways to the frontal cortex from the superior temporal region in the rhesus monkey. Journal of Comparative Neurology. 273, 52–66. Piazza, D. (1980) The influence of sex and handedness in the hemispheric specialization of verbal and nonverbal tasks. Neuropsychologia 18, 123–176. Posner, M.I., Petersen, S., Fox, P.T. and Raichle, M.E. (1988) Localization of cognitive operations in the human brain. Science. 240, 1627–1631. Rogers, R.L., Papanicolaou, A.C., Baumann, S.B., Eisenberg, H.M., and Saydjari, C. (1990) Spatially distributed excitation patterns of auditory processing during contralateral and ipsilateral stimulation. Journal of Cognitive Neuroscience, 2, 44–50. Roland, P.E., Skinhøj, E., and Lassen, N.A. (1981) Focal activations of human cerebral cortex during auditory discrimination. Journal of Neurophysiology. 45, 1139–1151. Sidtis, J.J. (1982) Predicting brain organization from dichotic listening performance: cortical and subcortical functional asymmetries contribute to perceptual asymmetries. Brain and Language. 17, 287–300. Stepien, L.S., Cordeau, J.P., and Rasmussen, T. (1960) The effect of temporal lobe and hippocampal lesions on auditory and visual recent memory. Brain. 83, 470–489. Tanguay, P., Taub, J., Doubleday, C., and Clarkson, D. (1977) An interhemispheric comparison of auditory evoked responses to consonant-vowel stimuli. Neuropsychologia. 15, 123–131. Tramo, M.J., Bharucha, J.J., and Musiek, F.E. (1990) Music perception and cognition following bilateral lesions of auditory cortex. Journal of Cognitive Neuroscience. 2, 195–212. Weinberger, N.M. and McKenna, T. (1988) Sensitivity of single neurons in auditory cortex to contour: toward a neurophysiology of music perception. Music Perception. 5, 355–389. Zatorre, R. (1979) Recognition of dichotic melodies by musicians and nonmusicians. Neuropsychologia. 17, 607–617. Zatorre, R. (1988) Pitch perception of complex tones and human temporal lobe function. Journal of the Acoustical Society of America. 84, 566–572. Zatorre, R., and Beckett, C. (1989) Multiple encoding strategies in the retention of musical tones by possessors of absolute pitch. Memory and Cognition, 17, 582–589. Zatorre, R. and Samsons, S. (in press) Short-term auditory memory deficit after excision of right temporal neocortex. Brain.
Split-brain studies of music perception and cognition
119
Split-brain studies of music perception and cognition Mark Jude Tramo Program in Cognitive Neuroscience, Dartmouth Medical School, Hanover, New Hampshire Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 113–121 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malasia
The biological basis of music perception and cognition is explored from the perspective of empirical work with brain-damaged patients. By examining patients whose left and right cerebral hemispheres have been surgically separated, the role of each half-brain in music processing can be assessed. This approach was used to analyse the lateralization of brain mechanisms mediating timbre discrimination and recognition. We found that each hemisphere of our split-brain patients was able to recognize musical instrument timbres, and that the left hemisphere was superior to the right; on the other hand, only their right hemispheres were able to discriminate subtle differences in timbre between pairs of steady-state harmonic spectra which had no cognitive referents outside the auditory domain. These observations suggest that acoustic-discriminative and semantic-associative processes mediating timbre perception are differentially distributed within the two half-brains. In addition, they emphasize the need to use real-world musical sounds rather than, or in addition to, laboratory-contrived acoustical stimuli when investigating brain mechanisms in music. KEY WORDS: Music, brain, neuropsychology, auditory perception, timbre, hemisphere.
Introduction Like all intellectual and aesthetic life, music cognition emanates from neural activity occurring within the two great hemispheres of the human brain (Figure 1). Why this anatomical duality exists in the face of unified conscious experience remains an
Contemporary music review
120
unanswered question that is of fundamental interest in cognitive neuroscience. By combining the experimental techniques of cognitive psychology with the formal neurological assessment of brain structure and function, investigators working in cognitive neuroscience hope to unravel the paradox of how mind emerges from brain, or at least keep in sight this ancient philosophical concern deep in the modern pursuit of acquiring empirical data about mental operations and their corresponding biological machinery. While the pre-eminent role of the left hemisphere for language functions has been firmly established in the neurological literature, it remains unclear to what degree the wide range of other cognitive functions are divided between—or shared by—the paired half-brains. The study of music perception and cognition in patients who have undergone surgical disconnection of the cerebral hemispheres1 provides an opportunity to examine how certain aspects of music processing are carried out in each hemisphere. Of broader interest in cognitive neuroscience, split-brain
Split-brain studies of music perception and cognition
121
Figure 1 Magnetic resonance scans depicting lateral (a) and medial (b) views of the left cerebral hemisphere. These high-resolution images of the living brain, particularly in this plane
Contemporary music review
122
(sagittal), became available to clinicians and brain scientists only in the last decade. In (a), the dotted line demarcates the temporal lobe (T); the top portion (superior surface) of the temporal lobe houses the auditory area in man. In (b), the arrows indicate the extent of the corpus callosum, the semilunar bundle of nerve fibers which connects the left and right hemispheres. In split-brain patients, the callosum is cut. studies of music perception and cognition permit us to examine whether a special set of psychological functions lying outside the verbal domain are differentially lateralized in the brain. This allows us empirically to address the theoretical possibility that the dichotomous organization of brain mechanisms mediating psychological functions reflects a fundamental dichotomy in human brain-behavior relationships which transcends (and perhaps subsumes) the lateralization of language functions. Music comprises a particularly well-suited nonverbal stimulus set by which to probe this possibility, for like speech it is ubiquitous in our culture, is a highly structured form of communication, is primarily learned and communicated through the special sense of hearing, and is conveyed to the hemispheres via the peripheral auditory system and the heavily-crossed ascending auditory pathways in the brainstem. Furthermore, defining the degree to which functional subcomponents underlying music processing are neurologically dissociable might provide biological validity for hypotheses which postulate a modular organization of serial and/or parallel processes on purely theoretical grounds or on the basis of experimental data collected from normal subjects. Similarly, defining neurally specialized subsystems mediating musical functions may provide empirical evidence for specialized nodes and layering schemes postulated in computational models; while the absence of neuroscientific verification can by no means refute such constructs, the presence thereof would provide strong support, and until such evidence can be accrued the appellation alternately bestowed upon computational models, “neural networks”, is rendered somewhat specious. Right hemisphere specialization in timbre perception? In order to consider the relative roles of the cerebral hemispheres in music, I shall focus on the problem of timbre perception, which is widely held in the neuropsychological literature to be a right hemisphere function. This belief is largely rooted in the work of Brenda Miller (1962), who administered Carl Seashore’s six-part “Measures of Musical Talents” (Laetvit, Lewis & Seashore’s 1942 version) to patients who underwent partial excision of their left or right temporal lobe2—that is, the front (anterior) portion of the part (lobe) of the hemisphere which houses the auditory areas (as well as portions of the
Split-brain studies of music perception and cognition
123
memory and visual areas; Figure 1). Milner found that performance on the subtests called the “Timbre Test” and the “Tonal Memory Test” dropped significantly in patients who had the right anterior temporal lobe removed, whereas patients who underwent left anterior temporal lobectomy showed no difference between pre-operative and postoperative scores. Noting the normative data obtained by Seashore and colleagues from thousands of normal young subjects, Milner expressed some reservations about the Timbre Test results and wrote, “the sensitivity of the Timbre Test to right temporal lobectomy may be related to task difficulty.” While Milner herself did not overtly interpret her data as evidence of right hemisphere specialization for timbre perception as it pertains to music per se (her paper addresses the more general issues of laterality effects underlying the perception of nonverbal auditory stimuli), the use of a test labelled as “Measures of Musical Talents” broached this possibility, and over the intervening twenty-eight years a number of authors have cited Milner’s work as evidence of right hemisphere dominance for timbre perception in music (e.g. Damasio & Damasio 1977; Gates & Bradshaw 1977; Gordon 1983; Sidtis 1985). Because the term “timbre” is ambiguously defined in psychoacoustics as that attribute of sound which distinguishes two different sounds of the same pitch and loudness (American Standards Association 1960), yet is usually defined in music to denote the distinctive tonal quality produced by a particular musical instrument (Apel 1972), let us consider the nature of the stimuli used in Seashore’s Timbre Test and if, in fact, this task measures musical talent. The Seashore Timbre Test contains five blocks of ten trials, each of which requires subjects to determine if two successive complex tones sound the same or differ in “tonal quality”. The stimuli are essentially steady-state spectra composed of a 500 cycle per second (cps) fundamental and its first five harmonics. In half the trials, the standard and comparison differ by the reciprocal intensities of the third and fourth harmonics, with overall intensity remaining constant. The blocks became increasingly difficult over the course of the test in that the reciprocal intensities differ by 10dB in the first block and 4dB in the fifth block. While other physical features of the complex tones are not specified in the test booklet, spectrographic analysis of a digital sample of one of the test stimuli shows a stimulus duration of 1148msec, with an attack time of 160msec and a release time of 148msec (Tramo & Gazzaniga 1991). Most musicians who hear these complex tones agree that there is very little about their tonal quality which can be aptly described as “musical”. Indeed, R.A.Henson and Maria Wyke (1982), working at London Hospital, administered the Timbre Test [and the rest of the 1960 version of the Seashore battery (Seashore, Lewis & Saetvit 1960)] to twenty-one members of “an internationally celebrated symphony orchestra” to determine if the skills of this gifted collection of musicians would be manifest on these purported measures of musical talent. The findings for the Timbre Test were striking—the orchestra members performed significantly worse than the mean of the 4,319 secondary school students used to establish norms! Overall, the musicians scored in the 31st percentile on the Timbre Test. Henson and Wyke concluded that the notion that the Seashore tests measure musical talent is untenable. Reservations about the use of steady-state spectra to analyze timbre perception have been expressed by a number of authors (e.g. Grey 1977; Risset and Wessel 1982; Balzano 1986). At the 1969 International Symposium on Frequency Analysis and Hearing Detection, Schouten used the example of violin timbre and its constancy across different pitches, loudnesses, and durations to voice his objection to the
Contemporary music review
124
standard psychoacoustic definition of timbre and to Reinier Plomp’s elegant mathematical treatment of timbre as a multidimensional attribute of complex sounds, which assumed steady-state frequency components in the analysis (Plomp 1970). More recently, Balzano (1986) has expressed similar concerns about Fourier analysis as a model for timbre perception in music since musical instrument sounds rarely possess distinctive steady-state characteristics and since simple variations of Fourier spectra do not produce the wide variety of naturally-occurring timbres encountered in the musical environment. Before I was aware of Henson and Wyke’s observations, I had heard the Seashore Timbre Test stimuli and was taken aback by the repeated citations of Milner’s study in the neuropsychology literature as evidence for right hemisphere specialization in music. I was motivated to undertake a series of experiments with split-brain patients in collaboration with Michael Gazzaniga (Tramo & Gazzaniga, 1989, 1991) who, working with Roger Sperry in the early 1960’s, developed the methodology for studying splitbrain animals and, ultimately, humans when Joseph Bogen began to perform commissurotomies to treat epilepsy (Gazzaniga, Bogen & Sperry 1962). In order to examine the operations of one half-brain independent of its
Split-brain studies of music perception and cognition
125
Figure 2 Split-brain paradigm. The patient fixates a central point on a computer screen while the musical instrument sounds are played in free field. Since the visual pathways are normally crossed, when a picture is
Contemporary music review
126
flashed in one visual field (here the picture of the violin is flashed in the left visual field), it is conveyed to the contralateral hemisphere (here the right hemisphere). In normal subjects, because of visual transfer across the corpus callosum, both hemispheres see the picture; in split-brain patients, no such transfer is possible, so only one hemisphere sees the picture. A tachistoscope is used to flash the picture for 150msec, faster than the eye can move, otherwise the picture might reach both hemispheres because of eye movements. A fast Fourier transform of the first 400msec of the violin sound is shown at the top of the figure [y axis=time in msec, x=kilohertz (kHz, or 1000cps), and height=relative amplitude (quantized sample value units)]. An example of one of the five melodic structures used to present the instrument sounds (here chromaticvariable) is also shown. In the next trial, a trumpet will play and a picture of a piano is flashed to the left hemisphere, and so on. (See text for further details of the experimental procedure). partner, Gazzaniga and colleagues took advantage of the organization of the visual system, which in the face of commissurotomy allowed them to deliver stimuli to only one half-brain using a tachistoscope (Figure 2). Musical instrument timbre recognition by the two half-brains In order to consider the relative roles of the hemispheres in timbre perception, I presented two split-brain patients (neither of whom was an accomplished musician) five different musical instrument sounds I had used in a New York City recording studio (a Yamaha DX-7 piano sound, a Roland TR-505 snare drum sound, and Ensoniq Mirage violin, trumpet, and bell samples). The instrument sounds were presented in free field in the
Split-brain studies of music perception and cognition
127
context of five eight-note melodies (ascending diatonic scale, ascending chromatic scale, variable diatonic set, variable chromatic set, and monotonic) played at one note per second in the registers of C4 (piano, trumpet, bell) and C5 (violin) (A4=440cps); of course, the snare drum was presented in a monotonic sequence only. Figure 2 illustrates the experimental procedure. During the playing of the sixth or seventh note, the subject was reminded to fixate a central dot on a Macintosh Plus computer screen, and a picture of a musical instrument was flashed for 150msec in one or the other visual field. Subjects were instructed to press the space bar on the computer keyboard if what they saw was what they heard. Thus, in this match-to-sample task, the match was made available to only one hemisphere, and it matched the sample in 50% of trials. The visual field the picture was flashed in, the hand used to respond, the instrument sound, the melodic context, and whether or not the picture matched the sound were counterbalanced across two hundred trials. The results are illustrated in Figure 3. In both patients, the left hemisphere and the right hemisphere performed significantly better than chance, revealing that each half-brain was able to discriminate and recognize musical instrument timbre. In addition, the results indicated that in both patients the left hemisphere was superior to the right hemisphere. We replicated the latter finding in a second experiment, in which we measured reaction time for correct responses as well as response accuracy in one of the patients. Not only was the left hemisphere again more accurate than the right hemisphere, but it was significantly faster than the right hemisphere on correct responses, even when the left hand, which is controlled predominantly by the right hemisphere, was used to make the response. We had tried to avoid verbal stimuli in the experimental design by using musical instrument sounds and line drawings, but because we wanted to give each subject timbres played in musical contexts, our auditory stimuli were lengthy, and likely allotted enough time for the left hemisphere to name the sound, and then match the picture to the name. If this were the case, the covert verbal mediation could have led to the observation of left hemisphere superiority, not on the basis of timbre perception but on the basis of a superior capacity for word-picture matching (by the left hemisphere) over sound-picture matching (by the right hemisphere). That is, the left hemisphere, having heard the sound, named it, and then matched the picture to the name, while the language-poor right hemisphere was unable to name the sound, and therefore had to match the sound itself with the picture.
Contemporary music review
128
Figure 3 Bar graph illustrating the observed “double dissociation”: the left hemisphere was significantly more accurate than the right (p<.01) on the musical instrument timbre recognition task, while the right hemisphere performed significantly better than the left hemisphere on the Seashore Timbre Test (p<.05). For convenience, per cent accuracy is averaged across the two subjects, both of whom showed the same pattern of significant left-right differences (Tramo & Gazzaniga 1989, 1990). In order to investigate this possibility, we carried out a third series of experiments, in which the subject was presented with written names instead of pictures as the visual target “match” for the musical “sample”. If in the first two experiments the left hemispheres had named the sound and then had matched the picture to the name, we would expect the left hemisphere to be faster for word targets than for picture targets. The results showed that, in fact, the left hemisphere was faster when pictures were flashed than when words were flashed. Although this difference did not reach statistical significance, the left hemisphere certainly was not faster for words than for pictures. These results argued strongly against covert verbal mediation as the basis of left hemisphere superiority in timbre perception. As expected, the right hemisphere was significantly faster for pictures than for words.3
Split-brain studies of music perception and cognition
129
We interpreted the data as evidence that each half-brain was able to recognize musical instrument timbre, and that the left hemisphere was superior to the right hemisphere in performing our particular auditory-visual (cross-modal) match-to-sample task. We thought the difference between our results and Milner’s results using the Seashore Timbre Test was primarily related to stimulus parameters: whereas the Seashore Timbre Test used steady-state spectra which were unfamiliar and had no cognitive referents outside the auditory domain, we had used well-known complex sounds whose spectra contained rapid transients and which had visual and lexical as well as auditory referents. However, there were numerous differences between our experimental design and Milner’s other than stimulus parameters, not the least of which was our small and select population of split-brain patients. Therefore, we decided to carry out a fourth experiment in which we administered the Seashore Timbre Test itself to each disconnected halfbrain. Instead of having a response sheet in front of our subjects with columns marked “S” and “D”, as in the standard administration of the test, we tachistoscopically flashed the response choices “SAME” and “DIFF” alternately in the upper and lower quadrants of one or the other visual field and asked our subjects to point to the correct response with the hand on the same side of the visual stimulus. Our subjects were familiar with this forced-choice procedure from numerous other experiments conducted in our laboratory. The results are illustrated in Figure 3. In both patients, only the right hemisphere performed significantly above chance, and right hemisphere performance was superior to left hemisphere performance. Thus, an anomalous lateralization of brain mechanisms mediating timbre perception in our split-brain patients could not account for the data from the musical instrument timbre recognition task: as in Milner’s patients, performance on the Seashore Timbre Test appeared to be a right hemisphere function.
Conclusion In summary, both the left hemisphere and the right hemisphere of our split-brain patients were able to recognize musical instrument timbres, and the left hemisphere was more accurate and faster than the right hemisphere. Covert verbal mediation did not appear to be the basis of this left hemisphere advantage. Conversely, only the right hemisphere of each patient performed significantly greater than chance on the Seashore Timbre Test, and right hemisphere performance was superior to that of the left. These results argue against a strict verbal/ nonverbal dichotomy as the basis of hemispheric specialization in auditory perception. An alternative hypothesis is that the two half-brains perform complementary functions in auditory pattern discrimination and recognition: while the right hemisphere is specialized to perform fine-grained acoustic-discriminative functions, the left hemisphere excels in semantic-associative functions and multimodal integration, by which the meaning of nonverbal sounds is derived. It would appear that the nature of auditory stimuli used to examine brain mechanisms in music is likely to influence the interpretation of brain-behavior relationships governing music perception and cognition.
Contemporary music review
130
Notes 1. Split-brain surgery, formally termed “commissurotomy” or “corpus callosotomy” (the latter if the anterior commissure is not sectioned), is performed in some medical centres in an attempt to treat epilepsy that is out of control despite anticonvulsant medications. The interhemispheric fiber tract [corpus callosum (Figure 1)] which connects the two hemispheres is cut, while the hemispheres themselves remain unscathed. 2. Excision of the temporal lobe, termed “temporal lobectomy”, usually spares the back (posterior) portion of the temporal lobe and parts of the auditory area contained therein. It is widely used in the surgical treatment of epilepsy caused by a focus of aberrant electrical activity within the anterior temporal lobe that is not satisfactorily controlled by anticonvulsant medications. 3. Both patients have demonstrated limited language capacities in the right hemisphere, including the ability to read simple nouns. A discussion of right hemisphere reading in these split-brain patients and in other populations is beyond the scope of this paper, the reader is referred to Baynes (1990) for a comprehensive review of the literature.
References American Standards Association (1960) Acoustical Terminology, S1.1–1960. New York: American Standards Association. Apel, W. (1972) Harvard Dictionary of Music. Cambridge, MA: Belknap Press. Balzano, G.J. (1986) What are musical pitch and timbre? Music Perception, 3, 297–314. Baynes, K. (1990) Language and reading in the right hemisphere. Journal of Cognitive Neuroscience, 2, 159–179. Gazzaniga, M.S., Bogen, J.E. & Sperry, R.W. (1962) Some functional effects of sectioning the cerebral commissures in man. Proceedings of the National Academy of Sciences U.S.A., 48, 1765–1769. Damasio, A.R. & Damasio, H. (1977) Musical faculty and cerebral dominance. In M.Critchley and R.A. Henson (Eds.), Music and the Brain, pp. 141–155. London: Heinemann Medical Books, Ltd. Gates, A. & Bradshaw, J.L. (1977) The role of the cerebral hemispheres in music. Brain and Language, 4, 403–431. Gordon, H.W. (1983) Music and the right hemisphere. In A.W.Young (Ed.), Functions of the Right Hemisphere, pp. 65–86. London: Academic Press. Grey, J.M. (19771) Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America, 6, 1270–1277. Henson, R.A. & Wyke, M.A. (1982) The performance of professional musicians on the Seashore measures of musical talents. Cortex, 18, 153–157. Milner, B. (1962) Laterality effects in audition. In V.B.Mountcastle (Ed.), Interhemispheric Relations and Cerebral Dominance, pp. 177–195. Baltimore: Johns Hopkins University Press. Plomp, R. (1970) Timbre as a multidimensional attribute of complex sounds. In R.Plomp and G.F. Smoorenburg (Eds.), Frequency Analysis and Hearing Detection, pp. 397–414. Leiden: A.W.Sijthoff. Risset, J.C. & Wessel, D.L. (1982) Exploration of timbre by analysis and synthesis. In D.Deutsch (Ed.), The Psychology of Music, pp. 25–58. Orlando: Academic Press. Seashore, C.E., Lewis, D. & Saetveit, J. (1960) Seashore Measures of Musical Talents (Revised). NY: Psychological Corporation.
Split-brain studies of music perception and cognition
131
Sidtis, J.J. (1984) Music, pitch perception, and the mechanisms of cortical hearing. In M.S.Gazzaniga (Ed.), Handbook of Cognitive Neuroscience, pp. 91–114. New York: Plenum Press. Tramo, M.J. & Gazzaniga, M.S. (1989) Discrimination and recognition of complex tonal spectra by the cerebral hemispheres: Differential lateralization of acoustic-discriminative and semanticassociative functions in auditory pattern perception. Society for Neuroscience Abstracts, 15, 1060. Tramo, M.J. & Gazzaniga, M.S. (1991) Cerebral specialization in auditory pattern perception: Timbre (in preparation).
Contemporary music review
132
Musical structure in cognition The influence of implicit harmony, rhythm and musical training on the abstraction of “tension-relaxation schemas” in tonal musical phrases Emmanuel Bigand Laboratoire de Psychologie de la Culture Université Paris X Nanterre, Paris, France. Contemporary Music Review 1993, Vol. 9, Parts 1 & 2, pp. 123–137 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
Tension-relaxation schemas are an important meaningful structure in music. The two present experiments investigate the main factors involved in their abstraction. The experimental method, similar to the procedure used by Palmer and Krumhansl (1987) consists in segmenting into several fragments 6 melodies varying in their implicit harmony or their rhythmic structure. Asking the subjects to evaluate the degree of completeness of each fragment may be thought of as an indirect way of measuring the degree of musical stability. The collected responses define an average profile which may be considered as an approximation of the musical tension/relation network abstracted by the listener. Results indicate that a level of coding exists where the musical phrase could be represented by its network of musical tensions and relaxations, which is in accordance with the Lerdahl & Jackendoff’s prolongational hypothesis. Abstraction of this network is influenced by the implicit harmony, the rhythmic structure, and, for musician-subjects by the interaction of these two factors. Results of the second
Musical structure in cognition
133
experiment seem to suggest that the psychological processes involved in such an abstraction are not strongly influenced by the musical training. In conclusion, some suggestions about a systematic formalisation of the rules involved in the determination of tension-relaxation schemas are put forward. KEY WORDS: Prolongational reduction, interaction pitch hierarchy x rhythm musical training, tensionrelaxation network, psychological representation of musical structures, musical phrase.
Music is one of the most complex acoustical structures of our environment. The study of the way a listener perceives, organises and memorises musical pieces fundamentally improves our knowledge about the complex perceptual and cognitive processes human beings are able to perform. But considering music only as a complex acoustical structure would be restrictive; from a psychological point of view, music is primarily an informative structure which enables us to exchange different emotions, and so to communicate in a non-verbal fashion. Understanding how a listener uses all his perceptual and cognitive competence to extract musical informative structure is the main goal of the research. From this perspective, there are two questions to be distinguished: the first concerns the nature of meaningful musical structures, the second the psychological processes involved in their abstraction. Research on musical expressivity and on musical semantics, carried out by Francès (1958) and Imberty (1979, 1981) showed the essential part played by musical tension and relaxation schemas; these schemas are extracted from the musical piece and then assimilated to kinetic and emotional schemas of tension and relaxation, which accumulate all of the affective experience of the listener. Therefore, it seems reasonable to consider that the most important part of musical expressivity might be determined firstly by the specific way each composer organises the musical tension and relaxation in time, and secondly by the kinds of musical tension and relaxation the listener manages to abstract. I shall now expand upon this second point. What are the psychological processes involved in such an abstraction; how does a listener interpret all the musical parameters to determine the musical tensions and relaxations of the piece? Lerdahl and Jackendoff’s theory is a very important contribution to this question. The authors claim that intuitions about tension and relaxation are determined by the combination of the grouping structure, the metrical structure, and the tonal hierarchies. This combination leads to the abstraction of an event hierarchy from which a hierarchy of tensions and relaxations (“Prolongational reduction”) maybe derived.
Contemporary music review
134
Figure 1 Schematic representation of the theory by Lerdahl 1989. Many of these components are psychologically plausible. First, we know that grouping is a major characteristic of perception (Fraisse 1974) and the experiments carried out by Deliège (1987) confirm the psychological validity of different grouping rules involved in the model. Second, several experiments have shown the listener’s ability to abstract metrical structure (Povel 1981, Essens & Povel 1985, Sloboda & Parker 1985), and many others have pointed out a very sophisticated implicit knowledge of tonal hierarchies by the listener (Krumhansl 1979, Krumhansl & Kessler 1982, Bharucha & Krumhansl 1983). Other data indicates that the listener manages to organise musical events in a hierarchical way (Deutsch 1980, Stoffer 1985, Serafine 1989). At the very least, the possibility of extracting a link between different variations and a theme, suggests that a level of coding exists where the musical phrase is represented by its underlying network of tension and relaxation (Bigand 1990a, 1990b, 1990c). The main problem however, is to understand how these different components really interact. Lerdahl and Jackendoff’s theory suggests that the metrical and grouping structures have a double function. First they divide the piece into groups, and then they add rhythmic values to the tonal hierarchy to determine the relative stability of each event.
Figure 2 Influence of the rhythmic structure on the tension-relaxation schemas by Lerdahl & Jackendoff 1983. Let us look at figure 2. Because of the tonal hierachy the D (quaver) creates a strong musical tension that is resolved on E. At a more abstract level, the D (minim) institutes a relative tension which will be resolved on C. Finally, at the most abstract level, the E (minim) produces a fundamental tension which will be resolved on C. Because of his
Musical structure in cognition
135
implicit knowledge of the tonal hierarchy, we may reasonably suppose that a listener would abstract these three levels of tension and relaxation. But what will happen if the rhythmic structure is changed? As Lerdahl and Jackendoff remarked, the musical tensions will be differently organised.
Experiment 1 Thus, it is probable that rhythmic values are added to the tonal weight to determine the musical tension. The few experiments on this issue suggest that the structural importance of the notes comprising a melody is not only determined by the tonal hierarchy but also by their rhythmic position (Palmer and Krumhansl 1987a, 1989b). More recently Serafine (1989) has observed that this structural importance is also strongly influenced by the metrical position: events on the strong beat tend to be perceived as more important. Though these experiments are of great interest, the way these different factors really interact in music perception is still little known and many problems remain. The first concerns the psychological importance of the tonal hierarchy in the perception of musical phrase. Can two melodies having the same rhythmic structure, the same melodic contour, the same tempo, the same dynamic, but differing in their implicit harmonies, really generate different networks of musical tensions and relaxations? The second concerns the role of the rhythmic structure in the determination of musical stability: can two melodies differing only in their rhythmical structure generate different tension/relaxation schemas? It maybe interesting to distinguish which part of this effect relates to the metrical structure and which to the different durations. The third problem concerns the eventual interaction of the pitch hierarchy and the rhythmic structure. Palmer and Krumhansl’s results (1987a, 1987b) tend to confirm an independent relationship suggesting that each structure is treated by two separate cognitive processes. As Peretz and Morais (1989) emphasised, this question is important for the cognitive sciences, since it improves our knowledge of the possible modularity of the musical mind. The last question concerns the role of musical training. Are processes involved in the abstraction of a network of musical tension determined by musical training, or do they reveal a competence to structure musical pieces, which does not require any particular learning, as in the case of the understanding of language? The purpose of these two experiments is to address to these questions.
Experimental method To measure the tension/relaxation schemas generated by a musical phrase it is necessary to register the degree of musical stability of each note. The procedure used by Palmer and Krumhansl (1987a, 1987b) appears very efficient: it consists in segmenting the melody into several fragments; each stops on a different note defining differently-ending units. In H1R1 (fig. 3), the first fragment stops on the G#, the second on the A, and so on. A melodic fragment stopping on a very stable note (a tonic in a strong rhythmical position for example) does not contain any musical tension: in this case a musical continuation
Contemporary music review
136
seems unnecessary and the fragment can be considered concluded. But when the melodic fragment stops on an unstable note (leading note) it contains a strong musical tension that requires musical continuation: in this case the melodic fragment would appear weakly concluded.
Figure 3 The 6 melodies of the first experiment with their prolongational structure defined by Lerdahl. Therefore, asking the subjects to evaluate the degree of completeness of each fragment may be thought of as an indirect way of measuring the degree of musical stability. As musical tension may be varied in a subtle way, it is necessary to provide the subjects with
Musical structure in cognition
137
a scale of responses. Here this scale contains 7 steps and the subjects’ task is to choose those which best correspond to the degree of completeness of each fragment. The collected responses define an average profile which may be considered as an approximation of the musical tension/relaxation network abstracted by the listener. If a tonal melody does not generate musical tensions, these profiles would look like a straight line. If a musical phrase generates a hierarchy of musical tension/relaxation effectively, the profiles would present a great contrast. Material In order to study the effects of the tonal hierarchy on this profile we must define another melody with the same rhythm and same melodic contour, but differing in its musical progression. As indicated in figure 3, the melodies H1 and H2 differ in their implicit harmonies and thus in their prolongational structures; the main tension appears on bar 3 in H2, whereas a the main relaxation appears in H1, and the main tension appears on bar 4 in H1, whereas there is a large relaxation in H2. Because of this different implicit harmony the tonal weight of an identical ending unit would not be the same considered with respect to H1 or to H2. For example, the E on ending unit 12 is more stable in H2 where it is a prolongation of the local tonic, than in H1 where it is a third subordinate to the local tonic C. In order to study the effect of the rhythmic organisation we now define two other melodies. R2 is obtained by shifting the rhythmic structure of R1 by one quaver; let us note that many ending units which were on a strong beat in R1 are on a weak beat in R2. A more important rhythmic change is performed in R3 so that the effect of duration may be measured. Let us note, for example, that the quaver A in H1R1 becomes a dotted crotchet in H1R3 (ending unit 2). Finally, in order to study the interaction, these rhythmic changes are applied to the melody H2. Each of the six melodies is segmented into 19 fragments, 114 fragments in all, played by a computer, strictly at the same tempo and without accentuation. Procedure and subjects The procedure is similar to that of Palmer and Krumhansl (ibid): each subject listens to all the fragments and the presentation order is varied randomly. 18 subjects are employed: 9 musicians from the Marseilles’ Philharmonic Orchestra, 9 non-musicians who have never played or learned music.
Experimental hypotheses 1. If the tonal hierarchy influences the tension/relaxation schemas, the profiles should differ in the melodies H1 (R1, R2, R3) and the melodies H2 (R1, R2, R3). 2. If the rhythmic structure influences the tension/relaxation schemas, the profile of R1, R2, R3 would differ. In this case two other hypotheses might be tested: one concerning the effect of the duration, the other that of the metrical structure.
Contemporary music review
138
3. If tonal hierarchy and rhythm are two independent musical dimensions, changing the rhythmic structure from R1 to R3 should alter these profiles in the same way in H1 and in H2. 4. Finally, if musical training influences the abstraction of tension/relaxation schemas, the effect of the preceding factors should differ in musicians and non-musicians.
Results of the first experiment First let us consider the musicians’ results. Profiles obtained in each experimental situation are shown in Figure 4. We can see immediately that each melody generates varied tension/relaxation schemas, and that these schemas differ strongly in each experimental situation. These differences are examined using the Multivariate statistical analysis method. First the effects of the factors are analysed on each ending unit (univariate analysis of variance). The profile differences are then tested by a multivariate analysis of variance.
Musical structure in cognition
139
Figure 4 Average profiles of musical tension obtained in the first experiment. These are the main results of this analysis. 1. The difference in the profiles observed in H1 and H2 is significant at p<0.001 (Wilks, Pillai and Hotelling-Lawson tests). The superimposition of these profiles shows how the two implicit harmonies generate different tension/relaxation schemas (fig. 5).
Contemporary music review
140
Fig. 5 Effect of the implicit harmony in R1, R2, R3.
Musical structure in cognition
141
While a musical stability period appears at H1 in bar 3, a strong tension period is observed in H2 in bar 4. The phenomenon is reversed at bar 5 which accords with the prolongational tree. Let us consider some of these differences in more detail. At ending unit 5 we have the tonic A at H1 and a D# at H2, which does not belong to the tonality and introduces a strong musical tension. Because of this different tonal weight the degree of musical stability observed here is higher in H1. An opposite result is observed on the ending unit 8. A more interesting fact is shown for ending unit 12. Here the two melodies have the same note E, but as the implicit harmonies are different, those Es do not have the same musical function and therefore the same tonal weight: at H1 the E is a third subordinate to the local tonic C, and at H2 the E is a prolongation of the local tonic. Experimental data confirms that the musicians perceived these different functions; indeed the degree of stability is higher in H2 and this difference is significant at p<0.039. These results demonstrate that two melodies having an identical superficial structure, but differing in their harmonic progression generate effectively different tensions/relaxations schemas. 2. Consider now the effect of the rhythmic structure. Differences between the profiles of R1, R2, R3 are significant at p<0.007: musical tensions are not only determined by the tonal hierarchy. Two different rhythmic effects can be distinguished. The responses observed in R1 for ending units 2, 6, 9, 13, 16, 20, which are on the strong beat, are compared to the responses observed for the same ending units in R2, which are now on the weak beat. Man Withney’s test indicates that the differences are not significant. In this experiment the metrical structure does not influence the musical stability of the note. The responses obtained in R1 for the ending units 2, 6, 10, 14, 18, 20 (quaver) are now compared to those obtained on the same ending units in R3 (dotted crotchet). The degree of musical stability is systematically higher in the second case (p<0.01): all the musical parameters being equal, the longer the note, the higher its musical stability. In comparing other ending units we can observe a significant effect of this factor even for a very little difference of duration (quaver/crotchet), (p<0.01). 3. Given these two main effects, the third question concerns the interaction. Results of the multivariate analysis of variance indicates an interaction between the implicit harmony and the rhythm significant at p<0.007. Changing the rhythmic structure does not have the same effect on H1 as on H2. Consider for example what happens on ending unit 5. The note is the tonic A in H1 and a leading note of key E in H2. As we can see in figure 6, the same rhythmic changes strongly affect the musical stability of the A but not that of the D# (interaction significant at p<0.019). 4. Let us now consider the non-musicians’ results. Profiles are less constrasting than those of the musicians, and differences between each experimental situation are less important. The effect of the implicit harmony is only significant at p<0.078 suggesting that the different harmonic progression generates only a small differ ence in the tension/relaxation schemas for the non-musicians. The effect of rhythmic structure is also significant at p<0.001: for the non-musicians, changing the rhythm of a melody alters its network of musical tension/relaxation. As previously, the effect of duration appears highly significant (p<0.01) even if the difference in duration is only one quaver for one crotchet (p<0.04). In contrast the effect of the metrical structure is not significant here. Finally comparison of the
Contemporary music review
142
Figure 6 Example of local interaction observed with musician subjects. profiles shows that the implicit harmony/rhythm interaction is not significant. Therefore, these results suggest that musical training has a strong effect on the abstraction of the tension/relaxation schemas: they seem to suggest that the nonmusicians’ aptitude in abstracting these schemas is less developed. Indeed their profiles appear almost monotonically related to overall melody duration, suggesting that for them, irrespective of the musical function, the longer the fragment, the higher its degree of achievement.
Comments Before interpreting these different results, let us note that two are surprising. First, the lack of effect of the metrical structure might suggest that this structure has no influence on the musical tension and relaxation. Second the strong difference between the two populations is inconsistent with other recent experimental results indicating that the nonmusician has a very sophisticated musical competence (Bigand 1990b, Deliège 1990). For these two reasons the experimental procedure was considered critically. Two main weaknesses should be mentioned. First, each subject listens to all the musical fragments. As these fragments are very similar, this design produces interferences which could obscure many subtle effects of the different factors. Second, as the presentation order is determined at random, it often appears that a short fragment follows a longer one. In this case the degree of completeness is not only defined by musical stability but also by the listener’s knowledge of the continuation of the melody. These two defects might
Musical structure in cognition
143
seriously confuse the subjects: the main purpose of the second experiment is to remedy to them.
Experiment 2 Only four melodies are used (H1R1, H1R2, H2R1, H2R2). 8 independent groups of 9 subjects are formed: 4 groups of musicians (graduate conservatory students studying musicology), 4 groups of non-musicians (students of the same age, but without formal musical training or practice). 2×4×9 (72) subjects were required for this experiment. Each group listened to fragments of only one of the 4 melodies. This time, a fragment (x) immediately follows the fragment (x−1) and precedes the fragment (x+1). This new presentation respects the chronology of the melody and permits observation of how the listener abstracts the different stages of the musical progression. As he does not know when the melody will stop his responses can only be based on the musical tension or relaxation he perceived. Because many interferences are now ruled out, it may be conjectured that the effect of the factors will appear in a more relevant way.
Figure 7 Average profiles of musical tension obtained in the second experiment.
Contemporary music review
144
Results As we can see in figure 7, the musicians’ profiles are roughly the same as those of the first experiment, although much more constrasted. Again we find a main effect of the implicit harmony significant at p<0.001. The differences between the harmonic progressions are localised in exactly the same way as mentioned in experiment 1. The rhythmic factor too is significant at p<0.013. The effect of the duration, significant at p<0.001 confirms that all things being equal, the longer a note, the higher its musical stability. The first difference appears when we measure the effect of the metrical structure. The average degree of musical stability observed on the ending units situated on a strong beat in one experimental situation (U2, U6, U9, U13 in R1 for example) is systematically higher than those observed for the same ending units when situated on the weak beat in the other experimental situation (U2, U6, U9, U13 in R2). This difference, significant at p<0.011 indicates that, all things being equal, a note on a strong beat is perceived as more stable than one on the weak beat. Finally the interaction between the implicit harmony and the rhythmical structure is significant at p<0.05. The effect of changing the rhythm of a melody depends on its implicit harmony. This effect is investigated in more detail in the following way. First, the degree of stability observed for the ending units U1R1, U7R2, U8R1, U14R2, U15R1 (crotchet) are compared to those observed in U1R2, U7R1, U8R2, U14R1, U15R2 (quaver) respectively. This comparison is effected in H1 and in H2. A multivariate analysis of variance shows that the effect of the duration is not the same in H1 and in H2. All things being equal, the effect of increasing the duration of a note depends on its musical function. For example, increasing the duration of a tonic (ending unit 7) does not have the same effect as increasing the duration of a leading note (Fig. 8a). The effect of the metrical structure is detailed in the same way. It differs in a significant way if it is observed in H1 and in H2 (p<0.035): all things being equal, a note on a strong beat is perceived as more stable than one on the weak beat but this gain depends on the tonal weight of the note. For example, the E in H2 (ending unit 12) is more strongly affected by the change in its metrical position, than the E in H1 (fig. 8b).
Musical structure in cognition
145
Figure 8 Examples of the interaction between the tonal weight and the duration (short/long) observed on ending unit 7 (a), and between the tonal weight and the metrical position (weak/strong beat) observed on ending unit 12 (b) with musician subjects. To summarise, the musicians’ results are consistent with those of the first experiment and they give more information about the interaction between implicit harmony×rhythm. It is not the same when we look at the non-musicians’ results. These new profiles differ radically from those of the first experiment: they are more contrasted, and more opposite in each experimental situation generated. The multivariate analysis of variance points to an effect of the implicit harmony significant at p<.001, which is the most important difference between the two experiments. Here, each musical progression generates contrasted tensions and relaxations schemas. Differences between them are qualitatively the same as those observed with musicians (see exp. 1). An interesting phenomenon is observed again for the ending unit 12. The average degree of musical stability is 3.33 for the E in H1, and 4.39 for the E in H2. This difference, significant at p<0.03, proves that even non-musician subjects are able to abstract very subtle differences in musical function. The effect of the rhythmical structure is significant at p<0.05, just as those of the duration p<0.001), which is consistant with the results of the previous experiment. But this time, the effect of the metrical structure is highly significant at p<0.009: all things being equal, a note tends to be more stable when it is played on a strong beat. Finally, comparison of the profiles shows that the changes of the rhythmic structure have the same effect in H1 and in H2. Given this lack of interaction, it was interesting to compare the effects of the metrical structure and of the duration in H1 and in H2. Firstly, the effect of the metrical structure appears not to be the same in H1 and in H2 (p<0.033). For the non-musicians too, the musical stability of a note tends to increase when this note is played on the strong beat, but this gain depends on the tonal function of the note.
Contemporary music review
146
Considering now the effect of the duration, we note that it is the same in H1 and in H2. So, for the non-musicians, independent of its musical function, the longer a note, the higher is its musical stability. This result explains the absence of a global interaction between implicit harmony×rhythm for the non-musicians. In sum, the main results of this second experiment is such as to indicate a very sophisticated competence of the non-musician listener in abstracting musical tension and relaxation. As we can see in figure 7, their results are quite similar to those of the musicians; in this second experiment, musician-subjects do not reveal a musical competence which would not exist in the non-musician subjects.
General interpretation Taken together, these two experiments confirm that a level of coding exists where the musical phrase is represented by its network of musical tension and relaxation. For all the listeners, abstraction of this network is influenced by at least four factors. The main effect of the implicit harmony shows that two melodies having the same superficial aspects but differing in their implicit harmony generate very different tonal weights are perceived as having different musical stability: as we have noted, difference in musical stability tends to follow difference in tonal weight. But the musical stability of each note contained in a melody is not only determined by its tonal weight. The main effect of the rhythmic structure in these two experiments suggests that it is also strongly influenced by rhythmic value. The experiment showed that two kind of rhythmic values are relevant. The first concerned the metrical position of the event, the second its duration. As we have ascertained, even a small difference in duration can slightly alter the local tension/ relaxation schemas. This result explains how subtle variations in duration effectuated by the performer may be appreciated by the listener. These results demonstrate, in accordance with Lerdahl and Jackendoff’s theory, that tonal weight and rhythmical value interfere in the determination of tension/ relaxation schemas. They show that an interactive relation between these two musical dimensions cannot be ruled out; it means that musical dimensions are not processed independently; as we have seen the effect of a change on one dimension depends on the musical context where it appears. This result contradicts others obtained by Palmer and Krumhansl (1987a, 1987b). This divergence may be explained by the different methodologies employed. In order to study the interaction between two factors, it could be better to use a classical factorial design where the two factors are systematically varied. The experiment of Palmer and Krumhansl varied only one; the rhythmical factor. The presence of interaction suggests that the cognitive processes implicit in the determination of tension/relaxation schemas are unlikely to be modular. The last factor which influenced this outcome is musical training. Results of the two experiments seem to suggest that the tension and relaxation schemas abstracted by the musicians tend to be more varied. However the results of the second experiment point to the fact that this difference should not be exaggerated; a detailed analysis will convince us that these listeners managed to abstract very subtle differences in musical structure. Obviously, a musical competence to structure musical pieces exists, which does not
Musical structure in cognition
147
require any specific training: for this reason, the main hypothesis of Lerdahl and Jackendoff’s model may be extended to inexperienced listeners as well.
Conclusion In conclusion, I would like to offer some suggestions about the formalisation of the rules involved in the establishment of tension/relaxation schemas. Given the effect of the tonal hierarchy we could assign to each note of a melody a specific tonal weight. The tonic of the main key would receive a weight of 7, the dominant a weight of 6, the third, a weight of 5, the other notes of the key a weight of 4, and a note which does not belong to the key a weight of 3 or 2, depending whether they belong to a near or far key. When a modulation appears the notes in the new key may be assigned values using the same system with a decrease of a value of 1 or 2 depending whether the new key is near or far from the main key. Applied to the melody H1, this system produces the line 1 (fig. 9). Of course this line differs from H1 to H2. Given the effect of duration we can allocate to each note a duration value varying here from 1 to 2 (line 2). We can apply a similar system in determining metrical value, varying here from 1 to 3 (line 3). Of course these lines vary from R1 to R2. If we consider the musician-subjects, because of the interaction, we can assume that the multiplication of these three lines can produce a theoretical profile of musical tension/relaxation, which should not be too different to that observed.
Figure 9 Comparison of the theoretical and the real profiles.
Contemporary music review
148
Certainly the similarity observed in figure 9 is a good starting point, but obviously the lack of total adequacy implies that other factors should be included. This system could be useful in formalising the missing factors, and it will be the object of future experiments.
References Bigand, E. (1990a) Perception et compréhension des phrases musicales. Thèse de Doctoral de Psychologie. Université Paris X Nanterre, France, Universal microfilm ISSN: 0294–1767, n° 09882/90. Bigand, E. (1990b). Abstraction of Two Forms of Underlying Structure in a tonal Melody, Psychology of Music, vol. 18, n°1, 45–60. Bigand, E. (1990c) Perception des schémas de tensions et détentes dans une phrase musicale, Acte du ler Congrès Européen dAnalyse Musicale de Colmar. Bharucha, J. & Krumhansl, C. (1983) The representation of harmonic structure in music: Hierarchies of stability as a function of context. Cognition, 13, 63–102. Deliège, I. (1987) Grouping conditions in listening to music: An approach to Lerdahl & Jackendoff’s grouping preference rules. Music Perception, 4, (4), 325–360. Deliège, I. (1990) Mechanisms of cue extraction in musical grouping: A Study on Sequenza VI for Viola by L.Berio. Psychology of music, vol. 18, n×1, 18–44. Deutsch, D. (1980) The processing of structured and unstructured tonal sequences. Perception & Psychophysics, 28, 381–389. Essens, P. & Povel, D. (1985) Metrical and non metrical representations of temporal patterns. Perception & Psychophysics, 37, 1–7. Fraisse, P. (1974) La psychologie du rythme, Paris, PUf. Francès, R. (1958) La perception de la musique. Paris, Vrin, trans, by W.J.Dowling, The Perception of Music, Hillsdale, N.J.: Lawrence Erlbaum Associates (1988). Imberty, M. (1979) Entendre la musique: sémantique psychologique de la musique, tome 1, Paris, Dunod. Imberty, M. (1981) Les écritures du temps: Sémantique psychologique de la musique, tome 2, Paris, Dunod. Krumhansl, C. (1979) The psychological representation of musical pitch in a tonal context, Cognitive Psychology, 11, 346–374. Krumhansl, C. & Kessler, E. (1982) Tracing the dynamic changes in perceived tonal organisation in a spacial representation of musical keys, Psychological Review, 89, 334–368. Lerdahl, F., (1989) Structure de prolongation dans l’atonalité, in S.McAdams & I.Deliège (eds.), La musique et les sciences cognitives, Bruxelles, P.Mardaga, 103–135, English version in Contemporary Music Review, Harwood Academic Publishers. Lerdahl, F. & Jackendoff, R. (1983) A Generative Theory of Tonal Music. Cambridge, MIT Press. Palmer, C. & Krumhansl, C. (1987a) Independent Temporal and Pitch Structures in Determination of Musical Phrases, Journal of Experimental Psychology: Human Perception & Performance, 13, (1), 116–126. Peretz, I. & Morais, J. (1989 La musique et la modularité, in S.McAdams/I.Deliège (eds.), La musique et les sciences cognitives, Bruxelles, P.Mardaga, 393–414. English version in Contemporary Music Review, Harwood Academic Publishers. Povel, J. (1981) The internal representation of simple temporal patterns. Journal of Experimental Psychology: Human Perception and Performance. 7; 3–18. Sloboda, J. & Parker D. (1985) Immediate recall of melodies, in P.Howell, I.Cross & R.West (eds.) Musical Structure and Cognition, London, Academic Press, 143–168. Stoffer, T. (1985) Representation of phrase structure in the perception of music, Music Perception, 3, 191–430.
Musical structure in cognition
149
Serafine, M.-L. (1989) The cognitive reality of hierarchic structure in music. Music Perception, 6, 397–430.
Contemporary music review
150
Is the perception of melody governed by motivic arguments or by generative rules or by both? Archie Levey Medical Research Council Applied Psychology Unit, Cambridge, UK Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 139–150 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
Traditional formal analyses of music have emphasised the sequence of motivic arguments as the source of the listener’s understanding and appreciation. More recently the concept of a listening grammar related to the compositional grammar by shared generative rules has offered an alternative model of musical processing. Experimental subjects listened to melodic materials which embodied comparable generative rules but differing thematic arguments in order to compare the relative contribution of these components to ratings of musical interest and preference. Ratings were also obtained for hybrid melodies constructed by alternating motivic materials from the source melodies. If the motivic structure is the main determinant of processing, the intact melodies should be preferred to the synthetic hybrids. In fact, the results suggested that the generative rules were the more prominent vehicle of processing. Individual differences in the processing of melodies were also clearly evident. KEY WORDS: Cognition general, cognitive processing, generative grammar, music, melody
Introduction: Generative rules in music The concepts of generative rules and of local grammars, which are subsets of a universal grammar, are tools that have been applied to a wide range of musical analyses. The specific application of the principles of generative grammar to music has been described by Sloboda (1985) and an informative overview of alternative grammars has been provided by West, Howell and Cross (1985). The versatility of generative models in
Is the perception of melody governed by motivic arguments
151
application to musical composition has been discussed by Jones (1989) in the context of computer orientated music. The most widely quoted account of a generative grammar of music is that of Lerdahl and Jackendoff (1983) who used the principles of Chomskian linguistics, together with concepts drawn from cognitive psychology, to produce a grammar of tonal music for which they claim a measure of universality. One of the implications of this approach is that the generative rules embodied in the grammar are shared by the listener (Lerdahl, 1988). Another is that the rules themselves, sui generis, can give rise to coherent musical utterances. The serious import of this and similar approaches is that rule-based processes can of themselves account for music. The term ‘generative’ does not imply that musical or linguistic utterances are actually generated by the rules. For cognitive science, correct identification of generative mechanisms serves to facilitate understanding of the underlying structure of mental events. Nor should it be thought that the composer, the performer or the listener must be explicitly aware of the rules. A successful composer may neither use the grammar nor be aware of its existence without inflicting any damage on the theory. It must be possible, however, to give explicit definition and precise format to every term in the generative formulae. Failure to understand these principles has lead to criticisms of the usefulness, theoretical validity and practicality of the concept of generative grammatical processes in music that are merely trivial. There is another criticism, however, that is not trivial which concerns the productive role of generative rules. Generative grammars are unable to determine what the speaker intends to say except insofar as they restrict utterances to those that are grammatical. This problem of intentionality has raised interesting issues in cognitive psychology which cannot be discussed here. They are illustrated by the fact that competent native speakers of a natural language are very frequently unable to specify the rules of its grammar while being able to communicate accurately the information they intend to convey. In application to music, the corpus of generative rules appropriate to any compositional idiom cannot embody the composer’s intentions, it can only place constraints on what can or cannot be written. Interestingly this problem in the application of rule-based approaches to composition was recognised by Johnson-Laird (Laird, 1961), a pioneer of computational approaches, long before generative grammars had been formally introduced into the study of music. One of the earliest applications of the principle of generative rules to music was undertaken by Sundberg and Lindblom (1976). These investigators looked at two sources in Swedish music, a series of eight bar nursery tunes written originally by a 19th Century composer (Alice Tegner) and a collection of indigenous folk songs said to have a common folkloric origin. By analysing the metric structure, pitches and harmony of these sources they were able to formulate discrete sets of generative rules for each of them, which they claimed could adequately account for the music. To test this claim they used the rules for the nursery rhymes to produce new tunes which were recognised by Swedish listeners as ‘similar’ to those of Tegner. The details of the local generative grammar are given in the original article and are too extensive to be described here. In principle, they work by defining a harmonic and rhythmic framework within which there is a legal set of alternative pitches for each note of the eight bar melody. In the construction of new melodies, notes were selected at
Contemporary music review
152
random from each set of permitted alternatives. In other words, the element of compositional intention was eliminated and the claim was that coherent melodies were produced by the rules alone. It was scepticism about this claim that prompted the series of investigations to be reported shortly.
The concept of motivic argument Before describing these investigations a more traditional approach to the metaphor of language in music must be considered. Thomas Campion, writing in the 16th Century, compared the ‘ayres’ of music to the ‘epigrams’ of poetry. In 1958 one of the seminal works of modern music theory, now translated and reissued (Francès, 1988), declared that: ‘Music appears to us, first of all, as a coherent discourse in which the moments are punctuated by more or less clear suspensions and conclusions…” (p. 159). This concept of motivic argument, widely accepted in the past, uses the metaphor of language but uses it in a way which is quite different from grammatical analysis. Central to the view is the idea that structural antecedents are followed by their appropriate consequents. Successive units of the musical material comment, as in a discourse, on those that have gone before. There are sequiturs and non-sequiturs, expectations and surprises which build the musical content of the piece. In a psychological sense, the motivic argument is concerned with a sequence of organised sounds which cannot be understood without reference to memory and which must involve continuous cognitive processing. Equally, the objects, events or gestures of more recent music are ordered in a sense which may no longer parallel formal discourse but which nevertheless reflects the concepts of meaning and intention. Even the introduction of indeterminacy can be seen as a recognition that the problem of intentional sequence is central to music. Thus there is a conflict, long-standing and widely acknowledged, between the view of music as language in a rule-based grammatical sense (e.g., serialism) or in the alternative sense of content defined by intentional goals. The latter sense is not confined to traditional music. Ligeti has argued, for example, in the context of contemporary music, that an international vocabulary of intervals can be used to define the discursive argument of a piece in place of specific motivic content (Ligeti, 1983, p. 94 ff). The essential distinction is between musical syntax and musical rhetoric (Francès, 1988) and the conflict concerns their relative importance. The psychological question that arises from this conflict is: what is the activity of the listener? Is the understanding and enjoyment of music based primarily on knowledge and appreciation of its structural rules or on understanding and appreciation of its sequences and their goals?
An experimental approach In the experiments to be reported subjects were required to listen to melodies embodying the same generative rules but differing motivic patterns. As a means of comparing the relative contribution of these two components to their perception of the music, ratings of preference and of interestingness were used to assess subjective reactions. The choice of
Is the perception of melody governed by motivic arguments
153
ratings as measures of the listener’s response has the merit of being simple and comes close to defining the listener’s enjoyment of a particular melody. The melodic materials in such experiments must be of proven musical interest and they should be historically valid in the sense of being consistent with a musical tradition accessible to a wide range of listeners. These criteria were adopted to ensure a realistic assessment of musical preferences. The experiments were conducted with subjects drawn from a pool of available volunteers, the APU subject panel. To these were added associates and colleagues known to have musical interests in order to ensure a broad spectrum of response. There were no restrictions on musical preferences or tastes. This strategy made it possible to divide the sample into three levels of musical competence. Criteria for the highest level, the musicians, included at least 8 years musical training and regular continuing activity. The lowest level, the non-musicians had no musical training and expressed little or no interest in music. The middle group, the majority, expressed interest in and liking for music and tended to listen selectively, but not to pursue other than casual musical activities. Melodic materials were recorded on cassette from an acoustic piano and were played without earphones in an informal testing situation. The intention was to produce a reallife sound but not to emphasise performance values, other than to articulate obvious phrases. Subjects were asked to respond to the melodies as such, rather than to the performance. Data were analysed using the statistical techniques commonly employed in psychological experiments but these procedures will not be described in detail. The term ‘significant’ will be used to indicate results or comparisons that are highly unlikely to have occurred by chance alone. A number in parentheses (e.g., 0.02) will indicate the best approximation to chance likelihood, in this case 2 in 100, which it is the function of the statistical procedures to estimate.
Experiment I: compatible generative rules Twenty subjects were asked individually to rate simple eight bar melodies on a scale from 0 to 10 in terms of degree of liking. They were also asked to rate from 0 to 10 the interestingness of each melody. It was stressed that these ratings were to reflect their own personal reactions. Each item was played twice, the rating being given during or after the second hearing. Subjects were assigned to one of two groups who heard the melodies in differing order to balance out the effects of fatigue or boredom. Materials Verdi’s ‘La Donne e Mobile’, from the opera Rigoletto, and the Neapolitan ‘folk’ song ‘Santa Lucia’, composed in the 19th Century, provided eight bar major diatonic source materials designated D and L respectively. The two melodies are identical in rhythmic and harmonic structure and differ only in pitch contour. Sloboda (1985) has shown that within the rhythmic and harmonic framework the pitches of the Verdi melody can be accounted for by three simple rules. Exactly the same rules can also account for the second melody. The number of possible pitches is very large and the assumption is made that the pitches for each of the melodies were chosen to satisfy an optimum motivic
Contemporary music review
154
structure. Two hybrid melodies were formed by alternating successive two-bar phrases across each of the source melodies to produce new melodic structures. These melodies obey the same generative rules and exhibit the same overall pattern of tension, but antecedent and consequent phrases can be assumed to be motivically inappropriate. These hybrids, DL and LD, are reproduced in Figure 1, Ex. 1 and 2. Each of the samples included a simple chordal accompaniment. The prediction to be tested seems intuitively compelling: that the authentic composed melodies will be preferred to the synthetic hybrids. It can be argued that a melody produced from a combination of familiar materials would itself be perceived as unfamiliar and hence less well liked. In order to control for the effects of familiarity and also to assess the sensitivity of the rating scales, four corollary melodies were included which were compatible with the generative rules of the test materials. A Strauss waltz provided a familiar eight bar melody and the same melody occurred later with a simple error in harmony, intended to detect subjects’ sensitivity to musical grammar. Two unfamiliar test melodies were written to conform to the generative rules one of which was intended to be appealing while the other was intentionally banal (see Figure 1, Ex. 3).
Figure 1 Musical examples referred to in the text. D—La Donne e Mobile; L—Santa Lucia. The order of the upper case symbols denotes the order of alternation of the hybrids, e.g. LD— L first phrase, D second. Results The two source melodies were familiar to 85% of the subjects and those who failed to recognise them were from the group of non-musicians. Musical competence did not affect overall levels of rating and the musically competent subjects did not differ from the others in their preferences either for the original or for the hybrid melodies. Contrary to the prediction being tested, the hybrid melodies did not differ from the source melodies in ratings either of interest or of preference. For the corollary materials, the attractive new
Is the perception of melody governed by motivic arguments
155
melody, a simple Ländler, received an average preference rating equal to the preferred sound melody (D), indicating that the generative rules could effectively accommodate a new melodic structure. The harmonic error, in the Strauss melody, produced a nonsignificant decrement in preference due entirely to the ratings of the competent listeners. The banal melody was least liked, though not dramatically so, and this was the only significant (.01) departure from the average level of liking. None of the ratings of interestingness differed significantly from one another and they will not be referred to again. There was no effect of familiarity as such. These results support the claim that simple generative rules can of themselves produce acceptable melodies even when compositional intentions are disrupted or suspended. The unfavourable response to the banal melody is important because it shows that the null results were not due either to unfamiliarity or to insensitivity in the ratings. The argument of the experiment rests, however, on the assumption that the similarity of the ratings grew out of the common rule base. The next experiment attempted to challenge this assumption by comparing hybrid melodies drawn from differing generative sources.
Experiment II: incompatible generative rules Thirty subjects were exposed to test melodies under essentially the same conditions as those described above. None of these subjects has taken part in the first experiment. The ratings of preference employed a scale from 0 to 100 and ratings of interestingness were not included. The melodies were presented without accompaniment. Otherwise the experimental conditions were comparable and order of presentation was again made different for two equal groups to control for fatigue or boredom. Materials The designation of melodies differing in generative rules raised difficulties which there is not space to discuss in detail but which were very informative. They stemmed largely from the fact that such rules are nowhere specified and must be inferred. The solution adopted was to choose eight bar sections of two simple folk melodies from modal sources on the assumption that the generative rules underlying melodies foreign to the major diatonic framework would differ from those involved in the original set. The melodies were also foreign in the sense of being identified with a different geographical region, Scotland, and hence a different musical tradition. The pentatonic melody of the Skye Boat Song (coded O for ‘Over the Seas’) was selected on the basis of its familiarity. Another melody was sought that would duplicate the previous, rhythmic pattern and this requirement proved difficult to fulfil, possibly because regional traditions tend to embody strong rhythmic conventions. The melody chosen was a traditional Lament, arguably in the mixolydian mode, coded S for ‘Scottish’ (or S for ‘Slavonic’,—a variant of the same melody occurs in Dvorak’s Slavonic Dances Opus 72 and a further variant occurs in a Chopin Mazurka!). All four of the source melodies were presented. Hybrids were formed from alternate phrases, SL, OR and DS. The remaining combination LO, was excluded because of a technical fault. The three hybrids are illustrated in Figure 2, Ex. 1–3. It
Contemporary music review
156
should be noted that the modal intervals are ambiguous in a major diatonic context. The prediction to be tested is again that the source melodies will be preferred to the hybrids.
Figure 2 Musical examples referred to in the text. O—Skye Boat Song (‘Over the seas’; S—Scottish Lament. The order of the upper case symbols again denotes the order of alternation. Results On average 82% of subjects were familiar with the source melodies, R, L and O but only 15% recognised the melody S, selected to duplicate the rhythmic pattern of the previous materials. Subjects were classified on the basis of their awareness of the experimental manoeuvres into three levels: those fully aware; vaguely aware; and unaware. Interestingly, both fully aware and unaware subjects rated all materials significantly higher (.03) than the vaguely aware, as if the uncertainty of knowing that something odd was going on, without knowing what it was, had interfered with linking. The awareness dimension overlapped significantly (0.05) with the dimension of musical competence but neither of these significantly influenced the differential rating of intact or hybrid melodies. The averaged preference ratings are shown in Table 1 which makes it clear that all of the hybrids were significantly (0.0001) less liked than the source melodies. The unfamiliar melody was also less well liked and this agrees with the well-known relationship between familiarity and preference. The table shows, however, that the hybrid that did not contain this source was no better liked than the two that did; the decreased liking for the hybrids cannot therefore be attributed to unfamiliarity.
Table 1 Averaged ratings for compatible and incompatible sourcesa Stimuli Hypothesis
D
L =
O =
S =
DS >
OD =
SL =
Is the perception of melody governed by motivic arguments
Ratings
66.9
=
65.6
=
69.8
>
49.8
=
46.0
157
=
47.5
=
51.4
a
Symbols are identified in the text.
The results supplement those of the previous experiment to suggest that the generative rules were the principle determinant of listeners’ reactions to the music. Hybrids formed from valid musical materials that share a common generative source (Experiment I) were indistinguishable in terms of musical preference from the original sources, their motivic units being interchangeable. Hybrids formed from source materials for which a different set of generative rules can be inferred (Experiment II) were less preferred indicating that their motivic materials were not interchangeable. This was true even though one of the sources was itself rated low in preference; the decrease in favourable response to the hybrids was independent of the initial level of preference. Even though the materials were very simple, listeners were sensitive to the results of mixing motivic materials from different sources. This is consistent with the fact that specific generative rules will constrain the possible motivic shapes with which the rules are consistent: the substitute answering phase does not give a sensible answer. The experiments described above were concerned with interchanges of the natural units of strophic melodies consisting of symmetrical two-bar phrases. If it is true that these natural units are shaped by the generative rules and that resistance to interchangeability is a function of the extent to which the rules differ, an interesting prediction follows. Hybrids formed from units other than the natural phrases should disrupt preferences for melodies regardless of their generative sources, but the disruption should be greater if the sources are incompatible. A further experiment was undertaken to test this prediction.
Experiment IIa: rule compatibility and level of parsing The subjects of the previous experiment believed that it was a practice run for a more elaborate experiment in which the melodies were rated in sets of three, having in common a property which would not be disclosed. This allowed for a more powerful experimental design involving multiple paired comparisons (Gulliksen and Tucker, 1961) in which each stimulus is compared once with every other and occurs in each possible order. A limitation of these designs is that the number of comparisons is fixed by the size of the set. Pilot studies indicated that more than 3 melodies could not meaningfully be compared and this limits the materials to 7 stimuli and a total of 21 comparisons. The resulting data are the number of ‘votes’ for each stimulus, ranging from a minimum of 0 for the melody never preferred within a set to a maximum of 6 for the melody consistently preferred. Though originally rated, the stimuli are thus ranked in order of preference for each individual subject. A principle advantage of the method is that the relatively variable subjective ratings are replaced by ranks having greater metric stability.
Contemporary music review
158
Materials Within the limit of seven stimuli the following were used. The three source melodies were D, L and S, which share the same metric pattern. A hybrid of S and L sharing alternate phrases (SLp), a hybrid of S and L sharing alternate bars (SLb), a hybrid of D and L sharing alternate bars (DLb) and a hybrid of D and L sharing alternate notes (DLn) constituted the test melodies. In analysing the data, D and S were each averaged with L to produce stable estimates, S/L and D/L. Thus all the materials to be rated held constant a common melodic source (L), permitting comparison of the effect of generative source (same or different) and level of hybridisation (phrase, bar or note). Figure 3, Ex. 1–3, shows the three new hybrids. The melody SLp was identical to SL, shown previously. The predictions are clear and can be expressed in the formula: S/L=D/L>SLp>DLb>SLb>DLn. This implies that the generative source and the level of unitising will summate to determine the rating of preference. It should be noted in passing that two of the hybrids are drawn from the sources of Experiment I which had already shown that no difference in preference occurred when they were hybridised at the level of the phrase. Results The results are displayed in Table 2 which shows the average ranking for each of the melodic materials. The prediction is surprisingly well confirmed in terms of order of preference and the overall difference among ranks was significantly above chance (.01). The statistical analysis also showed, however, that the difference between SLp and DLb was not significant nor was that between SLb and DLn. The fact that the most structurally deviant of the hybrids, that consisting of alternate notes from two sources, DLn, did not differ from its neighbour deserves special attention. Examination of individual ratings showed that five subjects had rated this oddly angular little melody as most preferred, thus artificially raising its average ranking. Significantly, all of these subjects were from the musically
Figure 3 Musical examples referred to in the text. D—La Donne e Mobile;
Is the perception of melody governed by motivic arguments
159
L—Santa Lucia; S—Scottish Lament. The order of the upper case symbols denotes the order of alternation while lower case symbols indicate the level of unitising: p—phrase; b—bar; n— note. competent group who were presumably able to parse the melody in its own terms, independent of the influence of the generative rules.
Evaluation: generative rules in music The investigations just described were framed in terms of determinants of the listeners’ enjoyment or liking of very simple melodies. They were designed to assess the relative importance of the grammatical rules, which organise the musical material, and the motivic content, which presumably reflects its composer’s intentions. Music entails the tantalising quality of ‘aboutness’; it seems to mean something. Yet the results of these simple experiments imply that the meaning is illusory, that the grammar, which is void of specific meaning, represents the musical content. The further demonstration that the level of unitising can be manipulated to disrupt these processes works both ways. It suggests that by disrupting the composer’s ‘intention’, rearranging the notes in ways that violate the compositional units of the material, the music loses some of its effectiveness. But it can as well be argued that what is disrupted is simply the next level of processing, a level at which the integrity of units in the concealed grammar is specified. Whichever way this argument is carried, the notion of a music in which the motivic argument is the vehicle of musical meaning comes off rather badly. Before accepting the intuitively awkward conclusion, however, that the generative rules alone account for the music, some critical alternatives must be considered.
Table 2 Averaged ranks for three levels of hybrida Stimuli
D/L
Hypothesis Ratings
S/L =
4.52
=
SLp >
4.28
>
DLb >
3.75
=
SLb >
3.70
>
DLn >
2.83
=
2.32
a
Symbols are identified in the text.
It is possible that ratings of liking or musical enjoyment did not reflect the listeners’ response to music other than in a superficial way. It is easy to ‘like’ the sound of background music, for example, without really hearing it. A number of well known cognitive laboratory techniques, speed of recognition, ease of memorising, remote recall, etc. might provide more sensitive measures of musical processing which would reflect the role of purely motivic elements. Judgements of ‘melodiousness’ have been shown to
Contemporary music review
160
be intuitively easy for experimental subjects (Watkins and Dyson, 1985) and might be more revealing than judgements of liking. Until these alternatives are explored it would be rash to conclude that the traditional view of motivic goals and intentions should be abandoned. It is also possible that the materials chosen to exemplify comparable generative sources did not do so. It can well be argued that the raucous, almost vulgar accompaniment originally provided by Verdi to project his cynical little song contains elements essential to its musical meaning. It can further be argued that there is no proof that modal materials originate from different generative rules. The case for the results rests heavily on the specific materials and on the specific recombinations chosen for analysis. Further experiments should use a wider range of materials and should include control comparisons based on more complex musical models. Interpretation of the results also rests on assumptions about composers’ intentions that may not be justified. Composers have often commented on their own compositional processes. Stravinsky (1947) described a procedure closely resembling trial and error in which the final material is recognised as being right rather than being planned. The use by Xenakis (1985) of stochastic algorithms to ‘generate’ indeterminacy bears no resemblance to the generative rules of ordinary language. Similarly the use of fractals, simple mathematical expressions that ‘generate’ the formal property of self-similarity, is devoid of linguistic implications but shows promise as a compositional device (Cutting and Garvin, 1987). If the notions of generative grammar and motivic argument bear no relation to the activities of those who generate music then the results of the experiments require some other explanation. They may simply have been concerned with degrees of similarity characteristic of the type of strophic materials studied.
The role of the listener The idea of a listening grammar, related to the compositional grammar by shared generative rules, proposed by Lerdahl (1988), borrows credibility from the experience of ordinary language. The empirical problem is whether it accords with the facts. The present results, though simple and preliminary, suggest that it may not. The differences in preference were very small in magnitude and examination of the individual rankings for each of the materials of the last experiment revealed an interesting reason for this blurring of outcomes. Without exception every one of the 7 melodies, originals and hybrids, each based on the average of three separate judgements, was selected as most liked by at least one subject and at least liked by one or more of them. While the ratings, on average, showed the systematic trends described earlier, the individual preferences were widely diverse. This was most clearly reflected in the preferred choice of the least liked hybrid by the musically competent listeners. Clearly, the melodies were processed in highly individual ways and by mechanisms that were only weakly related to the purposes of the experiment. This may be the most sensible conclusion that can be drawn from the data. Since the issues that are raised concern similarities between music and language it may be useful to look at alternative approaches to language comprehension. An interesting experimental literature has grown up around the linguistic concepts of coherence and plausibility (eg: Black, Freeman and Johnson-Laird, 1986). Applied to
Is the perception of melody governed by motivic arguments
161
music they suggest that a text may be understood (appreciated, valued) in terms not of shared generative rules but of perceived self-consistency and conformity to a knowledge base that makes it plausible. This approach would imply that the listener individually construes the music, processing it by whatever public or private conventions can be found or invented to make sense of it. The results of the present experiments seem peculiarly well fitted by this formulation. The selection of materials for the first experiment may have gratuitously ensured coherence of the thematic elements in all the melodies without the necessity of invoking generative rules. Similarly, the attempt to select materials differing in their generative sources may simply have ensured that thematic coherence in the hybrids would be disrupted. Experienced listeners who liked the deviant melody found it plausible in terms of a broad knowledge base which the less experienced did not have. Similarly, the preferred corollary melody, because it was modelled on the form of the Ländler, was plausible in terms of familiar knowledge while the banal melody was implausible by any standards. The principles governing natural language differ from those of music in one major particular. Every comprehending listener knows the same language. What does the musical listener hear? Is Birtwistle’s Tragoedia a subjective ritual, rhythmic tour-deforce, sensuous play of timbres, tightly argued formal structure or all of these? How intimate were Janacek’s Intimate Letters? How novel were its interval structures, with their associated sonorities? And which is the more appropriate question? Listeners are not only free to process music by their own strategies, they have no other choice. The probable answer to the question embodied in the title of this paper is that both components, generative rules and motivic content, contribute to the listener’s perception of music but only in part and in different ways for different listeners.
References Black, A., Freeman, P. & Johnson-Laird, P.N. (1986) Plausibility and comprehension of text. British Journal of Psychology, 77, 51–62. Cutting, J.E. & Garvin, J.J. (1987) Fractal curves and complexity. Perception and Psychophysics, 42, 365–370. Francès, R. (1988) The Perception of Music, translated by W.Jay Dowling, Hove and London: Lawrence Erlbaum Associates. Gulliksen, N.A. & Tucker, L.D. (1961) A general procedure for obtaining paired comparisons from multiple rank orders. Psychometrika, 26, 173–183. Jones, K. (1989) Generative models in computer assisted musical composition. Contemporary Music Review, 3, 177–196. Laird, P. (1961) Composers or computers? The Listener, 65, 785–786. Lerdahl, F. (1988) Cognitive constraints on compositional systems. In J.A.Sloboda (Ed.), Generative Processes in Music, pp. 231–259. Oxford: Clarendon Press. Lerdahl, F. & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, Mass.: MIT Press. Ligeti, G. (1983) Gyorgi Ligeti in conversation with Peter Varnai, Josef Hausler, Claude Samuel and Himself. London: Eulenburg. Sloboda, J.A. (1985) Music, language and meaning. Chapter 2 of The Musical Mind: The Cognitive Psychology of Music, pp. 11–66. Oxford: Oxford University Press.
Contemporary music review
162
Stravinsky, I. (1947) Poetics of Music. Cambridge, Mass: Harvard University Press. Sundberg, J. & Lindblom, B. (1976) Generative theories in language and music description. Cognition, 4, 99–122. Watkins, A.J. & Dyson, M.C. (1985) On the perceptual organisation of tone sequences and melodies. In P. Howell, I.Cross and R.West (Eds.), Musical Structure and Cognition, pp. 71– 119. London: Academic Press. West, Howell and Cross, I. (1985) Modelling perceived musical structure. In P.Howell, I.Cross and R.West (Eds.), Musical Structure and Cognition, pp. 21–52. London: Academic Press. Xenakis, I. (1985). Arts/Sciences; Alloys. Aesthetics in Music. No. 2. New York: Pergamon Press.
Transformation, migration and restoration shades
163
Transformation, migration and restoration Shades of illusion in the perception of music Zofia Kaminska and Peter Mayer Psychology Department, City University, London, UK Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 151–161 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
The traditional dichotomy between speech and non-speech sounds was questioned. The fate of musical stimuli was explored in three illusion-producing paradigms: ‘transformation’ of veridical perception under conditions of invariant stimulus input, ‘migration’ of extraneous sounds revealing cognitive restructuring of the linear input, and ‘restoration’ of absent sounds. These effects, arising from the low correspondence between physical input and conscious representation characteristic of speech perception, are considered to be speech-specific. Experiment 1 revealed musical sounds to be as susceptible to perceptual transformations as speech. Experiment 2 demonstrated cognitively imposed structuring in music evidenced by migration of extraneous sounds. Experiments 3 and 4 provided evidence of perceptual restoration of missing fragments of music. These parallels between music and speech in terms of deviation of the psychological representation from the underlying signal suggest that listening to music is far from the linear, data-driven process assumed by psychological theory and implicate strong top-down influences in the perception of music. KEY WORDS: Perception, speech, music, auditory illusions.
Introduction Cognitive psychology has traditionally divided the world of auditory events into two distinct categories: speech sounds and non-speech sounds. Speech sounds form a world within a world, a closed set with its own unique indirect mode of processing in which the
Contemporary music review
164
relationship between the acoustic properties of the stimulus and the emergent percept is opaque (Liberman, Cooper, Shankweiler, Studdert-Kennedy, 1967). The remainder of the auditory world is composed of non-speech sounds, far more prosaic than speech in that processing from stimulus to percept proceeds in a lawful, direct and more transparent manner, and it is among these sounds that music mingles. This dichotomous world has been drawn with good reason. The evidence for the uniqueness of speech sounds is formidable, converging from several directions. Neurology and neuropsychology point to the usually uni-lateral, typically lefthemispheric and precise specialisation for and localisation of, speech processing (McCarthy and Warrington, 1990). Speech sounds are also unique in that their perception is intertwined at the neural and psychological levels with their production. Evidence from electrical stimulation mapping of the brain at the level of individual neurones implicates the same neurones in both discrimination and articulation of phonemes (Ojemann, 1982), while visual information about the articulation of one sound will fuse with acoustic information defining another sound to produce an auditory percept representing neither (McGurk and MacDonald, 1976), again implicating cross-referencing between input and output pathways. But the most pervasive and persuasive characteristic of speech distinguishing it from other sounds is the lack of a discernible relationship between the acoustic signal which serves as the auditory stimulus in perception, and the way in which that stimulus is actually perceived. This lack of correspondence between acoustic characteristics and emergent percept is observable at different levels of analysis. At the acoustic level, a speech sound basically consists of two (sometimes three) fundamental bands of energy at, or sweeping across, particular frequencies. In the wider auditory world such bands are heard as two sounds at a steady, or changing, pitch, in a fairly lawful manner. Harmony between the physical and psychological world is preserved. But speech sounds are outlaws. Their psychological reality stems from a more obscure mode of translation from the physical, and the bands of energy are heard as a single sound, a unified entity—a vowel, or a consonant. (Liberman et al., 1967, Foss and Blank, 1982). The same cavalier disregard for the actual physical characteristics of the underlying stimulus surfaces at a less fine-grain level of analysis as an unusually high degree of topdown or conceptually-driven processing, and gives rise to some intriguing illusions. For example, a single repeated token of a word will undergo transformation to produce distorted or new, phantom words (Warren, 1968). A sound superimposed on a speech stream will perceptually migrate from its actual locus to points of linguistic breaks in the stream (Fodor and Bever, 1965). A missing fragment of speech will be perceptually restored to produce an intact word (Warren, Obusek and Ackroff, 1972). In each case the conscious realisation of the speech stimulus is relatively untrammelled by physical reality. The relationship between what exists at the level of acoustic information and what is heard is tenuous, and in some cases so tenuous as to be non-existent. This contrasts with the fate of non-speech sounds, which are trapped as psychological echoes of the physical stimulus. The validity of this traditional dichotomy has rarely been questioned, but this is precisely the question posed here: Can non-speech sounds, such as music, ever escape from their imprisonment? Is music always perceived in a way that directly reflects its
Transformation, migration and restoration shades
165
characteristics in real time and energy? To answer it we set out to investigate whether musical stimuli would give rise to the illusory effects thought to be peculiar to speech, and examined the fate of musical stimuli within the three paradigms of transformation, migration and restoration outlined above.
I. Transformation This effect was first described by Warren and Gregory (1958) and subsequently labelled, nota bene, the Verbal Transformation Effect. The effect is obtained when a single token of speech—a word or short phrase—is repeated continuously on a closed-loop principle, so that the acoustic waveform remains invariant across repetitions. Under such conditions the word soon starts to undergo perceptual changes. These changes may be minor ones of stress, or of initial syllable location, or more significant ones of phonetic structure. But the most dramatic transformations are those where what is heard bears no discernible relationship to the actual stimulus. Completely new words are created by the listener. The relationship between physical reality and psychological representation is lost. Of the theoretical accounts of this phenomenon, which have ranged from seeing it, somewhat misguidedly, as an analogue of the visual reversible figure (Evans, 1967) or, more logically but with no more empirical support, as the result of neural fatigue (Warren and Gregory, 1958), the one which has remained unchallenged is that subsequently proposed by Warren (Warren, 1968, 1982). In his view the phenomenon reflects a mechanism essential and specific to the resolution of ambiguities in speech, a highly creative mechanism which can—because it has to in the face of imprecise information conveyed by speech signals—construct a psychological representation with minimal recourse to physical data. If this hypothesis were correct, then the effect should not occur in non-speech sounds. This prediction was put directly to the test in Experiment 1, in which the efficacy of music in evoking transformations was compared with that of speech stimuli.
Experiment 1 Method Two categories of auditory stimuli were used: Speech (S)—complete words, selected to cover a broad range of frequency of occurrence in the language and of phonetic patterns. Musical motifs (M)—selected to represent a range of musical styles, complexity and familiarity. Ten stimuli in each category were prepared in the following way: A sample of each stimulus (produced vocally for words, synthesized on a Casio CZ1 piano synthesizer for music) was recorded on a closed loop with cycle duration of 2 seconds. 120 repetitions of the cycle were recorded on continuous tape, making a total presentation time of 4 minutes.
Contemporary music review
166
The design was within-subjects, each subject participating in each stimulus condition, with presentation of stimuli blocked within conditions, and randomised across subjects. Order of conditions was randomised across subjects. Subjects were asked to listen to each stimulus tape and to report verbally any changes they heard. Results and discussion Two response measures were obtained—type of transformation, and the interval between the onset of the stimulus and the occurrence of the first transformation (T1). Transformations were classified into two types: i. Identity changes—where the subject would claim to hear a new word, or that a different piece of music was being played, ii. Non-identity changes—involving relatively minor changes of quality, such as a change in stress or loudness. Table 1 shows the absolute number of each type of transformation, and the mean times to T1 per stimulus category.
Table 1 Transformations in speech and music. Speech
Music
Id
Non-id
Id
Non-id
239
131
270
94
Total
370
364
T1(secs)
49.7
51.2
It is clear that transformations are by no means limited to speech sounds. Musical sounds undergo changes in a manner and to an extent comparable to that of speech. Analyses of variance failed to reveal any significant differences between speech and music for either of the two sub-categories of transformation, total number of transformations, or time to first transformation. So it is evident that at least as far as this phenomenon is concerned, music sounds are just as labile as speech, and just as liable to detach themselves from the underlying signal and reach a form of conscious realisation which has no support from the physical stimulus.
II Migration ‘Migration’ refers to the perceptual migration of sounds extraneous to the main auditory input from their actual locus in time to positions of segmental breaks in the input. The phenomenon, first described by Fodor and Bever (1965), is well documented in speech.
Transformation, migration and restoration shades
167
If an extraneous sound, such as a click, is superimposed at the point marked * in an auditorily-presented sentence such as, for example:
Some articles are interesting, others are boring. the click is not heard at the point in time in which it actually occurs, (in this case, at the first syllable of the word ‘interesting’), but tends to migrate to a natural break in the speech stream, marked, in the case of a sentence spoken with natural intonation, by a coincident grammatical and intonational (prosodic) boundary. (In the example above, this would be the gap between ‘interesting’ and ‘others’). In fact it has been shown that both grammar and prosody individually are capable of acting as articulating forces in the perceptual segmentation of speech (Wingfield and Klein, 1971). It is as if decisions as to what has just been heard are being made not continuously, paralleling the temporal linearity of the input, but at intervals, perceptually organising the stream into quantal chunks. Extraneous signals can only be perceived during a break, because the perceptual chunk itself is indivisible. The conscious realisation of the speech stream must therefore involve cognitive restructuring of physical input, creating greater articulation into, and cohesion within, segments, and thereby deviating from a real time representation. Although there have been reports of a similar effect occurring in music (Sloboda and Gregory, 1980, Stoffer, 1985), the demonstrations have been somewhat equivocal. In the former study segmentation based apparently on processes of auditory cognition may have in fact been mediated by visual cues, since subjects had access to an accurate score during stimulus presentation. The results of the latter study are of limited generalisability since they were produced under conditions of extremely rigorous subject training and selection for musical experience. Experiment 2 therefore sought to establish the effect in music as a more robust phenomenon and place it clearly within the realm of auditory cognition, while at the same time testing details of its parallelism with speech.
Experiment 2 Method The musical counterpart of grammar in speech was taken to be metrical structure of musical stimuli, and that of spoken intonation to be performance intonation. Their individual and combined influence on click-localisation was investigated under three conditions of boundary definition: Condition 1:
Co-incident metrical structure and intonation. Natural, interpretative performance with variation in duration and intensity of notes consonant with metrical structure.
Condition 2:
Metrical structure alone. Intonational information eliminated by isochronic performance.
Condition 3:
Intonation alone. Intonation inappropriate to metre (the total elimination of metrical structure would
Contemporary music review
168
eliminate the musical nature of the input).
A set of six melodic lines, composed of isochronic notes and conforming to the same basic structure—a metrical boundary dividing two melodically-identical but pitchdifferent phrases—was generated by computer. Each line was of approximately 5 seconds duration. A click of equal volume to the notes was superimposed in the line, in a pre- or post-boundary position, the click-to-boundary distance being constant at 4 units of input (a unit being defined as a note or inter-note interval). An example of a typical tune showing the boundary definition and both pre- and post-boundary locus of clicks, is shown in Figure 1. Musically untrained subjects were asked to indicate the locus of the click using a schematic visual representation of the tune (not the score), either immediately, as soon as they thought they heard the click, (“Imm”), or retrospectively, at the end of
Figure 1 Boundary and click positions in a Condition 2 tune. the line (‘Ret’). During listening they were required to perform two subsidiary tasks— ratings of pleasantness and of musicality of the tune—to encourage listening to the tune as a whole as opposed to concentrating on click detection. Eighteen subjects took part in the experiment, which took the form of a 3 by 2 mixed factorial design, with mode of boundary definition (Conditions 1, 2 and 3) as a withinsubjects factor, Report time (Imm or Ret) as a between-subjects factor. Results and discussion Click migration, in both forward (pre-boundary clicks) and backward (post-boundary clicks) directions to a boundary occurred in all three conditions. That is, clicks tended to be localised closer to a boundary, irrespective of the way in which that boundary was defined, than their actual position in the input sequence, implicating both grammar and intonation, acting independently or in combination, as forces of re-structuring in the perception of music. The bi-directionality of the migration points to a true boundary pull, precluding any interpretation in terms of a constant error tendency in temporal localisation. The measure of migration used for purposes of analysis is the distance, in units of input, between the apparent location of pre- and post-boundary clicks, for a particular tune, timing of report, and boundary definition. This is exemplified in Figure 2 which shows the actual localisation of clicks by 18 subjects for a single tune and report timing in Condition 2.
Transformation, migration and restoration shades
169
Figure 2 Real and apparent locus of clicks about a boundary [Melodic line in Condition 2]. An absence of a boundary effect, or a simple constant error of temporal localisation, would give rise to mean values of 8 units, the actual separation of pre- and post-boundary click positions, as their apparent separation. A boundary effect is therefore given by the difference between the real and apparent separation of click positions on either side of a boundary. The mean values of computed apparent separations of click positions per experimental condition, summed over subjects and tunes, are shown in Table 2.
Table 2 Distance between apparent loci of pre- and post-boundary clicks. Imm—Immediate report Ret=Retrospective report Condition 1 Grammar + prosody
Condition 2 Grammar only
Condition 3 Prosody only
Imm
7.42*
7.63
6.94*
Ret
5.67**
6.03*
6.68*
Overall mean
6.55*
6.83*
6.81*
Statistically significant deviation from expected value (based on the null hypothesis of no boundary pull) established by one-sample t-test, is indicated by * where P<0.05, and by ** where P<0.01.
Contemporary music review
170
These data indicate that both metrical structure and intonation, operating in combination (Condition 1), or in isolation (Conditions 2 and 3) are capable of acting as articulating forces in the perceptual analysis of musical input. (Overall values for each condition show significant deviations from the value of 8, the expected value if there were no boundary effect on localisation of clicks). Their combined influence, however, is not a simple additive function of their individual effects. Deviation from 8 in Condition 1 is not equal to the sum of the deviations in Conditions 2 and 3. It seems also that the manner in which they exert their influence differs, since their potency is differentially affected by the timing of report. A comparison of immediate and retrospective reports shows that the effect of intonation (Condition 3) does not depend on the point in time at which the report is made, since its power at an immediate report is comparable to that of a retrospective report. (Imm—Ret difference in Condition 3 is not significant). The emergence of structure based on grammar, however, (Condition 2) seems to require a retrospective analysis of the input. The immediate report fails to reach statistical significance, whereas the retrospective report does. That this should be so has intuitive appeal. Information about phrases is embedded deep in the total metrical structure. The whole has to be appreciated before a decision as to how to parse it can be made; early decisions would be too error-prone. Intonational information, on the other hand, is carried at the surface level, is intrinsic to the ongoing acoustic input, and is available immediately as a travelling wave as the sequence unfolds. Hence this study demonstrates that music can be released from direct determination by the linear temporal characteristics of the input by forces arising from the metrical structure and intonation. Furthermore, these forces operate in a manner directly comparable to the operation of their counterparts in the apparently exclusive domain of speech sounds.
III Restoration In speech, if a part of a word—a single phoneme or a syllable—is obliterated from the input stream and replaced by an irrelevant noise, such as a short burst of white noise similar to the crackle of static interference, then provided the spectral characteristics of the noise are sufficiently broad to potentially cover the range of speech wounds (Warren, Obusek and Ackroff, 1972), the missing segment tends to be perceptually restored, so that all the component sounds of the word are heard (Warren, 1982). That the identity of the missing sound is not deduced consciously from the context is evident from the fact that listeners cannot identify which sound was missing. Furthermore, the delay between the point at which the missing segment is subjectively heard, and when it could potentially be disambiguated by the context, can be considerable (Warren and Sherman, 1974). For example, in this sentence: The *eel fell off the battered old “– – –”. what is heard in place of the crackle (indicated here by *) will depend on the disambiguating word at the end of the sentence, (“– – –”) but listeners will be convinced
Transformation, migration and restoration shades
171
that they heard the word appropriate to the context occurring at the beginning of the sentence. In the case of this example, they might report hearing ‘Heel’ if the last word is ‘shoe’, or ‘Wheel’ if the last word is ‘car’. This is the quintessence of constructivity in the perception of speech. The same spectral characteristics which themselves are not derived from vocalisation, can subserve totally different speech-based interpretations. There is a hint in the literature that restoration effects are possible in music (Sasaki, 1980) but in that instance the effect was intertwined with migration of extraneous sounds, and it still remains to demonstrate a pure form of the effect independent of possible confounding factors. Therefore Experiments 3 and 4 sought to explore restoration effects in music and the possible role played by familiarity (Experiment 3) and predictability (Experiment 4) of the musical line in generating restoration. Experiments 3 and 4 General method Single notes were deleted from musical lines approximately 6 bars in length, and replaced by a burst of white noise (covering a broad spectral range) and sounding like a crackle. Subjects were asked to indicate on a rating scale from 1 to 7 (1 indicating certainty that they could not hear music beneath the crackle, 7 indicating certainty that they could hear music spanning the time span of the crackle. The missing note was always located at or near the middle of the musical line. Method Experiment 3 The variable of familiarity of the musical line on restoration was examined in a single two-level factor, within-subjects design. Two sub-sets of tunes, one of high (High F) and the other of low (Low F) familiarity, were selected from a larger sample on the basis of familiarity ratings obtained from a pilot study. The tunes were all excerpts from classical pieces, the High F tunes being matched for musical style with Low F by using tunes composed by the same composer. Familiar and unfamiliar tunes were presented in one of four pre-prepared random orders to 25 subjects. Experiment 4 Predictability was defined as intervallic similarity, repetition of phrasing and smooth continuity of melodic line. Tunes conforming to high (High P) and low (Low P) degrees of predictability (12 in each category) were generated by computer. The design was again a repeated measure on the single, two-level factor of predictability. 25 subjects rated all tunes, presented in a random order. Results and discussion The results of Experiments 3 and 4 are shown in Table 3. There is clear evidence of restoration, in the sense of subjective experience of continuation of music across the time
Contemporary music review
172
span of the missing fragment beneath the crackle, with all ratings in the upper half of the scale. More importantly, the finding that restoration is not affected by familiarity (there is no significant difference between High F and Low F tunes) suggests the underlying process is not one of conscious deduction of the missing segment by reference to a stored representation of the tune. It must therefore be driven by a different mechanism, perhaps related more directly to the actual input sequence. This interpretation is reinforced by the findings related to the effect of predictability on restoration.
Table 3 Mean ratings of tunes, averaged over tunes and subjects. High F
Low F
High P
Low P
4.32
4.29
5.61
3.27
P>0.05
P<0.05
P=and Low probability of obtaining pairs by chance, a difference based on between High ANOVA.
Restoration is greatly enhanced by predictability of the musical line. (Ratings of High P lines are significantly higher than ratings of Low P lines). This suggests that its origin lies in automatic processes of realisation of the implication of the line; stored representations can play no role since the lines are all novel ones. It seems that the perceptual realisation of an element in a sequential input is well on its way before the occurrence of the physical stimulus, and does not require very much sensory support from the physical input, so that it does not make too much difference to perception if that support is not forthcoming. The term ‘implication’ has so far been assumed to suggest a process whereby preceding context projects information forward about what is to follow. However, judgements in the experiments were made at the end of a musical line, and since the missing element never occurred in the ultimate position, some succeeding context was available. It has been suggested that succeeding context, allowing for “retrospective” implication, may also be effective in the realisation of implication of musical lines (Krumhansl and Schellenberg, 1990). The results of some of our preliminary investigations into the role of succeeding context contrasted with preceding context within the restoration paradigm suggest that implication can be retrospective as well as prospective in generating restoration of missing musical segments. So, in relation to Experiments 3 and 4, it is likely that at least part of the effect is driven by succeeding context. Again, this mirrors the speech prototype, where the disambiguating information can occur at the end of the utterance and act retrospectively to restore information missing at the beginning of the sequence.
Summary and conclusion The three illusory effects which have been thought to arise from the special nature of speech processing have been shown to be by no means limited to speech sounds. Within each illusion-producing paradigm there is ample evidence of cognitive creativity at work
Transformation, migration and restoration shades
173
in the perception of music, and at work in a way that closely parallels its operation in speech. Musical stimuli will perceptually transform into representations that have no basis in the physical signal. Musical lines are subject to cognitive re-structuring and articulation into chunks. Musical sounds are subjectively created where none exist in reality. It is clear that listening to music is by no means the linear, data-driven unfurling of auditory events that has long been assumed. The parallels of divergence between stimulus parameters and cognitive representations in speech and music signal at the theoretical level, and substantiate at the empirical, the constructive processes involved in listening to music. This of course questions the traditional division of the auditory world into speech and non-speech events, especially where that division is made on the basis of differential relative weighting of bottom-up to top-down processes. But, more importantly, the above experiments provide an empirical demonstration of what may have long been implicitly recognised in musical spheres—that the psychological world of music is not necessarily in complete harmony with the physical world. There exists a considerable degree of freedom in the conscious realisation of music, and what is heard may not be so much an echo of the physical dimensions of sounds as a subjectively generated variation on the theme. Acknowledgements Our thanks go to Ian Cross of the Faculty of Music, University of Cambridge, on whose talents as a music theoretician, computer expert, performer and composer we relied so heavily.
References Evans, C., Longden, M., Newman, E.A. & Pay, B.E. (1967) Auditory ‘stabilized images’: fragmentation and distortion of words with repeated presentation. National Physical Laboratory, Autonomies Division, Report 30. Fodor, J.A. & Bever, T.G. (1965) The psychological reality of linguistic segments. Journal of Verbal Learning and Verbal Behaviour, 4, 414–420. Foss, P.J. & Blank, M.A. (1980) Identifying the speech codes. Cognitive Psychology, 12, 1–31. Krumhansl, C. & Schellenberg, G. (1990) An empirical investigation of the implication-realization model. Paper presented at the Conference on Music and the Cognitive Sciences, Cambridge, U.K. Liberman, A.M., Cooper, F.S., Shankweiler, D.P. & Studdert-Kennedy, M. (1967) Perception of the speech code. Psychological Review, 74, 431–461. McCarthy, R.A. & Warrington, E.K. (1990) Cognitive Neuropsychology, pp. 122–151. Academic Press. McGurk, H. & MacDonald, J. (1976) Hearing lips and seeing voices. Nature, 264, 746–748. Ojemann, G.A. (1982) Inter-relationships in the brain of language-related behaviours: Evidence from electrical stimulation mapping. In U.Kirk (Ed.) The neuropsychology of language, reading and spelling. Academic Press. Sasaki, T. (1980) Sound restoration and temporal localisation of noise in speech and music sounds. Tohoku Psychological Folia, 39, 79–88.
Contemporary music review
174
Sloboda, J.A. & Gregory, A.H. (1980) The psychological reality of musical segments. Canadian Journal of Psychology/Review of Canadian Psychology, 34, 274–280. Stoffer, T.A. (1985) Representation of phrase structure in music. Music Perception, 3, 191–220. Warren, R.M. (1968) Verbal transformations and auditory perceptual mechanisms. Psychological Bulletin, 70, 261–270. Warren, R.M. (1982) Auditory Perception, pp. 141–185. Pergamon Press. Warren, R.M. & Gregory, R.L. (1959) An auditory analogue of the visual reversible figure. American Journal of Psychology, 71, 612. Warren, R.M., Obusek, CJ. & Ackroff, J.M. (1972) Auditory induction: Perceptual synthesis of absent sounds. Science, 176, 1149–1151. Warren, R.M. & Sherman, G.L. (1974) Phonemic restoration based on subsequent input. Perception and Psychophysics, 16, 150–156. Wingfield, A. & Klein, J.F. (1971) Syntactic structure and acoustic pattern in speech perception. Perception and Psychophysics, 9, (1A) 23–25.
Associationism and musical sountrack phenomena
175
Associationism and musical soundtrack phenomena Annabel J.Cohen Dalhousie University, Halifax, N.S., Canada Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 163–178 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
It is often assumed that musical soundtracks influence the interpretation of film. Film music theorists further assume that such musical influences depend on the combination of meanings derived from musical and film material. The present article suggests that these assumptions about film music fall within an associationist tradition and translate into testable hypotheses about mental function. This view is supported first by a discussion of the reliance of musical meaning on experience and then by a review of three recent experiments which investigate the influence of music on film meaning. Experiment 1 uses simple musical and visual materials and measures one affective meaning at a time. Experiment 2 uses slightly more complex materials and measures multiple scales of affective meaning employing the semantic differential technique. Experiment 3 uses realistic complex materials and measures both affective and denotative meaning. In all studies, the direct influence of musical meaning on film meaning was often observed. To accommodate the data, it is proposed that musical and film information independently activate associations of both affect and denotation and that meaning at any point in time is the resultant of the total associations generated. This foundation provides a basis for examining further assumptions and hypotheses about the functions of film music, thus revealing facts about film music necessary for future cognitive modelling. KEY WORDS: Associationism, music, film, soundtrack, meaning, cognition
Contemporary music review
176
“Change the score on the soundtrack, and the image-track can be transformed,” So says film-music theorist Claudia Gorbman (1987, p. 30) and she is not alone. The notion that film music1 influences the interpretation of film is basic common sense, a part of folk psychology. Experts schooled in the disciplines of film and/or music agree as well: composers of film music such as Aaron Copland, Bernard Herrmann, and Miklos Rozsa take as granted that film music provides meanings, emphasis, tension, and connection in the drama that cannot be conveyed in other ways (cf., Carroll, 1988; Palmer, 1980). These assumptions are based primarily on introspection and may differ in detail from individual to individual. Nevertheless, the general agreement among experts suggests that it would be valuable to regard their assumptions as hypotheses about complex, cognitive processes underlying film music perception. I suggest here, that these hypotheses about the conjunction of effects from music and film align themselves with an experimental psychological approach which seeks to account for complex mental phenomena in terms of interconnected component elements. This approach, called associationism, embraces empirical study and controlled experimentation and would help to translate expert opinion into specific facts and theory. The application of the associationist viewpoint to understanding film music was first suggested by the author in an overview of psychological perspectives on film music (Cohen, 1990). The present article develops the approach by mapping first the features of associationism onto examples from film music, then by discussing experimental origins of musical meanings, and finally, by describing three experiments on the effect of musical meaning on the meaning of film.
Associationism Associationism has a long history in experimental psychology (cf., Anderson & Bower, 1973; Earhard, 1974). It is rooted in the philosophy of Aristotle and the British empiricists, Hobbes, Locke, Hume and Hartley, and its influence has been noted in classical conditioning, stimulus-response theory, theories of memory and contemporary connectionism.2 Anderson and Bower (1973, pp. 9–12) describe four features of associationism. As will be shown, each feature is well exemplified by typical assumptions made about filmmusic phenomena. First, associationism is sensationalistic; it identifies the basic components of mental experience with sensory experience. Anderson and Bower (1973) include emotions within the category of sensory experience (see also, Bower, 1981). Simple ideas are identified with elementary, unstructured sensations. The many statements about film music which acknowledge the sensory (auditory or emotional) origin are clearly sensationalistic. For example, “Music…can set up emotional vibrations in the mind of the audience that may complement, supplement or even contradict the visual image. It supplies an all-important third dimension” (Palmer, 1990, p. 11). Or consider the words of Jean-Paul Sartre during the silent film era which describe the emotional experience produced by the musical accompaniment: Above all, I liked the incurable muteness of my heroes. But no, they weren’t mute, since they knew how to make themselves understood. We communicated by means of music; it was the sound of their inner life.
Associationism and musical sountrack phenomena
177
Persecuted innocence did better than merely show or speak of suffering: it permeated me with its pain by means of the melody that issued from it. I would read the conversation, but heard the hope and bitterness; I would perceive by ear the proud grief that remains silent. (In Anderson, 1988, p. xv). In addition, there is a notion that music is more closely aligned to these sensory origins than are other kinds of both auditory and non-auditory information. Ruben (1985, p. 284) for example refers to film music and silence as two of the purest of the soundtrack’s four voices (the other two being speech and sound effects) and Carroll (1988) speaks of music as having a privileged or direct route to the emotions. This concurs with the psychoanalytic literature and with clinical psychological literature which shows that mood can be induced more effectively by music than by standard verbal techniques (Albersnagel, 1988). Hence, sensationalism is an appropriate premise for current thought about film music. Second, associationism is mechanistic. Simple algebraic rules predict the properties of complex configurations from the properties of the underlying simple ideas. Thus, the effects of complex configurations from the properties of the underlying simple ideas. Thus, the effects of combining dimensions of music (e.g., rhythms and harmonies) or of combining music and film can be predicted (following Anderson & Bower, 1973). This is consistent with the view that music independently adds meaning. For example, Eisenstein (1928/1949, p. 259) believed that sound was a separate element in the montage3 which provided new means for expression that were otherwise limited by using visual images alone. Carroll (1988) more recently writes “music modifies”. He specifically claims that music may attribute to the visuals an otherwise unavailable quality and that “…the associated musical elements are modifiers which attribute expressive properties to the referent…” (p. 222). Hand in hand with its mechanistic nature, associationism has as a third feature, reductionism; the events to be explained decompose into the basic stock of simpler units. This is compatible with attempts to understand film music phenomena in terms of the component visual and musical elements. For example, Eisenstein described a sequence from Alexander Nevsky in stills juxtaposed to the musical score to demonstrate the exact audiovisual correspondence that he and Prokofiev supposedly achieved (Eisenstein, 1942; Gorbman, 1987). Finally, associationism is connectionist.2 This means that ideas, sense data, memory nodes, or mental elements are associated together in the mind through simultaneous or contiguous experience. To translate this notion into the realm of film music, we have only to consider the principle of leitmotiv, which has been regarded as the structural premise of most film scores (Palmer, 1980, p. 550). The term is borrowed from its use in opera where it denotes a musical theme which typically accompanies a particular character or activity in the drama and takes on the meaning of that character or activity, such that the music alone evokes its memory (Grout, 1973, p. 417; Lipscomb, 1990, p. 6). Countless examples are found in discussions and critiques of film music (e.g., Carroll, 1988, pp. 217–218; Rozsa, 1982, p. 133). Gorbman (1987) describes Steiner’s score for Mildred Pierce in terms of themes which become associated with the main characters, e.g., “One comes between this motif (B) and its character is established rapidly” (p. 94) and “A
Contemporary music review
178
fourth gradually to associate Mildred with this music” (p. 93), and later, “The association theme (D) is a jaunty melody associated with Mildred’s restaurant business…” (p. 95). The concept of connectionism is clearly at the heart of such statements. The examples given above show that much discourse about film music fits within an associationist framework and is replete with associationist hypotheses. The discussions of film music, however, rarely acknowledge the hypothetical status of these statements4 and, to the best of the author’s knowledge, have not considered the statements within a broader associationist view. With its emphasis on the role of experience, associationism is also compatible with the scientific method which tests hypotheses through empirical research. Testing associationist hypotheses about film music would contribute to a more precise understanding of how the mind can process the audiovisual complexities of film music. In summary, within the associationist framework, it can be hypothesized that both musical soundtracks and film activate basic percepts and emotions, that the effects of combining music and film depend on the summation of these activated elementary percepts and emotions and that further multimodal phenomena of film music can be understood in terms of connections formed between these elements. Within the associationist framework, the scientific method can be applied to test the hypotheses through controlled experiments leading to reproducible results. If the hypotheses are not supported, then the hypotheses can be rejected or other theoretical viewpoints can be sought. The initial problem, however, is to begin to collect data that can be examined in terms of the hypotheses. Having now suggested that associationism provides a useful starting framework for investigations of musical soundtrack phenomena, an example will be given of testing an associationist assumption about film music. The hypothesis or assumption in question is that music alters film meaning. It will be addressed in three experiments. The successive experiments use materials of increasing complexity to suggest that the principles appropriate to simple stimuli are equally appropriate to complex stimuli. It will be concluded that associationist concepts help us to isolate the role of musical meaning in film and this approach could be applied to a variety of questions about film music. However, before embarking on this plan, we first examine the position that musical meaning itself is derived through associationist principles. The overall hypothesis is that music acquires its meaning through prior associations and that music transfers these meanings to portions of films which it accompanies.
Musical meanings and associationist origins It is-often claimed by music theorists such as Hopkins (1979) and Meyer (1956) that meanings of music are broad and tend to lack specific denotation. Film-music theorists agree on this point (Gorbman, 1987; Carroll, 1988) and, analogous to the 18th Century music theorists who argued that music required text to make it intelligible (Schroeder, 1990), film-music theorists claim that music finds specific meaning through conjunction with film. Nevertheless, some music has quite specific meanings on its own (e.g., Kracauer, 1960, p. 141). These specific meanings will be referred to as denotative meanings. To give a simple example, the tune Auld Lang Syne will bring to the minds of
Associationism and musical sountrack phenomena
179
many people the image of a New Year’s Eve party. The occurrence is no mystery, since the song is sung primarily in this situation. Such denotative meaning clearly depends upon experiencing the connection between the music and a particular event. Classical composers such as Haydn often quoted musical excerpts from folksongs and choral works assuming that the denotative meanings would make the new composition “intelligible” without the direct use of words (cf., Schroeder, 1990). Another aspect of musical meaning is the emotional or affective aspect. It has been shown in a number of empirical studies that listeners agree on emotional connotations of music which we will henceforth refer to as affective meaning (Cunningham & Sterling, 1988; Gardner, Silverman, Denes, Semenza & Rosenstiel, 1977; Levi, 1982; Nielzen & Cesarec, 1981; Riggs, 1964). Both denotative and affective meaning can be accounted for in terms of associationist principles as will be described below. As discussed earlier with reference to connectionism, one basic associationist notion is that two mental events which are contiguous in time or space will become linked together in the mind such that the presence of one of them will give rise to the other. In more specific terms, if A and B are presented sequentially many times, then eventually A will on its own elicit a representation of B. (This incidentally is the theory consistent with leitmotiv as previously referred to in the discussion of connectionism. It could be added that the film-music theorists who describe the formation of links between a musical theme and the film event, often assume that this occurs in just one presentation, cf. Gorbman, 1987, pp. 27–28). A slight variation of this idea resembles a simplified version of Pavlovian or classical conditioning which exemplifies associationist principles. Here, suppose A is a bell and B is a piece of cheese. The cheese B causes a salivation response, C. Thus, whenever A is followed by B, C occurs. Eventually, the presentation of A in the absence of B will still generate the response of salivation, C. Salivation is an objectively measureable response to an object. But less overt responses such as images and emotions may also be elicited by objects. For example with respect to denotative meaning, B can be regarded as the New Year’s Eve party and C, the image of the party. In the absence of B, the song A which is always sung at the New Year’s Eve party may bring to mind the image of the party, C. With respect to affective meaning, a picture of a young child B1 may generate representations of tenderness C1 in a viewer, or the picture of a massacre B2 may bring feelings of rage C2. The conjunction of music A with experiences of certain events, B1 or B2, may evoke representations of the emotional meanings, C1 or C2, aroused by these experiences. That is, music A may evoke emotion C in the absence of event B which typically evoked C. Thus, music A can acquire affective meaning C through association with B. For example, Auld Lang Syne may evoke emotions of nostalgia even in the absence of New Year’s Eve. These examples of musically evoked meanings are arbitrarily tied to experiences that are determined by cultural, social, and historical situations. They are not necessarily universal. In Japan, for example, Auld Lang Syne is associated with the closing of a department store. There are, however, certain auditory/behavioural contingencies that are universal and could account for universal meanings of music. For example, Sundberg (1982, pp. 146–147) notes a direct relation between human anatomy, emotions and vocal production, what he refers to as the body language of emotions. He said that “It is likely that expressive body movements are translated into acoustic terms in voice production.” When one is sad, the musculature is different than when one is happy or angry. Since
Contemporary music review
180
vocal production depends on the musculature, change in the musculature changes the quality of the sounds that are produced. Thus, humans have constant exposure to correlated information of emotional state and auditory parameters of the voice. In sadness, muscle activity is minimized and reduced activity in specific muscles may result in soft voice intensity, low pitch and non-dynamic pitch contour, impoverished harmonic structure, etc. This does not mean that all low, slow sounds are regarded as sad. Rather, abstracted over many contexts, sadness may be the predominant emotion associated with these parameters. Considerable research shows that humans can categorize auditory patterns into emotional categories on the basis of such auditory parameters (Scherer & Ochinsky, 1977; Sundberg, 1982). Moreover, contingencies between intonation pattern and specific interactional and motivational contexts have been noted in infant directed speech (Stein, Spieker & MacKain, 1982) and infants have been shown to link happy and sad sounding voices with corresponding facial expressions (Walker, 1982). This sensitivity to available contingencies is consistent with the idea that emotional-auditory connections are learned universally through experience. In other words, there is an associationist account for the meaning of simple acoustic parameters in speech and in vocal and nonvocal music.
Changing film meaning by music: three experiments The previous section suggested that musical meaning is itself the product of associations. The present section examines whether associations can account for the often referred to film-music phenomena in which meaning of the film is changed by the accompanying musical context. More specifically, the object is to relate the change in meaning of a visual pattern to the acoustic parameters of the musical background, and explicate Gorbman’s pronouncement which began this article: “Change the score on the soundtrack, and the image-track can be transformed”. We begin with a simple example and then turn to increasingly more complex cases. 1. Elementary auditory and visual patterns In the first example, very simple visual and auditory stimuli were examined in isolation and in conjunction in a number of studies. The visual stimulus was a bouncing ball represented on a computer screen. The bounce varied in two dimensions, height (low, middle or high) and tempo (slow, moderate or fast). The auditory stimulus was a simple melody (repeating tones or broken major or minor triad) which also varied in two dimensions, pitch (low, middle, or high) or tempo (slow, moderate, fast). When the stimuli were presented alone, in control conditions, subjects using a 7-point scale, rated high sounds happier than low, and fast tones happier than slow, confirming Riggs’ (1964) earlier observations for more complex musical stimuli. The same pattern of response was shown for the visual stimuli. Figure 1 (top Tempo, bottom Height/Pitch) shows the results for 12 subjects who saw the stimulus and 12 subjects who heard it. For these unimodal conditions, the roughly parallel lines indicate that the emotional rating is controlled by the level on the dimensions in the same way for the two modalities. The effects of combining values on the two dimensions within a modality was also additive.
Associationism and musical sountrack phenomena
181
For example, within the visual modality, high values on height and tempo produce a very high rating and a high value on height and a low value on tempo sum to a lower rating. What then would happen when the audio and visual information were combined (a) congruently and (b) incongruently? The results of 12 subjects shown in Figure 2 (top panel) for tempo reveal for the congruent condition exactly what would be expected, basically the same pattern as for the individual modalities. Since this could be explained by attending to either the auditory or visual modality, the effects of the auditory information on the visual judgment are more easily discerned in a comparison with the incongruent condition. In the incongruent condition, when the high values of the visual modality are now combined with low auditory values, the original effects for the visual values are moved in the opposite direction. Thus, for the incongruent condition, the range of values is greatly reduced as compared to the congruent case. Figure 2 (bottom panel) shows the same counteractive effect of the incongruous Height/Pitch on the emotional judgment, although in this case, the effect of the auditory information is not as pronounced. Thus, at the level of simple elementary dimensions of tempo and pitch, effects of music on the interpretation of the visual display are predictable. But these were not the only dimensions found to influence visual meaning. Melodic structure was a third dimension that had an independent effect: balls which bounced to a background of ascending-descending major triads were judged as happier than balls which bounced to a background of repeating tones. Minor triads produced intermediate judgments. These influences are schematically represented by Figure 3 which illustrates the effects of the separate components of the two modalities on the final meaning. The components represent only a few of the many possible dimensions.
Contemporary music review
182
Figure 1 Mean sadness/happiness rating as a function of visual or auditory tempo (top panel) and of
Associationism and musical sountrack phenomena
183
visual height or auditory pitch (bottom panel) for a bouncing ball and a simple melody, respectively.
Figure 2 Mean sadness/happiness rating of the bouncing ball as a function of tempo of the bounce and a
Contemporary music review
184
congruent or incongruent background melodic temporal pattern (top panel) and as a function of height of the bounce and congruent or incongruent background melodic pitch height (bottom panel).
Figure 3 Schematic representation of a simple model of information integration from multiple visual and auditory dimensions which gives rise to the resultant meaning of the audio visual stimulus. 2. Complex animation The previous example examined one rating scale. Another technique developed some years ago uses many rating scales to measure affective meaning on three basic dimensions: Evaluative, Potency and Activity (Osgood, Suci & Tannenbaum, 1957). The technique, called the semantic differential, is associationist, in that it assumes that an object will give rise to affective associations on these dimensions and their strength can be measured through ratings on separate bipolar scales which tap the particular dimension (e.g., fast-slow taps the Activity dimension). In one study (Marshall & Cohen, 1988), individual groups of subjects provided semantic differential ratings for (1) two
Associationism and musical sountrack phenomena
185
contrasting pieces of specially composed music, called Strong and Weak (2), a short animated film by Heider and Simmel (1944) involving three geometric figures and finally (3) the film accompanied by the two musical backgrounds. The overall ratings of the film changed in the presence of the music. For the Activity and Potency dimensions, an averaging of musical and film meaning was consistent with the data but the role of music on the Evaluative dimension was more complex and was accounted for in terms of a compatibility factor rather than a simple integration of the visual and auditory Evaluative ratings. As well, ratings of the individual characters in the film (the large triangle, small triangle and circle) also changed as a function of the music background as shown in Figure 4. A simple associationist account of the data was complemented by the proposal of a Congruence-Associationist model in which music governed attention
Figure 4 Mean rating on the Evaluative, Potency, and Activity dimensions for the three film
Contemporary music review
186
characters as a function of the two contrasting musical soundtracks (Weak and Strong) and a condition with no soundtrack (Control). If there were no effect of background music, the data for the two music conditions would not differ from the Control conditions and all lines would be horizontal (Figure © 1988 by the Regents of the University of California, reprinted from Music Perception, Vol. 6, No. 1, Fall, p. 107, by permission.) to certain visual elements. The musical associations were then attached to these attended visual elements. The control of visual attention was thought to be guided by simple intermodal relations, such as temporal similarity. 3. Realistic materials These previous examples have considered abstract visual materials and simple auditory materials. What of more realistic and complex materials that would be viewed in a movie theatre, typical of the everyday film experience? In one study of such complex materials, short (1 min) film music excerpts, entitled Conflict (M1) and Say Hello To Love (M2), were presented to over 50 students subjects who rated them on semantic differential scales. As a measure of denotative meaning, subjects also rated the appropriateness of various titles including Say Hello to Love and Conflict. Figure 5 (left top row) shows that the Evaluative, Potency and Activity ratings differ significantly for the two pieces. Figure 5 (right top) shows that the judged appropriateness of the two titles also differed for the two pieces. (Two other titles were also judged, but, for simplicity, are not shown.) As well, two contrasting film excerpts were selected. One (F1) showed a male chasing a female; the other (F2) showed a fist fight between two males. These were rated on semantic differential and titles by another large group of subjects and again, as shown in the second row of Figure 5, the patterns of affective and denotative meanings differ for the two film excerpts. Comparing the judgments for the music and film, it can be seen, for example, that M1 and F2 are denotatively similar and M2 and F2 are opposites. If music influences meaning of film, then the different musical backgrounds should alter judgments of the meaning of the films. To test this, different groups of subjects rated the films in the presence of the different soundtracks. One group rated M1F1, M2F2 and the other rated M1F2, M2F1. As can be seen in the bottom half of Figure 5, the judgments of the combined stimuli reflect the meanings of the individual components in many cases although not all. F1 is influenced directly by M1 but not so obviously by M2 for both affective and denotative meaning. F2 tends to dominate over M1 and M2. These results lead to further questions about the roles of salience and ambiguity of meaning of separate audiovisual components on the direct influence of music on film judgments.5
Associationism and musical sountrack phenomena
187
Conclusion Discourse on film music implies a number of psychological phenomena concerning the interaction of music and film. This discourse often assumes that musical meanings when added to the visual information set a mood, modify the interpretation of an event, provide a narrative function, establish context of time and place, etc. It also assumes that music has a direct path to the emotions and sensations, and that musical meaning can be acquired by contiguity with film meaning, such that the music can take on the meaning of its own and introduce information about the film in the absence of the film itself. It was argued that these assumptions are consistent with the sensationist, reductionist, mechanistic, and connectionist features of associationism and also, still within the tradition of associationism, these assumptions admit to empirical testing. An example of how such empirical work can be conducted was given with respect to the question of the contribution of musical meaning to interpretation of visual information in a film. Three increasingly complex examples were presented. In the first, an auditory pattern varying in tempo and height altered the happiness/sadness judgments of a bouncing ball. In the second example, contrasting music influenced semantic differential judgments
Figure 5 Mean rating on the Evaluaytive, Potency, and Activity dimensions (left panels) and for title appropriateness ratings (right panels) for two music selections, for two film
Contemporary music review
188
selections, and for the combinations of music and film selections.
Figure 6 Schematic representation of the simple associative model of meaning of film which integrates affective and denotative meanings from both musical and visual (scene) information from the film. about a simple animation overall and for the three geometric film “characters”. In the final study, with examples of real film music and realistic film excerpts, once again the meaning of the music in some instances shifted both the affective and denotative meaning
Associationism and musical sountrack phenomena
189
of the film. Thus all studies substantiated the claim that the soundtrack can transform the image/track. It is suggested that the principles demonstrated in the simplest experiment (the bouncing ball) are applicable to the more complex experiments which also showed effects of music on the interpretation of the visual information. Thus, all of these examples are consistent with a model that assumes that both music and film generate representations of specific denotation and affect (see Figure 6). These may be weighted according to their salience and clarity and then combined for the final outcome meaning. The associationist framework can help to provide the empirical data needed to fine tune this theory. The data, however, may or may not be most parsimoniously accounted for in terms of associationism. Related and more recent cognitive models may prove to be better candidates, but at present we do not have the appropriate data to judge this. This paper, therefore, does not argue that musical phenomena ultimately will be explained by an associationist theory. The intent of the argument for associationism is instead pragmatic and methodological. If the argument has been successful, the reader will agree that associationism provides a good foundation for exploiting the insights that both film music and film music theorists offer us about complex musicognitive processes.6 Acknowledgements Comments on an earlier draft from Professors John Barresi, James Clark, Douglas Mewhort and David Schroeder and Dr. Archie Levey are greatly appreciated. The technical assistance of Debora Dunphy is also acknowledged. The work was supported by the Social Sciences and Humanities Research Council of Canada through a Canada Research Fellowship and a Research Grant.
Notes 1. The discussion generally addresses background music (also known as underscore, incidental, functional and non-diegetic music) as opposed to music that is part of the drama (also known as foreground, musique de scène, source music, realistic or diegetic music). It is often assumed that background music is unheard by the audience (e.g., Gorbman, 1987; Lipscomb, 1990). That is a separate testable question which has been given preliminary attention (Cohen & Dunphy, 1990; A. Levey, personal communication; Lipscomb, 1990). 2. Fodor and Pylyshyn (1988) claim that all connectionists are associationists but that all associationists are not connectionists. Anderson and Bower (1973) claim that all associationists are connectionists. To argue one way or another would not be a profitable endeavor, when clearly, what is at stake here is precise definitions. Anderson and Bower’s notion of connectionism differs in detail from more recent definitions. 3. Montage appears to be regarded by many film theorists as the most characteristically cinematic device. It provides sequence of views that are not related to each other by overlap or common background (Hochberg & Brooks, 1978). 4. The filmscore composer and theorist Hanns Eisler in the early 1940’s envisioned a scientific project on film music “with theoretical determination of special problems, experiments, and
Contemporary music review
190
public tests of the results” and had obtained funding from the Rockefeller Foundation for the project; however, the project did not fully materialize (cf. Marks, 1979, p. 301; Eisler, 1947). 5. Other groups of subjects rated meanings of the music when “accompanied” by the film. No effects of the affective information of the film on judgments of the affect of the music were observed although effects of the donative information of the film on denotative judgments of the music were apparent. Thus, for these particular examples of film and music, the associationist process was asymmetrical for affective meanings. In other words, denotative but not affective information was transmitted from the film to the music interpreter. Denotative and affective meanings thus may be independent. It is also noteworthy that for many film/music combinations, judgments of the music as foreground statistically differed from judgments of the film as foreground. Thus, subjects apparently could direct attention reliably to either music of film.
Lipscomb (1990) also examined the effect of background music on excerpts from a mainstream film. He showed that the pattern of semantic differential judgments differed as a function of the background musical excerpt. Although ratings of the sound and visual excerpts were not examined independently, the results for the combined stimuli provide evidence that the musical soundtrack can transform the meaning of the image-track. 6. The present three experiments focused on one issue, the issue of meaning. A more complex question is that of the leitmotiv phenomenona, concerning the links that form between musical meaning and the accompanied film. Empirical work which addresses this question has been recently reported by Boltz, Schulkind, and Kantra (1991); see also Cohen (1990); Cohen and Dunphy (1990).
References Albersnagel, F.A. (1988) Velten and musical mood induction procedures: A comparison with accessibility of thought associations. Behavior Research Theory, 26, 79–96. Anderson, G.B. (1988) Music for the Silent Films: 1894–1929. A Guide. Washington: Library of Congress. Anderson, J.R. & Bower, G.H. (1973) Human associative memory. Washington: Winston. Boltz, M., Schulkind, M. & Kantra, S. (1991) Effects of background music on the remembering of filmed events. Memory and Cognition, 19 (6), 593–606. Bower, G. (1981) Mood and memory. American Psychologist, 36, 129–148. Carroll, N. (1988) Mystifying movies. New York: Columbia University Press. Cohen, A.J. (1990) Understanding musical soundtracks. Empirical Studies of the Arts, 8 (2), 111– 124. Cohen, A.J. & Dunphy, D. (1990) Musical and visual processing in film. Canadian Psychologist, 31, 220 (Abstract). Poster presented at the Annual meeting of the Canadian Psychological Association, Ottawa. Cunningham, J.G. & Sterling, R.S. (1988) Developmental change in the understanding of affective meaning in music. Motivation and Emotion, 12, 399–413. Earhard, B. (1974) Association (and the nativist-empiricist axis). In E.C.Carterette & M.P.Friedman (Eds). Handbook of perception. Vol. 1. N.Y.: Academic. 93–108. Eisler, H. (1947) Composing for the films. New York: Oxford University Press.
Associationism and musical sountrack phenomena
191
Eisenstein, S.M. (1942) The film sense. J.Leyda (Ed. & Trans). New York: Harcourt, Brace Jovanovitch. Eisenstein, S.M. (1949) film form. J.Leyda (Ed. & Trans). New York: Harcourt, Brace Jovanovitch. Fodor, J.A. & Pylyshyn, Z.W. (1988) Comiectionism and cognitive architecture. Cambridge, MA: MIT Press. Gardner, H., Silverman, J., Denes, G., Semenza, C. & Rosenstiel, A. (1977). Sensitivity to musical denotation and connotation in organic patients. Cortex, 242–256. Gorbman, C. (1987) Unheard melodies: Narrative film music. Bloomington: Indiana University Press. Grout, D.J. (1973) A history of western music (Revised edition). New York: Norton (Original published in 1960). Heider, F. & Simmel, M. (1944) An experimental study of apparent behavior. American Journal of Psychology, 57, 243–259. Hochberg, J. & Brooks, V. (1978) The perception of motion pictures. In E.C.Carterette and M.P.Friedman, Handbook of Perception. Vol. X. (pp. 259–303). New York: Academic. Hopkins, A. (1979) Understanding music. London: Dent. Kracauer, S. (1960). Theory of Film. London: Oxford University Press. Levi, D.S. (1982) The structural determinants of melodic expressive properties. Journal of Phenomenological Psychology, 13, 19–44. Lipscomb, S.D. (1990) Perceptual judgment of the symbiosis between musical and visual components in film. M.A. thesis. University of California. Los Angeles. Marks, M. (1979) Film music: The material, literature and present state of research. Notes, 36, 265– 325. Marshall, S. & Cohen, A.J. (1988) Effects of musical soundtracks on attitudes toward animated geometric figures. Music Perception, 6, 95–112. Meyer, L. (1956) Emotion and meaning in music. Chicago: University of Chicago Press. Neilzen, S. & Cesarec, Z. (1981) On the perception of emotional meaning in music. Psychology of Music, 9, 17–31. Osgood, C.E., Suci, G.J. & Tannenbaum, P.H. (1957) The measurement of meaning. Urbana: University of Illinois Press. Palmer, C. (1980) Film music. In Stanley Sadie (Ed.), New Groves Dictionary of Music and Musicians (pp. 549–566). Washington, D.C.: MacMillan. Palmer, C. (1990) The Composers in Hollywood. New York: Marion Boyars. Riggs, M.G. (1964) The mood effect of music: A comparison of data from four investigators. Journal of Psychology, 58, 427–438. Rozsa, M. (1982) Double life. New York: Hippocrene Books, Inc. Rubin, M. (1985) The voice of silence. In E.Weis and J.Belton (Eds) Film Sound: Theory and Practice (pp. 277–285). New York: Columbia, 1985. Scherer, K.R. & Oshinsky, J.S. (1977) Cue utilization in emotion attribution from auditory stimuli. Motivation and Emotion, 1, 331–346. Schroeder, D. (1990) Haydn and the enlightenment: the late symphonies and their audience. Oxford: Clarendon Press. Stern, D.N., Spieker, S. & MacKain, K. (1982) Intonation contours as signals in maternal speech to prelinguistic infants. Developmental Psychology, 18, 727–735. Sundberg, J. (1982) Speech, song and emotions. In M.Clynes (Ed.), Music, mind and brain (pp. 137–150). New York: Plenum. Walker, A.S. (1982) Intermodal perception of expressive behaviors by human infants. Journal of Experimental Child Psychology, 33, 514–535.
Contemporary music review
192
Rhythm perception: Interactions between time and intensity Claire Gérard, Carolyn Drake and Marie-Claire Botte Laboratoire de Psychologie Expérimental, CNRS, Paris, France Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 179–189 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
A series of experiments in the field of rhythm perception, examining the mental representation of the temporal and intensity-based properties of a rhythm sequence, demonstrates that errors made by adults in judging the intensities of sounds were dependent on the location of the judged sound in short rhythmic sequences, as well as demonstrating systematic distorsions of perceived temporal intervals at the end of a group (in adjustment tasks where the temporal organization and the number of elements were varied). The results of these experiments underline the need to distinguish between the perception of objective variations and the subjective representation of the properties of sounds when they are embedded in a sequence (immediate context) and used as the basis of various decisions (larger context). These results are discussed in the light of data on masking, subjective accents, temporal summation, perceptual illusion and judgmental processes. KEY WORDS: Rhythmic sequence, adjustment of intensity.
For cognitive psychologists, information processing implies the creation of an internal representation of external events. The properties of the internal representation of a given event depend on: 1) The physical parameters of the external event, 2) The properties of the sensory system involved, 3) The immediate objective context of the event (e.g. the events surrounding it), and 4) The larger context defined by the kind of task involved and the subject’s prior knowledge about the event being processed. In this last class of determinants, judgmental aspects lead the subject to emphasize or, on the contrary, to ignore some dimensions of the event, thus modifying their internal representation. Therefore, internal representations are never accurate copies of events, and mainly reflect steps 2, 3 and 4 of our non exhaustive list. These four steps can easily be applied to the processing of music. Musical events are defined by their pitch, timbre, duration, and intensity, all parameters of step 1 which must be in the range of sensitivity of the auditory system (step 2). They make up sequences of sounds organized into groups of various size and complexity (step 3). Psychologists
Rhythm perception: Interactions between time and intensity
193
attempt to elucidate the principles of the creation of internal representations with various paradigms (step 4) such as the discrimination between two sounds or two sequences of sounds, the reproduction of short sequences, the segmentation into sub-units, the detection of targets, and, at the extreme, listening and/or interpretation of complex holistic pieces of music. Nothing warrants the claim that the internal representation of the same sequence of sounds should be the same when the task differs, unless invariants are found under various contextual conditions. Concerning the perception of sound dimensions, the paradigms for studying step 4 can be roughly divided into two main approaches. In the first, studies aim at measuring perceptual phenomena directly by discrimination tasks (Monahan & Carterette, 1985) with, for instance, forced-choice paradigms (Povel & Okkerman, 1981). This last study is of particular interest here; by asking subjects to identify which sound (in a repeating group of two sounds of equal intensity) was the loudest, the authors showed that perceived intensity depends on the relation between inter and intra-group intervals. This sort of task requires the subjects to choose between two possible answers and thus can be influenced by judgmental aspects. In the second approach, studies measure perceptual phenomena indirectly by the bias of observable performance. Musicians are asked to play musical sequences chosen for their particular temporal or intensive structure (Clarke, 1985; 1988; Gabrielsson, 1974; Shaffer, Clarke & Todd, 1985). Systematic variations from a purely mechanical norm are taken to reflect characteristics of the internal representation of the sequence. For instance, Drake & Palmer (in preparation) examine how pianists emphasize structurally important notes. Once these systematic variations in performance variables have been identified, it is possible to examine their relative importance when timing and intensity features conflict. This second approach has the disadvantage of being only an indirect measure and it is often difficult to distinguish systematic distortions related to structural aspects and those related to expressive intentions. In this paper, we adopt a third method which, it was hoped, would overcome the inconvenience of the first two. As we will discuss later, this aim is partly fulfilled but new problems are created. In this method, subjects are asked to adjust either one temporal interval or the intensity of one sound in a series of sounds. Any difference between ideal and adjusted values will be taken to reflect distortions in the internal representation of these parameters. This paper examines the processing of temporal and intensity information, their hierarchy and their interactions. We wish to investigate how tasks involving different strategies and decisions on the part of the subject influence this hierarchy. These studies concern simple rhythmic sequences containing variations in time and intensity but not in pitch. The question of the relative importance of temporal and intensity aspects of a rhythmic sequence was first brought to our attention by a previous experiment which has already been described in detail (Gérard & Drake, 1990). In a discrimination task, which criteria are chosen by naive listeners like children? Children were asked to discriminate between short rhythmic sequences in order to ascertain the relative importance of differences in tempo, accentuation and temporal organization, and their possible development with age. Four groups of 30 children (5, 6, 7 and 8 years) were presented 12 pairs of rhythmic patterns (6 identical, 6 different), and for each pair, the child had to say if the 2 patterns
Contemporary music review
194
were “the same music or not the same”. The patterns of interest and the results are presented in Table 1. Four observations can be made: 1) Surprisingly, for the pair containing two isochronous sequences with different tempi (inter-onset intervals were respectively 500 ms and 250 ms), the majority of the children considered that “it was the same music” and their opinion did not change with age (Table 1 line a), 2) When the same temporal organization was presented firstly without accent and then with metrical accent (the level of accented sounds was at least 14 dB louder than that of non accented-sounds), the scores were below chance level (and thus the rhythms were considered “the same”) up to the age of 8 years. Thus, intensity differences did
Table 1 Percentages of correct detection of the differences between the two rhythmic patterns. Age groups (years) Different patterns
5
6
7
8
a: 1 1 1 1 / ssssssss
23.33
36.67
36.67
23.33
b: 1 1 ssl / 1 1 ssl
63.33
60
60
66.67
c: 1 1 ssl//1 ssl l
53.33
86.67
63.33
90
d: 1 1 ssl / 1 ssl l
70
90
90
96.67
1=long interval s=short interval l and s=accented sounds.
not appear to be taken into account (line b), 3) Two different temporal organizations were generally well discriminated since scores were above chance level for 6 and 8 year-olds (line c), 4) Finally, intensity played an interesting role since adding intensity differences to the temporal patterns of line c improved the discrimination scores for all subjects: the difference was detected more often than chance as young as 5 years (line d). This experiment lead to the conclusion that children’s decisions are mainly based on the temporal structure of long and short intervals. Time and intensity interact since variations in the intensity domain influence the processing of variations in the temporal domain. Whereas intensity differences and tempo are coded in the internal representation, these factors are not sufficient in themselves for children to decide about the similarity or differences between two “musics”. This first experiment lead us to examine possible analogous interactions between time and intensity parameters in adult subjects.
Experiment 1: systematic distortion of intensity Most data on distortions of intensity properties of sounds have come from performance studies. As stated above, the need to reproduce and to communicate a musical structure to a listener may produce changes in the internal representation. Thus, we have studied the
Rhythm perception: Interactions between time and intensity
195
perception of the intensity of sounds embedded in sequences in an adjustment task, in order to 1) reduce motor programming to a minimum, 2) ensure that attention was focused on loudness alone, 3) suppress any aesthetic or expressive aim related to the communication with a listener. This experiment was in two parts: in the first, isochronous equitone sequences were presented as a control condition, in the second, temporal organization was introduced into the sequences. Thus, the immediate context of the sounds to be processed was varied.
Method Procedure. A computer and a sound generator presented repeated short sequences of sounds monaurally by one earphone in a sound-proof room. In the first part of the experiment, the isochronous sequences contained 5 or 6 tones with a SPL of 70 dB, frequency 1000 Hz, and duration 50 ms. Inter-onset intervals were constant within a sequence but varied from one sequence to another: 300, 500 or 700 ms. The pause between sequences was twice the inter-onset interval. The intensity of one of the sounds (randomly chosen by the computer, and whose position in the sequence was indicated by a number appearing on the screen of the computer placed in front of the subject) was either louder or softer than the other tones in the sequence by between 15 and 18 dB. The subject’s task was to adjust the intensity of this sound (by turning a potentiometer) so that it seemed equal in loudness to the other sounds in the sequence. The subject heard the adjusted sequence with the correction he had asked for and could adjust it again as many times as he wished. He pushed a button once he was satisfied with the adjustment. The value of the adjustment in dB was recorded by the computer. In the second part of the experiment, sequences of 6 sounds were generated with the same characteristics as above and presented in the same manner, but the sequences were not isochronous. Three different rhythms containing two different intervals, short (s) and long (1) were selected: they can be coded as: r1=ssssl; r2= ssssl 1; r3=ssl ssl. The same three tempi gave inter-onset values for the short intervals equal to 150, 250 or 350 ms, and to 300, 500 or 700 ms for the long intervals. In the same way as before, each sound of each rhythm for each tempo had to be adjusted. Subjects. 10 adults subjects accustomed to psychoacoustic experiments, students or researchers in our laboratory, participated in all the experimental sessions. Results An analysis of variance on the adjusted levels in the isochronous sequences as a function of position (5 or 6), tempo (3), starting value (2: louder or softer), and repetition (2) revealed no significant effects. The mean adjustments obtained were very regular and showed an adjusted level slightly below the reference level (general mean level adjusted to 69.2 dB with reference level at 70 dB). They varied neither with tempo, nor when the modified sound began louder or softer, nor with the location in the sequence, and they did not vary with the number of elements of the sequences. Figure 1a shows the values in dB obtained for the two control sequences (5 or 6 elements), as a function of the position of the adjusted tone. The curves are flat, indicating that for isochronous sequences nothing
Contemporary music review
196
prevents the subjects from having a quasi-constant representation of the intensity of the sounds. The second part of the experiment examined how the introduction of temporal organizations perturbs this regular pattern of estimations. In the second part of the experiment, an analysis of variance of the adjusted values as a function of position (6), rhythms (3), tempo (3), and starting values (2) revealed significant differences for specific positions (F(5,40)=3.02, p<.05), as shown in Figure 1 (b, c, d). For rhythm r1, the adjusted intensities were nearly 2 dB higher for the first (F(1,8)=14.51, p<.01) and the last sound (F(1,8)=9.88, p<.05) as compared to the central sounds in positions 2, 3, 4 and 5. Thus, a sound which begins a long interval is adjusted to a higher intensity. In the same way, for rhythm r2, the last sound was adjusted louder (1.5 db) than the others (F(1,8)=9.47, p<.05). For rhythm r3, the last sound as well as the third sound (which correspond to the
Figure 1 Adjusted intensity in dB as a function of the position of the sound for control sequences of 5 and 6 sounds, and for the three rhythms. beginning of long intervals) were adjusted 1 or 2 dB higher than all the others (last sound: F(1,8)=9.60, p<.05; third sound, F(1,8)=9.54, p<.05). The tempi and the starting value of the modification (softer or louder) had neither significant effects per se, nor interactions with the effect of position. As these effects are independent of tempo in the two parts of the experiment, it is not the duration of the intervals per se which must be taken into account but rather their
Rhythm perception: Interactions between time and intensity
197
relative values. Let us now compare the two parts of the experiment. The control sequence with 6 elements can be written sssssl since the pause separating two repetitions was twice as long as the inter-onset interval. Rhythm r2 differs from this sequence by the presence of one long interval: ssssl 1. If the changes in intensity occurred mainly to underline the boundaries of a group, we would have found variations in the control sequences as well. On the contrary, nothing appeared in part 1. Thus, it is really the coding of two relative values inside a sequence which produces variations in adjusted levels.
Conclusion Subjects are able to adjust to the same level each sound in an isochronous sequence of 5 or 6 tones. The perceived intensity does not vary with the position of the sounds in the sequence. However, when the temporal structure of the sequences is made more complex by including two interval lengths (long and short), the ability to adjust sounds in each position in the same way disappears. The adjusted level depends on the temporal structure. We can neither explain the results by masking (as will be discussed later) nor by a “boundary phenomenon” (where the beginning and end of any sequence would give rise to a change in intensity), since adjustments do not vary in the control sequences which also end with a long interval. A second experiment investigates what happens when subjects adjust the interonset intervals between two sounds in sequences where intensity was objectively maintained constant.
Experiment 2: A systematic temporal distortion A series of experiments (Drake & Gérard, in preparation) examining the reproduction of musical rhythms by young children, adult musicians and nonmusicians has identified a systematic temporal distortion in reproduced rhythms: in a series of notes separated by short intervals, the last interval before a long interval is systematically lengthened by between 10 and 20%. Is this distortion related to motor programming or to a perceptual error coded in the internal representation of the sequence? The present experiment also uses an adjustment paradigm which reduces motor factors to a minimum, and which focuses subjects’ attention on time: the subject is asked to adjust the duration of the interval between 2 tones. Full details of procedure are presented in Drake, Botte & Gérard (1989). Method Models. Figure 2 (bottom) presents the 8 rhythms created by varying the number of short intervals (s=2, 3, 4 or 6) and the number of long intervals which follow (1= 1 or 2). The rhythms can be coded: r1=ssl, r2=sssl, r3=ssssl, r4=ssssssl; r5, r6, r7 and r8 were the same rhythms with an additional long interval at the end of the sequence (for example, r5=ssl 1, and so on). The sounds were 400 Hz, 60 ms tones at 70 dB SPL. The inter-onset
Contemporary music review
198
short interval was always 300 ms, long intervals being 600 ms for r1, r3, r5 and r7 and 900 ms for r2, r4, r6 and r8. Procedure. The subjects heard the “correct” rhythm followed by a “distorted” rhythm in which one short interval (called “target interval”) was much shorter (90 ms) or much longer (390 ms) that it should be. Two target intervals could appear: the short interval just before the first long interval (target 2) or the short interval two places before the first long interval (target 1). The subjects could lengthen or shorten this interval by pressing one of two buttons. They then heard the “adjusted” rhythm with the correction they had asked for, could adjust it again and hear their adjustment as many times as they wished. They pressed a last button once satisfied that their “adjusted” rhythm was the same as the “correct rhythm”. The experiment was in two parts—a) a training session in which the subjects did five blocks of models until the target interval was always adjusted in the right direction, and b) the experimental situation in which the models were presented in a random order. The subject was in a sound-proof room and heard the rhythms, synthesized by computer, over a loudspeaker. Subjects. 20 psychology students (nonmusicians and not accustomed to psychoacoutic experiments) took part in the experiment. Results As the duration of the sound was 60 ms and the short inter-onset interval 300 ms, the correct adjusted value for the target interval was 240 ms. An analysis of variance as a function of rhythms (8), starting values (2), and targets (2) showed a significant effect of targets (F(1,19)=19.36, p<.001). The mean adjustment for target 1 was 240.20 ms, which shows how accurate the adjustment of time intervals inside a group of sounds can be, but the mean adjustment for target 2 was 249.54 ms, which shows a systematic distortion appearing just before a long interval. This lengthening of the last short interval turned out to be valid for two conditions: 1) when the distorted value started off at 390 ms (F(1,19)=23.82, p<.001) and not when it started off at 90 ms, and 2) when the starting value was 390 ms, it only occurred if the rhythm contained 4 and 6 short intervals, that is to say in the longer rhythms (for 4 short intervals: F(1,19)=12.5, p<.005; for 6 short intervals F(1,19)=13.35, p<.005). The number of long intervals (1 or 2) after the short ones had no significant effect. These results are presented in Figure 2 (top).
Rhythm perception: Interactions between time and intensity
199
Figure 2 Adjusted intervals in ms for target 2 (last short interval) as a function of the number of short intervals in the sequence (N), for the two starting values. The lengthening of the specific target 2 in longer sequences (which are comparable to the control sequences of Experiment 1) is about 10%, that is to say at the short end of the range observed in reproduction tasks (10–20%). Therefore we suppose that such a distortion is primarily of a perceptual nature and that additional motor factors enhance it.
Conclusion Experiment 2 shows that subjects adjust quite accurately the short intervals in simple rhythmic sequences such as ssl—remember that they had not adjusted correctly the intensity of the last sound in such sub-groups in Experiment 1 (rhythm r3 for example). Moreover, they lengthen specifically the short interval preceding the last sound in long sequences like ssssl or ssssssl—remember that they adjusted correctly the intensity of the last sound in such sequences of Experiment 1 (control sequences). Therefore, the internal
Contemporary music review
200
representation of a sequence manifests either a temporal or an intensity distortion depending on the number of elements (size of the sequence), the internal composition of this sequence, and the dimension on which the subject focus his attention. Can we conclude that, before any constraint related to production or expression, and independently of any knowledge of musical laws (or, for children, naive definitions of “music”), the internal representation of a rhythmic sequence is never an exact copy of the physical stimuli because the processes of perceptual coding introduce distortions in time and/or intensity domains? The discussion which follows will try to answer this question.
General discussion The main purpose of this paper was to examine the internal representation of simple rhythmic sequences using a “perceptual” paradigm. In a previous discrimination task with children, whereas both temporal and intensity information was perceived and encoded, temporal structure was considered far more important in the definition of “music” than intensity-based structure. The interactions between time and intensity observed when the results of Experiment 1 and 2 are examined lead us to consider that judgmental aspects are not the only determinants. Let us now try to evaluate the possible bases of these results. 1 Judgmental aspects in rhythm perception A tempting explanation which is parsimonious though not providing a complete account could be adopted. Subjects may perceive intensities and intervals accurately, but they are not satisfied with the results of their adjustments because the units thus created do not conform with the usual prototype or template of musical productions (Handel, 1989). The pattern of intensity or temporal variations is therefore modified in order to obey traditional configurations of musical sequences and “accented” events are marked from surrounding events by differences in either time or intensity. Therefore, perceptual distortions, observed even with our socalled “perceptual” method, are not purely perceptual. They are judgmental and based on the comparison with ideal models, that is to say related to adults’ definition of what music must be, in the same way as children totally ignored intensity variations as sources of differences. 2 Perceptual distortions: some new phenomena Suppose now, on the contrary, that our method really does reflect perceptual representations. Strict logic assumes that in an adjustment task where sounds are (or should be) of equal intensity, if subjects adjust one sound louder than another, it is because they perceive the first softer than the second. Therefore, the “percept” and the adjusted value are in opposite directions. We will now try to apply traditional interpretations of modifications in loudness to our results, and see if they fit. Figure 3 will help our reasoning by presenting an example.
Rhythm perception: Interactions between time and intensity
201
Figure 3 Some hypotheses. Let us see what happens with masking. If pro & retroactive masking (Zwicker, 1982) were at work, sounds beginning short intervals would in fact be perceived softer because they receive both masking from those preceding and following them, compared with sounds preceded or followed by longer intervals. If this was the case, the constancy of the adjustments in the control sequences cannot be explained, neither can the absence of the role of the tempo (since the degree of masking is usually closely related to the time between two sounds), neither can the direction of the distortion we have observed: sounds in the vicinity of long intervals would have been adjusted softer and not louder if they received less masking. For these reasons, and also considering our rather slow tempi and short durations of sounds, this hypothesis can be rejected. Do subjective accents permit another explanation? Povel & Okkerman (1981) observed subjective accents (that they did not explain by masking or temporal summation) on the second tone in repeated groups of two tones. Our control sequences showed that this phenomenon cannot be generalized to groups of 5 or 6 sounds and, in the experimental sequences, when intensity differences appeared, the direction of the
Contemporary music review
202
distortion was the inverse of that observed by Povel & Okkerman. Therefore, this hypothesis must also be rejected. Now, if temporal summation, which appears for intervals as long as 200 ms (Irwin & Zwislocki, 1971; Elmasian & Galambos; 1975; Scharf, 1978), was at work, the first sound of a sequence would enhance the loudness of the second, and so on to the end. The “percept” would be softer for the first sound as compared to the others, and the adjusted values would be the inverse. This is partly what we observed, as shown in Figure 3. Suppose now that temporal summation could be generalized with pro & retroactive effects. Then, as shown in the figure, the “percept” for the first and the last sound would be softer, and the adjusted values louder. This is consistent with our results. This interpretation has the advantage of preserving, at least, the direction of the changes, however it has to be tested in further experiments, since rhythm 2 and 3 of Experiment 1, which were adjusted louder at the end (like the rhythm 1 selected for Figure 3), were not distorted at the beginning. A last solution, that of a new perceptual illusion, cannot be excluded. In Experiment 1, the only fact capable of explaining all the distortions is that they occurred only when the temporal window inside which the sound appeared was longer. We can therefore deduce that in a chain of short interval ssss, the fast burst of the sounds gives an impression of loudness x; when a sound is followed by a longer interval, its subjective loudness diminishes with time and this sound gives the impression of loudness y, softer than x. Subjective intensities which are coded in the internal representation of the sequence would be a combination of intensity and time. The main explanation of the role of the immediate context of the sounds would therefore assume that the mental representation of the intensity is submitted to the perceptual illusion of a lower loudness when longer intervals follow. Finally, temporal distortions have to be interpreted. We assume again that when subjects adjust an interval longer than it should be, it is because they perceive this interval shorter than it is. This difference would be another new perceptual illusion. Similar results and interpretations for time illusions were presented recently by Nakajima, Sasaki, van der Wilk & ten Hoopen (1989) and by ten Hoopen, Vis, Hylkhuysen & Nakajima (1989), who proposed a model of “trace-overlap”: the processing of short intervals would not be finished when a new sound arrives, so the subjective beginning of the following interval would be postponed, and thus, this second interval would be perceptually shortened. The heuristic value of such interpretations (for temporal as well as for intensitive distortions) is interesting because a psychophysical basis for explaining performance data can be proposed. Usual performance data show an increase of the intensity of the first and the last sounds reproduced in rhythmic groups, and a lengthening of the last short interval before a long one. If performers increase temporal and/or intensive values of these notes in rhythmic groups, it could be due to the fact that they (and the listeners) perceive them shorter and/or softer than they are. Therefore, in an initial perceptual step of processing, the performers would attempt to restore regularities in the group, for themselves and the listeners. We are collecting new adjustment data in order to confirm this conclusion.
Rhythm perception: Interactions between time and intensity
203
References Clarke, E.F. (1985). Structure and expression in rhythmic performance. In P.Howell, I.Cross and R.West (Eds). Musical Structure and Cognition. London: Academic Press, 209–236. Clarke, E.F. (1988). Generative principles in music performance. In J.A.Sloboda (Ed.), Generative processes in music. The Psychology of Performance, Improvisation and Composition. Oxford: Science Publications, 299–328. Drake, C., Botte, M.C. & Gérard, C. (1989). A perceptual distortion in simple musical rhythms. Proceedings of the Fifth Annual Meeting of the International Society of Psychophysics, Cassis, France. Drake, C. & Gérard, C. (In preparation). The role played by accents in the reproduction of simple musical rhythms by musicians, nonmusicians and children. Drake, C. & Palmer, C. (in preparation). Accent Structures and Piano Performance. Elmassian, R., & Galambos, R. (1975), Loudness enhancement: monaural, binaural, and dichotic. Journal of the acoustical Society of America, 58, 229–234. Gabrielsson, A. (1974). Performance of rhythmic patterns. Scandinavian Journal of Psychology, 15, 63–72. Gérard, C. & Drake C. (1990). The inability of young children to reproduce intensity differences in musical rhythms. Perception and Psychophysics. 48, (1), 91–101. Gérard, C. & Auxiette, C. (1988). The role played by melodic and verbal organization in the reproduction of rhythmic groups by children. Music Perception, 6, 173–192. Handel, S. (1989). Listening: An introduction to the perception of auditory events. MIT Press: Cambridge, Mass. Irwin, R.J. & Zwislocki, J.J. (1971). Loudness effects in pairs of tone bursts. Perception & Psychophysics, 10, 189–192. Monahan, C.B. & Carterette, C. (1985). Pitch and duration as determinants of Musical Space. Music Perception, 3, (1), 1–32. Nakajima, Y., Sasaki, T., van der Wilk, R.G.H. & ten Hoopen G. (1989). A new illusion in time perception I. In Proceeding of the First International Conference on Music Perception and Cognition, Kyoto, Japan, 17–19 October. Povel, D.J. & Okkerman, H. (1981). Accents in equitone sequences. Perception and Psychophysics, 30, (6), 565–572. Shaffer, L.H., Clarke, E.F. & Todd, N.P. (1985). Meter and Rhythm in piano playing. Cognition, 20, 61–77. Scharf, B. (1978). Loudness in Handbook of Perception. Vol. IV. Academic Press Inc. ten Hoopen G., Vis, G., Hilkhuysen, G. & Nakajima, Y. (1989). A new illusion in time perception II. In Proceeding of the First International Conference on Music Perception and Cognition, Kyoto, Japan, 17–19 October. Zwicker, E. (1982). Psychoakustik, Springer Verlag.
Contemporary music review
204
Mechanisms of cue extraction in memory for musical time Irène Deliège Unité de Recherche en Psychologie de la Musique, Université de Liège, Belgium Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 191–205 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
Starting from the hypothesis explored in previous research, that there exists a mechanism for cue extraction in the organisation of listening to a musical work, the incidence of the mechanism in memory for temporal form will be studied. The experiment, conducted using a contemporary work—Eclat—by Pierre Boulez—consisted of the localisation of extracts after three hearings of the work during which subjects carried out grouping tasks made necessary by a previous study. In the comparison of the results of professional musicians and non-musicians, we observe that the latter retain a smaller number of cues whilst listening, and are less sensitive to their tonal qualities. Moreover, whilst the realisation of the “temporal line” (ligne temporelle) of the work by musicians corresponds to its actual duration, it is noticeably more “centred” by non-experts. KEY WORDS: Cue, group, grouping, variant, invariant, time memory.
The first Music and Cognitive Science symposium which took place in Paris in 1988 was, for me, a chance to introduce the subject of some work in progress on The psychological organisation of listening to music. There an overview of three experimental studies centred around the hypothesis of the existence of a mechanism for cue extraction in the construction of a schema of the work was presented. This first phase of the work took into account the role of so called cues in the formation of groupings of groups leading to the capture of an overall mapping of the work during its perception in real time. Pieces from contemporary musical repertoire the Sequenza VI for alto solo by Luciano Berio, and Eclat by Pierre Boulez (I.Deliège, 1989, I.Deliège et al., 1990)—
Mechanisms of cue extraction in memory for musical time
205
were chosen for the experiments. Quite a strong similarity between the performance of professional musicians and non-musicians was observed. The following experiment is concerned with the schema of the work memorised whilst listening and performing a localisation judgement task for extracts consisting of complete and well formed groups. Nevertheless, what is investigated here is not the effect of the grouping itself as an entity, nor the perception of its boundaries, but that of the cue which it embodies, and which serves as a label for memory for the temporal succession of musical events. It should be made clear that the localisation judgement task was introduced at the end of the experiment discussed above which was concerned with the perception of groupings of groups during the hearing in real time of Eclat for fifteen instruments by Pierre Boulez. Thus the subjects had heard the piece three times before undertaking the localisa-tion of extracts task. They had not been told about this task beforehand. The research aims to separate out the effect of the cue in remembering the schema of the work, i.e. the foundation markers laid by the cues whilst listening with no other preparation. We endeavoured therefore to create experimental conditions which avoided the interference of any other indicators. It is not unhelpful here to re-emphasize that to a large extent, cue extraction brings into play the notions of invariant and variant in musical perception, notions which we know to be widely exploited in composition and analysis. The invariant function is inherent in the very nature of the cue. As soon as it has been identified this invariant takes its true form. It is a relatively stable entity, something made discrete. It could be said, to take up an expression of Leonard Meyer, that it has become a “palpable object” (1973, p. 90). However, the notion of invariant, transposed into the framework of cue extraction in listening to music, is slightly distanced from the exclusively thematic nature which is traditionally assigned to it. This had already been noted during the analysis of the formation of groupings of groups just mentioned (Deliège ibid.). In fact, even if a thematic quality of the cues detected was perceived in the results obtained for Sequenza VI by Berio, the extracts certainly did not have characteristics comparable to those evident when listening to Eclat by Boulez. However, the similarity of the groupings recorded for such different works was blatant, and it was from here that the idea of a listening procedure with the same underlying mechanism came. In other words, in both cases, it was plausible to postulate the operation of cue extraction, the nature and auditory characteristics of these cues being dictated by other structural characteristics of the composition. In this case it was a question of combinations of pitches and timbres. Certain invariants could be detected in the frequent use of trills and resonating sounds, or even isolated sounds bounded by groups of rapidly flowing sounds. Hence a cue always contains rare, but striking characteristics which facilitate recall. It has a role as a place marker, a simple and efficient way—as Ribot, one of the great masters of European psychology pointed out about a century ago now—of treating large quantities of data. The cues allow the discrimination of resemblances and differences. Thus they appear to be figures around which a musical process is constructed and progresses. It might be sufficient at this point to employ the old saying “Birds of a feather flock together” to give a good idea of the process to which the cue gives rise during listening. Regrouping of structures by similarity operates as long as they have the same fundamental invariants. It ends as soon as a new contrasting element is introduced, and
Contemporary music review
206
reappears as soon as new cue indicators are available. It takes effect in the setting up of a Same-Different principle which leads to the definition of temporal regions in the flow of the work. We could ask if the cue would by its very nature preserve the memory of the temporal organisation of the work, or at least facilitate it. To ask the question in such a way is to come back to the idea formulated in terms of a temporal horizon by numerous authors who have considered the question of time perception, (Janet, 1928; Fraisse, 1967; Cottle, 1976; Richelle & Lejeune, 1979; Michon, 1979). However, it is exclusively the retrospective aspect—that of memory for musical events which will be considered here. It is very much a question of a retrospective process rather than a prospective process, in the sense understood by Husserl (1928–1964), to the extent that only the “stores” tapped into by listening will be accounted for. This retrospective aspect distinguishes this type of memory from the current concept of the temporal horizon which, originally considered to be a product of memory, generally integrates three temporal concepts: past, present and future (Fraisse, 1967, p. 161; Lejeune, 1984). The process of retention during listening to music ought to be distinguished from one which contains a finite boundary or horizon, whilst at the same time it preserves the symbolic character of this concept with reference to space that is captured by all authors, and also found in Husserl. Retention is in fact seen by this author as “degraded, starting from the immediate past”, and “is compared to an object in space whose dimension appears to diminish gradually as we move away from it.” (In C.Deliège 1989, p. 172). In the end, we could perhaps suggest the idea of a temporal line capable of symbolising the whole extent of the work in memory: the specific events, labelled by cues which set them up for their storage and recall, being progressively set down along it during listening. No doubt such formulae only have a metaphorical value. Nevertheless, the relationship with space, in music, is very immediate and contemporary, not least for things relating to time. “Even our representation of time, our configuration of time, is in spatial form”, writes Guyau (1980, p. 70) about the relationship between time and memory. And elsewhere we can still read that: “Time (objectively) brings about necessary changes in space that we represent, sometimes by infinite lines, sometimes by fixed lines (periods).” (ibid., 48). Applied to the representation of the flow of the musical work, the idea of a line could well just be a way of actively taking up a common symbol which has an ordinary role in our everyday lives. It could be suggested that the concept one which underlies—and may even be “unconscious” in authors, when we consider how well incorporated the idea has become—Imberty’s dynamic vectors which take the weight of the structures during listening (1981, Chs. 3 & 4); in Fraisse when he writes: “The temporal perspectives born out of past and future experiences can only be made the object of a representation if we place the events side by side1 in relation to one another. This is a natural transcription, because temporal order often coincides with spatial order”. (1967, p. 302); or again in a composer like Xenakis (1956/71) speaking of duration in an examination of the different components of sound: “Time is considered to be a straight line on which it is a question of marking points corresponding to variations of the components.” (p. 10). Understood in its immediate sense, the notion of line is no doubt most directly adaptable to the representation of certain types of music, from certain periods, notably when the melodic element is prevalent. To represent the succession of musical events in
Mechanisms of cue extraction in memory for musical time
207
memory necessarily implies the inclusion of many other parameters. Hence the temporal localisation, as envisaged in this paper, in the context of a work like Boulez’ Eclat, assumes that the notion will be perceived in its broadest sense: it is more a case of situating sonorous regions in proximity to one another, ordering themselves—as Fraisse said when talking about the concept of temporal horizon—according to a plan of their succession and achieving the laying down of temporal perspectives analogue to the spatial perspectives.2 There are few empirical data which pertain to memory and the localisation of musical events in the context of works of a significant length. Moreover, the investigation of the temporal component of music is more often restricted to aspects of rhythmic or metrical order within a piece whose length has been voluntarily limited. Recently, Halpern (1988) addressed the problem of the localisation of pitch events within the whole time course of a song, based on an ensemble of well known songs which subjects had to memorise, but which they did not hear in real time. It appears that the decision times of the subjects are longest when the items to be detected are further from one another in time, as they are when the items are situated far away from one another in the song. The goals of this study were certainly not comparable to those of the present research, but a parallel can justifiably be drawn to the extent that similar mental operations, which Clarke and Krumhansl define in terms of a “re-run strategy” (1990, p. 218), appear to have been employed to execute the tasks in both experiments. In other words, the subjects perform a sweep, a sort of scan of the temporal horizon, which consolidates the material in order to detect the correct region before responding. Contrary to Clarke and Krumhansl, who estimate that such observations have no element which can be applied to material of a greater length, it can be imagined that if the cue is the marker—sufficiently clear by nature—which is postulated here, it can mark regions, generating by its repetitions a sensation of a measure of time past and of the space occupied by figures featuring in the architecture of the work.
The experimental material Fifteen extracts were selected in the work by a precise division carried out from recording on magnetic tape. They were defined in such a way as to present complete groups corresponding to the well-formedness and preference rules of Lerdahl and Jackendoff (1983, Ch. 3). These extracts were distributed throughout the greater part of the work’s duration. They include a variety of local cues, but were not selected according to the groupings perceived by the subjects, as these were not known when the materials were set up. The analysis of the data from this experiment revealed that the subjects had divided the work up into five sections. The fifteen extracts are distributed throughout them as follows: Extract 1—beginning of the work—is situated in the first section which comes to a close at 1′15″ Extracts 2 to 8 are situated in the second section (between 1′15″ and 4′50″). Extracts 9 to 11 are situated in the third section (between 4′50″ and 7′16″. Extract 12 is situated in the fourth section (between 7′16″ and 8′50″). Extracts 13 and 14 are situated in the fifth section and extract 15 concludes the piece.
Contemporary music review
208
The cue markers characterising the extracts are shown below with reference to the figures on the score, summarizing the sonorous elements likely to have constituted the cues:3 – extract 1: rapid piano figure followed by a long resonance and a chord in the wind instruments, ending in a trill, (score nr 1). – extract 2: characterised by harp sonorities with a regular beat and a strong contrast in tempo compared with the structures of the preceding piano solo, (score nr 3). – extract 3: groups of trilled harmonics followed by a sub-group, (score nr 4). – extract 4: contrast between groups of rapid sounds at the beginning and at the end of the extract which bound isolated sounds in the centre, (score nr 6). – extract 5: brief group followed by resonances, (score nr 6). – extract 6: group of trills ending on trills and appogiaturas. (score nr 7). – extract 7: two ascending figures with various instruments, interrupted in the middle by some piano sounds, (score nr 8). – extract 8: long resonances of glockenspiel and bells onto which rapid figures of cimbalom, guitar and harp are grafted, followed by vibraphone, celeste and piano, (beginning score nr 9). – extract 9: isolated dry sounds, repetition of bells and resonances, (score nr 14). – extract 10: piano solo in a low register, (score nr 16). – extract 11: follows on from previous extract: similar tempo, but addition of new timbres, (score nr 17). – extract 12: trill in an augmented forte chord, ending with a small group on the piano in the bass (end score nr 24). – extract 13: return of the wind instruments on a piano chord followed by a long pause, ending in a staccato forte group on several instruments, (score nr 25 & 26). –extract 14: structures with a fairly metrical character having simultaneous onsets and giving a clear impression of verticality. (beginning score nr 27). – extract 15: end of the work: isolated groups with orchestral tutti; piano presence, prolonged sonority and long resonance. Some ambient sounds in the room can be detected. In summary, the different extracts can be regrouped into five categories as follows, according to the dominant cue they possess: A—Extracts 1 & 15 with which 9, 10 & 13 are associated; B—Extracts 2, 9 & 14; C—Extracts 3, 6 & 12; D—Extracts 4 & 7; E—Extracts 5 & 8: As this dominant cue is not exclusive, other more hidden aspects may suggest links between the categories of extracts. Further considerations The groupings of groups determined in the task which had preceded the present experiment resulted in a very similar schema in all subjects irrespective of their level of
Mechanisms of cue extraction in memory for musical time
209
musical training, (cf. I.Deliège, 1989). We might in the first instance expect that memory for this schema, the topic of the following investigation, might result in the same order of similarity. However, such reasoning could prove hasty. In fact the identity of perceived groupings in real time does not necessarily mean the same thing as the identity of mnemonic processing. The formation of the groupings does not necessarily inform us about the perceived “meaning”, in other words about the internal substance which has been processed mentally on hearing the groupings. In order to replicate a comparable similarity in the results, it would have to be postulated that the musicians and the nonmusicians had registered the events in the same way, and with equivalent pertinence, which hardly seems likely. Consequently, given the large difference in musical competence between the two samples, we would expect performance by experts to be more accurate. Moreover, the perception of the schema of the work in real time provides little information as to whether the musicans and the non-musicians produce their groupings out of an equally rich pool of cues. It is, in fact, perfectly plausible that similar groupings are perceived by non-musicians, but from a smaller collection of cues. A similar theory has already been advanced with regard to the less accurate performance by non-musicians in another experiment which also pertains to the incidence of cue extraction in the memorised schema (I.Deliège, 1989, Expt. 3). The subjects were asked to reclassify short extracts (5 to 20″) as a function of the six sections of Sequenza VI by Berio, but here the coordinates of the schema were explicitly referred to in the instructions, as well as during the hearings which preceded the task. Consequently, in an attempt to bring the question closer to the relative quantity of cues gathered, the extracts to be localised in this new test were expressly chosen to be of a more variable length (7–35 seconds), which ought to allow us to observe whether the length of the extract, i.e. a varying density of information, influences the precision of the localisation positively or negatively. In addition, conjointly with the particular cues that they possess, an overall primacy and/or recency effect, well known in memory data (Murdock, 1962) ought, for the results specific to extracts 1–15 offer supplementary information for the study of memory for cues and for the perceived temporal schema. Finally, this experiment will also be an opportunity for a new approach to the phenomenon of the “imprint”. We should remember that the imprint is a sort of prototypical figure, resulting from the reiteration of the same type of invariant. It is, in fact, difficult to postulate that the particular characteristics of a group of presented structures could all be memorised. On the contrary, a summary is filtered in memory which retains the principle coordinates (I.Deliège, 1987, 1989). In other words, the imprint relieves the memory of details, but in so doing reduces its efficacy in favour of a more restricted data base. A certain “haziness” in performance can result, which, if the phenomenon is verified, should be seen principally in less precise localisations when the extracts belong to a temporal region perceived as being part of a same section in the mental schema of the work, (see above), i.e. for extracts coming from a grouping which is defined mentally as being constructed around a certain type of invariant structure, in the same way as was the case in the experiment just referred to, we should observe that the use of similar cues at points which are temporally spaced out in the work, leads to confusion in the localisation judgements.
Contemporary music review
210
Method Subjects This experiment required 32 subjects, 16 of which were accomplished musicians familiar with the contemporary repertoire, and 16 non-musicians. Procedure and materials It should be noted that the subjects carried out this test immediately after the experiment about the perception of groupings, for which they had heard the work three times from the beginning to end. They had not been forewarned to avoid the use of a premeditative strategy for remembering. The response sheets were marked with a horizontal line 22.5cm long divided into 15 boxes of 1.5cm representing the total length of the work and the possible location points of the extracts to be numbered from 1 to 15. The subjects were not obliged to fill in all the boxes. The instructions allowed the subjects the opportunity of marking extracts between others already localised on the line so that responses were not limited by preceding ones. The extracts to be localised were recorded on magnetic tape and played in a different random order for each subject. Each subject was tested individually.
Results and comments 1. Overall examination of the data a) Mean serial positioning and perceived duration of the work by the subjects Table 1 shows, in ascending order, the mean positions of the 15 extracts for each category of subjects. It can be seen that the non-musicians have a noticeably more “centred” view of the overall length of the work. On a scale representing length in time ranging from 1–15 the musicians showed an average length spreading between 1.7 and 14.7, a length approximately equivalent to that of the work itself, whereas the non-musicians limited this spread to between 6 and 12.1. However, if we look at the relationship which exists between the actual position of the extract in the work and the mean position attributed by subjects, we observe a mean level which is statistically very significant both for the musicians (.91; n=15; p<0.001) and for non-musicians (.78; n=15; p<0.001).
Table 1 Mean position of the extracts in ascending order of the localisation judgements. MUSICIANS
NON-MUSICIANS
extract n°
mean localisation judgement
extract n°
mean localisation judgement
1
1,7
1
6,0
2
5,1
6
6,3
Mechanisms of cue extraction in memory for musical time
211
6
6,2
9
6,5
3
6,3
4
7,2
4
7,2
10
7,5
9
7,3
2
7,8
5
7,7
5
7,9
10
8,5
3
8,1
7
9,3
7
8,9
8
9,5
8
8,9
11
9,9
11
9,3
12
10,5
12
9,4
13
10,7
13
10,7
14
13,3
15
11,3
15
14,7
14
12,1
b) Influence of the duration of the extract on the precision of the localisation judgements The extracts were divided into two groups according to the variable extract duration and based on a mean duration of 19″35 (median=20″). Group 1 (short extracts) consisted of the extracts whose duration was less than the mean duration and group 2 (long extracts) consisted of extracts whose duration was greater than the mean. From table 2 we can see that the average deviation between the actual location and the location judgement diminishes for the long extracts. At the same time, this effect is more marked for the musicians who show a difference of more than a minute’s benefit in precision of localisation for the long extracts, whereas this difference is only fifteen seconds for nonmusicians. An analysis of variance as a function of the absolute value of the deviation between the actual location and the average location attributed by the subjects (cf table 2), and the duration of the extract as defined by the distribution described above (short vs. long extracts) shows that this factor has no statistical significance for the non-musicians (F[1,13] =.46; p=5.1). The musicians however benefit significantly from the supply of greater temporal information. [F(1,13)=5.57; p=0.035].
Table 2 In relation to the mean duration of the extracts (19′35″), the table shows the mean displacement between the localisation judgments and the actual position of short extracts (<19′35″), and long extracts (>19′35″), for musicians and nonmusicians respectively. Musicians
short extracts
long extracts
2′33″
1′23″
Contemporary music review
non-musicians
212
3′11
2′56
c) Correct localisations as a function of the perceived sections of the piece Figure 1 regroups the correct responses with respect to the 5 sections defined by the subjects in the experiment preceding the localisation of extracts. It can be observed that the musicians recognise the initial extract of the work remarkably well (section 1). Their percentage of correct localisations is also higher for the final section, which reflects a primacy and recency effect, as observed in the recall curves for serial position in a list of words.
Figure 1 Boulez—Eclat: % of correct responses by section. Order of the sections x % of correct responses by section. The non-musicians didn’t memorise the initial extract of the work quite so well globally speaking. The overall view shows that their localisation judgements are better for section 2. Hence the existence of a recency and a primacy effect doesn’t seem to appear in such a striking way. It nevertheless appears in some individual subjects when the data for this test are analysed differentially, (see point 2 below) d) Perceived proximity of the extracts as a function of the memorised schema Figures 2a & b give an idea of the way in which the extracts appear to be organised according to a criterion of proximity relative to the sections defined in the subjects’s mental representation during the hearing of the work.
Mechanisms of cue extraction in memory for musical time
213
Figure 2a Boulez, Eclat: Perceived proximity of the extracts by musicians.
Figure 2b Boulez, Eclat: Perceived proximity of extracts by nonmusicians. For this approach, by a method of hierarchical partitioning known as “clustering” ie the regrouping of extracts in terms of perceived proximity represented graphically in the form of branching trees, the five sections are shown by 5 equidistant points in euclidian space in 4 dimensions. Each extract is also represented by a point in this 4 dimensional space. In order to do this, each section is shown as having a weight attributed to it
Contemporary music review
214
corresponding to the number of references which have been made to it. The extract is represented by the centre of gravity of the 5 points showing the sections, taking into account their weight. Proceeding in this manner, an extract which has been located by all subjects in the same section would be represented by the same point as the section itself. The representation of the extracts allows us to use the euclidian distance as a measure distance at a measure distance. At each stage the 2 extracts (or clusters) which are grouped together are replaced by their centre of gravity. The musicians’ results arrange the extracts quite well in proximity to one another their “schema” of proximities, but extract 12 (the only extract from section 4) is as a function of the sections determined during hearing. Outside the tree we can see extracts 1 and 12, alone representing sections 1 and 4 respectively and appearing in isolation, and extracts 13, 14 and 15, all coming out of the last section, grouped together. The non-musicians also detach the initial and final sections of the work in mixed up with the extracts from sections 2 & 3 which are grouped together by all subjects in an equivalent manner in the centre of the tree: this effect is the result of the formation of an imprint which creates a sort of equivalence between characteristics encountered within a grouping which leads to localisation judgements which are relatively less precise when several extracts occur in the same section of the piece. e) Curve/graph of localisation in relation to their actual positions In parallel to the propositions already put forward, it may not be superfluous to give a further illustration of how close the performance of the subjects for the central sections (2 & 3) of the piece is. We can in fact note (figure 3) that the curve of mean localisation of extracts as a function of their actual position in the work takes a distinct line for the musicians for the beginning and end of the work. For the localisations of the central extracts however, the curves take a similar line for all subjects.
Mechanisms of cue extraction in memory for musical time
215
Figure 3 Boulez, Eclat: Mean localisation judgement of extracts. Actual location of extracts x perceived location of extracts. 2) Differential examination of the data a) Mode and mean localisations of the extracts A relative variability of performance was noted in the data collected during the course of this task. Table 3 presents the extracts in their normal order of occurrence in the piece, i.e. 1 to 15, and shows 1) the principal mode, i.e. the localisation most frequently chosen by all subjects from each category separately (the other numbers in the mode column are “secondary” modes in cases where the localisations have been rather inconsistent); 2) the mean position of the extract; 3) the calculated mean distance of the extract from its actual position. As has already been pointed out, the localisations are more dispersed for the extracts in the middle sections of the piece, but as predicted by the hypotheses, this effect is more pronounced for the non-musicians than for the musicians. The recency effect is clearer for the musicians, but is also evident in the responses of a few non-musicians who can be distinguished by a level of response particularly suited to the question at hand, as will be noted below.
Table 3 This table shows the extracts in the order in which they appear in the piece. For each category
Contemporary music review
216
of subjects it gives the mode,1 the mean localisation judgement observed and the displacement of this from the actual position. MUSICIANS Extract
NON-MUSICIANS
mode
mean
displacement
mode
mean
displacement
1
1
1,7
+0,7
1
6
+5
2
2/5
5,1
+3,1
2/12
7,8
+5,8
3
6/4 (*)
6,3
+3,3
11/12–5 (*)
8,1
+5,1
4
5/2–9 (*)
7,2
+3,2
3/7 (*)
7,2
+3,2
5
5/7
7,7
+2,7
12/5–3
7,9
+2,9
6
6
6,2
+0,2
2–5 (*)
6,3
+0,3
7
9 (*)
9,3
+2,3
9/7
8,9
+1,9
8
7–8–9–10
9,5
+1,5
4–6 (*)
8,9
+0,9
9
4/9
7,3
−1,7
3 (*)
6,5
−2,5
10
12/7–3 (*)
8,5
−1,5
10/5–8
7,5
−2,5
11
11/10
9,9
−1,1
9/11
9,3
−1,7
12
12
10,5
−1,5
9/12–10
9,4
−2,6
13
13
10,7
−2,3
13–6
10,7
−2,3
14
14
13,3
−0,7
15/14
12,1
−1,9
15
15
14,7
−0,3
15/14
11,3
−3,7
1
The figure in bold is the principal mode. The ordinary figures represent a secondary mode, i.e. the most frequent response after the principal mode. The numbers separated by a hyphen are the cases where 2 extracts were equally located in that position. (*)=extracts for which the modes do not reflect their actual location.
b) Relationship between the actual position of the extract and the localisation judgements taken from each subject A certain degree of variability in the aptitude for remembering the time course of the work is apparent in the level of correlation between the actual order of the extracts and the order shown in the responses for each of the subjects. These levels are between .14 and .99 with a mean of .62 for the musicians; and between −.38 and .79 with a mean of .30 for the non-musicians (see table 4). Negative correlations only occurred for nonmusicians, the subjects who showed this having localised extracts from the end of the work at the beginning and vice-versa.
Table 4 Correlations between the actual order of the 15 extracts and the order perceived by each of the subjects.
Mechanisms of cue extraction in memory for musical time
musicians
217
non-musicians
r
P
r
P
.99
<.001
.79
<.001
.94
<.001
.78
<.001
.90
<.001
.74
<.001
.89
<.001
.58
<.02
.79
<.001
.51
<.05
.74
<.001
.50
<.05
.72
=.001
.39
ns
.68
<.01
.37
ns
.65
<.01
.22
ns
.53
<.02
.20
ns
.50
<.05
.17
ns
.44
ns
.11
ns
.43
ns
−.04
ns
.35
ns
−.05
ns
.16
ns
−.09
ns
.14
ns
−.38
ns
Discussion Speaking about the treatment of temporal information, John Michon (1979, p. 276) regretted that ten years ago such questions had so rarely been broached until the present time. “The principals on which the access and storage of temporal information are based are still, for the most part unknown,” he wrote. “There is reason to suppose that during storage events can in some way be given a time label. In this case these time labels could be used to reconstruct temporal relations; (…) The labelling of time used in this way is one of the most powerful strategies we have for remembering temporal relations.” Some elements of replies to the problems put forward by Michon emerge from the research discussed in this paper at least as far as the temporal organisation of listening to music is concerned. The two previous approaches (I.Deliège, 1989 Expts. 1 & 2), allowed us to see that cue indicators are based on a topographic organisation of the work and lead to the formation of a temporal schema which is similar whatever the level of musical training of the listener. The regions defined in this way are subsequently accessed in memory via a marker left by the auditory characteristics of the cues detected. (see ibid expt. 3). However the performance of musicians was superior in the latter task. The results of the approach presented here which tackles analogue problems such via a different procedure,
Contemporary music review
218
confirm the previous findings. There is a distinction between the organisation of groupings whilst listening on the one hand, and their retention on the other, which is controlled by the relevant cognitive processes in relation to the degree of training acquired. Effect of the quantity of cue information Whether an extract was relatively longer or shorter had no effect on the precision of the required localisation judgements in lay subjects. The musicians on the other hand benefited noticeably from information of a longer duration, which suggests that not only do they detect more cues but they remember relatively more detailed content information. The procedure employed in the experiment which tapped into the “left overs” which had been stored without further support, thus put the emphasis on the fact that the influence of practice and familiarity with the material has a greater effect on the storage and the number of “markers” acquired than on the formation of groupings in real time whilst listening, which may operate in a similar manner from a smaller basis of cues. It would be interesting to see whether the results of non-musicians would become significantly more like those of the musicians with regard to the complexity of the memory schema and the cue markers after several repetitions of the task. Centred perception of events Independently of the hypotheses studied concerning the effect of the cues extracted whilst listening to the work in question, a noticeably more restricted time course was noted in the complex schema of the lay subjects. The localisation task required the subjects to engage in a sort of retrospective scanning of what had just been heard. If we take into account a common belief, it might be imagined that a new experience would appear longer than a familiar experience, and hence we would expect a greater impression of length in non-musicians. What then is the mechanism which has produced the opposite effect in the present case i.e. the amplification of the “format” of the mental schema produced by the musicians? William Friedman in his recent work “About time, inventing the fourth dimension” (1990), brings out elements consistent with our hypotheses. Notably, touching on the phenomenon of the distortion of subjective duration, the author notes amongst other things: “An interval seems longer if we remember more of its contents or if it was made up of more distinct segments. It seems shorter if we think of it in a simpler way”) (p. 20). Consequently in the case which presently concerns us, the “less centred” temporal schema observed as a function of the competence of subjects constructing it, could be an indication that it was constructed from a less limited cue extraction procedure during listening. This proposal, which could at first glance be interpreted as being outside the hypotheses proposed hitherto, supports in what is perhaps a rather unexpected way, the line of argument employed in the previous point of the discussion.
Mechanisms of cue extraction in memory for musical time
219
Memory for cue characteristics and permeability of the image Speaking of memory for temporal events, Fraisse notes “…not everything from our past experience is transferred into memory. A large part is not fixed. There is a large discrepancy between the immediate richness of a perception and what we can recall a few seconds later. Moreover, the deficit is not homogeneous, and in reality there is no correlation between the richness of the perceptual content and what we transfer into memory (…)” (1967, p. 167). In view of the points discussed above, we can add to this affirmation that training, as far as memory for musical structures is concerned, opens up not only a larger memory, but also a better safeguard of the labelling of stored elements. Experience appears, to a certain extent to act as a barrier to the imprint phenomenon. Evidence of this is found in the primacy and recency effects observed in musicians and apparently absent in non-musicians. In this way, the timbre of the piano present at both the beginning and the end of Eclat and the similar organisation of the initial and final structures of the work produced this imprint phenomenon in the non-musicians, demonstrated by frequent confusions in their localisations; contrary to what is observed for the musicians. In the same way, the negative correlations noted only in the responses of non-musicians could be the result of an analogue imprint process. In fact, it appears that it is once more the strong piano presence which may cause the production of an imprint frequently leading to the placement of extracts 11 & 12 towards the beginning of the line in non-musicians, something which does not occur in musicians. In conclusion, it appears that the mechanism for the extraction of cues, which is the basis of the perception of groupings, stems from the psychological processes present prior to training. It is for this reason that contrary to the memory process, it does not lead to such divergent results between subjects and between different types of training. As far as memory is concerned, outside the effects of musical training which have been noted, a relative inter-subject variability comes to light, such that, on the one hand the occasional musician has a fairly low level of performance whilst on the other, some non-musicians show performance levels which approach those of the most precise musicians. Should we conclude that this type of test could reveal specific/particular musical aptitudes? This aspect was not one of the concerns of this work, but would certainly merit further investigation. (translated from the French by Diana Stammers)
Notes 1. My emphasis. 2. It may be of some use to emphasize, as regards the representation of the work in memory, that the type of organization that cue extraction brings out is necessarily of a hierarchical nature, in view of the fact that the cue automatically becomes a dominant element in relation to a group in the first instance, and in relation to a grouping of groups when the extraction is confirmed. The hierarchical effect of the cue resides, then, within the constituted grouping: a sequential and/or hierarchical form will characterize the nature of the different groupings of groups in the mental representation as a function of the syntactic organisation of the composition.
Contemporary music review
220
3. The extracts are not reproduced here. The interested reader may wish to refer to the score (Universal Edition, UE 14283.
References Clarke, E. & Krumhansl, C. (1990) Perceiving musical time. Music Perception, 7, 213–252. Cottle, T.J. (1976) Perceiving time. New York, Wiley. Deliège, C. (1989) De la forme comme expérience vécue. In S.McAdams et I.Deliège (éds) La Musique et les Sciences cognitives. Bruxelles, Pierre Mardaga, pp. 159–179. Trad. Anglaise: On form as actually experienced, Contemporary Music Review, vol. 4, 1989, pp. 101–117. Deliège, I. (1987) Le parallélisme, support d’une analyse auditive de la musique: Vers un modèle des parcours cognitifs de l’information musicale. Analyse musicale, 6, 73–79. Deliège, I. (1989) Approche perceptive de formes contemporaines. In S.McAdams et I.Deliège (éds) La Musique et les Sciences cognitives. Bruxelles, Pierre Mardaga, pp. 305–326. Trad. Anglaise: A perceptual approach to contemporary musical forms, Contemporary Music Review, vol. 4, 1989, pp. 213–230. Deliège, I. & El Ahmadi, A. (1989) Mécanisme d’extraction d’indices dans le groupement. Etude de perception sur la Sequenza VI de Luciano Berio. Contrechamps, 10, avril 1989, 85–104. Deliège, I. & El Ahmadi, A. (1990) Mechanisms of cue extraction in musical groupings: A study of perception on Sequenza VI for viola solo by L.Berio. Psychology of Music, 18, 1, 18–44. Fraisse, P. (1967) La psychologie du temps. Paris, Presses universitaires de France. Friedman, F. (1990) About time. Inventing a fourth dimension. Cambridge Mass., MIT Press, A Bradford book. Guyau, J.M. (1890) Genèse de l’idée de temps. Paris, Alcan. Halpern, A.R. (1988) Perceived and imagined tempos of familiar songs. Music Perception, 6, 2, 193–202. Janet, P. (1928) L’évolution de la mémoire et de la notion de temps. Paris, A.Chahine. Imberty, M. (1981) Les écritures du temps. Sémantique psychologique de la musique. Tome 2 Paris, Dunod. Lejeune, H. (1984) Régulations temporelles et estimations des durées. In Le temps, des milliards d’années au milliardieme de seconde. Liège, Edition de la Maison de la Science. Université de Liège, pp. 119–133. Lerdahl, F. & Jackendoff, R. (1983) A generative theory of tonal music. Cambridge Mass., MIT Press. Meyer, L.B. (1973) Explaining music. Essays and Explorations. Chicago, Londres, The University of Chicago Press. Michon, J.A. (1979) Le traitement de l’information temporelle. In Fraisse et al. (éds) Du temps biologique au temps psychologique. Paris, Presses Universitaires de France, pp. 255–287. Murdock, B.Jr. (1962) The serial position effect in free recall. Journal of Experimental Psychology, 90, 65–74. Richelle, M. & Lejeune, H. (1989) Le temps de la psychologie. In P.Grotard & D.Thieffry (éds) Le temps. Bruxelles, Cercle de philosophic, U.L.B., pp. 115–130. Xenakis, I. (1971) Musique, Architecture, Tournai, Casterman, coll. Mutations-Orientations.
Generativity, mimesis and the human body in music performance
221
Generativity, mimesis and the human body in music performance Eric F.Clarke City University, London, UK Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 207–219 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
Musical performance has been studied from a vareity of perspectives, but the most pervasive has been concerned with the relation between structure and expression. Research into this has tended either to be of the “analysisby-synthesis” variety (e.g. Sundberg, 1988) or performance analysis (Shaffer, Todd and others). This paper examines the relationship between the generative basis of expression around which both these approaches have been based and the role played by directly mimetic and acoustical representations and by body movement. Two studies of performers’ attempts to reproduce different types of performances, either real performances or artificial transformations of real performances, are discussed. The results of these studies appear to support the generative/representational approach to performance expression, but it is suggested that a capacity for direct mimesis may play a significant role in determining expression in performance. The results of a further study that sheds light on the influence of body movement on performance expression are then discussed, and it is proposed that a reappraisal is required of the premises upon which many recent studies of performance have been based. KEY WORDS: Musical performance, generative, mimesis, imitation, echoic, expression, mind, body, dualism.
Contemporary music review
222
Introduction Research on musical performance over the past decade or so has focused on how it is that performers are able to produce a stable and coherent performance from the notation of a piece of a music, and in particular they expressively transform a piece in performance. In addressing this issue, the authors of empirical research have converged virtually unanimously on a single model of the origin and control of expression in music. This is a generative model, in which the expressive aspects of a performance (the deviations in timing, dynamic, articulation, etc. from the instructions of the score) are generated from a structural description of the music in the performer’s mind. There are three primary features of the empirical data collected by different researchers that have influenced the general adoption of this model. The first is the remarkable stability that has been observed in musical performances: performers clearly have an impressive ability to replicate the expressive profile of a piece in performance, with a degree of variability in the timing properties of a performance of one percent or less, sometimes across performances that are years apart (Shaffer, Clarke & Todd, 1985; Clynes and Walker, 1982). Quite apart from begging the question as to the origin of the expressive profile in the first place, a theory that proposed that a performer constructed some kind of literal record of all the timing data required to specify the entire performance (often many thousands of events) would put completely unrealistic demands on the memory system, and goes against all the evidence of research on the organisation of memory. The second critical piece of evidence is the exact converse of replicability: performers have the ability to change spontaneously (or occasionally on demand) their interpretation of a piece and to produce a different expressive treatment at very little or no notice, and with no obvious period of experimentation and memorisation. Evidence for this can be found in Clynes & Walker, 1982 and in recent work by Shaffer (Shaffer, 1990) who discusses data in which performers give two or three different interpretations of the same piece of music in immediate succession, some of which are at the request of the experimenter and are entirely unprepared. The third piece of critical evidence is the existence of expression in sight-read performances: once again Shaffer has shown that a pattern of expression exists in sight-read performances (Shaffer, 1981) which is replicated in subsequent performances. It is clearly, therefore, not an arbitrary expressive profile, nor is it something that the performer can have learned previously. The most convincing way to account for these three kinds of evidence is to regard the expressive profile as being generated from the performer’s conception of the structure of the music at the time of its performance. The extraordinarily high level of long-term stability that has been found is thus the consequence of a performer retaining a very stable conception of how the music is organised. Likewise, the ability to produce different expressive versions of the same piece at a moment’s notice is the result of being able to imagine a number of different structural interpretations of the piece, each with its own expressive consequences. Finally, sight-read performances have a non-arbitrary expressive profile because even when first encountering a piece of music, a performer is obliged to form some kind of structural interpretation of it as s/he goes along, and provided that the music is in a familiar style this interpretation is likely to contain many of the same features as would
Generativity, mimesis and the human body in music performance
223
be present after a period of study and practice, albeit in a somewhat more rough and ready form. (Because there is still very little empirical work that has investigated what happens to an interpretation in the course of practice, it is not possible to be more precise about the extent or ways in which a first performance and a highly practised performance by the same performer differ). The main components of such a generative model are summarised in Figure 1. If the music exists as notation, then the first stage is for the notation to be parsed in some fashion so as to yield structural units of various kinds—groups, metrical units, harmonic units and melodic units among others. In the case of music passed on from performer(s) to performer(s) in an oral culture, the input is sound rather than notation, and for a memorised piece of music, or an improvised performance, this first stage is absent, and the whole process begins from the knowledge representation, which is the next point in the figure. This is intended to represent the knowledge that the performer has of the piece in terms both of its specifics (the notes, durations, dynamics, timbres, etc. of the particular piece) as well as more general knowledge of musical style, performance practices, music history, and so on. In the final analysis this general component extends potentially to include all the accumulated knowledge that a performer possesses—both musical and non-musical—since it is possible for an almost limitless range of knowledge to bear on a performance. To take a somewhat extreme but by no means implausible example,
Figure 1 Schematic representation of the principal components of a generative model of expressive
Contemporary music review
224
musical performance (see text for details). it is possible to imagine that a person’s knowledge of Hegelian dialectics might influence the performance of a Beethoven sonata, since sonata form structures can themselves be seen as an example of the operation of a dialectical process. (The same might equally be said of a performer whose knowledge of biology led him/ her to interpret a throughcomposed piece according to a metaphor of organic growth). Returning to Figure 1, this combination of piece-specific and more general knowledge can be regarded as the basis for two parallel paths: one is the construction of a motor program, which contains the abstract specification for all the motor activities required to perform the piece. This clearly draws principally on the piece-specific component of the knowledge representation, though general knowledge of musical structure is involved here too (Shaffer, 1981). The second path from the knowledge representation leads to a set of generative principles which take this structural, stylistic and aesthetic knowledge as their input and give a set of expressive modifiers as their output. These modifiers act on the first two motor programs which are conceptually distinct though fused in reality. This first form, labelled ‘canonical’ in the figure, contains all the information necessary for an accurate rendition of the piece, but not sufficient for an expressive performance. This can be thought of as the motor programming equivalent of the information contained within the score, but structurally parsed, and represented in terms of the requirements for action. The expressive modifiers, which are the output from the generative expressive principles, act on this canonical program so as to give rise to a second version of the program, labelled ‘expressive’ in the figure, which contains the necessary and sufficient information for an expressive performance of the music. As noted above, these two versions of the motor program are only conceptually distinct: a number of authors, going right back to Seashore (1938/67), have shown that performers are unable to produce performances that are devoid of expression, demonstrating either that a canonical program does not exist as an independent step in the process, or at the very least that it has no access to the later stages of the production system other than via an expressively modified version. The output from the expressive motor program is still a rather abstract specificaconvenient to keep real-time out of the specification until late in the whole process since it then allows the performance to be run off at a variety of absolute tempi without having to restructure the detailed specifications of the piece. It is important to note, however, that absolute tempo does affect the structure of a performance, apparently by modifying the performer’s representation of the musical structure: studies have shown (Clarke, 1982; 1985) that performers modify the grouping structure of the music and the number of distinct rhythmic categories at different performance tempi. Figure 1 therefore indicates the influence of tempo on the representation of musical structure. Finally, the timed movement specifications are converted into muscle commands which are sent to the appropriate effector organs. A great deal of the cognitive psychology of music has been concerned with attempting to model the more musically specific aspects of our representation of musical knowledge—albeit primarily from the perspective of listening rather than performance (e.g. Longuet-Higgins, 1979; Lerdahl & Jackendoff, 1983; Lerdahl, 1988; West, Howell
Generativity, mimesis and the human body in music performance
225
and Cross, 1985; West, Cross and Howell, 1987; Krumhansl, 1990). Similarly, a considerable amount of empirical research has been devoted to attempts to model the expressive principles and the way in which they map musical structure into expression (e.g. Shaffer, 1981; Todd, 1985; Clarke, 1988; Sundberg, 1988; Palmer, 1989). This all adds up to an impressive body of research which has done much to advance the status of the cognitive psychology of music, and which has begun to coalesce into a reasonably coherent view of the relationship between music perception and performance. However some recent empirical work suggests that this picture may be a little too neat, and that a less representational and mentalistic approach may be required. It is to the empirical studies that I will turn first, before considering the wider issue of the drawbacks of an overly mentalistic approach.
Generativity and mimesis in the imitation of expression One of the predictions of the generative approach to expressive performance is that expression will only be producible and reproducible if it has some structural basis. This was the prediction investigated in a study of performers’ ability to imitate performances of short melodies which they heard in a variety of expressive and inexpressive versions (Clarke & Baker-Short, 1986). In the study, pianists heard two short melodies in three versions: deadpan (a perfectly accurate rendering of the melody, with no expressive features); expressive (a pattern of expressive timing which corresponded to structural features of the music); and ‘perverse’ (a pattern of ‘expressive’ timing which bore no relation to the structure of the melody). Each pianist heard a randomly selected version of one of the melodies played three times in immediate succession, and then made three attempts at imitating whats/he had heard. The results showed a number of interesting features. First, the performers’ attempts to reproduce the deadpan versions revealed a clear pattern of rubato reflecting the structural organisation of the melodies; phrase or group boundaries were marked by a slight decrease in tempo. This demonstrates the insuppressible nature of expression in performance, rubato and other expressive features being symptomatic of a performer’s unconsciously formed mental representation of the musical structure of the melody. As motor skills research has demonstrated (e.g. Shaffer, 1981; Sloboda, 1982) quite abstract representation such as these are essential for the performer to construct the motor program that controls the movements of his/her performance. Second, performers’ attempts to reproduce the ‘perverse’ versions of the melodies showed a greater combination of inaccuracy and instability than did their attempts to imitate expressive or deadpan versions. This is essentially confirmation of the generative prediction that only when expression is meaningfully related to musical structure can it be produced (or in this case reproduced) by a performer. The third main finding, however, appears to go somewhat contrary to the most rigid version of the generative approach: when the mean timing profile for the performers’ attempts to imitate the perverse versions was analysed it showed a clear correspondence to the aberrant profile that the performers were trying to reproduce. Since this profile had no meaningful relation to the structure of the melody, we must assume one of two things: either the performers are constructing some kind of perverse mental representation of the music’s structure, from which the timing profile is then generated; or they are suspending the
Contemporary music review
226
normal output of the generative system and resorting to a more literal approach to the reproduction of the melody, relying on some kind of auditory image, somewhat like an extended echoic trace. There is no independent way to decide between these alternatives, but the first alternative seems a little unlikely since the performers have very little exposure to the melody (three hearings in immediate succession) on which to base the construction of a novel and idiosyncratic representation that would furthermore be inherently unstable. The results of this first study of imitation seemed to raise a number of interesting issues, but experiment itself suffered from a number of design problems. The most serious of these was that the expressive version did not actually correspond to a real performance, so that we cannot be sure that the timing profile it embodied was any more generatively spontaneous than that used in the perverse version. (There was actually some evidence from the study that this was the case: the rubato introduced spontaneously by the performers in their attempts to reproduce the deadpan versions bore a strong resemblance to the timing profile of the expressive versions). Furthermore, the total amount of rubato in the expressive and perverse versions, although reasonably similar, was not equalised as precisely as it might have been. A recent study (Clarke, in preparation) has tackled the same question of performers’ abilities to imitate different expressive versions of a melody, whilst trying to overcome the shortcomings of the earlier study. Rather than starting from a deadpan version, the new study begins from real performances, producing dislocations between structure and expression by transforming these real performances in various ways, all of which leave the total amount of rubato unchanged. These transformations (which are made possibly by the POCO system—see Honing, this volume) are of two types: one is an inversion of the timing profile such that a proportional deviation of a note from its exact metrical value becomes the reciprocal of that value. (For example, a note that in the original performance is played 0.8 of the value indicated by the notation is transformed into a note that is played with 1/0.8=1.25 of its notated duration). This produces a timing profile that is the mirror image of the original, but which preserves the total amount of timing deviation. The second kind of transformation can be described as cyclical translation, where the timing profile is shifted along the melodic sequence such that the proportional timing deviation of a note x becomes the timing value of note x+n, where n is the amount of translation or shift. In the present study two values of n were used corresponding to a shift by one beat, and a shift of a measure plus a beat. The cyclical nature of the translation means that the timing values for the last notes in the melody are shifted round to become the values for the first notes. Once again, this transformation preserves the total amount (and in this case even the pattern) of rubato, but redistributes it in relation to the melody. The procedure of the experiment was much the same as in the previous study (except that the performers heard each stimulus melody between each of three imitation attempts), and made use of all four versions (original, inversion and two translations) of four melodies which ranged from 10 to 36 notes in length. The deadpan condition, having been demonstrated to be ‘unreproducible’, was omitted from this study. The results of the study seem to bear out the general pattern of behaviour illustrated in the earlier study, but the more systematic experimental design and construction of the materials have allowed a more thorough analysis of the data. The basic result remains that the original versions are reproduced with greater accuracy and less variability than the transformed versions, and
Generativity, mimesis and the human body in music performance
227
that the inversion in particular, which is the most radical transformation, is generally reproduced with a striking decrease in accuracy and a correspondingly elevated variability. Of the two translations, the one beat shift seems less disruptive than the measure plus beat shift, suggesting that rubato is organised at a level higher than the level of beats within the bar (as has been shown in other studies, e.g. Todd, 1985). Once again, however, there is a significant ability to reproduce the timing profile of the transformations of the melodies, though there is an interaction here between transformations and melody. In particular, the inversion of the shortest melody is reproduced with an accuracy that is close to that of reproductions of the original. This seems to support the idea that imitation of the transformed versions is accomplished in an echoic or mimetic manner rather than by generation from a transformed structure: a directly mimetic strategy is likely to be inversely affected by the length of the melody because of the well-documented temporal and informational limits of immediate memory. Closer analysis of performers’ attempts to imitate the inversions shows that the ‘success’ of their attempts to imitate the inversion of the shortest melody is due to their not being drawn towards the pattern of timing that characterised the original. By contrast, the failure to shake off the timing pattern of the original in the longer melodies is shown in a multiple regression analysis where the timing pattern of the original actually accounts for a greater proportion of the variance in the imitation attempts than does the inverted timing pattern that the performers are trying to reproduce. The result, therefore, is a compromise profile that bears a closer similarity to the original performance than to the version being imitated. It would be a mistake, however, to imagine that this compromise approach always results in a profile which lies somewhere between the inversion and the original: if this were literally the case the performance would be perfectly metrical, since the original and its inversion have a precisely reciprocal relation to one another. In at least some
Contemporary music review
228
Figure 2 The timing profile of an attempt to imitate an inverted version of one of the four melodies used in the imitation experiment (see text). Note the ‘cross over’ halfway through the performance, where the performer, having followed the inversion target quite closely, reverts to the timing profile of the original performance target. cases it is apparent that a performer embarks on a profile that accurately imitates the inversion, but is then drawn back to the original timing profile some way into the melody. Figure 2 illustrates a striking example of this in one performer’s attempt to imitate the inversion of the longest melody. The melody is in two identical phrases, and it is clear in the figure that while the player keeps very close to the inverted timing profile through the first phrase, he crosses over at the mid-point and imitates the original timing profile through the second phrase. Without further evidence it is hard to be certain about the explanation for this, but the performance pattern fits the mimetic argument proposed above: the performer manages to hold on to an echoic trace of the version to be imitated for long enough for the first phrase to be completed, but by the time the second phrase starts the memory trace has disappeared leaving the standard expressive profile as a default. One final feature of the data throws further light on the performers’ strategies in attempting to imitate the transformed versions of the melodies. If the data are analysed for learning effects across the three attempts at each version of each tune, it emerges that only the more moderate transformations (the two translations) of the two longer melodies (32 and 36 notes long) show any significant improvement over the three attempts. Again, further work is needed to establish the reliability of this pattern of results and to provide a more secure basis for interpreting it, but a plausible explanation might hinge on the distinction between an analytic versus a wholistic approach to the task. For the shorter melodies (10 and 15 notes long), it is possible for the performers to hold the melody in immediate memory in a wholistic fashion for the short amount of time from first hearing the melody to finishing their imitation attempt (about 5 to 10 seconds). This has a kind of ‘all-or-nothing’ effect on their imitation attempts: either the whole thing is grasped reasonably accurately and then imitated quite successfully, or else very little is grasped and the imitation is poor. There is little scope for improving over the three trials because the basis of the imitation attempt is not a structural representation from which the imitation is regenerated, but is more like a perceptual ‘snapshot’ which is simply reactivated. For the longer melodies, however, this kind of immediate memory strategy is simply impossible, since the melody lasts too long and contains too much information to be held in an extended echoic manner. The only option is to form some kind of more abstract representations. The results show that the original and inverted versions do not improve over successive attempts—but possibly for quite opposite reasons: the well
Generativity, mimesis and the human body in music performance
229
structured characteristics of the original mean that it is readily understood and represented by the performers, leaving little opportunity for improvement, whereas the inversion is so peculiar that it very largely defies being represented at all. The two translations, being less radical as disruptions of the relation between structure and expression, give the performers enough that can be grasped to lay down the basic outlines of a representation on the first attempt, but which is improved upon in the subsequent repetitions. The two translations improve because they lie between the “all” of the original version, the conventionally of which makes it easy to represent on first hearing, and the “nothing” of the inversion, the bizarreness of which makes it almost impossible to represent in the course of three hearings. To summarise the findings of the two studies, there seems to be considerable support for the generative/representational approach to performance expression both in the patterns of accuracy and stability that the imitation task produces, and also (though rather more indirectly) in the way in which different versions of different melodies do or do not improve over successive attempts. However, both studies also quite clearly illustrate a capacity to imitate that seems not to be consistent with this generative/representational view, and which appears to be far more directly mimetic in character. In the literate classical music culture of the West, this kind of direct mimesis may seem to be something of an anachronism, given our predisposition to regard expression as something that arises out of understanding, but for oral cultures the ability to imitate before a full understanding is reached, and to reach a full understanding through imitating and repeating what at first seems strange and unfamiliar, is essential if the culture is not to stagnate. The same, of course, is true of the Suzuki method of instrumental tuition, and is the course of much of the controversy that accompanies the method: a young pupil is expected to imitate the performance of a teacher, or of a reference recording, before s/he can have any real understanding of structure, style and performance practices of the music concerned. The idea is that the child’s understanding of the music follows this initially mimetic approach, a more abstract and generative/representational conception being aided by the practical experience of the music that the initially imitative approach brings. Whether this is so, or whether basing the development of a child’s performance on a mimetic origin results in unimaginative and derivative performers, is still an open question.
Mind and body in music performance The distinction between a view of performance that emphasises an abstract mental structure as its basis, and one that sees it in a more mimetic light also relates to a more fundamental opposition between mind and body in music. From informal observation of the performers in the imitation experiment discussed above, it was evident that an important aspect of their approach to the task was the influence of a kinaesthetic factor: particularly for the versions that were more difficult to grasp, many performers seemed to be trying to find a pattern of movements in their hands, arms and upper torso that captured and conveyed the intention behind the peculiar timing pattern. This was also evident during the original recording of the melodies, when the performer was asked to play each melody in three different ways: in a normally expressive manner; without any
Contemporary music review
230
expression at all; and with exaggerated expression. For all four melodies he remained almost completely still with his torso rigidly upright throughout the expressionless version, was far more flexible and mobile in the normally expressive version, and moved in a striking and exaggerated way for the exaggeratedly expressive version. This informal impression of the relationship between the degree of body movement and the expressivity of the performance has been investigated more thoroughly by Jane Davidson, a postgraduate student at City University, who has demonstrated that video recordings of performers playing pieces of music with the instruction to play in one of the three manners mentioned above (normal, expressionless, and exaggeratedly expressive) are reliably identified by independent observers as conforming to each of the three manners. Even more strikingly, when observers are shown videos that show only points of light at each of the major joints of the performer’s body (in the manner of Johansson, 1973), they are still extremely reliable in identifying which of the three performance manners they are watching (Davidson, 1991). It is, however, not just the quantity of movement that is of interest in Davidson’s study. She also showed that the quality of performers’ movements seems to relate to aspects of the musical structure, and that a performer has a relatively stable repertoire of movements that appear in equivalent musical contexts across a range of musical styles. Performers are partially aware of these movements, and may even have some awareness that certain characteristic movements are linked to recurrent musical contexts, but a good deal of this moving seems to be unconscious. When Davidson asked one her pianists to indicate what movements he thought he might make in playing six different pieces, and how they related to the organisation of the music, his indications showed only a partial overlap with the movements observed in video recordings of previous performances of the pieces. We may ask what the function of these movements is, since they are all technically speaking “unnecessary” to piano performance: if one adopts a naive “time-and-motion study” approach to piano playing, the most efficient strategy would be to keep the body as still as possible, to keep the hands close to the keyboard and the wrists flat, and to keep the hands and forearms parallel to the horizontal plane of the keyboard and as close to it as is feasible when moving up and down the instrument. This is emphatically not what good keyboard players do. The fairly large curvilinear movements of their arms and hands up from the keyboard and then back down onto it as they move around the instrument, as well as swaying motions of the whole body and depression and raising of the wrists, are all commonly observed features of skilled piano performance that appear to be an integral part of expressive production. As yet we have little empirical evidence as to the precise function and motivation behind these movements, but it seems clear that they play a mixture of technical and expressive roles. At a technical level, they may be important in controlling the timing and fluency of the performance: rather than timing everything in an abstract, central, clock-like manner, it seems likely (as suggested in Shaffer, 1982) that timing is a distributed property of the whole body, arising out of the natural periodicities of the body’s different pendular components. This view of the body as a coupled system of masses and springs, that can be modelled with the equations of pendular motion, has been developed by Turvey and his co-workers (e.g. Tuller, Turvey and Fitch, 1982), and has been used by Winold, Thelen and Ulrich as a framework to describe the bowing movements of cellists playing pieces of music at different speeds
Generativity, mimesis and the human body in music performance
231
(Winold, Thelen, and Ulrich, 1990). In the case of piano playing we can imagine that the swaying movements of the whole body may be an effective way of regulating the fluctuations in tempo of the piece, that the large curvilinear departures of the hand and arm away from, and back to, the keyboard may be a way of controlling the precise intensity and onset time of important notes, and that the depression and raising of the wrists may serve a similar function. In short, the apparent choreographical extravagance of expressive performance may actually be quite a practical way to control the finer details of performance by exploiting the natural tendencies and resistances of the body as a physical system. There is also evidence, however, that these movements have a more deliberately expressive function. Davidson showed that a performer may have a relatively fixed and stereotyped vocabulary of movements that s/he makes use of in performing music in a range of styles, and on different occasions. In keyboard music ranging from Bach to Schoenberg, Davidson showed that a single professional performer used a repertoire of about six identifiably different movements to project the music, and that these movements seemed to have reasonably specific functions in relation to the structure of the music being played: one was associated with arrival at a particularly important point in the music, another with the release of tension in the music, a third with the completion of a phrase, and so on. While there may still be a purely productional element behind these movements, their appearance across a range of musical styles and in quite different local musical contexts, though in the same generic structural context, suggests strongly that they have a significant expressive function. Indeed, as mentioned above, the performer in Davidson’s study was able subsequently to go through the music he had played and indicate the movements that he would expect to make in performing the piece, and although these imagined movements turned out to have only a loose relation to the movements he actually made in the original recording, it is significant that he was aware that he made certain stereotyped movements that bore some systematic relation to what was going on in the music. Whether these movements are intended to have a communicative function for the audience, or whether they are a response to the music on the part of the performer, and a way of his becoming fully involved with the music, is difficult to assess. It seems likely that they start off with the latter function and may develop so as to acquire an explicitly communicative function for a more extrovert performer. Delalande (1990) has observed that video recordings of Glenn Gould from early and late in his career show a marked difference in the degree to which his movements are stereotyped. Gould is notorious for moving a great deal when he plays, but what Delalande observes is that the earlier recording shows movements that are far more fluent and unpredictable, whereas in the later recording the movements have become more fixed into an identifiable repertoire of movements that recur in a more predictable fashion. Interestingly, Gould had by this time given up public performing, so we must assume either that the change in the character of his movements was for the benefit of the camera, or that it was not motivated by communicative concerns at all. The importance of human movement in performance is not limited to its role in our understanding of the control of expression and the communication of expressive meaning. Daily (1985) has also shown how a consideration of the movements involved in music on a particular instrument can help to account for the way in which a musical style develops. Baily showed that the characteristic style features of the music of a stringed
Contemporary music review
232
instrument from Afghanistan (the Dutar) could be parsimoniously explained in terms of the movements necessary to produce the music, and that this Dutar music was actually a derivative of the music for another Afghan stringed instrument, the Rubab. The style differences between the two kinds of music could be accounted for simply by looking at what adjustments were necessary, from a purely physical point of view, in transferring Rubab music onto the Dutar. In a similar way, the highly original orchestration for which Berlioz’s music is well-known has been attributed by a number of commentators to the fact that he was a guitarist, and to the particular perspective on the instrumental character of music that this gave him (Rushton, 1983; Cook, 1990). In essence, these examples argue for a greater recognition of the role of interactions between the human body and the physical possibilities and constraints of musical instruments in determining musical structure in even the most apparently cerebral and literate music. A striking case is the solo string music by J.S.Bach: the basic musical intention to produce polyphonic music for instruments which are so constructed as to virtually exclude anything but fairly simple two part polyphony resulted in a structural solution of the greatest originality. The pseudo-polyphony that arises out of the melodic figurations of Bach’s music (possibly derived from the style brisé of lute and vihuela music of the sixteenth century), particularly in the faster movements of the sonatas, partitas and suites (e.g the Gigue of the second violin partita), is a structural device which allows adherence to the original musical intention within the constraints and possibilities of the chosen instruments. According to this perspective we can see musical structure as the dialectical synthesis of the compositional intention of a composer and the instrumental possibilities of the forces for which s/he is writing. There has in the past been a tendency in thinking about musical structure and musical performance to adopt too cerebral and abstract an approach. The purpose of this paper has been to redress the balance by showing that a number of phenomena in musical performance, and musical structure, are better understood when the embodied nature of musical thinking is recognised. Even within the rather abstract generative approach to musical performance, a number of authors have made use of the characteristics of physical movement, and of basic physical laws, to model the pattern of rubato found in expressive performance (Kronman & Sundberg, 1987; Todd, in press). It is as if the human body acts as a channel or filter between the laws of physics and the abstract and culturally specific principles of composition and performance. It is important, of course, not to go too far in the direction of seeking explanations for musical phenomena in terms of physics or kinematics; the history of music is already littered with the remains of theories that pursued too simple and literal a relation between music and physics, or music and movement. But it is also true that the limits of a purely formal approach to psychological issues in music, as is adopted by cognitive science in its purest form, have become clearer in recent times. However much we may want to model human intelligence in terms of abstract inference engines, the inescapable fact of the human body, and of its profound influence on human thinking, cannot be overestimated.
Generativity, mimesis and the human body in music performance
233
Summary Starting from a generative account of expression in musical performance, this paper has exposed certain limitations in the rather abstract view of musical performance that the approach exemplifies. The limitations are revealed in studies that show that a performer is able to reproduce performances in which the relationship between structure and expression has been disrupted—albeit more inaccurately and variably than when a more appropriate relationship exists. The result suggests that performers must be able to adopt some kind of directly mimetic strategy and that this maybe a more significant aspect of performance, particularly in non-literate musical cultures, than we are accustomed to recognise. The mimetic component itself seems to be closely related to the bodily involvement of the performer, apparently redundant bodily movements being both a way of achieving precise control of the acoustical aspects of expression in performance and a dimension of musical expression in their own right. Ultimately this is an argument for the abandonment of the old Cartesian mind/body dualism in favour of a more integrated view of human mental and physical processes which recognises the fundamental importance of the embodiment of human thought in human action. Acknowledgements The research in this paper was supported by grant no. A413254004 from the Economic and Social Research Council. I am grateful to Peter Desain and Henkjan Honing for their comments on an earlier version, for providing the POCO environment that made the experimental work possible, and for their help and ideas in running the experiment.
References Baily, J. (1985) Music structure and human movement. In P.Howell, I.Cross and R.West (Eds.), Musical Structure and Cognition, pp. 237–259. London: Academic Press. Clarke, E.F. (1982) Timing in the performance of Erik Satie’s “Vexations”. Acta Psychologica, 50, 1–19. Clarke, E.F. (1985) Structure and expression in rhythmic performance. In P.Howell, I.Cross and R.West (Eds.), Musical Structure and Cognition, p. 209–237. London: Academic Press. Clarke, E.F. (1988) Generative principles in music performance. In J.A.Sloboda (Ed.), Generative Processes in Music, p. 1–27. Oxford: The Clarendon Press. Clarke, E.F. (in preparation) Imitating real and artificial rubato. Clarke, E.F. & Baker-Short, C. (1987) The imitation of perceived rubato: a preliminary study. Psychology of Music, 15, 58–75. Clynes, M. & Walker, J. (1982) Neurobiologic functions of rhythm, time and pulse in music. In M.Clynes (Ed.), Music, Mind and Brain: the Neuropsychology of Music, p. 171–217. New York: Plenum.
Contemporary music review
234
Cook, N. (1990) Music, Imagination and Culture. Oxford: Oxford University Press. Davidson, J. (1991) The perception of expressive movement in music performance. Unpublished Ph.D. thesis, City University, London. Delalande, F. (1990) The musical gesture, from the sensori-motor to the symbolic. Paper presented at the Second International Colloquium on the Psychology of Music, Ravello, Italy. Johansson, G. (1973) Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14, 201–211. Kronman, U. and Sundberg, J. (1987) Is the musical ritard an allusion to physical motion? In J.A.Sloboda (Ed.), Action and Perception in Rhythm and Music, pp. 57–69. Stockholm: Publications issued by the Royal Swedish Academy of Music, no. 55. Krumhansl, C.L. (1990) Cognitive Foundations of Musical Pitch New York: Oxford University Press. Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal Music. Cambridge, Mass.: MIT Press. Lerdahl, F. (1988) Tonal pitch space. Music Perception, 5, 315–350. Longuet-Higgins, H.C. (1979) The perception of music. Proceedings of the Royal Society of London, B 205, 307–322. Palmer, C. (1989) Mapping musical thought to musical performance. Journal of Experimental Psychology: Human Perception and Performance, 15, 331–346. Rushton, J. (1983) The Musical Language of Hector Berlioz Cambridge: Cambridge University Press. Seashore, C. (1938/1967) Psychology of Music. McGraw-Hill. Republished by Dover Books: New York, 1967. Shaffer, L.H. (1981) Performances of Chopin, Bach and Bartok: studies in motor programming. Cognitive Psychology, 13, 326–376. Shaffer, L.H. (1982) Rhythm and timing in skill. Psychological Review, 83, 109–122. Shaffer, L.H. (1990) Studies of expression in musical performance. Paper presented at the Second International Colloquium on the Psychology of Music, Ravello, Italy. Shaffer, L.H., Clarke, E.F. & Todd, N.P. (1985) Metre and rhythm in piano playing. Cognition, 20, 61–77. Sloboda, J.A. (1982) Music Performance. In D.Deutsch (Ed.) The Psychology of Music. New York: Academic Press. Sundberg, J. (1988) Computer synthesis of music performance. In J.A.Sloboda (Ed.), Generative Processes in Music, pp. 52–70. Oxford: The Clarendon Press. Todd, N.P. (1985) A model of expressive timing in tonal music. Music Perception, 3, 33–58. Todd, N.P. (in press) The kinematics of musical expression. Journal of the Acoustical Society of America. Tulley, B., Turvey M.T. and Fitch, H.L. (1982) The Bernstein perspective: II. The concept of muscle linkage or coordinative structure. In J.A.Scott-Kelso (Ed.), Human Motor Behaviour: An Introduction, pp. 253–271. Hillsdale, N.J.:Lawrence Erlbaum. West, R., Cross, I. and Howell, P. (1987) Modelling music cognition as input-output and as process. Psychology of Music, 15, 7–30. West, R., Howell, P. and Cross, I. (1985) Modelling perceived musical structure. In P.Howell, I.Cross and R.West (Eds.), Musical Structure and Cognition, pp. 21–53. London: Academic Press. Winold, H., Thelen, E. and Ulrich, V.W. (1990) Coordination and control in the bow arm movements of highly skilled cellists. Unpublished manuscript, Indiana University.
Representations of musical structures
235
Representations of musical structure Issues on the representation of time and structure in music Henkjan Honing Centre for Knowledge Technology, Utrecht, The Netherlands and Music Department, City University, London, UK Contemporary Music Review, 1993, Vol. 9, pp. Vols. 1 & 2, pp. 221–238 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
This article discusses the issues in the design of a representational system for music. Following decisions as to the primitives of such a system, their time structure and general structuring is discussed. Most of the issues are presented as controversies, using extremes to clarify the underlying problems. Associating time intervals and their constraints with the components of musical structure turns out to be essential. These constraints on time intervals model an important characteristic of musical knowledge and should be part of the representation, i.e. part of the syntax. It is concluded that a representation of music should, in the short run, be made as declarative, explicit and formal as possible, while actively awaiting representation languages that can deal with the presented issues in a more flexible way. KEY WORDS: Representational systems, music representation, knowledge representation, temporal representations, structure.
Contemporary music review
236
Introduction This article describes a number of important issues in the representation of music with respect to the structuring of musical information. The set of issues presented is in no way complete, but indicates the most influential decisions that have to be taken in the representation of structure. The identification of the problems is central and there will be no speculation on possible solutions. The discussion will be restricted to the descriptive issues of music representation, concentrating on its primitives and their structuring. Of course, a purely technical description of a representation of music is not sufficient; its cognitive aspects should be incorporated as well. Although a discussion on the modeling of the “musical mind” is not the aim here, a cognitive viewpoint will add an essential perspective in the identification of the issues in the design of a general representation of music. Since a representation of the real world (represented world) has to do with cognition, the image (representing world) will have most of cognition’s characteristics. In the cognitive sciences, and in particular subfields like computational psychology and artificial intelligence, the use of computational models (or representational systems) is central. Their merits, together with the proposal of the term “cognitive science”, were described by Christopher Longuet-Higgins as: […] it sets new standards of precision and detail in the formulation of models of cognitive processes, these models being open to direct and immediate test. (Longuet-Higgins, 1973) The hope is that these formulations will contribute to a new theoretical psychology. Apart from the discussion whether a computational psychology is possible at all, a computational theory sets an important foundation: by describing a theory in terms of a formal system, together with its interpretation, it can be used to define what is faulty or inadequate (i.e. it can be falsified) and might help us in defining what kind of theoretical power we actually need. Or, as Margaret Boden states: It provides a standard of rigour and completeness to which theoretical explanations should aspire (which is not to say that a program in itself is a theory). (Boden, 1990, p. 108) Representation is an essential part of such a formal system and decisions made in its design will undoubtedly influence the behavior of the computational model, embodying the theory. It is these decisions, to be made with regard to a representational system of music, that this article is aiming at.
Different perspectives A number of different areas of research have a direct interest in specifying an appropriate representation of music. The latter either forms the basis of their studies or is a subject of
Representations of musical structures
237
study in itself. In the following short overview the different viewpoints and their specific demands will be described. The main difference is contained in the distinction between representations of a technical nature and representations of a cognitive nature (conceptual or mental representations). Music analysis and production Musicology Notation has always played a central role in musicological research. The design and adaptation of notations or representations have been developed along with the specific theories of analysis. Different overlapping or contradicting theories have been proposed (Schenker, 1956; Meyer, 1973; Narmour, 1977; Lerdahl & Jackendoff, 1983). Most theories agree that there is more in music than what is written in the score. In this sense, the opinion of the philosopher Nelson Goodman (1968) that a piece can be characterized as the set of performances in conformance with its score is an exception. The question here is whether a piece of music resides in the notation, in the air, or in people’s minds, or in other words, whether music is cognitive or not. Computer music In the field of computer music there is an interest in the design of appropriate data structures for music systems that form the basis of, for example, composition tools, interactive systems, and notation systems. Several projects have proposed different kinds of representation, suited to the specific demands of the particular problem or even to the software or hardware used (see Loy, 1988 for an elaborate survey of computer music systems). A distinction can be made between representations designed for real-time systems that are process-oriented (e.g. Puckette, 1988), and non-real-time systems that have a static global view of the music (e.g. Dannenberg, 1989). They differ, respectively, in their tacit and explicit representation of time (see below: The representation of time). All systems have their own way of representing music and share little common ground. The only widespread standard is the industry proposed MIDI standard: a communication protocol (described in Loy, 1985) and file format. It is a very low-level stream-like and structureless representation (criticized in Moore, 1988) designed for communication between electronic instruments and computers. Within the computer music community several initiatives (Dannenberg et al., 1989; ANSI, 1989) have been taken towards a more general and high-level representational standard. Music publishing and retrieval systems In music archiving the need for the standardization of notated music has resulted in several proposals for the storage and printing of music (Erickson, 1975; Byrd, 1984; Gourlay, 1986). Most of them are based on a visual description (e.g. notes positioned on staves) and are not very general in their applicability. The ANSI standardization committee for music representation (ANSI, 1989) is a recent attempt to make a technical and methodological specification for a standard music description language, useful in areas such as music publishing, music databases, computer assisted instruction, music analysis, and music production. In general, these standards seem to concentrate more on
Contemporary music review
238
pragmatics (e.g. efficiency, in terms of size and speed requirements) than on generality and consistency. AI and cognitive modeling Another large area of research is artificial intelligence (AI) and the cognitive sciences. Both have their own specific goals and demands. I will describe them here briefly. AI and knowledge representation In AI the concern is to notate descriptions of the world in such a way that an intelligent machine can come to conclusions about its environment by formally manipulating these descriptions. In knowledge representation, a subfield of AI, research is focussed on the development of representation languages and the design of inference schemes (e.g. to model reasoning about knowledge). Both are based in the tradition of (predicate) logic while more recent languages can be classified as structured object representations (e.g. frames; Minsky, 1975), associational representations (e.g. semantic networks; Quillian, 1968), and procedural representations and production systems (Newell, 1973). It is important to note that AI and knowledge representation are about feasible ways to build intelligent systems and not so much about modeling cognitive behavior. AI and music is also an important field of research where representation is becoming one of the central issues (Balaban et al., 1991). Cognitive and computational psychology In the cognitive sciences, mental and knowledge representations are important subjects of study. It seems impossible to imagine a cognitive system in which a representation does not play a central role (Anderson, 1983; Fodor, 1983; Johnson-Laird, 1983). There is, however, no general agreement on the assumption that mental activity is mediated by internal or mental representations, and when there is, there is still some discord on the precise nature of these representations. Proposals for knowledge representation can be grouped into three categories: prepositional representations (discrete symbols or propositions), analogical representations (use of images), and procedural representations (i.e. modeled as processes or procedures). To this last category also belong distributed representations (e.g. connectionist networks). Music perception and cognition In the psychology of music, alongside research in music production and comprehension, the majority of work has consisted of describing the nature of musical knowledge and its representation. Elaborate studies have been done in the domains of pitch (Krumhansl, 1979; Shepard, 1982), rhythm (Povel & Essens, 1981; Longuet-Higgins & Lee, 1984; Desain & Honing, 1989) and timbre (Grey, 1977; Wessel, 1979). But here also, there is no general agreement on the precise nature of these representations (see McAdams, 1987 for a more complete overview or Sloboda, 1985; Dowling & Harwood, 1986).
Representations of musical structures
239
General approaches to representation This paragraph will outline the main approaches to representation. Identifying the problems of representation in general will be shown to be of direct benefit to the debate concerning music representation. Knowledge representation hypothesis An important assumption in a formalist approach to representation is the knowledge representation hypothesis. It is summarized by Brian C.Smith (1982) as follows: Any mechanically embodied intelligent process will be comprised of structural ingredients that a) we as external observers naturally take to represent a propositional account of the knowledge that the overall process exhibits, and b) independent of such external semantical attribution play a formal but causal and essential role in engendering the behavior that manifests that knowledge. Such a “mechanically embodied intelligent process” is presumed to be an internal process that manipulates a set of representational structures, in such way that the intelligent behavior of the whole results from the interaction of parts. It is presumed only to react to the form or shape of these representations, without regard to what they mean or represent. As an illustrative example one can use a technique that is sometimes used in making enlarged copies of pictures, for instance, by artists who make large chalk drawings of well-known paintings on the street. They copy these paintings from a small reproduction, holding it upside-down. This minimizes the distorting influence a perspective has on the copying of the actual proportions: an unwanted interpretation that imposes ‘meaning’ not present in the picture. This example shows that one has to watch out for interpretive knowledge, so easily added by human observers, not present in the representation itself. A representation is only syntax and should have all knowledge embodied in this syntax, independent of the interpretive system. A representational system can be defined as “a formal system for making explicit certain entities or types of information, together with a specification of how the system does this” (Marr, 1982). In the formalist definition entities in a formal system might have complex mechanisms.1 In deciding on any particular representational system and its entities, there is a trade-off; certain information will become explicit at the expense of other information being pushed into the background making it possibly hard to recover. Procedural and declarative approaches There is a classic distinction between declarative and procedural ways of representing knowledge: declarative being the knowledge about something, while procedural knowledge states the knowledge in terms of how to do something. Declarative knowledge tends to be accessible: it can easily be examined and combined. Procedural knowledge
Contemporary music review
240
tends to be inaccessible, guiding a series of actions but allowing little examination. We seem to have conscious access to declarative knowledge whereas we do not have this access to procedural knowledge (Rumelhart & Norman, 1985). Declarative representations have the merit of being composable i.e. the meaning of a complex expression is based on or can be derived from the meaning of its parts and their combinations. There are no interactions between separate entities, which makes the representation extremely modular. Knowledge can simply be added as long as it keeps the system consistent. All knowledge is open for introspection. In procedural representations the emphasis is on interaction. Procedural representations are, not surprisingly, very powerful in modeling knowledge that is procedural by nature. There is no separation between facts and processes. Interactions are strong but deriving semantics is very hard (if not impossible). Addition or change is only reached by modification (and a resulting debugging process). Introspection and reflection is impossible. The problem, here, is the way in which procedures can be represented so that they can be interpreted. The question becomes what they do, instead of how they do it (see Table 1 for an overviews.
Table 1 Procedural and declarative knowledge representations compared Declarative knowledge
Procedural knowledge
Accessible
inaccessible
modular (no interaction)
interaction (no separation between facts and processes)
composable semantics
impossible (or hard) to derive semantics
open to introspection and reflection
closed to introspection and reflection
knowledge can easily be added, if consistent
addition only by modification
control structure obscure
control structure explicit
Mixed and multiple representations In general, the distinctions between procedural and declarative representations are about efficiency, control, modularity, and the accessibility of knowledge. For computer science the first two are most important, while cognitive psychology is most interested in the last two. Terry Winograd (1975) emphasized the duality between modularity and interaction, interaction being a strong characteristic of procedural representations and modularity of declarative representations. Many complex systems can be viewed as “nearly decomposable systems”, a notion introduced by Herbert Simon (1969).2 A single module can be studied separately without constant attention to its interaction(s) with other modules. Interactions among these subsystems are weak but not negligible. In representational terms, this forces us to have representations that facilitate these weak interactions. Mixed representations (i.e. both modular and interactive), as described by
Representations of musical structures
241
Winograd and others, have been further developed in the design of object-oriented languages (e.g. Minsky, 1975; Hewitt, 1975). In mixed representations different parts of the represented world are described in different ways. Some parts might be described procedurally, while others are described in a declarative way. Another approach is to have multiple representations of the same ‘world’, each describing the represented world completely. Instead of a mixture of, for example, procedural and declarative representations, describing different parts of the world, there is a procedural representation describing the whole world and a declarative representation describing the whole world in parallel. Here the trade-off is extra power against the problem of coordinating the information in the separate representations: when a change is made, all structures have to be kept consistent so as to reflect the same represented world.
Issues in music representation The remainder of this article will address issues specific to the representation of music. Three sub-areas will be discussed: the primitives of a music representation, time structuring and general structuring. The notion of structuring depends on the possibility of decomposing a representation into meaningful entities, so we must first answer the important question: what are we structuring? The primitives: building blocks of a representation Decomposability How to decompose a representation of music into the appropriate parts? What are the building blocks, the primitives of such a representation? As described earlier, this decision is essential and has implications on what kind of information will be lost and what information will clearly be represented. There seems to be a general consensus on the notion of discrete elements (e.g. notes, sound events or objects) as the primitives of music. It forms the basis of a vast amount of music-theoretical work and research in the psychology of music, but a detailed discussion and argument for this assumption is missing from the literature. In music theory, as Robert Erickson (1982, p. 533) points out, there is no clear definition of what such a primitive object might be. In the psychology of music, John Sloboda (1985, p. 24), for example, just states “the basis phoneme of music is a note”, and Diana Deutsch (1982) founds her discussion on grouping mechanisms in music on a ‘given’ set of basic acoustic elements. Yet the essential question of what these elements or ‘phonemes’ are is not answered. Research in psycho-acoustics on streaming shows how difficult it is to decide on such elements from a perceptual point of view (McAdams & Bregman, 1979; Bregman, 1990). A distinction has to be made between natural and artificial discretization of dimensions, or, in other words, the existence of possibly innate perceptual mechanisms and a learned division of continuous signals. In going from a continuous acoustic signal to a discrete signal one loses information. This quantization process should be looked at as a separation process instead: both types of information, the continuous and the discrete, are needed, and probably interact with each other (cf. Desain & Honing, 1989, with regard to this separation process in rhythm perception). So, next to decomposition,
Contemporary music review
242
the issue of the characterization of the primitives of a representation, as continuous, discrete or a combination of the two, is very important. Continuous or discrete? By way of illustration, imagine Billie Holiday singing “I cried for you.” How can the sound be represented in such a way that all expressive and structural information is incorporated? What is the relation between the actual perception and the notes originally notated in the score? Consists the sentence as sung of several discrete entities, or should it be described in a continuous way? Or a combination of both? For example, discrete phonemes, syllables or notes, continuous expression over these discrete structural elements, continuous fluctuations of pitch and amplitude within them, etc. combined into several levels of discrete and continuous types of information that are closely related. In music cognition, the assumption of discrete elements finds a lot of support (McAdams, 1989). Stephen McAdams makes a distinction between three auditory grouping processes that organize the acoustic surface into musical events, connect events into musical streams, and ‘chunk’ event streams into musical units (simultaneous, sequential and segmentational grouping, respectively); and perceived discrete qualities that are based on learning (e.g. scale, meter, harmony) (McAdams, 1989, p. 182). These discrete elements of music are assumed to carry structure, while the continuous aspects carry expression (Clarke, 1987). Mary Louise Serafine (1988) stands quite alone in arguing for a continuous basis. She blames music perception research for reducing music to false elements such as discrete pitches, scales and chords: “[they] are not the elements or building blocks of music” (p. 52). She accounts for these elements as an after-the-fact notion of music. But, as David Huron (1990) observes, these are speculative claims with no empirical support. It is clear that there is still quite a lot of discussion and research needed, especially on the rules of the segregation of acoustic signals, before we can decide on the discrete elements of a general representation of music. Currently, most music representation systems use either notes or sound events/ objects as the building blocks of their descriptions. In these systems, the distinction between continuous and discrete is normally between sound generation and the discrete events which describe the sound in several attributes, or, in other words, between the instrument and the score. This division rests on the assumption that sound is continuous by nature (e.g. signals, wave forms), whereas the score is mainly a collection of discrete events. The continuous aspects of the score (e.g. timing and dynamics) are often taken care of by different kinds of procedures or ‘modifiers’ (e.g. Pope, 1989; Dyer, 1990) acting on the score: their descriptions are not part of the score representation (see below: Granularity). The trade-off made in these decompositions is very little discussed or even acknowledged. The relations: issues in structuring When we have decided on the primitives of the representation, their structuring becomes of great importance. This structuring will be described in two separate sections. Since time and its structuring is an important factor in music, with its own specific issues related to it, it will be discussed separately from the issues in general structuring.
Representations of musical structures
243
However, in the end it will be shown that they are not very different. Time structuring will be discussed first.
The representation of time A number of distinctions need to be made in trying to narrow down discussion of the representation of time. There are three different areas of interest: temporal representation, temporal logic or reasoning, and planning and scheduling. All of them influence the design of a representation of time. This section will concentrate on the first. The representation of time can be subdivided in three categories: 1) tacit (time is not represented at all); 2) implicit (time is represented, but explicit time relations are not); and 3) explicit (time is represented with explicit time relations). The issues will be spread over these categories. Tacit time structuring Some real-time systems can be called ‘no-time’ systems (e.g. Bharucha, 1987; Puckette, 1988). Because time is not explicitly represented in the primitives, there is only the notion of now. There is no explicit formulation of the systems dependence on time and no information regarding time (except ‘now’) can be derived or manipulated. Implicit time structuring In this category, time is represented without explicit time relations. Time is expressed in an absolute way (e.g. note lists (Matthews, 1969)) or relative to an arbitrary point of reference. Time relations (e.g. this note occurs before that note, or, these notes are overlapping) have to be calculated since they are not explicitly stated in the representation. Primitives: points υs intervals The decision to represent time as points or intervals is not arbitrary, even when they, theoretically, can be expressed in terms of each other (an interval is a collection of points, a point is a very short interval3). A point-based representation (McDermott, 1982) implies the occurrence of only one event at a time and lacks the concept of an event ‘taking’ time. As Allen (1983) argues, there seems to be a strong intuition that, given an event, we can always “turn up the magnification” and look at its structure. He therefore proposes an interval-based representation. Intervals form a strong basis for the computability of meaningful relations, i.e. time intervals that overlap, meet, are during, before, and after each other, etc. In music representation there are examples of both choices. Mira Balaban (1989), for instance, describes a representation based on pairs of a sound object and a time point, and Desain & Honing (1988) use sound objects with a duration (i.e. time interval) as the basis of a representation of time. Time base: absolute υs relative
Contemporary music review
244
The time base that can be chosen is either absolute or relative, or, in other words, realtime (e.g. in seconds) or proportional time (e.g. a quarter note). With an absolute time base, (onset-) time is an attribute of the musical object, whereas with a relative time base it isn’t. Some music representation systems (Smith, 1972; Schottstaedt, 1983) use lists of notes with absolute times, whereas later systems tend to describe time in terms of a relative time base or relative to the enclosing time context, i.e. expressed as a function of this context (Dannenberg, 1989; Balaban, 1989). But both time bases seem to be needed. For example, in representing a trill as being twice as long as another trill, one has to decide whether to stretch or to extend the description of this related trill, i.e. is the new trill half the speed (using relative time) or is the speed the same (using absolute time) and are there just more notes added (or any other particular way of extending a trill). Both types of behavior, using both time bases, need to be represented to allow for both representations of time. Granularity: discrete υs continuous What is the grain or grid size of the time bases mentioned above? Is time expressed as a discrete value labeling events, or is it expressed as a continuous function? As well as discrete time, a continuous way of representing time is needed, for example, when representing an accelerando or rubato over a series of notes.4 Most representational systems make these notions available as global operations acting upon the representation instead of making them part of the representation. Explicit time structuring An example of explicit structuring in music is the use of two basic structuring relations called ‘parallel’ and ‘sequential’ (Desain & Honing, 1988). These two time relations, and combinations of them, can express many constellations of discrete sound events. Similar time structuring is proposed by several other authors (e.g. Rodet & Cointe, 1984; Dannenberg, 1989). Allen (1983) describes a list of thirteen possible relationships. A set of basic explicit time relations forms a solid basis for higher level notions of time structuring and make operations on time, or operations depending on time, very elegant (Desain, 1990). Controversy: declarative υs procedural The controversy over declarative and procedural representations is also very important in the representation of music. Take the example of a Trill—a sequence of notes, alternating in pitch, filling up a certain time interval. This “filling up” is most naturally represented in a procedural form. But, as discussed previously, this type of representation has quite some disadvantages. Problems occur when there is, for instance, a nesting of these trills defined in terms of each other (e.g. a higher-level trill composed by combining the definitions of some other, i.e. lower-level trills): the definition of the high-level trill depends on the result of the low-level trills, a result that is only available after execution of the procedural description of these low-level trills. There is no way in which the duration of the high-level trill can be decided upon without evaluating the definition of the low-level trills since this knowledge is represented in a procedural form. The
Representations of musical structures
245
declarative representation (a low-level trill of a certain length) has to be replaced by the result (a sequence of notes adding up to a certain length) and information is lost (e.g. knowledge on how the trill was composed). Both kinds of representation seem to be needed in the representation of music. The marriage of both types of knowledge is, as described before, still a topic of research.
The representation of structure Structural descriptions of music can be divided into two areas. One is the description of musical structure independent of psychological considerations, based on an analysis by a musicologist. The other is the description of the structural properties of mental representations of music: the goal of music psychology research. The described issues are relevant to both areas. In describing general structuring, we can employ the same division used in the subfield of time structuring: 1) tacit structural relations, 2) implicit structural relations, and 3) explicit structural relations. Tacit structural relations When no structure is represented, we are left with only the primitives of the representation. This is the case in the earlier mentioned MIDI protocol that represents a piece of music as a structureless stream of note-onsets and offsets (with as attributes an interger key number, a velocity value and channel number). Implicit structural relations Implicit are those structural relations that have to be calculated from the representation. As an example, from a MIDI file format the following structural information can be obtained: all notes on channel 1 belong to one unit called a ‘track’; every two seconds there is a bar and all notes within that time span are part of it; etc. The structural relations that can be derived from a representation (with only implicit structuring) depend heavily on the choice of primitives and their attributes. Explicit structural relations Structure is the denominator for a large class of possible relations made between the entities of a representation. One can say that almost everything, except the entities themselves, is structure. Very few representational systems for music supply explicit structuring mechanisms, and even when they are available, they only represent specific kinds of structure (e.g. meter, bars, instrumental parts) or support annotation (e.g. “this is an important note”). The following paragraphs discuss the issues in the design of a general structuring mechanism. What kinds of structural types are needed? One way of describing different kinds of relations—so as to have a handle to talk about them in a general way—is to divide them in binary and n-ary relations. A special kind of
Contemporary music review
246
binary relation is a tree or hierarchy. A part-of relation defines such a hierarchical relation between objects. It propagates behavior between objects. A part-of relation could denote relations such as “all notes part-of chord”, or the often-used bar, beat, and note hierarchy. They are quite general and flexible in describing musical structure (see Honing, 1990). Another hierarchical relation, orthogonal to the part-of relation, is the is-a relation. It defines inheritance of behavior and characteristics, specifying a generalization hierarchy of objects: a structure of concepts which are linked to those of which they are specializations. Examples are: a dominant chord being a special kind of seventh chord, a chord being a kind of cluster, a cluster being a kind of collection of notes, etc. (see e.g. Pope, 1989). A great number of music theories use hierarchies as their only kind of structuring (Lerdahl & Jackendoff, 1983). Hierarchies are very useful in relating local and global information, but other kinds of relations are needed as well. Other binary relations like associative relations are useful in relating, for example, a theme with its variations. Functional relations are also needed (e.g. the function of a particular chord in a scale) as well as referential relations (e.g. a theme referring to a previously presented or already known motif). N-ary relations can structure more complex types of relation: for instance, the dependency of a certain chord on scale, mode and the context in which it is used is a ternary relation. The structural types described here are the ones most relevant to music, though a complete overview of all musical constructs and their expression in these structural types would take considerably more space.5 Relations between musical constructs: generalization υs dedication Not everything is said about musical structure by simply assigning one of the structural types described above. Within one type of structure (e.g. defined in terms of part-of relations) refinement is needed to distinguish between the different musical constructs described by means of this type (e.g. what is the difference between a chord and a bar when both are described in terms of part-of relations?). There are two extremes in approaching this problem. One approach is dedication: all the well-known or often used musical constructs (chord, arpeggio, bar, beat, trill, grace note, etc.) are described, more or less ad hoc, as primitives with their own specific relations (and resulting behavior), with little or no hierarchy. The other approach is generalization and is based on parsimony: there are no special musical constructs defined as primitives, all constructs being based on some very general primitive (e.g. a time interval). The bias is on generality: new musical constructs have to be defined in terms of existing ones, in a hierarchical way. The first is a popular and pragmatic approach. For instance, in a computer composition system a reasonable set can be provided that takes care of most needs. The main drawback is that extensions have to be made in an ad hoc fashion and often need to have their own processes (or transformations) defined for the user to be able to access or manipulate them. In the latter approach the choice of the right generalities is the problem. But when they are available, extensions are simply defined in terms of these generalities or higher-level
Representations of musical structures
247
constructs. There is no need to ‘tell’ the processes, acting on the representation, about these new constructs. Direction: bottom-up, top-down or both? In expressing one of the above mentioned relations, it is important to note how the information flow is supported by the representation. In music theory and the psychology of music, different directions are proposed: from the conceptual level down (top-down; Schenker, 1956), and from the low-level data up (bottom-up; Narmour, 1977), or in both directions, as in modeling tonal hierarchies with interactive activation networks (Bharucha, 1987).
Musical structure: association with time intervals and their constraints essential In what way is musical structure different from any general structure mechanism (e.g. the part-of and is-a relations we described before)? Since time is an influential factor in most, if not all types of structure in music, musical structure can be described as a collection of structuring mechanisms that have time intervals asociated with their components (i.e. structural objects). It is the constraints on these time intervals that specialize the different types of structuring. As an example, let’s look at two simple part-of relations: bars, a bar, note (see Figure 1a), and a progression, chord, note hierarchy (see Figure 2a). In the first hierarchy it is clear that the structural object ‘bars’ and its parts have a duration: they hold for a certain time interval. This is also the case for the ‘progression’ object and its parts. Both constructs have the same part-of structure but differ in the kind of constraints they have on their associated time intervals. In a ‘bars’ structure, if one bar becomes longer, the other one has to become shorter: they have to satisfy the meet constraint (using Allen’s (1983) terminology). In the ‘progression’ structure, the comparable structural objects have a before relation. The musical constructs are characterized by the specific constraints on these time intervals associated with their structural objects (see Figure 1b and 2b)6. These constraints should be part of the representation, i.e. part of the syntax, so that operations on the representation produce the behavior resulting from these restrictions for free; the semantics of musical constructs (e.g. what does an arpeggio mean, and how does it differ from a chord or a run of notes) should be moved to the syntax. In this way the representation has embedded knowledge of how to deal with particular kinds of structure. These musical constructs can be compared with small machines: they have a clear and accessible behavior that cannot be altered. Multiple representations: power υs coordination and consistency Multiple representations are needed in a complete description of music, i.e. several structural descriptions being applied to the same primitives (e.g. a note is part of a meter and a tonal hierarchy at the same time). One could think of multiple
Contemporary music review
248
Figure 1 A ‘bars’ structure with partof relation (a), its associated time intervals and constraints (b).
Figure 2 A ‘progression’ structure with part-of relations (a), its associated time intervals and constraiants (b). structural representations as analogous to a ring binder: the spiral resembles the primitives, the pages the different kinds of structural relations.7 As described before (see General approaches to representation), the consistency and coordination of the information between the pages is the problem here. Inconsistencies may occur when two structural descriptions clash (i.e. the constraints on both structural descriptions can’t be solved or unified) and exceptional or preferred behavior to be provided. It seems that in these situations, the demand for consistency is too strong (e.g. a slowed-down chord structure might turn into an arpeggio). It may not be possible to formalize a representation of music in a way that guarantees consistency.8 More research is needed in the formalization of musical constructs (i.e. definition and behavior) and their combination that might result in exceptional or preferred behavior. Modularization: musical knowledge υs annotation Here the issue is whether structuring is used to add musical knowledge or just used as annotation. Structure can be used as an annotation of the basic elements of the representation assigning different kinds of information, but it can also be interpreted as
Representations of musical structures
249
musical knowledge. Using structure in both ways facilitates modularity: not all knowledge about music has to be part of the representation, since structure can be used as a hook to import information from outside the system. This improves the modularity of the system considerably (as advocated by Simon (1969) in technical terms, and by Fodor (1983) in cognitive terms).
Conclusion Representational systems have a central position in the cognitive sciences, especially in the fields of computational psychology and artificial intelligence. A formalist approach to representation, as summarized in the “knowledge representation hypothesis”, applied to the representation of music has turned out to be beneficial. Representing musical knowledge in syntactical terms, makes a theory within the psychology of music explicit and verifiable. Discussion the issues in the design of such a representational system for music is what this article has aimed at. Before talking about structuring, the question “what are we structuring?” was asked. The decomposability of a representation of music was discussed as well as the expression of its primitives in either discrete or continuous terms (or a combination thereof). Research in the segregation of acoustical signals (Bregman, 1990) is essential in deciding on the primitives of a general representation of music. Currently, most research is based on the assumption that the basic elements of music are discrete. The discussion of time structuring, as a special case of general structuring, showed that the choice of either points or intervals, a relative or absolute time base, discrete or continuous representations, and the use of procedural or declarative descriptions of musical knowledge are controversies where solutions through combining these polarities have to be found. Several types of general structuring were discussed. An important point is the observation that structure in music is often associated with a time interval (for which it ‘holds’). The constraints on these time intervals model specific musical constructs and their behavior. Time structuring and general structuring differ in the sense that time structuring makes these constraints explicit: they are represented as structural objects (e.g. ‘parallel’ and ‘sequential’ relations), while in general structuring they are implicit: they are used to restrict the behavior of the specific structure, but are not explicitly represented as structural objects. In conclusion: 1. A representation should be as formal as possible. Even when the meaning is removed from the formal system it must be possible to prove its correctness (i.e. not dependent on knowledge outside the formal definition). 2. A representation should be as declarative as possible. Declarative representations were shown to have preference over procedural representations, even though some information is more naturally represented in a procedural way. 3. A representation should be as explicit as possible. All relations and knowledge should be explicitly stated in the representation.
Contemporary music review
250
4. All the controversies presented above need combined solutions in which both extremes can be expressed. The idea of having multiple representations of the same ‘world’ seems useful. 5. Musical structure should be associated with time intervals. Constraints on these time intervals model the specific musical constructs and their behavior. These constraints should be part of the representation, i.e. part of the syntax, so that operations on the representation get the behavior resulting from these restrictions for free. In the short term, it is concluded that it would be best to construct representations of music so as to be as declarative, explicit and formal as possible, while actively awaiting developments in representation languages or schemes that can deal with the issues presented here in a more flexible way.9 Acknowledgements Thanks to David Huron, Christopher Longuet-Higgins, Steve McAdams, Stephen Pope, Maria Ramos, and my colleagues at City University, Music Department and the Centre for Knowledge Technology for useful discussions and advice. Special support by Johan den Biggelaar, Ton Hokken and Thera Jonker is highly appreciated. Thanks for proofreading and valuable suggestions and improvements on earlier versions to Eric Clarke, Joop Ringelberg, and an anonymous referee. The research was in part supported by an ESRC grant under number A413254004. Finally, special thanks to Peter Desain for his encouragement, insights, and generous sharing of ideas.
Notes 1. Distributed representations (e.g. connectionist networks), in this sense, manipulate symbols of an unusual kind. An individual unit of such network does not implement an identifiable symbol; a meaningful representation only exist at a level made up of a number of units. 2. Simon (1969) describes nearly decomposable systems as having the property “the short-run behaviour of each of the component subsystems is approximately independent of the shortrun behaviour of the other components” (p. 100). 3. Allen’s theory (1983), describes points as intervals that are durationless, i.e. a duration less than a value ε, adjusted to the reasoning task. 4. It has been shown that structure is essential in the performance of the continuous and discrete aspects of musical time (e.g. Clarke, 1987, 1988). Therefore a complete representation of time should facilitate the expression of these aspects in terms of structure to be of any perceptual or musical relevance (see Desain & Honing, 1991a). 5. A complete overview of all musical constructs will quite likely torn out to be a large, if not infinite collection, but they probably can be grouped into a considerably smaller set of prototypical relations, with their specific characteristics being modeled as refinements of a particular structural type (see issue on Musical structure: association with time intervals and their constraints essential). 6. The constraints on the time intervals, as shown in Figure 1b and 2b, give a raw characterization of the example structures, just for comparison. For a more complete characterization of such structures the logic-based constraints of Allen (1983) are not enough. Other kinds of constraints are needed as well to be able to express relations like, for example, all bars have the same length, or, a bar is half the length of ‘bars’. 7. These pages could be of different shapes and material, standing for structural descriptions of a completely different nature. This analogy was suggested by Morris Halle in a seminar at
Representations of musical structures
251
Sussex University in 1987 when talking about conceptual representations of linguistic structure. 8. Recent work done in the field of artificial intelligence on non-monotonic logic and truthmaintenance might therefore be applicable to music. 9. Since this article was written (autumn, 1990) work has been done on partial solutions of the issues presented above. Some of the issues on the representation of time have been resolved in a generalized concept of time functions (Desain & Honing, in press). A proposal for a specification and transformation formalism of expressive timing described in terms of structure is published as Desain & Honing (1991b).
References Allen, J.F. (1983) Maintaining Knowledge about Temporal Intervals. In Communications of the ACM, 26 (11). Anderson, J.R. (1983) The Architecture of Cognition. Cambridge, Mass.: Harvard University Press. ANSI (American National Standards Institute) (1989) X3V1.8M/SD-6 Journal of Development Standard Music Description Language (SMDL). San Francisco: Computer Music Association. Balaban, M. (1989) Music Structures: A Tempora-Hierarchical Representation for Music. Musikometrika, Vol. 2. Balaban, M., Ebcioglu, K. & Laske, O., eds (1991) Musical Intelligence. Menlo Park: The AAAI Press. (forthcoming). Bharucha, J.J. (1987) MUSACT: A Connectionist Model of Musical Harmony. In Proceedings of the Cognitive Science Society. Hillsdale, New Jersey: Erlbaum. Boden, M.A. (1990) Has AI helped psychology? In: The foundations of artificial intelligence. A source book, edited by D.Partridge and Y.Wilks. Cambridge: Cambridge University Press. Bregman, A.S. (1990) Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, Mass.: Bradford books, MIT Press. Byrd, D. (1984) Music Notation by Computer. Ph.D.Dissertation, Computer Science Department, Indiana University. Ann Harbor: University Microfilms. Clarke, E.F. (1987) Levels of structure in the organisation of musical time. In S.McAdams (Ed.), “Music and psychology: a mutual regard”. Contemporary Music Review, 2 (1). Clarke, E.F. (1988) Generative principles in music performance. In J.A.Sloboda (Ed.), Generative processes in music. The psychology of performance, improvisation and composition. Oxford: Science Publications. Dannenberg, R. (1989) The Canon Score Language. Computer Music Journal, 13 (1). Dannenberg, R., Dyer, L.M., Garnett, G.E., Pope, S.T. & Roads, C. (1989) Position papers. In Proceedings of the 1989 International Computer Music Conference. San Francisco: Computer Music Association. Desain, P. & Honing, H. (1988) LOCO: A Composition Microworld in Logo. Computer Music Journal, 12 (3). Desain, P. & Honing, H. (1989) Quantization of Musical Time: A Connectionist Approach. Computer Music Journal, 13 (3). Reprinted and updated in Todd & Loy (1991). Desain, P. & Honing, H. (1991a) Tempo curves considered harmful. In J.D.Kramer (Ed.), “Music and Time”. Contemporary Music Review. London: Harwood Press, (forthcoming). Desain, P. & Honing, H. (1991b) Towards a calculus for expressive timing in music. Research Report. Utrecht: Centre for Knowledge Technology. Desain, P. & Honing, H. (in press) Time functions function best as functions of multiple times. To appear in Computer Music Journal. Desain, P. (1990) Lisp as a second Language. Perspectives of New Music, 28 (1).
Contemporary music review
252
Deutsch, D. (1982) Grouping Mechanisms in Music. In D.Deutsch (Ed.), The Psychology of Music. New York: Academic Press. Dowling, W.J. & Harwood, D. (1986) Music Cognition. New York: Academic Press. Dyer, L. (1990) Ensemble. Proceedings of the 1990 International Computer Music Conference. San Francisco: Computer Music Association. Erickson, R. (1975) The DARMS Project: A Status Report. Computers and the Humanities, 9 (6). Erickson, R. (1982) New Music and Psychology. In D.Deutsch (Ed.), The Psychology of Music. New York: Academic Press. Fodor, J. (1983) The Modularity of the Mind: An Essay on Faculty Psychology. Cambridge, Mass.: Bradford Books, MIT Press. Goodman, N. (1968) The Languages of Art: An Approach to a Theory of Symbols. Indianapolis: Bobbs-Merril Co. Gourlay, J.S. (1986) A Language for Music Printing. Communications of the ACM, 29 (5). Grey, J.M. (1977) Multidimensional Perceptual Scaling of Musical Timbres. Journal of the Acoustical Society of America, 61. Hewitt, C. (1975) How to use what you know. Proceedings of the Fourth International Joint Conference on Artificial Intelligence. Los Altos, CA.: Morgan Kaufmann. Honing, H. (1990) POCO: An Environment for Analysing, Modifying, and Generating Expression in Music. Proceedings of the 1990 International Computer Music Conference. San Francisco: Computer Music Association. Huron, D. (1990) Book review of Music as Cognition by M.L.Serafine. Psychology of Music, 18. Johnson-Laird, P.N. (1983) Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Cambridge, Mass.: Harvard University Press. Krumhansl, C.L. (1979) The Psychological Representation of Musical Pitch in a Tonal Context. Cognitive Psychology, 11. Lerdahl, F. & R.Jackendoff (1983) A Generative Theory of Tonal Music. Cambridge, Mass.: MIT Press. Longuet-Higgins, H.C. (1973) Comments of the Lighthill Report. Artificial Intelligence—A Paper Symposium. London: Science Research Council. Reprinted in Longuet-Higgins (1987). Longuet-Higgins, H.C. (1987) Mental Processes. Cambridge, Mass.: MIT Press. Longuet-Higgins, H.C. & Lee, C.S. (1984) The Rhythmic Interpretation of Monophonic Music. Music Perception, 1. Reprinted in Longuet-Higgins (1987). Loy, G. (1985) Musicians Make a Standard: The MIDI Phenomenon. Computer Music Journal, 9 (4). Reprinted in Roads (1989). Loy, G. (1988) Composing with Computers—A Survey of Some Compositional Formalisms and Music Programming Languages. In M.V.Matthews and J.R.Pierce (Eds.), Current Directions in Computer Music Research. Cambridge, Mass.: MIT Press. Marr, D. (1982) Vision: A Computational Investigation into Human Representation and Processing of Visual Information. San Francisco: W.H.Freeman. Matthews, M.V. (1969) The Technology of Computer Music. Cambridge, Mass.: MIT Press. McAdams, S. & Bregman, A. (1979) Hearing Musical Streams. Computer Music Journal, 3 (4). Reprinted in Roads & Strawn (1985). McAdams, S. (1987) Music: A Science of the Mind? In S.McAdams (Ed.), “Music and Psychology: A Mutual Regard”. Contemporary Music Review, 2 (1). McAdams, S. (1989) Psychological constraints on form-bearing dimensions in music. In S.McAdams and I.Deliège (Eds.), “Music and the cognitive sciences”, Contemporary Music Review, 4 (1). McDermott, D.V. (1982) A Temporal Logic for Reasoning about Processes and Plans. Cognitive Science, 6. Meyer, L.B. (1973) Explaining Music: Essays and Explorations. Berkeley: University of California Press.
Representations of musical structures
253
Minsky, M. (1975) A Framework for Representing Knowledge. In P.Winston (Ed.), The Psychology of Computer Vision. New York: McGraw-Hill. Moore, F./R. (1988) The Dysfunctions of MIDI. Computer Music Journal, 12 (1). Narmour, E. (1977) Beyond Schenkerism: The need for Alternatives in Music Analysis. Chicago: University of Chicago Press. Newell, A. (1973) Productions systems: models of control structures. In Visual Information Processing, edited by W.G.Chase. New York: Academic Press. Pope, S.T. (1989) Modeling Musical Structures as Event Generators. Proceedings of the 1989 International Computer Music Conference. San Francisco: Computer Music Association. Povel, D.J. & Essens, P. (1981) Perception of temporal patterns. Music Perception, 2. Puckette, M. (1988) The Patcher. In Proceedings of the 1988 International Computer Music Conference. San Francisco: Computer Music Association. Quillian, M.R. (1968) Semantic Memory. In M.L.Minsky (Ed.), Semantic Information Processing. Cambridge, Mass.: MIT Press. Roads, C. (ed.) (1989) The Music Machine. Cambridge, Mass.: MIT Press. Roads, C. & Strawn, J. (eds.) (1985) Foundations of Computer Music. Cambridge, Mass.: MIT Press. Rodet, X. & Cointe, P. (1984) FORMES: Composition and Scheduling of Processes. Computer Music Journal, 8 (3). Reprinted in Roads (1989). Rumelhart, D.E. & Norman, D.A. (1985) Representation of Knowledge. In A.M.Aitkenhead and J.M.Slack (Eds). Issues in Cognitive Modeling. London: Lawrence Erlbaum Ass. Schenker, H. (1956) Der Freie Satz. Vienna: Universal Edition. Schottstaedt, W. (1983) PLA: A Composer’s Idea of a Language. Computer Music Journal, 7 (1). Reprinted in Roads (1989). Serafine, M.L. (1988) Music as Cognition: The Development of Thought in Sound. New York: Columbia University Press. Shepard, R.N. (1982) Structural approximations of musical pitch. In D.Deutsch (Ed.), The Psychology of Music. New York: Academic Press. Simon, H. (1969) The Architecture of Complexity. In The Sciences of the Artificial. Cambridge: MIT Press. Sloboda, J. (1985) The Musical Mind: The Cognitive Psychology of Music. Oxford: Clarendon Press. Smith, B.C. (1982) Reflection and Semantics in a Procedural Language. Ph.D.dissertation. Technical Report MIT/LCS/TR-272, Cambridge, Mass.: MIT. Smith, L. (1972) SCORE—A Musician’s Approach to Computer Music. Journal of the Audio Engineering Society, 20. Todd, P.M. & Loy, D.G. (eds.) (1991) Music and Connectionism, Cambridge, Mass.: MIT Press. Wessel, D. (1979) Timbre space as a musical control structure. Computer Music Journal, 3 (2). Winograd, T. (1975) Frame Representations and the Declarative/Procedural Controversy. In D.G. Bobrow and A.M.Collins (Eds.), Representation and Understanding: Studies in Cognitive Science. New York: Academic Press.
Contemporary music review
254
A connectionist and a traditional AI quantizer, symbolic versus sub-symbolic models of rhythm perception Peter Desain Utrecht School of Arts, The Netherlands and Music Department, City University, London, UK Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 239–254 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
The Symbolic AI paradigm and the Connectionist paradigm have produced some incompatible models of the same domain of cognition. Two such models in the field of rhythm perception, namely the Longuet-Higgins Musical Parser and the Desain & Honing connectionist quantizer, were studied in order to find ways to compare and evaluate them. Different perspectives from which to describe their behavior were developed, providing a conceptual as well as a visual representation of the operation of the models. With these tools it proved possible to discuss their similarities and differences and to narrow the gap between sub-symbolic and symbolic models. KEY WORDS: Rhythm perception, quantization, (sub)symbolic processing, connectionism
Introduction The so called Good Old Fashioned Artificial Intelligence has established itself firmly in the past decades as a research methodology. The methods and tools it uses are symbolic, highly structured representations of domain knowledge and transformations of these representations by means of formally stated rules. These rule based theories can function (and are vital) as abstract formal descriptions of aspects of cognition, constraining any cognitive theory. However, some authors go beyond that and claim that metal processes are symbolic operations performed on mental representations of rules (see Fodor, 1975). Until the connectionist paradigm emerged there was no real alternative to that view. But now, in this new paradigm, the departure from reliance on the explicit mental representation of rules is central, and thus the conception of cognition is fundamentally different (Bharucha & Olney, 1989). This holds regardless of the fact that the behavior of connectionist models could be formally described in rules. These distributed models
A Connectionist and a traditional AI quantizer
255
consistent of a large number of simple elements, or cells, each of which has its own activation level. These cells are interconnected in a network, the connections serving to excite or inhibit others. Connectionism opened the possibility of defining models which have characteristics that are hard to achieve in traditional AI, in particular robustness, flexibility and the possibility of learning (Rumelhart & McClelland, 1986). The connectionist boom brought forth much interesting work, also in the field of music (Todd & Loy, forthcoming). Although many researchers lost their critical attitude, impressed by the good performance of some (prototypical) models, it became soon clear that more study was needed to the limitations of these models. A connectionist model that ‘works” well, constitutes in itself no scientific progress, when questions like the sensitivity to parameter changes, the scalability to larger problems and the dependancy of the model on a specific input representation, cannot be answered. However, it is possible to describe the behavior of a connectionist model from different abstract perspectives that provide more insight in its limitations and its validity as a cognitive model than simulations or test runs alone. These perspectives are also fruitful for the analysis of traditional AI models. In this article we pursue this approach for a connectionist and a traditional AI model of rhythm perception as a case study for the wider issue how the paradigms themselves relate.
The quantization problem In performed music there are large deviations from the time intervals as they appear in the score (Clarke, 1987). Quantization is the process by which the time intervals in the score are recovered from the durations in a performed temporal sequence; to put it in another way, it is the process by which performed time intervals are factorized into abstract integer durations representing the notes in the score and local tempo factors. These tempo factors are aggregates of intended timing deviations like rubato and unintended timing deviations like noise of the motor system. This process of separating different discrete and continuous aspects of musical timing, however simple at first sight, and indeed forming a rather basic musical skill, proved to be very hard to model (Desain & Honing, forthcoming). As an example one could try to recover the intended rhythmic interpretation of the following temporal sequence (in milliseconds): 476:237:115:135:174:135:155:240:254:118:112:118:138:476 This task, however hard by calculation, yields an obvious and simple answer when the data is converted to an auditory stimulus: a sequence of drumbeats (the solution is given in note 1).
Known methods The simplest method of quantization, used by most commercially available music transcription programs, is the round-off of any point in time to the nearest point on a fixed time grid, with a resolution equal to, or an integral factor smaller than, the smallest
Contemporary music review
256
duration to be expected. This method is totally inappropriate: even when enhanced with facilities like user control over the grid resolution, it yields results that makes no musical sense, even when the performer is forced to play along with a metronome. However, this method can serve as the basis of more reasonable models in which the time grid is adapted if consistent deviations (notes being late or early) are detected. In this so called ‘tempo-tracking’ the design of the control behavior becomes crucial: the extraction of an error signal between time grid and note onsets, and the way in which this error influences the tempo of the grid. The most elaborate example is the ‘real time foot tapper’ (Dannenberg & Mont-Reynaud, 1987 and Boulanger, 1990), but still a 30% error ratio is reported for this system. A symbolic, rule based system for quantization was in place at the Stanford automatic transcription project (Chowning, Rush, Mont-Reynaud, Chafe, Schloss & Smith, 1984). It used knowledge about preferable ratios between time intervals as a basis for an optimal quantized description of the data. In such a knowledge based system it is relatively easy to use information from other domains (e.g. dynamic, harmonic) to help the quantization process, but one has to keep in mind that this increases the risk of style dependency and therefore brittleness. Because of its design as an unordered collection of rules it is, like all rule based systems, impossible to characterize its behavior in non-operational terms. The musical parser (Longuet-Higgins, 1987) comprises another symbolic AI approach to quantization, besides methods of tonal and articulation analysis that will be ignored here. It is highly hierarchical in its music representation and has a reasonable good performance. Furthermore it had the advantage of a published program being available. A Lisp version of this program is published in Desain (1990). The connectionist quantizer (Desain & Honing, 1989, 1991; Desain, Honing & de Rijk, 1989) is a distributed model of fairly simple processing elements. This model displays desirable properties like robustness, graceful degradation and precedence of local context, but as a model it is hard to understand why it works so well, and what its limitations are. These last two methods will now be described in more detail.
The Longuet-Higgins Musical Parser, a symbolic model Using just a little knowledge about meter, and exploiting that to the extreme, the Longuet-Higgins Musical Parser builds a metrical tree from performance data, and thus implicitly manages to quantize it. This method is supplied with an initial notion of a time interval called the beat. This interval is subdivided recursively in 2 or 3 parts looking for onset times near the start of each part, until the interval contains no more onsets. The ‘best’ subdivision is then returned. At each recursive level the interval length is adjusted on the basis of the onsets found, just as in simple tempo-tracking models. The output of the system consists of a list of trees, one for every analyzed beat. Each tree is of a combined binary-ternary nature, which means that each node has 0 (in case it is a leaf of the tree) or 2 or 3 sub-trees. During the construction of the tree there is a horizontal flow of information through the layers of the tree, seeking to maintain the same kind of subdivision at a certain level as long as possible. The description of the proposed subdivisions at each level of the tree is called meter. During the construction of
A Connectionist and a traditional AI quantizer
257
the tree a strict left to right order is maintained, and new sub-trees are created on a generate-and-test basis. This means that a proposed (and constructed) binary sub-tree may be rejected in favour of a tertiary one. The generate and test procedure is nonstandard in that it may, after checking and rejecting the first alternative, still reject the second in which case as yet the first alternative is chosen. There is one parameter (called tolerance) identified in the program. It is used in different places as the allowed margin of deviation in deciding if notes start or stop at a certain time. In this way the model does depend elegantly on global tempo by limiting the possibility of further subdivisions when an absolute time span (the tolerance) is reached: onsets that happen within the tolerance interval are considered synchronous.
The Desain & Honing connectionist quantizer: a sub-symbolic model A class of connectionist models, known as interactive activation and constraint satisfaction networks generally behave so as to converge towards an equilibrium state given some initial state. The connectionist quantization model is designed to converge from non-metrical performance data to a metrical equilibrium. The network topology is fixed (hard-wired) and so is the kind of interaction between cells: no learning takes place. The net comprises cells for each time interval in a temporal sequence, be it basic (one interonset: interval) or compound (spanning several notes). Two cells representing neighboring time intervals may interact and push each other to their ‘perfect’ values implied by an integer ratio, propagating the changes through the net. After a while the net stabilizes and a quantized temporal sequence can be read out. The interaction between cells, the change of their ratio, depends on the ratio of their durations, via a so called interaction function. Since the ratio of two time intervals is the only determinant of local behavior, the quantization result does not depend on absolute global tempo, nor can it handle polyphony. The interaction function is a section-wise polynomial with 2 parameters called peak and decay; the first reflects the size of the ‘capture’ range around an integer ratio, the second represents the decreasing influence of higher ratios. It has to be stressed that all aspects of the global behavior of the system are determined completely by these parameters. A model that takes a whole temporal sequence into consideration at once is not feasible when the aim is to develop a cognitive model. Luckily, it proved quite simple to design a version of the quantizer which operates upon a window of events. In such a model tempo tracking can handle slow global tempo changes. For reasons of space this part of the connectionist quantizer will not be described here.
Differences The models described can be characterized as complete antipoles in a number of aspects. They are summarized roughly in the table in figure 1. The huge differences made comparing them quite hard, but in the end the work was gratifying. Because the systems are prototypical for the two main AI paradigms the results may well generalize to other cases.
Contemporary music review
258
Different perspectives Different perspectives for describing these models will now be given, each at its own level of abstraction. Some perspectives will generalize over sets of inputs or parameters, some will reduce the amount of variability by keeping certain concepts fixed. I hope to show that this search for different representations of the behavior Longuet-Higgins musical parser
Desain & Honing connectionist quantizer
Symbolic
numerical
Central
distributed
Search
optimization
Hierarchical
heterarchical
knowledge based
knowledge free
Figure 1 A summary of the differences of the two models under study of a computational cognitive model, conceptual as well as visual, is fruitful, even for analyzing traditional symbolic AI programs. The most direct and raw representation of a computational model is a trace of the computation itself, an overview of how the internal state of the system changes in the course of a complete calculation as a function of the computation-time or the number of computation steps taken. A visualization of such a trace for the connectionist model is shown in Desain & Honing, (1989) and for the Longuet-Higgins parser similar graphic representations can be devised. A deficit of these representations is that they can only be given for one example input at a time, and thus are extremely dependent on the choice of input—in a sense it is easy to ‘lie’ with these examples by picking one that behaves well. But, on the other hand, these representations show in full detail the ongoing processes and thus enable interpretations and hypothesis forming. At the other end of the spectrum of possible perspectives is the statistical method, reducing all the information to a number of correct responses. We can assume that, when a skilled performer plays a rhythm, the performed temporal sequence should be quantized as the presented score. Collecting a set of performances and counting the numbers quantized correctly by the model gives us then an indication of its validity. The number of correct quantizations will in general be a function of the possible parameter values given to the model. Visualization of this dependency is useful in the study of the parameter-sensitivity of the models. Often connectionist models behave badly in this respect. They might need specific parameter settings for different problems. Or they might not ‘scale-up’: for larger problems the model only works for an increasingly smaller range of parameter settings specific to the problem at hand. Parameters might also have no cognitive relevance, and as such could not be used to control the global emerging behavior of the model, or they might be highly dependent. A visualization of the parameter space can detect such problems. Both models behave well in this respect, but because of space limitations we cannot present the parameter spaces here.
A Connectionist and a traditional AI quantizer
259
Both perspectives have their drawbacks, one being too specific, the other one too general. If we give up some detail on the speed and order of processing that was available in the computation trace, and give up the free choice of musical material that was available in the parameter space we can characterize the precise behavior of the system for a family of sequences: all possible sequences of a fixed small length. This set can be considered to constitute a rhythm space: the problem space of quantization.
Rhythm space perspective Let us consider the 3 dimensional space of all possible temporal sequences of 3 interonset intervals (four bangs on a drum). Every point in this space represents a unique temporal sequence. One could envisage this space projected in a room, with one corner as the origin. The distance along one wall represents the length of the first time interval, the distance along the second wall represents the length of the second interval, and the height above the floor represents the third. In this space certain points will be perfectly metronomical sequences, other points will represent performed deviations from them. Let us call this space ‘rhythm space’ although ‘temporal sequence space’ would be more appropriate. A quantization process maps each point in this space to another; it assigns to each sequence a solution of the quantization. Thus the problem space of a quantization method is the whole rhythm space, the solution space is a set of points within this rhythm space. A further characterization of the solution space (e.g., what constraints limit the set of permisible quantizations—is, for instance, a complex temporal pattern such as 7:11:2 to be considered allowable?) cannot be given at the moment, which is part of the reason why quantization is a difficult problem to define. Trajectories in rhythm space If the model has intermediate processing states that are temporal sequences themselves, as is the case with the connectionist model, the computation trace becomes a trajectory through this rhythm space. Otherwise a simple straight line can indicate the mapping from problem to solution. Easy visualization requires mapping of this space to 2 dimensions which can be done by assuming that the whole time-length of the temporal sequence is kept constant, the third interval then follows from the first two and the first two durations can be taken as the only independent variables: the x and y axes of a diagram. This normalization, which factors out global tempo, reduces the general applicability of the method if the theory is itself dependent on global tempo. The connectionist quantizer does not (but we admit that to model human rhythm perception accurately, it should). The Longuet-Higgins model does depend on global tempo and for this model the rhythm space can only be shown for one global tempo at a time. If all three intervals are restricted between a minimum and a maximum time span, the allowed portion of the 2 dimensional projection forms a parallelogram. In figure 2a and 2b the rhythm space for the two models is shown, given an input sequence of three notes between a sixteenths and a double dotted quarter note. The whole sequence has a total duration of three quarter notes. The different solutions are indicated by small circles. Note that in the Longuet-Higgins model some solutions contain inter-onset intervals of
Contemporary music review
260
length zero. That is because this model interprets two onsets that happen within the tolerance as synchronous. Regions in rhythm space Because the behavior of the connectionist model is completely determined by a temporal sequence, any point on a trajectory will be mapped to the end point of that trajectory. This means that the connectionist model ‘carves up’ rhythm space into little compartments around each solution. Each compartment or region contains all the sequences that will be quantized equivalently. Now we can abstract from the trajectory from initial state to solution and only characterize the compartments. These areas, the so called basins of attraction, can be shown as a partitioning of the
Figure 2a Trajectories through rhythm space in the connectionist model
A Connectionist and a traditional AI quantizer
261
Figure 2b Mapping in rhythm space in the traditional AI model rhythm space, as is depicted in figure 3a. The Longuet-Higgins model does not behave such that the solution itself will always lie within the region that will map to that solution. But still the region of all points that map to the same solution (the equivalence classes of the mapping) can be shown as in figure 3b. We can now check the differences between the models. E.g. the region that maps to the rhythm of three quarter notes (marked A in the figures) is much larger in the Longuet-Higgins model than it is in the connectionist model. Another difference is the behavior around the region marked C in figure 3a, which corresponds to a 2:1:2 rhythm. This solution is not present in the Longuet-Higgins model, because it is based on a five-fold division. Influence of context A good way to understand the influence of context (previously presented musical material) is to consider how these maps change under the influence of it. In figure 4a context of two dotted quarter notes was presented before the notes shown in the rhythm space. This context heavily biases the behavior of the connectionist model to quantize the following inter-onset intervals in subdivisions thereof as is shown by the enlargement of the area marked B. The area marked C completely disappears in the light of the contextual evidence for these subdivisions. Also in the Longuet-Higgins model the quantization is influenced (or even guided) by context. It does so by propagating the
Contemporary music review
262
established meter to the processing of the remaining data. Using a duple meter as context, a very similar distortion of the regions in rhythm space can be seen (figure 4b). Influence of parameters These maps can also be used to understand the influence of the parameters of the model by interpreting their changes under the influence of different values. In figure 5a the so called peak parameter of the connectionist quantizer is slightly increased. This yields a denser map of smaller regions and adds new regions around ‘difficult’ rhythms. In the Longuet-Higgins model we can achieve a similar change by decreasing the tolerance. Now the model will behave more ‘precisely’ and new solutions and small regions around them emerge. Cognitive interpretation We really need to compare these maps now to the corresponding maps for the human listener to be able to judge the cognitive validity of the models. In principle it is possible to obtain this empirical data in categorical perception experiments, presenting subjects with temporal sequences from the space in a transcription task. But mapping out the whole space will be a paramount task, even for such short sequences. Data about the borderlines between some regions can be found in Schulze (1989) and Clarke (1987).
A Connectionist and a traditional AI quantizer
263
Figure 3a Regions in rhythm space of the connectionist model
Figure 3b Regions in rhythm space in the traditional AI model
Contemporary music review
264
Figure 4a Influence of context (two dotted quarter notes) in the connectionist model
A Connectionist and a traditional AI quantizer
265
Figure 4b Influence of context (duple meter) in the traditional AI model
Contemporary music review
266
Figure 5a Influence of the peak parameter in the connectionist model
A Connectionist and a traditional AI quantizer
267
Figure 5b Influence of the tolerance parameter in the traditional AI model Expectancy space perspective The previous representations were based on the abstraction of a whole temporal sequence that served as input of the system. Since the full models work incrementally, a representation that makes explicit how a previously established context influences future decisions would be useful. We have to ignore here any influence of new incoming data back to the previously processed results, which is a reduction for both models. In the full process model of the connectionist quantizer we can ‘clamp’ the whole of the network state to the partial solution obtained and study what would happen to a new incoming onset. This virtual new onset, acting as a measuring probe, will be moved by the model to an earlier or a later time. If it is given a positive time shift to a later time, the model clearly had not yet ‘expected’ an event. If we postulate a measure of expectation of an event, it has to be larger at a later time for this ‘early’ event. Vice versa: a negative movement, a shift to an earlier time, indicates a dropping expectancy: the event is late. So we can integrate the movement to yield an expectancy measure. It forms a curve with peaks at places where an event, were it to happen there, would stay in place. We could also rephrase this explanation in terms of potential and energy. The potential curve projected into the future by the network is then the inverse of the expectancy. But in the context of cognitive models expectancy seems a more appropriate concept. This process
Contemporary music review
268
of calculating an expectancy can even be done in an incremental way: the expectancy is calculated until a real new event happens, that event is added to the context, and the process starts all over again. In figure 6 this curve is shown for a rhythm in 2/4 and the peaks in between and at the note onsets clearly are positioned at important metrical boundaries. Note that for the sake of clarity the input sequence is already idealized here to a metronomical performance. To show that these curves capture indeed an abstract property of the input data we can look at the last part of the curve in figure 6 (the last measure between time 16 and 20), and study that for different 2/4 contexts as is done in figure 7. It shows how the different rhythms project a very similar expectancy into the future. This even prompts the challenging thought that these curves constitute a kind of rhythmic ‘signature’ that can be compared to produce a kind of distance measure, a metric, of rhythms. One further corroboration of the usefulness of these curves is shown in figure 8. Here the expectancies of two rhythms are compared: one in 6/8 (a division in 2 and then in 3) and the other with time signature 3/4 (a division in 3 and then in 2). The prominent peak in the curve of the first one is located at half the measure length, lesser peaks appear at 1/6 and 2/6, and at 4/6 and 5/6. The curve of the second one has prominent peaks at 1/3 and 2/3 of the measure, and somewhat less pronounced peaks at 1/6, 1/2 and 5/6. These findings clearly correspond with the musical notion of the importance of the different points in time given these meters. For the Longuet-Higgins model it is a bit difficult to ‘clamp’ the internal state to a partial solution because of possible backtracking. However, no backtracking can take place across beat boundaries, and after each beat the model only propagates the established meter and the length of a beat (the tempo) to the processing of the next beat, expecting them to apply there too. So given a beat length and a meter, they will determine the points in time where notes will be found and assigned to a metrical level. Together with the resolution of this decision (the tolerance), a comparable kind of expectancy of future onset times can be postulated. Of course
A Connectionist and a traditional AI quantizer
269
Figure 6 Expectancy of onsets in the connectionist model
Contemporary music review
270
Figure 7 Expectancy of onsets in different 2/4 contexts in the connectionist model
Figure 8 Expectancy of onsets in 6/8 versus 3/4 rhythm in the connectionist model
A Connectionist and a traditional AI quantizer
271
Figure 9 Expectancy of onsets in 2/4 context of the two models the expectancy can only be given on an ordinal scale: onsets at higher metrical levels are expected ‘more’. In figure 9 such a measure is shown for a twofold two-division (2/4 meter) and at a beat length of one bar, together with the expectancy curve of the connectionist model from figure 6. It is striking to see how the peaks in both curves now coincide. I feel that this is the point where the two models meet. Meter, a symbolic, structural concept at the very heart of the Longuet-Higgins parser, emerges out of the global and abstracted behavior of the connectionist quantizer. Here we are on the verge of the possibility of ‘reading-out’ symbolic representations from a sub-symbolic model.
Conclusion It is possible to represent the behavior of two incompatible models of rhythm perception, the symbolic Longuet-Higgins musical parser and the Desain & Honing connectionist quantizer in different perspectives that make them comparable. These perspectives—the process state trace, the parameter and rhythm space, and the expectancy perspective— highlight different aspects of the models. Visualizations of these representations turned out to be crucial—even if the dimensionality or the flexability had to be reduced. These methods also showed the richness of the topic of quantization, a process that lies at the heart of rhythm perception. It is central because it separates two fairly different kinds of timing data: the discrete and the continuous, each of which forms the postulated
Contemporary music review
272
input of different theories of temporal perception. It is well known that the concept of meter is of great importance in encoding, interpretation and memory in musical tasks (Palmer & Krumhansl, 1990) and it is not surprising that this symbolic concept, even though not represented explicitly in the connectionist models is still present implicitly and can emerge from the net with the help of an appropriate measuring method. Acknowledgements I would like to thank the colleagues who helped in this research: Eric Clarke for providing a very stimulating research environment at City University. Christopher Longuet-Higgins for encouragement and fruitful discussions about his parser. Steve McAdams for his support. Michel Koenders who automated the categorical perception experiment and Jeroen Schuijt who programmed a test suite for the quantizer, for their work and enthusiasm. Siebe de Vos and Peter van Oosten for commenting on drafts of this text. And especially Henkjan Honing for answering my midnight telephone calls.
Notes 1. The solution is the rhythm:
References Bharucha J.J. & Olney, K.L. (1989) Tonal Cognition, Artificial Intelligence and Neural Nets. Contemporary Music Review, 4. Boulanger, R. (1990) Conducting the MIDI Orchestra, part 1: Interviews with Max Mathews, Barry Vercoe, and Roger Dannenberg. Computer Music Journal, 14 (2). Clarke, E.F. (1987) Levels of Structure in Musical Time. Contemporary Music Review, 2 (1). Clarke, E.F. (1987) Categorical Rhythm Perception: An Ecological Perspective. In A.Gabrielsson (Ed.) Action and Perception in Rhythm and Music. Stockholm: Royal Swedish Academy of Music, vol. 55. Chowning, J., Rush, L., Mont-Reynaud, B., Chafe, C., Andrew Schloss & Smith, J. (1984) Intelligent systems for the Analysis of Digitized Acoustical Signals. CCRMA Report No. STAN-M-15. Dannenberg, R.B. & Mont-Reynaud, B. (1987) An on-line Algorithm for Real Time Accompaniment. Proceedings of the 1987 International Computer Music Conference. San Francisco: Computer Music Association Desain, P. & Honing, H. (1989) Quantization of Musical Time: A Connectionist Approach. Computer Music Journal, 13 (3), also to appear in (Todd & Loy, forthcoming). Desain, P., Honing, H. & de Rijk, K. (1989) A Connectionist Quantizer. Proceedings of the 1989 International Computer Music Conference. San Francisco: Computer Music Association. Desain, P. & Honing, H. (forthcoming) The Quantization Problem: Traditional and Connectionist Approaches. In Musical Intelligence edited by M.Balaban, K.Ebcioglu and O.Laske, Menlo Park: AAAI book.
A Connectionist and a traditional AI quantizer
273
Desain, P. (1990) Parsing the Parser, a Case Study in Programming Style. Computers in Music Research, 2. Fodor, J.A. (1975) The Language of Thought. New York: Crowell. Palmer, C. & Krumhansl, C.L. (1990) Mental Representations for Musical Meter. Journal of Experimental Psychology: Human Perception and Performance. Longuet-Higgins, H.C (1987) Mental Processes. Cambridge, Mass.: MIT Press. Rumelhart, D.E. & McClelland, J.E. (Eds.) (1986) Parallel Distributed Processing. Cambridge: MIT Press. Schulze, H. (1989) Categorical Perception of Rhythmic Patterns. Psychological Research, 51. Todd, P.M. & Loy, D.G. (Eds.) (forthcoming) Music and Connectionism. Cambridge, Mass.: MIT Press.
Contemporary music review
274
Computer perception of phrase structure Robert Fraser Colchester Institute, School of Music, UK Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 255–266 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
A computer algorithm for modelling phrase perception in a monophonic rhythm is described and its relationship to phrase structure in general is discussed. The algorithm groups durations by proximity of attack and connections between relatively long notes, producing a hierarchy of groupings and identifying durational phrases among these. The connection between longer notes is also an aspect of metrical perception. Supplementary metrical information enables durational phrases to be read within a combined hierarchical representation of meter and phrasing, in treediagram form. In a wider context, structural characteristics perceptually stronger than duration, such as repetition, dissonance and resolution, and cadence, are seen to be responsible for the predominant phrase grouping when incongruent with durational phrasing. Despite the algorithm’s limitations, durational phrasing provides the foundation for a broader model of phrase perception. KEY WORDS: Rhythm, Phrase grouping, Hierarchical structure, Computer modelling.
Introduction This paper is concerned with the computer modelling of phrase grouping: that is, with analytical operations that are analogous to (but not necessarily the same as) the human perceptual processes and preferences that result in the grouping together of notes into phrases. The computer program discussed below does not deal with every aspect of phrase grouping, but merely identifies durational phrases in a single-line rhythm, in accordance with a simple grouping algorithm and various pre-defined “Closure Rules”. The wider purpose of the paper is to set these durational grouping mechanisms within the context of a more general theory of rhythmic structure and its perception (Fraser, 1982), of which they form the first stage, and to illustrate some aspects of the relationship between durational phrases and other characteristics which influence phrase grouping in general.
Computer perception of phrase structure
275
The term “phrase grouping”, or simply grouping (Lerdahl & Jackendoff, 1983), distinguishes the notion of phrases as relatively closed, or self-contained, musical units from that of the articulated phrasing associated with performance. The present view is that articulation is simply one of several characteristics which give rise to phrase grouping. While it may greatly assist the communication of phrases, it is normally an essential part of rhythmic structure only when it runs counter to other characteristics; it may therefore be taken into account at a more advanced stage of investigation. It is a part of standard musical theory to think of phrase structure as a hierarchy, with small phrases being grouped together to form larger ones and these in turn becoming part of larger musical “sentences”, “paragraphs” and sections. From this model come two important qualifications to the sense of closure: a phrase may be (a) closed in one sense and open-ended in another, an obvious example being the first half of a Classical antecedent-consequent phrase, or (b) closed on one level of the hierarchy and open on the next higher one (Meyer, 1973). Although the hierarchical principle is central to the concept of grouping, it is important to keep in mind that large-scale musical organisation is not necessarily hierachical, for example in music based on variation, contrast, or continuous changes in texture or timbre. Furthermore, the structural importance of phrase groupings as musical units has not yet been satisfactorily established by musical theory, except where they coincide with tonal cadences, as they must share the stage with other segmentations of equal importance: harmonic, metrical and thematic units and and the partitioning of pitch and timbral streams into musical lines (McAdams and Bregman, 1985). Since phrases are most often contained within individual musical parts, re-scorings such as Webern’s arrangement of the Ricercare from Bach’s Musical Offering can, in creating alternative phrases, appear to undermine the relevance of those groupings most apparent in the original, by drawing attention to the many other arbitrary segmentations possible. Nevertheless, phrase grouping is empirically an important structural principle in its own right, the investigation of which may help to shed light on its relationship to other aspects of musical organisation, especially meter and tonal harmony, which is fundamental to several recent lines of research (Schachter, 1980; Lerdahl & Jackendoff, 1983; Marsden, 1987; Ebcioglu, 1988).
Pre-metrical grouping of durations Considered on its own, the rhythm of a single musical line has only one feature that could account for any possible grouping: the different lengths of its notes, which describe what may be termed its duration contour. In the computer program, these duration values are coded proportionally, as in musical notation, but using numerical integers such that the shortest duration typically has the value 1. For present purposes, a duration may be defined as the timelength between adjacent attack points of a given musical line, rests being subsumed by the preceding note as an aspect of articulation. One of the main musical features of a rhythm is the sense of movement from shorter to longer durations in which the latter may act as points of closure, capable of absorbing the “momentum” of the preceding shorter notes. Thus one of the two hypotheses on which the grouping
Contemporary music review
276
algorithm is based is that notes tend to be grouped together according to the relative proximity of their points of attack (Figure 1). This notion is familiar as an example of the Gesalt principle of proximity (Sloboda, 1985) and is supported by experimental evidence (see Fraisse, 1982). It
Figure 1 Grouping by proximity of attack. also forms the basis of previous theories of durational grouping, such as those Cooper & Meyer (1960), Tenney & Polansky (1980) and Lerdahl & Jacendoff (1983). The first and last of these consider several different forms of grouping simultaneously in their analyses, without sufficient distinction between them to constitute the explicit relationships needed for a computer algorithm; Lerdahl & Jackendoff s theory has nevertheless, with some modifications, been developed into at least two computer programs (Rosenthal, 1988; Baker, 1989). Tenney & Polansky’s theory is also implemented as a computer program, though it is difficult to equate the mathematical basis of the algorithm with the human experience of rhythmic grouping. The second hypothesis underlying the grouping algorithm is that we make mental connections between non-adjacent relatively long notes (Longuet-Higgins & Lee, 1982; Sloboda, 1985, p. 187; Povel & Essens, 1985). Such notes are often called agogic accents (Riemann, 1884), though the sense of accent is not dynamic but rather the more general one of being “marked for consciousness in some way” (Cooper & Meyer, 1960, p. 8), in this case by relative length. An accent in this general sense may be termed a structural accent, meaning simply a type of accentuation that gives rise to structure; such accents are potentially metrical, but may instead by perceived as cross-rhythmic (Fraser, 1982). Lerdahl and Jackendoff (1983, p. 17) call these “phenomenal accents” and assign the term “structural accent” to pitch and harmonic events in total music which articulate “structural beginnings” at the start of phrase groupings and cadences at the end (1983, p. 30–35).
Computer perception of phrase structure
277
Figure 2 Schubert, Arpeggione Sonata: pre-metrical grouping of durations. An asterisk denotes a phrase terminator, the grouping being the entire tree subtended from that level’s branches. The grouping algorithm is initially pre-metrical (i.e. non-metrical) and is hierarchical, proceeding from bottom to top, with the various durations being grouped in ascending order of size. A typical example is illustrated by Figure 2, although the actual program output is in the form of coded numbers, equivalent to the tree-diagram. On each hierarchical level, the shortest timelength (interval between attacks) is found and notes separated by this timelength are connected together. (The timelengths are shown on the left of Figure 2). Notes of the smallest remaining duration value that have been connected in this way to subsequent notes are then eliminated from the analysis, as if replaced by rests, and the procedure is repeated to form the next higher level. Figure 2 shows that on each level the separate tree structures, with the exception of the last branch, depict either the grouping of a duration with the following note or the connection of a structurally accented duration to a subsequent non-adjacent note. A note connected to a following shorter one is coded to allow for both separation and continuity (shown by a dotted line in Figure 2). The last branch normally has the longest duration value of the structure and connections to a final shorter note are disregarded by the program. However such connections are upheld (as in Figure 3, grouping A) if that note’s own connection to a subsequent note would otherwise be made over a longer time-span (Figure 3, grouping B). A structure on one level may be linked to one or more structures on other levels by notes in common (vertically aligned branches) to form a composite tree. A primitive form of grouping may thus be read as the string of durations specified by such a tree, that is, a connected structure on one level and all linked structures subtended from it. An example in Figure 2 is the string 4 3 1 5, subtended from the structure on the third level (timelength 8).
Figure 3 Although such primitive groupings normally end on the longest duration, they do not always specify phrases. The program therefore tests each grouping against several predefined Closure Rules, which attempt to define durational phrases: Closure Rule 1: A separate grouping, whose last duration has the greatest value of the grouping, is a durational phrase if the grouping contains at least one smaller duration value.
Contemporary music review
278
Closure Rule 2: If the duration following the last note of a grouping is of equal or greater length, the grouping is not a durational phrase. This rule does not apply if the grouping’s last note falls at the ende of a metrical unit whose duration string is also a durational phrase according to Closure Rule 1. Closure Rule 3 (Repetition): A grouping that can be subdivided into two (or more) identical duration strings is not a durational phrase.
Groupings which are identified as durational phrases are coded accordingly by the program; in the tree-diagrams these are indicated by asterisks. In Figure 2 each
Figure 4 Mozart, Symphony No. 40. Some pre-metrical connections are also metrical; others, such as the top level, are not. primitive grouping is also a durational phrase by Closure Rule 1. The other two Closure Rules are illustrated by Figure 4. In the string 1 1 2 4 the substring 1 1 2 is not a durational phrase by Closure Rule 2 (although one could argue that it displays temporary closure, especially at a very slow tempo and also as a repetition of the prior phrase grouping 1 1 2). The grouping 1 1 2 1 1 2 is invalid as a durational phrase by Closure Rule 3, as is the string of the example as a whole. The string 1 1 2 1 1 2 1 1 2 4 is a durational phrase at the level of the timelength 4, as is indicated by the asterisk, even though the highest-level branch associated with its final note is on the next level down.
Metrical implications It is noticeable in the examples of Figures 2 and 4 that many of the connections, for example those on the level of the timelength 4 in both examples, fall on main beats; one can go further and say that in these instances the connections between structural accents of duration are responsible for the perception of metrical accents on the notes concerned. Similar metrical links between relatively long notes are made by the metrical perception programs of Longuet-Higgins and his collaborators (Longuet-Higgins and Steedman, 1971; Longuet-Higgins, 1976; Longuet-Higgins and Lee, 1982), but further constraints are incorporated into these programs both to avoid dubious upbeats and to resolve conflicts between the competing claims of different “longer” notes. In particular, an
Computer perception of phrase structure
279
initial regular meter (such as the minim pulse indicated by the timelength 4 in Figure 4) is normally given precedence over a subsequent long duration that is incongruent with it (such as the minim in bar 2 of Figure 4), resulting in a syncopation rather than a reappraisal of the meter. However, it is not always clear, on the basis of durational information alone, when a program should prefer syncopation to an unsyncopated solution or perceive upbeats instead of a main beat on a relatively short note; these questions are discussed in detail by Lee (1985). As the present program is concerned with phrase grouping, no metrical constraints have been applied and Figures 2, 4, 5 and 6 therefore present confusing information if read metrically. Figure 2 appears to put forward a two-bar metrical unit (length 16) beginning on bar 2, with a one-bar upbeat, whereas in fact the top level merely shows two lower-level groupings united to create a longer phrase. In
Figure 5 Wagner, Prelude to Die Meistersinger. The pre-metrical bracket on the fourth level (6) would be lost in a metrical interpretation. Figure 4 the top level beam would be incompatible metrically with the branches on the minim level (4); likewise in Figure 5 the connection on the fourth level (6) in bar 4 is incompatible with the continuation of the pulse established on the minim level (4) and the semibreve level (8). In an example where there is no clear regular meter, such as the fugue subject of Figure 6, the connections between structural accents could be seen as putting forward an irregular metrical structure, but it is by no means certain that they are likely to be perceived as such. Un-metrical connections not only indicate interesting subsidiary structures but also highlight potential discrepancies that a metrical program would have to resolve by factors such as a preference for regular over irregular meter or the influence of characteristics present in the harmony, melody or accompanying rhythm; these aspects are beyond the scope of this paper and are discussed at length elsewhere (Fraser, 1982).
Contemporary music review
280
Figure 6 Bach, Well-Tempered Clavier, Bk. 1; Fugue in F sharp minor. Durational phrasing within rhythmic structure There is no suggestion that in metrical music durational phrases are formed by premetrical connections independently of their metrical context. The perception of meter and the perception of phrase grouping go hand-in-hand, with phrases being understood within the established meter and phraselengths influencing the next level of metre (if any). Thus a further refinement of the program is the input of metrical information, so that durational phrase groupings may be read within a metrical context, irrespective of how this has been derived. Working from the lowest level upwards, metrical levels and their branches are merged with the pre-metrical ones as follows: (1) if a metrical branch is already present as a pre-metrical one, the latter is retained; (2) if a pre-metrical branch is contradictory with a metrical one on the same level, it is superseded by the latter; (3) pre-metrical branches that are not vertically aligned above established branches on the next lower level, are bypassed. In fact, the entire pre-metrical analysis is stored separately from the metrically adjusted one, both to preserve superseded structures as subsidiary ones and so that the durational grouping analysis may be transferred to the new structure. This is done by identifying the highest-level (metrical) branches that subtend the grouping as a whole, and placing a phrase marker at that level. The addition of metrical information means that phrase groupings can be associated with a particular metrical level. Figure 7 illustrates how duration closure is identified at various levels: for example, the first four-bar phrase is indicated by
Computer perception of phrase structure
281
Figure 7 Haydn, Symphony No. 104. Added metrical information (thin-line brackets) is combined with compatible pre-metrical brackets (bold lines) to show the phrase structure in a metrical context. the first asterisk at the metrical level 16; its two highest-level branches at this level, together with all of their subtended branches, specify the duration string of the phrase, just as their pre-metrical equivalents do, though the latter are placed on different notes (the second and third durations 4). The metrical timelength associated with the whole phrase (32) is not assigned until the next level, even though the phrase is first identified one level down, where it can be counted metrically as 4×8. The sub-phrases within this phrase are likewise associated with the metrical timelengths 4 4 8, an anapestic phrase contour which can be reduced proportionally to the pattern 1 1 2. The association of phrase groupings with certain metrical levels also opens up the possibility of grading the lowest levels. Since one or two metrical levels normally predominate, as determined by tempo—the idea of the tactus—groupings below these levels are usually of lesser interest, in context. Conversely, at higher levels of phrase organisation, such as 32 bars, a quasi-metrical connection on the level above the phrase may have little significance other than to create a larger grouping; while we do make such groupings, we would seem to become progressively less aware of the length of the phrase as a single timelength associated with the phrase’s first note; such awareness or otherwise is also affected by tempo. The tendency for phrase groupings to remain the same within different metrical contexts is not lost in the combined representation. However, in some examples containing adjacent equal durations, the imposition of different meters (for example by a longer-note accompaniment) can produce different groupings; the metrical qualification to Closure Rule 2 is invoked in such cases. Two types of durational phrase arising in a metrical context, that are currently not recognised by the algorithm because they do not end on the longest duration of the phrase, are matters for future investigation. One involves the recognition of a metrical contour, by distinguishing between regular metrical pulses (on the same level) that fall on notes and those that do not. In Figure 8, at the crotchet level (4), this contour would be 12112…(crotchet equals 1). The application of Closure Rule 1 to this rhythm would produce an initial two-bar phrase, ending on the dotted-crotchet (6), that is not recognised by the pre-metrical analysis. This phrase is indicated by the asterisk on the top level (16) of Figure 8, whose structure is otherwise as found by the program.
Contemporary music review
282
Figure 8 Chopin, Sonata in B minor. The opening two-bar phrase indicated by the top-level bracket is not found by the program and requires an analysis of the crotchet-level rhythm. The second type is what Cooper and Meyer (1960, p. 30) call a “closed trochee”. A subtree separated from a preceding longer note and forming an upbeat to the next metrical accent (Figure 9a) can itself have closure that counteracts the previous separation and suggests an alternative grouping (Figure 9b), particularly in conjunction with other characteristics. Such rhythms may be seen as inherently ambiguous in respect of phrase grouping.
Figure 9 Closed trochee. Relationship to other structural characteristics Although the grouping algorithm provides a means of identifying durational phrases and of grading these into a hierarchy of structural levels, it merely considers one structural characteristic of music in isolation. In a wider musical context, including pitch and harmonic information, it is soon found to give an inaccurate account of the phrase grouping in many cases. Rather than assume the method to be inherently flawed, it is preferable to see phrase grouping as a product of several structural characteristics, each with its own perceptual “strength”. When these come into conflict (are cross-rhythmic) it is normal for one grouping to prevail and for the alternatives either to be lost, if they are very weak, or to be perceptible as subsidiary structures. Several writers, including Cooper
Computer perception of phrase structure
283
& Meyer (1960) and Lerdahl & Jackendoff (1983), draw attention to the existence of different factors in the perception of rhythmic structure, but do not attempt to distinguish these in relative strength. This has been seen as a major shortcoming in Lerdahl & Jackendoff’s theory (Clarke, 1986; they appear to place equally weighted phrase boundaries, derived by various means, without provision for their incongruence. A detailed study of the relationships between the various characteristics that influence phrase grouping may be found elsewhere (Fraser, 1982); for the present, it is sufficient to summarise some findings of the earlier work, in noting some instances of characteristics that are stronger than durational grouping when incongruent with it: 1. Dissonance and resolution. Since the dissonance is understood as an inflection of its chord of resolution, the two are almost always bound together in the same phrase (Figure 10). In a clotted-note rhythm, however, especially at a fast tempo, the extreme durational separation can predominate with interesting results (Chopin, Andante Spianato, op. 22, b. 187–8).
Figure 10 Mozart, Sonata in F, K. 332. Dissonance and resolution (d-r) are normally stronger than durational grouping; the dual branch in bar 8 reflects phrase elision at the cadence. 2. Tonal cadence. Like dissonance and resolution, a cadence cannot normally be split by durational separation; however, a dual branch is often useful, as in Figure 10, bar 8, where both groupings are relevant and the “structural” duration, as it were, of the cadence note is of a whole metrical unit, rather than that of the surface rhythm. 3. Repetition. A whole phrase need not be repeated for repetition to be established, although for a point of durational repetition to be recognisable, it must involve at least one change of duration value that could make the repetition distinctive. (Steedman, 1977, attempts a more explicit definition for the purposes of metrical perception). Such a point of repetition is one of the strongest ways of establishing a phrase boundary, normally at the same level as the original statement. Figure 11 illustrates the predominance of the repetition of an upbeat structure, where the durational closure would precede the repeated upbeat. However, in the opposite case, if notes preceding the point of repetition are strongly bound by durational grouping to a note of duration closure within the proposed upbeat, then a phrase boundary at the point of repetition is unlikely to be asserted (unless aided by articulation), although the repetition is still likely to be perceptible.
Contemporary music review
284
Where a point of repetition falls on the first note of a metrical unit, the same conditions apply as for an upbeat repetition; however, it is common for a note or notes to be grouped durationally to the point of repetition, as an upbeat, so that the repeated phrase has to be understood in two senses: both with and without its upbeat, the latter structure being much stronger, due to the repetition. Being relatively weak, such upbeat notes tend to be understood as a link, rather than a true upbeat (Figure 10, bar 8). In Figure 5, bar 4, the durational upbeat to the point of
Figure 11 Brahms, Symphony No. 3. Repetition produces a two-bar grouping on the third level (12), as opposed to the program’s groupings ending on the dotted-crotchets (12). repetition in bar 5 is almost separated from it by a phrase boundary, due to its second function as an extended resolution. It is obvious that repetition need not be exact to be recognisable, but this leads to questions of reduction and variation, which are beyond the scope of this paper (Steedman, 1977).
Conclusions The pre-metrical grouping algorithm identifies structures in a monophonic rhythm which appear to correspond to the durational phrases commonly understood by the musician, at least in certain respects. In long passages it may identify groupings that are beyond normal perceptual abilities, though these may be perceptible in conjunction with other characteristics, such as tonal cadences. The hierarchy of structural levels, derived from the connections between relatively long notes, is related to the perception of meter; however, in music in which regular meter is normally perceived, these connections may be at odds with metrical tendencies such as the continuation of an established pulse. When metrical information is included as input data, the hierarchy becomes a representation of rhythmic structure combining meter and phrase grouping. Such structures require interpretation in respect of tempo, and in many cases involving repetition the model needs further refinement to take this structural characteristic into
Computer perception of phrase structure
285
account. In the context of a melodic line with tonal harmonic implications, other structural characteristics which influence the predominant phrase grouping, such as the perception of dissonance and resolution and tonal cadence, would need to be included in the model, to avoid “incorrect” groupings. Although the algorithm may appear rigid and prescriptive in its identification of structures, it simply makes choices on the basis that some structures are perceptually more likely than others, in the given context. Its analyses should not be construed as depicting what a person, even a trained musician, would necessarily hear or perceive in a score. Alternative structures may be harder to perceive, and hence “weaker”, for instance the metrical connection between irregularly spaced longer notes in preference to a regular pulse and syncopation, but such structure are nonetheless open to observation if the individual chooses and may be highlighted by other musical lines. Different groupings may co-exist, even though one predominates; this is easily verified in the case of incongruent durational and melodic groupings; the latter are generally weaker and subsidiary, but they are not imperceptible. There is also a distinction to be made between what any one individual can perceive and what is relevant to the understanding of a piece, however hard to perceive. For practical purposes, the algorithm works with the given rhythm as a whole, but it is quite possible to have a succession of program runs on an increasingly longer rhythm, which would go further towards modelling a “left-to-right” and “bottom-to-top” hierarchical construction; however, as we do not know the precise order of perceptual operations used by an individual, this exercise would seem of little value, except in a “real-time” program reacting to “live” musical input. The structural processes considered here have been greatly simplified by the use of rationalised durations and a single presupposed musical line; a more sophisticated model might take as its input MIDI data recorded on a sequencer, in which neither presumption can be made. In isolating separate structural characteristics and modelling their effect on structure, aspects of our own understanding of music can be revealed or verified. Although there is a complex relationship between the various characteristics, durational phrase grouping provides a suitable foundation for a broader cognitive model of rhythmic structure.
References Baker, M. (1989) An Artificial Intelligence Approach to Musical Grouping Analysis. Contemporary Music Review, 3 (1), 43–68. Clarke, E. (1986) Theory, Analysis and the Psychology of Music: A Critical Evaluation of Lerdahl, F. and Jackendoff, R., A Generative Theory of Tonal Music. Psychology of Music, 14, 3–16. Cooper, G. & Meyer, L. (1960) The Rhythmic Structure of Music. Chicago: University of Chicago Press. Ebcioglu, K. (1988) An Expert System for Harmonizing Four-Part Chorales. Computer Music Journal, 12 (3), 43–51. Fraisse, P. (1982) Rhythm and Tempo. In D.Deutch (Ed.), The Psychology of Music, pp. 149–180. San Diego and London: Academic Press. Fraser, G.R.M. (1982) Rhythmic Structure in Music: a study of the perception of metrical and phrase structure, from a mechanistic viewpoint. Unpublished PhD thesis, University of Durham.
Contemporary music review
286
Lee, C.S. (1985) The Rhythmic Interpretation of Simple Musical Sequences: Towards a Perceptual Model. In P.Howell, I.Cross and R.West (Eds.), Musical Structure and Cognition, pp. 53–69. London: Academic Press. Lerdahl, F. & Jackendoff, R. (1983) A Generative Theory of Tonal Music. Cambridge, Mass.: M.I.T. Press. Longuet-Higgins, H.C. & Steedman, M.J. (1971) On Interpreting Bach. In B.Meltzer and D.Michie (Eds.), Machine Intelligence 6, pp. 221–239. Edinburgh: Edinburgh University Press. Longuet-Higgins, H.C. (1976) Perception of Melodies. Nature, 263, 646–53. Longuet-Higgins, H.C. & Lee, C.S. (1982) The Perception of Musical Rhythms. Perception, 11, 115–128. McAdams, S. & Bregman, A. (1985) Hearing Musical Streams. In C.Roads and J.Strawn (Eds.), Foundations of Computer Music, pp. 658–698. Cambridge, Mass.: M.I.T. Press. Marsden, A. (1987) A Study of Cognitive Demands in Listening to Mozart’s Quintet for Piano and Wind Instruments, K.452. Psychology of Music, 15, 30–57. Meyer, L.B. (1973) Explaining Music. Berkeley: University of California Press. Povel, D. & Essens, P. (1985) Perception of Temporal Patterns. Music Perception, 2 (4), 411–440. Riemann, H. (1884) Dynamik und Agogik. Hamburg: Rahter. Rosenthal, D. (1988) A Model of the Process of Listening to Simple Rhythms. In Proceedings of the 1988 International Computer Music Conference, pp. 243–249. San Francisco: Computer Music Association. Schachter, C. (1980) Rhythm and Linear Analysis: Durational Reduction. Music Forum, 5, 197– 232. Sloboda, J. (1985) The Musical Mind. Oxford: Oxford University Press. Steedman, M.J. (1977) The Perception of Musical Rhythm and Metre. Perception, 6, 555–562. Tenney, J. & Polansky, L. (1980) Temporal Gestalt Perception in Music. Journal of Music Theory, 24, 205–229.
Critical study of Sundberg's rules for expression
287
Critical study of Sundberg’s rules for expression in the performance of melodies Peter van Oosten Utrecht School of the Arts, The Netherlands Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 267–274 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
During the last decade Sundberg, Frydén and Friberg have been working on a rule-system for the generation of expression in music. The system is constructed using an analysis-by-synthesis strategy which operates by making explicit the intuitions of a musical expert. The rules enable one to explain expression in a local musical context (e.g. two or three notes). There is still a need for more detailed structural information and for rules of a more hierarchical nature in order to explain expression from a higher structural level. KEY WORDS: expression, music performance, timing
Introduction Traditional musical notation has developed in about ten centuries from a primitive memory aid to an apparently precise representation of music in sounds and actions. However, modern notation still is not complete enough to reflect all the details and subtleties of a composition. What we gain by this shortcoming is a variety of interpretations, a possibility of re-creating the same piece. In fact, the deviations from the written material constitute an essential part of music communication. Evidently, discrepancies can occur as a result of too little experience or lapses of attention. For this research only those performance variations are interesting that are intentional. Because the performer is not always aware of these intentions, it is an interesting question why a performance deviates from the notation and how we can formalize the motives behind this, thus making it possible to generate a musical performance from a score. Not only is there a considerable scientific interest in this subject, but also musicians using computer music sequencers in step time feel the need for a ‘human touch’ feature in quantised sequences. Sundberg, Clynes and Todd among others have been working on this subject (Clynes 1987; Sundberg, 1988; Todd, 1989). The project described here is a part of POCO, an environment for the study of expression in music, developed at the City University of London (Honing, 1990).
Contemporary music review
288
The aim of the project is to implement a rule system in POCO, based on the research of Sundberg et al., that enables a computer to generate musical performances (Thompson, Sundberg, Friberg & Frydén, 1989; Friberg, 1989). At present the focus of our research is on aspects of timing. Future directions of research will comprise the inclusion of other sound parameters as well as comparison with other models of expressive timing.
Analysis-by-synthesis Music performance can be studied by means of two different strategies: analysis-bymeasurements and analysis-by-synthesis. For the analysis-by-measurements approach first the characteristics of the data are measured, then hypotheses are formulated on the basis of the results. Since these measurements have proved to be very complex, Sundberg uses an analysis-by-synthesis approach as an alternative. In this particular strategy a musical expert evaluates the performance of the system and proposes improvements. This method is very similar to a typical music lesson situation where a pupil is told how to improve his or her performance one step at a time. Sundberg’s strategy is formalised in the shape of a rule-based system. Thus we can extend the system by adding or combining rules and improve the rules by tuning the parameters. The analysis-by-synthesis approach is presented as a useful complement to any research that involves measurements of actual performances. It provides a unique study of the question as to whether an expert musician is able to make explicit the expressive devices that he or she uses in practice (Thompson et al., 1989).
Sundberg’s rule system In the present rule system there are about 20 rules. Three of them are used to synchronize different voices in ensemble music. Here only those rules that act upon a single melody will be discussed. Rules can modify the sound parameters of a note and a rest. Each event (note or rest) is examined by the rule and, based on its context, a modification can be made. The rules test structural features of the music, e.g. leaps (micro structure), phrase boundaries (macro structure) etc. A modification can be proportional, an absolute value or a combination of both. A simple example is the rule faster uphill as provided by Friberg (1989): “Duration of note is shortened by 2* k ms if higher note follows and previous one is lower. This shortening is also applied on the first note starting a series of rising intervals.” Another rule, durational contrast1, increases the durational contrasts by simply shortening notes with duration between 30 and 600 milliseconds. The rule of leap articulation states that a micropause is inserted between the two notes of a leap.
Critical study of Sundberg's rules for expression
289
The rule of phrase marks phrases and subphrases by inserting a micropause between them and by lengthening the last note of a phrase. The rule of double duration increases the duration of a short note surrounded by longer notes if the preceding note is exactly twice as long.
Evaluation of the system At this moment a few experiments have been done with the rule-system. Two experiments are described by Thompson et al. (1989). Only small scale experiments have been done with our own implementation of the system. Nevertheless, some critical remarks can be made already. These critical comments are presented below, first in respect of general characteristics of the model and then in more specific terms.
General remarks Is a rule-system a good model of mental architecture? Many authors have their doubts (see for example Johnson-Laird, 1989 for a general discussion). The weakness of this system becomes clear e.g. when a cocktail of rules is applied. Because all rules act independently, the effects on a note’s parameter accumulate, sometimes resulting in an unmusical performance. The action of the system is, however, partly controlled by the fact that the rules are not interacting as is the case with many rule-systems for other applications. Sundberg recgonizes this problem and solves it by carefully selecting rules and setting the parameters separately for each example. Rules can also compete with each other, in such a way that the effect of one rule is undone by another. If two rules are always each other’s complement then they must not be applied together. The problem arises when they complement one another only incidentally. What is the meaning of such competing rules in musical practice? It could mean that, in a certain context, a performer can have different interpretations. In that case he or she makes a choice and only one of the rules is then applied! The rule-system is not capable of indicating such preferences. Most models for the generation of expression are based on the hypothesis that there is a strong relation between expression and structure (Clynes, 1987; Todd, 1989). In classical music, musical structure can be seen as hierarchically organized, e.g. a piece is built of parts, sections, phrases, subphrases and so on. This is reflected in professional music education where students learn to be aware of the complete musical structure of a piece, its levels of organization and its ambiguities. Novice students, however, are given more instructions at a note level. In Sundberg’s rule-system, most rules act upon note level. Perhaps they have to, because the system is still a ‘novice student’. The context of most rules is 1, 2 or 3 notes, e.g. ‘modify the duration of note A with sound parameters S, if it is preceded by B and followed by C’. Obviously, three notes are not enough to determine any kind of hierarchy. Even for local structure, a wider context is often needed. How should a rule modify a parameter: by adding a certain percentage or by adding a specific amount? In Sundberg’s publications we can observe a shift of preference from
Contemporary music review
290
the one to the other. At an early stage in the research most modifications were proportional, while in the latest publications, especially where duration is modified, a fixed value is added (no reasons are given). There are many reasons which could be advanced in favour of a proportional modification or a combination with a fixed value. For instance, the lengthening of the first note of a leap (as is done by the rule leap tone duration) can be felt in relation to the beat length. When the same piece is played twice as fast, then the lengthening of that same note will be proportionally less. In fact, all tempo aspects of expressive timing are ignored by the rules when only fixed values are added. Moreover, a fixed value becomes inaudible when the duration of the note is relatively long.
Figure 1 Beethoven Sonata in A flat opus 26 More detailed observations on some rules In our experiments two fragments of piano music have been used: Beethoven Sonata in A flat, opus 26 and Mozart Sonata in F-major, KV.332. Both fragments contain a reasonable variety of musical events (leaps, rests, repetitions, appoggiaturas).
Critical study of Sundberg's rules for expression
291
A good example In Sundberg’s articles many different examples of music are given; each rule in fact has its own example. Why is this? The reason may be obvious: the success of a rule strongly depends on a well-chosen example! For instance, in Friberg (1989) the rule leap articulation is applied to a bourrée by J.S.Bach. This fragment consists of 23 notes of which 7 are influenced by the rule. Now let’s see what the effect of the rule is on the first 64 notes of the Beethoven Sonata. Surprisingly, there is no effect at all! It is stated that the rule may only be applied to notes shorter than 100 ms. Though there are enough leaps in the sonata, none of the notes has a duration shorter than 100 ms. The same goes for the rule double duration. Again there is no effect on the Beethoven sonata. When applied to the Mozart sonata, only one note is affected,
Figure 2 Mozart Sonata in F-major, KV.332 namely in measure 4. But there it is an unintentional application of the rule because the longer note that follows is the first note of a new phrase. The rule only makes sense from a musical point of view, if the three notes—long, short, long—appear in the same phrase. Thus, it is apparently possible to compose a set of rules which, when applied to these sonata fragments, has no significant effect. The experiments described in Thompson et al. (1989) are also based on well-chosen examples for each rule tested.
Contemporary music review
292
Examining the context of a note Another problem with the rules is their extremely local character. All rules presented in Sundberg’s publications operate on one single note in relation to its direct neighbours. For example leap articulation considers only two notes: the first and last note of the leap. However, from a musical point of view, it can be very important to know in what context the leap occurs, e.g. is it part of a sequence of leaps or does it stand alone; does the leap start on an upbeat or on a downbeat; what happens at a phrase boundary? Other rules such as leap tone duration, accents, melodic charge, harmonic charge, double duration also operate within a very small context.
Poco a poco Deviations in tempo and dynamics can be made gradually or abruptly, both ways having their own psychological effect. Just as with driving a car, you can stop either by slowing down or by hitting a wall. Gradual changes are the most common in musical practice, sometimes explicitly mentioned in the score as crescendo, diminuendo, ritenuto or accellerando. If not explicitly mentioned, a performer can use gradual changes to signal specific structural events: e.g., a phrase boundary can be signalled by gradually slowing down. In Sundberg’s rule-system only two rules support this type of deviation: harmonic charge and final-ritard. Other rules, such as phrase, just “hit the wall”. It is true that phrases are separated by the rule (a micropause is added and the last note of a phrase is lengthened) but the effect is not natural. In fact, it can be argued that the amount of ritenuto at the end of a phrase should be calculated in relation to the depth of nesting of the phrase; a phrase at the highest level thus getting the most ritenuto.
Melodic charge The rule melodic charge is mentioned in many articles by Sundberg and is apparently of great importance. It is one of the few rules where the harmonic context is considered. The meaning of melodic charge2 is intuitively defined as: “…a measure of remarkableness (i.e. salience, PvO) of a tone, given the harmony over which it appears…” (Friberg, 1989). The formal description is: Amplitude and duration Amplitude and duration is added in proportion to the tone’s melodic charge Cmel relative to the root of th relative to the root of the prevailing chord. Melodic charge for the various scale tones in a C major/minor context: Tone:
C
G
D
A
E
B
F#
Db
Ab
Eb
Bb
F
Cmel:
0
1
2
3
4
5
6
6.5
5.5
4.5
3.5
2.5
∆R=0.2*Cmel*k
[dB]
(maximum 1.3 dB for k=1)
Critical study of Sundberg's rules for expression
∆DR=2/3*Cmel*k
[%]
293
(maximum 4.3 % for k=1)
As a musician one would infer from the above intuitive definition of melodic charge: – chord tones are less remarkable than non-chord tones; – for each chord-type (major, minor, diminished etc.) values for melodic charge will be different; – remarkableness of a tone within a chord depends on the melodic function of that tone (e.g. an appoggiatura is more remarkable than a passing note). However, none of these conclusions are supported by the formal description of the rule. For one thing, the values increase in the first half of the circle of fifths and decrease in the second half, i.e. the circle of fifths is the starting-point instead of the chords themselves. For another, the melodic charge values are the same for each chord type. And finally the melodic charge value of a note is independent of its melodic or harmonic function. The context is just one note: no preceding or following notes are examined. To give but a few examples: according to the rule, the chord-tone E in a C-major context is more remarkable than a B flat, F, D or A. Even more peculiar, in a C-minor context, the sharp dissonant E is less remarkable than the chord-tone E flat!
Some proposals for improvement The above observations directly lead to the following three proposals for improvements of melodic charge: 1. adjust the values of melodic charge in order to reflect the relationship of a note with the prevailing chord. 2. for each chord-type, compose a table with melodic charge values. 3. add devices to the system that can compute a note’s melodic and/or harmonic function. The rule of phrase can easily be improved by adding two functions: one to calculate the phrase nesting and another to spread the added time over a sequence of notes.
Conclusions Sundberg’s rule-system is one of the few expression-generating models. The combination of an analysis-by-synthesis approach with a rule-system is quite unique. This method results in a relatively large collection of rules, acting independently of each other, each rule adding a specific aspect of expression. The model assumes that performance rules lie behind a performer’s actions. Given this assumption, it is likely that the rules have to be more complex. Extensive analysis is needed for each note before a rule can be applied. The application of a rule has to be more flexible. A professional musician can shift from one rule to another during his performance. In other words, in a live performance a rule is generally not fixed for a whole piece. What the system achieves is local expression but, since the rules do not cooperate, there is no guarantee that the numerous details will result in a coherent unity. A rule-system as
Contemporary music review
294
discussed in conjunction with a model based on hierarchical structure could turn out to be promising, combining detailed expression at the musical surface with a conception of the global timing profile Acknowledgements I would like to thank Peter Desain and Henkjan Honing for their comments, help and advice and all the colleagues at the Centre for Knowledge Technology in Utrecht where this research was started. Thanks also to Marineke Scholtes for critical remarks on this text.
Notes 1. Rule names used in this text (printed in italics) are the same as used in ‘Generative rules for music performance. A formal description of a rule system’ by Anders Friberg (1989). 2. In previous publications also the terms ‘harmonic distance’ and tonic distance have been used.
References Clynes, M. (1987) What can a musician learn about music performance from newly discovered micro-structure principles (PM and PAS)? In A.Gabrielsson (Ed.), Action and Perception in Rhythm and Music, pp. 201–233. Publications issued by the Royal Swedish Academy of Music No. 55. Friberg, A. (1989) Generative rules for music performance. Publications issued by the Royal Swedish Academy of Music, Nr. 04. Friberg, A., Sundberg, J. (1986 I) A LISP environment for creating and applying rules for musical performance. ICMC, 86 Proceedings. Friberg, A., Sundberg, J. (1986 II). Using rules to control the musical performance. Actes du Symposium “Systemes personnels et informatique musicale” IRCAM—Octobre, 1986. Frydén, L., Sundberg, J. (1984) Performance rules for melodies. Origin, functions, purposes. ICMC, 84 Proceedings. Honing, H. (1990) POCO, An environment for Analyzing, Modifying and Generating Expression in Music. ICMC’ 90, San Francisco: CMA. Johnson-Laird, P.N. (1989) The computer and the mind. Fontana Press: London. Lerdahl, F., Jackendoff, R. (1983) A generative Theory of Tonal Music. MIT Press. Oosten, P.W.J. van (1990) POCO RUBATO Een systeem υoor de generatie van expressieve timing in muziek. Publications issued by the School of the Arts, Utrecht. Oosten, P.W.J. van (1990) POCO RUBATO User Manual. Publications issued by the School of the Arts, Utrecht. Palmer, C. (1989) Structural Representations of Music Performance. Proceedings of the Cognitive Science Society, pp. 349–356, Hillsdale, NJ: Erlbaum. Palmer, C. (1989) Mapping Musical Thought to Musical Performance. Journal of Experimental Psychology: Human perception and performance, Vol. 15, pp. 331–346. Sundberg, J., Askenfelt, A., Frydén, L. (1983) Musical performance: A synthesis-by-rule approach. Computer Music Journal, 7 (1). Sundberg, J. (1888) Computer synthesis of music performance. In J.A.Sloboda (Ed.), Generative processes in music, pp. 52–69. Oxford: Clarendon Press.
Critical study of Sundberg's rules for expression
295
Sundberg, J., Friberg, A., Frydén, L. (1989) Rules for automated performance of ensemble music. Contemporary Music Review 3, pp. 89–109. Thompson, W.F., Sundberg, J., Friberg, A., Frydén, L. (1989) The use of rules for expression in the performance of melodies. Society for research in psychology of music and music education. Todd, N.P. (1985) A model of expressive timing in tonal music. Music perception, 3, pp. 33–58. Todd, N.P. (1989) A computation model of rubato. Contemporary Music Review, 3, pp. 69–88.
Contemporary music review
296
Contribution to the design of an expert system for the generation of tonal multiple counterpoint Agostino di Scipio Centra di Sonologia Computazionale, University of Padova, Italy Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 275–283 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
In this paper I examine some main features shown by recent applications of Artificial Intelligence methods to a specific set of musical problems concerning counterpoint. Particular emphasis is given to the “multiple viewpoints”—oriented techniques. However, theoretical and practical criticisms are introduced, concerning the sense and the nature of both the knowledge-representation and the intended musical tasks. A practical example is then described, in which a nontraditional formalization of contrapuntal fundamentals has been used as a minimal (but satisfying) knowledge-base for an expert system whose design is briefly outlined: it still preserves a “multiple viewpoints” orientation, but proposes a hierarchical organisation of competence viewpoints operating at successive moments of the solution process. Future developments should result in the expert system being re-shaped as a self-organising complex net. KEY WORDS: AI and music, contrapuntal compositional processes, musical knowledge representation, simulation models, multiple viewpoints.
Introduction In general, an expert system is a set of computer programs and hardware interfaces to external devices (or even sensors to input data from the external world) intended to solve several problems within the class of problems its knowledge base implies (Sovalmico, 1987). Recent studies regarding methods of Artificial Intelligence in musical applications have focused attention to a sort of “elastic” and multi-dimensional knowledgerepresentation which allows the system to arrange the solution of a specific step with
Contribution to the design of an expert system for the generation
297
reference to all aspects of the entire context. This is the case for studies concerning the harmonization of four-part chorales (Ebcioglu, 1985; Ebcioglu, 1988) and the generation of contrapuntal scores (Schottsteadt, 1984). Such systems are requested to observe the input and the output of every single step from multiple viewpoints: vertical intervals, horizontal intervals, overall relevance to the context, rhythmic pattern, linguistic and stylistic appropriateness. The influence of each viewpoint on decision-making must be weighted, depending on the context. Viewpoints usually refer to a knowledge-base encoded in terms of three types of prepositional structure: production rules, constraints and heuristics. Heuristics make alternative paths available, in an attempt to get through problems with a certain degree of common sense. A “penalty value” assignment at each n-th step solution is employed, and constraints derived from the sum of penalties from various viewpoints are used to eliminate some paths and switch the solution process back to an alternative path available at the step deemed responsible for the inappropriate solution (dependency-directed backtracking algorithms are described in Stallman & Sussman, 1977; Ebcioglu, 1985; Ebcioglu, 1988). Lischka (1991) suggests that each viewpoint should be thought of as a separate automaton, and the expert system as a group of highly specialized automata. This view relates to a connectionist model, each automaton being a computational unit (a “neuron”) receiving information from and sending information to other units. The fact that all projected structures would thus rely on parallel processing makes this hypothesis very interesting. As far as the musical output is concerned, expert systems designed according to these methods are supposed to give the sections of the generated music a certain coherence and interrelation, in order to emphasize the character of causality proper to a flow of events experienced in a musical way (not only to a correct, formally error-free, flow).
Simulation-objects Nevertheless rarely would an experienced musician consider the musical results to be stylistically characterised and well-formed, even if correct: in other words these expert systems do not seem to pass Turing’s1 test. Turing’s test is mentioned precisely because it is a fundamental criterion within a “strong” (traditional) AI paradigm: in fact, except for the general guidelines of Lischka’s approach, musical expert systems like the above ones seem to be applied experiments in a “strong” AI paradigm, where the solution-object is a more or less complicated function of an input. Broadly speaking, in this view mind is a synonym of program, of formal system. In opposition, other scholars have introduced a “weak” AI paradigm, where the mind is a function of the brain: mind is produced by a neurobiological network we call the brain (Sarle, 1984; Davies, 1988). What appears relevant (apart from the many related philosophical implications) is that a “weak” perspective also underlines the essential formalism observable in the “program-mind” view. In particular, it is to be noticed that no semantic level (no access to significance, no emergent meaning) is inherent in formal models, and that no real intentionality—a determinant of an intelligent behaviour and a main aspect of the cognitive processes—is implied. Similar indications come also from many disciplinary areas (Thom, 1980; Atlan, 1985).
Contemporary music review
298
Therefore, until some completely new technology becomes widely accessible, what musical expert systems can do is performing simulations of musical cognitive processes, and their solution-objects should be conceived as simulation-objects, simulations of musical output. Of course simulation-objects can be very sophisticated in some cases: as S.Papert has noticed, no one uses a digital watch as if it were a good imitation of a mechanical watch (Pagels, 1984). But normally a watch has, with very few exceptions, a purely functional role, it is an object-having-a-project (Monod, 1970), it serves some pragmatic functions. Usually, a chorale and most other music, even if composed for practical reasons, is intended to have an aesthetic function, and its parts have, surely, a syntactical (formal) but also aestheticallyoriented role. In practice, AI musical applications seem to be more interesting if intended to study musical cognitive processes by means of simulation models then if meant to produce integrally artificial musical intelligences (Sloboda, 1985). But even admit-ting and accepting the “simulatory” nature of the designed systems, some other limitations are observable. It should be pointed out that perhaps the desired goals are often exorbitant with reference to the real possibilities of the formalized knowledge-bases. Some correlative hypotheses might be: the knowledge-base does not imply a given problem (and, anyway, the expert system cannot infer2 new inherent aspects); or the knowledgerepresentation does not result in a knowledge-base conforming exactly to the human knowledge taken into account; or even the particular human knowledge is not characterised as accurately as supposed. The following section is a brief discussion on these hypotheses.
Non-intentionally aimed systems Who does actually assign “penalty values”? It seems almost banal to answer that, of course, the expert system does not: whether it has learnt by itself or it has been taught by the programmer, an expert system usually consults a table of previously stored judgments, and thus it is we who are defining the importance of the rules and specifying the related penalties in order to backtrack. The practical principles according to which the expert system (the programmer) makes decisions about the musical quality and the appropriateness of a certain solution, should exactly conform to the principles contained in one or more treatises articulating intuitions concerning music as experienced. It can be argued that counterpoint is the most “formalizable” category of western tonal music (except for some constructivist strands of contemporary music), and that representational techniques have been growing more and more powerful, but it remains to be proven that J.Fux’s (1725), C.Koechlin’s (1926) and others’ treatises are correctly translatable and fully encapsulatable in terms of formal systems. Sometimes musical strategies seem to share something with logical reasoning, but even in a solely combinatorial machine such as A.Kircher’s tabula mirifica (1650), what makes a decision pertinent and a rule logical is the emergent significance and the musical sense embodied in the substrate of an intelligent process. Whereas treatises ideally are intentionally aimed systems—they always take the aesthetic meaning as being implicit, the reader and the writer sharing it without explicit explanations—knowledgebased expert systems are non-intentionally aimed systems. In most cases, however,
Contribution to the design of an expert system for the generation
299
musical knowledge has always been handed down and explained in terms of musical thought which seldom appears to conform to a body of effective procedures, especially in non-contemporary styles. Moreover, contents of ancient and modern treatises are themselves not to be considered as complete and definitive formulations about the music of the past; they can be integrated and also replaced by new and complementary descriptions. For instance, all that concerns the independence of the melodic profiles in polyphonic structures and the consequent non-coincident formal development of each voice (perceived, nevertheless, as a part of a whole entity) is assumed to be dependent on the readers’ talent, and few hints as to how this might be achieved are typically given (composers learn more about this from musical scores and from listening then from treatises). Finally, a more precise solution-object should be specified, whereas harmonizing a chorale melody could result in an oversimplification of what a contrapuntal practice should require, assumed that the realisation of a figured chorale is not just a matter of harmony. In fact, functional harmony cannot be completely responsible for good voiceleading. The latter two critical arguments—namely the inadequacy of ancient treatises to provide a formal approach with non-ambiguous and effective strategies, and the need for more appropriate specifications of goal—have suggested the basic criteria in the context of the work which now will be presented.
A practical example from a composer’s viewpoint In related research I have tried to minimize any bias in the solution-objects and to introduce a procedure as close as possible to the actual contrapuntal compositional process (Di Scipio, 1989), still remaining in the area of simulation models. In order to provide the expert system with a minimal and fully computable knowledge-base, the inductive system introduced by G.Bizzi (1982) has proven very useful. It introduces a reconsidered representation of contrapuntal fundamentals, and allows the expert system to generate up to three superposed voices in triplum reversible counterpoint, in the first instance. Briefly summarizing3, assumed that only the diatonic pitch classes are considered, that only simple interval classes are considered (i.e. not the diminished nor the augmented intervals), that, hence, all possible horizontal relations (successive intervals) are the (ascending or descending, minor and major) second, the (ascending or descending, minor and major) third, the (ascending or descending) fourth (other simple intervals being implied with these), and finally that each note may be the root or the third in a fifthmissing triad (the root being doubled), all horizontal and vertical relations are reduced to a few in number (see top of Fig. 1). The entire set of relations, considered in both direct and retrograde order, can be transposed to each pitch class of the diatonic scale (see Fig. 1). The software implementation of such a reduced knowledge-base is simple and it will not be described here. However, it is important to emphasize that, from an operating point of view, we have the system computing only the role of a note within the current triad, according to the preceeding and succeeding notes in the preceeding and succeeding
Contemporary music review
300
triads; the absolute pitch and the pitch classes are not taken into account except insofar as the system interacts with a user or interfaces to external devices. Notice that two constraints must be applied: the first occurring on the relation of a third, where two voices must move in contrary directions; the second occurring when the root note is a leading-tone, which can not be doubled (this constraint will trigger some procedures in a higher level viewpoint, see below); both constraints avoid undesired parallel octaves. In all other cases the expert system works only by producing rules, and no penalties are needed because all solutions are syntactically permitted (correct) and musically acceptable. No errors and no deficit in quality must be controlled. Even though backtracking must sometimes be performed (because some paths to the solution-object are barred), all the acceptable solutions are available, and a modal chordal skeleton is generated.
Supra-line applied competence viewpoints On this base the expert system can work according to viewpoints which are not only
Figure 1 Numerals 1–7 stand for the notes of a diatonic scale; a little set of relations implies all horizontal and vertical intervals useful in generating a modal skeleton with three voices in
Contribution to the design of an expert system for the generation
301
reversible counterpoint (by permission of Edizioni Kappa, Rome). multiple but also successive, that is, applicable after the completion of the entire skeleton (ground level solution) and hence the output of one can serve as input to another, recursively. Procedures modeling higher and higher levels of competence can be called on for various tasks, such as the “tonalization” of the modal skeleton, the generation of a fourth melodic line, the fioriturae or figuration (arrangement and embellishment of the melodic lines), the absolute pitch assignment, etc…Obviously, the knowledge-base described provides no information about some specific view-points—e.g. rhythmic patterns. In such cases, which are in a certain sense simpler than those concerning pitch and intervals in this context, conventional rules of structure can serve as defaults. The “tonalization” of the modal skeleton, which also serves to transform any triad with leading-tone root into a diminished seventh or a dominant seventh chord, the generation of the fourth voice and the absolute pitch assignment have been already successfully implemented; they have confirmed that operating with three voices and generating the fourth from a higher (subsequent) level reduces the number of constraints to be considered at lower levels and, more importantly, makes available a wider spectrum of good solutions than otherwise. A hierarchy of multiple viewpoints is also involved, shaped by means of the order in which the viewpoints are arranged, and their weighting. This should be seen as a main feature of the expert system with reference to the format of the solution-object: different orders and different weightings can result in stylistically different output. Figure 2 shows the realisation of a four-part chorale (up to the third fermata) from a Bachian melody. For an instrumental in a sixteenth-century style, one may apply a radically-weakened “tonalization” view, and may give the melismatic motion of each voice (the fioriturae above) a major weight (see figure 3).
Figure 2 The beginning of a four-part chorale generated by the expert system
Contemporary music review
302
outlined in the paper according to multiple viewpoints arranged in the following order of succession: “modal skeleton”, “tonalization”, “generation of the fourth voice” and “absolute pitch assignment”. The soprano line is a Bachian melody used also in Ebcioglu (1988).
Figure 3 A modal example realised as follows: “modal skeleton”, fioriturae (figuration) and “absolute pitch assignment”. The cantus firmus has been drawn from Koechlin (1925).
Contribution to the design of an expert system for the generation
303
Figure 4 The over line procedure is preferred to the typical on line one: the solution-object is generated applying the competence viewpoints in successive moments, according to a given hierarchy (order of succession and weighting). The algorithms which generated the fioriturae in these musical examples were quite simple; they referred to a large data base and performed ordinary data manipulations (after all, this is not without analogies with the composers’ use of standard musical
Contemporary music review
304
formulae). However, more sophisticated algorithms (such as some proposed in other studies) can be used when modeling higher level viewpoints. They can be operationalised in the context of the multiple-viewpoint oriented approach proposed here, which preferentially adopts a cumulative, supra-line design (superposition of outputs from successive competence viewpoints), rather than the typical on line one, where decisions of any level are taken at the current step, and backtracking is performed iff…(see figure 4). This recursive process seems conformant with a purely contrapuntally-derived idea of music and is analogous to the compositional practice of superposing successive elaborations. (Interestingly, such “vertical” growth of materials resembles some compositional techniques of this century). Examples of this musical approach can be seen in J.S.Bach (from the simplest two-voice introduction to Die Kunstder Fuge—Bizzi’s formalisation can be seen almost explicitly in the six Trio Sonatas BWV 525–530) and his immediate German precursors. A more specific approach is likely to be required for earlier Flemish polyphony, where a completely independent process would be needed for each individual voice (though taking into account the possibility of additional voices).
Possible developments As the present approach comes from analytical observations of contrapuntal music, it seems to satisfy a particular condition, which requires the synthesis procedure to be complementary to the analysis procedure and to be a complex dynamical process evolving by refinement at successive moments. Elsewhere the definition of modèle informatique (Riotte, 1989)—rather than formal model—has been proposed for models of this kind. Once again, formal procedures have a merely simulatory nature and the methods described are themselves simply to simulate a (compositional) cognitive process. Nevertheless, the state of the project seems to imply some particular characteristics and enables us to see interesting developments where also the emergence of some musical significance could be involved, that is, where the basic and irreducible set of the relationships described above would be structured as a self-organising dynamic system. Recently, similar systems have been used for both instrumental and computer music composition (see, for example, Di Scipio, 1990 and Truax, 1990); in particular, it could be investigated how to represent the core of the expert system outlined above using a network of random Boolean automata (Atlan, Fogelman-Soulie, Solomon & Weisbuch, 1981; Atlan, 1985). The related concepts of dynamic resonance and system competence— sensitiveness (Thom, 1980)—should provide applicable useful ideas.4
Notes 1. In simple words, Turing’s test is a scientific criterion of the traditional AI intended to compare the computer behaviour (running some given programs) to a specialized human behaviour; in a successful test experts do not recognise the artificial nature of the machine’s output. 2. The latin meaning of the verb to infer is to bring inside or to bring oneself, i.e., in this case, to make new concepts a substantial part of an already established knowledge-base.
Contribution to the design of an expert system for the generation
305
3. Because of the limited space available, this paper cannot report on many necessary musical details; for further information refer to Bizzi (1982, 1990). 4. I am grateful to Prof.Giancarlo Bizzi for our long and fruitful discussions. I also thank my colleagues Paolo Speca and Serafino Di Eusanio for having synthesized some of the sound examples played during the presentation of this paper at the Conference; they used a CX5 Yamaha computer. The major part of the sound examples, however, were generated via software tools I have implemented on a IBM AT personal computer or using a particular version of the MusicV program designed at the Centro di Sonologia Computazionale, University of Padova.
References Atlan, H. (1985) Complessità, Disordine e autocreazione del significato. In G.Bocchi and M.Ceruti (Eds.), La sfida della complessità, pp. 158–178, Milan: Feltrinelli. Atlan, H., Fogelman-Soulie, F., Solomon, J. & Weisbuch, G. (1981) Random Boolean Networks. Cybernetics and Systems, no. 12, pp. 103–121. Bizzi, G. (1982) Specchi invisibili dei suoni. Rome: Edizioni Kappa. Bizzi, G. (1990) Il Canone e la Fuga, Manuale. Ancona: Berben. Davies, P. (1988) The Cosmic Blueprint. New York: Simon & Shuster. di Scipio, A. (1989) An expert system for automatic generation of tonal multiple counterpoint. Unpublished paper presented at the European Workshop on AI and music, University of Genova. di Scipio, A. (1990) Composition by exploration of non-linear dynamic system. In S.Arnold and G.Hair, (Eds.), Proceeding of ICMC 1990, pp. 324–328. Glasgow: Computer Music Association. Ebcioglu, K. (1985) An expert system for the schenkerian synthesis of chorale in the style of J.S.Bach. In W.Buxton (Ed.), Proceeding of ICMC 1984, pp. 135–142. San Francisco: Computer Music Association. Ebcioglu, K. (1988) An expert system for harmonizing four-parts chorales. Computer Music Journal, 12 (1), pp. 43–51. Fux, J.J. (1725) Gradus ad Parnassum. Vienna. Kircher, A. (1650) Musurgia Universalis. Rome. Koechlin, C. (1926) Précis de règles de Contrepoint. Paris: Heugel. Lischka, C. (1991) Artificial Intelligence and representation of musical sounds. In G.de Poli, A.Piccialli & C.Roads (Eds.), Representations of musical signals, Boston: MIT Press. Monod, J. (1970) Le Hasard et la néessité. Paris: Seuil. Pagels, H.R. (1984) editor. Computer Culture: The scientific, Intellectual and Social impact of the Computer. New York Academy of Sciences. Riotte, A. (1989) Modèles et métaphores: les formalismes et la musique. In S.McAdams and I.Deliège, (Eds.), La musique et les Sciences Cognitives, pp. 523–533, Liège: P.Mardaga. Schottsteadt, W. (1984) Automatic species counterpoint. Report no. STAN-M-19, Stanford University. Searle, J. (1984) Minds, Brains and Sciences. Harvard University Press. Sloboda, J. (1985) The Musical Mind. The Cognitive Psychology of Music. Oxford: The Clarendon Press. Sovalmico, M. (1987) Intelligenza Artificiale, Milan: Scienza & Vita Nuova/Hewlett Packard. Stallman, R.M. & Sussman, G.J. (1977) Forward reasoning and dependency directed backtracking in a system for computer aided circuit analysis. Artificial Intelligence, no. 9. Thom, R. (1980) Modèles mathématiques de la morphogénèse. Paris: C.Bourgois.
Contemporary music review
306
Truax, B. (1990) Chaotic non-linear systems and digital synthesis: An exploratory study. In S.Arnold and G.Hair, (Eds.), Proceeding of ICMC 1990, pp. 100–104. Glasgow: Computer Music Association.
Computer-aided comparison of syntax systems
307
Computer-aided comparison of syntax systems in three piano pieces by Debussy David Meredith University of Nottingham, UK Contemporary Music Review 1993, Vol. 9, Parts 1 & 2, pp. 285–304 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
1 Introduction 1.1 Syntax systems Figure 1 shows four sequences of letters, each sequence accompanied by a network. If letter L2 follows L1 in a sequence then an arrow is drawn from L1 to L2 in the network. The letters represent syntactic units. The networks represent syntax systems and are called syntactic networks. The sequences of syntactic units can be generated by the syntactic networks.
Figure 1. Sequences, syntactic units and syntactic networks.
Contemporary music review
308
Figure 1.5 shows the simplest syntactic network capable of generating all the sequences in Figures 1.1 to 1.4. It contains all and only the arrows in the networks in Figures 1.1 to 1.4. Figure 1.6 shows another network capable of generating all the sequences in Figures 1.1 to 1.4. However, this network is not as simple as that in Figure 1.5 If a syntax system can be represented by a syntactic network of the type shown in Figure 1, it will be a simple type of syntax system in which the choice of syntactic unit at any particular point in the sequence depends only on the identity of the preceding syntactic unit. An advantage of representing syntax systems by syntactic networks of this type is that metrics can be defined which can be used to measure how similar such networks are to each other. 1.2 The aim of this project The aim of this project was to devise a computable analytical procedure for comparing the pitch structures of different pieces of music. In his study of proportional structure in the works of Debussy, Howat (1983)1 showed that there are close correspondences between “Clair de lune” (no. 3 from Suite Bergamasque (1890; revised in 1905)). “Reflets dans l’eau” (no. 1 from Images, première série (1905) and “Cloches à travers les feuilles” (no. 1 from Images, deuxième série (1907)) in the proportional positioning in these pieces of events of structural significance. The similarities in the proportional structure of these pieces suggested the possibility of there being similarities between the pieces in their pitch organisation. The aim of this project was to investigate this by representing the pieces as sequences of syntactic units and then comparing these sequences to discover whether they could have been generated by a common syntactic network. 1.3 Some preliminary concepts A segment is any part of a score which is bounded by two distinct temporal locations in the score. In Figure 2.1 the portion of the bar including the first minim of the bar forms a segment but that portion of the bar which includes the top notes in each chord, (G4–C5– C5), is not a segment.
Computer-aided comparison of syntax systems
309
Figure 2 Pitch class simultaneity representation. An operational pitch class set for a specified segment is equal to the set of all and only pitch classes used within the segment. Pitch classes are coded from A=0. The operational pitch class set for the segment which includes the first three crotchets of Figure 2.1 is (3, 7, 10). A pitch class simultaneity is any segment of a score whose pitch class content is constant throughout its extent and whose pitch class content differs from that of neighbouring segments. The pitch class content of a general rest is defined as equal to that of the preceding pitch class simultaneity. The operational pitch class set for any given pitch class simultaneity is called a simultaneity set (Figure 2.1). A score can thus be converted into a pitch class simultaneity representation of the score. This is a list of all the pitch class simultaneities in the piece in order of location in the score (Figure 2.2).
2 Method The analytical procedure described below has been embodied in five computer programs: SaveData, TranSets, Superset, UniMap and Network. The programs are applied to the data in this order. 2.1 SaveData Program The SaveData program allows pitch class simultaneity representations to be input and edited. The first four columns of Figure 3 show the pitch class simultaneity representation of Bach’s harmonisation of the Chorale, “Christus, der ist mein Leben” (number 6). The pitch class simultaneity representations of different pieces are comparable sequences of defined syntactic units (pitch class simultaneities). If all pitch class simultaneities with
Contemporary music review
310
distinct simultaneity sets are considered to be distinct syntactic units then Figure 4 shows the simplest network capable of generating the pitch class simultaneity representation in Figure 3. The syntactic network in Figure 4 is very complex because a pitch class simultaneity is a very small syntactic unit and many distinct pitch class simultaneity sets can be used even in short pieces. It is therefore desirable to express pieces in terms of syntactic units which are, in general, longer than pitch class simultaneities. This aim requires a computable procedure to be defined for unifying pitch class simultaneities into longer segments according to some specified criteria. Lerdahl and Jackendoff (1983) propose a method for defining segments whose duration is in general longer than single pitch class simultaneities.2 Unfortunately, Lerdahl and Jackendoff’s procedure is not defined in a sufficiently formal manner to enable straightforward embodiment in computer programs. They explicitly state that they are “not concerned whether or not [their] theory can readily be converted into a computer program…”3. However, I am not alone in believing that a formal psychological theory ought to have a computational basis.4 The non-computability of Lerdahl and Jackendoff’s theory is exemplified by the terms in which they state their fourth “Grouping Preference Rule” (GPR4): NUMBER BAR FRACTION SIMULTANEITY SET TRANSITION SET 1
1
3/
4
{8, 0, 3}
2
2
1/
4
{10, 3, 7}
{8, 0, 3, 10, 7}
3
2
2/
4
{6, 3, 8, 0}
{10, 3, 7, 6, 8, 0}
4
2
3/
4
{5, 8, 1}
{6, 3, 8, 0, 5, 1}
5
3
0/
4
{3, 7, 10}
{5, 8, 1, 3, 7, 10}
6
3
1/
8
{3, 8, 10, 5}
{3, 7, 10, 8, 5}
7
3
2/
8
{3, 10, 7}
{3, 8, 10, 5, 7}
8
3
3/
8
{3, 7}
{3, 10, 7}
9
3
2/
4
{0, 8, 3}
{3, 7, 0, 8}
10
3
3/
4
{8, 5, 1}
{0, 8, 3, 5, 1}
11
4
0/
4
{0, 3, 8}
{8, 5, 1, 0, 3}
12
4
1/
4
{1, 8, 5, 10}
{0, 3, 8, 1, 5, 10}
13
4
3/
8
{1, 7, 10}
{1, 8, 5, 10, 7}
14
4
2/
4
{0, 8, 3}
{1, 7, 10, 0, 8, 3}
Computer-aided comparison of syntax systems
311
15
4
3/
4
{10, 8, 3}
{0, 8, 3, 10}
16
4
7/
8
{10, 7, 3}
{10, 8, 3, 7}
17
5
0/
4
{8, 3, 0}
{10, 7, 3, 8, 0}
18
6
0/
4
{2, 8, 0, 5}
{8, 3, 0, 2, 5}
19
6
1/
8
{2, 8, 10, 5}
{2, 8, 0, 5, 10}
20
6
1/
4
{3, 7, 10}
{2, 8, 10, 5, 3, 7}
21
6
3/
8
{7, 3}
{3, 7, 10}
22
6
7/
16
{7, 3, 1}
{7, 3, 1}
23
6
2/
4
{8, 0, 7, 5}
{7, 3, 1, 8, 0, 5}
24
6
5/
8
{8, 0, 5}
{8, 0, 7, 5}
25
6
3/
4
{7, 0, 3}
{8, 0, 5, 7, 3}
26
6
7/
8
{7, 0, 3, 10}
{7, 0, 3, 10}
27
7
0/
4
{5, 0, 3, 8}
{7, 0, 3, 10, 5, 8}
28
7
1/
8
{5, 0, 3}
{5, 0, 3, 8}
29
7
1/
4
{5, 10, 2}
{5, 0, 3, 10, 2}
30
7
2/
4
{3, 10, 7}
{5, 10, 2, 3, 7}
31
7
3/
4
{0, 8, 3}
{3, 10, 7, 0, 8}
32
8
0/
4
{10, 1, 8}
{0, 8, 3, 10, 1}
33
8
1/
8
{1, 7, 10}
{10, 1, 8, 7}
34
8
1/
4
{0, 3, 8}
{1, 7, 10, 0, 3, 8}
35
8
2/
4
{1, 4, 8, 10}
{0, 3, 8, 1, 4, 10}
36
8
3/
4
{3, 7, 10}
{1, 4, 8, 10, 3, 7}
37
8
15/
16
{3, 1, 7, 10}
{3, 7, 10, 1}
38
9
0/
4
{8, 0, 3}
{3, 1, 7, 10, 8, 0}
ENDING LOCATION: 9 ¾
Figure 3. Pitch class simultaneity representation of Chorale number 6.
Contemporary music review
312
Figure 4 Syntactic network for pitch class simultaneity representation of Chorale number 6. “Where the effects picked out by GPRs 2 and 3 are relatively more pronounced, a larger-level group boundary may be placed.”5 Because they at no point describe a formal procedure for deciding which of several given potential grouping boundaries predicted by GPRs 2 and 3 are “relatively more pronounced”, this rule cannot be directly embodied within a program. There are many other rules in their theory which are not defined sufficiently precisely to enable translation into a formal procedure: for example, their first, fourth, seventh and eighth “Metric Preference Rules”; their fourth “Time Span Reduction Well-Formedness Rule”; and their second, fourth and seventh “Time Span Reduction Preference Rules”. The fact that many of these rules are not expressed in sufficiently defined terms to enable embodiment in the form of computer programs contradicts Lerdahl and Jackendoff’s claim that their theory is a “formal” one. They assert that their grouping well-formedness and grouping preference rules “constitute a formal theory of musical grouping”.6 Yet, although they admit that these rules “would not be sufficient to provide a foolproof algorithm for constructing a grouping analysis from a given musical surface…”7 they do not admit that these rules are insufficient even to provide a formal prediction of one or more grouping structures which may be perceived by the listener. Since they do not provide the basis of a formal procedure for achieving this. I believe that Lerdahl and
Computer-aided comparison of syntax systems
313
Jackendoff’s theory fails to achieve the necessary level of definition which would be required for it to be considered “formal”. The analyst is therefore incapable of arriving at a prediction of perceived grouping structure without making many tacit assumptions and decisions which Lerdahl and Jackendoff’s theory does not tell him how to make. Strictly comparable analyses of different pieces can only be generated if the method of analysis is sufficiently well-defined for the analyst to be utterly certain that he is applying an identical analytical method to each piece. 2.2 TranSets program Most music analysts who aim to account for the pitch structure of “tonal” pieces in terms of collections of pitch classes attempt to do so by means of standard pitch class set types such as diatonic sets, whole-tone scale sets and so on. That is, most analysts assume a lexicon of “allowable scale types” which they feel at liberty to use in their account of music which they are attempting to interpret. However, the choice of pitch class set types to include in and exclude from the lexicon is guided by the musical knowledge and intuition of the analyst, and by pre-selecting (either consciously or unconsciously) a list of allowable set types in this manner the analyst is necessarily limiting the range of discoveries which he or she is likely to make from the resulting analyses. I therefore decided to invent a method of analysis which did not rely on a predetermined lexicon of allowable pitch class set types. Instead, the list of allowable scale types was to be derived from the piece being analysed by a formal, strictly defined and repeatable procedure. The TranSets program performs the first step in this procedure. A segment consisting of just two pitch class simultaneities is called a pitch class simultaneity transition. The operational pitch class set for any given pitch class simultaneity transition is called the transition set for that pitch class simultaneity transition. For example, the transition set for the first pitch class simultaneity transition in Chorale number 6 would be:
(see Figure 3). The TranSets program finds and lists the transition set for each pitch class simultaneity transition in the pitch class simultaneity representation of the piece (Figure 3, column 5). 2.3 Superset program The Superset program derives a list of “allowable pitch class sets” from the list of transition sets by examining this list and recording all those distinct transition sets which are not subsets of any other transition sets. The list of pitch class sets which results for Chorale number 6 is shown in Figure 5. Every transition set in the fifth column of Figure 3 is either equal to one of the sets in this is list or a subset of one or more sets in this list. If a potentially operational pitch class set for a segment is defined as any pitch class set which contains the operational pitch class set for the segment, then the set of pitch class sets in Figure 5 has the property that for
Contemporary music review
314
any transition in the pitch class simultaneity representation there exists a potentially operational pitch class set in the set of sets in Figure 5. If a set of pitch class sets has this property for a given piece, the set is said to be sufficient to express the piece. 1.
{10, 3, 7, 6, 8, 0}
2.
{6, 3, 8, 0, 5, 1}
3.
{5, 8, 1, 3, 7, 10}
4.
{0, 3, 8, 1, 5, 10}
5.
{1, 7, 10, 0, 8, 3}
6.
{2, 8, 10, 5, 3, 7}
7.
{7, 3, 1, 8, 0, 5}
8.
{7, 0, 3, 10, 5, 8}
9.
{0, 3, 8, 1, 4, 10}
10.
{1, 4, 8, 10, 3, 7}
11.
{8, 3, 0, 2, 5}
12.
{2, 8, 0, 5, 10}
13.
{5, 0, 3, 10, 2}
Figure 5 Transition supersets generated by the Superset program for Chorale number 6. 2.4 UniMap program The UniMap program uses the list of supersets generated by the Superset program to express the piece as sequences of defined syntactic units which are adjacently distinct in terms of pitch class content. Such a sequence of syntactic units is called a segmentation map of the piece. The set of pitch class sets generated by the Superset program can be used to generate other sets of pitch class sets which are sufficient to express the piece. It is first useful to define the relation. S, over the set of all pitch class sets. C. In general, a relation, R, defined over a set. a, is denoted
and the order of a set, a, is denoted
Given
and
Computer-aided comparison of syntax systems
315
then the relation, <S; C> is defined as follows:
Also,
If the set of pitch class sets which is generated by the Superset program for a given piece is denoted.
then it is now possible to define by induction some other sets of pitch class sets which are sufficient to express the piece,
In general, a member of such a set will be denoted:
For all pairs of sets,
such that
there exists a set,
such that
Also if for a given ti,j there exists no ti,k such that
then
The set Ti+1 is then a sufficient set for expressing the piece given that Ti is such a set. Ti+1 contains only those sets which satisfy one of the two conditions stated above. The first part of the UniMap program generates the sets Ti where T1 is defined as the set of pitch class sets generated by the Superset program (Figure 6).
Contemporary music review
316
Figure 6 The sets, Ti generated by the UniMap program for Chorale number 6. The second part of the Unimap program generates segmentation maps. One segmentation map is generated for each of the sets, Ti. Each segment in each segmentation map has an associated potentially operational pitch class set which is either a member of the set, Ti, for the map being generated or is a subset of two or more members of the set, Ti. The procedure by which the UniMap program generates segmentation maps will now be described in outline. The algorithm is too complex to be described in detail here. To find the first segment in the segmentation map for the set of sets. Ti, the program finds the longest initial segment for which there exists at least one set.
which contains the simultaneity set for each pitch class simultaneity in the segment. Let this initial segment contain all the pitch class simultaneities from s1c to skc and be denoted [s1c, skc]. If the operational pitch class set for this segment is a subset of a unique set, then the latter set is considered to be the potentially operational pitch class set for this initial segment. If the operational pitch class set for [s1c, skc] is a subset of more than one set, , then the potentially operational pitch class set for the segment is considered to be the operational pitch class set for this segment. The succeeding segments in the segmentation map are generated in a similar way.
Computer-aided comparison of syntax systems
317
The segmentation map which results for Chorale number 6 using the set, in Figure 7.
is shown
Figure 7 Segmentation map for Chorale number 6 which uses 2.5 Network program The segmentation maps generated by the UniMap program are then examined by the Network program which generates the simplest syntactic network which can account for each segmentation map. Figure 8 shows the syntactic networks for the segmentation maps using and in Chorale number 6. The pitch class sets are represented in the network by the number of their position in the set, when this set is written in the order in which it is generated by the UniMap program. For Chorale number 6, this order is that in which the sets are written in Figure 6.
Figure 8 Syntactic network for for Chorale number 6.
and
Contemporary music review
318
Figure 9 syntactic networks for three pieces by Debussy.
Computer-aided comparison of syntax systems
319
Sets with “E” prefixes are operational pitch class sets for their segments and are subsets of more than one set, from the set, for the segmentation map being examined. The networks are drawn so that the pitch class sets are in their order of first appearance in the segmentation map, starting with the node at the top of the network and going clockwise. (Compare Figures 7 and 8.)
3 Results for three piano pieces by Debussy 3.1 Comparison of networks for
segmentation maps
Figure 9 shows the syntactic networks generated by the Network program for the sets for “Clair de lune”, “Reflets dans l’eau” and “Cloches à travers les feuilles”. The segmentation maps and a complete listing of the sets for these pieces are available on request.8 3.1.1 Syntactic function If a potentially operational pitch class set emerges early in a segmentation map, it will be positioned “more anti-clockwise” in the corresponding network. If a pitch class set is adjacent to many distinct other sets in the segmentation map, it will have a large number of arrows originating or terminating at it in the corresponding network. By considering how early a pitch class set emerges in a segmentation map and the number of other pitch class sets to which it can be adjacent in a segmentation map, it is possible to gain some idea of the type of syntactic function which the pitch class set serves in its segmentation map. Figure 9 reveals that set 1 in “Cloches à travers les feuilles”, set 3 in “Clair de lune” and set 8 in “Reflets dans l’eau” occur early and interact with a relatively high number of other sets. They appear to serve similar syntactic functions in their respective segmentation maps. Note also that “Clair de lune” and “Reflets dans l’eau” are both “in D flat major”, and that set 3 in “Clair de lune” is equal to set 8 in “Reflets dans l’eau”. Both are the set, {4,6,8,9,11,1,3}—the D flat major diatonic set. It is encouraging that the same pitch class set emerges as performing a similar syntactic function in two pieces traditionally regarded as being in the same key. It is especially encouraging to discover that this apparently important “nexus” function is served by the diatonic set of that key in both pieces. Thus, by an entirely automated analytical method which does not embody any prejudice towards expressing the music in terms of traditional scale-type pitch class sets, the D flat major diatonic set emerged as having a nexus-type syntactic function in two pieces which are traditionally regarded as being “in D flat major”. This observation suggests the possibility of finding universal syntactic function categories which are represented in many different pieces but not necessarily by the same pitch class set type in every piece. The three networks in Figure 9 were therefore compared with a view to classifying the syntactic functions of the pitch class sets in their
Contemporary music review
320
segmentation maps into a number of categories, each category being represented by at least one pitch class set in each piece. The following section describes how syntactic function can be formally characterised. 3.1.2 Nodal order proportion and angular proportion The nodal order proportion of a node in a network is an index of the extent to which the set represented by the node interacts with the other potentially operational pitch class sets in its segmentation map. The order of a node is defined as the number of arrows originating or terminating at the node. The nodal order proportion is defined as the ratio of the order of the node representing the pitch class set to the order of the node with the greatest order in the network. Thus the nodal order proportion of set 9 in “Cloches à travers les feuilles” in Figure 9. is nodal order proportion of set 9 =(nodal order of set 9)÷(nodal order of set 1) =5÷9≈0.556. The angular proportion of a set is an index of the order of the set in the sequence in which the sets are introduced in the piece. It is defined as the ratio of the order in which the set is introduced less 1, to the total number of sets in the network. Thus the angular proportion of set 9 in “Cloches à travers les feuilles” would equal (3−1)÷24=1/12 ≈0.083. 3.1.3 Function graphs The above definitions of nodal order proportion and angular proportion allow a graph to be constructed of nodal order proportion against angular proportion. Such a graph is called a function graph. Since these two quantities were devised as two measures allowing the characterisation of syntactic function, the point on a function graph corresponding to a particular pitch class set in a syntactic network indicates the syntactic function of that set within its segmentation map. The syntactic functions of sets in different pieces can be compared by plotting them on the same graph. The similarity between the syntactic functions of different sets in their respective segmentation maps can be measured by their proximity on the function graph. Figure 10 shows a function graph with all the potentially operational pitch class sets from the three networks in Figure 9 represented. 3.1.4 Function groups Each potentially operational pitch class set in each piece is considered to correspond in terms of syntactic function most closely with the pitch class set from each of the other pieces being compared which lies closest to it on the graph.
Computer-aided comparison of syntax systems
321
Figure 11 shows that in general, if a1, a2 and a3 are pitch class sets from piece A, and b1 and b2 are pitch class sets from piece B, then it is possible for the pitch class set in piece B which is closest to a1 on the function graph to be b1 while the pitch class set in piece A which is closest b1 to on the function graph is a2. When this is the case, a1, a2 and b1 must fall into the same essential syntactic function category if such categories exist. a1, a2 and b2 form a function group. For any function group, it is possible to derive a mean syntactic function for the pitch class sets forming the function group. This is the syntactic function represented by the two-dimensional mean of the points on the
Figure 10 Function graph for sets for all three Debussy pieces. A: Claire de lune B: Reflets dans l’eau C: Cloches à travers les feuilles graph representing the pitch class sets in the function group. This point is called the centroid of the function group.
Contemporary music review
322
3.1.5 Derivation of function group centroids for the three pieces by Debussy When all three networks for the three Debussy pieces are graphed on the same function graph and function groups are derived, only one function group emerges. However, the syntactic functions of the various sets in the three networks can still be categorised by this method of comparison. This is done by comparing the pieces in pairs, deriving three sets of function group centroids, one for each pair comparison, and then comparing these centroids on the same graph. When this is done, the syntactic functions of
Figure 11 Function groups.
Figure 12 Function graph for sets with XYZ centroids for all three Debussy pieces.
Computer-aided comparison of syntax systems
323
A: Clair de lune B: Reflets dans l’eau C: Cloches à travers les feuilles XYZ: Centroids for function groups the sets in the three networks can be categorised into four syntactic function categories (function groups), each category represented by a centroid. Figure 12 shows these centroids (XYZ1–4), plotted with all the points representing the potentially operational pitch class sets in the segmentation maps for for the three pieces. If it is assumed that the syntactic function of a given potentially operational pitch class set falls into the category represented by the XYZ centroid which is closest to it, then the graph of Figure 12 can be divided into the four regions shown, each region representing a syntactic function category. for “Clair de lune” and “Reflets dans l’eau” were mentioned above as both being equal to the D flat major diatonic set and as having similar syntactic functions in their syntactic netrworks. Note therefore that these sets emerge as belonging to the same syntactic function category (XYZ2). 3.2 Comparison of networks of similar complexity segmentation maps for “Clair de lune” Figure 13 shows the syntactic networks for the segmentation map for and “Reflets dans l’eau” and the syntactic network for the “Cloches à travers les feuilles”.
Contemporary music review
324
Figure 13 networks for “Clair” and “Reflets” and network for “Cloches”.
Computer-aided comparison of syntax systems
325
Figure 14 Function graph for networks in Figure 13. A: Clair de lune B: Reflets dans l’eau C: Cloches à travers les feuilles ABC: Centroids of function groups These three networks are of a similar complexity. When these three networks are compared on a function graph, four function groups emerge (Figure 14). The segmentation maps corresponding to the networks in Figure 13 can be expressed in terms of syntactic function categories with the aid of Figure 14. In these segmentation maps, each set is represented by the centroid of the function group of which it is a member (ABC1–4). The resulting segmentation maps and the implied syntactic function category networks are shown in Figure 15. Figure 16 shows the simplest network capable of generating all the sequences in Figure 15.
Contemporary music review
326
This network represents a syntax system which could have given rise to the pitch structures of all three pieces.
Figure 15 Segmentation maps and syntactic function category networks for networks in Figure 13.
Computer-aided comparison of syntax systems
327
Figure 16 Sum network for networks in Figure 15.
Figure 17 Comparison of co-ordinates of ABC and XYZ centroids and bestfit networks. 4 Conclusions In each of the comparative analyses described above, the comparison procedure classified the syntactic functions of the potentially operational pitch class sets in the segmentation maps from all three pieces into four syntactic function categories. The table in Figure 17.1 compares the co-ordinates of the set of centroids, ABC, with those of the set of centroids, XYZ. The best-fit networks derivable from these coordinates are identical (Figure 17.2). This suggests that a very similar syntax system is operating on at least two levels of structure in all three pieces being compared. In each comparison, the syntactic function categories which emerge can be qualitatively described by means of the co-ordinates of their centroids. For example, because ABC1 and XYZ1 have low angular proportion, they represent sets which emerge early in the segmentation maps. Because they have low nodal order proportion, they represent sets which do not interact extensively with other sets in their segmentation maps. The method described above is an example of a formal and repeatable analytical procedure. When this procedure was applied to three pieces which were suspected of having deep similarities in their pitch organisation, it was sufficient to show some ways in which this was so. In particular, it was shown that it is possible to derive a syntax system from the pitch structure of the pieces themselves which is capable of accounting
Contemporary music review
328
for the structure of all three pieces. This was achieved without recourse to an arbitrary predetermined lexicon of allowable potentially operational pitch class set types.
Notes 1. Howat, R. (1983) Debussy in Proportion, Cambridge: C.U.P. pp. 44, 146. 2. Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal Music. M.I.T.Press: Cambridge, Mass.; pp. 146–178. 3. ibid. p. 333 (note 6 to Chapter 1). 4. See also Johnson-Laird (1983), Laske (1983) and Marr (1977). 5. Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal Music. M.I.T.Press: Cambridge, Mass.; pp. 49 and 346. 6. ibid; p. 53. 7. ibid. 8. Please address enquiries to David Meredith, St. Anne’s College, Oxford, OX2 6UQ, U.K. and include a self-addressed A4 envelope.
Bibliograhy Howat, R. (1983) Debussy in Proportion. C.U.P.: Cambridge. Johnson-Laird, P.M. (1983) Mental Models. C.U.P.: Cambridge. Laske, O.E. (1983) Artificial intelligence topics in computer-aided composition; paper presented at the 1983 International Computer Music Conference, Oct. 7–10; Rochester, N.Y. Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal Music. M.I.T. Press: Cambridge, Mass. Marr, D.C. (1977) Artificial intelligence—A personal view; Artificial Intelligence, Vol. 9, pp. 37– 48.
General issues in cognitive musicology
329
General issues in cognitive musicology A semiotic approach to music Nicolas Meeùs Conservatoire Royal de Musique de Bruxelles, Belgium Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 305–310 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
The purpose of this paper is to sketch a semiotic approach to music that avoids relying on linguistic models. Peirce’s triadic description of the sign is drawn into a landscape in which any sign is viewed as a node in a latent network of potential intersemic relationships. Whenever a sign is uttered, in speech or in musical performance, it activates an area of the emitter’s and the receiver’s semiotic network; the felicity of the exchange depends on the level of similitude between the networks. Basic musical techniques such as repetition, variation or development are the means of structuring the listener’s network. The musical discourse tentatively controls the listener’s meandering through the semiotic network, and the score is a notation of the discursive strategy. KEY WORDS: Semiotics, network (semiotic), intersemic relationships, discursive strategy.
Musical semiotics as an approach to a general semiotics The purpose of this paper is to sketch a semiotic approach to music that would avoid relying on linguistic models; it proceeds from the conviction that these models, especially that of verbal communication, are not suitable for the development of a true semiotics of music. The need to distinguish what is particular to language from what belongs to a general semiotics had been stressed already by Ferdinand de Saussure when he proposed
Contemporary music review
330
his programmatic description of the task of the linguist, “to define what makes the language a special system within the set of semiological facts” (1916/1972:33). But this programme has not been fulfilled, and a certain confusion reigns today between linguistics and semiotics. A general semiotics by definition should account for non-linguistic semiotic facts as well as for linguistic ones. Music, by reason of its particular position with respect to language, analogous to it in several ways, essentially different in many others, may for the time being form a better model than language for a general semiotic theory. The present discussion, therefore, indirectly aims at testing the possibility of such a general semiotics. It is a common prejudice that communication should form a primary concern of semiotics. “Semiology, writes Buyssens, can be defined as the study of the processes of communication” (1967:11). This is the position of the functionalists for whom semiotics is concerned with signs only insofar as they have communication as their function (Prieto, 1966; Mounin, 1968, 1970). Jean-Jacques Nattiez, however, show that “communication is but a particular case of the various modes of exchange, one among the possible consequences of the processes of symbolization” (1987:39). This is not to say that the matter of communication is of no importance, but merely that it would form a rather uncomfortable premise for the construction of a musical semiotics. Even if there may exist in music “a fructuous coincidence between the intention, the structure of the work and the expectations of the listeners” (D. Stockmann, quoted in Nattiez, 1987:40), it cannot be decided a priori that this coincidence necessarily amounts to an inter-personal communication in the strict sense of the term. Most of the problems of extending a linguistic semiotics to non-linguistic signs exist at the interface between the semiotic system proper and its surroundings: problems arise when signs have to be related to their meaning, their reference, on the one hand; to things or states of the world, their referent, on the other hand. And despite Eco’s claim (Eco, 1977), the theory of codes, which is a theory of the relationship between signs and meanings, is too much oriented towards language to form the basis of a general semiotic theory. The tendency to consider signs as consisting of form and content (Saussure, 1917/1972:25ss., 98, and passim; Hjelmslev, 1943/1971:65ss.) has a linguistic basis. At the same time, it must be realized that the preponderance of linguistic considerations in the semiotic debate is not merely the result of some excessive arrogance on the part of linguistics. The study of the meaning of non-linguistic signs often supposes a verbalization; because language is the main vehicle of thought, introspection usually requires the use of words; also, as language remains the best vehicle for interpersonal communication, sharing a semiotic experience seems to demand some form of verbalization. As Barthes said, “man is condemned to articulate language” (1967/1981:9). A general semiotics therefore must keep language among its models—but that, needless to say, is not at all the same thing as making use of linguistic models. It is necessary to return to the hard stone of semiotics, the formal study of signs and of their relationships within a single semiotic system. Matters of intersystemic relationships (e.g. correspondences between musical and verbal signs) or of the relation of signs to extrasemiotic systems (including systems of referents) may be considered less pressing, if not less important.
General issues in cognitive musicology
331
The semiotic network Peirce’s theories, because they centre on signs as such, offer a useful departure point for the present undertaking. Peirce’s triadic description of the sign is now well known (see Eco, 1979/1985:68ss; Nattiez, 1979–1980); in a somewhat simplified presentation, it may be summarized as follows: a sign, a representamen, is something which stands for something, its object, and projects an equivalent sign, its interpretant (cf 2.228).1 What is perhaps less obvious is that, in Peirce’s conception, both the object and the interpretant are signs themselves. The interpretant, therefore, cannot be identified with the meaning (see Deledalle in Peirce, 1978:222), nor with Saussure’s signifié, nor with Ogden and Richard’s reference. The Peircean object, similarly, is not to be confused with Ogden and Richard’s referent (Ogden and Richards, 1923/1966:10s.). “The object of a representation, writes Peirce (who means a mental representation, a representamen), can be nothing but a representation of which the first representation is the interpretant (…). The interpretant is nothing but another representation to which the torch of truth is handed along; and as representation, it has its interpretant again” (1.339; see also 1.538–542). The objects form “an endless series of representations, each representing the one behind it” (1.339), and the sign “determines (…) its interpretant to refer to an object to which itself refers (…) in the same way, the interpretant becoming in turn a sign, and so on ad infinitum” (2.303). Peirce’s description offers an unlimited chain of signs, each of which determines its down-stream neighbour, its interpretant, to refer to its up-stream neighbour, its object. Peirce stresses that the triadic relation “cannot consist in any actual event that ever can have occurred; for in that case there would be another actual event connecting the interpretant to an interpretant of its own of which the same would be true; and thus there would be an endless series of events which could have actually occurred, which is absurd. For the same reason the interpretant cannot be a definite individual object. The relation must therefore consist in a power of the representamen to determine some interpretant to being a representamen of the same object” (1.542). The relation, in other words, remains potential until realized by a process of semiosis. But it is clear that, before an actual semiosis occurs any sign potentially relates to several signs-objects and to many signsinterpretants. Peirce’s theory of an unlimited semiotic chain may therefore be extended to a conception in which any sign is viewed as a node in a latent network of potential intersemic relationships. The idea of this network is not entirely new: Eco spoke of “the network of interconnected properties that form the Global Semantic Field” (1979/1985:112); Nattiez defines a “symbolic form” as “a sign or a set of signs to which is attached an infinite complex of interpretants” (1987:30); and there are obvious correspondences between the idea of a semiotic network and those of cognitive frames or networks. Whenever a representamen comes to the mind, either in a mental process or as the result of an external solicitation, it activates the surrounding network, defining an upstream part of it as the Peircean object and a down-stream part as the Peircean interpretant. But the network must be preexistent, if only in a latent form: “The Sign can only represent the Object and tell about it. It cannot furnish acquaintance with or recognition of that Object; for that is what is meant (…) by the Object of a Sign; namely,
Contemporary music review
332
that with which it presupposes an acquaintance in order to convey some further information concerning it” (2.231). Peirce did not provide a very coherent theory of meaning; we saw that the meaning cannot be equated with the interpretant.2 A sign alone has no meaning (Hjelmslev, 1943/1971:62); it acquires significance only when triadically related to one or several objects and to one or several interpretants. The latent relations of a sign to possible objects and interpretants, i.e. the area of the semiotic network surrounding it, therefore establish for the interpreter the range of its possible significations. In this conception, the signification of a sign is a manière d’ être, a syntactic quality of the semiotic network around it. It is in this sense that the signification of a piece of music may be thought to reside entirely in its syntax. The signification so defined may relate to an external meaning, to a concept; the study of this possible relationship is the field of semantics. But it might be argued that the relation to an external meaning is necessary only in the case of language.
Interpersonal semiotic exchanges The above description expresses an essentially abstract conception in which the semiosis is described as a process within an individual’s semiotic network. The case of the interpersonal semiotic exchange must now be considered; it may be viewed as a process of socialization of the individual semiosis. Whenever a sign is brought forth in an actual utterance, as in speech or in musical performance, it not only activates an area of the semiotic network of the emitter, but may also activate a similar sign and a corresponding area in the receiver’s network: the emission of the sign that triggered the semiosis in the emitter’s network allows it to trigger a similar semiosis in the receiver’s network. This exchange does not necessarily imply the communication of a message, since, as remarked above, the triadic semiotic relation does not necessarily convey an external meaning. But it is clear that the success, the “felicity” of the exchange depends on a level of similitude between the individual semiotic networks; it is the similitude between the networks that has been described as involving a “code”. Let us first envisage the case of a verbal exchange. The structure of the semiotic networks in this case is strongly determined and regulated by the meaning of the signs: the syntactic structure is subordinated to the semantic one; the depth of activation of the interlocutors’ networks, i.e. the number of active denotative or connotative interpretants, is also largely determined by the semantic structure. The success of the exchange therefore supposes both a syntactic and a semantic validity of the statement uttered. “Syntactically valid”, in the present context, means “belonging to the interlocutor’s semiotic networks”. Musical exchanges may be quite similar, in some cases, to verbal ones. An often quoted example is that of the Wagnerian Leitmotive, which may convey a conventional meaning. This is, possible, of course, only if the listener is aware of the convention; that is, if the particular signifier of the Motive belongs to the latent intersemic relationships in that listener’s own semiotic network. Two points must be noted in this respect. One is that the signification considered here almost inescapably involves a verbalization: the network of the Wagnerian initiate must include a potential relation of the musical motif to
General issues in cognitive musicology
333
verbal interpretants, and it is this connotative relation which gives way to the motif’s particular meaning. The second point is that this relation to verbal interpretants is not necessarily activated when the motifs are heard: the listener may not notice them as such, or may not want to think of them in this particular sense. This demonstrates the absence of inner necessity in the latent relationships that make up the semiotic network. But the semiotic function of Wagnerian Leitmotive cannot be considered representative of a musical semiotics at large. Although any musical sign may connote verbal interpretants,3 it is the syntax of musical signs themselves that must engage us here. A non-linguistic syntax, because it is largely independent from any external meaning, in general has a much higher degree of freedom than a linguistic one. In the case of music, it appears that the work itself often provides the means for structuring the listener’s network. Basic musical techniques such as repetition, variation, development, etc., are techniques which state intersemic relationships before building upon them. Tonal music, similarly, often begins with a statement of the principal tonal functions, tonic, subdominant and dominant, in what Sadai (1980) calls a functional cycle; this initial statement establishes essential semiotic relationships before making use of them in a semiosis that triggers ever more distant interpretants. As the conventionality of the musical style increases, the need for such initial statements lessens; this is particularly obvious in late tonal music, in the second half of the 19th century, where the initial tonal statement at times completely disappears. But even in such works, the musical structure remains based on stated or implied intersemic relations. Modern techniques in music analysis aim at identifying this when they stress a musical linearity, when they illuminate musical implications and their realization, or when they evidence a paradigmatic structure of the work. A musical style may be defined as involving a predetermined semiotic network. “A style, writes Nattiez, is the identification of recurrent figures. But these figures are not the same to everyone, because style itself is a semiologic fact: the composer wrote a work, an ensemble of works, and from these traces the listeners have formed a more or less precise image of what the style of Wagner or Debussy may be” (1975:88). This “more or less precise image” consists in an awareness of the particular syntactic validity of some intersemic relationships, i.e. in a competence inscribed in one’s individual semiotic network. But the theory of the semiotic network also explains how the awareness of a particular musical style may be a very personal competence and why, even in the absence of this competence (e.g. in intercultural exchanges), a true musical experience remains possible. A discourse, be it musical or verbal, tentatively controls the listener’s meanderings through her own semiotic network; a certain feedback is possible in conversation or perhaps in musical improvisation. In the case of a written text or of a musical score, however, the emitter controls the uttered signs only and can but hope that the interpretants that they trigger in the receiver’s network are similar to those in his own network. Some of these non-uttered signs depend on context, on topic, on circumstances: they may be called the pragmatic objects and the pragmatic interpretants. The persuasiveness of a discourse consists either in rendering it as independent as possible from pragmatic signs, that is in uttering signs whose relation to their objects and interpretants is as unambiguous as possible, or in making correct hypotheses about
Contemporary music review
334
pragmatic relationships that may be triggered in the receiver’s network. It goes without saying that the co-textuality of the uttered signs is a factor in removing ambiguity. The way in which a theory of the semiotic network may fit within the tripartition of Molino and Nattiez (see Molino, 1975) must now briefly be considered. A score, or more generally a text, is the notation of a strategy of persuasion. The poïetic work by which the strategy is elaborated usually involves tests carried by the emitter on his own semiotic network. More than once during the creative process, the artist must put himself in the position of the receiver: the composer must listen to his work, the painter must step back to embrace her canvas in one look. So doing, they leave their semiotic network as free as possible to develop pragmatic interpretants, in order to form some idea of how the semiosis may unfold in the receivers’ network. In this sense, the poïesis involves an activity in many ways comparable to an aesthesis. In the case of musical performance, the performer filters the musical text through his own pragmatic objects and pragmatic interpretants, producing an acoustic image that again is a strategy. Finally, on the side of the receiver, the aesthesis again involves an activation of pragmatic objects and interpretants, an activity which resembles poïesis. The poïesis could be described as a semiosis that utters signs, the aesthesis as one triggered by external signs. But both poïesis and aesthesis are essentially individual processes, involving darker areas of the individual semiotic networks. The text, the score (or its acoustic image) is the material trace of the semiotic exchange and, as such, the first and the main object of study. It reveals little, however, of the semiotic processes proper, which occur in the secrecy of the individual semiotic networks.
Notes 1. Following the usage, references to Peirce’s Collected Papers (1931–1960) consist in the number of the volume followed by the number of the paragraph. 2. The most extended discussion of meaning is in 5.475ss. At first reading, it might seem that Peirce equates meaning with interpretant. But he can only be thinking of the interpretant as it relates to the representamen, and this relation, being triadic, necessarily involves that between sign and object: Peirce’s conception, therefore, is similar to the one proposed here. 3. There are reasons to believe, however, that the verbalization of music took on a particular importance during a relatively short period in the history of occidental musical, namely from the end of the 16th century or the beginning of the 17th, when rhetoric considerations became normative in composition, until the end of romanticism. Verbalization may be less developed in pre-Baroque music, as in some non-European cultures.
References Barthes, R. (1967/1981) Système de la mode. Paris: Seuil. Buyssens, E. (1967) La communication et l’ articulation linguistique. Paris: PUF. Eco, U. (1977) A Theory of Semiotics. London: Macmillan. Eco, U. (1979/1985) Lector in Fabula ou la coopération interprétative dans les textes narratifs, traduit de l’italien par M.Bouzaher. Paris: Grasset. Hjelmslev, L. (1943/1971) Prolégomènes a une théorie du langage, traduit du danois par U.Canger avec la collaboration d’A.Wewer. Paris: Minuit. Molino, J. (1975) ‘Fait musical et sémiologie de la musique’. Musique en jeu, 17, 37–62.
General issues in cognitive musicology
335
Mounin, G. (1968) Clef pour la linguistique. Paris: Seghers. Mounin, G. (1970) Introduction à la sémiologie. Paris: Minuit. Nattiez, J.-J. (1979–1980) ‘Les fondements théoriques de la notion d’interprétant en sémiologie musicale’. Journal canadien de recherche sémiotique VII /2, 1–19, reproduced with complements in Nattiez, 1988:143–171. Nattiez, J.-J. (1987) Musicologie générale et sémiologie. Paris: Bourgois. Nattiez, J.-J. (1988) De la sémiologie à la musique, Cahiers du département d’études littéraires 10. Montréal: Université du Quebéc. Ogden, C.K. & Richards, I.A. (1923/1966) The Meaning of Meaning. A Study of the Influence of Language upon Thought and of the Science of Symbolism. London: Routledge & Kegan Paul (1st ed., 1923; 10th ed., 6th impression, 1966). Peirce, Ch.S. (1931–1960) Collected Papers. Cambridge (Mass.): Harvard University Press, 8 vols. Peirce, Ch.S. (1978) Ecrits sur le signe, rassemblés, traduits et commentés par G.Deledalle. Paris: Seuil. Prieto, L. (1966) Messages et signaux. Paris: PUF. Sadai, Y. (1980) Harmony in its Systemic and Phenomenological Aspects. Jerusalem: Yanetz. Saussure, F. de (1916/1972) Cours de linguistique générale, Genève, 1916; éd. critique par T.de Mauro. Paris: Payot, 1972.
Contemporary music review
336
Psychological analysis of musical composition Composition as design Ron Roozendaal Utrecht School of the Arts, The Netherlands Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 311–324 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
In this paper we describe how techniques from psychology and knowledge engineering are used in forming a conceptual model of the musical composition process. The research in compositional processes which is described is based on research in design processes in different domains: written, architectural, industrial and musical composition. Experiments are conducted in which designers, in this case composers, are observed and recorded when performing their tasks. Recorded observations, including auditory, visual and spoken data, are analyzed afterwards. A model of the composition process derived from research in the above design domains can and will be refined and used in developing computer aided compositional tools in further research. These tools will not be mere “type-setting and printing” aids, as are many of the computer-based tools used nowadays, but will assist the process of musical composition itself. Tools can be automated, as in computer programs, or non-automated, as in a description of a new design environment KEY WORDS: Musical composition, cognition, cognitive science, modelling, design processes.
Introduction Views on compositional processes date back a long time. The first statements, and different points of view, in this field can be found in the texts of two famous Greek philosophers and writers: Plato and Aristotle. As in many issues in current psychology, philosophy and other fields, they have set the scale on which views on (written) composition can be positioned. In their times, differences between musical composers and writers were marginal: poetry was either lyrical or epic, and the writer was also a
Psychological analysis of musical composition
337
composer. Their views will be shown to be translatable into musical design processes when discussing musical composition in particular. Plato is known as a philosopher with a great belief in ‘the divine’, while Aristotle is famous for his view as humans as craftsmen. This difference shows in their thoughts on compositional processes, and on “good” instances of those processes in particular. Aristotle writes in his Poetics: In constructing plots and completing the effect by the help of dialogue the poet should, as far as possible, keep the scene before his eyes. Only thus by getting the picture as clear as if he were present at the actual event will he find what is fitting and detect contradictions. (…). The stories, whether they are traditional or whether you make them up yourself, should first be sketched in outline and then expanded by putting in episodes. I mean that one might look at the general outline, say of the Iphigeneia, like this: (…). Not until this has been done should you put in names and insert episodes: and you must mind that the episodes are appropriate, as, for instance, in the case of Orestes the madness that led to his capture and his escape by means of purification. [……] Now in drama the episodes are short, but it is through them that the epic gains its length. The story of the Odyssey is quite short. (…). That is the essence, the rest is episodes. [Aristotle, Poetics, XVIII] This is a totally different opinion from that stated by Plato in his Ion: I do observe it Ion, and I am going to point out to you what I take it to mean. For, as I was saying to you just now, this is not an art in you, whereby you speak well on Homer, but a divine power, which moves you like that in the stone which Euripides named a magnet, but most people call ‘Hereclae stone’. (…) In the same manner also the muse inspires men herself, and then by means of these inspired persons the inspiration spreads to others, and holds them in a connected chain. For all the good epic poets utter all those fine poems not from art, but as inspired and possessed and the good lyric poets likewise. [Plato, Ion} With the rise of cognitive psychology, Aristotle’s view is more and more accepted as that likely to bring about progress in understanding the creative process, whether totally correct or not. This has resulted in research into compositional processes in different areas: writing, architecture, industrial design, musical composition and many more. In this paper I compare results in those fields and apply those in the field of musical composition.
Written composition Models of the processes of written composition are described by authors such as Britton and Glynn (1989) and Bereiter and Scardemalia (1987). One can distinguish algorithmic,
Contemporary music review
338
descriptive and intermediary models. The first are aimed mostly at demonstrating by means of computer implementations that the algorithmic results for certain writing tasks resemble those of humans. Other models only describe mental processes going into writing and stop at a level where algorithmic description would be required. Intermediary models combine aspects of both descriptive and algorithmic models. Descriptive models can easily be used as hypothetical descriptions of design processes in research aimed at supporting different design strategies. Although algorithmic descriptions can be seen as more accurate, they tend to cover one task per algorithm. Distinctions between design strategies on different sets of tasks can more easily be shown when using a finite sample of the available models of those strategies. Scardemalia and Bereiter build their psychology of written composition on two descriptive models: that of “knowledge-telling” and “knowledge transforming”. In text composition often three stages of expertise are observed. The first phase in the development is named conversation. Children at a young age seldom have the capacity to write as fluently as they speak. They do not catch up until after a few years of attending school (Loban, 1976). The first reason why they are unlikely to do so at an earlier point in their development is the material of which texts are composed. Other reasons are more abstract, such as “thinking without partners” (which may cause great difficulties when tried at first). The initial model of Scardemalia and Bereiter, that of knowledge-telling, describes the processes used after a few years of attending school, and is shown graphically in Figure 1. Notes made by children writing often show a sequential process. They are in an
Psychological analysis of musical composition
339
Figure 1 The Knowledge-telling model. intermediate stage between conversation and knowledge transforming. Knowledge transforming is described in the second model of Bereiter and Scardemalia. Writers using this strategy can be found among people at advanced levels in any intellectual discipline (Bereiter and Scardemalia, 1987). Aldous Huxley describes his thoughts about the process as follows: Generally I write everything many times over. All my thoughts are second thoughts. And I correct each page a great deal, or rewrite it several-times as I go along…Things come to me in driblets, and when the driblets come I have to work hard to make them into something coherent. (Cited in Bereiter and Scardemalia, 1987, page 10) This description indicates that writing through the knowledge-transformation process involves much rethinking, but still is based on unstructured ideas. A graphical description of this second model of Bereiter and Scardemalia is given in Figure 2. The model still
Contemporary music review
340
includes knowledge-telling to compose or generate particular units of the text, but this generation now is planned beforehand. Observing people writing texts has provided clues about observable differences between writers using either or both models given by Bereiter and Scardemalia, although knowledge-telling is in fact one of the possibilities of the transformation
Figure 2 The Knowledgetransformation model. model. Beginning with a new text in knowledge-telling models only required a first item to be retrieved, whereas in transformation models start-up time would vary according to the goals set by the author, the kinds of problems to be solved in advance and the complexity of the plan that must be constructed to reach goals that are set. This start-up time, in opposition to the start-up time in the knowledge-telling model, would be highly variable. In note-making, the writing down of notes and eventual text, the knowledgetelling model would account for a highly sequential process, without much structural information added. The transformation model however would lead to structural additions, structural modifications etc. Both note-making types are observed in the actions of
Psychological analysis of musical composition
341
writers of texts, professional and non-professional alike. Authors thinking aloud would, when using the knowledge-transforming approach, utter many words which would not appear in the final text. This could include planning, structure, plot, etc. In the case of a knowledge-telling approach, however, most of the uttered words would enter the text. Finally the differences can be found in the need for revision. As repeated planning and goal analysis is only represented in the knowledge-transforming model, large scale revisions will occur in this model only. The main concepts in both models, as in other linguistic models (cf. Hayes and Flowers 1980) are cognitive processes, intermediate products, goals and constraints. Smith and Landsman introduce, on basis of the identification of these aspects and their interdependencies, the concept of cognitive mode. A cognitive mode is defined as a particular way of thinking that writers adopt in order to accomplish some part of the overall writing task. These modes are described as a certain combination of cognitive processes, intermediate products, goals and constraints. Their resulting model is described in Table 1. The work of Smith and Landsman has resulted in a text editor providing writers with three different outlooks on their composition: a network of standing alone and interconnected subjects, a structured tree and a final text. All representations are interconnected, and can be shown simultaneously. As cognitive modes are not sequential (Smith and Landsman, 1989), all three representations can be used in a random sequence.
Architectural and industrial design As described, research in design processes occurred in more than one application area. Architecture is among the first disciplines to be found in research reports (Eastman, 1970). Based on results in this field a dutch researcher, C.Teyken, describes in his dissertation design strategies of industrial designers. A process model of the design processes, based on Newell and Simon’s description of problem solving and on eight models given by former research in architectural design, is given by Teyken as shown in table 2 (Teyken, 1988). The products and processes mentioned by Teyken resemble the cognitive modes given by Smith and Landsman.
Musical composition Few researchers (cf. Reitman, 1965, Sloboda, 1985) have been involved in research in musical composition. Most resulting ideas, which we will call models but which are still very vague, are based on analysis of their own compositional processes. The only person to have conducted experimental research in this field is Reitman, who asked a subject to compose a fugue in the style of Bach. Detailed results or protocols of this 1965 experiment however have never been published; the fact that the experiment was conducted is mentioned only in passing (Reitman, 1965). Stages of the composition process and knowledge used are globally treated by Sloboda (Sloboda, 1985). In his model (Figure 3) a musical idea develops into a final form through various steps. This idea may vary from a first impression of the form of the musical work to even a non-musical concept like the aim of composing the piece. Not
Contemporary music review
342
every single musical composition process needs an initial idea. This resembles the distinction drawn by Otto Laske (1977) between transformational and creative composition. While engaged in the former, ideas develop into musical works. While engaged in creative composition, musical works evolve while ‘just trying’. A particular compositional process thus may be more or less bound to initial ideas. In general, a compositional process will be a combination of both creative and transformational composition. Sloboda points out that composers use a repertoire of compositional devices. These are techniques that are used to transform themes into complete musical
Table 1 Cognitive Modes model by Smith and Landsman. Cognitive Modes
Constituents Processes
Products
Goals
Constraints
Exploration
Recalling Representing Clustering Associating Noting sub/superordinate relations
Individual concepts clusters of concepts Networks of related concepts
To externalize ideas To cluster related ideas To gain general sense of available concepts To consider various possible relations
Flexible Informal Free expression
Situational analysis
Analyzing objectives Selecting Prioritizing Analyzing audiences
High level summary statement Prioritized list of readers (types) List of major actions desired
To clarify rhetorical intentions To identify and rank potential readers To identify major actions Consolidate realization To set high-level strategy for document
Flexible Extrinsic perspective
Organization
Analyzing Synthesizing Building abstract structure Refining structure
Hierarchy of concepts Crafted labels
To transform network of concepts into coherent hierarchy
Rigorous Consistent Hierarchical Not sustained prose
Writing
Linguistic encoding
Coherent prose
To transform abstract Not necessarily representation of refined concepts and relations into prose
Editing: global organization
Noting large scale relations Noting and correction inconsistencies
Refined text structure Consistent structural cues
To verify and revise coherence relations within intermediate sized components
Focus on largescale features and components
Psychological analysis of musical composition
343
Manipulating large scale structural components Editing: coherence relations
Noting coherence between sentences and paragraphs Restructuring to make relations coherent
Refined paragraphs and sentences Coherent logical relations between sentences and paragraphs
To verify and revise coherence relations within intermediate size components
Focus on structural relations
Editing expression
Reading Linguistic analysis; Linguistic transformation Linguistic encoding
Refined prose
To verify ands revise Focus on document structural relations among sentences and paragraphs Rigorous logical and structural thinking
Table 2 Model of the design process by Teyken. Orientation
Products
Recognition of the design problem
Problem notion
Analysis of the problem instruction
Essentials and subordinates
Planning of the orientation process
Problem approach
Specification of criteria and constraints
Criteria and constraints
Determine and weigh problem variables
Variables
Gain information
Extra information
Idea production
Ideas and representations
Solution production by combining ideas
Solutions and representations
Evaluate process
Judgement
Analyze and evaluate ideas and solutions
Conclusions, judgements and sub problems
Choosing a solution
Definitive solution
Sketching a solution
Sketch
Experimentation Planning the experimental phase
Problem approach
Recognizing sub-problems on basis of sketch
Sub problems
evaluating sub-problems on solvability
Solvable/non solvable sub problems
(re) production of ideas on sub-problems
Ideas, design variants, representations
Contemporary music review
344
evaluating process
Process judgement
choice of elements and working methods
Material and techniques
Mock up experiments
(Im) possibilities
Designing first try
Try out model
Analyzing and evaluating first try
Concusions, judgements and sub-problems
Improve prototype
Ideas and representations
Implementation Planning the implementation
Problem approach
Producing working sketch/description
Sketch and description
Analysis and evaluation of prototype
Judgements, conclusions and sub problems
Production of ideas on sub-problems
Ideas and representations
Implementation as prototype or end product
Prototype and/or end product
Figure 3 Model of musical composition by Sloboda. works. Some of these may be valid for a group of composers, while some may be personal. Musical knowledge plays an important role in all musical processes and is often acquired while listening to pieces (Serafine, 1988). Lerdahl (1988) in his model focuses on the activity of listening. His theories are based on concepts from linguistics and grammars play an important role. The basis of the model is the assumption that composers, while listening to their own themes, use a listening grammar to judge the themes and to alter their compositional grammar. This is strongly supported by Serafine (1988) who states that not only composers but all persons possess, to a considerable degree, an implicit understanding of music. Music, according to her, arises from a core
Psychological analysis of musical composition
345
set of cognitive processes, which are the basis of composition, listening and performing alike.
Composers’ notes Different strategies in written composition are shown by notes made while composing. This would, in musical terms, account for the different forms of sketches to be found in composers notebooks. Strauss in one of his letters shows a combination of knowledgetelling and knowledge-transformation, when describing the adding of music to lyrics to Hugo von Hofmannsthal (Weiss, 1967, letter 292b). The final scene is magnificent: I’ve already done a bit of experimenting with it today. I wish I’d got there already. But since, for the sake of symphonic unity, I must compose from the beginning to the end. I’ll just have to be patient. Another way of composing, more closely resembling knowledge-transformation, is described by Beethoven: I carry my thoughts with me for a very long time, before writing them down… I change many things, discard others, and try again until I am satisfied; then, in my head, I begin to elaborate the work in its breadth, its narrowness, its height, its depth and, since I am aware of what I want to do, the underlying idea never deserts me. (Sloboda, 1985, page 107). Even the differences between Plato and Aristotle return in modern age composers. Schubert writes, in a letter to his father and stepmother (Weiss, 1967, letter 148): I think this is due to the fact that I have never forced devotion in myself and never compose hymns or prayers of that kind unless it overcomes me in unawareness; but then it is usually the right and true devotion. Plato himself could almost have written these words; he also would argue that a composer has to be in a state of devotion to be able to write music. Wiedebein however, only stays in this state of devotion for a short time, and describes the state of dedication and the work afterwards in a letter to Robert Schumann (Weiss, 1967, letter 156b). We ought wholly to surrender ourselves to the fine rapture of moments of exalted dedication; after which, however, calm, inquiring reason must also assert its rights and intervene with its bear’s paw, mercilessly scratching out whatever human failing may have got smuggled in. Wild things grow wildly; nobler fruits demand cultivation. Wine, however, requires not only the most assiduous cultivation, but also the knife; (…)
Contemporary music review
346
As composers’ notes cannot be said to give a valid description of the compositional process as a whole, we only use these statements as means of comparison. They may show deficits in models to be defined, or demonstrate at least that models can not be proven wrong when evaluated against these notes.
A hypothetical model of musical composition Based on the above research and on models of different design process we may propose a hypothetical model of the musical composition process. Aspects of the models that are present in the hypothetical model are shown in Table 3.
Table 3 Included aspects of the different models. Model
Concepts present
Knowledge-telling
The concept of two types of knowledge The concept of a mental assignment The concept of identifiers The concept of probes The testing of elements The concept of a mental representation
Knowledgetransformation
The concept of planning, analysis and goal The use of two problem spaces and their connection
Cognitive modes
The concept of modes The concept of constituents The concept of concepts, clusters and networks, hierarchy, coherence, structure
Sloboda
The concepts of idea, theme, intermediate form and final form, the concept of compositional devices, the concept of unconscious knowledge
The global assumption of our model is that a set of ideas, themes and intermediate parts (cf. Sloboda’s model) develop into a final composition using different types of knowledge. We will, according to Smith and Landsman, define three levels of organization: global, coherence relations and expressions. Expressions come up, as Sloboda described in his model, by inspiration. We will assume these expressions to be the smallest units of investigation. Knowledge-telling, which may be assumed to lead to a particular expression, is thus left unexplored as an underlying process. Note that we do not want to design a composing system. We will not assume a sequential process. The only presumption we make is an increase in the number of structured ideas as opposed to unstructured ideas during the
Psychological analysis of musical composition
347
compositional process. This is in accordance with Smith and Landsman’s model, in which editing results in refined structures on all three levels of organization. In written composition rhetorical constraints effectively reduce the size of the content problem-space. In musical composition we expect to find less influence of the “form and direction” problem-space (cf. Sloboda, 1985) on the size of the content problem-space and thus on the amount of partly-developed musical units at a certain point in the compositional process.
Table 4 The hypothetical model of musical composition. Cognitive Modes
Constituents Processes
Products
Goals
Constraints
Assignment
Representing Planning
Representation Plan
To form a mental representation To form a plan
Rexible
Exploration
Knowledge-telling Representing Clustering Associating Noting relations
Individual, clustered and networked musical or non- musical concepts
To externalize ideas To cluster related concepts To gain general sense of available concepts To consider various possible relations
Flexible Informal Free expression
Situational analysis
Analyzing objectives Selecting Prioritizing Analyzing audiences
High level summary statement Audience description List of major actions desired
To identify and rank potential public To identify major actions Consolidate realization To set high-level strategy for piece
Flexible Extrinsic perspective
Organization
Analyzing Synthesizing Building abstract structure Refining structure
Hierarchy of concepts Crafted labels
To transform network of concepts into coherent hierarchy
Rigorous Consistent Hierarchical Not sustained prose
Writing
Musical encoding Coherent music through knowledgetransformation
To transform abstract representation of concepts and relations into music
Not necessarily refined
Contemporary music review
348
Editing: global organization
Noting large scale relations Noting and correction inconsistencies Manipulating large scale structural components
Refined composition structure Consistent structural cues
To verify and revise coherence relations within intermediate sized components
Focus on largescale features and components
Editing: coherence relations
Noting coherence between parts and constituents Restructuring to make relations coherent Arranging
Refined parts and constituents Coherent logical relations between parts and constituents
To verify and revise coherence relations within intermediate size components
Focus on structural relations among parts and constituents Rigorous logical and structural thinking
Editing: transforming, extending, developing and modify musical unit
Reading, imagining, Refined musical playing and unit listening Musical analysis and judgement on basis of constraints on form and direction Musical transformation (via compositional devices) Musical encoding Instrumentation
To verify and revise musical units
Focus on musical unit
In designing our hypothetical model we decided to use Smith and Landsman’s model as a framework and adapted it to the musical design process. The concepts recognized as important within the other models (see Table 3) are also included. The resulting hypothetical model is shown in Table 4. Describing compositional processes with the aim of identifying possible tasks to be assisted requires the identification of processes that could need assistance. The hypothetical model would at least account for the occurrence, in observing composers at work, of the processes and products shown in Table 5.
Table 5 Processes and products accounted for by the hypothetical model. Processes
Products
Planning
Musical and non-musical probes (ideas)
Knowledge-telling
Musical concepts/units
Psychological analysis of musical composition
349
Clustering of concepts
Clustered and networked musical concepts
Noting relations between concepts
Summary statements
Associating concepts
Hierarchies of musical concepts
The building and refining of a structure of of concepts
Compositional structure Structural cues
Noting large-scale relations Noting and correction inconsistencies in large-scale relations Manipulating large-scale relations Noting coherence between parts and constituents Restructuring of parts and constituents Judgement of musical units Transformation of musical units
In our experiments we expect to observe difficulties in all the processes mentioned. In addition we expect to find a distinct number of cognitive modes, combining sets of processes and products.
Testing the model experiments The experiments designed to test the model mainly consist of observing composers performing compositional tasks. Using modern recording equipment we are able to observe composers at work and record auditory data (e.g. humming and playing), visual data (sketching and “scratching”) and linguistic data produced when talking aloud. Several problems had to be overcome (Roozendaal, 1990), but the eventual experiments were conducted with a number of composers, who all took part in a number of sessions. Interestingly, one of the problems to be overcome was a resistance to co-operate among composers. Some of them, as it happened, shared Plato’s view about the compositional process being “of higher nature” and not subject to research. This will probably be the main reason why experiments in musical composition had not been conducted on a large scale before. Sloboda already noticed this when writing: There is a vast body of literature on the musical compositions which figure prominently in our art culture, but most of this deals with the product of composition, not the process (Sloboda, 1985). The resistance among university-based musicologists—who had had training in composition as opposed to conservatory composers—was not as high. This led to a group of subjects consisting of two—out of eight invited—conservatory-trained composers and six—out of eight invited—university-based composers.
Contemporary music review
350
Sessions Experiments consist of a number of sessions, an overview of which can be found in Table 6.
Table 6 Overview of sessions. Session
Content
1
Welcome and introduction
2
Methodology introduction
3
Complete compositional process
4
Detailed experiments on parts of the process
last
Complete compositional process
The first of these is short and only serves as an introduction to the institute, the experimenter and the research project itself. Although different issues come up for discussion, this session generally lasted no longer than one hour. The second session serves as an introduction to talking aloud, analyzing a video protocol and last but not least composing in an experimental room. At first talking aloud is practised in multiplication tasks. The difference between thinking and talking aloud is shown in quasi-visual tasks. After these tasks the composer is instructed only to use the talking aloud procedure in the rest of the sessions. The musical part of the second session consists of a modulation task and a completion task. The modulation task was chosen as it is simple enough to allow the training of talking aloud. Two modulating tonal melodies are requested, one very uncommon (from C major to F sharp major and back) and one less complex (from D major to B flat major and back). In the completion task subjects are asked to complete two incomplete melodies. One of these contains no modulation whereas the other does. After all the tasks have been completed the videotape is rewound and played back. One of the musical tasks is picked out to be retrospectively analyzed at a composercontrolled pace. This way the use of the recorder and generating retrospective reports is practiced. The third session comprises the making of a new musical piece. To ensure that every subject starts at the same point in the musical composition process, i.e. in the exploration mode, the experimenter asks the subject to start composing on a certain nonmusical theme, like a poem. These themes are chosen from the sources of inspiration mentioned by the composer in the first session. Although the composition process thus is directed to a certain degree, this is done in an attempt to ensure that all subjects are equally attracted to and motivated by the task to be performed. The third session enables us to get an overview of the composition process for a particular composer. The next sessions (two, for the most part) are used to analyze parts of the composition process that were not clear in the third session. In the last session the experiments are ended with another complete composition. This is included so as to be able to compare the results of this session with the model of the particular compositional process that is to be built in the analysis phase.
Psychological analysis of musical composition
351
Results Experimental data are at the moment evaluated using the hypothetical model of musical composition. The hypothetical model serves as interpretative model for the experimental data. An interpretative model represents human problem solving behavior on certain categories of tasks, in this case synthesis tasks in general and design task in particular (Breuker et al., 1985). First results are in accordance with the model and show a number of cognitive, perceptual and motor problems, which might be alleviated by the use of a computerassisted composition tool. Among these are those shown in Table 7.
Table 7 Problems identified in musical composition processes. • An inability to clearly state and plan the design problem • A focus on one subproblem, which can not be solved at the moment • An inability to structure ideas that come up • An inability to keep track of networks of concepts • An inability to obtain an overview of more than one musical unit • An inability to imagine musical units • An inability to play musical units • An inability to imagine extreme aspects of certain instruments, outside the normal use
Support for the model may also be found in the recognition of different levels of editing in the experimental data. This support was also found for the other cognitive modes. The conclusion may be that the model, although not explaining unit-generation—the “inspiration” in Sloboda’s model—may serve its purpose.
Towards a composition-assisting tool While our conclusions are necessarily speculative, it seems possible to suggest particular types of prospective computer-assisted compositional tools. This can best be done by comparing future musical systems to the new text editors to which Smith and Landsman’s research leads, as their model of cognitive modes closely resembles our musical composition model. Music composition-assisting systems could thus provide a hierarchy of abstraction, combining networks, structured relations and final musical units. Design of such a system is in the definition phase. After implementation we will conduct the same experiments as described in this paper, with the only difference being the use of the compositional system by the subjects. This will show whether or not problems in the compositional process have been mitigated.
Contemporary music review
352
References Aristotle Poetics, XVII. In Aristotle (23 vols), (23), London: Harvard University Press. Bereiter, C., Scardemalia, M. (1987) The psychology of written composition. Hillsdale: Lawrence Erlbaum Associates. Breuker, J. et al. (1987) Model Driven Knowledge Acquisition: Interpretation Models. Esprit Project P1098. Deliverable D1 (task A1), Amsterdam: University of Amsterdam and STL Ltd. Britton, B.K., Glynn, S.M. (1989) Editors. Computer writing environments: theory, research and design. Hillsdale: Lawrence Erlbaum Associates. Eastman, C.M. (1970) On the analysis of intuitive design processes. In G.T.Moore (Ed.), Emerging methods in environmental design and planning, Cambridge: MIT press. Hayes, J.R., Flowers, L.S. (1980) Identifying the organization of writing processes. In L.W.Gregg and E.R. Sternberg (Eds.), Cognitive processes in writing, pp. 3–30. Hillsdale: Lawrence Erlbaum Associates. Laske, O. (1977) Music, memory and thought. Pittsburgh: The music department of the University of Pittsburgh. Lerdahl, F. (1988) Cognitive constraints on compositional systems In J>A> Sloboda (Ed.), Generative processes in music, the Psychology of Performance, Improvisation and composition, Oxford: Clarendon Press. Loban, W. (1976) Language development: kindergarten through grade twelve. Urbana Il.: National Council of Teachers in English. Plato, Ion. In Plato (8 vols), (8), London: Harvard University Press. Reitman (1965) Cognition and thought—an information processing approach. New York: Wiley and Sons. Roozendaal, R. (1990) Generative processes in music: musical composition. To appear in: Proceedings of the Music and Informatics conference. Marseille. Serafine, M.L. (1988) Music as cognition: the development of thought in sound. New York: Columbia University Press. Sloboda, J.A. (1985) The musical mind: the cognitive psychology of music. New York: Oxford University Press. Smith, J.B., Landsman, M. (1989) A cognitive basis for a computer writing environment. In B.K.Britton and S.M.Glynn (Eds.), Computer writing environments: theory, research and design, pp. 1756. Hillsdale: Lawrence Erlbaum Associates. Teyken, C. (1988) Ontwerpen als proces. Amsterdam: Swets and Zeitlinger. Weiss, P. (1967) Editor. Letters of composers through six centuries. London: Chilton book company.
How do we perceive atonal music?
353
How do we perceive atonal music? Suggestions for a theoretical approach Michel Imberty University of Paris X, Nanterre, France Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 325–337 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
In the discussion that follows, I will attempt to formulate some propositions for a model of the perception and mnemonic encoding of atonal music. These propositions aim to address the psychological paradox on which, at least at the outset, Schoenberg and his pupils founded their theoretical work, that of the suppression of hierarchical tonal functions at the heart of musical structure. It is a psychological paradox to the extent that today we know that there is no visual or auditory perception without there being a hierarchical organisation of the information, which at the same time allows for economical storage in memory regardless of the complexity of the information. We must then suppose that in the case of an atonal work the listener himself constructs new hierarchical rules which are substituted for tonal rules, and the coherence of the musical work depends on the listener’s capacity to treat the musical information outside the conventions he has acquired by listening to tonal works. More exactly, it is a question of knowing how, in the absence of defined structural referants, the listener manages to reduce the musical surface in order to be able to reconstitute a network of very general temporal relations which can be applied beyond the present listening event. KEY WORDS: Atonal, hierarchy, reduction, schema, syntax.
A classic experiment: the identification of a series via its transformations Since 1958, R.Francès in his book La perception de la musique, has shown the type of difficulty the listener is confronted with when listening to serial music. In a preliminary experiment pertaining to memory for short sequences of sounds, it appears that atonal
Contemporary music review
354
sequences are a lot less stable in immediate memory than tonal sequences, this being as much the case for subjects who are musicians as for non-musicians. The outcome of this experiment leads us to interpret the results as reflecting a structural instability of atonal melodies as compared with tonal melodies: in fact, the experiment consists of the detection of a change of note within an initial melody during its reproduction. The change is even easier to detect when it results in a modification of the overall form, or at least the network of relationships between the surrounding intervals. However, in atonal melodies, the network is too weakly organised, and changing the notes does not affect the form of the sequence, which itself has a hazy and imprecise form in the subjects’ perception, since scalar functions have completely disappeared and all the notes bear equivalent weight in the temporal flow. But the main experiment will allow me to consider the problem in a more complete way. Francès works from the idea that the series, in the musical language of Schoenberg, must have the same sense of unification of the musical framework as tonality does in the tonal system. In other words, and according to Schoenberg himself, the series is the unifying principle of the piece: all the transformations that it undergoes must be perceived in relation to it, and they must have a family link perceived by the listener, which guarantees the coherence of the whole. On the other hand, two different series must allow the development of two groups of sequences resulting from their respective transformations, each group being totally distinct from the other and unable to have the same unifying link. In brief, it is a matter of showing whether the listener really recognises the link between the original series and its derivatives, or if the link is purely an abstract effect of a written procedure. In order to do that, Francès asks a serial composer to write 28 musical examples, 24 from transformations of a first series, and four others from another series. The two series differ in the order of the last 6 notes (dodecaphonic series of 12 notes). The subjects have to identify not both series, but only the principle series, by detecting the four sequences coming from the second series as being incoherent in relation to the 24 sequences coming from the first series. It is thus the errors in identification of the first series which are studied. The sequences composed are of various types, rhythmically and polyphonically. The subjects are professional musicians distributed into two groups: one is composed of composers and instrumentalists trained in serial techniques, the other of music teachers who have only received very general instruction in this area. First the subjects are played each series twice, then the subjects must determine whether the sequence belongs to the first or the second series for each of the 28 sequences. The results are well known—as they caused quite a stir at the time—and I will summarise them in the following way: identification errors are very numerous, more than 50% in the polyphonic sequences, in the order of 30 to 40% in the purely melodic sequences. Francès concludes from this that “serial unity is of a conceptual rather than a perceptual order: impeded by the melodic movement, the rhythm, the harmonic organisation of notes, it survives audition with great difficulty” (1958, p. 144). This experiment however provokes some comments. First of all, it rests on the implicit idea of a mental reduction of the musical surface by the listener, a reduction by which he may grasp, over and above immediate differences, the structural traits common to the sequences proceeding from the same series and the original series. The results appear to indicate that this reduction was impossible, hence the numerous identification errors. But
How do we perceive atonal music?
355
one could ask whether this impossibility results from the structure of the sequences—or rather from the lack of any perceptible underlying reduced structure—or whether the question posed was really appropriate for bringing to light an adapted reduction procedure. In other words, and this is my second point, it is not certain that the series is anything but a written process, a formal combination employed by the composer in order to create new forms where the relationships between the notes are organised according to very different parameters which have nothing to do with the series itself. We should remember that Berg, very precisely and systematically in the composition procedures he used, desired that all of that should be forgotten by the listener during listening. What I would like to suggest here is that the similarities or differences between the responses are not a result of a common underlying functional structure—the series—but that the compositional games lie in the melodic, rhythmic and dynamic features without reference to the stabilities or instabilities defined by scalar hierarchies. Atonal musical structure rests on other polarities, not situated at the level of the series itself which hence cannot serve as a prototype or frame of reference in perception and memory. Moreover, as Francès notes, the series employed are not very different: but if we can suppose that if they had been entirely different they would have been more easily identifiable, it is no doubt because they would have encompassed the global melodic contours, themselves more distinguishable. A comment of a similar nature could be made in relation to the preliminary experiment: the difficulty in detecting the changed note in the reproduction of the original melody is also related to the fact that, in atonal melodies, as in tonal melodies the modification depends on a very small interval (a semitone or a tone). In a strong hierarchical organisation, as is the case for the tonal system, the size of the interval is only a secondary cue as it is the tonal function of the modified note which is determinant. In a weak hierarchical organisation, where the functions of the notes are equivalent, the size of the interval becomes a pertinent cue in the organisation of the musical surface. If the modification had systematically depended on the size of the interval, in changing the melodic feature, the auditory detection may perhaps have been easier. Consequently, we note that if Francès experiment rests on the hypothesis that in order to identify the series to which sequences belong, the listener makes a mental reduction of the extended sequences to bring them into a schema of the initial series: the unifying principle. The psychological reality of this mental operation is not brought to light, and the hypothesis itself seems ill-adapted to atonal music.
The perceptual organisation of tonal phrases. Extension to atonal phrases In 1983, Lerdahl and Jackendoff put the final touches to what they themselves considered to be a model of the experienced listener, under the name of the Generative Theory of Tonal Music (GTTM). Recently, Lerdahl (1989) has made some proposals for generalising this model to atonal music. I would like to examine these points and show how certain questions posed by Francès experiment can be resolved.
Contemporary music review
356
a) Firstly it should be remembered that GTTM is a model of the comprehension of tonal music at the final stages of cognitive processing implied by the activity of listening and auditory analysis. From this perspective, GTTM does not take into account cognitive processes which occur during listening, in real time of actual listening. More exactly, it is necessary to theorise about the final stage of comprehension, in order later to be able to consider the processes in real time. At first GTTM was restricted to tonal music, and this no doubt explains the importance attributed from the outset to the notion of the functional hierarchy: the hypothesis is indeed that the listener, in order to understand and remember a phrase of tonal music—memory being indissociable from comprehension—tries to locate the most important elements of the structure whilst reducing the musical surface (the concrete form in all its detail) to a highly organised economical schema. The idea is then that the listener carries out mental operations of simplification which allow him not only to understand the complexity of the surface, but also the possibility of reconstructing this complexity from the simplified schema, and indeed to produce other musical surfaces, other phrases of the same type by the reactivation of this structure. We can see then that the GTTM, as a model of the listener, rests on the hypothesis that there is a correspondence between the hierarchical structure of the tonal music and the mental procedures by which the listener seeks to extract the fundamental underlying structure through the perceptual variety of the surface. The hypothesis is thus at the same time musical—based on tonal theory itself—and psychological, as the authors of GTTM infer that the strategy of the listener corresponds to this structural organisation. On the cognitive level the hypothesis no doubt also implies that all perception, all comprehension and indeed all memory of the relevant phenomena is carried out by processes which are themselves hierarchical. Deep down it is a return to one of the principles of Gestalt theory, according to which there is no spontaneous perception without a minimum of stable organisation, but perhaps here in an even more constraining way. The hierarchy concept can be explained more precisely in the following way: it is an organisation where the elements can subsume or contain other elements. These embeddings determine the levels such that none of the elements of a given level can subsume or contain elements from the same level. The relation of “subsumption” or inclusion is recurrent from level to level (Lerdahl and Jackendoff, 1983 (a), Ital. trans. 1987). The theory proposes analysing the whole tonal musical piece in four hierarchical structures: grouping structure which translates a hierarchical segmentation of the piece into units determined by the listener and of variable size (motif, phrase, section…), each smaller unit of a given level can be included in a larger unit of the level immediately above; the metrical structure, which is a hierarchy of rhythmic and tonal accents, of which the smallest unit is the alternation of strong and weak beats; time span reduction which establishes the relative weight of pitch events at the heart of the rhythmic structures and grouping structures: it is no longer a case of direct segmentation by the listener, but rather a reduction of the apparent complexity of the rhythmic units or the groupings of pitch in their essential schema or underlying structure of the segmentation; finally prolongation reduction which establishes the succession or progression of the tensions and relaxations at the heart of the time span reductions, and between the temporal lines. It is an underlying structure of all that encompasses time span reduction,
How do we perceive atonal music?
357
which corresponds to the most abstract and most fundamental organisation of the musical piece, and which could perhaps be called, although—Lerdahl and Jackendoff don’t employ these terms—the generative structure of the piece (that which most resembles Schenker’s idea of “Ursatz”). Hence we can see that these four types of hierarchy do not have the same status in GTTM: the first two types come from a deep analytical description of what is heard, that is to say of the perceived musical object (and the analysis procedures remain very close to those used in semiology, as they are presented by Nattiez (1987)); the final two types on the other hand are more mental operations carried out on that which is perceived, the treatment procedures of the musical surface for which the main concern is the economical reduction of the information and its storage in memory. b) It must be understood that from an epistemological point of view, there is in GTTM a certain equivalence between the actual or supposed structure of the musical piece, and the psychological requirement for hierarchical structure. This is, as we have seen just exactly what poses a problem in the case of atonal music. But we have to go a little further in our reflection on GTTM in order to understand what very general principles can be extracted from it, outside of the specific case of tonal music. The beginning of a psychological verification of the hypotheses of Lerdahl and Jackendoff has just been produced in my laboratory by E. Bigand (1990). In a whole series of experiments, this researcher put into practice the idea that the operations of reduction of musical surfaces are a psychological reality without which it would be impossible to explain certain behaviours of the listeners. Out of the 7 experiments conducted, I will refer to three which make explicit and operationalise the concept of reduction. The first experiment investigates the recognition of melodies based on two types of deformation: the first type consists of strongly modifying the rhythmic structure, whilst preserving all the notes of the original melody. This process masks the underlying harmonic structure, since the strong rhythmic cues no longer coincide with the points of basic harmonic cue. The second type of deformation consists of simplifying the melodic and rhythmic surface, but preserving the essential articulations from a harmonic point of view (global contour unchanged and underlying structure apparent). It proves that the recognition of the original model (popular melody very familiar to subjects) is much more difficult in the first case than the second, the perceived underlying structure allowing subjects to mentally reconstruct a simplified version of the original surface. The second experiment pertains to the recognition of melodic sequences such as variations on an original theme. Seven variations corresponding to different levels of abstraction of the theme were constructed from a well-known popular melody (Au clair de la lune), and five other melodies with the same rhythmico-melodic contours, but not corresponding to the same underlying time-span structure. The subjects, musicians and non-musicians, have to indicate whether each melody they heard is a variation of the theme or not, and evaluate the degree of certainty of their response. The results, whilst quite rich and complex, clearly show that the abstraction of the underlying structure functions in all subjects, musicians or non-musicians, as a principal criterion for recognition of a theme through the variation. Hence the experiment affirms the hypothesis that time-span reduction is not only an analysis procedure employed by subjects possessing musical knowledge and habits, but it corresponds to a group of very general cognitive procedures which extend beyond the framework of music perception. I
Contemporary music review
358
will return more precisely to this aspect which puts us on the path towards a reply to our problem concerning atonal music. The last experiment that I will put forward is even more radical, and resembles in nature the experiment of Francès on the identification of series in serial sequences; consequently, it will be accorded a little more space. Four melodies with the same underlying structure, but with different rhythmico-melodic organisations are constructed. These melodies are presented twice to both musicians and non-musicians, then they are mixed with four other melodies having the same surface structures as the previous ones, but a different underlying structure. The underlying structures of these four new melodies are themselves of two sorts: pitch structure and function structure, which correspond to two types of reduction defined by GTTM: in fact, time-span reduction consists of locating the same notes in the same strategic positions of one surface structure as of another, despite the differences; prolongation reduction consists of identifying the stability or instability of certain functions throughout the length of the compared surface structures. In the experiment the subjects must identify whether the melodies they hear belong to the first homogeneous family by its underlying structure, or not. The experimental instructions talk specifically of a family link, which corresponds to the idea of the unit to be recognised in Francès’ experiment. The results are very clear, even though the experiment appears to be very difficult for all subjects: “The listener hears the melodies of a family as being variations of an underlying pattern which is structurally more important than the little differences appearing at the level of the musical surface. In other words, the syntactic aspect of musical organisation takes precedence over the featural aspect” (Bigand 1990, p. 368). However, it should be added that the pertinence of the underlying prolongational organisation is revealed only in so far as there is a coincidence between this and the grouping structures. This rigorous parallelism is a condition of the results obtained here: in this case, there is indeed abstraction of a family link between melodies which have neither the same contour, nor the same notes nor the same rhythm. And the most remarkable is that the non-musicians come to the same abstraction in a way which is not very different from the musicians: the latter do not appear to use their knowledge of harmony and grasp the family link in an intuitive way, without being able to analyse it explicitly. c) The perception, comprehension and memory of atonal music therefore undergoes cognitive processes of hierarchicalisation and reduction of musical surfaces themselves conforming to rules of syntax. In Francès’ experiment one would be right in saying that it is the absence of these same hierarchies which leads to the impossibility of retrieving the initial series as an organisational underlying structure. But one could also ask how the listener actually organises the sequences which are presented to him; in particular we could suppose that it is a question of discovering the cognitive rules which apply to the grouping structures and metrical structures which are presented during listening. In other words, it is probably not the series which constitutes the underlying structure of Francès’ experimental sequences, but in the absence of a real grammaticality, the subjects are lead to define their own listening strategies. In some recent work, Lerdahl (1989) made some theoretical propositions concerning atonal music, based on some analyses of pieces of Schoenberg.
How do we perceive atonal music?
359
The main difficulty is that GTTM is based on the fact that one can define the stability conditions of certain groups compared with others, melodically as well as harmonically. Atonal music does not possess such conditions. But there appear, amongst the preference rules, and alongside the rules concerning stability, rules concerning salience. An auditory event is salient when it is a strong metrical position, in an extreme register, or when it takes on a particular formal significance (thematic for example). As atonal music does not have any conditions of stability, the listeners, according to Lerdahl, organise the musical surfaces according to the relative salience of events. Alongside the metrical position or register, salience in atonal music comes from timbre, attack, from all sorts of parallelisms which can come to light during analysis. Salience is a major perceptual phenomenon in atonal music, whilst it plays a minor role in tonal music, as a result of the existence of hierarchical conditions of stability. Lerdahl proposes then that the prolongation structures, in atonal music, are hierarchical structures of salience: at the level of surfaces, these are the auditory events which immediately attract the attention of the listener, at the most abstract levels it is those of motivic relations and structural parallelism. If we want to briefly and intuitively describe the principle of the analysis proposed by Lerdahl in the examples presented, salience is most frequently the “obsessive” repetition of a sonority (chord). (op. 19, no. 6, no. 2, op. 11, no. 1). From this auditory event, the whole of the musical surface is born for the listener. We see that the salient element plays the role of the stable tonal element in tonal music, and leads to what we might then call prolongation by iterations. We can now ask whether the experimental sequences proposed by Francès, despite the fact that they are constructed from two distinct series, actually present prolongation structures of salience which are very similar to one another. The hypothesis at least merits being examined: in particular, the two examples in harmonic presentation which Francès himself gives (1958, p. 148) do seem to correspond to the same phenomenon as that which forms the base of Lerdahl’s analysis of Schoenberg op. 19 no. 2 and 6. Despite the fact that they belong to two distinct series, both Francès sequences in fact rest on a major 7th sonority repeated in a strong metrical position, and end, equally, on an augmented 4th: how can we help thinking in this case that it is the two salient events which lead the subjects to judge that there is a family link which, for the subjects has nothing to do with the constituent series? d) This leads me to formulate two points. The first is already clearly proposed by Lerdahl who I quote: “The crux of the theory outlined above is the decision to regard contextual salience in atonal music as analagous to stability in tonal music. This step amounts to an acknowledgement that atonal music is not very grammatical. I think this is an accurate conclusion. Listeners to atonal music do not have at their disposal a consistent, psychologically relevant set of principles by which to organise pitches at the musical surface. As a result, they grab on to what they can: relative salience becomes structurally important, and within that framework the best linear connections are made. Schoenberg had reason to invent a new system” (1989, p. 84 in English version). We are going to rediscover this problem in a moment with respect to the perception of whole atonal pieces, compared to the perception of tonal pieces. The relative salience determines a provisional hierarchy, which is always modifiable from hearing to hearing, and is without doubt one of the characteristics of atonal music, more specifically of serial music, which is extremely fluid for the hearer, to have no definitive structure.
Contemporary music review
360
The second point nevertheless concerns a certain ambiguity of status of the notion of salience in the application of GTTM to atonal music: Lerdahl and Jackendoff insist on the fact that their model concerns the “final phase” of the cognitive processing of music, and that in a certain sense it is totally abstract in comparison to actual listening. In this respect they quite rightly distinguish between procedures which are theoretically possible and ideals which are the reductions, and the procedures which occur in real time during listening. It is clear that if we want to construct a true model of the listener, we also have to work in real time. But as far as atonal music is concerned, it is even more indispensable, to the extent that we cannot refer to a preexisting syntactic system, to a general structure underlying all contemporary styles. The only possible route is that which we have embarked on in my laboratory with regard to the tonal musical phrase, but also concerning the auditory structuring of tonal and atonal pieces. (Imberty, 1981, 1985, 1987).
The perception of tonal and atonal pieces Paradoxically, the experiments and the theoretical elaborations concerning the mental operations of listening in real time are more numerous for whole pieces of a certain length than for short phrases. No doubt it is because the problems posed by taking into account a complete work in a model, and the verification of all the hypotheses of the underlying cognitive processes are too complex to be tackled in their entirety. Conversely, the experimental—and more empirical—work that I have carried out myself, and that which Irène Deliège (1989; 1990) has presented reveals a certain number of facts which may provide some answers to questions which atonal music and its agrammaticality raise. a) I.Deliège has recently presented two experimental concepts which may illuminate our knowledge of the perception of atonal music: the extraction of cues at the musical surface, and the imprint. In a series of experiments on Sequenza VI by Berio and Eclat by Boulez, she shows that the listeners construct, during successive hearings of the works, which they do not know, and for which cues for tonal structures do not operate, a simplified schema of what they hear in the form of an imprint stored in memory, where the details are laid down in a prototype, unique with respect to the multiple variations from successive hearings. This imprint—which is in some respect the image of the piece which the listener retains—is progressively elaborated from cues recognised on the musical surface. At first these cues are anything which can attract the attention of the listener, and in a sense anything which elevates the salience of certain events in comparison to others. As the work, and then the repetitions proceed, certain cues are abandoned, and others reinforced which define the pitch groupings and the rhythmic groupings, and links are established between them. There again the most stable cues which are repeated from group to group, either in an adjacent manner or remotely, allow a larger structure which goes beyond the succession of groupings to be constructed. In the piece by Berio, there are numerous invariant elements which can give rise to cue extraction. All the groups are characterised by a specific cue which reinforces the caesura between them by the fact that it is starting with these cues that the listeners identify the groups and hence differentiate them from one another. In the work by Boulez, the
How do we perceive atonal music?
361
invariants are a lot less numerous and salient by virtue of writing which highlights the sonorous mutations. I.Deliège notes however that even in this case, the subjects have used some sonorous colours as cues, with great consistency. And it is these sonorous colours which, from hearing to hearing, help in the recognition of other events with structural value. Cue extraction is then a very general cognitive step, which progressively generates an imprint in memory. This is confirmed by the experiment which concludes I. Deliège’s paper: when the listeners have finished the auditory analysis of the piece, they are given the whole to listen to again, indicating to them the limits of five or six large sections which they have themselves recognised. Then they are given a whole series of small fragments to listen to in a random order and they must indicate to which section they belong. The results show that the cues extracted during analytical listening constitute the framework of the imprint, and lead to a precise recovery of the information; it is they which allow the identification and classification of the structures, and their hierarchicalisation. The salient cues function well then on a cognitive level, like prolongational elements. b) It is in this spirit that I myself broached the question of the role of perceptual and cognitive hierarchies in listening to works. In conclusion, therefore, I will develop a series of hypotheses based around three concepts: macrostructure, dynamic vectors, and perceptual hierarchy of changes and events, concepts which are in fact linked with the concepts of imprint, salience and cue extraction. First of all this signifies that a musical work, tonal or not, is, from the perceptual point of view, a hierarchy of changes, contrasts, perceived breaks during listening. Every work listened to is segmented into units or groups of varying length, relying on the perception of qualitative changes, or events of varying significance in the musical temporal flux. The perceived structure results, then, from this segmentation executed in real time, and from the relationships induced between the groups which define a perceptuo-cognitive hierarchy of what Lerdahl and Jackendoff call time spans. This hierarchy is a hierarchy of salience. But the concept of macrostructure is more complex and more dynamic than that of the imprint. As I have shown in my previous work on Brahms and Debussy (1981 and 1985) and expanded in my work on Berio (1987), auditory segmentation of a musical piece, and memory for it depend, quite apart from the structures of salience, on reference models, encoded and stored in memory, which have been learnt, or form part of the general knowledge of the subject, and favour certain strategies during listening. In fact, each one of us, in our culture, has, a priori, a certain idea of what a piece of music, popular song, etc. is, which is a mental representation of the moments, episodes, and parts which make up that which he is going to hear. Without going as far as to join with C.Levi-Strauss (1972) when he affirms that all music is structured like an “ad venture” (problem/tension, crisis, outcome/resolution), one might think that for the amateur, a priori, a musical work is composed of something like an exposition, or idea A, a development, or second idea B, a recapitulation, or return to A. Such a structure, maintained by three centuries of tonal music, rests on a very general principle of the alternation of tensions and relaxations, moments or periods of instability and their resolution in a new phase of stability, a principal which may also govern the organisation of the imaginary time of narrative.
Contemporary music review
362
Thus, the macrostructure is first of all a schema of the structuring of time, an a priori ordering of sonorous events in time, according to rules stemming from perceptuocognitive mechanisms which allow the detection of changes and salient elements in the sonorous flux. In sum, the macrostructure of a musical piece is made up of the perception and retention in memory of some particularly significant changes, which determine the overall progression of the piece for the listener. This macrostructure is in part constituted of formal stereotypes coming from the musical culture of the subject, and I have demonstrated their role in relation to the comparative perceptual analysis of pieces by Debussy and Brahms (Imberty, 1981, 1985). Subjects were asked to listen to each piece, then to say how many parts they had heard and to briefly describe them. In this case, if for the piece by Brahms, the stereotype ABA on which it had been constructed, was easily identified, for the piece by Debussy, the subjects who were not musicians place the same stereotype on it, ignoring certain dynamic aspects of the structure: they hence simplified in the extreme by bringing a structure infinitely more complex than that of the piece by Brahms into an identical schema which they knew well. c) But beyond stereotypes of a cultural origin, dynamic elements which reveal general processes of the organisation of time come to light. It is these which we can look for as principal (prolongational) unificators in the case of pieces of atonal music, since the cultural stereotypes learnt under the influence of tonal hierarchy cannot serve as cues. We can then hypothesise that the macrostructure consists of what we can call dynamic vectors which determine the progression of the whole. By dynamic vectors, I mean the musical elements which convey the temporal significance of orientation, progression, diminution or growth, of repetition or reversal. They are thus elements assimilated in the psychological experience of time, and whose function is to relate the present moment with its near past and immediate future in the [coordinates], or to relate the furthest past with the most remote future. In short, these dynamic vectors, distinct from syntactic or thematic elements, are the elements of salience placed in the order of the progression of the piece and playing on parallelisms, i.e. on the paradigmatic equivalences which define the repetition of structures at a longer or shorter distance in time, a repetition which is itself more or less varied. In pieces where the syntactic structure is weakly hierarchicalised, reductions and prolongations are founded psychologically on these equivalences (“Briefly, if two events are heard as connecting prolongationally (at the musical surface or at an underlying level), and if the second event is heard as less stable than the first, the overall progression is felt as tensing. If the second event is more stable than the first the progression is felt as relaxing” (Lerdahl and Jackendoff 1983 (a), Ital. trans. 1987—[p. 242–4 in English version]). I do indeed say weak syntactic structure, which, as we have seen, is the case for atonal music, but beyond this, for all music which has attempted, or which attempts to free itself from prolongational tonal structures and organisations. I have shown for example that in Debussy, in the two Préludes analysed, and yet very different, La Puerta del Vino and La Cathédrale engloutie, this character of feeble hierarchicalisation of the macrostructure appears very evident, in comparison to the macrostructure of an intermezzo by Brahms: the multiplicity of caesuras, the weak duration of groups and their number at the musical surface, but equally their high number at the higher levels of the hierarchical tree all testify to it, whereas the macrostructure of the piece by Brahms is practically reduced to its stereotypical tonal ABA (Imberty 1981 & 1987).
How do we perceive atonal music?
363
When professional musicians, who do not, however, know the pieces, are asked to point out, to identify and to briefly describe the salient elements of the perceptual structure which they themselves have just constructed, we are struck by the fact that in Debussy, elements with a thematic or harmonic function are practically never taken into consideration, but instead isolated notes, isolated chords perceived as sound colours, contrasts, augmentations and diminutions of expressivity, of density of writing, the ostinato rhythm of La Puerta nearly always are. These elements in fact constitute the basis of cues which “prolong” from one group to the other and orient the tensions and the progressions of the group of pieces analysed, allowing the perceptive anticipations on which the parallelisms of salience are constructed. I agree with I.Deliège (1987) in saying that musically these elements can be anything: only their contextual position, their individual differentiation, their repetition in time are determinant. But it is clear that they allow the listener retrieve a temporal and not fixed image of the musical piece which he is remembering.
Conclusion: Berio’s Sequenza III I will end with a brief examination of the perception and comprehension of a complete atonal work, moreover, one coming from within the framework of analysis centering around pitch groupings: L.Berio’s Sequenza III. a) The work can be presented in the following way for the reader who may be unfamiliar with it. Straight away, it introduces an alternation of two types of sound material: a group of “sung” sequences, where the voice lays out sounds of fixed pitch and sketches codified intervals in the system of the normal musical scale; a group of “noise” sequences, where the voice uses all possible noises made with the tongue, the mouth, the fingers, whispered speaking or breathing. However, this alternation is neither regular nor marked. In particular, certain “sung” sequences are embellished with “noises” (warbles, laughs, whispers) without these noises being organised in an autonomous sequence. The problem of segmentation—and of the make-up of the macrostructure—resides in the way in which the perceptual differentiation of the two types of material will be used by the listener. Hence I presented Sequenza III to a group of 24 subjects, musicians who did not know the work, and I asked them to segment it during listening. The experiment proceeded in the following way: after listening to the experimental instructions, the subjects listened to the work a first time in order to familiarise themselves with it. During the first experimental listening, they had to indicate, using the chronometric method (Imberty 1981), the changes, contrasts, breaks or any sort of sound event which was salient for them. A second experimental listening allowed them to begin the segmentation again, the instructions indicating that the subjects should indicate all the changes they perceived, the most subtle as well as the most marked this time. b) The results of the first hearing bring out 4 sections, the first two sections are welldefined (p. 1 to stave III, 3rd subdivision, before wistful), (1, III, 3 to 2, III, 2); the third and fourth sections being less well defined in relation to each other (2, III, 2 to 3, I, 2) (3, I, 2 to the end). Furthermore, two subdivisions appear, at (1, IV, 3) by a descending whispered grouplet and a clicking of the tongue which divides the long “melodic”
Contemporary music review
364
sequence, and at the end (3, II, 2), the beginning of a relatively short “noise” sequence. The first part appears to be a sort of heterogeneous introduction from which the song, still bearly recognisable, slowly emerges; it is mainly made from tongue noises, voice noises, and isolated sounds. Then the first long sung sequence comes: it is the second part which gives way to a new eruption of shouts, gasps and whisperings, in brief, everything which arises directly from bodily expression. This third section leads in an indecisive manner to the final section: the delimitation only corresponds in part to what the score suggests, if we retain the principle of segmentation by contrast of the two vocal materials used by Berio. In fact, this time we only lead progressively from “noises” to song, and the two materials appear to be very closely interwoven here. c) The second hearing only partly confirms this break down: the perceptual recenterings are important. In particular, the segmentation at the end of the first section is less well perceived, but a new break appears (1, IV, before faintly). It is thus easy to recognise the short ascending whispered motif and the tongue clicking which we have already noted and the subdivision of the second part. Hence it is here that all the theoretical suggestions about salience and parallelisms as bases of prolongational reduction in atonal music are made concrete: this short motif plays a decisive role in the organisation of the macrostructure from the second hearing onwards. In fact, it can be noted that after the break between the first and the second sections, a new segmentation appears at (1, IV, 3), then at (2, 1, 1), (2, II, 1 reversed), (2, II, 3). From then on it punctuates the long sung section, and it becomes the basis of the structuralisation of the third section where it is easily identified even in sound material of the same nature (“noise”sequence): hence at (2, III, 4), (2, IV, 2, twice in succession), (2, IV, 3). Its role as a cue for order or in Lerdahl’s terms, for prolongation, become explicit as it separates, but at the same time articulating the 3rd and 4th sections at (3, I, 2), even though it is slowed down. It is found three more times in the final section where its structuring role is clear for the subjects. It is interesting that the importance of this motif only comes out during the second hearing, and it follows on through an accentuated hierarchicalisation of the macrostructure, itself reinforced such that the listener moves forward in listening to the piece (sections 3 and 4 being significantly more hierarchicalised that the first two, as the detailed results presented in Imberty 1987 show). We can thus say that this element is like a cue which allows perceptual anticipation tending to homogenise musical time, to spin long term relationships between rhythmic, timbral, material and pitch groups. d) Hence, in atonal music, the macrostructure appears to be made up of the hierarchicalisation of saliences and tensions. We can thus distinguish, as do Bharucha (1984), Deutsch (1984) and Lerdahl himself (1989) between event hierarchies and tonal hierarchies: “An event hierarchy is part of the structure that listeners infer from temporal music sequences” (cognitive operations in real time); “a tonal hierarchy is a non-temporal mental schema that listeners utilize in assigning event hierarchies” (Lerdahl 1989). The macrostructure corresponds to a combination of two types of hierarchy which I myself have distinguished in terms of order structure and order relation structure. At least at the level of cognitive functioning, the schemes for temporal organisation make up two distinct structural systems, corresponding to these structures. In tonal music the order relation structures correspond for the most part to the knowledge of the subject, and the stereotypes of the macrostructure make up the base of the network of prolongations. In
How do we perceive atonal music?
365
atonal music, the order structures (event hierarchies) are primary, although the cues or the dynamic vectors of the macrostructure establish remote links between rhythmic groups or groups of pitch. In this case the iterations, the parallelisms, the induced temporal sections make up the non-syntactic prolongational links where salience takes precedence over stability. Perhaps these types of implicative model (Narmour or Meyer) would suit quite well, but they don’t yet seem able to broach this type of question. In any case, it is clear that the perception and comprehension of atonal music remains more uncertain, or if you prefer, more and more open than the perception and comprehension of tonal music. The rôle of the listener, with his past, his culture, his knowledge, is more important in this case. After all, it would not be so bad to have been able to put forward a theory that is more essential to atonal music than its eminently formalist and combinatorial character: the part of the creative imagination left to the listener in the psychological elaboration of the work.
References Bharucha, J.J., (1984) Event hierarchies, Tonal Hierarchies and Assimilation: A Reply to Deutsch and Dowling. Journal of Experimental Psychology General, 113 (3), 421–425. Bigand, E. (1990) Perception et compréhension des phrases musicales. Thèse de Doctorat en Psychologie, Université de Paris X-Nanterre. Deliège, I. (1987) Le parallélisme, support d’une analyse auditive de la musique: vers une modèle des parcours cognitifs de l’information musicale. Application au Syrinx de Debussy. Analyse Musicale, 6, 73–79. Deliège, I. (1989) A perceptual approach to contemporary musical forms In S.McAdams & I.Deliège (Eds.), Music and the cognitive sciences. Contemporary Music Review, vol. 4, pp. 213–231. Deliège, I. & El Alimadi, A. (1990) Mechanisms of cue extraction in musical groupings. Psychology of Music (18) 1, 18–44. Deutsch, D. (1984) Two issues concerning tonal hierarchies: Comment on Castellano, Bharucha and Krumhansl. Journal of Experimental Psychology: General, 113 (3), 413–416. Francès, R. (1958) La perception de la musique. Paris: Vrin, rééd 1972. Imberty, M. (1981) Les écritures de temps. Paris: Dunod. Imberty, M. (1985) La Cathédrale engloutie de Claude Debussy: de la perception au sens. Revue de Musique des Universités Canadiennes, 6, 90–160. Imberty, M. (1987) De la perception du temps musical à sa signification psychologique: à propos de la Cathédrale engloutie de Debussy. Analyse Musicale, 6, 28–37. Imberty, M. (1987) L’occhio e l’orecchio: Sequenza III di Berio. In L.Marconi, G.e Stefani, (Eds.), Il senso in musica. Bologna, CLUEB, 163–186. Lerdahl, F. (1989) Structure de prolongation dans l’atonalité. In S.McAdams, I.et Deliège, (Eds), La Musique et les sciences cognitives. Bruxelles, Mardaga, 103–135. Lerdahl, F., Jackendoff, R. (1983) An Overview of Hierarchical Structure in Music. Music Perception, (a), 1?, No. 2, 229–247. (Italian translation: Grammatica Generativa e analisis, In L.Marconi, G.e Stefani, (Eds.), Il senso in musica, Bologna: CLUEB, 197–220). Lerdahl, F., Jackendoff, R. (1984b) A Generative Theory of Tonal Music. Cambridge, Mass., M.I.T. Press. Meyer, L.B. (1973) Explaining Music, Berkeley: University of California Press. Narmour, E. (1989) Le “code génétique” de la mélodic: structures cognitives engendrées par le modèle de l’implication-réalisation. In S.McAdams, I.et Deliège, (Eds.), La Musique et les sciences cognitives. Bruxelles, Margada, 75–101.
Contemporary music review
366
Nattiez, J.J. (1987) Musicologie générale et sémiologie. Paris: Bourgois.
How do we perceive atonal music?
This page intentionally left blank.
367
Name index
368
Name index Contemporary Music Review, 1993, Vol. 9, Parts 1 & 2, pp. 339–349 Photocopying permitted by license only
© 1993 Harwood Academic Publishers GmbH Printed in Malaysia
Page numbers in bold italic type denotes references
Ackroff, J.M. 152, 158, 161 Albersnagel, F.A. 164, 177 Allen, J.F. 229, 232, 235n, 236n, 236 Allerhand, M. 4 Anderson, G.B. 164, 177 Anderson, J.R. 164, 176, 177, 223, 236 ANSI 116, 121, 223, 236 Apel, W. 116, 121 Aristotle 164, 311–12, 318, 323 Arom, S. 4, 8, 12, 15, 19 Askenfelt, A. 274 Atlan, H. 276, 282, 283 Auxiette, C. 189 Bach, J.S. 10. 97, 100, 216, 217, 315 Bachem, A. 83, 94, 95 Baddeley, A.D. 105, 107, 109 Badertscher, B. 47, 49 Baily, J. 217, 218 Baker, M. 257, 265 Baker-Short, C. 210, 218 Balaban, M. 223, 229, 236 Balzano, G.J. 116, 121 Barrière, J-B. 14(fn) Barthes, R. 306, 310 Baumann, S.B. 110 Baynes, K. 120, 121 Beckett, C. 108, 111 Beethoven, L. van 318 Benade, A.H. 22, 32 Bereiter, C. 312, 313, 323 Berg, A. 326 Berio, L. 191, 192, 196, 326, 324–4 Berlioz, H. 51, 66, 217 Bever, T.G. 98, 109, 152, 154, 161 Bharucha, J.J. 29, 32, 43, 49, 105, 111, 124, 136, 228, 232, 236, 239, 253, 336, 336
Name index
369
Bigand, E. 5, 124, 136, 329, 330, 336 Birtwistle, H. 149 Bismarck, G. 54, 66 Bizzi, G. 278, 283 Black, A. 149, 149 Blank, M.A. 152, 161 Boden, M.A. 222, 236 Bogen, J.E. 116, 121 Boltz, M. 177, 177 Botte, M.C. 5, 184, 189 Boulanger, R. 241, 254 Boulez, P. 191, 192, 193, 332 Bower, G.H. 164, 176, 177 Bradshaw, J.L. 115, 121 Brahms, J. 333, 334 Bregman, A.S. 227, 234, 236, 237, 256, 265 Breuker, J. 324 Britton, B.K. 312, 324 Brooks, V. 176, 177 Brugge, J.F. 103, 110 Bruner, J. 2, 3, 5 Bryden, M.P. 98, 109 Burns, E.M. 35, 49 Burridge, R. 22, 32 Butler, D. 47, 49 Buyssens, E. 305, 310 Byrd, D. 223, 236 Campion, T. 140 Carrol, N. 163, 164, 165, 166, 177 Carson, R. 99, 110 Carterette, C. 180, 189 Carterette, E.C. 22, 23, 32, 33, 52, 56, 58, 60, 63, 66 Casseday, J. 110 Castellano, M.A. 29, 32 Cattelani, R. 98, 110 Cesarec, Z. 166, 177 Chafe, C. 241, 254 Chailley, J. 12 Chauvel, P. 97, 110 Chavis, D. 106, 109 Chiarello, R. 98, 109 Chomsky, N. 139 Chopin, F. 144 Chowning, J. 15, 241, 254 Churchland, P.S. 3, 6 Clarke, E.F. 5, 180, 189, 194, 205, 207, 210, 211, 218, 227, 235n, 236, 240, 246, 254, 262, 265 Clarkson, D. 97, 111 Clynes, M. 207, 208, 218, 267, 269, 274 Cohen, A.J. 5, 164, 176, 177, 177 Cointe, P. 229, 238
Name index
370
Colombo, M. 105, 109 Cook, N. 217, 218 Cooper, F.S. 151, 161 Cooper, G. 257, 262, 265 Copland, A. 163 Cordeau, J.P. 105, 111 Cottle, T.J. 192, 205 Crandall, P. 97, 109 Cross, I. 139, 150, 210, 219 Crowder, R.G. 36, 49 Cuddy, L.L. 47, 49 Cunningham, J.G. 166, 177 Cutting, J.E. 148, 149 D’Amato, M. 109 Damasio, A.R. 115, 121 Damasio, H. 115, 121 Dandrel, L. 15 Danielou, A. 32n Dannenberg, R. 222, 223, 229, 236, 240, 254 Davidson, J. 215, 216, 218 Davies, P. 276, 283 De La Motte, D. 43, 49 de Rijk, K. 241, 254 Debussy, C. 309, 333, 334 Dehoux, V. 12, 19 Delalande, F. 216, 218 Deliège, C. 193, 205 Deliège, I. 5, 124, 136, 191, 195, 196, 202, 203, 205, 332, 334, 336, 337 Denes, G. 166, 177 Desain, P. 5, 224, 227, 229, 235n, 236, 240, 241, 242, 250, 253, 254 Deutsch, D. 35, 49, 102, 106, 109, 124, 136, 227, 236, 336, 337 DeWitt, L.A. 36, 49 Di Scipio, A. 5, 278, 282, 283 Diamond, I. 110 Divenyi, P.L. 97, 109 Doubleday, C. 97, 111 Dowling, W.J. 101, 108, 109, 224, 237 Drake, C. 5, 180, 184, 189 Duffy, F. 99, 109 Dunphy, D. 176n, 177, 177 Duvelle, C. 19 Dvorak, A. 144 Dyer, L.M. 228, 236, 237 Dykes, R. 97, 110 Dyson, M.C. 148, 150 Earhard, B. 164, 177 Eastman, C.M. 315, 324 Ebcioglu, K. 236, 256, 265, 275, 276, 283 Eco, U. 306, 310
Name index
371
Efron, R. 97, 99, 109 Eisenberg, H.M. 110 Eisenstein, S.M. 165–165, 177 Eisler, H. 176, 177 El Ahmadi, A. 205, 337 Elmassian, R. 188, 189 Erickson, R. 223, 226, 237 Essens, P. 124, 136, 224, 237, 257, 265 Evans, C. 153, 161 Feldtkeller, R. 44, 50 Fitch, H.L. 216, 219 Fletcher, H. 44, 48, 49 Flottorp, G. 44, 50 Flowers, L.S. 314, 324 Fodor, J.A. 152, 154, 161, 176, 177, 223, 234, 237, 239, 254 Fogelman-Solie, F. 282 Foss, P.J. 152, 161 Fox, P.T. 103, 110 Fraisse, P. 124, 136, 192, 193, 204, 205, 256, 265 Francès, R. 123, 136, 140, 141, 149, 325, 326, 327, 329, 330, 331 Fraser, G.R.M. 225, 257, 260, 263, 265 Fraser, R. 5 Freeman, P. 149, 149 Friberg, A. 267, 268, 272, 274 Friedman, F. 203, 205 Fryden, L. 267, 274 Furniss, S. 12 Fux, J.J. 277, 283 Gabrielsson, A. 180, 189 Galaburda, A.M. 106, 107, 109 Galambos, R. 188, 189 Galt, R.H. 44, 48, 49 Gardner, H. 166, 177 Garnett, G.E. 236 Garvin, J.J. 148, 149 Gates, A. 115, 121 Gazzaniga, M.S. 116, 121 Geffen, G. 98, 110 Gérard, C. 5, 180, 184, 189 Glynn, S.M. 312, 324 Goldman-Rakic, P.S. 106, 107, 110 Goldstein, J.L. 94, 95 Goodman, N. 222, 236 Gorbman, C. 163, 165, 166, 168, 177 Gordon, H.W. 115, 121 Gordon, W.P. 108, 110 Gould, G. 216 Gourlay, J.S. 223, 237 Gregory, A.H. 155, 161
Name index
372
Gregory, R.L. 152, 153, 161 Grey, J.M. 52, 66, 116, 121, 224, 237 Gross, C. 109 Grout, D.J. 165, 177 Gulliksen, N.A. 146, 149 Guttman, L. 25 Guyau, J.M. 193, 205 Halle, M. 236n Halpern, A.R. 193, 205 Handel, S. 186, 189 Hartley 164 Harwood, D. 224, 237 Hayes, J.R. 314, 324 Heffner, H.E. 104, 110 Heffner, R.S. 104, 110 Heider, F. 171, 177 Helmholtz, H. von 52, 55, 66 Henson, R.A. 116, 121 Herrman, B. 163 Hesse, H.P. 83, 84, 88, 93, 95 Hewitt, C. 226, 237 Hjelmslev, L. 306, 307, 310 Hobbes, T. 164 Hochberg, J. 176, 177 Hofmannsthal H. von 318 Holdsworth, J. 73, 75, 81 Honing, H. 15, 227, 229, 231, 235n, 236, 237, 240, 241, 242, 250, 253, 254, 267, 274 Hoopen, G. ten 188, 189 Hopkins, A. 166, 177 Houtsma, A.J.M. 94, 95 Howat, R. 286, 304 Howell, P. 139, 150, 210, 219 Hume, D. 164 Huron, D. 38, 49, 227, 237 Husserl, E. 192, 193 Huxley, A. 313 Hylkhuysen, G. 188, 189 Imberty, M. 5, 123, 136, 193, 205, 331, 333, 334, 336, 337 Irwin, R.J. 188, 189 Jackendoff, R.A. 2, 6, 123, 124, 135, 137, 139, 149, 194, 205, 210, 219, 231, 237 255, 256, 257, 262, 265, 274, 287, 289, 304, 327, 328, 329, 333, 334, 337 Jairazbhoy, N.A. 23, 32n, 32 Janacek, L. 149 Janet, P. 192, 205 Jaroszewski, A. 49, 50 Johanssen, G. 215, 219 Johnson-Laird, P.N. 140, 149, 149, 269, 274, 223–224, 237, 304 Jones, A.M. 19
Name index
373
Jones, E.C. 106, 110 Jones, K. 139, 149 Kaminska, S. 5 Kantra, S. 177, 177 Kapproaff, J. 22, 32 Kendall, R.A. 4, 52, 56, 58, 60, 63, 66 Kessler, E.J. 38, 46, 47, 49, 124, 137 Kimura, D. 97, 101, 110 Kircher, A. 277, 283 Klein, J.F. 155, 161 Koechlin, C. 277, 283 Koss, B. 97, 109 Kracauer, S. 166, 177 Kronman, U. 217, 219 Krumhansl, C.L. 29, 32, 38, 43, 46, 47, 49, 123, 124, 125, 127, 135, 136, 137, 160, 161, 194, 205, 210, 219, 224, 237 254, 254 Kubik, G. 12 Kuhl, D. 99, 110 Laird, P. 140, 149 Landsman, M. 315, 316, 319, 321, 323, 324 Laske, O. 236, 304, 315, 324 Lassen, N.A. 98, 110 Lauter, J.L. 97, 110 Lee, C.S. 224, 257, 259, 237, 265 Lejeune, H. 192, 193, 205 Lerdahl, F. 2, 6, 63, 66, 123, 124, 135, 137, 139, 148, 149, 194, 205, 210, 219, 231, 237, 255, 256, 257, 262, 265, 274, 287, 289, 304, 318, 324, 327, 328, 329, 330, 331, 333, 334, 336, 337 Levey, A. 5, 176 Levi, D.S. 166, 177 Levi-Strauss, C. 333 Lewis, D. 115, 116, 121 Liberman, A.M. 151, 152, 161 Liegeois-Chauvel, C. 97, 110 Ligeti, G. 141, 149 Lindblom, B. 140, 150 Lipscomb, S.D. 165, 176, 177 Lischka, C. 276, 283 Loban, W. 312, 324 Locke, J. 164 Longden, M. 161 Longuet-Higgins, H.C. 210, 219, 221, 237, 239, 241, 244, 246, 250, 253, 254, 257, 259, 265 Loy, G. 222, 223, 237, 238, 239, 254 Luria, A.R. 103, 110 MacDonald, J. 161 MacKain, K. 167, 178 Marin, O.S.M. 98, 110 Marks, M. 176, 177 Marr, D. 225, 237, 304
Name index
374
Marsden, A. 256, 265 Marshall, S. 171, 177 Matthews, M.V. 228, 237 Mazziotta, J. 99, 107, 110 Mazzuchi, A. 98, 110 McAdams, S. 5, 224, 227, 237, 256, 265 McAnulty, G. 99, 109 McCarthy, R.A. 105, 110, 151, 161 McDermott, D.V. 229, 237 McGurk, H. 152, 161 McKenna, T. 107, 111 McLelland, J.E. 239, 254 Méeus, N. 5 Meredith, D. 5 Merzenich, M.M. 97, 103, 110 Messenger, W.G. 22, 32 Meyer, L.B. 166, 177, 192, 205, 237, 256, 257, 262, 265, 336, 337 Meyer, P. 5 Michon, J.A. 192, 202, 205 Middlebrooks, J. 97, 110 Milner, B. 97, 110, 115, 121 Milroy, R. 4 Minsky, M. 223, 226, 237 Molino, J. 309, 310 Monahan, C.B. 180, 189 Monod, J. 276, 283 Mont-Reynaud, B. 240–241, 254 Moore, B.C.J. 94 Moore, F.R. 223, 237 Morais, J. 98, 99, 107, 110, 125, 137 Morshedi, C. 22, 32 Mounin, G. 305, 310 Murdock, B. Jr. 196, 205 Musiek, F.E. 105, 111 Musiek, F.E. 105, 111 Musolino, A. 97, 110 Nakjima, Y. 188, 189 Narmour, E. 232, 237, 336, 337 Nattiez, J.-J. 305, 306, 307, 309, 310, 328, 337 Neff, W. 103, 104, 105, 107, 110 Newell, A. 223, 237, 315 Newman, E.A. 161 Nielzen, S. 166, 177 Nimmo-Smith, I. 81 Noorden, L.van 83, 84, 93, 95 Norman, D.A. 225, 238 Obusek, C.J. 152, 158, 161 Ogden, C.K. 306, 310 Ohgushi, K. 36, 49, 69, 81
Name index
375
Ojemann, G.A. 152, 161 Okkerman, H. 188, 189 Olney, K.L. 239, 253 Oosten, P.W.J.van 5, 274 Osgood, CE. 171, 177 Oshinsky, J.S. 167, 178 Pagels, H.R. 276, 283 Palmer, C. 123, 125, 137, 163, 164, 165, 177, 180, 189, 210, 219, 254, 254, 274 Pandya, D.N. 106, 109, 110 Papanicolaou, A.C. 110 Papert, S. 276 Parker, D. 124, 137 Parma, M. 98, 110 Parncutt, R. 4, 35–49, 49, 50 Patterson, R.D. 4, 69, 70, 71, 73, 75, 81 Pay, B.E. 161 Peirce, C.S. 305, 306, 307, 310, 310 Pelletier, S. 19 Pepinsky, A. 66n, 66 Peretz, I. 98, 99, 107, 110, 125, 137 Perry, D.W. 4, 99, 100, 102, 108, 110 Petersen, S. 103, 110 Petrides, M. 106, 110 Phelps, M. 99, 110 Piazza, D. 98, 110 Piston, W. 54, 56, 66 Plato, I. 311, 312, 318, 321, 324 Plomp, R. 52, 67, 116, 121 Polansky, L. 257, 265 Pollack, I. 48n, 49 Pope, S.T. 228, 231, 236, 237 Posner, M.I. 103, 110 Povel, D.J. 124, 136, 137, 188, 189, 224, 237, 257, 265 Powell, T.P.S. 106, 110 Prieto, L. 305, 310 Puckette, M. 222, 228, 237 Pylyshyn, Z.W. 176, 177 Quillian, M.R. 223, 237 Raichie, M.E. 103, 110 Raman, C.V. 21, 32 Rasch, R.A. 37, 49 Rasmussen, T. 105, 111 Rawling, L.L. 36, 50 Rikchards, I.A. 306, 310 Richelle, M. 192, 205 Riemann, H. 43, 49, 257, 265 Riggs, M.G. 166, 168, 178 Rimsky-Korsakov, N. 56, 67
Name index
Riotte, A. 282, 283 Risset, J.-C 15, 54, 67, 94, 95, 116, 121 Roads, C. 236, 237 Robinson, K. 4 Rodet, X. 229, 238 Rodman, H. 109 Rogers, R.L. 97, 110 Roland, P.E. 98, 110 Roozendaal, R. 5, 321, 324 Rosenstiel, A. 166, 177 Rosenthal, D. 257, 265 Rozsa, M. 163, 165, 178 Rubin, M. 164, 178 Rumelhart, D.E. 225, 238, 239, 254 Rush, L. 241, 254 Rushton, J. 217, 219 Sadai, Y. 308, 310 Saetveit, J. 115, 121 Sallée, P. 12 Samson, S. 106, 107, 111 Sandell, G. 52, 67 Snides, F. 107, 109 Sartre, J.-P. 164 Sasaki, T. 158, 161, 188, 189 Saussure, F.de 305, 306, 310 Saydjari, C. 220 Scardemalia, M. 312, 313, 323 Schachter, C. 256, 266 Schachter, S. 99, 109 Scharf, B. 188, 189 Schellenberg, G. 160, 161 Schenker, H. 232, 238, 328 Scherer, K.R. 167, 178 Schermbach, R. 39, 50 Schloss, A. 241, 254 Schoenberg, A. 216, 325, 326, 330, 331 Schottstaedt, W. 229, 238, 275, 283 Schouten, J.F. 116, 121 Schroeder, D. 166, 178 Schubert, F. 318 Schulkind, M. 177, 177 Schulze, H. 246, 254 Schumann, F. 318 Schwarz, J. 14(fn) Seachore, C.E. 115, 116, 121 Searle, J. 276, 283 Seashore, C. 98, 99, 210, 219 Seewann, M. 39, 50 Semenza, C. 166, 177 Serafine, M.L. 124, 125, 137, 227, 238, 318, 324
376
Name index
377
Shaffer, L.H. 180, 189, 207, 208, 209, 210, 211, 216, 219 Shankweiler, D.P. 151, 161 Shepard, R.N. 25, 32, 36, 47–48, 49, 56, 67, 224, 238 Sherman, G.L. 158, 161 Sidtis, J.J. 97, 111, 115, 116, 121 Silver man, J. 166, 177 Simmel, M. 171, 177 Simon, H. 234, 235n, 226, 238, 315 Skinhøj, E. 98, 110 Slawson, W. 54, 63 Sloboda, J.A. 2, 6, 124, 137, 139, 142, 150, 155, 161, 211, 224, 227, 238, 256, 257, 266, 276, 283, 315, 317, 319, 321, 323, 324 Smith, B.C. 224, 238 Smith, J. 241, 254 Smith, J.B. 315, 316, 319, 321, 323, 324 Smith, L. 229, 238 Solomon, J. 282 Sovalmico, M. 275, 283 Sperry, R.W. 116, 11 Spieker, S. 167, 178 Stallman, R.M. 276, 283 Steblin, R. 33 Steedman, M.J. 258, 263, 264, 265, 266 Stepien, L.S. 105, 111 Sterling, R.S. 166, 177 Stern, D.N. 167, 178 Stevens, S.S. 44, 50 Stockman, D. 306 Stoffer, T.A. 124, 137, 155, 161 Stoll, G. 39, 50 Strauss, J. 142–3 Stravinsky, I. 148, 150 Strawn, J. 237 Studdert-Kennedy, M. 151, 161 Stumpf, C 52 Suci, G.J. 171, 177 Sundberg, J. 47, 49, 140, 150, 167, 178, 207, 210, 217, 219, 267–274, 274 Sussman, G.J. 276, 283 Tanguay, P. 97, 111 Tannenbaum, P.H. 171, 177 Taub, J. 97, 111 Tegner, A. 140 Tenney, J. 257, 266 Terhardt, E. 39, 44, 48, 50 Teyken, C. 315, 317, 324 Thelen, E. 216, 219 Thom, R. 276, 282, 283 Thompson, W.F. 267, 268, 271, 274 Thurlow, W.R. 36, 50 Todd, N.P. 180, 189, 207, 210, 212, 217, 219, 267, 569, 274
Name index
Todd, P.M. 238, 239, 254 Tracey, H. 19 Tramo, M.J. 5, 105, 111, 116, 121 Traub, E. 98, 110 Truax, B. 282, 283 Tucker, L.D. 146, 149 Tuller, B. 216, 219 Turing, A. 276 Turvey, M.T. 216, 219 Ueda, K. 69, 81 Ulrich, J.W. 216, 219 Vaughn, K. 4, 22, 23, 32, 33 Verdi, G. 142–3, 148 Viret, J. 12 Vis, G. 188, 189 Voisin, F. 9, 19 Wagner, R. 309 Walker, A.S. 167, 177 Walker, A.S. 167, 177 Warren, R.M. 152, 153, 158, 161 Warrington, E.K. 105, 110, 151, 161 Watkins, A.J. 148, 150 Weinberger, N.M. 107, 111 Weisbuch, G. 282 Weiss, P. 318, 324 Wessel, D.L. 54, 67, 116, 121, 224, 238 West, R. 139, 150, 210, 219 Wiedebein, G. 318 Wilk, R.G.H. van der 188, 189 Wingfield, A. 155, 161 Winograd, T. 226, 238 Winold, H. 216, 219 Wyke, M.A. 116, 121 Xenakis, I. 148, 150, 193, 205 Yund, E.W. 97, 99, 109 Zatorre, R. 98, 105, 106, 107, 108, 111 Zwicker, E. 44, 49, 50, 188, 189 Zwislocki, J.J. 188, 189
378
Subject index
379
Subject index absolute pitch 94, 102, 108 accent 180, 186, 328 accent, agogic 257 accent, metrical 180 accent, subjective 179, 188 Afghanistan 217 Alexander Nevsky (Prokofiev) 165 alternating-amplitude sounds (AAMP) 70 alternating-phase sounds (APH) 70 analysis 1, 5, 139, 192 analysis-by-synthesis 9, 207, 267, 268 Andante Spianato (Chopin) 263 Arpeggione Sonata (Schubert) 257 artificial intelligence 221, 223 artificial intelligence and music 257–283 artificial intelligence, good old fashioned (GOFAI) 239 associational representations 223 associationism 163–177 associationism and connectionism 165 associationism and mechanistic philosophy 164–165 associationism and reductionism 165 asynchrony, auditory sensitivity to 37 atonal music 5, 325–327, 330–336 Au clair de la lune 329 partials, audibility 45 auditory illusions 102, 179 auditory image 74–75, 211 auditory system, neurological aspects of 75 automatic performance 47 bottom-up processing 160 bottom-up vs. top-down representation 232 boundary (effect of perceived intensity) 184 boundary 154–156, 211 Central Africa 7–12, 13–19 cerebral hemisphere 97–109, 113, 151 children 180, 184, 312 chord 105 chord progression 41–43 chord roots 35, 38–41, 43
Subject index
Christus der ist mein Leben (Bach) 102, 155, 160 chunk 102, 155, 160 circle of fifths 24, 101 Clair de lune (Debussy) 285–304 click localisation 154–158 Cloches à travers les feuilles (Debussy) 285–304 cochlea 75 cognitive processes 104 cognitive science 1–4, 218 commissurotomy 116–118 common tone 24, 25, 28, 32n complex tones, perception of 105 composition 1, 55, 22, 192, 311–323, 326 computational models 1, 4, 5, 221–222, 225–266 computer analysis 283–304 computer composition 275–283 computer counterpoint 275–283 connectionism 5, 165, 239–240 consonance 22, 105 consonance and pitch multiplicity 38 constraints 275 contour 11, 12, 142, 327 cosine-phase sounds (CPH) 70 counterpoint 5, 7, 8 critical band 44 cross-modal experimental paradigm 115–120 cue extraction 191–204, 332, 333 cycle, time 7, 22, 23 development 1, 312 diatonicism 108, 144 Die Meistersinger, Prelude from (Wagner) 259 dissonance 22, 23, 105 dissonance and pitch multiplicity 38 distributed models of cognition 239–240 representations, distributed 235n duration 83, 90–91, 179, 198, 208 duraton, perceived 197 dynamic vectors 193, 333, 334 dynamics 208 ear difference 97–109 Eclat (Boulez) 191–202, 332 encoding 32, 108, 186 ensemble 13 equal temperament 24 ESCOM 5 ethnomusicology 1, 2, 4 excitation 22 expectation 141 experience, effect of 100
380
Subject index
expert systems 275–283 expression 5, 207–218 expression, formal theories of 267 film 5, 10, 16–18 film music 163–177 Five Orchestral Pieces, Op. 16 (Schoenberg) 140 folk song 140 formal theories 287–290 frequency (sound) 9, 23, 30, 86, 105, 106, 152 frequency discrimination 104 frequency, fundamental 8, 14, 23, 84, 105 Fugue in F sharp minor from WTK I (Bach) 260 Gegenklang 43 generative processes 207, 209–211, 214, 218 generative structure 328 Gestalt principle of proximity 256 grammar 5, 154, 155, 157 grammar, compositional 139, 140, 148, 318 grammar, generative 139–149 Grande Messe des Morts (Berlioz) 52 grouping see segmentation handedness 97 harmonic charge 47 harmonic structure 4, 5, 22, 140, 142 harmonics see partials hemispheric specialisation see cerebral hemisphere heuristics 275 hierarchical organisation 124, 231–233, 325, 327, 328 hierarchical phrase structure 255–266 hierarchy 204(fn), 327, 330–333, 336 illusion, perceptual 188 implication 160, 336 improvisation 160, 336 INDSCAL (Individual Differences Scaling) 56 information processing 179 inharmonicity 16, 17 instrument discrimination 84, 88, 90–94 instrumentation 51–67 intensity 86, 154, 179 intensity, perceived 90, 180, 181–4, 186–188 intensity, performed 189 intention, compositional 140, 147, 148 intention, expressive (performance) 180 intentionality 140, 141, 276 intersemic relationships 307 interval structure 149, 159
381
Subject index
382
interval, pitch 8, 11, 13, 14, 18, 22, 24, 25, 27, 83, 84, 86, 88, 90–94, 98 141, 325, 327, 335 interval, temporal 106, 179–188, 203 Intimate Letters (Janacek) 149 IRCAM 14fn Klangfarbenmelodie 52 knowledge representation 224–225 Kruskal stress reduction 63 Lélio (Berlioz) 52 La Cathédrale engloutie (Debussy) 334 La Donne e Mobile (Verdi) 142–3 La Puerta del Vino (Debussy) 334 Leitmotiv 165, 308 LISP 36 loudness see intensity macrostructure 333 masking 44–45, 179, 188 meaning 147 mel scale 48, 69 melodic charge 47, 272–273 melody 5, 7, 10, 11, 14, 22, 23, 30, 97–104, 107–109, 139, 155–6, 159–60, 193, 211–214 memory 5, 86, 191–204, 325, 327, 328 memory, echoic 211–214 memory, long-term 108 memory, storage 193, 328 memory, working 97, 105–109 metrical structure 140, 146, 155, 193, 328, 330 MIDI 223 Mildred Pierce (Stein) 165 missing fundamental 105 modelling 18, 24, 30, 312–318, 319–23 modes 22, 24, 25, 30, 144, 148 modularization 234 modulation 30 motive 141, 145, 153–4 motivic argument 139, 140, 147 motivic content 149 motivic goals 148 motivic relations 330 motor (acoustico-motor association cortices) 108 motor processes 103, 186 motor program 108, 109, 184, 209 motor response 106 motor skills 211 movement 108, 215–218 multidimensional scaling 25, 28–31, 32n multidimensional scaling of timbre 53–56 multiple viewpoints 275–283 Music for the Royal Fireworks (Handel) 51
Subject index
383
music and meaning 163–177 music archiving 223 music publishing 223 music theory 140 music, definitiion of 222 musical structure 5, 21, 210, 211, 214, 217, 218, 221–238, 325, 333 neural networks 106 neuropsychology 4, 97–109, 151–152 object data structures 231 octave-gerneralised chords 35–49 orchestration 51–67, 217 Orfeo Monteverdi) 51 parsing 146, 208, 209 partials 21–23, 30, 113 pattern recognition of harmonic tones 45 pentatonicism 7, 8, 11, 14–17, 144 percussion 8, 13, 21 performance 1, 5, 15, 30, 97, 139, 180, 189, 207–218 performance practice 32, 208, 214 periodicity 9, 10, 84, 94 phenomenal accents 257 phoneme 152, 153 phoneme, restoration 158 phrase 157, 211, 213 phrase grouping 255–266 phrase structure 5, 145, 159 phrase structure, cognitive representation of 123–124, 255–266 phrase structure, perception of 255–266 pitch 8, 13–16, 21, 35–49, 98, 100, 140, 179, 192 pitch ambiguity see pitch multiplicity pitch chroma 36, 56, 69, 78 pitch chroma: physiological basis 69–80 pitch class see pitch chroma pitch class sets 287–304 pitch commonality 35, 41–43 pitch discrimination 105 pitch height 69, 83–95 pitch height: physiological basis 69–81 pitch helix 69–70 pitch multiplicity 35–38, 45 pitch perception, spiral model of 69 pitch salience 35, 45–46 pitch similarity 35, 41–43 pitch, perception 14, 16, 23, 83–95, 98, 99, 104, 109 POCO 267 poïesis 309
Subject index
384
poïetic work 309 polyphony 7, 9, 10, 217, 326 polyrhythms 8 predicate logic 223 primacy 108, 198–9, 204 problem-solving 317, 323 procedural representations 223 production rules 275 production systems 223 prolongation 328–335 prolongational reduction 123–127 proportional analysis 286 psychoacoustics 1, 4, 21 Pulse Ribbon Model of hearing 73 Pythagorean scale 8 quantization, rhythmic 5, 240–254 rag 21–33 recall, melody 97, 100–104, 106, 109 recency 198–9, 204 recognition, melodic 97, 98–100, 141–145 reduction 328, 329 redyctuibm nebtak 325 Reflets dans l’ eau (Debussy) 285–304 register 14, 15 rehearsal 101, 102, 107, 108 representamen 306–310 representation, cognitive 11, 108, 109, 179, 184, 186, 193, 199, 204 (fn), 209–211, 214, 215, 224, 326, 327, 332, 333 representation decomposability 226 representation, levels of 229–230 representation, mixed 226 representation, multiple 226, 232–233 representation of musical structure 221–238 representation of pitch in cognition 48, 224 representation, real-time music 222 representation, score 228 representations, structured object 223 representation, tacit, implicit and explicit 228–237 representational systems 224–225 Requiem (Berlioz) 52 resolution 22 response latency 97, 101, 102 rhythm 5, 15, 98, 140, 142, 326, 336 rhythm perception 179–189, 193, 240–254 rhythm space 243–245 Musical Offering, Ricarcare from (Bach, arr. Webern) 256 Rigoletto (Verdi) 142 roughness 14, 16, 17, 18 rubato see timing, expressive
Subject index
385
rules-based models 267–274 salience 330, 332, 333, 334, 335, 336 scale 7–12, 13–19, 21–33, 327, 335 Seashore Timbre Test 113–120 segmentation 126, 125, 155, 179, 191–196, 204, 211, 256, 286–304, 328, 330, 333–336 semantic differential 171–176 semiology 5, 305, 328 semiotic networks 305–310 semiotic vs. linguistic models of music 305–310 sequence, tone 25 sequencer 9 Sequenza III (Berio) 334–336 Sequenza VI (Berio) 19, 192, 196, 332 serial music 325–327, 141 Shepard tones 35–49 sight-reading 208 signifié 306 similarity judgment 22, 28–30 simulation 15 simulation objects 276 sitar 22, 23 Skye Boat Song 144 Slavonic Dances Op. 72 (Dvorak) 144 Sonata in A flat Op. 26 (Beeethoven) 270 Sonata in B minor (Chopin) 262 Sonata K332 (Mozart) 270, 263 song 7, 10 spectral envelope 83–95 spectrum 22, 23, 105 speech, perception 151–160 spiral processor (chroma perception) 78 split brain and timbre perception 113–121 stochastic composition 47 accent, structural 257 structure, temporal 186 syllable restoration 158 symbolic form 307 symbolic vs subsymbolic processing 239–254 symmetry 24 Symphonie Fantastique (Berlioz) 52 Symphony no. 104 (Haydn) 261 syntactic function 296 syntactic networks 285, 294–304 syntax systems 285–304 synthesiser 7, 9–18, 89, 153 tambura 21–33 technology 3, 12 tempo 15, 182 temporal horizon 192–4
Subject index
386
temporal line 191, 193, 328 temporal summation 188 tension 142, 336 tension-relaxation networks 123 That 21–29, 32n theory of music 9, 104 timbral blend contrasts 63 timbral contours 63 timbre 4, 10, 14–17, 32n, 51–67, 179, 192, 204, 208, 330, 336 timbre and Fourier analysis 116 timbre blend 51–67 timbre discrimination, neurophysiological basis of 113 timbre identification 51–67 timbre perception 22, 83–95, 113–121 timbre, ANSI definition of 116 timbre, inharmonic 18 timbre, instrumental 9, 13 timbre, modulation 21 timbre, multidimensional representation of 116 timbre, vocal 8, 9 time perception 192–4, 202 time, representation of 228–230 time-variant energy spectra 58 timing, expressive 5, 207–218, 267–274 timing, performance 102 tonal centre 30 tonal hierarchy 124, 125, 135 tonal hierarchy and rhythm 124–125 tonal music and semiotics 308–310 tonality 326, 327 tone height see pitch height tone profile 35, 38–41 top-down (influences on perception) 151 top-down processing 152, 160 Tragoedia (Birtwistle) 149 training, effect of 83–88, 90–94, 141, 143, 148–149, 184, 191, 195–204 transcription 15, 32n transformation effect 152–154 Tristan chord 43 tritone paradox 35 tuning 14–18, 22 verbal processing 5 vocal reproduction 103 Well-Tempered Clavier Book II (J.S.Bach) 97, 100–102 wind instruments 51–67 xylophone 13–19
Subject index
yodelling 7, 8
387
Subject index
This page intentionally left blank.
388
Notes on contributors
389
NOTES FOR CONTRIBUTORS Typescripts Papers should be typed with double spacing on good quality paper and submitted in duplicate to the Editor, Contemporary Music Review, c/o Harwood Academic Publishers, at 5th floor, Reading Bridge House Reading Bridge Approach
50 West 23 Street or New York, NY 10010 USA
Reading RG1 8PP, UK
14–9 Okubo 3-chome or Shinjuku-Ku Tokyo 160 Japan
or directly to the issue editor. Submission of a paper to this journal will be taken to imply that it represents original work not previously published, that it is not being considered elsewhere for publication, and that if accepted for publication it will not be published elsewhere in the same form, in any language, without the consent of the editors and publishers. It is a condition of the acceptance by the editor of a typescript for publication that the publisher acquires automatically the copyright of the typescript throughout the world. Languages Papers are accepted only in English. Abstract Each paper requires an abstract of 100–150 words summarizing contents. Key words Up to six key words (index words) should be provided by the author. These will be published at the front of the paper. Illustrations All illustrations should be designated as “Figure 1” etc., and be numbered with consecutive arabic numerals. Each illustration should have a descriptive caption and be mentioned in the text. Indicate an approximate position for each illustration in the margin, and note the paper title, the name of the author and the figure number on the back of the illustration (please use a soft pencil for this, not a felt tip pen). Preparation: All illustrations submitted must be of a high enough standard for direct reproduction. Line drawings should be prepared in black (india) ink on quality white card or paper or on tracing paper, with all the necessary lettering included. Alternatively, good sharp photographs (“glossies”) are acceptable. Photographs intended for halftone reproduction must be good, glossy original prints of maximum contrast. Unusable
Notes on contributors
390
illustrations and example will not be redrawn or retouched by the printer, so it is essential that figures are well prepared. Musical examples These, like the illustrations, must be of a high enough standard for direct reproduction. Musical examples should be prepared in black (india) ink on quality white card or white music manuscript paper, or on tracing paper, with any necessary lettering included. If staves are hand drawn, ensure that the lines are of uniform thickness. Unusable musical examples will not be redrawn or retouched by the printer, so it is essential that figures are well prepared. References and notes References and notes are indicated in the text by consecutive superior arabic numerals (without parentheses). The full list should be collected and typed at the end of the paper in numerical order. Listed references should be complete in all details, including article titles and journal titles in full. In multiauthor references, the first six authors’ names should be listed in full, then “et al” may be used. Examples: 1. Smith, F.J. (1976) Editor. In Search of Musical Method, pp. 70–81. New York and London: Gordon and Breach. 2. Cockrell, D. (1982) A study in French Romanticism. Journal of Musicological Research, 4 (1/2), 85–115. NB authors must check that reference details are correct and complete; otherwise the references are useless. As a final check, please make sure that references tally with citings in the text. Proofs Contributors will receive page proofs (including illustrations) by air mail for correction, which must be returned within 48 hours of receipt. Please ensure that a full postal address is given on the first page of the typescript, so that proofs arrive without delay. Authors’ alterations in excess of 10% of the original composition cost will be charged to authors. Page charges There are no page charges to individuals or institutions.