Computational Processing of the Portuguese Language: 6th International Workshop, PROPOR 2003, Faro, Portugal, June 26-27, 2003. Proceedings

Lecture Notes in Artificial Intelligence Edited by J. G. Carbonell and J. Siekmann Subseries of Lecture Notes in Comput...

Author: Nuno J. Mamede | Jorge Baptista | Isabel Trancoso | Maria das Gracas Volpe Nunes

9 downloads 776 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Artificial Intelligence Edited by J. G. Carbonell and J. Siekmann

Subseries of Lecture Notes in Computer Science

2721

3

Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo

Nuno J. Mamede Jorge Baptista Isabel Trancoso Maria das Gra¸cas Volpe Nunes (Eds.)

Computational Processing of the Portuguese Language 6th International Workshop, PROPOR 2003 Faro, Portugal, June 26-27, 2003 Proceedings

13

Series Editors Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA J¨org Siekmann, University of Saarland, Saarbr¨ucken, Germany Volume Editors Nuno J. Mamede Isabel Trancoso Technical University of Lisbon, L2F, INESC-ID Lisboa Rua Alves Redol, 9, 1000-029 Lisbon, Portugal E-mail: {nuno.mamede, isabel.trancoso}@inesc-id.pt Jorge Baptista University of Algarve, Faculty of Humanities and Social Sciences Campus de Gambelas, 8005-139 Faro, Portugal E-mail: [email protected] Maria das Gra¸cas Volpe Nunes NILC, ICMC-USP S˜ao Carlos Av. do Trabalhador S˜ao-Carlense, 400, 13560-970 S˜ao Carlos, SP, Brazil E-mail: [email protected]

Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliographie; detailed bibliographic data is available in the Internet at .

CR Subject Classification (1998): I.2.7, F.4.2-3, I.2, H.3, I.7 ISSN 0302-9743 ISBN 3-540-40436-8 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York, a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP-Berlin GmbH Printed on acid-free paper SPIN: 10928967 06/3142 543210

Preface

Since 1993, PROPOR Workshops have become an important forum for researchers involved in the Computational Processing of Portuguese, both written and spoken. This PROPOR Workshop follows previous workshops held in 1993 (Lisboa, Portugal), 1996 (Curitiba, Brazil), 1998 (Porto Alegre, Brazil), 1999 ´ (Evora, Portugal) and 2000 (Atibaia, Brazil). The workshop has increasingly contributed to bring together researchers and industry partners from both sides of the Atlantic. The constitution of an international program committee and the adoption of high-standard referee procedures demonstrate the steady development of the ﬁeld and of its scientiﬁc community. This can also be seen in the realization of the satellite workshop AVALON, which constitutes the ﬁrst evaluation campaign of Portuguese NLP systems. Each one of the 64 submitted papers received a careful, triple blind-review by the program committee. All those who contributed are mentioned in the following pages. The reviewing process led to the selection of 41 papers for oral presentation, 24 regular papers and 17 short papers, which are published in this volume. The workshop and this book were structured around the eight following main topics: (i) speech analysis and recognition; (ii) speech synthesis; (iii) pragmatics, discourse, semantics, syntax, and the lexicon; (iv) tools, resources, and applications; (v) dialogue systems; (vi) summarization and information extraction; and (vii) evaluation. We would like to express here our thanks to all the reviewers for their quick and excellent work. We are specially grateful to our invited speakers, Julia Hirschberg (AT&T, USA) and Robert Gaizauskas (Sheﬃeld University, UK), for their invaluable contribution, which undoubtedly increased the interest in the workshop and its quality. We are indebted to a number of individuals for taking care of speciﬁc parts of the workshop program: Jo˜ ao Rochate, for his help in building and maintaining the PROPOR Web site, which served as our main information channel; Susana Mendon¸ca and Nuno Vieira, who provided the indispensable administrative support; and Susana Mendon¸ca, Luc´ılia Chacoto, and Cristina Palma, for their help in the secretariat. We would like to publicly acknowledge the institutions without which this event would not have been possible: The Departamento de Letras Cl´ assicas e Modernas, Faculdade de Ciˆencias Humanas e Sociais, Universidade do Algarve; the Laborat´ orio de Sistemas de L´ıngua Falada, Laborat´ orio de Engenharia da Linguagem; the N´ ucleo Interinstitucional de Ling´ıstica Computacional of the Universidade de S˜ ao Paulo; the Funda¸c˜ao para a Ciˆencia e a Tecnologia, Minist´erio da Ciˆencia e do Ensino Superior; the Funda¸c˜ao Luso-Americana para o Desenvolvimento; and the British Council in Portugal.

VI

Preface

Universidade do Algarve deserves a special mention for having generously hosted the workshop.

April 2003

Nuno J. Mamede Jorge Baptista Isabel Trancoso Maria das Gra¸cas Volpe Nunes

Conference Chair Jorge Baptista, Universidade do Algarve and LabEL (CAUTL-IST)

Program Co-chair Jorge Baptista, Universidade do Algarve and LabEL (CAUTL-IST) Isabel Trancoso, Technical University of Lisbon and L2F (INESC-ID Lisboa) Maria das Gra¸cas Volpe Nunes, University of S˜ ao Paulo and NILC

Publication Chair Nuno J. Mamede, Technical University of Lisbon and L2F (INESC-ID Lisboa)

VIII

Program Committee Ana Frankenberg-Garcia (ISLA, Portugal) Ant´onio Branco (University of Lisbon, Portugal) Ant´onio Teixeira (University of Aveiro, Portugal) Belinda Maia (University of Porto, Portugal) Bento Carlos Dias-da-Silva (S˜ao Paulo State University, Brazil) Carlos Lima (University of Minho, Portugal) Caroline Hag`ege (Xerox Research Centre Europe, France) Dante Barone (Federal University of Rio Grande do Sul, Brazil) Diamantino Freitas (University of Porto, Portugal) Diana Santos (Linguateca/SINTEF, Portugal/Norway) Donia Scott (ITRI, UK) Elisabete Ranchhod (University of Lisbon and LabEL, Portugal) ´ Eric Laporte (University of Marne-la-Vall´ee and IGM, CNRS, France) Fernando Perdig˜ ao (University of Coimbra, Portugal) Francisco Casacuberta (Universitat Polit`ecnica de Val`encia and ITI, Spain) Horacio Franco (SRI, USA) ´ Irene Rodrigues (University of Evora, Portugal) Isabel Trancoso (Technical University of Lisbon and L2F, Portugal) Jo˜ao Paulo Neto (Technical University of Lisbon and L2F, Portugal) Jorge Baptista (University of Algarve and LabEL, Portugal) Julia Hirchsberg (Columbia University and AT&T Labs, USA) L´ ucia H. Machado Rino (S˜ ao Carlos Federal University and NILC, Brazil) Lu´ıs Caldas de Oliveira (Technical University of Lisbon and L2F, Portugal) Manuel C´elio Concei¸ca˜o (University of Algarve and Termip, Portugal) Marcelo Finger (University of S˜ ao Paulo, Brazil) Maria das Gra¸cas Volpe Nunes (University of S˜ao Paulo and NILC, Brazil) Maria do C´eu Viana (CLUL, Portugal) Maria Helena Mira Mateus (University of Lisbon and ILTEC, Portugal) Max Silberztein (University of Franche-Comt´e and GRELIS, France) ´ Michel Gagnon (Ecole Polytechnique de Montr´eal, Canada) N´estor Becerra Yoma (University of Chile, Chile) Nuno Mamede (Technical University of Lisbon and L2F, Portugal) Palmira Marrafa (University of Lisbon, Portugal) Pl´ınio Barbosa (Campinas State University, Brazil) Renata Vieira (UNISINOS, Brazil) Rute Costa (New University of Lisbon and Termip, Portugal) Teresa Lino (New University of Lisbon and Termip, Portugal) Vera L´ ucia de Lima (Pontiﬁcal Catholic Univ. of Rio Grande do Sul, Brazil) Violeta Quental (Pontiﬁcal Catholic Univ. of Rio de Janeiro, Brazil)

IX

Referees Ana Frankenberg-Garcia Ant´onio Branco Ant´onio Teixeira Belinda Maia Bento Dias-da-Silva Carlos Lima Caroline Brun Caroline Hag`ege Dante Barone Diamantino Freitas Diana Santos Donia Scott Elisabete Ranchhod ´ Eric Laporte Fernando Perdig˜ ao Francisco Casacoberta Gagnon Michel Gra¸ca Nunes Herv´e D´ejean Horacio Franco Irene Rodrigues

Isabel Trancoso Jorge Baptista Jo˜ ao P. Neto Lu´ıs Oliveira L´ ucia Rino Manuel Concei¸c˜ao Marcelo Finger Maria Mateus Maria Viana Max Silberztein Michel Gagnon Nestor Yoma Nuno Mamede Palmira Marrafa Paulo Quaresma Pl´ınio Barbosa Renata Vieira Rute Costa Teresa Lino Vera Lima Violeta Quental

Table of Contents

Speech Analysis and Recognition Devoicing Measures of European Portuguese Fricatives . . . . . . . . . . . . . . . . Luis M.T. Jesus, Christine H. Shadle

1

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hugo Meinedo, Diamantino Caseiro, Jo˜ ao Neto, Isabel Trancoso

9

Pitch Restoration for Robust Speech Recognition . . . . . . . . . . . . . . . . . . . . . Carlos Lima, Adriano Tavares, Carlos Silva

18

Speech Synthesis Grapheme-Phone Transcription Algorithm for a Brazilian Portuguese TTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filipe Barbosa, Guilherme Pinto, Fernando Gil Resende, Carlos Alexandre Gon¸calves, Ruth Monserrat, Maria Carlota Rosa

23

Improving the Accuracy of the Speech Synthesis Based Phonetic Alignment Using Multiple Acoustic Features . . . . . . . . . . . . . . . . . . . . . . . . . S´ergio Paulo, Lu´ıs C. Oliveira

31

Evaluation of a Segmental Durations Model for TTS . . . . . . . . . . . . . . . . . . Jo˜ ao Paulo Teixeira, Diamantino Freitas

40

From Portuguese to Mirandese: Fast Porting of a Letter-to-Sound Module Using FSTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Isabel Trancoso, C´eu Viana, Manuela Barros, Diamantino Caseiro, S´ergio Paulo

49

A Methodology to Analyze Homographs for a Brazilian Portuguese TTS System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filipe Barbosa, Lilian Ferrari, Fernando Gil Resende

57

Automatic Discovery of Brazilian Portuguese Letter to Phoneme Conversion Rules through Genetic Programming . . . . . . . . . . . . . . . . . . . . . . Evandro Franzen, Dante Augusto Couto Barone

62

Experimental Phonetics Contributions to the Portuguese Articulatory Synthesizer Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ant´ onio Teixeira, Lurdes Castro Moutinho, Rosa L´ıdia Coimbra

66

XII

Table of Contents

Pragmatics, Discourse, Semantics, Syntax, and the Lexicon A Study on the Reliability of Two Discourse Segmentation Models . . . . . . Eva Arim, Francisco Costa, Tiago Freitas

70

Reusability of Dictionaries in the Compilation of NLP Lexicons . . . . . . . . Bento C. Dias-da-Silva, Mirna F. de Oliveira, Helio R. de Moraes

78

Homonymy in Natural Language Processes: A Representation Using Pustejovsky’s Qualia Structure and Ontological Information . . . . . . . . . . . Claudia Zavaglia, Juliana Galvani Greghi

86

Using Adaptive Formalisms to Describe Context-Dependencies in Natural Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jo˜ ao Jos´e Neto, Miryam de Moraes

94

Some Regularities of Frozen Expressions in Brazilian Portuguese . . . . . . . . Oto Ara´ ujo Vale

98

Tools, Resources, and Applications Selva: A New Syntactic Parser for Portuguese . . . . . . . . . . . . . . . . . . . . . . . . 102 Sheila de Almeida, Ariadne Carvalho, Lucien Fantin, Jorge Stolﬁ An Account of the Challenge of Tagging a Reference Corpus for Brazilian Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Sandra Alu´ısio, Jorge Pelizzoni, Ana Raquel Marchi, Luc´elia de Oliveira, Regiana Manenti, Vanessa Marquiaf´ avel Multi-level NER for Portuguese in a CG Framework . . . . . . . . . . . . . . . . . . . 118 Eckhard Bick HMM/MLP Hybrid Speech Recognizer for the Portuguese Telephone SpeechDat Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Astrid Hagen, Jo˜ ao P. Neto Managing Linguistic Resources and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 David M. de Matos, Joana L. Paulo, Nuno J. Mamede Using Morphossyntactic Information in TTS Systems: Comparing Strategies for European Portuguese . . . . . . . . . . . . . . . . . . . . . . . 143 Ricardo Ribeiro, Lu´ıs Oliveira, Isabel Trancoso Timber! Issues in Treebank Building and Use . . . . . . . . . . . . . . . . . . . . . . . . . 151 Diana Santos A Lexicon-Based Stemming Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Gilberto Silva, Claudia Oliveira

Table of Contents

XIII

Contractions: Breaking the Tokenization-Tagging Circularity . . . . . . . . . . . 167 Ant´ onio Horta Branco, Jo˜ ao Ricardo Silva A Linguistic Approach Proposal for Mechanical Design Using Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Jo˜ ao Carlos Linhares, Altamir Dias Identiﬁcation of Direct/Indirect Discourse in Children’s Stories . . . . . . . . . 175 Nuno J. Mamede Curupira: A Functional Parser for Brazilian Portuguese . . . . . . . . . . . . . . . . 179 Ronaldo Martins, Gra¸ca Nunes, Ricardo Hasegawa ANELL: A Web System for Portuguese Corpora Annotation . . . . . . . . . . . . 184 Cristina Mota, Pedro Moura Email2Vmail – An Email Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Hugo Pereira, Pedro Teixeira, Lu´ıs C. Oliveira A Large Speech Database for Brazilian Portuguese Spoken Language Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Carlos Alberto Ynoguti, Pl´ınio Almeida Barbosa, F´ abio Violaro

Dialogue Systems Interpretations and Discourse Obligations in a Dialog System . . . . . . . . . . . 197 M´ arcio Mour˜ ao, Pedro Madeira, Nuno J. Mamede Using Dialogues to Access Semantic Knowledge in a Web IR System . . . . 201 Paulo Quaresma, Irene Rodrigues Managing Dialog and Access Control in Natural Language Querying . . . . . 206 Luis Quintano, Irene Rodrigues

Summarization and Information Extraction GistSumm: A Summarization Tool Based on a New Extractive Method . 210 Thiago Alexandre Salgueiro Pardo, Lucia Helena Machado Rino, Maria das Gra¸cas Volpe Nunes Topic Indexing of TV Broadcast News Programs . . . . . . . . . . . . . . . . . . . . . . 219 Rui Amaral, Isabel Trancoso

Evaluation An Initial Proposal for Cooperative Evaluation on Information Retrieval in Portuguese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Rachel Aires, Sandra Alu´ısio, Paulo Quaresma, Diana Santos, M´ ario J. Silva

XIV

Table of Contents

Evaluation of Finite-State Lexical Transducers of Temporal Adverbs for Lexical Analysis of Portuguese Texts . . . . . . . . . . . . . . . . . . . . . 235 Jorge Baptista Evaluating Automatically Computed Word Similarity . . . . . . . . . . . . . . . . . . 243 Caroline Varaschin Gasperin, Vera L´ ucia Strube de Lima Evaluation of a Thesaurus-Based Query Expansion Technique . . . . . . . . . . 251 Luiz Augusto Sangoi Pizzato, Vera L´ ucia Strube de Lima Cooperatively Evaluating Portuguese Morphology . . . . . . . . . . . . . . . . . . . . . 259 Diana Santos, Lu´ıs Costa, Paulo Rocha

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

Devoicing Measures of European Portuguese Fricatives Luis M.T. Jesus1 and Christine H. Shadle2 1

Escola Superior de Sa´ ude da Universidade de Aveiro, and Instituto de Engenharia Electr´ onica e Telem´ atica de Aveiro Universidade de Aveiro, 3810-193 Aveiro, Portugal [email protected] http://www.ieeta.pt/˜lmtj/ 2 Department of Electronics and Computer Science University of Southampton, Southampton, SO17 1BJ, UK [email protected] http://www.ecs.soton.ac.uk/info/people/chs/

Abstract. This paper presents a study of devoicing of European Portuguese fricatives. Two devoicing criteria (a manual criterion and a criterion based on the ratio of variances of the laryngograph signal during the VF transition and during the fricative) were used to classify the examples into two or three categories. The results of the automatic and manual measures of devoicing are compared, and an explanation for observed misclassiﬁcations is presented. Devoicing occurs extensively in Portuguese; results are compared to those for English.

1

Introduction

Studies of Portuguese fricatives have been limited in comparison with other languages, yet are important for applications such as speech synthesis. In this paper we focus on devoicing of fricatives using recordings by four subjects of a set of corpora. The segmentation method and two devoicing classiﬁcation methods using acoustic and laryngograph signals are described. Results are presented and compared to studies of devoicing in English.

2

Design and Recording of a Corpus of Portuguese Fricatives

A speech corpus has been designed to explore the fricatives of standard European Portuguese. The phonetic and phonological evidence underlying the design of the corpus have been described in Jesus (2001) and Jesus and Shadle (2002). Corpora were devised that included the fricatives /f, v, s, z, S, Z/ in the following contexts: sustained (Corpus 1), repeated nonsense words following Portuguese phonological rules (Corpus 2), Portuguese words containing fricatives in frame sentences (Corpus 3), and the same set of words in sentences (Corpus 4). N.J. Mamede et al. (Eds.): PROPOR 2003, LNAI 2721, pp. 1–8, 2003. c Springer-Verlag Berlin Heidelberg 2003

2

L.M.T. Jesus and C.H. Shadle

The subjects used in this study were two male (LMTJ and CFGA) and two female (ACC and ISSS) adult Portuguese native speakers, with no reported history of hearing or speech disorders. Subject LMTJ, age 26, is from the city of Aveiro (at the centre of Portugal), and CFGA, age 26, is from Braga (in north Portugal). Speaker ACC, age 33, is from Sintra (a city very close to Lisbon), and ISSS, age 21, is from Lisbon. At the time of the recordings all subjects had been studying in England for a period of two to three years. Recordings were made in a sound treated room using a Bruel & Kjaer 4165 1 inch microphone located 1 m in front of the subject’s mouth, connected to 2 a Bruel & Kjaer 2639 pre-ampliﬁer. The signal was ampliﬁed and ﬁltered by a Bruel & Kjaer 2636 measurement ampliﬁer, with high-pass cut-on frequency of 22 Hz and low-pass cut-oﬀ frequency of 22 kHz. A laryngograph signal (Lx) was also collected using a laryngograph processor1 . The acoustic speech signal and Lx were recorded with a Sony TCD-D7 DAT system at 16 bits, with a sampling frequency of 48 kHz, and digitally transferred to a computer for post-processing. A 94 dB, 1000 Hz calibration tone produced by a Bruel & Kjaer 4620 calibrator was also recorded on the same tape on which speech was recorded (Jesus 2001).

3

Method for Segmentation and Annotation

The acoustic and laryngographic signals were recorded on DAT tape (16 bit, sampling frequency 48 kHz) and digitally transferred to .wav computer ﬁles. The time waveforms of all the corpus words were manually analysed to detect the start of the vowel-fricative (VF) transition, the start of the fricative, the end of the fricative, and the end of the fricative-vowel (FV) transition. During the VF transition, there is a decrease in amplitude, voicing ceases (for unvoiced fricatives) and frication noise starts, as shown in Fig. 1. During the FV transition, there is an increase in amplitude, voicing starts (for unvoiced fricatives) and frication noise ceases (Docherty 1992, pp. 118–119). These events do not occur simultaneously or always in the same order, making the segmentation a somewhat subjective process. However, it is important to segment consistently, because the results of the analysis methods depend on where the boundaries are placed (Docherty 1992, pp. 103–110). The amplitude and voicing changes appear in both acoustic and Lx signals, which aids the segmentation process. For example, as can be seen in Fig. 1, the start of the VF transition was chosen because both signals begin to decrease in amplitude noticeably at that point. The start of the fricative is placed where the noise in the acoustic signal becomes more apparent (in an unvoiced fricative, it would be placed after the Lx signal has lost all signs of periodicity); and the end of the fricative is placed where the frication noise ceases to be visible in the acoustic signal, and where voicing begins again in the LX signal (as in this partially devoiced example, or in an unvoiced fricative). Note that in this case, the end of the fricative is much more abrupt than its start, making the FV transition region easier to segment. 1

Model LxProc, type PCLX produced by Laryngograph Ltd (UK).

Devoicing Measures of European Portuguese Fricatives

3

Laryngograph Signal 0.3

Amplitude

0

−0.3

Start VF Trans.

End Fricative

Start Fricative

End FV Trans.

Acoustic Signal 0.15

0

−0.15 120

140

160

180

200

220

240

Time (ms)

Fig. 1. Laryngograph signal and acoustic signal of fricative /z/ in azar /åÉza|/, showing the start of the vowel-fricative transition, the start of the fricative, the end of the fricative, and the end of the fricative-vowel transition. Corpus 3 (Speaker ISSS)

Once segmented, the sample numbers for each boundary were entered into an annotation ﬁle that was used by the analysis programs.

4

Measures of Devoicing

In the word corpora (Corpus 3 and 4), there were large amounts of devoicing. When the vocal tract is constricted for a voiced fricative, voicing is often maintained over only part of the fricative, because there is a trading relation between the pressure drops across the glottal and tongue constrictions (Stevens 1987, p. 388). When voiced fricatives devoice, it is with a whisper phonation (Abercrombie 1967, p. 137), distinguishing them from their voiceless counterparts which are realised with a glottal abduction gesture. Smith (1997) also suggested that the glottis is in a state intermediate between voicing and voicelessness, like the state of the glottis that is used in whisper, with the glottis open but the folds very close together. Haggard (1978) deﬁned devoicing as presence of measurable friction in the absence of continued glottal vibration, i.e., the periodic component of the voiced fricative ceased before the friction component. Smith (1997, p. 478) used a criterion for devoicing in American English based on the amplitude of the electroglottograph (EGG) cycles: “The fricative was considered to be voiced during the portion of its duration that the amplitude of the EGG cycles exceeded one-tenth of the EGG cycle amplitude at the time of maximum energy in the preceding

4

L.M.T. Jesus and C.H. Shadle

vowel”. A study by Pirello et al. (1997, p. 3756) presents an alternative measure of voicing based on the acoustic signal: “An amplitude diﬀerence greater than 10 dB between the amplitude of the vowel and frication noise was classiﬁed as voiceless. A diﬀerence of less than or equal to 10 dB sustained over 30 ms was classiﬁed as voiced”. 4.1

Manual Criterion for Devoicing

Both the acoustic signal and the laryngograph signal were used to determine if a fricative was devoiced. A fricative was called devoiced when less than one-third of the frication interval showed periodic structure in the acoustic or laryngograph signals. The term partially devoiced was used when more than one-third but less than half of the frication interval contained steady acoustic and laryngograph signal cycles. A fricative was called voiced when more than half of the frication interval showed steady acoustic and laryngograph signal cycles, even if the amplitude was much lower than in the vowel (this follows Docherty’s deﬁnition 1992, p. 13). If the laryngograph signal was clearly periodic, the interval was classiﬁed as voiced; if the laryngograph signal was zero or distorted, the signal was classiﬁed as voiced only if the acoustic signal was unambiguously periodic. 4.2

Automatic Criterion for Devoicing

As pointed out by Docherty (1992, p. 102), many techniques for automatically detecting whether a portion of a signal is voiced or not have been used in the past, but proved to be unsuitable for fricatives because in this class of speech sounds voicing has low energy. Therefore a new criterion, based on the laryngograph signal, was tested for the corpora used in the present study. The sample mean x=

N 1 xi N i=1

(1)

and sample variance N

σ 2 (x) =

1 2 (xi − x) N − 1 i=1

(2)

of the laryngograph signal samples xi , i = 1, ...N were calculated during the VF transition and during the fricative. The ratio of variances of the two intervals, rσ2 (x) =

σt2 (x) , σf2 (x)

(3)

where σt2 (x) is the variance of the signal during the VF transition and σf2 (x) is the variance of the signal during the fricative, was used as an automatic criterion for devoicing. Obviously the ratio is larger if the ﬂuctuations in the laryngograph signal during the fricative are very small compared to those during the transition region. A heuristic threshold of 15 was used: for rσ2 (x) ≥ 15, the

Devoicing Measures of European Portuguese Fricatives

5

fricative is labelled devoiced; if rσ2 (x) < 15, voiced. Fricatives manually classiﬁed as partially voiced were considered to be in the devoiced category when comparing the manual and automatic criteria. The laryngograph signal presents, in some voiced fricative examples and in most unvoiced fricative examples, a slowly increasing or decreasing amplitude over the frication interval, which results in a large variance, and therefore a misclassiﬁcation as voiced. This problem has been solved using an averaged rσ2 (x) . We have computed the mean x and the variance σ 2 (x) for three consecutive equal length sections of the frication interval, calculated the average frication interval variance, and used it to compute a new ratio of variances. We have tried using a larger number of sections over which we calculate the averaged rσ2 (x) but this does not signiﬁcantly improve the eﬃciency of this measure of devoicing.

5

Results of Devoicing Analysis Using the Manual Criterion

In this section we consider ﬁrst the results for Corpus 3, the words in frame sentences, and then the results for Corpus 4, sentences made using the same words. The full corpus is given in Jesus and Shadle (2002). Corpus 3 results showed that 71% (241 out of 341) of fricatives devoiced (averaged across all positions within word), and the percentage of devoicing increased as the place of articulation moved posteriorly. For all four subjects 55% (70 out of 127) of the examples of fricative /v/ were totally devoiced (see Sect. 4.1 for devoicing criterion); 74% (79 out of 107) of the examples of fricative /z/ were totally devoiced; 86% (92 out of 107) of the examples of fricative /Z/ were totally devoiced. Most word-ﬁnal fricative examples (93% – 55 out of 59) were totally devoiced. Corpus 4 results showed that 61% (252 out of 413) of fricatives devoiced, and most word-ﬁnal fricatives were devoiced. For all four subjects 44% (77 out of 177) of the examples of fricative /v/ were totally devoiced; 78% (86 out of 110) of the examples of fricative /z/ were totally devoiced; 71% (89 out of 126) of the examples of fricative /Z/ were totally devoiced. The Corpus 4 fricatives devoiced mostly word-ﬁnally, but less often than in Corpus 3: word-initial – 97/157 = 62% (in Corpus 3, 75/127 = 59%); word-medial – 111/195 = 57% (in Corpus 3, 111/120 = 93%); word-ﬁnal – 44/61 = 72% (in Corpus 3, 55/59 = 93%). Some of the fricatives in the sentences of Corpus 4 that have been classiﬁed as wordﬁnal are followed by voiced phonemes. Some of the words that follow these fricatives even start with a vowel. This might account for the lower word-ﬁnal average percentage of devoicing in Corpus 4 when compared with Corpus 3. Indeed, some voiceless fricatives become voiced in Corpus 4, likely as a result of cross-word coarticulation: eleven tokens of word-ﬁnal /S/ were produced as [Z] by speakers LMTJ and ACC when followed by a word starting with a voiced phoneme ([d] or [m]). There is a slightly lower number of devoiced examples of /Z/ than /z/, which contradicts the very clear results of Corpus 3 fricatives (in which the percentage of devoiced examples decreases as the place of articulation moves anteriorly).

6

L.M.T. Jesus and C.H. Shadle

One possible explanation could be that /Z/ is produced in a more anterior place in continuous speech than in isolated word production. This hypothesis can only be conﬁrmed with additional articulatory data, which is planned as future work. These results can be compared to those of Smith (1997) and Pirello et al. (1997) for American English. Smith (1997) studied only /z/ in a range of contexts and measured 47% devoiced and 36% partially devoiced. Pirello et al. (1997) studied /v, z/ in nonsense words (fricatives in initial and stressed position) and measured, respectively, 5%, 20% devoiced, and 35%, 40% partially devoiced. The comparable ﬁgures for initial stressed /v, z/ in Corpora 3 and 4 are 32%, 55% devoiced and 16%, 23% partially devoiced. We must of course be cautious, because we have only four subjects and they may have been inﬂuenced by the two-to-three years spent in England. But subject to these reservations, it appears that there is more devoicing in European Portuguese than in American English.

6

Evaluation of the Automatic Devoicing Criterion

The ratio of variances (rσ2 ≥ 15), described in Sect. 4.2, was used as the criterion for devoicing for the Corpus 3 and 4 fricatives of Speaker LMTJ. Results are as shown in Table 1. There are some examples which are classiﬁed diﬀerently from the manual criterion (see Sect. 4.1). Still, the percentage of examples from Corpus 3 which were classiﬁed in the same category using the two methods is quite high: /v/ – 86.1% (31 out of 36), /z/ – 93.3% (28 out of 30), and /Z/ – 83.3% (25 out of 30). The percentage of “correctly classiﬁed” examples from Corpus 4 was: 79% (27 out of 34) of /v/; 77% (17 out of 22) of /z/; and 64% (14 out of 22) of /Z/. Overall, 83% of voiced fricatives are classiﬁed correctly. For Corpus 3 most of the discrepancies result from cases on the partially devoiced/completely devoiced borderline, giving promise that this automatic measure can be reliably used in the future. Some examples present a few peaks in the laryngograph waveform, which contribute to a larger variance than initially expected. Whether these peaks are included in the fricative or in the adjacent Table 1. Inventory of all cases of complete devoicing (using the automatic criterion). Values given are in the form x/y, where x = number of devoiced examples, and y = total number of examples. Corpus 3 (top) and Corpus 4 (bottom). Speaker LMTJ

/v/ /z/ /Z / All Fric. /v/ /z/ /Z / All Fric.

Word-Initial 9/14 (64.3%) 6/10 (60%) 5/10 (50%) 20/34 (58.8%) 3/14 (21.4%) 3/8 (37.5%) 4/8 (50%) 10/30 (33.3%)

Word-Medial 8/13 (61.5%) 15/17 (88.2%) 13/15 (86.7%) 36/45 (80%) 7/18 (38.9%) 8/10 (80%) 4/9 (44.4%) 19/37 (51.4%)

Word-Final 8/9 (88.9%) 3/3 (100%) 4/5 (80%) 15/17 (88.2%) 0 3/4 (75%) 3/5 (60%) 6/11 (54.6%)

All Pos. 25/36 (69.4%) 24/30 (80%) 22/30 (73.3%) 71/96 (74%) 10/34 (29.4%) 14/22 (63.6%) 11/22 (50%) 35/78 (44.9%)

Devoicing Measures of European Portuguese Fricatives

7

vowels depends on the criteria used for segmentation. There are also some examples in Corpus 3 manually classiﬁed as voiced but with a ratio of variances greater than 15. Although there is voicing throughout the whole frication interval, the amplitude of the laryngograph signal during the fricative is much lower than during the VF transition. Most examples present a signiﬁcant amplitude reduction of the laryngograph signal for the duration of the voiced fricative. A total of 39 examples from Corpus 4 were misclassiﬁed as voiced because there was no VF transition or there was devoicing during the VF transition (51% – 20 out of 39), and because there were a few cycles of the laryngograph during the production of the fricative (41% – 16 out of 39). The remainder of misclassiﬁed examples (8% – 3 out of 39) resulted from a dc drift in the laryngograph signal for the duration of the fricative. The rσ2 metric was also successful when used for the unvoiced fricatives /f, s, S/ of Corpus 3 and 4, with an overall 83% correctly classiﬁed. The percentage of “correctly classiﬁed” unvoiced examples of Corpus 3 was: /f/ – 96.2% (25 out of 26); /s/ – 85.2% (23 out of 27); and /S/ – 93.8% (30 out of 32). For Corpus 4 the results were: 81% (13 out of 16) of /f/; 55% (16 out of 29) of /s/; 85% (17 out of 20) of /S/.

7

Conclusions

Devoicing rate is generally very high for European Portuguese fricatives, especially when compared with studies of other languages, and devoicing occurs more often in word-ﬁnal than word-initial position. A possible explanation for such high percentages of devoicing could be that, due to the structure of the language and its vocabulary, Portuguese speakers are very seldom faced with confusions between voiced and devoiced examples. It is thought that this is an important characteristic of European Portuguese, which would have to be incorporated in any production model to obtain more natural-sounding synthetic speech. Factors that might be correlated with devoicing were investigated using Corpora 3 and 4 for two of the subjects, LMTJ and ACC. We note that the speakers all produced a large number of repeated tokens of nonsense words in one breath (more than 12 tokens). This high rate of speech (compared with previous recordings of similar corpora by French, American English and German speakers) could be one of the reasons why there are so many devoiced examples. A preliminary evaluation of the automatic criterion for devoicing showed great potential for the use of this technique in future work. The percentage of voiced fricative tokens from Corpus 3 and 4 which were classiﬁed in the same category using the two methods (manual and automatic) is quite high (overall 83%; range 64–93%).

Acknowledgements. This work was partially supported by Funda¸c˜ao para a Ciˆencia e a Tecnologia, Portugal.

8

L.M.T. Jesus and C.H. Shadle

References [1] Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh: Edinburgh University Press. [2] Docherty, G.J. (1992). The Timing of Voicing in British English Obstruents. Berlin: Foris Publications. [3] Haggard, M. (1978). The devoicing of voiced fricatives. Journal of Phonetics 6, 95–102. [4] Jesus, L.M.T. (2001). Acoustic Phonetics of European Portuguese Fricative Consonants. Ph.D Thesis, Department of Electronics and Computer Science, University of Southampton, Southampton, UK. [5] Jesus, L.M.T. and C.H. Shadle (2002). A parametric study of the spectral characteristics of European Portuguese fricatives. Journal of Phonetics 30 (3), 437–464. [6] Pirello, K., S.E. Blumstein, and K. Kurowski (1997). The characteristics of voicing in syllable-initial fricatives in American English. Journal of the Acoustical Society of America 101 (6), 3754–3765. [7] Smith, C.L. (1997). The devoicing of /z/ in American English: Eﬀects of local and prosodic context. Journal of Phonetics 25 (4), 471–500. [8] Stevens, K.N. (1987). Interaction between acoustic sources and vocal-tract conﬁgurations for consonants. In Proceedings of the 11th International Congress of Phonetic Sciences (ICPhS 87), Volume 3, Tallinn, Estonia, USSR, pp. 385–389.

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, Jo˜ ao Neto, and Isabel Trancoso L2 F – Spoken Language Systems Lab INESC-ID / IST, Rua Alves Redol, 9, 1000-029 Lisboa, Portugal {Hugo.Meinedo,Diamantino.Caseiro,Joao.Neto,Isabel.Trancoso}@l2.inesc-id.pt http://l2f.inesc-id.pt

Abstract. Many applications such as media monitoring are experiencing a large expansion as a consequence of the diﬀerent emerging media sources and can beneﬁt dramatically by using automatic transcription of audio data. In this paper, we describe the development of a speech recognition engine, AUDIMUS.MEDIA used in the Broadcast News domain. Additionally we describe recent improvements that permitted a relative recognition error decrease of more than 20% and a 4x speed-up.

1

Introduction

The development of speech recognition systems associated to Broadcast News (BN) tasks open the possibility for novel applications where the use of automatic transcriptions is a major attribute. We have been developing a system for selective dissemination of multimedia information in the scope of an IST-HLT European programme project. To accomplish that goal we have been working in the development of a broadcast news speech recognition system associated with automatic topic detection algorithms [1]. The idea was to build a system capable of identifying speciﬁc information in multimedia data consisting of audio-visual streams, using continuous speech recognition, audio segmentation and topic detection techniques. The automatic transcriptions produced by our speech recognition system are used for a topic detection module that outputs a set of topics to be sent to end users. This paper describes in detail our BN speech recognition system engine, AUDIMUS.MEDIA. Section 2 introduces the BN corpus used for training the system. Section 3 describes the acoustic models. In Sects. 4 and 5 we present the vocabulary and lexicon building and the language model. The decoder algorithm is described in Sect. 6, followed by our latest BN recognition results in Sect. 7. Finally some concluding remarks are presented in Sect. 8. N.J. Mamede et al. (Eds.): PROPOR 2003, LNAI 2721, pp. 9–17, 2003. c Springer-Verlag Berlin Heidelberg 2003

10

2

H. Meinedo et al.

BN Corpus

To support the research and developments associated with this task it was necessary to collect a representative Portuguese BN corpus in terms of amount, characteristics and diversity. We started by deﬁning the type of programs to monitor, in close cooperation with RTP the Portuguese public broadcast company, being selected as primary goals all the news programs, national and regional, from morning to late evening, including both normal broadcasts and speciﬁc ones dedicated to sports and ﬁnancial news. Given its broader scope and larger audience, the 8 PM news program was selected as the prime target. This corpus serves two main tasks: the development of a BN speech recognition system and a system for topic segmentation and indexing. In that sense we divided our corpus in two parts: the Speech Recognition Corpus and the Topic Detection Corpus. Since each of these parts serves diﬀerent purposes it will also have diﬀerent features. Prior to the collection of these corpora we started with a relative small Pilot Corpus of approximately 5 hours, including both audio and video, which was used to setup the collection process, and discuss the most appropriate kind of programs to collect. The Speech Recognition Corpus was collected next, including 122 programs of diﬀerent types and schedules and amounting to 76h of audio data. The main goal of this corpus was the training of the acoustic models and the adaptation of the language models used in the large vocabulary speech recognition component of our system. The last part of our collection eﬀort was the Topic Detection Corpus, containing around 300 hours of audio data related to 133 TV broadcast of the 8 PM news program. The purpose of this corpus was to have a broader coverage of topics and associated topic classiﬁcation for training our topic indexation module. RTP as data provider was responsible to collect the data in their installations. The transcription process was jointly done by RTP and our institution, and made through the Transcriber tool following the LDC Hub4 (Broadcast Speech) transcription conventions. Most of the audio data was ﬁrst automatically transcribed. The orthographic transcriptions of the Pilot Corpus and the Speech Recognition Corpus were manually veriﬁed. For the Topic Detection Corpus, we have only the automatic transcriptions and the manual segmentation and indexation of the stories made by RTP staﬀ in charge of their daily program indexing. Our institution was responsible for the validation process, training the annotators and packaging the data. 2.1

Pilot Corpus

The Pilot Corpus was collected in April 2000 and served as a test bed for the capture and transcription processes. We selected one of each type of program resulting in a total of 11 programs and a duration of 5h 33m. After removing the jingles and commercial breaks we ended up with a net duration of 4h 47m. The corpus was manually transcribed at RTP. The corpus includes the MPEG-1 ﬁles (.mpeg) where the audio stream was recorded at 44.1 kHz at 16 bits/sample, a separated audio ﬁle (.wav) extracted from the MPEG-1 data, a transcription ﬁle (.trs) resulting from the manual annotation process in the Transcriber tool, and a

AUDIMUS.media: A Broadcast News Speech Recognition System

11

Table 1. Pilot Corpus programs Program Not´ıcias Jornal da Tarde Pa´ıs Regi˜ oes Pa´ıs Regi˜ oes Lisboa Telejornal Remate 24 Horas RTP Economia Acontece Jornal 2 Grande Entrevista Total

Duration 0:08:02 0:57:36 0:16:05 0:24:21 0:45:31 0:07:30 0:24:23 0:09:43 0:20:39 0:49:34 1:09:38 5:33:00

Type Morning news Lunch time news Afternoon news Local news Evening news Daily sports news Late night news Financial news Cultural news Evening news Political / Economic interview (weekly)

.xls (Excel format) ﬁle including the division into stories and their corresponding summary that results from the daily process of indexation at RTP. The ﬁnal contents of the Pilot Corpus in terms of programs, duration and type is presented in Table 1. 2.2

Speech Recognition Corpus

The Speech Recognition Corpus was collected from October 2000 to January 2001, with small changes in the programs when compared with the Pilot Corpus. This corpus was divided in three sets: training, development and evaluation. A complete schedule for the recordings was elaborated leaving some time intervals between the diﬀerent sets. We got 122 programs in a total of 75h 43m. We adopted the same base conﬁguration as for the Pilot Corpus, except that we did not collected the video stream. Also the audio was recorded at 32 kHz, due to restrictions of the hardware, and later downsampled to 16 kHz which was appropriate to the intended processing. This corpus was automatically transcribed and manually veriﬁed. As a result it includes only a audio stream ﬁle (.wav) and a transcription ﬁle (.trs). The ﬁnal contents of the Speech Recognition Corpus in terms of programs, duration and type is presented in Table 2.

3

AUDIMUS.MEDIA Recognition System

AUDIMUS.MEDIA is a hybrid speech recognition system that combines the temporal modeling capabilities of Hidden Markov Models (HMMs) with the pattern discriminative classiﬁcation capabilities of multilayer perceptrons (MLPs). In this hybrid HMM/MLP system a Markov process is used to model the basic temporal nature of the speech signal. The MLP is used as the acoustic model estimating context-independent posterior phone probabilities given the acoustic data at each frame. The acoustic modeling of AUDIMUS.MEDIA combines phone

12

H. Meinedo et al. Table 2. Speech Recognition Corpus programs Training

Programs

Not´ıcias Jornal da Tarde Pa´ıs Regi˜ oes Pa´ıs Regi˜ oes Lx Telejornal 24 Horas RTP Economia Acontece Jornal 2 Total

Number

7 8 12 7 30 4 13 9 7 97

Development

Duration Number

0:43:52 7:55:46 6:43:47 2:16:47 32:41:34 1:18:53 1:53:02 3:05:38 4:53:39 61:32:58

1 1 1 1 3 2 2 1 1 13

Evaluation

Duration Number

0:10:38 1:13:10 0:32:49 0:20:32 3:37:13 1:02:43 0:10:38 0:19:47 0:46:04 8:13:34

1 1 1 1 2 2 2 1 1 12

Duration

0:10:41 1:02:59 0:33:46 0:20:16 1:54:18 0:38:35 0:19:58 0:17:50 0:38:26 5:56:49

Type

Morning news Lunch time news Afternoon news Local news Evening news Late night news Financial news Cultural news Evening news

probabilities generated by several MLPs trained on distinct feature sets resulting from diﬀerent feature extraction processes. These probabilities are taken at the output of each MLP classiﬁer and combined using an average in the logprobability domain [2]. All MLPs use the same phone set constituted by 38 phones for the Portuguese language plus silence and breath noises. The combination algorithm merges together the probabilities associated to the same phone. We are using three diﬀerent feature extraction methods and MLPs with the same basic structure, that is, an input layer with 9 context frames, a non-linear hidden layer with over 1000 sigmoidal units and 40 softmax outputs. The feature extraction methods are PLP, Log-RASTA and MSG.

4

Language Modeling

During the few last years we have been collecting Portuguese newspapers from the web, which allowed us to build a considerably large text corpus. Until the end of 2001 we have texts amounting to a total of 24.0M sentences with 434.4M words. A language model generated only from newspaper texts becomes too much adapted to the type of language used in those texts. When this language model is used in a continuous speech recognition system applied to a Broadcast News task it will not perform as well as one would expect because the sentences spoken in Broadcast news do not match the style of the sentences written in the newspaper. A language model from Broadcast News transcriptions would probably be more adequate for this kind of speech recognition task. The problem is that we do not have enough BN transcriptions to generate a satisfactory language model. However we can adapt a newspaper text language model to the BN task by combining it with a model created from BN transcriptions using linear interpolation, and thus improve performance. One of the models is always generated from the newspaper text corpus while the other is a backoﬀ trigram model using absolute discounting and based on the training set transcriptions of our BN database. The optimal weights used in

AUDIMUS.media: A Broadcast News Speech Recognition System

13

the interpolation are computed using the transcriptions from the development set of our BN database. The ﬁnal interpolated model has a perplexity of 139.5 and the newspapers model has 148.0. It is clear that even using a very small model based in BN transcriptions we can obtain some improvement in the perplexity of the interpolated model.

5

Vocabulary and Pronunciation Lexicon

From the text corpus with 335 million words created from all the newspaper editions collected until the end of 2000, we extracted 427k diﬀerent words. About 100k of these words occur more than 50 times in the text corpus. Using this smaller set, all the words were classiﬁed according to syntactic classes. Diﬀerent weights were given to each class and a subset with 56k words was created based on the weighted frequencies of occurrence of the words. To this set we added basically all the new words present in the transcripts of the training data of our Broadcast News database then being developed, giving a total of 57,564 words. Currently the transcripts contain 12,812 diﬀerent words from a total of 142,547. From the vocabulary we were able to build the pronunciation lexicon. To obtain the pronunciations we used diﬀerent lexica available in our institution. For the words not present in those lexica (mostly proper names, foreign names and some verbal forms) we used an automatic grapheme-phone system to generate corresponding pronunciations. Our ﬁnal lexicon has a total of 65,895 diﬀerent pronunciations. For the development test set corpus which has 5,426 diﬀerent word in a total of 32,319 words, the number of out of vocabulary words (OOVs) using the 57k word vocabulary was 444 words representing a OOV word rate of 1.4%.

6

Weighted Finite-State Dynamic Decoder

The decoder underlying the AUDIMUS.MEDIA system is based the weighted ﬁnitestate transducer (WFST) approach to large vocabulary speech recognition [3]. In this approach, the search space used by the decoder is a large WFST that maps observation distributions to words. This WFST consists of the composition of various transducers representing components such as: the acoustic model topology H; context dependency C; the lexicon L and the language model G. The search space is thus H ◦ C ◦ L ◦ G1 , and is traditionally compiled outside of the decoder, which uses it statically. Our approach diﬀers in that our decoder is dynamic and builds the search space ”on-the-ﬂy” as required by the particular utterance being decoded [4]. Among the advantages provided by the dynamic construction of the search space are: a better scalability of the technique to large language models; reduction of 1

We use the matrix notation for composition.

14

H. Meinedo et al.

the memory required in runtime; and easier adaptation of the components in runtime. The key to the dynamic construction of the search space is our WFST composition algorithm [5] specially tailored for the integration of the lexicon with the language model (represented as WFSTs). Our algorithm performs simultaneous the composition and determinization of the lexicon and the language model while also approximating other optimizing operations such as weight pushing and minimization [6]. The goal of the determinization operation is to reduce lexical ambiguity in a more general way than what is achieved with the use of a tree-organized lexicon. Weight pushing allows the early use of language model information, which allows the use of tighter beams thus improving performance. Minimization essentially reduces the memory required for search while also giving a small speed improvement. 6.1

Alignment

In order to allow the registration of time boundaries the decoder allows the use of a special label EOW in the input side of the search space transducer. Whenever that label is crossed while searching, the decoder records the time of the crossing. That label is thus used to mark word or phone boundaries. Using the EOW labels the decoder can be used for alignment, in alignment mode the search space is usually build as H ◦ L ◦ S where S is the orthographic transcription of the utterance being recognized. The fact that the decoder imposes no a priori restrictions on the search space structure gives us great ﬂexibility, for example, alternative pronunciation rules can be used by compiling then in a ﬁnite-state transducer R, and building the search space as H ◦ R ◦ L ◦ S[7]. The EOW label is also given other uses in the decoder, for example, it can be used as an aid in the construction of word lattices wherein the labels mark the end of segments corresponding to arcs in the lattice. One other use of the label is to collect word-level conﬁdence features that can be used to compute conﬁdence scores. 6.2

Pruning

Pruning is fundamental to control the search process in large vocabulary recognition. Our decoder uses 3 forms of pruning: beam pruning; histogram pruning; and phone deactivation pruning. Each form of pruning deals with a diﬀerent aspect of the search process. Beam pruning is probably the most important, and consists of pruning the hypotheses with a score worse than a given amount (the beam) from the best one among the hypotheses ending at a particular frame. This form of pruning is used by most large vocabulary decoders. Our form of beam pruning diﬀers from most in its eagerness, which allows the pruning of hypotheses while they are being generated, by using the cost of the best hypothesis so far as a reference. When an hypothesis with cost ct at time t is

AUDIMUS.media: A Broadcast News Speech Recognition System

15

propagated though an edge with input label d and weight w, its cost in the next frame is updated with two components: a transition weight w that incorporates linguistic constraints; and an acoustic weight distr(d, t + 1) obtained from the speech signal. Because the transition weight is often of the same order of magnitude as the beam, we obtain signiﬁcant improvements by performing the pruning test twice: ﬁrst the cost ct + w is tested and then ct + w + distr(d, t + 1). If the ﬁrst test fails, we avoid the expensive computation of both distr(d, t + 1) and the bookkeeping associated with the expansion of the hypothesis. The function of histogram pruning is to reduce peak resource usage, time and memory, to a reasonable limit. It consists of establishing the maximum number m of hypotheses that are expanded at each frame. Whenever their number is over the limit, only the m best are kept and the other are pruned. If the value of m is set to a reasonable value (such as 100000) then it was virtually no negative eﬀect on the accuracy of the decoder while preventing it from staling when there is a severe acoustic mismatch relative to the training conditions. Phone deactivation pruning [8] takes advantage of the fact that the MLP directly estimates the posterior probability of each phone. This form of pruning consists of ﬂooring the posterior probability of a given phone to a very low value when it is below a given threshold. This has the eﬀect of allowing the MLP to deactivate some unlikely phones. There is usually an optimal value for the threshold, if too large then the search will be faster but more error prone, if it too low, then the search will be slower with no advantage regarding the accuracy (sometimes the accuracy is even worse due to the MLP having diﬃculty modeling low probabilities).

7

Speech Recognition Results

Speech recognition results were conducted in the development test set which has over 6 hours of BN data. The experiments were conducted in a Pentium III 1GHz computer running Linux with 1Gb RAM. Table 3 summarizes the word error rate (% WER) evaluation obtained by AUDIMUS.MEDIA. The lines in Table 3 show the increase in performance by each successive improvement to the recognition system. The ﬁrst column of results refers to the F0 focus condition where the sentences contain only prepared speech with low background noise and good quality audio. The second results column refers to the WER obtained in all test sentences, including noise, music, spontaneous speech, telephone speech, non-native accents and also including the F0 focus condition sentences. The ﬁrst line of results in Table 3 were obtained using MLPs with 1000 hidden units and a stack decoder. Compared with the second line of results we see that there was a signiﬁcant increase in performance obtained when we switched to the new WFST dynamic decoder, especially in decoding time, expressed in the last column as real-time speed. The third line of results shows the improvement obtained by substituting the determinized lexicon transducer by one that was

16

H. Meinedo et al. Table 3. BN speech recognition evaluation using the development test set MLPs 1000 1000 1000 4000 4000 4000

Decoder stack WFST + min det L “ + eager pruning + shorter HMMs

% WER F0 All 18.3 33.6 18.8 31.6 18.0 30.7 16.9 29.1 16.7 28.9 14.8 26.5

xRT 30.0 4.8 4.3 3.7 3.9 7.6

also minimized. The fourth line shows 8% relative improvements obtained from increasing the hidden layers to 4000 units. This increase was necessary because the acoustic models MLPs were no longer coping with all the variability present in the Speech Recognition and Pilot corpus that were used as training data. The ﬁfth line shows the positive eﬀect of the eager pruning mechanism described in Sect. 6.2. Our current system, shown in line six, achieves another 8% relative improvement by decreasing the minimum duration of phone models by one frame.

8

Concluding Remarks

Broadcast News speech recognition is a very diﬃcult and resource demanding task. Our recognition engine evolved substantially through the accumulation of relatively small improvements. We are still far from perfect recognition, the ultimate goal, nevertheless our current technology is able to drive a number of very useful applications, including audio archive indexing and topic retrieval. In this paper we have described a number of improvements that permitted a relative recognition error decrease of more than 20% and speed-up from 30x real-time to as little as 7.6 xRT.

References 1. Amaral, R., Langlois, T., Meinedo, H., Neto, J., Souto, N., Trancoso, I.: The development of a portuguese version of a media watch system. In: Proceedings EUROSPEECH 2001, Aalborg, Denmark (2001) 2. Meinedo, H., Neto, J.: Combination of acoustic models in continuous speech recognition. In: Proceedings ICSLP 2000, Beijing, China (2000) 3. Mohri, M., Pereira, F., Riley, M.: Weighted ﬁnite-state transducers in speech recognition. In: ASR 2000 Workshop. (2000) 4. Caseiro, D., Trancoso, I.: Using dynamic wfst composition for recognizing broadcast news. In: Proc. ICSLP ’2002, Denver, Colorado, USA (2002) 5. Caseiro, D., Trancoso, I.: On integrating the lexicon with the language model. In: Proc. Eurospeech ’2001, Aalborg, Denmark (2001) 6. Caseiro, D., Trancoso, I.: Transducer composition for “on-the-ﬂy” lexicon and language model integration. In: Proc. ICASSP ’2003, Hong Kong, China (2003)

AUDIMUS.media: A Broadcast News Speech Recognition System

17

7. Caseiro, D., Silva, F.M., Trancoso, I., Viana, C.: Automatic alignment of map task dialogs using wfsts. In: Proc. PMLA, ISCA Tutorial and Research Workshop on Pronunciation Modelling and Lexicon Adaptation, Aspen, Colorado, USA (2002) 8. Renals, S., Hochberg, M.: Eﬃcient search using posterior phone probability estimates. In: Proc. ICASSP ’95, Detroit, MI (1995) 596–599

Pitch Restoration for Robust Speech Recognition Carlos Lima, Adriano Tavares, and Carlos Silva Department of Industrial Electronics of University of Minho, Campus de Azurém 4800-058 Guimarães, Portugal {carlos.lima,adriano.tavares,carlos.silva}@dei.uminho.pt

Abstract. This article suggests a based spectral normalization method, which purpose is to alleviate both the changing on peaks structure and on the flat zones of the speech spectrum, caused by additive noise. The proposed spectral normalization can be viewed as a noise estimate done in a frame by frame basis by assuming the clean database as lightly corrupted. This noise estimate is then used to restore both the peaked and the flat spectral zones of the speech spectrum. This algorithm was implemented over a baseline spectral normalisation method, which purpose is to emphasize the robustness in the regions of less energy of the speech spectrum, since in these regions the noise is more dominant.

1 Introduction In [1] it is argued that a proper spectral normalization, which concentrates essentially on the speech regions of less energy, could improve significantly the robustness of speech recognition systems when operating under additive noise conditions. The spectral regions with small energy would need a large degree of noise robustness since, assuming that the noise is speech independent, they are more corrupted. The spectral regions of small energies usually correspond to unvoiced sounds regions, which are spectrally not very well defined. Roughly speaking nearly half of the consonants can be classified as unvoiced, while the other half and the vowels are generally classified as voiced. Generally the importance of the vowels in classification and representation of written text is very low; however, most practical automatic speech recognition systems rely heavily on vowel recognition to achieve high performance, forgetting the speech regions of small energy, which perhaps contains the most important degraded information regarding to speech recognition tasks. Others authors [2] have also given an increasing importance to the spectral regions of small energy of the speech spectrum, although by using alternative approaches. The algorithm proposed in [1] can be improved by considering the properties of both the voiced and unvoiced speech regions. The voiced speech regions are characterized by peaked spectral zones, which are flattening, while the unvoiced speech regions are raised as the noise becomes more and more dominant. This join effect is perhaps the main cause of performance degradation under additive noise conditions. N.J. Mamede et al. (Eds.): PROPOR 2003, LNAI 2721, pp. 18–22, 2003. © Springer-Verlag Berlin Heidelberg 2003

Pitch Restoration for Robust Speech Recognition

19

The algorithm proposed in this paper tries to restore partially both the original spectral peaks and the flat spectral zones of a normalized speech spectrum. This approach assumes the clean database lightly contaminated and the noise power is estimated in a frame-by-frame basis by the lowest power of all the sub-bands in each segment. The speech spectrum is normalized (by subtraction) by the smallest component, which is proportional to the noise level and contains minor speech dependence. The algorithm does not assume noise existence, in the sense that the features are extracted exactly in the same way in both noisy and noise free conditions.

2 Noise Effect on the Baseline Spectral Normalization Domain The main goal of a robust features extraction method is providing robustness against noise or other sources of variability by ignoring its presence. Although the noise can be compensated, the effectiveness of this approach becomes very dependent on the accuracy of the noise estimate, which is a very hard task in practical situations. Hence our main goal was searching for a compensation process independent of the noise level or characteristics, although the proposed baseline normalisation assumes a wide band additive noise for maximal performance. More details can be found in [1]. For task uniformity in both clean and noisy conditions the clean database must be considered lightly contaminated. Trying to clean the database, which can be viewed as another kind of normalisation, represents a procedure compatible with the noise compensation paradigm, however the noise compensation is behind the spectral normalization procedure. We propose to estimate the second normalisation factor (the first normalisation factor is behind the baseline normalisation procedure [1]) by taking the value of the smallest component of the power spectrum density in each speech frame. The noise effect can then be alleviated by taking into account that the peaked spectral regions are flattened and the flat spectral regions are raised by the noise effect, as can be seen in [1]. This type of procedure presumes an efficient peak detector, which must be able to distinguish peaks of voiced nature (pitch) from weak peaks occurring in the speech regions of low energy, where the baseline system [1] is efficient concerned to the attenuation of the additive noise effect. This peak classification suggests the use of thresholds, where the key question is how to calculate the threshold level? Based only in practical considerations especially in the inspection of the selected peaks we concluded that roughly speaking a peak which energy is above at least three times the mean of the rest of components in the frame must be classified as a true peak. Otherwise the selected peak must be ignored in order to preserve the benefits of the baseline normalisation [1] on low energy segments.

3 Proposed Noise Compensation To cope simultaneously with the noise effect on the peaked and on the flat spectral regions we must consider two types of compensation procedures, for the peaked and

20

C. Lima, A. Tavares, and C. Silva

for the flat spectral regions. The flat spectral regions are raised by the noise effect, so we suggest subtracting the smallest component to each other component of the observed vector, so we are implicitly considering wide band noise. The procedure must be improved in the future to account for narrow band noise. To account for the second type of normalisation maintaining however compatibility between the two types of normalisation, the features extraction described in [1] must be changed so that

 S i − min{S i } , S i ≠ min{S i }  S ci =  . S i  , otherwise  S

(1)

where Si and S denotes respectively the power in sub-band i and the power of the considered speech segment. The noise compensation in the peaked spectral regions is made by increasing the speech coefficient that was decreased (flattened) by the noise effect. Assuming clean speech (not lightly contaminated speech) the speech features are related [1] by B

∑c

i

= 1.

(2)

i =1

where B is the number of sub-bands. For a speech frame where n peaks are detected, these peaks must be increased by a noise dependent factor. Assuming that each spectral sub-band was decreased proportionally to its value, which seems to be true by analysing figure 1 in [1], the noise compensation can be made by computing cj as

    ( B − n)  (S j )1 + n min{Si } Sj   ∑ j =1   cj = S

(3)

Therefore, the energy subtracted in the flat spectral regions is restored in the peaked zone in order to invert the additive noise effect whereas the sum of all the speech features for each frame is maintained unitary as supposed by the baseline spectral normalisation.

Pitch Restoration for Robust Speech Recognition

21

4 Experimental Results The proposed algorithm was tested in an Isolated Word Recognition system using Continuous Density Hidden Markov models. The database of isolated words used for training and testing is from AT&T Bell and is more accurately described in [1]. The noise has white noise characteristics, is speech independent and computationally generated at various SNR as shown in Table 1. The goal is to compare the performance of the method proposed in this article with the performance of the baseline spectral normalization method when used alone. The performance of the baseline spectral normalization when compared with contemporary speech robust features can be found in [1]. In Table 1 Norm. stands for the baseline spectral normalisation algorithm proposed in [1], PR stands for the pitch restoration algorithm proposed in this paper and N.+MMC stands for the conventional Markov model composition technique over the baseline spectral normalisation. Table 1. Performance of the pitch restoration algorithm under a test set of 400 digits SNR (dB)

15

10

5

0

-5

Norm.

98.5

97.75

93.75

88

42.5

PR

99.25

98.25

95

89.75

61.5

N.+ MMC

99.5

98.75

97.25

92.25

84.75

Table 1 shows that the pitch restoration procedure is really effective in situations where the noise is unknown. However, as expected, if the noise is known, which is not frequently the case in practical situations, the baseline spectral normalization overtake the suggested pitch restoration method suggested in this article.

5

Discussion

The main advantage of this multi-normalisation process is the recognition performance obtained when no knowledge of the noise statistics exists. As a robust extraction features, the baseline method [1] seems to be superior to the most used nowadays, which performance can be increased by the method of restoration of the peaks structure of the speech spectrum suggested in this paper. If isolated noise samples exist, the noise can be estimated and this knowledge can be incorporated into the system, and consequently increasing the recogniser performance.

22

C. Lima, A. Tavares, and C. Silva

References 1.

2.

Lima, C., Almeida, Luís B. and Monteiro, João L.: Improving the Role of Unvoiced Speech Segments by Spectral Normalisation in Robust Speech Recognition. 7th International Conference on Spoken Language Processing (ICSLP’2002). Raj, Biksha: Reconstruction of Incomplete Spectrograms for Robust Speech Recognition. Ph. D. Thesis, Department of Electrical and Computer Engineering, Carnegie Mellon University (2000).

Grapheme-Phone Transcription Algorithm for a Brazilian Portuguese TTS Filipe Barbosa1 , Guilherme Pinto1 , Fernando Gil Resende1 , Carlos Alexandre Gon¸calves2 , Ruth Monserrat2 , and Maria Carlota Rosa2 1 2

Escola Polit´ecnica, Universidade Federal do Rio de Janeiro, Brazil {filipe,guilherme,gil}@lps.ufrj.br Faculdade de Letras, Universidade Federal do Rio de Janeiro, Brazil [email protected] [email protected] [email protected]

Abstract. This paper describes one of the aspects of an ongoing research to improve the synthetic speech quality for a Brazilian Portuguese text-to-speech synthesis system. This paper focuses on the grapheme-phone transcription algorithm. A complete set of rules for every grapheme is presented. Experimental results, obtained running the proposed algorithm through a text database, gave rise to 98,43% of accuracy rate.

1

Introduction

A text-to-speech (TTS) synthesizer is a system which should be able to read, with some degree of intelligibility and naturalness, any text in a given language, Brazilian Portuguese (BP) in the case of this paper. As simple as it seems, this deﬁnition poses a problem from the outset: which accent to choose amongst the diﬀerent accents in BP. If the variety of Portuguese spoken in Rio de Janeiro was preferable to any other in Brazil in the beginning of the 20th century [1,2, 3], its status has been changing drastically since the last decades of the century [4,5], and now it is considered too much nasalized, and sizzingly. So, in place of a no more prestigious accent, it was decided to provide the model with a neuter accent, understood as an accent which most of the Brazilian speakers could identify. According to Ramos [4], the accent in Jornal Nacional, the television news broadcast with the largest indexes of audience in Brazil, is felt in diﬀerent degrees, by speakers of many parts in the country, as representative of their own dialects. This work describes a complete transcription algorithm intended to support a BP TTS with a neuter accent. The proposed grapheme-phone transcription algorithm was tested using a randomly selected part of the CETEN-Folha text database [6]. The transcribed phones were checked, and 98,43% were correctly transcribed in a set of 7805 phones from 1364 words. In this paper, SAMPA [7] phonetic alphabet is used with a few extensions that we propose for BP. Oﬃcial BP orthography is represented in italic fonts. N.J. Mamede et al. (Eds.): PROPOR 2003, LNAI 2721, pp. 23–30, 2003. c Springer-Verlag Berlin Heidelberg 2003

24

F. Barbosa et al.

The remaining of this paper is organized as follows. Section 2 presents the transcription rules. Section 3 discusses experimental results, and Sect. 4 focuses on our conclusions and future work.

2

The Transcription Rules

This section presents the symbols used in the transcription algorithm, followed by the rules for every grapheme. Table 1 contains the symbolization adopted for the description of grapheme rules. As some units needed for our BP transcription algorithms are not listed in the SAMPA phonetic units, we proposed the units listed in Table 2. 2.1

The Transcription Rules

The starting point for the transcription was the phone list proposed by Alcaim, Solewicz & Moraes [8], modiﬁed in Pinto, Barbosa & Resende Jr [9]. The rules for transcribing a given grapheme to the correspondent phone are shown in Tables 3–13. These tables are organized by grapheme blocks of rules. The left column has the grapheme sequences, the middle column contains the phone transcription, and the right column comes with one example for each rule. Table 1. Table of symbols used to express the transcription rules Symbol Value ... Any character (Punctuation or Grapheme) <x> A given grapheme x [y] A given acoustic unit y Hf Sp V

hifen Space between words Any vowel

V s

Stressed Vowel

V us Unstressed vowel C Any consonant

Symbol C v

Value Voiced consonants

C uv

Unvoiced consonants Except for <x>, any consonant Pnt Punctuation ( , . ! ? ; ) W bgn Beginning of word <x><[y]> When the transcription of the grapheme after <x> is [y] <x(STATE)> STATE modiﬁes x. Possibilities: V us, V s or W bgn <x1 , x2 > Graphemes x1 or x2 <x1 , x2 , (x3 , x4 )> <x1 x2 x3 > or <x1 x2 x4 >

Table 2. List of SAMPA extension units proposed for BP [fS] afta [bZ] abnegado [i m] mbi´ a [kS] krenak [gZ] magn´ıﬁco [m i] amn´esia [pS] piano [vZ] Ambev [i n] ntogapide

Grapheme-Phone Transcription Algorithm for a Brazilian Portuguese TTS Table 3. Table of rules for graphemes Grapheme Sequence for algorithm ... ... ... ... ... ... ...

... ... <m,n>... ... <´ a>... ... <˜ a,ˆ a>... ... <˜ a o>... ...

Computational Processing of the Portuguese Language, 8 conf., PROPOR 2008

Read more

Evolutionary Multi-Criterion Optimization: Second International Conference, EMO 2003, Faro, Portugal, April 8-11, 2003, Proceedings

Read more

Computational Processing of the Portuguese Language: 7th International Workshop, PROPOR 2006, Itatiaia, Brazil, May 13-17, 2006, Proceedings

Read more

Multiple Classifier Systems: 4th International Workshop, MCS 2003, Guilford, UK, June 11-13, 2003, Proceedings

Read more

Advanced Parallel Processing Technologies: 5th International Workshop, APPT 2003, Xiamen, China, September 17-19, 2003, Proceedings

Read more

Computational Methods in Sciences and Engineering 2003: Proceedings of the International Conference (Iccmse 2003)

Read more

Functional Imaging and Modeling of the Heart: Second International Workshop, FIMH 2003, Lyon, France, June 2003, Proceedings

Read more

php|architect (June 2003)

Read more

Business Process Management: International Conference, BPM 2003, Eindhoven, The Netherlands, June 26-27, 2003, Proceedings

Read more

Scientific american (June 2003)

Read more

Advances in Natural Language Processing: Third International Conference, PorTAL 2002, Faro, Portugal, June 23-26, 2002. Proceedings

Read more

Self-Stabilizing Systems: 6th International Symposium, SSS 2003, San Francisco, CA, USA, June 24-25, 2003, Proceedings

Read more

Algorithms in Bioinformatics: Third International Workshop, WABI 2003, Budapest, Hungary, September 15-20, 2003, Proceedings

Read more

Cooperative Information Agents VII: 7th International Workshop, CIA 2003, Helsinki, Finland, August 27-29, 2003, Proceedings

Read more

Information Security: 6th International Conference, ISC 2003, Bristol, UK, October 1-3, 2003, Proceedings

Read more

Experimental and Efficient Algorithms: Second International Workshop, WEA 2003, Ascona, Switzerland, May 26-28, 2003, Proceedings

Read more

Rewriting Techniques and Applications: 14th International Conference, RTA 2003, Valencia, Spain, June 9-11, 2003, Proceedings

Read more

Rewriting Techniques and Applications: 14th International Conference, RTA 2003, Valencia, Spain, June 9-11, 2003, Proceedings

Read more

Hybrid Systems: Computation and Control: 6th International Workshop, HSCC 2003 Prague, Czech Republic, April 3-5, 2003, Proceedings

Read more

Computational linguistics and intelligent text processing: 4th international conference, CICLing 2003, Mexico City, Mexico, February 16-22, 2003: proceedings

Read more

Graph Based Representations in Pattern Recognition: 4th IAPR International Workshop, GbRPR 2003, York, UK, June 30 - July 2, 2003. Proceedings

Read more

Intelligent Virtual Agents: 4th International Workshop, IVA 2003, Kloster Irsee, Germany, September 15-17, 2003, Proceedings

Read more

Information Processing in Sensor Networks: Second International Workshop, IPSN 2003, Palo Alto, CA, USA, April 22-23, 2003, Proceedings

Read more

Computational Linguistics and Intelligent Text Processing: 4th International Conference, CICLing 2003, Mexico City, Mexico, February 16-22, 2003. Proceedings

Read more

ICONIP 2003, Istanbul, Turkey, June

Read more

Developments in Language Theory: 7th International Conference, DLT 2003, Szeged, Hungary, July 7-11, 2003, Proceedings

Read more

Physics and Technology of Thin Films IWTF 2003: Proceedings of the International Workshop, Tehran, Iran 22 February - 6 March 2003

Read more

International Investment Perspectives 2003

Read more

2003)

Read more

2003)

Read more

Recommend Documents

Computational Processing of the Portuguese Language, 8 conf., PROPOR 2008

Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster Subseries of Lecture Notes i...

Evolutionary Multi-Criterion Optimization: Second International Conference, EMO 2003, Faro, Portugal, April 8-11, 2003, Proceedings

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 2632 3 Berlin Heidelberg New Y...

Computational Processing of the Portuguese Language: 7th International Workshop, PROPOR 2006, Itatiaia, Brazil, May 13-17, 2006, Proceedings

Lecture Notes in Artificial Intelligence Edited by J. G. Carbonell and J. Siekmann Subseries of Lecture Notes in Comput...

Multiple Classifier Systems: 4th International Workshop, MCS 2003, Guilford, UK, June 11-13, 2003, Proceedings

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 2709 3 Berlin Heidelberg New Y...

Advanced Parallel Processing Technologies: 5th International Workshop, APPT 2003, Xiamen, China, September 17-19, 2003, Proceedings

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 2834 3 Berlin Heidelberg New Y...

Computational Methods in Sciences and Engineering 2003: Proceedings of the International Conference (Iccmse 2003)

Proceed'ings of the METHODS in Sciences and Engineering 2003 (ICCMSE 2003) This page intentionally left blank Proc...

Functional Imaging and Modeling of the Heart: Second International Workshop, FIMH 2003, Lyon, France, June 2003, Proceedings

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 2674 3 Berlin Heidelberg New Y...

php|architect (June 2003)

JUNE 2003 VOLUME II - ISSUE 6 The Magazine For PHP Professionals PHP Unit Testing Agile software development with PHP...

Business Process Management: International Conference, BPM 2003, Eindhoven, The Netherlands, June 26-27, 2003, Proceedings

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 2678 3 Berlin Heidelberg New Y...

Scientific american (June 2003)

TEST TUBE BABIES AND CLONES • THE NEXT STEP FOR PHYSICS Mad Cow–Type Plague Strikes Wild Deer JUNE 2003 WWW.SCIAM.COM C...