Syntactic Pattern Recognition for Seismic Oil Exploration

SYNTACTIC PATTERN RECOGNITION FOR SEISMIC OIL EXPLORATION Hi) u- Yuunil mi ni> 60 ■ MACHINE PERCEPTION ARTIFICIAL INTEL...

Author: Kou-Yuan Huang

42 downloads 1179 Views 5MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

SYNTACTIC PATTERN RECOGNITION FOR SEISMIC OIL EXPLORATION Hi) u- Yuunil mi ni> 60

■ MACHINE PERCEPTION ARTIFICIAL INTELLIGENCE ^ ^ ^ V o l u m e 46 ^ ^ 1

World Scientific

50

30

20

10

SYNTACTIC PATTERN RECOGNITION FOR SEISMIC OIL EXPLORATION

SERIES IN MACHINE PERCEPTION AND ARTIFICIAL INTELLIGENCE* Editors: H. Bunke (Univ. Bern, Switzerland) P. S. P. Wang (Northeastern Univ., USA)

Vol. 34: Advances in Handwriting Recognition (Ed. S.-W. Lee) Vol. 35: Vision Interface — Real World Applications of Computer Vision (Eds. M. Cherietand Y.-H. Yang) Vol. 36: Wavelet Theory and Its Application to Pattern Recognition (V. V. Tang, L. H. Yang, J. Liu and H. Ma) Vol. 37: Image Processing for the Food Industry (E. Fl. Davies) Vol. 38: New Approaches to Fuzzy Modeling and Control — Design and Analysis (M. Margaliot and G. Langholz) Vol. 39: Artificial Intelligence Techniques in Breast Cancer Diagnosis and Prognosis (Eds. A. Jain, A. Jain, S. Jain and L Jain) Vol. 40: Texture Analysis in Machine Vision (Ed. M. K. Pietikainen) Vol. 41: Neuro-Fuzzy Pattern Recognition (Eds. H. Bunke and A. Kandel) Vol. 42: Invariants for Pattern Recognition and Classification (Ed. M. A. Rodrigues) Vol. 43: Agent Engineering (Eds. Jiming Liu, Ning Zhong, Yuan Y. Tang and Patrick S. P. Wang) Vol. 44: Multispectral Image Processing and Pattern Recognition (Eds. J. Shen, P. S. P. Wang and T. Zhang) Vol. 45: Hidden Markov Models: Applications in Computer Vision (Eds. H. Bunke and T. Caelli) Vol. 46: Syntactic Pattern Recognition for Seismic Oil Exploration (K. Y. Huang) Vol. 47: Hybrid Methods in Pattern Recognition (Eds. H. Bunke and A. Kandel) Vol. 48: Multimodal Interface for Human-Machine Communications (Eds. P. C. Yuen, Y. Y. Tang and P. S. P. Wang) Vol. 49: Neural Networks and Systolic Array Design (Eds. D. Zhang and S. K. Pal)

*For the complete list of titles in this series, please write to the Publisher.

Series in Machine Perception and Artificial Intelligence - Vol. 46

SYNTACTIC PATTERN RECOGNITION FOR SEISMIC OIL EXPLORATION

Kou-Yuan Huang Department of Computer and Information Science National Chiao Tung University, Taiwan Hsinchu, Taiwan

V|S* World Scientific w l

NewJersey Sinqapore •»lLondon • Hong Kong New Jersey • Singapore

Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

SYNTACTIC PATTERN RECOGNITION FOR SEISMIC OIL EXPLORATION Series in Machine Perception & Artificial Intelligence Volume 46 Copyright © 2002 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN 981-02-4600-5

Printed in Singapore by Mainland Press

In memory of my father

To my mother, my wife, Jen-Jen, my children Yuh-Shan Cathy and Harry

In memory of the late Goss Distinguished Professor King-sun Pu School of Electrical and Computer Engineering Purdue University

This page is intentionally left blank

AUTHOR'S BIOGRAPHY Kou-Yuan Huang received the B.S. in Physics and M.S. in Geophysics from National Central University, Taiwan, in 1973 and 1977, respectively, and the M.S.E.E. and Ph.D. degrees in Electrical and Computer Engineer ing from Purdue University, West Lafayette, Indiana, in 1980 and 1983, respectively. Since 1978, he was a Graduate Research Assistant at Purdue Univer sity. From 1978 to 1979 he was in Geophysics, Department of Geoscience. From 1979, he was in the School of Electrical and Computer Engineering. He joined the Laboratory for Applications of Remote Sensing (LARS) in 1979. From 1981 to 1983, he was in the Advanced Automation Research Laboratory. From September 1983 to August 1988, he was the Faculty in the Department of Computer Science, University of Houston. Now he is the Professor at the Department of Computer and Information Science at National Chiao Tung University. From August 1992 to July 1993, he was the Visiting Scholar at University of Texas at Austin for one semester and later at Princeton University. From August 1996 to July 1997, he took his sabbatical leave at Rice University and University of Houston. He widely published papers in journals: Geophysics, Geoexploration, Pattern Recognition, IEEE Transactions on Geoscience and Remote Sensing,..., etc. His major contributions are in the areas of seismic pattern recognition using image processing, statistical, syntactic, neural networks, and fuzzy logic methods.

VU

This page is intentionally left blank

PREFACE The use of pattern recognition has become more and more important in seismic oil exploration. Interpreting a large volume of seismic data is a challenging problem. Seismic reflection data in the one-shot seismogram and stacked seismogram may contain some structural information from the response of subsurface. Syntactic/structural pattern recognition techniques can recognize the structural seismic patterns and to improve seismic inter pretations. In 1-D seismic data analyses, different Ricker wavelets represent differ ent pattern classes. We can describe the Ricker wavelets into strings or sentences of symbols. Then we can parse the testing string by grammar or compute the distance between testing string and training string, and assign the testing string to its correct class. In 2-D seismic data analyses, the primary reflection from the geologic structure of gas and oil sand zones can show a seismic structure (bright spots). The bright spot pattern can be processed and represented as a tree representation. Then we can use the tree recognition system to recognize this kind of seismic pattern. In 1-D syntactic analyses, the methods include (1) the error-correcting finite-state parsing for the recognition of the 1-D string of Ricker wavelets, (2) the modified error-correcting Earley's parsing and (3) the parsing using match primitive measure for the recognition of the 1-D attributed string of the wavelets, and (4) the Levenshtein distance computation and (5) the likelihood ratio test for wavelet recognition. In the 2-D tree automata, the methods include (6) the weighted minimum-distance structure preserved error-correcting tree automaton and (7) the modified maximum-likelihood structure preserved error-correcting tree automaton for syntactic parsing of the 2-D seismic bright spot patterns. Finally we present (8) a hierarchical system to recognize seismic patterns in a seismogram.

IX

X

PREFACE

Syntactic seismic pattern recognition can be one of the milestones to ward geophysical intelligent interpretation system. The syntactic methods in this book can be applied to other fields, for example: medical diagnosis system. This book has been written for geophysicists, computer scientists and electrical engineers. I thank Kevin M. Barry of Teledyne Exploration for providing real seismic data. I am indebted to my graduate students, especially at the University of Houston — University Park (1983-1988), in many helpful discussions. At last I thank Professor C. H. Chen at the University of Mas sachusetts at Dartmouth. He encouraged me to write a paper, "Syntactic pattern recognition," in the book, Handbook of Pattern Recognition & Com puter Vision, edited by C. H. Chen, L. F. Pau, and P. S. P. Wang, World Scientific Publishing, 2nd edition, 1998/99. Then using the syntactic ap proaches to the seismic exploration data, I can finish this book. This book was partially supported by the National Science Council, Taiwan, under grant NSC-78-0408-E009-16, NSC-80-0408-E-009-17, NSC-81-0408-E-00912 and NSC-82-0408-E-009-065. Kou-Yuan Huang Hsinchu, Taiwan

CONTENTS AUTHOR'S BIOGRAPHY

vii

PREFACE

ix

1

INTRODUCTION TO SYNTACTIC PATTERN RECOGNITION 1.1. SUMMARY 1.2. INTRODUCTION 1.3. ORGANIZATION OF THIS BOOK

2

INTRODUCTION TO FORMAL LANGUAGES AND AUTOMATA

2.1. SUMMARY 2.2. LANGUAGES AND GRAMMARS

2.3. 2.4. 2.5.

2.6.

Type 0 (unrestricted) grammar Type 1 (context-sensitive) grammar Type 2 (context-free) grammar Type 3 (finite-state or regular) grammar FINITE-STATE AUTOMATON EARLEY'S PARSING FINITE-STATE GRAMMATICAL INFERENCE 2.5.1. Inference of Canonical Finite-State Grammar 2.5.2. Inference of Finite-State Grammar Based on K-Tails . . STRING DISTANCE COMPUTATION

xi

1 1 1 3 7

7 7 9 9 9 9 10 14 16 16 17 18

xii

ERROR-CORRECTING FINITE-STATE A U T O M A T O N FOR RECOGNITION OF RICKER WAVELETS 3.1. SUMMARY 3.2. INTRODUCTION 3.3. SYNTACTIC PATTERN RECOGNITION 3.3.1. Training and Testing Ricker Wavelets 3.3.2. Location of Waveforms and Pattern Representation 3.4. EXPANDED GRAMMARS 3.4.1. General Expanded Finite-State Grammar 3.4.2. Restricted Expanded Finite-State Grammar 3.5. MINIMUM-DISTANCE ERROR-CORRECTING FINITE-STATE PARSING 3.6. CLASSIFICATION OF RICKER WAVELETS 3.7. DISCUSSION AND CONCLUSIONS

CONTENTS

3

ATTRIBUTED GRAMMAR A N D ERROR-CORRECTING EARLEY'S PARSING 4.1. SUMMARY 4.2. INTRODUCTION 4.3. ATTRIBUTED PRIMITIVES AND STRING 4.4. DEFINITION OF ERROR TRANSFORMATIONS FOR ATTRIBUTED STRINGS 4.5. INFERENCE OF ATTRIBUTED GRAMMAR 4.6. MINIMUM-DISTANCE ERROR-CORRECTING EARLEY'S PARSING FOR ATTRIBUTED STRING 4.7. EXPERIMENT

. .

21 21 21 22 22 25 25 25 28 31 32 37

4

A T T R I B U T E D G R A M M A R A N D MATCH PRIMITIVE M E A S U R E (MPM) FOR RECOGNITION OF SEISMIC WAVELETS 5.1. SUMMARY 5.2. SIMILARITY MEASURE OF ATTRIBUTED STRING MATCHING 5.3. INFERENCE OF ATTRIBUTED GRAMMAR 5.4. TOP-DOWN PARSING USING MPM

39 39 39 41 41 42 45 47

5

51 51 51 55 56

CONTENTS

xiii

5.5. EXPERIMENTS OF SEISMIC PATTERN RECOGNITION 5.5.1. Recognition of Seismic Ricker Wavelets 5.5.2. Recognition of Wavelets in Real Seismogram 5.6. CONCLUSIONS

58 58 60 64

6

S T R I N G DISTANCE A N D LIKELIHOOD RATIO TEST FOR D E T E C T I O N OF C A N D I D A T E B R I G H T SPOT 6.1. SUMMARY 6.2. INTRODUCTION 6.3. OPTIMAL QUANTIZATION ENCODING 6.4. LIKELIHOOD RATIO TEST (LRT) 6.5. LEVENSHTEIN DISTANCE AND ERROR PROBABILITY 6.6. EXPERIMENT AT MISSISSIPPI CANYON 6.6.1. Likelihood Ratio Test (LRT) 6.6.2. Threshold for Global Detection 6.6.3. Threshold for the Detection of Candidate Bright Spot 6.7. EXPERIMENT AT HIGH ISLAND T R E E G R A M M A R A N D A U T O M A T O N FOR SEISMIC PATTERN R E C O G N I T I O N 7.1. SUMMARY 7.2. INTRODUCTION 7.3. TREE GRAMMAR AND LANGUAGE 7.4. TREE AUTOMATON 7.5. TREE REPRESENTATIONS OF PATTERNS 7.6. INFERENCE OF EXPANSIVE TREE GRAMMAR 7.7. WEIGHTED MINIMUM-DISTANCE SPECTA 7.8. MODIFIED MAXIMUM-LIKELIHOOD SPECTA 7.9. MINIMUM DISTANCE GECTA 7.10. EXPERIMENTS ON INPUT TESTING SEISMOGRAMS 7.11. DISCUSSION AND CONCLUSIONS

65 65 65 66 67 68 69 72 72 72 73

7

75 75 75 77 78 84 85 86 92 94 . . 95 102

xiv

A HIERARCHICAL R E C O G N I T I O N SYSTEM OF SEISMIC PATTERNS A N D F U T U R E S T U D Y 8.1. SUMMARY 8.2. INTRODUCTION 8.3. SYNTACTIC PATTERN RECOGNITION 8.3.1. Linking Processing and Segmentation 8.3.2. Primitive Recognition 8.3.3. Training Patterns 8.3.4. Grammatical Inference 8.3.5. Finite-state Error Correcting Parsing 8.4. COMMON-SOURCE SIMULATED SEISMOGRAM RESULTS 8.5. STACKED SIMULATED SEISMOGRAM RESULTS 8.6. CONCLUSIONS 8.7. FUTURE STUDY

CONTENTS

8

103 103 103 107 107 107 108 109 109 110 117 121 121

REFERENCES

123

INDEX

131

Chapter 1

INTRODUCTION TO SYNTACTIC PATTERN RECOGNITION

1.1.

SUMMARY

In this chapter we discuss the fundamental idea, system, methods, and applications of syntactic pattern recognition; the reason to use syntactic methods in seismic data. Also we describe the content of each chapter. 1.2.

INTRODUCTION

Syntactic pattern recognition has been developed over two decades, re ceived much attention and applied widely to many practical pattern recog nition problems, such as (1) English and Chinese character recognition, (2) fingerprint recognition, (3) speech recognition, (4) remote sensing data analysis, (5) biomedical data analysis in chromosome images, carotid pulse waves, EEG signals,..., etc., (6) scene analysis, (7) texture analysis, (8) 3-D object recognition, (9) two-dimensional mathematical symbols, (10) spark chamber pictures, (11) chemical structures, (12) geophysical seismic signal analysis,..., etc. [3, 6, 13-16, 19, 22-24, 30, 37, 39, 41, 46, 48, 49, 53-55, 58, 59, 62, 64-66, 72, 73, 78-81, 85-89, 92, 96, 98, 100, 112, 113, 116]. In the pattern recognition problems, besides the statistical approach, the structural information that describes the pattern is important, so we can use syntactic methods to recognize the pattern. A pattern can be decomposed into simpler subpatterns, and each simpler subpattern can be l

INTRODUCTION

2

Fig. 1.1.

TO SYNTACTIC

PATTERN

RECOGNITION

Block diagram of a syntactic pattern recognition system.

decomposed again into even simpler subpatterns, and so on. The simplest subpatterns are called primitives (symbols, terminals). A pattern can be described as a representation, i.e., a stringof primitives, a tree, a graph, an. array, a matrix, or an attributed string,..., etc. [33, 43, 64-66, 68, 91, 110]. We can parse the representation and assign the pattern to its correct class. A basic block diagram of the syntactic pattern recognition system is shown in Fig. 1.1. The system consists of two major parts: training and recognition. The training part consists of primitive (and relation) selec tion, grammatical inference, automata construction from the training pat terns, and the recognition part consists of preprocessing, segmentation or decomposition, primitive (and relation) recognition, construction of pattern representation, and syntactic parsing analysis for the input testing pattern. The finite-state grammar, context-free grammar and context-sensitive grammar of the formal language are adopted in the description of 1-D string representation of the pattern [2, 41]. The 1-D string grammars also include programmed grammar, indexed grammar, grammar of picture de scription language, transition network grammar, operator precedence gram mar, pivot grammar, plex grammar, attributed grammar,..., etc. [16, 36, 37, 39, 41, 49, 55, 90, 92, 95, 97, 101]. The syntactic parsing analy ses include finite-state automata, pushdown automata, top-down parsing, bottom-up parsing, Cocke-Younger-Kasami parsing, Earley's parsing,..., etc. [2, 41]. The description power can be extended from 1-D string grammars to high-dimensional pattern grammars for the analysis of 2-D and 3-D pat terns. The high-dimensional pattern grammars include tree grammar,

ORGANIZATION OF THIS BOOK

3

array grammar, web grammar, graph grammar, shape grammar, matrix grammar,..., etc. [17, 33, 41, 43, 66, 68, 88, 91, 94, 110]. The syntactic parsing analyses include tree automata, array automata,..., etc. For consideration of substitution, insertion, and deletion errors in the pattern, the automata can be expanded to error-correcting automata to accept the noisy pattern or distorted pattern [1, 64, 65, 84, 101, 102, 106, 115]. The 1-D string grammars and high-dimensional pattern grammars also include stochastic grammars, languages, and the corresponding parsers [29, 40, 44, 86, 101, 105]. The use of pattern recognition has become more and more important in seismic exploration [4, 5, 10, 11,18-21, 26, 50, 52-68, 99]. However, most of the papers emphasize statistical seismic pattern recognition. Interpreting a large volume of seismic data is a challenging problem. Seismic data in the one-shot seismogram and stacked seismogram may contain some phys ical and structural information from the response of subsurface. So before interpreting seismic data, it is better to have the techniques to process the seismic data and to improve seismic interpretation. Here using the structural information of seismic data, we propose the important syntactic approach to seismic pattern recognition.

1.3.

ORGANIZATION OF THIS BOOK

In Chapter 2, we start to discuss the fundamental theory of formal lan guages and parsing methods. There are four kinds of basic grammars and languages: finite-state, context-free, context-sensitive, and unrestricted. Finite-state automaton can recognize the finite-state language. Earley's parsing algorithm can recognize the context-free language. Finite-state grammar can be inferred from sample strings. Levenshtein distance is the distance computation between two strings. In Chapter 3, syntactic pattern recognition techniques are applied to the analysis of 1-D seismic traces to classify Ricker wavelets. Seismic Ricker wavelets have structural information in shape, and each Ricker wavelet can be represented by a string of symbols. To recognize the strings, we use a finite-state automaton to identify each string. The automaton can accept strings having substitution, insertion, and deletion errors of the symbols. There are two attributes, terminal symbol and weight, in each transition of

4

INTRODUCTION TO SYNTACTIC PATTERN

RECOGNITION

the automaton. A minimum-cost error-correcting finite-state automaton is proposed to parse the input string. Two methods of parsing attributed string are proposed. One is the modified error-correcting Earley's parsing in Chapter 4, and the other is a parsing using the match primitive measure (MPM) in Chapter 5. In Chapter 4, the modified minimum distance error-correcting Earley parsing for an attributed string can handle three types of error. The recognition criterion of the modified Earley's algorithm is "minimumdistance." We discuss the application of the parsing method to the recog nition of seismic Ricker wavelets and the recognition of wavelets in real seismic data in Chapter 5. In Chapter 5, the computation of the match primitive measure between two attributed strings using dynamic programming is proposed. The MPM parsing algorithm for an attributed string can handle three types of er ror. The MPM parsing algorithm is obtained from the computation be tween the input string and the string generated by the attributed grammar. The MPM parsing is more efficient than the modified Earley's parsing. The recognition criterion of the MPM parsing algorithm is "maximummatching". The parsing method is applied to the recognition of seismic Ricker wavelets and the recognition of wavelets in real seismic data. In Chapter 6, Levenshtein distance computation is applied to detect the candidate bright spot, trace by trace, in the real seismograms. The system for one-dimensional seismic analysis includes a likelihood ratio test, optimal amplitude-dependent encoding, probability of detecting the sig nal involved in the global and local detection, plus minimum-distance and nearest-neighbor classification rules. The relation between error probability and Levenshtein distance is proposed. In Chapter 7, tree automaton of syntactic pattern recognition is adopted to recognize 2-D structural seismic patterns. The tree automaton system includes two parts. In the training part of the system, the training seis mic patterns of known classes are transformed into their corresponding tree representations. Tree representations can infer tree grammars. Several tree grammars are combined into one unified tree grammar. Tree gram mar can generate the error-correcting tree automaton. In the recognition part of the system, each input testing seismogram passes through pre processing and tree representation of seismic pattern. Each input tree is parsed and recognized into the correct class by the error-correcting tree

ORGANIZATION OF THIS BOOK

5

automaton. Because of complex variations in the seismic patterns, three kinds of automaton are adopted in the recognition: weighted minimum distance structure preserved error-correcting tree automaton (SPECTA), modified maximum-likelihood SPECTA, and minimum distance generalized error-correcting tree automaton (GECTA). Weighted minimum distance SPECTA and modified maximum-likelihood SPECTA take only substi tution errors of the tree structure into consideration. Minimum-distance GECTA takes substitution, deletion, and insertion errors of the tree struc ture into consideration. The bright spot seismic pattern is shown as the example in the parsing steps. Tree automata could be applied to the recog nition of other seismic patterns, such as pinch-out, flat spot, gradual sealevel fall, and gradual sealevel rise patterns. The tree automaton system pro vides a tool for recognition of seismic patterns, and the recognition results can improve seismic interpretation. In Chapter 8, we present a hierarchical system to recognize seismic patterns in a seismogram. The seismic patterns are hierarchically decom posed or recognized into single patterns, straight-line patterns or hyperbolic patterns, using syntactic pattern recognition. The Hough transformation technique is used for reconstruction, pattern by pattern. The system of syntactic pattern recognition includes envelope generation, a linking pro cess in the seismogram, segmentation, primitive recognition, grammatical inference, and syntax analysis. The seismic patterns are automatically recognized and reconstructed.

This page is intentionally left blank

Chapter 2

INTRODUCTION TO FORMAL LANGUAGES AND AUTOMATA

2.1.

SUMMARY

In this chapter we introduce the fundamental theory of formal lan guages and syntactic parsing methods: finite-state grammar and language, context-free grammar and language, finite-state automaton for recognition of finite-state language, the Earley's parsing of context-free language, the inferences of finite-state grammar from training samples, and the distance computation between two strings. We use these fundamental methods in the later chapters.

2.2.

LANGUAGES AND GRAMMARS

The formal language theory was initially developed to understand the basic properties of natural languages. The phrase-structure grammar with a set of rewriting rules can be used as a method for describing languages. The phrase-structure languages and their relation to automata were described by Aho and Ullman [2]. Before defining the formal languages, we analyze a basic English sen tence, "The boy runs fast," using the English grammar. We can parse the sentence in tree. We can have a set of production rules or rewriting rules from the tree: (1) <sentence> -> <noun phrase> 7

INTRODUCTION TO FORMAL LANGUAGES AND AUTOMATA

(2) (3) (4) (5) (6) (7)

<noun phrase> -» <article> <noun> —> <article> -» The <noun> -> boy —> runs -> fast

where the symbol "—»•" means "can be rewritten as." The sentence can be derived by the production rules from (1) to (7) using the left-most substitution: <sentence> =>■ <noun phrase> => <article> <noun> =>■ The <noun> => The boy => The boy => The boy runs =>• The boy runs fast Definition 2.1 In the formal language theory, a grammar is a fourfold multiple, G — (Vjv, Vr,P,S), in which (1) Vjv is a finite set of nonterminal (nonprimitive) symbols, (2) VT is a set of terminal (primitive) symbols, V/vUVr = V, VjvU VT = , (3) P is a finite set of production rules denoted by a —>■ /3, where a is in (VJV U VTO'VAKVAT U VT)* and /3 is in (Vjv U

VT)*,

(4) 5 ( e VJV) is a starting nonterminal symbol, where (Vjv U Vp)* denotes the set containing all strings over (Vjv U VT) including zero length string (sentence) A. Prom the starting symbol 5, a string (sentence) is derived by using the production rules of P, S —► ao => cc\ => ■ • • => a m = <*• The language generated by a grammar G is L(G) = {a\a is in VT* and 5 => a } . S =>■ a G

G

LANGUAGES AND GRAMMARS

9

represents that we can derive from S to the string (sentence) a in several derivation steps using production rules in G. Depending on the form of the production rules, the grammars can be divided into finite-state grammar, context-free grammar, context-sensitive grammar, and unrestricted grammar. Type 0 {unrestricted) grammar There is no restriction on the production rules. The languages generated by type 0 grammars are called type 0 languages. Type 1 {context-sensitive)

grammar

The production rules of type 1 (context-sensitive) grammars are of the form a\Aa2 —>• aif3a2 where A £ V/v, a i , a2 G V*, and (3 G V+ {j3 € V* and j3 ^ A). That nonterminal A is replaced by string /3 is dependent on the contexts of the both sides of A. The languages generated by contextsensitive grammars are called type 1, or context-sensitive, languages. Type 2 {context-free) grammar The production rules of type 2 (context-free) grammars are of the form A —> (3 where A 6 Vjf, and /3 G V+. That nonterminal A is replaced by string /? is independent on the contexts of the both sides of A. The languages generated by context-free grammars are called type 2, or contextfree, languages. Type 3 {finite-state or regular) grammar The production rules of type 3 (finite-state or regular) grammars are of the form A -» aB or A —> b where A, B e V^ and a, b G Vy. The languages generated by finite-state grammars are called type 3, or finite-state (or regular), languages. Example 2.1 Finite-state grammar and language Consider the grammar G={VN, VT, P, S), where VN = {S, A}, VT = {a, b}, and P: (1)

S-^aA

INTRODUCTION TO FORMAL LANGUAGES AND AUTOMATA

10

(2) (3)

A^aA A^b

From the form of the production rules, the grammar is a finite-state gram mar. A typical sentence is generated by the derivation S => aA =>■ aaA => aaaA => aaab. In general, L(G) = {anb\n = 1,2,...}. E x a m p l e 2.2 Context-free grammar and language Consider the grammar G = (VN, VT, P, S), where VN = {S, A}, VT = {0,1}, and P: (1) S-^OAO (2) A^OAQ (3) A -+ 1 The grammar is a context-free grammar and L(G) = {0™10n|n = 1,2,...}.

2.3.

FINITE-STATE A U T O M A T O N

A finite-state automaton is the simplest recognizer (recognition device) to recognize the strings (sentences) of the language which are generated from finite-state grammar. Definition 2.2

A deterministic finite-state automaton A is a quintuple

A=(^2,Q,8,q0,F)

,

where ^ is a finite set of input symbols, Q is a finite set of states, 6 is a mapping of Q x £) onto Q (the next state function), qo (G Q) is the initial state, and F C Q is the set of final states. T(A) is the language accepted by A A convenient representation of a finite-state automaton is given in Fig. 2.1. The finite control, in one of the states in Q, reads symbols from an input tape sequentially from left to right. Initially, the finite control is in state go a n d is scanning the leftmost symbol of a string in YT, which appears on the input tape. J2* denotes the set containing all strings over ^2 including A, the empty string. The interpretation of S{q,a)=ql,

q,q'eQ,

d £ ^

FINITE-STATE

AUTOMATON

11

(b) Fig. 2.2. (a) Graphical representation of 8{q, a) = q'. (b) A state transition table of % , o ) =q'.

is that the automaton A, in present state q and scanning the present input symbol a, goes to next state q' and the input head moves one square to the right. A convenient way to represent the mapping 6 is by use of a state transition diagram or a state transition table. The state transition diagram and table corresponding to S(q, a) = q' are shown in Figs. 2.2(a) and 2.2(b). The mapping 5 can be extended from an input symbol to a string of input symbols by defining S(q,X) = q,

6(q,xa) = 6(6(q,x),a),

x G Y J and a G \]

.

Thus, the interpretation of S(q, x) = q' is that the automaton A, starting in state q and scanning through the string x on the input tape, will be in state q' and the input head moves to the right from the portion of the input tape containing x.

INTRODUCTION

12

TO FORMAL LANGUAGES AND AUTOMATA

A string or a sentence w is said to be accepted by A if S(qo, w) =p for some p € F. The set of strings accepted by A is defined as T(A) = {w\5(q0,w) G F } . There is a theorem that transforms a finite-state grammar into a finitestate automaton [41]. The relation is that production rules in P become the mapping 5 in A Theorem 2.1 Let G = (V/v, VT,P, S) be a finite-state grammar. Then there exists a finite-state automaton A = (]T},<2,<5, qo,F) with T{A) = L(G), where: (1) (2) (3) (4)

H =

VT,

Q = VN\J{T], 9o = S, / / P contains the production S -> A, t/ten F = {S1, T}; otherwise, F = {T}, (5) IfB^raC is in P, then S(B,a) = C, (6) IfB^-a is in P, S(B,a) = T. Definition 2.3 tuple

A nondeterministic finite-state automaton A is a quin

A=(^2,Q,S,qo,F)

,

where ^ is a finite set of input symbols, Q is a finite set of states, 5 is a mapping of Q x ^ onto subsets of Q, <jo(€ Q) is the initial state, and F C Q is the set of final states. T(A) is the language accepted by A. The only difference between the deterministic and nondeterministic case is that 6(q, a) may have more than one next state rather than a single next state in nondeterministic case. The nondeterministic finite-state automa ton can be transformed into deterministic finite-state automaton that is easy to design in the computer program. Both automata accept the same language.

FINITE-STATE

AUTOMATON

13

Theorem 2.2 Let L be a set of strings accepted by a nondeterministic finite-state automaton A = (J^, Q, 5, qo,F). Then there exists a determin istic finite-state automaton A' = (£) ,Q',5',qb,F') that accepts L. The states of A' are all the sebsets of Q; that is Q' = 2® and ^2' = Y2- F' is the set of all states in Q' containing a state of F. A state of A' will be denoted by \qi,q2, ■.. ,qi] € Q', where qi,q2,---,qi £ Q, q'0 = [go]<S'([9i,?2>--.,«i],a) = \pi,P2,---,Pi] if and only if 5({qi,q2,... ,qi},a) = U L i %fe.°) = {P1.P2. • • ■ ,Pi}Example 2.3 Given the finite-state grammar G = (Vjv, Vr, P, S), where VN = {S,B}, VT = {a, b}, and P: S -*aB B -+aB B^bS B ->a Using Theorem 2.1, the nondeterministic finite-state transition table can be found:

6 S B T

a B B,T

b S

where T is the final state. Using Theorem 2.2, the deterministic finite-state transition table is:

8' [S] [B] [T] [S,B] [S,T\ [B,T] [S,B,T] Some states are not entered by A'. transition table becomes:

a

b

[B] [B,T] [S] [B,T] [S] [B] [B,T] [S] [B,T] [S] The final deterministic finite-state

14

INTRODUCTION TO FORMAL LANGUAGES AND AUTOMATA

6'

[S] [B] [B,T] [S,B,T]

a

b

[B] [B,T] \S] [B,T] [S] [B,T] IS)

where [B, T] and [S, B, T] are the final states. 2.4.

EARLEY'S PARSING

An efficient context-free parsing algorithm has been developed by Earley [32]. Given a context-free grammar G = (Vjv, VT,P,S) and an input string w = a\a,2 ■ ■ ■ an, we can construct the item lists LQ, L\,..., Ln for w. The list Lj includes the item [A —> a»/3,i], where A —> a/3 is a production in P and 0 < j < n. The item [A —> a • f3, i] in the list Lj represents that we can derive partial string from position i to j of the input string us. The parsing algorithm for an input string w is shown in the following. Algorithm 2.1

Earley's parsing algorithm

Input:

A context-free grammar G = (VV, Vp, P, S) and an input string w = aia2 • ■ • an. Output: The parse lists LQ,L\,. .. ,Ln for w. Method: First, construct L 0 . (1) If S —> a is a production in P, add [S —> »a, 0] to LQ. Perform step (2) until no new items can be added to LQ. (2) For each item [A -> »Bf3,0] is in LQ and for all productions in P of the form B —> 7, add the item [B —> »7,0] to LQ. Now, construct Lj from

LQ,LI,

...,

Lj-\.

(3) For each item of the form [^4 —> a • a/3, i] in Lj-\ such that a = aj, add item [A -4 aa • f3, i] to Lj. Perform steps (4) and (5) until no new items can be added to Lj.

EARLEY'S

PARSING

15

(4) For each item [A -¥ a», i] in Lj and each item [B —> 7 • Af), k\ in Li, add [B ->■ 7 A • /?, ft] to Xj. (5) For each item [A —► a • i?/3, i] in Lj and for all productions in P of the form B —¥ 7, add the item [B —>■ #7, j] to L,-. Step (3) is the derivation to match the terminal a,j. Step (4) is the backward relay or connection in derivation of partial string. Step (5) is the forward derivation or expansion. If [S —>■ a»,0] is in Ln, then S can derive from position 0 to n for the string w = aia^- • • an, i.e., w is in L(G), otherwise the string w is not in L{G). The space complexity of Earley's parsing is Oin2), where n is the length of the input string. If the grammar is ambiguous, the time complexity is 0(n3), and if the grammar is unambiguous, the time complexity is 0(n2). Example 2.4 VN = {S,T,F},

Given the context-free grammar G = (VJV, Vr,P,S), VT = { « , + , * , ( , ) } , and P : (1)S-*S

+T

where

(4)S^T

(2) T ->■ T*F

(5)T->F

(3) F -» (S)

(6)F->a.

Let w = a*a. Applying the Earley's parsing algorithm, we obtain the parse lists for w. L\ :

L2 '■

L3 :

[5-*-«5 + T,0]

[F-tam.O]

[T->T*.F,0]

[F-¥

[S->«T,0]

[r^F.,0]

[F^.(5),2]

[T^r*F.,0]

[T-^.T*F,0]

[S^T;0]

[F-+«a,2]

[5-*r»,0]

[T->»F,0]

[T^-T«*F,0]

[F-+.(S),0]

[ S - + 5 • + r , 0]

LQ

:

a;2]

[T-+T»*F,0] [S-»5»+r,0]

[F -> «a, 0] Since [5 -> T», 0] is in L3, the input string a*a is in L(G). After a tring is accepted, its one or all possible derivation trees can be extracted [2]. Earley's parsing algorithm is used for error-correcting parsing of attributed string in Chapter 4.

INTRODUCTION

16

2.5.

TO FORMAL LANGUAGES AND AUTOMATA

FINITE-STATE G R A M M A T I C A L I N F E R E N C E

The grammar can be directly inferred from a set of sentences, L(G). The problem of learning a grammar based on a set of sentences is called gram matical inference. A basic diagram of a grammatical inference machine is shown as follows. Two finite-state grammar inference techniques are dis cussed: one is the canonical inference and the other is the K-tai\ inference.

Source grammar 2.5.1.

Inference

•^1 ) -*"2 '

•'

x

t}

of Canonical

Inference algorithm Finite-State

Inferred grammar Grammar

A basic canonical definite finite-state grammar can be inferred from a set of training strings [41]. The canonical definite finite-state grammar Gc associ ated with the positive sample set S+ = {x\, X2, ■ ■ ■, xn} is defined as follows: GC =

(VN,VT,P,S)

where S is the starting symbol, VN, VT, and P are generated using the following steps. (1) Check each string Xi e S+ and identify all of the distinct terminal symbols used in the strings of S+. Call this set of the terminal symbols as VT. (2) For each string x; € S+, Xi = a^aii ■ ■ ■ a,in, generate the corresponding production rules S —¥ CLuZil Zn -¥

aaZi2

Zi2 —► a^Ziz

^i,n—\

' din

Each Zij represents a new nonterminal symbol. (3) The nonterminal symbol set Vjv consists of S and all the distinct non terminal symbols Zitj produced in step (2). The set P consists of all the distinct production rules generated in step (2).

FINITE-STATE

GRAMMATICAL

INFERENCE

17

Example 2.5 Given a training string abbe, the inferred canonical definite finite-state grammar GC(VN, Vp, P, S) is as follows: VN = {S,A,B,C}

VT = {a,b,c}

S = {S} .

The production rule set P:

(0)S^aA,

(l)A-¥bB,

(2)B-+bC,

(3)C-+c.

We use this example in Chapter 3 for inference of expanded

finite-state

grammar.

2.5.2.

Inference of Finite-State on K- Tails

Grammar

Based

Initially, we define the derivative of a language with respect to the symbol, the new set of derivative is assigned as a state and the relation can become a production rule reversely. Then we combine the derivative and AT-tails to infer the finite-state grammar from a set of training strings or sentences. Definition 2.4 The formal derivative of a set of strings S with respect to the symbol a 6 Vp is defined DaS = {x\ax € 5 } . The formal derivative can easily be extended so that if ai<22 is a sequence of two symbols, Daia2S = Da2(DaiS) Definition 2.5 Let z £ Vr*'. Then the if-tail of z with respect to the set of the strings S, denoted by g(z, S, k), is defined as follows: g(z, S, k) = {xe VT*\zx G S and |z| < k} where \x\ denotes the length of string x, and k > 0. The string from zx to x is the derivative. In the if-tail finite-state inference grammar, for each z, the derivative g(z, S, k) will recover to a production rule. The example is in the following. Example 2.6 Let S = {01,100,111,0010}, k = 4, then the finite-state inference grammar is:

18

INTRODUCTION

TO FORMAL LANGUAGES AND

Derivative

AUTOMATA

Corresponding production rule

g(\, S, 4) = UQ = {01,100, 111, 0010} D0U0 = {1,010} = UX D1U0=

U0 ->OUi u0 ^1U2

{00,11} = U2

^ou3

D0UX = D00Uo = {10} = U3

Ui

D&

u1-»1 u2 -+0U4 u2 -+lJ7s

= DoiU0 = {A}

D0U2 = £>io^o = {0} = U4 D1U2 = DnU0 = {1} = Us D0U3 = D0D00Uo = D000U0 = DtUs = DiDoolIo = DooiUo = {0} = J74

U3 ->■ lE/"4

A>io#o = <> / ■DonC/o = 4> = D100U0 = {A}

t/ 4 -»■ 0

A ^ 4 = DxDwUQ = DWiUo = 4> D0U5 = DoDuUo = DlwUo = 4> DxUs = DiDnUo = DinU0 = {A}

J75 -► 1

D0U4 = D0D10U0

For k = 3, g(A,5,3) ={70 = {01,100,111}. Because the states from derivative of language are the same as those of A; = 4, the production rules are the same. We can also have inference grammars for k = 2, k = 1. We note that L{Gk) D S

2.6.

L{Gk) D L(Gk+i)

for k > 0

L(Gj)

for J > max |:EJ|.

= S

S T R I N G DISTANCE COMPUTATION

Due to noise and distortion in the practical applications of syntactic pattern recognition, misrecognition of primitives is regarded as substitution errors, and segmentation errors are regarded as deletion and insertion errors [41]. We discuss the error transformations and the distance computation between two strings in the following.

STRING DISTANCE COMPUTATION

19

Definition 2.6 Error transformations [41] For two strings x, y £ Vf", we can define three error transformations on T : Vf, —>• Vf such that y £ T(x). rp

(1) Substitution error transformation: a^b

LJIOMJ^

>-$ LJibu)2, for all a,b £ Vx,

rp

(2) Deletion error transformation: u>iau>2 *-$■ u\W2, for all a £ Vx rp

(3) Insertion error transformation: uj\U2 i-4 Wj6w2, for all 6 —>■ Vx, where wi,u>2 £ Vf. Definition 2.7 Levenshtein distance between two strings [1, 41, 75, 109] The distance between two strings x, y £ Vf, dL(x,y), is defined as the smallest number of error transformations required to derive y from x. Example 2.7 Given a sentence x = abed and a sentence y = accbd, then x = abed i-| acc
b I1

c

c

b

1

d Insertion

l ^ b ' c>l

x-

\ Deletion

\Substitution

Definition 2.8 Weighted error transformations [41] Similar to the defi nition of error transformations, the weighted error transformations can be defined as follows. (1) Weighted substitution error transformation u\au)2 S|—>' ui\bw2, for a,b £ Vx, a y£ b, where S(a,b) is the cost of substituting a by b. Let S(a,a) =0.

20

INTRODUCTION TO FORMAL LANGUAGES AND AUTOMATA

(2) Weighted deletion error transformation a;iatJ2 °h—> ^x^i-, for a € Vp, where D(a) is the cost of deleting a from u}iau)2(3) Weighted insertion error transformation W1UJ2 (■—> W1&W2, for b S Vp, where 1(b) is the cost of inserting b. Definition 2.9 Weighted distance between two strings The weighted dis tance between two strings x, y £ Vp, dw(x,y), is defined as the smallest cost of weighted error transformations required to derive y from x. A l g o r i t h m 2.2

Weighted distance between two strings [109]

Input:

Two strings x = a\a2 • ■ -an and y = &i&2 • • • bm, substitution error cost S(a, b), S(a,a) = 0, deletion error cost D(a), and insertion error cost 1(a), 0 , 6 6 Vp. Output: d(x,y). Method: Step 1. D(0,0) = 0. Step 2. Do i = l , n . D{i,0) = D(i - 1,0) + D(ai) Do j = l,m. D(0,j) = D{0,j - 1) + I(bj) Step 3. Do i = 1, n; do j = 1, m. el=D(i-l,j-l)+S(ai,bj) e2 = D(i-l,j) + D(ai) e3 = D(i,j-l)+I(bj) D(i,j) = m i n ( e i , e 2 , e 3 ) Step 4. d(x,y) = D(n,m). Exit.

We may consider context-deletion and context-insertion errors, then the deletion cost Del(a,b), deleting o in front of b or after b, and the insertion cost I(a,b), inserting b in front of a or after a, must be included. The rela tion between string distance and error probability has ever been presented in the detection of wavelets [58, 59]. For a given input string y and a given grammar G, we can find the minimum distance between y and z using parsing technique, where string z is in L(G). The parsing technique is called minimum-distance errorcorrecting parser (MDECP) (1, 41]. We use MDECP in the finite-state parsing, attributed string parsing and tree automaton.

Chapter 3

E R R O R - C O R R E C T I N G FINITE-STATE AUTOMATON FOR RECOGNITION OF RICKER WAVELETS

3.1.

SUMMARY

Syntactic pattern recognition techniques are applied to the analysis of onedimensional seismic traces for classification of Ricker wavelets. Seismic Ricker wavelets have structural information in the shape. Each Ricker wavelet can be represented by a string of symbols. In order to recognize the strings, we use the finite-state automaton for recognition of each string. Then each Ricker wavelet can be classified. The automaton can accept the strings with the substitution, insertion, and deletion errors of the symbols. There are two attributes, terminal symbol and weight, in each transition of the automaton. A minimum-cost error-correcting finite-state automaton is proposed to parse the input string. The recognition results of Ricker wavelets are quite encouraging.

3.2.

INTRODUCTION

Since seismic Ricker wavelets have structural information, it is natural to adopt a syntactic (structural) approach in seismic pattern analysis [58, 64, 65]. Each Ricker wavelet can be represented by a string of symbols (terminals, primitives). Each of these strings can then be recognized by the finite-state automaton allowing each Ricker wavelet to be classified. A block diagram of the classification system is shown in Fig. 3.1. The system includes a training (analysis) part and a recognition part. The 21

ERROR-CORRECTING

22

Input seismic signals

Location of wavelets

FINITE-STATE AUTOMATON .

String pattern representation Amplitude Segmen dependent tation encoding

Error-correcting finite-state parsing

Inference of finite-state grammar

Expanded >| grammar and automaton

Classification results

Recognition ' Training T Training wavelets

String pattern representation

Fig. 3.1. A classification system of seismic wavelets using error-correcting finite-state parsing.

training part establishes p a t t e r n representation and grammatical inference. T h e recognition p a r t includes location of waveforms, p a t t e r n representation, and error-correcting finite-state parsing. P a t t e r n representation performs p a t t e r n segmentation and primitive recognition to convert a Ricker wavelet into a string of primitives (symbols). Grammatical inference infers finitestate grammar from a set of training strings. T h e finite-state g r a m m a r is expanded to contain three types of error symbol transformations: dele tion, insertion, a n d substitution errors. T h e a u t o m a t o n can be constructed from error-correcting finite-state g r a m m a r . Then, the minimum-distance error-correcting finite-state a u t o m a t o n can perform syntactic parsing a n d classification of input Ricker wavelets.

3.3. 3.3.1.

SYNTACTIC PATTERN Training

and Testing

RECOGNITION

Ricker

Wavelets

T h e eight classes of zero-phase Ricker wavelets with different frequencies (15, 25, and 35 Hz) and maximum amplitudes (—0.30 to 0.30) are used in the classification a n d shown in Table 3.1. T h e 28 Ricker wavelets of eight classes with r a n d o m noise are generated in the seismic traces in Fig. 3.2. T h e sampling interval is 1 ms. T h e class of each Ricker wavelet in the seismic traces is shown in Table 3.2. Eight Ricker wavelets are chosen as the training wavelets and one for each class.

SYNTACTIC

PATTERN Table 3.1.

Ricker wavelet class

RECOGNITION

23

Selected Ricker wavelets in string representations.

Frequency (Hz)

Reflection coefficient

Training strings (corrupted by noise)

0.25 0.15 -0.15 -0.25 0.20 -0.20 0.30 -0.30

cccaacccaaccoAbcoCCBCCC AACCCBBC ccACoccbBAbaBAbccoBCCBaaBCCC CCBAaoCCCbbBCBbaCCbcoAcccoAccbab BCCCDCooCCAcaCCocbBacccbcccBacc ccccccccoBBBCCCCCC CCCCCCBBCCocccccccc dddddcbBCCDDD DDDCCCCbcccddd

1

15

2

15

3

15

4

15

5

25

6

25

7

35

8

35

Table 3.2.

Classes of Ricker wavelets in Fig. 3.2. Sample

Class

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

5 6 1 5 1 8 1 7 3 4 2 1 7 8 8 8 5 6 7 5 4 2 3 6 3 7 1 3

ERROR-CORRECTING

FINITE-STATE AUTOMATON

1. OS4 T I SHIT)

3.2. Twenty-eight Ricker wavelets in the seismic traces.

,..

EXPANDED

3.3.2.

GRAMMARS

Location Pattern

25

of Waveforms and Representation

We use pick detection method to locate each Ricker wavelet. Then the pattern representation can transform each wavelet into a syntactic string of primitives. The time interval of the segmentation is 1 ms and each segment is assigned as a primitive. In order to classify the Ricker wavelets with different amplitudes, the amplitude-dependent encoding of the modified Freeman's chain code is used [58, 65]. The difference di = y%+i — yi of vertical coordinates of two adjacent data points, (ti,yi) and (£j + i,yj + i), is assigned as a terminal symbol according to the following: Wi = d,

if 0.05027 < di,

Wi

— c,

if 0.01497 < di < 0.05027,

Wi

= b,

if 0.00668 < di < 0.01497,

Wi

= a,

if 0.00199 < di < 0.00668,

Wi

= o,

if - 0.00199 < di < 0.00199,

Wi

= A

if - 0.00668 < di < -0.00199,

Wi

= B

if - 0.01497 < di < -0.00668,

Wi

= C

if - 0.05027 < di < -0.01497,

Wi

_ D

if di < -0.05027.

and

After primitive recognition, the string representations of the eight training Ricker wavelets are shown in Table 3.1.

3.4. 3.4.1.

EXPANDED GRAMMARS General

Expanded

Finite-State

Grammar

Because of noise and distortion problems, three types of error symbol trans formations may occur in the strings: substitution errors, insertion errors, and deletion errors. After the inference of finite-state grammar, the gram mar is expanded to include the three types of error production rules. This

26

ERROR-CORRECTING

FINITE-STATE AUTOMATON

...

type of grammar is called a general expanded finite-state grammar. The following steps are used to construct this grammar. (1) The original production forms of a finite-state grammar are A —► aB, or A -» a. Change A —> a to A —i aF, where F is the new nonterminal with ending terminal a. (2) The production forms added to account for substitution errors are A -» bB, where a^b. (3) The production forms added to account for insertion errors are A —> aA. (4) The production forms added to account for deletion errors are A —> XB, where A is the empty terminal. We can put weights on these production rules if we wish. The algorithm is as follows. Algorithm 3.1

Construction of general expanded finite-state grammar

Input: A finite-state grammar G = (Vpf, Vp, P, S). Output: The general expanded finite-state grammar G' = (yN, VT, P', S"), where P' is a set of weighted production rules. Method: (1) For each production rule in P with the form A —> a in the grammar G, change the rule to the form A —► aF, where F is a new nonterminal. (2) Set V± = VTU {A}, Vff=VNU {F}, S' = S. (3) Let the production rule set P' = P with a weight of zero for each original production rule. (4) For each nonterminal A in V^ add the production A -¥ XA (with the weight 0) to P'. (5) Substitution error production: For each production A —> aB in P do For each terminal b in Vp do If A -> bB is not in the P' then add the production A —► bB (with the weight 1) to P'. (6) Insertion error production: For each nonterminal A in VN do For each terminal a in Vr do If A -> aA is not in the P' then add the production A —>■ aA (with the weight 1) to P'.

EXPANDED

27

GRAMMARS

(7) Deletion error production: For each production A —» aB in P (A ^ B) do If A -» AS is not in the P ' then add the production A -» AP (with the weight 1) to P'. (8) Add the production P -» A (with weight 0) to P . (9) Output C . Equal unit weight is assigned to each error production rule in the algorithm. Different weights may be assigned in steps (5), (6) and (7). E x a m p l e 3.1 From the previous Example 2.5, given the training string abbe, the inferred general expanded finite-state grammar G'(V^,Vr',. P ' , S") using Algorithm 3.1 is as follows: Vif = {S,A,B,C,F}

Vi = {a,b,c,X}

S> = {S}

The production rule set P' is: (0)5 ->aA,0

13)

B-+aC,l

26) C -> aC, 1

(l)A^-bB,0

14) B ->• cC, 1

27) C -»■fcC,1

(2)B->bC,0

15) C -4 aP, 1

28) C -> cC, 1

(3) C -> cF, 0

16) C -»• 6F, 1

29) F -> aP, 1

(4) 5 -> AS, 0

17) S -> oS", 1

30) P -4 6P, 1

(5)A->AA,0

18) 5 -* 65,1

31) P -> cP, 1

( 6 ) P - > AP,0

19) 5 -4 c5,1

3 2 ) 5 ^ A.4,1

(7) C -»■ AC, 0

20)

A-^aA,l

33)A->AP,1

( 8 ) F - > AF,0

21) A ->• &A, 1

34) B -s- AC, 1

(9) S -»■ 64, 1

22)

35) C - > A F , 1

(10) 5 - + c . 4 , 1

23) P -4 aB, 1

(ll)A->aP,l

24) P -)■ bB, 1

(12) A-> cP, 1

25)P->cP,l

A-^cA,l

36)P->A,0

Production rules (9) to (16 handle the substitution errors, rules (17) to (31) handle the insertion errors, and rules (32) to (35) handle the deletion errors. The corresponding error-correcting finite-state automaton is shown in Fig. 3.3.

ERROR-CORRECTING

28

(c,1)

(c.1)

(G,1)

(c.1)

FINITE-STATE AUTOMATON

(c,1)

(C,1)

(c,1)

(CO)

...

(c,1)

Fig. 3.3- Transition d i a g r a m of t h e general e x p a n d e d g r a m m a r G' of E x a m p l e 3.1.

3.4.2.

Restricted

Expanded

Finite-State

Grammar

For insertion error, we can insert an error terminal symbol before or after some arbitrary terminal symbol. Then we can expand the form of the production rule A —> aB with restricted insertion error as follows:

or

A —»• bB\,

B\ -4- aB

A —> ai?2 ,

B2 —> bB

(Insert b in front of a), (Insert b after a).

The proposed algorithm is described in the following. A l g o r i t h m 3.2

Construction of restricted expanded finite-state grammar

Input: A finite-state grammar G = (Vjv, Vp, P, S). Output: The restricted expanded finite-state grammar G' = {Vpj, Vp, P', S'), where P' is a set of weighted production rules. Method: (1) For each production rule in P with the form A —¥ a in the grammar G, change the rule to the form A —► aF, where F is a new nonterminal. (2) Let P' = P with the weight 0 for each original production rule. (3) Substitution error production: For each production A —> aB in P do For each terminal b in Vr do

EXPANDED

(4)

(5)

(6) (7) (8) (9)

GRAMMARS

29

If A -s- bB is not in the P' then add the production A —> bB (with the weight 1) to P'. Insertion error production: For each production A —t aB in P do { Insert b in front of a } For each terminal b in V? do add the production A ->■ frBi (with the weight 1) to P', add the production Bi —>■ aB (with the weight 0) to P ' , and { Insert 6 after a } For each terminal b in V? do add the production A —>■ a P 2 (with the weight 0) to P ' , add the production Bi —> bB (with the weight 1) to P'. Deletion error production: For each production A -> aB in P (A ^ B) do If A -> AB is not in the P ' then add the production A —>■ AB (with the weight 1) to P ' . Set S' = S,Vj, = VTL> {A}, V^ = all the distinct nonterminal symbols inP' For each nonterminal A in V^ do add the production A —► AA (with the weight 0) to P ' . Add the production F ->• A (with weight 0) to P ' . Output G'.

E x a m p l e 3.2 From the Example 2.5, given the training string abbe, the inferred restricted expanded finite-state grammar G"(V^, V^,P",S") using Algorithm 3.2 is as follows: Vj} = {S,A,B,C,D,E,F,G,H,I,J,K,L}

V^ = {a,b,c,\}

The production rule set P" is: (0)S->aA,0

(21)P->aA,l

(42) C ->■ cK, 1

(1) A -» 6B, 0

(22) E->bA,l

(43) if -> cF, 0

(2) B -4 bC, 0

(23) P -> cA, 1

(44) C -> cL, 0

(3) C -4 cF, 0

(24) A ->• aG, 1

(45) !■ -4 aF, 1

(4) 5 -4 bA, 1

(25) A -4 bG, 1

(46) L -4 bF, 1

S" = {S}.

ERROR-CORRECTING

30

(5)

S-¥cA,l

(6) A-¥aB,l (7) A -¥ cB, 1 (8) B -> aC, 1 (9) £ -> cC, 1 (10) C->aF,l (11)C->6F,1 (12) 5 -> XA, 1 (13)4-+ A 5 , l (14) B -»• AC, 1 (15) C -»• AF, 1 (16) S -4 aZ>, 1 (17) S -► &D, 1 (18) 5 -}■ eD, 1 (19) D - > a.4,0 (20) 5 -> a F , 0

FINITE-STATE AUTOMATON .

26 A -* cG, 1 !27 G - + 6 5 , 0 '28; A ^ 6iJ, 0 29 H -*aB,l 30 H->bB,l !31 H^cB,l 32 B^-a.1,1 33 B^bl,l 34 B->cI,l 35 J -> 6C, 0 '36 B^-bJ,0 37 J ->■ aC, 1 38 J -4 6C, 1 39. J -!• CC, 1 40 C-+aK,l '41 C-+bK,l

(47) Z, ->• cF, 1 (48) 5 -+ AS,0 (49)4 -s- XA, 0 (50) 5 -> XB, 0 (51)C-> AC,0 (52) D -+ AD, 0 (53) £ -s- XE, 0 (54) F ->• AF, 0 (55)G->AG,0 (56) H->XH,0 (57)/(58) J (59) # (60) £ -

A/,0 • AJ,0 *Aif,0 >AZ,,0

(61)F->A,0

The corresponding error -correcting finite-state automaton is shown in Fig. 3.4.

Fig. 3.4. Transition diagram of the restricted expanded grammar G" of Example 3.2.

MINIMUM-DISTANCE ERROR-CORRECTING

3.5.

...

31

MINIMUM-DISTANCE E R R O R - C O R R E C T I N G FINITE-STATE PARSING

Input testing pattern strings can be analyzed by a finite-state automa ton that can accept strings derived by finite-state grammar. Given a finite-state grammar G, there exists a finite-state automaton M such that the set of strings accepted by M is equal to the set of strings L(G) derived by the finite-state grammar G. The automaton M can be represented by a finite-state transition diagram [41]. The production rule A —¥ aB in the finite-state grammar corresponds to the transition S(A, a) = B in the automaton. An input string can go from the initial state to the final state of the automaton if the string is accepted by the automaton. Here, each transition of the error-correcting finite-state automaton has two attributes, input terminal symbol and weight value. For the production rule A -4- aB with weight w, we use CAB{O) — w a s the cost representation in the transition of the automaton. We want to parse the input string from the initial state to the final state by minimum cost. The following algorithm is proposed to compute the minimum cost (distance) by using the dynamic programming technique [41]. Algorithm 3.3 Minimum cost of error-correcting finite-state parsing with two attributes in each transition Input:

An error-correcting finite-state automaton with n nodes num bered 1,2,... ,n, where node 1 is the initial state and node n is the final state. Two attributes, terminal symbol and its cost function Cij(a), for 1 < i, j < n, a € (Vp U {A}), with Cij(a) > 0, for all i and j . An input testing string S. Output: M\n the minimum cost of the path from node 1 to node n when the parsing sequence is equal to the terminal sequence of the input string S. Method: (1) M\\ = 0, Mij = maxint (a large number), 1 < j < n. (2) For 1 < j < n do Mij = min{Mifc + Ckj(X), l
32

ERROR-CORRECTING

FINITE-STATE AUTOMATON

...

(4) Let M[k = Mik, l
3.6.

CLASSIFICATION OF RICKER WAVELETS

Following the procedures of the syntactic pattern recognition system in Fig. 3.1, 28 Ricker wavelets from the eight classes (Table 3.1) are

8

I Table 3.3. Parsing result of test string aabc using Algorithm 3.3 according to the transition diagram of restricted expanded grammar in Fig. 3.4. Minimum cost is 1. h

Mi.io

Mi,ii

Mi,i2

Ml, 13

inf inf

inf inf

inf inf

inf inf

inf 4

inf inf

3 3

inf inf

4 4

inf inf

4 3

1 1

inf inf

2 2

inf inf

3 3

inf inf

3 3

inf inf

2 2

1 1

2 2

1 1

3 3

inf inf

3 2

inf inf

inf inf

inf inf

2 2

inf inf

2 2

1 1

1 1

Step in algorithm

Afi.i

Afi,2

Afi,3

Afi,4

Ml,5

Ml,6

1 2

0 0

inf 1

inf 2

inf 3

inf inf

1

4 5

inf inf

0 0

2 1

3 2

2

4 5

inf inf

1 1

1 1

3

4 5

inf inf

inf inf

4

4 5

inf inf

inf inf

Afl,7

Afl,8

inf inf

inf inf

inf inf

1 1

0 0

2 2

2 2

inf inf

inf inf

1 1

1 1

inf inf

2 2

2 2

inf inf

Ml,g

o o •*]

S3

8 to

I

I 3

CO CO

ERROR-CORRECTING

34

FINITE-STATE

AUTOMATON

...

mixed with random noise and used as the testing wavelets in the seismic traces in Fig. 3.2. After locating wavelets and pattern representation by amplitude-dependent encoding of the modified Freeman's chain code, each wavelet is converted into a syntactic string of symbols. There are two kinds of training in the experiments. The first kind is the use of Algorithms 3.1 and 3.3 and the second is the use of Algorithms 3.2 and 3.3. For the first kind of training, using Algorithm 3.1, general expanded Table 3.4. Parsing result of test string aabc using Algorithm 3.3 according to the tran sition diagram of general expanded grammar in Fig. 3.3. Minimum cost is 1. Step in algorithm

Mi.i

Mi, 2

Ml,3

Ml,4

Ml,5

1 2

0 0

infinite 1

infinite 2

infinite 3

infinite 4

1

4 5

1 1

0 0

2

3 2

4 3

2

4 5

2 2

1 1

2 2

3 3

3

4 5

3 3

2 2

1 1

3 2

4

4 5

4 4

3 3

2 2

1 1

h

2 2

Table 3.5. Numbers of nonterminals, production rules, and CPU time for generating the general expanded grammar of each class of Ricker wavelet using Algorithm 3.1.

Class

Number of nonterminals

Number of productions

1 2 3 4 5 6 7 8

32 29 33 32 20 20 14 15

631 571 651 631 391 391 271 291

CPU time on VAX11/780 12.55 (sec) 10.50 13.13 12.50 5.66 5.69 3.33 3.67

CLASSIFICATION

OF RICHER

WAVELETS

35

finite-state grammar for the eight training strings of the Ricker wavelets are generated. The numbers of nonterminal symbols and production rules and the CPU time on a VAX11/780 required to construct the general ex panded grammar for each class of the training Ricker wavelet are shown in Table 3.5. Four cases of different weighting costs are used in the calculation of minimum-cost finite-state parsing. Different weights in Table 3.6 are as signed in each case to the automaton. The first case assigns equal weights to each error production rule. The second case normalizes the lengths of two calculated strings (one is the training string and the other is the testing Table 3.6.

Four cases: of different weights in automata.

Case no.

Case descriptions

1

Equal weights are assigned to insertion, deletion, and substitution transformations.

2

Normalized distance with equal weight assigned to . insertion, deletion, and substitution transformations.

3

Nonequal weights are assigned to insertion, deletion, and substitution transformations. weight (insertion) = 0.5 weight (deletion) = 0.5 weight (substitution) = 1.0

4

Nonequal weights are assigned to insertion, deletion, and substitution transformations. weight (insertion) = 2.0 weight (deletion) = 2.0 weight (substitution) = 1.0

Table 3.7. Parsing time and percentage of correct wavelet classification Alogrithm 3.1 and 3.3 in four cases of different weights.

Case no.

Average CPU time for one string (sec)

1 2 3 4

3.37 3.38 3.71 3.35

Percentage of correct classification 20/28 25/28 17/28 24/28

71.43 89.29 60.71 85.71

using

36

ERROR-CORRECTING

FINITE-STATE

AUTOMATON

...

string). The third and the fourth cases use the different weighting costs in error transformations. The parsing classification results of the 28 testing Ricker wavelets in Fig. 3.2 using Algorithm 3.3 are shown in Table 3.7. Because the classes of the testing wavelets are known, the percentage of correct classification in Table 3.7 can be calculated. The average parsing time is also shown in Table 3.7. The second case using normalized distance has a better classi fication result. Comparing the third case with the fourth case, we assign more weights to the insertion and deletion errors than to the substitution errors in the fourth case, then the classification result is improved. For the second kind of training, using Algorithm 3.2, a restricted expanded grammar is generated for the eight training strings of Ricker

Table 3.8. Numbers of nonterminals, production rules, and CPU time for generating the restricted expanded grammar of each class of Ricker wavelet using Algorithm 3.2.

Class

Number of nonterminals

Number of productions rules

1 2 3 4 5 6 7 8

93 84 96 93 57 57 39 42

1023 924 1056 1023 627 627 429 462

CPU time on VAX11/780 20.15 (sec) 17.08 21.49 20.76 9.63 9.73 5.91 6.45

Table 3.9. Parsing time and percentage of correct wavelet classification Algorithms 3.2 and 3.3 in four cases of different weights.

Case no.

Average CPU time for one string (sec)

Percentage of correct classification

1

315.82

24/28

85.71%

2

316.50

25/28

89.29%

3

323.74

22/28

78.57%

4

313.24

24/28

85.71%

using

DISCUSSION AND CONCLUSIONS

37

wavelets. The number of nonterminal symbols, production rules, and the CPU time on a VAXll/780 needed to construct the restricted ex panded grammar for each class of the training Ricker wavelet are shown in Table 3.8. The seismic data in Fig. 3.2 have also been tested. Four cases of different weighting costs are used in parsing the same as the first kind. The average parsing time and percentages of correct classification using Algorithm 3.3 and different weights are shown in Table 3.9. Comparing the results in Tables 3.7 and 3.9, Table 3.9 has a better classification rate but uses a longer CPU time because more nonterminals and production rules are used in the restricted expanded grammar. The percentages of correct classification for different weighting for Cases 1 and 3 are more improved in Table 3.9. 3.7.

DISCUSSION A N D CONCLUSIONS

Syntactic pattern recognition techniques are important in recognizing seis mic structural patterns. The error-correcting finite-state automaton in cludes substitution, insertion, and deletion error transformations in the state transition and is used to classify Ricker wavelets. Minimum-distance parsing can parse the input testing string using dynamic programming techniques with two attributes, input terminal symbol and weight value, in the state transition. The classification results of Ricker wavelets are encouraging. If each transition of the finite-state automaton has equal weight, then the nondeterministic finite-state automaton can be transformed into a deterministic finite-state automaton and a reduced number of states of finite-state automaton [41].

This page is intentionally left blank

Chapter 4

ATTRIBUTED G R A M M A R AND ERROR-CORRECTING EARLEY'S PARSING

4.1.

SUMMARY

Two methods of parsing attributed string are proposed. One is the modified error-correcting Earley's parsing in Chapter 4, and the other is a parsing using the match primitive measure (MPM) in Chapter 5. The modified minimum distance error-correcting Earley's parsing for an attributed string can handle three types of error. The recognition criterion of the modified Earley's algorithm is "minimum-distance." We discuss the application of the parsing method to the recognition of seismic Ricker wavelets and the recognition of wavelets in real seismic data in Chapter 5. 4.2.

INTRODUCTION

Applications of syntactic pattern recognition to digital signal process ing have received much attention and achieved considerable success [41]. Recently, syntactic pattern recognition has been applied in seismic signal analysis [58, 59, 62, 64-66]. Here, the attributed grammar parsing is used for classification of seismic wavelets. In the syntactic approach, after the segmentation and primitive assignment of the seismic trace, the same pri mitive may repeat several times. This often makes the size of the pattern strings and inferred grammars unnecessarily large. Instead of keeping track of all these identical primitives, we can use one syntactic symbol to represent 39

ATTRIBUTED

40

GRAMMAR AND

ERROR-CORRECTING...

the type of primitive with a n a t t r i b u t e to indicate t h e length of t h a t pri mitive. This leads to the application of the length a t t r i b u t e to seismic and other similar digital signal analysis. A t t r i b u t e d g r a m m a r has been applied in p a t t e r n recognition [41]. You and Fu [114] implemented a t t r i b u t e d g r a m m a r s in shape recognition. Shi and Fu [98] proposed a t t r i b u t e d graph grammars. A g r a m m a r of which each production is associated with a set of semantic rules is called an a t t r i b u t e d grammar [41]. B o t h the inherited and the synthesized a t t r i b u t e s often lead to significant simplification of grammars [41]. T h e advantages of using attributed grammars for p a t t e r n recognition are two-fold. First, the inclusion of semantic information increases flexibi lity in p a t t e r n description. Second, it reduces the syntactic complexity of the p a t t e r n g r a m m a r . T h e resulting grammar size for each p a t t e r n class is reduced. Similarity a n d dissimilarity measures between two strings have been discussed in m a n y articles [69, 71, 75, 109]. Here, the similarity measure between two a t t r i b u t e d strings is proposed and is called the m a t c h primitive measure ( M P M ) . Two parsing methods for attributed strings are proposed. One is the modified minimum distance error-correcting Earley's parsing algorithm a n d t h e other is the parsing algorithm using the m a t c h primitive measure ( M P M ) . Figure 4.1 is the system that two parsing methods can be used for t h e recognition of seismic wavelets [64].

Input signals

Location of wavelets

Attributed string representation

Error-correcting Earley's parsing or parsing using match primitive measure

Classification results

Recognition Training Training wavelets

Attributed ♦(string representation

Inference of attributed grammar

Fig. 4.1. System of error-correcting Earley's parsing or parsing using match primitive measure for recognition of seismic wavelets.

ATTRIBUTED

4.3.

PRIMITIVES AND STRING

41

A T T R I B U T E D PRIMITIVES A N D S T R I N G

In this study, each primitive is accompanied by a length attribute. That is, each pattern primitive, a, can be represented by a two-tuple, a = (s: y), where s is a syntactic symbol denoting the basic segment, and y represents the length of a. For example, a pattern string is aaadgggggeeaaagg. It can be simplified by merging the identical symbols. Thus, the above string becomes (a,3)(d,l){g,5)(e,2)(a,3)(g,2) as the attributed string where each number represents the number of du plications for that symbol. This idea leads to some storage improvement in string representation. 4.4.

DEFINITION OF ERROR T R A N S F O R M A T I O N S FOR A T T R I B U T E D STRINGS

In minimum-distance error-correcting parsing for context-free languages, three different types of error transformations (insertion, deletion, and sub stitution) have been defined [1, 75, 109]. In order to handle these three error transformations in the parsing of attributed strings, error transfor mations for attributed strings have been defined. Errors can be classified as global and local deformations. For global deformations, errors consist of insertion and deletion errors, and each of them can deform syntactically and semantically. (1) A syntactic insertion error is the replacement of a null string A (the length of A is zero) by a syntactic symbol, i.e., (\,0)-*

(s,y).

(2) A semantic insertion error is the addition of a length attribute, i.e., (s,Vi) ->■ (5,2/2)

where

yi < y2 ■

42

ATTRIBUTED GRAMMAR AND

ERROR-CORRECTING...

When a syntactic insertion error has occurred, the associated semantic length is added also. However, a semantic insertion error can occur without any syntactic insertion error. (3) A syntactic deletion error is the replacement of a syntactic symbol by a null string A, i.e., (a,y)->(A,0). (4) A semantic deletion error is the removal of an attribute from a semantic length, i.e., (s, 2/1) -> (s, 2/2) where

j/i > y2 •

When a syntactic deletion error has taken place, the corresponding semantic length is deleted also. A semantic deletion error can occur without a syntactic deletion error. For local deformation, a substitution error can take place. (5) A syntactic substitution error is denned as the replacement of primitive s by another primitive t, i.e., (s,y) ->

(t,y).

A semantic substitution error is not denned, because it may be counted as a semantic insertion or deletion error.

4.5.

I N F E R E N C E OF A T T R I B U T E D G R A M M A R

An attributed context-free grammar is a 4-tuple G = (Vn,Vt,P,S), where Vn is the nonterminal set, Vt is the terminal set, P is a finite set of pro duction rules, and S e Vn is the starting symbol. In P, each production rule contains two parts: one is a syntactic rule, and the other is a semantic rule. Each symbol X £ (Vn U Vt) is associated with a finite set of attributes A(X); A(X) is partitioned into two disjoint sets, the synthesized attribute set Ao(X) and the inherited attribute set A\{X). The syntactic rule has the following form: Xkfi -+ Xk,lXk£ ■ ■ ■ Xk,nk ,

INFERENCE OF ATTRIBUTED

GRAMMAR

43

where k means the fcth production rule. The semantic rule maps values to the attributes of Xk,o,Xk,i,Xkt2y ■ ■ ,Xk,Uk- The evaluation of synthesized attributes is based on the attributes of the descendants; therefore it pro ceeds in bottom-up fashion. On the other hand, the evaluation of inherited attributes is based on the attributes of the ancestors; therefore it proceeds in top-down fashion. To explain the inference procedure, let us consider the example of the previous string aaadgggggeeaaagg, where each primitive has a unit length attribute 1. First, it will be converted into the following string by merging identical primitives: (a,3)(
Semantic rules

(1) 5 ->

L(A!) = 3, L(D) = 1, L ( d ) = 5, L(E) = 2, L(A2) = 3, L(G2) = 2

(2)

ADGEAG

A->aA

y{Ax) = y(a) + y(A2)

(3) A -> a

y(A) = y(a)

(4)

D-tdD

y(D1)^y(d)

(5)

D^d

y(D) = y{d)

(6)

E^reE

y(E1)=y(e)+y(E2)

(7)

E->e

+ y(D2)

y(E) = y(e)

(8) G -+ gG

y(Gl) = y(g) + y(G2)

&)G-+g

y(G) = y(g)

where L denotes the inherited length attribute, and y denotes the synthe sized length attribute. The number right after the nonterminal symbol in the semantic rules is used to distinguish between occurrences of the same nonterminal. For example, in the production rule (2), A\ represents the nonterminal A on the left side; A2 represents the nonterminal A on

ATTRIBUTED

44

GRAMMAR AND

ERROR-CORRECTING...

the right side of the syntactic part. It is noted that the inherited length attribute, L, is not down to the descendents as it usually is; rather it is used to maintain the semantic information of the training string and as a reference for comparison in parsing. For simplicity, let y(a) = 1 for all a £ Vf. Consider the second input string

aakkdddffeeeea. We convert it into (a,2)(M)(d,3)(/,2)(e,4)(a,l) and add the following productions to the inference grammar.

Syntactic rules

Semantic rules

5 -> AKDFEA L(Ai) = 2, L(K) = 2, L(D) = 3, K^kK K->k F^fF F->f

L(F) = 2, L(E) = 4, L{A2) = 1 y(K1)=y(k) + y(K2) y(K) = y(k) y(F1) = y(f) + y(F2) y(F)=y(f)

For the new input string, there will be no need to add those production rules, A —> a A, A —► a,..., etc. One production rule is created for each input string, i.e., the first production rule in the above example. In fact, there are (2m + n) production rules for a set of n training strings, where m is the number of nonterminal symbols. We now formulate the inference algorithm of attributed grammar which uses the length attribute [80]. Algorithm 4.1

Inference of an attributed context-free grammar

Input: A set of training strings. Output: An inferred attributed context-free grammar. Method: (1) Convert each input string to the attributed string by merging identical primitives.

MINIMUM-DISTANCE ERROR-CORRECTING EARLEY'S ...

45

(2) For each input attributed string 0.10,20,3 ■ ■ ■ a,}., add the production S —¥ A1A2A3 ■ ■ ■ Ak to the inference grammar, where Ai is the nonterminal corresponding to terminal a;, and the semantic rule L(Ai) = j/i, 1 < i < k, where yi is the length attribute of primitive aA, y(Ai) — y(a) + 2/(^2) and A -> a, y(A) = y(a) to the inference grammar, if they are the new production rules.

This inferred grammar will generate excessive strings if we apply syn tactic rules only. However, we can use semantic rules (inherited attributes) to restrict the grammar so that no excessive strings are generated.

4.6.

MINIMUM-DISTANCE ERROR-CORRECTING EARLEY'S PARSING FOR A T T R I B U T E D STRING

A modified Earley's parsing algorithm is here developed for attributed context-free languages. Here, errors of insertion, deletion, and substitu tion transformation are all considered in the derivation of Earley's item lists. Let the attributed grammar G = (Vn,Vt,P, S) be a CFG (contextfree grammar), and let z = &1&2 • • • °n be an input string in Vt*. The form [A —> ct»P, x, y, i] is called an item for z if A —¥ a/3 is a production in P and 0 < i < n [2, 32]. The dot in a • (3 between a and f3 is a meta-symbol not in Vn or Vt, which represents the parsing position; x is a counter for local syntactic deformation which accumulates the total cost of substitution of terminal symbols. When A = S, y is used as a counter for global deforma tion which records the total cost of insertion and deletion errors. On the other hand, if A ^ S, then y is used as the synthesized attribute of A. The meaning of index i is the starting parsing position of the string, and it is the same pointer as the conventional item of Earley's parsing algorithm. The parsing algorithm for an input string z is shown in the following. Algorithm 4.2 Minimum-distance an attributed string

error-correcting Earley's parsing for

ATTRIBUTED

46

Input:

GRAMMAR AND

ERROR-CORRECTING...

An attributed grammar G= (Vn, Vt,P,S) and a test string z = b\ b2 ■ ■ ■ bn in Vt*

Output: The parse lists IQ, I\,...,In, and the decision whether or not z is accepted by the grammar G together with the syntactic and semantic deformation distances. Method: (1) Set j = 0 and add [S -> • a, 0,0,0] to Ij if S —> a is a production in P. (2) Repeat steps (3), (4) and (5) until no new items can be added to Ij. (3) If [B —> £ •, xi,yi,i] is in Ij,B ^ S, and (a) if [A —> a • B(3, x2,y2, k] is in h and A ^ S, then add item [A —> a,B • f3,xi +x2,yi +V2,k] to Ij. (b) if [S* —y a • Bfi, x2, y2, k] is in Ii, then add item [S -¥ aB • (3,x\ + x2,V2 + \L{B) -yi\,k] to Ij. (c) if [S —> a • Cf3,x2,y2,k] is in i,, C ^ B, then add item [S1 —^ a • C/3, x2,yi + y2, k] to Ij. (d) if [S —>• a •, a;2, j/2, A] is in Ii, then add item [S —>■ a •, x2, y\ + y2, k] to Ij. (4) If S —> ^ is a production in P, and if [A —> a • B(3, x, y, i] is in Ij, then add item [B -> • £ , 0,0, j] to Ij. (5) If [5 -4 a»B/3, x,y,i] is in i j , then add item [5 —>■ aB»/3, x,y + L(B),i) to i j . (6) If j" = n, go to step (8); otherwise increase j to j + 1. (7) For each item [A —>■ a • a/3, x, y, ij in i j - i , add item [A —J- a a • /3, x + ^(a, 6j), y + y(a),«] to Ij, where y{a) is the synthesized attribute of a. For simplicity, let y(a) = 1 for all a in VJ. S(a, bj) is the substitution cost, and S(a,a) = 0. Go to step (2). (8) If item [S —> a»,x,y,0] is in In, then string z is accepted by grammar G where x is the local deformation distance, and y is the global de formation distance; otherwise, string z is not accepted by grammar G. Exit. In the above algorithm, step (3b) handles the semantic insertion and deletion errors, steps (3c) and (3d) handle the syntactic insertion errors, step (5) handles the syntactic deletion errors, and step (7) handles the syn tactic substitution errors. It is possible for collision to occur in the process of developing a new item; i.e., the old item has already been in the list

EXPERIMENT

47

when a new item is to be put in the list. Under this situation, the one with the minimum distance (minimum of x + y) is selected for that item. Actually, collision may occur with only the items related to 5-productions, because insertion and deletion transformations are allowed for those items only. Since the error-correcting grammar is ambiguous, the time complexity is 0(n3), and the space complexity is 0(n2), where n is the length of the input string. The parsing is inefficient if the length of the input string is large. 4.7.

EXPERIMENT

Given a training string abbe. Using Algorithm 4.1, the inferred attributed grammar is shown in Fig. 4.2. An input test string aabc is parsed by Algorithm 4.2. The corresponding item lists are shown in Fig. 4.3. As we

Training string : abbe The attributed grammar G(Vn,Vt,P,S) is as follows : Vn = {S,A,B,C} Vt={a,b,c) S ={S} The production set P is as follows : Syntactic rules Semantic rules S-^ABC A->aA A—>a B->bB B->b C->cC C->c

Fig. 4.2.

L(A) = 1 L(B) = 2 U.Q = 1 y(A) = y(a)+y(A) y(A) = y(a) y(B)=y(b)+y(B) y(B)=y{b) y(Q = y(c) + y(Q y(Q = y(c)

Training string abbe and its inferred atributed grammar for Earley's parsing.

48

ATTRIBUTED GRAMMAR AND

ERROR-CORRECTING...

Test string aabc and its item lists I[0] contains [S-+ABC* [C->»c [C^fcC [S^>AB*C [B->»b [B^'bB [S->A»BC [A—>»a [A-^faA [S^'ABC

,0,4,0] ,0,0,0] ,0,0,0] ,0,3,0] ,0,0,0] ,0,0,0] ,0,1,0] ,0,0,0] ,0,0,0] ,0,0,0]

I [4] contains [C->cC* [B->bB» [A—»oA» [A-^aA* [B->bB»

[C->cO [C-»c [C-^'cC [B->»b [B->«6B [A->«a [A-^aA» [C->cO [B->kB« [S->»ABC [A-*aA* [S-+ABC* [S^>AB*C [S->A'BC [A-^>a»A [A->a»

Mil

I[3] contains [A->aA» ,1,3,0] [B^,bB» ,2,3,0] ,3,3,0] [C->cO ,0,0,3] [C-*»c ,0,0,3] [C^'cC ,0,0,3] [S-»£ [B-^'bB ,0,0,3] [A->»a ,0,0,3] [A—>*aA ,0,0,3] [C-^cC* ,2,2,1] [B^>bB* ,1,2,1] ,0,3,0] [S-fABC [A-)aA» ,1,2,1] ,1,0,0] [S-»Afl»C [S^ABC* ,1,1,0] ,0,2,0] [S^>A*BC [A-^a*A ,1,1,2] [A—>a* ,1,1,2] [B->b*B ,0,1,2] [B->6« ,0,1,2] [C->c»C ,1,1,2] [C->c» ,1,1,2]

I[l] contain, [C->»c [C^»cC [£->•* [B->.W? [A-»«a [A->*aA [S->ABC [S^AmBC [S^AB»C [S-+ABC* [A-*a*A [A-*a» [B^b'B [£->£■• [C-^cC [C^>c

I[2] contains ,0,0,1] ,0,0,1] ,0,0,1] ,0,0,1] ,0,0,1] ,0,0,1] ,0,1,0] ,0,0,0] ,0,2,0] ,0,3,0] ,0,1,0] ,0,1,0] ,1,1,0] ,1,1,0] ,1,1,0] ,1,1,0]

[C^'c [C->*cC [£->•& [B-**bB [A->»a [A^>*aA [C->cO [B-+bB» [S-*»ABC [A-^aA* [S^ABC* [S^>AB»C [S^A'BC [A-»a»A [i4-»a» [B-)b»B [B->6. [C->c*C [C-^c»

,0,0,2] ,0,0,2] ,0,0,2] ,0,0,2] ,0,0,2] ,0,0,2] ,2,2,0] ,2,2,0] ,0,2,0] ,0,2,0] ,1,2,0] ,1,1,0] ,0,1,0] ,0,1,1] ,0,1,1] ,1,1,1] ,1,1,1] ,1,1,1] ,1,1,1]

,3,4,0] ,3,4,0] ,2,4,0] ,2,3,1] ,2,3,1] ,2,3,1] ,0,0,4] ,0,0,4] ,0,0,4] ,0,0,4] ,0,0,4] ,0,0,4] ,1,2,2] ,1,2,2] ,0,4,0] ,2,2,2] ,1,0,0] ,1,1,0] ,0,3,0] ,1,1,3] ,1,1,3] ,0,1,3] ,0,1,3] ,1,1,3] ,1,1,31

Fig. 4.3. Item lists of the Earley's attributed parsing on the test string aabc.

EXPERIMENT

49

can see from the derived item lists, the three kinds of errors are considered. The corresponding items are generated for each possible error transforma tion. Because the item [S -> ABC •, 1,0,0] is in I± list, the string aabc is accepted with local syntactic deformation distance 1 and global deformation distance 0.

This page is intentionally left blank

Chapter 5

ATTRIBUTED G R A M M A R AND M A T C H P R I M I T I V E MEASURE ( M P M ) FOR R E C O G N I T I O N OF SEISMIC WAVELETS

5.1.

SUMMARY

The computation of the match primitive measure between two attributed strings using dynamic programming is proposed. The MPM parsing algorithm for an attributed string can handle three types of error. The MPM parsing algorithm is obtained from the computation between the in put string and the string generated by the attributed grammar. The MPM parsing is more efficient than the modified Earley's parsing. The recognition criterion of the MPM parsing algorithm is "maximum-matching". The Earley's parsing and MPM parsing methods are applied to the recognition of seismic Ricker wavelets and the recognition of wavelets in real seismic data.

5.2.

SIMILARITY M E A S U R E OF A T T R I B U T E D S T R I N G MATCHING

Although the modified Earley's parsing algorithm considers all three types of errors, the parsing is inefficient. Here, the parsing of an attributed string using the match primitive measure (MPM) is proposed. The similarity measure between two attributed strings is proposed and discussed in the following. The match primitive measure (MPM) is defined as the maximum num ber of matched primitives between two strings. The computation of the MPM between two length-attributed strings can be implemented by the 51

52

ATTRIBUTED

GRAMMAR AND MATCH PRIMITIVE

...

dynamic programming technique on grid nodes as shown in Fig. 5.1. For each node, three attributes are associated, i.e., (f,h,v). Let a be an attributed string, where a[i] denotes the ith primitive in a; a[i].s and a[i].y denote t h e syntactic symbol and length attribute of a[i], respec tively. Let (i,j) indicate the position in the grid. f[i,j] represents the MPM value from point (0, 0) to (i,j), i.e., the MPM value between two attributed substrings (a[l].s,a[l].y)(a[2].s, a[2].j/) • • -(a[i].s,a[i].y) and (b[l}.s,b[l].y)(b[2].s,b[2}.y)---(b[j}.s,b[j}.y) of attributed strings a and b. h[i,j] and v[i,j] represent the residual length attributes of primitive a[i] and b[j], respectively, after the match primitive measure (MPM) between two attributed substrings (a[l].s,a[l].j/)(o[2].s, a[2].y) • ■ -(a[i].s,a[i].y) and (b[l}.s,b[l}.y)(b[2}.s,b[2].y)---(b{j}.s,b[j].y) of attributed strings a and b. The partial MPM f[i,j] can be computed from the partial MPM's f[i — l,j] and f[i, j — 1] as shown in Fig. 5.1. The following algorithm is proposed to compute the MPM between two attributed strings.

Ai-hj-l]

fiij-l] •O-

fii-hfl O

Fig. 5.1. Partial MPM f[i,j] computed from f[i,j — 1] and f[i — l,j].

Algorithm 5.1 Computation of the match primitive measure between two attributed strings Input:

(MPM)

Two attributed strings a and b. Let a = (a[l].s,a[l].y)(a[2].s,a[2].y) ■ • • (a[m].s,a[m].y) b = (b[l].s, b[l].y)(b[2}.s, b[2].y) ■ ■ ■ (b[n].s, b[n).y) where m, n are the number of primitives of a and b, respectively. Output: The maximum MPM S(a, b).

SIMILARITY

MEASURE OF ATTRIBUTED STRING MATCHING

53

Method: (1) /[0,0]:=0;/i[0,0j:=0;t;[0,0]:=0; (2) for i:— 1 to m do begin /[t,0]:=0; h[i,0] :=a[i].y; v[i,0}:=0; end; (3) for j := 1 to n do begin /[0,j]:=0; h[0,j]:=0; v[0,j] :=b[j}.y; end; (4) for i := 1 to m do begin for j := 1 to n do begin nodi := hmove(z, j) nod2 := vmove(j, j) if n o d i . / > nod2./ then nodefi, j] : = nodi else node [i, j] := nod2; end; (5) Output S{a,b):= f[m,n\/ y/yi x y2; where 2/i = Eia[il-J/>2/2 = E i 6l?1-2/ Functions hmove and vmove are written as follows: function hmove(i, j): node_type; {node(t'-l, j) -*• node (i,j)} begin if a[i].s 7^ b[j}.s then eW := 0; else dt := min(v[z — 1, ji],a[i].j/); hmove./ := f[i - 1, j] + oK; hmove./i := a[i].y — d£; hmove.'u := u[i - 1, j] - di; return(hmove); end; function vmove(i, j): node.type; {node(i, j - 1) ->■ node (i, j)} begin if a[i].s ^ b[j].s then d^ := 0;

ATTRIBUTED

54

GRAMMAR

AND MATCH PRIMITIVE

...

else dl := min(/i[i, j - 1], b[j].y); + d£; vmove./ = f[i,j-l} vmove./i = h[i,j -1] -d£; vmove. v = b[j].y - dlreturn(vmove); end; In the above algorithm, two functions are used. Function hmove is used to compute the variation of attributes (/, h,v) from node (i — l,j) to node (i, j). Function vmove is used to get the value of (/, h, v) at point (i,j) from point (i, j — 1). An example of the MPM computation of two attributed strings is shown in Table 5.1. The normalized MPM value is calculated. Table 5.1. Calculation of the MPM between two attributed strings a and b, where a = (a, 3)(d, l)(c, 5)(h, 2)(a, 3)(g, 2). 6 = (a, 4)(c, 2)(d, 2)(c, 4)(h, l)(g, 3). Two strings:

a = {a,3)(d, l)(c, 5)(h, 2)(o, 3)(g, 2) 6 = (a,4)(c,2)(d,2)(c,4)(h,l)( 9 > 3)

/ v' 0 0 (a,4) 1 (c.2) 2 * (cf,2) 3 (c,4) 4 (M) 5 (5.3) 6

(a,3) (of,1) (c,5) (/7,2) (a,3) (0,2) 1 2 3 4 5 6

f=0

f=0

f=0

f=0

f=0

f=0

f=0

/) = 0

fc = 3

ft=1

h=5

/i = 3

h=2

v=0

v=0

v=0

v=0

h=2 v= 0

v=0

v=0

/=0

f=3

f=3

/=3

1=3

f=4

f=4

h=0

/i=0 i/= 1

/i = 1 i/=1

h=5 i/=1

h=2

h=2 i/=n

h=2 i/=n

v=4

v=t

f=0

f=3

f=3

f=5

f=5

/=5

f=5

h=0 v= 2

h=0

/i=1

h=3

h=2

h=2

v=2

y=2

v=0

v=0

h=3 v=0

f=0

f=3

f=4

f=5

f=5

f=5

f=5

h=0 v=2

h=0

/) = 0

/j = 3

v=1

v=2

h=2 v= 2

h=0 v= 2

h=2

v=2

f=0

f=3

/=4

/=8

f=8

f=8

f=8

h=0

h=0

ft = 0

h=0

v=4

v=4

v=4

v=1

h=2 v=-\

h=3 v=-\

h=2 v= 1

v=0

v=2

/=0

f=3

f=4

f=8

f=9

1=9

/=9

/) = 0

h=0

/1 = 0

h=0

h=1

ft = 2

v=\

v=1

v=1

^=1

v= 0

h=3 v=0

/=6

(=3

f=4

t=8

f=9

/=9

Mi

ft = 0

h=0

h=0

ft = 0

ft=1

v=3

v=3

v=3

v=3

/! = 3 v= 3

/i = 0

v=3

s(a, b) = l W l 6 x l 6 = 0.6875

v=0

v=1

INFERENCE

5.3.

OF ATTRIBUTED

55

GRAMMAR

I N F E R E N C E OF A T T R I B U T E D G R A M M A R

For the parsing of an attributed string using the property of the MPM, the attributed grammar for the training strings is inferred first. The inference procedure of an attributed grammar is similar to Algorithm 4.1 and is described below. Algorithm 5.2

Inference of attributed grammar

Input: A set of training strings. Output: An attributed grammar. Method: (1) Convert each input string to the attributed string by merging identical primitives. (2) For each input attributed string 0102^3 • • • a^, add to the grammar the production S -> A1A2A3 ■ ■ -Ah, where Ai is the nonterminal corres ponding to terminal a,; and the semantic rule L(Ai) = yi, 1 < i < k, where j/j is the length attribute of primitive a*. (3) For each primitive a, add the production rule A -» a, y(A) = y(a) and y(a) — y, where y is the length attribute of primitive a. The example is shown in Fig. 5.2.

Training string : abbe The attributed grammar G(Vn,Vt,P,S) is as follows : Vn = {A, B, C, S] Vt={a,b,c] S ={S] The production set P is as follows Syntactic rules Semantic rules

Fig. 5.2.

S-> ABC

■-\L(B)--= 2 L(Q = 1 UA) =

A-^a B^b C^c

y(A)-- ■-y(a), y(B) =--y(b), y(Q =--y(c),

y(a) = 1 y(b) = 2 y{c) = 1

Training string abbe and its inferred attributed grammar for the MPM parsing.

ATTRIBUTED

56

5.4.

GRAMMAR.AND MATCH PRIMITIVE

...

T O P - D O W N PARSING U S I N G M P M

Given a n attributed grammar G and input attributed string z, the value of the MPM between z and L(G), the language generated by the grammar G, is calculated. Consider an 5-production rule in the grammar, which has the form S -> AiA2Az

■ ■ ■ Am .

For each nonterminal at the right-hand side of 5-production rule, two at tributes are associated with it. f[k] denotes the MPM value calculated from the beginning up to the parse of fcth nonterminal. h[k] is a kind of residual attribute used for the calculation later on. The proposed algorithm to compute the MPM between z and L{G) is described in the following. Algorithm 5.3

Top-down parsing using the MPM

Input:

An attributed grammar G = (Vn, Vt, P, 5) and an input string z. Let m = the number of primitives in z. n = the length of z = J2t zW-2/Output: The maximum MPM between z and L(G). Method: (1) (2) (3) (4)

(5)

(6) (7) (8)

Set N = the number of 5-production rules, and max_MPM = 0. Set /[0] = 0 and h[0] = 0. For all 1 < k < N do steps (4) to (10). Apply the fcth ^-production rule with the form Sk -» Ak,iAk,2 ■ ■ ■ Afc,mfc, where mk is the number of nonterminals at the right-hand side of the &th 5-production rule to do steps (5) to (8). For all 1 < i < m^ do { f\i} = 0; h[i] = L(Akli); }. For all 1 < j < m do steps (7) and (8). Set VQ = z[j].y and v = VQ. For all 1 < i < mk do Apply production rule Akyi -» a,k,i(a) if z[j].s = a,k,i, then dt - min(y(afcii), v) else d£ = 0; fi=f[i-l]+dl;

TOP-DOWN

PARSING

USING

57

MPM

hi = y(ak,i) - di; vi = v — d£; (b) if z[j].s = a,k,i, then di = min(/i[i], vo) else cK = 0;

/ 2 = /[*]+<#; ft2 = h[i] - di; V2 = v0— di; (c) if / i > f2 then { /[<] = / i ; /i[i] = hi; v-vi; } else{ f[i] = / 2 ; ft[i] = /i 2 ; v = v2; } (9) MPM = f[mk]/V^h, where Zfc = JJJi L(Ak,^. (10) If MPM > max_MPM, then max_MPM = MPM. (11) Output max_MPM. Here the normalized MPM is calculated. Algorithm 5.3 is obtained from the comparison between the input string and the string generated by the 5-production rule. Example 5.1 The training string abbe and its inferred attributed gram mar are shown in Fig. 5.2. One input string aabc has been tested, and the parsing result is shown in Table 5.2. The MPM value is 0.75 after normalization. Table 5.2.

Parsing result for the test string aabc; the MPM value is 0.75. Test string:aa6c = (a, 2)(6, l)(c, 1)

it

1

2

3

1 1

0 1

0 1

0 1

0 1

1 1

2 3

1 1

2 2

2 3

max_MPM = 3 tf%x%~ = 0.75

58

ATTRIBUTED

5.5.

GRAMMAR

AND MATCH

PRIMITIVE

...

E X P E R I M E N T S OF SEISMIC PATTERN RECOGNITION

5.5.1.

Recognition

of Seismic

Ricker

Wavelets

Since the seismic wavelets have structural information, it is natural to adopt a syntactic approach in seismic pattern analysis [58, 59, 62, 6466]. The above proposed Earley's parsing in Chapter 4 and MPM parsing in Chapter 5 are used for the classification of seismic Ricker wavelets. A block diagram of the classification system is shown in Fig. 4.1. Twenty eight zero-phase Ricker wavelets of eight classes with Gaussian noise are generated in the simulated seismic traces in Fig. 3.2. Eight class wavelets with different frequencies and reflection coefficients are selected as the train ing patterns and listed in Table 3.1. Each wavelet in seismic trace and its corresponding class is listed in Table 3.2. The signal of each seismic trace is converted into a syntactic string of primitives. The sampling interval is 1 ms. Each segment is assigned as a primitive. The modified Freeman's chain code is used. Since the one dimensional seismic data are processed, nine primitives with rightward direction are defined in Chapter 3. The strings of the eight training Ricker wavelets corrupted by noise are listed in Table 3.1. In Earley's parsing approach, the attributed grammar for each class is inferred from the eight training strings by using Algorithm 4.1. The recognition rate and average cpu time of the modified Earley's parsing by using Algorithm 4.2 are listed in Table 5.3. The computer is VAX 11/780. The parsing speed is not good due to the inefficiency of Algorithm 4.2. However, the recognition rate is good. Table 5.3. The average parsing time and percentage of correct classification of the classifier using the attributed error-correcting Earley parser (Algorithm 4.2). Average CPU time for one string (sec) 1,974.8

Percentage of correct classification (25/28)

89.29%

In the MPM parsing approach, the same set of test data are used again. The attributed grammar is inferred by using Algorithm 5.2. Four cases with different weighting of errors are listed in Table 5.4. The recognition

EXPERIMENTS

OF SEISMIC

Table 5.4.

PATTERN

RECOGNITION

59

The four different cases and the descriptions for each case.

Case no.

Case descriptions

1

Normalized mpm. Estimating the number of substitution errors by use of an upper bound. Estimating the number of insertion and deletion errors by use of lower bound.

2

Nonequal weights are assigned to insertion, deletion and substitution transformations. weight(insertion) = 2.0 weight (deletion) = 2.0 weight(substitution) = 1.0 Estimating the number of substitution errors by use of a lower bound. Estimating the number of insertion and deletion errors by use of an upper bound.

3

Nonequal weights are assigned to insertion, deletion and substitution transformations. weight(insertion) = 2.0 weight(deletion) = 2.0 weight (substitution) = 1.0 Estimating the number of substitution errors and the number of insertion and deletion errors by use of the averages of upper bound and lower bounds.

4

Nonequal weights are assigned to insertion, deletion and substitution transformations. weight (insertion) = 2.0 weight (deletion) = 2.0 weight (substitution) = 1.0

Table 5.5. The average parsing time and percentage of correct classification of the attributed-grammar parser using Algorithm 5.3 for four different cases.

Case no.

Average CPU time for one string (sec)

1 2 3 4

0.244 0.244 0.243 0.242

Percentage of correct classification 21/28 23/28 17/28 21/28

75.00% 82.14% 60.71% 75.00%

60

ATTRIBUTED

GRAMMAR AND MATCH PRIMITIVE

...

rate and the cpu time of the MPM parsing by using Algorithm 5.3 are listed in Table 5.5. The parsing speed is fast. Nevertheless, the recognition rate is not better than that of Earley's parsing. If the method of estimating distance is used, the recognition rate is improved; especially for Case 2. In Case 2, some of the insertion or deletion errors are counted as substi tution errors. In Case 3, all of the substitution errors are considered as insertion and deletion errors. Due to the effect of noise, substitution errors much more easily occurred than the insertion and deletion errors. Insertion and deletion errors are more important than the substitution errors in our study.

5.5.2.

Recognition

of Wavelets

in Real

Seismogram

A real seismogram in the Mississippi Canyon is studied for the classification of wavelets and is shown in Fig. 5.3. The selected training traces are the 6th, 14th, 22nd, 30th, 38th, 46th, 54th, and 62nd traces. The wavelets are extracted through the process of peak detection and wavelet determination. The training samples are clustered into six classes by using the hierarchical JO

2D

30

"O

50

»

Fig. 5.3. Real seismogram at Mississippi Canyon.

EXPERIMENTS

OF SEISMIC »

PATTERN 2D

RECOGNITION 30

«

50

61 »

D.OSE

1.0SEC

2 0 SEC

Fig. 5.4(a). Canyon.

The detected waveforms of the 1st class in the seismogram from Mississippi

ID

2D

30

50

»

fcOSEC

USE ■

2* EEC • Fig. 5.4(b). Canyon.

The detected waveforms of the 2nd class in the seismogram from Mississippi

ATTRIBUTED

62

»

»

GRAMMAR

AND MATCH

»

»

«

PRIMITIVE

.

»

6.0 SEC

1.0 ££C

2 0 SEE

Fig. 5.4(c). Canyon.

The detected waveforms of the 3rd class in the seismogram from Mississippi

»

IB

30

«

50

60

0.0 5£C

1.0 SEC -

IP 2 0 2C

Fig. 5.4(d). Canyon.

The detected waveforms of the 4th class in the seismogram from Mississippi

EXPERIMENTS

OF SEISMIC

»

PATTERN

RECOGNITION

»

»

SO

63

»

CLCSEC

MSEC

iosr Fig. 5.4(e). Canyon.

The detected waveforms of the 5th class in the seismogram from Mississippi

» CLO

20

»

«

50

»

EEC

1.CEEC ■

10 5EC • Fig. 5.4(f). Canyon.

T h e detected waveforms of the 6th class in the seismogram from Mississippi

64

ATTRIBUTED GRAMMAR AND MATCH PRIMITIVE

...

dendrogram. Because of the better recognition rate in the above analysis, the modified Earley's parsing is chosen in this process. After the attributed parsing, the classification results are shown in Fig. 5.4(a)-5.4(f). 5.6.

CONCLUSIONS

Using an attribute, it is possible to reduce the length of pattern string and the size of an inferred attributed grammar. Due to the repetition of primitives in the string representation of wavelets, attributed strings and grammar are a better way to describe the wavelets. The computation of the match primitive measure between two attributed strings using dynamic programming has been proposed. The MPM parsing algorithm has been obtained by modifying the comparison between two attributed strings to the comparison between an input string and the strings generated by the attributed grammar. The recognition criterion of the modified Earley's algorithm is minimum-distance, but the recognition criterion of the MPM parsing algorithm is maximum-matching. The parsing methods have been applied to the recognition of seismic Ricker wavelets and the recognition of wavelets in real seismic data. The recognition results can improve seismic interpretation.

Chapter 6

STRING DISTANCE AND LIKELIHOOD RATIO TEST FOR DETECTION OF CANDIDATE BRIGHT SPOT

6.1.

SUMMARY

Syntactic pattern recognition techniques are applied to the classification of wavelets in the seismograms. The system for one-dimensional seis mic analysis includes a likelihood ratio test, optimal amplitude-dependent encoding, probability of detecting the signal involved in the global and local detection, plus minimum-distance and nearest-neighbor classification rules. The relation between error probability and Levenshtein distance is proposed. 6.2.

INTRODUCTION

In a seismogram, the wavelets of a bright spot have a specific structure. So syntactic pattern recognition is proposed to detect the candidate bright spot trace by trace. A block diagram of the 1-D syntactic pattern recognition system for the detection of candidate bright spot is shown in Fig. 6.1. The characteristic of 1-D syntactic approach is the string matching in the seismic trace. For the detection of candidate bright spots, testing traces are selected from the input seismogram and tree classification techniques are used in the detection of candidate bright spots [56, 57]. From the detected candidate bright spot, the sample patterns of the wavelets are extracted. Amplitudedependent encodings of optimal quantization are used. The global detection is to detect the possible wavelets. Levenshtein distance [75] is computed 65

66

STRING

DISTANCE

Global. Input seismo ram

Optimal quantization encoding

String

detection: extract possible wavelets

AND LIKELIHOOD

-

distance computation

Thresholding to detect candidate bright spots

RATIO

TEST

...

Classification result

Recognition ' Testing i

Tree classification tobright spot pattern s

1 Optimal quantization

Encoded pattern 3f J br ght s pots

Fig. 6.1. A block diagram of syntactic pattern recognition system (for the detection of candidate bright spot in a seismogram).

between the possible wavelet string and the extracted strings of bright spots. The local detection is to extract the candidate wavelet. Using the proba bility of detection, a threshold is set to detect the candidate bright spot. The system is used to detect the candidate bright spot, trace by trace, in the real seismograms of Mississippi Canyon and High Island.

6.3.

OPTIMAL QUANTIZATION E N C O D I N G

Initially, the seismic trace is encoded by amplitude-dependent encoding. Assign the ith pair of waveform points [(a?i, 2/i), (a;»+i, 2/i+i] to the sym bol Wi denoting the slope characteristic of the line segment joining the two points. Let di = j/j + i — yi. For the amplitude-dependent encoding, the assignment of di = yi+i — yi to a symbol is a quantization problem. The optimal quantization of 8 levels for the Gaussian samples is used [47]. From the experiments described here, if the standard deviation as of di from the signal is larger than 1.5an of di from noise, then an 8level optimal quantization is good. The pattern primitives are defined as follows.

LIKELIHOOD RATIO TEST (LRT)

67

Wi = d

for

di > 1.76(7,

cs = 2.15
Wi = c

for

1.05 < d» < 1.76cr,

Wi = b

for

0.5 < di < 1.05cr,

Wi = a

for

0.0 < di < 0.5a,

c 5 = 0.24a

Wi=A

for

- 0.5
,

Wi = B

for

- 1.05 < di < -0.5t7 ,

Wi = C

for

- 1.76
Wi = D

for

di < 1.76a,

O.OCT

c7 = 1.34<7 C6 = 6.75<7

c4 = -0.24
-1.05cr,

c3 = -0.75<7 c2 = -1.34er

Ci = —2.15<7

where a is the standard deviation of di = T/J+I —J/J distribution for the signal and Ci,i = 1, 2 , . . . , 8 are the conditional sample-mean value of each interval. The extracted pattern samples of bright spots in the real seismograms at Mississippi Canyon and High Island are shown in Table 6.1. Table 6.1.

The encoded string of the extracted patterns. String representation

Pattern For Mississippi Canyon (1) (2) (3) (4)

Pattern Pattern' Pattern Pattern

from from from from

the the the the

13th 21st 29th 37th

trace trace trace trace

BDCbcddB ABBAAbdc BDCaddbc ACCAcddc

the the the the

45th 50th 51st 53rd

trace trace trace trace

BBAbccaBBA BBBacdbBCB BBAbdcaCCB BBAbdcaBBA

For High Island (1) (2) (3) (4)

6.4.

Pattern Pattern Pattern Pattern

from from from from

LIKELIHOOD RATIO TEST (LRT)

The distribution of the differences di = t/i + i — yi} i = 1,2, ...,JV, of the extracted bright spot sample is a Gaussian distribution with N(0,<J^). Assume dj is corrupted by Gaussian noise ./V(0, <7^) and the signal and the noise are independent Gaussian. In the detection problem, there are two

68

STRING DISTANCE AND LIKELIHOOD RATIO TEST ...

hypotheses. One is the signal plus noise, N(0,a2 + &%), the other is the noise only, N(0, a%). For the following hypotheses:

HQ : Gassian noise P{r) =

1 \-r2' .—— exp —5V27TCTn

L2c7n.

1 \-r2~\ H\ : Signal + noise P(r) — —==— exp —=• , V 27T(7i

[

2<J

where a\ = o\ + a2

1 J

Let the Likelihood Ratio Test (LRT) [108] be

A{r}

p(r/Ho) n^i

2

[a2

a2)_

'

If A(r) > r), then H\ is true. If A(r) < 77, then H0 is true. Taking In on both sides and rearranging the terms, we obtain the following result. Let ln7?-ln(^)_^

If r2 > [I2, then Hi is true. If r2 < (32, then H0 is true, i.e., If \r\ > /3, then Hi is true. If |r| < {3, then Ho is true. Suppose that Ho and Hi are equal probable, then 7} = 1, which is used in the experiments.

6.5.

L E V E N S H T E I N DISTANCE A N D ERROR PROBABILITY

Levenshtein distance between two strings is denned as the minimum number of symbol insertions, deletions, and substitutions required to transform one string into the other string [41]. The distance calculation is based on the error transformations. Combining the idea of detection theory and Levenshtein distance, the error (missing and false) probabilities using 8level quantized value can be calculated. The example of the calculation is in the following experiment.

EXPERIMENT AT MISSISSIPPI

6.6.

CANYON

69

E X P E R I M E N T AT MISSISSIPPI C A N Y O N

1'he real seismogram at Mississippi Canyon in Fig. 6.2 is processed. Tree classification with partitioning-method is used. The result of this prepro cessing for locating the wavelets is shown in Fig. 6.3(a). Test traces are the 5th, 13th, 21st, 29th, 37th, 45th, 53rd, and 61st traces. The selected samples of detected candidate bright spot are on the 13th, 21st, 29th, and 37th traces in Fig. 6.3(a). The smallest variance of di distribution on the 21st trace is used in the analysis of signal and noise. The distribution of of di the extracted samples (signal plus noise) of the 21st trace is Gaus sian with JV(0, a\) and a = 0.055478. The distribution of di for noise only on the 1st trace is Gaussian with N(0,
60 50 10 30 20 tO 0 .OSgc 'ii 111 M u I H 111 111 11 11 M I 1 1 I I I 111 111 11 111 11 111 11 111 11 M i u I I n 11 il

Fig. 6.2. Real seismogram at Mississippi Canyon (negative on the right).

70

STRING

DISTANCE

AND LIKELIHOOD

37

O.OSec

29

RATIO

21

13

•

.

TEST .

t.OSec

'

|

:

«>

2.0S*cFig. 6.3(a).

Tree classification result of bright spots.

/<«) 20.00 to.oo 0-000 1879

-.1252

-.0626

.0000

0626

Fig. 6.3(b). 8-level optimal quantization encoding for di of bright spot pattern of Mississippi Canyon data.

EXPERIMENT AT MISSISSIPPI

71

CANYON

PD = .9319

-.\er*

1252

-.0626

• 0000

.0626

Fig. 6.3(c). Probability of detection using the conditional mean of each interval in bright spot pattern.

and as = 0.053673. The 8-level optimal quantization encoding is deter mined by using as. Four extracted pattern samples from above four traces can he encoded as strings of candidate bright spot patterns. Four strings have different lengths. In order to be conveniently calculated in global and local detections, 8 symbols of each string are selected, i.e., minimum of four sample lengths and selecting the central part of longer strings. The training strings of candidate bright spot are listed in Table 6.1.

STRING DISTANCE AND LIKELIHOOD RATIO TEST

72

6.6.1.

Likelihood

Ratio

Test

...

(LRT)

Prom the likelihood ratio test, the threshold for signal and noise is deter mined. Signal is present, if r > 0.0240482, or r < -0.0240482. 6.6.2.

Threshold for Global

Detection

For an input seismic string, the wavelets must be detected and extracted. Because the detection is under the Levenshtein distance of string cal culation, the detected signal is defined as the possible wavelets. The accompanying detection is defined as the global detection. Comparing the 8-level optimal quantization with LRT in Fig. 6.3(b), the closest levels to /3 = ±0.0240482 are the end points at ±0.0268365 = ±0.5crs. Then, ±0.0268365 = ±0.5crs are selected as the new threshold. The areas above 0.0268365 and below -0.0268365 in Fig. 6.3(b) are the detected areas of the signal, i.e., the intervals of b, c, d, B, C, and D. The probability of detecting b, c, d, B, C, and D is 0.617. For input 8 symbols of string, 8*0.617 = 4.936. Here a threshold is set: if the number of symbol (b, c, d, B, C, or D) is equal to or larger than 5, then the possible wavelet is detected. Otherwise, input the next 8 symbol string. 6.6.3.

Threshold for the Detection Bright Spot

of

Candidate

The extracted string of the 21st trace is ABBAAbdc. Using the 8-level optimal quantization, the di value is quantized to the quantized value Cj, the conditional mean of each quantization interval [47], Assume that Gaussian noise is added to the quantized value c;, then the probability of detection in the quantized interval can be calculated from Fig. 6.3(c). The extracted string of the 21st trace has 8 symbols. Each symbol has its quantized value Cj. For 8 c» of the quantized value of the candidate bright spot, the probability of detecting every Cj can be calculated by using the statistical table of normal distribution. The extracted string of the 21st trace has 3A, 2B, lb, lc, and Id. From Fig. 6.3(c), the sum of these 8 detection probabilities is 5.7354(3*0.66 + 2*0.704+1*0.704+1*0.7115 + 1*0.9319 = 5.7354). Truncated 5.7354 = 5. The approximated number of detected symbols is 5. The number of missing and false symbols, i.e., error or undetected symbols, is 8 — 5 = 3.

EXPERIMENT AT HIGH ISLAND

73

The input string belongs to the string of candidate bright spot if the Levenshtein distance is less than 3 symbols. So 3 is selected as a threshold. Suppose that x1, x2, z 3 and xi are candidate bright spot strings on the 13th, 21st, 29th, and 37th traces, iidL{x1,y) < ku or dL(x2,y) < k2, or L 3 L i d (x ,y) < k3, or d (x ,y) < fc4, then y is the detected candidate bright spot string. The threshold values of ki, k2, k3 and k4 are 3 in this experi ment. The classification result is shown in Fig. 6.4. O.OSec

60

30

to

30

so

to

t.OSffC

«

it

KOXs

it

((titt ti

2.05»c Fig. 6.4.

6.7.

1-D classification result at Mississippi Canyon.

E X P E R I M E N T AT HIGH ISLAND

Similarly the real seismogram at High Island is shown in Fig. 6.5 and the classification result is shown in Fig. 6.6.

74

STRING DISTANCE AND LIKELIHOOD RATIO TEST . ■«

30

20

to

w Fig. 6.5. Real seismogram at High Island (negative on the right). so

Fig. 6.6.

to

1-D classification result at High Island.

Chapter 7

TREE GRAMMAR AND AUTOMATON FOR SEISMIC PATTERN RECOGNITION

7.1.

SUMMARY

In a number of synthetic seismograms, there may exist certain structures in shape. In order to recognize seismic patterns and improve seismic interpre tation, we use the method of tree automaton. We show that the seis mic bright spot pattern can be represented as tree, so we can use the tree automaton in the recognition of seismic patterns. The system of tree automaton includes two parts. In the training part, the training seismic patterns of known classes are constructed into their corresponding tree representations. Trees can infer tree grammars. Several tree grammars are combined into one unified tree grammar. Tree grammar can generate the error-correcting tree automaton. In the recognition, each input testing seismogram passes through preprocessing, pattern extraction, and tree repre sentation of seismic pattern. Then each input tree is parsed and recognized by the error-correcting tree automaton. Several fundamental algorithms on tree construction of the seismic pattern and tree grammar inference are pro posed in this study. The method is applied to the recognition of bright spot, pinch-out, fiat spot, gradual sealevel fall, and gradual sealevel rise patterns.

7.2.

INTRODUCTION

Tree grammars and the corresponding recognizers, tree automata, have been successfully used in many applications, for example: English character 75

TREE GRAMMAR AND AUTOMATON

76

...

recognition, LANDSAT data interpretation, fingerprint recognition, classi fication of bubble chamber photographs, and texture analysis [41, 78, 85, 89,116]. Fu pointed out that "By the extension of one-dimensional concate nation to multidimensional concatenation strings are generalized to trees." [41]. Comparing with other high dimensional pattern grammars: web gram mar, array grammar, graph grammar, plex grammar, shape grammar,..., etc. [41], tree grammar is easy and convenient to describe a pattern using data structure of tree, especially in the tree traversal and in the substitu tion, insertion, and deletion of a tree node. The system of tree automaton is shown in Fig. 7.1. In the training part of the system, the training seismic patterns of known classes are constructed into their corresponding tree representations. Trees can infer tree grammars [8, 12, 76, 77, 86]. Several tree grammars are combined into one unified tree grammar. Tree grammar can generate the error-correcting tree automaton.

Input testing seismogram

Preprocessing: (1) Envelope (2) Thresholding (3) Compression (4) Thinning

Pattern representation: (1) Pattern extraction (2) Primitive recognition (3) Tree construction

ErrorClassification correcting tree automata results

Training seismic

Pattern representation: (1) Pattern extraction (2) Primitive recognition

Tree grammar inference

Recognition ' Training i

Fig. 7.1.

A tree automaton system for seismic pattern recognition.

In the recognition part of the system, each input testing seismogram passes through preprocessing and tree representation of seismic pattern. The preprocessing includes envelope [35, 56, 57, 104], thresholding, com pression in the vertical time-axis direction, and thinning [63, 67, 107, 117]. Tree representation of seismic pattern includes the extraction of seismic patterns, primitive recognition, and construction of tree representation. So a seismic pattern can be constructed as a tree. Then the tree is parsed by the error-correcting tree automaton into correct class. Three kinds of tree automaton are adopted in the recognition: weighted minimum distance structure preserved error-correcting tree automaton

TREE GRAMMAR AND LANGUAGE

77

(SPECTA), modified maximum-likelihood SPECTA, and minimum dis tance generalized error-correcting tree automaton (GECTA). We have some modifications on the methods of weighted minimum distance SPECTA and maximum-likelihood SPECTA. We show that the seismic bright spot pat tern can be represented as tree, so we can use the tree automaton in the recognition of seismic patterns.

7.3.

TREE GRAMMAR AND LANGUAGE

A tree domain (tree structure) is shown below [41]. Each node has its ordering position index and is filled with a terminal symbol. 0 is the root index of a tree. 0

/ I \ 0.1 0.2 0.3 . . . / \ 0.1.1 0.1.2 . . . / 0.1.1.1

\ 0.1.1.2

Each node has its own children, except bottom leaves of the tree. The number of children at node is called the rank of the node. Although there are different kinds of tree grammars [41], we use the expansive tree grammar in the study because of the following theorem. Theorem 7.1 For each regular tree grammar Gt, one can effectively con struct an equivalent expansive grammar G't, i.e., L(G't) = L(Gt) [41]. An expansive tree grammar is a four-tuple Gt = (V, r, P, S), where V =

VNUVT,

Vw = the set of nonterminal symbols, VT = the set of terminal symbols, S : the starting nonterminal symbol,

78

TREE GRAMMAR AND AUTOMATON

...

r : the rank of terminal symbol, i.e., the number of children in the tree node, and each tree production rule in P is of the form (1)

XQ

-»

x

I X\

or

(2)

Xo^-x

I --A X2

Xr(x)

where x e VT and XQX\X2 ■ ■ • Xr(x) € Vjf. For convenience, the tree production rule (1) can be written as XQ —> X1X2 • • • Xr(x) [41]. From the starting symbol S, a tree is derived by using the tree pro duction rules of P, S —> ao => a\ =>■ ■ ■ ■ => am = a. The tree language generated by Gt is denned as L(Gt) = {a is a tree \S^a in Gt}, where * represents several derivation steps using tree production rules in Gt. 7.4.

TREE AUTOMATON

The bottom-up replacement functions of the tree automaton are generated from the expansive tree production rules of the tree grammar. Expansive tree grammar is Gt = (V,r,P,S), and tree automaton is Mt = (Q,f,S), where Q is the set of states, / are the replacement functions, and S becomes the final state. If tree production rule XQ

-> X / I •■■\

Xi X2

Xn

is in P, then bottom-up replacement function in the tree automaton can be written as X0 4- x /|...\,or X\ X2

Xn

/(

x ) -> X0 /|...\, or

Xi X2

fx(XuX2,...,Xn)^X0.

Xn

Tree automaton is an automatic machine to recognize the tree and has the tree bottom-up replacement functions which are the reverse direction of the tree production rules. Tree grammar is forward and top-down derivation to derive the tree. The tree automaton is a backward replacement of the states from the bottom to the root of the tree. If the final replacement state

TREE AUTOMATON

79

is in the set of the final states, then the tree is accepted by the automaton of the class. Otherwise, the tree is rejected. Example 7.1 The following tree grammar Gt = (V,r,P,S) where V = {S, A, B, $, a, b}, VT = {•$, ia,^ b}, r{a) = {2,0}, r(b) = {2,0}, r($) = 2, andP: (1) S -»• $

(2) A -» a

I\ A

(3) B -+ b

I\

B

A

(4) A-Ki

(5) B -> b

I \

B

A

b

can generate the patterns, for example, using productions (1), (4), and (5),

/ \ a b or using productions (1), (2), (3), (4), (5), (4), and (5). $

t

^W

w r

Vi

r

1 a 1 \ a b

\ b 1\ a b

f

The tree automaton which accepts the set of trees generated by Gt is Mt = (Q, fa, fb, f$,F), where Q = {qA,qB, qs}, F = {qs}, and / : fa = qA, fb = qB, fa(qA,

qB) = qA, h(qA, qs) = qB, / $ ( « u , qB) = qs ■

Example 7.2 The following tree grammar can be used to generate trees represent at ing L-C networks.

$,

©

rtflTK

!_npftV

rtv

rr\,

_1*

TREE GRAMMAR AND AUTOMATON

80

...

Gt = (V,r,P,S), where V = {S,A,B,D,E,$,Vin,L,C,W}, VT = {$,Vin,L,C,W}, r($) = 2, r(Vin) = 1, r(L) = {1,2}, r(C) = 1, r(W) = 0, andP: (1) 5 -> $ (2) A-+Vin

(3) 5 -> Z (4) 5 ->■ L (5) D -* C (6) £J -> W

/ \

I

/ \

I

I

A B

E

D B

D

E

For example, after applying productions (1), (2), (3), (6), (5), (4), (6), (5), and (6), the following tree is generated. $ / \

K L I W

/ \ C L I I

©

v.„

L

L

-'TffiP'

j—'TflfffV

7T\

>Tv

w c I

w The tree automaton which accepts the set of trees generated by Gt is Mt = (Q,fw,fc,fL,fvin,f$,F), where Q = {qE,qD,qB,qA,qs}, F = {qs}, and / : fw{ )=qE,

fc(qB) = qo,

/v s „{1E) = qA , Example 7.3 spot pattern.

/ L ( ? D ) = qB,

fUqD,qB)

= qB,

fs(qA, qB) = qs ■

Tree representation and tree grammar of seismic bright

(A) Bright spot pattern: The seismogram of the primary reflection of bright spot is generated from a geologic model [27, 28, 93]. The geologic model is shown in Fig. 7.2 and the seismogram is shown in Fig. 7.3. After envelope processing, thresh olding, compression in the vertical direction, and thinning processing, the bright spot pattern can be shown below. We can scan the pattern from left to right, then top to bottom. The segments (branches) can be extracted in

TREE

AUTOMATON

Deroiv-2.0gm/cm**3 Veloc*^*2.0 knt/jec

D-2.5 V-2.59

Fig. 7.2. Geologic model. Station

...

10

20

30

40

(0

50

] >t

ul$y}§4

111

C\

!

!

imliSfffiliMififll

> Ci5 1 A-

v* •ir

l

«?fti?

^ H

1 1

ttOijiis

u

t| ^T

J7 TMCSOSIJII

2.0-

^H

«

«?

Mrr It

ip

^f&ymra aKiM>u«m

iuSi |SS c/rr?

l

I [ >L rf; 1if SJ

Fig. 7.3. Bright spot seismogram.

|J(|}M7

TREE GRAMMAR AND AUTOMATON

82

...

the tracing. Eight directional Freeman's chain codes [38] are used to assign primitives to the segments. From expanding the segments (branches), tree representation of seismic bright spot pattern is constructed. And the tree can infer the production rules of the tree grammar. $• x 5 xxxxx xxxxx 7 xxx xxx XXX

5

XX

5

XXXXXXXXXXXX

xxx

XXX

0

xxx

XXXXXXXXXXXXXXXXXXXXXXX

xxxxx

0

7

XX

xxxxx

XX

7

XX

Primitives: eight directional Freeman's chain codes [38] and terminal sym bol @ (the neighboring segment have already been expanded),

3

2 It

1

Z.

X

4

4 —

5

6 7 Primitives

/\—*

(B) (1) Tree representation of bright spot after scanning and tree construction: $ / 5

\ 7

A A 5 0@ 7

A

A

5 0

@ 7

(2) Corresponding tree node's position:

TREE AUTOMATON

83

0 0.1 / \ 0.1.1 0.1.2

0.2 / \ 0.2.1 0.2.2

/ \ 0.1.1.1 0.1.1.2

/ \ 0.2.2.1 0.2.2.2

(C) Tree grammar: Gt = (V, r, P, S), where V = set of terminal and nonterminal symbols = {$, 0, 5,7, @, S, A, B, C, D, E, F, G, 77, I, J} , VT = the set of terminal symbols = {$,0,5,7, @} , $ : the starting point (root) of the tree, @ : represents that the neighboring segment has already been expanded, S : the starting nonterminal symbol, r : r(5) = r(7) = {2,0}, r($) = 2, r(@) = 0, r(0) = 0, andP : (1) S -► $ (2) A -> 5 (3) B ->• 7 (4) C -+ 5 (5) D ->■ 0 (6) E -> @ / \ A B

/ \ C D

(7)f->7 7

/ \ E F

/ \ G7J

(8) G -^ 5 (9) # -» 0 (10) 7 -»• @ (11) J -j- 7

/ \ J

The tree derivation steps are as follows: (1) S ^

(2) $

(3) $

->

/ \ A B

/ \ 5 5 / \ C

D

(4,5,6,7,8,9,10,11)

->■

$

►

/ \ 5 7

/

/ \ / \ C DE

$

F

\ 5 7

/ \ / \ 5 0@ 7 / \ 5 0

/ \ @ 7

TREE GRAMMAR AND AUTOMATON

84

...

So following the steps as in (A) and (B), each seismic pattern can be represented as a tree. From the steps (B) to (C), a tree can infer tree production rules. The tree production rules can derive trees. Each tree is corresponding to its pattern class. (D) Tree automaton from tree production rules of (C): A tree automaton generated by Gt is Mt = (Q, h,fo,h,h, /<§>, S), where Q = {S,A,B,C,D,E,F,G,H,I,J}, S: the final state, and the bottom-up replacement function / : (11) J7^J

(10)/o-J-7

(9)f0->H

(8)/5-»G

(7)f7(I,J)->F

(6)/0->JS

(5)/o-+D

(4)/5(G,#)-> C

(3)f7(E,F)->B

(2)f5(C,D)^A

(1) f$(A, B) ^

S.

The number on the left hand side of the bottom-up replacement function is corresponding to the number of production rule of the tree grammar, and the bottom-up replacement function is the reverse of the corresponding production rule. The above tree in (B) can be replaced by replacement functions step by step from the bottom to the root of the tree and accepted by this tree automaton Mt as the seismic bright spot pattern.

7.5.

TREE REPRESENTATIONS OF PATTERNS

In the tree automaton system, patterns must be extracted from image data and constructed as the tree representations. In order to construct tree representation of a pattern automatically, a scanning algorithm is proposed. The following Algorithm 7.1 is the construction of tree representation from scanning an input pattern. The algorithm works for both four-neighbor and eight-neighbor connectivity. The scanning is from left to right and top to bottom on the binary image. In the algorithm, breadth-first tree expansion is adopted to construct the tree representation such that the depth of the tree will be shorter and the parsing time of the input tree by tree automaton will also be shorter in parallel processing. Algorithm 7.1

Construction of a tree representation from a pattern

Input: Image of a pattern after thinning. Output: Tree representation of a pattern. Method:

INFERENCE OF EXPANSIVE TREE GRAMMAR

85

(1) While scanning image from left to right and then top to bottom, (a) If the scanning reaches a point of the pattern, then the point is the root (node) of a tree. (b) Trace all following branches (segments) from a node. And assign the terminal symbol to each branch (segment) by the chain code. (c) If the lower end of each branch (segment) has sub-branches (subsegments), then go to step (b), trace the sub-branches (subsegments) and expand all children nodes from the left-most child node in the same tree level. After the whole children nodes in the same level are expanded, then go to step (b) to expand the descen dants from the left-most in the next down tree level. Expand level by level until there is no node to be expanded. Then a pattern is extracted and its corresponding tree representation is constructed. There may exist several patterns in the image data. Algorithm 7.1 can extract a pattern. The following Algorithm 7.2 can extract all the entity patterns without overlapping in the image and construct their tree representations of the all patterns in the image. Algorithm 7.2 Extract all patterns without overlapping and construct their tree representations from the binary image Input: Image data after thinning. Output: Tree representations of all patterns. Method: (1) Scan image from left to right, then top to bottom. (2) When the scan reaches a point of a pattern, follow Algorithm 7.1 to extract one pattern and construct its corresponding tree representation, then erase the position of current pattern from the image. (3) Go to step (1), continue to scan, extract the next pattern, and construct the tree representation until there is no pattern to be extracted.

7.6.

I N F E R E N C E OF E X P A N S I V E TREE GRAMMAR

In the training part of tree automaton system of Fig. 7.1, tree represen tations of the training patterns must be given in order to infer the tree

TREE GRAMMAR AND AUTOMATON

86

...

production rules. The following Algorithm 7.3 is presented to infer expan sive tree grammar from tree representation of the pattern. Algorithm 7.3

Inference of expansive tree grammar

Input: Tree representation of a pattern. Output: Expansive tree grammar. Method: (1) From top to bottom of a tree, for every node of the tree, derive a tree production rule: X->

a X\ X
Xn

where X: the nonterminal symbol assigned to the node, a: the primitive (terminal) of the node, and X\,X2,. ■ ■ ,Xn: the nonterminals of the direct descendants (children) to cover the next level subtrees. (2) Go to step (1) to handle the other nodes in the same level of the tree, until every node is reached. Handle each node level by level of the tree. This algorithm can be implemented in one recursive procedure.

7.7.

W E I G H T E D M I N I M U M - D I S T A N C E SPECTA

Due to noise and distortion there are some possibilities that terminal may be recognized as the neighboring terminals in the process of primitive recogni tion. The tree may have substitution error terminals, then tree automaton must be expanded to recognize error trees. Given a tree W below, node b at position 0.1 is substituted by node x of W. But the trees W and W have the same tree structure. This substitution error is written as W —% W. a

1 \c

Tree W = b

a

0 / \ --= 0.1 0.2 0.1.1

Substitute node b at position 0.1 by x

x

___, c = W .

87

WEIGHTED MINIMUM-DISTANCE SPECTA

For a given tree grammar Gt or tree automaton Mt, the minimumdistance SPECTA is formulated to accept the input tree and to generate a parse that consists of the minimum number of substitution error tree production rules or error replacement functions, while the tree structure is still preserved. Assume that W is an input tree, the parsing of the input tree by minimum-distance SPECTA is a search for the minimum distance and is a backward procedure of constructing a tree-like transition table with all candidate states and their corresponding costs recorded from the leaves to the root of W. For each tree node a (index position in tree), there is a corresponding transition box ta which consists of triplet items (X, #fc, c) in the transition table, where X is the state, #fc is the kth production rule or replacement function, c is the accumulated error cost. An example is shown as follows. E x a m p l e 7.4 Parsing by minimum-distance production rules P:

Given tree

(3) B -> 7

{2) A

(l)5->$ / \ A B

SPECTA

and given an input tree, $ / \ , 5 7 we can generate the parsing of the input tree by minimum-distance SPECTA as below.

to.i

t0.2

(A, 2, 0) (B, 3, 1)

(A, 2,1) (.B, 3, 0)

.i

, 5

L

7

TREE GRAMMAR AND AUTOMATON . . .

88

t 0 .i includes state A (nonterminal) in (A, 2,0) and state B (nonterminal) in (B, 3,1). Triplet item (A, 2,0) represents that we can use A in production rule (2) A —► 5 to derive terminal 5 with 0 substitution error because input terminal is 5, triplet item (B, 3,1) represents that we can use state B in production rule (3) B —> 7 to derive terminal 7 with 1 substitution error because input terminal is 5. to.2 has similar explanation, to has state S in triplet item (S, 1,0) that we can use S in production rule (1) to derive $ / \ A B with 0 substitution error, 0 in (5,1,0) is the sum of 0 from (^4,2,0) and 0 from (B, 3,0). Although there are other derivations from the combinations of the states (nonterminals) A and B of the triplets, i.e., $ / \, A

A

$ / \, B

$ / \ ,

and

A

B

B

only S in production rule (1) can derive $ / \ A B with minimum error, others are counted as larger errors and neglected. If X is a candidate state of tree node at position index a, then each triplet item (X, #k, c) is added to a box t a , #& specifies the fcth production rule or bottom-up replacement function used, and c is the accumulated minimum number of substitution errors from the leaves to the node a in subtree of W from node a when tree node a is represented by state X. The algorithm of minimum-distance SPECTA was in Pu [41]. For consideration of the weighted substitution error costs of pair terminals, Basu and Fu, 1987 [9], presented the weighted minimum distance SPECTA. Here we have some modification. We can expand each production rule to cover the substitution error production rules and embed cost to each production rule. So the expanded tree grammar with weighted error costs are generated. Production rule X —> x is expanded to substitution error

WEIGHTED MINIMUM-DISTANCE SPECTA

89

production rule with error cost c, X —> y, y ^ x; X —$■ x, c = 0, and production rule X-*

x X\ X2

Xn

is expanded to c

x —► I

y I••■\

X\ X2

,

2 / ^ a ; ; c = Oif2/ = a;.

Xn

Initially the expanded tree grammar with error costs must be generated. Then the algorithm of weighted minimum distance SPECTA is presented as follows. Algorithm 7.4

Weighted minimum distance SPECTA

Input:

An expanded tree grammar Gt (or a tree automaton Mj) with error costs, a tree W. Output: Parsing transition table of W and minimum distance. Method: (1) Replace each bottom leave of the input tree W to a state. If bottom leave (terminal) of the input tree is x, i.e., r[W(a)] = 0 (rank of bottom node a in tree W is 0) and W(a) = x (terminal at bottom node a in tree W is x), then for any terminal y and production rules ((#k)X —► y), replace the leave (terminal) x of the input tree to state X, and store (X, #fc, c) to the box t 0 . Do steps (2) and (3) until no replacement to the root of the tree. (2) Replace the subtree to a state using bottom-up replacement function. If subtree is X

I I--A , X\ Xi

Xn

i.e., r[W'(a)] = n > 0 (rank of node a in tree W is n) and W'(a) = x (terminal at node a in tree W is x), then for any terminal y and

TREE GRAMMAR AND AUTOMATON ...

90

production rules X ^

/

y

\

1 |.-A

(#*)

X\ X2 Xn J replace subtree X

X\ X2

Xn

to state X, and store (X, #k, c') to the table box t a , c' = c\ + c
Parsing of error bright spot pattern by minimum distance

(A) Bright spot pattern with primitive errors:

x xxxxx xxxxx 7 xxx xxx xxx xxxxxxxxxxx xxxxxxxxxxxxxxxx 0 5 xxx 0 xxx xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx xxx 0 xxxxxxxxxx 0 5

5 XX

(B) Tree representation with primitive errors: Using eight directional Freeman's chain codes [38] and @, tree represen tation of bright spot with primitive errors is shown below.

WEIGHTED MINIMUM-DISTANCE

SPECTA

/ 5

91

\ 7

/ \ / \ 5 0@ 0 / \ 5 0

/ \ @ 0

(C) Parsing by minimum distance SPECTA: Using the tree automaton inferred from bright spot pattern in Example 7.3, the transition table of parsing on tree of error bright spot pattern of Example 7.5 using minimum distance SPECTA can be generated as follows. Here the costs of terminal substitution errors are set and equal to 1. The explanation of each box from the bottom-up replacement is the same as that of Example 7.4.

to.i (A, 2, 0)

"A tO.1.2 (D, 5, 0) (£, 6, 1) (G,8, 1) (ff,9,0) (/, 10, 1) ( A l l , 1)

to.i.i.i

tO.1.1.2

(D, 5, 1) (E, 6,1) (G, 8, 0) (#, 9, 1) (A 10, 1) ( A H , I)

(D, 5, 0) (£, 6, 1) (G, 8, 1) (H, 9, 0) (/, 10, 1) (A 11,1)

T>

T

T°

tO.2.2.1 (O.S 1) (£,6, 0) (G, 8, 1) (H,9, 1) (I. 10, 0) (A 11.1)

@

tO.2.2.2 (D, 5, 0) (E, 6,1) (G, 8, 1) (H, 9, 0) (/, 10, 1) (A 11,D

V

TREE GRAMMAR AND AUTOMATON

92

7.8.

...

MODIFIED MAXIMUM-LIKELIHOOD SPECTA

When the probabilities of tree production rules and the substitution de formation probabilities on terminal pair symbols are available, maximumlikelihood SPECTA can be used for the recognition of patterns [41]. The stochastic expansive tree grammar Gs = (V, r, P, S) has the form of production rules in P: (l)Xo-A

x X\

X2

or

(2)X0-^x

Xr(x)

where p is the probability of production rule, x G Vp and Xo,Xi,X2,.--,Xr(x) G Vjf. The major steps of the maximum-likelihood SPECTA in Fu, 1982, are described as follows [41]. Given stochastic expansive tree grammar Gs, terminal substitution probabilities q{y/x) and input tree W. (1) Replace each leave of the input tree to a state. If r[W'(a)] = 0 (rank of node a in tree W is 0), W'(a) = y (symbol at node a in tree W is y), and X —> x is the fcth production rule in P, then add (X, #k,p') to t a and p' =p x q(y/x). (2) Replace the subtree to a state using bottom-up replacement function. If r[W'(a)] = n > 0 (rank of node a in tree W is n), W'(a) = y (symbol at node a in tree W is y), and X ^

x X\ X2

Xn

is the kth production rule in P, then add (X, #k,p') to table box t a , p' = p[ x p'2 x • ■ ■ x p'n x p x q(y/x), where p\ is the probability in table box t a .i for state Xi, i = 1 , . . . , n. The calculation of probability p' was from the multiplication of the pro duction rule probability p and the terminal substitution probability q(y/x). Instead of the calculation from the multiplication, p x q(y/x), here a modification is proposed. Similar to the previous expanded tree grammar

MODIFIED MAXIMUM-LIKELIHOOD

SPECTA

93

with error costs, each production rule is expanded to cover substitution error production rules with the probabilities, i.e., if X ->•

x

X\ Xi

Xn

is in P, then expand to X ^

x X\ Xi

Xn

for all terminals y, y = x or y ^ x. Summation of the probabilities from expansion of one tree production rule is 1. There are some possibilities that terminal may be recognized as the neighboring terminals in the process of primitive recognition. The value of probability of substitution pair terminals is inversely proportional to the angle of pair terminals. Each tree production rule of tree grammar Gt is expanded to cover the substitution error production rules with probabili ties. Based on the expanded grammar, the maximum-likelihood SPECTA is modified here for the recognition of seismic patterns. The algorithm is presented as follows. Algorithm 7.5

Modified maximum-likelihood

SPECTA

Input:

(1) Expanded tree grammar Gt (or tree automaton Mt) with probability on each production rule, (2) input tree W. Output: Parsing transition table of W and maximum probability. Method: (1) Replace each bottom leave of the input tree W to a state. If r[W'(a)] = 0 (rank of node a in tree W is 0), W'(a) = x, and X -£-» x is the fcth production rule in P, then add (X, #k, p') to t Q and p' = p. Do Steps (2) and (3) until no replacement at the root of the tree. (2) Replace the subtree to a state using bottom-up replacement function. If r[W'(o)] = n > 0 (rank of node a in tree W is n), W'(a) = x, (terminal at node a in tree W is x), and

94

TREE GRAMMAR AND AUTOMATON

14

...

x X\ X2

Xn

is the fcth production rule in P, then add {X, #k,p') to table box t a , p' = p'j x p'2 x • • • x p'n x p, where p\ is the probability in table box ta.i for state Xi, i = 1 , . . . , n. (3) Whenever more than one item (X, #k,p'} in t 0 has the same state X, keep the item with larger probability and delete the item with smaller probability. (4) If items (S, #/c, p') are in t 0 , choose one item with maximum probability p', then the input tree W is accepted with probability p'. If no item in to is associated with the starting nonterminal S, then the input tree W is rejected.

7.9.

M I N I M U M DISTANCE GECTA

Due to noise, distortion, and interference of the wavelets, the tree has error structure. The error may cause the tree to become the preserved or nonpreserved tree structure. If the tree structure is preserved, then weighted minimum distance SPECTA and modified maximum-likelihood SPECTA can be applied in the recognition of patterns. If the tree structure is not preserved, then minimum distance GECTA [41] can be applied in the recog nition of patterns. The syntax errors between two trees may include substitution, deletion, and insertion errors. The insertion error includes three types of errors: stretch, branch, and split errors. Totally there are five types of syntax er rors on tree. The distance between two trees is defined to be the least cost sequence of error transformations needed to transform one to the other [41, 82, 103]. Because there are five possible error transformations to transform one tree into the other tree, each production rule of the tree grammar must be expanded to cover all five syntax errors on trees. Then the expanded grammar can generate a recognizer, i.e., the minimum distance generalized error-correcting tree automaton (GECTA). Similar to the weighted mini mum distance SPECTA, the parsing of an input tree W using minimum distance GECTA is also to construct a tree-like transition table with all

EXPERIMENTS ON INPUT TESTING

SEISMOGRAMS

95

candidate states and their corresponding costs recorded. The procedure is a backward from the leaves to the root of W for the least cost solution [41].

7.10.

EXPERIMENTS ON INPUT TESTING SEISMOGRAMS

Here, three kinds of tree automata are applied to the recognition of five class seismic patterns: bright spot, pinch-out, flat, gradual sealevel fall, and gradual sealevel rise seismic patterns. In the training part of the system in Fig. 7.1, we design the five class training seismic patterns. Each training seismic pattern passes through extraction of seismic pattern, primitive recognition, and construction of tree representation. The 24 primitive as signment for scaling invariance is shown in Fig. 7.4. The tree of bright spot training pattern is shown in Fig. 7.5(a), and the tree of flat spot training pattern is shown in Fig. 7.5(b). Then each tree can infer its tree grammar. Five tree grammars are combined into one unified tree grammar. Tree grammar can generate error-correcting tree automaton. 2a 2b 2c

6a 6b 6c Fig. 7.4.

Twenty-four primitives.

TREE GRAMMAR

96

AND A UTOMATON

...

$S

\ Ob

/ 4b / 4b

/ 4b

\ Ob

/ @g

\ Ob / \ §6 7b

/ \ @@ @@

\ Ob

/ \ (a)

$$

c

\Ob

4b _5b

ob

-4b_

/ 5b

_"b_

\

/ Ob

\

5b

5b

/ eg

\

4b Oa

\

/

\

gg

@e

eg

7b

eg

7b

/~ ea

/ \ / \ 5b ee ee eg

7b

/ 7b

\ @g

/ / \ lb

~\ go

eg

ee

\ 7b / \ 4b 7b / \

(b) Fig. 7.5.

(a) Tree of bright spot pattern, (b) tree of flat spot pattern.

In the recognition part, each input testing seismogram of the primary reflection is generated from a geologic model which is adopted from Dobrin [27, 28] and Payton [93]. The input testing seismograms are generated in Fig. 7.6(a)-(g). Each input testing seismic pattern is extracted and constructed as a tree. Then each tree is parsed by the error-correcting tree automaton into correct class. In the minimum distance SPECTA, the costs of substitution error ter minal pairs are designed and shown in Fig. 7.7, and are used in the tree production rules of the seismic training patterns. The property of error cost of each substitution terminal pair is proportional to the angle of pair terminals. For example, Oa to la has cost 0.2, 0a to 2a has cost 0.4. The largest substitution error cost is Oa to 4a with cost 1, Oa to 4b with cost 1, and Oa to 4c with cost 1, which are the reverse direction of Oa. The recognition results of input testing seismograms using weighted min imum distance SPECTA are shown in Table 7.1. A threshold is set to 2.5. The blank inside Table 7.1 means that the distance is large and is neglected.

EXPERIMENTS

ON INPUT

TESTING

SEISMOGRAMS

97

0.0

1.0 sec

(a) 0.0 sec

1.0 sec

Si

11

l l i l l l Hill (c)

II 1ill

1IIII

i

Bit 11

fRl wOmu||ki|(||

I 11 llillllll I I 111 'i

2.0

(b)

11

!

20 sec'

2.0 111

(d)

(e)

10

Jl

£k=5c» I'll I fl i (f)

i| r,i ill

nljlii^T^Tl i l ' i 'I' I

zoi'l1!11'

(g) Fig. 7.6. Synthetic seismic patterns, (a) Bright-spot, (b) pinch-out, (c) flat-spot, (d) gradual sealevel fall, (e) gradual sealevel rise, (f) gradual sealevel fall, (g) gradual selalevel rise.

TREE GRAMMAR AND AUTOMATON ..

98 Oa Ob Oc la lb lc Oa Ob Oc la lb lc 2a 2b 2c 3a 3b 3c 4a 4b 4c 5a 5b 5c 6a 6b 6c 7a 7b 7c 99

$$

0 .1 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 .2 .2 .2 1 1

.1 .1 • 0 .1 .1 0 .2 .2 .2 .2 .2 .2 .4 .4 .4 .4 .4 .4 .8 .8 .8 .8 .8 .8 1 1 1 1 1 1 .8 .8 .8 .8 .8 .4 .4 .4 .4 .4 .4 .2 .2 .2 .2 .2 .2 1 1 1 1

.a

.2 .2 .2 0 .1 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 1 1

.2 .2 .2 .1 0 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 1 1

.2 .2 .2 .1 .1 0 .2 .2 .2 .4 ,4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 1 1

2a 2b 2c 3a 3b 3c la 4b 4c 5a 5b 5c 6a 6b 6c 7a lb 7c 9* .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 .2 .2 .2 1 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 .2 .2 .2 1 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 .2 .2 .2 1 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 1 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 1 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 .4 .4 .4 1 .1 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 1 0 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 1 .1 0 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 1 .2 .2 0 .1 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 1 .2 .2 .1 0 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 1 .2 .2 .1 .1 0 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 1 .4 .4 .2 .2 .2 0 .1 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 .4 .4 .2 .2 .2 .1 0 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 .4 .4 .2 .2 .2 .1 .1 0 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 .8 .8 .4 .4 .4 .2 .2 .2 0 .1 .1 .2 .2 .2 .4 .4 .4 1 .8 .8 .4 .4 .4 .2 .2 .2 .1 0 .1 .2 .2 .2 .4 .4 .4 1 .8 .8 .4 ,4 .4 .2 .2 .2 .1 .1 0 .2 .2 .2 .4 .4 .4 1 1 1 .8 .8 ,8 ,4 ,4 .4 .2 .2 .2 0 .1 .1 .2 .2 .2 1 1 1 .8 .8 .8 .4 .4 .4 .2 .2 .2 .1 0 .1 .2 .2 .2 1 1 1 .8 .8 .8 .4 .4 .4 .2 .2 .2 .1 .1 0 .2 .2 .2 1 1 .8 .8 .8 .4 .4 .4 .2 .2 .2 0 -1 .1 1 .8 .8 1 1 .8 .8 .8 .4 .4 .4 .2 .2 .2 .1 0 .1 1 .8 .8 1 1 .8 .8 .8 .4 .4 .4 .2 .2 .2 .1 .1 0 1 .8 .8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

.4 .4 .4 .2 .2 .2 0 .1 .1 .2 .2 .2 .4 .4 .4 .8 .8 .8 1 1 1 .8 .8 .8 1 1

$$ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 0

Fig. 7.7. Costs of substitution errors.

In the modified maximum likelihood SPECTA, each tree production rule is expanded to cover the substitution error production rules with probabilities. The value of probability of substitution pair terminals is in versely proportional to the angle of pair terminals. The recognition results of input testing seismograms using modified maximum likelihood SPECTA are shown in Table 7.2. The blank inside Table 7.2 means that the proba bility is small. In the recognition results of Tables 7.1 and 7.2, each pattern can be recognized correctly except pinch-out pattern. Because the tree represen tation of pinch-out pattern of Fig. 7.6(b), is interfered too much at the top of the pinch-out junction and the tree structure is not preserved, the tree can not be recognized as the pinch-out pattern by weighted minimum distance SPECTA and modified maximum likelihood SPECTA. For consideration of substitution, deletion, and insertion errors, the minimum distance GECTA is applied to the input testing seismograms. Here the costs of substitution errors of pair terminals in Fig. 7.7 are used, and the costs of deletion and insertion errors are set to 1. The recognition results are shown in Table 7.3. The threshold of the acceptance in the recognition is set to 3.8. Each pattern is recognized correctly.

I Table 7.1. Recognition results of input seismograms by the weighted minimum-distance SPECTA. Blank inside table means that the distance is large.

o

Distance and recognition result by weighted minimum-distanc SPECTA Final states of the weighted minimum-distance SPECTA Input Seismograms Fig. 7.6(a)

A100

A200

0.2

2.4

A300

A400

A500

A600

A700

A800

A900

Al Bright spot Cannot be accepted

Fig. 7.6(b) Fig. 7.6(c)

2.1

A3 Flat spot

Fig. 7.6(e)

Fig. 7.6(g)

A8 Gradual sealevel fall

0.2

Fig. 7.6(d)

Fig. 7.6(f)

Recognition results

0.2

A9 Gradual sealevel rise

I I a fa

I

o Co

A6 Gradual sealevel fall

0.5 0.2

A7 Gradual sealevel rise

to to

I Table 7.2. Recognition results of input seismograms by modified maximum-likelihood SPECTA. Blank inside table means that the probability is small. Probability and recognition result by modified maximum-likelihood SPECTA Final states of t h e modified maximum-likelihood SPECTA Input Seismograms

A100

Fig. 7.6(a)

4.8E-4

A200

A300

A400

A500

A600

A700

A800

A900

Al Bright spot Cannot be accepted

Fig. 7.6(b) Fig. 7.6(c)

1.0E-11

AS Gradual sealevel fall

6.1E-4

Fig. 7.6(e)

Fig. 7.6(g)

3

A3 Flat spot

Fig. 7.6(d)

Fig. 7.6(f)

Recognition results

6.1E-4

A9 Gradual sealevel rise

o 153

I

A6 Gradual sealevel fall

1.4E-5 6.1E-4

A7 Gradual sealevel rise

O

o 5;

Table 7.3.

Recognition results of input seismograms by minimum-distance GECTA. Distance and recognition result by minimum-distance GECTA

o

Final states of the minimum-distance GECTA A100

A200

A300

A400

A500

A600

A700

A800

A900

Recognition results

Fig. 7.6(a)

0.2

2.4

26.6

11.0

11.6

13.4

13.9

10.4

11.5

Al Bright spot

Fig. 7.6(b)

5.8

3.6

26.6

13.0

12.2

13.4

13.3

13.0

12.1

Cannot be accepted

Fig. 7.6(c)

15.4

17.0

2.1

26.0

25.6

17.8

16.5

24.4

29.7

A3 Flat spot

Fig. 7.6(d)

12.0

13.4

33.8

1.0

5.2

14.1

16.2

0.2

8.1

A8 Gradual sealevel fall

Fig. 7.6(e)

11.7

12.4

33.3

3.0

1.3

14.7

12.4

4.0

0.2

A9 Gradual sealevel rise

Fig. 7.6(f)

8.4

10.0

30.4

12.6

12.4

0.5

13.7

11.8

11.7

A6 Gradual sealevel fall

Fig. 7.6(g)

9.1

9.2

25.9

11.6

11.2

8.7

0.2

13.6

11.2

A7 Gradual sealevel rise

Input Seismograms

I I fcq

O

I—1

o

102

7.11.

TREE GRAMMAR AND AUTOMATON

...

DISCUSSION A N D CONCLUSIONS

In a number of synthetic seismograms, there exist certain structural seis mic patterns. In order to recognize seismic patterns and improve seismic interpretation, we use the method of tree automaton. In the training part, error-correcting tree automaton is generated from seismic training patterns. In the recognition part, each input testing seismic pattern is scanned and represented by tree, and each tree is parsed by the tree automaton into the correct class. Because of complex variations in the seismic patterns, three kinds of automaton are adopted in the recognition: weighted minimum distance SPECTA, modified maximum-likelihood SPECTA, and minimum distance GECTA. In the experiments, the analyzed 2-D synthetic seismic patterns are bright spot, pinch-out, flat spot, gradual sealevel fall, and grad ual sealevel rise patterns. The tree automaton system provides a method for recognition of seismic patterns, and the recognition result is encouraging in the improvement of seismic interpretation. The maximum-likelihood SPECTA is based on the probability approach. Alternatively the fuzzy logic may be used on the certainty approach of tree automaton in the experiment [74]. Also it is necessary to design the costs (weights, certainties) of the terminal pair error transformations and tree production rules in the minimum distance SPECTA and GECTA more objective.

Chapter 8

A HIERARCHICAL R E C O G N I T I O N SYSTEM OF SEISMIC PATTERNS AND F U T U R E STUDY

8.1.

SUMMARY

Hierarchical syntactic pattern recognition and the Hough transforma tion are proposed for automatic recognition and reconstruction of seismic patterns in seismograms. In the first step, the patterns are hierarchically decomposed or recognized into single patterns, straight-line patterns, or hyperbolic patterns, using syntactic pattern recognition. In the second step, the Hough transformation technique is used for reconstruction, pattern by pattern. The system of syntactic seismic pattern recognition includes enve lope generation, a linking process in the seismogram, segmentation, prim itive recognition, grammatical inference, and syntax analysis. The seismic patterns are automatically recognized and reconstructed.

8.2.

INTRODUCTION

In a seismogram, there are many seismic patterns. In a common-source (one-shot) seismogram (Fig. 8.1), traveltime curves of the direct wave and refracted wave patterns are straight lines and the traveltime curve of the reflected wave pattern is hyperbolic. Direct-wave and reflectedwave patterns exhibit some severe interference beyond some distance from the source. In the stacked seismogram (Fig. 8.2), there are diffraction curves and a continuous straight-line reflection. Huang et al. [63] showed the usefulness of the Hough transformation for detection of straight lines 103

A HIERARCHICAL

104

RECOGNITION

SYSTEM

OF SEISMIC

Refraction

Reflection -t

Offset distance (km) Fig. 8.1.

o 56

Direct-wave, reflection, and refraction patterns of one-shot seismogram.

0.0

Distance (km)

1.0

r

Fig. 8.2.

Horizontal reflector and diffraction patterns of stacked seismogram.

...

INTRODUCTION

105

and hyperbolic curves in a seismogram. However, the detection results showed that some interference existed among nearby patterns in the common-source seismogram. In order to detect different and varied types of patterns in a seismo gram, a hierarchical syntactic pattern recognition and Hough transforma tion system has been developed. The new system decomposes the complex patterns into uniform patterns consisting of similar properties, and further decomposes the uniform patterns into each type of pattern. Figure 8.3 shows the hierarchical pattern recognition scheme for composite seismic patterns. A syntactic pattern recognition system is used in hierarchical detection. Figure 1.1 have shown the block diagram of a syntactic pattern recognition system. After each single pattern is detected with syntactic pattern recognition, a Hough transformation [7, 31, 51, 63, 67, 70] is used in the reconstruction of the seismic patterns, pattern by pattern. Figure 8.4 shows the block diagram of the hierarchical syntactic pattern recognition and Hough transformation system. The Hough transformation part of the system includes envelope gener ation, thresholding, Hough transformation, and parameter determination [63, 67]. The envelope describes the outer shape of the wavelet [58, 59]. A Hough transformation transforms each image element in the picture space (t — x) into elements in parameter space. In line detection using the Hough transformation, the line equation of the direct-wave and refracted-wave

Seismic Patterns : (1 ) Straight lines (2) Hyperbolas

Straight lines

Direct wave pattern

'Refracted^ wave pattern

Fig. 8.3.

horizontal pattern

Direct, refracted wave patterns, horizontal pattern , . . . Reflected wave pattern, d i f f r a c t i o n patterns.

Hyperbolas

Reflected wave pattern

Diffraction pattern

Hierarchical pattern recognition for composite seismic patterns.

106

A HIERARCHICAL

SAMPLE PATTERNS I TRAINING

RECOGNITION

SYSTEM

PRIMITIVE

->!

SELECTION

OF SEISMIC

...

GRAMMATICAL

->!

INFERRENCE

I

ANALYSIS

RECOGNITION PATTERN REPRESENTATION INPUT

>

ENVELOPE!—>

! I

SEISMOGRAM

LINKING PATTERNS AND SEGMENTATION

I I l->l PRIMITIVE ! ! I I RECOGNITION

I I — > I I I

I HIERARCHICAL! SYNTAX I ANALYSIS !

DETECTION

! RECONSTRUCTED ! I l< — ! PATTERNS !

!

HOUGH TRANSFORMATION

I

I

EACH SINGLE PATTERN

I

Fig. 8.4. Block diagram of hierarchical syntactic pattern recognition transformation system.

and Hough

patterns in t — x space is p = Xi cos 9 + U sin 9. A line in the t — x plane cor responds to one point in the p — 9 plane. Every point (x^, tj) in the picture space maps onto one line of p — 9 in the parameter space. N points (xi, ti) of the same line map onto N curves of p — 9 in the parameter space. The in tersection of N curves in the parameter space is one point (p, 9) which is the parameter of the line equation p = Xi cos 9 + U sin 9. The method is similar to detecting hyperbolic patterns using the Hough transformation [63, 67]. A visual inspection method, local peak detection method, maximum peak detection method, and clustering algorithm determine the parameters. The clustering algorithm includes a K-mean algorithm with a bottom-up hier archical algorithm and the PFS-value method. Parameters corresponding to the detected patterns are transformed back to their pattern equation. Then the patterns are detected in the picture space. The system of syntactic pattern recognition includes envelope gen eration, a linking process in the seismogram, segmentation, primitive

SYNTACTIC

PATTERN RECOGNITION

107

recognition, grammatical inference, and syntax analysis. Linking process ing extracts seismic patterns using a linking algorithm based on the pattern growing technique and a function approximation algorithm. Primitives are assigned by amplitude-dependent encoding and a grammar is in ferred by if-tail finite-state inference. Syntax analysis is performed by an error-correcting finite-state automaton. Finally, the seismic patterns are automatically recognized and reconstructed.

8.3. 8.3.1.

SYNTACTIC P A T T E R N R E C O G N I T I O N Linking Processing

and

Segmentation

The linking process [52, 67] extracts the seismic patterns from the seismogram. First, each waveform of each trace is detected by scanning the seismogram in a vertical direction from left to right. Second, these detected waveforms are linked as skeletons according to their geometrical properties. A pattern-growing algorithm in the linking process automatically ex tracts patterns using the principles of branch and bound search [111] and function approximation. Initially, the algorithm creates a new skeleton when the peak of the detected waveform is in the starting trace. The assignment of each new peak point (xj,yj) on the next trace to one of the existing skeletons or to a new skeleton is based on the calculated dis tance between the estimated location of the peak at the current trace (from the existing skeleton) and the current analyzed location of the peak. The current new peak is assigned to one of the existing paths if the distance is within the threshold of the function approximation. Otherwise, a new path with this new peak point is created. Using this linking algorithm, all the input peak points are linked as several skeletons. Next each linked pattern is segmented. Every four sampling intervals (traces) is one segment. If the number of sampling intervals is less than 4, no segment is created. 8.3.2.

Primitive

Recognition

Each segment is assigned to a primitive, so each linked pattern can be repre sented as a sentence, i.e., a string of primitives. The amplitude-dependent encoding method [58, 59] is used in the assignment of primitives. Since

A HIERARCHICAL RECOGNITION SYSTEM OF SEISMIC ...

108

the increment in the x direction is the same for any two consecutive peak points, the average slope in each segment is defined as S1 + S2 + S3 + S4 d

i =

1

.

(1)

where 51 = tj+l - tj , 52 = tj+2 - tj , 53 = tj+3 — tj , s

4

=

tj+4 ~~ tj ,

and where tj+i (i = 1-4) is the peak time of the seismogram. For each sentence string W, the primitive Wi is defined as follows: Wi = a,

if di > m,

Wi =b,

if n < di

Wi—c,

if o < di < n,

Wi = d,

if p < di < m,

Wi = e,

if q < di < p,

Wi = f ,

if di < q,

<m,

and

where m, n, o, p, and q are predefined values and a, b, c, d, e, and / are assigned primitive symbols. 8.3.3.

Training

Patterns

Training patterns to be used for grammatical inference are generated for a common-source seismogram using different apparent velocities for directwave patterns, refracted wave patterns (straight lines), and reflected wave patterns (hyperbolas). Training patterns for a stacked seismogram with different velocities in horizontal patterns (straight lines) and diffraction patterns (hyperbolas) are also generated. In addition to whole training patterns, a training pattern can be a por tion of a whole pattern which is formed by the intersection of many patterns. The intersected patterns may be discontinued because of interference. All these patterns are included in the training pattern set. By using segmenta tion and primitive recognition, all these training patterns are transformed

SYNTACTIC PATTERN

RECOGNITION

109

into strings of primitives. The length of the segment at the end of each pattern may not be enough, so truncation is considered. A finite-state grammar can be inferred from a set of training strings in each class. 8.3.4.

Grammatical

Inference

The K-ta,i\ finite-state grammatical inference algorithm [41] is used to infer grammars from a set of training patterns. The algorithm has been discussed in Chapter 2. The algorithm reduces the number of derived grammars. It finds the canonical grammar first and then merges the states which are Ktail equivalent. The algorithm is adjustable. The value of K controls the size of the inferred grammar. A set of training sentences for each class is used to generate the source grammar, and each input sample can be recognized as one member of a particular class. 8.3.5.

Finite-state

Error Correcting

Parsing

Due to distortion and noise problems, the syntax analysis uses finite-state error correcting parsing. Finite-state error correcting parsing tries to cor rect an input sentence by looking for and removing errors. Three types of errors can occur: insertion errors, substitution errors, and deletion errors. In this study, all segments are continuously sampled without insertion error; therefore, only substitution errors and deletion errors are considered and the finite-state grammar is expanded to include only those errors. The original production forms of a finite-state grammar are A-^aB or A -> a. The production forms added to account for substitution errors are A -s-bB or A -> b,

a^b.

110

A HIERARCHICAL RECOGNITION SYSTEM OF SEISMIC ...

The production forms added to account for deletion errors are A^XB or A-^X where A is the empty terminal (empty string). Input pattern strings are analyzed by a finite-state automaton [41] which can only accept languages defined by finite-state grammars. A finite-state automaton is the simplest recognizer (recognition device) to recognize the strings (sentences) of the language. A deterministic finite-state automaton is used to parse the input strings and is derived from the nondeterministic finite-state automaton (Chapter 2). The input-linked strings are analyzed by this determinis tic finite-state automaton machine; therefore, the input patterns are automatically detected by using the computer. 8.4.

C O M M O N - S O U R C E SIMULATED SEISMOGRAM RESULTS

Figure 8.5 is the skeleton of the common-source (one-shot) seismogram (Fig. 8.1) produced by the linking processing algorithm. For each sentence string W in the common-source seismogram, the primitive Wi is defined as Wi

= f,

if di > 30,

Wi

= e,

if 25 < di < 30,

Wi

= d,

if 17 < di < 25,

Wi

= c,

if 12 < d* < 17,

Wi

= b,

if 5 < di < 12 ,

Wi

= a,

if di < 5 .

where di is given by Eq. (1). When the hierarchical pattern recognition system is applied to the common-source seismogram, lines and hyperbolic curves are first separated

COMMON-SOURCE SIMULATED SEISMOGRAM RESULTS

°- °

Offset distance (km)

111

2.56

Fig. 8.5. Skeleton of the one-shot seismogram of Pig. 8.1.

using the deterministic finite-state automaton derived from the first two terminals of the straight-line training patterns (cc, dd) and the hyperboliccurve pattern (ab). Here, cc is equivalent to a refraction pattern with slope between 12 and 17 over 8 traces, and dd is equivalent to a direct-wave pat tern with slope between 17 and 25 over 8 traces. After separating patterns of different types, each pattern can be further separated from other patterns of the same type based on the class recognizers generated by the training patterns. Table 8.1(a) shows the nine velocity combinations used in generat ing nine training patterns of each type of pattern in a common-source seismogram. Using the primitive recognition method, Table 8.1(b) lists all possible training strings. These include complete strings and par tial strings caused by the intersection of two patterns. Deletion errors at the end of each pattern are included. In the grammatical inference step, different K values are tried. For K = 14, Table 8.2(a) shows the

112

A HIERARCHICAL RECOGNITION SYSTEM OF SEISMIC ... Table 8.1(a). Training patterns for velocity data of the one-shot seismogram. Velocity 1st layer (m/s)

Velocity 2nd layer (m/s)

(1)

1900

2500

(2)

1900

2700

(3)

1900

2900

(4)

2100

2700

(5)

2100

2900

(6)

2100

3100

(7)

2300

2900

(8)

2300

3100

(9)

2300

3300

Table 8.1(b). Training strings for the one-shot seismogram. Direct wave pattern

dd ddd dddd ddddd ddddddddd dddddddddd ddddddddddd ddddddddddddd dddddddddddddd

Reflected wave pattern

Refracted wave pattern

cc ccc cccc ccccc cccccc ccccccc cccccccc ccccccccc cccccccccc ccccccccccc cccccccccccc

abbcccdddd abbcccddddd abbccccdcddd abbccccccccccc

set of nonterminals used to infer the finite-state grammar for refiectedwave patterns of the common-source seismogram. The inferred finite-state grammar is shown in Table 8.2(b). Table 8.2(c) is the expanded grammar with substitution errors included. The deterministic finite-state automaton is derived in Table 8.2(d). Similar procedures are performed for direct-wave patterns and refracted-wave patterns.

COMMON-SOURCE

SIMULATED

Table 8.2(a).

SEISMOGRAM

RESULTS

if-tail nonterminal table for reflected-wave patterns. A" = 1 4

f/i = {abbccccccccccc, abbccccdcddd, abbcccddddd,

abbcccdddd}

U2 — {bbccccccccccc, bbccccdcddd, bbcccddddd, bbcccdddd} Ui = {bccccccccccc, bccccdcddd, bcccddddd, bcccdddd} UA = {ccccccccccc, ccccdcddd, cccddddd,

cccdddd}

f/s = {cccccccccc, cccdcddd, ccddddd, ccdddd} Ue = {ccccccccc, ccdcddd, cddddd, cdddd} U-r = {cccccccc, cdcddd, ddddd, dddd} Ug = {ccccccc, dcddd} Ug = {dddd, ddd} U10 = {cccccc} Un = {cddd} U12 =

{ddddd}

Uia = {ccccc} Uu

= {ddd}

Urn = {dd, d} Uis = {cccc} Un = {dd} UIB

= {d}

Ui9 = {ccc} U20 = {cc} Uix = {c}

114

A HIERARCHICAL RECOGNITION SYSTEM OF SEISMIC . . .

Table 8.2(b).

/f-tail grammer table for reflected-wave patterns. K = 14

G=(V„,Vt,P,

S) Vn = {Ult U2, U3, U4

,u5 u6,U7,

u9, u10, Un,l hi, ul3, Ui6, U17 Uis, U19 Vt = {a, b, c, d, e,f} S = UX P: U1-^aU2 U2-+bU3 U3-+bU4 UA-+cU& U& -> cU6 U6-+cU7 U7-+cUs U7->dUg Us -> cU10 Us-KlUu Ug -► dUi2 U10 ->-ct/i3 [ I n -> c U14 U12^dU15 U13 -> C Ul6 Un ->• d U\7 Uxs -¥ d Uis U15-yd U16

->cUla

U17 -> d Uis Uis -► d U19 -> c U2o U2o -> cU2x U2\ -» c

Uu,

u20

,U21}

Us, UU,

COMMON-SOURCE Table 8.2(c).

SIMULATED

SEISMOGRAM

RESULTS

115

Expanded grammer table for reflected-wave patterns. K = 14 G =-- (Vn, Vt, P, S)

u6, U7,

Vn = {Ui, U2, U3, U4 ,US, Ug Uio, Un,

Uu , u 1 7

,uw,

u12,

U8,

U13 Uu, Uia,

Ul9

U20, U21}

Vf- = {«. b, c, d, « . / }

s = Ui P: Ui —f a Ui

U7 ->• eUg

U2->bU3

Us^cUi0

U3->bU4

U8-+dUn

Ui->bU$

y9->dC/i2

Ui-tcUs

Uxo^tcUrz

U4->dU5

Un

Us^bUe

U12 -> d Uis

U5^cU6

U13

-tcUu -+cUi6

U5^dU6

Un —> d U\7

U6-+bU7

I/15 -)• d Uis

U6->cU7

Uls->d

U6-+dU7

Uie -*• c U\g

U7->bU8

Un

U7-*cUs

U18^d

U7->dU8

Uig - > C U20

U7-+cUg

U2o -> c U21

U7-+dUg

U2\ - > C

->■ d Uis

The input extracted strings of the common-source seismogram (Fig. 8.1) are shown in Table 8.3(a). Four linked skeleton strings are listed. The re sults are shown in Table 8.3(b). The deterministic finite-state automata are constructed to analyze the input strings and the detected patterns are reconstructed by using the Hough transformation. Figure 8.6 shows all the reconstruction of the detected patterns of the common-source seismogram.

116

A HIERARCHICAL

Table 8.2(d). (K = 1 4 ) .

Deterministic

a

1

6 3 4 5 6 7

18 19 20 21 22 23 24 25

c

d

automaton

e

/

2

2

10 11 12 13 14 15 16 17

Table 8.3(a).

flnite-state

3 4 5 6 7 8 9

RECOGNITION

8

SYSTEM

for

OF SEISMIC

reflected-wave

Final State

0 0 5 6 7 9 11 11

5 6 7 9 12 13

0 0 0 0 10

0 0

14

0 0

17 17

0 0 0 0

15 16 16 18

0 0

19 20

0 0 0

21 22 25

1 0

23 25 24 25

1

0 0 0 1

Input extracted strings from the one-shot seismogram. cc cccc abbcccdddd ddddddddddd

...

patterns

STACKED SIMULATED SEISMOGRAM RESULTS Table 8.3(b).

Results of the one-shot seismogram. Type

Input strings

Results in Fig. 8.6

cc

Line

Line-2

cccc

Line

Line-2

abbcccdddd

Hyperbolic curve

Hyperbolic curve

ddddddddddd

Line

Line-1

0.0

117

Offset distance (km)

2.56

Fig. 8.6. Summing patterns for the one-shot seismogram of Fig. 8.1.

8.5.

STACKED SIMULATED S E I S M O G R A M RESULTS

Figure 8.2 shows the input patterns of the stacked seismogram. By using the linking processing to separate the hyperbolic-curve patterns from the

118

A HIERARCHICAL RECOGNITION SYSTEM OF SEISMIC ..

Distance (km)

Fig. 8.7.

1.0

Skeleton of the stacked seismogram of Fig. 8.2.

straight-line patterns, we obtain the skeleton of the stacked seismogram. shown in Fig. 8.7. For each sentence string W in the stacked seismogram, the primitive w^ is defined as Wi = f,

if di > 8,

Wi =

if 4 < di < 8,

e,

Wi — d,

if 0
Wi =

if - 4 < di < 0,

C,

<4,

Wi = b>,.

if: - & < 4 - < - 4 ,

Wi

if d i < - 8 .

—a,:

STACKED

SIMULATED

SEISMOGRAM

RESULTS

119

The first three symbols of each pattern are used to separate linear and hyperbolic curve patterns of the stacked seismogram, then each individual class recognition process is performed. Table 8.4(a) lists the nine training patterns with different velocity com binations. Table 8.4(b) lists all sets of training strings including any dele tion errors at the ends of the patterns. Similar to the above procedures, the set of nonterminals, the inferred finite-state grammars, the expanded grammars, and the finite-state automata are performed for the horizontal patterns, the left diffraction patterns, and the right diffraction patterns. Table 8.5(a) shows the input extracted strings for the stacked seismogram. Table 8.5(b) shows the results for the seismic patterns detected in the Table 8.4(a).

Training patterns for velocity data of a stacked seismogram. Velocity 1st layer (m/s)

Velocity 2nd layer (m/s)

(1)

1900

2500

(2)

1900

2700

(3)

1900

2900

(4)

2100

2700

(5)

2100

2900

(6)

2100

3100

(7)

2300

2900

(8)

2300

3100

(9)

2300

3300

Table 8.4(b).

Training strings for a stacked seismogram.

Horizontal reflector patterns

Diffraction patterns (1)

Diffraction patterns (2)

eef eff deef cccccc

deff

aab

ccccccc

eeff

aabb

120

A HIERARCHICAL

Table 8.5(a).

RECOGNITION

SYSTEM

OF SEISMIC

Input extracted strings from a stacked seismogram. ee// aabb ccccccd

Table 8.5(b).

Results of a stacked seismogram.

Input strings

Type

Results

ee//

Hyperbolic curve

Hyperbolic-1 curve

aabb

Hyperbolic curve

Hyperbolic-2 curve

ccccccd

Line

Line

Distance (km)

0.0 0

X. 0

3

W mP ^ i S A W M V W W ,.■>■

Vl

fti *J

«'■

Fig. 8.8.

Summing patterns for the stacked seismogram of Fig. 8.2.

.

CONCLUSIONS

121

stacked seismogram. Summing all the detected patterns results in Fig. 8.8 which shows all patterns in the stacked seismogram.

8.6.

CONCLUSIONS

Syntactic (structural) pattern recognition is an important technique in recognizing seismic structural patterns. The syntactic pattern recognition system can recognize the classes of patterns and the Hough transformation system can reconstruct the patterns. The two systems are combined for the recognition and reconstruction of seismic patterns. The direct-wave pattern, refracted-wave pattern, and reflected-wave pattern were recon structed in a common-source (one-shot) seismogram. A horizontal pattern and a diffraction pattern were reconstructed in a stacked seismogram. In the syntactic pattern recognition system, amplitude-dependent en coding is used in the primitive recognition. K"-tail grammatical inference and expanded finite-state grammar are used in the syntactic pattern recog nition system. Substitution and deletion errors are considered in the ex panded finite-state grammar. The experimental results obtained using hierarchical syntactic pattern recognition and a Hough transformation system are quite good, and better than the results of using the Hough transformation only. The hierarchical syntactic pattern recognition system may be applicable to complex 2-D and 3-D seismic patterns caused by complex geology and velocity distribution.

8.7.

FUTURE STUDY

The chapters from Chapter 2 to 8 are the proposed fundamental syntactic techniques and their applications to seismic pattern recognition. In the fu ture study, combining the syntactic and the semantic approach can expand the power in syntactic pattern recognition [42]. Semantic information often provides spatial information, relation, and reasoning between primitives, subpatterns, and patterns, and can be expressed syntactically, for example, the attributed strings and attributed graphs. The attributed 2-D and 3-D pattern grammars, for example, attributed tree, graph, and shape gram mars, may be in the future study [14, 98, 113-115]. The distance computa tion between two attributed patterns (attributed strings, attributed trees,

122

A HIERARCHICAL RECOGNITION SYSTEM OF SEISMIC ...

attributed graphs,..., etc.) may also be in the future study [64]. The error-correcting finite-state parsing, Earley's parsing, tree automaton,..., etc. may also be expanded for attributed strings, t r e e s , . . . , etc. [64, 65, 80]. The distance can be computed between input pattern y and language L(G), or between input pattern and training pattern. Using a distance or similarity measure, clustering methods, such as minimum-distance clas sification rule, nearest neighbor classification rule, if-nearest neighbor classification rule and method of hierarchical clustering, can be easily applied to syntactic patterns [41, 45, 82, 83, 116]. If the pattern has inherent structural property, globally we can use syntactic approach to recognize the pattern. Locally we can use neural network techniques in the segmentation and the recognition of primitives, such that the syntactic pattern recognition will improve and become more robust against the noise and distortions. In the study of certainty effect, besides the probability approach, fuzzy logic may be considered in grammar and automaton, for example, fuzzy tree automaton [74]. Parallel parsing algorithms can speed up the parsing time [17, 25, 29]. For example, the tree automaton can be parsed from the bottom leaves to the top root of the tree in parallel. Further, syntactic approach to the time-varying pattern recognition may also be one of the research topics [34]. We also expect that the syntactic pattern recognition techniques can be applied to more complex geophysical pattern recognition problems and improve seismic interpretations.

REFERENCES [1] A. V. Aho and T. G. Peterson, A minimum distance error-correcting parser for context-free languages, SI AM J. Comput. 1 (1972) 305-312. [2] A. V. Aho and J. D. Ullman, The Theory of Parsing, Translation, and Compiling, Vol. 1: Parsing (Prentice-Hall, Englewood Cliffs, NJ, 1972). [3] F. Ali and T. Pavlidis, Syntactic recognition of handwritten numerals, IEEE Trans. Syst. Man Cybern. 7 (1977) 537-541. [4] P. Aminzadeh (ed.), Pattern recognition & image processing, Handbook of Geophysical Exploration: Section I. Seismic Exploration 20 (Geophysical Press, London, 1987). [5] F. Aminzadeh, S. Katz and K. Aki, Adaptive neural nets for generation of artificial earthquake precursors, IEEE Trans. Geosci. Remote Sensing 32 (1994) 1139-1143. [6] K. R. Anderson, Syntactic analysis of seismic waveforms using augmented transition network grammars, Geoexploration 20 (1982) 161-182. [7] D. H. Ballard and C. M. Brown, Computer Vision (Prentice-Hall, 1982). [8] A. Barrero, Inference of tree grammars using negative samples, Pattern Recogn. 24 (1991) 1-8. [9] S. Basu and K. S. Fu, Image segmentation by syntactic method, Pattern Recogn. 20 (1987) 33-44. [10] P. Bois, Autoregressive pattern recognition applied to the delimitation of oil and gas reservoirs, Geophys. Prospecting 28 (1980) 572-591. [11] P. Bois, Some apphcation of pattern recognition to oil and gas exploration, IEEE Trans. Geosci. Remote Sensing 21 (1983) 416-426. [12] J. M. Brayer and K. S. Fu, A note on the K-tail method of tree grammar inference, IEEE Trans. Syst. Man Cybern. 7 (1977) 293-299. [13] I. Bruha and G. P. Madhavan, Use of attributed grammars for pattern recognition of evoked potentials, IEEE Trans. Syst. Man Cybern. 18 (1988) 1046-1089. [14] H. Bunke, Attributed programmed graph grammars and their apphcation to schematic diagram interpretation, IEEE Trans. Pattern Anal. Mach. Intell. 4 (1982) 574-582.

123

124

REFERENCES

[15] H. Bunke and A. Sanfeliu (eds.), Special issue: Advances in syntactic pattern recognition, Pattern Recogn. 19, 4 (1986). [16] H. Bunke and A. Sanfeliu (eds.), Syntactic and Structural Pattern Recog nition — Theory and Applications (World Scientific, 1990). [17] N. S. Chang and K. S. Fu, Parallel parsing of tree languages for syntactic pattern recognition, Pattern Recogn. 11 (1979) 213-222. [18] C. H. Chen (ed.), Computer-aided seismic analysis and discrimination, Methods in Geochemistry and Geophysics 13 (1978). [19] C. H. Chen (ed.), Special issue: Seismic signal analysis and discrimination, Geoexploration 20, 1/2 (1982). [20] C. H. Chen (ed.), Special issue: Seismic signal analysis and discrimination III, Geoexploration 23, 1 (1984). [21] C. H. Chen (ed.), Special issue: Artificial intelligence and signal processing in underwater acoustic and geophysics problems, Pattern Recogn. 18, 6 (1985). [22] C. H. Chen, L. F. Pau and P. S. Wang (eds.), Handbook of Pattern Recog nition and Computer Vision (World Scientific, 1993). [23] J. C. Cheng and H. S. Don, A graph matching approach to 3-D point correspondences, Int. J. Pattern Recogn. Artif. Intell. 5 (1991) 399-412. [24] Y. C. Cheng and S. Y. Lu, Waveform correlation by tree matching, IEEE Trans. Pattern Anal. Mach. Intell. 7 (1985) 299-305. [25] Y. T. Chiang and K. S. Fu, Parallel parsing algorithm and VLSI implemen tations for syntactic pattern recognition, IEEE Trans. Pattern Anal. Mach. Intell. 6 (1984) 302-314. [26] R. J. P. deFigueiredo, Pattern recognition approach to exploration, in K. C. Jain and R. J. P. deFigueiredo (eds.), Concepts and Techniques in Oil and Gas Exploration (Soc. Explor. Geophys., 1982). [27] M. B. Dobrin, Introduction to Geophysical Prespecting, 3rd ed. (McGrawHill, New York, 1976) Chapter 10. [28] M. B. Dobrin and C. H. Savit, Introduction to Geophysical Prospecting, 4th ed. (McGraw-Hill, New York, 1988). [29] H. S. Don and K. S. Fu, A parallel algorithm for stochastic image segmen tation, IEEE Trans. Pattern Anal. Mach. Intell. 8 (1986) 594-603. [30] D. Dori, A syntactic/geometric approach to recognition of dimensions in engineering machine drawings, Comput. Vision Graph. Image Process. 47 (1989) 271-291. [31] R. O. Duda and P. E. Hart, Use of the Hough transformation to detect lines and curves in picture, Commun. ACM 15 (1972) 11-15. [32] J. Earley, An efficient context-free parsing algorithm, Commun. of ACM 13 (1970) 94-102. [33] M. A. Eshera and K. S. Fu, A graph distance measure for image analysis, IEEE Trans. Syst. Man Cybern. 14 (1984) 398-408.

REFERENCES

125

[34] T. I. Fan and K. S. Fu, A syntactic approach to time-varying image analysis, Comput. Graph. Image Process. 11 (1979) 138-149. [35] J. S. Farnback, The complex envelope in seismic signal analysis, Bull. Seismol. Soc. Am. 65 (1975) 951-962. [36] T. Feder, Plex languages, Inf. Sci. 3 (1971) 225-241. [37] G. Ferrate, T. Pavlidis, A. Sanfeliu and H. Bunke (eds.), Syntactic and Structural Pattern Recognitions (Springer Verlag, 1988). [38] H. Freeman, On the encoding of arbitrary geometric configurations, IEEE Electron. Comput. 10 (1961) 260-268. [39] K. S. Fu, Syntactic Methods in Pattern Recognition (Academic Press, New York, 1974). [40] K. S. Fu, Syntactic image modeling using stochastic tree grammars, Comput. Graph. Image Process. 12 (1980) 136-152. [41] K. S. Fu, Syntactic Pattern Recognition and Applications (Prentice-Hall, Englewood Cliffs, NJ, 1982). [42] K. S. Fu, A step towards unification of syntactic and statistical pattern recognition, IEEE Trans. Pattern Anal. Mach. Intell. 5 (1983) 200-205. [43] K. S. Fu and B. K. Bhargava, Tree systems for syntactic pattern recognition, IEEE Trans. Comput. 22 (1973) 1087-1099. [44] K. S. Fu and T. Huang, Stochastic grammars and languages, Int. J. Comput. Inf. Sci. 1 (1972) 135-170. [45] K. S. Fu and S. Y. Lu, A clustering procedure for syntactic patterns, IEEE Trans. Syst. Man Cybern. 7 (1977) 734-742. [46] J. E. Gaby and K. R. Anderson, Hierarchical segmentation of seismic waveforms using affinity, Pattern Recogn. 23 (1984) 1-16. [47] R. M. Gagliardi, Introduction to Communications Engineering (John Wiley & Sons, 1978) 485-490. [48] P. Garcia, E. Segarra, E. Vidal and I. Galiano, On the use of the morphic generator grammatical inference (MGG) methodology in automatic speech recognition, Int. J. Pattern Recogn. Artif. Intell. 4 (1990) 667-685. [49] R. C. Gonzalez and M. G. Thomason, Syntactic Pattern Recognition (Addison Wesley, Reading, MA, 1978). [50] D. C. Hagen, The application of principal components analysis to seismic data sets, Proc. 2nd Int. Symp. Comput. Aided Seismic Anal. Discrimina tion, North Dartmouth (1981) 98-109. [51] P. V. C. Hough, Method and means for recognizing complex patterns, U.S. Paten 3,069,654 (1962). [52] K. Y. Huang, Branch and bound search for automatic hnking process of seismic horizons, Pattern Recogn. 23 (1990) 657-667. [53] K. Y. Huang, Pattern recognition to seismic exploration, in Ibrahim Palaz and Sales K. Sengupta (eds.), Automated Pattern Analysis in Petroleum Exploration (Springer-Verlag, New York, 1992) 121-154.

126

REFERENCES

[54] K. Y. Huang, Syntactic pattern recognition, in C. H. Chen, L. F. Pau and P. S. P. Wang (eds.), Handbook of Pattern Recognition & Computer Vision, 2nd ed. (World Scientific, Singapore, 1999). [55] K. Y. Huang, W. Bau and S. Y. Lin, Picture description language for recognition of seismic patterns, Society of Exploration Geophysicists International 1987 Meeting, New Orleans, 326-330. [56] K. Y. Huang and K. S. Fu, Decision-theoretic pattern recognition for the classification of Ricker wavelets and the detection of bright spots, 52nd Ann. Int. Mtg. Soc. Explor. Geophys. (Dallas, 1982) 222-224. [57] K. Y. Huang and K. S. Fu, Detection of bright spots in seismic signal using tree classifiers, Geoexplor. 23 (1984) 121-145. [58] K. Y. Huang and K. S. Fu, Syntactic pattern recognition for the classifica tion of Ricker wavelets, Geophys. 50 (1985) 1548-1555. [59] K. Y. Huang and K. S. Fu, Syntactic pattern recognition for the recognition of bright spots, Pattern Recogn. 18 (1985) 421-428. [60] K. Y. Huang and K. S. Fu, Decision-theoretic approach for classification of Ricker wavelets and detection of seismic anomalies, IEEE Trans. Geosci. Remote Sensing 25 (1987) 118-123. [61] K. Y. Huang and K. S. Fu, Detection of seismic bright spots using pattern recognition techniques, in Fred Aminzadeh (ed.), Handbook of Geophysical Exploration: Section I. Seismic Exploration, 20, Pattern Recognition & Image Processing (Geophysical Press, 1987) 263-301. [62] K. Y. Huang, K. S. Fu, S. W. Cheng and Z. S. Lin, Syntactic pattern recog nition and Hough transformation for reconstruction of seismic patterns, Geophys. 52 (1987) 1612-1620. [63] K. Y. Huang, K. S. Fu, S. W. Cheng and T. H. Sheen, Image process ing of seismogram: (A) Hough transformation for the detection of seismic patterns; (B) Thinning processing in the seismogram, Pattern Recogn. 18 (1985) 429-440. [64] K. Y. Huang and D. R. Leu, Modified Earley parsing and MPM method for attributed grammar and seismic pattern recognition, J. Info. Sci. Eng. 8 (1992) 541-565. [65] K. Y. Huang and D. R. Leu, Recognition of Ricker wavelets by syntactic analysis, Geophys. 60 (1995) 1541-1549. [66] K. Y. Huang and T. H. Sheen, A tree automaton system of syntactic pattern recognition for the recognition of seismic patterns, 56th Ann. Int. Mtg., Soc. Expl. Geophys. (1986) 183-187. [67] K. Y. Huang, T. H. Sheen, S. W. Cheng, Z. S. Lin and K. S. Fu, Seismic image processing: (I) Hough transformation, (II) Thinning processing, (III) Linking processing, in Fred Aminzadeh (ed.), Handbook of Geophysi cal Exploration: Section I. Seismic Exploration, 20, Pattern Recognition & Image Processing (Geophysical Press, 1987) 79-109.

REFERENCES

127

[68] K. Y. Huang, J. J. Wang and Vram M. Kouramajian, Matrix grammars for syntactic pattern recognition, 1990 Telecommunications Symposium in Taiwan, 576-581. [69] J. W. Hunt and T. G. Szymansky, A fast algorithm for computing longest common subsequences, Commun. ACM 20 (1977) 350-353. [70] J. Illingworth and J. Kittler, A survey of the Hough transform, Comput. Vision, Graphics. Image Process. 44 (1988) 87-116. [71] S. Kiram and C. Pandu, A linear space algorithm for the LCS problem, Ada Informatica 24 (1987) 353-362. [72] A. Koski, M. Juhola and M. Meriste, Syntactic recognition of ECG signals by attributed finite automata, Pattern Recogn. 28 (1995). [73] Lawrence H. T. Le and Edo Nyland, An application of syntactic pat tern recognition to seismic interpretation, in A. Krzyzak, T. Kasvand and C. Y. Suen (eds.), Computer Vision and Shape Recognition (World Scientific, 1988) 396-415. [74] E. T. Lee, Fuzzy tree automata and syntactic pattern recognition, IEEE Trans. Pattern Anal. Mach. Intell. 4 (1982) 445-449. [75] V. I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, Sov. Phys. Dokl. 10 (1966) 707-710. [76] B. Levine, Derivatives of tree sets with applications to grammatical inference, IEEE Trans. Pattern Anal. Mach. Intell. 3 (1981) 285-293. [77] B. Levine, The use of tree derivatives and a sample support parameter for inferring tree systems, IEEE Trans. Pattern Anal. Mach. Intell. 4 (1982) 25-34. [78] R. Y. Li and K. S. Fu, Tree system approach for LANDSAT data interpre tation, Symp. Mach. Process. Remotely Sensed Data, West Lafayette, Ind., June 29-July 1, 1976. [79] W. C. Lin and K. S. Fu, A syntactic approach to 3D object representation, IEEE Trans. Pattern Anal. Mach. Intell. 6 (1984) 351-364. [80] H. H. Liu and K. S. Fu, A Syntactic Approach to seismic discrimination, Geoexplor. 20 (1982) 183-196. [81] S. W. Lu, Y. Reng and C. Y. Suen, Hierarchical attributed graph represen tation and recognition of handwritten Chinese characters, Pattern Recogn. 24 (1991) 617-632. [82] S. Y. Lu, A tree-to-tree distance and its application to cluster analysis, IEEE Trans. Pattern Anal. Mach. Intell. 1 (1979) 219-224. [83] S. Y. Lu and K. S. Fu, A sentence-to-sentence clustering procedure for pattern analysis, IEEE Trans. Syst. Man Cybern. 8 (1978) 381-389. [84] S. Y. Lu and K. S. Fu, Error-correcting tree automata for syntactic pattern recognition, IEEE Trans. Comput. 27 (1978) 1040-1053. [85] S. Y. Lu and K. S. Fu, A syntactic approach to texture analysis, Comput. Graph. Image Process. 7 (1978) 303-330.

128

REFERENCES

[86] S. Y. Lu and K. S. Fu, Stochastic tree grammar inference for texture synthesis and discrimination, Comput. Graph. Image Process. 9 (1979) 234-245. [87] L. Miclet, Structural Methods in Pattern Recognition (North Oxford Academic, London, 1986). [88] W. Min, Z. Tang and L. Tang, Using web grammar to recognize dimensions in engineering drawings, Pattern Recogn. 26 (1993) 1407-1416. [89] B. Moayer and K. S. Fu, A tree system approach for fingerprint pattern recognition, IEEE Trans. Comput. 25 (1976) 262-274. [90] R. Mohr, T . Pavlidis and A. Sanfeliu (eds.), Structural Pattern Recognitions (World Scientific, 1990). [91] T. Pavlidis, Linear and context-free graph grammars, JACM 19 (1972) 11-12. [92] T. Pavlidis, Structural Pattern Recognition (Springer-Verlag, New York, 1977). [93] C. E. Payton (ed.), Seismic Stratigraphy — Applications to Hydrocarbon Exploration (AAPG Memoir 26, Tulsa, OK, Amer. Assn. Petroleum Geo logists, 1977). [94] J. L. Pfaltz and A. Rosenfeld, Web grammars, Proc. 1st Int. Joint Conf. Artif. Intell., Washington, D.C. (1969) 609-619. [95] A. Rosenfeld, Picture Languages (Academic Press, New York, 1979). [96] A. Sanfeliu, K. S. Fu and J. Prewitt, An application of a graph distance measure to the classification of muscle tissue patterns, Int. J. Pattern Recogn. Artif. Intell. 1 (1987) 17-42. [97] A. C. Shaw, The formal picture description scheme as a basis for picture processing system, Inf. Control 14 (1969) 9-52. [98] Q. Y. Shi and K. S. Fu, Parsing and translation of (attributed) expansive graph languages for scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 5 (1983) 472-485. [99] A. Sinvhal and H. Sinvhal, Seismic Modelling and Pattern Recognition in Oil Exploration (Kluwer Academic Publishers, Netherlands, 1992). [100] L. Stringa, A new set of constraint-free character recognition grammars, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 1210-1217. [101] P. H. Swain and K. S. Fu, Stochastic programmed grammars for syntactic pattern recognition, Pattern Recogn. 4 (1972). [102] E. Tanaka and K. S. Fu, Error-correcting parsers for formal languages, IEEE Trans. Comput. 27 (1978) 605-616. [103] E. Tanaka and K. Tanaka, The tree-to-tree editing problem, Int. J. Pattern Recogn. Artif. Intell. 2 (1988) 221-240. [104] M. T. Taner, F. Koehler and R. E. Sheriff, Complex seismic trace analysis, Geophys. 4 4 (1979) 1041-1063. [105] M. G. Thomason, Generating functions for stochastic context-free gram mars, Int. J. Pattern Recogn. Artif. Intell. 4 (1990) 553-572.

REFERENCES

129

[106] M. G. Thomason and R. C. Gonzalez, Error detection and classification in syntactic pattern structures, IEEE Trans. Comput. 24 (1975) 93-95. [107] Y. F. Tsao, Skeleton Processing for Shape Analysis and Image Generation (Ph.D. thesis, Purdue University, 1982). [108] H. L. Van Trees, Detection, Estimation, and Modulation Theory, Vol. I (Wiley, New York, 1968). [109] R. A. Wagner and M. J. Fischer, The string to string correction problem, JACM 2 1 (1974) 168-173. [110] P. S. P. Wang (ed.), Special issue on array grammars, patterns and recog nizers, Int. J. Pattern Recogn. Artif. Intell. 3, 3&4 (1989). [Ill] P.H. Winston, Artificial Intelligence (Addison Wesley, Reading, MA, 1984). [112] G. Wolberg, A syntactic omni-font character recognition system, Int. J. Pattern Recogn. Artif. Intell. 1 (1987) 303-322. [113] A. K. C. Wong, S. W. Lu and M. Rioux, Recognition and shape synthesis of 3-D objects based on attributed hypergraph, IEEE Trans. Pattern Anal. Mach. Intell. 11 (1989) 279-290. [114] K. C. You and K. S. Fu, A syntactic approach to shape recognition using attributed grammars, IEEE Trans. Syst. Man Cybern. 9 (1979) 334-345. [115] K. C. You and K. S. Fu, Distorted shape recognition using attributed grammars and error-correcting techniques, Comput. Graph. Image Process. 13 (1980) 1-16. [116] T. Y. Young and K. S. Fu (eds.), Handbook of Pattern Recognition and Image Processing (Academic, New York, 1986). [117] T. Y. Zhang and C. Y. Suen, A fast parallel algorithm for thinning digital patterns, Commun. Ass. Comput. Mach. 27 (1984) 236-239.

This page is intentionally left blank

INDEX K-mea.n algorithm, 106 AT-nearest neighbor classification rule, 122 •ftT-tail finite-state grammatical inference, 109 K-tai\ finite-state inference, 107

seismogram, 110 common-source seismogram, 108 context-free grammar, 2, 7, 9, 14 context-free language, 7 context-sensitive grammar, 9 dendrogram, 64 detection problem, 67 detection theory, 68 deterministic finite-state automaton, 10, 13, 37, 110 diffraction curves, 103 diffraction patterns, 108 direct wave, 103 direct-wave patterns, 108 dynamic programming, 4, 31, 51, 52

amplitude-dependent, 107 amplitude-dependent encoding, 25, 34, 65 attributed context-free grammar, 42 attributed context-free languages, 45 attributed grammar, 2, 40, 56 attributed grammar parsing, 39 attributed strings, 2, 39, 45, 51, 52, 122 attributed tree, 121

Earley's parsing, 2, 3, 7, 14, 58, 122 eight-neighbor connectivity, 84 encoding method, 107 error correcting parsing, 109 error probability, 65, 68 error transformations, 19 error-correcting finite-state automaton, 22, 31, 107 error-correcting finite-state parsing, 31, 122 error-correcting tree automaton, 4, 75, 76 expanded tree grammar, 89, 93 expansive tree grammar, 77

bottom-up hierarchical algorithm, 106 bottom-up replacement function(s), 78, 84 branch and bound search, 107 breadth-first tree expansion, 84 bright spot, 75, 95 bright spot pattern, 80, 90 bright spot seismic pattern, 5 candidate bright spot, 65, 69 canonical definite finite-state grammar, 16 common-source, 103 common-source (one-shot) 131

132 finite-state, 109 finite-state automata, 2 finite-state automaton, 3, 7, 10, 12, 31 finite-state grammar, 2, 7, 9, 10, 12, 26, 31 finite-state language, 7 fiat, 95 flat spot, 5, 75 formal language, 7, 8 four-neighbor, 84 Freeman's chain codes, 82 function approximation, 107 fuzzy logic, 102, 122 fuzzy tree automaton, 122 general expanded finite-state grammar, 26 generalized error-correcting tree automation (GECTA), 5, 94 geophysical pattern recognition, 122 global detection, 72 gradual sealevel fall, 5, 75, 95 gradual sealevel rise patterns, 5, 75 gradual sealevel rise seismic patterns, 95 grammatical inference, 2, 16, 103, 107, 109 hierarchical, 60 hierarchical clustering, 122 hierarchical syntactic pattern recognition, 103, 105 hierarchical syntactic pattern recognition system, 121 hierarchical system, 5 High Island, 66, 73 Hough transformation, 5, 103, 105, 115 Hough transformation system, 121 inference algorithm of attributed grammar, 44 inference of an attributed context-free

INDEX grammar, 44 inference of attributed grammar, 55 inference of canonical finite-state grammar, 16 inference of expanded finite-state grammar, 17 inference of expansive tree grammar, 85 inference of finite-state grammar, 17, 25 inherited, 40 inherited attribute, 42 Levenshtein distance, 3, 19, 65, 68 likelihood ratio test (LRT), 65, 67 linking process, 107 local detection, 66 match primitive measure (MPM), 4, 39, 40, 51, 52 maximum-likelihood SPECTA, 92 maximum-matching, 4 minimum distance GECTA, 5, 77, 94, 98 minimum-cost error-correcting finite-state automaton, 21 minimum-cost error-correcting finite-state parsing, 32 minimum-distance error-correcting Earley's parsing, 45 minimum-distance error-correcting parser (MDECP), 20 Mississippi Canyon, 60, 66, 69 modified error-correcting Earley's parsing, 39 modified Freeman's chain code, 25, 34,58 modified maximum likelihood SPECTA, 5, 77, 93, 98 modified minimum distance error-correcting Earley's parsing, 39 MPM parsing, 58

INDEX MPM parsing algorithm, 4 nearest neighbor classification rule, 122 nearest-neighbor classification, 65 neural network, 122 nondeterministic finite-state automaton, 12, 13, 37, 110 one-shot seismogram, 3, 103 optimal quantization, 65 optimal quantization encoding, 66 parallel parsing, 122 parallel processing, 84 pattern-growing, 107 PFS-value method, 106 pinch-out, 5, 75, 95 primitive recognition, 107 primitives, 2 recognition system, 105 reflected wave pattern(s), 103, 108 refracted wave pattern(s), 103, 108 restricted expanded finite-state grammar, 28, 29 Ricker wavelets, 3, 21, 32, 58 seismic interpretations, 122 semantic deletion error, 42 semantic information, 121 semantic insertion error, 41 semantic rules, 40 semantic substitution error, 42 similarity measure, 40, 122 simplest recognizer, 10, 110 stacked seismogram, 3, 103, 108, 117 stochastic expansive tree grammar, 92 syntactic and semantic, 121

133 syntactic deletion error, 42 syntactic insertion error, 41 syntactic parsing analysis, 2 syntactic pattern, 105 syntactic pattern recognition, 121 syntactic pattern recognition system, 2 syntactic substitution error, 42 syntax analysis, 103, 107 synthesized attribute(s), 40, 42 terminals, 2 the global detection, 65 the tree of flat spot training pattern, 95 time-varying pattern recognition, 122 top-down parsing using MPM, 56 training pattern, 95 traveltime curves, 103 tree automata, 3, 75 tree automaton, 84, 122 tree classification, 65, 69 tree grammar(s), 2, 75, 76 tree grammar inference, 75 tree language, 78 tree of bright spot, 95 tree representation(s), 75, 82, 84 unrestricted grammar, 9 weighted distance, 20 weighted error transformations, 19 weighted minimum distance structure preserved error-correcting tree automaton (SPECTA), 5, 77, 88, 89,96 zero-phase Ricker wavelets, 22

The use of pattern recognition has become more and more important in seismic oil exploration. Interpreting a large volume of seismic data is a challenging problem. Seismic reflection data in the one-shot seismogram and stacked seismogram may contain some structural information from the response of the subsurface. Syntactic/structural pattern recognition techniques can recognize the structural seismic patterns and improve seismic interpretations. The syntactic analysis methods include: (1) the error-correcting finitestate parsing, (2) the modified error-correcting Earley's parsing, (3) the parsing using the match primitive measure, (4) the Levenshtein distance computation, (5) the likelihood ratio test, (6) the error-correcting tree automata, and (7) a hierarchical system. Syntactic seismic pattern recognition can be one of the milestones of a geophysical intelligent interpretation system. The syntactic methods in this book can be applied to other areas, such as the medical diagnosis system. The book will benefit geophysicists, computer scientists and electrical engineers.

www. worldscientific. com 4682 he

^B9j789810^2460o£H

Syntactic pattern recognition for seismic oil exploration

Read more

Syntactic methods in pattern recognition

Read more

Syntactic methods in pattern recognition

Read more

Chemometrics for Pattern Recognition

Read more

Chemometrics for Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern recognition

Read more

Pattern Recognition

Read more

PATTERN RECOGNITION

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Pattern Recognition

Read more

Recommend Documents

Syntactic pattern recognition for seismic oil exploration

SYNTACTIC PATTERN RECOGNITION FOR SEISMIC OIL EXPLORATION Hi) u- Yuunil mi ni> 60 ■ MACHINE PERCEPTION ARTIFICIAL INTEL...

Syntactic methods in pattern recognition

Syntactic Methods in Pattern Recognition K. S. FU . School of Electrical Engineering Purdue University West Lafayette,...

Syntactic methods in pattern recognition

Chemometrics for Pattern Recognition

Chemometrics for Pattern Recognition Chemometrics for Pattern Recognition Richard G. Brereton © 2009 John Wiley & Sons...

Chemometrics for Pattern Recognition

Chemometrics for Pattern Recognition Chemometrics for Pattern Recognition Richard G. Brereton © 2009 John Wiley & Sons...

Pattern Recognition

Pattern Recognition

ELSEVIER ACADEMIC PRESS I PATTERN RECOGNITION S E C O N n SERGIOS THEODORIDIS KONSTANTINOS KOUTROUMBAS A L PATT...

Pattern Recognition

Pattern Recognition

Pattern Recognition