Fundamentals of Convolutional Coding ROLF JOHANNESSON KAMIL Sh. ZIGANGIROV
John B. Anderson, Series Editor
IEEE Press 445 Hoes Lane, P.O. Box 1331 Piscataway, NJ 08855-1331
IEEE Press Editorial Board Roger F. Hoyt, Editor in Chief
A. H. Haddad R. Herrick S. Kartalopoulos D. Kirk P. Laplante
J. B. Anderson P. M. Anderson M. Eden M. E. El-Hawary S. Furui
M. Padgett W. D. Reeve G. Zobrist
Kenneth Moore, Director of IEEE Press John Griffin, Senior Acquisition Editor Marilyn G. Catis, Assistant Editor IEEE Communications Society, Sponsor COMM-S Liaison to IEEE Press, Salah Aidarous Information Theory Society, Sponsor IT-S Liaison to IEEE Press, Sanjeev Kulkarni Vehicular Technology Society, Sponsor VT-S Liaison to IEEE Press, J. R. Cruz
Cover photo: After Gottfried Ungerboeck, Courtesy of IBM, Zurich Laboratory, Switzerland Cover design: William T. Donnelly, WT Design
Technical Reviewers Daniel J. Costello, Jr., University of Notre Dame John Proakis, Northeastern University
Books of Related Interest from the IEEE Press ... DIGITAL TRANSMISSION ENGINEERING
John B. Anderson A volume in the Digital & Mobile Communication Series An IEEE Press book published in cooperation with Prentice Hall 1999 400 pp Hardcover IEEE Order No. PC5714
ISBN 0-7803-3457-4
TRELLIS CODING Christian Schlegel 1997 Hardcover
ISBN 0-7803-1052-7
304 pp
IEEE Order No. PC4609
AN INTRODUCTION TO STATISTICAL COMMUNICATION THEORY.- An IEEE Classic Reissue
David Middleton 1996 Hardcover
IEEE Order No. PC5648
ISBN 0-7803-1178-7
REED-SOLOMON CODES AND THEIR APPLICATIONS Stephen B. Wicker and Vijay K. Bhargava 1994 Hardcover 336 pp IEEE Order No. PC3749
ISBN 0-7803-1025-X
1,184 pp
FUNDAMENTALS OF CONVOLUTIONAL CODING Rolf Johannesson Lund University, Sweden Kamil Sh. Zigangirov Lund University, Sweden
IEEE Communications Society, Sponsor IEEE Information Theory Society, Sponsor IEEE Vehicular Technology Society, Sponsor
IEEE PRESS
John B. Anderson, Series Editor
The Institute of Electrical and Electronics Engineers, Inc., New York
This book and other books may be purchased at a discount from the publisher when ordered in bulk quantities. Contact: IEEE Press Marketing Attn: Special Sales Piscataway, NJ 08855-1331 Fax: (732) 981-9334 For more information about IEEE PRESS products, visit the IEEE Home Page: http://www.ieee.org/
© 1999 by The Institute of Electrical and Electronics Engineers, Inc. 3 Park Avenue, 17th floor, New York, NY 10016-5997. All rights reserved. No part of this book may be reproduced in any form, nor may it be stored in a retrieval system or transmitted in any form, without written permission from the publisher. Printed in the United States of America 10
9
8
7
6
5
4
3
2
1
ISBN 0-7803-3483-3 IEEE Order Number: PC5739
Library of Congress Cataloging-in-Publication Data Johannesson, Rolf, 1946Fundamentals of convolutional coding / Rolf Johannesson, Kamil Sh. Zigangirov. p. cm. -- (IEEE Press series on digital and mobile communication) "IEEE Communications Society, sponsor. IEEE Information Theory Society, sponsor. IEEE Vehicular Technology Society, sponsor." Includes bibliographical references and index. ISBN 0-7803-3483-3
1. Coding theory. 2. Convolutions (Mathematics). 3. ErrorI. Zigangirov, K. Sh. correcting codes (Information theory) II. IEEE Communications Society. III. IEEE Information Theory Society. IV. Vehicular Technology Society. V. Title. VI. Series. TK5102.92.J64 1998 003' .54--dc21
98-36706
CIP
To
Regina, Katrin, Peter, and Hanna and Ira, Dima, and Valja
Contents
PREFACE
xi
ACKNOWLEDGMENTS xiii CHAPTER 1
INTRODUCTION
1
1.1 Why Error Control? 1
1.2 Block Codes-A Primer 6 1.3 A First Encounter with Convolutional Codes 16 1.4 Block Codes versus Convolutional Codes 21 1.5 Capacity Limits and Potential Coding Gain Revisited 23 1.6 Comments 25 Problems 26
CHAPTER 2 CONVOLUTIONAL ENCODERSSTRUCTURAL PROPERTIES 31 2.1 Convolutional Codes and Their Encoders 31 2.2 The Smith Form of Polynomial Convolutional Generator Matrices 38 2.3 Encoder Inverses 45 2.4 Equivalent and Basic Encoding Matrices 52 2.5 Minimal-Basic Encoding Matrices 55 2.6 Minimal Encoding Matrices and Minimal Encoders 61 2.7 Canonical Encoding Matrices* 73 2.8 Minimality via the Invariant-Factor Theorem* 87 vii
viii
Contents
2.9 Syndrome Formers and Dual Encoders 91
2.10 Systematic Convolutional Encoders 96 2.11 Comments 103 Problems 103
CHAPTER 3 DISTANCE PROPERTIES OF CONVOLUTIONAL CODES 109 3.1
Distance Measures-A First Encounter 109
3.2 Active Distances 117 3.3 Properties of Convolutional Codes
via the Active Distances 123 3.4 Lower Bound on the Distance Profile 128 3.5 Upper Bounds on the Free Distance 132 3.6 Time-Varying Convolutional Codes 136 3.7 Lower Bound on the Free Distance 139 3.8 Lower Bounds on the Active Distances* 143 3.9 Distances of Cascaded Concatenated Codes* 149 3.10 Path Enumerators 153 3.11 Comments 158 Problems 159
CHAPTER 4 VITERBI DECODING 163 The Viterbi Algorithm Revisited 163 4.2 Error Bounds for Time-Invariant Convolutional Codes 168 4.3 Tighter Error Bounds for Time-Invariant Convolutional Codes 181 4.4 Upper Bounds on the Output Error Burst Lengths 186 4.5 Error Bounds for Periodically Time-Varying Convolutional Codes 195 4.6 Lower Error Bounds for Convolutional Codes 203 4.7 Error Bounds for Time-Varying Convolutional Codes 211 4.8 Error Bounds for Finite Back-Search Limits 220 4.9 Tailbiting Trellises 223 4.10 Quantization of Channel Outputs 230 4.11 Comments 233 Problems 234 4.1
CHAPTER 5 LIST DECODING 239 5.1
List Decoding Algorithms 239
5.2 List Decoding-Performance 242 5.3 The List Minimum Weight 247
5.4 Upper Bounds on the Probability of Correct Path Loss 255
Contents
Ix
5.5
Lower Bound on the Probability of Correct Path Loss 261
5.6 Correct Path Loss for Time-Invariant
Convolutional Codes 264
5.7 Comments 266 Problems 267
CHAPTER 6 SEQUENTIAL DECODING 269 The Fano Metric 269 6.2 The Stack Algorithm 274 6.3 The Fano Algorithm 276 6.4 The Creeper Algorithm* 278 6.5 Simulations 288 6.1
Computational Analysis of the Stack Algorithm 289 Error Probability Analysis of the Stack Algorithm 296 Analysis of the Fano Algorithm 305 Analysis of Creeper* 310 6.10 Comments 313 Problems 313 6.6 6.7 6.8 6.9
CHAPTER 7 ITERATIVE DECODING 317 Iterative Decoding-A Primer 317 7.2 The Two-Way Algorithm for APP Decoding 321 7.3 The Two-Way Algorithm for Tailbiting Trellises 330 7.4 The One-Way Algorithm for APP Decoding 334 7.5 Low-Density Parity-Check Convolutional Codes 337 7.1
7.6 Comments 344 Problems 344
CHAPTER 8 CONVOLUTIONAL CODES
WITH GOOD DISTANCE PROPERTIES 347 Computing the Distance Spectrum 347 8.2 Some Classes of Rate R = 1/2 Convolutional Codes 351 8.3 Low-Rate Convolutional Codes 357 8.4 High-Rate Convolutional Codes 360 8.5 Tailbiting Trellis Encoders 362 8.1
8.6 Comments 370
CHAPTER 9 MODULATION CODES 371 Bandlimited Channels and QAM 373 9.2 Coding Fundamentals 378 9.3 Lattice-Type Trellis Codes 384 9.1
Contents
x
9.4 Geometrically Uniform Trellis Codes 384 9.5 Decoding of Modulation Codes 387 9.6 Comments 388 Problems 389
APPENDIX A MINIMAL ENCODERS 393
APPENDIX B WALD'S IDENTITY 399
BIBLIOGRAPHY 407
INDEX 419 ABOUT THE AUTHORS 427
Preface
The material presented in this book has been maturing in our minds for more than 10 years and reflect 20 years of research activities on convolutional codes by our graduate students and us. The material presented herein will be of interest to both industry researchers as well as designers. The book was initially written for use as a textbook in graduate level course study. There has been a great exchange of ideas and theories between us through our own re-
search and that of our graduate students. For example, Harro Osthoff wrote a thesis on list decoding of convolutional codes. As we found that problem both interesting and fun, more work went into extending the theoretical aspects and the result became Chapter 5. Johan Nystrom's thesis was devoted to a new algorithm for sequential decoding. Both of us have a long-standing weakness for sequential decoding and thus it constitutes a substantial part of Chapter 6.
We also believed that the appearance of some of David Forney's important structural results on convolutional encoders in a textbook was long overdue. For us, that generated new research problems. This interplay between research and teaching was a delightful experience. This book is the final product of that experience. Chapter 1 is an introduction to the area of convolutional codes. Shannon's coding limits are discussed and we introduce some terminology. In Chapter 2, we define convolutional codes and convolutional encoders. Various concepts of minimality are discussed in depth. Chapter 3 is devoted to distances of convolutional codes. Upper and lower bounds are derived. The following four chapters, Chapters 4-7 describe and analyze different decoding methods, viz., Viterbi (maximum-likelihood), list, sequential, and iterative decoding, respectively. In Chapter 8, we provide lists of good convolutional encoders of various types, rates, and memories. Chapter 9 contains some results on modulation codes. In Appendix A we demonstrate how to minimize two examples of convolutional encoders and in Appendix B we present Wald's identity and related results that are necessary for our analyses in Chapters 3-6. For simplicity's sake, we have only considered binary convolutional codes. In most of
our derivations of the various bounds we have only considered the binary symmetric
xi
xii
Preface
channel (BSC). Although inferior from a practical communications point of view, we believe that its pedagogical advantages outweigh that disadvantage. Each chapter ends with some comments, mainly historical in nature. The problems are highly recommended and most have been used as exam questions in our classrooms. Note that sections marked with asterisk (*), can be skipped at the first reading without loss of continuity. There are various ways to organize the material into an introductory course on convolutional coding. Chapter 1 should always be read first. Then one possibility is to cover the following sections, skipping most of the proofs found there: 2.1-2.6, 2.9, 2.10, 3.1, 3.5, 3.10, 4.1,
4.2, 4.9, 4.10, 5.1, 5.2, 6.1-6.3, 6.5, 7.1-7.3, and perhaps also, 8.1. With our own younger students, we emphasize explanations, discussion of algorithms, and assign a good number of the problems at the end of the chapters. With more experienced students we stress proving the
theorems, because a good understanding of the proofs is an asset in advanced engineering work. Finally, we do hope that some of our passion for convolutional codes has worked its way into these pages.
Rolf Johannesson Kamil Sh. Zigangirov
Acknowledgments
For the past ten years, the material presented in this book has been in development. Needless to say, several persons have contributed in various ways. Above all, we are indebted to Lena Mansson who with great enthusiasm and outstanding skill typed, retyped, proofread, retyped, suggested improvements, retyped, retyped, and again retyped the manuscript in what often seemed to be an endless sequence of retypings-it is hard to believe that it is over now. Without Lena this project would certainly have been aborted just short of ten years ago. We would not have reached our view of convolutional codes without our work with former and present graduate students: Mats Cedervall, Ben Smeets, Harro Osthoff, Johan Nystrom, Gunilla Bratt, Per Ljungberg, Joakim Persson, Alberto Jimdnez, Leif Wilhelmsson, Emma Wittenmark, Kristian Wahlgren, Stefan Host, Karin Engdahl, Per Stahl, OlaWintzell, and Michael Lentmaier, who have all contributed in many ways. They produced research results, did simulations, provided us with various figures, suggested improvements, saved us from some blunders, etcetera, etcetera. Mats Cedervall, Bertil Lindvall, and Ben Smeets did an outstanding job as TeX-gurus.
In our research we have benefited considerably from our cooperation with coauthors John Anderson, Dave Forney, Zhe-Xian Wan, and Victor Zyablov.
Special thanks go to our Editor in Chief, John Anderson, who made us sign a contract with IEEE Press a year ago and thereby-at the cost of an annus horribilis-forced us to finish this book. Many thanks to Marilyn Catis, John Griffin, and SavoulaAmanatidis at the IEEE Press and Copyeditor Betty Pessagno for their support and encouragement.
Rolf would also like to thank Goran Einarsson who almost 30 years ago not only suggested convolutional codes as Rolf's thesis topic, but also recommended that he spend a year of his graduate studies with Jim Massey at the University of Notre Dame. This year was the beginning of a lifelong devotion to convolutional codes and a true friendship-the influence and importance of this cannot be overestimated. Finally, Rolf would also like to acknowledge Dave Forney's outstanding contribu-
tions to the field of convolutional codes; without his work convolutional codes would have been much less exciting.
Kamil would also like to thank his colleagues at the Institute for Problems of Information Transmission in Moscow. In particular, he would like to mention the "daytime" seminars organized
xiii
xiv
Acknowledgments
by the late Roland Dobrushin and by Mark Pinsker and the "evening" seminars organized by Leonid Bassalygo. During these seminars, Kamil had the opportunity to use the participants as guinea pigs when he wanted to test many of the fundamental ideas presented in this book. Special thanks go to Mark Pinsker and Victor Zyablov. Mark introduced Kamil to convolutional codes, and he inspired Kamil for more than 30 years. Over the years, Kamil has also benefited from numerous discussions with Victor. Rolf fohannesson Kamil Sh. Zigangirov
I Introduction
1.1 WHY ERROR CONTROL? The fundamental idea of information theory is that all communication is essentially digital-it is equivalent to generating, transmitting, and receiving randomly chosen binary digits, bits.
When these bits are transmitted over a communication channel-or stored in a memory-it is likely that some of them will be corrupted by noise. Claude E. Shannon showed in his 1948 landmark paper, "A Mathematical Theory of Communication" [Sha48], that the problem of communicating information from a source over a channel to a destination can always be separated-without sacrificing optimality-into the following two subproblems: representing the source output efficiently as a sequence of binary digits (source coding) and transmitting binary, random, independent digits over the channel (channel coding). In Fig. 1.1 we show a general digital communication system. We use Shannon's separation principle and split the encoder and decoder into two parts each as shown in Fig. 1.2. The channel coding parts can be designed independently of the source coding parts, which simplifies the use of the same communication channel for different sources. To a computer specialist, "bit" and "binary digit" are entirely synonymous. In information theory, however, "bit" is Shannon's unit of information [Sha48] [Mas82]. For Shannon, information is what we receive when uncertainty is reduced. We get exactly 1 bit of information from a binary digit when it is drawn in an experiment in which successive outcomes are independent of each other and both possible values, 0 and 1, are equiprobable; otherwise, the information is less than 1. In the sequel, the intended meaning of "bit" should be clear from the context.
Shannon's celebrated channel coding theorem states that every communication channel is characterized by a single parameter Ct, the channel capacity, such that Rt randomly chosen bits per second can be transmitted arbitrarily reliably over the channel if and only if Rt < C.
We call Rt the data transmission rate. Both Ct and Rt are measured in bits per second. Shannon showed that the specific value of the signal-to-noise ratio is not significant as long as it is large enough, that is, so large that Rt < Ct holds; what matters is how the information bits are encoded. The information should not be transmitted one information bit at a time, but long information sequences should be encoded such that each information bit has some influence on many of the bits transmitted over the channel. This radically new idea gave birth to the subject of coding theory. 1
2
Chapter 1
Introduction
Noise
I Source
Digital
Encoder
Decoder
channel
Destination
Figure 1.1 Overview of a digital communication system.
Source encoder
Source
Binary
Channel
digits
encoder
Noise -
Binary
Source
Destination
decoder
digits
Digital
channel
Channel decoder
Figure 1.2 A digital communication system with separate source and channel coding.
Error control coding should protect digital data against errors that occur during transmission over a noisy communication channel or during storage in an unreliable memory. The last decade has been characterized not only by an exceptional increase in data transmission and storage but also by a rapid development in microelectronics, providing us with both a need for and the possibility of implementing sophisticated algorithms for error control. Before we study the advantages of coding, we will consider the digital communication channel in more detail. At a fundamental level, a channel is often an analog channel that transfers waveforms (Fig. 1.3). Digital data uo, u 1, u2, ... , where ui E {0, 11, must be modulated into waveforms to be sent over the channel. Noise
i Waveform
Modulator
Demodulator
channel
Analog waveform
Analog waveform
- - - - - - - - - - - - - - - - Figure 1.3 A decomposition of a digital communication channel.
In communication systems where carrier phase tracking is possible (coherent demodulation), phase-shift keying (PSK) is often used. Although many other modulation systems are in use, PSK systems are very common and we will use one of them to illustrate how modulations generally behave. In binary PSK (BPSK), the modulator generates the waveform EE
st(t) = 0,
coswt,
0
(1.1)
otherwise
for the input 1 and so(t) = -sl (t) for the input 0. This is an example of antipodal signaling. Each symbol has duration T seconds and energy Es = ST, where S is the power and w = T
Why Error Control?
Section 1.1
3
The transmitted waveform is 00
V(t)=Y, s,ti;(t-iT)
(1.2)
i_o
Assume that we have a waveform channel such that additive white Gaussian noise (AWGN) n(t) with zero mean and two-sided power spectral density N0/2 is added to the transmitted waveform v(t), that is, the received waveform r(t) is given by
r(t) = v(t) + n(t)
(1.3)
E[n(t)] = 0
(1.4)
E[n(t + r)n(t)] = Zo6(r)
(1.5)
where
and
Based on the received waveform during a signaling interval, the demodulator produces an estimate of the transmitted symbol. The optimum receiver is a matched filter with impulse response
h(t) =
1T cos cot
0
0,
else
(1.6)
which is sampled each T seconds (Fig. 1.4). The matched filter output Z, at the sample time
iT, iTT
r(r)h(iT - r) dr
=
(1.7)
(i-1)T
is a Gaussian random variable N(µ, a) with mean coscor)
(cosw(T
T
- r))
dr = f Es
(1.8)
where the sign is + or - according as the modulator input was 1 or 0, respectively, and variance /T
U2 = Zo
J0
2
T cos mr
dr = No/2
(1.9)
After the sampler we can make a hard decision, that is, a binary quantization with threshold zero, of the random variable Zi. Then we obtain the simplest and most important binary-input and binary-output channel model, viz., the binary symmetric channel (BSC) with crossover probability c (Fig. 1.5). The crossover probability is closely related to the signalto-noise ratio E,s/No. Since the channel output for a given signaling interval depends only on the transmitted waveform during that interval and not on other transmissions, the channel is said to be memoryless.
Because of symmetry, we can without loss of generality assume that a 0, that is, -El T cos cot, is transmitted over the channel. Then we have a channel "error" if and T(t)
Figure 1.4 Matched filter receiver.
T cos wt
GI
iT, i = 1, 2,
Zi
Chapter 1
4
Introduction
1-E
1-c
Figure 1.5 Binary symmetric channel.
only if the matched filter output at the sample time i T is positive. Thus,
E=P(Zi>0I0sent) where Zi E N(- E,, No/2). Since 1
.fz; (z) =
e
we have 0°
E = 7r No
f
(i+
e
s)2
"o
dz (1.12)
2Jr JI2Es/No
e-yZ/2dy
= Q ( 2E,./No)
where
1 fe
Q(x) =
_ yZ/2dy
(1.13)
27r
is the complementary error function of Gaussian statistics (often called the Q-function). When coding is used, we prefer measuring the received energy per information bit, Eb, rather than per symbol. For uncoded BPSK, we have Eb = E,. Letting Pb denote the bit error probability (or bit error rate), that is, the probability that an information bit is erroneously delivered to the destination, we have for uncoded BPSK
Pb = Q
2Eb/No)
(1.14)
How much better can we do with coding? When we use coding, it is a waste of information to make hard decisions. Since the influence of each information bit will be spread over several channel symbols, the decoder can benefit from using the value of Z; (hard decisions use only the sign of Z;) as an indication of how reliable the received symbol is. The demodulator can give the analog value of Z, as its output, but it is often more practical to use, for example, a three-bit quantization-a soft decision. By introducing seven thresholds, the values of Z, are divided into eight intervals and we obtain an eight-level soft-quantized discrete memoryless channel (DMC) as shown in Fig. 1.6. Shannon [Sha48] showed that the capacity of the Gaussian memoryless channel with two-sided noise spectral density No/2 and without bandwidth limitation isl
Cr ° = lim W log I+ W
S
Noln2
--
(1.15)
bits/s
where W denotes the bandwidth and S is the signaling power. If we transmit K information 'Here and hereafter we write log for loge.
Section 1.1
Why Error Control?
5
Figure 1.6 Binary input, 8-ary output, DMC.
bits during r seconds, where r is a multiple of T, we have Eb
_ Sr K
(1.16)
Since the data transmission rate is Rt = K/r bits/s, the energy per bit can be written S Eb _ Rt
(1.17)
Combining (1.15) and (1.17) gives C`° Rt
_
Eb No In 2
(1.18)
From Shannon's celebrated channel coding theorem [Sha48] it follows that for reliable communication, we must have Rt < C°°. Hence, from this inequality and (1.18) we have Eb
No
>In 2=0.69=-1.6dB
(1.19)
which is the fundamental Shannon limit. In any system that provides reliable communication in the presence of Gaussian noise the signal-to-noise ratio Eb/No cannot be less than the Shannon limit, -1.6 dB!
On the other hand, as long as Eb/No exceeds the Shannon limit, -1.6 dB, Shannon's channel coding theorem guarantees the existence of a system-perhaps very complex-for reliable communication over the channel. In Fig. 1.7, we have plotted the fundamental limit of (1.19) together with the bit error rate for uncoded BPSK, that is, equation (1.14). At a bit error rate of 10-5, the infinite-bandwidth additive white Gaussian noise channel requires an Eb/No of at least 9.6 dB. Thus, at this bit error rate we have a potential coding gain of 11.2 dB! If we restrict ourselves to hard decisions, it can be shown (Problem 1.2) that for reliable communication we must have
No > 2ln2=1.09=0.4dB
(1.20)
In terms of capacity, soft decisions are about 2 dB more efficient than hard decisions. Although it is practically impossible to obtain the entire theoretically promised 11.2 dB coding gain, communication systems that pick up 2 to 8 dB are routinely in use. We conclude this section, which should have provided some motivation for the use of coding, with an adage from R. E. Blahut [B1a92]: "To build a communication channel as good as we can is a waste of money"-use coding instead!
6
Chapter 1
Introduction
10-1
10-2
Shannon limit soft decisions
10-3
(-1.6 dB) 1b
Hard decisions
104
(0.4 dB)
11.2 dB 10-5
2.0 dB
10-6
-2
0
2
4
6
8
10
12
Eb/No(dB) Figure 1.7 Capacity limits and regions of potential coding gain.
1.2 BLOCK CODES-A PRIMER For simplicity, we will deal only with binary block codes. We consider the entire sequence of information bits to be divided into blocks of K bits each. These blocks are called messages and denoted u = uoul ... UK -I. In block coding, we let u denote a message rather than the entire information sequence as is the case in convolutional coding to be considered later.
A binary (N, K) block code B is a set of M = 2K binary N-tuples (or row vectors of length N) v = v0v1 ... vN_1 called codewords. N is called the block length and the quantity
R=
10N = K/N
(1.21)
is called the code rate and is measured in bits per (channel) use. The data transmission rate in bits/s is obtained by multiplying the code rate (1.21) by the number of transmitted channel symbols per second:
Rt = R/T
(1.22)
EXAMPLE 1.1 The set 13 = {000, 011, 101, 1101 is a (3, 2) code with four codewords and rate R = 2/3.
An encoder for an (N, K) block code B is a one-to-one mapping from the set of M = 2K binary messages to the set of codewords B.
Section 1.2
Block Codes-A Primer
7
EXAMPLE 1.2 uOul
VOVJV2
"Out
V V1 V2
00
000
00
101
01
011
01
011
10
101
10
101
11
110
11
000
and
are two different encoders for the code 13 given in the previous example.
The rate R = K/N is the fraction of the digits in the codeword that are necessary to represent the information; the remaining fraction, 1 - R = (N - K)/N, represents the redundancy that can be used to detect or correct errors. Suppose that a codeword v corresponding to message u is sent over a BSC (see Fig. 1.8). The channel output r = r0r1 ... rN_1 is called the received sequence. The decoder transforms
the received N-tuple r, which is possibly corrupted by noise into the K-tuple u, called the estimated message. Ideally, u will be a replica of the message u, but the noise may cause some decoding errors. Since there is a one-to-one correspondence between the message u and the codeword v, we can, equivalently, consider the corresponding N-tuple v as the decoder output. If the codeword v was transmitted, a decoding error occurs if and only if v v. (Channel) Encoder
Figure 1.8 A binary symmetric channel (BSC)
Message
BSC
Codeword
with (channel) encoder and decoder.
r
(Channel) Decoder
Received sequence
u or
Estimated message
Let PE denote the block (or word) error probability, that is, the probability that the estimated codeword v differs from the actual codeword v. Then we have
PE =EP(vovIr)P(r)
(1.23)
r
where the probability that we receive r, P (r), is independent of the decoding rule. Hence, in order to minimize PE, we should specify the decoder such that P (v v I r) is minimized for aet all r or, equivalently, such that P (v I r) P (v = v I r) is maximized for all r. Thus the block error probability PE is minimized by the decoder, which as its output chooses u such that the corresponding v maximizes P(v I r). That is, v is chosen as the most likely codeword given that r is received. This decoder is called a maximum a posteriori probability (MAP) decoder. Using Bayes' rule, we can write
P(v I r) = P(r I V)P(v)
(1.24) P (r) The code carries the most information possible with a given number of codewords when the codewords are equally likely. It is reasonable to assume that a decoder that is designed for this
case also works satisfactorily-although not optimally-when the codewords are not equally likely, that is, when less information is transmitted. When the codewords are equally likely, maximizing P(v r) is equivalent to maximizing P(r v). The decoder that chooses its estimate i to maximize P (r I v) is called a maximum-likelihood (ML) decoder. Notice that in an erroneously estimated codeword some of the information digits may nevertheless be correct. The bit error probability, which we introduced in the previous section, I
I
Chapter 1
8
Introduction
is a better measure of quality in most applications. However, it is in general more difficult to calculate. The bit error probability depends not only on the code and on the channel, like the block error probability, but also on the encoder and on the information symbols. The use of block error probability as a measure of quality is justified by the inequality
Pb
(1.25)
When PE can be made very small, inequality (1.25) implies that Pb can also be made very small.
The Hamming distance between the two N-tuples r and v, dH(r, v), is the number of positions in which their components differ. EXAMPLE 1.3 Consider the five-tuples 10011 and 11000. The Hamming distance between them is 3.
The Hamming distance, which is an important concept in coding theory, is a metric; that is,
(i) dH(x, y) > 0, with equality if and only if x = y (positive definiteness) (ii) dH(x, y) = dH(y, x) (symmetry) (iii) dH(x, y) < dH(x, z) + dH(z, y), all z (triangle inequality)
The Hamming weight of an N-tuple x = x0x1 ... xN_1, denoted WH(x), is defined as the number of nonzero components in x. For the BSC, a transmitted symbol is erroneously received with probability E where E is the channel crossover probability. Thus, assuming maximum-likelihood (ML) decoding, we must choose our estimate v of the codeword v to maximize P(r I v); that is,
v = argmax{P(r I
(1.26)
v)}
U
where P(r I v) = EdH(r,v)(1 - )N-dH(r,v)
E)N
E
1-E
dH(r,v)
(1.27)
Since 0 < E < 1/2 for the BSC, we have
0<
E
1-E
<1
(1.28)
and, hence, maximizing P (r I v) is equivalent to minimizing dH (r, v). Clearly, maximum-likelihood (ML) decoding is equivalent to minimum (Hamming) distance (MD) decoding, that is,
choosing as the decoder output the message u whose corresponding codeword v is (one of) the closest codeword(s) to the received sequence r.
In general, the decoder must compare r with all M = 2K = 2RN codewords. The complexity of ML or MD decoding grows exponentially with the block length N. In order to develop the theory further, we must introduce an algebraic structure. A field is an algebraic system in which we can perform addition, subtraction, multiplication, and division (by nonzero numbers) according to the same associative, commutative, and distributive laws we use with real numbers. Furthermore, a field is called finite if the set of numbers is finite. Here we will limit the discussion to block codes whose codewords have components in the simplest, but from a practical point of view also the most important, finite field, viz., the binary field, ]F2, for which the rules for addition and multiplication are those of
Section 1.2
Block Codes-A Primer
9
modulo-two arithmetic, namely 0
1
0
0
1
1
1
0
0
1
0
0
0
1
0
1
We notice that addition and subtraction coincide in F2! The set of binary N-tuples are the vectors in an N-dimensional vector space, denoted ]F2 , over the field F2. Vector addition is component-by-component addition in F2. The scalars are the elements in F2. Scalar multiplication is carried out according to the rule:
a(xo, xi, ... , xN-1) = (axo, ax1, ..., axN-1)
(1.29)
where a E IF2. Since a is either 0 or 1, scalar multiplication is trivial in FN . Hamming weight and distance are clearly related:
dH(x,y) = WH(X -y) = WH(X +y)
(1.30)
where the arithmetic is in the vector space FN and where the last equality follows from the fact that subtraction and addition coincide in F2The minimum distance, d,in, of a code B is defined as the minimum value of dH(v, v') over all v and v' in B such that v v'. EXAMPLE 1.4 The code 13 in Example 1.1 has d,,,;n = 2.
Let v be the actual codeword and r the possibly erroneously received version of it. The error pattern e = eoe1 ... eN_1 is the N-tuple that satisfies
r= v+e
(1.31)
wH(e) = dH(r, v)
(1.32)
The number of errors is
Let £, denote the set of all error patterns with t or fewer errors, that is,
£,= {eIwH(e)
(1.33)
We will say that a code B corrects the error pattern e if for all v the decoder maps r = v + e into v = v. Theorem 1.1 The code B can correct all error patterns in £t if and only if d,,,;n > 2t. Proof. Suppose that din > 2t. Consider the decoder which chooses v as (one of) the codeword(s) closest to r in Hamming distance (MD decoding). If r = v + e and e e £,, then dH(r, v) < t. The decoder output v must also satisfy dH(r, v) < t since v must be at least as close to r as v is. Thus,
dH(v, v) < dH(v, r) + dH(r, v) < 2t < drain
(1.34)
which implies that v = v and thus the decoding is correct. Conversely, suppose that dn,in < 2t. Let v and v' be two codewords such thatdH(v, v') = drain, and let the components of r be specified as
vi = v',
all i such that vi = v'
v/,
v' (if t < dn,in) or the first t positions with vi all positions with vi # v' (otherwise) the remaining dn,in - t positions (if any)
Vi,
(1.35)
Chapter 1
10
Introduction
Thus, dH(v, r) = t and dH(v', r) = dmin - t < t (if t < dmin, and dH(v, r) = drain and dH(v', r) = 0 otherwise). Next we observe that both error patterns e and e' satisfying
r=v+e=v'+e'
(1.36)
are in Er, but the decoder cannot make the correct decision for both situations, and the proof is complete.
To make codes easier to analyze and to simplify the implementation of their encoders and decoders, we impose a linear structure on the codes. A binary, linear code 13 of rate R = K/N is a K-dimensional subspace of the vector space FN; that is, each codeword can be written as a linear combination of linearly independent vectors gl , 92, ... , gK, where gj E F2 , called the basis for the linear code B. Then we call the K x N matrix G having g1, 92, ... , gK as rows a generator matrix for B. Since the vectors . gK are linearly independent, the matrix G has full rank. The row space of G is 13 91, 92, itself. EXAMPLE 1.5 For the code in Example 1.1, which is linear, the codewords 011 and 101 form a basis. This basis determines the generator matrix
G=
(0
0
(1.37)
1)
The generator matrix offers a linear encoding rule for the code 13:
v = uG
(1.38)
where G =
911
912
...
g1N
g21
g22
...
g2N
9Kl
gK2
(1.39)
9KN
and the information symbols u = uou1 ... uK_1 are encoded into the codeword v = vovl ... VN-1
A generator matrix is often called an encoding matrix and is any matrix whose rows are a basis for B. It is called systematic whenever the information digits appear unchanged in the first K components of the codeword; that is, G is systematic if and only if it can be written as
G = (IK P)
(1.40)
where IK is the K x K identity matrix. EXAMPLE 1.6 The generator matrix
G=( 1
0 1)
(1.41)
is a systematic encoding matrix for the code in Example 1.1.
Two codes 13 and 13' are said to be equivalent if the order of the digits in the codewords v E B are simply a rearrangement of that in the codewords v' c B'.
Theorem 1.2 A linear code B has either a systematic encoding matrix or there exists an equivalent linear code B' which has a systematic encoding matrix.
Section 1.2
Block Codes-A Primer Proof.
11
See Problem 1.5.
Let G be an encoding matrix of the (N, K) code B. Then G is a K x N matrix of rank K. By the theory of linear equations, the solutions of the system of linear homogeneous equations
GxT = 0
(1.42)
where x = (x1, x2 ... xN), form an (N - K)-dimensional subspace of FZ . Therefore, there exists an (N - K) x N matrix H of rank N - K such that
GHT = 0
(1.43)
We are now ready to prove a fundamental result.
Theorem 1.3 An N-tuple v is a codeword in the linear code 13 with encoding matrix G if and only if
vHT = 0
(1.44)
where H is an (N - K) x N matrix of full rank which satisfies
GHT = 0 Proof.
(1.45)
Assume that the N-tuple V E B, then v = uG for some u E FZ F. Thus,
vHT = uGHT = 0
(1.46)
Conversely, suppose that vHT = 0. Since GHT = 0 and both H and G have full rank, the rows of G form a basis of the solution space of xHT = 0. Therefore, v = uG for some u E ]FZ , that is, v E B.
El
From (1.45) it follows that each row vector of H is orthogonal to every codeword; the rows of H are parity checks on the codewords, and we call H a parity-check matrix of the linear code B. Equation (1.44) simply says that certain coordinates in each codeword must sum to zero.
It is easily verified that an (N, K) binary linear code with systematic encoding matrix
G = (IK P) has
H = (pT IN-K)
(1.47)
as a parity-check matrix. EXAMPLE 1.7 The code in Example 1.1 with an encoding matrix given in Example 1.6 has
H=( 1
1
(1.48)
1
as a parity-check matrix.
Next, we will consider a member of a much celebrated class of single-error-correcting codes due to Hamming [Ham50]. EXAMPLE 1.8 The binary (7,4) Hamming code with encoding matrix 0
0
0
0
1
1
0
1
0
0
1
0
1
0
0
1
0
1
1
0
0
0
0
1
1
1
1
1
G
(1.49)
Chapter 1
12
Introduction
has
H=
0
1
1
1
1
0
0
1
0
1
1
0
1
0
1
1
0
1
0
0
1
(1.50)
as a parity-check matrix. Note that vHT = 0 can be written as VI VO
V0
+ + +
V2
+ +
V1
+
V2
V3 V3 V3
+ V4 = 0 + +
v5 = 0 v6 = 0
(1.51)
so that each row in H determines one of the three parity-check symbols v4, v5, and v6. The remaining four code symbols, viz., V0, V1, v2, and v3, are the information symbols.
Since the N - K rows of the parity-check matrix H are linearly independent, we can use H as an encoding matrix of an (N, N - K) linear code X31, which we call the dual or orthogonal code of B. Let v1 E B' and assume that v1 = u1H, where u1 E FN-K. Then from (1.45) it follows that
v1GT = u1HGT = u1(GHT )T = 0
(1.52)
Conversely, assume that v1GT = 0 for v1 E JF2 . Since HGT = 0 and both H and G have full rank, v1 is a linear combination of the rows of H, that is, v1 E B'. Hence, G is a K x N parity-check matrix of the dual code 81 and we have proved Theorem 1.4 An (N - K) x N parity-check matrix for the linear code B is an (N - K) x N encoding matrix for the dual code X31, and conversely. From (1.44) and (1.52) it follows that every codeword of B is orthogonal to those of X31 and conversely. EXAMPLE 1.9 The code B in Example 1.1 has the dual code Bt = {000, 111 }.
If 5 = 51, we call B self dual. EXAMPLE 1.10 The (2, 1) repetition code 100, 111 is self dual.
The minimum weight, Wmin, of a linear code B is the smallest Hamming weight of its nonzero codewords.
Theorem 1.5 For a linear code, dmin = wmin
(1.53)
Proof. The theorem follows from (1.30) and from the fact that for a linear code the sum of two codewords is a codeword.
For the class of linear codes, the study of distance properties reduces to the study of weight properties that concern only single codewords! A most convenient consequence of this reduction is the following.
Theorem 1.6 If H is any parity-check matrix for a linear code B, then drain = Wmin equals the smallest number of columns of H that form a linearly dependent set.
Section 1.2
Block Codes-A Primer
Proof.
13
Follows immediately from vHT = 0 for v E 13.
EXAMPLE 1.11 Consider the (7,4) Hamming code with parity-check matrix
H
0
1
1
0
1
1
1
0
1
110 1
0
0
0
(1.54)
1
All pairs of columns are linearly independent. Many sets of three columns are linearly dependent, for example, columns 1, 6, and 7. It follows from the previous theorem that d,,,;,, = 3. All single errors (i.e., all error patterns of weight one) can be corrected.
Theorem 1.7 (Singleton bound) If 13 is an (N, K) linear code with minimum distance dmi,,, then the number of parity-check digits is lower-bounded by
N-K>dmin-1
(1.55)
Proof. A codeword with only one nonzero information digit has weight at most 1 + N - K. Then, from Theorem 1.5 follows
Wmin=dmin
(1.56)
0 An (N, K) linear code that meets the Singleton bound with equality is called a maximumdistance-separable (MDS) code.
The only binary MDS codes are the trivial ones, viz., the (N, N) code B = l±2 , the (N, 1) repetition code B = {00...0, 0,11 ... 1}, and the (N, N - 1) code consisting of all even-weight N-tuples. (For a nontrivial code, 2 < K < N - 1.) The most celebrated examples of nonbinary MDS codes are the Reed-Solomon codes. Let B be an (N, K) linear code. For any binary N-tuple a, the set
a+ B=def[a + v I V E 131
(1.57)
is called a coset of B. Every b E FN is in some coset; for example, b + B contains b. Two Z binary N-tuples a and b are in the same coset if and only if their difference is a codeword, or, equivalently, (a + b) E B. Every coset of B contains the same number of elements, viz., 2K, as 13 does.
Theorem 1.8 Any two cosets are either disjoint or identical. Suppose that c belongs to both a + B and b +13. Then c = a + v = b + v', where
Proof.
v,v'EB. Thus,a=b+v+v'Eb+B,andsoa+Bcb+13. Similarlyb+Bca+13. Hence,a+B=b+B. From Theorem 1.8 follow immediately
Corollary 1.9 ]F'2 is the union of all the cosets of B. and
Corollary 1.10 A binary (N, K) code B has 2N-K cosets. Suppose that the binary N-tuple r is received over the BSC. Then r = v + e
(1.58)
where v E B is a codeword and e is an error pattern. Clearly r is in the coset r + B. From (1.58) it follows that the coset r + B contains exactly the possible error patterns! The N-tuple
Chapter 1
14
Introduction
of smallest weight in a coset is called a coset leader. (If there is more than one N-tuple with smallest weight, any one of them can be chosen as coset leader.) An MD decoder will select as its output the error pattern, e say, which is a coset leader of the coset containing r, subtract (or, equivalently in FN , add) a from (to) r, and, finally, obtain its maximum-likelihood estimate v. We illustrate what the decoder does by showing the standard array. The first row consists of the code B with the allzero codeword on the left. The following rows are the cosets e; + B arranged in the same order with the coset leader on the left: 0
v2K_1
v1
V2K_j +el
e2N-K-1
...
v1 +e2N-K_1
V2K-1
+e2N-X_1
The MD decoder decodes r to the codeword v at the top of the column that contains r. EXAMPLE 1.12 The (4,2) code B with encoding matrix
G = (0
1
(1.59)
1
)
has the following standard array:
0110
1101
0011
1110
0101
1111
0010
1001
1010
0111
1100
0000
1011
1000 0100 0001
Suppose that r = 1001 is received. An MD decoder outputs v = 1101.
Theorem 1.11 An (N, K) binary linear code B can correct all error patterns in a set £ if and only if these error patterns all lie in different cosets of F' relative to B. Proof. Suppose that e and e' are distinct error patterns in the same coset. Then there is a v E B such that v + e = e'. No decoder can correct both e and e'. Conversely, suppose that all error patterns in £ lie in different cosets. If v is the actual transmitted codeword and e the actual error pattern, then r = v + e lies in e + B. Thus, all error patterns in £ can be corrected by a decoder that maps r into the error patterne E E (if any) that lies in the same coset e + B as r does. The syndrome of the received N-tuple r, relative to the parity-check matrix H, is defined as
s aef
rHT
(1.60)
Assume that the transmitted codeword is v and that r = v + e, where e is the error pattern. Both r and H are known to the receiver, which exploits (1.44) and forms
s = rHT = (v + e)HT
=vHT +eHT =O+eHT
(1.61)
Block Codes-A Primer
Section 1.2
15
so that
s = eHT
(1.62)
The syndrome depends only on the error pattern and not on the codeword! In medical terminology, a syndrome is a pattern of symptoms. Here the disease is the error pattern, and a symptom is a parity-check failure. Equation (1.62) gives N - K linearly independent equations for the N components of
the error pattern e. Hence, there are exactly 2K error patterns satisfying (1.62). These are precisely all the error patterns that are differences between the received N-tuple r and all 2K different codewords v. For a given syndrome, these 2K error patterns belong to the same coset. Furthermore, if two error patterns lie in the same coset, then their difference is a codeword and it follows from (1.61) that they have the same syndrome. Hence, we have
Theorem 1.12 Two error patterns lie in the same coset if and only if they have the same syndrome. From the two previous theorems follows
Corollary 1.13 An (N, K) binary linear code 8 can correct all error patterns in a set e if and only if the syndromes of these error patterns are all different. No information about the error pattern is lost by calculating the syndrome! In Fig. 1.9 we show the structure of a general syndrome decoder. The syndrome former, HT, is linear, but the error pattern estimator is always nonlinear in a useful decoder. Clearly, a syndrome decoder is an MD, or, equivalently, an ML decoder. r
-O+
Error
HT
Figure 1.9 Syndrome decoder for a linear block
pattern
estimator
code.
EXAMPLE 1.13 Consider the (7, 4) Hamming code whose parity-check matrix is given in Example 1.11. Let r = 0010001 be the received 7-tuple. The syndrome is
s = (0010001)
0
1
1
1
0
1
1
1
0
1
1
1
1
0
0
0
1
0
0
0
1
= 111
(1.63)
Since s # 0, r contains at least one erroneous component. For the Hamming codes there is a one-to-one correspondence between the single errors and the nonzero syndromes. Among all 2K = 16 possible error patterns, the MD decoder chooses the one with least Hamming weight, viz., the single-error pattern
corresponding to the given syndrome s = 111. Since the fourth row in HT is the triple 111, the MD decoder gives as its output e = 0001000 (a single 1 in the fourth position). It immediately follows that
U=r+e = 0010001 + 0001000 = 0011001
(1.64)
16
Chapter 1
Introduction
If v = 0011001 was sent, we have corrected the transmission error. However, if v = 0000000 was sent and e = 0010001, the Hamming code is not able to correct the error pattern. The syndrome decoder will in this case give as its output v = 0011001.
Suppose that the (7,4) Hamming code is used to communicate over a BSC with channel error probability E. The decoder can correctly identify the transmitted codeword if and only if the channel causes at most one error. The block (or word) error probability PE, that is, the probability that the estimated codeword differs from the actual codeword, is
PE =
-E)7-i
2
(1.65) = 21E2 - 70E3 + ...
For the (7,4) Hamming code, it can be shown (Problem 1.21) that for all digits
Pb = 9E2(1 - E)5 + 19E3(1 - E)4 + 16E4(I - E)3 +12e5(1 - E)2 + 7E6(1 - E) + E7
(1.66)
= 962 - 26E3 + ...
Finally, we notice that if we start with the 7-tuple 1011000, take all cyclic shifts, and form all linear combinations, we obtain a (7,4) Hamming code with a parity-check matrix 1
H=
1
1
0
1
0
0
0
1
1
1
0
1
0
1
1
0
1
0
0
1
(1.67)
which is equivalent to the one given in Example 1.7. The two codes differ only by the ordering of the components in their codewords (permute columns 1-4). The (7,4) Hamming codes are
members of a large class of important linear code families called cyclic codes-every cyclic shift of a codeword is a codeword. The class of cyclic codes includes, besides the Hamming codes, such famous codes as the Bose-Chaudhuri-Hocquenhem (BCH) and Reed-Solomon (RS) codes. The cyclic behavior of these codes makes it possible to exploit a much richer algebraic structure that has resulted in the development of very efficient decoding algorithms suitable for hardware implementations. 1.3 A FIRST ENCOUNTER WITH CONVOLUTIONAL CODES Convolutional codes are often thought of as nonblock linear codes over a finite field, but it can be an advantage to treat them as block codes over certain infinite fields. We will postpone the precise definitions until Chapter 2 and instead begin by studying a simple example of a binary convolutional encoder (Fig. 1.10).
Figure 1.10 An encoder for a binary rate R = 1/2 convolutional code.
Section 1.3
A First Encounter with Convolutional Codes
17
The information digits u = uou 1 ... are not as in the previous section separated into blocks. Instead they form an infinite sequence that is shifted into a register, in our example, of length or memory m = 2. The encoder has two linear output functions. The two output sequences v(1) = and v(2) = uo2)v12) ... are interleaved by a serializer to form a single-output sequence that is transmitted over the channel. For each information digit that enters the encoder, two channel digits are emitted. Thus, the code rate of this encoder is R = 1/2 bits/channel use.
Assuming that the content of the register is zero at time t = 0, we notice that the two output sequences can be viewed as a convolution of the input sequence u and the two sequences 11100... and 10100... , respectively. These latter sequences specify the linear output functions; that is, they specify the encoder. The fact that the output sequences can be described by convolutions is why such codes are called convolutional codes. In a general rate R = b/c, where b < c, binary convolutional encoder (without feedback) the causal, that is, zero for time t < 0, information sequence
U - u pu 1 ... = u(1)u(2) ... U (b) ... 0 0 ... u(b)u(1)u(2) 0 1
1
(1.68)
is encoded as the causal code sequence V = vovl ... = VO(1)vo(2) ... V0(c)V1(1) Vi(2) ... V1(c)
...
(1.69)
where
vt = .f (ut, ut-1, ... , ut-m)
(1.70)
The parameter m is called the encoder memory. The function f is required to be a linear function from
F(m+1)b 2
to 1Fc. It is often convenient to write such a function in matrix form: 2
vt =utGo+ut-1G1 +...+ut-mGm
(1.71)
where Gi, 0 < i < m, is a binary b x c matrix. Using (1.71), we can rewrite the expression for the code sequence as vovl
... = (uouI ...)G
(1.72)
or, in shorter notation, as
v = uG
(1.73)
where Go
G=
G1
...
Gm
Go
G1
...
Gm
(1.74)
and where here and hereafter the parts of matrices left blank are assumed to be filled in with zeros. We call G the generator matrix and Gi, 0 < i < m, the generator submatrices. In Fig. 1.11, we illustrate a general convolutional encoder (without feedback). rot
Figure 1.11 A general convolutional encoder (without feedback).
18
Chapter 1
Introduction
EXAMPLE 1.14 The rate R = 1/2 convolutional encoder shown in Fig. 1.10 has the following generator submatrices:
Go = (11) G1 = (10) G2 = (11)
(1.75)
The generator matrix is
G=
/ 11
10
11
I
11
10
(1.76)
11
EXAMPLE 1.15 The rate R = 2/3 convolutional encoder shown in Fig. 1.12 has generator submatrices 1
0
1
0
1
1
(1
1
0
0
0
1
G0
G1
Gz
(0 0 1
0
) / I
(1.77)
1
1
0
The generator matrix is
G=
110
000
001
101
101
110
000
011
001
101
(1.78)
v(1)
Figure 1.12 A rate R = 2/3 convolutional encoder.
It is often convenient to represent the codewords of a convolutional code as paths through a code tree. A convolutional code is sometimes called a (linear) tree code. The code tree for the convolutional code generated by the encoder in Fig. 1.10 is shown in Fig. 1.13. The leftmost node is called the root. Since the encoder has one binary input, there are, starting at the root, two branches stemming from each node. The upper branch leaving each node corresponds to
the input digit 0, and the lower branch corresponds to the input digit 1. On each branch we
Section 1.3
A First Encounter with Convolutional Codes
19 00 00 11
00/
10 11
01 11 11
0
10
A
00
00 01
01 10
00 11 11
11
10 10
00
01"Z Transmitted sequence 11
01
01
10
Figure 1.13 A binary rate R = 1 /2 tree code.
B
00 01 10
have two binary code digits, viz., the two outputs from the encoder. The information sequence 1011 ... is clearly seen from the tree to be encoded as the code sequence 11 10 00 01 .... The state of a system is a description of its past history which, together with a specification of the present and future inputs, suffices to determine the present and future outputs. For the
encoder in Fig. 1.10, we can choose the encoder state o, to be the contents of its memory elements; that is, at time t we have 17t = ut-lut-2 (1.79) Thus, our encoder has only four different encoder states, and two consecutive input digits are enough to drive the encoder to any specified encoder state. For convolutional encoders, it is sometimes useful to draw the state-transition diagram. If we ignore the labeling, the state-transition diagram is a de Bruijn graph [Gol67]. In Fig. 1.14, we show the state-transition diagram for our convolutional encoder.
Figure 1.14 A rate R = 1/2 convolutional encoder and its state-transition diagram.
Chapter 1
20
Introduction
Let us return to the tree code in Fig. 1.13. As an example, the two input sequences 010 (node A) and 110 (node B) both drive the encoder to the same encoder state, viz., v = 01. Thus, the two subtrees stemming from these two nodes are identical. Why treat them separately? Let us replace them with one node corresponding to state 01 at time 3. For each time or depth in the tree, we can similarly replace all equivalent nodes with only one-we obtain the trellis-like structure shown in Fig. 1.15, where the upper and lower branches leaving the encoder states correspond to information symbols 0 and 1, respectively.
Figure 1.15 A binary rate R = 1 /2 trellis code.
The information sequence 1011 ... corresponds in the trellis to the same code sequence as in the tree, viz., 1110 00 0 1 .... The trellis is just a more convenient representation of the same set of encoded sequences as is specified by the tree, and it is easily constructed from the state-transition diagram. A convolutional code is often called a (linear) trellis code. We will often consider sequences of finite length; therefore, it is convenient to introduce the notations X[o,n) =XOX1 ... Xn_1
(1.80)
X[on] =XOX1 ...Xn
(1.81)
and
Suppose that our trellis code in Fig. 1.15 is used to communicate over a BSC with crossover probability e, where 0 < e < 1/2. We start the encoder in encoder state o = 00, and we feed it with the finite information sequence u[o,n) followed by m = 2 dummy zeros in order to drive the encoder back to encoder state a = 00. The convolutional code is terminated and, thus, converted into a block code. The corresponding encoded sequence is the codeword V[O,n+m) The received sequence is denoted r[o,n+m)
To simplify the notations in the following discussion, we simply write u, v, and r instead Of u[0,n), V[0,n+m), and r[o,n+m)
We will now, by an example, show how the structure of the trellis can be exploited to perform maximum-likelihood (ML) decoding in a very efficient way. The memory m = 2 encoder in Fig. 1.10 is used to encode three information digits together with m = 2 dummy zeros; the trellis is terminated and our convolutional code has become a block code. A codeword
consisting of ten code digits is transmitted over a BSC. Suppose that r = I 1 00 1100 10 is received. The corresponding trellis is shown in Fig. 1.16. (In practice, typically a few thousand information bits are encoded before the encoder is forced back to the allzero state by encoding m dummy zeros.)
As shown by the discussion following (1.27), the ML decoder (and the MD decoder) chooses as its estimate i of v the codeword v that minimizes the Hamming distance d1.1(r, v) between r and v. That is, it minimizes the number of positions in which the codeword and the received sequence differ. In order to find the codeword that is closest to the received sequence, we move through the trellis from left to right, discarding all subpaths that could not turn out to be the prefix of the best path through the trellis. When we reach depth m = 2, we have four subpaths-one for each encoder state. At the next depth, however, there are eight subpaths-
Section 1.4
Block Codes Versus Convolutional Codes r= 11
00
00 n 00 r1 00
21 11
00
r-1 00
10
r-1 00
Figure 1.16 An example of Viterbi decoding for the received sequence r = 110011 1100 1
two per encoder state. For each encoder state at this depth, we keep only one subpath-the one that is closest to the corresponding prefix of the received sequence. We simply discard the poorer subpath into each encoder state since this poorer subpath could not possibly be the prefix of the best path through the trellis. We proceed in this manner until we reach encoder state 00 at depth 5. Because only one path through the trellis has survived, we have then found the best path through the trellis. In Fig. 1.16, the Hamming distance between the prefix of the received sequence and the best subpath leading to each encoder state is shown above the encoder state. The discarded poorer subpath is marked with the symbol x on the branch that enters the encoder state. Two subpaths leading to an encoder state may both have the same Hamming distance to the prefix of the received sequence. In fact, this happened at state 01, depth 4. Both subpaths have distance three to the prefix of the received sequence! Both are equally likely to be the prefix of the best path-we can discard either subpath without eliminating all "best paths" through the trellis, in case there are more than one best path. We arbitrarily chose to discard the upper of the two subpaths entering encoder state 01 at depth 4. The best codeword through the trellis was found to be v = 11 101100 00, which corresponds to the information sequence i = 100. If the estimate v = 1110 1100 00 happened to be the transmitted codeword, we have corrected two transmission errors. How many errors can we correct? The most likely error event is that the transmitted codeword is changed by the BSC so that it is decoded as its closest (in Hamming distance) neighbor. It is readily seen from Fig. 1.16
that the smallest Hamming distance between any two different codewords is 5, for example, dH (00 00 00 00 00, 1110 11 00 00) = 5. This minimum distance is called the free distance of the convolutional code and is denoted dfree. It is the single most important parameter for determining the error-correcting capability of the code. (The free distance and several other distance measures will be discussed in detail in Chapter 3.) Since dfree = 5, we can correct all patterns of two errors. The ML decoding algorithm described above is usually called the Viterbi algorithm in honor of its inventor [Vit67]. It is as simple as it is ingenious, and it is easily implementable. Viterbi decoders for memory m = 6 (64 states) are often used in practice. In Chapter 4, we will study Viterbi decoding in more detail and obtain tight upper bounds on the decoded bit error probability. 1.4 BLOCK CODES VERSUS CONVOLUTIONAL CODES
The system designer's choice between block and convolutional codes should depend on the application. The diehard block code supporters always advocate in favor of block codes, while their counterparts on the other side claim that in almost all situations convolutional codes
22
Chapter 1
Introduction
outperform block codes. As always, the "truth" is not only somewhere in between, but it also depends on the application. The theory of block codes is much richer than the theory of convolutional codes. Many sophisticated finite field concepts have been used to design block codes with a beautiful mathematical structure that has simplified the development of efficient error-correcting decoding algorithms. From a practical point of view, the Reed-Solomon (RS) codes constitute the most important family of block codes. They are extremely well suited for digital implementation. Berlekamp's bit-serial RS encoders [Ber82] have been adopted as a NASA standard for deepspace communication. The RS codes are particularly powerful when the channel errors occur
in clusters-burst errors-which is the case in secondary memories such as magnetic tapes and disks. All compact disc players use RS codes with table-lookup decoding. Assuming that a decoded bit error rate of 10-5 is satisfactory, which is the case, for example, for digitized voice, convolutional codes in combination with Viterbi decoding appear to be an extremely good combination for communication when the noise is white and Gaussian.
For example, Qualcomm Inc. has on a single chip implemented a memory in = 6 Viterbi decoder for rates R = 1/3, 1/2, 3/4, and 7/8. The rate R = 1/2 coding gain is 5.2 dB at Pb = 10-5. This very powerful error-correcting system operates either with hard decisions or with eight-level quantized soft decisions. The major drawback of RS codes is the difficulty of making full use of soft-decision information. As we will see in Chapter 4, the Viterbi algorithm can easily exploit the full softdecision information provided at the decoder input and thus easily pick up the 2 dB gain over hard-decision. Furthermore, code synchronization is in general much simpler for convolutional codes than for block codes. If a combination of a high level of data integrity, Pb = 10-10 say, and a larger coding gain than a Viterbi decoder can provide is required, then we could use either an RS code or a large memory, m = 25 say, convolutional encoder. The complexity of the Viterbi decoder,
which is essentially 2', will be prohibitively large in the latter case. Instead, we could use sequential decoding (Chapter 6) of the convolutional code whose complexity is essentially independent of the memory of the encoder. In many applications where the noise is predominantly Gaussian, the best solution is obtained when block and convolutional codes join forces and are used in series. In Fig. 1.17 we show a concatenated coding system, where we use a convolutional code as the inner code to clean up the channel. The Viterbi decoder will correct most channel errors but will occasionally output a burst of errors. This output then becomes the input to the outer decoder. Since an RS code is well suited to cope with bursts of errors, we use an RS code as the outer code. Such a concatenated coding system combines a high level of data integrity with large coding gain
and low complexity. Often an interleaver is used between the outer and inner encoders and a corresponding deinterleaver between the inner and outer decoders. Then the output error Input -H
Outer RS encoder
Inner cony. encoder
Noise -
Output ---
Outer
Inner
RS decoder
Viterbi decoder
Channel
Figure 1.17 Concatenated coding system.
Section 1.5
Capacity Limits and Potential Coding Gain Revisited
23
burst from the inner decoder will be smeared out by the deinterleaver before the outer decoder has to cope with it.
1.5 CAPACITY LIMITS AND POTENTIAL CODING GAIN REVISITED We will now return to the problem of determining the regions of potential coding gain which we first encountered in Section 1.1. Let us consider Shannon's formula for the capacity of the bandlimited Gaussian channel [Sha48]: CtW
= W log C1 + \\\
S
No W
bits/s
(1.82)
where W as before denotes the bandwidth. Assume that we are transmitting at the so-called Nyquist rate (i.e., at a rate of 2W samples per second) and that we use a rate R = K/N block code. If we transmit K information bits during r seconds, we have
N = 2Wr samples per codeword
(1.83)
Rt = K/r = 2WK/N = 2WR bits/s
(1.84)
Hence,
(Assuming a constant transmission rate Rt, the required bandwidth W is inversely proportional to the code rate R.) By combining (1.17) and (1.84), we obtain S W No
= 2REb/NO
(1.85)
For reliable communication, we must have Rt < Cr', that is,
R t = 2WR < W log I+
2REb No
(1.86)
or, equivalently,
22R-1
(1.87) > 2R Letting R - 0, we obtain (1.19). Since the right-hand side of inequality (1.87) is increasing with R, we notice that in order to communicate close to the Shannon limit, -1.6 dB, we have to use both an information rate Rt and a code rate R close to zero. Furthermore, if we use a rate R = 1/2 code, it follows from (1.87) that the required signal-to-noise ratio is Eb/NO
Eb/NO > 1 = 0 dB
(1.88)
In Fig. 1.18 we show the coding limits according to (1.87) and the regions of potential coding gain for various rates R.
When we derived the coding limits determined by inequality (1.87), we assumed a required error probability Pb arbitrarily close to zero. If we are willing to tolerate a certain given value of the error probability Pb, we can obtain a larger coding gain. It follows from Shannon's rate distortion theory [McE77] that if we are transmitting the output of a binary symmetric source and can tolerate an average distortion of K Pb for a block of K information symbols, then we can represent Rt bits of information per second with only Rt (1 - h(Pb)) bits
Chapter 1
24
Introduction
10-1 +
10-2 +
103+ Pb
10-4 +
10-5
Eb/No(dB) Figure 1.18 Coding limits and regions of potential coding gains for various rates R.
per second, where h(x) is the binary entropy function; that is,
h(x) = -x logx - (1 - x) log(1 - x)
(1.89)
These Rr (1 - h (Pb)) bits per second should now be transmitted over the channel with an error probability arbitrarily close to zero. Hence, instead of (1.86), we now have the inequality
R,(1 - h(Pb)) = 2WR(1 - h(Pb)) < W log
1 + 2oEb
(1.90) J
or, equivalently, 22R(t-h(1'b)) - 1
(1.91)
Eb/NO > 2R In Fig. 1.19 we show the coding limits according to (1.91) and the regions of potential coding gains for various rates R when we can tolerate the bit error probability Pb. We also show a comparison between these coding limits and Qualcomm's Viterbi decoder performance and that of a rate R = 3/4 (256,192) RS decoder. Remark. In order to achieve the capacity Cu' promised by (1.82), we have to use nonquantized inputs to the channel. If we restrict ourselves to the binary input Gaussian channel, then the formula for Cr', (1.82), must be replaced by a more complicated expression, and the coding limits shown in Fig. 1.19 should be shifted to the right by a small fraction of a dB [BMc74].
Section 1.6
25
Comments
Pb
Eb/No(dg) Figure 1.19 Regions of potential coding gains for various rates R when we can tolerate bit error probability Pb and a comparison with the performance of two convolutional codes and a block code.
1.6 COMMENTS Back in 1947 when Hamming had access to a computer only on weekends, he was very frustrated over its behavior: "Damn it, if the machine can detect an error, why can't it locate the position of the error and correct it?" [Tho83]. That question inspired the development of errorcorrecting codes. Hamming's famous single-error-correcting (7,4) block code is mentioned by Shannon in "A Mathematical Theory of Communication" [Sha48], but Hamming's paper was not published until two years later [Ham50]. The first paper published in coding theory was that of Golay [Go149], which in less than one page gave the generalization of Hamming codes to nonbinary fields, gave the only two multi-error-correcting perfect codes aside from the trivial binary repetition codes of odd length, and introduced the parity-check matrix (see also Problem 1.19). Elias introduced convolutional codes in 1955 [E1i55]. The first decoding method for these codes was sequential decoding suggested by Wozencraft in 1957 [Woz57] and further developed by Fano, who in 1963 presented a most ingenious decoding algorithm [Fan63]. The conceptually simplest algorithm for sequential decoding is the stack algorithm introduced by Zigangirov 1966 [Zig66] and Jelinek 1969 [Jel69]. In the meantime, Massey had suggested threshold decoding of convolutional codes [Mas63]. In Viterbi's famous paper from 1967 [Vit67], the Viterbi algorithm was invented as a proof technique and presented as "a new probabilistic nonsequential decoding algorithm." Forney [For67] was the first to draw a trellis,
26
Chapter 1
Introduction
and it was he who coined the name "trellis," which made understanding of the Viterbi algorithm
easy and its maximum-likelihood nature obvious. Forney realized that the Viterbi algorithm was optimum, but it was Heller who realized that it was practical [For94]. Later, Omura [Omu69] observed that the Viterbi algorithm can be viewed as the application of dynamic programming to the problem of decoding a convolutional code. The most important contributions promoting the use of convolutional codes were made by Jacobs and Viterbi when they founded Linkabit Corporation in 1968 and Qualcomm Inc. in 1985, thereby completing the path "from a proof to a product" [Vit90].
PROBLEMS 1.1 The channel capacity for the ideal bandlimited channel of bandwidth W that contains AWGN of variance WNo is given by
C, = W log (1 + WN ) bits/s The signal power can be written S = EbR,. Define the spectral bit rate r by
r = R,/W (bits/s)/hertz and show that Eb
2'-1
No
r
for rates R, less than capacity. Sketch r as a function of Eb/No expressed in dB. 1.2 Consider an ideal bandlimited AWGN channel with BPSK and with hard decisions. Based on transmitting R, = 2W bits/s, the capacity is
C,=2W(1+Eloge+(1-E)log(1-E)) bits/s where c = Q
rEb/No) and r is the spectral bit rate r = R,/ W.
Show that Eb No
>
2
nln2
for reliable communication. Hint: The Taylor series expansion of C, is
Ct=2W
22
1.2
1
2'
\12
\2-E) +3.4
/1
26
4
\2-E) +5.6
/1
6
\2-E) +...
loge
and
E=Q( rEb/No)>
1
2
-
1
rEblNo
2n
1.3 Show that a block code B can detect all patterns of s or fewer errors if and only if dm;,, > s. 1.4 Show that a block code 8 can correct all patterns of t or fewer errors and simultaneously detect all patterns of t + 1, t + 2, ... , t + s errors if and only if dmin > 2t + s. 1.5 Prove Theorem 1.2. 1.6 Consider a block code B with encoding matrix
G= ( 1
1
0
0
0
1
1
1
ff\V
1)
0
(a) List all codewords. (b) Find a systematic encoding matrix and its parity-check matrix. (c) Determine drain.
Problems
27
1.7 Consider the following two block codes. B, = { 110011, 101010, 010101, 011001, 100110, 111111, 001100, 0000001
B2 = {010101, 101010, 001100, 110110, 111111, 011001, 110011, 100110}
(a) Are the two codes linear? (b) Determine wmin for each of the codes. (c) Determine dmin for each of the codes. (d) Determine the rate R = K/N. 1.8 Consider the block code B = {000000, 110110, 011011, 101101}
(a) Is B linear? (b) Find the rate R = K/N. (c) Find, if it exists, a linear encoder. (d) Find, if it exists, a nonlinear encoder. (e) Determine dmin 1.9 Show that if 8 is a linear code and a V B, then B U (a + B) is also a linear code. 1.10 Consider the binary (6, K) even-weight code. (All codewords have even weight.) (a) Find K. (b) Give the encoding and parity-check matrices. 1.11 Consider the binary (4,3) even-weight code. (a) Construct a standard array. (b) For each coset give its syndrome. (c) How many errors can it correct? (d) Determine dmin.
1.12 Show that a binary code can correct all single errors if and only if any parity-check matrix has distinct nonzero columns. 1.13 Consider a binary code with encoding matrix
G=
1
0
0
0
1
01
1
0
1
0
1
0
0
1
1
1
0
1
(a) Find a parity-check matrix. (b) Construct a standard array. (c) List all codewords. (d) Determine from the standard array how many errors it can correct. (e) Determine dmin (f) For each coset give its syndrome. (g) Suppose that r = 110000 is received over a BSC with 0 < E < 1/2. Find the maximumlikelihood estimate of the information sequence u. 1.14 Consider a block code B with encoding matrix
G=
1
1
0
0
1
1
1
1
1
1
0
0
(a) Find a parity-check matrix. (b) List all codewords. (c) Determine dmin.
(d) Suppose that r = 000111 is received over a BSC with 0 < E < 1/2. Find the maximumlikelihood estimate of the information sequence u. 1.15 Consider a binary (N, K) code B with parity-check matrix H and minimum distance d. Assume that some of its codewords have odd weight. Form a code B by concatenating a 0 at the end of every codeword of even weight and a 1 at the end of every codeword of odd weight. This technique is called extending a code.
28
Chapter 1
Introduction
(a) Determine dmin for 13. (b) Give a parity-check matrix for the extended code C3,
1.16 Consider the (8,4) extended Hamming code. (a) Give a parity-check matrix.
(b) Determine din. (c) Find an encoding matrix. (d) Show how a decoder can detect that an odd number of errors has occurred. 1.17 The Hamming sphere of radius t with center at the N-tuple x is the set of ally in FZ such that dH (x, y) < t. Thus, this Hamming sphere contains exactly VC =
, =o \/ '
distinct N-tuples. Prove the Hamming bound for binary codes, that is, VI
2N(1-R)
I<
L
2
J
which is an implicit upper bound on dmin in terms of the block length N and rate R. 1.18 The systematic parity-check matrices for the binary Hamming codes can be written recursively as
and
Hm_I
Hrri_1
0
1...1
0...0
1
m > 3.
Find the parameters N, K, and dm1 for the mth Hamming code. 1.19 A code for which the Hamming bound (see Problem 1.17) holds with equality is called a perfect code. (a) Show that the repetition code, that is, the rate R = 1/N binary linear code with generator matrix G = (11 ... 1), is a perfect code if and only if N is odd. (b) Show that the Hamming codes of Problem 1.18 are perfect codes. (c) Show that the Hamming bound admits the possibility that an N = 23 perfect binary code with dmi = 7 might exist. What must K be? Remark. The perfect code suggested in (c) was found by Golay in 1949 [Go149]. There exist no perfect binary codes other than those mentioned in this problem. 1.20 Suppose that the block code 13 with parity-check matrix
H0 1
0
0
0
1
0
1
1
111 0
1
0
1 1
is used for communication over a BSC with 0 < E < 1/2. (a) Find dm;,,.
(b) How many errors can the code correct? (c) How many errors can the code detect? (d) For each syndrome give the error pattern a that corresponds to the error-correcting capability of the code. (e) For r = 0111011 find v, the maximum-likelihood decision. (f) For r = 0110111 find v, the maximum-likelihood decision.
29
Problems
1.21 Verify formula (1.66). Hint: The (7,4) Hamming code has one codeword of weight 0, seven codewords of weight 3, seven codewords of weight 4, and one codeword of weight 7. The error probability is the same for all bits. 1.22 Given a Hamming code 13 with parity-check matrix H, (a) Construct an extended code 13ext with parity-check matrix
(b) Determine dmi for Bext. (c) Construct an expurgated code Bexp with parity-check matrix
(d) Determine dm; for 13exp. (e) What is characteristic for the weights of the codewords of Bexp?
1.23 Consider the trellis given in Fig. 1.16. (a) List all codewords.
(b) Find the ML estimate of the information sequence for the received sequence r = 01 100110 11 on a BSC with 0 < E < 1/2. 1.24 Consider the convolutional encoder shown in Fig. P1.24. (a) Draw the trellis corresponding to four information digits and m = 1 dummy zero. (b) Find the number of codewords M represented by the trellis in (a). (c) Use the Viterbi algorithm to decode when the sequence r = 110110 1001 is received over a BSC with 0 < E < 1/2. V(1)
Figure P1.24 Convolutional encoder used in
U
Problem 1.24.
1.25 Consider the convolutional encoder with generator matrix 11
G =
10
01
11
11
10
01
11
(a) Find the rate and the memory. (b) Draw the encoder. (c) Find the codeword v that corresponds to the information sequence u = 1100100... . 1.26 Consider the code C with the encoding rule
v = uG + (11011110 11...)
30
Chapter 1
where G
- (11
10
01
11
=
11
10
01
I
Introduction
11
(a) Is the code C linear? (b) Is the encoding rule linear? 1.27 Consider the rate R = 2/3, memory m = 2 convolutional encoder illustrated in Fig. 1.12. (a) Draw the trellis diagram. (b) Find the encoder matrix G. (c) Let u = 101101 10 00 00 ... be the information sequence. Find the corresponding codeword v. 1.28 Plot in Fig. 1.19 the bit error probability for the (7,4) Hamming code when used to communicate over the Gaussian channel with hard decisions.
Hint: From formula (1.12), viz., c = Q
2Es1No), where E, = REb, we obtain the
following table:
Es/N0(dB)
E
0
0.79. 10-1
2
0.38. 10-'
4
0.12. 10-1 0.24. 10-2 10_3 0.19. 0.39. 10-5 0.90. 10-8
6 8 10 12
K Convolutional Encoders Structural Properties
After defining convolutional codes and convolutional encoders, we show that a given convolutional code can be encoded by many different encoders. We carefully distinguish code properties from encoder properties. The Smith form of a polynomial matrix is used to obtain important structural results for convolutional encoders. We give several equivalent conditions for an encoding matrix to be minimal-that is, to be realizable by as few memory elements as any encoder for the code, though not necessarily in controller or observer canonical forms. We also show that a systematic encoding matrix is always minimal. 2.1 CONVOLUTIONAL CODES AND THEIR ENCODERS
In general, the rate R = b/c, b < c, convolutional encoder input (information) sequence u = ... u_1uoulu2 ... , where ui = (u(i1)u(i2) ... u(i ), and output (code) sequence v = ... v_lvovl V(. )), must start at some finite time (positive or negative) and V2 .... where vj = may or may not end. It is often convenient to express them in terms of the delay operator D(D-transforms):
u(D) = ...+u_ID-1+uo+ulD+u2D2+
(2.1) (2.2)
In the sequel, we will not distinguish between a sequence and its D-transform.
Let F2((D)) denote the field of binary Laurent series. The element x(D) _ E°°r xi D` E F2((D)), r E Z, contains at most finitely many negative powers of D. The delay of a Laurent series is the "time index" at which the Laurent series starts. For example,
x(D)=D-2+1+D3+D7+D12+...
(2.3)
is a Laurent series with delay
(2.4) del x(D) = -2 Let F2[[D]] denote the ring of formal power series. The element f (D) = E'° fiD' E
F2[[D]] is a Laurent series without negative powers of D. Thus, F2 [[D]] is a subset of IF2((D)).
The element f (D) = F°°o f, D' with fo = 1 has delay del f (D) = 0 and is called delayfree. A polynomial p(D) = Y°Oo pi D' contains no negative and only finitely many positive powers of D. If po = 1, we have a delayfree polynomial, for example, 31
Convolutional Encoders-Structural Properties
Chapter 2
32
p(D) = 1 + D2 + D3 + D5
(2.5)
is a binary delayfree polynomial of degree 5. The set of binary polynomials F2[D] is a subset of 1F2[[D]] and, hence, a subset of ]F2((D)). Multiplication of two polynomials is ordinary polynomial multiplication with coefficient operations performed modulo 2. Since 1 is the only polynomial with a polynomial as its multiplicative inverse, viz., 1 itself, ]F2[D] cannot be a field. It is a ring-the ring of binary polynomials. Given any pair of polynomials x(D), y(D) E F2[D], with y(D) # 0, we can obtain the element x(D)/y(D) c ]F2((D)) by long division. Since sequences must start at some finite time, we must identify, for instance, (1 + D)/D2 (1 + D + D2) with the series D-2 + 1 + D + instead of the alternative series D-3 + D-5 + D-6 + that can also be obtained D3 + by long division but that is not a Laurent series. Obviously, all nonzero ratios x(D)/y (D) are invertible, so they form the field of binary rational functions ]F2(D), which is a subfield of the field of Laurent series IF2((D)). As an element in ]F2((D)), a rational function has either finitely many terms or is ultimately periodic, but a Laurent series can be aperiodic! Finite rational functions are also called Laurent polynomials. The degree of a Laurent polynomial is the "time index" at which the Laurent polynomial ends. For example,
x(D)=D-2+1+D3+D7
(2.6)
is a Laurent polynomial with degree
deg x (D) = 7
(2.7)
W e can consider n-tuples of elements from ]F2[D], ]F2L[D]], ]F2(D), or ]F2((D)). For ex-
ample, an n-tuplex(D) = (x(1)(D) x(2)(D) ... x(")(D)), where x (1) (D), x(2)(D), ... , x("I (D)
E ]F2((D)), can be expressed as x(D) = F°°,(xi(1) xe2i...x;"i)Di, r E Z, if x(>>(D) _
< j < n.
So we denote the set of n-tuples of elements from ]F2((D))
by ]F"2 ((D)), which is the n-dimensional vector space over the field of binary Laurent series ]F2((D)). Relative to ]FZ((D)) the elements in the field ]F2((D)) are usually called scalars. Similarly, we have >F2 [D], Fn [[D]], and Fn (D).
x(D)
If x(D) E ]FZ[D], we say that x(D) is polynomial in D. The degree of the element (1) (2) . x(") )Di is definedto bem, provided (x(1) ... x(")) (0 0 ... 0). m x(2) m 0(xi xi i m
For simplicity we call the elements in ]FZ[[DI] formal power series also when n > 1.
For our rate R = b/c encoder, we have the input sequences u(D) E ]F2((D)) and the output sequences v(D) E IFZ((D)). We next consider the realization of linear systems. Consider for simplicity the controller canonical form of a single-input, single-output linear system as shown in Fig. 2.1. The delay elements form a shift register, the output is a linear function of the input and the shift register contents, and the input to the shift register is a linear function of the input and the shift register contents. The output at time j m
v1 = E fi w1-i i=0
has the D-transform 00
v(D) j=-00
=k=-1
00
m
vjDj = E Y, /_-00 i-o
(Dz) wkDk = .f (D)w(D)
(2.9)
Section 2.1
33
Convolutional Codes and Their Encoders
Figure 2.1 The controller canonical form of a rational transfer function.
where we have replaced j - i by k, and where .f (D) = f o + f t D + ... + .fm Dm
(2.10)
and
w(D) _
wkDk
(2.11)
qi wj-i
(2.12)
k=-oo
From Fig. 2.1 it follows that m
wj = uj + Upon defining qo = 1, (2.12) can be rewritten as m
uj = Egiwj-i
(2.13)
i=o
or, by repeating the steps in (2.9), as
u(D) = q(D)w(D)
(2.14)
where 00
u(D) = E
(2.15)
j=-oo and q(D)=1+q1D+...+gmDm
(2.16)
Combining (2.9) and (2.14) we have
v(D) = u(D).f (D)lq(D) = u(D)
fo + ft D + ... + fm Dm
1+q1D+...+q Dm
(2.17)
Let g(D) = f (D)/q(D), then v(D) = u(D)g(D), and we say that g(D) is a rational transfer function that transfers the input u (D) into the output v(D). From (2.17), it follows that every rational function with a constant term 1 in the denominator polynomial q (D) (or, equivalently, with q (0) = 1 or, again equivalently, with q (D) delayfree) is a rational transfer function that can be realized in the controller canonical form shown in Fig. 2.1. Every rational function
g(D) = f (D)/q(D), where q(D) is delayfree, is called a realizable function. In general, a matrix G (D) whose entries are rational functions is called a rational transfer function matrix. A rational transfer function matrix G (D) for a linear system with many inputs or many outputs whose entries are realizable functions is called realizable.
34
Chapter 2
Convolutional Encoders-Structural Properties
In practice, given a rational transfer function matrix we have to realize it by linear sequential circuits. It can be realized in many different ways. For instance, the realizable function
fo + fl D + ... + fm Dm
8(D) = 1+q1D+...+gmDm
(2.18)
has the controller canonical form illustrated in Fig. 2.1. On the other hand, since the circuit in Fig. 2.2 is linear, we have
v(D) = u(D)(fo + f1 D + ... + fm Dm) +v(D)(giD+...+gmDm)
(2.19)
which is the same as (2.17). Thus, Fig. 2.2 is also a realization of (2.18). In this realization, the delay elements do not in general form a shift register as these delay elements are separated by adders. This is the so-called observer canonical form of the rational function (2.18). The controller and observer canonical forms in Figs. 2.1 and 2.2, respectively, are two different realizations of the same rational transfer function.
Figure 2.2 The observer canonical form of a rational transfer function.
We are now prepared to give a formal definition of a convolutional transducer.
Definition A rate R = b/c (binary) convolutional transducer over the field of rational functions F2(D) is a linear mapping
r : F"2 ((D)) u(D)
F2'((D)) v(D)
which can be represented as
v(D) = u(D)G(D)
(2.20)
where G(D) is a b x c transfer function matrix of rank b with entries in F2(D) and v(D) is called a code sequence arising from the information sequence u(D). Obviously, we must be able to reconstruct the information sequence u(D) from the code sequence v(D) when there is no noise on the channel. Otherwise the convolutional transducer would be useless. Therefore, we require that the transducer map is injective; that is, the transfer function matrix G(D) has rank b over the field F2(D). We are now well prepared for the following
Definition A rate R = b/c convolutional code C over F2 is the image set of a rate R = b/c convolutional transducer with G(D) of rank b over F2(D) as its transfer function matrix.
It follows immediately from the definition that a rate R = b/c convolutional code C over F2 with the b x c matrix G(D) of rank b over F2(D) as a transfer function matrix can
Section 2.1
Convolutional Codes and Their Encoders
35
be regarded as the 1F2((D)) row space of G(D). Hence, it can also be regarded as the rate R = b/c block code over the infinite field of Laurent series encoded by G(D). In the sequel we will only consider realizable transfer function matrices and, hence, we have the following
Definition A transfer function matrix (of a convolutional code) is called a generator matrix if it (has full rank and) is realizable.
Definition A rate R = b/c convolutional encoder of a convolutional code with generator matrix G(D) over F2(D) is a realization by a linear sequential circuit of a rate R = b/c convolutional transducer whose transfer function matrix G(D) (has full rank and) is realizable. We call a realizable transfer function matrix G(D) delayfree if at least one of its entries 0. If G(D) is not delayfree, it can be written as
f (D)/q(D) has f (0)
G(D) = D'Gd(D)
(2.21)
where i > I and Gd(D) is delayfree. Theorem 2.1 Every convolutional code C has a generator matrix that is delayfree. Proof. Let G(D) be any generator matrix for C. The nonzero entries of G(D) can be written
gij(D) = Ds'ifij(D)lgij(D) (2.22) where sij is an integer such that fi j (0) = qi j (0) = 1, 1 < i < b, 1 < j < c. The number sij is the delay of the sequence
gij(D) = Ds".fij(D)lgij(D) = Ds"
+gs=j+IDs'j+t +....
(2.23)
Let s = mini, j {sij }. Clearly,
G'(D) = D-SG(D)
(2.24)
is both delayfree and realizable. Since D-s is a scalar in IF2((D)), both G(D) and G'(D) generate the same convolutional code. Therefore, G'(D) is a delayfree generator matrix for the convolutional code C.
A given convolutional code can be encoded by many essentially different encoders. EXAMPLE 2.1 Consider the rate R = 1/2, binary convolutional code with the basis vector vo (D) = (1 + D + D 2 1 +D2). The simplest encoder for this code has the generator matrix
Go(D) = (1 + D + D2
1 + D2)
(2.25)
Its controller canonical form is shown in Fig. 2.3.
V(2)
Figure 2.3 A rate R = 1/2 convolutional encoder with generator matrix Go(D).
Theorem 2.2 Every convolutional code C has a polynomial delayfree generator matrix. Proof. Let G (D) be any (realizable and) delayfree generator matrix for C, and let q (D) be the least common multiple of all the denominators in (2.22). Since q (D) is a delayfree
Convolutional Encoders-Structural Properties
Chapter 2
36
polynomial,
G'(D) = q(D)G(D)
(2.26)
0
is a polynomial delayfree generator matrix for C.
An encoder that realizes a polynomial generator matrix is called a polynomial encoder. EXAMPLE 2.1 (Cont'd) If we choose the basis to be vi (D) = a, (D)vo(D), where the scalar a, (D) is the rational function al(D) = 1/(1 + D + D2), we obtain the generator matrix
G1(D)=
I
I+D
1+D+D2
(2.27)
for the same code. The output sequence v (D) _ (v (1) (D) v(2)(D)) of the encoder with generator matrix GI (D) shown in Fig. 2.4 can be written as
v(')(D) = u(D) v(z)(D) = u(D)
1 + D2 1 + D + D2
(2.28)
The input sequence appears unchanged among the two output sequences. vltl v(2)
Figure 2.4 A rate R = 1 /2 systematic convolutional encoder with feedback and generator matrix G I (D).
Definition A rate R = b/c convolutional encoder whose b information sequences appear unchanged among the c code sequences is called a systematic encoder, and its generator matrix is called a systematic generator matrix.
If a convolutional code C is encoded by a systematic generator matrix, we can always permute its columns and obtain a generator matrix for an equivalent convolutional code C' such that the b information sequences appear unchanged first among the code sequences. Thus, without loss of generality, a systematic generator matrix can be written as
G(D) = (Ib R(D))
(2.29)
where Ib is a b x b identity matrix and R(D) a b x (c - b) matrix whose entries are rational functions of D. Being systematic is a generator matrix property, not a code property. Every convolutional code has both systematic and nonsystematic generator matrices! (Remember that the code is the set of code sequences arising from the set of information sequences; the code does not depend on the mapping.) EXAMPLE 2.1 (Cont'd) If we further change the basis to v2(D) = a2(D)vo(D), where a2(D) E F2(D) is chosen as a2(D) _ 1 + D, we obtain a third generator matrix for the same code, viz.,
G2(D)=(1+D3 1+D+D2+D3)
(2.30)
Section 2.1
Convolutional Codes and Their Encoders
37
Definition A generator matrix for a convolutional code is catastrophic if there exists an information sequence u(D) with infinitely many nonzero digits, WH(u(D)) = 00, that results in codewords v(D) with only finitely many nonzero digits, wH(v(D)) < oo. EXAMPLE 2.2 The third generator matrix for the convolutional code given above, viz.,
G2(D)=(1+D3 1+D+D2+D3)
(2.31)
is catastrophic since u(D) = 1/(1 + D) = 1 + D + D2 + . has WH(u(D)) = oo but v(D) = u(D)G2(D) = (I + D + D2 1 + D2) = (1 1) + (1 0)D + (1 1)D2 has wH(v(D)) = 5 < 00. In Fig. 2.5 we show its controller canonical form. vhl
vhl
Figure 2.5 A rate R = 1/2 catastrophic convolutional encoder with generator matrix G2(D).
When a catastrophic generator matrix is used for encoding, finitely many errors (five in the previous example) in the estimate v(D) of the transmitted codeword v(D) can lead to infinitely many errors in the estimate u (D) of the information sequence u (D)-a "catastrophic" situation that must be avoided! Being catastrophic is a generator matrix property, not a code property. Every convolutional code has both catastrophic and noncatastrophic generator matrices. Clearly, the choice of the generator matrix is of great importance. EXAMPLE 2.3 The rate R = 2/3 generator matrix
1 + D + D2
D I + D3
1 + D3
D2
1
1
I+D3
1+D3
1+D
1
G(D)
1
(2.32)
has the controller and observer canonical forms shown in Figs. 2.6 and 2.7, respectively. vlrl
V v(3)
Figure 2.6 The controller canonical form of the generator matrix G(D) in (2.32).
38
Chapter 2
Convolutional Encoders-Structural Properties
(2)
v(2)
xFigure 2.7 The observer canonical form of the generator matrix G (D) in (2.32).
In Chapter 3 we will show that generator matrices G (D) with G (0) of full rank are of particular interest. Hence, we introduce Definition A generator matrix G(D) is called an encoding matrix if G(0) has full rank. We have immediately the following
Theorem 2.3 An encoding matrix is (realizable and) delayfree.
The polynomial generator matrices G0(D) (2.25), G1(D) (2.27), and G2(D) (2.31) as well as the rational generator G(D) in Example 2.3 are all encoding matrices. But the polynomial generator matrix
G(D)
1 1D
D+D2 1+D
(2.33)
is not an encoding matrix since G(0) has rank 1. In the sequel we will see that all generator matrices that are interesting in practice are in fact encoding matrices! The generator matrix for the convolutional encoder shown in Fig. 1.11 can and where are the entries of the b x c matrix Gk, 0 < k < m, in (1.74). Remark.
be written G(D) = (g,j (D))1«
We now present a useful decomposition of polynomial convolutional generator matrices. This decomposition is based on the following fundamental algebraic result [Jac85]:
Section 2.2
The Smith Form of Polynomial Convolutional Generator Matrices
39
Theorem 2.4 (Smith form) Let G(D) be a b x c, b < c, binary polynomial matrix (i.e., G(D) = (gij(D)), where g,j(D) E FA[D], 1 < i <_ b, 1 < j < c) of rank r. Then G(D) can be written in the following manner:
G(D) = A(D)F(D)B(D)
(2.34)
where A(D) and B(D) are b x b and c x c, respectively, binary polynomial matrices with unit determinants, and where I'(D) is the b x c matrix yt (D) Y2(D)
I' (D) =
yr (D)
(2.35)
0
0 ... 0/ which is called the Smith form of G(D), and whose nonzero elements y; (D) E F2[D], 1 < i < r, called the invariant factors of G(D), are unique polynomials that satisfy y r (D) I
yy+t (D), i = 1, 2, ... , r - 1
(2.36)
Moreover, if we let Ai (D) E IF2[D] be the determinantal divisor of G(D), that is, the greatest common divisor (gcd) of the i x i subdeterminants (minors) of G(D), then (2.37)
Ai-1(D) where AO (D) = 1 by convention and i = 1 , 2, ... , r.
By definition a generator matrix has full rank. Its Smith form decomposition is illustrated in Fig. 2.8. We call the matrices A(D) and B(D) "scramblers" because they define permutations. The input sequences are scrambled in the b x b scrambler A(D). The b outputs of this scrambler are multiplied by the b invariant-factors. These b products plus c - b dummy zeros are then scrambled in the c x c scrambler B(D) to give the output sequences. WO
71(
A
bxb
7z(D)
(2)
B
scrambler
7b(D) 0
cXC scrambler
0
Figure 2.8 Smith form decomposition of a rate R = b/c polynomial convolutional encoder.
Before we give a proof of Theorem 2.4, we introduce two types of elementary operations.
Type I:
The interchange of two rows (or two columns). The following nonsingular matrix performs the interchange of rows (or columns)
i and j depending on whether it is used as a pre- (or post-) multiplier.
Chapter 2
40
/
Convolutional Encoders-Structural Properties
1
1
row i
Pit = row j I 1
It is immediate that Pi.i 1 = P,j. Moreover, det(P,1) = -1 = 1 since we are working in F2.
Type II: The addition to all elements in one row (or column) of the corresponding elements in another row (or column) multiplied by a fixed polynomial in D.
The following nonsingular matrix performs the addition to the elements in row (or column) i of the elements in row (or column) j multiplied by the polynomial p(D) E FF2[D] when used as a pre- (or post-) multiplier:
/1
1 1
- - - p(D) -
row i
R,1(p(D)) =
row j
1
1/ It is easy to check directly that One sees trivially that det(R,j (p (D))) = 1.
R,1(-p(D)) = Rij (p (D)).
Premultiplication by any of these elementary matrices results in the associated transformation being performed on the rows, whereas postmultiplication does the same for columns. Proof of Theorem 2.4. If G(D) = 0, there is nothing to prove. Therefore, we assume
that G(D) = 0. First, we show that starting with G(D) we can obtain, by elementary operations only, a matrix whose entry in the upper-left corner is nonzero and has minimal degree of all nonzero entries in all matrices that can be obtained from G(D) by elementary operations. We can bring the polynomial of lowest degree in G(D) to the upper-left corner by elementary operations.
Assume now that it is there. Let a,j (D), 1 < i < b, 1 < j < c, be the elements in this new matrix. Divide an element in the first row, a11(D), j > 1, by all (D). Then we have a11(D) = al l (D),B1(D) + pl1(D), where deg(,Bi1(D)) < deg(a11(D)). Now add the first column multiplied by ,Bi (D) to the jth column. This elementary operation replaces all (D) by $l j (D). Thus, if Fit j (D) # 0, we obtain a matrix for which the lowest degree of a nonzero entry has been reduced. Repeat this procedure for the new matrix. Similarly, we can obtain a matrix for which the lowest degree of the nonzero entries in the first column has been reduced. Since the degree is reduced at each step the procedure is finite. Thus, repeating this process yields a matrix G'(D) = (fi,j (D)) in which N11(D) has minimal degree and f311(D) divides
Section 2.2
The Smith Form of Polynomial Convolutional Generator Matrices
41
both 01 j (D) and fi 1(D) for all i and j. Except for the upper-left corner, we next clear the first row and first column to zero by elementary operations. This gives a matrix of the form 811 (D)
...
0
0
0
G"(D) 0
where we let 11(D) = yj (D). Next we will prove that yj (D) divides every element in G"(D) = (Sid (D)). If yj (D) I 8i! (D), then we add the jth column to the first column and obtain a new first column. Repeating
the procedure described above then yields an element in the upper-left corner with a degree less than deg(p11(D)) which is a contradiction. Hence, our matrix can be written as
Yi(D)
...
0
0
0
Yi(D)G*(D) 0
Repeating the procedure for G*(D) (induction) proves the first part of the theorem. Now we will prove that A, (D)
aef
the gcd of all i x i minors of G(D)
(2.38)
is unaffected by elementary row and column operations on G (D). Let A (D) = (ail (D)) be a b x b scrambler with entries in FAD]. The (i, j) entry of A(D)G(D) is >k aik(D)gkj (D). This shows that the rows of A(D)G(D) are linear combinations of the rows of G(D). Hence, the i x i minors of A(D)G(D) are linear combinations of the i x i minors of G(D). Thus, the gcd of all i x i minors of G(D) is a divisor of the gcd of all i x i minors of A(D)G(D). Since A(D) has a unit determinant, A1 (D) is also a b x b polynomial matrix. By repeating the argument above for the matrix A-' (D) (A (D) G (D)) we can show that the gcd of all i x i minors
of A(D)G(D) is a divisor of the gcd of all i x i minors of A-1(D)(A(D)G(D)) = G(D). Hence, the gcd of all minors of G(D) and A(D)G(D) are the same. Since A(D) can be any product of elementary row operations, we have shown that Di (D) is unaffected by elementary row operations. Similarly, we can show that Di (D) is unaffected by elementary column operations. We have now proved that we can also identify
Di (D) = the gcd of all i x i minors of I'(D)
(2.39)
The form of F(D) then shows that
A, (D) = y1(D) O2(D) = Yl(D)Y2(D)
(2.40)
A,(D) = Y1(D)Y2(D) ... y, (D)
or (with 00(D) = 1) Yi(D) =
Di (D)
Di-1(D)'
i = 1, 2, ..., r
(2.41)
Moreover, the uniqueness of Di (D), i = 1, 2, ... , r, implies that of yj (D), i = 1, 2, ... , r, which completes the proof.
El
42
Convolutional Encoders-Structural Properties
Chapter 2
EXAMPLE 2.4 To obtain the Smith form of the polynomial encoder illustrated in Fig. 2.9 we start with its encoding matrix
1 +D D
G(D) =
D2
1
(2.42)
1+D+D2
1
and interchange columns 1 and 3: 0
1+D D DZ
1
1+D+D2
1
)(\0 1
0
1
1
0
0
0
(2.43)
D 1+D
1
1+D+D2
D2
1
Now the element in the upper-left corner has minimum degree. To clear the rest of the first row, we can proceed with two type two operations simultaneously:
1+D+D2
1
D2
D 1+D
1
D 1+D
1
/
0
1
0
0
0 (2.44)
1
0
1
0
1+D+D2+D3 1+D2+D3
1+D+D2
01
Next, we clear the rest of the first column: C
1
1+D+D2
0
0
1+D+D2 1+D+D2+D3 1+D2+D3
1
i (2.45)
0
1
\
0
0 1+D+D2+D3 1+D2+D3
Following the technique in the proof, we divide I + D2 + D3 by I + D + D2 + D3:
1 +D2+D3 = (1 +D+D2+D3)1 +D
(2.46)
Thus, we add column 2 to column 3 and obtain: 0
1
0
0 1+D+D2+D3 1+D2+D3 (2.47) 0
0
1
0 1+D+D2+D3 D Now we interchange columns 2 and 3: 0
1
0
0 1+D+D2+D3 D 1
0
1
0
0
0
0
1
0
1
0
(2.48)
0
0 D 1+D+D2+D3 Repeating the previous step gives
1+D+D2+D3=D(1+D+D2)+1
(2.49)
Section 2.2
The Smith Form of Polynomial Convolutional Generator Matrices
Figure 2.9 A rate R = 2/3 convolutional en-
43
u(2)
coder.
and, hence, we multiply column 2 by 1 + D + D2, add the product to column 3, and obtain 1
0
0
0 D 1+D+D2+D3
1
0
0
0
1
1+D+D2
0
0
1
f1
0
0
0
D
1
(2.50)
)
Again we should interchange columns 2 and 3: 1
0
0
0
0
0
I
1
0
0
0
0
1
D
(2.51)
1
and, finally, by adding D times the column 2 to column 3 we obtain the Smith form: 1
0
0
0
1
D
0
0
1
= (O
0 ) = F(D)
(2.52)
All invariant factors for this encoding matrix are equal to 1. By tracing these steps backward and multiplying F (D) with the inverses of the elementary matrices (which are the matrices themselves), we obtain the matrix:
G(D) = A(D)F(D)B(D)
0\
I
1+D+D2 1 )r(D )
x
1+ D+D 2
0
1
1
D
00
01
01
0
0
00
00
1
0
1
1 D 1+D
1 00 00 1
0
1
1
0
1
0
0
0
0 0
1
0
0
1
1
0
1
0
00 1
0
1
0
1
00
1
110 1
0
1
(2.53)
0 0
and we conclude that 1
0)
1+D+D2 1
(2.54)
Chapter 2
44
Convolutional Encoders-Structural Properties
and
1+D D 1+D2+D3 1+D+D2+D3 0 D+D2 1+D+D2 0 1
B(D) =
(2.55)
Thus, we have the following decomposition of the encoding matrix G(D):
G(D) _ (
0
1
1
0 0
1+D+D2 1 (0 0) 1+D D 1+D2+D3 1+D+D2+D3 0 D+D2 1+D+D2 0 1
(2.56)
1
x
The extension of the Smith form to matrices whose entries are rational functions is immediate. Let G(D) be a b x c rational function matrix, and let q(D) E F2[D] be the least common multiple (lcm) of all denominators in G (D). Then q (D) G (D) is a polynomial matrix with Smith form decomposition
q(D)G(D) = A(D)r'q(D)B(D)
(2.57)
Dividing through by q(D), we obtain the so-called invariant factor decomposition of the rational matrix G(D) [KFA69][HaH70]:
G(D) = A(D)r'(D)B(D)
(2.58)
h(D) = r'q(D)/q(D)
(2.59)
where
with entries in ]F2(D). Thus Y1 (D)
q(D) Y2 (D)
q (D)
F(D) =
(2.60)
Yr (D)
q (D) 0
0 ... 0) where )"q(D)) I Let
yz(D)
q(D)'...,
Y '(D) are called the invariant factors of G(D). q(D) y, (D)
a; (D)
q(D)
Pi(D)
t = 1,2,
,r
where the polynomials a; (D) and ,B, (D) are relatively prime. Since y, (D)
1,2,.. ,r-1; that is, q (D)
(D) I q(D) "+t "Pi (D) (D) 8i +t (D)
(2.61) I
yi+1(D),
i=
(2.62)
we have ai (D)Or+1 (D) I a;+i (D)8j (D)
(2.63)
Section 2.3
45
Encoder Inverses
From (2.63) and the fact that gcd(ai (D), fi, (D)) = 1 , i = 1 , 2, ... , r, it follows that a; (D) I ai+1(D)
(2.64)
Ni+1(D) I fi (D)
(2.65)
and
i = 1, 2, .
.
.
, r - 1.
EXAMPLE 2.5 The rate R = 2/3 rational encoding matrix (2.32) in Example 2.3 D
1
1+D3
I+D3
1
1
I+D3
1+D
1
I+D+D2
G(D)=
3
D
I+D3
(2.66)
has
q(D)=lcm(1+D+D2, 1+D3, 1+D)=1+D3
(2.67)
Thus, we have
q(D)G(D) =
1+D D D2
1
1
(2.68)
1 -}- D + D2
which is equal to the encoding matrix in Example 2.4. Hence, from (2.56), (2.59), and (2.67) it follows that
G(D) =
1
0
I +D+D2
1
1+D3
X
1+D
0
1
0
1+D3
0
7
0
D
(2.69)
1
1+D2+D3 I+D+D2+D3 0 D+D2
1+D+D2
0
2.3 ENCODER INVERSES Let us consider a convolutional encoder in a communication system as shown in Fig. 2.10. From
the information sequence u(D) the encoder generates a codeword v(D), which is transmitted over a noisy channel. The decoder operates on the received data to produce an estimate u(D) of the information sequence u(D). As the case was with block codes in Chapter 1, we can split the decoder conceptually into two parts: a codeword estimator that from the received data r(D) produces a codeword estimate v(D), followed by an encoder inverse t-1 that assigns to the codeword estimate the appropriate information sequence u(D). This communication system is illustrated in Fig. 2.11. A practical decoder is seldom realized in two parts, although the decoder can do no better estimating the information sequence directly from the received data than by first estimating the codeword and then obtaining the decoded sequence via the inverse map r-1. it
U Encoder
Figure 2.10 A convolutional encoder in a communication situation.
Information sequence
Decoder
Channel
Codeword
Received
data
Decoded sequence
Convolutional Encoders-Structural Properties
Chapter 2
46
U -.a
Channel
Encoder
Codeword
Encoder
estimator
inverse
r-F
L7_ - - - - - Information
Codeword
Codeword estimate
Received
sequence
data
-1
Decoded sequence
Figure 2.11 A conceptual split of decoder into two parts.
The inverse map r that is,
is represented by a c x b right inverse matrix G t(D) of G(D),
G(D)G-'(D) = Ib
(2.70)
where Ib is the b x b identity matrix. In general, a right inverse G-t (D) of a generator matrix is not realizable. Theorem 2.5 A convolutional generator matrix G (D) that has a realizable right inverse is an encoding matrix. Proof. Let G t (D) be a realizable right inverse of G(D), that is,
G(D)G-t(D) = Ib
(2.71)
Substituting 0 for D in (2.71), we obtain
G(O)G-'(0) = Ib
(2.72)
Hence, G (0) has full rank.
Theorem 2.6 A rational convolutional generator matrix has a realizable and delayfree inverse if and only if ab(D) is delayfree. Proof. For any rational generator matrix G(D) a right inverse can be obtained from the invariant-factor decomposition G(D) = A(D)F(D)B(D):
G-' (D) = B-' (D)r-' (D)A-t (D)
(2.73)
where q (D) Yi (D)
q (D) Y2(D)
r-t(D) = q(D)rq'(D) _
q (D)
(2.74)
Yb(D) 0
k
dl
is a right inverse of the b x c polynomial matrix r(D). The matrix rq(D) is the Smith form of the polynomial matrix q (D) G (D), where q (D) is the least common multiple of the denominators in G(D) and y, (D), 1 < i < b, are the invariant factors of q(D)G(D). We have
r(D)r-1(D) = Ib
(2.75)
Section 2.3
Encoder Inverses
47
From (2.61) it follows that r-' (D) can be written
7 Pi (D) al (D) P2 (D)
a2(D)
r-1(D) =
1b(D)
(2.76)
ab (D) 0
0
Suppose that ab(D) is delayfree. From (2.64) we deduce that all ai(D), 1 < i < b, are delayfree. Then the right inverse matrix I'-1 (D) (2.76) is realizable. Hence, all the entries in (D)r-1(D)A-1(D)
(2.73) are realizable. From (2.72) it follows that G-1(0) has full rank. Hence, G-1 (D) is delayfree. Conversely, suppose that the rational generator matrix G(D) has a realizable right inverse G-1 (D). Since both G-1 (D) and B-' (D)F-' (D)A-1(D) are (possibly different) right inverses of G(D), it follows that
G-'(D) = B-1
G(D)(G-1(D) + B-1(D)r-'(D)A-'(D)) = 0
(2.77)
Substituting the invariant-factor decomposition G(D) = A(D)I'(D)B(D) into (2.77), we have
A(D)r(D)B(D)(G-1(D) + B-1(D)F-1(D)A-'(D)) = 0
(2.78)
F(D)(B(D)G-1(D) + r-1(D)A-1(D)) = 0
(2.79)
or, equivalently,
From (2.79) we conclude that
B(D)G-1(D) + r-1(D)A-1(D) =
0
L1(D)
(2.80)
where L, (D) is a (c - b) x b matrix with entries in 1F2(D). Thus, we have B(D)G-1(D)
=
(L,D) ) + r1(D)A 1(D)
) ( L, (D)
LA D )
2.81)
+
where L2(D) is a b x c matrix with entries in F2(D). Since G-1(D) is realizable, so is B(D)G-1(D) and, hence, L2(D). Thus, F-1(D)A-1(D) is realizable. Therefore, ab(D) is delayfree, and the proof is complete.
Corollary 2.7 A polynomial convolutional generator matrix has a realizable and delayfree inverse if and only if yb(D) is delayfree. Proof. matrix.
It follows immediately from (2.61) that ab(D) = yb(D) for a polynomial
Chapter 2
48
Convolutional Encoders-Structural Properties
EXAMPLE 2.6 Since the encoding matrix G(D) for the encoder in Fig. 2.9 satisfies the condition of Corollary 2.7, it has a realizable and delayfree inverse. A right inverse matrix G-1 (D) for G(D) is
G-' (D) = B-' (D)F-' (D)A-' (D) 0
0 1
1+D+D2 1+D+D2+D3 D+D2 1+D2
1
0
1+D2+D3
0
1
1+D+D3
0
0
1
\ 1 +D+D2
0 1
i
(2.82)
1+D2+D4 1+D+D2 D+D4 D+D3+D4
D+D2 1+D2
which happens to be polynomial. In Fig. 2.12 we show the controller canonical form of G-' (D).
U(2)
Fr v(2)
Figure 2.12 The controller canonical form of the encoding right inverse matrix G-I (D) in Example 2.6.
EXAMPLE 2.7 The rate R = 1/2 convolutional encoder in Fig. 2.3 has encoding matrix
G(D) = (1 + D + D2
(2.83)
1 + D2)
Following the technique in the proof of Theorem 2.4, we divide g12(D) by g1 l (D):
1+D2=(1+D+D2)1+D
(2.84)
Thus, we add 1 + D + D2 to 1 + D2 and obtain the sum D:
1 )=l+D+D2
(1+D+D2 1+D2)(
D)
(2.85)
We interchange the columns by an elementary operation:
(1 + D + D2
D)
p
(
) = (D 1 + D + D2)
(2.86)
1
Repeating the first step yields
1+D+D2=D(1+D)+1
(2.87)
Section 2.3
49
Encoder Inverses
and we add 1 + D times the first column to the second column: 1
(D 1+D+D2)
=(D
0
(2.88)
1)
Again we interchange columns:
(D
1) (
0)
1
= (1
D)
(2.89)
Finally, adding D times the first column to the second column yields D)
(1
=(I
0
(2.90)
0) = T(D)
Since we did not perform any elementary operations from the left, the scrambler A(D) = (1). Since the inverses of the elementary matrices are the matrices themselves, the scrambler B(D) is obtained as
B(D)
1
l
0
Ol
0)
1
0
1 1D I1
(1
Ol
0 /
0
1
(2.91)
1+D+D2 1+D2 1+D
D
Hence, we have the right inverse matrix
G1(D) = B ' (D)P ' (D)A ' (D) = (
D
1+DZ
1+D 1+D+DZ
1
0 (2.92)
D
1+D Its observer canonical form is shown in Fig. 2.13. V(1)
v(2)
Figure 2.13 The observer canonical form of the encoding right inverse matrix G (D) in Example 2.7.
EXAMPLE 2.8 Consider the rate R = 2/3 convolutional encoding matrix
G(D) =
1
1+D
1+D+D2
1+D+D3 1+D2+D3 1+D+D4
(2.93)
Since all entries in G(D) are relatively prime, it follows that Ai (D) = 1 and, hence, yi (D) = 1. The determinantal divisor A2(D) is the gcd of all 2 x 2 subdeterminants of G(D); that is,
A2(D) = gcd (D4, D + Ds, D + D2 +D 4) =D
(2.94)
Thus, we have
A2(D) = D
(2.95) y2(D) = A(D) which is not delayfree. By Theorem 2.6 none of the right inverses of the encoding matrix G(D) is realizable!
Theorem 2.8 A rational convolutional generator matrix has a (delayfree) polynomial right inverse if and only if a1, (D) = 1.
Chapter 2
50
Proof.
Convolutional Encoders-Structural Properties
Suppose that ab(D) = 1. From (2.64) it follows that at (D) = a2(D)
ab(D) = 1. Then l
,d2 (D)
G-'(D) = B-'(D)
A-' (D)
,fib (D)
(2.96)
0 /
0
is a polynomial right inverse of G(D). Since G(D) is realizable, it follows that G-' (D) is delayfree.
Conversely, suppose that G(D) has a polynomial right inverse G '(D). Since A(D) has unit determinant, there exists a polynomial vector (xl (D) X2 (D) ... xb (D)) such that
(x1 (D) x2(D) ... xb(D))A(D) = (0 0 ... 0 fib (D))
(2.97)
Then (x1(D) X2 (D) ... xb(D)) = (xl (D) X2 (D) ... xb(D))G(D)G-' (D)
= (xi (D) X2 (D) ...
xb(D))A(D)F(D)B(D)G-l (D)
(2.98)
= (0 0... 0 flb(D))r(D)B(D)G-' (D)
= (0 0...0 ab(D) 0...0) B(D)G-'(D) = (ab(D)yl (D) ab(D)y2(D) ... ab(D)yb(D))
where (yi(D) y2(D)
... yb(D)) is the bth row of B(D)G-'(D). From (2.97) we deduce
that 13b(D) is the greatest common divisor of the polynomials xl (D), X2 (D), ... , and xb(D).
Then it follows from (2.98) that fib(D) is the greatest common divisor of ab(D)yl (D), ab(D)y2(D),... , and ab(D)yb(D). Clearly, ab(D) is a common divisor of ab(D)yl (D), ab(D)y2(D), ..., and ab(D)yb(D), which implies that ab(D) ,b(D). Since gcd(ab(D), I
13b(D)) = 1, we conclude that ab(D) = 1 and the proof is complete. From Theorem 2.8 follows
Corollary 2.9 A polynomial convolutional generator matrix has a (delayfree) polynomial right inverse if and only if yb(D) = 1. EXAMPLE 2.9 Since the encoding matrix G(D) for the encoder in Fig. 2.9 satisfies the condition of Corollary 2.9 it has a polynomial right inverse. See Example 2.6.
The catastrophic error situation when finitely many errors in the estimated codeword sequence could cause infinitely many errors in the estimated information sequence is closely related to the existence of a polynomial right inverse matrix G-' (D): if v (D) contains finitely
many nonzero digits, then u(D) = v(D)G-'(D) also contains finitely many nonzero digits since G-1(D) is polynomial. No catastrophic error propagation can occur! This result is included in the following
Theorem 2.10 A rational convolutional generator matrix G(D) is noncatastrophic if and only if ab(D) = for some integer s > 0.
Section 2.3
Encoder Inverses
51
Proof. Suppose that ab(D) = Ds. Then from (2.64) it follows that a1 (D) divides DS for 1 < i < b. Thus, the matrix
Pi (D)
al(D) P2 (D)
a2(D) DSG-1(D) = Ds B -'(D)
$b (D)
A-1(D)
(2.99)
ab(D) 0
0 1 is polynomial and, hence, G(D) is noncatastrophic. Conversely, suppose that ab(D) is not a power of D. Then ab(D) has a delayfree factor, and it follows that wHC8b(D)/ab(D)) = cc. The input sequence
u(D) = (0 0...0 ND)/ab(D))A-1(D)
(2.100)
then also has wH(u(D)) = oc. But now we see that
v(D) = u(D)G(D) (2.101) = (0 0...0 13b(D)/ab(D))A 1(D)A(D)P(D)B(D) = (0 0...0 1)B(D) which is polynomial and wH(v(D)) < oc follows; that is, the generator matrix G(D) is catastrophic. The proof is complete.
For the special case when G (D) is polynomial we have
Corollary 2.11 A polynomial convolutional generator matrix is noncatastrophic if and only if yb (D) = DS for some integers > 0. From Corollary 2.11, (2.36), and (2.40) follows immediately [MaS68]
Corollary 2.12 A polynomial convolutional generator matrix is noncatastrophic if and only if Ab(D) = DS for some integers > 0. Any c x b matrix G-1 (D) over IF2(D) is called a right pseudo inverse of the b x c matrix
G(D) if
G(D)G-1(D) = DsIb
(2.102)
for some s > 0.
Corollary 2.13 A rational convolutional generator matrix G(D) is noncatastrophic if and only if it has a polynomial right pseudo inverse G -I (D). Proof. Follows from the proof of Theorem 2.10. EXAMPLE 2.10 The rate R = 1/2 polynomial convolutional encoding matrix G2(D) = (1 + D3 1 + D + D2 + D3), whose realization is shown in Fig. 2.5, has the Smith form decomposition (Problem 2.5)
G2(D) = (1) (1 + D 0)
1DD2 + + 1D2 + 1+D
D
Since yb(D) = I + D # D`, the polynomial encoding matrix G2(D) is catastrophic.
(2.103)
52
Convolutional Encoders-Structural Properties
Chapter 2
EXAMPLE 2.11 From the Smith form of the rate R = 2/3 polynomial convolutional encoding matrix in Example 2.4 follows 1
0
0)
\0
1
0
r(D) = I
(2.104)
Hence, the encoding matrix in Example 2.4 is noncatastrophic. EXAMPLE 2.12 The rate R = 2/3 rational convolutional encoding matrix
G(D) = I
1
D
1
1+D2
1+D2
1+D
D
1
1+D2
(2.105)
I 1
1+D2
has the invariant-factor decomposition (Problem 2.6) 0
G(D)
I+D2 (D O1) (
(
0 1+D 0)
1
D
1+D
0
1+D
1
0
1
0
(2.106)
The encoding matrix G(D) is noncatastrophic since y2(D) = 1 + D divides q (D) = 1 + D2 and, hence,
a2(D) = 1. EXAMPLE 2.13 The Smith form of the rate R = 2/3 polynomial convolutional encoding matrix G(D) given in Example 2.8 is
r(D) = ( 1
D
0
)
(2.107)
The encoding matrix G(D) is noncatastrophic since y2(D) = D, but the right inverse G-'(D) is not
realizable (y2(D):1). 2.4 EQUIVALENT AND BASIC ENCODING MATRICES In a communication context, it is natural to say that two encoders are equivalent if they generate
the same code C. It is therefore important to look for encoders with the lowest complexity within the class of equivalent encoders.
Definition Two convolutional generator (encoding) matrices G(D) and G'(D) are equivalent if they encode the same code. Two convolutional encoders are equivalent if their generator matrices are equivalent.
Theorem 2.14 Two rate R = b/c convolutional generator (encoding) matrices G(D) and G'(D) are equivalent if and only if there is a b x b nonsingular matrix T (D) over F2(D) such that
G(D) = T(D)G'(D)
(2.108)
Proof. If (2.108) holds, then G(D) and G'(D) are equivalent. Conversely, suppose that G (D) and G'(D) are equivalent. Let gi (D) E F2 '(D) be the
ith row of G(D). Then there exists a ui(D) E IFz((D)) such that
gi (D) = ui (D) G'(D)
(2.109)
Section 2.4
Equivalent and Basic Encoding Matrices
53
Let
ul(D) u2(D)
T (D) _
2.110)
Ub(D)
Then
G(D) = T(D)G'(D)
(2.111)
where T (D) is a b x b matrix over 1F2((D)). Let S'(D) be a b x b nonsingular submatrix of G'(D) and S(D) be the corresponding b x b submatrix of G(D). Then S(D) = T (D) S'(D). Thus, T (D) = S(D)S'(D)-' and, hence, T (D) is over F2(D). Since G(D), being a generator matrix, has rank b, it follows that T (D) also has rank b, and, hence, is nonsingular.
Definition A convolutional generator (encoding) matrix is called basic if it is polynomial and it has a polynomial right inverse. A convolutional encoder is called basic if its generator matrix is basic. From Theorem 2.5 follows immediately
Theorem 2.15 A basic generator matrix is a basic encoding matrix. Next we have the important
Theorem 2.16 Every rational generator matrix is equivalent to a basic encoding matrix. Proof. By Theorem 2.2 every rational generator matrix has an equivalent polynomial delayfree generator matrix. Let the latter be G(D) with the invariant-factor decomposition
G(D) = A(D)I'(D)B(D), where A(D) and B(D) are b x b and c x c polynomial matrices, respectively, of determinant 1, and
r(D) =
Y2(D) 0
Yb(D)
...
J
(2.112)
G'(D)
(2.113)
Let G'(D) be a generator matrix consisting of the first b rows of B(D). Then
G(D) = A(D)
y2 (D) Yb
(D)
Since both A(D) and Y2(D) Yb(D)
are nonsingular matrices over F2(D), it follows from Theorem 2.14 that G(D) and G'(D) are equivalent. But G'(D) is polynomial and since B(D) has a polynomial inverse, it follows that G'(D) has a polynomial right inverse (consisting of the first b columns of B-1(D)). Therefore, G'(D) is a basic generator matrix. Then from Theorem 2.15 follows that G'(D) is a basic encoding matrix. From Corollary 2.9 follows immediately
Convolutional Encoders-Structural Properties
Chapter 2
54
Theorem 2.17 A generator matrix is basic if and only if it is polynomial and yb (D) = 1.
Corollary 2.18 A basic encoding matrix G (D) has a Smith form decomposition
G(D) = A(D)h(D)B(D)
(2.114)
where A(D) is a b x b polynomial matrix with a unit determinant, B(D) is a c x c polynomial matrix with a unit determinant, and r (D) is the b x c matrix 1
J
(2.115)
Corollary 2.19 A basic encoding matrix is noncatastrophic. Proof. Follows from Corollary 2.11 and Theorem 2.17. In the sequel unless explicitly stated otherwise, we shall consider only basic encoders. As long as we do not require that the encoder should be systematic, we have nothing to gain from feedback! Now we have the following
Theorem 2.20 Two basic encoding matrices G(D) and G'(D) are equivalent if and only if G'(D) = T (D)G(D), where T (D) is a b x b polynomial matrix with determinant 1. Proof. Let G' (D) = T (D) G (D), where T (D) is a polynomial matrix with determinant 1. By Theorem 2.14, G(D) and G'(D) are equivalent. Conversely, suppose that G'(D) and G(D) are equivalent. By Theorem 2.14 there is a nonsingular b x b matrix T (D) over ]F2(D) such that G'(D) = T (D)G(D). Since G(D) is basic, it has a polynomial right inverse G-1 (D). Then T (D) = G'(D)G-' (D) is polynomial. We can repeat the argument with G(D) and G'(D) reversed to obtain G(D) = S(D)G'(D)
for some polynomial matrix S(D). Thus, G(D) = S(D)T(D)G(D). Since G(D) has full rank, we conclude that S(D)T (D) = Ib. Finally, since both T (D) and S(D) are polynomial, T (D) must have determinant 1 and the proof is complete.
Corollary 2.21 Let G(D) = A(D)r(D)B(D) be the Smith form decomposition of a basic encoding matrix G(D), and let G'(D) be the b x c polynomial matrix that consists of the first b rows of the matrix B(D). Then G(D) and G'(D) are equivalent basic encoding matrices. Proof. Since G(D) is basic, it follows from Corollary 2.18 that G(D) = A(D)G'(D), where A(D) is a b x b unimodular (determinant 1) matrix. Applying Theorem 2.20 completes the proof. EXAMPLE 2.14 The encoding matrix for the rate R = 2/3 convolutional encoder shown in Fig. 2.14
1+D
G'(D)
D
1
1+D+D2+D3 0
l
(2 . 116)
is simply the first two rows of B(D) in the Smith form decomposition of the encoding matrix (2.56) for the encoder in Fig. 2.9, viz.,
1+D D
G(D)
(
D2
I
1
1+D+DZ
(2.117)
Thus, G(D) and G'(D) are equivalent, and the encoders in Figs. 2.9 and 2.14 encode the same code.
Section 2.5
Minimal-Basic Encoding Matrices
55
u(1)
V
Figure 2.14 The controller canonical form of the encoding matrix G'(D) in Example 2.14.
2.5 MINIMAL-BASIC ENCODING MATRICES
We begin by defining the constraint length for the ith input of a polynomial convolutional generator matrix as vi = max {deg gil (D)} 1<j
(2.118)
the memory m of the polynomial generator matrix as the maximum of the constraint lengths, that is,
m = max{vi} 1
(2.119)
and the overall constraint length as the sum of the constraint lengths b
V =
Vi
(2.120)
i=1
The polynomial generator matrix can be realized in controller canonical form by a linear sequential circuit consisting of b shift registers, the ith of length vi, with the outputs formed as modulo 2 sums of the appropriate shift register contents. For example, in Fig. 2.9 we have shown the controllable canonical form of the polynomial encoder given by the encoding matrix
G(D)=(ID2D 1 1+D+D2)
(2.121)
whose constraint lengths of the first and second inputs are 1 and 2, respectively, and whose overall constraint length is 3. The number of memory elements required for the controller canonical form is equal to the overall constraint length. In Fig. 2.14 we show the controller canonical form of a rate R = 2/3 encoder whose constraint lengths of the first and second inputs are 1 and 3, respectively, and whose overall constraint length is 4. We will now proceed and characterize the basic encoding matrix whose controller canonical form requires the least number of memory elements over all equivalent basic encoding matrices.
Chapter 2
56
Convolutional Encoders-Structural Properties
Definition A minimal-basic encoding matrix is a basic encoding matrix whose overall constraint length v is minimal over all equivalent basic encoding matrices. In the next section we shall show that a minimal-basic encoding matrix is also minimal in a more general sense.
Let G(D) be a basic encoding matrix. The positions for the row-wise highest order coefficients in G(D) will play a significant role in the sequel. Hence, we let [G(D)]h be a (0, 1) -matrix with 1 in the position (i, j) where deg g11 (D) = vi and 0 otherwise. Theorem 2.22 Let G(D) be a b x c basic encoding matrix with overall constraint length v. Then the following statements are equivalent:
(i) G(D) is a minimal-basic encoding matrix. (ii) The maximum degree it among the b x b subdeterminants of G(D) is equal to the overall constraint length v. (iii) [G(D)]h has full rank. Proof.
Let us write
G(D) = Go(D) + G1 (D)
(2.122)
where Dv1
Dv2
GI(D) =
[G(D)]h
(2.123)
Dvb
Then all entries in the ith row of G0(D) are of degree < vi. The maximum degree g among the b x b subdeterminants of G(D) is < v. It follows immediately from (2.123) that (ii) and (iii) are equivalent. Thus, we need only prove that (i) and (ii) are equivalent. (i = ii). Assume that G(D) is minimal-basic.
Suppose that g < v, that is, rank [G(D)]h < b. Denote the rows of G(D) by r1, r2, ... , rb and the rows of [G(D)]h by [rl], [r2], ... , [rb]. Then there is a linear relation
[rill + [rig] +
(2.124)
+ [rid] = 0
The i th row of G 1(D) is Dv' [ri ]. Without loss of generality we can assume that vid > Vii'
1, 2, ..., d - 1. Adding Dv'd
v'l Dv'1 [rill + Dv'd-v2 Dv'2 [rill + ... +
Dv'd-v'd-1 D°'d-1 [rid-1 ]
(2.125)
= Dv'd ([ril ] + [rig l + ... + [rid-1 ])
to the idth row of G1 (D) reduces it to an allzero row. Similarly, adding r(D) = Dv'd-v'1 rl + Dv'd - v'2r2 + ... + Dv'd
_"d-l
rid-1
(2.126)
to the idth row of G(D) will reduce the highest degree of the idth row of G(D) but leave the other rows of G(D) unchanged. Thus, we obtain a basic encoding matrix equivalent to G(D) with an overall constraint length that is less than that of G(D). This is a contradiction to the assumption that G(D) is minimal-basic, and we conclude that g = v. (ii i). Assume that g = v. Let G'(D) be a basic encoding matrix equivalent to G (D). From Theorem 2.20 it follows that G'(D) = T (D) G (D), where T (D) is a b x b polynomial matrix with determinant 1. Since det T (D) = 1, the maximum degree among the b x b subdeterminants of G'(D) is equal to
Minimal-Basic Encoding Matrices
Section 2.5
57
that of G(D). Hence, µ is invariant over all equivalent basic encoding matrices. Since µ is less than or equal to the overall constraint length for all equivalent basic encoding matrices, it follows that G(D) is a minimal-basic encoding matrix. EXAMPLE 2.15 The basic encoding matrix for the encoder in Fig. 2.9
+DD
G(D)
DZ
1
1
1+D+D2
1
0
(2.127)
has
[G(D)]h =
1
(2.128)
Ol)
with full rank and, hence, is a minimal-basic encoding matrix.
Corollary 2.23 Let G(D) be a b x c basic encoding matrix with maximum degree it among its b x b subdeterminants. Then G(D) has an equivalent minimal-basic encoding matrix whose overall constraint length v = µ. Proof. Follows from the proof of Theorem 2.22 and the fact that µ is invariant over all equivalent basic encoding matrices. EXAMPLE 2.16 Consider the encoding matrix for the encoder in Fig. 2.14, viz.,
G'(D)
D 1 1+D 1+D2+D3 1+D+D2+D3 0 )
(2.129)
The rank of
[G'(D)]h =
1
1
0
1
1
0
(2.130)
is one. Hence, G'(D) cannot be a minimal-basic encoding matrix. On the other hand, G'(D) has the following three b x b subdeterminants:
1+D+D3, 1+D2+D3, 1+D+D2+D3 and, thus, it = 3. Hence, any minimal-basic matrix equivalent to G'(D) has overall constraint length 3.
The equivalent basic encoding matrix for the encoder in Fig. 2.9 has [G(D)]h of full rank (see Example 2.15) and, hence, is such a minimal-basic encoding matrix.
We can use the technique in the proof of Theorem 2.22 to obtain a minimal-basic encoding matrix equivalent to the basic encoding matrix G'(D) in Example 2.16. We simply multiply the first row of G'(D) by D"-°' = D2 and add it to the second row: 1
(
D2
0
1+D
D
1
1) 1+D2+D3 1+D+D2+D3 0 1+D 1
D
(2.131)
1
1+D+D2 D2
It is easily shown that the two minimal-basic encoding matrices (2.127) and (2.131) are equivalent. Thus, a minimal-basic encoding matrix equivalent to a given basic encoding matrix is not necessarily unique.
Chapter 2
58
Convolutional Encoders-Structural Properties
In general, we have [For70a] the following simple algorithm to construct a minimal-basic encoding matrix equivalent to a given basic encoding matrix:
Algorithm MB (Minimal-basic encoding matrix) MB1. If [G(D)]h has full rank, then G(D) is a minimal-basic encoding matrix and we STOP; otherwise go to the next step.
MB2. Let [ri, ], [ri2.... [rid] denote a set of rows of [G (D)]h such that vid > vii, 1 <
j
D°`d-°'Iri, + DV'd-°'2ri2 + ... + D'jd-V`d-irid_, to the idth row of G(D). Call the new matrix G(D) and go to MB1. By combining Theorem 2.16 and Corollary 2.23, we have Corollary 2.24 Every rational generator matrix is equivalent to a minimal-basic encoding matrix.
Before we prove that the constraint lengths are invariants of equivalent minimal-basic encoding matrices, we need the following
Lemma 2.25 Let V be a k-dimensional vector space over a field F, and let {al, a2, ... ak} be a basis of V. Let {,Q1, N2, ... , O e) be a set of f, £ < k, linearly independent vectors of V. Then there exist k - £ vectors aie+,, aie+2, ... , aik, 1 < ie+i < ,
< ik < k, such that {01, / 2, ... -)3e, aie+1, aie+2, ... , aik} is also a basis of V. Consider the vectors in the sequence 0C31, a2..... pe, a1, a2, ... , ak one by one successively from left to right. If the vector under consideration is a linear combination of vectors to the left of it, then delete it; otherwise keep it. Finally, we obtain a basis Q1, a2, ... , Of, aie+1+ aie+2, ... , aik, 1 < 1e+1 < ... < ik < k, of V. ie+2 <
Proof.
Theorem 2.26 The constraint lengths of two equivalent minimal-basic encoding matrices are equal one by one up to a rearrangement. Proof. Let G(D) and G'(D) be two equivalent minimal-basic encoding matrices with constraint lengths v1, v2, ... , vb and v v2, ... , vb, respectively. Without loss of generality, <_ vb and vi < v2 < ... < vb. we assume that vl <_ v2 < 1',
Now suppose that vi and vl are not equal for all i, 1 < i < b. Let j be the smallest v'. Then without loss of generality we assume that vj < v'. From index such that vj the sequence g, (D), g2(D), ... , g1(D), gi (D), g2 (D), ... , g'b (D), according to Lemma 2.25 we can obtain a basis gi (D), 92(D), ..., gj (D), g'.+ (D), g' .}2 (D)..... gee (D) of C. These b row vectors form an encoding matrix G"(D) which is equivalent to G'(D). Let {g'1(D),gi(D), ... ,gb(D)} \ {g'. (D),g' +2
_
(D)....,gih(D)}
(2 .132)
From our assumptions it follows that (2.133) e=1
e=1
Section 2.5
Minimal-Basic Encoding Matrices
59
Then we have b
j
V" = L Ub e=1
t=j+1
b
I
Vie <
vie = V'
vie +
(2.134)
e=j+1
c=1
where v' and v" are the overall constraint lengths of the encoding matrices G'(D) and G" (D), respectively. From Theorem 2.14 it follows that there exists a b x b nonsingular matrix T (D) over F2(D) such that
G"(D) = T(D)G'(D)
(2.135)
Since G'(D) is basic, it has a polynomial right inverse Gi-1(D), and it follows that
T (D) = G"(D)Gi-1(D)
(2.136)
is polynomial. Denote by t' and u" the highest degrees of the b x b minors of G'(D) and G"(D), respectively. It follows from (2.135) that
µ" = deg I T (D) I +A'
(2.137)
Clearly, v" > p" and, since G'(D) is minimal-basic, v' = tt' by Theorem 2.22. Thus,
v">degIT(D)I+v'>v'
(2.138)
which contradicts (2.134) and the proof is complete.
Corollary 2.27 Two equivalent minimal-basic encoding matrices have the same memory.
Next, we will consider the predictable degree property for polynomial generator matrices, which is a useful analytic tool when we study the structural properties of convolutional generator matrices. Let G (D) be a rate R = b/c binary polynomial generator matrix with vi as the constraint length of its ith row gi (D). For any polynomial input u(D) _ (u1 (D) u2(D) ... ub(D)), the output v(D) = u(D)G(D) is also polynomial. We have b
deg v(D) = deg u(D)G(D) = deg
ui (D)gi (D) =1
(2.139)
< max (deg ui (D) + vi } 1
where the degree of a polynomial vector is defined to be the maximum of the degrees of its components. Definition A polynomial generator matrix G(D) is said to have the predictable degree property if for all polynomial inputs u(D) we have equality in (2.139). The predictable degree property guarantees that short codewords will be associated with short information sequences. We have the following
Theorem 2.28 Let G(D) be a polynomial generator matrix. Then G(D) has the predictable degree property if and only if [G(D)]h has full rank. Proof. Without loss of generality, we assume that V1 >
V2>...> Vb
(2.140)
Let us write
G(D) = Go(D) + G1(D)
(2.141)
60
Convolutional Encoders-Structural Properties
Chapter 2
where Dv' D v2
[G(D)1h
G1(D) =
(2.142)
D°H
Then, all entries in the ith row of Go(D) are of degree < vi. Assume that [G(D)]h has full rank. For any input polynomial vector u (D) we have
v(D) = u(D)G(D) = u(D)(Go(D) + G1(D)) (2.143)
b
ui (D) (goi (D) + D' [G(D)l hi ) i-1
where goi(D) and [G(D)]hi are the ith rows of G0(D) and [G(D)]h, respectively. Since [G(D)]h has full rank, we have
[G(D)]hi # 0, i = 1, 2, ... , b
(2.144)
Thus,
deg(ui(D)goi(D)+ui(D)D [G(D)lhi) = deg(ui (D)D' [G(D)]hi) = deg ui (D) + v,
(2.145)
It follows from (2.143) and (2.145) that
deg v(D) = max {deg ui (D) + vi} 1
(2.146)
Now assume that [G(D)]h does not have full rank. Then, there exists a nonzero constant binary vector u(0) = (u (0)u(20) ... ue°)) such that u (0) [G(D)]h = 0
Let u(0)(D) = (ui°) u2 °)D°1-°2 v(01(D)
( 2 .147)
ub°)D°'-°e) be a polynomial input vector. Then,
= u(0)(D)G(D) = u (0) (D) (Go (D) + G1(D)) = u(0)(D)Go(D) b b
_
(2.148)
u(0)Dv'-V'goi(D)
i-1
Since deggoi(D) < vi, it follows that
degu(°)D' 'goi(D) < v1, i = 1, 2, ... , b
(2.149)
degv(0)(D) < v1
(2.150)
and, hence, that
But max {deg u(°)Dv1
1
' + vi} = v1
(2.151)
Therefore, the G(D) does not have the predictable degree property.
Since a basic encoding matrix is minimal-basic if and only if [G(D)]h has full rank (Theorem 2.22), we immediately have the following theorem [For73a].
Theorem 2.29 Let G (D) be a basic encoding matrix. Then G (D) has the predictable degree property if and only if it is minimal-basic.
Section 2.6
Minimal Encoding Matrices and Minimal Encoders
61
EXAMPLE 2.17 The catastrophic (and, hence, not basic) encoding matrix
G(D)=(1+D3 1+D+D2+D3)
(2.152)
has the predictable degree property since
[G(D)]h = (
1
(2.153)
1
has full rank.
EXAMPLE 2.18 The basic encoding matrix (cf. Example 2.16)
G(D) =
1+D
D
1
1+D2+D3 1+D+D2+D3 0 /
(2.154)
has
FG(D)lh =
(2.155)
of rank 1 and, hence, does not have the predictable degree property.
2.6 MINIMAL ENCODING MATRICES AND MINIMAL ENCODERS We will now proceed to show that a minimal-basic encoding matrix is also minimal in a more general sense, but first we need the following definitions.
Definition The encoder state o, of a realization of a rational generator matrix G(D) is the contents of its memory elements. The set of encoder states is called the encoder state space. If G(D) is polynomial, then the dimension of the encoder state space of its controller canonical form is equal to the overall constraint length v. Definition Let G(D) be a rational generator matrix. The abstract state s(D) associated with an input sequence u(D) is the sequence of outputs at time 0 and later, which are due to that part of u(D) that occurs up to time -1 and to the allzero inputs thereafter. The set of abstract states is called the abstract state space. The abstract state depends only on the generator matrix and not on its realization. Distinct abstract states must spring from distinct encoder states at time 0. The number of encoder states is greater than or equal to the number of abstract states.
Let P be the projection operator that truncates sequences to end at time -1, and let Q = 1 - P be the projection operator that truncates sequences to start at time 0. That is, if u(D) = udDd + ud+l Dd+1 +
.
(2.156)
then (
u(D)P = {
udDd +ud+1Dd+1 +
+u_1D-1, d < 0
0,
d>0
(2.157)
and
u(D)Q = uo + u1 D + u2D2 +
(2.158)
Chapter 2
62
Convolutional Encoders-Structural Properties
Clearly,
P+Q=1
(2.159)
Thus, the abstract state s(D) associated with u(D) can be written concisely as
s(D) = u(D)PG(D)Q
(2.160)
The encoder in Fig. 2.14 has 16 encoder states and 8 abstract states. The correspondence between them is tabulated in Table 2.1. TABLE 2.1 Correspondence between encoder and abstract states of encoder G'(D) in Fig. 2.14.
Encoder states (at time 0) 0 000
0 001 0
010
(
X X
0 011
101
0 110
000
011
010 X1
100
0
1
001
X1
0
101
X X
Abstract state
1
100 1
111
0
X1
111
110
(0 0 0)
(1
1
0)
(1+D 1+D 0) (D D 0)
(D+D2 1+D+D2 0)
(1+D+D2 D+D 2
0)
(1 + D2 D2 0)
(D2 I+ D2 0)
For example, the input sequence
u(D) = +(01)D-3+(01)D-2+(00)D-' +
(2.161)
will give the encoder state
at time 0. The corresponding abstract state is
s(D) = u(D)PG(D)Q = ((D-3 + D-2)(1 + D2 + D3) (D-3 + D-2)(1 + D + D2 + D3) 0) Q
(2.162)
= (D D 0) In Fig. 2.15 we give the observer canonical form for the encoding matrix G'(D) in Example 2.14. Note that in the observer canonical form of a polynomial generator matrix the abstract states are in one-to-one correspondence with the encoder states since the contents of the memory elements are simply shifted out in the absence of nonzero inputs. Since the abstract state does not depend on the realization we have the same abstract states in the observer canonical form as in the controller canonical form.
Section 2.6
Minimal Encoding Matrices and Minimal Encoders
63
(2)
F N V(2)
Figure 2.15 The observer canonical form of encoding matrix G'(D) in Example 2.14.
At a first glance it might be surprising that we have only 8 abstract states but as many as 64 encoder states in the observer canonical form in Fig. 2.15. We have some (in this case 56) encoder states that cannot be reached from the zero state-this realization is not controllable. The encoder state space of the controller canonical form of a generator matrix of overall
constraint length v contains 2' states. This type of realization plays an important role in connection with minimal-basic encoding matrices, as we shall see in the sequel, but first we prove a technical lemma.
Lemma 2.30 Let G(D) be a minimal-basic encoding matrix, and let n
u(D) _
(u(l)u;2)
... u(b))Di
(2.163)
where m is the memory of G(D) and n > -m. If u(D)G(D)Q = 0, then u(D) = 0. Proof.
Let
v(D) = u(D)G(D)
(2.164)
v(D)Q = u(D)G(D)Q = 0
(2.165)
Then, by assumption,
Thus, each coefficient of Di, i > 0, in v(D) must be 0. Write G(D) as in (2.122), then we have Dv, Dv2
v(D) = u(D)Go(D) + u(D)
[G(D)]h
(2.166)
DVb
Without loss of generality, we can assume that
m=vl
=v2=...=Vg >Ve+1 >_
>_
Then the coefficient of Dm+n where m + n > 0, in v(D) is (uni) un2)
...
unr) 0
... 0)[G(D)]h
(2.167)
Convolutional Encoders-Structural Properties
Chapter 2
64
which must be 0. Since G (D) is a minimal-basic encoding matrix, [G (D)]h has rank b. Hence,
ult = u_ = u= 0. Proceeding in this way, we can prove that u(D) = 0. The following theorem shows that the controller canonical form is a minimal realization (minimal number of memory elements) of a given minimal-basic encoding matrix.
Theorem 2.31 Let G(D) be a minimal-basic encoding matrix whose overall constraint length is v. Then1
# {abstract states} = 2°
(2.168)
Proof. Consider the controller canonical form of the minimal-basic encoding matrix G(D). Input sequences of the form V1
u(D) _ ' u_;(1)D -i r=t
Vy
Vz
-r
(2) u_1 D -l
(b) u_i D -L
(2.169)
ti-r
will carry us to all encoder states at time 0. Then we have the abstract states
s(D) = u(D)G(D)Q
(2.170)
where u(D) is of the form given in (2.169). Every abstract state can be obtained in this way, and we have
# {abstract states} <2 °
(2.171)
To prove that the equality sign holds in (2.171), it is enough to show that u(D) = 0 is the only input that produces the abstract state s(D) = 0. This follows from Lemma 2.30. EXAMPLE 2.19 The encoder illustrated in Fig. 2.9 has the following eight abstract states:
(0 0 0), (1 0 1), (D 0 D), (1 + D 0 D), (1 (1+D 1 1 + D), (D 1 D)
1
0), (0
1
1),
Definition A convolutional generator matrix is minimal if its number of abstract states is minimal over all equivalent generator matrices. Before we can show that every minimal-basic encoding matrix is also a (basic) minimal encoding matrix, we have to prove the following lemmas:
Lemma 2.32 Only the zero abstract state of a minimal-basic encoding matrix G(D) can be a codeword. Proof We can assume that the abstract state s(D) arises from an input u(D), which is polynomial in D-r and of degree < m and without a constant term, that is, u(0) = 0. Thus,
s(D) = u(D)G(D)Q
(2.172)
u(D)G(D) = w(D) +s(D)
(2.173)
Then it follows that
where w(D) is polynomial in D-1 without a constant term. Assume that s(D) is a codeword; that is, there is an input u'(D) E 1F2((D)) such that
s(D) = u'(D)G(D)
(2.174)
Since s(D) is polynomial and G(D) has a polynomial inverse, it follows that u'(D) E F2[D]. '# {
} denotes the cardinality of the set {
}.
Section 2.6
Minimal Encoding Matrices and Minimal Encoders
65
Combining (2.173) and (2.174), we have
(u(D) + u'(D))G(D) = w(D)
(2.175)
(u(D) +u'(D))G(D)Q = 0
(2.176)
u(D) + u'(D) = 0
(2.177)
Consequently
By Lemma 2.30,
and, since u(D) is polynomial in D-t without a constant term and u'(D) is polynomial, we conclude that u'(D) = 0. It follows from (2.174) that s(D) = 0.
Lemma 2.33 Let G(D) and G'(D) be equivalent generator matrices. Then, every abstract state of G (D) can be expressed as a sum of an abstract state of G'(D) and a codeword. Furthermore, if G'(D) is minimal-basic, then the expression is unique. Proof. Assume that G(D) = T(D)G'(D), where T (D) is a b x b nonsingular matrix
over 1F2(D). Any abstract state of G(D), SG(D), can be written in the form u(D)G(D)Q, where u(D) is polynomial in D-t without a constant term. Thus, we have
SG(D) = u(D)G(D)Q = u(D)T(D)G'(D)Q = u(D)T (D) (P + Q)G'(D)Q = u(D)T (D)PG'(D)Q + u(D)T (D)QG'(D)Q Since u(D)T (D) P is polynomial in
D-1
(2.178)
without a constant term, it follows from (2.160) that
SG'(D) = u(D)T(D)PG'(D)Q
(2.179)
is an abstract state of G'(D). Furthermore, u(D)T (D) Q is a formal power series, and so is
u(D)T(D)QG'(D). Hence, v(D)
def
u(D)T(D)QG'(D)Q = u(D)T (D)QG'(D)
(2.180)
is a codeword encoded by G'(D). Combining (2.178), (2.179), and (2.180), we obtain
SG(D) = sG'(D) + v(D)
(2.181)
and we have proved that every abstract state of G(D) can be written as a sum of an abstract state of G'(D) and a codeword. Assume now that G'(D) = Gmb(D) is minimal-basic. To prove uniqueness we assume that (2.182) SG(D) = Smb(D) + v(D) = Smb(D) + v'(D) where Smb(D), snb(D) are abstract states of Gmb(D), and v(D), v'(D) are codewords. Since the sum of two abstract states is an abstract state and the sum of two codewords is a codeword, it follows from (2.182) that
Smb(D) = Smb(D) +s;,,b(D) = v(D) + v'(D) = v"(D)
(2.183)
is both an abstract state of Gmb(D) and a codeword. From Lemma 2.32 we deduce that smb(D)
=0
(2.184)
and, hence, that Smb(D) = Smb(D)
(2.185)
v(D) = v'(D)
(2.186)
and
which completes the proof.
13
Chapter 2
66
Convolutional Encoders-Structural Properties
Theorem 2.34 Let G (D) be any generator matrix equivalent to a minimal-basic encoding matrix Gmb(D). Then
# {abstract states of G(D)} > # {abstract states of Gmb(D)} Proof.
(2.187)
Consider the following map:
{abstract states of G(D)} -* {abstract states of Gmb(D)} SG(D) H Smb(D) where
SG(D) = Smb(D) + v(D)
(2.188)
in which v(D) is a codeword. From Lemma 2.33 it follows that 0 is well defined. By the first statement of Lemma 2.33 we can prove that every abstract state Smb (D) can be written as a sum of an abstract state of G(D) and a codeword. Hence, we conclude that 4> is surjective which completes the proof. Remark. The map 0 in Theorem 2.34 is linear. Moreover, if G(D) is a minimal encoding matrix, then 0 is necessarily an isomorphism of the abstract state space of G(D) and that of Gmb(D).
From Theorem 2.34 follows immediately
Corollary 2.35 Every minimal-basic encoding matrix is a (basic) minimal encoding matrix.
Next we will prove the following little lemma:
Lemma 2.36 Let G(D) be a b x c matrix of rank b whose entries are rational functions of D. Then a necessary and sufficient condition for G(D) to have a polynomial inverse is: for each u(D) E Fb(D) satisfying u(D)G(D) E 1F2 [D] we must have u(D) E 1F2[D]. Proof. Since the necessity of the condition is obvious, we shall prove only the sufficiency. Let us assume that G (D) does not have a polynomial inverse. Then, from Theorem 2.8 it follows that a, (D) fii (D)
G(D) = A(D)
a2(D)
B(D)
,82 (D)
ab(D)
0
(2.189)
0
fib (D)
where ab(D)
1. Clearly,
u(D) _ (0...0 fib (D) )A1(D) V F [D]
(2.190)
but
u(D)G(D) = (0...0 ab(D) )A1(D)G(D)
(2.191)
= (0...0 1 0...0)B(D) E FZ[D] Hence, we have proved our lemma.
Define the span of a Laurent series f (D) in D as the interval from the index of the first nonzero component of f (D) to the index of the last nonzero component, if there is one, or to
Section 2.6
Minimal Encoding Matrices and Minimal Encoders
67
infinity otherwise. In other words, if f (D) is a Laurent series, then
span f (D) _
[del f (D), deg f (D)], [del f (D), oo),
if deg f (D) < oo otherwise
(2.192)
We are now well prepared to prove the following beautiful theorem on minimal generator matrices:
Theorem 2.37 Let G(D) be a generator matrix and Gmb(D) be an equivalent minimalbasic encoding matrix. Then the following statements are equivalent:
(i) G(D) is a minimal generator matrix. (ii) # {abstract states of G(D)} = # {abstract states of Gmb(D)}. (iii) Only the zero abstract state of G (D) can be a codeword. (iv) G(D) has a polynomial right inverse in D and a polynomial right inverse in D-1. (v) span u(D) c span u(D)G(D) for all rational input sequences u(D). Proof. It follows immediately from Theorem 2.34 that (i) and (ii) are equivalent. Next, we prove that (ii) and (iii) are equivalent. In the proof of Theorem 2.34 we have defined a surjective (linear) map
{abstract states of G(D)} -> {abstract states of Gmb(D)} SG(D) H Smb(D) where
SG(D) = Smb(D) + v(D)
(2.193)
in which v(D) is a codeword. Clearly, 0 is injective if and only if (ii) holds, and if and only if (iii) holds. Hence, (ii) and (iii) are equivalent. Then we prove that (iii) and (iv) are equivalent. (iii iv). Suppose that (iii) holds. First, we shall prove that G(D) has a polynomial
right inverse in D-1. Let u(D) E lFb(D) and assume that v(D) = u(D)G(D) is polynomial in D-1. Then D-1 v(D) is polynomial in D-1 without a constant term, that is, D-1v(D)Q = 0 (2.194) But
D-'v(D)Q = D-lu(D)(P + Q)G(D)Q = D-1u(D)PG(D)Q + D-lu(D)QG(D)Q = 0
(2.195)
D-1u(D)QG(D)Q = D-lu(D)QG(D)
(2.196)
where
is a codeword. Hence, from (2.195) and (2.196) it follows that the abstract state D-lu(D) PG(D)Q is a codeword and, then, since (iii) holds, it is the zero codeword. Thus,
D-lu(D)QG(D) = 0
(2.197)
D-lu(D)Q = 0
(2.198)
and, since G(D) has full rank,
or, in other words, u(D) is polynomial in D-1. Since every rational function in D can be written as a rational function in D-1, G (D) can be written as a matrix whose entries are rational
functions in D-1, we can apply Lemma 2.36 and conclude that G(D) has a polynomial right inverse in D-1.
Chapter 2
68
Convolutional Encoders-Structural Properties
Now we prove that G(D) has a polynomial right pseudoinverse in D. Let G-1(D) be a polynomial right inverse in D-1 of G(D). Then there exists an integer s > 0 such that DSG=1(D) is a polynomial matrix in D and
G(D)DSG=1(D) = DSIb
(2.199)
That is, DSG=1(D) is a polynomial right pseudo-inverse in D of G(D). Next, we prove that G(D) also has a polynomial right inverse in D. Let u(D) E Fb(D) and assume that v(D) = u(D)G(D) is polynomial in D. Then,
v(D) = u(D)PG(D) + u(D)QG(D)
(2.200)
where u(D) Q is a formal power series. Thus, u(D) QG (D) is also a formal power series and, since v(D) is polynomial in D, it follows that u(D)PG(D) is a formal power series. Then
u(D)PG(D) = u(D)PG(D)Q
(2.201)
is an abstract state. From (2.200) it follows that it is also a codeword, and, since (iii) holds, we conclude that
u(D)PG(D) = 0
(2.202)
u(D)P = 0
(2.203)
Since G(D) has full rank,
or, in other words, u(D) is a formal power series. Since v(D) is polynomial and DSG=1(D) is a polynomial matrix in D, it follows that
v(D)DSG=1(D) = u(D)G(D)DSG(D) = u(D)DS
(2.204)
is polynomial; that is, u(D) has finitely many terms. But u (D) is a formal power series. Hence, we conclude that it is polynomial in D. By Lemma 2.36, G(D) has a polynomial right inverse
in D. (iv
iii). Assume that the abstract state sG (D) of G(D) is a codeword; that is,
SG(D) = u(D)G(D)Q = u'(D)G(D)
(2.205)
where u(D) is polynomial in D-1 but without a constant term and u'(D) E lFb((D)). Since sG(D) is a formal power series and G(D) has a polynomial right inverse, it follows that u(D) is also a formal power series. Let us use the fact that
Q=1+P
(2.206)
SG(D) = u(D)G(D) + u(D)G(D)P = u'(D)G(D)
(2.207)
and rewrite (2.205) as
Let G=1(D) be a right inverse of G(D) whose entries are polynomials in D. Then
u(D)G(D)G=1(D) +u(D)G(D)PG=1(D) = u'(D)G(D)G=1(D)
(2.208)
which can be simplified to
u(D) + u(D)G(D)PG-1 (D) = u'(D)
(2.209)
Since u(D)G(D)P is polynomial in D-1 without a constant term, it follows that u(D)G(D) PG=1(D) is polynomial in D-1 without a constant term. Furthermore, u(D) is polynomial in D-1 without a constant term and u'(D) is a formal power series. Thus, we conclude that u'(D) = 0 and, hence, that SG (D) = 0. It remains to prove that (iv) and (v) are equivalent.
Section 2.6
Minimal Encoding Matrices and Minimal Encoders
69
(iv = v). Assume that G(D) has a polynomial right inverse in D, G 1(D), and let v(D) = u(D)G(D) be any sequence in C. Then u(D) = v(D)G(D), so (i) deg u(D) < oo if deg v(D) < 00 (ii) del u(D) > del v(D) Similarly, if G(D) has a polynomial right inverse in D
(iii)
then
degu(D) <degv(D)
Conditions (i) - (iii) imply that span u(D) c span v(D). (v= iv). It follows from (v) that for each u (D) E IFb(D) satisfying u (D) G (D) E F2[D] we must have u(D) E lFz[D]. By Lemma 2.36, G(D) has a polynomial right inverse in D. Similarly, (v) implies that for each u(D) E Fb(D) satisfying u(D)G(D) E F2'[D-1] we must have u(D) E Fz[D-1]. Again, by Lemma 2.36, G(D) has a polynomial right inverse in D-1.
Corollary 2.38 A minimal generator matrix is a minimal encoding matrix. Proof. Follows immediately from Theorem 2.5 and Theorem 2.37 (iv). Corollary 2.39 A minimal encoding matrix is noncatastrophic. Proof. Follows immediately from Corollary 2.13 and Theorem 2.37 (iv). Theorem 2.37 (v) is closely related to the following (non)minimality criteria [LFM94]:
Theorem 2.40 A generator matrix can be nonminimal in only three ways:
(i) If there is an infinite nontrivial sequence of (encoder) states (not the zero state sequence) that produces an allzero output sequence.
(ii) If there is a nontrivial transition from the zero state (not to the zero state) that produces a zero output.
(iii) If there is a nontrivial transition to the zero state (not from the zero state) that produces a zero output. Proof. Condition (i) corresponds to a case in which there is an infinite input u(D) that produces a finite output u(D)G(D) (the "catastrophic" case); condition (ii) corresponds to a case in which there is an input u(D) that produces an output u(D)G(D) with del u(D)G (D) > delu(D); and condition (iii) corresponds to a case in which there is a finite input u(D) that
produces a finite output u(D)G(D) with degu(D)G(D) < degu(D). It is easy to see that spanu(D)G(D) does not cover spanu(D) if and only if one of these three conditions is satisfied.
The following simple example shows that not all basic encoding matrices are minimal. EXAMPLE 2.20 Consider the basic encoding matrix
G(D)
DD 1
1
+DD )
(2.210)
which has it = 0 but v = 2. Clearly, it is not minimal-basic. The equivalent minimal-basic encoding matrix, Gmb(D) =
1\
1
0
1
1
(2.211)
has only one abstract state, viz., Smb = (0, 0), and can be realized without any memory element. Since G(D) has two abstract states, viz., so = (0, 0) and s1 = (1, 1), it is not minimal!
Before we state a theorem on when a basic encoding matrix is minimal, we will prove two lemmas:
Chapter 2
70
Convolutional Encoders-Structural Properties
Lemma 2.41 Let ft (D), f2(D), ... , fe (D) E ]F2LD] with gcd(fi(D), .f2(D), ... , .fe(D)) = 1
(2.212)
n = max {deg fi(D), deg f2(D), ... , deg fe(D)}
(2.213)
and let
Then form > n D-m ft (D), D-m f2(D), ... , D-m fe (D) E F2[D-t] and
gcd(D-m.fl(D), D-mf2(D),... , D-mfe(D)) = Proof.
D-(m-n)
(2.214)
Let
f (D) = DS' gi (D), i = 1, 2, ... , f
(2.215)
where si is the start of fi (D) and gi (D) E ]F2[D] is delayfree. From (2.212) follows min {st, s2, ... , se} = 0
(2.216)
and
gcd(gi(D), g2(D), ... , ge(D)) =
(2.217)
1
Form > n
D-mf (D) = D-mDS'gi(D) = D-(m-s' -deg gi (D)) (D- deg gi (D) gi (D))
(2.218)
= D-(m-degf;(D)) (D- deg gi (D) gi (D)), i = 1, 2, ... , £
where the last equality follows from the fact that
deg f i (D) = si + deg gi (D), i = 1, 2, ... , £ Since D- deg g; (D) gi (D), i = 1, 2, ..., £, are delayfree polynomials in ]F2 (2.218) that
(2.219) [D-1
] it follows from
gcd (D-m fl (D), D-m f2(D), ... , D-m fe (D)) = D-(m-n) x gcd D - degg1 (D) gt (D), D - deg92 (D) gz(D)> ... , D - de gge (D) ge(D))
(2.220)
gcd (D-degg1(D)gl(D), D-deg92(D)g2(D), ... , D-degge(D)ge(D)) = 1
(2.221)
Clearly,
and the proof is complete.
Lemma 2.42 Let G(D) be a basic encoding matrix, and let r and s be the maximum degree of its b x b minors and (b - 1) x (b - 1) minors, respectively. Then the bth invariant 1/D-(r-s) factor of G(D) regarded as a matrix over 1F2(D-1) is Proof. Let G (D) = (gi1 (D)), 1 < i < b, 1 < j < c, and let n = maxi, {deg Write G(D) as a matrix over F2 (D 1) as gi1(D)}.
G(D) =
D
n
n
.
e n(D)
_
I
n
G-1(D)
(2.222)
where
G-1(D) = (D-ngij(D))
(2.223)
is a matrix of polynomials in D-l. Since G(D) is basic it follows, by definition, that it has a polynomial right inverse. Hence, it follows from Theorem 2.8 and (2.64) that
al (D) = a2(D) = ... = ab(D) = 1
(2.224)
Section 2.6
Minimal Encoding Matrices and Minimal Encoders
71
Let Di(G(D)) be the greatest common divisor of the i x i minors of G(D). Since (2.40) (2.225) Ai (G(D)) = a, (D)a2(D) ... ai (D) we have in particular (2.226)
Ab(G(D)) = Ab-t(G(D)) = 1
An i x i minor of G_t (D) is equal to the corresponding minor of G(D) multiplied by D-n` Hence, by Lemma 2.41, we have
Ab(G-1(D)) =
D-(nb-r)
(2.227)
D-(n(b-1)-s)
(2.228)
and
Ab-1(G-1(D)) = Thus, the bth invariant-factor of G_1 (D) is (2.41) Yb
Ob(G-1(D)) _
(G
1
D-n
(D)) = Ab-1(G-1(D))
(2 . 229)
D-(r-s)
From (2.222) and (2.229) it follows that the bth invariant-factor of G (D), regarded as a matrix over ]F2[D-1], is I D-n 1 D-n
D-(r-s)
(2.230)
D-(r-s)
We are now ready to prove the following theorem.
Theorem 2.43 A basic encoding matrix G(D) is minimal if and only if the maximum degree of its b x b minors is not less than the maximum degree of its (b - 1) x (b - 1) minors. Proof. From Theorem 2.37 it follows that a basic encoding matrix G(D) is minimal if and only if it has a polynomial right inverse in D-1. By the invariant-factor decomposition, G(D) has a polynomial right inverse in D-1 if and only if the inverse of its bth invariant-factor,
regarded as a matrix over F2[D-1], is a polynomial in D-1. By applying Lemma 2.42 the theorem follows. We will now briefly describe a "greedy" construction of a minimal-basic encoding matrix for a convolutional code C [Roo79]. The construction goes as follows. Choose the generator g, (D) as a nonzero polynomial code sequence of least degree. Choose g2(D) as a nonzero polynomial code sequence of least degree not in the rate R = 1/c code CI encoded by gi (D); choose g3 (D) as a nonzero polynomial code sequence of least degree not in the rate R = 2/c code C2 encoded by g1 (D) and g2(D), and so forth until a set G(D) = {gi (D), 1 < i < b} of b generators has been chosen that encodes C. It is easy to see that the degrees deggi (D) are uniquely defined by C; in fact, they are the constraint lengths vi. The sum >i vi is minimal over all equivalent polynomial generator matrices; it follows from Corollary 2.35 that it is minimal and, thus, from Theorem 2.37 (iv) that it is also basic. Hence, G(D) is minimal-basic. EXAMPLE 2.21 Let C be the rate R = 2/3 convolutional code encoded by the encoding matrix in Example 2.14,
G'(D) _
gi (D) g2(D)
=
1zD
D
z
1+D +D s 1 + D + D +D s
1
0
(2.231)
The shortest nonzero polynomial code sequence is gi (D), so v1 = 1. The next shortest code sequence not dependent on g' (D) has degree 2; for example,
g2(D) = D2g, (D) +gz(D) _ 1 1 + D + D2
D2)
(2.232)
72
Chapter 2
Convolutional Encoders-Structural Properties
so v2 = 2. A minimal-basic encoding matrix for C is
G(D) _
I+D 1
D
1
(2.233)
1+D+D2 D2
We return to our favorite basic encoding matrix given in Example 2.14, viz.,
G'(D)=
D 1+D 1+D2+D3 1+D+D2 +D3
1
(2.234) 1
In Example 2.16 we showed that G'(D) is not minimal-basic, that is, µ < v. Its controller canonical form (Fig. 2.14) requires four memory elements, but the controller canonical form of an equivalent encoding matrix (Fig. 2.9) requires only three memory elements. However, G'(D) is a basic encoding matrix, and, hence, it has a polynomial right inverse. Furthermore, it has a polynomial right inverse in D-1, viz.,
1+ D-t + D-2 + D-3
G-i (D) =
1+ D-t + D-3 D-2
+ D-3
D-1
D-1
(2.235)
D-1
Thus, from Theorem 2.37 we conclude that G'(D) is indeed a minimal encoding matrix. Its eight abstract states are given in Table 2.1. We will conclude this section by considering realizations of minimal encoding matrices. Definition A minimal encoder is a realization of a minimal encoding matrix G (D) with a minimal number of memory elements over all realizations of G(D).
Theorem 2.44 The controller canonical form of a minimal-basic encoding matrix is a minimal encoder. Proof. Follows immediately from Corollary 2.23 and Corollary 2.35. El EXAMPLE 2.22 The realization shown in Fig. 2.16 of the minimal encoding matrix G'(D) given in (2.234) is a minimal encoder. (The realization is obtained by minimizing G'(D) using a standard sequential circuits minimization method, see, for example, [Lee781. See also Appendix A.)
V(2)
Figure 2.16 A minimal encoder for the basic encoding matrix G'(D) given in (2.234).
Notice that the minimal realization shown in Fig. 2.16 is neither in controller canonical form nor in observer canonical form. This particular minimal encoding matrix does not have
Canonical Encoding Matrices*
Section 2.7
73
a minimal controller canonical form, but it has an equivalent minimal-basic encoding matrix whose controller canonical form (Fig. 2.9) is a minimal encoder for the same convolutional code.
2.7 CANONICAL ENCODING MATRICES* In this section we will revisit rational generator matrices and study them in depth.
For simplicity we consider first a rate R = 1/c rational generator matrix g(D) _ (g1(D) 92 (D) ... gc(D)), where g1(D), $2(D), ... , gc(D) are rational realizable functions. We may write
gi(D) = fi(D)lq(D), i = 1, 2, ... , c
(2.236)
where f1 (D), f2(D), ... , fc(D), q(D) E F2[D] and gcd(fl (D), .f2(D), ... , fc(D), q (D)) = 1
(2.237)
We define the constraint length of the rational 1 x c matrix
g(D) = (g, (D) 92(D) ... gc(D))
(2.238)
as
v def
max{deg f, (D), deg f2(D), ... , deg fc(D), degq(D)}
(2.239)
Clearly, g(D) can be realized with v delay elements in controller canonical form.
Next, we consider a rate R = b/c rational generator matrix G(D) and define its ith constraint length v, as the constraint length of the ith row of G(D), its memory m as m def
max {v1}
1
(2.240)
and its overall constraint length v as v
def
b V
(2.241)
i=1
For a polynomial generator matrix, these definitions coincide with the original definitions of the ith constraint length, memory, and overall constraint length given by (2.118), (2.119), and (2.120), repectively. A rational generator matrix with overall constraint length v can be realized with v memory elements in controller canonical form. This leads to
Definition A canonical generator matrix is a rational generator matrix whose overall constraint length v is minimal over all equivalent rational generator matrices. We have immediately the following two theorems.
Theorem 2.45 A minimal-basic encoding matrix is canonical. Proof. Let Gmb(D) be a minimal-basic encoding matrix with overall constraint length vmb and let G, (D) be an equivalent canonical generator matrix with overall constraint length v, Then v, < v,,,b. Since G,,,b(D) is minimal, its number of abstract states, 21mb, is minimal over all equivalent encoding matrices. Thus, 2vmb < #{abstract states of G,(D)} < 2°,. 'Therefore, Vmb < vc. Hence, Vmb = v, and Gmb(D) is canonical.
Theorem 2.46 A canonical generator matrix is minimal. *Note: Sections marked with an asterisk (*) can be skipped at the first reading without loss of continuity.
Chapter 2
74
Convolutional Encoders-Structural Properties
From the proof of Theorem 2.45 it follows that the number of abstract states
Proof.
of G,(D), 2° = 2'-', is minimal over all equivalent generator matrices. Hence, G,(D) is minimal.
Since a canonical generator matrix G(D) is minimal, the following theorem follows immediately from Corollary 2.38. Theorem 2.47 A canonical generator matrix is a canonical encoding matrix.
In Theorem 2.26 we proved the invariance of the constraint lengths of minimal-basic encoding matrices. By a straightforward generalization of that proof we show
Theorem 2.48 The constraint lengths of two equivalent canonical encoding matrices are equal one by one up to a rearrangement. Proof. Let C be the code encoded by two equivalent canonical encoding matrices G (D) and G'(D) with constraint lengths v1, v2, ... , vb and vl , vZ, ... , vb, respectively. Without loss < ve. of generality we assume that vi < v2 < ... < vb and vi < v2 < Now suppose that v, and vi are not equal for all i, 1 < i < b. Let j be the smallest index
such that v j 0 v'. Then without loss of generality we can assume that v j < v'. From the sequencegi (D), g2(D), ... , gj (D), g1 '(D), g' (D), ... , g'b(D) we can obtain by Lemma 2.25 a basis gi (D), g2(D), ..., gj (D), g'i,+ (D), g'i }z (D), ..., gee (D) of C. These b row vectors form an encoding matrix G"(D) which is equivalent to G'(D). Let {gi (D), gi(D), ... , gb (D)} \ {g';+1 (D), gi;+2 (D), ... , g'b (D)}
= {g1 (D), giz (D), ... , (D)} From our assumptions it follows that j b j b j V" _ V, + E vie < ve + 1: Vie < vie + e=1
e=j+1
e=1
e=j+1
e=1
(2.242)
b
E
vie =V '
(2.243)
e=j+1
where v' and v" are the overall constraint lengths of the encoding matrices G(D) and G" (D), respectively. The inequality (2.243) contradicts the assumption that G'(D) is canonical. This completes the proof. In virtue of Theorem 2.48, we may define the constraint lengths of a convolutional code
to be the constraint lengths of any canonical encoding matrix that encodes the code. By Theorem 2.45, a minimal-basic encoding matrix is canonical. Thus, we have Theorem 2.49 The constraint lengths of a convolutional code are equal to the constraint
lengths of any minimal-basic encoding matrix that encodes the code one by one up to a rearrangement.
The following lemma will be useful in the sequel when we prove our main results for canonical encoding matrices.
Lemma 2.50 Letg(D) = (gi (D) $2(D) ... g, (D)) be a 1 x c rational generator matrix, where gi (D), $2(D), ..., gc(D) E F2(D). Write
gi(D)=fi(D)/q(D), i=1,2,...,c
(2.244)
gcd(.fi(D), f2 (D), ... , .fc(D), q(D)) = 1
(2.245)
and assume that
Then g(D) is canonical if and only if both (i) and (ii) hold:
Section 2.7
Canonical Encoding Matrices*
75
(i) degq(D) < max{deg f, (D), deg f2(D), ... , deg ff(D)} (ii) gcd(.fi (D), .f2(D), ... , ff(D)) = I Proof. Let f (D) = (f1(D) f2 (D) ... ff(D)) = gcd(fi(D), f2(D), ... , .ff(D))t (D) It is clear that g(D), f (D), and £(D) are equivalent generator matrices. Suppose that g(D) is canonical. Then by the definitions of vg and v f
vg = max{deg f1 (D), deg f2(D), ... , deg f, (D), deg q (D)} < max{deg fi (D), deg f2(D), ... , deg fe(D)} = v f
(2.246)
from which (i) and vg = V f follow. Moreover,
vg = of = deg gcd(fl (D), f2 (D), ... , f, (D)) + ve (2.247) where ve is the constraint length of £(D). From (2.247), the equivalence of the generator matrices g(D) and £(D), and the assumption that g(D) is canonical, it follows that deg gcd(f1 (D), f2 (D), ... , f, (D)) = 0
(2.248)
which is equivalent to (ii). Conversely, suppose that (i) does not hold; that is,
degq(D) > max{deg f, (D), deg f2(D), ..., deg ff(D)}
(2.249)
From (2.249) it follows that
vg = deg q (D) > v f
(2.250)
and, hence, since g(D) and f (D) are equivalent, that g(D) is not canonical. Finally, suppose that (i) holds and that (ii) does not hold. Then from (2.247) follows that vg > ve and, since g(D) and £(D) are equivalent, that g(D) is not canonical. Next, we will introduce a few concepts from valuation theory [Jac89]: Let P be the set of irreducible polynomials in ]F2[D]. For simplicity we write p for the irreducible polynomial p(D) E P. For any nonzero g(D) E ]F2(D) we can express g(D) as
g(D) = pep(g(D)) h(D)/d(D)
(2.251)
where ep(g(D)) E Z, h(D) and d(D) E F2[D], gcd(h(D), d(D)) = 1, and p%h(D)d(D). The exponents e p (g (D)), p E P, that occur in this unique factorization are called the valuations
of the rational function g(D) at the primes p, or the p-valuations of g(D). By convention we define ep (0) = oo. The map
ep:FAD) -+ ZUfool g(D) H ep(g(D)) is called an exponential valuation of ]F2(D). Moreover, for any nonzero g(D) E ]F2(D) we can express g(D) as
g(D) = .f (D)/q(D)
(2.252)
where f (D), q(D) E ]F2[D]. We define
eD (g(D)) = degq(D) - deg f (D)
(2.253)
and eo-1 (0) = oo. Then eo-1 is also called an exponential valuation Of FAD). For notational convenience we introduce
P* =P U JD-')
(2.254)
It is easily verified that the p-valuation ep(g(D)) for each p e P* satisfies the properties:
76
Chapter 2
(i) (uniqueness of 0): (ii) (additivity): (iii) (strong triangle inequality):
Convolutional Encoders-Structural Properties
ep(g(D)) = oo, if and only if g(D) = 0 ep(g(D)h(D)) = ep(g(D)) + ep(h(D)) ep(g(D) + h(D)) > min{ep(g(D)), ep(h(D))}
From (2.251) and (2.253) we have the important product formula, as it is called in valuation theory, written here in additive form since we are using exponential valuations:
T ep(g(D)) deg p = 0
(2.255)
pEP*
for all nonzero g(D) E F2(D), where the degree of D-1 is defined as 1. EXAMPLE 2.23
Let g(D) = (D3 + Ds)/(1 + D + D2). Then we have el+n+Dz(g(D)) = -1, el+D(g(D)) = 2, eD(g(D)) = 3, eD-1(g(D)) = -3, and ep(g(D)) = 0 if p E P* and p # I + D + D2, 1 + D, D, D-1
(2.256)
It is easy to verify that
Y, ep (g (D)) deg p = 0
(2.257)
Pep*
The delay of a rational function g(D) can be expressed in terms of the exponential valuation eD as
delg(D) = eD(g(D))
(2.258)
Similarly, the degree of a rational function g(D) may be expressed as
degg(D) = -eD-4g(D))
(2.259)
A rational function g(D) is
(i) causal, (ii) polynomial, (iii) finite,
if del g(D) > 0, i.e., if eD(g(D)) > 0 if ep(g(D)) > 0 for all p E P if ep(g(D)) > 0 for all p E P except possibly D
Remark. Causal rational functions are also sometimes called proper (particularly when z-transforms are used rather than D-transforms).
We mentioned in Section 2.1 that a rational function g(D) may be expanded by long division into a Laurent series in powers of D and thus identified with a semi-infinite sequence over IF2 that begins with allzeros; for example,
1/(1+D) = 1+D+D2+
(2.260)
In this way, the set of rational functions F2(D) may be identified with a subset of the set F2((D)) of Laurent series in D over F2, which we shall call the rational Laurent series. These are precisely the Laurent series that eventually become periodic. The first nonzero term of a Laurent series expansion of g(D) in powers of D is the term involving Deo(g(D)) = Dde1g(D) that is, the Laurent series in D "starts" at a "time index" equal to the delay eD(g(D)) of g(D). Remark. Alternatively, a rational function may be expanded similarly into a Laurent series in D-1; for example,
1/(1+D)=D-1+D-2+
(2.261)
In this way, F2(D) may alternatively be identified with a subset of F2 ((D ')). If elements of IF2((D-1)) are identified with semi-infinite sequences over IF2 that finish with allzeros, then
Section 2.7
Canonical Encoding Matrices*
77
g(D) "ends" at a time equal to the degree -eD-I (g(D)) of g(D). This hints at why we use the notation p = D-1 for this valuation. We should emphasize that this second, alternative expansion is a purely mathematical construct. When we wish to identify a rational function in g(D) with a physical sequence of elements of F2, we shall always use the first Laurent series expansion in powers of D.
A rational function g(D) may be expanded as a Laurent series in powers of p with coefficients in the residue class field F2[D]p = F2[D]/plF2[D] for any p in P*, as follows. Let g(D) = f (D)/q(D), where f (D) and q(D) 0 are polynomial. If f (D) = 0, then the Laurent series in powers of p of g(D) is simply f (D) = 0. If f (D) 0, then we may write .f (D) = [f (D)l ppe°(f (D)) + .f (1) (D)
(2.262)
where [f (D)] p is the residue of f (D) p-ep(f (D)) modulo p, which is an element in the residue
class field F2[D]p = ]F2[D]/p]F2[D] and f(1)(D) is a polynomial (possibly 0) whose evaluation is greater than ep(f (D)). Iterating this process, possibly indefinitely, we obtain a Laurent series in powers of p, which is an element in F2[D]p((p)) and whose first nonzero term is [f (D)]p pep(f(D)). Similarly, we may expand the denominator q(D) into a Laurent series in powers of p whose first nonzero term is [q(D)] p peP(q(D)). Then by long division we obtain a Laurent series expansion of g(D) in powers of p whose first term is [g(D)] p pep(g(D)), where
eD(g(D)) = ep(f(D)) - ep(q(D))
(2.263)
[g(D)]p = [f (D)]pl[q(D)]p
(2.264)
and
This division is well-defined because [ f (D)] p and [q (D)] p are nonzero residues of polynomials in ]F2[D] modulo p.
If g(D) = 0, then in addition to ep(0) = oo, we define [O]p = 0 for all p in P*.
Note that this general expansion method works perfectly well for p = D-1, if we and [q(D)ID-1 to be the coefficients of the highest-order terms of f (D) and take [f (D)ID-
q (D), respectively, that is, the coefficients of Ddeg f (D) and Ddeg q(D) respectively. Again, this
explains our use of the notation p = D-1 for this valuation. EXAMPLE 2.24
For f (D) = D + DZ + D3 (or indeed for any nonzero polynomial in D), the Laurent series in the polynomial D is simply f (D). We have eD (f (D)) = 1, [f (D)ID = 1, and the first nonzero term of the series is [f (D)]DDeD(f (D)) = D. Similarly, for p = D-1, we have eD-I (f (D)) = -3, [f (D)ID-1 = 1, and the Laurent series in D-' is
.f (D) = (D-')-3 +
(D-1)-2
+ (D-1)-'
(2.265)
whose first nonzero term is [f (D)ID-1 (D-')eD-1(f (D)) = (D-')-3. For p = 1+D, wehave el+D (f (D)) _ 0, [f (D)] 1+D = 1, and the Laurent series in 1 + D is
f (D) = (1 + D)° + (1 + D)3
(2.266)
whose first nonzero term is [f (D)]1}D(1 + D)ej+D(f(D)) = (1 + D)°. For p = 1 + D + D2, we have eI+D+D2 (f (D)) = 1, [f (D)] 1+D+D2 = D, and the Laurent series in 1 + D + D2 is simply
f (D) = D(1 + D + D2)'
(2.267)
whose first and only nonzero term is [f (D)]1+D+D2 (1 + D + D2)el+D+D2 (f(D)) = D(1 + D + D2)' .
EXAMPLE 2.25
Let g(D) = (D3 + Ds)/(1 + D + D2). Then eD(g(D)) = 3 = delg(D), eD-,(g(D)) = -3 = -degg(D), el+D(g(D)) = 2, e,+D+DD(g(D)) = -1, and all other p-valuations are zero. It is
Chapter 2
78
Convolutional Encoders-Structural Properties
easy to verify that the product formula holds. Also, [g(D)]D = [g(D)ID-1 = [g(D)]t+D = 1 and [g(D)]i+D+DZ = D.
Next we extend the valuations to vectors (or sets) of rational functions.
Let g(D) = (gi (D) $2(D) ... any p E P* we define
g, (D)), where gi92(D),..., g, (D) E F2(D). For (D),
ep(g(D)) = min{ep(gi(D)), ep($2(D)), ... , ep(g,(D))}
(2.268)
Remark. Equality (2.268) generalizes the notion of the "greatest common divisor." Indeed, if g(D) is a set of polynomials, then the greatest common divisor of the set g(D) is
gcdg(D) = H pep(g(D))
(2.269)
pEP
Now we can write Lemma 2.50 in a more symmetric form:
Lemma 2.51 Let g(D) = (gt (D) $2(D) ... ge(D)) be a 1 x c generator matrix over IF2(D). Then g(D) is canonical if and only if
ep(g(D)) < 0, all p E P*
(2.270)
Proof. We will prove that (2.270) is equivalent to (i) and (ii) of Lemma 2.50. From (2.244), (2.245), and (2.268) it follows that
eD-l (g(D)) = degq(D) - max{deg fi (D), deg f2(D), ... , deg ff(D)}
(2.271)
eD-, (g(D)) < 0 .. deg q (D) < max{deg ft (D), deg f2(D), ... , deg fc (D)}
(2.272)
Hence,
For the second half of the proof, let p be any irreducible polynomial of F2[D]. First, we assume that p I q (D). Since (2.245) holds, p X fi (D) for some i. Then we have both
ep(g(D)) = min{ep(f1(D)/q(D)), ep(f2(D)/q(D)), ..., ep(ff(D)/q(D))}
= ep(fi(D)/q(D)) = -ep(q(D)) < 0
(2.273)
and
p X gcd(fi (D), f2 (D), ... , f (D)) Now we assume that p X q (D). Then
ep(g(D)) = min{ep(fi(D)), ep(.f2(D)), ... , ep(.fc(D))} > 0
(2.274)
ep(g(D)) < 0 .. p X fi (D) for some i q P X gcd(fi (D), f2 (D), ... , f, (D))
(2.275)
Thus,
Therefore,
ep(g(D)) < 0 for all irreducible polynomial p
.. gcd(fi(D), f2(D), ..., ff(D)) = I
2.276)
which completes the proof.
In the following, we will give necessary and sufficient conditions for a b x c rational generator matrix to be canonical but first some prerequisites. Properties (i)-(iii) given below (2.254), appropriately generalized, continue to hold:
Section 2.7
79
Canonical Encoding Matrices*
(i) ep(g(D)) = oo, if and only if g(D) = 0 (ii) ep(k(D)g(D)) = ep(k(D)) + ep(g(D)) for all k(D) E JF2(D) (iii) ep(g(D) + h(D)) > min{ep(g(D)), ep(h(D))} However, the product formula becomes an inequality, since for any i
E ep(g(D)) deg p < T, ep(gi(D)) deg p = 0
(2.277)
pEP*
PEP*
We therefore define the defect of a 1 x c nonzero vector g(D) = (gl (D) $2(D) ... g,(D)) over F2(D) to be [For75]
defg(D)dol= _ E ep(g(D)) deg p
(2.278)
PEP*
We may generalize the definition of delay and degree to a vector g(D) as
delg(D) = eD(g(D)) = min{delg,(D)}
(2.279)
degg(D) = -eD-l (g(D)) = max{deg g; (D)}
(2.280)
Then (2.278) can also be written as
defg(D) = degg(D) - Y ep(g(D)) deg p pEP
= degg(D) - deg(gcdg(D))
(2.281)
In view of property (ii) and the product formula, we have for all nonzero k(D) E IF2(D)
def k(D)g(D) = defg(D)
(2.282)
Thus, every nonzero vector in a one-dimensional rational vector space has the same defect. When c = 1, that is, when g(D) reduces to a nonzero g(D) E F2(D), we have
def g(D) = - Y, ep(g(D)) deg p
(2.283)
PEP*
From the product formula it follows that for any nonzero g(D) E IF2(D), def g(D) = 0. The following lemma shows the significance of def g(D).
Lemma 2.52 Letg(D) = (gI (D) $2(D) ... g, (D)) be a 1 x c nonzero generator matrix over F2(D). Write
g; (D) = fi (D)/q(D), i = 1, 2, ... , c
(2.284)
where f , (D), q(D) E F2[D], i = 1, 2, ... , c, and
gcd(.fi(D), f2 (D), ... , .fc(D), q(D)) = 1
(2.285)
and assume that g(D) is canonical. Then,
defg(D) = max{deg f, (D), deg f2 (D), ... , deg ff(D)} and def g(D) is the constraint length of g(D).
(2.286)
Chapter 2
80
Proof.
Convolutional Encoders-Structural Properties
We have
defg(D) _ - E ep(g(D)) deg p PEP*
(eDI((D))+
e(g(D)) deg p plq(D)
ep(g(D)) deg p)
+
(2.287)
p%q(D)
=-
(degq(D) - max{deg f, (D), deg f2(D), ... , deg ff(D)})
- E ep(q(D))degp+0 plq(D)
where in the last equality the first term follows from (2.271), the second term from (2.273), and the last term from (2.274) and Lemma 2.51. The observation that
degq(D) = Y, ep(q(D)) deg p
(2.288)
plq(D)
and application of Lemma 2.50 complete the proof. EXAMPLE 2.26 Let D
g(D) =
1+D
2.289)
2)
1+D 1+D+D2
B y definition ,
ei+D+D2(g(D)) = min{0, -1, 01 = -1
(2.290)
Similarly, ei+o(g(D)) = -1, eD(g(D)) = 0, eD-l(g(D)) = -2, and ep(g(D)) = O if p E P* and p 541 + D + D2, 1 + D, D, D '. It follows from Lemma 2.51 thatg(D) is canonical. We can express g(D) as
g(D)
D+D2+D3 1+D3
1+D2 1+D3
D2+D5 1+D3
(2.291)
which can be implemented by five delay elements in controller canonical form.
Let G(D) = {gi (D), 1 < i < b} be a set of vectors g, (D) E lF2(D). In view of properties (ii) and (iii), for any vector v(D) = Yz ui (D)g1 (D) and any p c P*, we have ep(v(D)) > min{ep(u;(D)gi(D))} = min{ep(u; (D)) + ep(g; (D))}
((iii)) ((ii))
(2 . 292)
t
Monna [Mon70] defines the set G(D) to be p-orthogonal if equality holds in (2.292) for all rational b-tuples u(D); that is, if for all u(D) in 1F2 (D) we have
ep(v(D)) = min{ep(uti (D)) + ep(g; (D))}
(2.293)
If G(D) is p-orthogonal for all p in P*, then the set G(D) is called globally orthogonal. A b x c polynomial matrix G(D) = {g; (D) E 1F2 [D], 1 < i < b} was said in Section 2.5 to have the predictable degree property (PDP) if for all v(D) = u(D)G(D), where u(D) and v(D) are polynomial vectors, deg v(D) = max{deg u; (D) + degg1 (D)}
(2.294)
Canonical Encoding Matrices*
Section 2.7
81
Equivalently, in the terminology we are using here, G(D) has the PDP if for all v(D) _ u(D)G(D) eD-I (v(D)) =
min{eD-1 (ui (D)) +
eD-I (g; (D))}
(2.295)
that is, if G(D) is D-1-orthogonal. Hence, the PDP is naturally generalized as follows:
Definition A rational matrix G(D) has the predictable degree property (PDP) if it is D-1-orthogonal.
Definition For any p E P*, a rational matrix G(D) has the predictable p-valuation property (PVPp) if it is p-orthogonal.
Definition A rational matrix G(D) has the global predictable valuation property (GPVP) if it is globally orthogonal. We will see below that the GPVP is an essential property of canonical encoding matrices of convolutional codes.
Let g(D) E ]FZ(D). We define the residue vector [g(D)]p as the vector whose components are residues of the corresponding components of the vector g(D) p-eo(g(D)) modulo
p in the ring of formal power series ]F2[D]p[[p]]. Thus, if ep(g;(D)) > ep(g(D)), then [g; (D)] p = 0, even if g; (D) # 0. If g(D) is expanded as a vector of Laurent series in powers of p with coefficients in ]Fz[D]p, then [g(D)] p peo(g(D)) is the first nonzero term in the expansion. In Section 2.5 for a polynomial generator matrix G(D) the matrix [G(D)]h was defined as consisting of the high-order coefficient vectors [g; (D)]h which we would call here the residue vectors [g; (D)]D-l . It was shown that for the PDP to hold for G(D) it is necessary and sufficient that {G(D)]h have full rank. We have the following natural generalization [For75]: Definition Given a rational matrix G(D), its p-residue matrix [G(D)] p is the ]F2[D] pmatrix whose ith row is the residue vector [g; (D)] p, 1 < i < b. The following theorem then gives a basic test for p-orthogonality:
Theorem 2.53 For any p E P*, a rational matrix G(D) has the PVPp (is p-orthogonal) if and only if its p-residue matrix [G(D)] p has full rank over ]F2[D] p. Proof. In general, if v(D) = u(D)G(D), where u(D) = (u1 (D) u2(D) ... Ub(D))
and ui (D) E F2(D), 1 < i < b, then
ep(v(D)) > d
(2.296)
d = min{ep(ui (D)) + ep(g, (D))}
(2.297)
where
Let I be the set of indices such that the minimum is achieved, that is,
I = {i I ep(ui(D)) +ep(g1(D)) = d}
(2.298)
Then, if v(D) 0, the Laurent series expansion of v(D) in powers of p with coefficients in R[D]p may be written as
v(D) = vd pd + vd+l pd+1 + ...
(2.299)
G(D) is p-orthogonal if and only if for all u(D) 0 ep(v(D)) = d, that is, Vd # 0. We may write the Laurent series expansions of the nonzero u; (D) and of the gi (D) as
(2.300)
gj(D) = [g1(D)]ppe°(gi(D)) +g;1)(D), 1 < i < b
(2.301)
u,(D) =
[ui(D)]ppev(ut(D)) +ui1)(D), 1
Convolutional Encoders-Structural Properties
Chapter 2
82
where for all i [ui(D)]p 0, ep(ui(D)), [gi(D)]p ep(gi (D)). Then the lowest-order coefficient of v(D) is given by Vd =
0, and
T[ui(D)]p[gi(D)]p
(2.302) iEl If vd = 0, then the p-residue vectors [gi (D)] p are linearly dependent over F2 [D] p and [G (D)] p does not have full rank. Conversely, if [G(D)] p does not have full rank over F2[D]p, then there exists some nontrivial linear combination of rows that equals zero:
Y ui (D) [gi (D)]p = 0
(2.303)
where ui(D) E F2[D]p, therefore, with the input sequence (u1(D)p-ep(gi(D))
u2(D)p eP(92(D)) ... ub(D)p-eP(gd(D)))
we have d = 0 and Vd = 0, so
ep(v(D)) > 0 = min{ep(ui(D)peP(g'(D))) + ep(gi(D))}
(2.304)
which implies that G (D) is not p-orthogonal. This completes the proof. This test involves only the determination of the rank of a b x c matrix [G(D)] p over the field F2[D] p and is thus easy to carry out for any polynomial p of moderate degree. From Theorem 2.53 follows immediately
Corollary 2.54 A rational matrix G(D) has the GPVP (is globally orthogonal) if and only if its p-residue matrix [G(D)] p has full rank over F2[D] p for all p E P. EXAMPLE 2.27 Consider the rational encoding matrix D
G(D) -
2
gi(D)
1 +D
D
( g2(D) )
(2.305)
I
1+D+D2 1+D+ D2
1
Let p = D. Then we have eD(1) = eD(I+D) = eD(l+D+D2) = 0, eD(i+D) = 1, and eD(]+D+D2) = 2. Hence, we have eD(g1(D)) = eD(g2(D)) = 0, and [ gi (D)] D
_
1
1+ D
D° _
1+ D
1
z
[92 (D)I D
1 + D + D2
1 + D + D2
0
1
D° _ (0
1
(mod D) 1
1
(2 . 306)
)
(mod D)
(2. 307)
Thus, we have 0
(2.308) 1
1
which has full rank over ]F2[D]D = IF2.
For p = 1 + D we have el+D(1) = ei+D(i+D+D2) = ej+D(I +D+D2) = 0 and el+D(i+D) = -1. Hence, ei+D(gi(D)) = -1, ei+D(gz(D)) = 0, and
[gi(D)]i+D = ( 1 0
1
1
+DD 1
)
1+D
(1 + D)-(-')
(mod (1 + D))
(2.309)
z
[g2(D)]i+D =
_(
1 + D+ Dz 1
1
1
)
1 + D + Dz (mod (1 + D))
1
)(1+ D)° (2.310)
Section 2.7
Canonical Encoding Matrices*
83
Thus, we have
(
[G(D)]1+D =
1
1
1)
1
1
1J
(2.311)
which has full rank over ]F2[D]I+D. Next, we let p = 1 + D + D2 and obtain el+D+D2 (1) = et+D+D2 (1+D) = e1+D+D2 (i1+D) = 0 and
ei+D+o2 (i+D+D2) = e1+D+D2 (l+D+D2) = -1. Hence, ei+D+D2 (gl (D)) = 0, el+D+D2 (g2(D)) _ -1. A simple way to calculate 1 °D (mod (1 + D + D2)) is as follows: (mod (1 + D + D2)). From First, we need the inverse of the denominator, that is, (1 + D)-' Euclid's algorithm, it follows that
1=D(1+D)+(1+D+ D2)
(2.312)
(1 + D)-' = D (mod (1 + D + D2))
(2.313)
and, hence,
Then we have
D =D(1+D)-'=D2=1+D
(mod (1+D+D2))
1+D
(2.314)
Similarly, we have 1
= D (mod (1 + D + D2))
1+D
(2.315)
Thus, [gi (D)] i+D+D2 =
D 1+D
1
1
[g2(D)]i+D+D2
1
(1 + D + D2)°
1+D
1+D D)
(2.316)
(mod (1+D+D2))
D
1
1 + D + D2
1 + D + D2
1+D
0)
1
(1 + D +
1
(2.317)
(mod (1 + D + D2))
Thus,
1+D D
1
fG(Dll,. , »2 =
1+D
1
(2.318)
0
which has full rank over IF2[D]l+D+D2.
Finally, let p = D ' . Then we have eD-I (1) = eD-i (,+D) = eD-' (1+D+D2) = 0, eD-' (I+D) = 1,
and eD-I (l+) = 2. Hence, eD-I (g1 (D)) = eD-I (92(D)) = 0, and [gt (D)]D-1 _
1
1
[gz(D)7D
1+D 1
0)
1+D) (D
D2
1
1+D+D2
0
1
)
(2.319)
(mod D-1)
1+D+D2 1
1)0
)(D-1)o 1
(2.320)
(mod D 1)
Thus, we have (2.321)
which has full rank over IF2[D]D-1 = IF2.
Convolutional Encoders-Structural Properties
Chapter 2
84
For p :A D, 1 + D, 1 + D + D2, D-1 we have ep(g; (D)) = 0, i = 1, 2. Thus, D 1
1+D
D 2
1
(mod p)
1+D+D2 1+D+D2
(2.322)
1
which has full rank over IFz[D]P.
Since [G(D)]p has full rank over 1F2[D]p for all p E P*, we conclude that G(D) has the GPVP.
EXAMPLE 2.28 Consider the rational encoding matrix
1+D+Dz 1+D+Dz
G(D) =
(2.323)
1+D+D2 1+D+D2 By repeating the steps in the previous example we can show that the p-residue matrix [G(D)]P has full rank over 1F2[D]P for p = D, 1 + D, and D-1. D+ 2 For 1+D+Dzwehave e
21 = 0, e20 =no,ande
) = ei+D+D3 (t+) _ -1. Hence, we have ei+D+D2 (gi (D)) = eJ+D+D2 (92 (D)) _
eI+D+D2 (l+
-1, and
+D
[gi(D)]1+D+D2 =
0
1
_ (0 [$2(D)]1+D+D2
0
0 0
1 + D2
D2
)(1+D+D2)1> 1+D+Dz 1+D+Dz D 1+D) (mod (1+D+Dz)) D
1
0
1+D+Dz 1+D+D2 1
1+D 1)
(2.324)
(1 + D + D (2.325)
(mod(1+D+D2))
Thus, we have [G(D)]l+D+D2 =
(0 0 0
D
0 1+D
1+D
(2.326)
1
whose rows are linearly dependent over 1F2[D]1+D+D2 and it follows that G(D) does not have the GPVP.
Let G(D) be a b x c generator matrix over 1F2(D) and let Mb denote the set of all b x b submatrices of G(D). For all p E P* we define [For75]
ep(G(D))
m
def
{ep(I Mb(D) 1)}
(2.327)
Mb
and then correspondingly define the internal defect of G (D) to be [For91]
intdef G(D)
aef
-
ep(G(D)) deg p
(2.328)
pEP*
Then we can prove the following important result: Theorem 2.55 The internal defect intdef G (D) is an invariant of the convolutional code C that is encoded by G(D). Proof. Let T (D) be a b x b nonsingular matrix over 1F2(D). Then
ep(I T(D)Mb(D) I) =ep(I T(D) I)+ep(I Mb(D) I)
(2.329)
Canonical Encoding Matrices*
Section 2.7
85
Hence,
ep(T (D)G(D)) = ep(I T(D) 1) + ep(G(D))
(2.330)
It follows from (2.283), (2.328), and (2.330) that
intdef (T (D)G(D)) = def I T (D) I + intdef G(D)
(2.331)
But I T (D) I E F2(D) and I T (D) 10. By the product formula (2.255) and (2.283),
def I T (D) I = 0. Hence,
intdef (T (D)G(D)) = intdef G(D)
(2.332)
Theorem 2.55 motivates us to introduce the defect of the code C encoded by the generator
matrix G(D) to be [For9l] def C
aef
intdef G(D)
(2.333)
We define the external defect of G(D) as the sum of the generator defects: extdef G (D)
aef
b
> def gi (D)
(2.334)
i=1
Before we give five equivalent conditions for a generator matrix to be canonical we shall prove a lemma. Let (2.335)
G(D) = (gij(D))i
gi (D) = (gi I (D) 9i2 (D) ... gic(D)), i = 1, 2 ... , b
(2.336)
gij (D) = fj (D)/qi (D), i = 1, 2, ... , b; j = 1, 2, ... , c
(2.337)
where f j (D), qi (D) E F2[D], i = 1 , 2, ... , b; j = 1, 2, ... , c, and assume that gcd(.fi I (D), .fi2(D), ... , fic(D), qi (D)) =
1,
i = 1, 29 ... , b
(2.338)
[G(D)]p
(2.339)
Then define G I (D, p) by pen (gi (D))
p ep(92(D))
G1(D, p) =
pev(gb(D))
and GO(D, p) by
G(D) = Go(D, p) + Gi (D, p) From (2.340) and (2.339) we have
(2.340)
Convolutional Encoders-Structural Properties
Chapter 2
86
Lemma 2.56 Let G(D) be a b x c rational generator matrix and let p E P. Then
(i) ep([G(D)]p) = 0 if and only if ep(G(D)) = yb1 ep(gi(D)) (ii) ep([G(D)] p) 0 O if and only if ep(G(D)) > Lb 1 ep(gi (D)) We are now well prepared to prove the following
Theorem 2.57 Let G(D) be a b x c rational generator matrix with rows g1(D), g2(D),
.... 9b (D). Then the following statements are equivalent: (i) G(D) is a canonical encoding matrix.
(ii) For all p E P*: ep(gi(D)) < 0, 1 < i < b, and ep([G(D)]p) = 0 (iii) For all p E P*: ep(gi(D)) < 0, 1 < i < b, and ep(G(D)) = =1 ep(gi(D)) (iv) For all p E P*: ep(gi (D)) < 0, 1 < i < b, yib
and intdef G(D) = extdef G(D) (v) For all p E P*: ep(gi (D)) < 0, 1 < i < b, and G(D) has the GPVP Proof. (i = ii). Assume that G(D) is canonical. Suppose that ep(gi (D)) < 0 does not hold for some p E P* and some i, then, by Lemma 2.51, gi (D) is not canonical, and, hence, G(D) is not canonical. Suppose that ep([G(D)]p) = 0 does not hold for some p E P*. Then, by Lemma 2.56, for any p E P* such that ep([G(D)]p) = 0 does not hold, we have b
ep(G(D)) >
ep(gi (D))
(2.341)
i=1
and for any p E P* such that e p ([G (D)] p) = 0 holds, we have b
ep(G(D)) _
ep(gi(D))
(2.342)
i=1
Thus, by combining (2.341) and (2.342) we obtain b
ep(G(D)) deg p < - Y Y, ep(gi (D)) deg p
intdef G(D)
pep* i=1
pEP* b
_
- Y ep(gi (D)) deg p i=1
(2.343)
PEP*
b
_ Y' def gi (D) = extdef G(D) i=1
Hence, G(D) is not canonical. (ii = iii). Follows from Lemma 2.56. (iii = iv). Follows from (2.278) and (2.328). (iv i). By Lemma 2.52, the hypothesis means that intdef G (D) is the overall constraint length of G(D). Let G,(D) be a canonical encoding matrix equivalent to G(D). Then, from
Theorem 2.55 it follows that intdef G,(D) = intdef G(D). By (i = iv), intdef G,(D) is the overall constraint length of G,(D). Thus, intdef G,(D) is minimum over all equivalent generator matrices, and so is intdef G(D). Hence, G(D) is canonical.
Section 2.8
Minimality via the Invariant-Factor Theorem*
87
(ii * v). ep([G(D)]p) = 0 means that there exists at least one b x b minor of [G(D)]p not divisible by p, which together with Corollary 2.54 completes the proof. EXAMPLE 2.29 In Example 2.27 we showed that the rational encoding matrix
D 1 +D
1
G(D)
D2
1
1+D
(2.344)
1
1+D+D2 1+D+D2
has the GPVP. Clearly, for all p E P*, ep(g,(D)) < 0, i = 1, 2. Therefore, condition (v) in Theorem 2.57 is satisfied, and we conclude that G(D) is canonical and, hence, minimal. EXAMPLE 2.30 The rational encoding matrix
I + D2
G(D) =
D2
1+D+D2 1+D+D2 D2
(2.345)
1
1+D+D2 1+D+D2 has a (trivial) right inverse,
which is polynomial in both D and D-1. Hence, from Theorem 2.37 (iv) it follows that G (D) is minimal. In Example 2.28 we showed that G(D) does not have the GPVP. Hence, from Theorem 2.57 we conclude that G(D) is not canonical, although it is minimal.
Corollary 2.58 Let C be a convolutional code. Then any canonical encoding matrix of C has def C as its overall constraint length. Moreover, the number of memory elements in any encoder of C is > def C. Proof. Let G(D) be a canonical encoding matrix of C. By Theorem 2.57 (iv), intdef G(D) = extdef G (D). By (2.333) and (2.334) def C = I:d_i def gi (D) and from Lem-
ma 2.52 that yb1 defgi(D) is the overall constraint length of G(D). Among the rational generator matrices that encode a convolutional code, we have singled out the class of canonical encoding matrices, which can be realized by the least number of delay elements in controller canonical form among all equivalent generator matrices. Thus the position of canonical encoding matrices within the class of rational generator matrices corresponds to that of minimal-basic encoding matrices within the class of polynomial generator matrices.
The set of canonical encoding matrices is a proper subset of the set of minimal rational encoding matrices. This is a generalization of the previous result that the set of minimal-basic encoding matrices is a proper subset of the set of minimal polynomial encoding matrices. 2.8 MINIMALITY VIA THE INVARIANT FACTOR THEOREM* In this section we will use the invariant-factor theorem with respect to both F2 [D] and F2 [D-t ]
to derive a result on minimality of generator matrices. First we state the invariant-factor decomposition of a rational matrix introduced in Section 2.2 in the following form, where for simplicity we assume that the rational matrix has full rank.
Convolutional Encoders-Structural Properties
Chapter 2
88
Theorem 2.59 (Invariant-factor theorem) Let G(D) be a full-rank b x c rational matrix, where b < c. Then G(D) may be written as
G(D) = A(D)f(D)B(D)
(2.346)
where A(D) and B(D) are, respectively b x b and c x c matrices with unit determinants,
and where r(D) is a diagonal matrix with diagonal elements yi(D), I < i < b, called the invariant factors of G(D) relative to the ring ]F2[D]. The invariant-factors are uniquely determined by G(D) as
Yi (D) = Oi (D)/Oi-t (D) where 00(D) = 1 by convention and
(2.347)
Oi(D) = [ pmin{ep(detM;(D))IM1(D)EM;)
(2.348)
PEP
where Mi is the set of i x i submatrices of G(D), 1 < i < b. Consequently, b
(2.349)
Yr (D) = Ab(D) b
ep(yi(D)) = ep(Ab(D)), p E P
(2.350)
For all p in P, the invariant-factors satisfy the divisibility property
ep(Y1(D)) < ep(Yi+i(D)), 1 < i < b
(2.351)
It is easy to show that if G(D) is regarded as a matrix over ]F2(D-' ), then the invariantfactors yi (D-1) of G(D) with respect to ]F2[D-1] have the same p-valuations as the invariantfactors yi (D) for all p in P except for D. Therefore, it makes sense to define the p-valuations of the invariant-factors of G(D) for all p in P* and all i by YD,i = eD(Yi(D)), YD-1,i = eD-1
((D-1)),
if p = D
(2.352)
if p = D-'
(2.353)
yp,i = ep(yi(D)) = ep(yi(D ')), otherwise
(2.354)
If we define .0(D-1) = 1 and Oi(D-1) _
H pEP*\{D}
pmin{ep(detM;(D-'))IM,(D-')EM;}
(2.355)
then Yi(D-1)
=
2ii(D-1)/Oi-1(D-1)
(2.356)
To simplify the computation of these p-valuations for small generator matrices we define Sp,o = 0 for all p in P* and
8p,i = ep(Di(D)), p c P
(2.357)
i = eD (Xi (D-i)) and then we have for all p in P* and 1 < i < b
(2.358)
= Sp,i - bp,i-1
(2.359)
bD
Yp,i
which implies b
8p,b = E Yp,i i=1
(2.360)
Section 2.8
Minimality via the Invariant-Factor Theorem*
Remark.
89
We can now recognize that for p E P
ep(G(D)) = min{ep(detMb(D)) I Mb(D) E Mb) = ep(Ab(D)) = bp,b
(2.361)
and
eD-I(G(D))
Mb(D-1) E Mb}
=min{eD-I(detMb(D-1))
I
(2.362)
= eD-I (Ob(D-1)) = SD--,b
Thus, the internal defect can be computed directly from the p-valuations of the invariant factors
of G(D)by b
Sp,b deg p = -
intdef G(D)
pEP*
PEP*
(Yi)
deg p
(2.363)
i=1
Now we have
Theorem 2.60 Let G(D) be a b x c rational generator matrix. Then G(D) is minimal if and only if yp,b < 0 for all p in P*. Proof. If G(D) is minimal, then Theorem 2.37 (v) implies that a polynomial output sequence u(D)G(D) must be generated by a polynomial input sequence u(D), and an output sequence u(D-1)G(D-1) that is polynomial in D-' must be generated by an input sequence
u(D-') that is polynomial in D-1. Let G(D) = A(D)F(D)B(D) be an invariant-factor decomposition of G(D); then G-1 (D) = B-'(D)1'-1(D)A-'(D) is a right inverse of G(D), where A-1(D) and B-1(D) are polynomial since A(D) and B(D) have unit determinants. Suppose that Yp,b > 0 for some pin P. Then u(D) = (0 0... 1/p)A-'(D) is nonpolynomial (since u(D)A(D) is nonpolynomial), but u(D)G(D) is polynomial and we have a contradiction. Using the invariant-factor theorem with respect to F2[D-1], we can show a similar contradiction if YD-l,b > 0. Conversely, assume that yp,b < 0 for p in P, then yp,i < 0, since by the invariant-factor
theorem yp,, < Yp,b for i < b. Hence, if yp,b < 0 for all p in P, F-' (D) is polynomial, and then G-1 (D) = B-1(D)P-1(D)A-' (D) is the desired polynomial right inverse of G (D). Similarly, if also YD-I,b < 0, then r-1(D-') is polynomial in D-1, and the invariant-factor theorem with respect to F2[D-'] yields an ]F2[D-' ]-inverse of G(D). The minimality of G(D) follows immediately from Theorem 2.37. Remark. The condition that Yp,b < 0 for p = D is equivalent to the condition that no nontrivial zero-output (encoder) state transition starting from the zero state occurs in the minimal state realization of G(D) when it is used as a state realization of C. Similarly, the
condition Yp,b < 0 for p = D-' is equivalent to the condition that there is no nontrivial zero-output state transition ending in the zero state. Finally, the condition Yp,b < 0 for all other p E P* is equivalent to the condition that there is no nontrivial infinite zero-output state path (the "noncatastrophic" condition). EXAMPLE 2.31 Let
G(D) =
(
1
0
D
1
)
(2.364)
be an encoding matrix over ]F2. Then the I x 1 minors of G(D) are {1, 0, D, 11, and the 2 x 2 minor is the determinant det G (D) = 1. The greatest common polynomial divisor of the 1 x 1 minors is 1, so 6p,1 = 0 for all p E P. However, the maximum degree of the 1 x 1 minors is 1, so it follows that
90
Convolutional Encoders-Structural Properties
Chapter 2
SD-1,1 = -1. Since detG(D) = 1 we have 8p,2 = 0 for all p E P*. Therefore, Yp2=Sp2-SP1=
10, pEP
(2.365)
p_D-1
1,
so G(D) is not minimal. Indeed, G(D) does have an ]F2[D]-inverse, namely, its unique inverse G ' (D) _ -
/
0
1
1D
(2.366)
1
but G-'(D) is not an ]F2[D-1]-matrix, so G(D) has no F2[D-']-inverse. EXAMPLE 2.32 Let 1
D
1
1+D
1+D
(2.367)
D2
1+D+D2
1+D + D2
be an encoding matrix over F2. The greatest common divisor of the 1 x 1 minors of G(D) is Al (D) =
1/(1+D3)andthatof the 2x2minors isA2(D) = 1/(1+D3). Therefore,y2(D) = A2(D)/A1(D) = 1. G(D) can also be written as a rational matrix in D-1, viz., 1
G(D) =
1
1
D-1
1 - D-1
1+ D'
I
(2.368)
D
1+D 1
1
+D-2
+ D-2 We have, similarly, ii(D ') = 1/(1 + D-') and A2(D ') = 1/(1 + D-'). Therefore, we also have 1+D-1
y2(D ') = 1. Thus, Yp,2 = 0 for all p e P*. By Theorem 2.60, G(D) is minimal.
Corollary 2.61 Let G(D) be a minimal b x c rational generator matrix with rows g1 (D), 92 (D), ... , gb (D). Then
ep(g; (D)) < 0, 1 < i < b, all p E P*
(2.369)
Proof. Suppose that ep(gj (D)) > 0 for some j with 1 < j < b and p in P. Then u(D) = (0...0 1/p 0...0),where l/pisinthe jth position, is nonpolynomial, but u (D) G (D) is polynomial. Similarly, if eD-1 (gj (D)) > 0, then u(D 1) = (0 ... 0 1/D ' 0 ... 0) is nonpolynomial in D ', but u(D 1)G(D ') is polynomial in D-'. Hence, Theorem 2.37 (v)
implies that G(D) is nonminimal, which is a contradiction. It is easily seen that the converse of Corollary 2.61 does not hold; for example, the basic encoding matrix (cf. Example 2.20)
G(D) =
1+D
D
D
1+D
(2.370)
is not minimal, although its rows satisfy (2.369). By combining Theorems 2.45 and 2.57 (v) and Corollary 2.61, we have
Corollary 2.62 Let G(D) be a b x c rational generator matrix with rows g, (D), 92(D), gb(D). Then G(D) is canonical if and only if it is minimal and has the GPVP.
The encoding matrix given in Example 2.30 is minimal but does not have the GPVP. Hence, it is not canonical. But the encoding matrix given in Example 2.32 is both minimal and has the GPVP (cf. Example 2.27) and, hence, canonical.
Section 2.9
Syndrome Formers and Dual Encoders
91
We have shown that canonicality is the intersection of two independent properties: minimality and the global predictable valuation property. A minimal encoding matrix need not have the GPVP, and a matrix with the GPVP need not be minimal.
2.9 SYNDROME FORMERS AND DUAL ENCODERS We will now use the invariant-factor decomposition of a rational convolutional generator matrix to construct generator matrices for a dual code.
Let G(D) = A(D)F(D)B(D). In Section 2.4 we have shown that the first b rows of the c x c polynomial matrix B(D) can be taken as a basic encoding matrix G'(D) equivalent to G(D). A polynomial right inverse G'-1 (D) consists of the first b columns of B-1(D). Let HT (D), where T denotes transpose, be the last c - b columns of B-1(D). Then the last c - b rows of B(D), which is a (c - b) x c matrix, is a left inverse of HT (D). Thus, the transpose of the matrix formed by the last c - b rows of B(D) is a right inverse of H(D) and can be denoted H-1(D). Hence, the last c - b rows of B(D) can be denoted (H-1(D))T . Summarizing, we have
B(D) =
G '(D)
(H-1(D))T
(2.371)
and
B-1(D) = (G'-1(D) HT (D))
(2.372)
The matrix HT (D) has full rank and is both realizable and delayfree.
Let g'(D) be a row among the first bin B(D). Since HT (D) consists of the last c - b columns in B-1(D), it follows that
g'(D)HT (D) = 0
(2.373)
Then for each codeword v(D) = u(D)G'(D) we have
v(D)HT (D) = u(D)G'(D)HT (D) = 0
(2.374)
Conversely, suppose that v(D)HT (D) = 0. Since rank HT (D) = c - b, it follows from (2.373) that v(D) is a linear combination of the first b rows of B(D) with Laurent series as coefficients, say v(D) = u(D)G'(D). Thus, v(D) is a codeword. It follows that the output of the c-input, (c - b)-output linear sequential circuit whose transfer function is the polynomial matrix HT (D), is the allzero sequence if and only if the input sequence is a codeword of the code C encoded by G(D). Thus, C could equally well be defined as the sequences v(D) such that
v(D)HT (D) = 0
(2.375)
or the null space of HT (D). We call the matrix H(D) the parity-check matrix and the matrix HT (D) the syndrome
former corresponding to G(D) = A(D)F(D)B(D). In general, any c x (c - b) realizable, delayfree transfer function matrix ST (D) of rank c - b is called a syndrome former of C if
G(D) ST (D) = 0
(2.376)
The syndrome former HT (D) can be expanded as
HT (D) = Hp + Hl D + ... + HT Dm,
(2.377)
where HfT , 0 < i < m,., is a c x (c - b) matrix and m, is the memory of the syndrome former. In general, the memory of a syndrome former is not equal to the memory of the generator matrix G(D).
92
Convolutional Encoders-Structural Properties
Chapter 2
Using (2.377), we can write equation (2.375) as
+ v,-,,, HMS = 0, t E Z
vrH0 + vt_1Hi +
(2.378)
For causal codewords v = v0v1 v2 ... we have equivalently
vHT = 0
(2.379)
where
HT HT 0
HT
m,.
1
Ho HI ...
HT =
(2.380)
H,n
is a semi-infinite syndrome former matrix corresponding to the semi-infinite generator matrix G given in (1.74). Clearly, we have
GHT = 0
(2.381)
EXAMPLE 2.33 For the basic encoding matrix G(D) whose encoder is illustrated in Fig. 2.9 we have (Example 2.4)
0 1+D+D2 1+D+D2+D3 B-'(D) =
1+D2+D3
D+D2 1+D2
0 1
1+D+D3
(2.382) 1
Hence, we have the following syndrome former
HT(D)=
I
(2.383)
I+D2+D3
I+D+D3
1
whose controller canonical form is illustrated in Fig. 2.17.
+
+
z
vhl
Figure 2.17 The controller canonical form of the syndrome former in Example 2.33.
Its observer canonical form is much simpler (Fig. 2.18).
vhl
Figure 2.18 The observer canonical form of the syndrome former in Example 2.33.
Syndrome Formers and Dual Encoders
Section 2.9
93
The corresponding semi-infinite syndrome former matrix is
HT =
1
1
1
1
1
0
1
1
1
1
0
1
1
1
1
1
1
0
1
1
1
1
0
1
(2.384)
We notice that in the previous example the observer canonical form of the syndrome former requires exactly the same number of memory elements as the controller canonical form of the corresponding minimal-basic encoding matrix G(D). Before we prove that this holds in general, we shall consider H(D) as a generator matrix.
Definition The dual code C1 to a convolutional code C is the set of all c-tuples of sequences v1 such that the inner product
(v vi) def = v(v1)T
(2.385)
is zero, that is, v and v1 are orthogonal, for all finite v in C.
The dual code C1 to a rate R = b/c convolutional code is a rate R = (c - b)/c convolutional code.
Theorem 2.63 Let the rate R = b/c convolutional code C be generated by the semiinfinite generator matrix G and the rate R = (c - b)lc dual code C1 be generated by the semi-infinite generator matrix G1, where G is given in (1.74) and
Go G 1 G1 =
(
...
GG1
Gm1
...
Ga m
(2.386)
Then
G(G1)T = 0 Proof.
(2.387)
Let v = uG and v1 = u1G1, where v and v1 are orthogonal. Then we have
v(v1)T = uG(u1G1)T = uG(G1)T (u1)T = 0
(2.388)
and (2.387) follows.
The concept of a dual convolutional code is a straightforward generalization of the corresponding concept for block codes (see Section 1.2). For convolutional codes, we also have a closely related concept: Definition The convolutional dual code C1 to a convolutional code C, which is encoded by the rate R = b/c generator matrix G(D), is the set of all codewords encoded by any rate R = (c - b)/c generator matrix G,(D) such that
G(D)GT (D) = 0
(2.389)
Consider a rate R = b/c convolutional code C encoded by the polynomial generator matrix
G(D)
(2.390)
94
Convolutional Encoders-Structural Properties
Chapter 2
Let G1 (D) denote the rate R = (c - b)/c polynomial generator matrix GL(D) = Gml1 + Gml_t D +
+ GODml
(2.391)
which is the reciprocal of the generator matrix
GL(D) = Go + Gi D +
+ Gm1Dml
(2.392)
for the dual code C1. Then we have
G(D)(GL(D))T = G0(Gml)T + (G0(Gml-t)T + Gl(Gml)T )D
+ ... + Gm =
\=-m
(G)TDm+ml
o
(2.393)
YGi(G(+1)T
Dm+j =Q
i=0
where the last equality follows from (2.387). Let C1 be the reversal of the dual code CL, that is, the rate R = (c - b)/c convolutional code encoded by GL (D). Then, we have
Theorem 2.64 The convolutional dual to the code encoded by the generator matrix G (D) is the reversal of the convolutional code dual to the code encoded by G (D). That is, if C is encoded by G(D), then
CL = JL
(2.394)
Remark. Often the convolutional dual code CL is simply called the dual of C, which could cause confusion since it is in general not equal to CL.
It follows from (2.374) and (2.389) that the transpose of the syndrome former for the code C can be used as a generator matrix for the convolutional dual code CL; that is, we may take
GL(D) = H(D)
(2.395)
and, equivalently, the transpose of the reciprocal of the syndrome former for the code C can be used as a generator matrix for the dual code C'; that is, we may take
G1(D) = H(D)
(2.396)
EXAMPLE 2.34 Let the rate R = 2/3 encoding matrix G(D) in Example 2.4 encode the code C. Two realizations of its syndrome former HT (D) are shown in the previous example. The convolutional dual code is encoded by the rate R = 1/3 encoding matrix
H(D)=(1+D+D2+D3
1+D2+D3
1+D+D3)
(2.397)
whose controller canonical form is illustrated in Fig. 2.19. VG)
v(2)
Figure 2.19 The controller canonical form of the encoding matrix for the convolutional dual code in Example 2.34.
Section 2.9
Syndrome Formers and Dual Encoders
95
The following lemma leads to an important theorem relating a code to its convolutional dual code. Lemma 2.65 Let i 1 , i2, ... , i, be a permutation of 1 , 2, ... , c. Then the b x b subdeterminant of the basic encoding matrix G'(D) formed by the i I , i2, ... , ib columns is equal to the (c-b) x (c - b) subdeterminant of the syndrome former HT (D) formed by the ib+1, ib+2, , is rows. Proof. It is sufficient to consider the case that i 1 = 1 , i2 = 2, ... , is = c. Recall that
G'(D) is the first b rows of B(D) and that HT (D) is the last (c - b) columns of B 1(D). Write
B(D)
_
B1l(D)
B12(D)
B21(D)
B22(D)
(2.398)
and
B11(D)
B-1(D)
=
\
B2 1(D)
B22(D) B12(D)
(2.399)
/
where
G'(D) = (B1l(D) B12(D))
(2.400)
and
HT (D) =
B
12(D) B22 (D)
(2.401)
where Bt 1(D) is a b x b matrix, and B22(D) is a (c - b) x (c - b) matrix. Consider the matrix product
C Btl(D) 0
B12(D)
Bll(D)
B12(D)
Ic-b
B21(D) 0
B22(D)
Ib
B21(D)
B22(D)
(2.402)
Taking the determinants, we have
det(B11(D)) det(B-1(D)) = det(B22(D))
(2.403)
Since det(B-1(D)) = 1, we have now shown that the leftmost subdeterminant of G'(D) is equal to the lower subdeterminant of HT (D).
Theorem 2.66 If G(D) is a minimal-basic encoding matrix with maximum degree µ among its b x b subdeterminants, then the convolutional dual code has a minimal-basic encoding matrix H(D) with overall constraint length A. Proof. Follows from Theorems 2.20 and 2.22, Corollary 2.23, and Lemma 2.65. Since µ = v for a minimal-basic encoding matrix we have
Corollary 2.67 If G(D) is a minimal-basic encoding matrix for a convolutional code C and H(D) is a minimal-basic encoding matrix for the convolutional dual code C1, then # {abstract states of G(D)} = # {abstract states of H(D)}
(2.404)
If we connect the encoder outputs directly to the syndrome former input, then the output of the syndrome former will be the allzero sequence. From this it follows that a close connection exists between the abstract states of a generator matrix and the abstract states of its syndrome former:
96
Chapter 2
Convolutional Encoders-Structural Properties
Theorem 2.68 Let the output of the encoder with generator matrix G(D) drive its syndrome former HT (D). Then, whenever the abstract state of G(D) is s(D), the abstract state of HT (D) will be s, (D) = s(D)HT (D). Proof. Let u(D) be the input of G(D) associated with the abstract state s(D), that is, s(D) = u(D)PG(D)Q, and let v(D) be the output when we truncate the input at time zero, that is, v(D) = u(D)PG(D), then it follows that s(D) = v(D)Q. Since v(D) is a codeword, we have (2.375) v(D)HT (D) = 0. Using P + Q = 1 we get 0 = v(D)HT (D) = v(D)HT (D)Q = v(D)(P + Q)HT (D) Q = v(D)PHT (D)Q + v(D)QHT (D) Q (2,405) = v(D)PHT (D) Q +s(D)HT (D) Q = ss (D) + s(D) HT (D) where s,. (D) = v (D) P H T (D) Q is the abstract state of the syndrome former corresponding to the input v(D). Corollary 2.69 Assume that both the encoding matrix G (D) of a convolutional code and the encoding matrix H (D) of its convolutional dual code are minimal-basic. Then the abstract
state spaces of the encoding matrix G(D) and its syndrome former HT (D) are isomorphic under the map
s(D) H ss(D) = s(D)HT (D) Proof.
(2.406)
Following the notation of Theorem 2.68, it is clear that
s(D) i-+ s, (D) = s(D)HT (D) (2.407) is a well-defined linear map from the abstract state space of G(D) to that of HT(D). By Lemma 2.32, the map is injective. Furthermore, we have # {abstract states of G(D)} = # {abstract states of H(D)} _ # {encoder states of the controller canonical form of H(D)} = # {states of the observer canonical form of HT (D) = # {abstract states of HT (D)} Therefore, the map is also surjective. Hence, we have an isomorphism.
(2.408)
Suppose that the codeword v(D), where v(D) = u(D)G(D), is transmitted over a noisy additive, memoryless channel. Let r(D) be the received sequence, then
r(D) = v(D) + e(D)
(2.409)
where e(D) is the error sequence. If we pass the received sequence through the syndrome former HT (D), we obtain the syndrome
z(D) = r(D)HT (D) = (v(D) + e(D))HT (D) = e(D)HT (D)
(2.410)
We notice that the syndrome is 0 if and only if the error sequence is a codeword. Furthermore,
the syndrome is independent of the transmitted codeword-it depends only on the error sequence. If we use a syndrome former, then the decoding rule can be simply a map from the syndromes to the apparent errors e(D). 2.10 SYSTEMATIC CONVOLUTIONAL ENCODERS Systematic convolutional generator matrices are in general simpler to implement than general
generator matrices. They have trivial right inverses, but unless we use rational generator matrices (i.e., allow feedback in the encoder), they are in general less powerful when used together with maximum-likelihood decoding.
Section 2.10
97
Systematic Convolutional Encoders
Since a systematic generator matrix has a submatrix that is a b x b identity matrix, we have immediately the following Theorem 2.70 A systematic generator matrix is a systematic encoding matrix. EXAMPLE 2.35 Consider the rate R = 2/3 systematic convolutional encoder with the basic encoding matrix
G(D) =
1
0
1+D2
0
1
1 + D + D2
(2.411)
In Figs. 2.20 and 2.21, we show the controller canonical form and the observer canonical form, respectively.
V(2)
V(3)
Figure 2.20 The controller canonical form of the systematic encoder in Example 2.35.
(2)
Figure 2.21 The observer canonical form of the systematic encoder in Example 2.35.
V(2)
V(3)
From Theorem 2.17 and equation (2.40) it follows that every basic encoding matrix has the greatest common divisor of all b x b minors equal to 1. Thus, it must have some b x b submatrix whose determinant is a delayfree polynomial, since otherwise all subdeterminants would be divisible by D. Premultiplication by the inverse of such a submatrix yields an equivalent systematic encoding matrix, possibly rational. Thus, we have the following
Theorem 2.71 Every convolutional generator matrix is equivalent to a systematic rational encoding matrix. Remark. If the determinant of the leftmost b x b submatrix of G (D) is not a delayfree polynomial, then we can always, by permuting the columns of G(D), find a generator matrix G'(D) whose leftmost b x b submatrix has a determinant which is a delayfree polynom, where G'(D) encodes an equivalent code. Hence, without loss of generality we can write a systematic encoding matrix G(D) = (Ib R(D)).
Convolutional Encoders-Structural Properties
Chapter 2
98
EXAMPLE 2.36
Consider the rate R = 2/3 nonsystematic convolutional encoder illustrated in Fig. 2.9. It has the minimal-basic encoding matrix
1+D D
G(D)=
D2
1
(2.412)
1+D +D2
1
with µ = v = 3. Let T (D) be the matrix consisting of the first two columns of G(D): T (D) =
++ZD
(2.413)
1
)
We have det(T(D)) = 1 + D + D3, and
T-' (D) =
I + D + D3
(
D2
1
(2.414)
+DD
Multiplying G(D) by T-1(D) yields a systematic encoding matrix Gsys(D) equivalent to G(D):
Gsys(D) = T-'(D)G(D) 1
1
1+D+D3
D
1+D D
D2 1+D ) \
D2
1
1
1+D+D2 (2.415)
1+D+D2+D3 1+D+D3 1+D2+D3
1+D+D3 Its realization requires a linear sequential circuit with feedback and it = 3 memory elements as shown in Fig. 2.22.
Figure 2.22 The observer canonical form of the systematic encoding matrix in Example 2.36.
The systematic encoding matrix in the previous example was realized with the same number of memory elements as the equivalent minimal-basic encoding matrix (Example 2.15). Hence, it is a minimal encoding matrix. Every systematic encoding matrix (2.29),
G(D) = (Ib R(D))
(2.416)
where Ib is a b x b identity matrix and R (D) is a b x (c - b) matrix whose entries are rational functions of D, has a trivial right inverse, viz., the c x b matrix
G-' (D) =
(2.417)
which is polynomial in both D and D-t. Hence, it follows from Theorem 2.37 that this minimality holds in general:
Theorem 2.72 Every systematic encoding matrix is minimal.
Section 2.10
Systematic Convolutional Encoders
99
EXAMPLE 2.37 Consider the rate R = 2/4 minimal-basic encoding matrix
1+D D
G(D) =
D
1
D
1
(2.418)
D I+ D
with µ = v = 2. Let T (D)
=
1+D
D
D
1
(2.419)
Then, we have
T-'(D) =
1
1
1+D+D2
D
(2.420)
D 1+D
and
Gsys(D) = T-'(D)G(D) D
1+D+D2 1
0
0
1
(
D 1+D X 1 DD
1+D2
1
D
1+DD (2.421)
D2
1+D+D2 1+D+D2 D2
1
1+D+D2 1+D+D2 /
Gsys(D) has neither a minimal controller canonical form nor a minimal observer canonical form, but by a standard minimization method for sequential circuits [Lee78] we obtain the minimal realization shown in Fig. 2.23. (This minimization is described in Appendix A.)
V(3)
Figure 2.23 A minimal realization of the systematic encoding matrix in Example 2.37.
Consider the c x 1 polynomial syndrom e former
hi (D) HT (D) =
h2 (D)
(2.422)
hT(D)
w ith
gcd(hi (D), hz (D), ... , hT (D)) = 1
(2.423)
for a rate R = (c - 1)/c convolutional code C. From (2.423) it follows that at least one of the polynomials hT (D) is delayfree. We can without loss of essential generality assume that
Convolutional Encoders-Structural Properties
Chapter 2
100
hT (D) is delayfree. Then we can rewrite (2.375) as
hi (D) (v(1)(D) v(z)(D)
... v(`) (D))
h2 (D)
=0
(2.424)
hT (D)
which can be used to construct a systematic encoder as follows. Assume that the first c - 1 output sequences are identical to the c - 1 input sequences; that is,
v(`)(D) = u(`)(D), i = 1, 2, .. . , c -
1
(2.425)
Inserting (2.425) in (2.424) gives the following expression for determining the last output sequence v(`) (D) = (hT
(D))-1(ul1)
(D)hi (D) + ... + u(c-1) (D)h (D))
(2.426)
Hence, we have
v(D) = u(D)Gsys(D)
(2.427)
where
hi (D)/hC (D) h2 (D)/ hT (D)
1
Gsys(D) = 1
(2.428)
h 1(D)/ he (D)
is a (c - 1) x c systematic rational encoding matrix for the code C obtained directly from the syndrome former HT (D). Gsys(D) is realizable since hT (D) is assumed to be delayfree. EXAMPLE 2.36 (Cont'd) The syndrome former corresponding to the generator matrix G(D) given in (2.412) was determined in Example 2.33: h1 (D)
HT
(D) =
h2
(D)
1+D+D2+D3 =
1 + DZ + D3
1+D+D
h,T (D)
(2.429)
3
By inserting (2.429) in (2.428) we again obtain the systematic encoding matrix given in (2.415). Only a slight modification of the syndrome former in Fig. 2.18 is required to obtain the observer canonical form of the systematic encoding matrix in Fig. 2.22.
Next we consider the c x (c - b) polynomial syndrome former
HT (D) =
h11 (D)
h12(D)
h2 1(D)
h2 2(D)
...
h1(D) h (D) ...
h1(C_b)(D)
h2(c-b)(D)
hC
(2.430)
_b) (D)
for a rate R = b/c convolutional code C. Assume that H(D) is an encoding matrix for the convolutional dual code C1; then H (0) has full rank and there exists a (c - b) x (c -b) submatrix of H (0) whose determinant is 1. It follows that the determinant of the same submatri x of H (D)
Section 2.10
Systematic Convolutional Encoders
101
is a delayfree polynomial and, thus, has a realizable inverse. Assume without loss of essential generality that this submatrix consists of the last (c - b) rows in HT (D). We can now construct a systematic rational encoding matrix for C from (2.375) as follows. Assume that the first b output sequences are indentical to the b input sequences; that is,
v(`)(D) = u(`)(D), i = 1, 2, ... , b
(2.431)
Inserting (2.431) into (2.375) where HT (D) is given by (2.430) gives (u(1)(D) u(2)(D)
x
... u(b)(D)v(b+1)(D) v(b+2)(D) ... v(c)(D))
h11(D)
h12(D)
... h(c_b)(D)
h21(D)
h22 (D)
...
h 1(D)
h2(D)
... h
(2.432)
h2(c-b) (D)
=0
(c-b) (D)
or, equivalently, (v(b+1)(D) v(b+2)(D)
X
... v(c)(D))
h(b+1)1(D)
h(b+1)2(D)
...
T h(b+l)(c-b)(D)
h(b+2)1(D)
h(b+2)2(D)
...
h(b+2)(c-b) (D)
...
h (c-b) (D)
hC (D)
h
2(D)
h11(D)
_ (u(1)(D)u(2)(D)...u(b)(D))
h12(D)
(2.433)
... h(c-b)(D)
(hl(D)h2D...hC_bD) hb1(D)
hb2(D)
...
hb(c-b)(D)
Then we have (v(b+l)(D) v(b+2)(D)
... vW (D))
(2.434)
= (u(1)(D) u(2)(D) ... u(b)(D))H'(D) where
hil(D) h12(D) ... hl(c-b)(D) h22(D)
...
hbl(D) hb2(D)
...
h21(D)
H'(D) =
h2(c b)
(D)
hb(c-b)(D)
(2.435)
x
h(b+1)1(D)
h(b+1)2(D)
...
h(b+2)1(D)
h(b+2)2(D)
...
h 2(D)
h2(D)
h T(b+1)(c-b) (D)
h(b+2)(c-b) (D)
T
h (c
b)
(D)
is a b x (c - b) realizable rational matrix. Thus, we have a rational systematic encoding matrix for the convolutional code C
GSys(D) = (Ib H'(D)) where H'(D) is given in equation (2.435).
(2.436)
Convolutional Encoders-Structural Properties
Chapter 2
102
EXAMPLE 2.38 The rate R = 2/4 minimal-basic encoding matrix (2.418) in Example 2.37 has the Smith form decomposition
G(D) = A(D)r(D)B(D)
/ 1+D 1
D
0 1/ 1
0
0
0
1
0
1
0 0
D
1
D2
1+D2
0
D 1+D+D2
1
1
0
0
0
0
0
1
(2.437)
and, hence,
B ' (D) =
0
1
0
1
D2
1+D+D2 1+D+D2
1
1
1 + D + D2
1 + D2
0
0
0
1
1+D2
(2.438)
Since the syndrome former is the last c - b columns of B -'(D) (2.372), we have
H' (D) _
1
1 + D2 n2
1 + D + D2
1 + D + D2
1 + D2
0
1
1
_,_ n -j- n2 (2.439)
The encoding matrix for the convolutional dual code C1, viz.,
H(D)=
1+D2
1+D+D2 0
D2
1+D+D2 1+D+D2
1+D2
(2.440)
1
is not minimal-basic; [H(D)In does not have full rank. Since H(D) is obtained from the unimodular matrix B(D), it is basic and, hence, we can apply our minimization algorithm (Algorithm MB, given in Section 2.5) and obtain an equivalent minimal-basic encoding matrix for the convolutional dual code C1,
H b(D) =
1+D D
D
1
D 1+D
D
(2.441)
I
The (c - b) x (c - b) matrix formed by last two rows of H,nb(0) has full rank, which implies that the determinant of the same submatrix of H,nb(D) is a delayfree polynomial and, thus, has a realizable inverse. Hence, we have from (2.434) that (v(3)(D) v(4) (D)) 1
_ (u(1) (D) u(2) (D))
D
D 1+D
1+D
D
11-
D
(2.442)
=
(u (" (D) u (2' (D))
1
1 + D2 + D + DZ
DZ
1+D
DZ
1
1+D+D2 1+D+ D2 Finally, we have the following systematic rational encoding matrix obtained via the syndrome former: 1 + D2 GSys(D) =
D2
1+D+D2 1+D+D2 D2
1
1+D+D2 1+D+D2 which is identical to (2.421).
\ (2.443)
Problems
103
2.11 COMMENTS Massey made early contributions of the greatest importance to the structural theory of convolutional encoders. Together with Sain [MaS67], he defined two convolutional generator matrices to be equivalent if they encode the same code. They also proved that every convolutional code
can be encoded by a polynomial generator matrix. Later, they studied conditions for a convolutional generator matrix to have a polynomial right inverse [MaS68] [SaM69]. Massey's work in this area was continued by his students Costello [Cos69] and Olson [O1s70]. Costello was apparently the first to notice that every convolutional generator matrix is equivalent to a systematic encoding matrix, in general nonpolynomial. By exploiting the invariant-factor theorem and the realization theory of linear systems, Forney generalized, deepened, and extended these results in a series of landmark papers [For70a] [For73a] [For75]. Among the pre-Forney structural contributions, we also have an early paper by Bussgang [Bus65].
In the late 1980s and early 1990s, there was a renewed interest in the structural analysis of convolutional codes. In 1988 Piret published his monograph [Pir88], which contains an algebraic approach to convolutional codes. In a semitutorial paper, Johannesson and Wan [JoW93] rederived many of Forney's important results and derived some new ones using only linear algebra-an approach that permeates this chapter. In [For91a] Forney extended and deepened his results in [For75]. This inspired Johannesson and Wan to write [JoW94], which together with [FJW96] constitute the basis for Sections 2.7 and 2.8. Other recent important developments are reported in [FoT93] [LoM96]. Finally, we would like to mention Massey's classic introduction to convolutional codes [Mas75].
PROBLEMS 2.1 Consider the rate R = 1/2 convolutional encoder illustrated in Fig. P2.1. (a) Find the generator matrix G(D).
(b) Let G(D) = A(D)r(D)B(D). Find A(D), F(D), and B(D). (c) Find G'(D), where G'(D) is the first b rows in B(D). (d) Is G' (D) minimal? If not, find a minimal encoding matrix G,,,i (D) equivalent to G' (D).
(e) Find A-'(D), B-' (D), and G-' (D). (f) Find H(D) and verify that G(D)HT (D) = 0. V(1)
Figure P2.1 Encoder used in Problem 2.1.
2.2 Repeat Problem 2.1 for the encoding matrix 11
G=
10
01
11
11
10
01
11
11
10
01
11
Convolutional Encoders-Structural Properties
Chapter 2
104
2.3 Draw the encoder block diagram for the controller canonical form of the encoding matrix
G(D) =
1+D D 1+D D
1
1
and repeat Problem 2.1. 2.4 Consider the encoder shown in Fig. P2.4. V(1)
Figure P2.4 Encoder used in Problem 2.4.
(a) Find the generator matrix G(D). (b) Find the Smith form. (c) Is the generator matrix catastrophic or noncatastrophic? (d) Find G-1(D). 2.5 Consider the rate R = 1/2 convolutional encoding matrix G (D) = (1 + D3
1 + D + D2 +
D3).
(a) Find the Smith form decomposition.
(b) Find G-1(D). 2.6 Consider the rate R = 2/3 rational convolutional encoding matrix
G(D) =
1
D
1
1+D2
1+D2
1+D
D
1
1+D2
1+D2
(a) Find the invariant-factor decomposition.
(b) Find G-1(D). 2.7 The behavior of a linear sequential circuit can be described by the matrix equations
Qt+l = AQt + Bu,
ut = Co + Hu, where u, v, and o,, are the input, output, and encoder state, respectively, at time t. (a) Show that applying the D-transforms to each term in the matrices yields
o-) = o,(D)A+u(D)B v(D) = o'(D)C + u(D)H where oro is the initial value of a,. (b) Show that the transfer function matrix is
G(D) = H + B(I + AD)-1 CD (c) Show that the abstract state s(D) corresponding to the encoder state o can be expressed as
s(D) = o(I + AD)-1C 2.8 Consider the encoding matrix in Problem 2.3. Find the correspondence between the encoder and abstract states. 2.9 Consider the two minimal-basic encoding matrices
G' (D)
(l+D =
D2
D
1
1
1 + D + DZ
Problems
105 and
GZ(D)=
7 1+D
D
1
1+D+D2 D2
1
(a) Show that GI (D) and G2(D) are equivalent. (b) Find an isomorphism between the state spaces of Gi (D) and G2(D). Hint: Study the proof of Lemma 2.33. 2.10 Consider the rate R = 1/2 convolutional encoding matrix G(D) = (1 + D + D2 1 + D2). (a) Find the syndrome former HT (D) and draw its controller and observer canonical forms. (b) Find a systematic encoding matrix Gsys(D) equivalent to G(D) and draw its controller and observer canonical forms. (c) Find an encoding matrix for the convolutional dual code and draw its observer canonical form.
2.11 Consider the rate R = 2/3 convolutional encoding matrix
G(D)=(l+D D 1+D D
1
1
/
(a) Find the syndrome former HT (D) and draw both its controller and observer canonical forms.
(b) Find a systematic encoding matrix Gsys(D) equivalent to G(D) and draw its controller and observer canonical forms. (c) Find an encoding matrix for the convolutional dual code and draw its controller canonical form.
(d) Find a right inverse to the encoding matrix in (b) and draw its observer canonical form. 2.12 Consider the rate R = 2/3 convolutional encoding matrix
G(D) = 1
1
D2
1+ D+ D2
1
1+D
D
and repeat Problem 2.11. 2.13 Consider the rate R = 1/2 nonsystematic encoding matrix G (D) = (1 + D + D2
1 + D2). Find two systematic polynomial encoding matrices that are equivalent over a memory length.
2.14 Consider the rate R = 2/3 convolutional encoding matrix
G(D) =
I+D
D
1
1+D2+D3 1+D+D2+D3 0
(a) Find without permuting the columns an equivalent systematic rational encoding matrix and draw both its controller and observer canonical forms. (b) Find a systematic polynomial encoding matrix that is equivalent to G(D) over a memory length. (c) Compare the result with the systematic encoding matrix in Example 2.36. 2.15 Consider the rate R = 2/3 convolutional encoding matrix
G(D) = C
1
D2
1 + D + D2
1
1+D
D
(a) Is G(D) basic? (b) Is G(D) minimal? (c) Let G' (D) be the b first rows in the unimodular matrix Bin the Smith form decomposition
of G(D). Is G'(D) minimal? 2.16 Consider the rate R = 2/3 convolutional encoding matrix
G(D) _ (
1
1
0
1
1+D
1
106
Convolutional Encoders-Structural Properties
Chapter 2
(a) Find G-1(D). (b) Is G(D) basic? (c) Is G(D) minimal? If not, find a minimal encoding matrix Gmi (D) equivalent to G(D). 2.17 Consider the rate R = 2/3 convolutional encoding matrix
G=
1
1
D
1+D3
1+D+D2
1+D3
D2
D
1+D+D2+D3
1+D3
I+D3
1+D3
(a) Find G-1(D). (b) Is G-1(D) realizable? (c) Is G(D) catastrophic? 2.18 Consider the rate R = 2/3 convolutional encoding matrix
G(D) =
I D2
D+D3
I
D2+D3
I+D3 1+D2+D3+D4 D+D2+D3+D4
(a) Is G(D) basic? (b) Is G(D) minimal? 2.19 Consider the rate R = 4/5 convolutional generator matrix
I+D+D4
G(D)=
1+D 1+ D 1+D2
D4
1+D2
1+D4
D+D3 1+D3 D2+D3 I
D
D+D2 1+D
D2+D4 1+D2+D3
0
1
D2
1+D+D2
(a) Is G(D) an encoding matrix? (b) Is G(D) basic? (c) Is G(D) minimal? If not, find a minimal encoding matrix Gmin(D) equivalent to G(D). 2.20 Consider the rate R = 2/3 convolutional encoder illustrated in Fig. P2.20. (a) Find the generator matrix G(D). (b) Is G (D)minimal-basic? If not, find a minimal-basic encoding matrix Gmb(D) equivalent
to G(D). (c) Is G(D) minimal? (d) Is the encoder in Fig. P2.20 minimal?
V(3)
Figure P2.20 Encoder used in Problem 2.20.
107
Problems
2.21 Consider the rate R = 2/3 convolutional encoder illustrated in Fig. P2.21. (a) Find the generator matrix G(D). (b) Is G (D) minimal-basic? If not, find a minimal-basic encoding matrix G. b (D) equivalent to G(D). UM V(2)
Figure P2.21 Encoder used in Problems 2.21 and 2.26.
2.22 Consider the rate R = 2/3 convolutional encoding matrix
G(D) =
(D D k.
1
1
D2 1+D+D2
(a) Is G(D) minimal? (b) Find a systematic encoding matrix Gsys(D) equivalent to G(D). 2.23 Consider the rate R = 2/3 convolutional encoding matrix
G(D) =
1
D
1
1+D+D2
I+D3
1+D3
D2
1
1
1+D3
I+D3
I+D
(a) Is G(D) minimal? (b) Find a systematic encoding matrix equivalent to G(D). (c) Find G-1(D). 2.24 Consider the two rate R = 2/3 convolutional encoding matrices 1
(I + D)3 1 + D + D3
G1(D) =
1+D2 +D I+D+D3
0
3
0
1
and
G2(D)
1+D
1
D
D2
1 + D + D2
1
(a) Are G1(D) and G2(D) equivalent? (b) Is G I (D) minimal? (c) Is G2(D) minimal? 2.25 Consider rate R = 1/3 convolutional code with encoding matrix
G(D)=( I+D+D2 1+D2+D4 1+D+D2 1+D4
1+D4
1+D2
(a) Is 1 11 010 011 000 000 000... a codeword? (b) Find a minimal encoder whose encoding matrix is equivalent to G(D). 2.26 Consider the rate R = 2/3 convolutional encoder illustrated in Fig. P2.21. (a) Is the encoder in Fig. P2.21 minimal? (b) Find a systematic encoder that is equivalent to the encoder in Fig. P2.21.
Convolutional Encoders-Structural Properties
Chapter 2
108
2.27 Consider the rate R = 2/4 convolutional encoder illustrated in Fig. P2.27. Is the encoder in Fig. P2.27 minimal?
Figure P2.27 Encoder used in Problem 2.27.
2.28 Consider the cascade of a rate R° = b°/c° outer convolutional encoder with generator matrix G°(D) and a rate R; = b,/ci inner convolutional encoder with generator matrix G' (D), where bi = c° (Fig. P2.28). Show that if G°(D) and G' (D) are both minimal, then the cascaded generator matrix defined by their product G°(D) = G°(D)G' (D) is also minimal [HJS98].
b°
Outer convolutional encoder
co = bi
Inner
convolutional encoder
ci
Figure P2.28 A cascade of two consecutive convolutional encoders used in Problem 2.28.
2.29 Let Gmb(D) be a minimal-basic encoding matrix with memory mb. Show that mmb is minimal over all equivalent polynomial generator matrices. Hint: Use the predictable degree property. 2.30 * Let Gc(D) be a canonical encoding matrix with memory mc. Show that me is minimal over all equivalent rational generator matrices. Hint: Use the GPVP.
K Distance Properties of Convolutional Codes
Several important distance measures for convolutional codes and encoders are defined. We also derive upper and lower bounds for most of these distances. Some of the bounds might be useful guidelines when we construct new encoders, others when we analyze and design coding systems. The distance spectrum for a convolutional code is obtained via the path enumerators that are determined from the state-transition diagram for the encoder in controller canonical form.
3.1 DISTANCE MEASURES-A FIRST ENCOUNTER In this section we discuss the most common distance measures for convolutional codes.
Consider a binary, rate R = b/c convolutional code with a rational generator matrix G(D) of memory m. The causal information sequence
u(D) =uo+u1D+u2D2+
(3.1)
is encoded as the causal codeword
v(D)=vo+viD+u2D2+
(3.2)
v(D) = u(D)G(D)
(3.3)
where
For simplicity we sometimes write u = uou1 ... and v = v0v1 ... instead of u(D) and v(D), respectively. First, we consider the most fundamental distance measure, which is the column distance [Cos69].
Definition Let C be a convolutional code encoded by a rational generator matrix G(D). The jth order column distance djc of the generator matrix G(D) is the minimum Hamming distance between two encoded sequences v[0,J] resulting from causal information sequences u[o,l] with differing u0.
109
Chapter 3
110
Distance Properties of Convolutional Codes
From the linearity of the code it follows that d is also the minimum of the Hamming weights of the paths v[o.1] resulting from causal information sequences with uo 0 0. Thus, dd = min{WH(v[o,j])}
(3.4)
uo:A0
where wH( ) denotes the Hamming weight of a sequence. Let
G(D)=Go+G1D+- - - +GmDm
(3.5)
be a polynomial generator matrix of memory m and let the corresponding semi-infinite matrix G be (1.74): Go
G1 Go
G=
...
Gm
G1
...
Gm
(3.6)
where Gi, 0 < i < m, are binary b x c matrices. Denote by Gc the truncation of G after j + 1 columns; that is, Go
... Gj
G1
G2
Go
G1
Gj_1
Go
Gj_2
Gi =
(3.7)
Go
where Gi = 0 when i > m. Making use of (1.72), we can rewrite (3.4) as djc = min {WH (u[o,l]Gc)}
(3.8)
uo#0
From (3.7) and (3.8) it follows that to obtain the jth order column distance dj' of the polynomial generator matrix (3.5), we truncate the matrix G after j + 1 columns. EXAMPLE 3.1 Consider the convolutional code C encoded by the encoding matrix D
G(D) = ( Ol
0
(3.9)
)
where
G(0) = Go =
oil
\
100
(3.10)
/
has full rank. The encoding matrix G(D) has the column distances do = min{wH(uoGo)} = WH ((01)
( 0100 )) = 1
(3.11)
and
di = m in WH UO-AO I
= wH ((0100)
11 Go / /
Go 011
000
100
0010 1
100
(3.12)
_2
Section 3.1
Distance Measures-A First Encounter
111
The equivalent generator matrix
G(D)=\ 1 ID
I
)(
D
0)
I+D)
1
1
I
1+D
(3.13)
1
has Go
_
111 111
(3.14)
of rank Go = 1. By choosing uo = 11 we obtain do' = WH(uoG0) = 0
(3.15)
Hence, there is a nontrivial transition from the zero state (not to the zero state) that produces a zero output. The two equivalent generator matrices for the code C have different column distances.
From Example 3.1 it follows that the column distance is an encoder property, not a code property.
In Chapter 2 we defined an encoding matrix as a generator with G(0) of full rank. The main reason for this restriction on G(0) is given in the following.
Theorem 3.1 The column distance is invariant over the class of equivalent encoding matrices. Proof.
Let Ccd be the set of causal and delayfree codewords; that is, C,d
aef
{v E C I vi = 0,
i < 0, and vo : 0}
(3.16)
The set Ccd is a subset of C, Cd C C, but it is not a subcode since it is not closed under addition.
The set of causal and delayfree codewords Cd depends only on C and is independent of the chosen generator matrix. Then the theorem follows from the observation that for encoding matrices the minimization over u0 0 in (3.4) is a minimization over {vl0,Jl I V E C,d}. Theorem 3.1 leads to the following
Definition Let C be a convolutional code. The jth order column distance of C is the jth order column distance of any encoding matrix of C. The mth order column distance dm of a rational generator matrix of memory m, where
the memory of a rational generator matrix is defined by (2.240), is sometimes called the minimum distance (of the generator matrix) and denoted d,,,;,,. It determines the error-correcting capability of a decoder that estimates the information symbol u0 based on the received symbols over the first memory length only, that is, over the first n,,, = (m + 1)c received symbols. A good computational performance for sequential decoding (to be discussed in Chapter 6) requires a rapid initial growth of the column distances [MaC71 ]. This led to the introduction of the distance profile [Joh75].
Definition Let G(D) be a rational generator matrix of memory m. The (m + 1)-tuple
dP=(dp,di,...,dm)
(3.17)
where djc, 0 < j < m, is the jth order column distance, is called the distance profile of the generator matrix G(D). The distance profile of the generator matrix is an encoder property. However, since the jth order column distance is the same for equivalent encoding matrices and the memory is the same for all equivalent minimal-basic (canonical) encoding matrices, we can also define the distance profile of a code:
Chapter 3
112
Distance Properties of Convolutional Codes
Definition Let C be a convolutional code encoded by a minimal-basic encoding matrix Gmb(D) of memory m. The (m + 1)-tuple
d'=(do,di,...,dc,,)
(3.18)
where dj, 0 < j < m, is the jth order column distance of Gmb(D), is called the distance profile of the code C.
A generator matrix of memory m is said to have a distance profile dP superior to a distance profile dP' of another generator matrix of the same rate R and memory m, if there is some £ such that d`
=dj, j=0,1,...,.E-1 >djc',
j=f
(3.19)
Moreover, a convolutional code C is said to have an optimum distance profile (is an ODP code)
if there exists no generator matrix of the same rate and memory as C with a better distance profile.
A generator matrix G(D) with optimum do must have G(0) of full rank. Hence, a generator matrix of an ODP code is an ODP encoding matrix. An ODP encoding matrix causes the fastest possible initial growth of the minimal separation between the encoded paths diverging at the root in a code tree. Remark. We notice that only in the range 0 < j < m each branch on a code sequence v[o.J] is affected by a new portion of the generator matrix as one penetrates into the trellis. The great dependence of the branches thereafter militates against the choice
dp _ (do, dl, ... , dam)
(3.20)
as does the fact that dc,, is probably a description of the remainder of the column distances, which is quite adequate for all practical purposes. Let G'(D) be a rational encoding matrix of memory m'. We denote by G'(D) j,,,, where m < m', the truncation of all numerator and denominator polynomials in G'(D) to degree m. Then it follows that the encoding matrix G(D) of memory m and the encoding matrix G"(D) = T (D)G(D) Im, where T (D) is a b x b nonsingular rational matrix, are equivalent over the first memory length m. Hence, they have the same distance profile. Let, for example,
G(D) be a systematic encoding matrix. Then we can use this property to generate a set of nonsystematic encoding matrices with the same distance profile. EXAMPLE 3.2 The systematic encoding matrix
Gsys(D) = (1
1 + D + D2)
(3.21)
has the optimum distance profile dP = (2, 3, 3). The nonsystematic encoding matrix
G(D)_(1+D+D2)(1 1+D+D2)12
=(1+D+D2 1+D2)
(3.22)
is equivalent to GSys(D) over the first memory length and, hence, has the same distance profile.
Theorem 3.2 The column distances of a generator matrix satisfy the following conditions: (i)
d<
j = 0, 1, 2, ...
(3.23)
Section 3.1
Distance Measures-A First Encounter
113
(ii) The sequence do, d', d2, ... is bounded from above. (iii) dj' becomes stationary as j increases. Proof.
(i) Assume that
dj+t = WH(v[o,.i+]])
(3.24)
d+1 > WH(v[o,i]) > d'
(3.25)
where vo 0 0. Then,
(ii) It follows from the controller canonical form of a generator matrix that for any j there exists an input sequence u[o,J] such that the Hamming weight of the output sequence v[o,j] is less than or equal to the number of nonzero coefficients in the numerator polynomials of G(D). This number is finite. (iii) Follows immediately from (i) and (ii).
Thus, the column distance d is a nondecreasing function of j. It is sometimes called the column distance function [ChC76]. Moreover, the limit
d'00 = lim d; -.oo
(3.26)
do
(3.27)
exists, and we have the relations
Definition Let C be a convolutional code. The minimum Hamming distance between any two differing codewords, dfree = min{dH(v, v')} v#v'
(3.28)
is called the free distance of the code.
From the linearity of a convolutional code it follows immediately that dfree is also the minimum Hamming weight over the nonzero codewords. The free distance is a code property! In Fig. 3.1 we illustrate an example of two codewords, v and v', in a trellis. Assume that v = 0. Then the free distance of a code C is the smallest Hamming weight of a codeword v' that makes a detour from the allzero codeword.
v=0
Figure 3.1 Two codewords in a trellis.
The free distance is the principal determiner for the error-correcting capability of a code when we are communicating over a channel with small error probability and use maximumlikelihood (or nearly so) decoding. Let .5, be the set of all error patterns with t or fewer errors. As a counterpart to Theorem 1.1 for block codes we have:
Theorem 3.3 A convolutional code C can correct all error patterns in Ft if and only if dfree > 2t. Proof.
See the proof of Theorem 1.1.
Distance Properties of Convolutional Codes
Chapter 3
114
Theorem 3.4 For every convolutional code C, (3.29)
dfree = d. Proof.
From Theorem 3.2 (iii) it follows that there is an integer k such that dk = dk+1 =
= d.
(3.30)
Clearly, there exists a codeword of weight dk. Let G(D) be a polynomial encoding matrix of C. By definition of dk an encoded sequence V[O,k] results from a causal information sequence 0 such that wH(V[o,k]) = dk. Since Go = G(0) has full rank, for each i > k u[o,k] with uo we can choose ui such that
ui Go + ui_1 G1 + ... + ui_mGn = 0
(3.31)
by induction, where u, = 0 if n < 0. Then, (u[O,k]uk+luk+2 ...)G = (V[0 k]O 0...) E C
(3.32)
where G is the matrix (3.6), and we conclude that WH(V[o,k]O 0...) = WH(V[O,k]) = dk
(3.33)
dfree < dk = d,0
(3.34)
Hence,
We can assume that C has a codeword v = vovl ... of weight dfree with vo 0 0. For all j we have djc < WH(VOV1 ... v1) < dfree
(3.35)
Let j > k and the proof is complete. Often we need more detailed knowledge of the distance structure of a convolutional code. Let ndfree+i denote the number of weight dfree + i paths which depart from the allzero path at the root in the code trellis and do not reach the zero state until their termini. We call ndf,ee+i the (i + 1)th spectral component. The sequence ndfree+i ,
i = 0, 1, 2....
(3.36)
is called the weight spectrum of the code. The generating function for the weight spectrum,
T(W) =
ndfe+jWdf-+i
(3.37)
i=o
is called the path weight enumerator and will be studied in depth in Section 3.10. The zero state driving information sequence for a rational generator matrix with a given encoder realized in controller canonical form is the sequence of information tuples that causes the memory elements to be successively filled with zeros. The length of this sequence is at
most m, where m is the length of the longest shift register. Denote the zero state driving information sequence starting at time t + 1 by uz t+m] In general, it depends on the encoder state at time t, and it can happen that the encoder is driven to the zero state in fewer than m steps. To simplify notations we also use t+m] as the zero driving information sequence in these cases. For a polynomial generator matrix of memory m we have u(s,t+m] = 0. As a counterpart to the column distance we have the row distance [Cos69].
Definition The jth order row distance df of a rational generator matrix of memory m realized in controller canonical form is the minimum of the Hamming weights of the paths V[o,j+m] resulting from information sequences u[o,j+m] = u[o,j]uzs,j+m], where u[o, J] 0 0, and
remerging with the allzero path at depth at most j + m + 1.
Section 3.1
Distance Measures-A First Encounter
115
Let G(D) be a polynomial generator matrix and let the corresponding semi-infinite matrix G be (3.6). Denote by G j the matrix formed by truncating the semi-infinite matrix G after its first j + 1 rows, that is, Go
G1 Go
G=
...
Gm
G1
...
Gm
Go
G1
...
Gm
Go
G1
(3.38)
...
Gm
Suppose that G(D) is realized in controller canonical form. Then we have d = min { WH (u[o, j]G'j) }
(3.39)
Theorem 3.5 The row distances of a rational generator matrix realized in controller canonical form satisfy the following conditions: (i)
d j+l < d; , j = 0, 1, 2....
(3.40)
djr >0, j=0,1,2,...
(3.41)
(iii) dj becomes stationary as j increases. Proof.
(i) Assume that djr = WH(V[O,j+m])
(3.42)
where v[o,j+ml results from an information sequence u[o, j+m] = u[o,j]u(J,j+m] with u[o,j] 0 0 and such that the path remerges with the allzero path at depth at most j + m + 1. Then there exists an information tuple u'j+1 such that
dj
= WH(V[O,j+m]0)
> min[ wH(VEo j+l+m])} > dj+l
(3.43)
ui+1
where v'o j+]+mJ results from an information sequence u[O j+1+m] with u[O+1] = u[O,j]uj+l
and such that the path remerges with the allzero path at depth at most j + m + 2. (ii) Assume that dj' = 0 for some j. Then there is an input sequence u[o, j+m] _ u[o,j]u1 j+m] with u[o, j] 0, which produces the output sequence v[o,j+m] = 0[o,j+m] and the zero state. This contradicts G (D) having full rank. (iii) Follows immediately from (i) and (ii). We define
d
def
lim d' j-*0
(3.44)
and have the relations
0
(3.45)
If we think of the row distances in terms of a state-transition diagram for the encoder, it follows
that do is the minimum weight of the paths resulting from only one freely chosen nonzero information tuple followed by k zero state driving information tuples, where v,,,i < k < m and vm;f = min, { v, }. These paths diverge from the zero state at some time instant and return to the zero state at k + I time instants later. Higher order row distances are obtained by allowing
Chapter 3
116
Distance Properties of Convolutional Codes
successively more freedom in finding the minimum weight path diverging from and returning to the zero state. The jth order row distance dj' is the minimum weight of paths of length at most j + m + 1 branches diverging from and returning to the zero state. Eventually, d is the minimum weight of paths diverging from and returning to the zero state. Since the column distance di is the minimum weight of a path of length i + I branches and with a first branch diverging from the zero state, it is obvious that
di <
all i and j
(3.46)
and, thus, that
do
(3.47)
Furthermore, if there are no closed circuits of zero weight in the state diagram except the trivial zero weight self-loop at the zero state, it follows that
d. =
(3.48)
We will give a formal proof of the important equality (3.48) below. First, we remark that the existence of a zero weight nontrivial circuit in the state-transition diagram is equivalent to the existence of an information sequence with infinite weight that is encoded as a finite weight sequence-a catastrophic encoder.
Theorem 3.6 Let G(D) be a noncatastrophic generator matrix. Then,
dg. =
(3.49)
Proof. If G(D) is noncatastrophic, then there exists a codeword v resulting from an information sequence of the form u = u[o,i]uz i+m]O, where u[o,i] # 0, such that dg, = WH(V) = WH(v[o,i+m]) ? di
(3.50)
Let k be the least positive integer such that dr k = drk+t
dro.
(3.51)
Then di > dk for i < k. By combining (3.47), (3.50), and (3.51) we obtain
di
dk
(3.52)
d` o0 oo = drk = dr
(3.53)
It follows that i > k and di = dk. Hence,
0 The row distance could probably blame equality (3.49) for not getting much attention in the literature. However, its significance should not be underestimated. It is easy to calculate and serves as an excellent rejection rule when generator matrices are tested in search for convolutional codes with large free distance. EXAMPLE 3.3 The state-transition diagram for the rate R = 1/2, binary, convolutional encoder of memory m = 3 and encoding matrix G(D) = (1 + D + D2 + D3 I + D2 + D3) shown in Fig. 3.2 is given in Fig. 3.3. V 1)
v(2)
Figure 3.2 A rate R = 1/2 ODP convolutional encoder.
Section 3.2
Active Distances
117
da=7 dtT
1/11
1/01 1
0/01 C) 1/00
T T d2=...=dam=6
da2 did2=3
0/00
d3=d4=4 d5=d6=5
0/11
dc=6
Figure 3.3 The state-transition diagram for the encoder in Fig. 3.2.
100 -+ 010
The row distance do is obtained from the circuit 000
d', dz. ... . d are obtained from the circuit 000 --> 100
110
011
001 ---> 000, and
001 -* 000. The
column distance do is obtained from 000 -> 100, d' from, for example, 000 100 -* 110, dz from 000 -+ 100 -+ 010 -+ 101, and so on. The row and column distances are shown in Fig. 3.4. We have the free distance dfree = d' = d' = 6 and the distance profile dP = (2, 3, 3, 4). This ODP code has an optimum free distance (OFD), which will be shown in Section 3.5.
d dr i
dfree
Figure 3.4 Row and column distances for the encoder in Fig. 3.2.
0
1
0
2
4
6
8
3.2 ACTIVE DISTANCES The column distance introduced in the previous section will not increase any more when the lowest weight path has merged with the allzero path. In this section we shall introduce a family of distances that stay "active" in the sense that we consider only those codewords that do not pass two consecutive zero states. As we shall see in Section 3.9, these distances are of particular importance when we consider concatenated encoders. Let the binary m-dimensional vector of b-tuples o t be the encoder state at depth t of a realization in controller canonical form of the generator matrix and let 0'(') be the b-tuple representing the contents of position i of the shift registers (counted from the input connections).
(When the jth constraint length vj < m for some j, then we set the jth component of OM to be 0.) Then we have crt = o,, Ol(t2) ... at"'). To the information sequence u = uout ... corresponds the state sequence or = oroo't ....
118
Chapter 3
Distance Properties of Convolutional Codes
Let S"[tl,tzl 2 denote the set of state sequences Q[t,,t2] that start in state o l and terminate in state 0'2 and do not have two consecutive zero states in between; that is, def
Qty = 01, ot2 = 0'2 and
{a'th,tzl I
(3.54)
ai, ai+i not both = 0, tl < i < t2} The most important distance parameter for a convolutional generator matrix is the active row distance; it determines the error-correcting capability of the code.
Definition Let C be a convolutional code encoded by a rational generator matrix G (D) of memory m which is realized in controller canonical form. The jth order active row distance is
ar def =
min
{WH(V[O, j+ml)}
(3.55)
S10,j+il°°i+i+i=0,1
where o, denotes any value of the state o j+i such that a(i) denotes the i j+1 =A 0, and first positions of the shift registers (counted from the input connections); that is, j+I+i =O'(l,i)
01
(1)
(2)
W
aj+l+i . . . j+l+i'
Let vmin be the minimum of the constraint lengths vi, i = 1, 2, ... , b, of the generator matrix G(D) of memory m; that is, vmin = mini{vi} and m = maxi{vi}. Then the active row distance of order j is the minimum weight of paths that diverge from the zero state at depth 0, possibly "touches" the allzero path only in nonconsecutive zero states at depth k, where 1 + vmin < k < j, and, finally, remerges with the allzero path at depth £, where
j+I+Vmin<_ e<j+1+m. For a polynomial generator matrix realized in controller canonical form we have the following equivalent formulation:
ar =
min,
{WH (u[o,j]G5)}
(3.56)
uj0, S[o.j+n
where o, denotes any value of the state o j+l with Go
G1
...
Gm
Go
G1
...
u j and Gm
(3.57) Go
G1
...
Gm
is a (j + 1) x (j + 1 + m) truncated version of the semi-infinite matrix G given in (3.6). Notice that the active row distance sometimes can decrease. As we shall show in Section 3.8, however, in the ensemble of convolutional codes encoded by periodically time-varying generator matrices there exists a convolutional code encoded by a generator matrix such that its active row distance can be lower-bounded by a linearly increasing function. From the definition follows immediately
Theorem 3.7 (Triangle inequality) Let G (D) be a rational generator matrix with vmin -m. Then its active row distance satisfies the triangle inequality
aj < ai +aj-i-l-m
(3.58)
where j > i + m and the sum of the lengths of the paths to the right of the inequality is
i+m+1+(j-i-m-1)+m+1= j+m+l that is, equal to the length of the path to the left of the inequality.
Furthermore, we have immediately the following important
(3.59)
Section 3.2
Active Distances
119
Theorem 3.8 Let C be a convolutional code encoded by a noncatastrophic generator matrix. Then min{a } = dfree
(3.60)
.i
The following simple example shows that the triangle inequality in Theorem 3.7 would not hold if we do not include state sequences that contain isolated inner zero states in the definition of S°''°z [ri,tzl EXAMPLE 3.4 Consider the memory m = 1 encoding matrix
G(D) = (1
D)
(3.61)
The code sequences corresponding to the state sequences (0, 1, 0, 1, 0) and (0, 1, 1, 1, 0) are (10, 01, 10, 01) and (10, 11, 11, 01), respectively. It is easily verified that ao = 2, a' = 4, and a2 = 4, which satisfy the triangle inequality
aZ < ao + ao
(3.62)
If we consider only state sequences without isolated inner zero states, the lowest weight sequence of length four would pick up distance 6 and exceed the sum of the weight for two length two sequences, which would still be four, violating the triangle inequality.
Remark. If we consider the ensemble of periodically time-varying generator matrices G (or G (D)) to be introduced in Section 3.6 and require that the corresponding code sequences consist of only randomly chosen code symbols (i.e., we do not allow transitions from the zero state to itself), then for a given length the set of state sequences defined by Sr`'zj is as large as possible.
Next, we will consider an "active" counterpart to the column distance:
Definition Let C be a convolutional code encoded by a rational generator matrix G (D) of memory m realized in controller canonical form. The j th order active column distance is def
a; = min {wH(vlo,jl)}
(3.63)
s
where o denotes any encoder state. For a polynomial generator matrix, we have the following equivalent formulation:
a = min {WH (u[o,l]Gi))
(3.64)
S[O, j+il
where n, denotes any encoder state and Go
G`j =
G1
...
Go
G1
Gm Gm
Go
G1
Go
...
G.
(3.65)
Gm-1
Go
I
is a (j + 1) x (j + 1) truncated version of the semi-infinite matrix G given in (3.6). It follows from the definitions that (3.66)
120
Chapter 3
Distance Properties of Convolutional Codes
where k < min{ j, Um;,,} and, in particular, if Vin = m < j, then we have (3.67) aj-m From (3.66) it follows that when j >_ vi the active column distance of order j is upperbounded by the active row distance of order j - vmi,,, that is, by the minimum weight of paths ac j <
of length j + 1 starting at a zero state and terminating at a zero state without passing consecutive zero states in between.
The active column distance a' is a nondecreasing function of j. As we will show in Section 3.8, however, in the ensemble of convolutional codes encoded by periodically timevarying generator matrices there exists a convolutional code encoded by a generator matrix such that its active column distance can be lower-bounded by a linearly increasing function. Definition Let C be a convolutional code encoded by a rational generator matrix G(D) of memory m. The jth order active reverse column distance is
a rc
= min
def
,o
(3.68)
{WH(V(m,j+ml)}
where a denotes any encoder state.
For a polynomial generator matrix we have the following equivalent formulation to (3.68):
arc = min o,o
(3.69)
{WH (u[o,j+mlG5`)}
S[m,m+i+tl
where o denotes any encoder state and Gm
G.
Gm-1
Gm-1
Gri
Go
(3.70)
Gm
Go
Gm-1
Go
/
is a (j + m + 1) x (j + 1) truncated version of the semi-infinite matrix G given in (3.6). The active reverse column distance a nondecreasing function of j. As we will show in Section 3.8, however, in the ensemble of convolutional codes encoded by periodically timevarying generator matrices there exists a convolutional code encoded by a generator matrix such that its active reverse column distance can be lower-bounded by a linearly increasing function. Furthermore, the active reverse column distance of a polynomial generator matrix G (D) is equal to the active column distance of the reciprocal generator matrix diag (1)°1 D°2 ... D°e)G(D-1)
Definition Let C be a convolutional code encoded by a rational generator matrix G (D) of memory m. The jth order active segment distance is
a,s def = min ,
2
s[m,m+;+n
where o j and 0'2 denote any encoder states.
{WH(v[m,j+ml))
(3.71)
Section 3.2
Active Distances
121
For a polynomial generator matrix, we have the following equivalent formulation:
a' = min x .02
{ WH
(3.72)
(u[o,j+m]G$) }
Stm,M+;+u
where o l and 0'2 denote any encoder states, and G' = G". If we consider the segment distances for two sets of consecutive paths of lengths i + 1 and (j - i - 1) + 1, respectively, then the terminating state of the first path is not necessarily identical to the starting state of the second path (see Fig. 3.5). Hence, the active segment distance for the set of paths of the total length j + 1 does not necessarily satisfy the triangle inequality. However, we have immediately the following t = to
t=to+i+1
t=to+j+1
Figure 3.5 An illustration to Theorem 3.9.
Theorem 3.9 Let G(D) be a generator matrix of memory m. Then its active segment distance satisfies the inequality
aj > a, +a'_i_t
(3.73)
where j > i and the sum of the lengths of the paths to the right of the inequality is
i+l+j-i-1+1=j+1
(3.74)
that is, equal to the length of the path to the left of the inequality.
The active segment distance a is a nondecreasing function of j. As we will show in Section 3.8, however, in the ensemble of convolutional codes encoded by periodically timevarying generator matrices there exists a convolutional code encoded by a generator matrix such that its active segment distance can be lower-bounded by a linearly increasing function. The start of the active segment distance is the largest j for which a' = 0 and is denoted Js
The jth order active row distance is characterized by a fixed number of almost freely chosen information tuples, j + 1, followed by a varying number, between vn,in and m, of zero state driving information tuples ("almost" since we have to avoid consecutive zero states Qi cri+l for 0 < i < j + 1 and ensure that cr 0.) Sometimes we find it useful to consider a corresponding distance between two paths of fixed total length, j + 1, but with a varying number of almost freely chosen information tuples. Hence, we introduce the following (final) active distance:
Definition Let C be a convolutional code encoded by a rational generator matrix G (D) of memory m. The jth order active burst distance is a1b
aef
min {wH(v[o,j])} S[O,;+
where j > Vmin
(3.75)
Chapter 3
122
Distance Properties of Convolutional Codes
For a polynomial generator matrix, we have the following equivalent formulation: b def
aJ = mm {wH(ulo,J]G )}
(3.76)
s[0,;+11
where G1 is given in (3.65). The active row and burst distances are related via the following inequalities:
j-
ab > min i {aj_1
I
(3.77)
aj > min jab
}
When vmi,, = m, we have
r undefined,
1 ar_,,,,
0<j<m j>m
(3.78)
For a noncatastrophic generator matrix we have min {ab} =
dfree
(3.79)
J
From the definition it follows that the active burst distance satisfies the triangle inequality. EXAMPLE 3.5 In Fig. 3.6 we show the active distances for the encoding matrix G(D) = (1 + D + D2 + D3 + D7 + D8 + D9 + D11 1 + D2 +D 3 + D7 + D$ + D9 + D'1). Notice that the active row distance of the 0th order, a', is identical to the row distance of the 0th order, do = 15, which upper-bounds dfree = 12, and
the start j, = 9. aj
01 20
40
60
80
100
NJ
Figure 3.6 The active distances for the encoding matrix in Example 3.5.
From the definitions it follows that the active distances are encoder properties, not code properties. However, it also follows that the active distances are invariants over the set of minimal-basic (or canonical if rational) encoding matrices for a code C. Hence, when we in the sequel consider active distances for convolutional codes, it is understood that these distances are evaluated for the corresponding minimal-basic (canonical) encoding matrices.
Section 3.3
Properties of Convolutional Codes via the Active Distances
123
3.3 PROPERTIES OF CONVOLUTIONAL CODES VIA THE ACTIVE DISTANCES We define the correct path through a trellis to be the path determined by the encoded information sequence, and we call the (encoder) states along the correct path correct states. Then we define an incorrect segment to be a segment starting in a correct state o-t, and terminating in a correct state ort2, tl < t2, such that it differs from the correct path at some but not necessarily all states
within this interval. Let e[k,e) denote the number of errors in the error pattern e[k,e), where e[k,e) = ekek+l ... et-1 For a convolutional code C with a generator matrix of memory m, consider any incorrect segment between two arbitrary correct states, mot, and o't2. A minimum distance (MD) decoder can output an incorrect segment between o t, and trt2 only if there exists a segment of length j + 1 c-tuples, Vm;n < j < t2 - tl, between these two states such that the number of channel errors e[t,,t2) within this interval is at least ab/2. Thus, we have the following Theorem 3.10 A convolutional code C encoded by a rational generator matrix of memory m can correct all error patterns e[t,,t2) that correspond to incorrect segments between any two correct states, ort, and o't2, and satisfy b e[t,+k,t,+l+i) < ai-k/2
(3.80)
for0
Corollary 3.11 A convolutional code C encoded by a rational generator matrix of memory m and smallest constraint length vmin = m can correct all error patterns e[t,,t2) that correspond to incorrect segments between any two correct states, Qt, and (7t2, and satisfy air-k-m/2
e[tl+k,ti+1+i) <
(3.81)
for0
e[i,t) < a[ _i-1/2, tl < i < t
e[t,J) < aj t-1/2, t < j < t2
(3.82)
Proof. Assume without loss of generality that the correct path is the allzero path. The weight of any path of length t - i diverging from the correct path at depth i, i < t, and not having two consecutive zero states is lower bounded by ar i_1 (see Fig. 3.7). Similarly, the weight of any path of length j - t, j > t, remerging with the correct path at depth j and not having two consecutive zero states is lower-bounded by t_1. Hence, if e[i t) < a't__1/2 and e[t, J) < a"t_, /2, then oi must be correct. Since c re at-i-1 + aJ-t
< l
b
aJ-i-1
(3.83)
(see Fig. 3.7), it follows that we can regard Theorem 3.10 as a corollary to Theorem 3.12.
Chapter 3
124
Distance Properties of Convolutional Codes
Figure 3.7 An illustration used in the proof Theorem 3.12.
EXAMPLE 3.6
Assume that the binary, rate R = 1/2, memory m = 2 convolutional encoding matrix G(D) = (1 + D + D2 1 + D2) is used to communicate over a binary symmetric channel (BSC) and that we have the following error pattern e[O,20) = 1000010000000001000000001000000000100001
(3.84)
or, equivalently,
e[o,20)(D) = (10) + (Ol)D2+ (O1)D7+ (1O)D12+ (1O)D17+ (Ol)D19
(3.85)
The active distances for the encoding matrix are given in Fig. 3.8. From Theorem 3.10 it is easily seen that if we assume that t7o is a correct state and that there exists a t' > 20 such that Q, is a correct state, then, despite the fact that the number of channel errors e[0,20) = 6 > dfree = 5, the error pattern (3.84) is corrected by a minimum distance decoder. aj
Figure 3.8 The active distances for the encoding matrix in Example 3.6.
The error pattern e[0 20) = 1010010000000000000000000000000000101001
(3.86)
or, equivalently,
e[o,2o)(D) = (10) + (10)D+ (01)D2+ (10)D17+ (10)D18+ (Ol)D19
(3.87)
also contains six channel errors but with a different distribution; we have three channel errors in both the prefix and suffix 101001. Since v,,;,, = m = 2 and the active row distance a' = 5, the active burst
Section 3.3
Properties of Convolutional Codes via the Active Distances
125
distance a2 = 5. Hence, Theorem 3.10 does not imply that the error pattern (3.86) is corrected by a minimum distance decoder; the states Ql, 02, 018, and 019 will in fact be erroneous states. However, from Theorem 3.12 it follows that if 0o is a correct state and if there exists a t' > 20 such that (7,, is a correct state, then at least 010 is also a correct state.
We will now study the set of code sequences corresponding to encoder state sequences that do not contain two consecutive zero states. From the properties of the active segment distance it follows that such code sequences can contain at most js + 1 zero c-tuples, where js is the start of the segment distance. Lower bounds on the number of nonzero code symbols between two bursts of zeros are given in the following
Theorem 3.13 Consider a binary, rate R = b/c convolutional code, and let and v[m,j,+m] denote code sequences corresponding to state sequences in So' ,'+1], S[m,'m+j'+1] respectively, where or, 01, and 02 denote any encoder states.
1,],
1,],
and
(i) Let w denote the number of ones in (the weight of) a code sequence vco j,] counted
from the beginning of the code sequence to the first burst of j consecutive zero c-tuples. Then wc satisfies c > acc
(3.88)
wj
(ii) Let denote the number of ones in (the weight of) a code sequence j,] counted from the last burst of j consecutive zero c-tuples to the end of the code sequence. Then w'C satisfies (3.89)
rc > rc
'ui
aj+[wrc/C]-1
(iii) Let
denote the number of ones in (the weight of) a code sequence vim, j,_m] j2 counted between any two consecutive bursts of j, and j2 consecutive zero c-tuples, respectively. Then w' J2 satisfies s
WS j2
-> ajl+j2+[wi,,iz/c]-1 s
(3.90)
Proof. (i) The subsequence up to the beginning of the first burst of j consecutive zero c-tuples consists of at least 1 w'/c] c-tuples. Thus, the length of the subsequence that includes
c-tuples; hence, w' must
the first burst of j consecutive zero c-tuples is at least j + satisfy (3.88). (ii) The proof is analogous to the proof of (i).
(iii) Since w', j2 is the weight of the subsequence between the two bursts of J1 and j2 consecutive zeros, respectively, the total length including these bursts of zeros is at least ji + 1w1,1, /c] + j2. The weight of a subsequence of this length is lower-bounded by the corresponding active segment distance, which completes the proof. EXAMPLE 3.7 Consider the code sequences encoded by the binary, rate R = 2/3, minimal-basic convolutional encoding matrix
G(D)=
D2
1+D
l+D+D2 1+D+D2
1+D+D2 1
(3.91)
The active distances are given in Fig. 3.9. In Fig. 3.10 we give the three paths in the trellis which correspond to the minimum values of w', w7, and wj1 J2, respectively. From these paths we obtain the following tables:
Distance Properties of Convolutional Codes
Chapter 3
126
J
1
2
3
min{w'} min{w"}
2
3
3
2
2
3
J2
m in
{war J2 }
1
2
1
0
0
2
0
1
3
1
3
n
Figure 3.9 The active distances for the encoding matrix in Example 3.7.
w:
w rc
.
000
7ODi
S.
00 011
10 000
11
00
00
11 10
000
00
11 000
01 000
10 000
11
10
01
00
00
10 000
11
11
01
01 00
11 000 10
01 000
10 000
11
01
00
00
000
010
101
10 000
11 #000
00
00
01 01
000
010
11 10
000
01 01
011
00 00
010
11
000
01 O1
000
10
00 00
10
Figure 3.10 The trellis paths used in Example 3.7.
00
000
11
00
#00
Section 3.3
Properties of Convolutional Codes via the Active Distances
127
Theorem 3.13 gives the following lower bounds on w', w", and w',, j2 :
J
WC
1
2
3
2
3
3
2
2
3
f2
{Ws 11 Jh
i
wr`
Ji
1
1
2
1
0
0
2
0
1
3
1
1
3
The evaluated lower bounds are tight.
We saw in Example 3.7 that for the encoding matrix given in (3.91) w J2 is symmetric in j1 and j2. The following example shows that this is not true in general: EXAMPLE 3.8 Consider the code sequences encoded by the binary, rate R = 1/2, ODP convolutional encoding matrix
G(D)=(1+D+D4 1+D2+D3+D4)
(3.92)
The active distances are given in Fig. 3.11. For this encoding matrix the values of min{w'}, min{w"}, and min{wj, 12} are given in the following tables:
J
1
2
3
min{w }
3
4
4
3
3
3
J2
min {wit J2}
1
2
1
0
0
2
0
1
3
1
3
aj
Figure 3.11 The active distances for the encoding matrix in Example 3.8.
3
3
Chapter 3
128
Distance Properties of Convolutional Codes
Notice that wj1 iz is not symmetric in ji and j2. From Theorem 3.13 we have the following lower bounds:
J WC
1
2
3
3
4
4
3
3
3
.12
{ws ii'h
1
2
3
1
0
0
1
2
0
1
3
1
1
i
wrc
i
1i
The lower bounds on w and wj` are tight, but that on w',
J2
2
is not.
3.4 LOWER BOUND ON THE DISTANCE PROFILE
In this section we shall derive a lower bound on the distance profile for the ensemble of convolutional codes. First, we give the following
Definition The ensemble F(b, c, m) of binary, rate R = b/c convolutional codes of memory m is the set of convolutional codes encoded by generator matrices G in which each digit in each of the matrices Gi, 0 < i < m, is chosen independently with probability 1 /2 and where G is given in (3.6).
Lemma 3.14 Let E(b, c, m) be the ensemble of binary, rate R = b/c convolutional codes with memory m. The first (m + 1)c code symbols on a path diverging from the allzero path are independent and equally likely to be 0 and 1. Proof. Assume without loss of generality that a path diverge from the allzero path at the root, that is, uo 0. Then we have
vi=u1Go+ul_1G1+
+uoGj
(3.93)
where 0 < j < m. Since vi is a vector determined by Go, G1..... Gi_1 plus uoG1, where 0 and Gj is equally likely to be any b x c binary matrix, vi assumes each of its 2' possible values with the same probability. Furthermore, it is independent of the previous code symbols vo, v1, ... , v1_1 for 1 < j < m and the proof is complete. uo
We will now prove
Theorem 3.15 There exists a binary, rate R = b/c convolutional code with a generator matrix of memory m whose column distances satisfy
d > pc(j + 1)
(3.94)
for 0 < j < m, and where p, 0 < p < 1/2, is the Gilbert-Varshamov parameter; that is, the solution of
h(p) = I - R
(3.95)
where h(p) is the binary entropy function (1.89). Before proving this theorem, we show the optimum distance profile together with the lower bound (3.94) for a rate R = 1/2 (p = 0.11), binary convolutional code in Fig. 3.12.
Section 3.4
129
Lower Bound on the Distance Profile
d'. pe (j + 1) 30 +
m 60
40
20
80
100
Figure 3.12 ODP and its lower bound for rate R = 1 /2 and 0 < m < 96.
Proof. Let do,e, 0 < < 2b, denote the weights of the branches with uo 0 0 stemming from the root. From Lemma 3.14 we have
P(do,e = k) =
(c) (1)
(3.96)
for 0 < k < c and 0 < f < 2b. Consider all paths stemming from the £th initial branch with uo ¢ 0 for 0 < f < 2b. That is, these paths begin at depth 1 and not at the root! Now let us introduce the random walk So = 0, Si, S2, ... , where Sj
Zi, j = 1, 2, .. .
(3.97)
t-t
where
Zi = E V,1
(3.98)
e=1
with Yie = a if the £th symbol on the (i + 1)th branch is 1 and Yie = ,B if it is 0, and an absorbing barrier which will be specified later. According to Lemma 3.14, P(Yie = a) = P(Yie = ,B) = Z, 0 < i < m, 1 < £ < c (see Example B.2). The random walk begins at depth 1 with So = 0. Let wj be the weight of the corresponding path of length j + 1 branches starting at the root. Then equation (3.97) can be rewritten as
Sj _ (wj - k)a + (jc - wj + k),B, j = 0, 1, ...
(3.99)
Furthermore, we notice that if we choose a = 1 and fi = 0, then k + Sj should stay above the straight line in Fig. 3.12; that is,
k + S3 = wj > pc(j + 1)
(3.100)
for 0 < j < m for all paths in order to guarantee the existence of a code with a generator matrix of memory m satisfying our bound. Since it is more convenient to analyze a situation with an absorbing barrier parallel to the j-axis, we choose 1
-p
p
(3.101)
Distance Properties of Convolutional Codes
Chapter 3
130
By inserting (3.101) into (3.99), we obtain
k+ Sj = wj(1 - p) - (jc - wj)p
=wj - pcj, j =0,1,...
(3.102)
Thus, from (3.102) it follows that the random walk Sj will stay above an absorbing barrier at
cp - k; that is, (3.103)
or, equivalently, (3.104)
if and only if (3.105)
for all paths with uo
0.
In the sequel we consider only k > ko = [cpl. If k < ko, then we say that the random walk was absorbed already at the beginning. The probability of this event, Po, is upper-bounded by
PO < (2b - 1)Y
(k) (2)c
(3.106)
k=O
To estimate the probability that the random walk is absorbed, we introduce a random variable (indicator function) ji such that ji = 1, if the random walk for a path leading to the ith node at depth j + 1 will cross the barrier for the first time at depth j + 1, and i;ji = 0 otherwise. The average of the random variable ji is equal to the probability that the random walk S j hits or drops below the barrier cp - k for the first time at depth j + 1; that is, (3.107)
Let P(k) denote the probability that the random walk is absorbed by the barrier at cp - k. Then, summing (3.107) over 1 < j < m and 1 < i < 2bj, we get m
2"
00
P(k) _
(3.108)
j=1
j=1 i=1
Using the notation from Appendix B (B.55), we have
fo,j(cp - k, v)
(3.109)
v
where fo, j (cp - k, v) denotes the probability that the random walk is not absorbed at depths i < j by the barrier at (cp - k) and Sj = v. Hence, the right side of inequality (3.108) can be rewritten as 00
YY
fo,j (cp - k, v)2bj
j=1 v
In Appendix B we prove (cf. Corollary B.6) that for any Ao < 0 such that the moment-generating
function of the random variable Zi given by (3.98) equals g(Ao) = E[2A0zu] = 2-b
(3.110)
and such that g'(Ao) < 0 we have 00
f,,, j (cp - k, v)2bj < 2-A0(cp-k) j=1 v
(3.111)
Lower Bound on the Distance Profile
Section 3.4
131
Choose Ao = log
P
(3.112)
1-p
Then we have (see also Example B.2)
g(,o) =
(2Aoa I
c
+
)
22xoR
= (l2(1_P)1o5i
+ 12-P log
c
=(2
(3.113)
(
-
_ (P_P(l 1
p)1-P)
`
= 2(-1+h(p))c = 2-Rc = 2-b and
Ix=xo =
C (°°
+ 2A00)
C
1
(20a +
2Aofl)
In 2
1(1p2(1-p)log
=C(12'O +12,'os)
(3.114)
P
)Pp
ln2=0
Combining (3.111) and (3.113), we get (cf. (B.81)) P(k) < 2-a.o(cp-k) k > k0
(3.115)
where we have used the facts that A,0 < 0 and g'(),o) = 0. Since 2-ko(cp-k) > I
(3.116)
fork < ko = Lcp J , we can further upper-bound (3.106) by Po < (2b - 1)
2-ko(cp-k)
E (C) (2)C
(3.117)
k
Finally, using (3.117) and summing over all nodes at depth 1 with uo the probability of absorption by
Po + (2b - 1)
E
_
(2b-c
-
k=0 c
=
(2(R 1)c - 2-c)2cplog (2-h(P)c
- 2-`) ((I -
k IP -P
(3.118)
Y
(1 + 2-109 p)-(1-P)p-P)c
=I-2-c(1-h(P))=1-2-b <1
for0
(k)
(C)2_klo1g
2-c)2cplog1PP
0
=
(i)C20(cP-k)
(k) (1)c P(k) < (2b - 1)
k=ko+1
0, we upper-bound
=
(2-h(P)c
-
2-c)2ch(P)
Chapter 3
132
Distance Properties of Convolutional Codes
Since the probability of absorption is strictly less than 1, there exists a convolutional code with a generator matrix of memory m whose distance profile satisfies the bound and, hence, the proof is complete. We have immediately the following
Corollary 3.16 There exists a binary, rate R = b/c convolutional code with a generator matrix of memory m whose minimum distance satisfies
drain > pc(m + 1)
(3.119)
where p is the Gilbert-Varshamov parameter.
3.5 UPPER BOUNDS ON THE FREE DISTANCE
We will now prove an upper bound on the free distance based on Plotkin's bound for block codes. For the sake of completeness we start by proving Plotkin's upper bound on the minimum distance for block codes.
Lemma 3.17 (Plotkin) The minimum distance for any binary block code of M codewords and block length N satisfies
NM (3.120) 2(M - 1) Consider an arbitrary column in the list of M codewords. Suppose that the drain <
Proof. symbol 0 occurs no times in this column. The contribution of this column to the sum of the distances between all ordered pairs of codewords is 2no(M - no), whose maximum value M2/2 is achieved if and only if no = M/2. Summing the distances over all N columns, we have at most NM2/2. Since drain is the minimum distance between a pair of codewords and since there are M(M - 1) ordered pairs, we have
M(M - 1)dmin <
NM2 2
(3.121)
and the proof is complete.
Heller [He168][LaM70] used Plotkin's bound for block codes to obtain a surprisingly tight bound on the free distance for convolutional codes. We regard Heller's bound as an immediate consequence of the following
Theorem 3.18 The free distance for any binary, rate R = b/c convolutional code encoded by a minimal-basic encoding matrix of memory m and overall constraint length v satisfies dfree
min
(m+i)c 11 20 - 2v-b(m+i))
]j
(3.122)
Proof. Any rate R = b/c convolutional code can be encoded by a minimal-basic encoding matrix whose realization in controller canonical form has 2v encoder states. Consider
i = 1, 2, ... , information sequences. There exist 2b("`+`)/2° information sequences starting in the zero state leading to the zero state. The corresponding code sequences constitute a block code with M = 2b(m+`)_U codewords and block length N = (m + i)c for i = 1, 2, ... . Apply Lemma 3.17 for i = 1, 2, ... and the proof is complete. 2b("`+`)
Since v < bm we have
Section 3.5
Upper Bounds on the Free Distance
133
Corollary 3.19 (Heller) The free distance for any binary, rate R = b/c convolutional code encoded by a minimal-basic encoding matrix of memory m satisfies min
dfree
(m + i)c J} 2(1 {[ - 2-bi)
(3.123)
From Heller's bound follows immediately
Corollary 3.20 (Heller asymptotic bound) The free distance for any binary, rate R = b/c convolutional code encoded by a minimal generator matrix of memory m satisfies urn
dfree
<
m-oo mC
1
2
(3.124)
In fact, Heller's bound is valid not only for convolutional codes but also for a larger class of codes, viz., the so-called class of nonlinear, time-varying trellis codes. For convolutional codes (i.e., linear, time-constant trellis codes), we can use Griesmer's bound for block codes to obtain slight improvements for some memories [Gri60].
Lemma 3.21 (Griesmer bound for linear block codes) For a binary, linear, rate R = K/N block code with minimum distance dmin we have x-1
TI
2' I< N
dmin
2'
(3.125)
I
II
Proof. Without loss of generality, we assume that the first row of the generator matrix G is (111 ... 10 ... 0) with drain ones. Every other row has at least [dmin/2] ones or [drain/2] zeros in the first d,nin positions. Hence, in either case they have at least [drain/2] ones in the remaining N - dmin positions. Therefore, the residual code with respect to the first row is a
rate R = (K - 1)/(N - d,nin) code with minimum distance > [dmin/2]. Using induction on K completes the proof. Consider the binary, linear block code in the proof of Theorem 3.18 with M = 2b(m+i)-v codewords and block length N = (m + i) c, i = 1, 2, .... The number of information symbols
K = log M > b (m + i) - v > bi, i=1,2,...
(3.126)
The minimum distances f o r these block codes, i = 1 , 2, ... , must satisfy the Griesmer bound. Hence, we have
Theorem 3.22 (Griesmer bound for convolutional codes) The free distance for any binary, rate R = b/c convolutional code encoded by a minimal-basic encoding matrix of memory m satisfies bi-1 J
L1
TI dfte
I2'I
< (m+i)c
(3.127)
fori = 1,2.....
EXAMPLE 3.9
(a) LetR = 1/2andm = 16. Since mini, i{[(16+i)/(1-2-i)j} = 21, it follows from the Heller bound that any binary, rate R = 1/2 convolutional code with a generator matrix with memory m = 16 must have dfree < 21. Since E j-o [21/21] = 41 -Z (16+4) 2 = 40, any binary, rate R = 1/2 convolutional
code with a generator matrix with memory m = 16 must have dfree < 21. The Griesmer bound gives an improvement by one. Such a code exists with dfree = 20.
Distance Properties of Convolutional Codes
Chapter 3
134
(b) LetR = 1/2andm = 18. Since mini,t{[(18+i)/(1-2-i)j} = 23, it follows from the Heller bound that any binary, rate R = 1/2 convolutional code with a generator matrix with memory m = 18 must have dfree < 23. Since 123/2] 1 < (18+i)2 for all i > 1, the Griesmer bound for convolutional codes does not give any improvement over the Heller bound for convolutional codes in this case. The largest free distance for any binary, rate R = 1/2 convolutional code with a generator matrix with memory m = 18 has been determined by exhaustive search to be dfree = 22. Thus, the Griesmer bound is not tight.
For rate R = 1/2, binary convolutional codes, we have calculated Heller's upper bound for memories 1 < m < 39. Using Griesmer's bound, we obtained an improved bound for some values of m by one or two (the bold values): m
0
1
2
3
4
5
6
7
Heller Griesmer
2 2
4 4
5
6
8
5
6
8
9 8
10 10
8
9
10
11
12
11
12
13
14
16
11
12
13
14
16
17 16
13
14
15
16
17
18
19
20
21
22
23
24
25
18
19
20
21
28
29
30
20
25 24
27
20
24 24
26
18
22 22
23
17
26
27
28
29
30
26
27
28
29
30
31
32
33
34
35
36
37
38
39
32
33 32
34 32
35 34
36 35
37 36
38 37
39 38
40 40
41 40
42
43 42
44 44
45 44
32
23
41
These results are shown in Fig. 3.13 and compared with the free distance for optimum free distance (OFD) fixed convolutional codes. The upper bound is surprisingly tight. dfree
I
m
0 + 0
10
20
30
Figure 3.13 Heller and Griesmer upper bounds on the free distance and the free distance for rate R = 1/2 OFD codes.
Section 3.5
Upper Bounds on the Free Distance
135
We notice for example, that, for memory m = 4 there is a gap. The Griesmer bound implies that dfree < 8, but the best rate R = 1/2 convolutional code of memory m = 4 has dfree = 7. However, there exists a rate R = 2/4 convolutional code of the same complexity, that is, memory m = 2, with dfree = 8. Its encoding matrix is [JoW98]
G(D)
-
D+D2 1+D D2 1 + D + D 2 D+D2 1+D 1+D+D2) \ I
(3.128) .128)
and the first few spectral components are 12, 0, 52, 0, 260. The optimum free distance rate R = 1/2 convolutional code has encoding matrix
G(D) = (1 + D + D4
1 + D2 + D3 + D4)
(3.129)
and first few spectral components 2, 3, 4, 16, 37. Next, we will consider convolutional codes encoded by polynomial, systematic encoding matrices and derive counterparts to Heller's and Griesmer's bounds for convolutional codes encoded by general generator matrices.
Theorem 3.23 (Heller) The free distance for any binary, rate R = b/c convolutional code encoded by a polynomial, systematic encoding matrix of memory m satisfies
(I (m(1-R)+i)c Il
(3.130) J1 Consider a convolutional code encoded by a polynomial(!), systematic encoding dfree
min jl i>1
L
2(1 - 2-bi)
Proof. matrix realized in controller canonical form. The code sequences of length (m + i)c starting at the zero state and remerging at the zero state constitute a block code of M = 2b(m+i)-» codewords, where v is the overall constraint length. Append to the shift registers m - vi, 1 < i < b, memory elements without connecting them to the output. The corresponding block code encoded by this new encoder is an expurgated block code whose minimum distance is at least as large as the minimum distance of the original block code. In order to obtain the merger at the zero state, we now have to feed the encoder with m allzero b-tuples. Hence, the expurgated block code has only Mexp = 2bi codewords, and each of them has b zeros as the first code symbols on each of the m last branches before merging with the zero state. The "effective" block length is reduced to
N = (m+i)c-mb=(m(1-R)+i)c
(3.131)
Applying Lemma 3.17 completes the proof. From Theorem 3.23 follows immediately the systematic counterpart to Corollary 3.20:
Corollary 3.24 (Heller asymptotic bound) The free distance for any binary, rate R = b/c convolutional code encoded by a polynomial, systematic encoding matrix of memory m satisfies
lim
dfree
m-*oo mC
< 1-R 2
(3.132)
Finally, by using (3.131) in the proof of Theorem 3.22, we obtain
Theorem 3.25 (Griesmer bound for convolutional codes) The free distance for any rate R = b/c convolutional code encoded by a polynomial, systematic encoding matrix of memory m satisfies bY,
rdfe1 fori = 1,2.....
(3.133)
Chapter 3
136
Distance Properties of Convolutional Codes
3.6 TIME-VARYING CONVOLUTIONAL CODES So far we have considered only time-invariant or fixed convolutional codes, that is, convolutional codes encoded by time-invariant generator matrices. When it is too difficult to analyze the performance of a communication system using time-invariant convolutional codes, we can often obtain powerful results if we study time-varying convolutional codes instead. Assuming polynomial generator matrices, we have
vt = utGo + ut_iGI +
+ ut_mGm
(3.134)
where Gi, 0 < i < m, is a binary b x c time-invariant matrix. In general, a rate R = b/c, binary convolutional code can be time-varying. Then (3.134) becomes
vt = utGo(t) + ut-t G i (t) + ... + ut-mGm (t) (3.135) where Gi (t), i = 0, 1, ... , m, is a binary b x c time-varying matrix. In Fig. 3.14 we illustrate a general time-varying polynomial convolutional encoder. As a counterpart to the semi-infinite matrix G given in (3.6), we have
Go(t) Gt(t+1) Gt =
Go(t + 1)
...
Gm(t+m)
Gl (t + 2)
...
Gm(t + 1 + m)
(3.136)
Figure 3.14 A general time-varying polynomial convolutional encoder.
Remark.
With a slight abuse of terminology we call for simplicity a time-varying poly-
nomial transfer function matrix a generator matrix, although it might not have full rank.
We have the general ensemble of binary, rate R = b/c, time-varying convolutional codes with generator matrices of memory m in which each digit in each of the matrices G, (t) for 0 < i < m and t = 0, 1, 2, .. . is chosen independently and is equally likely to be 0 and 1. As a special case of the ensemble of time-varying convolutional codes, we have the ensemble of binary, rate R = b/c, periodically time-varying convolutional codes encoded by a polynomial generator matrix Gt (3.136) of memory m and period T, in which each digit in each of the matrices Gi (t) = Gi (t + T) for 0 < i < m and t = 0, 1, ... , T - 1, is chosen independently and is equally likely to be 0 and 1. We denote this ensemble E(b, c, m, T). The ensemble E(b, c, m) of (time-invariant) convolutional codes that we encountered in Section 3.4 can be considered as the special case E(b, c, m, 1), and the ensemble of general time-varying convolutional codes defined above will be denoted E(b, c, m, oo). Before we define the active distances for periodically time-varying convolutional codes encoded by time-varying polynomial generator matrices, we introduce the following sets of information sequences, where we always assume that ti < t2.
Section 3.6
Time-Varying Convolutional Codes
137
Let U[t, t2+m] denote the set of information sequences ut, -mut, -m+1 . . . ut2+m such that the first m and the last m subblocks are zero and such that they do not contain m + 1 consecutive zero subblocks; that is, r
def
u[t,-m,t2+m] = {u[ti-m,t2+m] I u[t1-m,t,-1] = 0,
(3.137)
u[t2+1,t2+m] = 0, and u[i,i+m] :0, ti - m < i < t2} t2] denote the set of information sequences ut1-mut,-m+1 ... ut2 such that the first m subblocks are zero and such that they do not contain m + 1 consecutive zero subblocks; that
Let U,',-m is,
def
c
u[ti-m,t2] = {u[h-m,t2] I u[t1-m,t,-l] = 0,
(3.138)
0,t1 - m
andu[i,i+m]
Let U1t; _m,t2+m] denote the set of information sequences ut, -muti _m+1 ... ut2+m such that the
last m subblocks are zero and such that they do not contain m + 1 consecutive zero subblocks; that is, def
rc
u[h-m,t2+m] = {u1t1-m,t2+m] I u[t2+1,t2+m] = 01
(3.139)
and u[i,i+m] 0 0, tl - m < i < t2} Let Uti -m t2] denote the set of information sequences ut,-mut,-m+1
. .
. ut2
such that they do not
contain m + 1 consecutive zero subblocks; that is, U[[tI -m, t2]
def
{u[t,-m,t2] I u[i,i+m]
0, tl - m < i < t2 - m)
(3.140)
Next, we introduce the (j + m + 1) x (j + 1) truncated, periodically time-varying generator matrix of memory m and period T : Gm (t)
Gm(t + 1)
Gm-1(t)
Gm-1(t + 1)
G[t,t+j] =
Go(t)
:
Go(t + 1)
(3.141)
Gm(t + j) Gm_1(t + j)
Go(t + j)
I
where G1(t) =Gi(t+T)for0 < i <m. We are now well-prepared to generalize the definitions of the active distances for convolutional codes encoded by polynomial generator matrices to time-varying convolutional codes encoded by polynomial time-varying generator matrices:
Definition Let C be a periodically time-varying convolutional code encoded by a periodically time-varying polynomial generator matrix of memory m and period T. The jth order active row distance is a; def
min t
min
U r,-m,,+j+m]
{WH(u[t-m,t+j+m]G[t,t+j+m])}
(3.142)
The jth order active column distance is
aic def = min uMin {WH(u[t-m,t+j]G[t,t+j])} `, m.,+n
(3.143)
Distance Properties of Convolutional Codes
Chapter 3
138
The j th order active reverse column distance is
a r`
def
min min {WH(u[t-m,t+.i]G[t,t+i])}
(3.144)
The jth order active segment distance is def
aas= min min {WH(u[t-m,t+l]G[t,t+ i])} t
.
(3.145)
u[t-.,t+;,
For a periodically time-varying convolutional code encoded by a periodically timevarying, noncatastrophic, polynomial generator matrix with active row distance a' we define its free distance by a generalization of (3.60) aef
(3.146)
ditee
In the following two sections we will derive lower bounds on the free distance and on the active distances. There we need the following
Theorem 3.26 Consider a periodically time-varying, rate R = b/c, polynomial generator matrix of memory m and period T represented by Gt, where Gt is given in (3.136). (i) Let the information sequences be restricted to the set U[t-m,t+j+m] Then the code symbols in the segment v[t,t+j+m] are mutually independent and equiprobable over
the ensemble £(b, c, m, T) for all j, 0 < j < T. (ii) Let the information sequences be restricted to the set U[t-m,t+j] Then the code symbols in the segment v[t,t+.i] are mutually independent and equiprobable over the
ensemble £(b, c, m, T) for all j, 0 < j < max{m + 1, T}. (iii) Let the information sequences be restricted to the set u[t`m,t+j] Then the code symbols in the segment v[t,t+i] are mutually independent and equiprobable over the
ensemble £(b, c, m, T) for all j, 0 < j < max{m + 1, T}. (iv) Let the information sequences be restricted to the set u[t-m t+i] Then the code symbols in the segment v[t,t+.i] are mutually independent and equiprobable over the
ensemble £(b, c, m, T) for all j, 0 < j < T. Proof.
It follows immediately that for 0 < j < T the code tuples vi, i = t, t +
1, ... , t + j, are mutually independent and equiprobable in all four cases. Hence, the proof of (iv) is complete. In cases (ii) and (iii) it remains to show that the statements also hold for
T < j <m when m > T. (ii) Consider the information sequences in the set u[t-m t+j], where 0 < j < m. Let t < i < t + j ; then, in the expression vi
=uiGo(i)+u1_1G1(i)+...+ui-mGm(i)
(3.147)
there exists a k, 0 < k < m, such that at least one of the b-tuples ui_k is nonzero and all the
previous b-tuples ui_kt, k < k' < m, are zero. Hence, vi and vi,, t < i < i' < t + j, are mutually independent and equiprobable. This completes the proof of (ii).
(iii) Consider the information sequences in the set u[t`-m,t+j,, where 0 < j < m. Let t < i < t + j; then, in (3.147) at least one of the b-tuples ui_k, 0 < k < m, is nonzero, and all the following b-tuples ui_kt, 0 < k' < k, are zero. Hence, vi and vi-_ t < i < i' < t + j, are mutually independent and equiprobable. (i) For the information sequences in u[t-m,t+i+m] it remains to show that vi and vi- are
mutually independent and equiprobable also for T < i' - i < T + m. From the definition of u[t-m,t+I+m] it follows that u[t_m,t_1] = 0, ut
0, ut+.i
0, and u[t+,i+1,t+1+m] = 0. For
Section 3.7
Lower Bound on the Free Distance
139
j = T, we can choose, for example, u[t_m,t+m] = u[t+T-m,t+T+m] E U[t-m,t+T+m] which implies that v[t,t+m] = V[t+T,t+T+m] However, for T - m < j < T, vi, t < i < t + m, and vi,, t + j < i' < t + j + m, are mutually independent and equiprobable. From Theorem 3.26 follows immediately
Corollary 3.27 Consider a rate R = b/c polynomial generator matrix of memory m represented by G, where G is given in (3.6).
(i) Let the information sequences be restricted to the set U[m,t+j] Then the code symbols in the segment v[t,t+j] are mutually independent and equiprobable over the
ensemble £(b, c, m, 1) for all j, 0 < j < m. (ii) Let the information sequences be restricted to the set U[t`m,t+j+m] Then the code symbols in the segment v[t,t+j] are mutually independent and equiprobable over the
ensemble £(b, c, m, 1) for all j, 0 < j < m. 3.7 LOWER BOUND ON THE FREE DISTANCE In this section we will derive a lower bound on the free distance for the ensemble of periodically time-varying convolutional codes. This bound is due to Costello [Cos74], but our proof
is slightly different. Our goal is to find a nontrivial upper bound on the probability that dfree < d, that is, to prove that for the ensemble of periodically time-varying convolutional codes P (dfree < d) < 1, since then we know that at least one code exists within our ensemble that has dfree ? d. First, we combine the definition of the free distance (3.146) with the definition of the jth order active row distance (3.142) and obtain
dfree = minmin [min {WH(u[t_m,t+j+m]G[t,t+j+m])} j U,-m,t+j+m] t
(3.148)
Then we consider the ensemble of periodically time-varying convolutional codes encoded by polynomial generator matrices with the information sequences restricted to Urt,-m,t2+m] The probability that the free distance for a randomly chosen code in this ensemble is less than d is equal to the probability that for at least one u[t_m,t+j+m] E U[t-m,t+j+m], t = 0, 1, ... , T - 1
and j = 0, 1, 2, ..., we have WH(u[t-m,t+j+m]G[t,t+j+m]) < d
(3.149)
Thus, we have P(dfree < d) = P (mino
min min
min
0
+ P I min min 0
T
< T P (O<j
min
min
{WH(u[t_m,t+j+m]G[t,t+j+m])} < d {WH(u[t_m,t+j+m]G[t,t+j+m])} < d
{WH(u[t-m,t+j+m]G[t,t+j+m])} < d
U(t-m,t+j+m]
+ T P min
min
j>T U[,-m,,+j+m7
{WH(u[t-m,t+j+m]G[t,t+j+m])} < d
where the two inequalities follow from the union bound.
(3.150)
140
Distance Properties of Convolutional Codes
Chapter 3
In the sequel we will call code sequences (3.151)
v[t,t+j+m] = u[t_m,t+j+m]G[t,t+j+m]
where u[t_m,t+j+m] E u[t-m,t+j+m] for incorrect sequences. Then it follows from Theorem 3.26 that for 0 < j < T the incorrect sequences are sequences of mutually independent, equiprobable, binary symbols. For any fixed t, the set u[t-m,t+j+m] contains (2b - 1) sequences for j = 0 and at most 2(j-1)b(2b - 1)2 sequences for j > 1. We use 2(j+1)b as an upper bound on the cardinality of u[rt-m,t+j+m].
First, we consider only incorrect sequences v[t,t+j+m] whose lengths are at most T + m. The probability for each of these sequences is 2-(j+m+1)c, and there are ((j+m+1)c) ways of choosing exactly i ones among the (j + m + 1)c code symbols. Hence, we have
min
P(
{WH(u[t-m,t+j+m]G[t,t+j+m])} <
d) (3.152)
I (j+m+1)cl l
i=0
for 0 < j < T and t = 0, 1, 2, ... . Using the union bound we can obtain an upper bound on
P (min
{wx(u[t_m,t+j+m]G[t,t+j])} <
min
d)
by summing the probabilities for all incorrect sequences of lengths m + j + 1 < T + m with weights less than d. Thus, we have
P
\
( min
min
{wx(u[t-m,t+j+m]G[t,t+j])} < d I
T-1 d-1
< Y,
2(j+ 1)b
((' + m + 1)c 2 (j+m+1)c l
j=0 i=0
(3.153)
0o d-1
< E E 2(j+1)b (J + m + 1)c 2_(j+m+1)c j=0 i=0
We use the substitution
k = (j + m + 1)c
(3.154)
and upper-bound (3.153) by summing over k = 0, 1, 2, ...,
P ( min
min
{wx(u[t-m,t+j+m]G[t,t+j])} <
O< j
d1 oc < 2-mb
(k)_1)
d) (3.155)
i=0 k=0
Let
x=2R-1 1 <x < 2
(3.156)
Section 3.7
Lower Bound on the Free Distance
141
and rearrange (3.155) as min P (0<.i
min
(WH(u[t-m,t+.i+m]G[r,r+,i])} < d 00
2-mb
d-1 xi
k(k-1)...(k-i+1)xk-i
i=0
=2 -mb
k=0 oc
d-1 xi ; II
x.k)
(1)
-2
d-1 mb
\
1
i!
(1 -X )
'X x
- 2
1-xi_0 1-x
-
2-mb
(0
xi
1
(3.157)
-x )d
1-x
IXx-1
2-mb
< (2x - 1)(x-1 - 1)d Using (3.156), we obtain
P ( min
min
O<j
{WH(u[t-m,t+i+m]G[t,t+j])} < d (3.158)
2-mb
< (2R - 1)(21-R -
1)d
Next, we will consider sequences of lengths greater than T + m, that is, sequences for which
j > T. Then we have
P min
min
j>T ur,-m.,+;+m,
{WH(u[t-m,t+j+m]G[t,r+,i+m])} < d
{WH(u[t-m,t+T-1]G[t,t+T-1])} <
min
d)
(3.159)
d-1 < 2Tb
d` (Tc)2_Tc l
where the first inequality follows from the fact that the weight of a sequence is a nondecreasing function of the length of the sequence. To obtain the second inequality, we use 2Tb as an upper bound on the cardinality of UU-m,t+T-11 We will now interrupt our derivation and prove the following useful
Lemma 3.28 For 0 < y < 1/2, Lyn]
(n) < 2h(y)n
(3.160)
where h(y) is the binary entropy function (1.89).
Since 0 < y < 1/2, we have
Proof. i=0
()
(n) 1y
n
yn
i=O
(1; =
)Yfl
(n) =p
=(1-y)yny-yn
\1 Yy)`
(1+ yy)n 1
Distance Properties of Convolutional Codes
Chapter 3
142
_ (1 - y)Yn-nY-Yn =
(y-Y(l
-
y)-(1-Y))n
(3.161)
= 2h(y)n
F-I
From Lemma 3.28 it follows that d-1
(Tc)
< 2h(T,i)Tc < 2h(r )Tc
(3.162)
i=o
Hence, we obtain
P min
min
j>T L[ , -m.,+j+m]
{WH(u[t-m,t+j+m]G[t,t+j+m])} < d (3.163)
< 2(h(r )+R-I)Tc By combining (3.150), (3.158), and (3.163) we obtain mb
P(dfree < d) < T
- l)d +2
((2R - 1)(21_R
(h(Tc)+R-1)Tc)
(3.164)
Let us now choose the period T >> m, T = m2, say, and choose d = d < cm, where d satisfies 2 mb
2
m
(2R -
+ 2(h(
)+R-1)m2c
1)(21-R - l)d
(3.165)
mb 2
< m2
+2
C(2R
(h(m)+R-I)m2c
<1
1)(21-R - 1)d For large memories m such a d always exists. From (3.165) it follows that'
d>
-mb
log(21-R
log(2R - 1) + log(m-2 - 1)
-
-2
log(21 R -
(h(')+R-1)m2c)
m
1)
(3.166)
-mb log(21-R - 1)
- O (log m)
and, finally, we have proved
Theorem 3.29 (Costello) There exists a binary, periodically time-varying, rate R = b/c convolutional code with a polynomial generator matrix of memory m that has a free distance satisfying the inequality dfree
me
>
+ O (logm)
R
- log(21-R - 1)
(3.167)
m
Since the overall constraint length
v < mb
(3.168)
we have
Corollary 3.30 There exists a binary, periodically time-varying, rate R = b/c convolutional code with a polynomial generator matrix of overall constraint length v that has a free distance satisfying the inequality dfree
vR-1
R
-log(21
- 1)
+O
log y
(3.169)
V
'Here and hereafter we write f (x) = O(g(x)) if I f (x) I< Ag(x) for x sufficiently near a given limit, A is a positive constant independent of x and g(x) > 0. We have, for example, f (x) = log x, x > 0, can be written as when x + cc. f (x) = O(x) when x - oc and f (x) = +x+] , x > 0, can be written as f (x) = 0 SOX
Section 3.8
143
Lower Bounds on the Active Distances
As a counterpart to Theorem 3.29 we can prove (see Problem 3.17) the following
Theorem 3.31 There exists a binary, periodically time-varying, rate R = b/c convolutional code with a polynomial, systematic generator matrix of memory m that has a free distance satisfying the inequality R)
dfree
-1 g(2f-
me
I)
+O
logmm
(3.170)
Remark. The Costello bound is also valid for time-invariant convolutional codes with "sufficiently long" branches, that is, for large c [Zig86]. In [ZiC91] [CSZ92] it was shown that
the free distance for rate R = 2/c, c > 4, time-invariant convolutional codes asymptotically meets the Costello bound. 3.8 LOWER BOUNDS ON THE ACTIVE DISTANCES* In this section we will derive lower bounds on the active distances for the ensemble of periodically time-varying convolutional codes. First, we consider the active row distance and begin by proving the following
Lemma 3.32 Consider the ensemble E(b, c, m, T) of binary, rate R = b/c, periodically time-varying convolutional codes encoded by polynomial generator matrices of memory m. The fraction of convolutional codes in this ensemble whose jth order active row distance
a', 0 < j < T, satisfies
a' < ai < (j +m + l)c/2 does not exceed +i
j+m +1 T2 ( '
R+h( (j +
'
+1)c
(3.171)
)-1)(j+m+) c
where h( ) is the binary entropy function (1.89). Proof. Let (3.172)
v[t,t+j+m] = u[t_m,t+j+m]G[t,t+j+m] where u[t_m,t+j+m] E L [t-m,t+j+m] and assume that
a < (j + m + 1)c/2
(3.173)
Then, it follows from Theorem 3.26 that P(WH(v[t,t+j+m])
;) = - a((j <2
h
+ m + 1)c)-(j+m+1)c 2
-o
a+'+i>c
(3.174)
`
(j+m+1)c
0< j< T- m
where the last inequality follows from Lemma 3.28. [Notice that we need the denominator "2" in the right inequality in (3.171) in order to be able to apply inequality (3.160)]. Using 2(j+l)b = 2(j+l)Rc (3.175) as an upper bound on the cardinality of u[t-m,t+j+m], we have min
P (Ufr-m,r+j+ml
(WH(V[t,t+j+m])] <
a
°.r
< 20+1 )Rc 2 (h( u+m+oc
-I)(j+m+l)c (3.176)
-2
aj
j+t R+h i'+'
(j+m'
1 .) -)(j+m+l)c
for each t, 0 < t < T. Using the union bound completes the proof.
Distance Properties of Convolutional Codes
Chapter 3
144
For a given f, 0 < f < 1, let jo be the smallest integer j satisfying the inequality
j+1
TZ (j+m+1)c>loglf j+m+1R
(1
(3.177)
For large memories m such a value always exists. Let a, ,
0 < a < (j + m + 1)c/2
(3.178)
denote the largest integer that for given f, 0 < f < 1, and j, j > jo, satisfies the inequality +1
j+m+1R+h < - log
aj (j + m + 1)c
-1 (j+m+1)c (3.179)
T2
1-f
Then, from Lemma 3.32 follows that for each j, jo < j < T, the fraction of convolutional codes with jth order active row distance satisfying (3.171) is upper-bounded by T2_ log
1T ff
=
f
1
T
(3.180)
Hence, we use the union bound and conclude that the fraction of convolutional codes with active row distance a < a for at least one j, jo < j < T, is upper-bounded by T-m-1 1 j=jo
-f T
<1-f
(3.181)
Thus, we have proved the following
Lemma 3.33 In the ensemble £(b, c, m, T) of periodically time-varying convolutional codes, the fraction of codes with active row distance
a' > a', jo < j < T
(3.182)
is larger than f, where for a given f, 0 < f < 1, jo is the smallest integer satisfying (3.177) and a the largest integer satisfying (3.179). By taking f = 0, we have immediately
Corollary 3.34 There exists a binary, periodically time-varying, rate R = b/c, convolutional code encoded by a polynomial generator matrix of period T and memory m such that its jth order active row distance for jo < j < T is lower-bounded by a is the largest integer satisfying aj
1+1
___
j+m+1 R+h\ (j+ m+1)c -1 ) (J + m +1) c
(3.183)
< -2logT and jo is the smallest integer satisfying C1
j+1 j+m+1R (j +m+1)c>21ogT
(3.184)
In order to get a better understanding for the significance of the previous lemma, we shall study the asymptotical behavior of the parameters jo and a for large memories. Let the period T grow as a power of m greater than one; choose T = m2, say. Then, since jo is an integer, for large values of m we have jo = 0. Furthermore, the inequality (3.183)
Section 3.8
Lower Bounds on the Active Distances
145
can be rewritten as
h (j+m+1)c aIr
J+1 R+0(logm) <1- j+m+1 m ))
(3.185)
or, equivalently, as2
j+1 R)(j+m+1)c+O(log m) j+m+1
(3.186)
Finally, we have proved
Theorem 3.35 There exists a binary, periodically time-varying, rate R = b/c, convolutional code encoded by a polynomial generator matrix of memory m that has a j th order active row distance satisfying the inequality
J+1 R (j+m+1)c+O(logm) j+m+1
(3.187)
forj>0. The main term in (3.187) can also be obtained from the Gilbert-Varshamov bound for block codes using a geometrical construction that is similar to Forney's inverse concatenated construction [For74a]. Consider Gilbert-Varshamov's lower bound on the (normalized) minimum distance for block codes [MaS77], viz.,
Nn >
h-1(1
- R)
(3.188)
where N denotes the block length. Let Sr(j) =
h-1 (i - j+1 R) (j + 1 + m)c .+1+m (3.189)
me denote the main term of the right hand side of (3.187) normalized by mc. The construction is illustrated in Fig. 3.15 for R = 1/2. The straight line between the
points (0, &(j)) and (R, 0) intersects h t(1 - R) in the point (r, h t(1 - r)). The rate r is chosen to be R r= j +j+1 1+m
(3.190)
That is, it divides the line between (0, 0) and (R, 0) in the proportion (j + 1) : m. Then we have
Sr(j)
j+1+m
h-t(1 -r)
m
(3.191)
which is equivalent to (3.189). The relationship between r and j in Fig. 3.15 is given by (3.190).
We will now derive a corresponding lower bound on the active column distance. Let v[t,t+j] = u[t-m,t+j]G[t,t+j] where u[r-m,t+j] E u[r-m,t+j] and let
(3.192)
be an integer satisfying the inequality
ac < (j + 1)c/2 2Here and hereafter we write h-1 (y) for the smallest x such that y = h(x).
(3.193)
146
Chapter 3
h-'(1 -R)
Distance Properties of Convolutional Codes 10)
Figure 3.15 Geometrical construction of the relationship between the lower bound on the active row distance for convolutional codes and the Gilbert-Varshamov lower bound on the minimum distance for block codes.
Then, as a counterpart to (3.174) we have P(wH(v[r,r+j])
a,) =
((i + 1)c 2-(j+1)c z -o 2(h(j+o )-1)(j+1)c
3.194)
0<J
We use (3.175) as an upper bound on the cardinality of U _m t+j] and obtain
P
min {t11H(V[t,t+j])} < ac
< 2(j+1)Rc2
- 2(
)-1)(j +I)c
(h( (
(3.195) R+
h( a
I
) )(j
+1 )c
for each t, 0 < t < T. Using the union bound completes the proof of the following Lemma 3.36 Consider the ensemble S (b, c, m, T) of binary, rate R = b/c, periodically time-varying convolutional codes encoded by polynomial generator matrices of memory m. The fraction of convolutional codes in this ensemble whose jth order active column distance
a', 0 < j < T, satisfies
a < ac < (j + 1)c/2
(3.196)
does not exceed
T2
R+h(
-I)(1+1)c
Next, we choose jo to be the smallest integer j satisfying the inequality
(1-R)(j+1)c>logT2
(3.197)
0
(3.198)
Let aic,
Section 3.8
147
Lower Bounds on the Active Distances
denote the largest integer that for given j, j > jo, satisfies the inequality
(R+h((Jl)C)_l)U+l)c_<_logT2
(3.199)
Then, from Lemma 3.36 it follows that for each j, jo < j < T, the fraction of convolutional codes with a jth order active column distance satisfying (3.198) is upper-bounded by T2-logT2
=
1
(3.200)
T
Hence, we use the union bound and conclude that the fraction of convolutional codes with active column distance ac < a' for at least one j, jo < j < T, is upper-bounded by T-1
T<
(3.201)
J=Jo
Thus, we have proved the following
Lemma 3.37 There exists a periodically time-varying, rate R = b/c, convolutional code encoded by a polynomial generator matrix of period T and memory m such that its jth order active column distance for jo < j < T is lower-bounded by ac, where ac is the largest integer satisfying a;
- 1 (j + 1)c < -2log T
(R+h( (j + 1)c
(3.202)
and jo is the smallest integer satisfying
(1-R)(j+1)c>2logT
(3.203)
If we as before choose T = m2, then jo = 0(logm), and the inequality (3.202) can be rewritten as ajc
h
(
(J+1)c)
<1-R
4log m
(j+1)c
(3.204)
for j = 0(m) or, equivalently, as ac
< h-i(1 - R)(j + 1)c + 0(logm)
(3.205)
Thus, we have proved
Theorem 3.38 There exists a binary, periodically time-varying, rate R = b/c, convolutional code encoded by a polynomial generator matrix of memory m that has a jth order active column distance satisfying the inequality
a' > p(j + 1)c + 0(logm)
(3.206)
for j = 0(m) > jo = 0(logm) and p is the Gilbert-Varshamov parameter (3.95). Analogously, we can prove
Theorem 3.39 There exists a binary, periodically time-varying, rate R = b/c, convolutional code encoded by a polynomial generator matrix of memory m that has a jth order active reverse column distance aic which is lower-bounded by the right-hand side of the inequality
(3.206) for all j = 0(m) > jo = 0(logm). For the active segment distance we have the following
148
Distance Properties of Convolutional Codes
Chapter 3
Theorem 3.40 There exists a binary, periodically time-varying, rate R = b/c, convolutional code encoded by a polynomial generator matrix of memory m that has a j th order active segment distance satisfying the inequality
ajs > h-1
1-
J+m+1 R (j + 1)c+ O(log m)
j+l
(3.207)
for j = 0(m) > j, where R
JS < 1 - R m + O (log m)
(3.208)
Proof. Consider the ensemble E(b, c, m, T). First, we notice that the cardinality of Ult,t+jl is upper-bounded by 2mb2(j+1)d = 2(j+m+1)Rc
(3.209)
Using (3.209) instead of (3.175) and repeating the steps in the derivation of the lower bound on the active column distance will give h
a (j+1)c
<1- j+m+1R
j+l
41ogm
(j+1)c
(3.210)
for all j = 0(m) > js, or, equivalently,
ajs < h1 1 -
j+m+1 R j+1
(j + 1)c + 0(logm)
(3.211)
where
0
(3.212)
instead of (3.204), (3.205), and (3.198), respectively, and the proof is complete.
The parameter js is the start of the active segment distance which was introduced in Section 3.2 (cf. Fig. 3.6). For R < 1/2 there exist binary, time-invariant convolutional codes with (see Problem 3.18)
js <
R 1-R(m+
(3.213)
Next, we consider our lower bounds on the active distances, viz., (3.187), (3.206), and (3.207), and introduce the substitution
f_(j+l)/m
(3.214)
Then we obtain asymptotically-for large memories m-the following lower bounds on the normalized active distances: Theorem 3.41 (i) There exists a binary, periodically time-varying, rate R = b/c, convolutional code encoded by a polynomial generator matrix of memory m whose normalized active row distance asymptotically satisfies Se def ajr
mc
>h-1 I- t+1 f R (t + 1) + O (logmm )
(3.215)
fort >0. (ii) There exists a binary, periodically time-varying, rate R = b/c, convolutional code encoded by a polynomial generator matrix of memory m whose normalized active
Section 3.9
149
Distances of Cascaded Concatenated Codes
column distance (active reverse column distance) asymptotically satisfies Sc
def ajc me
Src def f
aJrc
>h 1(1-R)f +O
m
(3.216)
(109In
me
ford > to = 0 (109M) (iii) There exists a binary, periodically time-varying, rate R = b/c, convolutional code encoded by a polynomial generator matrix of memory m whose normalized active segment distance asymptotically satisfies
f
SPdef-c>h-1(l_JR)t+oQ2!_) for f > 2s = 1 RR + O
(3.217)
r logtm )
The typical behavior of the bounds in Theorem 3.41 is shown in Fig. 3.16. Notice that by minimizing the lower bound on the normalized active row distance (3.215), we obtain nothing but the main term in Costello's lower bound on the free distance (3.167), viz., R
- log(21-R - 1) be
Figure 3.16 Typical behavior of the lower bounds on the normalized active distances of Theorem 3.41.
3.9 DISTANCES OF CASCADED CONCATENATED CODES* Consider the simplest concatenated scheme with two convolutional encoders, viz., a cascade
of a rate Ro = b,/c, outer encoder of memory mo and a rate Ri = bi/c, inner encoder of memory mi, where bi = co. The cascaded concatenated code Cc is encoded by the rate def Rc = RoR, = bo/c, b/c convolutional encoder whose memory (2.119) in general could be less than the sum of the memories of the constituent encoders, that is, me < mo +mi (Fig. 3.17).
Chapter 3
150
bo
Outer
Inner
encoder
encoder
Distance Properties of Convolutional Codes
ci Figure 3.17 A cascade of two consecutive en-
coders.
Consider the ensemble EC(b, c, m, T) of periodically time-varying, cascaded convolutional codes constructed in the following way: Choose as outer convolutional code a binary, periodically time-varying with period T, rate Ro = bo/c,, convolutional code encoded by a minimal-basic encoding matrix of memory mo = mink {v0,k}, where vo,k is the constraint length of the kth input and whose active segment distance has the start j10 =
R°
1-Ro
(3.218)
mo + O(log mO)
The existence of such a convolutional code follows from Theorem 3.40. For Ro < 1/2, the outer convolutional code can be chosen to be time-invariant (cf. (3.213)). The ensemble of inner convolutional codes is the ensemble of binary, periodically timevarying with period T, rate Ri = bi/ci convolutional codes encoded by time-varying polynomial generator matrices of memory mi > j°. Then we have the ensemble EC(b, c, mc, T) of periodically time-varying with period T, rate Rc = b/c, cascaded convolutional codes encoded by convolutional encoders of memory m. As a counterpart to Theorem 3.26 we have
Theorem 3.42 Consider a periodically time-varying, rate R, = b/c, cascaded convolutional code encoded by a convolutional encoder of memory mc. (i) Let the information sequences be restricted to the set U t_m_t+i+m'] Then the code symbols in the segment v[r,r+i+mc] are mutually independent and equiprobable over
the ensemble EC(b, c, mc, T) for all j, 0 < j < T. (ii) Let the information sequences be restricted to the set U _m"t+.il. Then the code symbols in the segment v[r,r+i] are mutually independent and equiprobable over the
ensemble EC(b, c, mc, T) for all j, 0 < j < T. (iii) Let the information sequences be restricted to the set U1r1cm,,r+i I. Then the code symbols in the segment v[t,t+i] are mutually independent and equiprobable over the
ensemble EC(b, c, mc, T) for all j, 0 < j < T. (iv) Let the information sequences be restricted to the set u[r-mc t+i] Then the code symbols in the segment v[t,t+il are mutually independent and equiprobable over the ensemble EC(b, c, mc, T) for all j, 0 < j < T. Proof.
Analogously to the proof of Theorem 3.26.
If we let
k = (j + 1)/me
(3.219)
then for cascaded convolutional codes we have the following counterpart to Theorem 3.41:
Theorem 3.43 (i) There exists a cascaded convolutional code in the ensemble EC(b, c, m, T) whose normalized active row distance asymptotically satisfies 6r aef
fort >0.
h-t
C1 -
k+1
R°) (k + 1) + O
ClogMmc.
(3.220)
Section 3.9
Distances of Cascaded Concatenated Codes
151
(ii) There exists a cascaded convolutional code in the ensemble £C(b, c, m, T) whose normalized active column distance asymptotically satisfies 8e
aef a j'`
h-'(1 - R,)f + O
me
(log me )
(3.221)
m
fore > to = 0 ( logm,m° (iii) There exists a cascaded convolutional code in the ensemble EC(b, c, mc, T) whose normalized active segment distance asymptotically satisfies
8'aef
a> >h-t(l-±+IRC)f+O(logmc) f
mCc
(3.222)
log m° m°
R° fort > toS == t_R° +O
Proof.
mc
Analogously to the proof of Theorem 3.41.
The behavior of the bounds in Theorem 3.43 is the same as that for the bounds in Theorem 3.41 (see Fig. 3.16). The free distance for a convolutional code is obtained as the minimum weight of the nonzero codewords. The only restriction on the input sequence is that it should be nonzero. Now, let us consider the cascade in Fig. 3.17. The free distance for the cascaded convolutional code, dfree, is obtained as the minimum weight of the nonzero codewords; again the minimum
is evaluated over all nonzero inputs. Since the inputs to the inner encoder are restricted to be codewords of the outer encoder, we will not obtain a useful estimate of dfree from the free distance of the inner code, drreeIt is somewhat surprising that, given only a restriction on the memory of the inner code,
there exists a convolutional code obtained as a simple cascade with a free distance satisfying the Costello bound:
Theorem 3.44 (Costello bound) Consider a cascade Cc of an outer binary, periodically time-varying convolutional code Co of rate R. = bo/co and encoder memory mo and an inner binary, periodically time-varying convolutional code C; of rate R; = b, /c, and encoder memory m; , where b; = co. If
m, >
R°
1-Ro
mo + 0(log mo)
(3.223)
then there exists a pair of codes, Co and C; , such that the code Cc of rate R, = Ro R; and encoder
memory me = mo + m; has a free distance satisfying the inequalities mcciRc
dfree
- -log(21-R, - 1) +
0
me
m,c,R,
moc0Ro -log(21-R°
logmC
- 1)
-log(21-R; - 1)
(3.224)
+0(logMO) +0(logm`l mo
Proof.
m,
Choose an inner code C, such that Ro m`=11-R0 mo
(3.225)
Distance Properties of Convolutional Codes
Chapter 3
152
Then, by minimizing (3.220) over f, we obtain dfree
>
m`b° - - log(21-Rc - 1)
+O
+
m°c`R°
O
- log(21-Rc - 1) mobo
-
- log(21-Rc - 1) +
-
+
mocoRo log(21-Ro
(logm`)
-
- 1)
me
(logmc) me
mici Rc
- 1)
log(21-Ro
m;ciRi log(21-R,
+O
(logm')
(3.226)
me
- 1)-+0(
logmc mC
where we in the last inequality have used the fact that
- log(21-R - 1) is increasing and R
- log(21-R - 1) is decreasing functions of R.
Theorem 3.44 shows that, given the restriction (3.225) on the ratio mi/mo, from the Costello lower bound point of view, we lose nothing in free distance by splitting a given amount of convolutional encoder memory into two cascaded convolutional encoders. Remark. Assume that for the cascaded encoder in Theorem 3.44 we have vmi,,,o = mo and vmin,i = mi and that mi = R mo holds. Then the total number of states in the outer and inner encoders are 2mib; + 2m,b, = 2mobo(2mib;-mobo + 1) 11
(3.227)
= 2(mo+mi )bo (1 + 2-mi bo )
which is essentially equal to the total number of states of a generator matrix G,(D) with vmin,c = mc. The second equality follows from the equality mi = RRo mo. 11
We have shown that the performances for all active distances for the ensemble of timevarying cascaded convolutional codes are the same as for the ensemble of time-varying convolutional codes, although the former ensemble is essentially smaller. Consider again the concatenated scheme given in Fig. 3.17. Since the inputs to the inner encoder are restricted to the codewords of the outer encoder, we will not obtain a useful estimate of dfree from the free distance of the inner code, dfree. Consider the situation when the shortest constraint length for the inner encoder exceeds the length of the allzero sequence considered when we determine the active segment distance for the outer encoder, that is, when mink {vi,k} > j°, where Vi,k is the constraint length for the kth input of the inner encoder. Then a nontrivial lower bound on the free distance for the binary concatenated code, dfree, can be obtained as follows: Assume a nonzero input sequence to the outer encoder. The length of its output sequence is at least mink { vo,k } + 1, where vo,k is the constraint length for the kth input of the outer encoder.
Since this output sequence serves as input to the inner encoder and since the weight of the output sequence of the inner encoder is lower-bounded by the active row distance of the inner encoder, a'.', at the length of its input sequence, it follows that dfree >
min
{a"' }
j> bi
II
(3.228)
Section 3.10
Path Enumerators
153
where em;n is the length of the shortest burst of code symbols, which is lower-bounded by min > max{(min{vo,k} - 1)co + 2, dfree)
(3.229)
k
From (3.228) we conclude that, in order to obtain a large free distance for this cascaded concatenated code, the inner encoder should have a rapidly increasing active row distance and the outer encoder should have long nonzero sequences as outputs.
Finally, we consider a cascade of a binary rate Ri = 1/ci convolutional inner encoder of memory mi and a set of mi + 1 parallel, binary, rate Ro = b0/c0 convolutional outer encoders each of memory mo. The m, + 1 outputs of the outer encoders are via a buffer of size (mi + 1) x co connected to the single input of the inner encoder (Fig. 3.18). The overall concatenated code CPC is a rate RpC = R0Rt = bo/ci convolutional code with an encoder consisting of mp, = (m0 + co)(mi + 1) + mi memory elements. Encoder Oo
Encoder 0t
Encoder 02
Encoder I
Figure 3.18 A cascade with a set of m, parallel outer encoders, each of memory mo and rate Ro = bo/co, followed by a buffer of size (m; + 1) x c0, and an inner encoder of memory m; and rate R; _
1/c;.
Encoder 0m
Theorem 3.45 There exists a binary, periodically time-varying, rate Rpc = RoR, cascaded concatenated convolutional code CPC encoded by the scheme described above that has a free distance satisfying the inequality3 mocoRo
dfree > Pict (mt + 1)
+ o(mo)
(3.230)
- log(2t-R0 - 1) where pi is the Gilbert-Varshamov parameter (3.95) for the inner convolutional code. Proof. Choose outer convolutional codes satisfying the Costello bound and an inner convolutional code satisfying the active column distance bound. We have at least dfree ones in
the output from any outer encoder. They will be at least mi + 1 positions apart at the input of the inner encoder. Since each one at the input of the inner encoder will contribute at least pi ci (mi + 1) to the weight of the output sequence, it follows that its weight will be at least the product of the two lower bounds, and the proof is complete.
The lower bound in Theorem 3.45 can be interpreted as the product of a GilbertVarshamov-type lower bound on the minimum distance (Corollary 3.16) for the inner convolutional code and the Costello bound on the free distance (Theorem 3.29) for the outer convolutional code. 3.10 PATH ENUMERATORS
For a convolutional encoder the paths through the state-transition diagram beginning and ending in the (encoder) zero state when the self-loop at this state is removed determines the 3Here and hereafter we write f (x) = o(g(x)) if limx.a s(x) = 0, where a is any real number or oo. We have, for example, f (x) = x2 can be regarded as f (x) = o(x) when x -- 0 and f (x) = can be regarded as f (x) = o(x) when x -a oo. We also have f (x) = x can be regarded as f (x) = 0(1) when x -* 0.
Distance Properties of Convolutional Codes
Chapter 3
154
distance spectrum of the corresponding code. We shall now obtain a closed-form expression whose expansion yields the enumeration of all such paths. The method, which is due to Viterbi [Vit7I], is best explained by an example. Consider the rate R = 1/2 convolutional encoder and its state-transition diagram given in Fig. 1.14. For reference the figure is repeated as Fig. 3.19. The self-loop at the zero state in the state-transition diagram is removed, and the zero state is split in two-a source state and a sink state. Then the branches are labeled W° = 1, W, or W2, where the exponent corresponds to the weight of the particular branch. The result is the so-called signal flowchart shown in Fig. 3.20.
Figure 3.19 A rate R = 1/2 convolutional encoder and its state-transition diagram.
Figure 3.20 Signal flowchart for the encoder illustrated in Fig. 3.19.
Let the input to the source (left zero state) be 1, and let T(W) denote the generating function for the path weight W. We call T (W) the path weight enumerator. In connection with signal flowcharts, it is often called the transmission gain and can be found by the standard signal flowchart technique [MaZ60]. For some applications, this method is an efficient way of formulating and solving a system of linear equations. Here we prefer the straightforward method used by Viterbi.
Let 1, 2, and 3 be dummy variables representing the weights of all paths from the left zero state to the intermediate states. Then from Fig. 3.20 we obtain the following system of linear equations
t
W2
T (W) =
W2
(3.231)
and
6
(3.232)
Section 3.10
Path Enumerators
155
Equation (3.231) can be rewritten as
-1
0
1
i
-W 1-W 0 H f;2
-W -W
(3.233) 0
1
Using Cramer's rule, we obtain
3=
det
-1
0
1
-W 1 - W
-W -W
0
))-'( -W
0
w2
1-W
0
-W
0
1
det
-W
1
(3.234)
= W3/(1 - 2W) Combining (3.232) and (3.234), we find
T(W) = W5/(1 - 2W)
=W5+2W6+4W7+...+2kWk+5
(3.235)
+...
Hence, we have dfree = 5, and the spectral components n_++, i = 0, 1, 2, ... , are 1, 2, 4, ... . EXAMPLE 3.10
Consider the rate R = 2/3, memory m = 1, overall constraint length v = 2, convolutional encoding matrix
G(D) - ( 1 + D D
1 +DD
(3.236)
)
Its controller canonical form is shown in Fig. 3.21. v(t) v(2)
Figure 3.21 Controller canonical form of the encoding matrix in Example 3.10.
U(2)
From G (D) or from the encoder block diagram (Fig. 3.21) we can easily obtain the state-transition diagram shown in Fig. 3.22. A modification of the state-transition diagram yields the signal flowchart given in Fig. 3.23. From the signal flowchart we obtain the following system of linear equations
1-W
-W -W
-W2
-W
1 - W2 -W2 = 1-W33 -1
W2 W2
(3.237)
w2
and
T(W)=W3 1+W2 2 +
3
(3.238)
Solving these equations yields
T(W) = 2W3 +5W° + 15W5 +43W6+ 118W7 +329W8 +
(3.239)
Chapter 3
156
Distance Properties of Convolutional Codes
10/010
Figure 3.22 State-transition diagram for the encoder in Fig. 3.15.
Figure 3.23 Signal flowchart for the encoder in Fig. 3.15.
Thus, our code has two codewords of weight dfree = 3, which can easily be verified by tracing paths in the signal flowchart or state-transition diagram.
Viterbi also used the signal flowchart to obtain an extended path enumerator, which counts the paths not only according to their weights, but also according to their lengths L and to the number of 1's I in the information sequence. We return to our encoder in Fig. 3.19 and label the branches not only by W'W, where w
is the branch weight, but also by LP, where i is the number of l's among the information symbols corresponding to the particular branch. Thus, we have the extended signal flowchart shown in Fig. 3.24.
Section 3.10
Path Enumerators
157
LI
Figure 3.24 Extended signal flowchart for the encoder illustrated in Fig. 3.13.
From this extended signal flowchart we obtain the following linear equations W2LI 1 0 -LI
-WLI 1 - WLI -WL -WL
4l
0 1
I
I
2
=
3
0 0
(3.240)
and
T(W, L, I) =
(3.241)
Solving (3.240) and inserting the solution into (3.241) yield
T(W, L, I) =
WSL3I
1 - WL(1 + L)I
= W5L31 + W6L4(1 + L)I2 + W7Ls(1 + L)213
( 3 . 242 )
+... + W5+kL3+k(1 + L)kI l+k + .. . The path weight enumerator is a code property, while the extended path enumerator depends
not only on the code but also on the map between the information symbols and the code symbols, that is, on the encoding matrix. In the next chapter we use these path enumerators to obtain error bounds for maximum-likelihood decoding. Finally, using an example we will show how to determine the active burst distance by using a simple modification of the signal flowchart. Consider the rate R = 1/2 convolutional encoder in Fig. 3.19 and its signal flowchart in Fig. 3.20. In order to determine the active burst distance, we add another zero state such that when we reach this zero state we will leave it in the next step corresponding to the "bounces" in the zero state. We also label all branches except the first one with J in order to count the order of the active burst distance. Hence, we obtain a modified signal flowchart as illustrated in Fig. 3.25.
Figure 3.25 Modified signal flowchart for the encoder illustrated in Fig. 3.19.
Distance Properties of Convolutional Codes
Chapter 3
158
As before, we use the dummy variables i, 2, 3, and 4 to represent the weights and depths from the left zero state to the intermediate states. Thus, we obtain the following set of linear equations
WJ2
42 =
(3.243)
S4 = W2J3 and
T(W, J) = W2J3
(3.244)
Equation (3.243) can be rewritten as 0
1
-WJ 1 - WJ -WJ
-WJ
0
0
-J
-W2J
0
0 0
1
W2J
w2 :2
0
s
0
11
4
1
l
(3.245)
0
Using Cramer's rule, we obtain 1
0
-WJ 1 - WJ ,3 =
x det
-J
-W2J
0
0
-WJ
-WJ
1
0
0
0
-W2J
1
1
0
W2 -W2J
-WJ 1-WJ -WJ
-WJ
0
0
'
0 0
0 0 0
(3.246)
1
W3J
1-WJ-WJ2-W5J3 and, hence,
T(W' J) = W2 J3
W5 J2
1 - WJ - WJ2 - W5J3 = WSJ2(1 + WJ + (W + W2)J2 + (2W2 + W3 + W5)J3 +(W2+3W3+W4+2W6)J4+.,.)
(3.247)
= W5J2+W6J3+(W6+W7)J4+(2W7+W8+W10)J5
+(W7+3W8+W9+2W11)J6+ From the expansion of T (W, J) given in (3.247) we have the following active burst distances
az = 5, a3 = 6, a4 = 6, a5 = 7, a6 = 7.... or since vmin = m = 2, equivalently, the active row distances ao = 5, al = 6, a2 = 6, a3 = 7, a4 = 7, ... , in agreement with the initial part of the curve in Fig. 3.8.
3.11 COMMENTS Most distance measures for convolutional codes were born at the University of Notre Dame: column distance, row distance, free distance [Cos69], and distance profile [Joh75]. In 1974 Costello published important bounds on the free distance [Cos74]. The lower bound on the distance profile was obtained by Johannesson and Zigangirov [JoZ89].
Problems
159
A family of extended distances was introduced for the class of unit memory (UM), that is, m = 1, convolutional codes by Thommesen and Justesen [ThJ83]; see also [JTZ88]. They were generalized to m > 1 convolutional codes by Host, Johannesson, Zigangirov, and Zyablov and presented together with the corresponding bounds in 1995 [HJZ95] [JZZ95]; they are closely related to the active distances [HJZ99].
PROBLEMS 3.1 Consider the convolutional encoding matrix (cf. Problem 1.25) G =
11
01
11
10
01
11
(a) Draw the state-transition diagram. (b) Find the path u = 11001 in the state-transition diagram. (c) Find the lowest weight path that leaves the zero state and returns to the zero state. 3.2 Consider the rate R = 2/3, memory m = 2, overall constraint length v = 3, convolutional encoder illustrated in Fig. P3.2 (cf. Problem 1.27).
(2)
Figure P3.2 Encoder used in Problem 3.2.
(a) Draw the state-transition diagram. (b) Find the path u = 101101 10 in the state-transition diagram.
3.3 Consider the rate R = 1/2 convolutional code with encoding matrix G(D) = (1 + D + D2
1 + D2).
(a) Find the column distances do, d...... dg. (b) Find the distance profile dP. (c) Find the row distances do, d...... dom.
3.4 Consider the rate R = 1/2 convolutional code with encoding matrix G(D) = (1 + D + D2 + D3
1 + D2 + D3) and repeat Problem 3.3.
3.5 Consider the rate R = 1/3 convolutional code with encoding matrix G(D) = (1 + D + D2 1 + D + D2 1 + D2) and repeat Problem 3.3. 3.6 Consider the rate R = 2/3 convolutional code with encoding matrix
G(D)=h 1+D D 1+D 1
and repeat Problem 3.3.
1
D
Chapter 3
160
Distance Properties of Convolutional Codes
3.7 Consider the rate R = 1/2 convolutional code with encoding matrix G(D) = (1 + D + 1 + D3) (cf. Problem 2.4). (a) Draw the state-transition diagram. (b) Find an infinite-weight information sequence that generates a codeword of finite weight. D2
(c) Find dg,, and dom.
3.8 Find the distance profile and the free distance for the rate R = 2/3 convolutional code with encoding matrix (a)
G1(D)=
(1+D D D2
1
1
1+D+DZ
(b)
G2(D)D 1+D2+D3 1+D+D2+D3 0 D
1+
1
(c) Show that GI(D) and G2(D) encode the same code. 3.9 Consider the rate R = 1/2 systematic convolutional encoding matrix G(D) = (1
1+D+
D2).
(a) Draw its controller canonical form. (b) Draw the state-transition diagram. (c) Draw the signal flowchart. (d) Find the extended path enumerator T (W, L, I). 3.10 Consider the rate R = 1/2 convolutional encoding matrix G (D) _ (1 + D + D2 + D3 D2 + D3). (a) Draw the state-transition diagram. (b) Draw the signal flowchart. (c) Find the path weight enumerator T (W). 3.11 Consider the rate R = 2/3 convolutional encoding matrix
G(D)=
1+
1+D D 1+D 1
1
D
Find the extended path enumerator T (W, L, I). 3.12 Consider a rate R = 1/2 convolutional code with a memory m = 4 encoding matrix. (a) Calculate the Heller bound. (b) Calculate the Griesmer bound. Remark.
The optimum time-invariant code with an encoding matrix of memory m = 4 has
d = df ee = 7, but there exists a catastrophic time-invariant encoding matrix with dr. = 8. 3.13 Repeat Problem 3.12 for rate R = 1/2 and memory m = 5. 3.14 Repeat Problem 3.12 for rate R = 1/2 and memory m = 28. 3.15 Consider the rate R = 2/3 convolutional encoding matrix
G(D) =
1
1
\ D 1+D
0 1
(a) Find the column distances do, d ' ,., dg. (b) Find the extended path enumerator T (W, L, I). 3.16 Consider the rate R = 1/3 convolutional code with encoding matrix G(D) = (1 1 + D + D2 1 + D2). Find the spectral component n16. 3.17 Prove the Costello bound for periodically time-varying, rate R = b/c convolutional codes with polynomial, systematic generator matrices (Theorem 3.31). Hint: Modify the proof of Theorem 3.29 by using the idea from the proof of Theorem 3.23.
Problems
161
3.18 Prove that for R < 1/2 there exist binary, time-invariant convolutional codes whose start of the active segment distance is R
Js<1-R(m+1) 3.19 *Prove the Costello bound for periodically time-varying, rate R = b/c convolutional codes with period T
=
Rm/p - log(21-R - 1)
where p is the Gilbert-Varshamov parameter. Hint: Use inequalities (3.150) and (3.159).
Viterbi Decoding
In this chapter we give a general description of the Viterbi algorithm which was introduced in Chapter 1 and show that it is an efficient decoding method, particularly when the advantage
of soft decisions is fully exploited. The path weight enumerators and the extended path enumerators obtained from the state-transition diagram of the convolutional encoder are used to derive tight upper bounds on the decoding error probabilities for both hard and soft decisions.
From these bounds we can estimate the coding gain without the need for experiments or simulations. We also derive various upper and lower bounds for the ensemble of periodically time-varying convolutional codes.
4.1 THE VITERBI ALGORITHM REVISITED
First, we return to the Viterbi algorithm, which we introduced in Chapter 1. Suppose that the controller canonical form of the rate R = 1/2, memory m = 2 convolutional encoding matrix G(D) = (1 + D + D2 1 + D2) is used to communicate over the BSC with 0 < e < 1/2; that is, the decoder operates on hard decisions. For simplicity we encode only four information symbols followed by m = 2 dummy zeros in order to terminate the convolutional code into a block code. This kind of termination is called the zero-tail (ZT) method. Let r = 10 01 10 010100 be the received sequence. We recall that when comparing the subpaths leading to each state, the Viterbi algorithm discards all subpaths except the one closest (in Hamming distance) to the received sequence, since those discarded subpaths cannot possibly be the initial part of the path v that minimizes dH(r, v), that is,
i= argmin(dH(r, v)}
(4.1)
U
This is the principle of nonoptimaliry. In case of a tie, we can arbitrarily choose one of the closest subpaths as the survivor. If we are true to the principle of nonoptimality when we discard subpaths, the path remaining at the end must be the optimal one. The Hamming distances and discarded subpaths at each state determined by the Viterbi algorithm are shown in Fig. 4.1. The estimated information sequence is u = 1110. The successive development of the surviving subpaths through the trellis is illustrated in Fig. 4.2. 163
164
Chapter 4 r =
01
10
10
01
01
Viterbi Decoding
00
Figure 4.1 An example of Viterbi decoding-hard decisions.
The decoding delay is as long as the codeword since an optimum decision cannot be made until the surviving paths to all states at a certain depth share a common initial subpath. In principal, this may not happen before the decision is made at the final node and only one path through
the trellis remains. However, in practice, for rate R = 1/2 a fixed decoding delay of four to five times the encoder memory m causes negligible degradation. In our simple example the maximum decoding delay is four (see Fig. 4.2).
(b)
(e)
'
(c)
%'
(f)
Figure 4.2 Development of subpaths through the trellis.
From this example it is clear that the Viterbi algorithm is a general method to do minimum
distance decoding efficiently when the overall constraint length is small enough, v < 10 say. The maximum-likelihood (ML) decoder selects as its estimate v that encoded sequence v which maximizes P(r I v), or, equivalently, since we consider only memoryless channels vke>) (DMC), IIkP(rk I Vk) = IIkIIeP(rke) Taking the logarithm, subtracting a sum that depends only on the received sequence, and multiplying by an arbitrary positive constant A, we find that the maximum-likelihood rule reduces to choosing that encoded sequence v which maximizes the Viterbi metric I
l-ev(r, v) = E l-ev(rk, vk) = E AV (rke), vke)) k
k
(4.2)
e
where [Mas84] µV (rke), vke))
A
(log
P (rke)
I vke)) - f(e) (rke)))
(4.3)
We call the quantity µv(rk, vk) the Viterbi branch metric, and µv(rke), vke)) is the Viterbi symbol metric. For convenience, we often choose the arbitrary function fee)
Yke)) = min log P (rke) M
uk
(4.4)
I V ke)) j
if the minimum exists, since then the minimum value of the symbol metric for each received digit rke) is zero. Furthermore, we choose the constant A so that the Viterbi symbol metric µv(r(e), vke)) can be well approximated by integers.
The Viterbi Algorithm Revisited
Section 4.1
165
Suppose that the information sequence u[o,n) followed by mb dummy zeros are encoded by a convolutional encoder of memory m and that the corresponding codeword is transmitted over the BSC with crossover probability E. Let r[o,n+m) denote the received sequence. The ML decoder chooses as its estimate v[o,n+m) of the transmitted codeword that codeword v[o,n+m) which maximizes P(r[O,n+m)
I
E)(n+m)c-dH(r[o,n+m),u[o,n+m))
V[O,n+m))
EdH(r[o,n+m),v[0m+m))
X
dH(r[o,n+m),v[o,n+m))
E
=(1-E)(n+m)c
(4.5)
E)
(n+m-l)c
dH(rl,vf)
E
H 1-E ) t-O (
where the maximization is done over all codewords v[o,n+m). Hence, the symbol metric Nw ( rt,
vt ) = A
(log
= (Alog
((1
-
E
I
I - E ///
)(
dH (re , yr )
i
)
) - fi ( .
re ) (4.6)
dH ( rtv t ) + Alo g (I - E) - Aft (rt )
By choosing A
-(l og j
=
1
E E
)
(4 . 7)
and
ft (rt) = lo ge
(4.8)
AV(rt, vt) = 1 - dH(rt, vt)
(4.9)
we get
or
µv (rt, vt)
_
1,
if rt = vt
(4.10) otherwise From (4.10) it is readily seen that, when communicating over the BSC, maximizing the Viterbi metric is equivalent to minimum (Hamming) distance (MD) decoding. Before stating the Viterbi algorithm, we remark that finding the shortest route through 0,
a directed graph is an old problem in operations research. The following algorithm, first discovered by Viterbi in this context [Vit67], appears in many variants in the shortest-route literature and as dynamic programming in the control literature:
Algorithm V (Viterbi) VI. Assuming that the convolutional encoder is at the zero state initially, assign the Viterbi metric zero to the initial node; set t = 0. V2. For each node at depth t + 1, find for each of the predecessors at depth t the sum of the Viterbi metric of the predecessor and the branch metric of the connecting branch (ADD). Determine the maximum of these sums (COMPARE) and assign it to this node; label the node with the shortest path to it (SELECT). V3. If we have reached the end of the trellis, then stop and choose as the decoded codeword a path to the terminating node with largest Viterbi metric; otherwise increment t by 1 and go to V2.
166
Chapter 4
Viterbi Decoding
EXAMPLE 4.1 Consider the rate R = 2/3, memory m = 1, overall constraint length v = 2 convolutional encoder shown in Fig. 4.3. Suppose that two information two-bit symbols followed by a two-bit dummy 0 are encoded and that r = 010111000 is received over a BSC with 0 < e < 1/2. V(I) v(2) V(3)
Figure 4.3 The rate R = 2/3 encoder used in
u(2)
Example 4.1.
The Hamming distances and discarded subpaths at each state determined by the Viterbi algorithm are shown in Fig. 4.4. The estimated information sequence is u = 10 10.
r=
010
111
000 2
)00
0
000
O1\
0
011 110
t
110
001 0
00 011 001
100 110
010 0
0
1
0
111
101
3
2
001
Figure 4.4 An example of Viterbi decodingrate R = 2/3.
010
We will now consider how to do Viterbi decoding in a soft decisions situation. This gives us an "extra" 2 to 3 dB coding gain for "free" (cf. p. 180). EXAMPLE 4.2 Consider the binary input, 8-ary output DMC shown in Fig. 4.5 with transition probabilities P(r given by the following table: r
V
0 1
04
03
02
Ot
11
12
13
14
0.434 0.002
0.197 0.008
0.167 0.023
0.111 0.058
0.058 0.111
0.023 0.167
0.008 0.197
0.002 0.434
I
v)
The Viterbi Algorithm Revisited
Section 4.1
167
Taking the logarithms, we have r 04
V
0
-0 8 3
11
12
79
-2 20 .
-2 85
-3 77
-4
83
-6 21
-3.77
-2.85
-2.20
-1.79
-1.62
-0.83
-1
-
1
01
02
- 1 . 62 4.83
.
-6.2
1
03
.
.
14
13
.
.
.
For each r we subtract min {log P (r I v) } and obtain r
V
0 1
04
03
5.38 0.00
02
01
11
12
13
14
3.21
1.98
0.65
0.00
0.00
0.00
0.00 0.65
0.00
0.00
1.98
3.21
0.00 5.38
Finally, after scaling (A = 1.5) and rounding we have the following table:
r
V
04
03
02
01
11
12
13
14
0
8
5
3
1
0
0
0
0
1
0
0
0
0
1
3
5
8
These Viterbi symbol metrics will be used by the Viterbi algorithm when decoding a sequence received over the channel shown in Fig. 4.5.
0 0 01
Figure 4.5 Binary input, 8-ary output, DMC.
Suppose that the same encoder as in the hard decisions example, viz., the controller canonical form of G(D) = (1 + D + D2 I + D2), is used and that again four information symbols followed by two dummy zeros are encoded. Let r = 1 t 04 0,12 110, 0111 0,13 0403 be the received sequence. The Viterbi metrics /v are calculated for all subpaths leading to each state. Only the subpath with the largest metric is kept-all other subpaths are discarded. The trellis with Viterbi metrics and discarded subpaths is shown in Fig. 4.6. The estimated information sequence is u = 0110. The successive development of the surviving subpaths through the trellis is shown in Fig. 4.7.
Notice that the received sequence in our hard decisions example is exactly the received sequence, which we obtain if we merge the soft decisions outputs 01, 02, 03, 04 and 11, 12, 13, 14 in Example 4.3 into the hard decisions outputs 0 and 1, respectively. The estimated information
Chapter 4
168 r = 1104
0112
0111
1101
00 r- 00 f-1 00 r10
i
0113
Viterbi Decoding
0403
00
Figure 4.6 An example of Viterbi decoding-soft decisions. (d)
(a)
(b)
<`X
(e)
Figure 4.7 Development of subpaths through the trellis.
sequences based on hard and soft decisions, respectively, differ in the first digit. Thus, here we have a specific example showing the importance of exploiting the full information provided by the soft output demodulator. From this example it is clear that the Viterbi algorithm is an efficient maximum-likelihood decoding procedure that can easily exploit soft decisions.
We conclude this section by observing that when comparing the subpaths leading to each state, we discard all subpaths except the one with the largest Viterbi metric, since those discarded subpaths cannot possibly be the initial part of the path v that maximizes P (r I v). In case of a tie, we can arbitrarily choose one of the maximizing paths as the survivor. The path remaining at the end must be the maximum-likelihood estimate, and we have the following important theorem [For67] [For94]. Theorem 4.1 The Viterbi algorithm is a maximum-likelihood decoding algorithm. 4.2 ERROR BOUNDS FOR TIME-INVARIANT CONVOLUTIONAL CODES In practice, the Viterbi algorithm is used for a rate R = b/c convolutional code to decode long sequences of received symbols, typically a few thousand bits, before the decoder is forced back to the zero state by a tail of mb dummy zeros that are fed into the encoder in order to terminate the frame (the ZT method). An erroneously decoded path will in general remerge with the correct path long before it reaches the end of the trellis. Thus, a typical error event consists of a burst of erroneously decoded information digits. Such a burst always starts and ends with an error. Furthermore, if we have mb or more consecutively correct information digits among the erroneously decoded information digits, then the erroneous path has remerged with the correct path before it diverges again-a multiple-error event, which consists of separate error bursts. The block error probability is not the appropriate quality measure for Viterbi decoding. If we use very long frames, the block error probability will be close to one even if the system
Section 4.2
Error Bounds for Time-Invariant Convolutional Codes
169
provides adequate protection of the information digits. As in Chapter 1, we shall use the bit error probability, Pb, as our quality measure. Again we would like to stress that the bit error probability is not only a code property but also an encoding matrix property. It depends on the map between the information sequences and the codewords. We shall also introduce the burst error probability, PB, which is the probability that an error burst starts at a given node. The burst error probability is a code property and is sometimes called first event error probability or node error probability. Since it is easier to obtain good bounds on the burst error probability than on the bit error probability, we shall study the burst error probability first. The burst error probability for a convolutional code consisting of finite length codewords is always upper-bounded by the burst error probability for infinite length codewords if they are encoded by the same encoder. This is readily seen from the fact that we cannot do better by adding more adversary paths. The burst error probability is not the same for all nodes along the correct path. This is readily seen from the following argument. Suppose that the first burst starts at depth i, i > 0. Typically, this burst has not been caused by an error event containing many channel errors in the beginning of it since such an error event would have caused a burst starting at an earlier depth. Hence, the burst error probability at depth i, i > 0, is not greater than that at the root. However, our upper-bounding of the burst error probability is valid for any node. Suppose that the convolutional code whose trellis diagram is shown in Fig. 4.8 is used for communication over a BSC with crossover probability c, 0 < E < 1/2.
Figure 4.8 Trellis diagram for a convolutional encoder.
What is the probability that the Viterbi decoder selects the codeword 1110 1100 00 ... in favor of the transmitted allzero codeword? This particular decoding error will occur when there are three or more errors among the five digits in which the codewords differ, that is, in positions 1, 2, 3, 5, and 6. The probability for this event is
p5 = P (3 or more l's in 5 positions) 5
()eeu -
=
e=3
E)5-e
(4.11)
e
If the distance d between the correct path and its adversary is even, which, for example, is the case if we consider the codeword 1101011100 ... instead, we do not necessarily make a decoding error when there are d/2 errors among the d digits in the positions in which the codewords differ. In this case, there is a tie between the two paths, and we will discard the correct one only with probability 1/2. For d = 6 we have P6 =
)E3(l
2
(6)ei - E)6-e
- E)3 +
(4.12)
e=4
In general, we have
Ed
//e=(d+1)/2
d
e
(1 - E) d-e
d odd (4.13)
Pd = dd/2),5d/2(i
- .5)d/2 + Le=d/2+1 lelEe(1
d even
Chapter 4
170
Viterbi Decoding
Since Ee(1 - E)d-e is increasing with decreasing e, we notice that for d odd,
(d)Ee(1
Pd = e=(d+1)/2
a
- E)d-e < E (d)Ed/2(1 - 6)d/2 e=(d+1)/2 e
(d)
= Ed/2(1 - E)d/2
e
e=(d+1)12
_
< Ed/2(1 - E)d/2 y e=0
(d)
(4.14)
e
V\d (2
E(1 - E))
It can be shown (Problem 4.17) that (2,/c (I - E))d is an upper bound on Pd also when d is even. Hence, we have the Bhattacharyya bound [Bha43]
Pd < (2 E(1
- E))d def zd
alld
(4.15)
where z is called the Bhattacharyya parameter for the BSC. EXAMPLE 4.3 Consider the BSC with E = 0.01. For this channel 2JE(1 - E) -= 0.2
(4.16)
and
Ps < 0.25
3.2
10_4
(4.17)
which is much less than the channel crossover probability c = 0.01.
Assuming Viterbi decoding, a necessary, but not sufficient, condition for a burst error to occur at the root is that the received sequence given an incorrect path diverging from the correct path at the root is more likely than the received sequence given the correct path. This condition is not sufficient since, even if it is fulfilled, the ultimately chosen path might begin with a correct branch. Let E(k) be the event that a burst error at the root is caused by path k. Then we have P(E(k)) PB < P(UE(k)) < (4.18) where the second inequality follows from the union bound, and the union and the sum are over all incorrect paths diverging from the root. Since a convolutional code is linear, we can without loss of generality assume that the correct path is the allzero sequence. Then, if the Hamming weight of the kth incorrect path is d, we have
P(E(k)) = Pd where Pd is given in (4.13). Combining (4.18) and (4.19), we obtain
(4.19)
00
PB <
ndPd
(4.20)
d=dfree
where nd is the number of weight d paths, that is, the weight spectrum of the code; cf. (3.36). The number of weight d paths for d = dfree, dfree + 1, ... is given by the path weight enumerator T (W) discussed in Section 3.10: Oc
T(W) =
ndWd
(4.21)
d=dfree
Combining (4.15), (4.20), and (4.21), we obtain [Vit71]
Theorem 4.2 (Viterbi) The burst error probability when using a convolutional code for communication over the BSC with crossover probability 6 and maximum-likelihood decoding
Section 4.2
Error Bounds for Time-Invariant Convolutional Codes
171
is upper-bounded by PB <
00
nd
(2 E(1
d
- E))
= T(W) I W=2 F(l-E)
(4.22)
d =d free
EXAMPLE 4.4
Consider the BSC with c = 0.01 and the rate R = 1/2 convolutional code with the encoding matrix G(D) = (1 + D + D2 1 + D2). Its path weight enumerator is (3.235)
T(W) _
W5
(4.23)
1-2W
Viterbi's upper bound on the burst error probability is
PB < T (2 E(l - E)) I T (O.2)
_
0.25
1-2.0.2
(4.24)
X5.10 a
Van de Meeberg [Mee74] noticed that (see Problem 4.18) (4.25) P2i-1 = P2i, i > 1 That is, for each path the decoding error probability when d is odd is the same as that for the case when d is increased by 1! Thus, when d is odd, the Bhattacharyya bound (4.15) can be
replaced by d+1
Pd < (2
E(1 -_c))
d odd
,
(4.26)
Then from
Y, ndWd + Y ndWd+1 d even 1
d odd
= 2(T(W) + T(-W)) +
W 2
(4.27)
(T(W) - T(-W))
we have
Theorem 4.3 (Van de Meeberg) The burst error probability when using a convolutional code for communication over the BSC with crossover probability c and maximum-likelihood decoding is upper-bounded by PB <
G(T(W) + T(-W)) + W (T(W) -
T(-W))) (4.28)
_ Remark.
(1 2T(W)+ 2W T(-W))
jW=2 E(1-E)
Actually, van de Meeberg [Mee74] derived a slightly tighter bound (Problem
4.19).
EXAMPLE 4.5 Consider the same channel and encoding matrix as in Example 4.4. Van de Meeberg's upper bound on the burst error probability is PB <
I+2 E(1-E) I-2 E(I -E) T (-2 E(1 - E)) 2 T (2 E(1 - )) + 2 (4.29)
1+0.2
T (0.2) +
1-0.22T(-0.2) - 2.
10-a
which is an improvement by a factor 2.5 compared to Viterbi's bound.
Chapter 4
172
Viterbi Decoding
In Fig. 4.9, simulations of the burst error probability for Viterbi decoding are compared with Viterbi's and van de Meeberg's upper bounds when the rate R = 1/2, memory m = 2 encoder with encoding matrix G (D) = (1 + D + D2 1+ D2) is used for communication over the BSC. Notice the close agreement between van de Meeberg's bound and the measurements, in particular for small E. The perhaps somewhat surprisingly large difference between the two
bounds is explained by the fact that this code has an odd free distance, viz., dfree = 5. In Fig. 4.10 we show the corresponding curves for Qualcomm's rate R = 1/2, memory m = 6 encoder with encoding matrix G(D) = (1 + D + D2 + D3 + D6 1 + D2 + D3 + DS + D6) and dfree = 10.
E
Figure 4.9 Viterbi's and van de Meeberg's upper bounds on the burst error probability for the encoding matrix G(D) = (1 + D + D2 1 + D2) and the BSC with crossover probability E.
We will now use the extended path enumerator and derive an upper bound on the bit error probability. Suppose that the received sequence r = rort ... rK_ I of length K c-tuples is decoded as = vovt . . . vK_t and that the corresponding estimate of the information sequence k = uoul ... uK_t contains I (K) errors. Let I denote the number of erroneously estimated information symbols in a burst of length L b-tuples, and let N denote the number of b-tuples between two bursts. In particular, burst j, j = 0, 1, 2, ... , contains Ij erroneously estimated information symbols, is of length Lj b-tuples, and is separated by Nj b-tuples from the previous burst, where
Nj > m, j = 0, 1, 2....
(4.30)
Section 4.2
Error Bounds for Time-Invariant Convolutional Codes
173
e
Figure 4.10 Viterbi's and van de Meeberg's upper bounds on the burst error probability
for Qualcomm's R = 1/2, m = 6 encoder and the BSC with crossover probability E.
Erroneously estimated information symbols separated by m or fewer b-tuples belong by definition to the same error burst. We define the bit error probability Pb to be the ratio between the number of erroneously estimated information symbols and the total number of information symbols. According to the law of large numbers, we have
lim PI
Pb -
I(K)
K-oc
Kb
>e)=0
(4.31)
for any e > 0 or, equivalently,
Pb = lim (K) with probability 1 K-oo Kb
(4.32)
Assume that the I (K) errors are distributed over J bursts with lo, h, ... , Ij_1 errors, of lengths Lo, L1, ... , LJ_1, and separated by error-free intervals of lengths No, N1.... , NJ-1. From (4.32) it follows that
lIi
Pb = lim
J-oo b
i=o
4-o (Nj + L1) (4.33)
J 1I < lim
j=0 j f
.1-,oobF_j=oNi
with probability 1
Chapter 4
174
Viterbi Decoding
and that
Pb <
limj" i
I.i
blimj,")
with probability 1
Nj
(4.34)
According to the law of large numbers, the limit in the numerator is equal to the expected value of the number of bit errors in a burst; that is, I J-1
E[I I burst] = lim Y I j with probability 1 J-*00 J
(4.35)
j=0
and the limit in the denominator is equal to the expected value of the length (in branches) of the error-free interval; that is, I J-1
E[N] =J-*oo lim -
Nj with probability 1
(4.36)
j=0
By combining (4.34), (4.35), and (4.36), we obtain
E[I I burst] bE[N]
Pb
(4.37)
The probability, PB, that an error burst starts at a given node can be defined as the limit when
J -+ oc of the number of bursts J divided by the number of instances where a burst could have started. This latter number is less than - j_o Nj (In fact, it is less than or equal to
- j=o (Nj - m).). Hence, we have
PB > lim
J J-1 N
_
1
(4.38) 1
E[N]
limJ-O<) Y j=o Nj
with probability 1
From the law of large numbers and from inequalities (4.37) and (4.38), we deduce that 1
Pb < b E [I I burst] PB
(4.39)
Let p (i) be the probability of a burst introducing i errors in the estimated information sequence. Then we have 00
PB =
L p(i)
(4.40)
i=1
00
p(i I burst) = 1
(4.41)
p (i burst) = P(i)
(4.42)
i=1
PB
00
E[I I burst] _
i p(i I burst) = Z-`=' p(
(4.43)
PB
i=1
From (4.39) and (4.43) we conclude that 00 1
Pb < b T i p(i) i=1
(4.44)
Section 4.2
Error Bounds for Time-Invariant Convolutional Codes
175
Let n(w, f, i) be the number of weight w paths of length f introducing i errors in the estimated information sequence. From the Bhattacharyya bound (4.15) it follows that the probability that a path of weight w causes a decoding error is upper-bounded by
(2,/c (I
- E)) w
where c is the crossover probability of the BSC. Then, by applying the union bound, we obtain 00
00
p(i) <
n(w, f, i) (2 E(1 - e))w
(4.45)
W=dfree E=Vmin+1
where Vmin = mini[ vi}. The numbers n (w, f, i) are given by the extended path enumerator T (W, L, I) discussed in Section 3.10:
T(W, L, I) _
1: n(w, f, i)WwLEI` w
E
(4.46)
i
Combining (4.44), (4.45), and (4.46), we obtain [Vit71]
Theorem 4.4 (Viterbi) The bit error probability when using a convolutional code with generator matrix G(D) of memory m for communication over the BSC with crossover probability E and maximum-likelihood decoding is upper-bounded by 1
00
00
00
Pb < b
i n(w, f, i) (2 E(1 - E))w w=dfr,. E=vm;n+1 i=1
_
(4.47)
1 8T(W, L, I) b 8I
W=2
L=1
f
)
I = 1
where T (W, L, I) is the extended path enumerator for the generator matrix G (D). EXAMPLE 4.6
Consider the BSC with E = 0.01 and the rate R = 1/2 convolutional code with the encoding matrix G(D) = (1 + D + D2 1 + D2). Its extended path enumerator is (3.242) WSL3I
T(W' L, I)
(4.48)
I - WL(1 + L)I
Since
8T(W, L, I) 8I
W5L3
(4 . 49)
(1 - WL(1 + L)I)2
Viterbi's upper bound on the bit error probability is Pb <
(2
(1 - 2
E(1 --E))'
(2,/E-
- E)))2
0.25
ti
0.9.10-3
(4.50)
(1 -2-0.2)2
which shows that even by using this simple coding we can achieve an improvement of at least a factor 10 over the raw "error probability" of the channel.
As a counterpart to van de Meeberg's tightening of Viterbi's bound on the burst error probability, we have (Problem 4.20)
Theorem 4.5 (Van de Meeberg) The bit error probability when using a convolutional code with generator matrix G (D) of memory m for communication over the BSC with crossover
176
Chapter 4
Viterbi Decoding
probability a and maximum-likelihood decoding is upper-bounded by
Pb
b1 (1+W0T(W,L,I) 2
8I
1-W0T(-W, L, I)) +
8I
2
W= 2 E(1-E) L=1
(4.51)
1= 1
where T (W, L, I) is the extended path enumerator for the generator matrix G (D). In Figs. 4.11 and 4.12 we compare Viterbi's and van de Meeberg's upper bounds on the bit error probability with simulations for the encoding matrix G(D) = (1 + D + D2 1 + D2) and Qualcomm's R = 1/2, m = 6 encoder, respectively.
E
Figure 4.11 Viterbi's and van de Meeberg's upper bounds on the bit error probability for the encoding matrix G(D) = (1 + D + D2 1 + D2) and the BSC with crossover probability E.
Now suppose that a rate R convolutional code is used to communicate over the additive white Gaussian noise (AWGN) channel with BPSK modulation at the signal-to-noise ratio Es/No, where the symbol energy ES is related to the bit energy through the rate R: ES = REb (4.52) We assume that the allzero codeword 0 is transmitted and that there is a codeword v with wH(v) = d. To simplify notations, we assume without essential loss of generality that the first d symbols of the codeword v are nonzero. What is the probability, Pd, that the Viterbi decoder selects v in favor of the transmitted allzero codeword? The probability for this event equals the probability that the ratio of the conditional density functions is greater than 1; that is,
Section 4.2
Error Bounds for Time-Invariant Convolutional Codes
177
E
Figure 4.12 Viterbi's and van de Meeberg's upper bounds on the bit error probability
for Qualcomm's R = 1/2, m = 6 encoder and the BSC with crossover probability E.
Pd=P(P(r
P(n p(rI v) >1)
I0)>1)
/
(4.53)
r
where 1
P(rt 1o)= Noe
_
ry+
'
ft
(4.54)
and
(,,-
1
P(r1 11) = From (4.53) it follows that
Noe
(4.55)
(_)2 d
Pd =P
No 2
e 1=1 e
>1
No
d
= P
)2
No
No
2
(-
(rj
-
2
+ (rj + ES)/ S/
0)
(4.56)
Chapter 4
178
Viterbi Decoding
Since the channel noise values at different sample times are uncorrelated, it follows that j 1 rj is the sum of d independent Gaussian random variables and, hence, that Ea=1 rj e N(-d Es, dNo/2). Thus, we have ' °° _ (r+a, s)2 e dNo
1
Pd =
J
ndNo 0
dr
Oc
e-y2 /2dy
2n
2dEs/N
= Q ( 2dEs/No)
(4.57)
= Q ( 2dREb/No) where Q O is the complementary error function (1.13). From (4.20) it follows that the burst error probability when communicating over the AWGN channel and exploiting soft decisions is upper-bounded by 00
(4.58)
ndPd
PB _< d=dfree
where nd is the number of weight d paths and Pd is given by (4.57).
Lemma 4.6 For x > 0, Q(x) = Proof.
1
2n
f
e-y2/Zdy <
e_x2/2
(4.59)
00
For any A > 0, we have 2
e z +)'(y x)dy
Q(x) <
2n
<e
X
(4.60)
aX+ 2
e
`Y 2
dy = e-xx+
z
Choosing A = x completes the proof. By combining (4.58) and Lemma 4.6, we obtain
Theorem 4.7 (Viterbi) The burst error probability when using a rate R convolutional code for communication over the AWGN channel with BPSK modulation at signal-to-noise ratio Eb/No and maximum-likelihood decoding is upper-bounded by 00
PB <
nde
dREb/No
= T (W) IW=e ReblNo
(4.61)
d=df,ee
Viterbi's bound (4.61) can be tightened [Vi079] by using the following inequality:
Lemma 4.8 For x > 0, z > 0,
Q( x{ z < Q (J) e-z/2 with equality if and only if z = 0.
(4.62)
Section 4.2
Error Bounds for Time-Invariant Convolutional Codes
179
Proof
z f- Q (/)
Q( x =
_
'
1
/
2n 1
e-z/2
e-Y2I2dy
x+z
-
I
1
e-Y2dy
27L
(e-(Y+ x+z)2/2
2n Jo __
e-z/2
- e-z/2e-(Y+x)2/2) dy
(4.63)
e- z -x2 (e-Y x+z _ e) dy < 0
1
\
2n Jo
J
where the inequality follows from the fact that e-Y x+z <
(4.64)
We have equality in (4.63) and (4.64) if and only if z = 0 and the proof is complete.
Theorem 4.9 The burst error probability when using a rate R convolutional code for communication over the AWGN channel with BPSK modulation at signal-to-noise ratio Eb/No and maximum-likelihood decoding is upper-bounded by ao
nde-dREbtN0
PB < Q ( 2dfreeREb/NO) d=dam
=Q( Proof.
2dfreeREb/NO) edfreeREbINOT(W)
(4.65) IW_e-REb/NO
Let x = dfree and z = d - dfree. Then by combining (4.57) and (4.62) we obtain
Pd < Q ( 2dfreeREb/No)
e-(d-dfae)REbINo
(4.66)
Inserting (4.66) into (4.58) completes the proof.
In order to upper-bound the bit error probability, we combine (4.57) and Lemma 4.6. Thus, we have 00
00
p(i) <
n(w, £, i) (e-REb/No\w
(4.67)
w=dam E=vm;n+1
From (4.44) and (4.67) follows
Theorem 4.10 (Viterbi) The bit error probability when using a rate R convolutional code with generator matrix G(D) of memory m for communication over the AWGN channel with BPSK modulation at signal-to-noise ratio Eb/No and maximum-likelihood decoding is upper-bounded by Pb < b w=df6=vme,+1 i=1
_
l n(w' ,
L)e-wREb/No
(4.68)
1 8T (W, L, I) b
87
W= e-REb/NO
L= 1
1=1
where T (W, L, I) is the extended path enumerator for the generator matrix G(D). As a counterpart to Theorem 4.9 we have [Vi079]:
Chapter 4
180
Viterbi Decoding
Theorem 4.11 The bit error probability when using a rate R convolutional code with generator matrix G (D) of memory in for communication over the AWGN channel with BPSK mod-
ulation at signal-to-noise ratio Eb/No and maximum-likelihood decoding is upper-bounded by Pb
<
1
b
Q ( 2dfreeREb/No) e df-REblNo !)e-wREb1No
X (w=df_ f=vmin+l i=1 i n(tv j,
= Q( b
(4.69)
2dfreeREb/No ) edfreeREb/No aT (
L, I)
W= a-REb INO
01,
L=1
I=1 where T (W, L, I) is the extended path enumerator for the generator matrix G (D). Proof. Follows immediately from (4.44), (4.62), and (4.67). I
It is interesting to compare the bit error probability bounds for hard decisions (BSC) and soft decisions (AWGN channel). For high signal-to-noise ratios, the terms at distance dfree dominate the bounds. Thus, the dominating term in (4.47) is b
(2 E(1
-
E))df-
ti
(4.70)
1 2dfr-Edfree/2
From (1.12) and (4.52) it follows that
E=Q
2REb/No)
(4.71)
Hence, we have 2REb/No))dfree/2
2dfreeEdfree/z _ 12dfree (Q (
(4.72) 12dfreee- 2dfreeREb/No
b where the inequality follows from Lemma 4.6. Thus, for the BSC we have asymptotically Pb < b2dfreee-2dfreeREb/No
(
Y
L, 1 n(dfee, f, i) I
/
t=vmjn+l i=1 00 00
(4.73)
For the AWGN channel we have asymptotically °°
Pb
b
Q ( 2dfreeREb/No)
i n(dfree, £, i) E=vmin+l i
°° e-dfreeREb/No
b
°°
r
(4.74)
l n(dfree, t, 1)
£=vmin+l itJ1
By comparing (4.73) with (4.74) we see that the exponent of (4.74) is larger by a factor of 2. Thus, we have a 3-dB energy advantage for the AWGN channel over the BSC. In the Viterbi's bounds we pick up asymptotically (high signal-to-noise ratios) a 3-dB gain by using soft decisions instead of hard decisions. In Chapter 1(Fig. 1.7) we used Shannon's channel capacity theorem (low signal-to-noise ratios) to show that soft decisions are about 2 dB more efficient than hard decisions. For high signal-to-noise ratios, the bit error probability is determined by the exponent dfreeREb/No in (4.74). In the uncoded case, we have dfree = R = 1. Hence, we define the
Section 4.3
Tighter Error Bounds for Time-Invariant Convolutional Codes
181
asymptotical coding gain to be
y = 10loglo(dfreeR) dB
(4.75)
The asymptotical coding gain is a rough estimate of the performance of a coded system representing the potential increase due to coding.
For a rate R = 1/2, memory m = 2 convolutional code with dfCee = 5, we have y = 101oglo(5 2) = 4 dB. The true coding gain at Pb = 10-5 is approximately 3.5 dB
[Wil96]. The difference at this bit error probability level is due to the nonnegligible contribution of the other terms in the extended path enumerator. Less powerful codes approach their asymptotic coding gain much faster than more powerful codes. For example, the rate R = 1/2, memory m = 6 convolutional code with dfCee = 10
has an asymptotic coding gain y = 101ogto(10. 2) = 7 dB, but at Pb = 10-5 the coding gain is only 5 dB [Wi196].
4.3 TIGHTER ERROR BOUNDS FOR TIME-INVARIANT CONVOLUTIONAL CODES Most of the bounds we derived in the previous section are quite tight when few channel symbols
are in error, that is, for high signal-to-noise ratios. However, they become trivial (> 1) when we have many errors among the channel symbols (i.e., for low signal-to-noise ratios). In this section, we will derive an essentially tighter bound for the burst error probability for the BSC [CJZ84a]. Since the bound will not depend on the starting point for the error burst, we will without loss of generality consider bursts starting at the root. Let us separate the error event into two disjoint events corresponding to "few" (F) and "many" (M) errors, respectively. If we let E denote the burst error event (i.e., the event that an error burst starts at a given node if maximum-likelihood decoding is used on the BSC), we have
PB = P(E) = P(E I F)P(F) + P(E I M)P(.M) < PV I .F)P(F) + P(M)
(4.76)
= P(E,-F)+P(M) To obtain an upper bound on the probability that we have many channel errors, P(M), we use a random-walk argument that is similar to what we used in Section 3.4 when we derived a lower bound on the distance profile. Let us accrue a metric a when the channel symbol is correctly received and a metric 0 when we have a channel error. Then the cumulative metric along the correct path is a random walk 0, So, Si, S2, ... , where Z;
SL =
(4.77)
i-o
for t > 0. The branch metrics Z; , i = 0, 1, 2, ... , are independent, identically distributed random variables and can be written C
Zi=EYif
(4.78)
where the Yie's are independent, identically distributed random variables, Y,e =
f
a > 0, with probability 1 - E 0 < 0, with probability E
(4.79)
Chapter 4
182
Viterbi Decoding
where c is crossover probability of the BSC. Clearly (cf. Example B.1),
P(Zi = ka + (c - k)/j) =
- )k,,c-k
(4.80)
Let us choose the metrics a = log
1-E 1-a
(4.81)
and
=logE
a where E < a < 1 is a parameter to be chosen later. Suppose we have wt errors among the first (t + 1)c channel symbols. Then,
St = wtt8 + ((t + 1)c - wt)a
(4.82)
(4.83)
Now we can more precisely state what we mean by "few" and "many" errors. Those error patterns for which St stays above the barrier at u < 0 contain few errors, and those error patterns for which the cumulative metric hits or crosses the barrier contain many errors. Few errors, that is, Smin = min( St} > u
(4.84)
wto + ((t + 1)c - wt)a > u, t > 0
(4.85)
t
is equivalent to
or
w < W,
-u
a-0
+ (t + 1)c
a
= rt,
def
t>0
a-
(4.86)
In Fig. 4.13 we illustrate inequality (4.86).
"Many" errors it
"Few' errors
-u+cc a-Q
t
Figure 4.13 Barrier at rt which separates paths with many and few errors.
From Wald's identity (Corollary B.6) we have
P(M) = P(Smin < u) < 2-A0
(4.87)
where u is a parameter to be chosen later and .lo < 0 is the root of the following equation (see Example B.1): g(o)
def ((1
-
E2"#)` = 1
(4.88)
Tighter Error Bounds for Time-Invariant Convolutional Codes
Section 4.3
183
That is,
,lo = -1
(4.89)
To upper-bound the probability that a burst starts at the root and that we have few channel symbol errors, P (E, F), we use the union bound and obtain
P(£, F) <
P(£(k), F)
(4.90)
k
where £(k) is the event that a burst error starting at the root is caused by path k. A path has few channel errors only if it stays below the barrier in Fig. 4.13 for all t > 0. If we take all paths of length t + 1 with wt < rt channel errors, we will get all paths with few channel errors together with the paths with many channel errors that take one or more detours above the barrier. Hence, for path k of length j + 1 we have
P(£(k),F) < P(£(k), Wj < rj)
(4.91)
where
((I +
P(£(k), Wj < rj) = E
Wj
wj
1)c
I(1 -6) (j+1)C-Wj6Wjp(E(k)
/
I wJ)
(4.92)
If we multiply the terms of the sum in (4.92) by rt-(rj-wj), 0 < 17 < 1, we obtain the inequality
P(£(k), Wj < rj) < rl-ri
E ((j +wJ1)c)(7E)Wi(1 l
E)(J+1)c-w;P(£(k)
I
wJ.)
(4.93)
wj
Let us introduce E def
7)E <E o= i]E+1-E -
(4.94)
and
1-E
1-def
EO
1)E+1-E
> 1-E
(4.95)
Substituting (4.94) and (4.95) into (4.93) and rearranging (4.93), we get
E(1 -6O))r' ( I -E ) (J+1)c
P(£(k), Wj < rJ) < (60(1
- E)
1 - EO
((I + 1)c)EO'
x wj
w
(4.96)
(1 - Eo)(j+1)c-wj P (£(k) I WJ )
Overbounding by summing over all 0 < w j < (j + 1)c, we have
P(£(k)wJ
<
r1)
(E(i_60)y)
(iE)u+l)c
<
P(£ (k))d j+1 Eo
(4.97)
where P(£(k))d, j+1,,, is the probability that a decoding error is caused by the kth path of weight d and length j + 1 on an improved BSC with crossover probability Eo. Using the Bhattacharyya bound (4.15), we obtain from (4.97)
1E (1-EO) 1- E P((k)wJ
d
(J+1)c
-EO))
(4.98)
Using the definition of rj given in (4.86) and rearranging (4.98), we obtain P(£(k), Wj
<
r1)
< (EO(1 -EE))
WdLj+1
(4.99)
Chapter 4
184
Viterbi Decoding
where
W=2
EO(1 - EO)
(4.100)
and
1-E
E(1
( L- \1-EO)
-60)
(4.101)
\EO(1-E))
Finally, we combine (4.90) and (4.91) with (4.99) and obtain the following upper bound on the probability of having few channel symbol errors and making an error when decoding the first information symbol:
P(E, F) <
E(1 - EO)
(\EO(1 -E))
n(d, j + 1) Wd Lj+l d
j
(4.102)
E(1 - EO)
where n (d, j + 1) is the number of paths of weight d and length j + 1 stemming from the root and
T(W, L)
def
T(W, L, 1)1,=1
(4.103)
where T (W, L, I) is the extended path enumerator. We now combine the bounds (4.76), (4.87), and (4.102) to obtain the following upper bound on the burst error probability:
P(E) <
(E(1 -EO))°="
T(W, L)+2"
E0(1 - E)
(4.104)
This bound is valid for all u. By taking the derivative of the right side of (4.104), we find that its minimum is obtained for
(a - ,B) log(a uo
tog
log T (W, L) - log log'(1-60j CO (I '(1-CO)
E00-E)
+a-
(4.105)
Inserting (4.105) and rearranging (4.104) give the upper bound P(S) < 2h(Y)T(W, L)Y
(4.106)
where h( ) is the binary entropy function (1.89) and E(1-EO) E0(1-E)
y-1 = 1 + log
a-fl
(4.107)
Finally, we use (4.81) and (4.82) and obtain a Viterbi-type bound:
Theorem 4.12 The burst error probability when using a convolutional code for communication over the BSC with crossover probability E and maximum-likelihood decoding is upper-bounded by PB < inf inf 2h(Y)T(W, L)1' (4.108) O<EO<E E
where Y
1
= 1+
log
EO(1
log E (1
EE) E
a) and W and L are given by (4.100) and (4.101), respectively.
(4 . 109 )
Tighter Error Bounds for Time-Invariant Convolutional Codes
Section 4.3
185
The bound (4.108) is significantly better than Viterbi's bound (4.22) for low signal-tonoise ratios. The latter bound can be obtained by choosing Eo = E rather than minimizing over Eo in (4.108).
Van de Meeberg's strengthening of Viterbi's bound (4.22) is also valid here, and we obtain
Theorem 4.13 The burst error probability when using a convolutional code for communication over the BSC with crossover probability e and maximum-likelihood decoding is upper-bounded by PB <
inf 2h(y) inf o<EO<ee
(4.110)
X C12W T(W,L)+
1
2W T(-W,L)
where y, W, and L are given by (4.109), (4.100), and (4.101), respectively. In Fig. 4.14 we compare our tightened van de Meeberg-type bound (4.110) with van de Meeberg's bound (4.28) and simulations for Qualcomm's R = 1/2, m = 6 encoder. Remark. In the next section we introduce a parameter for the BSC called the computational cutoff rate Ro = 1 - log(l + 2/e(1 - E)). For a BSC with crossover probability E = 0.045, we have Ro = 1/2. We notice that in Fig. 4.14 van de Meeberg's upper bound is nontrivial only for rates 0 < R < R0, while our tightened bound is nontrivial for rates 100
Simulation Tightened van de Meeberg van de Meeberg
10-1
10-2
I
I
PB 10-3
I
10_1
F
10-2
10-3
E
Figure 4.14 Tightened van de Meeberg-type bound compared with van de Meeberg's bound
on the burst error probability for Qualcomm's R = 1/2, m = 6 encoder and the BSC with crossover probability E.
Chapter 4
186
Viterbi Decoding
0 < R < C, where the channel capacity for the BSC is C = 1 - h(e); that is, e = 0.11 corresponds to C = 1/2. 4.4 UPPER BOUNDS ON THE OUTPUT ERROR BURST LENGTHS The errors in the output from the Viterbi decoder are grouped in error bursts. In this section, we will upper-bound the distribution of the lengths of these error events for the ensemble of periodically time-varying convolutional codes. In the next section we will use these upper bounds to obtain upper bounds on the burst error probability. For simplicity we will only consider the binary symmetric channel (BSC). Consider the ensemble E(b, c, m, T) of binary, rate R = b/c, periodically time-varying convolutional codes encoded by polynomial, periodically time-varying generator matrices of memory m and period T which we introduced in Section 3.6. Let u and e = u + i denote the estimated information sequence and the error sequence in the decoder output, respectively. In Section 3.6 we also introduced the set U1r,-m,t2+m] of information sequences ut,-m utl-m+l .. ut2+m such that the first m and the last m subblocks are zero and such that they do not contain m + 1 consecutive zero subblocks. We have the following Definition The decoder output error sequence e[t_m_1,t+j+m+1] is called an error burst
or error event of length j + 1 starting at time t and ending at time t + j + 1, if et-m_1 = 0, e[t-m,t+j+m] E u[t-m,t+j+m]+ and et+j+m+1 = 0.
In order to upper-bound the distribution of the length of an error burst starting at time t, we consider the block code lit (j) given by
Bt(j) _ {v[t,t+j+m] v[t,t+j+m] = 0 or I
v[t,t+j+m] = u[t_m,t+j+m]G[t,t+j+m], where
u[t-m,t+j+m] E'
[rt-m,t+j+m]}
where G[t,t+j+m] is given by (3.141). The rate of the block code Bt (j) is upper-bounded by
r(j) =
j+1 R j+m+1
(4.112)
(This is an upper bound since we have imposed a restriction on Assume that the transmitted sequence is the allzero sequence, and let Lt (j) denote the event that an error burst starting at depth t has length j + 1. A necessary-but not sufficientcondition for Lt (j) to occur is that the block code Bt (j) will be erroneously decoded. Thus, we have
P(Gt(j)) < P(Et(j))
(4.113)
where Et (j) denotes the event that Bt (j) is erroneously decoded. Hence, we can obtain an upper bound on P (Lt (j)) by upper-bounding the error probability of the block code 13, (j). For a periodically time-varying with period T convolutional code, we define the probability of the event that an error burst has length j + 1, j < T, to be
P(G(j))
aer
max {P(Gt(j))}
0
(4.114)
The probability P(L(j)) is upper-bounded by T-1
P(G(j)) < E P(G,(j)) t-o
Before we proceed we need the following
(4.115)
Section 4.4
Upper Bounds on the Output Error Burst Lengths
187
Definition The computational cutoff rate Ro for the BSC with crossover probability E is given by
Ro def =
1-log (1 +2/c(1 _e))
(4.116)
Using the Bhattacharyya parameter z (4.15), we can express Ro as
Ro = 1 - log(1 + z)
(4.117)
which can be rewritten as
2 Ro =
1+z
(4.118)
2
Now we have
Lemma 4.14 (Random coding bound) Consider the ensemble E(b, c, m, T) of binary, rate R = b/c, periodically time-varying convolutional codes encoded by polynomial, periodically time-varying generator matrices of memory m and period T, where T = 0(m2). For rate r (j) given by (4.112) the average probability that an error burst has length j + 1 when we communicate over the BSC and use maximum-likelihood decoding is upper-bounded by the inequality
E[P(1(j))] <
2-(Ro-r(j)+o(1))(j+m+1)c
(4.119)
for j < T and rates 0 < r (j) < R0, where Ro is the computational cutoff rate. Proof. From Theorem 3.26 and (4.15) it follows that over the ensemble E(b, c, m, T) the probability that each codeword v[t,:+j+m] 0, j < T, causes an error is upper-bounded by
(j+m+1)c
((J +m + 1)c)2-(j+m+1)c Zi -
r=o
(1 2+ z (j+m+1)c = 2-Ro(j+m+1)c
(4.120)
a
where the last equality follows from (4.118). We combine (4.120) with the upper bound, 2r(j)(j+m+1)c on the total number of codewords in 3t (j) and apply (4.113) and (4.115) and the lemma follows. Next we have
Definition The expurgation rate ReXp for the BSC with crossover probability a is given by
+22 E(1E(1-c-)
ReXp = 1 - h 1
(4.121) E)
where h ( ) is the binary entropy function (1.89). For rates r (j) less than the expurgation rate ReXp, we can obtain an essentially stronger bound. Consider the fraction of convolutional codes in the ensemble E(b, c, m, T) whose active row distances satisfy the condition in Lemma 3.33. Since this fraction is larger than f it follows analogously to the proof of Lemma 4.14 that the average error probability for this expurgated ensemble is upper-bounded by Eexp[P(E(J))] = coax {Eexp[P(Et(J))]} 0
T f
(j+m+1)c
((J +m + 1)c)2(r(i)-1)(j+m+1)czi
Jo <j < T
(4 .122)
i
where d is the largest integer satisfying (3.179) and Jo is the smallest integer satisfying (3.177).
Chapter 4
188
Viterbi Decoding
For any A > 0, we can further upper-bound (4.122): T (j+m+1)c
Eexp [ f P(.6W)
I
E
((j + m + 1)c)2_2J)_1)(J+m+1)cZt
i=a + 1
7
U+m+1)c
2(rU) 1)U+m+1)c2 x&j
< f
f
=T
((j + m + 1)c 2;iZi
(4.123)
l
i-p
2(r(j)-1)(j+m+l)c2-xa; (I + 2Az)(j+m+1)c
jo < j < T
Let us choose p 2 = (1-,)z, a,>0 a
(4.124)
where ar
z
i
(j+m+1)c
>
1+z
(4.125)
and where the inequality follows from A > 0. Then we can rewrite (4.123) as 2(r(j)-1)(j+m+1)c2h(P)(j+m+1)czP(j+m+1)c
Eexpl[P(E(j))l J < T
f
2(h(P)+r(j)-1)(j+m+1)cZP(j+m+1)c
ff
< I
zP(j+m+1)c
(4.126)
jo < j < T
T
where the last inequality follows from (3.179). Let us choose f = 1/2, then we have p"(j+m+1)c jo < j < T Eexp[ P ((j))] < 1Tz
(4.127)
From (3.179) it follows that log(2T2) h(p) < I - r(J) -
(j+m+ 1)c
(4.128)
By choosing T = O (m2), we obtain that for large values of m we have jo = 0 (cf. Section 3.8). Furthermore, (4.129)
and, hence, for large values of m we can replace p by p + O (lomm), where p is the GilbertVarshamov parameter of rate r (j). Thus, we have Eexp[P(S(j))Il < 2p(j+m+1)clogz+O(logm) (4.130)
where j < T and cf. (4.125) z
p> (1+z)+O
log m
(4.131)
m
For the BSC we have the Bhattacharyya parameter, cf. (4.15),
z=2 E -I- E )
(4.132)
Eexp[P(£(j))] < 2p(j+m+1)clog(2 e(1-e))+O(logm)
(4.133)
and, finally, we obtain
Section 4.4
Upper Bounds on the Output Error Burst Lengths
189
where j < T and
p>
2
E(1 _-E)
+
1 + 2 E(1 - E)
(logm) mJ
0
(4.134)
or, equivalently,
r (j) < Rexp
+0
log m
(4.135)
M
where the expurgation rate Rexp is given by (4.121). From inequality (4.113) we obtain
Lemma 4.15 (Expurgation bound) Consider the ensemble E(b, c, m, T) of binary, rate R = b/c, periodically time-varying convolutional codes encoded by polynomial, periodically time-varying generator matrices of memory m and period T, where T = 0 (m2). There exists a subset containing at least half of the codes such that the average probability that an error burst has length j + 1 when used to communicate over the BSC with maximum-likelihood decoding is upper-bounded by the inequality Eexp[P(G(j))l < 2P(i+m+1)clog(2 E(1-E))+0 (logm)
(4.136)
for j < T, 0 < r(j) < Rexp + 0 1-9 m ) and where p = h-1(1 - r(j)) is the GilbertVarshamov parameter of the rate r(j) and Rexp is the expurgation rate (4.121). Next, we introduce
Definition The critical rate Rent for the BSC with crossover probability c is given by
Rcrit=1-h
,/6- +
(4.137) 1
where h() is the binary entropy function (1.89). We will now derive the so-called sphere-packing bound for rates r(j) > Rc it. We will exploit the idea of "few" (.F) and "many" (M) errors introduced in Section 4.3. Assume that the allzero sequence is transmitted over the BSC and that r[t,t+j+m] is received and decoded by a maximum-likelihood decoder. Let M = {r[t,t+J+m] I wx(r[r,r+j+m]) ? p(j + m + 1)c}
(4.138)
where p is the Gilbert-Varshamov parameter of the rate r (j ). The set .T is the complement of M. Following the thread in Section 4.3, we upper-bound the average error probability for the ensemble S(b, c, m, T) by (cf. (4.76))
E[P(EE(J))] < E[P(EE(j), .T')) + P(M)
(4.139)
Let io be smallest integer larger than or equal to p (j + m + 1) c; that is,
io = [p(J + m + 1)cl
(4.140)
Then from (4.138) it follows that
P(M) ==
(J+m+1)c
((j + m + 1)c)
i=io
i
E
E` (1 - E)(J+m+l)c-i
(4.141)
Chapter 4
190
Viterbi Decoding
For any A > 0 we have (J+m+1)c
P (.M) <
((j + m + 1)c
=io
6'(1 - )(j+m+')ci2)(i
io)
i
(4.142) (J+m+l)c
E
< 2 XP(l+m+1)c
(j +M + I)C
i=io
l
l
C
(c2")'(1 -
)(J+m+l)c-i
Next, we upper-bound the right side of (4.142) by extending the summation to i = 0. Then we obtain (j+m+ 1)c
E
P(M) < 2-AP(i+m+1)c
(j + m + 1)C
\
i=0
= 2-XP(J+m+1)c(1 - E +
lI (E2X)i (1 -
)(J+m+1)c-i
(4.143)
J
l
2AE)(j+m+1)c
Its minimal value is achieved when
a.=Ao=log
p(1 - E)
(4.144)
(1 - P)E
The minimizing value ,lo > 0 if and only if p > E, which corresponds to (4.145)
r(j) < C = 1 - h(E)
where C is the channel capacity of the BSC. Inserting ,lo into (4.143) yields
p(I - E) P(te)<((I-p)E)
P(J+m+1)c
=2-(Pluge+(1-P)log
(1-E+
)(J+m+1)c
,p
PO - E) )(J+m+1)C
>
I-p
(4.146)
E
Next we will upper-bound E[P(Et(j),F)]. Suppose that the word received over the BSC contains i errors where i < io; that is, we have a situation with "few" errors. Assuming maximum-likelihood decoding, we can upper-bound the error probability by the probability that a randomly chosen codeword is at distance at most i from the received word. Since these two events are independent, we have the following upper bound: io-1
i
E[P(Et(J), F)] < E
E2r(J)(j+m+1)c(J + + k
i=o k=O X
((j + m + 1)C)ci(I
(4.147)
-
E)(J+m+1)c-i
Now we introduce two parameters A' > 0 and µ > 0 such that we can extend the summation intervals without increasing the bound too much. Thus, we can further upper-bound (4.147)
Section 4.4
Upper Bounds on the Output Error Burst Lengths
191
as follows:
i0-1(I+m+1)c ((j + m + 1)c) ((j +M + 1)c) k J i
E i=0
k=0
1-i)Ee
2(r(I)-1)(I+m+1)c2µ(1
x
(1
-
((I+m+1)c ((j +
< 2a'(i0_1)2(r(j)-1)(j+m+I)c
1: k=0
E)(I+m+1)c-i
m + 1)c - µk k
2
(4.148) is
1
X
((j + m + 1)c)(E2µ-A')i(1 i
i=0
<
-
E)(I+m+1)c-i
2-µ)(I+m+1)c `0
X
1
E ((j
+m+1)c E)(J+m+1)c-i
l
i=0
J
where we have used (4.140) to obtain the last inequality. Extending the summation over i to i = (j + m + 1) c, we obtain:
E[P(Et(j), L)) <
2-P(J+m+1)c2(r(I)-1)(I+m+1)c(1
+ 2µ)(j+m+1)c
X (1 - E + E2µ )(I+m+1)c
(4.149)
It is straightforward (but tedious) to show that the upper bound (4.149) achieves its minimum value for
µ=/t0=log
1-p
(4.150)
p
and - p)ZE X' = Xo = log
4.151)
(
P (1 - E) 2
where p is the Gilbert-Varshamov parameter. The inequality) > 0 is equivalent to (4 . 152)
p< '/6- + 1-E which corresponds to
r (j ) >
1-h
\\\\ 'A-
+
)
=
Rc,;t
( 4. 153 )
1 - E JJJJ
where Rat is the critical rate (4.137). By inserting (4.150) and (4.151) into (4.149), we obtain
E[P(et(j),.:)l
<2-(Ploge+(1-P)log
)(I+m+1)c
4.154)
which holds for j < T and r (j) > Rcpt We can obtain an upper bound by combining (4.139), (4.146), and (4.154). Somewhat surprisingly, we can obtain a more elegant upper bound directly from (4.139), (4.142), and
Chapter 4 • Viterbi Decoding
192
(4.148) by inserting the parameters A
and
F)1 + P(M)
< <
A0, A' =
/(1
— Pi
E
\ p(j+m+1)c
21)+m+1)c(1 —
—
+
+ 1
+
((1
p)E
—
—
y(f+rn+l)C (J+m+l)c((
e))
+m+ i
'\
f-
—
(1 —
e)p
1
p
—
(4.155)
)
x (1 — —
(j+m+(j
((1—
((i +m +
((1—
x (1 — where the last equality follows from
r(j)= 1—h(p)
(4.156)
By evaluating the sum in (4.155), we obtain / (1
—
p)E \ p(f+m+1)c "(1
p(l—e) )
—
€)p
i—p
+
\ (J+m+1)c i — E)
(4.157)
= ((P)_0
= 2—(p'g
log
which holds for rates
Lemma 4.16 (Sphere-packing bound) Consider the ensemble E(b, c, m, T) of binary,
rate R = b/c, periodically time-varying convolutional codes encoded by polynomial, periodically time-varying generator matrices of memory m and period T, where T = 0(m2). For rates r (j) given by (4.112), the average probability that an error burst has length j + 1 when we communicate over the BSC and use maximum-likelihood decoding is upper-bounded by the inequality
<
for j < T,
2_-(piog
< r(j) < C, and where p
parameter of the rate r(j) and
log
(4.158) —
r(j))
is the Gilbert-Varshamov
is the critical rate (4.137).
From our derivations we can also deduce the following (cf. [Ga168]):
Theorem 4.17 There exists a binary block code 13(j) of rate r(j) and block length (I + m + 1)c such that the event e(J) that 8(j) is erroneously decoded, when it is used to communicate over the BSC together with maximum-likelihood decoding, is upper-bounded by
P (E(j)) where
(
) is the block coding exponent
(4.159)
Upper Bounds on the Output Error Burst Lengths
Section 4.4
E13(r) =
-p log (2 E(1 - e)) , Ro - r, P log + (1 - p) log
193
0 < r < Rexp r < Rcrit Rexp Rcrit
(4.160)
r
and where
p = h-1(1 - r)
(4.161)
is the Gilbert-Varshamov parameter, Rexp is the expurgation rate, and Rerit is the critical rate.
In Fig. 4.15 we show the block coding exponent E13(r) for the BSC with crossover probability e = 0.045, which corresponds to Ro = 1/2. The transition points between the expurgation, random coding, and sphere-packing bounds are marked with . Eti(r)
-1log(2
E(1 - E))
r Rexp
Rcrit
Figure 4.15 The block coding exponent Eg(r) for the BSC with crossover probability c = 0.045, that is, Rp = 1/2.
From (4.113) it immediately follows that a convolutional code exists in the ensemble £ (b, c, m, T) such that the probability of the event G (j) that an error burst has length j + 1, 0 < j < T, is upper-bounded by the right side of inequality (4.159); that is, P(C(j)) < 2-(En(r(1))+o(1))(j+,n+l)c
(4.162)
where EB (r (j )) and r (j) are given by (4.160) and (4.112), respectively. We define the error burst length exponent L(f) to be
L(2)
,ef Eri(r(j))(j +m + 1)
(4.163)
M
where
f = (j + 1)/m
(4.164)
Then we have
Theorem 4.18 There exists a binary, rate R = b/c, periodically time-varying convolutional code encoded by a polynomial, periodically time-varying generator matrix of memory m and period T, where T = 0 (m2), such that the probability of a length j + 1 error burst from a maximum-likelihood decoder when used to communicate over the BSC is upper-bounded by
P(G(j)) <
2-(L(e)+o(1))mc
(4.165)
where j < T, £ = (j + 1)/m, and L(t) is the error burst length exponent given by (4.163).
Chapter 4
194
Viterbi Decoding
The error burst length exponent L(E) can be constructed geometrically from the block coding exponent EB(r(j)). This construction is similar to Forney's inverse concatenated construction (cf. Section 3.8). Consider the block coding exponent EB(r(j)) as given in Fig. 4.16. Draw a straight line from the point (R, 0) through (r(j), EB(r(j))). This line intersects the EB(r)-axis in the point (0, E0). From Fig. 4.16 and (4.112) it follows that
R - r(j) _
E13 (r(j))
R
E0
1
r(j)
m
R
j+-M+1
( 4 . 166 )
£+1
Thus, by combining (4.163) and (4.166) we obtain
L(() = E0
(4.167)
as illustrated in Fig. 4.16. E13(r)
0.5
0.5
E13(r)
0
0' 0
r
4
2
B
6
Figure 4.16 Geometrical interpretation of the relation between the block coding exponent E13(r) and the error burst length exponent L(P) for the BSC with crossover probability s = 0.045.
In Fig. 4.17 we show the error burst length exponent L (f) for various rates for the BSC. The transition points between the expurgation, random coding, and sphere-packing bounds are
marked with . L(f) R = 0.4
1.5
1
R = 0.5
0.5
R = 0.6
00,
2
4
6
8
10
e
Figure 4.17 The error burst length exponent L(E) for various rates for the BSC with
crossover probability e = 0.045.
Section 4.5
Error Bounds for Periodically Time-Varying Convolutional Codes
195
The value of e = (j + 1)/m for which L(e) achieves its minimum is called the critical length and denoted ecrit. It is the most probable (normalized) length of an error burst. In Fig. 4.18 we show the critical length as a function of the rate R. We notice that ecrit approaches infinity when R approaches the channel capacity. In other words, when we communicate at
rates close to the channel capacity the typical error bursts are very long. Thus, we cannot draw any conclusions about the burst error probability from the initial part of the path weight enumerator. The discontinuity at the computational cutoff rate Ro is due to the straight line in the block coding exponent Era (r (j)) for rates ReXp < r (j) < Rerit. For R = R0, we have E0 = Ro. Thus, the critical length makes a jump at Ro from Ro
ecrit =
-1=
Eti(Rexp)
Rexp
,
f or R = Ro
- E, E > 0
(4. 168)
,
for R = Ro + E, E > 0
(4.169)
Ro - Rexp
to
=
R n0
E13 (Rcrit)
-1=
Rcrit
Ro - Rcrit
Since the critical length is derived from the block coding exponent EC3(r), which is a lower bound on the "true" error exponent, it follows that the critical length shown in Fig. 4.18 is a lower bound on the "true" value. However, in Section 4.6 we show that in the sphere-packing region the block coding exponent is tight. Hence, ecrit assumes the correct value for rates
Ro
5
Figure 4.18 The critical length ecrit for the BSC with crossover probability E = 0.045 correspond-
0'
--- R
0
ing to R = Ro = 1/2.
C
4.5 ERROR BOUNDS FOR PERIODICALLY TIME-VARYING CONVOLUTIONAL CODES In order to upper-bound the burst error probability via the distribution of the error burst lengths
for the ensemble £(b, c, m, T), we have to restrict the burst error probability to those error events caused by bursts of lengths that are at most T. Let us denote this restricted burst error probability by PB . Then we use the union bound and obtain from (4.162) that T-1
Ps <
P(G(j)) =o
(4.170)
< T2-(E13(r(Jm:n))+o(1))(Jmin+m+l)c
where the real number j,," is the value of j (here we allow j to be any real number) that minimizes the exponent of the right hand side of (4.162).
Chapter 4
196
Viterbi Decoding
Let us introduce
f (j) = EL3(r(j))(j + m + 1)c
(4.171)
pB <
(4.172)
Then we have
In the expurgation region, that is, for rates 0 < r (j) < Rexp, we have
f (j) = -p(j + m + 1)c log z
(4.173)
where z is the Bhattacharyya parameter (4.15) and p is the Gilbert-Varshamov parameter, that is,
j+1 R=1-h(p) r(j)= j+m+1
(4.174)
= 1+plogp+(1 -p)log(1 -p) Regarding j as a continuous variable and taking the derivatives of (4.173) and (4.174), we obtain (notice that p is a function of j)
f'(j) = -p'(j + m + 1)c log z - pc log z and m
r'(j) = (j+m+1)2 R
_
-p
1° g
l
p
-p
(4.175)
(4 . 176)
Since (4.177)
f (jmin) = 0 we obtain from (4.175) and (4.176) that
p=-P'(jmin+m+1) mR (jmin + m + 1) log
(4.178) PP 1
where
p = 1 - h-1(r(jmin))
(4.179)
Hence, by inserting (4.178) into (4.173) we obtain
f (jmin) _ -p(jmin + m + 1)c log z mcR
log(,P)
t og z
(4.180)
It is beneficial to introduce a parameters, 1 < s < oo, such that z,/s 4 . 81)
1 + zl/'' (If 0 < p < ,+Z, then we can always find such an s.) Then we obtain from (4.178) that p log 1,
PP
mR
= - jmin + m + 1 = r (jmin) - R =1+plogp+(1-p)log(l-p)-R
(4.182)
where the last equality follows from (4.174). Using (4.181), we can rewrite (4.182) as
R=1+log(1-p)=1-log(l+zi/s)
(4.183)
Section 4.5
Error Bounds for Periodically Time-Varying Convolutional Codes
197
By inserting (4.181) into (4.180), we obtain
f (join)/mc = sR = s(1 - log(1 + zl/S))
(4.184)
where the last equality follows from (4.183). Next, we introduce the expurgation function def
Gexp(s) = s(1 - log(1 + Z1/S))
4.185)
where 1 < s < oo and z is the Bhattacharyya parameter. Hence, in the expurgation region we have the parametric dependence
J f (Jmin)/me = Gexp(s)
(4.186)
= Gexp(s)/s
11 R
where 1 < s < oo or, equivalently, 0 < R < Ro. From (4.173), (4.181), and (4.186) it follows that
l=
jmin + rn
-Gexp(s)(l + Z1/S)m ZI/S log Z
(1 - log(1 + z1/S))(1 + z1/S)m
(4.187)
_
R(1 + z1/S)m
zI/S log zt/S
Zl/S log z1/S
From (4.185) and (4.186) we have zl/S = 21-R
-1
(4.188)
which we insert into (4.187) and obtain Jmin + I
m
R21-R + (21-R
_
(21-R
- 1) log(21-R - 1)
- 1) log(21-R - 1)
0 < R < Ro
(4.189)
In particular, when R -a Ro or, equivalently, when s -+ 1, we have lim
S-> 1
Jmin + 1 m
- - (1 + z) log
12 z
+ z log Z
-
Rexp
Ro -Rexp
z log z
(4.190)
which coincides with (4.168). In the random coding region, that is, for rates Rexp < r (j) < Ro, we have (cf. (4.160)):
.f (j) = (Ro - r (j )) (j + in + 1)c
= Romc - r(j)(j +m + 1)c+ (j +1)Roc
=Romc-(j+1)Rc+(j+1)Roc
(4.191)
= Romc + (j + 1)(Ro - R)c The minimum value of f (j) is obtained for j = 0. Hence, for the random coding region we have
f (0)/mc = Ro + (Ro - R)/m
(4.192)
This bound is valid for the same region of rates R as the expurgation bound, viz., 0 < R < Ro, and since it is weaker than (4.186) it can be omitted. Finally, we consider the sphere-packing region, that is, rates Ro < r(j) < C, where
f(j)=(plogP+(1-p)log 1-p)(j+m+1)c 6 45
(4.193)
Chapter 4
198
Viterbi Decoding
Then the minimizing value of j, viz., jmin, is the root of pr(Jmin
f (r jmin) =
+
p(1 - E) (1 - p)E 1 p - p) log l - E c 11n 2 = 0
+m + 1)clog p
(Plog+(l
-
(4.194)
where p' satisfies (4.176). Using (4.194) we can rewrite (4.193) as
f (jmin) = -p,(jmin + m + 1)2c log P 1
p)E (4.195)
= -Rmc
log (1(1 P)E
where the last equality follows from (4.176). Again it is beneficial to introduce a parameters. Here we choose s, 0 < s < 1, such that
p=
E i+s
(4.196)
E ,+s + (1 - E) I+s
or , equivalentl y, log
p _ l-p l+s log l - E E
1
(4.197)
Then , from (4 . 195) we obtain (4.198)
f (jmin)/mc = sR From (4.194) it follows that
p
mi n
+m+ 1)
p log
p 25
+ (1 - p) log I -P l og
p(I-E) (I-P)E
(4 . 199)
We insert (4.196) inside the logarithms in the numerator of (4.199), split the denominator, and obtain: p log p (Jmin + m +
1) _ -
, E
,
+(1-E)
log
+ (1- p) log Er+s+(1-E)r+s , I-E) P 1-p - log
E
1-E
(4.200)
-I+s(ploge+(1-p)log(1-E))-log(E++s+(1-E)i+ log Ia - (1 + S) log I-P-P
where we used (4.197) in order to simplify the denominator. From (4.196) it follows that
loge=(1+s)(log p+log(E+s+(1-E)+))
(4.201)
log(1 - E) = (1 + s) (log(1 - p) + log (E 1 +' + (1 - E) I+s ))
(4.202)
and
Section 4.5
Error Bounds for Periodically Time-Varying Convolutional Codes
199
Inserting (4.201) and (4.202) into (4.200) yields
P'(jmin + m + 1)
-s(plogp+(1-p)log(1-p))-(1+ s) log (Eh+(1-E)11+5) s log
=
p 1
tss
-(p log p + (1 - p) log(1 - p)) -
(4.203)
p
log (E 'Ti + (1 - E) 1+s
ip
log
p
I
Equality (4.176) can be rewritten as
n'(;- m 4-m+1)=
mR (jmin + m + 1) log i Pp
(4.204)
and from (4.174) it follows that
R-r(j)= j +mR m1 +
(4.205)
Combining (4.204) and (4.205) yields
P'(jmin+m+ 1)_
R - r(j) log
% P
p
_ R - 1 -plogp-(1 -p)log(1 -p)
(4.206)
log p-p 1
where the last equality follows from (4.174). From (4.203) and (4.206) we conclude that
R=1 - 1+slog(E1+s+(1-E)1+s) s
(4.207)
Let us define the Gallager function for the BSC to be
G(s) = s - (1 + s) log (E t+s + (1 - E) t+s)
(4.208)
Then, for the sphere-packing region we obtain from (4.198) and (4.207) the following parametric expression: J
l
.f (jmin)/me = G(s)
(4.209)
R = G(s)/s
where 0 < s < 1. (The existence of a solution for all R, 0 < R < C, of the second equation in (4.209) follows from the properties of G(s) shown in Problem 4.21.) From (4.193), (4.207), (4.208), and (4.209) it follows that G(s)m imtn +m+1 = p log + (1 - p) log tI-P -E
sRm
plogGT+s +(1 - p)log(l -E)t+s + log(E1+s +(1 -E)1+s
(4.210)
Rm
plogp+(1 -p)log(1 -p)+
'SS log(Ett+s
+(1 -E)1+s)
or, equivalently,
jmin+l _ m
plogp+(1-p)log(1-p)+1 plogp+(1-p)log(1-p)+1-R
(4.211)
200
Chapter 4
Viterbi Decoding
where p is defined by (4.196). In particular, for R = Ro we have Jmin + I
_
m
Rcrit
(4.212)
Ro - Rcrit
which coincides with (4.169). From (4.170) and (4.171) it follows that the restricted burst error probability can be upper-bounded by (4.213)
PT < T2-f(i-io)+o(1)(j+m+1)c
By choosing T = 0(m2), say, we can summarize our efforts in the following
Theorem 4.19 There exists a binary rate R = b/c, periodically time-varying convolutional code encoded by a polynomial, periodically time-varying generator matrix of memory m and period T, where T = O(m2), such that the burst error probability due to error bursts of lengths at most T from a maximum-likelihood decoder when used to communicate over the BSC is upper-bounded by pT < 2 (Ec(R)+o(1))mc -
(4.214)
B
where {Gexp(S)
p(s)/s,
E(R)
G(s)
IR = G(s)/s,
1s< oo, 0RRo (4.215)
0<s<1, Ro
is the convolutional coding exponent for the expurgation and sphere-packing regions, respectively, and where Ro is the computational cutoff rate (4.116), Gexp(s) is the expurgation function (4.185), G(s) is the Gallager function (4.208), and C is the channel capacity for the BSC.
From (4.183) it follows that R - 0, when s -+ oo. Hence, from (4.185) we obtain 1
EC (0) = - log z
(4.216)
2
where z is the Bhattacharyya parameter (4.15). In Fig. 4.19 we show the convolutional coding exponent EC (R) for the BSC with c = 0.045, which corresponds to Ro = 1/2. Rc(R)
- 21og(2
E(1 - E))
R Ro
Figure 4.19 The convolutional coding exponent Ec (R) for the BSC with c = 0.045, that is, Ro = 1/2.
Section 4.5 U Error Bounds for Periodically Time-Varying Convolutional Codes
201
Forney [For74a] showed that the convolutional coding exponent can easily be constructed from the block coding exponent and vice versa. Let r denote the rate of the block code 8 that gives the largest contribution to P, that is, r = r Then, for the expurgation region we combine (4.174), (4.181), (4.183), (4.185), and (4.186) and obtain Gexp(S) (i
sR
(1
—
—
—
h(p))s (4.217)
= where 0 r < Rexp and 0 R < R0. Similarly, for the sphere-packing region we combine (4.161), (4.196), (4.197), (4.208), and (4.209) and obtain
G(s)
(i —
= G(s) (i — —(1
=
—loge
R
_h(P))s)
+s) log (€th +(1
=
_€)th) +h(p)s
+ (1 + s)logp + h(p)s
=—loge+
where
(1
log —i-1-e
log
(4.218)
flog \log
logp+h(p)(
Hence, it follows from (4.217) and (4.218) that the convolutional and block coding exponents are related as
=
R
(4.219)
R—r
The corresponding geometrical construction is shown in Fig. 4.20. Notice that the random coding region for the block coding exponent collapses into the point (R0, R0) on the (R) curve.
In Section 4.2 we showed that the free distance is the principal determiner for the error probability when we use maximum-likelihood decoding at large signal-to-noise ratios, and in Section 3.5 we showed that the free distance for convolutional codes encoded by systematic, polynomial, encoding mathces is off by a factor (1 — R) compared to the free distance for convolutional codes encoded by nonsystematic generator matrices. In the proof of Heller's bound (Theorem 3.23), the "effective" length when we used systematic, polynomial, encoding matrices was only (m(l — R) + i)c instead of (m + i)c for nonsystematic generator matrices. We have the same reduction in the derivation of the corresponding upper bound on the error probability. Hence, we have
Theorem 4.20 There exists a binary rate R = b/c, periodically time-varying convolutional code encoded by a systematic, polynomial, periodically time-varying encoding matrix of memory m and period T, where T = 0(m2), such that the burst error probability due to error bursts of lengths at most T from a maximum-likelihood decoder when used to communicate over the BSC is upper-bounded by
0<
R
(4.220)
202
Chapter 4
Viterbi Decoding
EB(r), Ec(R)
ES(r)
r, R
Figure 4.20 Geometrical construction of Ec (R) from E13 (r) and vice versa for the BSC with crossover probability e = 0.045.
where Es S(R)
= Ec(R)(1 - R)
(4.221)
is the convolutional coding exponent for convolutional codes encoded by systematic, polynomial encoding matrices. In Fig. 4.21 we compare the convolutional coding exponents ES (R) and EC (R) for the BSC with e = 0.045, which corresponds to Ro = 1/2. Ecrs(R), Ec(R)
Figure 4.21 The convolutional coding exponents Es s(R) and EC(R) for the BSC with e = 0.045.
We will conclude this section by upper-bounding the bit error probability via the distribution of the error burst lengths for the ensemble E(b, c, m, T). Thus, we have to restrict the bit error probability to those events caused by bursts of lengths that are at most T. We denote this restricted bit error probability by Pb .
Section 4.6
Lower Error Bounds for Convolutional Codes
203
Since an error burst of length j + 1 can cause at most (j + 1)b bit errors, we have T-1
Pb < E(j + 1)b P(1(j))
(4.222)
j=0
Combining (4.222) and (4.159), we obtain T-1
PT b < (j + 1)b2-(Eg(r(j))+o(1))(j+m+1)c
(4.223)
j=0
Now we rewrite (j + 1)b as (j + 1)b = 21og((j+1)b) _ 2o(1)(j+m+1)
(4.224)
which is inserted into (4.223). Thus, the restricted bit error probability is upper-bounded by T-1
PT (J + 1)b2 b -<j=o
(4.225)
< T2-(EB(r(Jmmn))+0(1))(jmi.+m+l)c
which coincides with (4.170). Hence, for the restricted bit error probability we obtain as a counterpart to Theorem 4.19: Theorem 4.21 There exists a binary rate R = b/c, periodically time-varying convolutional code encoded by a polynomial, periodically time-varying generator matrix of memory m and period T, where T = 0(m 2), such that the bit error probability due to error bursts of lengths at most T from a maximum-likelihood decoder when used to communicate over the BSC is upper-bounded by PbT < 2-(Ec(R)+0(1))mc
0
(4.226)
where the convolutional coding exponent EC (R) is given by (4.215).
As a counterpart to Theorem 4.20 we have
Theorem 4.22 There exists a binary rate R = b/c, periodically time-varying convolutional code encoded by a systematic, polynomial, periodically time-varying encoding matrix of memory m and period T, where T = 0 (m2), such that the bit error probability due to error bursts of lengths at most T from a maximum-likelihood decoder when used to communicate over the BSC is upper-bounded by
Pb
< 2-(Eca(R)+o(1))mc
0
(4.227)
where the convolutional coding exponent E s(R) is given by (4.221). Asymptotically, for increasing memory, the upper bound on the restricted bit error probability decreases exponentially with the same exponent as the upper bound on the restricted burst error probability.
4.6 LOWER ERROR BOUNDS FOR CONVOLUTIONAL CODES As a counterpart to our upper bounds on the burst and bit error probabilities, we will derive lower bounds; that is, we will be concerned with finding the minimum burst and bit error probabilities any rate R convolutional code of memory m must exceed. First, we need the corresponding bound for block codes. For simplicity we study only the BSC.
204
Chapter 4
Viterbi Decoding
Consider an arbitrary rate r binary block code of block length N and assume that it is used to communicate over the BSC with crossover probability E. The number of codewords is
M=2 rN
(4.228)
Our goal is to obtain a lower bound on error probability P (£), where
P(£) = 1 - P(E)
def
1 - min{P(£ I i)} = max{P(£ I i)} i
i
(4.229)
where P (E i) and P (S i) denote the probability of correct and erroneous decoding, respectively, when the codeword v(i), i = 0, 1, ... , M - 1, is transmitted. Let D; denote the decoding region for the ith codeword, that is, I
I
D,
daf
{r I r is decoded as vii)}
(4.230)
i = 0, 1, ... , M - 1. Then it follows that
P(£Ii)=EP(rIv(`))
(4.231)
rED;
Clearly, min{I Di 11 <
M = 2( 1-r)N
(4.232)
For the BSC we have
P(r I
V(i))
= EdH(rl00))(1
- E)N-dH(rtvI'I)
(4.233)
where dH(r, v(i)) is the Hamming distance between the received word r and the ith codeword V(i).
The conditional probability of receiving r when the i th codeword is transmitted, P (r vii)), is monotonically increasing with decreasing dH(r 10)). Hence, in order to achieve its maximal value, the sum (4.231) should be taken over those received words r that are equal to vii), at Hamming distance 1 from vii), at Hamming distance 2 from vii), and so on until we have reached I Di I terms. Let ko denote the largest integer such that I
kp-1 /Nl (
k=O
k
) < min{ I Di I } i
and let k°-1
o
(4.235)
k=O
Then for a given value of I Di I, the maximum possible value of P (£ I i) will be
P(£ I i) _
k0-1 N
(k)eku - E)N k
(N)(l
AiEkO(1
- E)N-=
-
(4.236)
(4.237)
Section 4.6
Lower Error Bounds for Convolutional Codes
205
the smallest possible value of the error probability P (E 1 i) will be
P(EIi)=1-P(EIi) _ k=ko
()i -
E)N-k
N ),ko+l(1
- AiEko(1 - ,)N-ko
- E)N ko
ko +
(4.238)
1
where the inequality follows from (4.235). From (4.232) and (4.235) we obtain ko-1
2( 1_r)N
min[ I Di 11
(N)
>
(4.239)
k=0
By combining (4.229), (4.238), and (4.239), we obtain the following parametric lower bound for rate r block codes: 2(1-r)N
l
N > Go-1)
P(E) > (ko
(4.240) -E)N-ko-l
1)Eko+1(1
Remark. The bound (4.240) is known as the sphere-packing bound for the following reason. The set of sequences at distance ko or less from a codeword can be interpreted as a sphere of radius ko around that codeword. If we could choose codewords such that the set of spheres of radius ko around the different codewords exhausted the space of binary N-tuples and intersected each other only on the outer shells of radius ko, then the error probability would be lower-bounded by (4.240).
To obtain a more illuminative form of our lower bound, we need the following
Lemma 4.23 The binomial coefficient (k) satisfies the inequality
\kJ
8k(N-k)2
`
(4.241)
N
> where hO is the binary entropy function (1.89). Proof.
The lemma follows from a refinement of Stirling's formula [Fe168]: 27tH n n e
ne(12n+1)-'
< n! <
2nn nne
"e(12n)-'
(4.242)
Hence, we have
(N) = k
N!
k! (N - k)! NN N e(12N+1)-'-(12k)-'-(12(N-k))-' 27rk(N - k) kk(N - k)N-k
(4.243)
We note that
-(12k)-' - (12(N - k))-1 > -1/9
(4.244)
except for k = 1, N - k = 1; k = 1, N - k = 2; and k = 2, N - k = 1. Thus, with these exceptions, e(12N+1)-'-(12k)-'-(12(N-k))-' > e-119 >
2n 8
(4.245)
Chapter 4
206
Viterbi Decoding
and
>
(N
k)
N
2Tr2
2nk(N - k)
Alog(1 N))N
NIOgN-(1
8
(4.246)
N
2h(# )N
8k (N - k) For the exceptions the inequality can easily be verified numerically; in fact, for k = N - k = 1 we have equality.
We have ko < N/2, and, hence, we can rewrite (4.241) as
N
>
2h(p)N
1
(4.247)
8 (1 - p)N
ko
where p = ko/N < 1/2. Then from (4.240) we obtain 1
1 - r > h(p-1)
- 2N
log(8p-1(I - p-1)N)
(4.248)
= h(p + o(1)) + o(1)
where
p-1 = (ko - 1)/N = p - 1/N
(4.249)
and
- N log P (S) < p+1 log
pE 11
+ (1 - p+1) log
pE 1
11
1
+ 2N log(8p+1(I - p+1)N)
(4.250)
(p + o(1)) log P + o(1) + (1 - p + o(1)) log
+E 1
(1) + o(1)
1
where
p+1=(ko+1)/N=p+1/N Both h(p) and p log
(4.251)
+ (1 - p) log -P are continuously differentiable functions of p. It
follows from (4.248) that
p > h-1(1 - r) + o(1) When N that
(4.252)
oo we can always choose p to be arbitrarily close to h 1(1 - r), and it follows
- N P (E) < p log P + (1 - p) log
1
- p +0(l)
(4.253)
where
p = 1 - h-1(r)
(4.254)
Thus, we have proved
Theorem 4.24 For any rate r block code B with block length N that is used to communicate over the BSC with crossover probability c, the error probability is lower-bounded by
P(E) >
(')+o(1))N
(4.255)
Section 4.6
207
Lower Error Bounds for Convolutional Codes
where E h(r) is the block sphere-packing exponent P E,,sph(r) = p log E + (1 - p) log I1 -- pe
(4.256)
and p is the Gilbert-Varshamov parameter (4.254).
We are now well prepared to derive the corresponding lower bound on the burst error probability for convolutional codes.
Lemma 4.25 For any rate R = b/c convolutional code encoded by a generator matrix of memory m and overall constraint length v = bm that is used to communicate over the BSC with crossover probability e, the burst error probability is lower-bounded by PB
> 2-(EC°(R)+o(1))mc
(4.257)
where Ec h(R) is the convolutional sphere-packing exponent
E(R) h = G(so) C
(4.258)
R = G(so)/so
(4.259)
so satisfies
and0 0 there exists a block length NE such that for any N > NE we have
P(E) > 2-Er(r)+E)N
(4.260)
Analogously, Lemma 4.25 states that for any e > 0 there exists a memory mE such that for any in > mE we have PB >
°
(EC(R)+E)mc
(4.261)
Now suppose that inequality (4.261) does not hold. Then as a consequence there exists a certain a such that for any large enough mE there exists a memory m > mE such that
PB <
2-(EC°(R)+2e)mc
(4.262)
Then we terminate this convolutional code (with very good burst error probability according to (4.262)) into a block code B of rate r such that Ec h(R)
R
R-r
EL3h(r)
(4.263)
and that the block length is me N = EE Sph (R) (r)
(4.264)
L3
It is easily shown that such a rate r exists for all R, 0 < R < C. Next, we will show that this block code has such good error probability that it violates (4.260) and thus cannot exist. Let us choose mE such that
Ec h(R) - Eah(r)
E (r)
m E <2 EM'C
(4 . 265)
h
Then for any m > mE we have the number of information subblocks def j+1 =
Ec h(R) sph E,, (r)
-1 m <2Emc
(4.266)
Viterbi Decoding
Chapter 4
208
The error probability P (E) for our block code B is equal to the probability that we will have an error burst starting in at least one of the j + 1 positions. Thus, there exists an m > me such that (ECE(R)+2e)mc
P(E) < (j + 1)Pg < (j + 1)2
(4.267)
where the last inequality follows from (4.262). Combining (4.266) and (4.267) yields (E'°(R)+e)mc
P(E) < 2
(4.268)
Inserting (4.264) into (4.268) gives
P(E) <
h(R))N
-
- 2-(Eg (r)+E')N
4.269)
h
where El
=
EEsph(r)
4 . 270)
s ph
Ec (R)
Thus, assuming that inequality (4.261) does not hold, we have constructed a block code whose error probability satisfies (4.269) which contradicts Theorem 4.24. Hence, we conclude that inequality (4.261) must hold and the proof is complete. Next we will derive a simple lower bound that is much better than the bound in Lemma 4.25 for low rates. From Heller's asymptotic bound (Corollary 3.20) it follows that for any codeword there always exists a codeword at distance
d=
(+o(1))(m+1)c (4.271)
1
_ (2 +0(1)) me or less. Assume that such a codeword is transmitted over the BSC with crossover probability E. Then, assuming that d is even, the burst error probability is lower-bounded by the probability that an error is caused by a codeword of weight d; that is, d
PB > 2 (d
l
2)Ed/2(l
(dd2)Ed/2(1
(d)6i
d
2 8(d12)(d - d/2) I
2 2d
(I -
E)d-i
)d/2
d
1
=
d
i=d/2+1
2
>
- E)d-d/2 +
(4.272) E)d12
(2 E(I - E))d = 2(logz+o(l))d
202 logz+o(1))mc
where the last inequality follows from Lemma 4.23 and z is the Bhattacharyya parameter (4.15). It can be shown (Problem 4.22) that (4.272) also holds for d odd. Thus, we have
Lemma 4.26 For any rate R = b/c convolutional code C encoded by a generator matrix of memory m and overall constraint length v = bm that is used to communicate over the BSC
Lower Error Bounds for Convolutional Codes
Section 4.6
209
with crossover probability c, the burst error probability is lower-bounded by
P B > 2(i logz+o(1))mc
(4.273)
where z is the Bhattacharyya parameter (4.15).
In the sphere-packing region, Ro < R < C, in Section 4.5 we restricted the values of s to 0 < s < 1. Here we extend the permissible values of s to include s > 1. Then it follows from (4.196) and (4.207) that p - 1 /2 and R - 0, when s -+ oo. Then, from (4.218) we obtain
ECh (0) _ -log z
C and, hence, at R = 0 the exponent E' h(0) is twice the exponent in Lemma 4.26. We summarize our results in the following
(4.274)
Theorem 4.27 For any rate R = b/c convolutional code C encoded by a generator matrix of memory m and overall constraint length v = bm that is used to communicate over the BSC with crossover probability e, the burst error probability is lower-bounded by PB > 2-(E' (R)+o(1))mc
(4.275)
where E "'(R) is the convolutional lower bound exponent
(R) =min
{Er(R),
-2log (2 e(1 - ))
(4.276)
and0
Ec(R). Hence, in this region the convolutional coding exponent is asymptotically optimal. In Fig. 4.22 we show the convolutional coding and lower-bound exponents for our upper and lower bounds. EE°"'(R), Ec(R)
Figure 4.22 The convolutional coding exponent EC (R) and the lower-bound exponent E" for the BSC with crossover probability e = 0.045.
210
Chapter 4
Viterbi Decoding
In order to lower-bound the bit error probability we return to (4.33); that is,
Pb J-oobj=O(N1+Lj) = limj-0 Ij
with probability 1
(4.277)
where the jth burst contains Ij errors, is of length Lj, and is separated from the previous burst by Nj b-tuples. Since according to our definition an error burst cannot have more than m consecutive error-free b-tuples, it follows that (4.278)
h>
m+1 By combining (4.277) and (4.278), we can lower-bound the bit error probability as follows: J-1
i=° L
Pb > lim
J-oo b(m + 1) Y j=o (Nj + Lj)
limj-,, i -j=o L1 b(m + 1) (limj,,,, -j' >J=o Nj + limj,,,,
1
-1 I:J=0 Lj
(4.279)
E[L]
b(m + 1)(E[N] + E[L]) 1
>
- b(m + 1)(E[N] + 1)
with probability 1
where the last inequality follows from the fact that the error burst length is lower-bounded by 1. Next we return to the burst error probability, which can be expressed as the limit of the ratio between the number of error bursts and the number of nodes in which an error burst could have started, that is, (cf. (4.38)),
J = J-- J+Yi=o(Nj -m) 1 +limj-,
PB = lim
1
j=o Nj -m (4.280)
with probability 1
E[N]+1-m Equation (4.280) can be rewritten as
E[N]+1
PBm+l - m+1
(4.281)
where we used that PB < 1 to obtain the inequality. Combining (4.279) and (4.281) yields Pb >
PB
=
PB2-1og(b(m+1)2) = PB2-o(1)mc
b(m + 1)2 Finally, we lower-bound PB by (4.275) and obtain
(4.282)
Theorem 4.28 For any rate R = b/c convolutional code C encoded by a generator matrix of memory m and overall constraint length v = bm that is used to communicate over the BSC with crossover probability e, the bit error probability is lower-bounded by Pb > 2 (Ec"(R)+o(1))mc
(4.283)
where EC'-(R) is the convolutional lower bound exponent given by (4.276) and 0 < R < C.
Section 4.7
Error Bounds for Time-Varying Convolutional Codes
211
Asymptotically, for increasing memory, the lower bound on the bit error probability decreases exponentially with the same exponent as the lower bound on the burst error probability.
4.7 ERROR BOUNDS FOR TIME-VARYING CONVOLUTIONAL CODES
In Sections 4.4 and 4.5, we studied the ensemble of periodically time-varying convolutional codes when used to communicate over the BSC. In this section we will consider a more general channel, viz., the binary input, q-ary output discrete memoryless channel (DMC). We need a more general ensemble, viz., the ensemble of (nonperiodically) time-varying convolutional codes. For this ensemble we define PB aer
1
T-1
P
Imo T
T- 1 Lt
U
j= 0
j=0
1 T-1 T-1
(j)
>P < T imo T t=0 j=0
(,C, (j ))
(4.284)
where P (L, (j)) is the probability that a burst starting at t has length j + 1. Consider the situation when we use a convolutional code together with maximumlikelihood (ML) decoding to communicate over a binary input, q-ary output DMC with input v and output r and transition probabilities P (r I v). First we will obtain an upper bound on the decoding error probability P2 (£) when the code consists of only two codewords. This simple case is both nontrivial and interesting. Let P2 (£ I i) denote the error probability when the codeword v('), i = 0, 1, is transmitted, and let Di denote the decoding region for the ith codeword, that is,
D, def = {r I r is decoded as v(`)}
(4.285)
Then it follows that
P2(£ 10 _
(4.286)
P(r I v(`)) r¢D;
Although (4.286) appears to be very simple it will be most useful to have an upper bound on the error probability that is independent of i. Hence, we multiply each term of the sum of (4.286) by (P(r I v(1))/P(r I v(t)))A
where 0 < >,, < 1 and i denotes the binary complement of i. Since we assume maximumlikelihood decoding, it follows that our estimate v = v('); that is, r E Di, if P(r I V(')) > P(r I 0)) (ties are resolved arbitrarily) or, that v $ v('); that is, r ¢ Di, if P(r 10)) < P(r 10')). In other words, the ratio (P(r I v('))/P(r I v(')))'' is at most one when r E Di and at least one when r V Di. Hence, we obtain the following upper bound P2(£ I
i) <
P(r I v('))(P(r I v(`))/P(r I v
_ >(P(r I
v(c)))A(P(r I
v(')))'->r
(4.287)
r¢D;
< 1:(P(r I v(')))x(P(r I v(i))1r
Let A = 1/2; then the last sum in the bound (4.287) is independent of the transmitted codeword. Hence, we have
P(rI v(0))P(rI v(1))
P2(£)=P2(£I0)=P2(£I1)< r
(4.288)
Chapter 4
212
Viterbi Decoding
Since the channel is memoryless we can simplify (4.288):
P2(E) < E E ... jj P(rj I v(°')P(rj vil)) I
_
ri
r2
j
r
(4.289)
v `(r I v(°))P(r I
where
VP (r I vi°))P(r I vii)) = 1, for vi°l = viii
(4.290)
r
Let d denote the Hamming distance between the two codewords; that is,
d = dx(v(0), v(')
(4.291)
We have now proved
Theorem 4.29 (Bhattacharyya bound) The decoding error probability when using two codewords for communication over a binary input DMC with transition probabilities P (r I v) and maximum-likelihood decoding is upper-bounded by
Pd<(v'P(r r
I 0)P(r Ii)
(4.292)
where d is the Hamming distance between the two codewords. From Theorem 4.29 follows immediately the Bhattacharyya bound for the BSC (4.15). EXAMPLE 4.7 Consider the binary input, 8-ary output DMC given in Example 4.2. For this channel P (r 10) P (r
1
1) =
+ 0.002.0.434
0.434.0.002 +
= 2 (,/0.434 - 0.002 + 0.197 .0.008 +
0.167 .0.023 + 10.111. 0.058)
(4.293)
1.7. 10-4
(4.294)
= 0.42 and
plo < 0.4210
Consider a rate R = b/c, memory m time-invariant convolutional code with weight spectrum nd, cf. (3.36), used for communication over a binary input DMC and maximumlikelihood decoding. From (4.20) and (4.292) it follows that the burst error probability, PB, is upper-bounded by PB
00
(4.295)
nd Pd
d=dfr
where
Pd<( r
d
P(rI0)P(YI1)/
defzdd
1))
where z is the Bhattacharyya parameter for the binary input DMC.
(4.296)
Section 4.7
Error Bounds for Time-Varying Convolutional Codes
213
Next consider the ensemble E(b, c, m, oo) of binary, rate R = b/c time-varying convolutional codes encoded by polynomial, time-varying, generator matrices of memory m which we introduced in Section 3.6. The distance spectrum does in general depend on the time t, but its average E[nd] is independent oft. The average of the burst error probability taken over this ensemble is upper-bounded by 00
E[PB] <
(4.297)
E[nd]Zd d=0
where E[nd] can be calculated by summing the probabilities for all incorrect paths of weight d. Since the number of incorrect paths of length (j + m + 1) branches is upper-bounded by 2(j+1)b and since the probability that each of these incorrect paths has weight d is equal to ((j+m+l)c)2-(j+m+1)c we obtain d
E[nd] < 00E 2(j+1)b((j + m+ 1)c)2-(j+m+l)c d = 0, 1, 2....
(4.298)
j=0
Hence, by inserting (4.298) into (4.297) we have
E[PB] <
E L.
+m +
=00 j=0
d
00
00 (j+m+l)c
j=0
ZdLd
2(j+1)b((J +m +
zd
d
d=0
(1f.Z)(m)C 2(J+1)b
(4.299)
2
j=0
(_i)
Zmc 0)
2R
2
2
j_0
where the last sum converges if 2R 1
2Z
<1
(4.300)
Before we proceed with upper-bounding the burst error probability, we generalize our definition of the computational cutoff rate for the BSC (4.116) to a general discrete memoryless channel:
Definition The computational cutoff rate R0 for a general DMC with transition probabilities P (r I v) and input distribution Q (v) is given by R0
aef
_ log (nun
>
(IP(r I v ) ! 2 (v))
(4.301)
It is most interesting and, perhaps, somewhat surprising that for binary input channels the probability distribution
Q(0) = Q(1) =
2
(4.302)
214
Chapter 4
Viterbi Decoding
is the minimizing one (Problem 4.24). Hence, expanding the square in (4.301) gives
(P(r 0)+2 P(r I 0)P(r
Ro = - log \4 log
I
P(r
1+ r
C2
11))
1)
P(r 1)))
(4.303)
I
I
P(r 10)P(r 11))
= 1 - log (1 + r
and we have
Lemma 4.30 The cutoff rate Ro for a binary input DMC with transition probabilities
P(r I v) is Ro = 1 - log(1 + z)
(4.304)
where z is the Bhattacharyya parameter defined by (4.296). EXAMPLE 4.8 For the BSC with crossover probability c, it follows from (4.304) that
Ro = 1 - log (i + 2\/E(1 - E))
(4.305)
which coincides with the definition for the BSC (4.116). As a specific instance, we find
Ro =
1
2
,
when E = 0.045
(4.306)
EXAMPLE 4.9 Consider the binary erasure channel (BEC) with erasure probability 8 shown in Fig. 4.23. Its cutoff rate is
Ro=1-log(1+ (1-8)0+ 88+ /0(1 - 8) (4.307)
= 1 - log(1 + 8) As a specific instance, we find 1
Ro = 2, when 8 =
(4.308)
- 1 = 0.414
So 4.5% "channel errors" are as bad as 41.4% erasures!
1-6
Figure 4.23 Binary erasure channel.
Both the BSC and the BEC have
Ro = 1, when c = 0 and 8 = 0, respectively
(4.309)
Ro = 0, when c = i and 8 = 1, respectively
(4.310)
and
Error Bounds for Time-Varying Convolutional Codes
Section 4.7
215
Now we return to our derivation and rewrite (4.304) as
2Ro1+z=1
(4.311)
2
By comparing the inequality (4.300) with (4.311), we conclude that the last sum in (4.299) converges if R < Ro. Hence, we have
E[Pal
I
<
+z\mc
(2R 11+z)c
2(R-Ro)c
2
2
1 - (2R 1±z ), 2
= C(R)2-Romc =
-2
2-(Ro+o(1))mc
Rome
1 -2 (RRo)c
(4.312)
for R < Ro
where 1
(4.313)
c(R) = 2(Ro-R)c - 1
Since the average of the burst error probability taken over our ensemble is upper-bounded by (4.312), we have
Theorem 4.31 There exists a binary, rate R = b/c, time-varying, convolutional code encoded by a polynomial, time-varying generator matrix of memory m such that its average burst error probability when used to communicate over a binary input DMC with maximumlikelihood decoding is upper-bounded by
for R < Ro
PB < 2-(Ro+o(1))mc
(4.314)
where Ro is the computational cutoff rate.
Next we will obtain a corresponding upper bound on burst error probability for the sphere-packing region, that is, for rates Ro < R < C. Then we need the Gallager function for the binary input DMC with transition probabilities P (r v): G(s)
-log
(P(r
1+s
2P(r
1
I
1)i+-,
(4.315)
where 0 < s < oo. For the BSC, definition (4.315) coincides with definition (4.208). We also notice that for s = 1 we have the important equality
G(1) = Ro
(4.316)
For simplicity we will in the sequel only consider binary input and output-symmetric DMC; that is, we impose the restriction that
P(r 10) = P(-r 1
1),
all r
(4.317)
All channels we consider in this book are output-symmetric.
For the sphere-packing region we will again exploit the idea of separating the error event £ into two disjoint events corresponding to "few" .F and "many" M errors, respectively. Hence, we have (cf. (4.76))
E[PBI = E[P(£)l < E[P(£, F)l + P(M)
(4.318)
Since we are considering output-symmetric channels, we can without loss of generality assume that the allzero sequence is transmitted. Let r = rorl .... where r; = r,1)r((2) ... ri ` , denote the received sequence. Then we introduce the cumulative metric r
Zi
Sr = i-o
(4.319)
Chapter 4
216
Viterbi Decoding
where (4.320)
z; _ Y, A if f=t
P (ri(t) 1 0)+s
(g) µit def = µ(ri) = log
,
p (tre) 0) +s + 2 p
1) 7+s
-R
(4.321)
1
and the value of the parameters will be chosen later. Remark.
For the BSC, s(0) = sa and µ(1) = s,8, where a and $ are defined
by (4.81) and (4.82), respectively, the parameter a in (4.81) and (4.82) is given by a = s / (E I I+s + (1 - E)1+s ), and s satisfies G(s) = sR, where G(s) is given by (4.208). +E
As in Sections 3.4 and 4.3, we have a random walk 0, So, St, S2, ... , and we say that those error patterns for which St hits or crosses (from above) a certain barrier u contain many errors. From Wald's identity (Corollary B.6) follows
P(M) = P(min{Sr} < u) < 2-X0"
(4.322)
t
where Ao < 0 is a root of the equation g(A)
def E[2'`u(r)}
_
2"µ(r)P(r 10) = 1
(4.323)
r
Combining (4.321) and (4.323) yields
2-a0R (P(r
0)
+t (1P(r 0) + + p(r
1) +s
\
J
=
Jl
/
1
(4.324)
Exploiting the output-symmetry of the channel, we can rewrite (4.324) as 2-k0R
(1:
(2P(r 10)1+t + 2P(r 1
+1)
1)
(4.325) X
(P(r 0)1+, + p(r
1)
1
/
Now we choose s = so to be a positive root of
J
G(s) = sR
(4.326)
where G(s) is the Gallager function and 0 < so < 1 if R > Ro. We can easily verify that Ao = -so satisfies (4.325). Hence, for s < so we have P (M) < 2so" < 2su (4.327) where so is a positive root of (4.326). To upper-bound the probability that a burst starts at the root and that we have an error pattern with few errors, P (E, F), we use, as before, the union bound and obtain
P(E, F) < E P(E(k), .P')
(4.328)
k
where e(k) is the event that a burst error starting at the root is caused by path k. Let us assume that the allzero codeword is transmitted and that path k has weight wk and
remerge with the allzero path at depth j + m + 1, j > 0. Then,
p(E(k), y) =
P(r[o,l+m] 10[o,i+m]) r[o.l+mI E Dk..F
(4.329)
Section 4.7
217
Error Bounds for Time-Varying Convolutional Codes
where Dk is the decoding region for the kth path and F is the region corresponding to few errors. In the region Dk we have X1
P(r[O,j+m]
V[O,j+m])
P(r[O,j+m]
I O[O,j+m])
>
(4.330)
where v[O,j+m] is the first j + m + 1 c-tuples of the kth path and X, > 0. Similarly, in the region F we have 2)L2(Sj+m-u) > 1
(4.331)
where X2 > 0. Thus, combining (4.329), (4.330), and (4.33 1) yields (k)
P(r[o,j+m]
y) <
p(£(k)
Al
y[0,j+m])
r[O.j+m]EDk,.F
'
x 2'12 ('Sj+m -u ) P (r[0 j+m] I <
P(r[o,j+m]
X-1 allr[o,j+m]
O[O, j+m])
(4.332)
Al
V10)j+m])
I
(P(r[O,j+m] 10[0,j+m])
x2 k2(Sj+.-U)Pr
0
Next we insert (4.319), (4.320), and (4.321) into (4.332) and exploit the memorylessness of the DMC. Then we obtain j+m c I
Pi (r(e)l v
P((k) j7) < 2-A2u fl fl 2 >2R i=0 a=1
x\ (2 P( ri(e) =
2-k2u
(P(r(t)
r`(e)
2 P (ri(e) 11)1'+s
10)111+s
+
(2_A2R
P(r1(e) 10)i
0)>
+1
1
1
l/
- A2l /
11T P(r 10)-x1+1 P(r 11)"1 r
x x
(ZP(r 0)1+s + ZP(r 11)1+s)
-12
wk
f )
2-12R
E P(r 10)
x1+7
+1
r
x where v;
k)
=
(2p(r
v`k)(1)v`k)(z)
V
0)11+5 + 2p(r I l)+s I
)r2
) (j+m+1)c-wk
... vlk)(c) is the ith c-tuple of the codeword corresponding to the kth
path. Let 1
X1 =
l+s
(4. 334)
and
X2 = 1 - s
(4.335)
p(g(k) F) < 2-(1-s)uZ k2c(j+m+l)-wk
(4.336)
Then we have
Chapter 4
218
Viterbi Decoding
where
Z1 = 2-(' -s)R Y (P (r 0)P(r
1))1+s
r
G p(r 10)'+s + p(r I
X
(4.337)
1+s
1)=+s
and
Z2 =
P (r 0) 1+s ( P (r 1 0)+s +
2-(1-s)R
- P (r 11)1+5
)_1+S
r
2
=2-(1-s)R
C-P(r 10).+s + 2P(r 1)1+s)
(4.338)
1+s
0)1+s + 2p(r 1)1+s/
X
where the last equality follows from the output-symmetry of the channel. By combining (4.328) and (4.336), we obtain 00 (j+m+1)c s)u
E[P(E, )r)] < 2 (1
E[n(w, j + m + 1)] j=0
(4.339)
w=1
X ZwZ(j+m+l)c-w 1
2
where n (w, j + m + 1) denotes the number of paths of weight w and length j + m + I c-tuples. For the ensemble E(b, c, m, oo) we have
((j +m + 1)c)
)
w
(l)U+m+1)c 2
(4.340)
C(j + m + 1)c) 2-(j+m+1)e
< 2b(j+l)
w
Further upper-bounding of (4.339) by extending the sum over w to include w = 0 and inserting (4.340) into (4.339) yield co
E[P(E '')] < 2
2b(j+l)-(j+m+1)c
(1-s)u
j=0 X
Y, w=0
((j +M + 1)c) W
Zu'Z(j+m+l)c-w 1
2
00
=
(j+m+l)c
2-(1-s)u
Y. j=0
= 2-(1-s)u
2b(j+l)-(j+m+l)c(Z1 + Z2)(j+m+])c
ZI + Z2
'me o0
Y" (2R-] (ZI + Z2))(j+l)c
2 2-(1-s)u
(4.341)
j=0
(Zi + Z2 \ me 2
J1
(2R-l (Z1 + Z2))c
1 - (2R-1(Z1 + Z2))c
where the last equality requires that
R < log
2
ZI + Z2
(4.342)
219
Error Bounds for Time-Varying Convolutional Codes
Section 4.7
From (4.337) and (4.338) it follows that
Zl + Z2
=2-(1-s)R
2
x E(2 P (r
0) 1+s + P(r 0) '+s P(r 1) +s + -
P(r
1)'+s
2
r 1+s
x 2-(1-s)R
Y' (1 P(r
10)-+s +
2
x
G P (r
0)'+s +
2
1)+s
I2 P(r
(4.343)
l+s
-P(r
I
1)'+s
(2 P(r 1 0)11+s + 2 P(r
= 2-(1-s)R
)z
1+s
1)1+s
r
= 2-(1-s)R-G(s)
Hence, inequality (4.342) is equivalent to
R < G(s)/s
(4.344)
S < So
(4.345)
or, again equivalently,
where so is defined by (4.326). Combining the bounds (4.318), (4.327), and (4.341) with equality (4.343) yields
E[PBI < E[P(E, .E)] + P(M) < 2-(1-s)u2-((1-s)R+G(spmc
(2R-1(Z1 + Z2))c + 2su 1 - (2R 1(Z1 + Z2))c
(4.346)
where s < so. Let us choose
u = -((1 - s)R + G(s))mc
(4.347)
Then we obtain
E[PB] < (1 +
(2R-1(Z1 + Z2))`'
l 2-s((1-s)R+G(s))mc
1 - (2R-1(Z1 + Z2))` /
(4.348)
2-s((1-s)R+G(s))mc
1
1 - (2R-1(Z1 + Z2))c Now, if we choose s to be slightly less than so,
s=so-1/m
(4.349)
say, and use (4.326), then we obtain 2((1-s)R+G(s))c+(sG(so)-soG(s))mc
E[PB] <
=
1-
(2R-1(Zl + Z2))`
2-(G(so)+o(1))mc
where 0 < so < 1, or, equivalently, Ro < R < C
2-soRmc
(4.350)
220
Chapter 4
Viterbi Decoding
Finally, since the average of the burst error probability can be upper-bounded by (4.350), we have proved the following
Theorem 4.32 There exists a binary, rate R = b/c, time-varying convolutional code encoded by a polynomial, time-varying generator matrix of memory m such that its average burst error probability when used to communicate over a binary input and output-symmetrical DMC with maximum-likelihood decoding is upper-bounded by PB < 2-(Ec(R)+o(1))mc
(4.351)
where
0
Ro,
Ec(R) =
{'G(s)/s, 0 < s
1, Ro
R<
(4.352)
for the random coding and sphere-packing regions, respectively, and where G (s) is the Gallager function for the DMC (4.315) and Ro is the computational cutoff rate for the DMC (4.301).
Consider a rate R = b/c, memory m, minimal-basic convolutional encoding matrix whose overall constraint length is v = bm. Let S denote the set of encoder states of the controller canonical form realization of the encoding matrix. Thus, the complexity of the encoder is S I = 2bm
(4.353)
and, hence, the complexity of the encoder satisfying (4.351) is at most 2bm. By combining (4.351), (4.352), and (4.353) we obtain
Theorem 4.33 There exists at least one time-varying, rate R < C, convolutional code of memory m with burst error probability satisfying j1
PB
where o(1)
(I S I)-Ra/R+o(1),
< 11 ((I S I)-so+o(1)
0 < R < Ro Ro < R < C
(4.354)
0 when I S I-+ oc and so is given by (4.326).
The burst error probability decreases algebraically with increasing complexity. In particular, it follows from Theorem 4.33 that the single number Ro not only determines
a range of rates, 0 < R < Ro, over which reliable communication is possible, but also determines the coding complexity necessary to obtain a given error probability.
4.8 ERROR BOUNDS FOR FINITE BACK-SEARCH LIMITS
In the Viterbi decoding described in Section 4.1, we postponed the decision of the decoded codeword until we had reached the end of the trellis, which was terminated using the zerotail method. Here we will consider a suboptimal version of Viterbi decoding which when the decoding reached depth t, t > r, outputs an estimate of the information b-tuple at depth t - r; r is called the back-search limit. Our aim is to derive upper bounds on the burst error probability for the ensemble of rate R = b/c, time-invariant convolutional codes encoded by generator matrices of memory m for the back-search limit r = m + 1. The decoding rule is simple: for each depth t > r choose among the paths leading to the states at this depth the one or, in case of ties, arbitrarily one of those with maximum Viterbi metric and output as the decoder estimate the (t - r)th information b-tuple corresponding to the chosen path. Since we consider the ensemble of randomly chosen time-invariant convolutional codes, without loss of generality we can study the burst error probability at the root.
Section 4.8
Error Bounds for Finite Back-Search Limits
221
To upper-bound the burst error probability we will use the block code B given by B = {v[o,m]
v[o,m] = 0 or arises from u[o,m] with uo
I
0}
(4.355)
The number of codewords is M = (2b - 1)2bm + 1
(4.356)
N = (m + 1)c
(4.357)
and the block length is
Hence, the rate of the block code is slightly less than the rate of the convolutional code R = b/c.
Assume that the allzero sequence is transmitted over the BSC, and let £ denote the error event for the Viterbi decoder with back-search limit r = m + 1. Then, a necessary and sufficient condition for S is that the block code B is erroneously decoded.
Lemma 4.34 (Random coding bound) There exists a binary, rate R = b/c, timeinvariant convolutional code encoded by a generator matrix of memory m such that its burst error probability when used to communicate over the BSC with Viterbi decoding with backsearch limit r = m + 1 is upper-bounded by the inequality
PB < 2
(Ra-R)(m+1)c
0 < R < Rcit
(4.358)
where Ro and Rc,it are the computational cutoff rate and the critical rate, respectively. Proof. The required statistical properties follow from Lemma 3.14. Then from (4.15) it follows that over the ensemble £(b, c, m, 1) the probability that each codeword v[o,m] that arises from an information sequence u(o,m) with uo # 0 causes an error is upper-bounded by (m+1)c
(l±)(m+l)c
((m
Y, i=0
2-Ro(m+1)c
(4.359)
i
where the last equality follows from (4.118) and z is the Bhattacharyya parameter (4.15). We combine (4.359) with the upper bound 2R(m+l)c on the total number of codewords in B and the lemma follows.
Lemma 4.35 (Expurgation bound) In the ensemble £(b, c, m, 1) of binary, rate R = b/c, time-invariant convolutional codes encoded by generator matrices of memory m, there exists a subset containing at least a 2-bth fraction of the codes such that their average burst error probability when used to communicate over the BSC with Viterbi decoding with back-search
limit r = m + 1 is upper-bounded by the inequality Eexp[PBI < (2b - 1)2p(1og(z E(1 F)))(m+1)c 0 < R < Rexp
(4.360)
where p is the Gilbert-Varshamov parameter of the rate R, Rexp is the expurgation rate, and e is the crossover probability for the BSC. Proof. The probability that the minimum distance between the allzero codeword and any other codeword in the block code B given in (4.356) does not exceed do is upper-bounded by d.
(M - 1) E ((m + 1)c)2-(m+1)c
< (2b
-
1)2bm2(h((m
do
)-1)(m+1)c
(4.361)
Let
do = [p(m + 1)cj
(4.362)
where p is the Gilbert-Varshamov parameter. Then (4.361) can be further upper-bounded by (2b
_
1)2bm2(h(n)-1)(m+1)c = (1-2 -b )2 b(m+ 1)2 -R(m+l)c = 1-2 -b
(4.363)
Chapter 4
222
Viterbi Decoding
Hence, at least the 2-bth fraction of the block codes has such distances that are not less than do + 1. These block codes form an expurgated ensemble. The average error probability for this subensemble is upper-bounded by (cf. (4.122)) (m+1)c /(yn
b
Eexp[P(E)]
2 (M - 1) Y (
i=do+1 \
+ 1)C
/
Z
2-(m+1)czi
(4.364)
where z is the Bhattacharyya parameter (4.15). Then analogously to (4.123) we obtain (2b
Eexp[P(E)}
_
(m+1)c
I)2R(m+1)c
((m + 1)C 2x(i
_ (2
b
(m+T, 1)c
-
1)-(m+1)czi
da
Z
i=do+1 1)2(R-1)(m+1)c2-z(do+1)
(4.365)
((m + 1)c '
i=do+1
where we have chosen A to be
A = log
p
(1 - p)z
> 0, 0 < R < ReXp
(4.366)
(cf. (4.124)). Next we upper-bound (4.365) by extending the summation to start at i = 0. Then we obtain (26
E'exp[P(E)]
((in + 1)c )
- 1)2(R-1)('"+1)c2-+l)
i
i=o
_
(26 _ 1)2(R-1)(m+1)c2-X(do+1)(1
(4.367)
+2Az)(m+1)c
Inserting (4.366) into (4.367) and using (4.362) yield
(2b - 1)2(R-1)(m+1)c (1 - p)Z p
I-
1)2(R-1)(m+1)c
= (2b -
1)2(R-1+h(p))(m+1)cZp(m+1)c
(2b
_
(1 - p)-(m+1)c
p)-(1-p) p_P)(m+1)c zp(m+1)c
< (2b -
= (2b - 1)zp(m+l)c =
do+1
J
(4.368)
I)2p(1og(2 E(1-c)))(m+1)c
and the proof is complete.
For the sphere-packing region, Rcrit < R < C, we have the following
Lemma 4.36 (Sphere-packing bound) There exists a binary, rate R = b/c, timeinvariant convolutional code encoded by a generator matrix of memory m such that its burst error probability when used to communicate over the BSC with crossover probability E with Viterbi decoding with back-search limit r = m + 1 is upper-bounded by the inequality
PB < 2 (p l°g+(1 p) log ;
(m+1)c
Rerit < R < C
(4.369)
where p is the Gilbert-Varshamov parameter of the rate R; that is,
p = h-1(1 - R)
(4.370)
Rcrit is the critical rate, and C is the channel capacity for the BSC. Proof. As usual, the proof in the sphere-packing region uses the idea of splitting the error events in "few" (F) and "many" (M) errors introduced in Section 4.3. Let M = {r[o,m+1] I wx(r[o,m+1] >- p(m + 1)c)}
(4.371)
Section 4.9
Tailbiting Trellises
223
and .E' is the complement of M. The average burst error probability is now upper-bounded by, cf. (4.139),
E[PB] = E[P(E)] < E[P(E), F] + P(M)
(4.372)
where P(M) is upper-bounded by, cf. (4.143), P(M) < 2-AP(m+1)c(1 - F + 2xE)(m+1)c
(4.373)
where A is given by (4.144). Analogously, E[P(E), .F ] is upper-bounded by, cf. (4.149),
E[P(E), 91 <
2a'P(m+1)c2(R-1)(m+1)c(1
+ 2-µ)(m+1)c
x (I - E + E24-A)(m+1)c
(4.374)
where µ and A' are given by (4.150) and (4.151), respectively. By inserting (4.373) and (4.374) into (4.372) we obtain, cf. (4.157),
E[P(E)] < 2
2-(p log e+(1-p) log 'I-PO (M+I)C
(4.375)
If we instead use more elegant bounds analogously to (4.142) and (4.148), we can avoid the factor 2 and obtain (4.369). We can now summarize our results in the following
Theorem 4.37 There exists a binary, rate R = b/c, time-invariant convolutional code encoded by a generator matrix of memory m such that its burst error probability when used to communicate over the BSC with crossover probability c with Viterbi decoding with backsearch limit r = m + 1 is upper-bounded by the inequality PB <
2-Efbs(R)(m+1)c
(4.376)
where EtS(R) is the finite back-search limit exponent given by E tbs (R)
-p log (2 c(1 -E))+ 1(m(+1)c) ,
0 < R < ReXp
E13 (R),
ReXp
(4.377)
and where
p = h-1(1 - R)
(4.378)
is the Gilbert-Varshamov parameter of the rate R, Ea (R) is the block coding exponent (4.160), Rexp is the expurgation rate, RCi41 is the critical rate, and C is the channel capacity for the BSC. We notice that when we use Viterbi decoding with back-search limit r = m + 1, the burst error probability is determined by the block coding exponent. When we use Viterbi decoding with the decoding postponed to the end of the trellis, the burst error probability is determined by the convolutional coding exponent. Finally, we remark that for the ensemble E(b, c, m, oc) of time-varying convolutional codes encoded by generator matrices of memory m, Viterbi decoding with back-search limit r can also be analyzed for r > m + 1 [Zig72].
4.9 TAILBITING TRELLISES
So far we have used the zero-tail (ZT) method to terminate convolutional codes into block codes. If the trellis is short, the rate loss due to the terminating m dummy b-tuples (zeros when the generator matrix is polynomial and realized in controller canonical form) might not be acceptable. In this section, we will briefly describe a method to terminate convolutional
Chapter 4
224
Viterbi Decoding
codes into block codes without any rate loss. This method is called tailbiting and can be used to construct powerful trellis representations of block codes. Consider a rate R = b/c convolutional code C encoded by a polynomial generator matrix (4.379)
of memory m realized in controller canonical form as shown in Fig. 4.24.
Figure 4.24 A polynomial generator matrix G(D) realized in controller canonical form.
Let us truncate the causal codewords after L c-tuples, where L > m. Then, assuming that the encoder state is allzero at time t = 0, we have
vt=utGo+ut_1Gi+---+ut-mGm, 0 < t < L
(4.380)
V[O,L) = U[O,L)GL
(4.381)
or, equivalently,
where Go
G1
...
Go
G1
Gm
Gm Go
GL =
G1
...
Go
Gi
Gm
...
Gm-t
(4.382)
G1
Go
is an L x L matrix and G1, 0 < i < m, are b x c matrices. Let QL = (011 O L2 ... OLm) denote the encoder state at time t = L. Clearly, we have (4.383)
QL = (UL-1 UL-2 ... UL-m)
Now we assume that we have the encoder state at time t = 0 equal to the encoder state at time L, that is, Qo = QL and that the input is the allzero sequence. If we let L) denote the output sequence corresponding to initial encoder state 0 L and allzero input sequence, then L) can be expressed as Zl
Zi
zi
(4.384)
v[o,L) = u[0,L)GL
where U"[0,L) = (0 0... 0 UL-m UL-m+t
...
UL-1)
(4.385)
Section 4.9
225
Tailbiting Trellises
and
GzL
_
(4.386)
Gm Gm-1
G.
G1
G2
... G.
is an L x L matrix. Hence, we conclude the codewords of the tailbiting representation of the block code Btb obtained from the convolutional code C encoded by generator matrix G (D) by using the reciprocal of the last m input b-tuples as the initial encoder state can be compactly written as (4.387)
v[o,L) = u[o,L)GtL
where
GL = GL + Gz' Go
G1
...
Go
G1
Gm Gm
Go Gm
Gm-1
G.
G1
G2
...
Gm
G1
...
Go
G1
Gm
...
Gm-1
(4.388)
Gi Go
is the L x L generator matrix for the tailbiting representation of the block code Btb Since we require that we have the same state at the beginning as at the end, we can use a circular trellis for the tailbiting representation of a block code. In Fig. 4.25 we show as an example a circular trellis of length L = 6 for a four-state tailbiting representation.
00 01 10 10
Figure 4.25 Circular trellis of length L = 6 for a four-state tailbiting representation of a block code.
226
Chapter 4
Viterbi Decoding
A circular trellis of length L corresponds to a total of K = bL information symbols, c code symbols per branch, block length N = Lc, 2b branches per trellis node; the number of codewords is
M=2 K = 2bL
(4.389)
R = K/N = b/c
(4.390)
and its rate is
The block code is the union of 2bm subsets corresponding to the paths that go through each of the 2bm states at time 0. These are 2b"` cosets of the (N, K - bm) zero-tail terminated convolutional code corresponding to the paths that go through the allzero state at time 0. It remains to find the minimum distance dmin of the block code. Since it is linear, it is enough to find the minimum weight of the nonzero codewords. The nonzero block code trellis paths fall into two cases, which are illustrated in Fig. 4.26. Case (i)
Case (ii)
Figure 4.26 Two different types of paths.
Case (i): The neighbor path touches the allzero path at least once; all paths considered are within the subset of paths leaving the allzero state. By the tailbiting condition they will sooner or later remerge with the allzero path. Finding the minimum weight among such paths is the same as finding the minimum weight path for a convolutional code. By the symmetry of the circular trellis, the behavior of paths out of one allzero node is the same as for all the other allzero nodes. The minimum weight of the paths within this subset is the (L - m - 1)th order row distance, but we call it the intra minimum distance of the tailbiting trellis representation of the block code and denote it dintra If L is long enough, dintra is equal to the free distance of the corresponding convolutional code, but in general we have dintra > dfree Case (ii): The neighbor path never touches the allzero path. This case is unique to block codes. For each nonzero starting state, we have to find the minimum weight path and then take the minimum over all outcomes. We call the minimum distance between the code subsets, that is, between the subset considered in case (i) and its cosets, the inter minimum distance of the tailbiting trellis representation of the block code and denote it dinter. The minimum distance of the block code is d,n1n = min{dintra, dieter}
(4.391)
If the tailbiting circle is long enough, case (i) paths lead to the minimum distance, but for short circles case (ii) codewords may lead to the minimum weight path, that is, dinter might be less than dintra. Hence, short tailbiting representations of block codes have quite different optimal (largest distance) generators than do zero-tail terminated convolutional codes. So far, we have constructed tailbiting representations of block codes from convolutional codes by first truncating the semi-infinite generator matrix G for the convolutional code,
Go G i
G=
Go
... G. ... G.
G1
(4.392)
Section 4.9
Tailbiting Trellises
227
after L rows. Then we obtain Go
G1
... G.
Go
G1
Gm Go
G1
...
Go
G.
(4.393)
Gm-1
Gm
Go
Gi
...
Gm
/
which is an L x (L + m) matrix. Then we "wraparound" the part consisting of the last m columns and obtain Gt as given in (4.388). We can also construct convolutional codes by "unwrapping" a tailbiting generator matrix of the type Gt and then extending the "unwrapped" matrix to a semi-infinite generator matrix for the corresponding convolutional code. EXAMPLE 4.10 Consider the rate R = 1/2 convolutional code with encoding matrix G(D) = (1 + D2 +D 3 1 + D) of memory m = 3 and with dfree = 5. By the "wraparound" technique, we obtain the following generator matrix for a tailbiting representation of the corresponding (12,6) block code:
Gtb 6
=
11
01
10
10
00
00
11
01
10
10
00 00
00
00
11
01
10
10
10
00
00
11
01
10
10
10
00
01
10
10
00 00
11
01
00
11
(4.394)
It can be verified that dintra = dfree = 5, dinter = 4, and dm;,, = 4. (Adding rows 3, 4, 5, and 6 gives the codeword 0 100 01 10 00 10 of Hamming weight 4 which upper-bounds dm n.) By increasing the number of information symbols by two to K = 8, we obtain a (16,8) block code with dd1 = dfree = 5.
The (24,12) extended Golay code B24 with dmin = 8 is a quite remarkable block code that has often been used as a benchmark in the studies of code structure and decoding algorithms. Surprisingly enough, there exists a rate R = 1/2 tailbiting representation of B24 that requires only 16 states [CFV97]. It has the following generator matrix:
G tb12-
11
01
10
11
11
00 00 00
00
11
01
11
01
11
11
01
11
01
11
00 00 00 00 00
00 00 00 00 00 00 00
11
11
10
01
11
00 00 00 00
00 00 00 00
11
01
10
11
11
00 00 00 00 00
11
01
11
01
11
00 00 00 00
11
01
11
01
00 00 00
11
11
10
00 00
11
01
00
11
/ 11
01
11
01
11
00
00 00 00 00 00 00 00
11
11
10
01
11
11
00 00 00 00 00 00 00
01
11
00 00 00 00 00 00
10
11
11
00 00 00 00 00 00 00
01
11
01
11
00 00
00 00
00 00 00 00
00 00
00 00 00 00 00 00 00
(4.395)
We notice that if we regard Gt as a rate R = 1 /2 generator matrix, the corresponding generator
matrix for the convolutionalacode is time-varying with period T = 4 and of memory m = 4. That is, we need a 2bm = 21.4 = 16-state trellis. However, Gtz can, for example, also be
228
Viterbi Decoding
Chapter 4
regarded as a rate R = 4/8 generator matrix with a corresponding time-invariant generator matrix of memory m = 1, that is, 2bm = 24.1 = 16 states. EXAMPLE 4.11 By "unwrapping" G;b, we obtain, for example, the rate R = 1/2 Golay convolutional code (GCC) with the time-varying (period T = 4) encoding matrix of memory m = 4,
GCC
Lit
(D) _
(1+D2+D4 (1+D+D2+D4 (1+D2+D3+D4 (1+D2+D4
F
1+D+D2+D3+D4), t =0,4,... 1+D+D3+D4), t=1,5,... 1+D+D3+D4), t =2,6,...
(4.396)
1+D+D2+D3+D4), t=3,7,...
(Notice that G1(D) = G,+3 (D).) Alternatively, we can obtain the R = 4/8 Golay convolutional code with the time-invariant encoding matrix of memory m = 1,
G GCC (D)
=
1+D
0
1
0
0
I+ D
1
1
D
D D
1+D
0
0
0
1+D
1+D D 0 D
1
1
1
1+ D
1
0
1+D
1
D
1+D
D D
(4.397)
The rate R = 4/8 Golay convolutional code has dfree = 8 and the following path weight enumerator: T(W) =
W8(49 - 20W4 - 168W8 +434W12 - 560W16 + 448W20 - 224W24 + 64W28 - 8W32) 1 - 28W4 - 17W8 + 118W12 - 204W16 + 204W20 - 128W24 + 48W28 - 8W32
(4.398)
=49W8+1352W12+38521W'6+1096224W20+
When we consider the GCC as a time-varying rate R = 1/2 convolutional code, it is reasonable to average the spectra for the four different phases. With this convention we have T (W) = 12.25W' + 338 W 12 + 9455.25 W 16 + 264376 W 20 +
(4.399)
If we multiply the values n8 and n12 in (4.399) by four, then we obtain the numbers given by (4.398) but 4n16 = 4.9455.25 = 37821
(4.400)
which is 700 less than the corresponding number for the rate R = 4/8 GCC. The time-varying
rate R = 1 /2 GCC has memory m = 4. Hence, its shortest detour from the allzero sequence is of length (1 + m)c = (1 + 4)2 = 10 code symbols. The rate R = 4/8 GCC has memory m = 1 and thus, a shortest detour of (1 + m)c = (1 + 1)8 = 16. The information sequence 100001000000... is encoded as 110111011111111001110000 ... and 11011101 11111110 01110000... , respectively. The first code sequence corresponds to two consecutive detours, each of length 10 code symbols and of weight 8. The second code sequence corresponds to a single detour of length (2 + 1)8 = 24 code symbols and of weight 16. The path weight enumerator counts only single detours. Hence, this code sequence is counted when we consider the GCC as a rate R = 4/8 convolutional code but not when we consider it as a rate R = 1/2 convolutional code. This phenomenom explains the discrepancy in the numbers of weight 16 code sequences. The GCC is doubly-even, that is, the weights of the codewords grow in steps of four, and it can easily be verified that its codewords are self-orthogonal, that is, the Golay convolutional code is self-dual. A code that is both doubly-even and self-dual is called Type H.
Section 4.9
229
Tailbiting Trellises
EXAMPLE 4.12 The rate R = 4/8, time-invariant, memory m = 1 (16-state), convolutional code encoded by encoding matrix [JSW98]
G(D)=
0 D
0 0
1+D
D
1+D 1+D
D 1+D 1+D
1
0
1
1+D D 0
0
1
D D
1
1+D 1+D
D 1
0
1
0
1
D
1
(4.401)
is also Type II and has free distance de,, = 8, but its path weight enumerator T(W)
_ W8(33-6W4+8W8-138W12+260W16-226W20+112W24-32W28+4W32) (1 - 30W4 - W8 + 20W12 +
32W16- 74W20 + 56W24 - 22W28 + 4W32)
(4.402)
= 33W8+984W12+29561 W16 + 886644 W20 + is better than that of the GCC (4.398). EXAMPLE 4.13 Through the "wraparound" technique, we can from the rate R = 4/8 encoding matrix in Example 4.12 obtain a tailbiting representation of a rate R = 16/32 block code with drain = 8:
G=
/ 00101110
00011101
00000000
00000000
00011111 10100010 11000101
10000110
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00011101 10000110 11111000 \ 11001010
00101110 00011111 10100010
00000000 00000000 00000000 00011101 10000110 11111000 11001010 00101110 00011111 10100010
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00011101 10000110 11111000 11001010 00101110
11111000 11001010
11000101
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
11000101
00000000 00000000 00000000 00000000
(4.403)
00011111 10100010 11000101
An obvious decoding method for tailbiting trellises is to use the Viterbi algorithm for each of the 2° subcodes where v is the overall constraint length and a subcode consists of the codewords going through a given state. This procedure leads to 2° candidate codewords, and the best one is chosen as the decoder output. A simpler but suboptimal decoding method is to initialize the metrics for all states at
t = 0 to zero and then decode with the Viterbi algorithm going around the cyclic trellis a few times. Stop the Viterbi algorithm after a preset number of cycles n. Trace the winning path backward to determine whether it contains a cycle of length L that starts and ends in the same state. If such a cycle exists, it is chosen as the decoder estimate; otherwise the result is an erasure. Experiments have shown that a few decoding cycles often suffice [ZiC89]. However, the decoding performance may be significantly affected by "pseudo-codewords" corresponding to trellis paths of more than one cycle that do not pass through the same state at any integer multiple of the cycle length other than the pseudo-codeword length. In Fig. 4.27 we compare the bit error probabilities when these two algorithms are used to decode the 16-state tailbiting representation of the extended Golay code when it is used to communicate over the BSC.
Chapter 4
230
Viterbi Decoding
PB
Figure 4.27 Comparison of the bit error probabilities for the 16-state extended Golay code. The bundles of curves correspond to soft decisions (left) and hard decisions (right). In each bundle the curves correspond to (from left to right) ML-decoding and suboptimal decoding with 10, 5, and 3 cycles, respectively.
4.10 QUANTIZATION OF CHANNEL OUTPUTS In Section 4.2 we introduced the computational cutoff rate R0, which is an important design criterion. Massey [Mas74] showed that it can be effectively employed in modulation system
design. Here we will use the Ro criterion to determine the quantization thresholds on the AWGN channel when BPSK modulation is used. It is easily seen from (4.304) that for the unquantized output channel we have
Theorem 4.38 The computational cutoff rate Ro for a binary input, unquantized output channel is
Ro = 1 - log (1 + J where p,iv(
11)da)
p,j,(a j
(4.404)
-00
denotes the transition density function.
Corollary 4.39 The computational cutoff rate Ro for the AWGN channel with BPSK modulation at signal-to-noise ratio Es/No is
Ro = 1 - log (1 + Proof.
e-E,lNo)
(4.405)
J
From (4.404) follows
Ro = 1 - log C1 + j
1
nNo
r °°
= 1- log 1+ J
1
°°
= 1 - log (1 + e-E,lNo)
nNo
e
1
e
nNo +Es
"o da
e_
da (4.406)
Section 4.10
Quantization of Channel Outputs
231
EXAMPLE 4.14 The unquantized AWGN channel with BPSK modulation at signal-to-noise ratio ES/No = 0 dB is
Ro(oo) = 1 - log(1 + e-1) = 0.548
(4.407)
Using hard decisions, that is, only two output values, we obtain a BSC with crossover probability (see Problem 1.28):
e=Q
(i) = 0.079
(4.408)
and, hence,
R0(2) = 1 - log (1 + 2,/0.079(t - 0.079)) = 0.379
(4.409)
which is substantially smaller than Ro(oo).
Which signal-to-noise ratio is required for a BSC to achieve the same computational cutoff rate Ro as for the unquantized AWGN channel at E5/No = 0 dB? From (4.304) it follows that the crossover probability for the BSC can be expressed as
E = 2 - 2R°(1 - 2-Ro)
(4.410)
By combining (4.407) and (4.410) we obtain e = 0.035
(4.411)
and by solving (1.12)
E = Q ( 2Es/No)
(4.412)
E, /No = 2.15 dB
(4.413)
that is, we have a slightly more than 2-dB loss due to hard decisions. The binary input, 8-ary output DMC in Example 4.2 is a quantization to q = 8 output levels of an AWGN channel with BPSK modulation at signal-to-noise ratio E, /No = 0 dB. This particular quantization was in actual use in the space program. The corresponding thresholds were set before we knew how to find the optimum ones [Mas74].
Definition We will say that a given quantization scheme with q different outputs is optimum if it maximizes Ro for all ways of quantizing the channel to q outputs.
For a given q, the AWGN channel with BPSK modulation reduces to a binary input, q-ary output DMC. Let the q output symbols be r1, r2, ... , rq. From (4.304) it follows that
Ro = 1 - log 1 +
P (rj 10) P (rj 11) I
/
i=1
(4.414)
Maximizing Ro is equivalent to minimizing the sum in (4.414). Let Tj be the quantization threshold between the regions where r, and r;+1 will be the outputs. A necessary condition for this minimization is that for each i the derivative of the sum with respect to T, equals 0. Since only the terms for j = i and j = i + 1 in the sum depend on Ti, we have
P(r; 10)P(r, 11) +
d T,
/P-(r,+1
11))
1
2
P(r; 10)P(r; 11)
(P(r1 10)p(Tj 11) + p(T1 10)P(r, 11))
1
+2 P(r,+1 I 00)P(r,+t 11) x (-P(r;+1 10)p(Tt 11) - p(T 10)P(ri+1 11))
(4.415)
232
Chapter 4
Viterbi Decoding
Equation (4.415) follows from
j=i i+1 jai, j#i+l
p(TiIv),
dP(rj I v)
- p (T
i
dTi
v)
I
,
0,
(4 . 416)
where we have used the fact that
d X d x ,I f (z)dz = f (x)
4.417)
and d
i
f f(z)dz = -f (x)
(4.418)
dx The condition that (4.415) equals 0 can be written P (ri 1 0)
4
P(ri 11)P
1)+
(Ti 1
P (ri
1
1)
P(ri 10)p
P(rj+j 1 0) (T, P(rj+j 11) p 11) +
(Tt 10) (4.419)
P(rj+j 11) p (T, 0) , P(rj+j 10) 1
a ll
i
which is equivalent to
P(rj+j 10) all i p(T 11) P(ri 11) P(ri+1 11) Given that r = a, we define the likelihood ratio on the unquantized channel as p(Ti 10) =
P(ri 10)
X(a)
det p(a 1 0)
(4 . 420)
(4.421)
p(a 1 1)
For the DMC obtained by quantization of the continuous channel, we define the likelihood ratio for the output letter rj as i P (rJ 10) XrJ = P (r; 1)
(4.422)
l
The necessary condition (4.420) can then be formulated as
Theorem 4.40 A quantization scheme with thresholds Ti which converts a binary input, continuous output channel into a DMC with q output levels is optimum in the sense of maximizing Ro for all quantization schemes giving q output only if
X(Ti)=all i
(4.423)
that is, the likelihood ratio when r = Ti on the unquantized channel must be the geometric mean of the likelihood ratios for the two output letters whose quantization regions the threshold Ti divides. Remark. Lee [Lee76b] has constructed an example showing that the optimality condition (4.423) is in general not sufficient. EXAMPLE 4.15 For the AWGN with BPSK modulation at signal-to-noise ratio E,./No we have
=e e
which is a single-valued function of a.
NO
a
(4.424)
Section 4.11
Comments
233
Massey suggested the following algorithm to compute the set of thresholds [Mas74]: Choose T1 arbitrarily. This determines ..,, as well as a.(T1). Hence, we can then choose T2 such that the resulting A,,2 will satisfy (4.423). We can then choose T3 such that the resulting ,1,3 will satisfy (4.423) and so on. If we can complete the procedure up to the choice of Tq_1 and this choice gives as well
A(Tq-1) =
(4.425)
Arq-iArq
then we stop. If we are unable to choose some Ti along the way or if A(Tq-1) >
(4.426)
Arq-i Arq
then we have to decrease T1 and repeat the procedure. On the other hand, if A(Tq-1) <
(4.427)
Arq-i Arq
then we know that our choice of T1 was too small and we must adjust our guess of T1 and repeat the procedure. EXAMPLE 4.16 By applying the previous procedure, we obtain the following quantization thresholds for the AWGN channel with BPSK at signal-to-noise ratio ES/No = 0 dB and assuming Es = 1:
q
Ro(q)
T,
T2
2
0.378 0.498 0.534 0.548
0 -1.032 -1.798
0
1.03 2
-1.07 5
-0.5 12
4 8
00
T3
T4
T5
T6
T7
0
0.512
1.075
1.798
Symmetry considerations indicate that we have only one local maximum of Ro. This is also the global maximum; hence, the quantization thresholds are optimum. EXAMPLE 4.17 To achieve a cutoff rate Ro(oo) = 0.548, we need the following signal-to-noise ratios:
q
Ro
E,lNo dB
2
0.548 0.548 0.548 0.548
2.148 0.572 0.157 0
4 8
00
From the previous example it follows that compared to hard decisions there is a considerable gain in using four levels of quantization but not much of an advantage to go beyond eight levels.
4.11 COMMENTS Error probability bounds for block codes when used to communicate over the B SC were already presented by Elias in 1955 [E1i55]. The modem versions of the upper bounds for block codes as presented here are inspired by Fano [Fan6l] and Gallager [Ga165] [Ga168]. The `few" and
234
Chapter 4
Viterbi Decoding
"many" idea which we have exploited several times in this volume can be found in Fano's textbook. For convolutional codes it was first used in [CJZ84a]. Lower bounds on the error probability for block codes were derived by Shannon, Gallager, and Berlekamp [SGB67]. For convolutional codes the upper bounds on the error probability were derived by Yudkin in 1964 [Yud64] and the lower bounds by Viterbi in 1967 [Vit67]. The tightest upper bounds on the error probability were derived in [Zig85].
The ideas of constructing the convolutional coding exponent from the block coding exponent and vice versa, as well as the concept of critical length, go back to Forney [For74a]. For general, nonlinear trellis codes, Pinsker [Pin67] derived a lower bound on the error probability for decoding with finite back-search limit r. His bound is similar to the spherepacking bound for block codes of block length N = rc. Tailbiting representations of block codes were introduced by Solomon and van Tilborg [Sov79]. See also [MaW86].
PROBLEMS 4.1 Consider therateR = 1/2, memory m = 1, systematic encoding matrixG(D) = (1 1+D). (a) Draw the length f = 5 trellis. (b) Suppose that the encoder is used to communicate over a BSC with crossover probability 0 < E < 1/2. Use the Viterbi algorithm to decode the received sequence r = 1001 11 11 11 11. (c) How many channel errors have occurred if the optimal path corresponds to the transmitted sequence?
4.2 Consider the rate R = 1/2, memory m = 2 encoding matrix G (D) = (1+D+D2 1+D2). (a) Draw the length £ = 4 trellis. (b) Suppose that the encoder is used to communicate over a BSC with crossover probability 0 < c < 1/2. Use the Viterbi algorithm to decode the received sequence r= 11 11 00 11 01 11.
(c) Suppose that the information sequence is u = 1011. How many channel errors are corrected in (b)? 4.3 Repeat Problem 4.2 for r = 10 11 10 10 01 00 and u = 0100. 4.4 Repeat Problem 4.2 for r = 00 1110 0 100 00 and u = 0000. 4.5 Consider a binary input, 8-ary output DMC with transition probabilities P(r I v) given in Example 4.3. (a) Suppose that the rate R = 1/2 encoder with encoding matrix G (D) = (1 + D + D2 1 + D2) is used to communicate over the given channel. Use the Viterbi algorithm to decode r = 1314 1304 0204 0113 0311 0411. (b) Connect the channel to a BSC by combining the soft-decision outputs 01, 02, 03, 04 and 11, 12, 13, 14 to hard-decision outputs 0 and 1, respectively. Use the Viterbi algorithm to decode the hard-decision version of the received sequence in (a). 4.6 Repeat Problem 4.5 for r = 0403 1301 0111 1101 0112 11044.7 Repeat Problem 4.5 for r = 1104 0112 1101 0111 1301 0403. 4.8 Consider the binary input, 4-ary output DMC with transition probabilities given in Fig. P4.8. (a) Suppose that the rate R = 1/2 encoder with encoding matrix G (D) = (1 + D + D2 1+ D2) is used to communicate over the given channel. Use the Viterbi algorithm to decode r = 0111 1201 1102 0111 1211 1112(b) Connect the channel to a BSC by combining the soft-decision outputs 01, 02 and 11, 12 to hard-decision outputs 0 and 1, respectively. Use the Viterbi algorithm to decode the hard-decision version of the received sequence in (a).
235
Problems
Figure P4.8 DMC used in Problem 4.8.
4.9 Consider the binary input, 8-ary output DMC shown in Fig. 4.5 with transition probabilities P(r I v) given by the following table: r
0 1
04
03
02
01
11
12
13
14
0.1415 0.0001
0.3193 0.0025
0.2851 0.0180
0.1659 0.0676
0.0676 0.1659
0.0180 0.2851
0.0025 0.3193
0.0001 0.1415
Suppose that the rate R = 1/2 encoder with encoding matrix G (D) = (1+D2 1+D+D2) is used to communicate over the given channel. After appropriate scaling and rounding of the metrics, use the Viterbi algorithm to decode r = 0112 1213 0302 0103 0302 0201. 4.10 Consider the binary input, 8-ary output DMC shown in Fig. 4.5 with transition probabilities P (r v) given by the following table: I
r
0 1
04
03
02
01
11
12
13
14
0.2196 0.0027
0.2556 0.0167
0.2144 0.0463
0.1521 0.0926
0.0926 0.1521
0.0463 0.2144
0.0167 0.2556
0.0027 0.2196
Repeat Problem 4.5 for r = 0213 1202 0113 1111 1214 1111. 4.11 Consider the communication system in Problem 4.5. (a) Which parameters determine the system's error-correcting capability? (b) Find the values of the parameters in (a).
4.12 Consider a binary input, q-ary output DMC with a high signal-to-noise ratio. hich of the following three convolutional encoding matrices will perform best in combination with ML-decoding?
G1(D)=(1+D 1+D+D2) G2(D)=(1+D+D2+D3 1+D3) G3(D) = (1
1 + D2 + D4)
4.13 Suppose that we use the rate R = 1/2 convolutional encoding matrix G (D) = (1 + D2 1+ D) to encode 998 information bits followed by two dummy zeros. The codeword is transmitted over a BSC with crossover probability E. (a) What is the total number of possibly received sequences? (b) Find the total number of codewords. (c) Suppose that the received sequence r = (rlr2 ... r?) is given by
r,_
1,
0,
i = 536, 537 otherwise
{ Find the ML-estimate of the information sequence.
236
Chapter 4
Viterbi Decoding
4.14 Puncturing a given convolutional code is a method of constructing new convolutional codes
with rates that are higher than the rate of the original code. The punctured codes are in general less powerful than nonpunctured codes of the same rate and memory, but they have two advantages: From a given original low-rate convolutional code we can obtain a series of convolutional codes with successively higher rates. They can be decoded by the Viterbi algorithm with essentially the same structure as that for the original code [CCG79]. A puncturing sequence is a binary sequence; 0 and 1 means that the corresponding code symbol is not transmitted and transmitted, respectively. This is illustrated in the following little example: 11
10
00
10
11
00
original code symbols
11
10
01
11
10
01
puncturing sequence
11
1
0
10
1
0
punctured code symbols
We have used the periodic sequence [111001]°° as puncturing sequence, where [ ]°° denotes a semi-infinite sequence that starts at time 0 and that consists of an infinite repetition of the subsequence between the square brackets. Suppose that this puncturing sequence is used together with the encoding matrix G(D) _ (1 + D2 1 + D + D2) to communicate over a BSC with crossover probability E. (a) Find the rate for the punctured code. (b) Use the Viterbi algorithm to decode r = 01010101. 4.15 Consider the encoding matrix in Problem 3.9. Evaluate the upper bounds on the burst error probability for a BSC with crossover probability E.
(a) E=0.1 (b) E = 0.01 (c) E = 0.001 4.16 Repeat Problem 4.15 for the encoding matrix in Problem 3.11. 4.17 Verify (4.15) ford even. 4.18 Verify (4.25). 4.19 Van de Meeberg [Mee74] derived a slightly tighter bound than (4.28). (a) Show that P2i
(2_ 1\2 _2d (2E
2i
where 8=Id-+1
2 Hint: (1 - E)2i-t[[ Le-il (
((
(b) Show that PB
<
(28_
l
)e
I-E
= 2-21(L (2,/ )2i
1E2E
1)2-2a
J
X (2 (T(W) + T(-W)) + 2 (T(W) - T(-W))) IW=2,` 4.20 Prove van de Meeberg's bound on the bit error probability. 4.21 Show the following properties of the Gallager function (4.208).
(a) G(s) > 0, for s > 0 (b) G'(s) > O, for s > 0 (c) G"(s) < 0, for s > 0
(d) lim, G'(s) = 0 (e) G'(0) = C (f) G'(1) = Rcrit
Problems
237
4.22 Verify (4.272) for d odd. 4.23 Find a Bhattacharyya-type upper bound for the burst error probability when using a convolutional code with two codewords for communication over a binary input, unquantized output DMC with transition density function p,,,. 4.24 Show that for a binary input DMC, Ro is maximized when the inputs are used with equal probability. 4.25 (a) Find Ro for the binary input, 8-ary output DMC given in Example 4.3. (b) Convert the channel in (a) to a binary erasure channel (BEC) by combining the softdecision outputs 04, 03; 02, 01, 11, 12; and 13, 14 to hard-decision outputs 0, A, and 1, respectively. Find Ro for this channel. (c) Find Ro for the BSC in Example 4.2. 4.26 (a) Find Ro for the AWGN channel with BPSK modulation at ES/No = 1 dB. (b) Convert the channel in (a) to a 4-ary DMC with optimum Ro and find the quantization thresholds. (c) Repeat (b) for an 8-ary DMC.
4.27 Repeat Problem 4.26 for ES/No = 2 dB. 4.28 Assume that Gilbert-Varshamov's lower bound on the minimum distance for block codes (3.188) and Costello's lower bound on the free distance for convolutional codes (3.167) are asymptotically tight. Show for a rate R tailbiting trellis of length L encoded by a generator matrix of memory m that the best error correcting capability for a given decoding complexity is obtained for
Ll m =
R
where p is the Gilbert-Varshamov parameter.
-plog(21-R
- 1)
+ o(m)
List Decoding
Viterbi decoding (Chapter 4) is an example of a nonbacktracking decoding method that at each time instant examines the total encoder state space. The error-correcting capability of the code
is fully exploited. We first choose a suitable code and then design the decoder in order to "squeeze all juice" out of the chosen code. Sequential decoding (Chapter 6) is a backtracking decoding method that (asymptotically) fully exploits the error-correcting capability of the code. In list decoding, we first limit the resources of the decoder; then we choose a generator matrix with a state space that is larger than the decoder state space. Thus, assuming the same decoder complexity, we use a more powerful code with list decoding than with Viterbi decoding. A list decoder is a powerful nonbacktracking decoding method that does not fully exploit the error-correcting capability of the code. In this chapter we describe and analyze list decoding which is an important and interesting
decoding method based on the idea that we at each time instant create a list of the L most promising initial parts of the codewords. For a given decoder complexity, list decoding of convolutional codes encoded by systematic encoding matrices is in fact superior to Viterbi decoding of convolutional codes encoded by nonsystematic encoding matrices. 5.1 LIST DECODING ALGORITHMS List decoding is a nonbacktracking breadth-first search of the code tree. At each depth, only the L most promising subpaths are extended, not all, as is the case with Viterbi decoding. These subpaths form a list of size L. (Needless to say, starting at the root, all subpaths are extended until we have obtained L or more subpaths.) Since the search is breadth-first, all subpaths on the list are of the same length; finding the L best extensions reduces to choosing the L extensions with the largest values of the Viterbi metric (4.2).
Assuming a rate R = b/c convolutional encoder of memory m, we append, as was the case with Viterbi decoding, a tail of bm dummy zeros to the information bits in order to terminate the convolutional code into a block code. The following algorithm is the simplest version of a list decoding algorithm: 239
240
Chapter 5
List Decoding
Algorithm LD (List decoding) LD1. Load the list with the root and metric zero; set t = 0. LD2. Extend all stored subpaths to depth t + 1 and place the L best (largest Viterbi metric) of their extensions on the list. LD3. If we have reached the end, then stop and choose as the decoded codeword a path to the terminating node with the largest Viterbi metric; otherwise increment t by 1 and go to LD2.
Let us assume that the allzero information sequence is encoded by the rate R = 1/2, memory m = 4, systematic encoder with encoding matrix G(D) = (1 1 + D + D2 + D4) given in Fig. 5.1 and that r = 00 110100 00 00 00 00 ... is received over a BSC.
v(2)
U
Figure 5.1 A rate R = 1/2 systematic convolutional encoder with encoding matrix G(D) = (1 1 + D + D2 + D4).
In Fig. 5.2 we show the code tree that is partially explored by the list decoding algorithm.
We have used a list of size L = 3, which should be compared with the 16 states that would have been examined at each depth by the Viterbi algorithm. At the root and at depth 1, only one and two states, respectively, are extended. The upper (lower) branch stemming from an extended state represents information bit 0 (1). We notice that the correct path is lost at depth 4 and that the correct state is recovered at depth 6. A correct path loss is a serious kind of error event that is typical for list decoding. It will be discussed in depth later in this chapter. The extension process that takes place in Fig. 5.2 is illustrated in Fig. 5.3 in a way that resembles an implementation of the list decoding algorithm. The upper branches enter states at even positions counted from the top (positions 0, 2, ... , 2L - 2), and the lower branches enter states at odd positions (positions 1, 39 .... 2L - 1). An efficient representation of the extension process is given in Fig. 5.4. It is understood that we have the states in the same positions as in Fig. 5.3. The extended states are denoted by "*". The extended paths can be traced backward through the array in the following way. Suppose that the * at breadth k and depth j + 1 represents the best path traced backward so far. The best position at depth j is then represented by the (Lk/2j + 1)th * (counted from the top). Furthermore, the estimated information bit is k (mod 2). EXAMPLE 5.1 Consider the array in Fig. 5.4 and suppose that the * at breadth 0 and depth 5 represents the best path
traced backward so far. Then we obtain (L0/2J + 1) = 1, that is, the first * at depth 4, and u4 = 0 (mod 2) = 0. The first * at depth 4 is at breadth 2. Thus, continuing backward we obtain ([2/2] + 1) = 2, that is, the second * at depth 3, and u3 = 2 (mod 2) = 0. The second * at depth 3 is at breadth 2. Hence, we have (L2/2J + 1) = 2, that is, the second * at depth 2, and u2 = 2 (mod 2) = 0. The second * at depth 2 is at breadth 1. Thus, (L1/2J + 1) = 1, that is, the first * at depth 1, and u i = 1 (mod 2) = 1. Finally, the first * at depth 1 is at breadth 0 and, hence, uo = 0. In conclusion, the estimated information sequence is u = (01000...).
List Decoding Algorithms
Section 5.1
r=00
241
00
01
11
00
00
00
00
3
p0 000 3
0000 00
5
11
1000
00 000 00 0000
000 00 11
1000
01
0001
00
000
01
0010
0
01
10
1000
101
00
011
00
00
01
000
10
100
0001
1101 01
0000
1100 11
1010
1100 0 1
111
0010 10
11
10
011
0100 00
0101 10
100
01
1001
01
100
11 11
100
10 11
0100
11
00
0000
11
11
1110
0110 1001
0011 10 10 3
1110
010
11
01
1011
1? 100 4
10
00
0110
3
1100
11
1110
Figure 5.2 List decoding (L = 3). Partially explored tree demonstrating a correct path loss at depth 4 and a spontaneous recovery of the correct state at depth 6. 1
2
2
0000
000
0001
0000
0000
000
1000
100
1001
1000
100
1000
0
0000
000
0000 0
100
1000
0
0100
0100
0,1
0101
011
1100
1100
1010
1101
111
0110
011
0011
000
111
111
1011
1001
1110
100
Figure 5.3 An illustration of the extension process for list decoding (L = 3). Depth 0
1
2 3 4 5 6 7 8
*
*
0 1
Figure 5.4 An example of an array representing the extension process for list decoding (L = 3).
Breadth 2
*
*
*
3
*
*
*
4 5
*
*
*
* *
*
242
Chapter 5
List Decoding
Let us return to the partially explored tree in Fig. 5.2. We notice that at depth 8 we have a duplicate of the state (0000). The two states (0000) have accumulated Hamming distances 2 and 3, respectively. The two subtrees stemming from these two nodes are, of course, identical. Hence, a path passing the first of these two states will always be superior to a corresponding path passing the latter one. We will obtain an improved version of the list decoding algorithm if at each depth we delete inferior duplicates before we select the L best extensions to be put on the list. Searching only for duplicates of only one state (in our case the tentatively best) can easily be done linearly in L. We do not worry about duplicates of other states since subpaths stemming from nonoptimal states are deleted from the list of the L best subpaths very fast. Furthermore, obtaining the Lth poorest path in the list is of order L [Knu73], and comparing the remaining paths to this one is also of order L. Hence, the list decoding algorithm is linear in L [MoA84][AnM91].
5.2 LIST DECODING-PERFORMANCE Consider a rate R = b/c, memory m encoder for a convolutional code C, the BSC, and the received binary sequence
r[o,t] = rort ... rt
(5.1)
where r; E F2, 0 < i < t. We look at the sphere S8(r[o,t]) with radius S around the received sequence r[o,t], that is,
S8(r[ot]) where y[o,t] E
IFZt+t)c.
def
{Y[o,t]
I dx(r[o,rl,y[o,t]) < S}
(5.2)
The number of (initial parts of length (1 + t)c of the) codewords v in
this sphere is Nt(3,r[o,t])
=
def
I{v[o,t] E S8(r[o,t]) I V E C}I
(5.3)
Let N(e, r) be the maximal number of codewords which can occur in a sphere with center r and radius e over all t, that is,
N(e, r) def = max{Nt(e, r[o,t])}
(5.4)
t
Maximizing over all possible received sequences, we get the sphere of radius e with the maximal number of codewords for the code C:
Nma(e)
def
max{N(e, r)} r
(5.5)
For a list decoder that has to correct all e-error combinations, no matter where they start, it is sufficient to keep at least L = N,,,ax(e) paths in the decoder memory. Otherwise the decoder might lose the correct path, and there exists at least one e-error sequence that may not be corrected. How large should we choose L? If L > Nmax(e)
(5.6)
then the following statements hold:
(i) If at most e errors occur, then a list decoder of list size L will not lose the correct path. (ii) If the used code has free distance dreee > 2e + 1, then all e-error combinations will be decoded correctly.
Section 5.2
List Decoding-Performance
243
The parameter Nmax (e) can be illustrated by assuming that all codewords at some trellis level
are points in a plane and by using a coin [And89]. Move a coin with radius e around until it covers the largest number of codewords. The largest such number at any trellis level is Nmax(e), the least L which is necessary to avoid losing the correct path. EXAMPLE 5.2 The following table shows Nmax (e) for the systematic rate R = 1/2, memory m = 11, convolutional
encoding matrix G(1 1+D+D2+D5+D6+D8+D10+D11) with free distance dfree=9.
e
2
3
4
Nmax (e)
4
9
19
A list decoder for this encoder decodes all 4-error combinations correctly, if L is large enough. The sphere around any received sequence with radius 4 contains at most Nmax(4) = 19 codewords. So we have to use at least L = 19 paths in order to fully reach the 4-error correction potential of the given code.
In Chapter 2 we showed in Theorem 2.71 that every convolutional generator matrix is equivalent to a systematic rational encoding matrix. Consider a rate R = b/c, memory m, nonsystematic polynomial convolutional encoding matrix, and its systematic rational equivalent. Expand the rational functions of the systematic encoding matrix into power series in D and truncate after Dm. Then we obtain a systematic polynomial encoding matrix that is equivalent to the nonsystematic one over the first memory length. That is, their code trees are identical over the first m + 1 branches. Hence, the two encoders have exactly the same distance profile.
Suppose that both a nonsystematic polynomial encoder of memory m and a systematic polynomial encoder, in general also of memory m, that are equivalent over the first memory length, are used together with a list decoder. For a range of interesting values of L, the list decoder will operate mostly in the identical parts of the code trees encoded by the two encoders. The burst error probability measured at the root will be almost the same for both encoders. Consider the memory m = 31 ODP nonsystematic convolutional encoding matrix (octal notation, see Chapter 8) Gnonsys = (74041567512 54041567512) with dfree = 25. By long division of the generator polynomials and truncation, we obtain the following memory m = 31 ODP systematic convolutional encoding matrix Gsys = (40000000000 67115143222) with dfree = 16. These two encoding matrices are equivalent over the first memory length. In Fig. 5.5 we compare for various L the burst error probability PB measured at the root of the code tree when the list decoder is used to decode sequences received over the BSC. The simulations give striking support of the conclusions given above. A very different result occurs when encoders with widely varying distance profiles are tested. In Fig. 5.6 we compare the burst error probabilities at L = 32 for five encoders, all with dfree = 10, each of whose distance profile successively under-bounds the others. The result is a sequence of widely varying PB curves arranged in the same order. The encoders are G1 = (400000 714474), m = 15, dfree = 10 d = (2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8, 8, 8)
G2 = (400000 552234), m = 15, dfree = 10 d = (2, 2, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 8, 8)
244
Chapter 5
List Decoding
10-2
10-3
PB
10 4
3.5
4.5
4.0
5.0
5.5
6.0
Eb/No [dB]
Figure 5.5 Burst error probability, PB, measured at the root for list decoding-systematic versus nonsystematic convolutional encoders. 10-3
10-4
PB
10-5
3.0
4.0
5.0
6.0
7.0
Eb/No [dB]
Figure 5.6 A nested set of distance profiles leads to a nested set of PB curves.
Section 5.2
List Decoding-Performance
245
G3 = (400000 447254), m = 15, dfree = 10
d=(2,2,2,3,3,3,3,4,5,6,6,6,6,7,7,7) G4 = (400000 427654), m = 15, dfree = 10
d=(2,2,2,2,3,3,4,5,5,5,5,5,5,6,6,6) G5 = (400000 417354), m = 15, dfree = 10 d = (2, 2, 2, 2, 2, 3, 4, 5, 5, 6, 6, 6, 6, 6, 6, 7)
It appears that for a given list size L almost all the variation in the burst error probability
for different encoders can be traced to variations in the distance profiles. The burst error probability performance depends almost entirely on the distance profile and neither on the free distance nor on the type of the encoder (systematic or nonsystematic).
Now we will turn to perhaps the most important comparison: List versus Viterbi decoding. In Fig. 5.7 we show the results when nonsystematic encoders with optimum free
distances are used. The list decoder decodes a memory m = 9 encoder at L = 16, 32, 64, 128, and 256, while the Viterbi algorithm decodes three encoders of memory 7, 8, and 9, whose survivor numbers are 128, 256, and 512, respectively. From the simulations we conclude that for the same burst error probability the Viterbi algorithm requires somewhat more than twice the survivors of the list decoder. The encoders are G, = (4734 6624), m = 9, dfree = 12 G2 = (712 476), m = 7, dfree = 10 G3 = (561
753), m = 8, dfree = 12
10-s
7
PB 10
4
7
N List decoding, Gt
--- Viterbi decoding, G2 10-s
3.0
s `\N
-- Viterbi decoding, G3 - - - Viterbi decoding, Gl
4.0
,N
5.0
6.0
Eb/No [dB]
Figure 5.7 Burst error probability, PB. List versus Viterbi decoding.
Chapter 5
246
List Decoding
So far we have considered only burst error probability. The situation is quite different when we deal with bit error probability. A correct path loss is a severe event that causes many bit errors. If the decoder cannot recover a lost correct path it is, of course, a "catastrophe," that is, a situation similar to the catastrophic error propagation that can occur when a catastrophic
encoding matrix is used to encode the information sequence. The list decoder's ability to recover a lost correct path depends heavily on the type of encoder that is used. A systematic, polynomial encoder supports a spontaneous recovery. This is illustrated in Fig. 5.8 where we compare the bit error probability for list decoders with various list sizes when they are used to decode sequences received over a BSC and encoded with both systematic and nonsystematic, polynomial encoders. Both encoders have the same distance profile. The free distance of the systematic, polynomial encoder is by far the least, yet its bit error probability is more than ten times better. The encoders are Gsys = (4000000 7144655), m = 20, dfree = 13 Gnonsys = (6055051 4547537), m = 20, dfree = 22
1o1
10-2
Pb
10-3
10-4
3.0
4.0
5.0
6.0
Eb/No [dB]
Figure 5.8 Bit error probability, Pb, for list decoding-systematic versus nonsystematic, polynomial convolutional encoders.
The only advantage of the nonsystematic encoder is its larger free distance. Yet this extra distance has almost no effect on either the burst or the bit error probability. Nor does it change the list size L needed to correct e errors, as long as e falls within the powers of the systematic, polynomial encoder. A suggestion of why systematic, polynomial encoders offer rapid recovery of a lost correct path may be found by considering the trellises of rate R = 1 /2 random systematic, polynomial, and nonsystematic encoders. Suppose the correct path is the allzero one and no
Section 5.3
247
The List Minimum Weight
errors occur for a time, and consider an arbitrary trellis node. The 0-branch (the one that inserts a zero into the encoder shift register) is the one leading back to the correct path. For a systematic, polynomial encoder, the distance increment of this "correct" branch is 0.5 on the average, while the incorrect branch has increment 1.5. For a nonsystematic encoder, these average increments are both 1 and give no particular direction of the search back to the correct path. In conclusion, using systematic, polynomial convolutional encoders essentially solves
the correct path loss problem with list decoders. Since both systematic and nonsystematic encoders have the same error rate in the absence of correct path loss, systematic, polynomial encoders are clearly superior to nonsystematic ones. In Fig. 5.9 we compare list and Viterbi decoding with respect to the bit error probability.
We have chosen Viterbi decoders with complexities that are twice the list sizes of the list decoders. The list decoders outperform the corresponding Viterbi decoders. The encoders are
Gt = (4000000 7144655), m = 20, dfree = 13 G2 = (712 476), m = 7, dfree = 10 G3 = (561 753), m = 8, dfree = 12 G4 = (4734 6624), m = 9, dfree = 12
10-2
L = 16
10-3 Pb
L = 32
10-4
- List decoding G
Vb'd eco Ing, G iter '
---
2
\
%
\ % L=64
Viterbi decoding,
--- Viterbi decoding, G4 = 128
3.0
4.0
5.0
6.0
Eb/NO [dB]
Figure 5.9 Bit error probability, Pb. List versus Viterbi decoding.
5.3 THE LIST MINIMUM WEIGHT
In this section we will introduce the list minimum weight for convolutional codes. It is an important parameter when we analyze the error performance of list decoding. It is related to the number of errors that can be guaranteed to be corrected by a list decoder.
248
Chapter 5
List Decoding
In the previous section we used a sphere of fixed radius e. By counting the number of codewords within the sphere when we moved the center of the sphere to all possible received
sequences with at most e errors, we obtained guidelines on how to choose the list size L for a given guaranteed error-correcting capability e. Now we will turn the problem around. Consider a list decoder with a fixed list size L. For every t = 0, 1.... and every received sequence r[o,t] E ]FZt+t)c let 5L(r[o,t]) denote the largest radius of a sphere SsLcrio, ]>(r[o,t]) with
center r[o,t] such that the number of codewords in the sphere is Nt(SL(r[o,t]),r[o,t]) < L
(5.7)
The smallest such radius is of particular significance, and we have the following Definition For a list decoder with a given list size L, the list minimum weight wmin is wmin = minmin{SL(r[o,t])} t
(5.8)
rlo,t]
where r[o,t] is the initial part of the received sequence r. We have immediately
Theorem 5.1 Given a list decoder of list size L and a received sequence with at most wmin errors. Then the correct path will not be forced outside the list of L survivors. Unfortunately, wmin is hard to estimate. This leads us to restrict the minimization to those received sequences that are codewords. Thus, we obtain Definition For a given list size L the list weight wlist of the convolutional code C is wrist = minmin{SL(v[o,t])} t
(5.9)
ulo,tl
where v[o,t] is the initial part of the codeword v E C.
The importance of the list weight can be inferred from the following:
Theorem 5.2 The list minimum weight wmin is upper- and lower-bounded by wrist according to L2
wlistJ
Wmin
wrist
(5.10)
Proof. The right inequality of (5.10) follows immediately from (5.8) and (5.9). To prove the left inequality of (5.10), we consider the spheres S[2 (r[o,t]) of radius L z wlistJ with centers at all received sequences r[o,t]. If all such spheres contain less than L code-
words, we are done. If not, consider any one of the nonempty spheres and take another sphere Sws,(v[o,t]) of radius wrist with center at any codeword 0] in the smaller sphere (Fig. 5.10). Clearly, we have W
`SLzwrs,](r[o,t]) C Swust(v[o,t])
.11 (5)
By the definition of wrist, the larger sphere contains at most L codewords. Thus, from (5.11) it follows that the smaller sphere contains at most L codewords. From Theorem 5.1 and Theorem 5.2 we have immediately
Corollary 5.3 Given a list decoder of list size L and a received sequence with at most L Z wlistJ errors. Then the correct path will not be forced outside the list of L survivors. If the number of errors exceeds L 1 wrist], then it depends on the code C and on the received
sequence r whether the correct path is not forced outside the list.
Section 5.3
The List Minimum Weight
249
Figure 5.10 Illustration for the proof of Theorem 5.2.
EXAMPLE 5.3 Both the list minimum weight wmin and the list weight w11st for the convolutional code C encoded by the rate R = 1/2, memory m = 4, systematic convolutional encoding matrix G(D) = (1 1 +D+D2+ D4) are shown in Fig. 5.11. wmin, wlist A
4+ wlist wmin
3 .............
2
L2 wlisd
1
Figure 5.11 The list minimum weight wmin and the list weight wlist for the convolutional code C given in Example 5.3.
L
0 1
2
3
4
5
6
7
8
9
To prove a random coding lower bound on wlist we consider the ensemble of £ (b, c, 00, 1), that is, the ensemble of infinite memory, time-invariant convolutional codes with generator matrix Go
G=
G1
...
Go
G1
(5.12)
in which each digit in each of the matrices Gi, i = 0, 1, ... , is chosen independently with probability 1/2. Hence, over the ensemble £(b, c, oo, 1) all code symbols on a subpath diverging from the allzero path are mutually independent (cf. Lemma 3.14). Furthermore, these code symbols are also equally probable binary digits. The next lemma and theorem establish lower bounds on the list weight wlist that are similar to Costello's lower bound on the free distance of a convolutional code (Theorem 3.29):
Lemma 5.4 The fraction of binary, rate R = b/c, infinite memory, time-invariant convolutional codes with polynomial encoding matrices used with list decoding of list size L, having list weight wrist satisfying the inequality L wlist
((2 R - 1)(1 _
-
- log(21gR - 1) + log log(211)R
1)) - 1
(5.13)
exceeds f, where 0 < f < 1. Proof. Suppose that C belongs to the ensemble of infinite memory, time-invariant convolutional codes. Let C[o,t) denote the truncation at depth t of the code C. One of the paths
250
Chapter 5
List Decoding
in the code tree representing C[o,t) corresponds to the allzero path. In this code tree, 2b - 1 subpaths exist differing from the allzero path at depth (t - 1), and, in general, there exist (2b -1)2 6b subpaths differing from the allzero path at depth (t - £ - 1), £ = 0, 1, ... , t - 1. Since all code symbols on the subpaths differing from the allzero path are independent and equiprobable binary digits (Lemma 3.14), it follows that the probability that a subpath differing from the allzero path at depth (t - £ - 1) has weight i is equal to
(( + 1)c
11(e+l)c
i
2
The probability that the weight of this subpath is less than w is equal to
1: ( (t + 1)c )(i) (2+1)c 2
i=O
Let Vk[o t) be a code sequence arising from a nonzero information sequence, and let us introduce
the indicator function cow(Vk[O,t)) = 1
1,
if WH(Vk[O,t)) < w
0,
else
(5.14)
Let N(t) denote the number of subpaths in the truncated code tree for C[o,t) having weight less than w. Clearly, (5.15)
Nw(t) _ Y, cw(Vk[0,t)) k
Hence, we have
E[Nw(t)] _
E[tow(vk[o,t))] k
_ Y. P(wH(Vk[O,t)) < w)
(5.16)
k
= I + 1: (2b - 1)2tb E ((t + 1)C)
1
(t+1)c
2 eO i=0 By extending the sum over f in (5.16) to all nonnegative integers, we obtain an upper bound that is independent of t. Omitting the argument t in the random variable N"' (t) and upper-bounding (2b - 1)2tb by 2(b+1)b we obtain: a
0o w-1
(f + 1)c 2-(e+1)c
2(2+1)b
E[Nw] < 1 +
(5.17)
a
e=o i=0
We use the substitution
k=(f+1)c and upper-bound the right side of (5.17) by summing over k = 0, 1 , 2, ... 00 w-1
E[Nw] <
L
(k)2k(R_1)
k=0 i=0
(5.18) ,
(5.19)
l
Substituting x =
2R-1
(5.20)
into (5.19) and using the formulas 00
E k=0
(k)Xk i
xi
=
(5.21) (1 - x)i+1
Section 5.3
The List Minimum Weight
251
and
w-1
I
ti
(1 - x) i+1
i-o
l
x
I
x x )w -
(x
1-x) -
(5.22) 1
for 0 < x < 1, we obtain E[Nw] < (2R
1
(5.23)
- 1)(21-R - 1)w
For a given list size L and value f, we choose a w such that 1
1
(2R - 1)(21-R - 1)w < L(1 - f) < (2R
- 1)(21-R - 1)w+l
(5.24)
Then, from (5.23) and the left inequality of (5.24) it follows that
E[Nw] < L(1 - f)
(5.25)
and, thus, for more than a fraction f of the codes in the ensemble, we must have Nw < L, which implies a list weight wlist that is not less than iu. By rewriting the right inequality of (5.24) as W>
log ((2R - 1)(1 - f)) -1 -log (21 - R - 1) + - log(21-R - 1) log L
(5.26)
we have completed the proof. Next we shall show that our lower bound on the list weight wrist for general convolutional codes also holds for convolutional codes encoded by systematic, polynomial encoding matrices.
Consider the ensemble E(b, c, oo, 1) of binary, rate R = b/c, infinite memory, timeinvariant, convolutional codes with systematic, polynomial encoding matrices
Go G
G=
(5.27)
1
1
in which each (b x c) submatrix Gi, i = 0, 1, ... , is systematic, that is,
Gi =
1
0
. . .
0
g1,b+1
g1,b+2
g1,c
0
1
. . .
0
g2,b+1
g2,b+2
g2,c
0
0
. .
1
gb,b+1
gb,b+2
.
.
. .
(5.28)
gb,c
and each digit gij, i = 1, 2, ... , b, j = b + 1, b + 2, ... , c, is chosen independently with probability 1/2 to be 0 or 1.
Lemma 5.5 The fraction of binary, rate R = b/c, infinite memory, time-invariant convolutional codes with systematic, polynomial encoding matrices used with list decoding of list size L, having a list weight wrist satisfying inequality (5.13), exceeds f, where 0 < f < 1. Proof. Suppose that C belongs to the ensemble of infinite memory, time-invariant
convolutional codes with systematic, polynomial encoding matrices. Let C[o,t) denote the truncation at depth t of the code C. In the code tree there exist (b) subpaths differing from the allzero path at depth (t - 1) with a weight of information symbols equal to i, i = 1, 2, ... , b. Correspondingly, there exist no more than (('+')') subpaths differing from the allzero path at depth (t - £ - 1), with a weight of its information symbols equal to i , i = 1 , 2, ... , b(( + 1). The probability that the weight of the parity check symbols of the subpath differing from the
252
Chapter 5
List Decoding
allzero path at depth (t - £ - 1) has weight (j - i), j > i, is equal to
(t + 1)(c - b)\ f 1
j-i
V(e+1)(c-b)
2
Thus (cf. (5.17)), the average of the number of subpaths of C[o,t) having weights less than w is upper-bounded by the inequality \(e+1)( b) 00 w-1
j- i
i
e=o=o=o
2
(5.29)
((t + 1)c)2_(e+1)(c-b)
= 1 + Y, Y
e_o 1=0 `
i
where the equality follows from the identity J
i=O
(k) = (n+k)
(j n
i)
(5.30)
Since (5.29) is analogous to (5.17), the remaining part of the proof is similar to the proof of Lemma 5.4. By letting f = 0 and combining Lemma 5.4 and Lemma 5.5, we obtain the following Theorem 5.6 There exist binary, rate R = b/c, infinite memory, time-invariant convolutional codes with nonsystematic and systematic, polynomial generator matrices used with list decoding of list size L, having a list weight wrist satisfying the inequality log L
wrist >
- log(21-R - 1) + O(1)
(5.31)
where
log ((2R - 1)(21-R - 1))
(1)
(5.32)
-log(21-R - 1)
Since a list decoder has L states and a Viterbi decoder for a convolutional code of overall constraint length v has 2° states, our lower bound on wrist (5.31) is similar to Costello's lower bound on dreee (3.169). It follows from Theorem 5.6 that convolutional codes encoded by both nonsystematic generator matrices and systematic, polynomial encoding matrices have principle determiners of the correct path loss probability that are lower-bounded by the same bound. For the free distance, which is the principal determiner of the error probability with Viterbi decoding, the Costello's lower bounds on the free distance differ by the factor (1- R), depending on whether the convolutional code is encoded by a nonsystematic generator matrix or by a systematic, polynomial encoding matrix (cf. Theorem 3.29 and Theorem 3.31). This different behavior reflects a fundamental and important difference between list and maximum-likelihood decoding. Next we will derive an upper bound on the list weight that resembles our lower bound. In the derivation we will use a Hamming-type upper bound on the list weight for convolutional codes, but first we need the corresponding bound for block codes (cf. the Hamming bound for binary block codes given in Example 1.15). The list weight wrist for a block code 13 of block length N is obtained if we let t = 0,
and use c = N code symbols on each branch in the definition of the list weight wrist for a convolutional code C (5.9).
Section 5.3
The List Minimum Weight
253
Lemma 5.7 The list weight wrist for a binary block code 13 of block length N and rate R when used with list decoding of list size L satisfies the inequality Lz-Imj
N
O
<
(5.33)
L2N(1-R)
Z
Proof. Assume that the inequality does not hold. Then, counting all points in the spheres of radius L2 waist] around the 2NR codewords we will fill the whole space (2N points) more than L-fold. Hence, at least one of these points in the spheres must have been counted at least L + I times; that is, it must belong to at least L + 1 spheres. The codewords at the centers of these at least L + 1 spheres with one point in common are at most 2LIwlistJ apart. Let any one of these codewords be the center of a sphere of radius 2 L? wrist J . That sphere contains at
least L + 1 codewords which contradicts the definition of wrist. Thus, the inequality (5.33) must hold. Corollary 5.8 The list weight wlist for a binary block code 8 of block length N and rate R when used with list decoding of list size L satisfies the inequality (ii
(N
1+ R I N <
2 wlistJ
log N + log L + 2
(5.34)
2 where h( ) is the binary entropy function (1.89). Proof. We have the following series of inequalities: Lzwiid i=O
(N) i
(N
N
2Nh(NL2w,stJ)
/
L z wlistJ l 1
(5.35)
2Nh (NZwus,J)
2N where the first inequality is obvious and the second inequality follows from Lemma 4.23. By combining Lemma 5.7 and (5.35), we obtain 1
2 Nh (NLiwl.s,1)
2N
< L2 N (
1
-R
)
(5.36)
Taking the logarithm completes the proof. For convolutional codes we have the following counterpart to Lemma 5.7:
Lemma 5.9 The list weight wrist for a convolutional code of rate R = b/c when used with list decoding of list size L satisfies the inequality Lzwustj
(t + 1)c
i=o
i
1: (
I < L2(t+1)x(1
R)
(5.37)
/
fort=0,1..... Proof.
Follows immediately from Lemma 5.7 applied to CLo,t) for each t = 0, 1, ....
Theorem 5.10 The list weight wlist for a binary, rate R = b/c convolutional code when used with list decoding of list size L satisfies the inequality 2 log L
wrist <
_ log(21-R - 1) +0 (log L)
(5.38)
254
Chapter 5
Proof.
List Decoding
For convolutional codes we obtain from (5.37) the following counterpart to
(5.34)
(h ((t + 1)c [2 wlist] ) - 1 + R) (t + 1)c < 2 log((t + 1)c) + log L + 2 for t = 0, 1,
(5.39)
.... Let to + 1 = [c(1
1
1
(5.40)
2wlist 11
- 2R-1)
Then, for any given rate R > 0 there exists a sufficiently large L 2 wrist] such that the inequalities 2
1
c [Z w1is1] < to + 1
c(1
-12R-1)
(5.41)
[Zwlis11
hold.
Let t = to in (5.39). From the left inequality in (5.41) it follows that the argument for the binary entropy function in (5.39) is less than 1/2. From the right inequality in (5.41) it follows that this argument is at least (1 - 2R-1). Since the binary entropy function is increasing in the interval [0, 1], we obtain
(h(1 - 2R-1) - 1 + R)(to + 1)c < Z log((to + 1)c) + log L + 2
(5.42)
By simple manipulations, we can show that
h(1-2R-1)_1+R=-(1-2R-1)log(21-R_1)
(5.43)
Combining (5.42) and (5.43) gives the important inequality 1
-(to + 1)c(1 - 2R-1) log(21-R - 1) <
2
log((to + 1)c) + log L +
2
(5.44)
By using the well-known inequality
logx<x-1,x>0
(5.45)
(to+1)Ac- (to+1)c-logL <0
(5.46)
A = -(1 - 2R-1)log(21-R - 1)
(5.47)
it follows from (5.44) that
where
By combining (5.41) and (5.46) we obtain
2logL wrist <
- log(21-R -
1)
+(1-2 R -
I
)
1+ 4A log L A2
+
1-2 R-1 A2
(5.48) 2
L
+(1-2R 1)c+2=_log(21gR-1)+0( log L) Thus, the proof is complete. It is interesting to notice that the main term in the upper bound for wlist is exactly twice the main term in the corresponding lower bound. Finally, from Theorem 5.6 and Theorem 5.10 we obtain:
Theorem 5.11 Given any received sequence with at most e errors. Then there exists a rate R = b/c convolutional code such that the correct path is not lost when it is used with list
Section 5.4
Upper Bounds on the Probability of Correct Path Loss
255
decoding of list size L satisfying (asymptotically) the inequalities
R_llZe
(5.49)
\21
In particular, we have
Corollary 5.12 Given any received sequence with at most e errors. Then there exists a rate R = 1/2 convolutional code such that the correct path is not lost when it is used with list decoding of list size L satisfying (asymptotically) the inequalities (1+'/2-)2
;
2
(5.50)
As expected, the required list size grows exponentially with the number of errors to be corrected.
5.4 UPPER BOUNDS ON THE PROBABILITY OF CORRECT PATH LOSS P1
The correct path loss on the tth step of a list decoding algorithm is a random event 4C which
consists of deleting at the tth step the correct codeword from the list of the L most likely codewords. In this section, we derive both expurgation and sphere-packing upper bounds on the probability of this correct path loss for the ensemble E(b, c, oc, 1) of infinite memory, time-invariant convolutional codes.
Our expurgation bound is valid for transmission rates R less than the computational cutoff rate R0:
Lemma 5.13 (Expurgation bound) For a list decoder of list size L and the BSC with crossover probability E there exist infinite memory, time-invariant, binary convolutional codes of rate R = b/c with systematic and nonsystematic, polynomial generator matrices such that the probability of correct path loss is upper-bounded by the inequality
P(Ecp') < L-sO(1) for rates 0 < R < R0, where s, 1 < s < oo, satisfies R = Gexp(s)/s
(5.51) (5.52)
GeXP(s) is the expurgation function (4.185), and 0(1) is independent of L. Remark. When the rate R -* Ro the exponent of the L-dependent factor of the upper bound in Theorem 5.13 approaches -1, while the second factor approaches +oo.
In the proof we need the following
Lemma 5.14 (Markov inequality) If a nonnegative random variable X has an average E[X], then the probability that the outcome exceeds any given positive number a satisfies E[X] P(X > a) < (5.53) a Proof. The lemma follows immediately from
E[X] = Y, xP(X = x) > 1: xP(X = x) X
x>a
>a1: P(X=x)=aP(X>a) x>a
(5.54)
256
Chapter 5
List Decoding
Proof of Lemma 5.13. Consider the ensembles £(b, c, oo, 1) of binary, rate R = b/c, infinite memory, time-invariant convolutional codes with systematic or nonsystematic, polynomial encoding matrices, respectively. Let wLL/2j-list denote the list weight when the list size is [L/2J. Then we let the LL/2J codewords of weight wLL/2J_list or less be on the list of size L. The remaining [L/21 places are filled with codewords having weights more than w[L/2J-list. This is done as follows. Expurgate from the code ensemble all codes for which w LL/2j -list does
not satisfy the inequality
+ log ((2R - 1)(1 - f )) log[L/2j - log(21-R - 1) - log(21-R - 1)
wLL/2j-list >
- 1 def wo
(5.55)
and consider from now on only the subensembles for which the inequality (5.55) is satisfied. According to Lemmas 5.4 and 5.5, each of these subensembles contains more than a fraction f of the codes of the ensemble. Assuming list decoding of list size L of the codes of each subensemble, a necessary condition to lose the correct path at the tth step of the decoding is that the number of subpaths in the tree at the depth t, having weight more than wLL/2J-list and surpass the correct path, exceeds IL/21 since then the total number of paths exceeds L. The probability that a particular subpath of weight j would surpass the correct (allzero) path is upper-bounded by the Bhattacharyya bound (4.15), viz., by (2 E (1 - E))' . The fraction of subpaths diverging from the allzero path at depth (t - 2 - 1) having weight j is upper-bounded by
((t + 1)c)2-(e+l)c
j
The average of the number of subpaths having weight not less than wo and surpassing the correct path evaluated over the ensemble is upper-bounded by 00
00
Ee=0Y,j=wo 2W+1)
((i + 1)c)2-(e+t)c (2 E(1 - E))
\
i
If we evaluate the average over the subensemble containing only the fraction f of the codes, then the upper bound is weakened by a factor of 1/f. Hence, for f > 0 in the subensemble, the mathematical expectation of the number N of subpaths, which surpass the correct path, is upper-bounded by
E[N]
0
< f 1: 1
((f
00
j 1)c)2-(e+1)(c-b) (2
E(1 - E))
I
e=0 j=wo 00
<
00
I f k=0 j=wa (k)2k(R-l) 1
00
1
(2 E(l - E))'
(5.56)
( 2 E(1 -E)
2R-1)f
(1 -
J-wo
(21-R - 1) )
The sum on the right-hand side of (5.56) converges if 2
E(1 - E)
21-R - I
21-R0 - 1
21-R - 1
<1
(5.57)
Section 5.4
Upper Bounds on the Probability of Correct Path Loss
257
that is, if R < Ro. From (5.56) we have for f > 0 that
E[N] <
1
(2 E(1 -E)1)
(1 - 2'')f
21-R - 1
21-R
(2
(2
1
f (I - 2R-Ro) <_ (LL/2J)
- log 2 e(1 1 q 106(2 -
1
1 - 22 I
RI-E)
-1
E(l
21-R - I
f (21-R - 21-&)
-
WO
E(l -E)\WO 21-R - 1
(LL/2J) f
(5.58)
I-
2R-Ro
los 2 Ea-E) +1 X((2R-1)(1-f)(21-R'°6z
"
where the last inequality follows from (5.55). In Chapter 4 we introduced the expurgation function (4.185) 1/ Is
(5.59)
= sR
(5.60)
E)\
1 - log 1 +
Gexp(s) = s
1
Let s satisfy (5.52), then combining (5.52) and (5.59) yields s
(1 - log 1 + (2/e(1 -
E))1/s
Assuming s # 0, we obtain
log (2E(1 -E) log(21-R - 1)
s
(5.61)
Thus, inequality (5.58) can be rewritten as 1R
1
(LL/2J)1-s
E[N] <
f 1-2 - oR
((2R - 1)(1
-
f)(21-R
- 1))
1-s
(5.62)
From Lemma 5.14 it follows that the probability of correct path loss at the tth step of list decoding for rates R < Ro is upper-bounded by the inequalities P(E1p1)
< E[N] < L-' Of (1)
where 2R-R0
Of (1) =
2s
1
f
((2R - 1)(1
- f)(2 1-R -
(5.63)
1))1-S
(5.64)
If we choose, f = 1 /2 say, we get 01/2(1) =
((2R
1 - 2R R0 aef
- 1)(21-R -
1))1-s
(4E(1 - E))1o6(2T
(5.65)
O (l)
and the proof is complete.
For the sphere-packing region, Ro < R < C, we have the following upper bound on the probability of correct path loss:
258
Chapter 5
List Decoding
Lemma 5.15 (Sphere-packing bound) For a list decoder of list size L and the BSC with crossover probability E, there exist infinite memory, time-invariant, binary convolutional codes of rate R = b/c with systematic and nonsystematic, polynomial generator matrices such that the probability of correct path loss is upper-bounded by the inequality
P(gCPl) < L-sO(log L)
(5.66)
for rates Ro < R < C, where s, 0 < s < 1, satisfies
R = G(s)/s
(5.67)
0(log L) -*
G(s) is the Gallager function (4.208), and log L constant when L oc. Proof. For the sphere-packing region we shall as before exploit the idea of separating the error event SI Pl into two disjoint events corresponding to "few" T and "many" .M errors, respectively. Hence, we have (cf. (4.76))
P' [P(EEPl)] < L' [P(E' ', F)] + P(M)
(5.68)
Without loss of generality, we assume that the allzero sequence is transmitted. Let r[o,t) _ rorl ... rt_l, where r; = (r,lr,2 ... r«), denote the received sequence of length t c-tuples. We introduce a backward random walk 0, So, Sl, ..., St-1, where f
Se=EZ;, 0
(5.69)
=o
starting with 0 at a node at depth t, So at depth t - 1, and so on until we have St -I at the root. The branch metric Z, is given by C
Zi_
(5.70)
Yii i=1
where
YIi=
a,
if r(t_l_i)i = 0
,B,
otherwise
(5.71)
and (cf. 4.321)
a = log __
(1_E)T
zil_E
p
to b
E
2(1
-R
2 T+s
_E).+Z
(5.72)
- R
The parameter s will be chosen later. We say that those error patterns for which St hits or crosses (from above) a certain barrier u contain "many" errors. Following the same steps as in the derivation given by (4.322) to (4.327), we obtain (see also Corollary B.6)
P(M) = P(min{SP} < u) < 2s"
(5.73)
where 0 < s < so and so is a positive root of the equation
G(so) = soR
(5.74)
Next we upper-bound the probability that the correct path is lost at depth t and that we have an error pattern with "few" errors. Let Nt denote the number of paths at depth t which surpass the correct path. Then, since the correct path is lost at depth t if Nt > L, it follows from the
Section 5.4
Upper Bounds on the Probability of Correct Path Loss
259
Markov inequality (Lemma 5.14) that P(Et
p1
E[NN I -F]P(.F)
, ") <
(5.75)
L
Consider an arbitrary subpath which diverges from the correct path at depth (t - £ - 1), 0 < f < t. A necessary condition that this path surpasses the correct one is that the Hamming distance between this path and the received sequence is not more than the number of channel errors i in the last (f + 1) subblocks. In case of the event F, that is, that we have "few" errors, the random walk must satisfy St > u. Thus, assuming nonsystematic generator matrices, we obtain
E[Nr I F]P(.F) <
(f + I)C
Zb(e+1)
i
t=0 iISe>u w=0
x
(5.76)
C(t + 1)c)2-(t+1)c w
For St > u we have 2>z(st u)
>1
(5.77)
where X2 > 0. Combining (5.76) and (5.77) yields t-1
E[Nt I F]P(F) <
+ I)c
Y2b(e+1)
E (1 -
E)(e+1)c-i
i
t=O i Isc>u w=0
x ((t + 1)c)2_(t+1)c2X2(St-u) W
t-1 (8+1)c
i
< 2->wzu
2
b(P+1)
(f + 1)c)
Et (I -
2=0 i=0 w=0 x
(5.78)
(y + W
< 2 )2
J)
t-1 (t+1)c (t+1)c
((Q + 1)C
Y, E 2b(t+1)
e=0 i=0
e (1 -
w=0
x ((t + 1)c)2-(t+1)c2A1(i-w)2A2(a((t+1)c-i)+13i) W
where 1 > 0. From (5.78) it follows that E [Nt I F] P (Y) < 2-;12U Y 2e(t+1)
1+2_;"\V+l)c 1v
2
Jt
(5.79)
f=0
x (62''i+)12 + (I Let 1
1 -6
1 +S log
(5.80)
E
),2=1-s
(5.81)
u = -logL
(5.82)
and
List Decoding
Chapter 5
260
Then we obtain I
2b(e+1) (ET +, + (1 - E) t+s
E[Ni I F]P(.F) < 2(1-s)logL
(e+1)c
I
2(1 - E) ++s
e=o
(1=E\ t+s El
02
(e+1)c
s)R + (1 -
(1
X
E)l+i+s2-(1-s)R
1-s
12Et'+=+z(1-E)t+sl
=
E00 2b(e+1) E -L
(1+s)(e+1)c
t
1
L1-s
+ (1 - E) t+o
2
s(e+1)c2(s-1)R(e+1)c
e=0
=
(5.83)
00 L1-s
T2
R(f+ l)c2 -G(s) ('+')c2(s- 1)R(f+l)c
e=0 00
L1-s E2- (G(s)-sR)(f+l)c e=0
=
00
2-(G(s)-sG(so)/so)(e+l)c
L1-s
e=o
= L1-s
1
2(G(s)-sG(so)/so)c
-1
where G(s) is the Gallager function (4.208) and the last sum converges for all s such that 0 < s < so and so satisfies (5.67). Combining (5.68), (5.75), (5.82), and (5.83) yields 1 E[P(£,Pl)] < Ls 2(G(s)-sG(so)/so)c
=
- 1+2 slogL
(5.84)
1
L-s
1 - 2-(G(s)-sG(so)/so)c
Let us choose
S=so
1
- log L
(5 85) .
Then we obtain
E[P(EEPl)] < L-soO(logL) where
0(log L)
log L is complete.
(5.86)
-+ constant when L -+ oo, and the proof for nonsystematic generator matrices
For systematic, polynomial generator matrices, we replace (5.76) by (cf. (5.29)) e-1
L' [Nt
i
w
I F]P(.F e=0 ` EiISe>u E w=0 Y k=0 X
(f + 1)c l
Et (1 -
E)(e+1)c-i
(5.87)
((t + 1)b'\ ((f + 1)(c - b)'\2-(e+1)(c-b) k
w-k
Using the identity (5.30), we can rewrite (5.87) as (5.76) and then repeat the steps (5.78)-(5.89).
Since
0 (log L) = 20(IoglogL) = L
0 ogL L)
= L0(l)
(5.88)
Lower Bound on the Probability of Correct Path Loss
Section 5.5
261
where o(1) -+ 0 when L -f oo, it follows that the expurgated bound (5.51) can be written P(£ip) < L-s+o(1) (5.89) for rates 0 < R < R0, where s satisfies (5.52). We can now summarize Lemma 5.13 and Lemma 5.15 in the following Theorem 5.16 For a list decoder of list size L and the BSC with crossover probability E, there exist infinite memory, time-invariant, binary convolutional codes of rate R = b/c with systematic and nonsystematic, polynomial generator matrices such that the probability of correct path loss is upper-bounded by the inequality P (£C p) <
L-s+o(1)
(5.90)
where s satisfies
J R = Gexp(s)/s, 0 < R < Ro R = G(s)/s, Ro < R < C
(5.91)
GeXp(s) is the expurgation function (4.185) and G(s) is the Gallager function (4.208), and
o(1)--0when L--* oo. The upper bound in Theorem 5.16 can be rewritten as P(£CPl) < L-s+o(1) = 2-(s+o(1))logL = 2-(EA(R)+o(1))(log L)/R
(5.92)
where EC (R) is the convolutional coding exponent (4.215). Hence, if we choose the list size L equal to the number of encoder states, assuming that v = bm, that is, L = 2bm
(5.93)
then our upper bound on the probability of correct path loss (5.92) coincides with the upper bound on the burst error probability (4.214). For the ensemble of general, nonlinear trellis codes it can be shown that for list decoding the exponent of (5.90) in the sphere-packing region, viz., s = G (s) / R, is somewhat surprisingly correct for all rates, 0 < R < C [ZiK80]! We conjecture that list decoding of convolutional codes encoded by systematic, polynomial generator matrices is superior to Viterbi decoding of convolutional codes encoded by nonsystematic generator matrices. This conjecture is in fact given strong support by the experiments reported in Section 5.2. 5.5 LOWER BOUND ON THE PROBABILITY OF CORRECT PATH LOSS
As a counterpart to the upper bound on the probability of correct path loss for list decoding when used to communicate over the BSC, we will derive a lower bound. Following the path of reasoning in Section 4.6, we begin by proving the corresponding lower bound for block codes.
Theorem 5.17 Suppose that a block code B of rate r and block length N with list decoding of list size L is used to communicate over the BSC with crossover probability E. If
r" = r -
log L
N
(5.94)
and
_ N
E. h(r) log L E
Esh(F)r 13
(5.95)
262
Chapter 5
List Decoding
where EC h(r) and E h(F) are the convolutional sphere-packing exponent (4.258) and block sphere-packing exponent (4.256), respectively, then the probability that the correct codeword will not be on the list of L codewords chosen by the list decoder is lower-bounded by PL(Ecpl) > L-s+o(1)
(5.96)
r = G(s)/s
(5.97)
where s, 0 < s < oo, satisfies
0 when L -- no.
G(s) is the Gallager function (4.208), and o(1)
Proof. Let P(E°pl i) denote the probability that the transmitted codeword v(`), i = 0,1,...,2 rN - 1, will appear on the list of the L most probable codewords that is produced by the list decoder. For each received sequence r, a list of L codewords is produced. Let D; I
denote the set of received sequences r such that the codeword v(`) is on the list corresponding to r. Then we have P(Ecpl I
i) _
P(r I v(`))
(5.98)
rED;
Clearly, -r)N min{I D; I} < L2 rN = 2(1
(5.99)
P(Ecpl) J max{P(ECpl
5.100)
r
where r is given by (5.94). Let I
i)}
1
Then, by repeating the steps (4.233)-(4.240) we obtain the following parametric lower bound I2(1 -i)N >
N ko-1
P(Ecp) > ( NJ) Eko+1(1 - E)N-kp-l ko+
(5 . 101)
where ko is the largest integer such that (cf. (4.234)) ko-1 k=O
k (N\
< min{ I D, 11
(5.102)
Following the steps in the proof of Theorem 4.24 yields
P(Ecpl) >
2-(Esph(T)+o(1))N
(5.103)
where ESph r is given by (4.256) and o(l) --->. 0 when N -+ no. Inserting (5.95) into (5.103) yields ° = L-s+o(1) P(Ecp1) > 2-(E'(r)/r+o(1))IogL
(5.104)
where s, 0 < s < no, satisfies (5.97). We notice that r as given by (5.94) is related to r by the equation
EC (r)
r
r-r
_
E' (r)
(5.105)
which shows the existence (at least for large N) of r and N satisfying (5.94) and (5.95), respectively. We will now prove the corresponding lower bound on the probability of correct path loss for convolutional codes.
Lower Bound on the Probability of Correct Path Loss
Section 5.5
263
Theorem 5.18 Suppose that a rate R = b/c convolutional code with list decoding of list size L is used to communicate over the BSC with crossover probability E. Then the probability of correct path loss is lower-bounded by the inequality P(EcPl) > L-s+o(1)
(5.106)
R = G(s)/s
(5.107)
where s, 0 < s < oc, satisfies and o(1) -> 0 when L -+ oc. Proof. The proof is similar to the proof of Lemma 4.25. The theorem states that for any e > 0 there exists a list size LE such that for any L > LE we have P(Ecpl) > L-(s+E)
(5.108)
Now suppose that inequality (5.108) does not hold. Then as a consequence, a convolutional code and a certain e > 0 exist such that for any large LE there exists a list decoder with list size L > LE such that P(Ecpl) < L-(s+2E)
(5.109)
Construct a block code by truncating this convolutional code so that its rate is equal to the rate R of the convolutional code (no zero termination) and its block length N is given by sph
EC (R) log
Es (r")R L
N
(5.110)
where RlogL
(5.111)
For this block code the probability that the correct codeword will not be on the list is upperbounded by the probability of correct path loss for the convolutional code at depths 1, 2, ... , N/c. Using the union bound, we obtain from (5.109) and (5.110) that PL (Ecp) < N L-(s+2E)
sph = EC (R) log L L-(s+2E) E13sph
C
(r)Rc
(5.112)
Let us choose LE such that
E h(R) log LE
< (LE)E
(5.113)
Es3h(r)Rc Then, for any L > LE we have PL (Ecpl) < L-(s+E)
(5.114)
in contradiction to Theorem 5.17. Hence, we conclude that inequality (5.108) must hold and the proof is complete. Let us rewrite the lower bound in Theorem 5.18 as P(Ecpl) > L-s+o(1) = 2-(s+o(1))logL _ 2-(E'"(R)+o(1))(IogL)/R
(5.115)
where EC h(R) is the convolutional sphere-packing exponent (4.258). Thus, if we choose L = 2bm
(5.116)
then our lower bound on the probability of correct path loss coincides with the lower bound on the burst error probability for high rates given in Lemma 4.25. For maximum-likelihood
264
Chapter 5
List Decoding
decoding, we derived a tighter lower bound for low rates (Lemma 4.26). Such a bound does not exist for list decoding. 5.6 CORRECT PATH LOSS FOR TIME-INVARIANT CONVOLUTIONAL CODES In this section, we use a path weight enumerator and derive upper bounds on the probability of correct path loss for time-invariant convolutional codes which are similar to Viterbi's bounds
in Section 4.2. Consider the trellis for a rate R = b/c, time-invariant convolutional code encoded by a generator matrix of memory m. The signal flowchart introduced in Section 3.10
consists of 2bm + 1 states, since the zero state is split into two-a source and a sink state. Let j(W), j = 1, 2, ... , 2b"` - 1, be dummy variables representing the generating functions of the weights of all paths leading from the left zero state to the 2b"' - 1 intermediate states, respectively. Assuming W < 1, order these dummy variables in decreasing order; that is,
1(fl') > 2(W) > ... > 26m-1(W)
(5.117)
Notice that since W < 1, a large value of (W) corresponds to low weight on the paths that are represented by (W ). 2"-1 defj(W) Tf(W)
(5.118)
j=f+1
As before E°P1 denotes the event that the correct path is deleted at the tth step from the list of the L most likely codewords. Then we have the following
Theorem 5.19 For convolutional codes encoded by a generator matrix with £-list path weight enumerator TT(W) and used to communicate over the BSC with crossover probability E, the probability of correct path loss for a list decoder of list size L is upper-bounded by Tf(W) IW=2 E(1-F)
L_f
one i
(5.119)
Proof. Since a convolutional code is linear, we can without loss of generality assume that the allzero codeword has been transmitted. The states at depth t are ordered according to increasing weights of the best paths leading to these states. If the allzero state (i.e., the state corresponding to the transmitted path) is not among the L best states, a correct path loss has occurred. The probability that a certain path of weight w will be ranked by the list decoder higher than the allzero path is upper-bounded by the Bhattacharyya bound (4.15), viz.,
(2J E(1 - E))'. Then, it follows that the average of the number of states ranked higher than the allzero state is upper-bounded by 2bm-1
To(W) = Y j(W)
(5.120)
j=1
where
W = 2,4 (1 - E)
(5.121)
From (5.120) and Lemma 5.14 we obtain the upper bound P(.6rp1)
= P(N > L) <
To(W) I w=2 E(1-E)
L
where N denotes the number of states ranked higher than the allzero state.
(5.122)
Section 5.6
Correct Path Loss for Time-Invariant Convolutional Codes
265
The bound (5.122) can be improved if we assume that the £, 0 < £ < L, states represented by the 2 largest values of the dummy variables, viz., j, 1 < j < £, always remain on list. This assumption holds when both the crossover probability E and the list size L are relatively small (i.e., in the situations that are of practical interest). Assuming that these £ states will always stay on the list, we expurgate these states and consider only list size L - f. Then the average of the number of remaining states ranked higher than the allzero state is upper-bounded by 2b--1
Tt(W) = Y j(W)
(5.123)
j=e+1
where W is given by (5.121). Again, we apply Lemma 5.14 and obtain TQ(W) IW=2 E(1-E)
P(£rPI)=P(n>L-f)<
L-2
Optimizing over f completes the proof. EXAMPLE 5.4
The signal flowchart for the rate R = 1/2, memory m = 3, systematic encoding matrix G(D) _ (1
1 + D + D3) is given in Fig. 5.12. We obtain the following linear system of equations:
1(W) = 2(W) +W3(W) 42(W) = WS4(W) +45(W)
4(W) = W2 +
(5.125)
5(W) = W25(W)
6(W) =
S7(W) = W46(W) + W247(W)
Solving (5.125) gives:
2(W)
def
';l(W) = W31-3 W2 4W6
W6 4W8+ W10
def
4
W2
2 (W) = W4 1-3WZ-4W47W6 4W8+Wbo = s (W) defo(W) 2-3WZ+3W6-W 3(W) = u/4 4
1-3W -4W +7W -4W 8 +W 10 4 6 - -W6+W 12W2
def
3-2W2
def
t4(W) = W2 1-3W2- 4W +7W -4W +W W5
S5(W) = t6
(W)
=
= °(W) I
(5.126)
1-3W2-4W4+7W6-4W8+W10 =7 (W)
(W)
def -2W2+2W6-W +7W -4W 8+Wio
W3
I -3W2+4W
def tt
4
2
30
s7 (W) = W4 1-3W2-4W41+7W6 4W$+W1o - s6 (W)
where 1=io(W), 1 < i < 7, denotes a reordering in decreasing order for small values of W. For small values of W we obtain from Theorem 5.19 the following upper bounds on the probability of the correct path loss: 7
L = 1 : P(E°PI) <
i (W)
W2
(5.127)
j=1 7
7
L = 2 : P() < min 2 t i (W), 1 i (W) j=1 j=2
ti 2W3
7
7
7
j=1
j=2
j=3
L=3:P(EIP1)<min
(5.128)
W3
(5.129)
Chapter 5
000
000
Figure 5.12 Signal flowchart for the encoding
1(WI
matrix in Example 5.4.
7
7
L = 4 : P(E°p') < min 4 Y, Co (W), 3
j=3
j=1
J
7
i (W), 4
7
7
(W), Y mw), j=2
YO(W),
Ej (W), 2
(5.131) 7
Yj (W)
2W4
j=5
j=4
j=3
(5.130)
j=4
7
L=5: P(S°pl) < min 5
j (W), j=2
j=1
3
List Decoding
A probability of correct path loss of the order W3 requires L = 2, while a probability of correct path loss of the order W4 requires L = 4.
For the Gaussian channel we have the corresponding bound:
Theorem 5.20 For convolutional codes of rate R encoded by generator matrices with f-list path weight enumerator Te (W) and used to communicate over the AWGN channel with signal-to-noise ratio Eb/No, the probability of correct path loss for a list decoder is upperbounded by P(Scpl)
< omi
i
Te(W
w ,-REe/NO
LI
I
(5.132)
Proof. This theorem follows immediately from the proof of Theorem 5.19 if we replace (2,/c (I - ))w by the corresponding Bhattacharyya bound (Theorem 4.29) for the Gaussian channel, viz., (e-REb/N°)w (see also Theorem 4.7).
5.7 COMMENTS Elias introduced list decoding for block codes in 1955 in the very same paper where convolutional codes were introduced for the first time [Eli55]. Anderson introduced a list algorithm for source coding in his M.Sc. thesis 1969 [And69]. He called it the "M-algorithm", where M denotes the number of extended paths. Actually, it was conceived in August 1968 during a camping trip in Quebec, Canada. "It was very cold and I tried to forget how cold it was!" Early theoretical work on list decoding was done by Zigangirov and Kolesnik [ZiK80]. Further investigations of the list decoding algorithm (the M-algorithm) were done by Anderson
267
Problems
and his students; see, for example, [Lin86] [And89] [And92]. The bounds on the list weight were derived by Johannesson and Zigangirov, see [JoZ96], which together with [Zi093] contains various bounds of the same type as those presented here. Use of systematic convolutional encoders to solve the correct path loss problem was first reported in [Ost93]. This phenomenon is discussed in depth in [OAJ98].
PROBLEMS 5.1 Consider a BSC with the error sequence e = 0101 10 00 00 .... Decode the received sequence for the following combinations of encoders and decoders: (a) Systematic convolutional encoding matrix G(D) = (1 1 + D + D2 + D4), dfree = 5. List decoder with L = 3. (b) Systematic convolutional encoding matrix G(D) = (1 1 + D + D3 + D4 + D5 + Ds), dfree = 7. List decoder with L = 3. (c) Nonsystematic convolutional encoding matrix G (D) _ (1 + D + D2 1+D 2), dfree = 5. Viterbi algorithm.
(d) Nonsystematic convolutional encoding matrix G(D) = (1+D+D4 1+D2+D3+D4), dfree = 7. Viterbi algorithm.
5.2 Consider a BSC with the error sequence e = 010 1 10 01 00 00 .... Decode the received sequence for the following combinations of encoders and decoders: (a) Systematic convolutional encoding matrix G(D) = (1 1 + D + D2 + D4), dfree = 5. List decoder with L = 3. (b) Systematic convolutional encoding matrix G(D) = (1 1 + D + D2 +D 5 + D6 + D8 +
D10 + D"), dfree = 9. List decoder with L = 3. (c) Nonsystematic convolutional encoding matrix G(D) = (1+D+D2 1+D2),dfree = 5. Viterbi algorithm.
5.3 Consider the rate R = 1/2 convolutional encoding matrix G(D) = (1 + D3 + D4 1 + D4). Suppose that the encoder is used to communicate over a BSC with crossover probability E. Use the list decoder with L = 4 to decode the received sequence r = 11 01 10 10 11 00 10 11 10 0101 11. Eight information bits followed by four dummy zeros have been encoded. 5.4 Consider the binary input, 8-ary output DMC shown in Fig. 4.5 with transition probabilities P(r I v) given by the following table:
r
V
0 1
04
03
02
01
11
12
13
14
0.2196 0.0027
0.2556 0.0167
0.2144 0.0463
0.1521 0.0926
0.0926 0.1521
0.0463 0.2144
0.0167 0.2556
0.0027 0.2196
An information sequence is encoded by the encoding matrix G(D) = (1+D+D2 1+ D2). Use a list decoder of list size L to decode r = 0213 1202 0113 111, 1214 1111. Find the minimum value of L for which the output from the list decoder is a maximumlikelihood estimate of the transmitted information sequence. (Use appropriate scaling and rounding of the metrics.) 5.5 Consider the systematic convolutional encoding matrix G(D) = (1 1 + D) with dfree = 3. Find the following parameters: (a) N,,,_ (e) for e = 1, 2. (b) wmin for L = 1, 2, 3. (c) wl;st for L = 1, 2, 3. 5.6 Repeat Problem 5.5 for the systematic convolutional encoding matrix G(D) = (1 1 + D + D2) with dfree = 4.
268
Chapter 5
List Decoding
5.7 Repeat Problem 5.5 (a)-(c) for the nonsystematic convolutional encoding matrix G(D) =
(1+D+D2 (d) Find two systematic convolutional encoding matrices that are equivalent over the first memory length.
5.8 Consider the rate R = 1/2, memory m = 3, nonsystematic encoding matrix G(D) = (1 + D + D2 + D3 1 + D2 + D3), and assume that it is used together with list decoding with list size L = 3 to communicate over the BSC. Show that P(E°P') < 1.5W3, where W = 2 E(1 - E).
Sequential Decoding
In Chapter 4, we saw that Viterbi decoding is maximum-likelihood and that the complexity of Viterbi's algorithm grows exponentially with the overall constraint length v of the encoder. If a certain application requires extremely low error probability, we may have to use an encoder
with such a large overall constraint length, v = 25 say, that the Viterbi decoder would be hopelessly complex. In this chapter, we describe a class of algorithms, called sequential decoding algorithms,
whose complexity is essentially independent of the memory of the encoder. In sequential decoding, only one encoder state is examined at each time instant. By allowing backtracking, sequential decoding approaches asymptotically a maximum-likelihood estimate of the transmitted sequence (with increasing encoder memory). It is an example of a decoding method that (asymptotically) fully exploits the error-correcting capability of the code. We choose a code whose encoder memory is long enough to warrant essentially error-free decoding. Sequential decoding algorithms are almost maximum-likelihood, and they can be easily implemented. When we consider convolutional encoders with overall constraint lengths so large that
Viterbi decoding is impractical, we usually view the encoding process as a walk through a tree instead of through a trellis. The fundamental idea behind sequential decoding is that in the decoding process we should explore only the most promising paths. If a path to a node looks "bad" we can discard all paths stemming from this node without any essential loss in performance from that of a maximum-likelihood decoder. We begin by introducing a quality measure that could guide a decoder in its search for the most promising path to explore. This will lead us naturally to the stack algorithm, which is the simplest one to describe and analyze. After discussing the stack algorithm, we will be well prepared for the more intricate Fano algorithm. Then we discuss Creeper which combines the best properties of the other two algorithms. For all three algorithms we analyze their computational performances and obtain upper bounds on the decoding error probabilities. 6.1 THE FANO METRIC
Consider the tree diagram for a rate R = 1/2, binary tree code shown in Fig. 6.1. Starting at the root at time 0, we traverse the tree from left to right and choose the upper branch when the corresponding information symbol is 0 and the lower branch when it is 1. Thus the information sequence 101 ... is encoded as the code sequence 010001... . 269
Chapter 6
270
Sequential Decoding
Code sequence
Figure 6.1 A rate R = 1/2 tree code.
First we will consider "hard decisions," and then we will extend our results to "soft decisions" Suppose that we receive the sequence r = 00 1001 ... over a BSC and that we in the decoding process in some unexplained way have obtained the partially explored tree shown in Fig. 6.2.
Figure 6.2 Partially explored tree.
Regardless which path the decoder eventually will choose, it must pass through exactly one of the leaves in this tree. It is tempting to ask: `Which of these four paths is the most promising to extend?' In the Viterbi algorithm we always compare paths of the same length and prefer the one closest to the received sequence-in, for example, Hamming distance. Here the situation is more subtle. Should we, for instance, prefer 0) with one erroneous symbol of two, instead of v(2) with two erroneous symbols of six? It depends on the channel-if the crossover probability is very small, then v10) is more promising than v(2), but if it is very large, then we should extend v(2)! We will now analyze the situation in more detail. Let us assume that the tree code shown in Fig. 6.1 belongs to the ensemble of binary tree codes in which each digit on each branch is chosen independently with probability 1/2. All information bits are also chosen independently with equal probability. Let X denote
a particular configuration of all symbols in the partially explored tree, and let Hi denote the hypothesis that the transmitted path corresponds to the information sequence u (0, i = 0, 1, ... . The noise symbols are independent and 1 and 0 occur with probability E and 1 - E, respectively, where E is the crossover probability of the BSC. Let ni denote the length of the path vli) in c-tuples, and let di = dH (r10,,,,), vii)) denote the Hamming distance between the path vii) and the corresponding part of the received sequence,
r[o,n,), i = 0, 1, .... Suppose that Hj maximizes P (Hi I X, r), i = 0, 1, .... Then the natural choice for the most promising path to extend is V>>! Using Bayes' rule, we have
P(Hi I X, r) =
P(X I Hi, r)P(Hi I r) P(X I r)
(6.1)
Section 6.1
The Fano Metric
271
where
P(X I Hi, r) =
P(r
I X, Hi)P(X I Hi) P(r I Hi)
(6.2)
and, since the hypothesis Hi and the received sequence r are independent if we do not know the configuration X,
P(Hi I r) = P(Hi) =
2-n;Rc
(6.3)
Without additional knowledge, neither the received sequence r nor the configuration X is dependent on the hypotheses. Hence, we have
P (r I Hi) = P (r)
(6.4)
P(X I Hi) = P(X)
(6.5)
and
Inserting (6.4) and (6.5) into (6.2) and then inserting (6.2) and (6.3) into (6.1) and multiplying by the factor P (r) P (X I r)/P(X), we see that we can equivalently maximize E)n,c-d'd2-(n-n,)c2-n;Rc P(r I X, H,)P(H,) = (I -R)njc-dj (E21-R)dd = 2-nc ((1 -6)21
( 6 . 6)
where n > maxi {nj } is the length of the received sequence in c-tuples. Again, equivalently, in the case of hard decisions we can maximize def
ILF(r[o,n,), v[o n,))
(nic - di)(log(l - E) + 1 - R) + di (loge + 1 - R)
(6.7)
i = 0, 1, ... , which quantity is usually called the hard-decision Fano metric (although it is not a metric in the mathematical sense). The most promising path to extend is the one whose Fano metric 1tF(r[o,n;), v[0 n,)) is the largest! In general, the Fano metric for the path v[o,t) is t-1
AF(r[o,t), v[o,t)) = Y liF(rk, vk)
(6.8)
k=0
where ,F(rk, Vk) is the Fano branch metric. For notational convenience we sometimes use
a
def
R def
log(1 - E) + I - R (6.9)
log 6
+1-R
Then the Fano branch metric can be expressed as
ttF(rk, vk) = ja + (C - j),R
(6.10)
where c - j = dH(rk, vk). It has the distribution
P(iiF(rk, Vk) = ja + (c - j)$) = (c)(1
-
E)JE`-j
(6.11)
G)
if the branch is on the correct (transmitted) path, and, for the ensemble of random tree codes, the distribution
P(µp(rk, Vk) = ja + (c - j)fl) = (c)2 -c J
if the branch is on an incorrect (erroneous) path.
(6.12)
272
Chapter 6
Sequential Decoding
From (6.11) it follows that along the correct path we have
E'[,IF(rk, Vk)] _ ((1 - E)a + EMC
_ ((1 - E)(log(1 - E) + 1 - R) +E(1ogE + 1 - R))c
(6.13)
_ (1 - h(E) - R)c = (C - R)c where h(E) is the binary entropy function and C = 1 - h(E) is the channel capacity for the BSC.
If R < C, then on the average we have an increase in the Fano metric along the correct path.
Along an incorrect path we have
(1
1 R1
E[N-F(rk, vk)] = 2a + 2~ J c
_ ((log(1 - ) + 1 - R) + I(logE+1-R))c
(6.14)
=(1-R+log E(1-E))c<-Rc with equality if and only if c = 1/2. On the average, the Fano metric will always decrease along an incorrect path. The Fano symbol metric is (t)
def
14 = where a and
(e)
def
V")) _ {
a,
if rk') = Vkf)
N,
if Yk')
(6.15)
vkt
are given by (6.9).
EXAMPLE 6.1 Consider a rate R = 1/2 convolutional code used to communicate over a BSC with crossover probability c = 0.045. Then we have the following Fano symbol metric: (e)
(e)
l.LF(rk , vk
)_
I a = log 0.955 + 1 - 1/2 = 0.434,
P=log0.045+1-1/2=-3.974,
if r,(') =
vke)
ifrke)Ovke)
(6.16)
For simplicity, we scale and round the metrics and obtain:
µr
v F(ke) > ke)) =
+0.5,
no error
-4.5,
error
(6.17)
for the BSC with E = 0.045.
Remark.
We can scale the Fano metrics with any positive factor y. If we choose y = (1 + s)/s
(6.18)
R = G(s)/s
(6.19)
where s satisfies
and G (s) is the Gallager function (4.208), then we obtain the metric (4.321) used to prove the error bound for time-varying convolutional codes for the sphere-packing region (Theorem 4.32).
Section 6.1
The Fano Metric
273
We now extend the Fano metrics to the situation when soft demodulator outputs are available. The derivation given above is valid up to equation (6.6), which should be replaced by ni-1
n
P(r I X, HOP(Hi) = fl P(rj I vj`)) 11 j=0
P(rk)2-niRc
k=n;
(6.20)
P(rj I
= F7 P(rk) j=0
k=o
yj`)) 2-niRc
P(rj)
where ni is the length of the codeword v(i) = v(i)vl`) ... vnW- 1 and n > maxj (n j) is the length 0 of the received sequence. Again we take the logarithm and, equivalently, maximize n.-1
def
W
Clog
AF(r[o,ni), V[o n;)) _
P(r1 I P(r1)
- Rc)
/
j=0 ni-1
(6.21)
P(r(f) I v(i)(1))
c
E E log j=0 f=I (
R
vj(i) = vji)(1)vji)(z)
i = 0, 1..... The terms in the and first sum are the Fano branch metrics, and the terms in the double sum are the Fano symbol metrics. where rj =
EXAMPLE 6.2 Consider a rate R = 1/2 convolutional code used to communicate over a binary input, 8-ary output DMC with equiprobable inputs and transition probabilities P(r I v) given by the table: r
0
v
1
04
03
02
01
11
12
13
14
0.434 0.002
0 197 0.008
0. 167 0.023
0.111 0.058
0.058 0.111
0. 023 0.167
0 .008 0.197
0 .002 0.434
.
Then we have the following Fano symbol metrics: r 04
03
02
0
0.49
0.44
0.31
1
-7.27
-4.18
-2.55
01
11
12
13
14
-0.11 -1.04
-1.04 -0.11
-2.55
-4.18
-7.27
0.31
0.44
0.49
For simplicity we scale and round to integers and obtain: r 04
v
02
03
01
11
12
-18
-46
-75
-131
6
8
9
0
9
8
6
-2
1
-131
-75
-46
-18
-2
13
14
274
Sequential Decoding
Chapter 6
6.2 THE STACK ALGORITHM The discussion in the previous section leads us naturally to the obvious stack algorithm, in which we store the Fano metrics for all paths leading to a terminal node and then extend to next depth the path with the greatest Fano metric. The adjective "stack" is used to indicate that the paths in the partially explored tree are ordered by the Fano metrics. The best path is placed at the top of the stack; then we have the second best path and so on. Remark. Although it is an abuse of notations, we have chosen to follow a deeply rooted tradition in using the word "stack" instead of the proper word "list." When we use a rate R = b/c convolutional encoder of memory m to communicate a finite number of information bits, fb say, we append a tail of mb dummy zeros to the information bits in order to terminate the convolutional code into a block code. If the string of dummy zeros is shorter than mb we do not fully exploit the error-correcting capability of the code to protect the last information symbols.
Algorithm S (Stack) S1. Load the stack with the root and the metric zero. S2. Remove the top node and place its successors in the stack according to their metrics.
S3. If the top path leads to the end of the tree, then stop and choose the top path to be the decoded codeword; otherwise go to S2. EXAMPLE 6.3 Consider the rate R = 1/2 binary convolutional encoding matrix G(D) = (1 + D + D2 1 + D2) with memory m = 2. We encode f = 3 information symbols and m = 2 dummy zeros. Assuming that the code is used together with a BSC with c = 0.045, we use the Fano symbol metrics given in Example 6.1, viz.,
+0.5,
-4.5,
no error error
(6.22)
The received sequence r = 01 10 01 10 11 is decoded by the stack algorithm as follows (the stack entries are the paths u(') and the corresponding Fano metrics): Step
stack contents
0 0
1
2
3
4
0, -4
11-4 00,_8 01, -8
10, -3
100, -7
00, -8 01, -8
101,-7 00, -8
1,-4
01,-8 1],-13
11, -13
Step
stack contents
5
6
100,-7 00, -8
1010, -6
7
'
10100,-5
00, -8 - 00, -8
01,-8 - 01,-8
1000,-11 - 1000,-11
11,-13 - 11,-13
01,-8 1000,-11
11,-13
The partially explored tree is shown in Fig. 6.3 together with the Fano metrics. The stack algorithm's estimate is the path 10100 or the information sequence 101.
Section 6.2
The Stack Algorithm
275 r =
01
01
10
10
11
Figure 6.3 Partially explored binary tree-hard decisions.
EXAMPLE 6.4 The same code as that used in Example 6.3 is used to communicate over a binary input, 8-ary output DMC with equiprobable inputs and transition probabilities P(r v) given in Example 6.2 where the following Fano metrics per symbol were calculated: I
04
V
03
02
13
14
01
11
12
-18
-46
-75
-131
6
8
9
0
9
8
6
-2
1
-131
-75
-46
-18
-2
The soft demodulator outputs r = 0112 1302 0413 1103 1211 are decoded by the stack algorithm as follows: Step
stack contents
0
2
1
0, -12 1,_48
0
N
3
0, -48
10, +2
01-48 ,_133
101, -64 100,-121
I1, -133 Step
stack contents
4
6
5
101, -64 - 1010,-58 - 10100,-54
01,-86 - 01,-86 - 01,-86
00, -117 - 00, -117 - 00, -117
100,-121 -
=
100,-121 - 100,-121
l1,-133 - 1l,-133 - l 1, -133
The partially explored tree is shown in Fig. 6.4 together with the Fano metrics per symbol. The stack algorithm's estimate is the path 10100 or the information sequence 101.
r = 0112
0413
1302
-117
Figure 6.4 Partially explored binary tree-soft decisions.
1103
276
Chapter 6
Sequential Decoding
6.3 THE FANO ALGORITHM
The Fano algorithm is an extremely clever algorithm for sequential decoding. It generally visits more nodes than the stack algorithm, but since it requires essentially no memory it is more suitable for hardware implementations. In practice, this more than compensates for the additional computations needed to decode a given sequence. The Fano decoder moves from a certain (current) node either to its predecessor or to one of its immediate successors-it never jumps. The flowchart of the Fano algorithm is shown in Fig. 6.5, where µtook and µpre are the Fano metrics of the successor node which we look at and the predecessor node, respectively. If the decoder looks backward from the root, we assume
that the "predecessor" of the root has a metric value of -oo.
Figure 6.5 Flowchart of the Fano algorithm.
The decoder can visit a node only if its Fano metric AF is larger than or equal to the cur-
rent value of a certain threshold T which takes on only discrete values ... , -2z, -A, 0, 0, 20, ... , where 0 is the stepsize. If the decoder visits a new node then the threshold should be increased by the largest multiple of 0 such that the new threshold T does not exceed the current metric, i.e., such that T < µcur < T + A holds. If it explores the immediate predecessor of the current node and the predecessors metric is lower than the threshold then the threshold should be lowered by A.
If the successor's metric falls below the threshold, then the Fano decoder first moves backward and then tries to move forward to its next best successor. If this fails, another move
Section 6.3
277
The Fano Algorithm
backward is necessary, and so on. Eventually, all paths with metrics above the threshold T have been systematically visited. When the Fano decoder reaches the node where the threshold T was increased the previous time, the Fano decoder lowers T by A and tries to move forward again, now with a lower threshold. This means that when a node is revisited, T + A < µ,u, holds. Eventually, the Fano decoder will reach the end of the tree and complete the decoding procedure. "First visit?" can be detected using Acur and µpre as follows. If µpre < T + A holds, the previous node npre cannot have been visited with a threshold higher than T. Hence, the previous visit of npre was a "first visit" and, since ncur is a successor of np1e, ncur is also visited for the first time.
If Ape > T + A and acur < T + A hold, then it follows that ncur cannot have been visited before, but npre has. Since µCur < T + A, the threshold is already tight. If µpre > T + A and µcur > T + A hold, both npLe and ncur have been visited before and the threshold should not be increased. Hence, the condition for the threshold to be increased can be written:
1 µpreT+A
(6.23)
where the first inequality in (6.23) indicates a first visit and the second inequality in (6.23) indicates that the threshold is not already tight. We conclude this section by giving the Fano algorithm in pseudo-code: Start; Init;
Look forward to best node;
while (alook < T)
\\ see note 1.
{
T 4-T-O
} Move forward; while (not end of tree)
{
\\ see note 2.
if (First visit) { Tighten threshold; }
Look forward to best node;
while (Nµbok < T)
if
(//-pre > T)
\\ see note 3.
{ {
repeat
\\ see note 5. Move back; (From worst node) & (µpre > T) ); until (not \\ see note 7. if (not (From worst node)) { Look forward to next best node; \\ see note 6. } else { (
)
T <-T-O; Look forward to best node; }
} else {
T -T - A; Look forward to best node; } }
Move forward; }
Stop;
\\ see note 4.
278
Chapter 6
Sequential Decoding
Notes:
1. If the threshold is too high, then it must be lowered. 2. This is the main loop. As long as we do not encounter any channel errors, we will execute this loop. 3. The metric is decreased on the chosen path. 4. We have moved forward and tightened the threshold, so we cannot move backward. Since we cannot proceed forward either, we have to lower the threshold in order to continue. 5. After having examined both successor nodes, we have returned to the present node. We will continue backward as long as we are coming from the worst node and are staying above the threshold. 6. As in 5 but here the threshold is too high. We have to decrease the threshold before we can proceed forward. 7. We have moved backward and are looking forward to the next best node that we have not examined.
6.4 THE CREEPER ALGORITHM* The stack algorithm is simple to describe and analyze. It is very attractive from a pedagogical point of view. However, any practical implementation includes accesses to the large stack (external) memory. These accesses will limit the clock frequency. In the Fano algorithm, we can eliminate these external memory accesses and thus execute the algorithm at a higher clock frequency. On the other hand, the Fano algorithm will visit more nodes; this drawback does not override the strong implementational advantages. In this section we will describe an algorithm that is a compromise between the stack and Fano algorithms. It visits more nodes than the stack algorithm but fewer than the Fano algorithm, while it can be implemented without the need of an external memory that slows down the clock frequency. We call this sequential decoding algorithm Creeper since it explores the code tree in a way that resembles the behavior of a creeping plant. Consider the partially explored tree in Fig. 6.6. Let ncur denote the current node with metric µcur, and let its two successors be nx and ny with metrics µx and µy, respectively. Also, assume without loss of generality that ,a > µy holds. We call the path leading to the current node the stem of the partially explored tree. The subtrees stemming from the stem are denoted TI, T ..... When we have computed (for the first time) the metrics of the successor nodes of a given node, we say that these successor nodes are examined but not visited. Let µt, µ2, ... , i17 be the metrics of the best, that is, largest metric, examined but not visited nodes in the subtrees TI, T2, ... , T7, respectively. In Fig. 6.7 we let only the root of subtree Ti, viz., nr,, together with the largest metric of an examined node in that subtree, pi, represent subtree Ti. Assume that the metric of the current node ncur is the largest among the metrics of all examined but not visited nodes in the subtrees, that is,
I-tcur ? max{µt, µ2, ... , /17}
(6.24)
Furthermore, assume that the metric /µ3 is the second largest. The stack algorithm would remove the current top node of the stack, ncur, calculate the metrics of its two successor nodes, nx and ny, and compare these metrics with µ3, the metric of n3, the new top node of the stack. If /ix > µ3 > µy, then the stack algorithm would proceed to node nx, which would be the next top node; n3 would again be the node with the second largest metric on the stack. If A3 > µx > µy, then node n3 would remain as the top node of the stack. In the next step, the stack algorithm would remove the top node (n3) from stack and so on.
Section 6.4
The Creeper Algorithm*
279
Figure 6.6 A partially explored tree.
nr2, µ2
Figure 6.7 The stem corresponding to the partially explored tree in Fig. 6.6.
nTI, Jul
Creeper has only stored (some of) the nodes shown in Fig. 6.7. Hence, it cannot move directly to n3 in the subtree T3. Instead, since 3 is the largest metric, Creeper will move to n 3 and with n as the new current node work its way to node n3. When n 3 is made the current node, the stem is pruned at node n3, which is stored together with the metric value 1 3
280
Chapter 6
Sequential Decoding
representing the metrics of all examined nodes stemming from n3, viz.,
µ3 = max{µa, /15, µ6, h7, µ.z}
(6.25)
and the decoding process is continued. (In general, Creeper stores only a subset of the nodes along the stem. Moreover, only a subset of these is considered important enough to warrant the storage of the associated it-value.) If we, as in the stack algorithm, always visit the nodes with the largest metrics, we will jump to nT3 also when µ3 is only slightly larger than it,. This causes a substantial increase in the number of computations. By introducing a threshold T, we can restrict the algorithm to return to only those nodes that are sufficiently promising, that is, with metrics above the threshold T. These nodes are called buds and are stored on a node stack. The nodes on the node stack are referred to by the node stack pointer NP. The top node
is denoted n(NP), the second n(NP - 1), and so on. The node n(NP - 1) corresponds to the closest bud. Some of the nodes on the node stack are considered so good that their metrics should affect the threshold, viz., increase it. These nodes are called T-nodes, and to each T -node there is a metric value stored on the threshold stack. This metric value, denoted it,, is the metric of the best examined but not visited node in the subtree stemming from that node the previous time that subtree was explored. This value is a measure of what we expect to find if we later should (re-)visit that subtree. For each node on the node stack there is also a flag, F, denoted F(NP), that indicates whether or not the corresponding node is a T-node. The elements on the threshold stack are referred to by the threshold stack pointer T P, and the elements on the stack are denoted µT (T P), it, (T P - 1), and so on. We also denote the subtree stemming from the T-node corresponding to the value it, (T P) by i (T P), and so on. The current threshold T is computed as Q (µT (T P - 1) ), where
Q(x)=IAXI A
(6.26)
is a quantization function that converts x into the largest multiple of a positive parameter 0 not
larger than x. By convention, we let Q(-oo) = -oo. The threshold for the current subtree is constructed with the help of the metric of the best examined but not visited node in r (T P - 1). Creeper must remember the maximum metric, µmax, of all nodes that have been examined so far. If it finds a node (nx) with a metric larger than Amax and also ny has a metric above T, then these nodes are stored as T-nodes and the new threshold will be computed as T = Q(µy).
Thus, the threshold is raised only when it is likely that the decoder is following the correct path, that is, when a new lµmax-value is found. It is less likely that a threshold raise occurs while examining an incorrect path. All non-T-nodes in r (T P) that are stacked have metrics it satisfying T < It < A., but these non-T-node buds are not necessarily stacked according to increasing metrics. The threshold T will determine the further actions of the algorithm. Creeper will continue to explore the tree forward to n,, if it, > T. If µ y > T, n y should also be stored on the node stack. If it, < T, the decoder cannot move forward and will leave the current subtree for the subtree stemming from n(NP - 1). Depending on whether or not that node is a T-node, different actions must be taken. If it is a T-node, the decoder must decide whether or not to keep the T-node on the threshold stack. If it is a non-T-node, a backtracking move is made in order to systematically examine all paths above the threshold. We are now well prepared to give a formal description of Creeper. For simplicity we consider only rate R = 11c convolutional codes. The Creeper algorithm for rate R = b/c convolutional codes is given at the end of this section.
Section 6.4
281
The Creeper Algorithm*
Algorithm C (Creeper rate R =1/c)
C1. Store the root node, n(0), and a dummy sibling, n(-1), on the node stack as T-
nodes, i.e., F(0) = F(-1) <- 1. Set µr(-1) F- -oo on the threshold stack. Set µm
-oo, T P = NP
0, and ncur 4- root. See Fig. 6.8.
> t) and the threshold T = C2. Compute the successor metrics it., t t, Q(.t (T P - 1)). Ties are resolved by choosing n, among the successors in an arbitrary way. C3. Perform one of six actions depending on the corresponding conditions. The actions are specified in Fig. 6.9.
C4. If we have reached the end of the tree, stop, and output the current path as the estimated information sequence; otherwise go to C2. A40) _ ? n(O) = ncur = root
Figure 6.8 The stack looks like this when Creeper starts. The purpose of the dummy node,
or rather its associated threshold µr (-1) = -oo, is to guarantee that Rule One is the very first rule
n( -1) = dummy
to be applied. Thus, in the sequel none of the
µr(-1)=-0o
root's successors will be removed.
The exchange ( , )-operation in Fig. 6.9 simply changes places of the two elements on the top of the appropriate stack. Stacking at most two (2b in general) elements at each depth makes the hardware memory requirements roughly proportional to the code tree depth. Note that the initial conditions guarantee that at the very first iteration Rule One always will apply. It is easy to see that always exactly one of the six conditions above will apply. If the received sequence is error-free, the condition one will always hold, thus, making the decoder stack all nodes as T-nodes. In Figs. 6.10-6.17 we illustrate how Creeper explores the tree and how the stacks are affected by the different rules. Note that in these figures T P and N P have not been updated in order to make it easier to follow the actions.
Consider the rate R = 1/2 binary convolutional encoding matrix G(D) = (1 + D + 1 + D2) with memory m = 2. Assuming the encoder is used together with a BSC with crossover probability e = 0.045, we use the Fano symbol metrics D2
I
µF
a = +0.5,
- jl $ = -4.5,
no error error
(6.27)
Since we have two symbols on each branch, the Fano symbol metrics in (6.27) correspond to the Fano branch metrics +1, -4, and -9, when we have 0,1, and 2 assumed errors per branch, respectively. Let r = (01 01 00 10 00 ...) be the received sequence. The code tree and its node metrics are given in Fig. 6.18. Finally, we assume the threshold spacing A = 3. In Fig. 6.19(a-h) we show the first eight steps of Creeper's travel through the code tree and how the variables involved change. In case of a tie between the successor metrics, we
assume that the zero path is always chosen first. Big black circles are T-nodes, big white circles are non-T-nodes, and the small black circles correspond to nodes that have been visited but that are not stored on the node stack. The node currently being visited is the node from which the two dotted arrows originate. These arrows indicate that the nodes pointed to are currently being examined. The dotted lines
282
Chapter 6 Rule
Condition T < Ay
One
Sequential Decoding
Decoder actions
F nx n(NP + 1) ny n(NP + 2) F nx
and Ax > Amax
Ar(TP+1)
tty
A.r(TP+2)-00 F(NP + 1) - 1 F(NP + 2) E- 1 NP NP+2
TP -TP+2 Amax F Ax
T < Ay
Two
ncur <-- nx
and
n(NP + 1) E-- ny
Ax <_ Amax
n(NP+2) Fnx F(NP + 1) - 0 F(NP + 2) E-0 NP NP+2
Ay < T < Ax
Three
ncur F- nx
Ar(TP) F max{Ay, A,(TP)} Amax <- max{Ax, Amax}
n(NP - 1) A,(TP)
Ax < T
Four
and
F(NP - 1) = 1 and max{Ax, Ar(TP)}
> Q(AV(TP - 3)) Ax < T and
Five
F(NP - 1) = 1
(T
exchange (Ar(T P), Ar(T P - 1)) exchange (n(NP), n(NP - 1))
exchange (F(NP), F(NP - 1)) /.6r(TP) F -00
n,ar - n(NP - 1) Ar(TP - 2) F-- max{Ax,
A,(TP), Ar(TP - 2)}
NP F- NP - 2
and max{Ax,Ar(TP)}
TP-2
TP
< Q(Ar(TP - 3)) cur F- n(NP -
Ax < T and
Six
F(NP-1)=0
1)
Ar (T P) <- max{Ax, Ar (T P)}
NP SNP-2
Figure 6.9 Six rules that specify the behavior of Creeper for rate R = 1/c.
p,(TP - 2)
n(NP - 4)
&T(TP)
n(NP - 2) n(NP)
nx
Figure 6.10 The situation could look like this before one of the rules One, Two, Three or Six is ap-
plied. Thenodesn(NP-2), n(NP-3), n(NP4) and n (N P - 5) are T-nodes and marked , the
n(NP - 5) Pr(TP - 3) Pr(TP - 2) n(NP - 4)
n(NP - 3) n(NP - 1)
ny
P7(TP - 1)
Pr(TP)
n(NP - 2) n(NP)
nodes n(NP - 1) and n(NP) are non-T-nodes and marked 0, the nodes marked . are not stored on the stack.
P7(TP+ 2) _ -oo n(NP + 2) = ncur
n(NP - 5)
n(NP - 3) n(NP - 1)
n(NP+ 1) = ny
P,r(TP - 3)
/1r(TP - 1)
P7(TP+ 1) = Py
Figure 6.11 Rule One: Move forward and stack both successors as T-nodes. Two elements, corresponding to the two new T-nodes, are stacked on the threshold stack.
Section 6.4
The Creeper Algorithm*
283
p,(TP - 2) n(NP - 4)
Figure 6.12 Rule Two: Move forward and stack both successors as non-T-nodes.
n(NP - 5) p,r(TP - 3) p,(TP - 2)
Figure 6.13 Rule Three: Move forward and update AT(TP) with max{Ay, Ar(T P)}. Since ny has been examined but will not be visited this time, r(TP) is visited and consequently A, (T) will be updated using Ay.
delete the two top elements on the node stack. Update it, (T P) with max{Ax, Ar (T P)}.
Figure 6.15 The situation could look like this before one of the rules Four or Five is applied. The metric value max{A,r, Ar (T P)} of the best examined but not visited node in r (T P) is compared
to Q(Ar (T P - 3)).
node on the stack and exchange the two top elements of the node and threshold stacks.
n(NP - 3) n(NP - 1) iir(TP - 1)
1 n(NP - 5) p,(TP - 3)
n(NP - 3) n(NP - 1) p,(TP - 1)
recur
o
µ7(TP)
n(NP - 4)
n(NP - 2)
n(NP - 5) p,(TP - 3)
n(NP - 3)
p,(TP - 2) n(NP - 2)
µr(TP) n(NP)
n(NP - 3)
n(NP - 1) pr(TP - 1)
recur
M,(TP - 1)
nr
recur
A
pT(TP - 3)
n(NP - 2)
pr(TP - 1) n(NP - 1)
n(NP - 3)
n(NP) = recur
Ar(TP - 3)
pT(TP) _ -co
pr(TP - 2) n(NP - 2)
Figure 6.17 Rule Five: Move to the closest (T)node on the stack and delete the two top elements on the node and threshold stacks.
n(NP + 1) = ny
1i (TP)
n(NP - 2) n(NP)
pT(TP - 2)
Figure 6.16 Rule Four: Move to closest (T)-
n(NP + 2) = recur
n(NP - 4)
s, (TP - 2)
Figure 6.14 Rule Six: Move sideways-backward to the closest (non-T-)node on the node stack and
pT(TP)
n(NP - 2) n(NP)
n(NP - 3) pr(TP - 3)
recur
ny
284
Chapter 6 r =
01
00
01
Sequential Decoding
00
10 00
-8
00 11
10 11
00
01
-4
11
11
10
00 00
0
-8
01 01 10
0
00
-13
11
11 11
10
00
10
01
-11
-4
-7 01
11
01
00
-3
01 10
-7
10
11
- 16
01 -a -10
6
10
-10
Figure 6.18 The part of the code tree with its node metrics that is used in the example.
correspond to the subtrees in r(TP) that have been examined or visited and that have been discarded. The metrics at the end of these lines are the metrics of the nodes at the end of these paths, that is, nodes that have been examined but not visited. Evidently, these metrics lie below the current threshold. For each graphic in Fig. 6.19(a-h), we describe the situation and state which condition will hold for that situation.
(a) The initial condition guarantees that Condition One holds, leading to stacking of the successors as T-nodes and updating lµmax to -4.
(b) The threshold T is given by Q(-4) = -6, so both successors lie below T. Thus, it, (2) is updated to
The closest node on the stack is aT-node, andmax{-8, -oo}
is not less than Q(µr(-1)) = -oo, so Condition Four holds. (c) Only one successor node lies above the threshold, so Condition Three holds, making the other node update µr (2). Also, µmaX is updated to -3. (d) Both successors lie above T, but none exceeds /µmaX, so Condition Two holds.
(e) Both successors lie below T, and the closest node on the stack is a non-T-node. Thus, Condition Six holds, so p (2) is updated to -11. (f) Only one successor node lies above the threshold, so Condition Three holds, making the other node update it, (2), which will not change.
(g) Both successors lie below T and the closest node is a T-node, and max{-10, -11 } is not less than Q(µ(-1)) = -oo, so Condition Four holds. (h) We have the same situation as in (b) except that we have a threshold that is lowered by an amount of 2A compared to (b), and a new Amax value.
From Fig. 6.19(g) it is evident that all nodes in r(2) with metrics above T = -9 have been visited (metrics -4, -3, -7, -7, -6) and that no node with metric below T has been visited. Furthermore, the best metric of all nodes that have been examined but not visited
Section 6.4
The Creeper Algorithm*
0
(a)
.;
285
-4
(b)
-4
-4
1) _ -oo
µT(
µT(1) = -4
T= -oo
T= -6 T= -9
T= -9 µT(1)=-8 -4
o
i-13
(c)
-4
µT(1) = -8 0
+--3
-7
µT(2) = -13
T= -9
-11
µTM = -8
-11
T= -9 -11
0
(f)
(e)
-4 -3 -
-7
T= -9
µ (1) = -8
-4 -13
-4 -13
16
µT(2) = -13
0
-7
-4 -3
µT(2) = -oo
-4 -3
-4 -13
(d)
µT(2) = -11
-11 µT(2) = -oo
-11 16,e-10
0
-4 8
(h)
(g)
-10 -4 -3 -7 µT(2) = -11
-6
4 -8
-4 µT(1) = -10
T= -12
Figure 6.19 The first eight steps of Creeper when A = 3, and the received sequence is (01 01 00 10 00 ...). The node metrics are shown next to the corresponding nodes.
(metrics -13, -11, -11, -16, -10, -10) is remembered (µ7 (1) in Fig. 6.19(h) after the exchange of stack elements). The example shows that Creeper, contrary to the Fano algorithm, does not always increase
(tighten) the threshold when a node is visited for the first time. Instead, Creeper continues in the current subtree as long as the best known alternative is not better; the threshold is constructed using the metric of the best known alternative as T = Q(µr (T P - 1)). When a new largest (so far) metric, µm, has been found, a new T-node is stacked (Rule One) and a threshold increase occurs. Usually, the threshold increases occur along the correct path; along the incorrect paths the threshold is usually kept constant, allowing Creeper to exhaust most of the incorrect subpaths to be explored in one forward attempt. In the incorrect subtree the Fano algorithm moves forward, tightens the threshold, backs up, lowers the threshold, continues forward, and so on, visiting the same incorrect nodes several times. After the threshold T has been decreased (as a consequence of Rules Four or Five), Creeper will always visit at least one node that it had not visited before. We conclude this section by briefly discussing an extension of Creeper to rate R = b/c convolutional codes. In this case, any number of the 26 successor nodes to the current node could be suitable for further investigation, that is, have a metric above the current threshold. At each depth a variable number of nodes could be stacked, not just zero or two as in the original
Creeper for R = 1 /c.
286
Chapter 6
Sequential Decoding
Stems from earlier levels are not allowed to be visited until all possible paths stemming from nodes at larger depths are exhausted. Therefore, we have to store the number of promising
nodes at each depth along with these nodes, in order to keep track of remaining candidate stems. This number is decreased as paths leading into "bad" parts of the tree are discarded using a certain threshold, but could be increased when the parent node is revisited with a lower threshold, which may allow more successor nodes to be promising than at the previous visits. Since we have to control a varying number of stacked nodes and corresponding µr -values at a certain depth, we introduce two stacks similar to those in the rate R = 1 /c version, viz., a node object stack and a threshold object stack. An object (of any of the two kinds) now holds several elements, all associated with sibling nodes at the same depth. The node object stack
pointer, NP, is used as reference to the objects on the node object stack, and the threshold object stack pointer, T P, as reference to objects on the threshold stack. A node object contains: N, denoting the number of promising sibling nodes at that depth; the N sibling nodes denoted n1, n2, ... , nN; and a flag F, denoting whether the N nodes are T-nodes (F = 1) or not (F = 0). The nodes are initially sorted such that n, has a metric not less than that of n,+1
A node object is denoted N(NP), N(NP - 1), and so on, depending on the relative position of the object from the top of the node object stack. A threshold object contains N metric values, µ.1r, µ7, ... , Ar . where N is the number of corresponding T-nodes. These metrics are kept sorted so that the relations µ`r > µ`7+1, i =
1,2,...,N-1, hold. Each such value is the metric of a node that has been examined but not yet visited in the subtree stemming from the corresponding node in an appropriate node object on the node object stack. A threshold object is denoted µ(T P), µ(T P -1), and so forth, depending on the relative position of the object from the top of the threshold object stack. A specific value or node within a particular object is denoted N(NP).N (the number of promising siblings in the topmost node object). The flag F and the N nodes are referred to in a similar way, and so are the µ`r values in the threshold objects in the threshold object stack. The currently visited node is denoted The 26 successors to ncur are denoted n 1 , n2, ... , n2b and their corresponding metrics > µ2b, possibly after sorting. are µl > µ2 > Now we are ready to give a formal description of the rate R = b/c version of Creeper.
Algorithm C (Creeper rate R = b/c)
C1. Let TP = 0. Let NP = 0. Store a node object, N(NP), on the node object stack
with N(NP).N = 2, N(NP).F = 1, N(NP).nl = root, and N(NP).n2 = a dummy node. Let µ(T P) be a corresponding threshold object with
µ(T
µ(T P).µ7 = -no
Finally, let T = -oc, root, and Amax = -00. C2. Compute the threshold T = Q(µ(T P).µ7). Compute and sort the metrics of all successors to nCur. Ties are resolved in an arbitrary way. Let N be the number of successors having metrics not smaller than T. C3. Perform one of six actions specified in Fig. 6.20.
C4. If we have reached the end of the tree, stop, and output the current path as the estimated information sequence; otherwise go to C2.
Section 6.4
The Creeper Algorithm*
Rule
Condition
One
N>2 and µI > µmax
287
Decoder actions near = ni
N(NP + 1).ni = ni, i = 1, 2, ... , N N(NP + 1).N = N N(NP + 1).F = 1
NP=NP+l
µ(TP + 1).µr = -oc
=µi,i =2,3,...,N if N < 2b, then µ(T P).µ4 = max{µN+,, µ(T P).µi}
TP=TP+1 /.tmax = max{/-tmax, Ad
Two
N>2 and µt < Amax
ncur = n,
N(NP + 1).ni = ni, i = 1, 2, ... , N N(NP + 1).N = N N(NP + 1).F = 0
NP=NP+1 Three
N=1
if N < 2b, then µ(T P).µi = max{µN+,, µ(T P).µ4} ncur = ni Itmax = max{Amax, /-ti}
Four
N=0 and N(NP).F = 1 and
Five
max{µ,, µ(TP).µi} Q(µ(TP - 1).µt) N=0 and N(NP).F = 1 and
max{µ,, µ(TP).µi} < Q(µ(T P - 1).µi)
µ(T P).µ4 = max{µ2i µ(T P).µ4} near = N(NP).n2
µ(TP).µT = max{µt, µ(TP).µi} Resort the N(NP).N thresholds in µ(TP) Resort the N(NP).N nodes in N(NP) accordingly
µ(TP).p = -oo ncur = N(NP).n2 µ(TP - 1).µT = max{µl, µ(TP).µi, µ(TP - 1).µi)
Delete N(NP).nt Delete µ(TP).µi Renumber the remaining N(NP).N - 1 thresholds in µ(TP) Renumber the remaining N(NP).N - 1 nodes in N(NP)
N(NP).N = N(NP).N - 1 if N(NP).N = 1, then NP = NP - 1, and TP = TP - 1 Six
N=0 and N(NP).F = 0
ncur = N(NP).n2 µ(T P).µ4 = max{µi, µ(T P).µ4}
Delete N(NP).nl Renumber the remaining N(NP).N - 1 nodes in N(NP) N(NP).N = N(NP).N - 1 if N(NP).N = 1, then NP = NP - 1
Figure 6.20 Six rules that specify the behavior of Creeper for rate R = b/c.
The "resort" operation in Rule Four is simple. All thresholds in the threshold object are already sorted except for threshold one, which only has to be inserted in the proper place. Node one is inserted in the same place in the node object so that the node and its corresponding threshold always have the same order number. The "renumber" operation in Rules Five and Six is even simpler since after threshold one has been deleted the remaining thresholds remain sorted as before. The "resorting" operation corresponds to an exchange of two elements when we have a rate R = 1 /c code.
288
Chapter 6
Sequential Decoding
6.5 SIMULATIONS In the table below, we compare the stack and Fano algorithms with Creeper. We have simulated the transmission of a large number of frames over a BSC with crossover probability e = 0.034
which corresponds to a rate R = 0.9R0, that is, 10 % below the computational cutoff rate Ro when R = 1/2. We used the rate R = 1/2, memory 23, ODP convolutional encoding matrix G(D) = (75744143 55346125) with free distance dreee = 25. (The polynomials are given in octal notation.) Information sequences of 500 bits were encoded as a 1046 bit code sequence before the transmission; that is, m = 23 dummy information zeros were used to terminate the convolutional code (ZT). We have simulated the transmission of 1,000,000 frames (200,000 for Fano) when the average number of computations needed to decode the first branch, C1, was estimated. When the average number of computations needed to decode a branch, C, was estimated, 200,000 frames were transmitted. The frames were aborted when C and C1 exceeded 100. Due to the large encoder memory, no frames whose decoding terminated normally were erroneously decoded. (The A-values in the table are optimized empirically and shown before scaling and quantizing to integers.) Algorithm
Ct
C
stack
1.20 1.23 1.71
1.29 1.54
Creeper, A = 2.6 Fano, A = 4.8
2.26
The estimations of C suggest that Creeper is significantly more efficient than the Fano algorithm and only slightly worse than the stack algorithm. Finally, in Figs. 6.21 and 6.22, we show simulations of the computational distributions for the number of computations P(C1 > x) and P (C > x), respectively, for the three algorithms. 10°
Figure 6.21 Simulations of the computational distribution functions P(C> > x) for the Fano 100
101
x
102
(F), Creeper (C), and stack (S) algorithms for the BSC with crossover probability E = 0.034.
Section 6.6
Computational Analysis of the Stack Algorithm
289
100
Figure 6.22 Simulations of the computational distribution functions P (C > x) for the Fano (F), Creeper (C), and stack (S) algorithms for the BSC with crossover probability E = 0.034.
100
102
6.6 COMPUTATIONAL ANALYSIS OF THE STACK ALGORITHM
The curse of sequential decoding is that the number of computations needed to decode a received sequence varies with the channel noise in a particularly nasty way. Consider the ensemble E(b, c, oo, 1) of infinite memory, time-invariant, rate R = b/c convolutional codes, and the partially explored binary tree shown in Fig. 6.23. For simplicity we assume that the decoder operates continuously in the infinite code tree, and since the codes have infinite memory we assume that the decoder never makes any decoding errors. Let Ti, i = 1, 2, ... , denote the set of extended nodes in the ith incorrect subtree (0 denotes the empty set.). If we count as a computation every extension of a node, then the number of computations needed to decode the ith correct node, C,, can be written as
Ci=1+ITi 1
(6.28)
i = 1 , 2, ... , where I T I denotes the cardinality of the set Ti. Since for the ensemble of random
Figure 6.23 Partially explored binary tree.
Chapter 6
290
Sequential Decoding
infinite tree codes the statistical properties are the same for all subtrees, the random variables Ci, i = 1, 2, ... , all have the same distribution, but they are certainly not independent. Thus, for the average number of computations per branch, Cav, we have Cav = E[Ci}
(6.29)
all i. Without loss of generality, we will only consider the first incorrect subtree, that is, the incorrect subtree stemming from the root. It is obvious that Cav will depend on the distribution of the minimum of the metrics along the correct path. In Section 6.1 we introduced the Fano metric. Now we will consider a more general symbol metric for the BSC, viz.,
µ - Sl
a=log(1-e)+l-B, no error + 1 - B,
,B =log 6
error
(6.30)
where the parameter B is called the bias. When B = R, we have the Fano symbol metric. The following lemma is important when we want to characterize the behavior of the metric along the correct path. It follows from Wald's identity (Corollary B.6), and it gives a most useful upper bound on the minimum of the metric along the correct path.
Lemma 6.1
Consider the BSC and let µm;n denote the minimum of the metric
µ(r[o,t), v[o,t)) along the correct path; that is, /Lmin = min{/,(,(r[o,t), V[p t))}
(6.31)
Then P(/imin _< x) <
2-fox
(6.32)
where .lo is the smallest root of the equation
g(,)
def E[2Aoh(rt,vr)}
_ ((1 - E)2"oa + 62xof)c = 1
(6.33)
where a and 0 are given by (6.30). Equation (6.33) always has the root Xo = 1. To obtain a nontrivial bound from (6.32), equation (6.33) should have a root X0 < 0. This problem is addressed in the following (see Example B.1)
Corollary 6.2 Consider a convolutional code of rate R < C, where C is the channel capacity of the BSC. Then, for the Fano metric there exists a negative root of (6.33), viz., A0
1
s s
(6 . 34)
where s satisfies the equality
G(s) = sR
(6.35)
and G (s) is the Gallager function (4.208). Let the information sequence ulo J) also denote the corresponding node at depth j in the first incorrect subtree, and let us introduce the function (u [o,J) )
1,
if node u'o ) is extended
0,
else
(6 . 36)
Then, from (6.28) it follows that 00
Cav = E[Cil = I +
L
Y, E[tp(u[o,J))}
J=1 u[o.i)ET
(6.37)
Section 6.6
Computational Analysis of the Stack Algorithm
291
Consider the random walk So = 0, S',_., S associated with the node u(0 J), where t-1
S; _
Z,
(6.38)
i=o
and Z =
ri) is the ith branch metric along the incorrect path u[o j). In the ensemble £(b, c, oo, 1) the branch metrics along an incorrect path, Z', are independent, identically distributed, random variables with the distribution (see Example B.2)
P(Zi' = ja + (c - j)$) = (
j)
(i) 2
(6 . 39) c
j = 0, 1, ... , c. Hence, we have
E[Z']=I
Ia+
c=(1-R+ E(1-e))c<0
1
(6.40)
and
gz; (),) = E[2xz']
_G
2Ya +
(6.41)
22)LP)c
For a node u[o J) in the incorrect subtree to be extended, it is necessary that the random walk So, S ' 1 ,
.
. ,
S'. does not drop below a barrier at Amin. What will happen when mint {S" } = Amin?
It depends on how the stack algorithm is implemented. A common and efficient method of resolving the ties when two nodes have exactly the same cumulative metrics is to use the "last in/first out" principle. In our analysis, we will adopt the more pessimistic view: in case of a tie between a node on the correct path and an incorrect node, always extend the incorrect node. Let
fj(y,z)
def
P(St,
>y, 0
(6.42)
Then the probability that a node u1l0 j) in the first incorrect subtree will be extended, given that /-tmin = y, can be upper-bounded by E[tp(u[0,J))
y] _
I
(6.43)
fj(y, Z) z>>Y
Thus, we have 00
E[cp(u'[O J)) I Amin = y]
E[C1 I /-train = y] = I + J=1 U0i)-T,
(6.44)
= 1 + Y(2b - 1)2b(J-1) E fj (y, z) 00
j=1
z?Y
From (6.40) and Lemma B.1 it follows that the infinite random walk So, Si, ... , S', ... will cross from above any finite barrier at y < 0; that is, 00
57
(6.45)
fJ (y, z) = 1
j=1 z
Furthermore, since the random walk SI will eventually cross the barrier y, it follows that 00
E fj (y, Z) _ E E ft (y, z) zay
(6.46)
t=J+1z
Hence, inserting (6.46) into (6.44) yields 00
E(2b E[C1 I /min = y] = 1 + j=1
-
00
1)2b(J-1) E
t-j+1z
ft
(y,
Z)
(6.47)
Chapter 6
292
Sequential Decoding
Next we replace the "1" in (6.47) by the double sum in (6.45) and obtain 00
E[CI
/ti. = Y] _
00
ft(Y, z)
- 1)2b(J
T(2b
00 1)
00
ft (y, Z)
t=j+1z
j=1
t=1 z
00
00
=EEft(Y,z)+>2bJ E Ef(Y,z)
r
t=j+1z
j=1
t=1 z
-E2 b(j-1) r Oc' :: E ft (Y, Z) 00
00
= y2b' j=0
00
t=j+1Z
j=0
)7, ft (y, z)
t=j+2 z
cc
2b(t-1) E ft (Y, Z)
2b' E fj+l (Y, z) j=0
00
Eft (y, z) - y2bJ
00
_
(6.48)
t=j+1z
j=1
z
z
t=1
From Wald's identity (Corollary B.4) it follows that 00
T
2bt
t=1
ft (y, z) < 2-xi(y+cT)
(6.49)
z
where Al > 0 is a positive root of gz; (A 1) = 2-b
(6.50)
(That g'z, (.k1) < 0 and the existence of such a A 1 for the Fano metric are shown in Example B.2.)
Then, we obtain by combining (6.48) and (6.49) the upper bound E[C1 I Nmin = Y] < 2-b2-Ai(y+cf)
(6.51)
Thus, we have proved the following
Lemma 6.3 The average number of computations needed to decode the first correct node given that the minimum metric along the correct path is /-min = y is upper-bounded by
E[CI I
= y] < 2-"
(y+c#)-b
(6.52)
where ),I is a positive root of equation (6.50) and f is given by (6.30).
For the Fano metric we show in Example B.2 that .l1 = 1+ is a positive root of (6.50) where s satisfies (6.35). Hence, we have
Corollary 6.4 For the Fano metric we have E[C1 I /-min = Y] < 2
I+s
(6.53)
where s is the solution of equation (6.35). Without any essential loss of accuracy we assume that the metric values can be written
a = aot
(6.54)
0 =1808
(6.55)
and
Section 6.6
293
Computational Analysis of the Stack Algorithm
where ao and 00 are integers and S > 0. Then {µn-,in} = {0, -8, -28, -36, ...} and we have
E[C1 I A .in = -i6]P(btmin = -iS)
Cav = E[Cil = i =o 00
<
2 xl(i8-cfi)-b P(1lA,min
-= -iS)
i=0
=
-(i + 1)6)))
-i8) - P(iLmin
(6.56)
,=0
= 2-x,c9-b
(2ixlP(.n <
00
j S) -
= 2-x,c,B-b ((1
2(i-1)x"P(N-min < -l8) i=1
i=0
- 2-X's)
1\
i=0 \E
2`;"sP(Htmin < -iS) + 2-x13
Now we use Lemma 6.1 and obtain Cav < 2
1 c #-b
((1 - 2 x18
(00 2ix1821A08)
+2
(6.57)
x's/
i=0
The sum converges if A0 + X1 < 0 and, finally, we have
1 - 2 x,s
Cav < 2-)lco-b
+ 2_x,8
1 - 2(xo+x,)s
1 - 2x08
= 2-xi c,6-b
(6.58)
< 2-x,
A0
,lo + Al
I - 2(),o+x1)8
where the last inequality follows from the fact that 1 have proved
X-x
is decreasing for x > 0. Thus we
Theorem 6.5 The average number of computations per branch for the stack algorithm when used to communicate over the BSC is upper-bounded by < A0 2-xlcfi-b (6.59)
Cav
)'0 + X 1
where A0 and .l1 are the negative and positive roots of equations (6.33) and (6.50), respectively,
andA0+A1 <0. The factor xo' in the upper bound (6.59) regarded as a function of the bias B achieves +Xl its minimum for B = R, that is, for the Fano metric (see Problem 6.10). Hence, the following corollary is of particular interest. Corollary 6.6 If we use the Fano metric, then the average number of computations per branch for the stack algorithm when used to communicate over the BSC is upper-bounded by Cav <
s
s-1 2
i'+=cfl-"b
for R < Ro
(6.60)
where s is the solution of equation (6.35) and Ro is the computational cutoff rate. Proof. For the Fano metric the inequality
0+X1 =
1-s
l+s <0
is satisfied if and only ifs > 1, that is, if and only if R < R0.
(6.61)
294
Chapter 6
Sequential Decoding
By choosing the bias B = Ro we obtain the Gallager symbol metric [Ga168] for the BSC,
a=log(1-E)+1-Ro
11,0 =
,
Remark.
+ - R oo
= log's
(6.62)
Although the Gallager metric gives a weaker bound on the average number
of computations, it is, as we will see in the sequel, important when we analyze the error probability for sequential decoding. Since the Gallager metric is the same as the Fano metric when R = R0, it follows from
(6.34) that Ao = - 1/2 for the Gallager metric (where we also have used that s = 1 when R = Ro, cf. (4.316)). For R < Ro and bias B < Ro the root Al < 1/2 (see Problem 6.11). Corollary 6.7 If we use the Gallager metric, then the average number of computations per branch for the stack algorithm when used to communicate over the BSC is upper-bounded by Cap <
-2-x' cfl X0
b
for R < Ro
(6.63)
where Ao = -1/2, Al is the positive root of the equation c
(2Ata +
2
b
(6.64)
a and B are the Gallager symbol metrics given in (6.62), and A0 +'k I < 0. Since Cav is finite for R < Ro, we obtain from the Markov inequality (Lemma 5.14):
Theorem 6.8 When the stack algorithm is used to communicate over the BSC, then the computational distribution for the number of computations needed to decode any correct node, C, , i = 1, 2, ... , is upper-bounded by
P(C1 > x) < Cavx
1,
for R < Ro
(6.65)
where Cay is the average number of computations per branch and Ro is the computational cutoff rate. Remark. For the ensemble E(b, c, oc, 1) of rate R = b/c, infinite memory, timeinvariant convolutional codes, Zigangirov [Zig66] has shown that for the stack algorithm used
with the Fano metric the sth moment of C,, E[C'] is finite for 0 < s < 1 if R < G(s)/s and, hence, that
P(C, > x) < E[C;
]x_s
(6.66)
Furthermore, for the ensemble of general, nonlinear, infinite memory trellis codes, Zigangirov strengthened (6.66) to
P(C1 > x) < OR 0)X_'
(6.67)
R < R(S) = G(s)/s
(6.68)
for
where 0 < s < oo and OR (1) depends on the rate R but not on x. As a counterpart to the upper bound on the computational distribution (6.65) we have the following lower bound:
Section 6.6
Computational Analysis of the Stack Algorithm
295
Theorem 6.9 When the stack algorithm is used to communicate over the BSC, then the computational distribution for the number of computations needed to decode the information b-tuple u; _ 1, C; , i = 1, 2, ..., is lower-bounded by
P(C; > x) > x-s+o(1)
0
(6.69)
for at least one information b-tuple u;_t, i = 1, 2, ... , where s is the solution of (6.35), that is, s = EC h(R)/R, and o(1) --* 0 when x -p oo. Proof. The proof is similar to the proof of Lemma 4.25 and Theorem 5.18. The theorem states that f o r at least one i , i = 1 , 2, ... , and any s > 0 there exists a value xE such that for any x > xE we have
P(Ci > x) > x-(S+E)
(6.70)
Suppose that (6.70) does not hold. Then as a consequence there exist a rate R = b/c convolutional code decoded with the stack algorithm and a certain s > 0 such that for all i , i = 1 , 2, ... , and any large xE there exists an x > xE such that
P(C, > x) < x-(s+20 i = 1, 2....
(6.71)
We terminate this convolutional code (with very good computational distribution according to (6.71)) into a block code B of rate R (no zero-tail) and block length N. In order to decode this code, we use the stack algorithm as a (block) list decoder with list size L = x. Assume that the decoder operates only up to depth N/c, where the block length is
N=
E' h(R) log x
(6.72)
Sph _
E. (r)R
and
r=R-1Nx
(6.73)
Each time the decoder reaches the depth N/c, it stores the corresponding node on the list and chooses the next node from the stack and operates according to the rules of the stack algorithm.
Assume that the stack algorithm stops after making x(2b - 1)-t computations. Since each computation of the stack algorithm increases the number of nodes on the stack by (2b - 1), the number of stored paths is equal to x. Extend in an arbitrary way all paths that are shorter than N/c c-tuples. The list decoder will not make an error if the number of computations for none of the N/c information b-tuples
exceeds x/(N/c). Thus, using the union bound and (6.71), the probability of error for the (block) list decoder is upper-bounded by PL (9)
N xc < c
-(s+2E)
(N)
N = (N)
1+s+2E
x (s+zE)
(6.74)
Let us choose xE such that
E
(R) logx8
)1+S+26
< (xE)E
(6.75)
E,h(r)Rc Then, for any x > xE we have PL (c) < x-(S+8)
(6.76)
in contradiction to Theorem 5.17. Hence, we conclude that inequality (6.70) must hold and the proof is complete.
296
Chapter 6
Sequential Decoding
6.7 ERROR PROBABILITY ANALYSIS OF THE STACK ALGORITHM In this section, we will upper-bound the probability of decoding error for the stack algorithm.
Hence, we must assume that the codes have finite memory since otherwise we have zero probability of decoding errors. We consider the ensemble E(b, c, m, oo) of binary, rate R = b/c, time-varying convolutional codes of memory m. Then we have v = uGt
(6.77)
where G,
I
Go(t)
G1(t + 1)
...
Gm(t +m)
Go(t+1)
G1(t+2)
...
Gm(t+1+m)
(6.78)
in which each digit in each of the matrices Gi(t) for 0 < i < m and t = 0, 1.... is chosen independently and is equally likely to be 0 and 1. Assume without loss of generality that the allzero sequence is transmitted over the BSC
and that the stack algorithm with the metric µ with a general bias B > 0 (6.30) is used to decode the received sequence r. Since the analysis of the computational behavior of the stack algorithm is the same for all correct nodes, we consider only the root and the decoding of the first information b-tuple uo. In the first incorrect subtree stemming from the root, there are (2b - 1) nodes at depth m + 1 which have the same state as the (m + 1)th correct node. In general, in the first incorrect subtree there are (2b - 1)2bj nodes at depth (j + m + 1 ) , j = 0, 1, ... , which have the same
state as the (j + m + 1)th correct node. We call these nodes mergeable. First, we assume that the rate is less than the computational cutoff rate, that is, 0 < R < Ro.
Consider an arbitrary mergeable node u1o,j+m], j = 0, 1, ... , in the first incorrect subtree. A necessary condition that an error burst will start at the root is that at least one of the mergeable nodes in the first incorrect subtree will be extended. Let us introduce the function 1
O(u[o,j+m]) =
1,
if the mergeable node u,j+m] is extended
0,
else
(6.79)
The probability that the mergeable node u,j+m] is extended is simply E[tp(u,j+mi)]. Hence, the probability that an error burst will start at the root is upper-bounded by the average number of the extended, mergeable nodes in the first incorrect subtree; that is, 00
T E[co(u[o,j+mi)]
P(E1) <
(6.80)
J=O uj0.j+,,jET'
where 7m is the set of mergeable nodes in the first incorrect subtree. Consider the random walk So = 0, S'1, .. , Sj+m+1 associated with the node u[o j+m]> where t-1
Z'
S;
(6.81)
i=O
and Z, = µ(r,, vl) is the ith branch metric along the path uo j+mJ. A necessary condition for a mergeable node u'o j+m] in the first incorrect subtree to be extended is that the random walk So, Si..... Sj+m+1 does not cross (from above) a barrier at µmit,, where ttmin is the minimum
Section 6.7
Error Probability Analysis of the Stack Algorithm
297
metric along the correct path. Assume that µnun = y and let us introduce (see (6.44)) def
.fj+m+1(Y, Z)
P(S,
y 0 < t < j + m, Sj+m+1 = z)
(6.82)
Then
E[tp(u[0 j+m]) I /Pin = Yl = E fj+m+l(Y, z)
(6.83)
z?Y
Analogously to (6.80) we have P(E1 I
/-fmin = Y)00< E Y,
E[WP(u[O,j+m]) I µmin = Y]
(6.84)
j=0 u[o.i+.JETm
Inserting (6.83) into (6.84) yields 00
P(E1 I µ'.i. = Y) < E
Y, fj+m+1(Y, Z)
L..r
j=0 u[o,j+m]ET^ Z?Y
(6.85)
00
_
2b
- 1)2bi
j=0
.f j+m+1(Y, z) z?y
Since in the incorrect subtree we will eventually cross the barrier we have, analogously to (6.46), 00
fj+m+1(Y, Z) _
ft (Y, z)
(6.86)
t=j+m+2z
Z?Y
Then we insert (6.86) into (6.85) and obtain P(E1 I /burin = Y) <
00
E(2b - 1)2bi
ft (y, z)
j=0
t=j+m+2 z
00
= 2-bm (2b - 1) Y2 b(j+m) Y E .ft (Y, Z) j=0
t=j+m+2 z
The right side of (6.87) can be further upper-bounded by 00
00
2b'
P(E1 I /2.j. = Y) < 2-bm(2b - 1) j=0
.ft(Y, z)
t=j+lz
= 2-bm (2b - I)001: 1: 2b' Y, ft (Y, z) t=1
= 2-bur (2b - 1)
E t=1
j=0
z
2bt_1 2b - 1 T ft (Y, Z)
(6.88)
z
00
< 2-bur 57 2bt 1: ft (Y' Z) t=1
z
(cf. (6.48).) Analogously to (6.52), we obtain 2-bm2-Ai(y+cc) (6.89) P(E1 I gmin = y) < where;,l satisfies (6.50) and P is given by (6.30) and, then, analogously to (6.56) we can show
that 00
P(EI) _ E P(-'l l-min = -is)P(ILmin = -i8) I
i=0
<
2-bm2-xlcfl
(1
-
00
2-a,s)
E i=0
2i(xa+x,)s
+2-)"s
(6.90)
298
Chapter 6
Sequential Decoding
The sum in inequality (6.90) converges if A.o + Al < 0, and we obtain analogously to (6.58) that 2-Rmc
(6.91)
for R < Rp
(6.92)
P(E1) <2" Ao + Al
where Ao + ) i <0. If we choose the Fano metric, then
P(EA) < 0(1)2-Rmc where
0(1) = 2-hc
s
s-1
(6.93)
We notice that for the Fano metric we have obtained an upper bound on the error probability
for the stack algorithm whose exponent R decreases linearly with the rate R! The reason is that although the Fano metric should be chosen in order to obtain a good computational performance, it is not optimal from an error probability point of view. In fact, for rates R close to zero, it is far from optimal.
Next we consider the sphere-packing region Ro < R < C. Then, as always when we consider this region, we separate the event that an error burst starts at the root node into two disjoint events corresponding to "few" (F) and "many" (M) errors, respectively. Then we have
P(Et) = P(E1 I F)P(F) + P(E1 I M)P(M) < P(E1 I F)P(F) + P(M)
(6.94)
where we have upper-bounded P(E1 I M) by 1. To state more precisely what we mean by "few" and "many" errors, we consider the minimum metric µ1along the correct path. Let those error patterns for which the metric stays above the barrier u <0, that is, for which µmin > u
(6.95)
contain "few" errors, and those error patterns for which the metric hits or crosses the barrier contain "many" errors. From Lemma 6.1 it follows that
P(M) = P(Nmin
(6.96)
where Ao is the smallest root of (6.33). The probability
P(E1, F) = P(E1 I F)P(F)
(6.97)
is equal to the joint probability that µn,;n > u and that at least one of the mergeable nodes in the first incorrect subtree is extended. Then, for µ;n = y > u we have (cf. (6.89)) P (E1
where Al satisfies (6.50) and
I
/5min = 3)< 2-bm 2-a i
(6.98)
is given by (6.30). We let
u = -(io + 1)S
y = -iS
(6.99) (6.100)
Section 6.7
299
Error Probability Analysis of the Stack Algorithm
and, then, analogously to (6.56), we get io
P(El I .F)P(F) < >
'2-bm2-x1(-z3+cfi)P(Amin
= -is)
i=0 io
= 2-bm-xicp
(2Al(P(.n < -is) - P(gmin < -(i + 1)6)) i=0 io+1
io
= 2-bm-xicj6
L2"I"P(/min < -is) -1:2Ai(i-1)SP(ILmin < -is) i=0
i=1
(6.101)
= 2-bm-xicp ((1
-
io
2-x18)
2x1iSP(/t.in
< -is)
i=0
+2-x13P(µmin < 0) -- 2`I'03P(/-tmin _< -(io + 1)5))
<
2-bm-xic,8
C(1 - 2-x,8)
(io
2xiid2X018 Y i=0
+2-x'8
where the last inequality follows from Lemma 6.1. Assuming that a,0 + Al 0 0, we have P(E1
I flP(F) < 2
bm-xi c
((1
-
(2(x0+x)('b0+ 1)S 2(xo+xi)s - 1
2-x S ) 1
- 1) + 2-;L16
(6.102)
=
2-bm-xic
(1 - 2-x13)2-(xo+xi)u + 2x03 - 1 2(xo+xl)S - 1
By inserting (6.102) and (6.96) into (6.94), we obtain
P(El) < 2
-bm-xic
(1 - 2-xl8)2-(x0+x,)u + 2x03 - 1 2(x0+x, )s
_1
+2
x
°u
(6.103)
For the Fano metric we conclude from (6.34) and Example B.2 that S
l +S
X0
(6.104)
and 1
X1
1+s
(6.105)
G(s) = sR
(6.106)
where s is the solution of
and G(s) is the Gallager function (4.208). Hence,
A0+A1=
1-s
1+s >0
(6.107)
if and only ifs < 1, that is, if and only if R > R0. Since A0 < 0 and A0 + Al > 0, it follows that 2x03
-1
(6.108)
300
Chapter 6
Sequential Decoding
Furthermore,
1 - 2`l6 1 / (),0 + A1)8 1
(1 - 2-x18) 2(xo+x,)s
-1
Al
)
1 - 2-(xa+x')s) GO + Al J
8.18
(6.109)
Xo + .l1
where in order to obtain the last inequality we have used that x. Thus, we can further upper-bound (6.103): Al
P(E1) <
is decreasing with increasing
x
2-bm-XIcfi2-()Lo+xi)u+2-x0u
(6.110)
)'0+A1 In order to transform our bound (6.110) into a more illuminative form, when R > R0 we use the optimal value of the barrier, viz.,
l+s G(s)mc
u
S
Next we insert (6.104), (6.105), (6.107), and (6.111) into (6.110) and obtain
P(El) <
2- +scP21 G(s)mc-bm +2-G(s)mc
1
1-s 1
G
2
i+scfl2mc-Rmc + I) 2-G(s)mc
_ (1___2_thcfl
+ i) 2-G(s)mc
(6.112)
Final/lily,
where the last equality follows from (6.106).
we obtain
P(E1) < 0(1)2-G(s)mc
for R > R0
(6.113)
where
O(1)= 1
(6.114)
1
Remark. It is somewhat surprising that the constant factor in the upper bound (6.113), that is, O (1), is better than the corresponding factor for maximum-likelihood decoding (cf. Theorem 4.32). The explanation is that here we have used a more advanced bounding procedure.
Next we consider the special case R = R0, which implies that s = 1. Hence, assuming the Fano metric, we have
A0=-X1 = -1/2
(6.115)
That is, we have A0 + k1 = 0. We return to the last inequality of (6.101) and obtain
P(El I .F')P(.F) < 2-bm-x,c ((1
- 2-612) > 1 +
2-8/2
i=0
= 2-bm-c#/2((1 -2 -3/2)(io + 1)+2 -8/2)
<
2-bm-cP12(I
(6.116)
+ (i0 + 1)8/2)
By inserting (6.116) and (6.96) into (6.94), we obtain P(E1) < 2-bm-c,1/2(1
- u/2) + 2u/2
(6.117)
Again we use the optimal value of the barrier, viz.,
u = -2R0mc
(6.118)
Section 6.7
Error Probability Analysis of the Stack Algorithm
301
and obtain P(El) < 2-Romc-c,8/2(1
+ Romc) + 2-Romc
< (2-cP12(1 + Romc) + 1) 2-Romc
(6 . 119)
Finally, we obtain
P(Et) < 0(m)2 Romc,
for R = Ro
(6.120)
where
0(m) = 2-`P12(1 + Romc) + 1
(6.121)
We summarize our results in the following
Theorem 6.10 If we use the Fano metric, then for the ensemble of rate R = b/c, timevarying convolutional codes of memory m the burst error probability for the stack algorithm when used to communicate over the BSC is upper-bounded by 0(1)2-Rmc,
P(E1) <
0(m)2-Romc
0(1)2-G(s)mc
0 < R < Ro R = RO
(6.122)
Ro < R < C
where s is the solution of (6.106), G(s) is the Gallager function given by (4.208), and Ro is the computational cutoff rate.
For rates Ro < R < C we have the same exponents in our upper bounds on the error probability for sequential decoding as we have for maximum-likelihood (ML) decoding, but in the region 0 < R < RO the bound for sequential decoding is much worse than that for ML decoding. This is because from the error probability point of view the Fano metric is not the best choice for rates R < Ro. By replacing the Fano metric by the Gallager metric, we can for rates R < RO obtain a stronger bound on the error probability at the expense of an increased number of computations as follows: By replacing b (= Rc) by def
bo = Roc
(6.123)
in (6.85), we can obtain a strengthening for rates R < R0. As before, A0 is the smallest root of (6.33), but now k1 is the positive root of 2
12x'P = 2-Ro 2
(6.124)
For the Gallager metric it is easily verified that (6.125) and
,ll =
1
(6.126)
2
Since A0 + Ai = 0 we can use the Gallager metric and repeat the derivation of (6.120) and obtain
Theorem 6.11 If we use the Gallager metric, then for the ensemble of rate R = b/c, timevarying convolutional codes of memory m the burst error probability for the stack algorithm when used to communicate over the BSC is upper-bounded by O(m)2-Romc
P(it) <
0(1)2-G(s)mc
0 < R < Ro Ro < R < C
(6.127)
Chapter 6
302
Sequential Decoding
where s is the solution of (6.106), G(s) is the Gallager function given in (4.208), and R0 is the computational cutoff rate. Hitherto we have assumed that the sequential decoder had no back-search limit; that is, the decoder could, if required, back up as far as to the root of the code tree. Now we assume (cf. Section 4.8) that we have a finite back-search limit r that is an excursion further than r branches back from the foremost node is prohibited. For simplicity we will restrict our analysis to the case r = m + 1, where m is the encoder memory. Consider the ensemble £(b, c, m, 1) of binary, rate R = b/c, time-invariant convolutional codes of memory m. In the first incorrect subtree there are (2b -1)2b(T -1) nodes at depth r. A necessary condition that the subpath corresponding to an error burst will start at the root is that it will pass one of these nodes at depth r, that is, that the metric of such a node is not less than the minimum metric along the correct path. In order to minimize the probability that an error burst starts at the root when we have the finite back-search limit r = in + 1, P(£s), we need a metric different from the Fano and Gallager metrics, viz., the Zigangirov symbol metric for the BSC:
( a = log(1 - E) + 1 - G(s,)/s,
µz
1 $=logE
+1-G(s,)/s,
(6.128)
where s, 0 < sZ < 1, is the solution of
G'(s) = R
(6.129)
if this solution is upper-bounded by 1, and where sz = 1, if the solution of (6.129) is greater than 1. As before, G(s) is the Gallager function (4.208) and G'(s) is its derivative. (The value of sZ maximizes the reliability function
EB(R) = G(s) - sR, for Rcrit < R < C
(6.130)
and the pair (E13 (R), G'(se)) defines parametrically the sphere-packing bound for block codes in this region.) For the BSC it is easily verified that
=
log
1-E
- 1, for R,r;t < R < C
P
1
sZ
log
(6.131)
P
where p is the Gilbert-Varshamov parameter with respect to the rate R, that is,
p = h-1(1 - R)
(6.132)
The rate corresponding to sZ = 1 is the critical rate Rc1;t (4.137). For rates R < R,it the Zigangirov metric coincides with the Gallager metric. It is easily verified that for the Zigangirov metric the negative root A0 of (6.33) is A0
1
s +Z
(6.133)
Thus, P(M) satisfies (6.96) when A0 is given by (6.133). Remark.
For rates Rcrit < R < C, the Zigangirov symbol metric (6.128) can be written
as
a = l+s, tog 0-0 l-P S,
I
(6.134)
'+SZ loge
which can be scaled to
a =log
1-e 1=P
8'=log'
(6.135)
Section 6.7
Error Probability Analysis of the Stack Algorithm
303
Analogously, for rates 0 < R < Rcrit we obtain a' =
1-E
log
1-Pcrit
= log
I
(6.136)
E
Pcrit
where pait is the Gilbert-Varshamov parameter with respect to the rate Rcrit, viz., (6.137)
pcrit = h-1(1 - Rcrit)
We have Al = 1/2 and
1{ s <
1/2
(6.138)
z
for rates 0 < R < Rcrit and Rcrit < R < C, respectively, and it is easily verified that
+ 12"
12a1a
= 2-G(s,)1s,
(6.139)
2
2
Consider the rates Rcrit < R < C. The conditional probability that at least one of the nodes at depth r = m + 1 in the first incorrect subtree is visited given that we have few errors, that is, P(g bS I .F), is upper-bounded by the conditional expectation of the number of visited nodes at depth (m + 1) in the first incorrect subtree given the event F. Then, analogously to (6.44) we obtain (for r = m + 1): P(SIbs
/-lmirl =
Y)
< Y(2b - 1)2bm.fm+1(y> Z) z?Y
< 2b(m+l) E P(pm+l =
z)2'`1(z-Y)
z?Y
(6.140)
(P(IL.+l = Z)2;"z 2->`tY
< 2b(m+1) allz
= 2b(m+1)
12Ala
+
(m+1)c
2-aty
12)1 2
where Al > 0. Next, we let u = -(i0 + 1)8, y = -i8, and assume that Ao + Al > 0, that is, R > Rcrit. Then analogously to (6.101) and (6.102) we can show that P(E]bs
io
.F)P(F) _
-i8)
µmin = i=0
< 2b(m+l)
1 (2Ala
+ 12>rt,6
(m+1)c
2
x
((1 -
2-)"8)2(A0+)wt)(io+1)8 + 2`°6
2(xo+x1)8 - 1
= 2b(m+l)
12)"a + 12>w1f
(m+1)c
2
X
(1 - 2-),.18)2-(A0+A1)u + 2''08 - 1 2(Xo+)t)3 - 1
-1
(6.141)
304
Chapter 6
Sequential Decoding
Analogously to (6.103) we obtain
P(EtS)
< 2b("'+1)
(21
1+
(m+1)c
2x"d)
2
((I x
2-xla)2-(Xo+`,)u + 2X0 - 1 2(Ao+),,)8
(6.142)
+ 2-A 0u
-1
Using (6.108), (6.109), and (6.139), it follows from (6.142) that Al
P(Eo'S) <
2b(m+1)2-(m+1)c2 (xo+x,)u
A0 + Al
+ 2°u
(6.143)
We choose the barrier to be
u=-l+sz(G(sz)-szR)(m+1)c
(6.144)
sZ
insert (6.133), (6.138), and (6.144) into (6.143), and obtain Al
2b(m+1)2-G(sz)(m+1)c2(G(si)-s,R)(m+1)c
Ao + Al +2-(G(sz)-s,R)(m+1)c -
A0+2 ,12-(G(s,)-siR)(m+1)c
(6.145)
Ao + Al
Finally, we have
P(e s) <
0(1)2-(G(si)-siR)(m+1)c
for R > Rcrit
(6.146)
and
O(1) =
Iko + 2a,1
= 2 - sz
(6.147)
1-si ,k0+A1 Next, we consider the special case when R = Rcrit, which implies that sz = 1. Hence, assuming the Zigangirov metric we have
,10=-X1 = -1/2
(6.148)
Then, analogously to (6.116) and (6.117) we can show that for the barrier
u = -2R0(m + 1)c
(6.149)
we have
P(EES) < 0(m)2 (R0 R)(m+1)c
for R = Rcrit
(6.150)
where
O(m)=(Ro(m+1)c+1)+1
(6.151)
Finally, to make the analysis complete, we consider rates R < Rcrit. Then the Zigangirov metric coincides with the Gallager metric, and analogously to (6.127) we obtain P(Etbs) < O(m)2 (Ro-R)(m+1)c
0 < R < Rcrit
(6.152)
where 0(m) is given by (6.15 1). We summarize our results in the following (cf. Theorem 4.37)
Theorem 6.12 If we use the Zigangirov metric, then for the ensemble of rate R = b/c, time-invariant convolutional codes of memory m the burst error probability for the stack algorithm with back-search limit m + 1 when used to communicate over the BSC is upperbounded by
P(E AS ) <
0(m)2-(Ro-R)(m+1)c 0(1)2-(G(s,)-s,R)(m+1)c
0 < R < Rcrit Rcrit < R < C
(6.153)
Section 6.8
Analysis of the Fano Algorithm
305
where sZ is given by (6.13 1), G(sZ) is the Gallager function given by (4.208), Ro is the computational cutoff rate, and R,;t is the critical rate. Remark. If we use a tail that is shorter than bm the protection of the last information symbols will be weakened [Joh77b]. Remark. The sequential decoder will perform about as well with a rate R = b/c systematic encoder of memory me/(c - b) as with a rate R = b/c nonsystematic encoder of memory m. Due to the systematicity, the first b symbols on each branch in the tail, which is
used to terminate the encoded sequence, are zeros. These zeros are known beforehand and can be omitted before transmission. Hence, the two encoders require the same allotted space for transmission of their corresponding tails. The factor c/(c - b) by which the memories differ is the same factor by which Heller's upper bounds on the free distance differ when systematic and nonsystematic encoders are used.
6.8 ANALYSIS OF THE FANO ALGORITHM Our analysis of the Fano algorithm is based on the results we obtained for the stack algorithm.
First we analyze the computational behavior and consider, as for the stack algorithm, the ensemble E(b, c, oc, 1) of infinite memory, time-invariant, rate R = b/c convolutional codes. We need the following properties of the Fano algorithm. They follow immediately from the description of the algorithm.
Property F1 A node u[o,t) is extended by the Fano algorithm only if its metric µ(rlo,t), v[o,t)) is not less than the threshold T.
Property F2 Let pain denote the minimum metric along the correct path. The threshold T is always greater than µmin - A, where A is the stepsize; that is, def
(6.154)
Tmin e min(T) > Amin - A
Property F3 For any value of the threshold T, each node can be visited at most once. From Properties F1 and F3 it follows that given Tmin = y, the number of visits to any node with metric A = z, where z > y, is upper-bounded by ZAyI
(6.155)
Since the root has metric 0, it can be visited at most -y/A + 1 times. Analogously to (6.44), we can show that the conditional average of the number of computations needed to decode the first correct node, given Tmin = y, is upper-bounded by E[CI I Tmin = y] < (-y/A + 1) 00
+ E (2b - l)2b(J-1) j=1
E(
y+1
z
)
fj (y, Z)
Z>>y
00
_ -y/A +
(1 +
Y(2b -
1)2b(J-1)
Z>y
j=1
-
E fj(y, Z)
00
A-1
E(2b - 1)2b(J-1) E(y j=1
Z>y
- z)f1(y, Z)
(6.156)
306
Chapter 6
Sequential Decoding
In order to upper-bound the right side of (6.156), we use the following inequality from Appendix B (see Corollary B.5): 00
yln2
EtEft(.Y,x) t=1
(6.157)
g%(0)
x
where
g (0) = I - a + 2 /3) c In 2 = E [ Z; ] In 2
(6.158)
and Z' is the ith branch metric µ(r,, vi) along the incorrect path u'o j). Combining (6.156) and (6.157) yields 00
E[C1 I Tmin = y] < -y/o + (1 +
-
E(2b
1)2b(j-1) E
j=1
-
fj (y, z))
Z>Y
cc E[Z']L-1
E(2b -
1)2b(j-1) 1: 1:
Z>Y t=1
1=11
(6.159)
t E fr(y - Z, x)fj (y, z) x
From (6.44) and (6.51) we conclude that 00
1+
E(2b
-
1)2b(j-1) Y fj
j=1
(y, z) < 2-b-ai
(6.160)
z>y
The sum
Y, ft (Y - z, x)
x
is the probability that the random walk, given that it started from level z, where z > y, crosses the barrier y for the first time at the depth t. Then,
Y fi(y, z) E f, (Y - z, x) x
Z>Y
is the probability that the random walk, given that it started from level 0, crosses the barrier y for the first time at the depth j + t. This latter probability can also be expressed as
E fj+t (y, z) z
Hence, we have
fj(y, z) E ft(y - z, x) = Y fj+t(y, z) Z>Y
x
z
(6.161)
Section 6.8
Analysis of the Fano Algorithm
307
Using (6.160) and (6.161) we can further upper-bound (6.159): E[C1
I
2-b-xi(y+cp)
Tmin = y] < -y/A +
00
00
-
t
E[Z;]A-1 E(2b _ 1)2b(j-1)
j=1
f1+t(y, z) z
t=1
2-b-xI(y+cp)
= - y/A +
oo
k
-E[Zi]A-1(1 -2-b)E(k-j)2bi Efk(y,z) k=1 j=1
z
(6.162)
-y/A+2-b-xi(y+cp) -E[Z']A-1(1 -2-b)
=
-
b 2bk
00
1
k2 2b - 1 -
x
k2b(k+2)
_
2b(k+1) ( k
+ 1 ) + 2b
(2b - 1)2
fk(y,z)
)
k=1
z
_
2b
2b(k+1)
k2b
fk(y,Z)
(26-1)2)Z
2b71 + (2b-1)2
X k-1
- 2-b)
E[Z;]A-1(1
-y/A+2-b-x,(y+cp) -
In order to further upper-bound (6.162) we use the following two simplifications: From (6.157) it follows that 00
fk(y, z) > E[Z
k z
k=1
i
(6.163)
]
and from (6.49) it follows that 00
Y 2bk Y, fk (y, Z) < k=1
2-X I (Y+cp)
(6.164)
z
Furthermore, from (6.45) we have 00
E Y MY, Z) = 1
(6.165)
k=1 z
Inserting (6.163), (6.164), and (6.165) into (6.162) yields E[C1 I Tmin = y] < -y/A + 2-b-x1(y+cp)
- E[Z ]A1(1 - 2 b) (\
`
_
y2b (2b - 1)E[Z;]
-y/A+2-b-ai(Y+cp) +y/A
<2
-b-xi(y+cp)
-
E [ Z] i
- E[Ar126
2-x1 (Y+cp)
A(2b - 1)
E[Zi]
(2-b - A(2b - 1)
2-xicp2-x,Y
+
2b2-xl(Y+cp) (2b - 1)2
xi(Y+cp)
-
2b (2b - 1)2
+ AE[Z']1)
(6.166)
308
Chapter 6
Sequential Decoding
By combining (6.154) and (6.166), we obtain
E[C1 I /min = Y] <
E[Zi}
(2- b - (2b
= 1-
-x1cP+x1A-x1Y
- 1)A) 2
(6.167)
E[Zil
2x'A2-b-x'cP2-xiy
(1 - 2-b)0 )
The function 2)"'/ 0 attains its minimum when (6.168)
A 1 In 2
Since in a practical situation the second term in the parenthesis in the right-hand side of (6.167) is essentially greater than 1, we use (6.168) as a good approximation of the optimal value of the stepsize. Inserting (6.168) into (6.167) gives (6.169)
E[C1 I /min = A < CF2-x'y where CF
I-
=
e2-b-xicb
(a +,B)A1 In 2
(6.170)
)
2(1 - 2-b)
Without any essential loss of accuracy we assume that the metric values can be written (6.171)
a = 0108 and
(6.172) P = Po8 where ao and,Bo are integers and 8 > 0. Then /min E 10, -8, -28, -38, ...} and we have
C. = E[C1] _
E[C1 I /min = -ls]P(/min = -is) i=0
00
< Y, CF2'x'6(P(/min < -is) - P(/min < -(i + 1)8)) i=0
(6.173)
00
2('-1)x'6P(/min
= CF Y, 2"-"P(/mi. < -is) i=0
< -is)
i=1
+2-x'3
= CF C(1 - 2-x'6) Y, 2"-'6P(/min < -is) i=0
Now we use Lemma 6.1 and obtain 00
Cap < CF I (1 - 2 x,s)
(2iXI82lO8) + 2-x's
(6.174)
i=O
The sum converges if A0 + Al < 0 and, finally, we have Cav
<
_
CF
1 - 2-x,s
+
1 - 2x0+xOs
2-x16)
(6.175) 1 - 2x06
<
CF1 -2(xo+xi)s
where the last inequality follows from the fact that proved
CF-
A0
FA0+X1
X
is decreasing for all x. Thus, we have
Section 6.8
Analysis of the Fano Algorithm
309
Theorem 6.13 The average number of computations per branch for the Fano algorithm when used to communicate over the BSC is upper-bounded by )10
Cav < CF
(6.176)
A0+A1 where A0 and ),1 are the negative and positive roots of equations (6.33) and (6.50), respectively, such that A0 + Al < 0, and CF is given in (6.170).
From (6.104) and (6.107), we have
Corollary 6.14 If we use the Fano metric the average number of computations per branch for the Fano algorithm when used to communicate over the BSC is upper-bounded by Cav < CF
s
for R < R0
s- 1 '
(6.177)
where s is the solution of (6.35) and R0 is the computational cutoff rate. Also for the Fano algorithm we can use the Markov inequality (Lemma 5.14) and obtain
Theorem 6.15 When the Fano algorithm is used to communicate over the BSC then the computational distribution for the number of computations needed to decode any correct
node, Ci, i = 1, 2, ... , is upper-bounded by
P(C; > x) < Cavx-1,
for R < R0
(6.178)
where Cav is the average number of computations per branch and R0 is the computational cutoff rate. Remark. For the ensemble S(b, c, oo, 1) of rate R = b/c, infinite memory, timeinvariant convolutional codes Falconer [Fa166] has shown that for the Fano algorithm the sth
moment of C;, E[Cf ], is finite for 0 < s < 1 if R < G(s)/s and, hence, that
P(C; > x) < E[Cs]x-'
(6.179)
For the ensemble of general, nonlinear, infinite memory trellis codes Savage [Sav66] strengthened (6.179) to
P(C, > x) <
OR(1)x-S
(6.180)
for
R<
R(S)
= G(s)/s
(6.181)
where s is a strictly positive integer and OR(1) depends on the rate R but not on x. The lower bound on the computational distribution that we derived in Section 6.6 (Theorem 6.9) is also valid for the Fano algorithm. The error probability analysis of the Fano algorithm is quite similar to the corresponding analysis of the stack algorithm. The only difference is in the necessary condition for a mergeable node u[O,J+m] in the incorrect subtree to be extended. For the Fano algorithm, the condition is that the random walk So, Si, . Sj+m+l does not cross a barrier at Tn,in, while for the stack algorithm the barrier is at Amin. Thus, we use inequality (6.154) and instead of (6.51) we obtain A'0
P(51) <
2-Rmc
(6.182)
AO +Al
where A0 + A 1 < 0 and R < R0. Analogously to Theorem 6.10 we can prove
Theorem 6.16 If we use the Fano metric, then for the ensemble of rate R = b/c, timevarying convolutional codes of memory m the burst error probability for the Fano algorithm
310
Chapter 6
Sequential Decoding
when used to communicate over the BSC is upper-bounded by
0(1)2-Rmc
P(E1) <
0(m)2-Romc
O(1)2-G(s)mc
0 < R < Ro R = Ro
(6.183)
Ro < R < C
where s is the solution of (6.106), G(s) is the Gallager function given by (4.208), and Ro is the computational cutoff rate. For the Gallager metric we can show the following counterpart to Theorem 6.11: Theorem 6.17 If we use the Gallager metric, then for the ensemble of rate R = b/c, timevarying convolutional codes of memory m the burst error probability for the Fano algorithm when used to communicate over the BSC is upper-bounded by O(m)2-Romc
P(El) <
{
O(1)2 -G(s)mc
0 < R < Ro
Ro < R < C
(6.184)
where s is the solution of (6.106), G(s) is the Gallager function given by (4.208), and Ro is the computational cutoff rate. Finally, when we use the Fano algorithm with a finite back-search limit, we can show the following counterpart to Theorem 6.12:
Theorem 6.18 If we use the Zigangirov metric, then for the ensemble of rate R = b/c, time-invariant convolutional codes of memory m the burst error probability for the Fano algorithm with back-search limit m + 1 when used to communicate over the BSC is upperbounded by O(m)2-(Ro-R)(m+l)c
tbs
O(1)2-(G(s,)-s,R)(m+1)c
{
0 < R < Rcrt Rcrit < R < C
(6.185)
where sZ is given by (6.13 1), G(s,) is the Gallager function given by (4.208), Ro is the computational cutoff rate, and Rciit is the critical rate.
6.9 ANALYSIS OF CREEPER* For the analysis of the computational behavior, we consider as before the ensemble E (b, c, 00, 1)
of binary, rate R = b/c, time-invariant convolutional codes with infinite memory. The following properties are quite similar to those of the Fano algorithm:
Property C1 A node u[o,t) is extended by Creeper only if its metric µ(r[o,t), v(o,t)) is not less than the threshold T. From the description of Creeper it follows that the value of the threshold is a multiple of the stepsize A and that it is decreased at least twice between two successive visits to a certain node. Hence, we have
Property C2 The threshold decrement between two successive visits to a certain node is at least 2A. Property C2 implies that Creeper will not get stuck in a loop. Property C3 Let ttmin denote the minimum metric along the correct path. The minimum value of the threshold for Creeper is lower-bounded by
Tmin > /Lmin - 2A + c/
(6.186)
311
Analysis of Creeper*
Section 6.9
Property C4 For each value of the threshold T Creeper can visit a node at most once.
From Properties Cl and C4, it follows that for given T;n = y the number of visits to any node with metric µ = z, where z > y, is upper-bounded by
z - y < z-y + 20 2A
(6.187)
Thus, since the root has metric 0 it can be visited at most -y/2A + 1 times. Analogously to (6.156) we can show that the conditional expectation of the number of computations needed to decode the first correct node given that Tmin = y is upper-bounded by the inequality E[C1 I Tmin = Y] <
-ZA + 1 + E(2b -
(z - Y + 1) fj(Y, Z)
1)2b(j-1)
j=1
(6.188)
z?y
The upper bound (6.188) for Creeper differs from its counterpart for the Fano algorithm only by the factor of 2 in front of the stepsize A. Therefore, repeating the steps (6.157)-(6.166) yields E[C1 I Tmin = YA < (1 \\
- 2(1E[Zi1 - 2-b)A
22XIo-b-xic142-A'y
(6.189)
4AiA-b-2xicP 2 xly
(6.190)
Combining Property C3 and (6.189) yields (cf. (6.167)) E[C1
I
/min = A <
E[Zi1
1 - 2(1 - 2-b)A) 2
The function 24A A/ A attains its minimum when 1
(6.191)
4X1 In 2
Since in a practical situation the second term in the parenthesis in the right-hand side of (6.190) is essentially greater than 1, we use (6.191) as a good approximation of the optimal value of the stepsize. Inserting (6.191) into (6.190) gives
E[C1 I Amin = A < Cc2-'"y
(6.192)
where
Cc = C1 _
2(a + fi).i In 2) e2-b-2x,cfi 1 - 2-b
(6.193)
and analogously to (6.175) we obtain
C. = E[C1] < Cc
Xo
.0 + 7.1
(6.194)
Thus, we have proved
Theorem 6.19 The average number of computations per branch for Creeper when used to communicate over the BSC is upper-bounded by Cav < Cc
)10
Ik0 +;.1
(6.195)
where Ao and Al are the negative and positive roots of equations (6.33) and (6.50), respectively, such that Al < 0, and Cc is given by (6.193).
From (6.104) and (6.107), we have
312
Chapter 6
Sequential Decoding
Corollary 6.20 If we use the Fano metric, the average number of computations per branch for Creeper when used to communicate over the BSC is upper-bounded by Cap < CC
s
s-
for R < Ro
(6.196)
where s is the solution of (6.35) and Ro is the computational cutoff rate.
Again we can use the Markov inequality (Lemma 5.14) and obtain
Theorem 6.21 When Creeper is used to communicate over the BSC then the computational distribution for the number of computations needed to decode any correct node, Ci, i = 1, 2, ... , is upper-bounded by
P(Ci > x) < Cavx-I,
for R < Ro
(6.197)
where Ca, is the average number of computations per branch and Ro is the computational cutoff rate. The lower bound on the computational distribution given in Theorem 6.9 is also valid for Creeper. By comparing Cc and CF, one might expect Creeper to perform worse than the Fano
algorithm. In our proof technique, we (have to) use a lower bound on the threshold, viz., Turin > Amin -20+c1, which does not show that the average behavior of Creeper is much better. For the Fano and stack algorithms, we can use the more realistic bounds, Tmin > AAmin - 0 and Tmin = Arvin, respectively. The simulations reported in Section 6.5 show the superior computational performance of Creeper compared to the Fano algorithm. For the error probability, we can analogously to Theorem 6.16 prove
Theorem 6.22 If we use the Fano metric, then for the ensemble of rate R = b/c, time-varying convolutional codes of memory m the burst error probability for Creeper when used to communicate over the BSC is upper-bounded by O(1)2-Rmc,
P(EA) <
O(m)2-ROmc
0(l)2-G(s)mc
0 < R < Ro R = Ro Ro < R < C
(6.198)
where s is the solution of (6.106), G(s) is the Gallager function given by (4.208), and Ro is the computational cutoff rate. For the Gallager metric, we can show the following counterpart to Theorem 6.17:
Theorem 6.23 If we use the Gallager metric, then for the ensemble of rate R = b/c, time-varying convolutional codes of memory m the burst error probability for Creeper when used to communicate over the BSC is upper-bounded by
P(-'i) <
0(m)2 -Romc
0 < R < Ro
O(1)2- G(s)mc
Ro < R < C
(6.199)
where s is the solution of (6.106), G(s) is the Gallager function given by (4.208), and Ro is the computational cutoff rate. Finally, when we use Creeper with a finite back-search limit, we can show the following counterpart to Theorem 6.18:
Theorem 6.24 If we use the Zigangirov metric, then for the ensemble of rate R = b/c, time-invariant convolutional codes of memory m the burst error probability for Creeper with
Problems
313
back-search limit m + 1 when used to communicate over the BSC is upper-bounded by 0(m)2-(Ro-R)(m+1)c
P(E s) <
J1
0(1)2-(G(s,)-s,R)(m+1)c
0 < R < Rcrit Rcr,t < R < C
(6.200)
where sZ is given by (6.131), G(si) is the Gallager function given by (4.208), Ro is the computational cutoff rate, and Rcrt is the critical rate.
6.10 COMMENTS
Wozencraft [Woz57] presented the first sequential decoding algorithm. It inspired all the subsequent discoveries but is much more complicated as well as less effective than the other sequential decoding algorithms. In 1963, Fano published a most ingenious algorithm for sequential decoding [Fan63]. It is still considered to be the most practical sequential decoding algorithm. The Fano algorithm was intensively studied by graduate students at MIT during the 1960s; among them were Falconer [Fa166], Haccoun [Hac66], Heller [He167], Savage [Sav66], Stiglitz [Sti63], and Yudkin [Yud64]. It was used for space and military applications in the late 1960s. Zigangirov published the stack algorithm in 1966 [Zig66] and Jelinek in 1969 [Je169]. Zigangirov used recursive equations to analyze sequential decoding algorithms [Zig74]. An embryo of Creeper appears in [Zig75]. It was further developed in [CJZ84b], and the final version and its analysis are given in [Nys93] [NJZ97]. Among all of those analysts who addressed the problem of bounding the distribution function of the number of computation, we would like to mention Jacobs and Berlekamp [JaB67].
Massey [Mas75]: "It has taken a long time to reach the point where one understands why the early sequential decoding algorithms `worked' and what they were really doing"
PROBLEMS 6.1 Consider the rate R = 1/2, memory m = 2 convolutional encoding matrix G(D) = (1 + D + D2 1 + D2). (a) Draw the complete code tree for length f = 3 and tail m = 2. (b) Find the Fano metrics for a BSC with crossover probability E = 0.045, that is, R = Ro = 1/2. (c) Use the stack algorithm to decode the received sequence r = 00 01 10 00 00. How many computations are needed? (d) Repeat (c) with the Fano algorithm. (e) Are the decisions in (c) and (d) maximum-likelihood? 6.2 Repeat Problem 6.1 (c-c) for r = 01 01 00 00 11. 6.3 Consider a binary input, 8-ary output DMC with transition probabilities P(r I v) given in Example 6.2. Suppose that four information symbols (and a tail of three dummy zeros) are encoded by G (D) = (1 + D + D2 + D3 1 + D2 +D 3). Use the stack algorithm to decode the received sequence r = 0202 0104 0304 0213 1411 1312 04036.4 Repeat Problem 6.3 for G(D) _ (1 + D + D2 1 + D2) and r = 1104 0112 1101 0111 1301 0403. Is the decision maximum-likelihood? 6.5 Consider the BEC with erasure probability 8 = 1/2. Suppose that three information symbols
(and a tail of two dummy zeros) are encoded by the encoding matrix G(D) = (1 + D + 1 + D2). (a) Find the Fano metric. (b) Use the stack algorithm to decode the received sequence r = 1 A A A OA A I A A. D2
314
Chapter 6
Sequential Decoding
6.6 Consider the Z channel shown in Fig. P6.6. Suppose that three information symbols (and a tail of two dummy zeros) are encoded by the encoding matrix G(D) = (1 + D + D2 1 + D2). (a) Find the Fano metric. (b) Use the stack algorithm to decode the received sequence r = 01 00 00 10 10. (c) Is the decision in (b) maximum-likelihood?
Figure P6.6 The Z channel used in Problem 6.6.
6.7 Consider a rate R = 1/2 convolutional code with encoding matrix G(D) = (1 + D + D2 1 + D2) that is used to communicate over a binary, input, 8-ary output DMC with transition probabilities P(r I v) given by the table:
r
V
0 1
04
03
02
01
11
12
13
14
0 .2196 0.0027
0 .2556 0.0167
0 .2144 0.0463
0 . 1521 0.0926
0.0926 0.1521
0.0463 0.2144
0.0167 0.2556
0.0027 0.2196
Suppose that four information symbols (and a tail of two dummy zeros) are encoded.
(a) Use an appropriate choice of integer metrics and the stack algorithm to decode the received sequence r = 0213 1202 0113 1111 1214 1111. (b) Is the decision in (a) maximum-likelihood?
6.8 Consider a rate R = 1/3 convolutional code with encoding matrix G(D) = (1 + D + D3 1 + D2 + D3 1 + D + D2 + D3) that is used to communicate over the BSC with crossover probability E = 0.095. (a) Find Ro for this channel. (b) Suppose that a tail of three information symbols is used. Make an appropriate choice of integer metrics and use the stack algorithm to decode the received sequence. r = 010 001 101 100 000 011 110. 6.9 Consider the rate R = 1/2, convolutional code with the systematic encoding matrix G (D) = (1 1 + D + D2 + D4) that is used to communicate over the binary input, 8-ary output DMC with transition probabilities P(r I v) given by the table: r
V
0 1
04
03
02
01
11
12
13
14
0.1402 0.0001
0.3203 0.0024
0.2864 0.0177
0.1660 0.0671
0.0671 0.1660
0.0177 0.2864
0.0024 0.3203
0.0001 0.1402
Suppose that after the information symbols and a tail of four dummy zeros have been encoded, the four "known" (systematic encoder!) zeros in the encoded tail are deleted before transmission. The sequence r = 1201 0212 1311 1211 0312 is received. (a) How many information symbols were encoded? (b) After an appropriate choice of integer metrics, use the stack algorithm to decode r.
Problems
315
6.10 Show that the factor +x a in the upper bound on the average number of computations per branch for the stack algorithm (Theorem 6.5) regarded as a function of the bias B achieves its minimum for B = R, that is, for the Fano metric. 6.11 Show that for rates R < Ro and bias B < Ro the positive root X1 of the equation
(12'i° + I2a'6) = 2-R 2
is strictly less than 1/2 (cf. Corollary 6.7).
2
I
N Iterative Decoding
In the three previous chapters, we have described and analyzed Viterbi decoding, which is a nonbacktracking decoding method that fully exploits the error-correcting capability of a code of low complexity; list decoding, which is a nonbacktracking decoding method that partially exploits the error-correcting capability of a code of relatively high complexity; and sequential decoding, which is a backtracking decoding method that at the cost of variable computational efforts (asymptotically) fully exploits the error capability of a code of high complexity. In this chapter, we address an entirely different decoding method-iterative decoding. An iteration results in an a posteriori probability that is used as input for the next iteration. Iterative decoding is very powerful and gives at reasonable complexities performances close to Shannon's capacity limits. For equiprobable information sequences, the Viterbi algorithm outputs the most probable transmitted code path. We consider a posteriori probability (APP) decoding algorithms, which calculate the a posteriori probabilities for each of the transmitted information symbols. Combined with a decision rule which decides in favor of the most likely information symbols given the received sequence, we have a decoder whose outputs are the most probable information symbols. In this latter case, the corresponding sequence of estimated code symbols is, in general, not the same as that of the most probable code path.
7.1 ITERATIVE DECODING-A PRIMER In order to explain iterative decoding, we consider the systematic encoder for a rate R = 1/3 parallel concatenated convolutional code given in Fig. 7.1. Remark. An encoder of the type shown in Fig. 7.1 is often used together with iterative decoding. During the decoding, the output is partly fed back in a manner that somewhat resembles the principle behind the turbo engine. Therefore, when a convolutional code is encoded by a parallel concatenated scheme, it is often with an abuse of notation called a "turbo code."
Assume that the interleaver is very large, such that symbols that are close in time in the information sequence are, in general, far apart in the interleaver output. We also assume that 317
Chapter 7
318
Iterative Decoding
(2)
Figure 7.1 A rate R = 1/3, systematic, parallel concatenated convolutional encoder. the sequence
v = vov i... = ll00 v(1)v(2)uiv(1)v(2) o 1
(7.1))
1
is transmitted over the additive white Gaussian noise (AWGN) channel. The first symbol in each triple vt is the corresponding information symbol, ut, and v(1) = v0o1)v (1) i
..
(7.2)
and
v(2) = V(2)V(2) 0
(7.3)
1
are the first and second check sequences, respectively. Let
r = r0r1 ... = r0 r0 r0 (0)
(1)
(2)
(0)
r1
ri(1) r1(2)
...
(7.4)
denote the received sequence and let P (ut = 0) denote the a priori probability that information
symbol ut = 0, t = 0, 1, .... In practice, we often have P(ut = 0) = 1/2. Each iteration of our iterative decoding is executed in two phases. The idea is that the a posteriori probabilities of one phase are used as the a priori information for the following phase. The decoder consists of two a posteriori probability (APP) decoders. Such decoders are described in the following sections of this chapter.
Let r (i), £ = 1, 2, i = 1, 2, ..., denote the a posteriori probability that the tth information symbol, ut, equals zero, which is obtained in fth phase of the ith iteration. We also let
nt(0)
def
P(ut = 0)
(7.5)
That is, 7rt (0) is the a priori probability that ut equals zero. Let r(0), r('), and r(2) denote the sequences of received symbols corresponding to the information sequence u, the first check sequence v(1), and the second check sequence v(2), respectively. Then we let r(() denote the sequence of the received symbols corresponding to
the information sequence u except the received symbol rt°); that is, (0) (0) r/(o) = r°(0)ri(0) ... rt-Irt+i ...
(7.6)
During the first phase of the first iteration, the decoder calculates the a posteriori probabilities
that the information symbols ut, t = 0, 1, ... , equal zero, given the received sequences corresponding to the information sequence u and the first check sequence v(1); that is, 7r
(1) (1) def P (ut
= 0 1 r(0), r('))
(7.7)
Section 7.1
Iterative Decoding-A Primer
319
Next, we would like to separate the dependence on ri°). Hence, we rewrite (7.7) as
nil)(1) = P (ut = 0 I ri0) (,O), r(1)) P
(ri0) I
ut = 0) nr(0)P (rO, r(1) I Ut = 0)
(7.8)
P (r(0) r(0), r(1)/
Since the channel is memoryless by assumption, P(rr°)
I
ut = 0) does not depend on the a
priori probabilities P(uj = 0) for j = 0, 1. .... Analogously to (7.8), we have
1 -4rt1)(1) = P (ur = 1
ri°),rO,r(1)) I
P(ri°) Jut=1)(1-nr(0))P(r,),r(1) but=1)
(7.9)
P (r,(0), r(0), r(1))
Let A(') (1) denote the ratio of the a posteriori probabilities for the information symbol ut after the first phase of the first iteration; that is,
Ail (1)
def
ni1)(1)
(7.10)
1 - ntl)(1)
Then, combining (7.8) and (7.9) yields
A(1)( 1) = At
nt(0)
P (rr°)
u`
1 - 7T, (0) p (r(0) r
0) P (r>,r(1)
0)
ut
(7.11)
Ut = 1) P (r(0) f r(1) ur = 1)
The ratio At (0) def
nt (0)
(7.12)
1 - nt (0) is the likelihood ratio of the a priori probabilities of the information symbol ut. If P(ut = 0) = 1/2, then At(0) = 1. We introduce the likelihood ratio for the received symbol tion symbol ut, Lint
def
41 °)
corresponding to informa-
P rt(°) ut = 0 P (ri°)
(7.13)
lur=1)
and call it the intrinsic information about the information symbol ut. Finally, we have the extrinsic information about the information symbol ut obtained after the first phase of the first iteration from the received sequence r() corresponding to the information sequence except the information symbol ut and from the received sequence r(1) corresponding to the first check sequence, viz., Alxt(1)(1) def
P ((0)° r r(1)
P (r,
r(1)
ut = 0)
(7.14)
I Ut = 1)
Thus, we can rewrite (7.11) as
A(')(1) = At(0)Lint(ri°))AiXt(1)(1), t = 0, 1, which is the outcome of the first phase of the first iteration.
...
(7.15)
320
Chapter 7
Iterative Decoding
During the second phase of the first iteration, the decoder calculates the likelihood ratios
of the a posteriori probabilities for the information symbols ut, t = 0, 1, ... , based on an interleaved version of the received sequence r(0) and on the received sequence r(2) which corresponds to the second check sequence. Furthermore, the decoder can also exploit its knowledge of the a posteriori probabilities 7rt 1)(1), t = 0, 1, ... , or, equivalently, the ratio A(1) (1), t = 0, 1, ... , obtained from the first phase of the first iteration. In the calculation of A(l)(1), we use the intrinsic information about the information symbol ut, the likelihood ratio of the a priori probabilities of the information symbol ut, as well as the extrinsic information about the information symbol ut obtained from the received sequences stemming from the information sequence and the first check sequence. To obtain the ratio of a posteriori probabilities A (2) (1) we add only the influence of the extrinsic information about the information symbol ut obtained during the second phase of the first iteration from the received sequence r j) corresponding to the information sequence except the information symbol ut and from the received sequence r(2) corresponding to the second check sequence, viz.,
P rt , r (o)
def
(2)
ut = 0
(7.16)
P (rf),r(2) I ut = I) If the interleaver is very large, then the sequence r(o) used in (7.16) is independent of the sequence r() used in (7.14). The a posteriori probabilities after the first phase of the first iteration are used as a priori probabilities when we calculate the ratio of the a posteriori probabilities Ail) (1) after the second phase of the first iteration. Then we obtain an expression that contains the factor Lint(ri°)) twice-the extra factor of Lilt(r,°)) comes from the a priori probabilities via Art) (1). To avoid using the same information twice in our decoder, we simply cancel one factor of Lint (rr °)) and use
A(2)(1) =
At(0)Lint(r(0))Aext(1)(1)Arxt(2)(1)
(7.17)
as the ratio of a posteriori probabilities after the second phase of the first iteration. The a posteriori probabilities calculated during the second phase of the first iteration can be used as a priori probabilities during the first phase of the second iteration. Then we will get the following approximation of the ratio of the a posteriori probabilities for the tth symbol
ti Ar2)(1)Lint(rt°))Atxt(1)(2) (o)
= At(0)Lint (rt )At
xt(1)
ext(2)
(I)At
ext(1)
(I)L int (rt )At (o)
(7.18)
(2)
(This is only an approximation of the ratio of the a posteriori probabilities since we do not have independence between the data used in the different iterations.)
The information obtained from the received sequences r(°) and r(i) is used in both (The received sequence r(0) is also used in Atxt(2)(1), but since
Arxt(1)(1) and
the interleaver is very large, in practice it can be regarded as an independent sequence.) This means that we are using the same information twice, first as a priori probability in Atxt(1)(1) and then as channel information in Arxt(t)(2). To avoid this, we simply exclude the factor Aext(1) (1) together with one of the two factors Lint(ri°)) when we calculate A(1)(2) and instead of (7.18) we have
A(1)(2) =
At(0)Lint(rt°))Atxt(2)(I)Atxt(1)(2)
(7.19)
The equality in (7.19) holds only if all symbols of the sequences r(0), r(i), and r(2) that are involved in the calculations of Aext(2)(1) and Aext(l) (2) are independent. Otherwise (7.19) is only an approximation.
Section 7.2
The Two-Way Algorithm for APP Decoding
321
In the next phase, we use (0)Lint(rlo))A,xt(1)(2)A,X`(2'
At 2'(2) =A,
(2)
(7.20)
Again, the equality in (7.20) holds only if the involved symbols are independent. The decoder alternates in this way for 10 to 20 iterations, or until convergence is achieved. Eventually, we will have dependence between these symbols. (The larger the interleaver is, the longer it will take.) Then our expressions will only be approximations of the ratios of the a posteriori probabilities, but for simplicity we will keep our notations. The decision of ut is obtained by comparing the final likelihood ratio with the threshold 1. Performance within 1 dB of the Shannon limit can be achieved. An iterative decoder is illustrated in Fig. 7.2. A«t(2)(n - 1)
At(O)Lint(r(to))
Deinterleaver
APP
Aext(1)(n)
decoder 1
Interleaver
At(0)Lint(rio))
At(O)Lint(r(t0)) Interleaver
i
i
APP decoder 2
After N Aert(2)(n)
iterations
At(2)(N)
t
Figure 7.2 Iterative decoding procedure.
7.2 THE TWO-WAY ALGORITHM FOR APP DECODING
The two-way algorithm is the most celebrated algorithm for APP decoding of terminated convolutional codes.
Suppose that a binary, rate R = b/c, convolutional code of memory m is used to communicate over the binary input, q-ary output DMC (introduced in Section 4.7). Let, as before, v denote the code sequence, r the received sequence, P(r I v) the channel transition probabilities, and use the zero-tail (ZT) method to terminate the convolutional code into a block code of block length N = (n + m)c code symbols. Furthermore, let U[O,n+m) = UO U I ... Un+m-1
= U0(I)U0 ...U0 U1 U1(2) (2)
(b)
(1)
(b)
... U1
a) (2) (b) ...Un+m-IUn+m-I...Un+m-1
(7.21)
denote the sequence of bn information symbols followed by a tail of bm dummy symbols, which terminates the convolutional code into a block code. We let P(u;k) = 0) denote the a priori probability that information symbol 0, i = 0, 1, ..., n - 1, k = 1, 2, ..., b. In general, P(u(ik) = 0) 1/2. Let P(ink) = 0 1 r[O.n+m)) denote the a posteriori probability that u;k) = 0 given the received sequence r[o,n+m), where r[o.n+m) = r0 rl ... rn+m-1
= r0 r0 ... r0 r1 (1)
(2)
(c)
(1)
(2)
r1
(2) (c) (1) rn+m-I ... rn+m-1 ... r1(c) ... rn+m-I
(722)
Let U[O.n) denote the set of all information sequences u0u1 ... un-I, and let ll[0,n)i denote the
Chapter 7
322
Iterative Decoding
set of information sequences uoul ... un_1 such that ui(k) = 0. Then we have the following expression for the a posteriori probability that u(k) = 0 given the received sequence r[o,n+m):
p
(k)
/
\
u
(k)
P(r[O,n+m))
EL[(k)
P(r[O,n+m) I u[O,n))P(u[O,n))
(7.23)
Y-U[o,,,)EU[o,n) P(r[O,n+m) I u[O,n))P(u[O,n))
where P (r[o,n+m) I u [0,n)) is the probability that r[o,n+m) is received given that the code sequence
corresponding to the information sequence u[o,n) followed by a tail of m dummy zeros is transmitted and P(u[o,n)) is the a priori probability of the information sequence u[O,n). Our goal is to compute the sequence of a posteriori probabilities P(uo1) = 0 I r[o,n+m)), P(uoz) = 0 I r[o,n+m)), ... , P(uob) = 0 I r[O,n+m)),
P(u(') = 0 r[on+m)), P(ui2) = 0 I r[o,n+m)), ... , P(uib) = 0 I r[o,n+m)), ... , P(un11 = 0 I r[o,n+m)), P(un-1 = 0 I r[o,n+m)), ... , P(unb1 = 0 I r[O,n+m)) For an encoding matrix of overall constraint length v, we denote for simplicity the encoder state at depth t by at, where a t = or, or = 0, 1, ... , 2° - 1. We denote by S[O,n+m] the set of state sequences a[O,n+m] such that ao = an+m = 0, that is, def
S[O,n+m]
{a[O,n+m] = oval ... an+m I 0'0 = an+m = 01
(7.24)
and by S[o n+m]i the set of state sequences a-[O,n+m] such that ao = an+m = 0 and that the transition from state ai at depth i to state ai+1 at depth i + 1 implies that ui(k) = 0, that is, (k) S(k)
def
= {a[o,n+m] = a0a1 ... an+m I 0`0 = an+m = 0
(7.25)
= U(k) = 0) We always start and end the encoding in the zero state, and we have a one-to-one correspondence between the information sequence u[o,n) and the sequence of encoder states a[O,n+m] = a0a1 . . . an+m, where ao = an+m = 0. Hence, we can rewrite (7.23) as & or,
P ui(k)
a,+1
= 0 I r[O,n+m)) F_.,[o,n+m] ES[(o
P
Y_n[o,,,+m]ES[o.n+m]
(r[O,n+m)
I a[0,n+m]) P (a[0,n+m] )
(7.26)
P(r[O,n+m) 10'[0,n+m])P(Q[0,n+m])
a[o,n+m]) is the probability that r[o,n+m) is received given that the code is transmitted and is the a sequence corresponding to the state sequence where P(r[O,n+m)
I
priori probability of the state sequence a-[O,n+m]
Let P (r,, at+1 = a'
I
at = a), where Or, a' E {0, 1, ... , 2° - 1), be the conditional
probability that we at depth t receive the c-tuple rt and that the encoder makes the state transition
to at+1 = a' at depth t + 1, given that it is at at = or at depth t, and let P(at+1 = a' I at = a) be the probability of the same state transition. Next, we introduce the 2° x 2° state-transition matrix
Pt = (pt(a, a')),,
(7.27)
pt(a, a') = P(rt, at+1 = a' I at = a) = P(rt I at+1 = a', at = a)P(at+1 = a' I at = a)
(7.28)
where
and or, a' E {0, 1, ... , 2° - 11. The matrix Pt is a sparse matrix; in each row and each column, only 2b elements are nonzero.
The Two-Way Algorithm for APP Decoding
Section 7.2
323
Let
ei=(0...010...0), 0
(7.29)
i
Consider the product
eoPoPl ... Pn+m-i = (Y 0... 0)
(7.30)
where the equalities to 0 in the last 2' - 1 entries follow from the fact that the tail of m zero state driving b-tuples causes the state transitions an -4 an+1 - -' an+m to terminate in an+m = 0. The value y obtained by (7.30) is the conditional probability that we receive r[o,n+m) given that a code sequence is transmitted, that is, - -
Y=
(7.31)
P(r[o,n+m) 10'[O,n+m])P((7[O,n+m]) Q[O,n+m] ES[Q,,,+m]
which is the denominator of (7.26). In order to calculate the numerator of (7.26), we introduce, as a counterpart to Pt, the state-transition matrix Pp(k) _ (Pt") (a,
a'))
(7.32)
where
prk)(a> a') = P (rr, ar+1 = a', utk) = 0 I ar = a)
= P (rt I at+1 = a', at = or, utk) = 0) (u(k) X P (at+1 = a' I at = or, u(tk) = 0) P
(7.33)
0)
and Or, a' E {0, 1, ... , 21 - 1}. The matrix element p(k)(a, a') is the conditional probability that we at depth t receive the c-tuple rt, that the encoder makes the state transition to at+1 = a'
at time t + 1, and that the kth information symbol at depth t is utk) = 0, given that it is at at = or at depth t. As a counterpart to (7.30), we have eoPoPl .. . Pt-1Pi(k)Pi+1 ... Pn+m-1 = (yi(k)0...0)
(7.34)
where yi(k) is the conditional probability that we receive r[o,n+m), given that a code sequence corresponding to u(ik) = 0 is transmitted, that is, Yi(k) _
P(r[O,n+m) 10,[O,n+m])P(a[O,n+m])
(7.35)
[n+m] ES[0 n+m]i
which is the numerator of (7.26). Hence, we can rewrite (7.26) as
P
(u(k)
= 0 I r[o,n+m)) =
yi(k)/Y
(7.36)
where y and yi(k) are given by (7.30) and (7.35), respectively.
The most crucial part of the two-way algorithm is the calculation of yi(k) for i =
0, 1, ... , n - 1 and k = 1, 2, ... , b. First, we start at the root and go forward and calculate aef (7.37) ai = (ai (0) ai (1) ... ai (2° - 1)) eoPoPl ... Pi-1, 1 < i < n + m By convention, we have ao = eo. For each depth i, i = 1, ... , n + m, the components of ai
are stored at the corresponding states. Then we start at the terminal node at depth n + m, go
Chapter 7
324
Iterative Decoding
backward, and calculate pIk)
_
(k) (0) fik)(1) ...,0ik)(2°
def = eoP
- 1)) (7.38)
T
+m-1 Pn+m-2
Pi+1 (Pi(k))
,
0 < i < n, 1 < k < b
By combining (7.34) with the definitions of ai and )3 k), we obtain 2'-1 9ik)(a),
Yi(k) _ E ai (a)
0
(7.39)
U=0
From (7.31) and the definition of ai (a) it follows that Y = an+m (0)
Hence, combining (7.36), (7.39), and (7.40) yields EQ=o ai (a)pi(k) (a) 0 < i < n, 1 < k < b P (k) _ an+m(0)
(7.40)
(7.41
Since the matrix Pt is sparse, it is efficient to compute yi(k) by trellis searches. For each state we will calculate both forward and backward multiplicative metrics. In the forward direction, we start at depth t = 0 with the metric
u.n(al =
or = 0 otherw ise
1,
1 0,
(7.42)
Then for t = 1, 2, ... , n + m we calculate the forward metric At (or) as 2'-1 it, W) = Y, µt_i(a)pt-1(a, a'), 0 < a' < 2v
(7.43)
a=o
where pt-I (a, a') is given by (7.28). The sum has only 2b nonzero terms. The forward metrics At (a') are stored at the corresponding states. It is easily verified that
µi(a)=ai(Q), 0
(7.44)
where ai (a) is given by (7.37). In the backward direction, we start at depth n + m with the metric
µn+. (a") =
1
1,
a'=0
0,
otherwise
(7.45)
Then for t = n + m - 1 , n + m - 2, ... , 1 we calculate the backward metric 2°-1
At+1(a')pt(a, a'), 0 < a < 2°
At(a)
(7.46)
a'=0
where pt (Q, a') is given by (7.28). It is easily shown that 2°-1
ik>(or),
p(k)(a,
0 < i < n, 1 < k < b, 0 < a < 2°
(7.47)
a'=0
where pik)(a, a') and
P,k)(a)
are given by (7.33) and (7.38), respectively.
Finally, we obtain
2-1 2-1 Yi(k)
= a=0 L a'=0 L µi (a)pik)(a, a')µi+i (U')
The two-way algorithm can be summarized as follows:
(7.48)
Section 7.2
The Two-Way Algorithm for APP Decoding
325
The two-way algorithm for APP decoding TWA1. Initialize µo(0) _ %tn+m(0) = 1, AO(a) _ i2n+m(a) = 0 for all nonzero states
(0,00). TWA2. For t = 1, 2, ... , n + m calculate the forward metric 2"-1
It, (0") = E At-1(a)pt-1(a, a'), 0 < a' < 2° a=0
TWA3. For t = n + m - 1, n + m - 2, ... , 1 calculate the backward metric 2°-1
t-tt+1(a')pt(a, a'), 0 < a < 2°
At (a) a'=0
TWA4. For i = 0, 1 , ... , n - l and k = 1, 2, ... , b calculate 2"-1 2"-1 Yi(k)
=
/ti(a)Pik)(a, a')Ai+1(a') a=0 a'=O
and output
P (u(k) = 0 I r[On+m) = Yi1k)//tn+m(0) In iterative decoding, for example, we use the a posteriori probabilities that are calculated
by the two-way algorithm as a priori probabilities in the following step of the iteration. In maximum a posteriori probability (MAP) decoding, the a posteriori probabilities are used to obtain estimates of the information symbols; we simply use the rule Il(k) =
(7) .49
f 0, if p (uik) = 0 I r[,+,,)) > 1/2 1,
`
otherwise
EXAMPLE 7.1
Consider the binary input, 8-ary output DMC shown in Fig. 4.5 with transition probabilities P(r I v) given by the following table:
0 1
04
03
02
01
11
12
13
14
0.434 0.002
0.197 0.008
0.167 0.023
0.111 0.058
0.058 0.111
0.023 0.167
0.008 0.197
0.002 0.434
Suppose that the same encoding matrix G(D) = (1 + D + D2 1+D 2 ) as in Example 4.2 is used and that four information symbols followed by two dummy zeros are encoded. Assume that the a priori probabilities for the information symbols are P(u, = 0) = 2/3, t = 0, 1, 2, 3. For the dummy zeros we have P(u, = 0) = 1, t = 4, 5. Let r = 1104 0112 1101 0111 0113 0403 be the received sequence. The trellis is shown in Fig. 7.3. We will use the two-way algorithm to obtain the a posteriori probabilities P(ut = 0 1 r[o,6)), t = 0, 1, 2, 3. First, we calculate the probabilities pt (a, a') and pt') (or, a'). Then we calculate the forward metrics At (a') according to (7.42) and (7.43) and write the values next to the corresponding states in Fig. 7.4 (TWA2). The backward metrics At (a) are calculated according to (7.45) and (7.46) and their values are written next to the corresponding states in Fig. 7.5 (TWA3).
326
Iterative Decoding
Chapter 7 r =
1104
0112
0403
0113
Figure 7.3 The trellis used in Example 7.1.
1.000
1.678.
10-2
2.85610-5
4.57210-7
1.22910-7
6.263-10-8
9.621.10-12
2.44210-9
8.22710-13
3.22510 10
Figure 7.4 The forward metrics µ1(a') are written next to the corresponding states.
8.22710 13
4.808-10-11
5
6.103-10-9
3.25910 7
7.59210
3.369-10-8
8.025-10-6
3.49910 7
8.55010-2
Figure 7.5 The backward metrics µ1(a) are written next to the corresponding states.
We have now reached step TWA4 and calculate y,(1) according to (7.48). Then we obtain
yol) -0-8069. 10-14 yi 1) = 0.1747 10-14
yzl) -0-1854. 10-14 Y3(1) -0-8226. 10-14
Section 7.2
The Two-Way Algorithm for APP Decoding
327
corresponding to the four information symbols and (to check our calculations)
Y41) = 0.8227 10-14
Y/) = 0.8227. 10-14 corresponding to the two dummy zeros in the tail. Since t6(0) = 0.8227 10-14 (see Fig. 7.4), we have the a posteriori probabilities
p (uo1) = 0 I r[o5)) = 0.9808
p (ui) = 0 I r[o,6)) = 0.2124 p (t41) = 0 I r[o6)) = 0.2254
p
(t41)
= 0 I r[o6)) = 0.9999
(and for the two dummy zeros, as expected, p (u41) = 0 I r[o6)) = 1.0000
P
(u51)
= 0 I r[0,6)) = 1.0000)
Using (7.49), we obtain the estimated information symbols u o)4) = uo1)ui1)u21)u31) = 0110. It is interesting to notice that the maximum-likelihood sequence estimate obtained by the Viterbi algorithm in Example 4.2 is the same.
In Fig. 7.6, we show the bit error probabilities when the two-way algorithm is used to communicate over the AWGN channel. Simulation results are shown for the two rate R = 1 /2
encoding matrices GI(D) = (1 + D + D2 1 + D2), and G2(D) = (1 + D + D2 + D3 + D6 1 + D2 + D3 + D5 + D6) for both L = 30 and 100 information symbols, followed by a tail of m dummy zeros. 100
Pb
Figure 7.6 Bit error probabilities for G1 (D) =
(1 + D + D2
1 + D2) and G2(D) = (1 +
D+D2+D3+D6 I+D2+D3+D5+ D6) for the two-way algorithm and the AWGN channel. The curves from left to right are G2(D) with L = 100 information symbols, G2(D) with
L = 30 information symbols, G1(D) with L = 100 information symbols, and G 1(D) with L = 30 information symbols.
328
Chapter 7
Iterative Decoding
Next, we assume that the encoder is systematic; that is,
= u(k), 1 < k < b
v(k)
(7.50)
From (7.36) we obtain
r [o,.+.)) = 1 -
P (u(k) = 1
Y - yi(k) (k)/
(7.51)
Let Aik) denote the likelihood ratio of the a posteriori probabilities for the kth information symbol in the ith b-tuple; that is, (k) def
Al
p (ui \
0
l
r[O ,n+m) )
(k)
(7.52)
)
(k)
Combining (7.36) and (7.51) yields -`(k)
A(k)
=
(7.53)
- Yi(k) Y
Now we will factorize A( k) into factors representing the intrinsic information about the information symbol uik), the ratio of a priori probabilities of u(k), and the extrinsic likelihood ratio. Let rig) denote the tth received c-tuple except its kth symbol; that is,
rt(A)
- rt(1) rt(2) ... rt(k-1)rt(k+l) ... rt(b)
(7.54)
Since the channel is memoryless, we can rewrite (7.33) as follows: Pi k) (or, Q')
= P ri XP
0'i+1 = Q', 0'i = a,
I
U( ik)
= 0)
= o, 0,i = a, uik) = 0) P u k)
(0,i+1
= P (ri(k) Qi+1 = Q', Qi = Or,
u(ik)
X P (ri K) 0'i+1 = Q', Qi = or,
0)
= 0)
u( ik)
= 0) (7.55)
xp(ai+1= P (rI(k)
I
uik)
XP (rig)
I
0) 0) p (uik)
0)
01i+1 = 0', 01i = 01, uik)
= 0)
XPiai+1=a'I Qi=O,u(ik)=0 where the last equality follows from the systematicity of the encoder (ri(k) depends only on u( k) and not on the state transition). Let us introduce piX)(O,
Q')
def
=P =P
riK>
Qi+1 = Q' I Qi = or, uik) = 0 Qi+1 = o'', ori = or, uik)
= 0)
X P (Qi+1 = Q' 1 ori = Q, uik) = 0)
(7.56)
Section 7.2
The Two-Way Algorithm for APP Decoding
329
Then, combining (7.55) and (7.56) yields p(k) (a
a') = P (ri(k)
uik)
= 0) P (U (k) = 0) pi ) (a a")
(7.57)
u(k)
= 0) P (uik) = 0)
(7.58)
or, equivalently, in matrix form p(k)
=P
(rl(k)
where (9)
> 0")) = pi(') (0"
Pi
(7.59)
Analogously to (7.55), we have
Pj(a, a') - p(k)(a, a') = P (r6(k) I u`k) = 1) P (uik) = 1)
x P (r`X) ai+t = a,, ai = or, u(k) = 1)
(7.60)
xP(ai+i=a'Iai=a,u;k)=1) or, equivalently, in matrix form
A - Pi(k) = p (r(k) I u;k) = I) p (u(k) = 1) PrX) 1.
I
(7.61)
where the "complementary" matrix (corresponding to u(k) = 1) Pic(K)
_ ( C°)(Q a')) pi
(7.62)
a a'
and where PiCW
' def (a, a) = P((lE) ri , ai+i = a'
=P
ai = or, u i(k) = 1
(rig) I a=+1 = a', a = or, u(k) = 1)
(7.63)
xP(ai+i=o'I ai=a,u;k)=1) Analogously to (7.36), we obtain the following expression for the likelihood ratio of the a posteriori probabilities for
(k) -
P (u( = 0 I r[Qn+m)
Al
(
)
k)
(
/
(7.64)
P (uik) = 0) p (ri(k) I u(k) = 0)
P (u` k) = 1) p
u(k)
1) = 1)
(K) Yi
yiCM
(r`(k)
where
yi(K)
and yi") are defined by eoPoPl ... Pi-I P i ( ' )
... Pn+m-t def (Yi(X)0...0)
(7.65)
and
eoPoPl .
.
.
Pi-1Pic(')Pi+t . .
. Pn+m-tdef - (Yi c(tE)0... o)
(7.66)
respectively. The first factor in (7.64) is the ratio of the a priori probabilities of the information symbol u;k), the second is the intrinsic likelihood ratio, and the third is the extrinsic likelihood ratio.
330
Iterative Decoding
Chapter 7
7.3 THE TWO-WAY ALGORITHM FOR TAILBITING TRELLISES
In Section 4.9, we described trellis representations of tailbiting block codes as well as their maximum-likelihood decoding. Here we consider a posteriori probability (APP) decoding of rate R = b/c tailbiting block codes of block length N = Lc code symbols, where L is the block length in branches. We assume that the underlying rate R = b/c convolutional code has memory m and overall constraint length v. Let at denote the encoder state at depth t, where
atE{0,1,...,2°-1). We impose the tailbiting condition that only those paths that start and end at the same state are valid. Then as a counterpart to S[O,n+m] in (7.24) we let S[0,L](a) denote the set of all
state sequences 0[0,L] such that a0 = aL = a; that is, S[0,L] (a)
def
{°[0,L] = a0a1 ...aL I ao = aL = a},
(7.67)
aE{0,1,...,2°-1}
As a counterpart to S(o,n+m]i in (7.25), we let S1 i (a) denote the set of state sequences Q[o,L]
such that a0 = aL = or and that the transition from state ai at depth i to state ai+l at depth i + 1 implies that u(k) = 0; that is, S[o L]i (a)
de_f
{a[0 L] = a0 a1 ... aL I ao = aL = a & ai
ai+1
u(k)
= 0}, (7.68)
aE{0,1,...,2°-1} To obtain the a posteriori probability that u`k) = 0 given the received sequence r[O,L), we have to sum both the numerator and denominator of (7.26) over all starting and ending states such that a0 = aL = or. Then we obtain the following expression: (k)
P ui(k)
(r[o L)+ Ui = 0 I r[0 L)
P
=
P (r[0, L) )
2°-1 Ea TfO,L]ES[oL],(Q) a=0
P(r[o,L) 10[0,L])P(0[o,L])
21-1
Es
Q
( 7.69 )
P(r[o,L) I ci[0,L1)P(0[0,L])
0
Ai=P0P1...Pi_1, 1
(7.70)
and (Pi(k))T
Bi(k) = Pi where Pi and
Pi(k)
1
Pi2 ... P+1
0 < i < L, 1
(7.71)
are given by (7.27) and (7.32), respectively. By convention
A0 = I2
(7.72)
where I2 is the 2° x 2° identity matrix. As a tailbiting counterpart to yi(k) (cf. (7.35)), we have the 2° x 21 dimensional matrix
CM = A (B(k))T
=PoP1...Pi-1(Pi(k)lIPi+1 ... PL 1,0
(7.73)
Because of the tailbiting condition, we are interested in the sum of the diagonal elements of
Section 7.3
The Two-Way Algorithm for Tailbiting Trellises
331
the matrices AL and C. Along the diagonals of AL and C;k) we have
T
P(r(o,L) 10[o,Ll)P(O[o,Ll), 0 < or < 2°
O[O,LJ ES[O,L] (o)
and
P(r[o,L) I 0[0,L])P(o[0,LJ), 0 < or < 2°
L_.. O[O. L]ES[Ok).L]i (Q)
respectively. Hence, we can rewrite (7.69) as P (u; k) = 0 I r[o,L))
_
TrC(k) `
Tr AL
,
0 < i < L, 1 < k < b
(7.74)
where Tr M m;; is called the trace of the matrix M = (m, ). Also in the tailbiting version of the two-way algorithm, we exploit the sparseness of the
matrix P, and compute the a posteriori probabilities P(uik) = 0 I r[o,L)) by trellis searches. Let us introduce the forward (vector) metric ii (a') = (Ato(a') N-:1(a') ...1Lt(2°-1) (a'))
(7.75)
In the forward direction, we start at depth t = 0 with the (vector) metric
l-to(a) = e0, 0 < a < 20
(7.76)
Then for t = 1, 2, ... , L we calculate the forward (vector) metric 2°-1
tit(a') = i ILt-1(a)pt-1(a, a'), 0 < a' < 2°
(7.77)
-o
where pt _1(a, a') is given by (7.28). The vectors Mt(a'), 0 < a' < 2°, are stored at the corresponding states. Analogously to the forward (vector) metric µt (a'), we introduce the backward (vector) metrics
/-t, (Or) _ (µto(a) lit1(a) ... µt(2"-1)(a))
(7.78)
and
µtk)(a)
_
(µ o)(a) µ 1) (a)
... At(2 ,-,)(Cr))
(7.79)
In the backward direction we start at depth L with the (vector) metric
PL(a)=e0, 0
(7.80)
Then for t = L - 1, L - 2, ... , 0 we calculate the backward metrics 2°-1
N't+i (a') pt (a, a'), 0 < a < 2°
ltt (a) =
(7.81)
=o
and
T-1 (k) (a) = Y Ftt+1(a')P(tk) (a, a'), 0 < a < 2V , 1 < k < b
(7.82)
a'=0
where pt (a, a') and p(tk) (a, a') are given by (7.28) and (7.33), respectively. Finally, we have the (scalar) metrics 2°-1
µ,o((,')jt1o)(a'), 0 < i < L, 0 < a < 2°, 1 < k < b =o
(7.83)
332
Chapter 7
Iterative Decoding
Since
2"-1
E AL, (a) = Tr AL
(7.84)
a=0 and
2°-1
1: t,;Q) = TrCik) 0 < i < L, 1 < k < b
(7.85)
a=0
we have the following two-way algorithm for a posteriori probability decoding of tailbiting trellises.
The two-way algorithm for APP decoding of tailbiting trellises
TWAT1. Initialize µo(a) = jL(a) = ea, 0 < or < 2° TWAT2. For t = 1, 2, ... , L calculate 2°-1
Nr(or') = E mt-1(a)pt-1(a, a'), 0 < a' < 2° U =O
T W A T 3 . For t = L - 1 ,
L - 2, ... , 0 calculate 2°-1
iit(a) _ E N-t+1(a')pt(a, o`), 0 < a < 2" a'=0 and 2°-1
flik)(a) _ Y 1jt+1(a')p(,k)(a, a'), 0< a< 2°, 1< k< b a'=0
TWAT4. For i = 0, 1, ..., L - 1 and k = 1, ..., b calculate 2°-1
(a,)N'ia)(a,),
0< I< L, 0< a< 2°, 1< k< b
a'=0
and output
P (Um = 0 1 r[0 L)) =
'`2°-1
(k)
L.a=O I'ia 2v-1
La=o ALa(a)
EXAMPLE 7.2 Consider the same channel as in Example 7.1 and suppose that the encoding matrix G(D) 1 + D) = (1 is used to encode a tailbiting representation of a block code of block length N = 6 code symbols. Assume that the a priori probabilities for the K = 3 information symbols are P(u, = 0) = 2/3, t = 0, 1, 2. The trellis together with the received sequence is shown in Fig. 7.7. r = 1104
0102
1102
00 r--i 00 r--i 00 v v
11
11
1
11
ul
ui
U1 1
1
1
Figure 7.7 The tailbiting trellis used in Example 7.2.
We will use the two-way tailbiting algorithm to obtain the a posteriori probabilities P(ut = 0 r[o,3)), t = 0, 1, 2. First we calculate the probabilities p, (a, a') and p,( l)(a, a'). Then we calculate the forward (vector) metrics p.t(a') according to (7.76) and (7.77) and write the values next to the corresponding states in Fig. 7.8 (TWAT2).
Section 7.3
333
The Two-Way Algorithm for Tailbiting Trellises 0.0077).10-2
10-4
(1.0000 0.0000)
(1.6781
(2.0750 0.2828).
(0.0000 1.0000)
(0.0074 1.6058) .10-2
(0.0770 0.5189)10-4
(1.3467
(0.2242
0.2287).10-6
0.3447).10-6
Figure 7.8 The forward (vector) metrics lit (a') are written next to the corresponding states.
The backward (vector) metrics µt (a) and µ(k) (a) are calculated according to (7.81) and (7.82), and their values are written next to the corresponding states in Figs. 7.9 and 7.10, respectively, (TWAT3). (1.3467
0.2242).10-6
(0.8019 0.1327) .10-4
0.2140)10-4
(0.1386
(0.2287 0.3447).10 6
(0.6457 0.0851).10-2
(1.0000 0.0000)
(0.0889 0.6179). 10-2
(0.0000 1.0000)
Figure 7.9 The backward (vector) metrics µ,(a) are written next to the corresponding
states. (1.3457
0.2227).10-6
(0.0062 0.0010)10 6
0.0000).10-2
0.1052)10-4
(0.7980
(0.6457
(0.1.099 0.0145). 10-4
0.0000).10-2
(0.0889
Figure 7.10 The backward (vector) metrics jii])(a) are written next to the corresponding states.
We have now reached step TWAT4 and calculate µ(1) according to (7.83). Then we obtain
µ0o + µol) = 1.3467
.10-6
/ii0+i =1.3640.10-6 A20 + /L21) = 1.3467 10-6
corresponding to the three information symbols. Since /ILO(0) + µL1(1) = 1.6914. 10-6
we have the a posteriori probabilities
P (uo') = 0 1 rt0,3)) = 0.7962 P (ul1) = 0 1 r[0,3)) = 0.8065
P(
(1)
=0
r[0,3)
= 0.7962
Hence, the maximum a posteriori estimate of the information symbols is k = 000.
In Fig. 7.11, we compare the bit error probabilities when the two-way tailbiting algorithm is used to decode the 16-state and 64-state tailbiting representations of the extended Golay code (described in Section 4.9) when they are used to communicate over the BSC. The discrepancy between the two curves is due to the different mappings between the information symbols and the codewords for the two representations.
334
Chapter 7
Iterative Decoding
loo
10-2
Pb
10-4
3
1
5
9
7
Eb/No [dB]
Figure 7.11 Bit error probabilities for the 16state (left curve) and 64-state (right curve) tailbiting representations of the extended Golay code 824.
7.4 THE ONE-WAY ALGORITHM FOR APP DECODING The two-way algorithm is applicable only to terminated convolutional codes. In this section, we consider the one-way algorithm, which is a forward-only algorithm for a posteriori decoding of convolutional codes. It uses a sliding window and can be considered as the APP decoding counterpart of the Viterbi algorithm with a finite back-search limit considered in Section 4.8.
The one-way algorithm calculates the a posteriori probability for u(ik) = 0 given that the receiver has reached depth i + r, that is, based on the received sequence r[o,i+T). Hence, analogously to (7.26) we have (k)
P ui
P
= 0 I r[o,i+r)
(k)
)
\
(7.86)
-
F-a[o,i+r]ES(oi+r]i P(r[o,i+r) I o,[o,i+r])P(cr[oI+r])
Y_U[o,i+r]ES[o.i+r] P(r[o,i+r) 10[Oi+r])P(0[O,i+r])
1
where S[o,i+r] and S[o,i+r]i are given by S[O,i+r]
def
_ {o [o,i+r] = aoai ... Qi+r
1 QO = 0)
(7.87)
and (k)
def
`S[O,i+r]i = }°[o,i+r] = QOQ1 ... Qi+r I Qo = 0 & Qi -+ Qi+1 = uik) = 0}
(7.88)
respectively. Let del
2"-1
Yi+r = E ai+r (Q) a=0
(7.89)
Section 7.4
The One-Way Algorithm for APP Decoding
335
where ai+r (a) is given by (7.37). Then it follows that yi+r equals the denominator of (7.86). Let a(k) i+r
_
(ask) i+r (1) t+r (0) a(k)
def
... a(k) i+r (2°
- 1))
e0 P0P1 ... Pi-] Pi(k) Pi+1 ... Pi+r
(7.90)
1
where Pi and Pi(k) are given by (7.27) and (7.32), respectively. Then let 2°-1 (k) def
(k)
ai+r (Q)
Yi+r
(7.91)
0=0
We conclude that the numerator of (7.86) equals yi(k) . Hence, we can rewrite (7.86) as
P u (k) i
(k) = 0 r[o i+r> = Yi+r/yi+r
)
(7.92)
Both yi k) and yi+r can be calculated recursively. First, we consider yi+r. For yi+r it follows immediately from (7.37) that
=
J ao ai+r+l
e0
(7.93)
ai+r Pi+r
which together with (7.89) yields yi+r In order to calculate yi(k) we introduce
ate) = defeoPOPl
1))
(7.94)
...Pi-I Pi(k)Pi+i ...Pj-i, j-t
Let
Aij=
a(2)
tj
, j-t
(7.95)
a (b)
be a b x 2° matrix and let At be a (b(r - 1) + 1) x 2° matrix whose first t - 1 entries are the matrices Ait, t - r < i < t - 1, and the last entry is the vector at given by (7.37); that is,
(7.96)
It is easily shown that
At-r+i,t+i At-r+2,t+1 (7.97)
At-i,t+l If we delete the top matrix At-r+i,t+i from At Pt, shift all matrices At-i+i,t+i, I < i < r, up
Chapter 7
336
Iterative Decoding
one position, and replace the matrix At_1,,+1 by the matrix
(7.98)
At,t+l =
Ob then we obtain
(7.99)
At+1 =
The rows of the deleted top matrix At-,+1,t+1 are the vectors a t+1, 1 < k < b, defined by (7.90). They are used to calculate the probabilities P (u(k)T+1 = 0, r[o,t+T)) according to (7.92).
The sparseness of the matrices Pt and Pt(k) can be exploited to simplify the calculation of the elements of the matrix At. Assign to each of 2° states at depth t of the trellis a (br + 1)dimensional column vector metric (b)
(2)
o O p()t +1,t( Q) r rO Q ILt_t+1,t (or) ,L _+l,t lit (or) = (µ (1-) t (a) µ(i)1 t(a) µ()1 t(a) ... i41(a) /bt(a))T , t = 0, 1, ... , 0 < or < 2°
(7.100)
such that µ0
(a)
(00... 01)T, if or = 0 (00...00)T, otherwise
jl
(7.101)
For t = 1, 2, ... we first calculate 2°-1
, lit-1(a)Pt-1(a, a')
At (0") _
(7.102)
U=0
then we exclude the first b components of (7.102) (for t > r they are used for calculating the I a posteriori probabilities P(u(k) r[O,t)), 1 < k < b), shift all components except the last one
b positions up, and replace the following entries b(r - 1) + 1, b(r - 1) + 2, ... , and br by Pt -I (a, or')
(a')
df =e
X1(2) 1
2°-1
E µt
P(_
(a a') (7.103)
1(a)
=o
/-,(Or') (a') I
(b)
Pt 1(a or ')
Then we obtain the metric µt+, (a'). As mentioned earlier, the first b entries of the vectors At (a'), 0 < a' < 2°, t > r viz., 2-1
bit-r-l,t-1Pt-1(a, a'), 1 < k < b
t-t-1,t(a') _
(7.104)
=0
is used together with p (a'), 0 < a' < T', to calculate the a posteriori probability
p
(u`k)r = 0 1 r[o,t)) = Yt(k)/Yt, 1 < k < b
(7.105)
Low-Density Parity-Check Convolutional Codes
Section 7.5
337
where 2"-1
Yt(k) _
1
ik1T-l,t (Q'),
(7.106)
=0 and
2"-1
Yt = E ltr(Q)
(7.107)
0=0
Finally, we obtain the a posteriori probabilities from (7.92). The one-way algorithm requires more memory than the Viterbi algorithm but less than the two-way algorithm. The decoding delay is much less than that of the two-way algorithm. EXAMPLE 7.3 Consider the binary input, 8-ary output DMC used in Example 7.1. Suppose that the encoding matrix G(D) = (1 + D + D2 1 + D2) is used and assume that the a priori probabilities for the information
symbols are P(ut = 0) = 2/3, t = 0, 1. .... Let r = 1104 0112 1101 0111 0113 0403 0411 1201 0111 1201 0401 0302 1101 011, ... be the received sequence. For the one-way algorithm we obtain the following results:
r=6
t
1i+1r
1't+r
P(ui")=01 r[o,t+r)J
ut
0
5
0.3132.10-22
0.5459. 10-12 0.9089. 10-14 0.5249. 10`16 0.3780. 10-18 0.2599. 10-20 0.5014. 10-22
6
0.6478. 10-24
8
0.3536. 10-24 0.1108. 10-26 0.1841. 10-28
0.9849 0.3483 0.2557 0.5652 0.8844 0.6246 0.5458 0.2374 0.5984
0
4
0.5376. 10-12 0.3166. 10-14 0.1342. 10-16 0.2137. 10-18 0.2299. 10-20
t
'Yi+)r
'Yt+r
1
2 3
7
0.4668 10-26 0.3076. 10-28
1 1
0 0 0 0
0 1
r = 12
0
0.6393
1
0.1944. 10-26
0.6478 0.4668
2
0.6923
0.3076 10-28
10-24 10-29
P\ui1)= 0 I
10-24 10-26
0.9868 0.4164 0.2251
ur 0 1 1
In Figs. 7.12 and 7.13, we show the bit error probabilities for the AWGN channel when the one-way APP decoding algorithm is used for maximum a posteriori probability (MAP) decoding for rate R = 1/2 convolutional codes encoded by the encoding matrices G(D) _
(1+D+D2 1+D2)andG(D)=(1+D+D2+D3+D6 1+D2+D3+D5+D6) (Qualcomm's memory m = 6 encoder), respectively. 7.5 LOW-DENSITY PARITY-CHECK CONVOLUTIONAL CODES
In this section we will introduce a class of convolutional codes that are suitable for iterative decoding. Consider rate R = b/c, time-varying convolutional codes defined by time-varying syndrome formers (cf. Sections 2.9 and 2.10). We have the following straightforward generaliza-
338
Chapter 7
Figure 7.12 Bit error probabilities for G(D) _ (I + D + D2 I+ D2) for the one-way algorithm and the AWGN channel. The curves from right to left are for r = 4, 6, 10, and 20.
Pb
Figure 7.13 Bit error probabilities for G(D) =
(l+D+D2+D3+D6 1+D2+D3+D5+
D6) for the one-way algorithm and the AWGN channel. The curves from right to left are for r = 12, 18, 30, and 60.
Iterative Decoding
Section 7.5
339
Low-Density Parity-Check Convolutional Codes
tion of (2.378) to time-varying syndrome formers:
+ vt_m Hm (t) = 0, t E Z
v,HO (t) + v,_1H,T(t) +
(7.108)
where HiT (t), 0 < i < ms, are the c x (c - b) submatrices of the bi-infinite syndrome former
Ho (-1) Hi (0) Ho (0)
...
Hm (ms - 1) H,n (ms)
Hi (l)
(7.109)
and ms is the memory of the syndrome former. We assume that Ho (t) has full rank and that Hms (t) 0 0 for all t E Z. Furthermore, we assume without loss of generality that the last c - b rows of Ho (t) are linearly independent. The parity-check symbols belonging to the rows of the bi-infinite syndrome former HT can be compactly written as (c - b)(ms + 1)-dimensional binary vectors hi, -00 < i < oo. We have the following Definition Let the rate R = b/c convolutional code C be defined by its bi-infinite syndrome former HT of memory ms. It is called a low-density parity-check convolutional code (LDPC) if the row vectors hi are sparse for all i; that is, if
wy(hi) << (c - b)ms, -oo < i < oo
(7.110)
If both all rows and all columns of the syndrome former of an LDPC convolutional code have constant numbers of ones, although in all nontrivial cases not the same for the rows and the columns, then the LDPC convolutional code is called homogeneous [JiZ97]. The class of homogeneous LDPC convolutional codes is a subclass of the turbo codes examplified in Section 7.1 and defined below. Next we will introduce various scramblers.
Definition A bi-infinite matrix S = (sib), i, j E Z, that has one 1 in each row and one 1 in each column and that satisfies the causality condition sib = 0,
i<j
(7.111)
is called a convolutional scrambler. Let x denote the bi-infinite binary input sequence of a convolutional scrambler, and let
y =XS
(7.112)
denote the corresponding output sequence. The convolutional scrambler permutes the symbols in the input sequence.
The identity scrambler has ones only along the diagonal and is a special case of a convolutional scrambler. If
si,i+s = 1, i E Z
(7.113)
we have a delay scrambler with delay 3. The identity scrambler is a delay scrambler with delay S = 0. A block interleaver consists of two arrays; the input symbols are written in the interleaver arrays one row at a time and read one column at a time. In Fig. 7.14 we show a 2 x 2 block interleaver. When the fifth input symbol is written in the second array, then the first input symbol is read; when the sixth input symbol is written, then the third input is read; and so on.
340
Chapter 7
1
2
5
6
3
4
7
8
Iterative Decoding
Figure 7.14 A 2 x 2 block interleaver.
The 2 x 2 block interleaver is a convolutional scrambler with scrambling matrix 2
-1 0
3
4
5
6
7
8
9
10
11
12
13
1 1
1
1
1
I (7.114)
1
1
1 1
8
1
9
1
Next we will describe a construction method due to Jimenez and Zigangirov [JiZ97]. Consider first an n x n diagonal matrix. By permuting the columns, we can obtain any of the n! n x n
matrices with exactly one 1 in each row and one 1 in each column. Then we unwrap the submatrix which is below the diagonal. The procedure is illustrated in Fig. 7.15.
1
1
1
Figure 7.15 Unwrapping a 5 x 5 matrix.
The unwrapped matrix is then repeated indefinitely, and we obtain the convolutional scrambler shown in Fig. 7.16, where we also have shifted the matrix one position to the right and added a diagonal of only zeros to avoid having an input symbol appear directly at the output. (As before, all blanks denote zeros; the zeros along the diagonal are the only zeros actually written as zeros in the figure.) When the construction is based on randomly chosen equiprobable permutations of the columns of the diagonal matrix, we obtain the class of uniform convolutional scramblers. We have the following generalization of a convolutional scrambler:
Definition A bi-infinite matrix S = (sf1), i, j E Z, that has at least one 1 in each row and one 1 in each column and that satisfies the causality condition (7.111) is called a multiple convolutional scrambler.
Multiple convolutional scramblers not only permute the input symbols; they also make multiple copies of them.
Low-Density Parity-Check Convolutional Codes
Section 7.5
9
341
1
01
1
1
0 0 0 0
1
0
1
0
1 1
0
1 1 1 1
Figure 7.16 A convolutional scrambler obtained from the unwrapped matrix in Fig. 7.15.
As we will see below, a convolutional scrambler is often used in cascade with a syndrome
former. Then its input sequence consists of subblocks of c binary symbols; hence, it is convenient to represent the scrambler by a matrix S = (S11), whose entries are c x d, d > c, submatrices Sij, i, j E Z. Since each column of S has one 1, it follows that the rows on the average will have d/c ones. The ratio d/c is called the rate of the scrambler. If all rows have the same number of ones, then the scrambler is called homogeneous. Let S denote the set of nonzero submatrices Si j of the matrix S. Then
S=max{j - i S,j ES} I
i. j Ez
(7.115)
is called the delay of the scrambler and is a straightforward generalization of the delay of a delay scrambler as defined by (7.113). The delay of the scrambler shown in Fig. 7.16 is S = 5. We are now well prepared to define a turbo code. Definition A low-density parity-check (LDPC) convolutional code is called a turbo code if its bi-infinite syndrome former H u can be written as the product of a multiple scrambler S and a bi-infinite syndrome former H,, of a combination of constituent convolutional codes; that is,
HT = SHCC
(7.116)
in such a way that the delay 8 of scrambler is much larger than the memory me of syndrome former of the constituent convolutional code; that is, 8 >> mc. The most important and most difficult aspect of constructing turbo codes is the design of the multiple scrambler. An attractive method is to combine several simple scramblers with a more complicated one.
Definition Consider two matrices S(') = (ST) and S(2) _ (Si(?)) whose entries are of sizes c x d1 and c x d2, respectively. The matrix S = Sw m S(2) = (Sij), i, j E Z (7.117) is called column-interleaved if Si(2j)
=S
Si (2j+1) =
foralli, j EZ.
(2)
Iii
(7.118)
342
Chapter 7
Iterative Decoding
The entries Si(2j) and Si(2j+1) of the matrix S have sizes c x d1 and c x d2, respectively. In general, d1 d2. If we join the two entries in (7.118), then we can regard S as a matrix whose submatrices are of size c x (d1 + d2). EXAMPLE 7.4 Consider the identity scrambler and the scrambler given by (7.114). By column-interleaving, we obtain the rate 2/1 scrambler .
1
1 1
1
S=
I
(7.119)
1
1 1 1
1
with delay 1 = 5.
Next we will take the column-interleaving construction one step further:
Definition Consider two matrices S(1) = S ()) and S(2)
_
(S(?)) whose entries are of
sizes cl x d1 and c2 x d2, respectively. The matrix
S = S(1) ®S(2) = (Sij), i, j E Z
(7.120)
is called row-column interleaved if (1)
S(2i)(2j)
= Sij
S(2i)(2j+1)
=0 =0
S(2i+1)(2j)
(7.121) (2)
S(2i+1)(2j+1) = Sij
foralli, j EZ. If we join the four submatrices in (7.121), then we can regard S as a matrix whose submatrices are of size (c1 + c2) x (d1 + d2). EXAMPLE 7.5 Consider the column-interleaved scrambler (7.119) in Example 7.4 and two identity scramblers. By row-column interleaving, we obtain the rate 4/3, nonhomogeneous scrambler
1
1 1 1 1
1 1
S=
1
(7.122) 1
1 1 1 1 1
with delay 8 = 5.
Section 7.5
Low-Density Parity-Check Convolutional Codes
343
EXAMPLE 7.6 Consider the rate R = 1/2 constituent convolutional code encoded by encoding matrix
Gc(D) = (1 + D + D2 1 + Dz)
(7.123)
In Example 2.7 we computed the Smith form decomposition of Gc(D) and found that B`
(D) _
1+D2
D
1+D 1+D+D2
(7.124)
In Section 2.9 we showed that a syndrome former HT (D) for a convolutional code C is obtained as the last c - b columns of B-1 (D). Hence, it follows that for this example we have D
1+D
H` (D) =
(7.125)
Dz
From (2.428) it follows that Gsys(D) =
1+D
1
(7.126)
1+D+Dz
is a systematic encoding matrix obtained from the syndrome former H, (D) for the convolutional code C. Furthermore, since we can write HT (D) as
HT(D)=Ho+HTD+HTD2
(l) D + (i)D2
_ (1) +
(7.1 27)
it follows from (2.380) that the bi-infinite syndrome former for the constituent convolutional code is
1
0
1
1
1
1
1
0
1
1
1
1
(7.128)
HIT =
Next we will construct a bi-infinite syndrome former for the turbo encoder shown in Fig. 7.1 when its interleaver is the 2 x 2 block interleaver given in Fig. 7.14. We use the rate 4/3 scrambler described in Example 7.5 with the received triplets corresponding to (u,, v(1), v(2) as inputs. The outputs of the scrambler correspond to the four-tuples (u,, u,', v('), where u, is the information symbol at the output of the interleaver in the encoder. The bi-infinite syndrome formers for the constituent encoders having inputs corresponding to (u,, v')) and (u,', v,2) are given by (7.128). By row-column interleaving two such syndrome formers with submatrices of size 2 x 1, we obtain a bi-infinite syndrome former for the combination of the two constituent encoders,
HccT =
H'' ®HT =
1
0
1
1
1
1
1
0
1
1
1
1
(7.129)
The order of the rows in H, corresponds to (u, v0)' ut', V(, ). Hence, to obtain a match with the
344
Chapter 7
Iterative Decoding
interleaver outputs, we swap rows 2 and 3, 6 and 7, and so on. Then we obtain 1
.
0
1
1
0
1 1
1 1
1 1
1
1
H'T= cc -
(7.130)
1
O
1
0
1
1
1
A bi-infinite syndrome former HT. for the turbo code encoded by the turbo encoder given in Fig. 7.1 with a 2 x 2 block interleaver can be written as
HT = SH'T to
cc
(7.131)
where S is the multiple convolutional scrambler given by (7.122) and H'Tc is a bi-infinite syndrome former for the combination of the two identical constituent encoders given by (7.130).
In practice, the interleaver size is much larger than 2 x 2, typically several hundred times several hundred. LDPC convolutional codes are usually decoded iteratively as described in Section 7.1. For large interleavers, the performance is close to the Shannon limit.
7.6 COMMENTS Elias introduced iterative decoding methods as early as 1954 [E1i54]. In his thesis, Gallager used an iterative APP decoding algorithm for the decoding of low-density parity-check (LDPC) block codes [Ga162][Ga163]. He showed that good performances could be achieved with relatively simple decoding methods. Interest in iterative decoding was regenerated when Berrou, Glavieux, and Thitimajshima presented their remarkable findings at the ICC '93 in Geneva [BGT93] [BeG96]. They showed that it was indeed practically possible to communicate near the Shannon limit. The two-way algorithm has a long history. The earliest known application is Gallager's APP decoding algorithm for LDPC block codes mentioned earlier. A variant, often called the backward forward algorithm, has been repeatedly reinvented for maximum a posteriori symbol detection; see, for example, [ChH66][BaP66]. In current coding literature it is often called the BCJR algorithm after Bahl, Cocke, Jelinek, and Raviv [BCJ74]. For a tutorial introduction to a general class of two-way algorithms, the reader is referred to [For97]. The one-way algorithm was suggested by Trofimov [Tro94] and independently by Zigangirov [Zig98]. Engdahl, Jimenez, and Zigangirov extended Gallager's concept of LDPC block codes to convolutional codes [JiZ97][EnZ98]. Finally, although they are beyond the scope of this book, we would like to mention the important contributions of Kotter, Loeliger, Tanner, and Wiberg to iterative decoding of codes defined on general graphs [Tan81][WLK95][Wib96].
PROBLEMS 7.1 Consider therateR = 1/2,memorym = 1, systematic encoding matrixG(D) = (1 1+D). Suppose that the encoder is used to communicate over a BSC with crossover probability c = 0.045. Assume that the received sequence r = 10 01 11 11 11 11.
Problems
345
Use the two-way algorithm to decode the received sequence when (a) the information symbols a priori are equiprobable.
(b) P(u, = 0) = 2/3. (c) P(u, = 0) = 1/3. 7.2 Repeat Problem 7.1 for the rate R = 1/2, memory m = 2 encoding matrix G(D) _ 0 +D+D2 1 +D2) andr= 11 11 00 11 01 11. 7.3 Repeat Problem 7.1 for the rate R = 2/3, memory m = 1 encoding matrix 1
G(D) =
1
D 1+D
D 1
and r = 010 111 000. 7.4 Consider the binary input, 4-ary output DMC with transition probabilities given in Fig. P7.4.
Suppose that the rate R = 1/2 encoder with encoding matrix G(D) = (1 + D + 1 + D2) is used to communicate over the given channel. Use the two-way algorithm to decode r = 0111 1201 1102 0111 1211 1112 when (a) the information symbols a priori are equiprobable. D2
(b) P(u, = 0) = 2/3. (c) P(u, = 0) = 1/3.
Figure P7.4 DMC used in Problem 7.4.
7.5 Consider the binary input, 8-ary output DMC shown in Fig. 4.5 with transition probabilities P(r I v) given by the following table:
r
0 1
04
03
02
01
11
12
13
14
0.1415 0.0001
0.3193 0.0025
0.2851
0.1659 0.0676
0.0676 0.1659
0.0180
0.0025 0.3193
0.0001 0.1415
0.0180
0.2851
Repeat Problem 7.4 forr = 0112 1213 0302 0103 0302 0201.
7.6 Consider the binary input, 8-ary output DMC shown in Fig. 4.5 with transition probabilities P(r I v) given by the following table:
r
0 I
04
03
02
01
11
12
13
14
0.2196 0.0027
0.2556 0.0167
0.2144 0.0463
0.1521 0.0926
0.0926
0.0463 0.2144
0.0167 0.2556
0.0027 0.2196
0.1521
Repeat Proble m 7.4 for r = 0213 120 2 0113 1111 1214 1111.
7.7 Construct a two-state (8, 4) tailbiting representation by using the convolutional encoding matrix G(D) = (1 1 + D). Suppose that this block code is used to communicate over
346
Chapter 7
Iterative Decoding
the BSC with crossover probability e = 0.045. Assume that the received sequence r = 10 01 11 11. Use the two-way tailbiting algorithm to decode the received sequence when (a) the information symbols a priori are equiprobable.
(b) P(u, = 0) = 2/3. (c) P(ut = 0) = 1/3. 7.8 Construct a four-state (4, 2) tailbiting representation by using the convolutional encoding
matrix G (D) = (1 +D+D2 1 +D2). Suppose that this block code is used to communicate over the BSC with crossover probability e = 0.045. Assume that the received sequence
r= 1111. Use the two-way tailbiting algorithm to decode the received sequence when (a) the information symbols a priori are equiprobable. (b) P(u1 = 0) = 2/3. (c) P(u, = 0) = 1/3. 7.9 Use the 4-ary output DMC given in Problem 7.4 and repeat Problem 7.7 forr = 0111 1201 1102 0111.
7.10 Use the 4-ary output DMC given in Problem 7.4 and repeat Problem 7.8 for r = 0111 1201. 7.11 Use the 8-ary output DMC given in Problem 7.5 and repeat Problem 7.7 forr = 0111 1213 0302 0103.
7.12 Use the 8-ary output DMC given in Problem 7.5 and repeat Problem 7.8 for r = 0111 12137.13 Repeat Problem 7.1 for the one-way algorithm when r = 2, 3, and 4. Decode only the first two information symbols.
7.14 Repeat Problem 7.2 for the one-way algorithm when r = 4. Decode only the first two information symbols.
7.15 Repeat Problem 7.4 for the one-way algorithm when r = 4. Decode only the first two information symbols. 7.16 Let WH (Sij) denote the number of ones in the submatrix Sip The maximal number of inputs which the scrambler has in its memory is called the size of the scrambler and denoted ssc. Then
Ew
ssc = max j k l ik
H ('Sij) JJ
(a) Find the size of the scrambler in Example 7.4. (b) Find the size of the scrambler in Example 7.5. (c) Show that, if wH (Sij) does not depend on i, then ssc does not depend on k. 7.17 Find a convolutional scrambler matrix for the 3 x 3 block interleaver. 7.18 Combine the identity scrambler and the delay scrambler with S = 5 by column-interleaving and find the delay and size of the combination. 7.19 Combine the identity scrambler and the scrambler shown in Fig. 7.16 by column-interleaving and find the delay and size of the combination. 7.20 Combine the resulting scramblers of Problems 7.18 and 7.19 by row-column interleaving and find the rate, delay, and size of the combination.
Convolutional Codes with Good Distance Properties We have previously shown that, when convolutional codes are used to communicate over channels at moderate to high signal-to-noise ratios, the distance spectrum is the principal determiner of the burst error probability when maximum-likelihood (or nearly so) decoding is used. We
have also seen that an optimum distance profile is desirable to obtain good computational performance with sequential decoding. Thus, it is important to find methods for constructing convolutional encoders with both a good distance spectrum and a good distance profile. So far there has been little success in finding good convolutional encoders by algebraic methods. Most encoders used in practice have been found by computer search. In this chapter, we discuss an algorithm for computing the distance spectrum. Extensive tables of good convolutional encoders are given. 8.1 COMPUTING THE DISTANCE SPECTRUM
To compute the distance spectrum for a convolutional code encoded by a noncatastrophic generator matrix, that is, the number of paths with a given distance, d say, to the allzero path, we exploit the linearity of the code and count the number of weight d sequences with uo 0 0 stemming from the zero state and terminating for the first time at the zero state. For simplicity, we limit our discussion to binary codes of rate R = 1/2. The extension to rate R = 1 /c is trivial, and to rate R = b/c is straightforward. We assume that rate R = 1 /2 convolutional code is encoded by a memory m generator matrix that is realized in controller canonical form. As usual, ut and vt denote the input and output at time t, respectively. Suppose we are in an arbitrary node at depth t in the code tree and that we have produced channel symbols whose total weight is Wt. Then, in each subtree stemming from this node we have to spend the weight (d - Wt). Hence, let us label each node with the state of the encoder
and the remaining weight; that is, W = d - W. Let o t = (Qt") ... a(m)), where the state variable at = ut_n for n = 1, 2, ... , m, and ut = 0 for t < 0, denote the state of the encoder. From each state we have two successor states, o +1 = (0 ut_1 ... ut_m_1) and al+1 = (1 ut_1 ... ut_in_1), corresponding to information symbol ut equal to zero and one, respectively. To simplify the notations, we suppress the index t in the sequel. For given encoders, we can use the state of a node to determine the at(2)
347
348
Chapter 8
Convolutional Codes with Good Distance Properties
weights, w° and w 1 of the branches stemming from that node. By using these branch weights together with the node weight W, we can determine the two new node weights W° = W - w°
and W 1= W - w t (see Fig. 8.1).
Figure 8.1 Successor nodes at time t.
When searching for a path in the code tree with a given weight, we explore a subtree if and only if the new node weight, W", is nonnegative and if the state of the new node, a", differs from the zero state. Let us arbitrarily give priority to the zero branch when we have to select between two new possible nodes. A straightforward algorithm for determining the number of paths of a given weight d can be formulated as follows: Start at state o = (10 ... 0) with weight W = d - do, where do is the 0th order column distance, and move forward in the code tree. If cr = (0 0 ... 0 1) and W° = 0, then increase the path counter. If the new node weight is negative or if the new node is in its zero state, then we will move backward. Thus, we have to remember all of the previous information symbols so that we can move backward until we find a new "one"-branch with a nonnegative node weight. Then we move forward again. A stop condition appears when we reach the root. This basic algorithm is very time consuming. To measure the performance of the algorithm, we count the total number of nodes visited. Each visit to a node, regardless of whether or not we have been there before, is counted as a visit. As an example, we can use this basic algorithm to verify that the memory m = 3 encoder with encoding matrix G(D) = (g11(D) g12(D)) = (1+D+D2+D3 1+D2+D3),or in octal notation G = (gil g12) = (g11 g11 1 2 gig) = (1111 1011) = (74 54), has one path of weight dfree = 6. (The binary digits are collected in groups of three starting from the left.) We visit as many as 121 nodes in the explored tree. Now by an illuminative example we will show how we can obtain a substantial reduction in the number of nodes we have to visit. Our encoding matrix G = (74 54) has an optimum distance profile dp = (2, 3, 3, 4). In Fig. 8.2 we show only that part of its trellis which contains the weight 6 (= dfree) path. This path corresponds to the information sequence 11000, that is, to the encoded sequence V10,41 = 110 101 00 11. Since the column distance is the minimum of the Hamming weights of the paths with u0 = 1, the distance profile can be used as a lower bound of the decrease of the node weight along the path. In steps 1, 2, and 4 in Fig. 8.2, this bound is tight. If we traverse this path in the opposite direction, we will get the same total weight but different node weights. In Fig. 8.3 we can use the distance profile as a lower bound of the node weights along the path. Notice that if a node has weight less than this bound, then every path leading backward to the zero state will give a negative node weight at the root node. For example, if the node weight in state (001) is less than d3 = dm = 4, we must not extend this node when we are traversing the trellis backward. More generally, the weight of a backward path stemming from a node in state a 0 (00 ... 0), starting with a one-branch and eventually leading to the root node (zero state), is lower-bounded by dm. In Fig. 8.3 we notice, for example, that if the node weight in state (110) will be less than dl = 3, then we must not extend this node. g11)
g1)
g12)
g12)
g12)
Section 8.1
349
Computing the Distance Spectrum
Start
do=2
d3=4
d2=3
d1o =3
Figure 8.2 An example of a weight dfree path. s
0
000
4
000
Start
001 4
011
2
100 3
110
do=2
d1o =3
d2=3
d3=4
Figure 8.3 The weight dfree path traversed backward.
Use of the distance profile as a lower bound works for every path from the end to the a-1, where root node. Moving backward from state o, we can reach the states t7-0 and
00)
(?...??1
(8.1)
e-i zeros
° = (? ...?100 ... 00)
(8.2)
e zeros
= (?
(8.3) e-1 zeros
The minimum weights of backward paths stemming from the states Q-0 and o
are lower-
bounded by dm_e_t and dm_ 1, respectively.
Instead of moving backward in the trellis, we can reverse the entries in the generator matrix and move forward in the corresponding tree and use the distance profile (of the nonreciprocal generator matrix) to effectively limit the part of the tree that must be explored. We will now describe a Fast Algorithm for Searching a code Tree (FAST) in order to determine the distance spectrum for a convolutional code with generator matrix G = (911 912) and distance profile dp. Let
glk = (gik' g'1- k-"
...
g(lk)
(8.4)
k = 1, 2, denote the generators for the reversed convolutional code encoded by the reciprocal
generator matrix G = (gil g12). The distance profile of the generator matrix is denoted dp = (do , dp, ... , dm ). To calculate the ith spectral component, we start at state o = (10 ... 0) with weight W = dfree + i - ao in the code tree generated by the reciprocal generator matrix G. Then we reduce this weight by the weights of the branches that we traverse when the code tree is searched for nodes, with both node weight and state equal to zero. For the state of each explored node, we use the column distances dmor dm _1 to lower-bound the weight of any path leading to a zero state. If the weight is less than this bound, we will always reach a weight
350
Chapter 8
Convolutional Codes with Good Distance Properties
that is zero or negative at a nonzero state. Hence, it is only necessary to extend a node if the node weight is larger than or equal to this bound. If both successor nodes are achievable, then we follow the zero-branch and save (push) the one-branch node (state rr1 and weight W1) on a stack. Thus, we can avoid calculating the node weight for the same node twice, and the algorithm will be twice as fast. (The basic algorithm should also be implemented with a stack.) Our FAST algorithm is shown in Fig. 8.4. If dp > dp; then the reciprocal entries in the generator matrix, else the entries in the generator matrix, are used as inputs to FAST. Start
i
Figure 8.4 Flowchart of the FAST algorithm. Notice that w' is calculated using the reciprocal generator matrix G = (gn g12).
Using FAST to verify that ndfree = 1 for our encoding matrix G = (74 54), we visit only five nodes! Since we are interested in the spectral components for encoders with optimum (or good) distance profiles, it is interesting to notice that the better the distance profile is, the faster FAST runs.
Section 8.2
Some Classes of Rate R = 1/2 Convolutional Codes
351
8.2 SOME CLASSES OF RATE R = 1/2 CONVOLUTIONAL CODES An exhaustive search for convolutional codes with large dfree is practically impossible even for relatively short memories. Therefore, we need efficient rejecting rules that limit the computation of dfree to a small fraction of the complete ensemble of encoders. As we mentioned in Chapter 3, the row distances can be used as a rejecting rule. Their efficiency is shown by the following example. Let (8.5)
The total number of rate R = 1/2, memory m = 16 (G,,, = (10), (01), or (11)) encoding = 3,221,225,472. A simple way to generate all the matrices with Go = (11) is 3 22m-2
encoding matrices G = (gi1 912) and eliminating the encoding matrices G' = (912 9i1) is as follows: for each $11 test only those g12 for which g12 < g11 (in obvious binary notation). The number of encoding matrices is reduced to 3 22m-3 - 2in-2 Thus, we have 1, 610, 596, 352 encoding matrices left to test. Hoping to find an encoding matrix with dfree = 20, we reject successively all encoding matrices with dd < 20, j = 0, 1, ... , 15, where 15 is arbitrarily chosen (see Fig. 8.5). After having used the row distance d15 as a rejecting rule, only 1034 candidates are left. Another 123 of these can be rejected since they suffer from catastrophic
error-propagation. Of the remaining 911 encoding matrices, 200 have dfree = 20. The best one is given in Table 8.2. One might suspect that there exists a memory m = 17, R = 1/2, encoding matrix with dfree = 21. However, all candidates have row distance dio < 21. The efficiency of using the row distances as rejecting rules in this case is shown in Fig. 8.5. # encoders J
with dr ? 20
0
543537361 267253166 84145636 19788663 4764506 1138502
# encoders 1
2 3
4 5
J
with dj' ? 21
0
7
2204679293 791375586 160725370 16854476 1471120 101684 5098 236
1
2 3
6 7 8
9
10
13
3954 2488 1650
14
1233
15
1034
11
12
Figure 8.5 Two examples of using the row distance as a rejection rule.
309889 96872 35853 14974 7167
4 5
6 8
16
9
2
10
0
In Tables 8.1 and 8.2 we give an extensive list of nonsystematic rate R = 1/2 optimum distance profile (ODP) encoding matrices, and in Table 8.3 we list a few encoding matrices that have dfree superior to that of an ODP encoding matrix of the same memory given in Table 8.1.
Massey and Costello [MaC71] introduced a class of rate R = 1/2 nonsystematic convolutional encoding matrices called quick-look-in (QLI) encoding matrices in which the two
TABLE 8.1 Weight spectra ndfr +i, i = 0, 1, ... , 9, for nonsystematic rate R = 1/2 ODP encoding matrices G = (g11 g12).
m
drree 0
912
911
2
1
4
3
5
6
7
7
5
5
1
2
4
8
16
32
64
128
3
74 62 77 634 626 751 7664 7512 6643 63374 45332 65231 727144 717066 745705 6302164 5122642 7375407 67520654 64553062 55076157
54 56 45 564 572 557 5714 5562 5175 47244 77136 43677 424374 522702 546153 5634554 7315626 4313045 50371444 42533736 75501351
6
1
3
5
11
25
55
121
7
2
3
4
16
176
2
3
8
15
37 41
68
8
90
224
10
12
0
53
0
1517
10
123
321
267 432 515 0 764
340
875 441
1951
940
2214
2615
0 1531 1720 2095 2395 2095 3460 4222 5318 0 3099 0 4578 0 6790 9073
15276
5531 0
3611
8675
5
6 7 8
9 10 11
12 13
14 15
16 17 18 19
20 21
22 23
24 744537344 472606614 25 665041116 516260772
1
6
13
0 20
12 10
9
30
51
12
1
8
8
31
14
19
0
80
234 64 156 73 450
150
14
1
10
25
0 46
105
0 258
15
2
10
29
55
138
301
16
5
15
21
381
3
16
44
172
455
18
5
15
21
56 62 56
161
17
161
381
19
9
16
48 58 72 160 49 251
112 125 161
259 314 369 916 234
596 711 914 0
6 31 13 34 22 26 0 20 21
22
1
17
0
108 0 1379
521
0 0 24 40 24 4 27 75 147 331 817 0 2014 0 0 331 26 65 26 10 45 91 235 465 1186 27 24 54 125 278 637 1599
616 692 879 1025 879 1457 1819 2167 5154 1310 7812 1956 11359 2882 3779
TABLE 8.2 Total number of bit errors for the codewords of weight dfree for nonsystematic rate R = 1/2 ODP encoding matrices G = (gli g12). m
911
drr,,
912
ndrree
# bit errors
1
6
4
3
1
2
7
5
5
1
1
3
1
2
7
2
4
8
2
10
12
4 46
8
751
9 10
22
7664 7512 6643 63374 77442 65231 764474 717066 663256 745705 7746714 7315626 7375407 67520654 75457402
23
75501351
54 56 45 564 572 557 5714 5562 5175 47244 56506 43677 573304 522702 513502 546153 5634664 5122642 4313045 50371444 46705066 55076157 472606614
6
7
74 62 77 634 626
4 5
6
11
12 13
14 15
16 17 18 19
20 21
24
744537344
10 12
1
10
1
6
40
12
1
2
14
19
82
4
14
1
15
2
6
16
5
16
17
3
17
18
5
20 55 53
19
9
19
11
20
6
21
13
30 73
22 22 24
26
130
24 26 26
1
2
40 4
260
65
498
10
82
16
256 589 925 1239 8862 1858
9
2
4
352
8
512 1299 2156 2896 0
4442
5127 11589
4199 10245 5085 12207 5853 14487 5085
12207
8257 10502 12937 29386 7433 45858 11053 65585 16618 21831
20562 25222 31241 0
18264 0
27282 0
39794 52929
Section 8.2
Some Classes of Rate R = 1/2 Convolutional Codes
353
TABLE 8.3 Weight spectra ndfa+i, i = 0, 1, ... , 9, for nonsystematic rate R = 1/2 OFD (optimum free distance) encoding matrices G = (gil 912)-
i m 11
12 14 15
16
gti
dr, ee 0
912
7173 53734 63057 533514 626656
2
1
5261
15
72304 44735 653444 463642
16
14 21 14 38
18 19 20 22
26 0 30 67 43 0 0 65
18 4551474 6354344
4
3
6
5
7
8
9
1373 3317 101 249 597 8014 19559 35 108 724 1604 4020 9825 23899 342 4844 0 28513 165 0 845 0 0 54 167 632 1402 2812 7041 18178 43631 265 0 1341 0 7613 0 44817 0 349 0 10947 0 63130 0 1903 0 34
entries in each encoding matrix differ only in the second position: (8.6)
GQLI(D) _ (g11(D) g11(D) + D)
The main feature of QLI encoding matrices is that they have a feedforward right pseudo-inverse, viz.,
i
G QLI (D)
=
1
(8.7)
1
which can be implemented by a simple modulo 2 adder. Clearly, GQLI(D)GQLI(D) = D
(8.8)
This makes it easy to extract an estimate of the information digits from the hard-decisioned received sequences. Furthermore, since the feedforward right inverse has "weight" two, the error amplification factor A = 2 is the smallest possible for nonsystematic encoders. In Tables 8.4 and 8.5, we list some QLI encoding matrices, and in Tables 8.6, 8.7, and 8.8, we present extensive lists of systematic ODP encoding matrices G = (4 g12). TABLE 8.4 Weight spectra ndf_++, i = 0, 1, ... , 9, for QLI rate R = 1/2 encoding matrices.
m
die
911
0
1
2
3
2
5
5
1
2
4
8
3
6
1
3
5
11
4
54 46
8
7
6 10
15
55
2 2
4
5
6 7
454 542
4
8
11
1
4
13
25 25
8
551
9 9 10
9
18
9
5664 5506 5503 56414 46716 51503 510474 522416 454643 5522214 4517006 5036543 47653514 51102726 53171663 510676714
11
3
6
19
12
3
11
23 26 47 32 36 50 35 38 62 42 50 69 42 70 77
10 11
12 13
14 15
16 17 18 19
20 21
22 23 24
7
1
13
8
16
14
10
21
14
3
12
15
6
14
16
11
29
16
2
21
17
5
18 18
6 2
23 26
19
4
20 20
7
24 26
1
17
21
3
31
22
7
38
16
18
4
5
16 25 37 49 70
32
64
55
121
51
83 124 181
115 172
6
191
292 405 270 379
30 37 47 67 90
73 83 99 146
207 234 361
587 870
210
520
1311
71
141
335
68 122
176
51
155
90
230 326
469 688 376 499 727 445 586 867 476 754 948
877 1006 1637 898 1227 1614 1120
139 78 97 138 93 136 171
269
173
253 349 215 327 423
450
1441
1965 1096 1866 2231
7 128 267 442 678 945 686 992 1146 1474 2128 3096 1991
2390 3955 2164 2994 4070 2610 3525 4803 2733 4531 5469
8
256 589 1015 1576
2279 1663 2495 2719 3535 5205 7458
4852 5924 9574 5337 7233 10189 6158 8526 11405
9
512 1299
2334 3694 5414 3955 5735 6631 8363 12510 17856 11775 14285
22960 12891 17526
24338 14933
20482 27759
6640
16127
10676 13466
26209 32186
354
Chapter 8
Weight spectra ndf,ee+i, i = 0, 1, ... , 9, for QLI rate R = 1/2 ODP encoding matrices.
TABLE 8.5
m
g12
9 11
12 13
14 15
16 17 18
19
20 21
22 23 24 25
26 27 28 29 30 31
5
6
32 55
64
128
121
267 209 678 366 686 316 492
7
1
2
4
16
1
3
5
25
1
1
3
18
40
87
8
2
7
10
124
292
714 742 743 7434 7422 7435 74044 74046 74047 740464 740462 740463 7404634 7404242 7404155 74041544 74042436 74041567 740415664 740424366 740424175 7404155634 7404241726 7404154035 74041567514 74041567512
8
1
2
5
49 27
157
9 9 10
1
4
13
51
68 115
270
1
1
5
21
51
127
219 345 427 162 278 402 260 404 585 283
5
10
4
3
6 6
4
8
2
1
5
3
7
0
df 7 74 76 75
2
6
Convolutional Codes with Good Distance Properties
11
2 2
6
31
112
5
14
57
5
3
10
81
146 183
12 11
1
1
5
13
2
6
7
14
8
12
14
2 2
15
3
5
11
16
2
9
15
16
1
2
13
33 48 71 61 67 114 43
15
1
0
2
19
18
12
15
4 9
126 78
19
2 2 2
13
19
1
2
8
20 20 22 22
1
8
11
1
3
8
7
6 38
2
6
14
23
2 6
7 24
24 32
18
24 23 25
1
6
1
6
62 115 184
89 168 231 139
48 226
1
1
6
202 34
5
11
15
134
1020 377
676 981
633 992 1344 741
315 1412 1026
552 439 539 302 427 242 806 467 685
225 105
170 117 311 186
164 93 105
841
143
183
96 50 67 54
270 473
1195
88
208
332
841
8
1283
722 939 567 1996 1141
1589 2653 559 2072
256 589 476 1576 914
912
dfree
0
1
2
3
4
1
6
3
1
1
1
1
1
2
7
4
2
0
0 0
13 16
4
1
0
4
64 66
5 6
5
2
10
21
73
3
13
0
6
1
3
4
11
55 25
7
674 714
2
0
9
0
46
8
671
6 6 6 7
2 0
1
5
1
5
5
17
35
8
4 3
0 0
19
8
9 9 10 10 10
3
5
11
0 0 26
1
10
15
27
0
12
1
4 0 0 0
94 79 52 46
12
13
0
12
13
12
4
12
3
0 0 0
46 46
0 0 0
12
1
3
9 10 11
12 13 14 15
16 17 18
19
20 21 22 23 24 25 26 27 28 29 30 31
7154 7152 7153 67114 67116 71447 671174 671166 671166 6711454 7144616 7144761 71447614 71446166 67115143 714461654 671145536 714476053 7144760524 7144616566 7144760535 67114543064 67114543066
5
4
16
16
0 0 0 25
124 105 78
263 263
3
23 23 10
1
0
7
0
6 2
0 0
44
0 0
189
14 15
5
15
3 8
62 44 0
115 112
16
7 7 0
16
7
16
3
18
22
12 14
0 0 0
16
1
1
18
11
0
38 23 16
54 73 38
154
92 53 66 168
0
289 350
0
134
118
0
10 53
15
695 36 307
0
6
5
7
8
4070 5589 3560
2470 3179
5903 7850
1717 725 3329 2419 3131 1702 2325 1447
4040
4828 2658 3936 6687 1293
4878
9
1
1
1
1
0
0 29 0 53 0 70 0 0 124 104 0 0 0 0 0 0 0 110 0 0 0 256 244 0 0 0 0
180
89 232 332
0
69 77 298 118 248
0
1401
274 0 452
654
101
0
173
542 457 317 224 777 517 437 1486 1486 817
556 263 314
1870 1368
4529 3138
0 0
0
2391
0
9019 9019 4896 3472
0 0
0
0 0
947 669 578
0
1971
834 3926 225 1742
0 1430 0 2415
576 0 0
0 0 676
1648 1312 0 0
0 0 596 0
3159 2618
0 711
0 0 4364 3322
0 0 821
1132
1691
1289 993
1593 1842
6570 5726 3999 3267 9609 11624 5052 22788 1342 10218
2161
1726 2391 1466
1205
34
0
3694
2070 2593 930
780
0
1
512 1299 1096
3955 1886 2846 4956 6186 2352
1663
TABLE 8.6 Weight spectra ndfree+i, 1 = 0, 1, ..., 9, for systematic rate R = 1/2 ODP encoding matrices G = (4 $12).
m
9
0 0 0
3838 0 0 0
9703 8097 0 0 0 0
3298 0
1825
8109 6049 7534 4064 5702 3525 12103 6545 9611 16203 3051 11683
Section 8.2
Some Classes of Rate R = 1/2 Convolutional Codes
355
TABLE 8.7 Minimum distances dm and # truncated codewords v[Qml of weight dm for systematic rate R = 1/2 ODP encoding matrices G = (4 g12). m
dm
1
3
1
2
3
1
3
3
4
4 4
5
5
5
6
5
2
7
6
11
8
6
5
9
6
1
10
7
12
# v[o.ml of weight dm
1
11
7
5
12
8
29
13
8
12
14
8
6
15
8
1
16
9
18
17
9
7
18
9
3
19
10
31
20
10
13
21
10
4
22 23 24
10
1
11
27
11
11
25
11
5
26 27
11
1
12
21
28
12
8
29 30
12 13
2 43
31
13
15
32 33 34 35 36
13
4
37 38 39 40
14
2
15
31
15
12
15
3
41
15
1
42 43 44 45 46 47 48 49 50
16
31
16
14
16
5
13
1
14
34
14
14
14
5
16
1
17
39
17
13
17
4
17
1
18
38
51
18
16
52 53 54 55 56
18
7
57
19
2
58
20 20
60 25
20 20
10
59 60 61
18
2
19
43
19
20
19
7
2
62 63 64
20
1
21
25
21
10
65
21
2
66 67
22
71 29
22
912 6 6
64 64 65 650 670 670 6710 6710 6711 67114 67114 67115 671150 671144 671151 6711514 6711454 6711454 67114544 67115142 67114543 671145430 671151572 671151505 6711454574 6711454306 6711454311 67114545754 67114545754 67114545755 671145457554 671145457556 671145454470 6711454544704 6711454544676 6711454575564 67114545755644
67114545755712 67114545755713 671145457556464 671145457556464 671145457556153 6711454575561314 6711454575564666 6711454575564667 67114545755646674 67114545755646676 67114545755646676 671145457556466760 671145457556466760 671145457550027077 6711454575571301174 6711454575571301176 6711454575571301176 67114545755713011760 67114545755713011760 67114545755646670367 671145457556466703670 671145457557130117610 671145457557130117611 6711454575571301176114 6711454575571301176114 6711454575571301176114 67114545755713011761144 67114545755646670367016
TABLE 8.7 (Cont'd) Minimum distances dm and # truncated codewords vlp,m] of weight dm for systematic rate R = 1 /2 ODP encoding matrices G = (4 g12).
m dm 68 69
70 71 72
22 22 22 23 23
#
of weight dm 9
4 1
46 16
73 74
23 23
2
75 76
24 24 24 24
56 20
25
74
25
33
25 25 25 26 26
16
77 78 79 80 81
82 83 84 85
86 87 88
89 90 91
92 93 94 95 96
26 26 27 27 27 27 27 28 28 28 28
5
8 3
4 1
41
20 6 2
62 28 11
5 1
42 20 5 1
g12 67114545755646670367017 671145457556466703670170 671145457556466703670170 671145457557130117611463 6711454575564667036701444 6711454575564667036701446 6711454575564667036701447 67114545755713011761146370 67114545755713011761146342 67114545755713011761146373 671145457557130117611463424 671145457557130117611463432 671145457557130117611463433 6711454575571301176114634334 6711454575564667036701447272 6711454575564667036701447277 67114545755646670367014472730 67114545755713011761146343362 67114545755713011761146343363 671145457557130117611463433634 671145457556466703670144727304 671145457556466703670144727305 6711454575564667036701447273054 6711454575564667036701447273056 6711454575564667036701447273357 67114545755646670367014472730510 67114545755646670367014472730512 67114545755646670367014472730511 671145457556466703670144727305110
TABLE 8.8 Total number of bit errors for the codewords of weight dfree for systematic rate R = 1 /2 ODP encoding matrices G = (4 g12). m
ndr_
# bit errors
3
1
1
4
2
3
4
1
1
5
2
6
3
4 6
6
1
2
7
6 7 64 66 73 674 714
6
2
3
8
671
7
1
9
7154 7152 7153 67114 67116 71447 671174 671166 671166 6711514 7144616 7144761 71447614 71446166 67115143 714476124 671145536 714476053 7144760524 7144616566 7144760535 71446165670 67114543066
8
4
11
8
3
8
9
3
9
9
1
10
5
13
10
4
9
1
2 3
4 5
6
10 11
12 13
14 15
16 17
18 19
20 21
22 23
24 25 26 27 28
29 30 31
356
drpe
912
10
1
1
1
2
12
13
44 44
12
4
9
12
3
7
12
1
2
12
1
2
14
6
14
2
20 4
15
5
17
15
3
11
16
8
28
16
7
28
16
3
11
18
22
89 2 50
12
13
16
1
18
11
Section 8.3
357
Low-Rate Convolutional Codes
8.3 LOW-RATE CONVOLUTIONAL CODES
In Tables 8.9-8.12, we list rate R = 1/3 and R = 1/4 systematic as well as nonsystematic ODP convolutional encoding matrices. Their free distances are compared with Heller's and Griesmer's upper bounds in Figs. 8.6 and 8.7. TABLE 8.9 Weight spectra ndfR,+i, i = 0, 1, ... , 9, for systematic rate R = 1/3 ODP encoding matrices G = (4 912 913)-
M
dr,
913
912
0
1
6
1
5
6 7
5
2
6
1
3
74 72 73
8
5
64 56 57
9 10
6
564
754
12
7
516
12
8
531 5314 5312 5317 65304 65306 65305 653764 531206 653055 5144574 6530576 6530547 65376114 51445036 65305477 514453214 653761172 514450363 6537616604 6537616606 5312071307 51445320354
676 676 6764 6766 6767 71274 71276 71273 712614 676672 712737 7325154 7127306 7127375 71261054 73251266 71273753 732513134 712610566 732512675 7126106264 7126106264 6766735721 73251313564
4
9 10 11
12 13 14 15
16 17 18 19
20 21
22 23
24 25 26 27 28 29 30
1
2
4
3
6
5
8
7
9
1
1
1
1
1
1
1
2
0 0
8
3
0 0
4
2
0 0
1
2
3
3
1
0 0 24
46 48
1
3
1
0
4
4
8
0
12
140
1
7
8 8
9
11
11
20
0
13 39 19 27 29 64
21 0
1
0 2
8
19
2 10 0 7
9
1
13
1
3
3
3
5
2
16
4
8
16
1
0 2
6 4 0
4
7
7
15
18
31
17
38 35
1
2
6
8
4
12
24
18
1
3
4
7
5
9
31
19
2
2
3
5
9
25
20 20 22 24 24 26 26 26 28 28 30
2
0
8
19
1
0
5
0 0
2 6
7
0
19
0
2
0 0 0
11
4
0
21
3
11
4 3
9
17
2
0 0 4 0 0
0 0 0 0
20 40 27 43 23
23 0 0 0 0 0 0 0 0 25
14
0 0
29 46
31
2
5
9
14
32
10
0
19
0
33
5
13
12
13
33
3
5
34
1
2 6
6
1
3 1
8
11
18
35
20 25
15
1
0 0 62 48
16
46
0 93 110
80
144
168
0 95 118 123
43
49 0
66 65 71
65
92
35 0 0 0 0
71
0
197
98 0 0 0 0 0
131
0
307
0
88
0 0
261 132
0
75
142
0 0 94 0
169
145
80
261 119
75
136
41
25 65
49 46 50 100
17
0 0 25
43 23
0 54
140 75
12
16
31
5
14
31
54 43
58
138
106 176
294
0 201
0 0 229 0 412
276 136
449
192
200
TABLE 8.10 Weight spectra ndf,, +i, i = 0, 1, ..., 9, for nonsystematic rate R = 1/3 ODP encoding matrices G = ($11 912 913)
m
912
911
913
drree
1
4
6
2
5
7
6 7
3
54
64
74
4
52 47
66 53
574 536 435 5674 5136 4653 47164 47326 47671 447454 552334 552137
664 656 526 6304 6642 5435 57254 61372 55245 632734 614426 614671 6246334 6151572
76 75 744 722 717 7524 7166 6257 76304 74322 63217 766164 772722 772233 7731724 7731724
5
6 7 8
9 10 11
12 13
14 15
16 17 18
19
4550704 5531236
5
0
1
2
3
4
5
6
7
8
9
1
1
1
1
1
1
1
1
1
1
8 10
2
5
0
13
3
0 0
2
0
15
0 0
34 24
0 0
89 87
0 0
12
5
0
3
0
62
0
108
0
1
3
6
5
12
14
1
0
8
11
0
35
66 97
16
1
5
2
18
2 0
6
13
125 140
19
0
40
14
18
0
32
0 60 0
15
18
0 28 0 29
321
4 9
34 30 99 39 70 74
65
1
6 7
14
17
33 0 44 38
106
14
0 4 0
13
13
20
7
21
4
1
22 24 26 27 28
6
3
0
2
8
10
7
0
6
0 4
23
6
21
1
6
5
17
3
9
20
21
7
15
11
21
0 32
30 32 34
28
0
53
35
8
18
29
64 24 24 29 58 112
54
0 37
34 49 82 0 78
73 114 190
101
155
166
0
426
69 67
112
166
90
83
145
155 241
94 357
177
130
0 267
288 994 431
0
0 225 0 267 0 328 266 409 523 0 693
Convolutional Codes with Good Distance Properties
Chapter 8
358
TABLE 8.11 Weight spectra ndfRe+i, i = 0, 1, ..., 9, for systematic rate R = 1 /4 ODP encoding matrices G = (4 912 913 g14).
m
912
1
4
2
5
3
54 56
4 5
51 534
6
14 15
16 17 18
19
20 21
22 23
24 25
26 27 28 29 30
1
0
1
64 62 67
74 72
11
1
0 0
12
1
0
1
73 754 732 755 7554 7404 7557 75574 76472 76473 740424 747332 755775 7647074 7647072 7647073 76470730 74040356 76470731 764707304 764707302 764707303 7647073024 7647073032 7647073025 76470730324
14
1
0
2
16
1
3
18
1
3
20 20 20 24 24 26 28 27 30 30 32 34 36 36 38 40 42 42 44 47 48
2
4
2
0
1 1
0 0
1
6371
13
0
1
8
5351
53514 51056 51055 515630 530036 535154 5105444 5105446 5105447 51054474 51563362 51054477 510544764 510544770 510544771 5105447710 5105447714 5105447715 51054477154
3
6
11
12
2
7
10
9
1
6
634 676 637 6370 6272
8
0
6 6
516 535 5350 5156
7
dfree
g14
913
63714 63116 63117 627350 611516 637141
6311614 6311616 6311617 63116164 62735066 63116167 631161674 631161666 631161667 6311616664 6311616664 6311616671 63116166734
4
5
0
1
0 0 0
2
0
0 0
1
2
0 2 7
0 0
0
1
2
0
1 1
0 0
3 1 1
7
6
0 0 0
1
9
8
0 0 0
1
0 0
3
4
5
0 0
3
5
5
9
9
9
3
3
5
10
11
13
3
2
9
19
21
9
8
0
29
0
4
4 0 0
8
0
5
17
0
7
0
16
0 0
35
0
2
1
8
8
3
9
10
2
0
5
0
23
0
0
4
0 0
12
5
13
0
30
0
0
2
3
2
3
5
9
18
0
2
0
6
0
21
1
0
5
12
0
20
0 0
12
0
17
0
3
0 0
3
2
0 0
0 0 0
10
0
4
0
12
0
21
2
2
3
5
1
4
7
9
13
0 26
1
0 0
3
0
4
0
12
0
21
0
2
3
5
0
7
0
21
1
1
3
0 4
1
3
10
10
11
0 32
2
0
7
0
5
0
3
0
10
17
0
29 37
0
1
0 0
0 0
30
2
1
3
2
12
5
10
14
18
3
3
6
9
13
9
23
25
35
3
4
2
7
8
17
33
38
51
6
3
4
8
15
25
20 28
39
44
60 48 58
52
3
2
7
6
9
13
14
35
37
60
1 1
1
1
5 5
4
5 8
0 0
6
0 0
8
10
0
TABLE 8.12 Weight spectra ndf1ee+i, i = 0, 1, ..., 9, for nonsystematic rate R = 1/4 ODP encoding matrices G = (gil 912 913 914)-
M 1
2 3
4 5 6 7 8
9 10 11
12 13
14 15 16
17 18
19
20 21
g12
911
g13
drree
g14
4 4
4
6
5
6
44 46
54 52
47 454 476 457 4730 4266 4227 46554 45562 47633 454374 463712 415727 4653444 4654522 4712241 45724414
53 574 556 575 5574 5362 5177 56174 57052 57505 574624 566132 523133 5426714 5617436 5763615 55057474
64 66 67 664 672 663 6564 6136 6225 66450 64732 66535 662564 661562 624577 6477354 6645066 6765523 65556514
0
4
1
2
1
0
1
1
0
2
0 0
3
0
1
1
2
2
3
5
6
7
6 7
6
1
8
1
70 76
11
1
0 0 0
15
1
2
1
1
1
3
7
4 4
75 724 712 723 7104 7722 7723 72374 73176 71145 723354 727446 744355 7036504 7237532 7330467 72624710
18
3
0
5
0
10
3
0
4
0 0
8
20 22 24 26 28 30 32 34 37 39
7
0
15
1
5
2
2
4
4
1
3
4 4
7
2
1
0
12
5
0
10
4
0
41
42 45 46 50 50
1
0 0
8
9
1
0
5
0
4
4 18
0
7 25
0
27
5
15
21
14
6
13
31
0 0
20
0
41
29 0
17
0
11
0
23
0
40 39
0 0
0 0
4
0 0
4
0
1
3
6
9
6
13
10
15
31
37
1
0
11
0
lI
0
33
0
3
5
6
10
11
11
25
5
7
10
4
5
10
15
32 33
39 45
0 56 62
3
7
7
10
19
11
21
35
1
0
14
0
17
0
24
0
3
5
8
13
16
14
25
13
0
18
0
20 39
0 0
28
13
0 0
42 0
91
0
168
0
1
7
6
13
15
13
36
36
55
79
3
1
40 52 72 64 89
75 0 87 0
Section 8.3
Low-Rate Convolutional Codes
359
diree
Free distance
- - - Heller's upper bound and Griesmer's strengthening
+ +- Optimum distance profile m 5
10
15
20
25
30
Figure 8.6 The free distances for rate R = 1/3 systematic and nonsystematic ODP convolutional codes and comparisons with Heller's and Griesmer's upper bounds.
dtree
50
45
Nonsvstematic encoders 40
Systematic encoders
35
30 25 20 15
Free distance
Heller's upper bound and Griesmer's strengthening
10
.--.--*--Optimum distance profile m
0t 0
5
10
15
20
25
30
Figure 8.7 The free distances for rate R = 1/4 systematic and nonsystematic ODP convolutional codes and comparisons with Heller's and Griesmer's upper bounds.
360
Chapter 8
Convolutional Codes with Good Distance Properties
8.4 HIGH-RATE CONVOLUTIONAL CODES
In Table 8.13 we list rate R = 2/3 systematic, polynomial, ODP convolutional encoding matrices
G( 1
0
813
0
1
823
(8.9)
Rate R = 2/3 nonsystematic, polynomial ODP convolutional encoding matrices G
=
911
912
913
g21
g22
923
(8.10)
are listed in Table 8.14. TABLE 8.13 Weight spectra ndf +i , i = 0, 1, ..., 9, for systematic rate R = 2/3 ODP encoding matrices.
m
dfree 0
923
913
2
1
4
3
1
6
6
2
1
2
5
12
25
2
5
7
3
2
5
15
45
127
3
54 56 57 554 664 665 5504 5736 5736 66414 57372 55175 664074 664072 573713
64 62
4 4
7
0
2
10
76 22
0 92
63
5
6 20
704 742 743 7064 6322 6323 74334 63226 70223 743344 743346 632255
5
2
55 46 237 52 50 84 505 260 82 57 36 588 170 249 74 196
207 126 0 208 175 323 0 0 318 184 122 0 695 0 297 696
820 343 799 554
4 5
6 7 8
9 10 11
12 13
14 15
16 17
18 6641054 7431164 19 5514632 7023726 20 5514633 7023725
16
6 24
0
6
5
19
6
1
12
8 24 8 44 0 8 16 0 3 27 8 7
8
2
8
1
14
9
10 40 0 10 15 52 10 18 0 2 19 10 11
8 46
6
5
3608 789 605
1211 7586 3668 1099 705 477
54 370 0 1109 2896 2144 0 3008 2346 4718
7
113
240
1080
3137 0
8993
4006 14323 10422 38069 7765 28649 50088 0 11495 43732 8883 33781 18003 68871
0 111152 0 4254 2821 1760
8
509 9128 98441 49843 139300 106754 700939 165535 130363
9 1082
26557
0 175786 508029 398879 0
627736 496348 263798 1007634
0 1634188
0
795129
0
53648 0 16944 64305 10644 40563
6976 26870 8203 0 0 121732 2685 10131 3638 0 1029
TABLE 8.14 Weight spectra nde,.+i, i = 0, 1, ... , 9, for nonsystematic rate R = 2/3 ODP encoding matrices.
m v 911 912 913 921 922 923 dfree 0 1
2 2 3 3
4 4 5 5
2 3 4 5 6 7 8 9
2 6 3 60 30 54 46 34
10 43
1
2
3
4
5
6
6
6
4
2
6
3
1
4
14
40
116
339
991
2
4
1
4
7
4
1
5
24
71
239
871
3051
4 70 64 30 50 46 46
5
4
3
7
30
64 54 64 64
00
74 54
1103
17
55
25
45 70 133 528 154 228 925
143 285 569
41
13 27 53 0 43 81 0
64 54 26 56 33
10
62 32
12
52
46 76
2 6 7 17 8 41 8 6 9 17 10 69 5
6
0
534 2327 7497 1860
493 933 3469 0 13189
7 2897 10684
8 8468 37458
9 24752 131418
2014 7336 26643 97750 358232 15383 4063 57770 216873 815660 32412 123450 468927 1777969 8624 0 111071 0 1619131 0 28490 108772 416442 1597887 7406 51286 198250 755787 2911513 13203 0 197340 0 2908007 0
High-Rate Convolutional Codes
Section 8.4
361
In Section 2.9 (Theorem 2.66), we proved that if an encoding matrix G(D) and the encoding matrix H(D) of the convolutional dual code both are minimal-basic, then they have the same overall constraint length v. Furthermore, in Section 2.10 we showed how to use H(D) to build systematic encoders with feedback of memory v = m. Thus, it is natural and quite useful when we search for encoding matrices to extend the concept of distance profile to dp = (do, dl d c ,.. dv). The encoding matrices in Table 8.15 are ODP in this extended sense [JoP78]. In Table 8.16 we give the encoding matrices for the convolutional duals of the codes encoded by the encoding matrices in Table 8.15. TABLE 8.15 Weight spectra ndea+i, i = 0, 1, ..., 9, for nonsystematic rate R = 2/3 ODP encoding matrices. These encoding matrices are ODP in the d° sense.
V 811
812
813
823 d
g22
g21
d f_0
3
6
2
4
1
4
7
3
2
4
4
6
3
7
1
5
5
4
17
5
5
60
30
70
34
74
40
4
7
6
6
24
54
24
70
2
6
30
64
46
5
30
7
8
64
12
52
00 26
54 66
4
7
50 54
66
44
5
15
8
9
54
16
66
25
71
60
5
9
8
71
239
871
3051
10684
912 1122
2986 4121
10580 15564
38728 58274
613 1320
2269
8588 19860
32872 76028
1987
7761
904 2353
3522 8879
1
1 20 9 45
51
36
53
67
6
54
670
320
404
6
29
10 29
12
740 710
260 260
520
367
27
10
670
140
414 545
6
13
714 515 533
6
5
11
14
676
046
704 642 7560
256
7
65
7
38
12 12
7
14
12
302
0724
5164
0600
7650 5140 5712
5434 7014 5622
7
7
8
106
8
43
13 14 14
0012 6756 5100 21 6562 2316 4160 0431 4454 7225 22 57720 12140 63260 15244 70044 47730 23 51630 25240 42050 05460 61234 44334
8
23
14
8
11
9
144
9
60
15 16 16
722
054 2460
17
5330
3250
18
6734
1734
19
5044
3570
20
7030
3452
5340 4330 4734 7566
1574 1024
17
6 26 8 40
23
7640
0 4 34 9 72 58 0 25 0 7 26 18 73 50 0 24 0 15 0 14 59 50 0 10 53
5089
0 6711 0 450 1657 6565 824 3192 12457 0 12416 0 0
6583
0
437 1695 6470 983 3861 14616 0 11499 0 0
5801
0
0
2837
0
780 3004 11534 0 10665 0 661 2555 9981
h2
h1
h3
3
74
54
4
50
62
64 72
5
65
45
6 7
424 472
644 752
8
635
403
571
9
5014
10
7164 5755
4634 4136 7767 52464 76346 53241
6664 5416
11
12 13
70414 56502
14
71433
15
660004 461656
16 17 18
19
20 21
22 23
544463 4114444 6171512 7500021 72164254
55422416 45416327
575734 700006 433501 5433454
5475256 6742327 45126324 42035332 51203765
8
37458
53 764
532
6601
60244 67772 61175 776554 630732 615256 7152024 4301002 4162245 61662214 60362506 76300111
9
131418
476060 219519 823216 123888 470008 288665 1099145 135564
30303 115760 446960 13757 52718 202205 33990 130878 501743 99045 0 1468497 24771 94893 368220 47341 181615 697910 184543 0 2722778 97231 0 1437059 24819 95348 366797 56285 215749 829630 169839 0 2506969 85995 0 1276564 43027 0 631809 44262 169220 652763 156770 0 2318789 38201 147508 566451
TABLE 8.16 Encoding matrices H = (h1 h2 h3) for the convolutional dual of the codes encoded by the encoding matrices in Table 8.15. V
7
240 276 145 378 598 243 593
260
16
6
59 80 47 105 157 75 166 473 127 222 847 462 120 247 764 425 185 187 740 181
53
15
5
24
710
442 435 4260
4
5
1
11
470 457
3
7 23 9 19
10
9
2
1
1711648 775954 1931861 0 1413235 2690281 0 0 1412698 3197125 0 0 0
2511123 0
362
Convolutional Codes with Good Distance Properties
Chapter 8
8.5 TAILBITING TRELLIS ENCODERS In this section, we give short and moderate-length polynomial convolutional encoding matrices
for tailbiting (TB) representations of block codes for various rates. Tailbiting encoders can generate many of the most powerful binary block codes. Tables 8.17-8.20 list the best encoding matrices for rates R = 1/4, 1/3, 1/2, and 2/3, respectively. For each number of information bits K up to 50, the best encoding matrices are listed at each encoder memory m. Encoders of fixed memory m achieve higher distances as the tailbiting trellis lengthens. At a given distance, a longer trellis reduces the number of nearest neighbors at d. Eventually, neither d nor nd improves further with tailbiting length; in fact,
TABLE 8.17 Minimum distance d for rate R = 1/4 encoding matrices for tailbiting representations for given K, 2 < K < 50.
M K
2
1
G
2 (4,4,4,6) 5 3 (4,4,6,6) 6 4 (4,4,6,6) 6 5 (4,4,6,6) 6 6 (4,6,6,6) 6 7 (4,6,6,6) 7 8 (4,6,6,6) 7 9 10 11
12 13 14 15 16
3
G
d nd
4
G
d nd
2 (4,4,6,7) 4 (4,4,6,7) 4 (4,5,6,7) 5 (4,5,6,7)
5
G
d nd
G
d nd
d nd
2 (46,52,60,70) 5 2 (46,56,67,73) 5 2 3 (42,54,62,70) 6 3 (43,51,63,76) 6 3 11 (46,56,66,76) 8 11 (45,56,66,73) 8 11 8 8 10 (40,52,62,74) 9 10 (45,57,64,72) 9 10 1 (5,6,7,7) 8 12 (46,56,62,76) 10 9 (45,57,64,72) 10 9 8 (5,6,7,7) 10 21 (54,64,70,74) 12 35 (42,52,62,76) 12 28 (40,51,63,75) 12 28 8 (5,5,7,7) 10 16 (54,64,70,74) 12 14 (54,56,62,76) 12 8 (45,57,61,77) 12 8 K (5,5,7,7) 10 18 (54,64,70,74) 12 9 (46,54,56,74) 14 72 (45,47,63,75) 14 72 (5,5,7,7) 10 12 (44,54,64,74) 12 10 (46,54,66,76) 14 30 (46,55,61,77) 16 285 (5,5,7,7) 10 11 (54,54,64,74) 13 33 (46,52,72,76) 15 22 (43,55,71,75) 16 88 K (54,54,64,74) 13 36 (52,56,66,76) 16 123 (43,53,75,76) 16 12 (54,54,64,74) 13 26 (52,56,66,76) 16 52 (47,55,73,75) 18 234 5
6
2 (44,54,64,70) 5 3 (40,54,60,70) 6 11 (40,50,60,64) 8 5 (40,54,60,70) 9 3 (40,54,64,70) 10
2K
4K (47,53,67,75) 18
84 45
(47,57,65,73) 18
3K
TABLE 8.18 Minimum distance d for rate R = 1/3 encoding matrices for tailbiting representations for given K, 2 < K < 50. M
K
1
d nd
2 (4,4,6) 4 3 (4,4,6) 4 4 (4,6,6) 4 5 (4,6,6) 5 6 (4,6,6) 5 7
G
d nd
3 (4,6,7) 4 3 (4,6,7) 4 1 (4,6,7) 6 6 (4,6,7) 6 6 (5,6,7) 6 K (5,6,7) 7
12
(5,6,7) 7 (5,6,7) 7 (5,7,7) 8 (5,7,7) 8 (5,7,7) 8
13
(5,7,7) 8
8
9 10 11
14
4
3
2
G
G
3 (54,60,70) 3 (54,70,74) 12 (40,60,64) 10 (40,64,70)
d nd
5
G
d nd
3 (46,56,62) 4 3 (46,56,62) 4 6 12 (52,60,70) 6 7 15 (40,62,64) 7 1 (54,60,70) 8 45 (46,60,70) 8 8 (54,70,74) 8 28 (46,52,70) 8 8 (44,64,74) 8 13 (50,72,76) 8 9 (44,64,74) 8 9 (46,56,62) 10 45 (44,64,74) 9 20 (52,72,74) 10 33 (54,64,74) 10 44 (52,72,74) 10 27 (54,64,74) 10 36 (56,62,72) 10 26
2K
4 4
6
G
d nd
3 (54,60,75) 3 (42,67,75) 12 (45,61,77) 15 (43,57,67) 45 (56,61,73) 21 (46,67,75)
4
(47,60,73)
8
1
4 6 7 8
8
G
d nd
3 (430,674,730) 4 3 (430,674,730) 4 12 (570,600,724) 6 15 (430,664,744) 7 45 (430,674,730) 8 21 (444,534,670) 8 1 (440,640,764) 8 99 (430,674,730) 10 21 (514,630,730) 10 55 (424,504,774) 11
3 3
12 15
45 21 1
99 (54,60,75) 10 99 46 (55,62,72) 10 21 11 (46,54,67) 11 33 6 (47,57,61) 12 200 (530,714,774) 12 178
3K (52,66,76) 12 182 (47,55,67) 12 (52,66,76) 12 126 (56,65,75) 12
15
(46,52,76) 11
15 (47,53,67) 13
16
(52,66,76) 12
80 (45,67,75) 13
17
(52,66,76) 12
18
(52,66,76) 12
85 (51,67,73) 13 93 (47,53,75) 13
19
(52,66,76) 12
95
5K
39 (534,674,740) 12 39 14 (610,724,744) 13 126 75 (510,664,764) 14 390
32 (454,564,770) 34 (570,654,744) 18 (464,534,734) K (454,574,654)
14 168 14 51 15 240
15 114
(564,624,754) 15
60
21
(534,664,744) 15
63
22 23
(564,624,754) 15
20
66
3K
Tailbiting Trellis Encoders
Section 8.5
363
TABLE 8.19 Minimum distance d for rate R = 1/2 encoding matrices for tailbiting representations for given K, 2 < K < 50. M
2
1
K G d
nd
4
3
G d
nd
G
d
2 (4,6) 2 1 (6,7) 2 (54,60) 2 3 (4,6) 3 4 (4,5) 3 4 (54,74) 3 4 (4,6) 3 4 (4,7) 4 14 (40,64) 4 5 K (6,7) 4 10 (50,64) 4 6 (6,7) 4 9 (54,60) 4 1
7 8
(5,7) 4 (5,7) 4
7 (54,74) 4 2 (54,60) 5
nd
1
d nd
(44,76) 2
4 (46,60) 3 14 (54,62) 4 10 (46,60) 4 6 (46,60) 4 7 (44,76) 4 24 (46,60) 5
1
7
6
5
G
G
G
d
(434,730)
2
4 (434,730) 14 (444,610) 10 (434,730) 6 (410,664) 7 (434,730) 24 (530,654)
3
d nd
(53,70) 2
4 (50,67) 3 14 (46,75) 4 10 (51,66) 4 6 (56,75) 4 7 (53,70) 4 24 (53,70) 5
1
nd
8
G
d
(556,750)
2 3
5
4 (556,750) 14 (570,766) 10 (524,712) 6 (556,750) 7 (556,750) 24 (556,750)
6
4
4 4 4
nd
G
d
nd
(511,733)
2
1
3
4
4 4 4 4
14
5
4 (461,750) 14 (537,705) 10 (461,750) 6 (461,750) 7 (423,722) 24 (423,722)
5
24
102 (570,766)
6
102 (513,662)
6
102
1
4 4 4 4
1
10
6 7
9
(5,7) 5
18 (54,70) 6 102 (52,64) 6 102 (43,73) 6 102 (564,774)
10
(5,7) 5
12 (54,70) 6
90 (54,62) 6
90 (57,60) 6
6
40 (414,642)
6
40 (537,704)
6
40
11
(5,7) 5
11
(54,74) 6
44 (56,76) 6
44 (53,70) 7 176 (444,624)
7
176 (540,606)
7
176 (472,745)
7
176
12
K (54,74) 6
48 (62,72) 6
30 (50,67) 6
20 (414,730)
8
759 (406,670)
8
759 (411,664)
8
759
13
(54,74) 6
13 (56,76) 6
13 (56,67) 7 117 (444,614)
7
117 (476,634)
7
117 (400,653)
7
117
14
K (46,66) 7
72 (44,73) 7
56 (654,760)
8
546 (442,632)
8
546 (576,763)
8
15
(56,62) 7
75 (54,75) 8 450 (414,724)
8
450 (442,766)
8
450 (453,771)
8
546 450
16
(56,62) 7
64 (54,75) 8 348 (564,740)
8
238 (512,654)
8
238 (437,670)
8
238
17
(46,66) 7
34 (53,76) 8 170 (534,630)
8
153 (532,674)
8
153 (422,665)
8
153
18
(56,62) 7
54 (57,63) 8 117 (564,644)
8
72 (406,732)
8
54 (530,727)
8
54
19
(56,62) 7
38 (56,75) 8
76 (550,634)
8
19 (540,762)
8
19 (537,656)
8
19
20
2K (53,75) 8
60 (464,744)
9
280 (552,702)
21
(53,75) 8
42 (434,554)
9
22
(45,77) 8
44 (564,634) 10 1364 (562,616) 10 1298 (562,721) 10 1144
23
(55,75) 8
46 (564,634) 10
24
(53,75) 8
24 (454,634)
40 (554,754)
240 189 (436,554) 10 1701 (427,667) 10 1680 9
240 (573,606)
9
989 (572,712) 10
690 (345,572) 10
9
96 (466,772) 10
696 (561,745) 10
690 372
25
K (554,744) 10
125
(564,634) 10
575 (554,762) 10 494 (436,566) 10
225 (535,616) 10
26
130 (527,703) 10
52
27
(554,744) 10
378 (552,636) 10
108 (436,665) 10
28
(564,634) 10
364 (522,676) 10
56 (573,621)
11
588
29
(554,744) 10
319 (446,756) 10
29 (573,635) 11
319
30
(564,634) 10
390 (522,676) 10
31
(554,744) 10
341 (572,626) 10
32
(564,634) 10
384
150 31 (453,755) 12 1736 K (465,771) 12 1376
33
(554,744) 10
34
27
30 (463,751) 11
363
(557,751) 12
726
11K
(515,677) 12
578 490
35
(515,677) 12
36
(515,677) 12
37
(515,677) 12
396 407
38
(515,677) 12
380
39
(515,677) 12
403
40
(515,677) 12
360
41
9K
the best generators are now those of the memory m convolutional code with best free distance and best nd, When the columns of a table reach this point, no further entries are shown. The neighbor number continues to grow with K, according to the linear rate indicated. For a fixed circle K, there is also an encoder memory m, going across a table row, at which no further distance growth occurs, although sometimes a larger m allows an encoder with fewer nearest neighbors. There is thus a saturation as m grows for a given K, as well as vice versa. As a rule, a code with a given d should be as short as possible, in order to obtain the best decoded data error rate. Tables 8.21-8.24 show the best tailbiting generators found at each K, regardless of encoder memory. Tables 8.21-8.24 also list the best feedforward systematic encoders found by exhaustive search. It is clear that short-circle TB encoders can be feedforward systematic without much growth in encoder memory. Note that the long-K encoders with longer memories than shown here are likely to exist. Figures 8.8-8.11 compare d and K for the best TB encoders to the best bounds given in Brouwer and Verhoeff [BrV93] for all four rates. Note that the TB encoders in the right half of the figure were found by random search and are not optimal; wherever the TB encoders are known to be optimal, they essentially match the B-V table.
364
Convolutional Codes with Good Distance Properties
Chapter 8
TABLE 8.20 Minimum distance d for rate R = 2/3 encoding matrices for tailbiting representations for
given K, 4 < K < 50.
M K
2
1
G
d
nd
3
G
d nd
G
d
4 (0,4,6) 2
3 (0,6,7) 2
3 (04,54,74) 2
(4,2,4)
(4,1,7)
(70,34,40)
6 (0,4,6) 2
3 (1,5,5) 2
3 (10,60,64) 2
(4,2,4)
(6,3,7)
(60,74,34)
8 (0,4,6) 3
16 (0,6,7) 3
16 (04,54,74) 3
(4,2,4)
(4,1,7)
(70,34,40)
10 (0,4,6) 3 (4,2,4)
12 (0,4,6) 3 (6,2,4)
14 (0,4,6) 3 (6,2,4)
16
15 (2,4,6) 4 105 (04,50,64) 4 (4,0,7)
K/2 (0,5,7) 4 (7,3,4)
18
(1,6,6) 4 (6,2,7)
20 22
(1,5,5) 5 (6,3,7)
26
(1,4,7) 5 (4,7,3)
28
(0,5,7) 5 (7,3,4)
30
(1,5,6) 5
(6,7,3) 32
(2,7,7) 5 (5,3,4)
34
105
78 56
26 (00,44,64) 4
26
(60,10,64)
18 (14,50,70) 4
18
(74,34,40) 162
(40,14,74)
(1,5,5) 5 132 (00,54,64) 6 1276 (6,3,7)
24
16
(50,70,14)
(1,6,6) 5 172 (10,60,64) 5 (6,2,7)
3
(40,24,60)
7 (0,6,7) 4 56 (04,54,64) 4 (4,1,6)
3
(54,64,30)
8 (0,6,7) 4 78 (04,54,74) 4 (4,3,5)
nd
(54,30,64)
96 (00,64,70) 6 1036 (54,20,50)
52 (04,44,54) 6
767
(60,54,10)
42 (04,44,54) 6
329
(50,34,50) 45 (04,50,54) 6
195
(50,74,10)
32 (04,60,64) 6
104
(60,34,74)
K (04,64,70) 6
51
(50,74,04) 36
(10,54,64) 6
36
(44,74,10) 38
(00,54,74) 6
38
(54,34,44) 40
(04,54,70) 6
20
(50,20,54) 42
(10,54,54) 7
528
(64,30,64) 44
(10,54,54) 7
418
(64,30,64) 46
(10,54,54) 7
414
(64,30,64)
48
(10,54,54) 7
432
(64,30,64) 50
(10,54,54) 7 (64,30,64)
425
Section 8.5
Tailbiting Trellis Encoders
365
Minimum distanced for rate R = 1 /4 encoding matrices for tailbiting representations for given K, 2 < K < 50. The nonsystematic encoders are found by exhaustive search for 1 < m < 5 and by random search for m > 6. The systematic encoders are all found by exhaustive search. The best K = 23 nonsystematic encoder is best also for 24 < K < 50 with nd = K. The best K = 14 systematic encoder is best also for 15 < K < 50 with nd = K. TABLE 8.21
Nonsystematic
Systematic
m
K
G
2
(4,4,4,6)
3
(4,4,6,7)
4
(4,5,6,7)
2
5
9
(40,54,60,70) (46,56,62,76) (42,52,62,76) (54,56,62,76) (46,54,56,74)
10
m
G
d
nd
I
5
2
(4,4,4,6)
2
6
3
(4,4,6,7)
8
11
3
9
10
4 4 4 4
10
9 28
14
(46,55,61,77)
5
16
11
(470,274,714,610)
6
16
12
6 6
17
13
(424,474,704,724) (564,164,534,714)
19
60 325
14
(646,536,526,720)
7
20
805
15
(536,616,474,266)
7
198
16
(626,516,372,166)
7
20 20
17
(656,436,744,552)
7
20
18
(452,616,536,766)
7
22
32 17 234
19
(662,746,576,532)
7
22
76
20
(452,616,536,766)
7
22
80
21
(662,746,576,532)
7
22
42
22
(736,636,452,726)
7
22
23
23
(472,556,726,762)
7
22
23
6 7 8
12 12
d
nd
1
5
2
2
6
3
(4,5,6,7)
2
8
11
(40,54,60,70) (40,54,64,72) (40,51,63,75) (40,53,61,75)
3
10
5
9 10 12 12
72
(400,510,630,764)
6
14
72
285 66
(400,464,734,744)
6
16
(400,554,634,764)
6
16
285 77
(400,544,654,774)
6
16
30
(400,534,674,714)
6
16
13
(400,534,634,754)
6
16
14
8
4 5
9 28 8
K
K
24
TABLE 8.22 Minimum distanced for rate R = 1/3 encoding matrices for tailbiting representations for given K, 2 < K < 50. The nonsystematic encoders are found by exhaustive search for 1 < m < 6 and by random search for m > 7. The systematic encoders are all found by exhaustive search. The best K = 28 nonsystematic encoder is best also for 29 < K < 50 with nd = 4K. The
best K = 16 systematic encoder is best also for 17 < K < 50 with nd = K. Nonsystematic K
G
Systematic
m
d
1
nd
G
m
d
nd
2
(4,4,6)
(4,4,6)
1
1
3
(4,4,6)
1
4 4
3
(4,4,6)
4 4
3
3
4
(4,6,7)
2
6
12
(4,6,7)
2
6
12
5
(40,64,70)
3
7
15
(40,64,70)
3
7
15
6
(54,60,70)
3
8
45
45
4 4 4
8
21
8
21
8
1
8
1
9
(46,52,70) (50,72,76) (46,56,62)
4 4 4
8
7
10
99
(40,56,70) (40,46,76) (40,56,66) (40,53,67)
5
10
99
10
(55,62,72)
5
10
21
(400,672,704)
7
10
21
11
(424,504,774)
6
11
33
(400,456,722)
7
11
12
(530,714,774)
6
12
178
(400,516,674)
7
12
33 186
13
5
12
7
12
65
7
13
(400,652,736)
7
12
14
15
(332,670,452)
7
14
39 112 330
(400,566,752)
14
(47,55,67) (762,442,644)
7
12
15
16
(712,476,650)
7
14
(400,572,672) (400,516,676)
7
12
16
17
(442,075,567)
8
15
18
(731,271,375)
8
16 16
8
80 204 648 323 100
19
(767,323,247)
8
20
(517,415,675)
8
16
21
(457,763,655)
8
16
21
22
(423,563,675)
8
17
110
23
(575,323,517)
8
18
24
(733,753,511)
8
18
460 360
25
(437,673,625)
8
18
175
26
(437,673,625)
8
18
130
27
(465,537,671)
8
18
111
28
(465,537,671)
8
18
112
29
4K
3
K
366
Convolutional Codes with Good Distance Properties
Chapter 8
TABLE 8.23 Minimum distance d for rate R = 1/2 encoding matrices for tailbiting representations for given K, 2 < K < 50. The nonsystematic encoders are found
by exhaustive search for 1 < m < 8 and by random search for m > 9. The systematic encoders are all found by exhaustive search.
Nonsystematic
Systematic
m d nd
m
d
2 (4,6)
1
2
1
(4,6)
1 2
1
3 (4,6)
1
3
4
(4,6)
1
3
4
4 (4,7)
2 2
4 4
14
5 (6,7)
6 (54,60)
3
7
(5,7)
2
4 4
6 (40,72) 7 (40,74)
8 (54,60)
3
5
9 (54,70)
3
6
10
(57,60) (53,70) 12 (414,730) 13 (56,67) 14 (654,760)
5
6
11
5
7
24 (40,72) 102 (400,654) 40 (400,762) 176 (400,732)
6
8
759 (400,675)
5
7
117 (400,653)
6
8
546 (400,727)
15 (414,724)
6
8
(564,740) (534,630)
6
8
450 238
6
8
18 (406,732)
7
8
(540,762) 20 (552,702)
7
K
16 17
G
19
21 (427,667) 22 (562,721)
(1757,5207)
26 (6445,2053)
27 (5640,6542)
10
28 (6551,4055)
11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
24 (4744,2132) 25
(535,616)
29 (6351,4524)
30 (5534,6417) 31
(3356,4245)
32 (4763,6235) 33 (6467,5501)
34 (7204,4667)
(3531,6407) 36 (5355,7167) 35
37 (7715,6501) 38
(5133,7317)
39 (5765,7633)
40 (5755,6141) 41 (5437,6235) 42 (6773,6235)
43 (4637,7521) 44 (4373,6523) 45 (5175,6643) 46 (4325,6747)
47 48 49 50
(5537,6131) (5147,6671) (5175,6643) (5537,6131)
10
(4,7) (40,74)
2 4 14 3 4 10 44 6 34 7 4 5 24 6 6 102 7 6 40 7 7 176 8 8 759 8 7 117 8 8 546
(4000,7152) 10 8 450 (4000,7147) 11 8 238 153 (4000,6762) 10 8 170 54 (4000,6572) 10 8 72 19 (4000,6571) 11 8
19
240 (4000,6275) 11 9 320 8 10 1680 (4000,6571) 11 9 189 8 10 1144 (4000,6457) 11 9 110 7
11 10 8 11
23
8
G
nd
9
11 3312 (4000,6457) 11 9 10 252 (4000,7345) 11 9
69
10 10 11 12 12 12 12 12 12 12 12 13 14 14 14 14 14 14 14 14 14 15 14 14 14 14
125 (4000,7345) 11 9
75
39 (4000,6457) 11 9 675 (4000,6457) 11 9
52 57
4690 (4000,7153) 11 9 3161 (4000,6457) 11 9
84
(4000,6457) 11 9 (4000,6457) 11 9 (4000,7153) 11 9 (4000,6457) 11 9
60 62 96 66
2285 868 632 275 136 35
936 6401 4674 2964 1880 738 378 258 286 90 1474 47 48 49 50
72
58
2K
Section 8.5
Tailbiting Trellis Encoders
367
Minimum distance d for rate R = 2/3 encoding matrices for tailbiting representations for given K, 4 < K < 50. The nonsystematic encoders are found by exhaustive search for 1 < m < 3 and by TABLE 8.24
random search form > 4. The systematic encoders are all found by exhaustive search.
Nonsystematic K
G
4 (0,4,6)
Systematic
md 1
2
nd
3 (4,0,4)
1 2
3 (4,0,4)
(4,2,6)
8 (0,4,6)
1 3
(0,6,7)
16 (4,0,6)
2 4 2 4
(40,00,70) (00,40,54) 78 (40,00,70)
105
2 4
56 (40,00,74)
(0,5,7)
2 4
(7,3,4)
18 (1,4,7)
2 4
(6,6,1)
20 (64,74,74)
(44,10,60) 22 (30,54,64) (64,54,00) 24 (76,54,22) (10,42,54) 26 (34,50,54) (70,04,60) 28 (04,66,42)
(70,46,56) 30 (14,70,46) (64,74,36) 32 (52,12,54) (26,52,40) 34 (52,64,16) (10,66,62) 36 (64,67,77)
3 5
3 6
4 6
26 (40,00,72)
(00,40,66) 18 (40,00,54) (00,40,71) 162 (40,00,63) (00,40,57) 1276 (40,00,63) (00,40,57) 1024 (40,00,67)
3 6
767 (40,00,63)
4 6
259 (40,00,63)
4 6
150
4 6
80
4 6
34
5 7
792
(00,40,57) (40,00,63) (00,40,57) (40,00,71) (00,40,57) (40,00,63) (00,40,57) (40,00,63)
5 7
(00,40,57) 5 8 4875 (40,00,63) (00,40,57) 5 8 3801 (40,00,63) (00,40,57)
42 (04,63,75)
(62,75,33) 44 (34,21,45) (72,57,01) 46 (34,65,40)
513 (40,00,63)
5 8 2904 5 8 1886
(60,13,32)
48 (04,63,75)
5 8 1350
(62,75,33) 50 (64,41,17) (43,13,65)
3 4 105
3 4 102
4 4 56 44
26
54
18
54
10
5 5 121
55
96
5 5 104 5 5
98
55
90
5 5 112 5 5 102 5 5 108
(00,40,57)
(75,01,63)
(53,23,51)
16
(00,40,57)
(15,50,22)
40 (57,15,17)
23
(00,40,53)
(22,37,41)
38
3
(00,40,62)
(4,1,7)
16
1 2
(00,40,54)
(4,3,5)
14 (0,6,7)
3
(0,4,7)
(4,0,7)
12
2
1
(0,4,6)
(4,6,2)
10 (2,4,6)
m d nd
(0,4,6)
(4,6,2) 6 (0,4,6)
G
5 8
350
5 5 114 5 5 124 5 5 126 6K
Chapter 8
368
Convolutional Codes with Good Distance Properties
dmin
Brouwer-Verhoeff upper bound 25
Brouwer-Verhoeff lower bound 20
Tailbiting trellis
15
representation 10
Gilbert -Varshamov bound
'-- K 5
10
15
20
25
Figure 8.8 Best minimum distance found for R = 1/4.
dmin 1
22 20
Brouwer-Verhoeff upper bound
18
Brouwer-Verhoeff lower bound
16
-
14 11
12
.
Tailbiting trellis representation
10 8 6
4 2
0V 0
K 5
10
15
20
Figure 8.9 Best minimum distance found for R = 1/3.
25
Section 8.5
Tailbiting Trellis Encoders
369
dmin
20 18
Brouwer-Verhoeff upper bound
Brouwer-Verhoeff lower bound \
16
14 12
/
Tailbiting trellis representation
10
8 6 4 2
0L 0
K 10
5
15
20
30
25
35
40
Figure 8.10 Best minimum distance found for R = 1/2.
dmin 11
10
Brouwer-Verhoeff upper bound
8 7
Tailbiting trellis representation 5
i
Gilbert -Varshamov bound
3
0` 0
-I
10
15
20
25
30
35
40
45
Figure 8.11 Best minimum distance found for R = 2/3.
50
w K
Chapter 8
370
Convolutional Codes with Good Distance Properties
8.6 COMMENTS
Our search for good convolutional encoding matrices started with the introduction of the distance profile which could be used to give an essential reduction of the set of potential candidates that we had to investigate [Joh75]. It turned out that for short rate R = 1/2 encoding matrices, the ODP condition could be imposed at no cost in free distance. More ODP encoding matrices were reported in [Joh76] [Joh77a] [JoP78]. With the development of the FAST algorithm, we achieved a dramatic improvement in the speed at which we could determine the weight spectra for convolutional codes. The algorithm, together with extensive tables of rate R = 1 /2 convolutional encoding matrices, was reported in [CeJ89]. For several memories, slightly better convolutional codes exist [JoS97]. The tailbiting representations reported in Section 8.5 are from [SAJ97].
9 Modulation Codes
In this chapter we will study digital transmission over bandlimited channels. In particular, we are interested in combining coding and modulation techniques in order to achieve significant coding gains over conventional, uncoded, multilevel modulation without compromising bandwidth efficiency.
9.1 BANDLIMITED CHANNELS AND QAM Under ideal conditions, we can transmit 2W pulses without intersymbol interference over a bandlimited channel with (two-sided) bandwidth W. This is illustrated in Fig. 9.1 showing the classical quadrature amplitude modulation system (QAM) in which W pairs of amplitudemodulated input pulses are transmitted per second. The outputs of the ideal low pass filters (LPF) of the receiver are sampled W times per second. Assuming no noise, perfect filters, perfect carrier timing recovery, and so on, we get out exactly what we put in. The general form for the signal is
s(t) _
(vRiVcoswot
- vri,12 sinwot) g(t - iT)
(9.1)
where T = w is the duration of the signaling interval and g (t) is the pulse used to form the QAM
waveform. The two phase-quadrature carriers are modulated by a set of discrete amplitudes VRi E R and Vii E Z. The amplitudes vRi and v j are called the in-phase and quadrature modulation components, or the real and imaginary modulation components, respectively. The set of pairs (vRi, vr) forms a signal constellation, that is, a set of signal points, in a twodimensional Euclidean signal space Ii82
As an example, suppose that R = Z = {-3/2, -1/2, +1/2, +3/2}. Then we have the 16-ary QAM signal constellation shown in Fig. 9.2. Since we are transmitting 4 bits per signaling interval, we have the bandwidth efficiency (or spectral bit rate, cf. Problem 1.1) Rt / W = 4 (bits/s)/hertz, where Rt is the data rate in bits/s. Suppose that this 16-ary QAM signal constellation is used to communicate over a AWGN
channel whose noise has two-sided power spectral density No/2. Furthermore, suppose that 371
Chapter 9
372
Modulation Codes
Ideal
channel
sin wot
- v1_2_ sin wot
Figure 9.1 QAM-system for transmitting W pairs of pulses per second over an ideal bandlimited channel.
Figure 9.2 A 16-ary QAM signal constellation.
T
a Gray code is used to map the four input bits onto the signal constellation. Thus, two input tuples at Hamming distance 1 are mapped onto two signal points at Euclidean distance 1. Then the bit error probability is expressed by (Problem 9.1)
Pb=4Q(,io) +
-4
(
Q(i'0)
(9.2)
where Eb denotes the energy per information bit. For large signal-to-noise ratios, (9.2) is approximated by Pb _,
4Q
(ii0)
(9.3)
15-
In Fig. 9.3 we have plotted the bandwidth efficiency R,/ W versus the signal-to-noise ratio Eb/No at the bit error probability Pb = 10-5. The potential for improving 16-ary QAM system by coding is large, 7.8 dB. The error performance of our system depends on the Euclidean distance between the closest pair of signal points. Let d2 min be the minimum of the squared Euclidean distance between any two signal points in the signal constellation. This distance is determined by the positions and the number of signal points and by the average energy E of the constellation. Assume that the input symbols are independent and identically distributed with a uniform distribution over the set of signal points. Let the minimum squared Euclidean distance for a 16-ary QAM signal constellation be d2. Then we have the average energy
E
16d2(4.4+8. 4 +4.
4)=2.5d2
(9.4)
or, since we are transmitting 4 bits per signal point,
Eb = E/4
(9.5)
Section 9.2
Coding Fundamentals
373 6
5
16-ary QA M,
4+
8-ary PS
[(bits/s) /h ertz] 3 QPSK
2+
BPSK
1+
0
2
4
6
8
10
12
14
Eb/No [dB]
Figure 9.3 Bandwidth efficiency Rt/W versus signal-to-noise ratio Eb/No at bit error probability Pb = 10-5.
Hence, we have dE min =
d2 = 1.6Eb
(9.6)
for 16-ary QAM.
We can easily improve the error performance of our QAM system by increasing the signaling energy. An alternative is to make the signal constellation more sparse, which, at the cost of a reduced data rate, increases the minimum distance. A more clever approach is to introduce interdependencies between consecutive signal points in order to increase the distance between the closest pair of sequences of signal points. This can be done by expanding the signal constellation and using relatively simple codes. Then we can gain much more in minimum distance between sequences than what is offset by the increase of the average energy owing to the expansion of the signal constellation.
9.2 CODING FUNDAMENTALS By introducing interdependencies between signal points in a sequence, we can exclude some sequences from the set of possible sequences. A general model for the coding procedure is shown in Fig. 9.4. The exclusion of some sequences of signal points is caused by the memory part. The set of output sequences of signal points from a coded modulation system is called a modulation code. The minimum squared Euclidean distance between two different sequences of signal points is called its free squared Euclidean distance dE free. Memory
Constellation selector
Data bits
Selection of a signal point from the constellation
Figure 9.4 A general model for coded modulation.
Signal
points
374
Modulation Codes
Chapter 9
Using a maximum-likelihood sequence detector at the receiver yields an asymptotical (for high signal-to-noise ratios) coding gain of
= Iologlo
Y
. (d2))
idB
(9.7)
Em
where is the minimum squared Euclidean distance between signal points in the uncoded scheme, and E and E(u) are the average signal energies of the coded and uncoded schemes, respectively. Ungerboeck [Ung82] introduced "mapping by set partitioning" to break down a signal constellation into a suitable number of subsets such that in each subset the signal points are farther apart. We will now illustrate this procedure by an example. Consider the 64-ary QAM signal constellation shown in Fig. 9.5. The constellation can be divided into two subsets by assigning alternate signal points to each subset. In Fig. 9.5 we
........ ........ . . . . . ........ ........
........ ........ . . . . ........ ........
.
. .
.
.
.
.
.
.
.
.
.
.
.
A
.
.
.
.
.
.
.
.
.
E
C
9
........ ........ ........ ........ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
B
.
D
.
.
T
. . .
.
.
.
71
Figure 9.5 Set partitioning of a 64-ary QAM signal constellation.
.
.
Section 9.2
Coding Fundamentals
375
have repeated the partitioning to obtain first four and then eight subsets. The minimum squared Euclidean distance between signal points within a subset is doubled for each partitioning. This procedure can be repeated until we have 64 subsets of only 1 signal point each. As an example let the memory part in the coded modulation scheme in Fig. 9.4 be the
rate R = 1/3, memory m = 3 convolutional encoder shown in Fig. 9.6. V(1)
v(2)
Figure 9.6 Convolutional encoder used in the
u(t)
modulation scheme shown in Fig. 9.7.
In Fig. 9.7 we show a coded 64-ary QAM modulation scheme in which one of the four input bits, viz., ui'), is encoded by the convolutional encoder shown in Fig. 9.6. The encoder output v; = vi') vi2) Vi(3) is used to select one of the eight subsets A, 13, ... , N of the signal points in the third level of the set partitioning in Fig. 9.5. The remaining three uncoded input select the signal point from the subset selected by bits, Y the encoder output.
ii
Convolutional encoder
Select
constellation
R = 1/3
I One of the
eight constellations
u(2)
Select
signal point from constellation
u(3) i U (4)
One of the 64 signal points
x
Figure 9.7 Encoding four information bits into a 64-ary QAM constellation.
The smallest squared Euclidean distance between two different sequences of signal points
for which the convolutional encoder outputs are the same is equal to the minimum squared Euclidean distance between two signal points in any one of the eight subsets A, B, ... , R. Let us denote this squared Euclidean distance dEn,;n(S); that is, def
dEmin(S)
min min {dE(s, s')} S s,s'ES
(9.8)
where S E {A, L3, ... , 7l} and where dE(s, s') is the squared Euclidean distance between the signal points s and s'. Clearly, dE free - dE min (S)
(9.9)
Let dE min (S, S') denote the smallest squared Euclidean distance between a signal point in subset S and a signal point in subset S'; that is, d2E min (S,
where S, S' E {A,13, ..., N}.
S')def- min{dE(s, s')} sES S'ES'
(9.10)
376
Chapter 9
Modulation Codes
Let us denote the free squared Euclidean distance of sequences whose symbols are the subsets A, B, ... , 7-l by dE free (S); then we have def
2
dE free (S)
ao
dE2 in
min
S#S
i=0
(Si, S)
(9.11)
where S = (Se, Sl, ...) and S' = (So, Si, ...) are two infinite code sequences stemming from the same state and whose symbols are the subsets A, 13, ... , R. 2 Combining the fact that dE2 free cannot exceed dEfree(S) and definition (9.8), we have dE2
2
free = min{dEfree(S), dEmin(S)} 2
(9.12)
In order to build up a large free squared Euclidean distance for the code whose codewords are sequences of the subsets A, B, ... , 7-l, we assign subsets that have a common immediate predecessor in the partitioning tree (Fig. 9.5) to encoder output tuples that belong to the branches stemming from or terminating at the same state. The map between encoder output and signal subset that is given in Table 9.1 is constructed according to these rules. Suppose that the minimum distance between two signal points, that is, the unit of the spacing, for our 64-ary signal constellation is d. Then we have dEmin(S) = 8d2
(9.13)
The smallest distances between points in different subsets are given by dE min (A, 13) = 4d2
dEmin(A, C) = 2d2 dE min (A,
D) = 2d2
(9.14)
dEmin(A, £) = d2
dEmin(c, x) = 4d2 TABLE 9.1 Selection of constellations. vi
Subset
000 111
010 101
100 011 110
001
In Fig. 9.8 we show the trellis for our code. The branches are labeled according to the map specified in Table 9.1. We have, for example, 2 2 2 2 dE free (S) = dE min (A, B) + dE min (A, E) + dE min (A, C) + dE min (A, B)
(9,15)
= 4d2 + d2 + 2d2 + 4d2 = lld2
Inserting (9.13) and (9.15) into (9.12), we have dEfree = min{dEfree(S), dEmin(S)} = min{11d2, 8d2} = 8d2
(9.16)
Coding Fundamentals
Section 9.2
377
Figure 9.8 A rate R = 1/3 trellis with eight states.
for our coded modulation scheme using a rate R = 1/3, memory m = 3 convolutional encoder and 64-ary QAM. Its average energy is (cf. Problem 9.2)
E
_
1
64d 2
2 4 4+8
10
4
+4
18
4
+8
26 4 -I-8
34 4
+8
50 4
+4
50 4
C
+8. 4
(9.17) +8.
4 +4. 48 I = 10.5d2
Since uncoded 16-ary QAM with dE(m n = d2 has E(u) = 2.5d2, we have a coding gain (9.7)
of 8d2/10.5d2 = 2.8 dB (9.18) d2/2.5d2 for our coded 64-ary QAM scheme over uncoded 16-ary QAM. Since the free distance for our scheme is determined by the minimum distance between two signal points within a subset, using a more powerful (longer memory) rate R = 1/3 convolutional encoder will not improve the coding gain. However, we can continue the partitioning process and obtain subsets whose signal points are farther apart and thus obtain an increase in the coding gain. In Fig. 9.9 we show the set partitioning of subset A. For this partition we have
y = 101ogio
dEmin(S') = 32d2
where S' denotes any one of the 32 subsets with only two signal points,
(9.19) viz.,
S' E {AA, Al?..... RD}. Four of them, viz., AA, Al, AC, and AD, are shown in Fig. 9.9. Now we let one of the four input bits, u(4) say, select a signal point from a subset in S'. The remaining input bits, u; t)ui21 ui3), are inputs to a rate R = 3/5 convolutional encoder whose output v, = vi tlvi21 ... selects the subset. By choosing the encoder memory large enough, we can obtain
dEfree(S') > dEmin(S') = 32d2
(9.20)
378
Chapter 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........ ........
.
.
A.A.
.
Modulation Codes
. .
........ ........ ........ ........ AC
........ ........ ........ ........ AD
Figure 9.9 Set partitioning of subset A.
Hence, we have
d2E free = 32d 2
(9.21)
which gives us a large coding gain,
32d2/10.5d2 = 8.8 dB (9.22) d2/2.5d2 Finally, we can also use the ultimate partitioning of the 64-ary QAM constellation. Let S" E {64 subsets of 1 signal points) and choose a rate R = 4/6 convolutional encoder. Since there is only one signal point in a subset S", we have
y = 10loglo
(9.23) dE min (S") = 00 and it follows that dEfree is determined solely by the convolutional encoder and the map from
the encoder outputs onto the set of signal points.
9.3 LATTICE-TYPE TRELLIS CODES
Our coded 64-ary QAM scheme in the previous section can be put into the mathematical framework of lattices and lattice cosets. The signal constellation should be regarded as a finite set of points taken from an infinite lattice, and the partitioning of the signal set into subsets corresponds to the partitioning of that lattice into a sublattice and its cosets. For signal constellations that are subsets of a lattice, we can construct coded modulation schemes using algebraic and geometric properties of the lattice. An N-dimensional real lattice A is an infinite discrete set of points (vectors, N-tuples) in the real Euclidean N-space RN that forms a group under ordinary vector addition. The sum of two points in A is in A, and the point 0 is also in A. Clearly, a lattice is a regular array.
Section 9.3
379
Lattice-Type Trellis Codes
EXAMPLE 9.1 The set Z of all integers is a one-dimensional lattice (Fig. 9.10).
EXAMPLE 9.2 The set Z2 of all integer-valued two-tuples is the two-dimensional square lattice (Fig. 9.10). z2
z
Figure 9.10 Lattices Z and Z2.
Given a lattice A we can obtain closely related lattices by scaling and orthogonal transformations:
Scaling: If r is any real number, then rA is the lattice consisting of all multiples rA., where A E A. Orthogonal transformation: If T is any orthogonal transformation of N-space, then T A is the lattice consisting of all transformations TX, where k E A. EXAMPLE 9.3 The lattice 27L of all even integers is obtained by scaling the lattice Z by r = 2.
Consider the two-dimensional lattice Z2 shown in Fig. 9.10. The lattice R7L2, where
R=
(1 -1)
(9.24)
is obtained by rotating Z2 by 45° and scaling by /2-. This operation is illustrated in Fig. 9.11. The lattice RZ2 is a sublattice of Z2. z2
RZ2
R2Z2 = 2762
Figure 9.11 Lattice Z2 and its sublattices RZ2 and R2Z2 = 2Z2.
Since R2 = 212, we have R27L2 = 2Z2
(9.25)
which is illustrated in Fig. 9.11.
A coset of a lattice A, denoted by A + c, is the set of all points obtained by adding a fixed point c to all lattice points .L E A. Geometrically, the coset A + c is a translate of A by
c.IfcEA,then A+c=A. A sublattice A' of a given lattice A is a subset of the points in A that is itself a lattice; that is, A' is a subgroup of the additive group A. The set of all cosets of a sublattice is denoted
A/A' and is called a partition of A. In Fig. 9.12 we show the chain of two-way partitions Z2/RZ2/2762. We will now use the structure of a lattice to construct good modulation codes.
Chapter 9
380
Modulation Codes
z2
nz2
RZ2 + (1. 0)
Z 2Z2
2Z2 + (1; 1)
2722 + (1. 0)
2Z2 + (0. 1)
11
0 11
Figure 9.12 The lattice partition chain Z2/RZ2/2Z2 and its cosets.
Definition A coset code (C(A/A'; C) is the set of all signal points that lie within a sequence of cosets of A' that could be specified by a sequence of coded bits from C. In Fig. 9.13 we show the general structure of an encoder for a coset code. k bits
k+r Encoder
coded bits
for C
Coset selector
A/A' One of 2k+r cosets of A
One of 2n+r
n - k uncoded bits
Signal point selector
signal points Figure 9.13 General structure of encoder for
coset code C(A/A'; C).
If C is a block code, then we call the coset code C(A/A'; C) a lattice code.
Definition A lattice-type trellis code (or simply a trellis code) is a coset code C(A/A'; C), where C is a rate R = k/(k + r) convolutional code; r is the redundancy of the convolutional code. EXAMPLE 9.4 A C(Z2/2762; C) trellis code with a rate R = 1/2 convolutional code C is transmitting 5 bits per two dimensions with the squared 64-ary QAM signal constellation. Its encoder is shown in Fig. 9.14. In order to minimize the average energy, the signal constellation is actually chosen from a translate of the
Section 9.3
Lattice-Type Trellis Codes
381
lattice Z2, viz., Z2 + (1/2, 1/2). Rate R = 1/2
1 bit
convolutional
2 coded bits
Coset selector
Z2/2Z2
encoder
One of the four cosets of 2Z2
Signal point
4 uncoded bits
One of the 64 signal points
selector
Figure 9.14 Encoder for trellis code C(Z2/2Z2; C).
2 2 For trellis codes we will use the notations dE2 free ((C), dEfree (A'), and dEnun(A') instead and dE respectively. Then, as we have already shown (9.12), for a of 4 free' free (S), min (S), trellis code we have
4
dE2
2
2
free (_) = min [dEfree (A'), dEmin(A'))
(9.26)
The principal geometric parameters of a lattice A are the minimum squared Euclidean distance between its points, dE min (A), and its fundamental volume, V (A), which is the volume of the
N-space corresponding to each lattice point. If C is a trellis code with reasonably many signal points that lie in a region ]B, then the distribution of its signal points in N-space is well approximated by a uniform continuous distribution over the region B. Forney and Wei [FoW89] call this the continuous approximation. Thus, the average energy E(]I) of a signal constellation is approximated by E(11$) ^ I
J
I b 12 dV)
where I b I2 is the energy of a signal point b E
/(NV(B)/2)
(9.27)
H and
dV
V( H
(9.28)
J11
is the volume of B, 1 / V (F) is the density of a uniform continuous probability distribution over H and the division by N/2 normalizes to two dimensions. EXAMPLE 9.5 Consider a one-dimensional pulse-amplitude-modulated (PAM) signal constellation with M signal points
{-M12,..., -1/2, 1/2, ... , M/2}. Using the continuous approximation, E is a line segment of length M centered at the origin. Then we have
E( H - (
/2
K
b2db /(M/2) = M2/6
(9.29)
/2
normalized per two dimensions. An exact calculation (Problem 9.2) gives
= (M2 - 1)/12
(9.30)
E(I3) = (M2 - 1)/6
(9.31)
E( H
per one dimension, or
per two dimensions.
We will now show that under the continuous approximation the coding gain y (C) for a trellis code C(A/A'; C) is separable into two parts:
Modulation Codes
Chapter 9
382
The fundamental coding gain ye (C), which depends on the code C and on the partition
A/A'. The shape gain ys, which is determined not by C but by the choice of the signal constellation bounding region Let our uncoded reference scheme be an N-dimensional cubic signal constellation con-
sisting of MN signal points chosen from a lattice A with 4min(A) = 1 and centered at the origin. Its average energy per two dimensions is M2/6 (9.29). Then, for a trellis code C(A/A'; C) with constellation bounding region ]B, we have the coding gain
y(C) = 1010 g10 dEfree(C)/E(13) 4min(A)/E(u)
(9.32)
which can be written as d4free(C)M2
Y(C) = lologto
6E(R
(V(]B))21N = 10M2dEfree(C) log10 ( ) + 10loglo ( 6E(IB)
(V(] ))2/N
(9.33)
/
YS
YC (C)
The signal constellation for our trellis code contains MN2' signal points, where r is the redundancy of the convolutional code C. Each signal point occupies the volume V (A). Hence, for reasonably large signal constellations, we have V (B) = MN2'V (A)
(9.34)
Combining (9.33) and (9.34), we have the coding gain Y (Q = YJC) + Ys
(9.35)
where dEfree(C) 10logto 22r/NV(A)
(9.36)
and
y = l0log s
V( H )2/N
10 6E(
EXAMPLE 9.6 The two-dimensional trellis code in Example 9.4 has 4min(A) = 4. C can be chosen such that dE2
2 free(A) > dEmin(A')
and, hence, such that dE free (C) = 4
Since the convolutional code C has redundancy r = 1 and since V (A) = V (7Z2) = 1
we have the fundamental coding gain
y, (C) = 10log,o
4
2
= 3 dB
Using the continuous approximation, we have E(1E$) = M2/6
(9.37)
Section 9.3
Lattice-Type Trellis Codes
383
which together with
=M z
V(
inserted into (9.37) yields the shape gain
YS=OdB as was expected since the constellation is square. Thus, the total coding gain is 3 dB. EXAMPLE 9.7
Consider an encoder for the trellis code C(7L2/2RZ2; C), where C is a rate R = 2/3 convolutional code encoded by the encoder shown in Fig. 9.15. The free squared Euclidean distance (Problem 9.4) 2
d2
free(C) = 5. Hence, we obtain the fundamental coding gain 2
Yy(C) = 10log1o
101og,o
5
2.1
= 3.98 dB
(9.38)
Let the signal constellation be the 32-ary so-called cross constellation shown in Fig. 9.16. It has the average energy (Problem 9.5)
E(B) = 31/6 Since
V(
) = 32
we obtain the shape gain
ys = l0logio
V (18)2/^'
6E(B)
= 10log1o
32
6.31/6 = 0.14dB
The total coding gain is
y (C) = y (C) + y, = 4.12 dB
V(2) V(3)
U(1)
Figure 9.15 Encoder for the convolutional code used in Example 9.7.
F
u(2)
OF. Figure 9.16 Cross constellation boundary for 32ary QAM.
Chapter 9
384
Modulation Codes
Because it is more like a circle, the cross constellation in the previous example is slightly more efficient than a square (0.14 dB). For a given area, a circle minimizes the average energy. The best bounding region 11 is a circle. EXAMPLE 9.8 A circle with radius p has area 7rp 2. Its average energy is
1
Ec
22 irbdb = p2/2
np2
Thus, the shape gain for a circle is YS = lolog10
V (I)2/N
7rp2
101og10 = 10log10 6E(B) = 6p2/2
7r
3
= 0.20dB
An N-sphere of radius p has, for N even, the volume [FoW89] V( H)
= nN/2PN/((N/2)!)
(9.39)
) = 2p2/(N + 2)
(9.40)
and average energy
E(
Thus, the shape gain for an N-sphere is 2/N
YS = 101og10
6E(B)
= 101ogto 12((N/)
(9.41)
)21N
Using the Stirling approximation
n! - (n/e)°
(9.42)
we can easily see that the shape gain YS
l0loglo
7r(N + 2)
12N/(2e) =
10log1o
7re(N + 2) 6N
(9.43)
approaches its optimum value 101og10(7re/6) = 1.53 dB
when N
(9.44)
oo.
9.4 GEOMETRICALLY UNIFORM TRELLIS CODES Designing an efficient modulation system is to some extent a geometrical problem. An isometry is a distance-preserving transformation of some metric space.
Definition An isometry q of ][8N is a transformation 77
RN
RN X
F+
17(x)
that preserves Euclidean distances, RN II 17(x) -17(y) 112=11 x - y 112, x, y E
(9.45)
Translations, rotations, and reflections are important examples of isometries.
Definition A geometrical figure S is any set of points of RN. The image of a figure S under an isometry r7 is denoted as 17(S). Two figures S and S' are geometrically congruent
Section 9.4
385
Geometrically Uniform Trellis Codes
(or simply congruent) if there exists an isometry r7 such that ri(S) = S'. If S and S' are geometrically congruent, then we say that S and S' have the same shape.
We call figures that are sets of discrete points signal sets. A finite signal set is a signal constellation.
Definition A signal set S is geometrically uniform if, given any two points s and s' in
S , there exists an isometry t7s..r- that transform s to s' while leaving S invariant: r.s..r'(s) = s' n.t.s, (S) = S EXAMPLE 9.9 The signal sets in Fig. 9.17 are (a) a geometrically uniform constellation and (b) an infinite geometrically uniform signal set or a regular array.
11
..... ..... ..... .....
11
Figure 9.17 Geometrically uniform signal sets.
(a) Geometrically uniform constellation
(b) Regular array
Most signal sets S used in communications are geometrically uniform. EXAMPLE 9.10 The geometrically uniform signal set shown in Fig. 9.17 (a) is an 8-ary phase-shift key (8-ary PSK) signal constellation. The constellation is invariant under rotations by multiples of 45-.
Let S/S' be a partition of a geometrically uniform signal set such that the subsets S' are geometrically uniform and mutually congruent. In Fig. 9.18 we show an example of such a partition. EXAMPLE 9.11 The subsets in Fig. 9.5 are geometrically uniform and mutually congruent signal constellations.
Let S/S' be a geometrically uniform partition of a signal set S into subsets and let an encoder for a convolutional code C generate a label sequence v. This label sequence is mapped into a subset sequence (Fig. 9.19). A specific signal sequences is then selected according to
the n - k uncoded bits. A signal space code C(S/S'; C) is a subset of the set of all sequences of elements of the signal set S. If S/S' is a partition of a lattice-type signal set S = A + c into cosets of a sublattice A' of A, then S/S' is isomorphic to A/A' and the signal space code (C(S/S'; C) is a lattice-type trellis code C(A/A'; C) defined in the previous section. A geometrically uniform PSK-type trellis code is a signal space code (S/S'; C) with a finite signal set S and a convolutional code C.
386
Modulation Codes
Chapter 9
8
A
C
D
Figure 9.18 Partitioning of a geometrically uniform signal constellation into subsets that are geometrically uniform and mutually congruent.
k bits
Encoder
Label sequence
for C
V
Mapping from labels
to subsets Subset sequence
n - k uncoded bits
Signal point Signal point
sequence a
selector
Figure 9.19 Encoder for a signal space code
C(S/S'; C). EXAMPLE 9.12
The signal set S in Fig. 9.18 is an 8-ary PSK signal constellation. The subsets A, B, C, and D are geometrically uniform and mutually congruent. The rate R = 1/2 convolutional encoder shown in Fig. 9.20 is used to generate the label sequences. Labels stemming from (or entering into) the same states are mapped onto subsets stemming from the same subsets in the previous level in the partitioning tree. Thus, we have, for example, Label
Subset
00
A
01
8
10
C
11
D
Section 9.5
Decoding of Modulation Codes
387
Then, using the distances between signal points shown in Fig. 9.21, where we have assumed unit energy, we obtain 2
dE free (S)
2
2
2
= dE in (A, B) + dE min (A, C) + dE min (A, 13)
=2+(2-/2-)+2=4.59
(9.46)
One uncoded bit selects the signal point from one of the subsets it, B, C, and D. Since (9.47)
dEmin(S') = 4
our coded 8-ary PSK scheme has z
d E free (C) = 4 and a coding gain over uncoded QPSK (4-ary PSK) of z
l0loglo d2dEfreePS)K = 10logl0 Emin(Q
)
2
= 3 dB
(9.48) v(1)
v(2)
Figure 9.20 A rate R = 1/2 convolutional encoder and its trellis.
Figure 9.21 Distances between signal points in an 8-ary PSK constellation.
9.5 DECODING OF MODULATION CODES In this section, we show by a simple example how to use Viterbi decoding to decode modulation codes. EXAMPLE 9.13 The rate R = 1/2 convolutional encoder shown in Fig. 9.20 is used in a coded QPSK scheme with the following mapping from labels onto signal points:
388
Chapter 9
Modulation Codes
The labels are used to denote branches in the trellis shown in Fig. 9.22. r =
10
00
10
01
11
00 6
Figure 9.22 An example of Viterbi decoding-signal space code.
The free squared Euclidean distance is, for example, dEfree(C) = dE11 (00, 01) + dEm;,,(00, 10) + dEm;,,(00, 01)
=4+2+4= 10
(9.49)
The minimum squared Euclidean distance for BPSK dE,,,;,, (BPSK) = 4
(9.50)
and, hence, the coding gain for our coded QPSK scheme over uncoded BPSK is 2
Y = 10togio
dE free(C) z
10
= 10 log10 - = 3.98 dB
dE min (BPSK)
4
(9.51)
Suppose that four information symbols followed by two dummy zeros have been encoded and that r = 10 10 00 110 100 has been received over the Gaussian channel (hard decisions). In Fig. 9.22 we show the trellis with cumulative squared Euclidean distances and discarded subpaths. The maximumlikelihood estimate of the information sequence is u = 1010.
9.6 COMMENTS When Ungerboeck presented the set partitioning idea at a conference in Ronneby, Sweden, in 1976 [UnC76] (see also [Ung82][Ung87a][Ung87b]), its importance was immediately recognized. This celebrated work led to a flurry of research papers, and it did not take long before Ungerboeck's trellis-coded modulation schemes appeared in consumer products such as modems. Our presentation is based on Fomey's treatments of the subject [For88a] [For88b] [For91 b]
PROBLEMS 9.1 Verify equation (9.2). Hint: The bit error rate for 16-ary QAM is the same as for 4-ary AM.
Problems
389
9.2 Prove that the average energy for a two-dimensional M2-ary QAM signal constellation is
E = (M2 - 1)d2/6 where d is the unit of the spacing. Hint: The average energy for a two-dimensional M2-ary QAM signal constellation is twice the average energy for a one-dimensional M-ary AM signal constellation. Furthermore, n
Y k2 = n(n + 1)(2n + 1)/6 k=0
9.3 Use a rate R = 1/3 convolutional encoder and design a coded 16-ary QAM scheme. What is the coding gain compared with an uncoded 4-ary QAM system? 9.4 Consider the trellis code C(Z2/2RZ2; C), where C is a rate R = 2/3 convolutional code encoded by
G(D) =
1
D
0
( D+D2 1+D2 D
(a) Determine dEfree(C) (b) Let 3 be the region containing the 32-ary QAM cross constellation. Draw the eight subsets of the signal constellation. 9.5 Show by using the continuous approximation that the 32-ary QAM cross constellation in Fig. 9.16 has
E(B) = 31/6 9.6 Consider a signal space code C(S/S'; C) for a coded 8-ary PSK scheme with a rate R = 1/2 convolutional code C and a mapping:
Figure P9.6
111
Determine its free distance d2 free (C) when the encoding matrix for the convolutional encoder is
(a) G(D) = (1 + D + D2 (b) G(D) = (1 1 + D)
1 + D2)
9.7 Use the convolutional encoder shown in Fig. 9.15 and construct a coded 8-ary PSK scheme; find its coding gain over uncoded QPSK. 9.8 The convolutional encoder shown in Fig. 9.20 is used in a coded 8-ary PSK scheme with mapping: 010 110
0
000
001
)1'l 1
101
Figure P9.8
011
Three information tuples followed by two dummy zero-tuples have been encoded, and r =
390
Chapter 9
Modulation Codes
010 111010 010 010 has been received over the Gaussian channel (hard decisions). Find the maximum-likelihood estimate of the information sequence. 9.9 Consider a coded 8-ary PSK scheme with the following encoder and mapping:
V(3)
000
110
Figure P9.9
(a) Find an equivalent encoder without feedback. (b) Determine the free distance dEfree (C) 9.10 Consider an encoder for a signal space code C(S/S'; C) for an 8-ary PSK scheme. The code C is encoded by the encoding matrix G(D) = (1 + D2 D), and the mapping is the same as the one given in Problem 9.6. Find the coding gain for the scheme. 9.11 Repeat Problem 9.10 with the following mapping:
VMV (2) V (3)
Figure P9.11
9.12 Consider the convolutional encoding matrix G(D) = (1 + D2 D). (a) Design a coded 8-ary PSK scheme. (b) Find the coding gain over an uncoded scheme. 9.13 Consider the convolutional encoding matrix G(D) = (D 1 + D D + D2). (a) Design a coded 16-ary QAM scheme. (b) Find the coding gain over an uncoded scheme. 9.14 Consider an encoder for a signal space code C(S/S'; C) for an 8-ary PSK scheme. The code C is encoded by the encoding matrix G(D) = (1 + D + D2 + D3 1 + D2 +D 3 ), and the following mapping is used:
Problems
391 110 100
11
000
001
v(1)v(2),(2)
)1(01
010 Figure P9.14
Decode the received sequence r = 110 111 000 001 010 001 111. 9.15 Consider the convolutional code with encoding matrix
G(D)=
1+D 1+D 0
D
1
1+D
(a) Design a coded 8-ary PSK scheme. (b) What is the coding gain? 9.16 Consider the following TCM scheme:
V Figure P9.16
The constellation (s1, s3, s5, s7) is rotated +60 degrees compared to the constellation (SO, s2, s4, s6)
(a) Which of the following three mappings {ma, Mb, m,} from v = (v(1)v(2)v(3)) to Si is best? (b) Assume the best mapping in (a) is used. What is the asymptotic coding gain for the scheme? y (1) y (2) v (3)
Ma
Mb
Ate 111
111
111
010
010
110
101
001
001
110
O11
O11
000
000
000
001
101
101
011
110
010
100
100
100
Si
Minimal Encoders
In this appendix we show how to obtain the minimal encoders in Examples 2.22 and 2.37. (Readers who would like to learn more about realizations of sequential circuits are referred to standard textbooks, for example, [Lee78].) EXAMPLE A.1 In Example 2.21, we showed that the encoding matrix
G'(D) =
1+D
D
1
1+D2+D3 1+D+D2+D3 0
(A.1)
is minimal, that µ = 3, and that there exists a minimal realization with only three memory elements. In order to obtain such a realization, we introduce the state variables shown in Fig. A. 1. First, we notice that the input of a memory element whose output is a(rj) is a(r+1rk) Then we have (1u
a)
(21)
(2)
ar+1 = ur
ar+1 = ur a(22) r+
(21)
= ar = ar
(22)
a(23) r+
(A.2)
v(1)
= ar(11) + ar(22) + ar(23) + u(D + u(2)
V (2)
= ar+ (11)
(3)
ur
ar(21) +ar(22) + a, 23) + ur(2) (
(1)
= ur
For simplicity we use hexadecimal notations for the state-four-tuples, (ar(11) ar(21) ar(22) ar(23) ). Then, from
(A.2) we obtain Table A.1 for the behavior of the encoder in Fig. A. 1. In order to find which states are equivalent, we start by merging those states that correspond to the same outputs. Hence, we obtain partition P1:
P1: {0, 3, 9, A}, {1, 2, 8, B}, {4, 7, D, E}, {5, 6, C, F) The states that belong to the same set in P1 are said to be 1-equivalent. Two states are 2-equivalent if they are 1-equivalent and their successor states are 1-equivalent. For example, the states 0 and 3 are not 2-equivalent since their successor states are not 1-equivalent. From
393
Appendix A
394
Minimal Encoders v(1) t t2) V
v(3) t
(11)
at+1
V Ut2) (21)
t (22)
at(22) (23)L
°t+1
= °t+1
J a(21) 1
Ut+1
(23) Qt
Figure A.1 The controller canonical form of the encoding matrix G'(D) in Example A. 1.
TABLE A.1 1
(11)
(21)
Successor state and output (22)
(23)
(1)
at+1 Qt+1 at+i at+i 1Vt a
as
funct ion 1)
("(11)
of
422 ) 61(23) )
01P
(3))
(2)
Vt
the
Vt
present
the
and
state
input
ut(1) u t(2)). I nput
Present state
00
01
10
11
0
0/0
4/6
8/5
1
0/6
4/0
8/3
C/3 C/5
2
1/6
5/0
9/3
3
1/0
5/6
9/5
D/5 D/3
4
2/2
6/4
A/7
E/1
5
2/4
6/2
All
E/7
6
3/4
7/2
F/7
7
3/2
7/4
B/1 B/7
8
0/6
4/0
8/3
9
0/0
4/6
8/5
C/5 C/3
A
1/0
5/6
9/5
B C
1/6
5/0
9/3
2/4
6/2
All
D E F
2/2
6/4
3/2
7/4
A/7 B/7
3/4
7/2
B/1
F/1
D/3 D/5 E/7 E/1 F/1
F/7
Table A.1 and P1 we obtain P2: {0, 91, {3, A}, {1, 8), {2, B), {4, D}, {7, E), {5, C}, {6, F}
The states that belong to the same set in P2 are 2-equivalent. Two states are 3-equivalent if they are 2-equivalent and their successor states are 2-equivalent. Hence, we see that in this example P3 = P2. Thus, we can stop the procedure, and the eight sets in P2 represent the eight states of the minimal encoder. (In general, we proceed until P(k + 1) = Pk.)
Minimal Encoders
Appendix A
395
We let the first octal digit in each set represent the states (or(') ot(21 or(3)) of the minimal encoder. (In order to obtain a linear realization of the minimal encoder, we must let (000) represent the state {0, 9}!) Then we obtain Table A.2. TABLE A.2 Successor state and output (0(1) v(3)) as a function r r r+1 0(2) r+10(3) r+1 v(1) v(2) of the present state (or(') or(2) ot(3)) and the input (u(t1) u(,2)).
Input
Present state
00
01
10
11
0
0/0
4/6
1/5
5/3
1
0/6
4/0
1/3
5/5
2
1/6
5/0
0/3
4/5
3
1/0
5/6
0/5
4/3
4
2/2
6/4
7/1
5
2/4
6/2
3/7 3/1
6
3/4
7/2
2/1
7/7 6/7
7
3/2
7/4
2/7
6/1
From Table A.2 we obtain the following table for o (1)
ut
0(1) Qt(2) at(3)
00
01
(1)
(2)
(2)
ut
10
11
000 001
010 Oil
100 101
110 111
Clearly, (A.3)
(7t+1 = ur
For o,,,, we have (1)
ut
of(1)
at(2)
t(3)
000 001
010 011
(2)
ut
00
01
10
11
0 0 0 0
0
0
0
0
0
0
0
0
0 0 0 0
100
1
101
1
110
1
111
1
Appendix A
396
Minimal Encoders
Thus, (2)
(1)
(A.4)
ar+1 = at
For of+ we have
ut(1) ut(2) at(2)
at(1)
at(3)
00
10
01
11
000 001
010 Oil 100 101
110 111
It is easily seen that (3)
(2)
(1)
(A.5)
+ ut
ar+1 - at
Repeating this procedure for each of the outputs, we obtain I + u (2) v=1) = at(2) + at(3) + u (1) t
v (z) (3)
Vt
+a(3) +u(z) = a(1) +a(2) t t r
(A.6)
(1)
= ut
The minimal encoder is shown in Fig. 2.16.
EXAMPLE A.2 In Example 2.37 we showed that 1 + D2
D2
1+D+D2 1+D+D2
Gsys(D) =
D2
(A.7)
1
1+D+D2 1+D+D2 / is equivalent to a minimal-basic encoding matrix with µ = v = 2. Since Gsys(D) is systematic, it is minimal. Hence, it is also realizable with two memory elements, albeit not on controller canonical form. Its realization in controller canonical form uses four memory elements and is shown in Fig. A.2. From the figure it follows immediately that (11)
(11)
(12)
(11)
at+1 = at
+ at
(12)
(1)
+ ut
at+1 - at (21)
(21)
(22)
+ at at+1 = or, (22) - (21)
(2)
+ ut
ar+1 - at (1)
vt
(2)
(A . 8)
(1)
= ut
(2)
Vt
= ut
v (3)
= 11( lt)
Vt4) = at(12)
+ 11r1(
zz) + u(' r
+ ar (21) + at(
22) + u(2) t
We can ignore the outputs v(1) and v(2) when we minimize the encoder. By repeating the state minimization procedure described in Example A. 1, we obtain the following table for the successor state and output at+i at+1/vt3) vt4)) as a function of the present state and present input:
Appendix A
Minimal Encoders
397 t1)
Ut1)
V V (2)
V
t3)
I-*
(22)
t
Figure A.2 A realization of the systematic encoding matrix in controller canonical form.
(1)
ut
(2)
ut
of(1) Qt(2)
00
01
10
00
00/00
11/11
10/01 11/10 01/11
01/01 11/00 10/11 00/10
10/10
01
00/11
01/00
01/10 00/01
11/01
10/00
10 11
11
From the table we obtain
I
(1)
(1)
(2)
ar+1 = Qr
(2)
+ Qt
at+1 = QI (1)
(1)
+ ut
(2)
+ Ur
(D + U(1) = Qt(1)t O4 O2 + 2) = Qr ur yr
(A.9)
vr3)
The minimal realization is shown in Fig. 2.23.
Finally, we remark that different mappings between the states and their binary representations will give different realizations that in general are nonlinear. However, there always exist mappings such that the encoders can be realized by linear sequential circuits consisting of only memory elements and modulo two adders. For the systematic encoder in Example A.2, there exist six different linear realizations. Two of them require five adders, three require six adders, and one as many as eight adders.
IV
Wald's Identity
Let a random walk be represented by the sequence of random variables So = 0, S1, S2, ... , where j-1
Sj = > Zi
(B.1)
i=0
for j > 1, and where the Zi's are independent, identically distributed random variables such that
P(Zi > 0) > 0
(B.2)
P(Zi < 0) > 0
(B.3)
and
We introduce an absorbing barrier at u < 0 such that the random walk will stop whenever it crosses the barrier from above, that is, as soon as Sj < u. Let g(A) = E[2xZi]
(B.4)
be the moment generating function of Zi. From (B.2) and (B.3) it follows that g(X) -. 00 both when A -+ oc and when A -i -oo. Furthermore, g(0) = I
(B.5)
g'(),) Ix=o = E[ZiI In2
(B.6)
and
In Fig. B.1 we show the typical shapes of the moment generating function g(A). The curve to the left corresponds to E[Zi] > 0 and the one to the right to E[Zi] < 0. We will now consider the real roots of the equation g(A) = 1
(B.7)
399
400
Appendix B
Wald's Identity
g(k)
Figure B.1 Typical g(A) =
when
E[Z,] > 0 (left) and E[Z,] < 0 (right).
Xo
From (B.5) it follows that ? = 0 is always a root. From (B.6) we conclude that for E[Z;] > 0 we have at least one negative root Ao < 0 and for E [Z; ] < 0 at least one positive root ,L, > 0. In general, we have
g(A) = go where 0 < go < 1. This equation has at least two roots if
(B.8)
min{g(a.)} < go
(B.9)
min{g(A)} = go
(B.10)
Z, = y YL,
(B. 11)
but only one root if
EXAMPLE B.1 Let
s-i where the Y,,'s are independent, identically distributed random variables,
_ Y's
a > 0,
with probability 1 - E with probability c
,B < 0,
(B.12)
and
P(Z, = ka + (c - k)N) = (k)(1
-
E)kec-k
(B.13)
From (B.13) we have c
g(,) = E[2'`Zi 1 = jJ E[2xI"1
(B.14) + E2"fl)c
The real root of the equation g ()) = I is equal to the real root of
f (),) = (I - E)2'` + E2'` = I
(B.15)
Let
a= Q
1-E 1-a
(B.16)
E
a
where 0 < a < 1. Then it is easily verified that ? = -1. Next we let
J a = log(1-E)+1-B
P=
logE
+1-B
(B.17)
Appendix B
Wald's Identity
401
where we assume that
f'(0) = (1 - E)a + E/3 = 1 - h(E) - B > 0
(B.18)
1 - h(E) > B
(B.19)
h(x) = -xlogx - (1 - x)log(1 - x)
(B.20)
or, equivalently, that
where
is the binary entropy function. Then there exists a negative root Ao < 0 of the equation g(),) = 1. Inserting (B. 17) into (B. 15) yields (1 - E)1+xp + E1+),0 = 2)'0(B-1)
(B.21)
or, equivalently,
B=1+
1
log ((1 - E)1+a° +E1+xo)
(B.22)
A0
Let S
(B.23)
A0
1 + s
where s is the solution of
R = G(s)/s
(B.24)
and G (s) is the Gallager function for the BSC (4.208). Then we obtain from (B.22) the important equality
B=R
(B.25)
EXAMPLE B.2 Consider Z, given by (B. 11), but in this example we let
_
a > 0, # < 0,
Y`S
with probability 1/2 with probability 1/2
(B.26)
That is,
P(Z1 =ka+(c-k),B) _
(k) (2)
(B.27)
and
g (A) = k C
(i)C \k
2a(ka+(c-k)R)
f
(B.28)
_ ( 2)a + 2
12)`6
)c
2
The real root of the equation (B.29)
g(A) = go
is equal to the root of
f
12aa 2
+ 2
g0
c
(B.30)
Let
(B.31) or, equivalently, (B.32)
and let
(B.33)
Appendix B
402
Wald's Identity
where p is the Gilbert-Varshamov parameter; that is,
p = h-1(1 - R)
(B.34)
Then the function
2aa +
.f (A) =
2
2
(B.35)
2)'P
has its minimum for ,1,, in < 0 such that
) = (a2Xmino +
.f'(A
2iB2Amin91
1n2 = 0
(B.36)
or, equivalently, for ,min = log
p
I-p
(B.37)
Inserting (B.37) into (B.30) yields
) =f ( min
12(1-P)Iog
lpp + 12-Piog I-p 2
2
2
1
pp)t-P+pp)-P)
(B.38)
_ 2h(P)-1
=2-R
That is, equation (B.30) has only one root, viz., Amin Consider next the case when a and 6 are given by (B.17). Then, for B = R we have I>=o =
2(a+,e)In 2
(B.39)
_(-R+log(2 E(1-E)))1n2<0 and there exists a Al > 0 such that (B.40)
.f (),1) = 2-R
From (B.30) it follows that
(2(1 - E)'' +
IE)1/
2-R
(B.41)
log((1-E)'''+e')
(B.42)
or, equivalently, that
R=1-
1
1-A1
If we write Al as 1+
(B.43) s
then it follows from (B.42) that s satisfies (B.24).
Let the random variable N denote the time at which the random walk first crosses the threshold u < 0; that is,
P(N = n) = P(SJ > u, 0 < j < n, & Sn < u)
(B.44)
Then we have the following
Lemma B.1 If E[Z;] < 0, then the random walk will eventually cross the threshold u < 0; that is,
lim P(N > n) = 0
(B.45)
P(N > n) < P(SS > u)
(B.46)
n->oo
Proof.
Clearly,
Appendix B
Wald's Identity
403
From (B.1) it follows that n-1
E[2xs"]
_ F1 E[2xz'] = g(A)n
(B.47)
j=0
and for ,l > 0 we have E[2xs"]
= E2xsP(Sn = s) all s
>
2a'sP(Sn=s)>2xuP(Sn>u)
(B.48)
s.u Then, by combining (B.46), (B.47), and (B.48) we obtain
P(N > n) < 2-xuE[2xs"] < 2-xug(a,)n
(B.49)
From the assumption E[Zj] < 0 it follows that there exists a A > 0 such that g(),) < 1. Then, from (B.49) we conclude that (B.45) holds. Now we consider the general case when E [Z; ] can be not only negative but also positive.
When E[Z;] > 0, we change the probability assignment of the random walk in such a way that the new random walk will have a negative drift and hence will be absorbed with probability one. We introduce the "tilted" probability assignment fz (z)2'z (B.50) g ()) where fz (z) is the probability assignment for the original random walk. We notice that
gz,x(z) =
gz,a,(z) > 0, all z
(B.51)
and
T gz,x(z) =
fz(z)2xz
= g(E[2xz] = 1
(B.52)
Let us introduce the corresponding random walk So,x = 0, Si,,, S2,a, ... , where j-1
(B.53)
Sj,x = E Zi,x
i-o for j > 1, where the Z;,a,'s are independent, identically distributed random variables. We find that
E[Zj,x] _ Y qz,x(z)z
=
1
g00 1
E fz(z)z2"z (B.54)
d fz(z)2xZ
1
dg (.k)
In 2 dA Thus, by choosing A such that has a negative drift.
0, we see that E[Z,,x] < 0 and our tilted random walk
Appendix B
404
Wald's Identity
Let f;,,, (u, v) denote the probability that the tilted random walk is not absorbed by the barrier at u < 0 at time j < n and assumes the value v at time n; that is, def
fx (u, v) = P(Sj,a, > u, 0 < j < n, & Sn,x = v)
(B.55)
For the tilted random walk, (B.44) can be rewritten as
P(N=n)=EfXn(u,v)
(B.56)
v
If we choose ,l such that the drift is negative, then it follows from Lemma B.1 that the tilted random walk will eventually achieve a value less than u with probability 1; that is, 00
(B.57)
EL.fln(u,v)=1 n=1 v
n-1
fl P(Zi,x = ai) = fl P(Zi = i=o
(B.58)
i=o
Hence, we have v)2xvg(A)-n
A,n(u, v) = fo,n(u,
(B.59)
where Fn o ai = v. Combining (B.57) and (B.59) we obtain 00
E E fo,n(u, v)2"vg(a,)-n = 1
(B.60)
n=1 v
which is known as Wald's identity for a random walk with one absorbing barrier at u < 0 [Wa147].
Wald's identity can also be written in a more compact form:
E[2as"g(A) N] = 1
(B.61)
and we have the following.
Theorem B.2 Let Sj be the random walk given by (B. 1) and let g(,l) be the moment generating function of the random variable Zi which satisfies (B.2) and (B.3). Let N be the smallest n for which Sn < u, where the barrier u < 0. Then, for all ,l such that 0 Wald's identity (B.61) holds. Corollary B.3 Let Sj be the random walk given by (B.1) with E [Zi ] > 0 and let g (.k) be the moment generating function of the random variable Zi which satisfies (B.2) and (B.3). Let ,lo < 0 be a root of equation (B.7) such that 0. Then, (B.62)
P(Smin < u) < where Smin = min{Sj}
(B.63)
1
Proof.
From (B.60) we have 00 v)2A0vg(Ao)-n
1=
n=1 v
fo,n(u, (B.64)
00
>
Y. n=1 v
fo,n(u, v) = 2X0uP(Smin < u)
11
Appendix B
405
Wald's Identity
In a more general case we would like to upper-bound 00
f,,,, (u, v)2bn
(B.65)
n=1 v
Then we choose A = Al > 0 such that g(A1) =
2-b
(B.66)
Assume that g'(a,l) < 0. Hence, we can use Wald's identity (B.60) to obtain Oc
I = n=1 Y v
< 2x' u 1: 1: fo,n(ll 02
(B.67)
bn
n=1 V
or, equivalently, 00
Y Y' fo,n (u, v)2bn > 2 xlu
(B.68)
n=1 v
Suppose that min{Z,} = Zmin < 0
(B.69)
SN > U + Zmin
(B.70)
Then,
where zmin is called the maximal overshot, and we obtain from (B.60) that 00
1>
fo,n(u,
v)2bn
(B.71)
n=1 v
or, equivalently, 2-)i(u+zmin)
>
00
fo,n(u, v)2bn
> 2-),'n
(B.72)
n=1 v
where the last inequality follows from (B.68). Notice that (B.68) is still valid if g'(A1) = 0 for Al > 0. Then we have to replace b in
(B.66) by b - e, where e > 0. Let Al(e) be the root of
g(A) = 2-b+E Then Al (e) < Al and g'(A1(e)) < 0. Hence, we use Wald's identity and obtain
(B.73)
00
1: E fo,n n=1 v
(u,
v)2(b-E)n > 2-x'(R)u
(B.74)
When e - 0 (B.74) will approach (B.68). (If Al < 0 is a root of (B.66), then the inequalities in equation (B.74) are reversed.) Corollary B.4 Let S1 be the random walk given by (B. 1) with the absorbing rule Sj < u, where the barrier u < 0, and let g (A) be the moment generating function of the random variable Zi which satisfies (B.2) and (B.3) and whose minimum value zmin = min{Z; } < 0. Let Al > 0 be a root of equation (B.63) such that g'(A1) < 0. Then, inequality (B.72) holds. If Al < 0 is a root of equation (B.66) such that g'(A1) < 0, then the inequalities in (B.72) are reversed.
Corollary B.5 (Wald's equality) Let S1 be the random walk given by (B.1) and assume that the random variable Z, satisfies (B.2) and (B.3) and has E[Z,] < 0. Then E[N] exists
Appendix B
406
Wald's Identity
and Oc
n E fo,n(u, v) = E[SN]/E[Zi]
E[N]
(B.75)
v
n=1
Furthermore, if a maximal overshot zmin exists, then
> E[N] >
u + Zmin
u
=
E[Z,]
E[Z,]
u In 2
(B.76)
g'(0) where u < 0 is the barrier and g(k) is the moment generating function satisfying (B.5). Proof. Taking the derivative of Wald's identity (B.60) yields 00
1n2
fo,n(u, n=1 v
-
(B.77) v)2avng(A)-n
fO,n(u,
1g'(),) = 0
n=1 v
Since
g'(0) = E[Zi ] In 2
(B.78)
we have for ,l = 0: cc
00
vfo,n (u, v) = E[Zt] n=1 v
E n.fo,n (u, v)
(B.79)
n=1 v
which is equivalent to (B.75). Since u + zmin < SN < U
(B.80)
(B.76) follows.
Finally, Wald's identity is valid not only in situations when the random walk crosses
the barrier u < 0 but also when it hits it, that is, as soon as Si < u. The corresponding reformulations of Lemma B. 1, Theorem B.2, and Corollary B.5 are left as exercises. Since the corresponding reformulations of Corollaries B.3 and B.4 are used in Chapters 3-6, we give them here. As a counterpart to Corollary B.3, we have
Corollary B.6 Let Sj be the random walk given by (B.1) with E[Zi] > 0, and let g(A) be the moment generating function of the random variable Z, which satisfies (B.2) and (B.3). Let A0 < 0 be a root of equation (B.7) such that g'(AO) < 0. Then, P(Smin < u) < 2-;'0u
(B.81)
Smin = min{Sj}
(B.82)
where J
Finally, as a counterpart to Corollary B.4 we have Corollary B.7 Let Sj be the random walk given by (B. 1) with the absorbing rule Sj < u, where the barrier u < 0, and let g (A) be the moment generating function of the random variable Z, which satisfies (B.2) and (B.3) and minimum value zmin = min{Z; } < 0. Let Al > 0 be a root of equation (B.66) such that g'(,l1) < 0. Then inequality (B.72) holds. If Al < 0 is a root of equation (B.66) such that g'(Ai) < 0, then the inequalities in (B.72) are reversed.
Bibliography
[AnB89] [And69]
Anderson, J. B. and Balachandran, K. (1989), Decision depths of convolutional codes. IEEE Trans. Inform. Theory, IT-35:455-459. Anderson, J. B. (1969), Instrumentable tree encoding of information sources. M.Sc. Thesis, School of Electrical Engineering, Cornell University, Ithaca, N.Y., Sept. 1969.
Anderson, J. B. (1989), Limited search trellis decoding of convolutional codes. IEEE Trans. Inform. Theory, IT-35:944-956. [And92] Anderson, J. B. (1992), Sequential decoding based on an error criterion. IEEE Trans. Inform. Theory, IT-38:987-1001. [AnM84] Anderson, J. B. and Mohan, S. (1984), Sequential decoding algorithms: A survey and cost analysis. IEEE Trans. Commun., COM-32:169-176. [AnM91 ] Anderson, J. B. and Mohan, S. (1991), Source and Channel Coding An Algorithmic Approach. Kluwer Academic Publishers, Boston. Bahl, L. R. and Jelinek, F. (1971), Rate 1/2 convolutional codes with complemen[BaJ71] tary generators. IEEE Trans. Inform. Theory, IT-17:718-727. Bahl, L. R. and Jelinek, F. (1972), On the structure of rate 1 /n convolutional codes. [BaJ72] IEEE Trans. Inform. Theory, IT-18:192-196. [BaP66] Baum, L. E. and Petrie, T. (1966), Statistical inference for probabilistic functions of finite-state Markov chains. Ann. Math. Stat., 37:1554-1563. [BCJ74] Bahl, L., Cocke, J., Jelinek, F., and Raviv, J. (1974), Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans. Inform. Theory, IT-20:284[And89]
287. [BeG96] [BeH89]
[BeH93]
Berrou, C. and Glavieux, A. (1996), Near-optimum error-correcting coding and decoding: Turbo codes. IEEE Trans. Commun., COM-44:1261-1271. Begin, G. and Haccoun, D. (1989), High-rate punctured convolutional codes: Structural properties and construction technique. IEEE Trans. Commun., COM37:1381-1385. Belzile, J. and Haccoun, D. (1993), Bidirectional breath-first algorithms for the decoding of convolutional codes. IEEE Trans. Commun., COM-41:370-380. 407
408
Bibliography
[Ber80] [Ber82]
[BGT93]
[Bha43]
[BHP90]
[BiM53] [B1a79]
[Bla90]
Berlekamp, E. R. (1980), The technology of error-correcting codes. Proc. IEEE, 68:564-593. Berlekamp, E. R. (1982), Bit-serial Reed-Solomon encoders. IEEE Trans. Inform. Theory, IT-28:869-874. Berrou, C., Glavieux, A., and Thitimajshima, P. (1993), Near Shannon limit errorcorrecting coding and decoding: Turbo codes (1). Proc. ICC '93, 1064-1070, Geneva, Switzerland. Bhattacharyya, A. (1943), On a measure of divergence between two statistical populations defined by their probability distributions. Bull Calcutta Math. Soc., 35:99-110. Begin, G., Haccoun, D., and Paquin, C. (1990), Further results on high-rate punctured convolutional codes for Viterbi and sequential decoding. IEEE Trans. Commun., COM-38:1922-1928. Birkhoff, G. and MacLane, S. (1953), A Survey of Modern Algebra, rev. ed.. Macmillan, New York. Blahut, R. E. (1979), Transform techniques for error-control codes. IBM J. Res. Devel., 23:299-315. Blahut, R. E. (1990), Digital Transmission of Information. Addison-Wesley, New York.
[B1a91 ]
[B1a92]
[BMc74]
[BrV93]
[BuH70] [Bus65]
[CCG79]
Blahut, R. E. (1991), Algebraic Methods for Signal Processing and Communication Coding. Springer-Verlag, New York. Blahut, R. E. (1992), Presentation at Workshop on Information Theory, Mathematisches Forschungsinstitut, Oberwolfach, Germany, Apr. 5-11, 1992. Butman, S. A. and McEliece, R. J. (1974), The ultimate limits of binary coding for a wideband Gaussian channel. DSN Progress Report 42-22, vol. May-June 1974, Jet Propulsion Laboratory, Pasadena, California, 78-80. Brouwer, A. G. and Verhoeff, T. (1993), An updated table of minimum distance bounds for linear codes. IEEE Trans. Inform. Theory, IT-39:662-677.
Bucher, E. A. and Heller, J. A. (1970), Error probability bounds for systematic convolutional codes. IEEE Trans. Inform. Theory, IT-16:219-224. Bussgang, J. J. (1965), Some properties of binary convolutional code generators. IEEE Trans. Inform. Theory, IT-11:90-100. Cain, J. B., Clark, G. C., Jr., and Geist, J. M. (1979), Punctured codes of rate (n - 1)/n and simplified maximum-likelihood decoding. IEEE Trans. Inform. Theory, IT-25:97-100.
[CeJ89]
Cedervall, M. and Johannesson, R. (1989), A fast algorithm for computing distance spectrum of convolutional codes. IEEE Trans. Inform. Theory, IT-35:1146-1159.
[CFV97]
Calderbank, A. R., Forney, G. D., Jr., and Vardy, A. (1997), Minimal tail-biting trellises: the Golay code and more. IEEE Trans. Inform. Theory, forthcoming. Chevillat, P. R. and Costello, D. J., Jr. (1976), Distance and computing in sequential decoding. IEEE Trans. Commun., COM-24:440-447. Chevillat, P. R. and Costello, D. J., Jr. (1977), A multiple stack algorithm for erasure-free decoding of convolutional codes. IEEE Trans. Commun., COM25:1460-1470. Chevillat, P. R. and Costello, D. J., Jr. (1978), An analysis of sequential decoding for specific time-invariant convolutional codes. IEEE Trans. Inform. Theory, IT24:443-451.
[ChC76]
[ChC77]
[ChC78]
Bibliography
409
[Che76]
Chevillat, P. R. (1976), Fast sequential decoding and a new complete decoding algorithm. Ph.D. Thesis, IIT, Chicago, Ill.
[ChH66]
Chang, R. W. and Hancock, J. C. (1966), On receiver structures for channels having memory. IEEE Trans. Inform. Theory, IT-12:463-468.
[CJZ84a]
Cedervall, M., Johannesson, R., and Zigangirov, K. Sh. (1984), A new upper bound on the first-event error probability for maximum-likelihood decoding of
fixed binary convolutional codes. IEEE Trans. Inform. Theory, IT-30:762-766. [CJZ84b] Cedervall, M., Johannesson, R., and Zigangirov, K. Sh. (1984), Creeper-an easily implementable algorithm for sequential decoding. Proc. Sixth Int. Symp. Inform. Theory, Tashkent, USSR. [C1C83] Clark, G. C. and Cain, J. B. (1983), Error-Correcting Coding for Digital Communications. Plenum Press, New York. [Co192] Collins, O. M. (1992), The subtleties and intricacies of building a constraint length 15 convolutional decoder. IEEE Trans. Commun., COM-40:1810-1819. Costello, D. J., Jr. (1969), A construction technique for random-error-correcting [Cos69] convolutional codes. IEEE Trans. Inform. Theory, IT-19:631-636. Costello, D. J., Jr. (1974), Free distance bounds for convolutional codes. IEEE [Cos74] Trans. Inform. Theory, IT-20:356-365. [CoS88] Conway, J. H. and Sloane, N. J. A. (1988), Sphere Packings, Lattices and Groups. Springer-Verlag, New York. [CSZ92]
Chepyzhov, V. V., Smeets, B. J. M., and Zigangirov, K. Sh. (1992), The free distance of fixed convolutional rate 2/4 codes meets the Costello bound. IEEE
Trans. Inform. Theory, IT-38:1360-1366. Dholakia, A. (1994), Introduction to Convolutional Codes. Kluwer, Boston. [DMW82] Daut, D. G., Modestino, J. W., and Wismer, L. D. (1982), New short constraint length convolutional code constructions for selected rational rates. IEEE Trans. Inform. Theory, IT-28:793-799. [DSZ94] Dettmar, U., Sorger, U. K., and Zyablov, V. V. (1994), Concatenated codes with convolutional inner and outer codes. Proc. Workshop on Inform. Protection, Moscow, Russia, Dec. 1993, 80-87. Also in Error Control, Cryptology, and Speech Compression (Eds. A. Chmora and S. B. Wicker), Lecture Notes in Computer Science 829. Springer-Verlag, Berlin. [E1i54] Elias, P. (1954), Error-free coding. IRE Trans. Inform. Theory, PGIT-4:29-37. Also in Key Papers in the Development of Coding Theory (Ed. E. R. Berlekamp) (1974). IEEE Press, New York. [Eli55] Elias, P. (1955), Coding for noisy channels. IRE Cony. Rec., pt. 4, 37-46. Also in Key Papers in the Development of Coding Theory (Ed. E. R. Berlekamp) (1974). IEEE Press, New York. [EnZ98] Engdahl, K. and Zigangirov, K. Sh. (1998), On the statistical theory of turbo-codes. Proceedings of the Sixth International Workshop on Algebraic and Combinatoric Coding Theory, Pskov, Russia, Sept. 6-12, 1998. [Fal66] Falconer, D. D. (1966), A hybrid sequential and algebraic decoding scheme. Ph.D. Thesis, Dept. of E. E., MIT, Cambridge, Mass. [Fan6l] Fano, R. M. (1961), Transmission of Information. MIT Press, Cambridge, Mass. and Wiley, New York. [Fan63] Fano, R. M. (1963), A heuristic discussion of probabilistic decoding. IEEE Trans. Inform. Theory, IT-9:64-74. [Dho94]
410
Bibliography
[Fe168]
[FeM91] [FGL84]
[FJW96]
[For66] [For67]
[For7Oa]
[For70b] [For71] [For73a] [For73b] [For74a]
[For74b] [For75] [For88a] [For88b]
[For9la]
[For9lb] [For94]
Feller, W. (1968), An Introduction to Probability Theory and Its Applications, 3rd ed.. Wiley, New York. Fettweis, G. and Meyr, H. (1991), High-speed parallel Viterbi decoding: Algorithms and VLSI-architecture. IEEE Commun. Mag., 46-55. Forney, G. D., Jr., Gallager, R. G., Lang, G. R., Longstaff, F. M., and Qureshi, S. U. (1984), Efficient modulation for band-limited channels. IEEE J. Select. Areas Commun., SAC-2:632-647. Forney, G. D., Jr., Johannesson, R., and Wan, Z.-X. (1996), Minimal and canonical rational generator matrices for convolutional codes. IEEE Trans. Inform. Theory, IT-42:1865-1880. Forney, G. D., Jr. (1966), Concatenated Codes. MIT Press, Cambridge, Mass. Forney, G. D., Jr. (1967), Review of random tree codes (NASA Ames. Res. Cen., Contract NAS2-3637, NASA CR 73176, Final Rep.;Appx A). See also Forney, G. D., Jr. (1974), Convolutional codes II: Maximum-likelihood decoding and convolutional codes III: Sequential decoding. Inform Contr., 25:222-297. Forney, G. D., Jr. (1970), Convolutional codes I: Algebraic structure. IEEE Trans. Inform. Theory, IT-16:720-738. Forney, G. D., Jr. (1970), Use of a sequential decoder to analyse convolutional codes structure. IEEE Trans. Inform. Theory, IT- 16:793-795. Forney, G. D., Jr. (1971), Correction to "Convolutional Codes I: Algebraic Structure." IEEE Trans. Inform. Theory, IT-17:360. Forney, G. D., Jr. (1973), Structural analyses of convolutional codes via dual codes. IEEE Trans. Inform. Theory, IT-19:512-518. Forney, G. D., Jr. (1973), The Viterbi algorithm. Proc. IEEE, 61:268-278. Forney, G. D., Jr. (1974), Convolutional codes II: Maximum-likelihood decoding and convolutional codes. Inform. Contr., 25:222-266. Forney, G. D., Jr. (1974), Convolutional codes III: Sequential decoding. Inform. Contr., 25:266-297. Forney, G. D., Jr. (1975), Minimal bases of rational vector spaces, with applications to multivariable systems. SIAM J. Control, 13:493-520. Forney, G. D., Jr. (1988), Coset codes-Part I: Introduction and geometrical classification. IEEE Trans. Inform. Theory, IT-34:1123-1151. Forney, G. D., Jr. (1988), Coset codes-Part II: Binary lattices and related codes. IEEE Trans. Inform. Theory, IT-34:1152-1187. Forney, G. D., Jr. (1991), Algebraic structure of convolutional codes, and algebraic system theory. In Mathematical System Theory (Ed. A. C. Antoulas). SpringerVerlag, Berlin, 527-558. Forney, G. D., Jr. (1991), Geometrically uniform codes. IEEE Trans. Inform. Theory, IT-37:1241-1260. Forney, G. D., Jr. (1994), Trellises old and new, 115-128. In Communications and Cryptography-Two Sides of the One Tapestry (Eds. R. E. Blahut et al.). Published in honor of James L. Massey on the occasion of his 60th birthday. Kluwer, Boston, 1994.
[For97] [FoT93]
Forney, G. D., Jr. (1997), On iterative decoding and the two-way algorithm. Proc. Int. Symp. on Turbo Codes & Related Topics, Brest, France. Forney, G. D., Jr. and Trott, M. D. (1993), The dynamics of group codes: State
spaces, trellis diagrams, and canonical encoders. IEEE Trans. Inform. Theory, IT-39:1491-1513.
411
Bibliography
Forney, G. D., Jr. and Wei, L.-F. (1989), Multidimensional constellations-Part I: Introduction, figures of merit, and generalized cross constellations. IEEE Journ. Select. Areas in Com. JSAC-7:877-892. [FWH91] Ferreira, H. C., Wright, D. A., Helberg, A. S. J., Shaw, I. S., and Wyman, C. R. (1991), Some new rate R = k/n (2 < k < n - 2) systematic convolutional codes with good distance profiles. IEEE Trans. Inform. Theory, IT-37:649-653. Gallager, R. G. (1962), Low-density parity-check codes. IRE Trans. Inform. The[Ga162] ory, IT-8:21-28. Gallager, R. G. (1963), Low-Density Parity-Check Codes. MIT Press, Cambridge, [Ga163] Mass. Gallager, R. G. (1965), A simple derivation of the coding theorem and some ap[Ga165] plications. IEEE Trans. Inform. Theory, IT-11:3-18. [Ga168] Gallager, R. G. (1968), Information Theory and Reliable Communication. Wiley, [FoW89]
New York. [Gei71 ]
[Gei73] [Go149]
[Gol67]
[Gri60] [HaB89] [Hac66] [HaF75] [Hag77] [Hag88] [HaH89] [HaH70] [Ham50] [Has93]
[HeC77] [HeC80] [HeJ71 ]
Geist, J. M. (1971), An empirical comparison of two sequential decoding algorithms. IEEE Trans. Commun., COM-19:415-419. Geist, J. M. (1973), Some properties of sequential decoding algorithms. IEEE Trans. Inform. Theory, IT-19:519-526. Golay, M. J. E. (1949), Notes on digital coding. Proc. I. R. E., 37:657. Golomb, S. W. (1967), Shift Register Sequences, Holden-Day, San Francisco, 1967. Revised ed., Aegean Park Press, Laguna Hills, Calif., 1982. Griesmer, J. H. (1960), A bound for error-correcting codes. IBM J. Res. Develop., 4:532-542. Haccoun, D. and Begin, G. (1989), High-rate punctured convolutional codes for Viterbi and sequential decoding. IEEE Trans. Commun., COM-37:1113-1125.
Haccoun, D. (1966), Simulated communication with sequential decoding and phase estimation. S. M. Thesis, Dept. of E. E., MIT, Cambridge, Mass. Haccoun, D. and Ferguson, M. J. (1975), Generalized stack algorithms for decoding convolutional codes. IEEE Trans. Inform. Theory, IT-21:638-651. Hagenauer, J. (1977), High rate convolutional codes with good distance profiles. IEEE Trans. Inform. Theory, IT-23:615-618. Hagenauer, J. (1988), Rate-compatible punctured convolutional codes (rcpc codes) and their applications. IEEE Trans. Commun., COM-36:389-400. Hagenauer, J. and Hoeher, P. (1989), A Viterbi algorithm with soft-decision outputs and its applications. Proc. GLOBECOM '89, 47:1-7, Dallas, TX. Hartley, B. and Hawkes, T. O. (1970), Rings, Modules and Linear Algebra. Chapman and Hall, London. Hamming, R. W. (1950), Error-detecting and error-correcting codes. Bell Sys. Techn. J., 29:147-160. Hashimoto, T. (1993), A coded ARQ scheme with the generalized Viterbi algoHemmati, F. and Costello, D. J., Jr. (1977), Truncation error probability in Viterbi decoding. IEEE Trans. Commun., COM-25:530-532. Hemmati, F. and Costello, D. J., Jr. (1980), Asymptotically catastrophic convolutional codes. IEEE Trans. Inform. Theory, IT-26:298-304.
Heller, J. A. and Jacobs, I. M. (1971), Viterbi decoding for satellite and space communications. IEEE Trans. Commun., COM-19:835-848.
Bibliography
412
[He167] [He168]
Heller, J. A. (1967), Sequential decoding for channels with time varying phase. Sc.D. Thesis, Dept. of E. E., MIT, Cambridge, Mass. Heller, J. A. (1968), Short constraint length convolutional codes. Jet Propulsion Lab., California Inst. Technol., Pasadena, Space Programs Summary 37-54,3:171177.
[HJS98]
Host, S., Johannesson, R., Sidorenko, V. R., Zigangirov, K. Sh., and Zyablov, V. V. (1998), Cascaded convolutional codes. In Communications and Coding (Eds. M. Darnell and B. Honary), 10-29. Published in honor of Paddy G. Farrell on the occasion of his 60th birthday. Research Studies Press Ltd. and John Wiley & Sons, 1998.
[HJZ95]
[HJZ99]
Host, S., Johannesson, R., and Zyablov, V. V. (1995), On the construction of concatenated codes based on binary conventional convolutional codes. Proc. Seventh Joint Swedish-Russian Int. Workshop on Inform. Theory, St. Petersburg, Russia, 114-118. Host, S., Johannesson, R., Zigangirov, K. Sh., and Zyablov, V. V. (1999), Active distances for convolutional codes. IEEE Trans. Inform. Theory, 45:March.
[HOP96]
Hagenauer, J., Offer, E., and Papke, L. (1996), Iterative decoding of binary block and convolutional codes. IEEE Trans. Inform. Theory, IT-42:429-445.
[HSS90]
Hagenauer, J., Seshadri, N., and Sundberg, C. W. (1990), The performance of ratecompatible punctured convolutional codes for digital mobile radio. IEEE Trans. Commun., COM-38:966-980.
[HuW76] Huth, G. K. and Weber, C. L. (1976), Minimum weight convolutional codewords of finite length. IEEE Trans. Inform. Theory, IT-22:243-246.
Jacobs, I. M. and Berlekamp, E. R. (1967), A lower bound to the distribution of computation for sequential decoding. IEEE Trans. Inform. Theory, IT- 13:167-174. [Jac67] Jacobs, I. M. (1967), Sequential decoding for efficient communication from deep space. IEEE Trans. Commun., COM- 15:492-501. Jacobs, I. M. (1974), Practical applications of coding. IEEE Trans. Inform. Theory, [Jac74] IT-20:305-310. [Jac85] Jacobson, N. (1985), Basic Algebra 1, 2nd ed.. Freeman, New York. [Jac89] Jacobson, N. (1989), Basic Algebra II, 2nd ed.. Freeman, New York. [Je169] Jelinek, F. (1969), A fast sequential decoding algorithm using a stack. IBM J. Res. Dev., 13:675-685. [JiZ97] Jimenez, A. and Zigangirov, K. Sh. (1997), Periodically time-varying convolutional codes with low-density parity-check matrices. Submitted to IEEE Trans. Inform. Theory, June 1997. [Joh75] Johannesson, R. (1975), Robustly optimal rate one-half binary convolutional codes. IEEE Trans. Inform. Theory, IT-21:464-468. [Joh76] Johannesson, R. (1976), Some long rate one-half binary convolutional codes with an optimum distance profile. IEEE Trans. Inform. Theory, IT-22:629-631. [Joh77a] Johannesson, R. (1977), Some rate 1/3 and 1/4 binary convolutional codes with an optimum distance profile. IEEE Trans. Inform. Theory, IT-23:281-283. [Joh77b] Johannesson, R. (1977), On the error probability of general trellis codes with applications to sequential decoding. IEEE Trans. Inform. Theory, IT-23:609-611. [Joh79] Johannesson, R. (1979), On the distribution of computation for sequential decoding using the stack algorithm. IEEE Trans. Inform. Theory, IT-25:323-331. [JaB67]
413
Bibliography
[JoP78]
Johannesson, R. and Paaske, E. (1978), Further results on binary convolutional codes with an optimum distance profile. IEEE Trans. Inform. Theory, IT-24:264268.
[JoS97]
[JoW91] [JoW93] [JoW94]
Johannesson, R. and Stahl, P. (1997), New rate 1/2, 1/3, and 1/4 binary convolutional encoders with an optimum distance profile. Submitted to IEEE Trans. Inform. Theory, Oct. 1997. Johannesson, R. and Wan, Z.-X. (1991), Submodules of F[X]" and convolutional codes. Proc. First China-Japan Int. Symp. on Ring Theory, Guilin, China. Johannesson, R. and Wan, Z.-X. (1993), A linear algebra approach to minimal convolutional encoders. IEEE Trans. Inform. Theory, IT-39:1219-1233. Johannesson, R. and Wan, Z.-X. (1994), On canonical encoding matrices and the generalized constraint lengths of convolutional codes. In Communications and Cryptography-Two Sides of the One Tapestry (Eds. R. E. Blahut et al.), 187-200. Published in honor of James L. Massey on the occasion of his 60th birthday. Kluwer Academic Publisher, Boston, 1994.
[JoW98]
[JoZ85]
[JoZ89]
[JoZ92] [JoZ96]
[JSW98]
Johannesson, R. and Wittenmark, E. (1998), Two 16-state, rate R = 2/4 trellis codes whose free distances meet the Heller bound. IEEE Trans. Inform. Theory, 44:1602-1604. Johannesson, R. and Zigangirov, K. Sh. (1985), On the distribution of the number of computations in any finite number of subtrees for the stack algorithm. IEEE Trans. Inform. Theory, IT-31:100-102. Johannesson, R. and Zigangirov, K. Sh. (1989), Distances and distance bounds for convolutional codes. In Topics in Coding Theory-In honour of Lars H. Zetterberg, Einarsson, G. et al.. Springer-Verlag, Berlin, 109-136. Johannesson, R. and Zigangirov, K. Sh. (1992), A trellis coding scheme based on signal alphabet splitting. Probl. Peredachi Inform., 4:14-23. Johannesson, R. and Zigangirov, K. Sh. (1996), Towards a theory for list decoding of convolutional codes. Probl. Peredachi Inform., 1. Johannesson, R., Stahl, P., and Wittenmark, E. (1998), A note on Type II convolutional codes. Submitted to IEEE Trans. Inform. Theory.
[JSZ88]
Johannesson, R., Sidorenko, V., and Zigangirov, K. Sh. (1988), On sequential decoding for the Gilbert channel. IEEE Trans. Inform. Theory, IT-34:1058-1061.
[JTZ88]
Justesen, J., Thommesen, C., and Zyablov, V. V. (1988), Concatenated codes with convolutional inner codes. IEEE Trans. Inform. Theory, IT-34:1217-1225. Justesen, J. (1973), New convolutional code constructions and a class of asymptotically good time-varying codes. IEEE Trans. Inform. Theory, IT-19:220-225.
[Jus73]
Justesen, J. (1975), An algebraic construction of rate 1/v convolutional codes. IEEE Trans. Inform. Theory, IT-21:577-580. [JWW98] Johannesson, R., Wan, Z.-X., and Wittenmark, E. (1998), Some structural properties of convolutional codes over rings. IEEE Trans. Inform. Theory, 44:839-845. Johannesson, R., Zigangirov, K. Sh., and Zyablov, V. V. (1995), Lower bounds [JZZ95] on the free distance for random concatenated convolutional codes. Proc. Seventh Joint Swedish-Russian Int. Workshop on Inform. Theory, St. Petersburg, Russia, [Jus75]
[KFA69]
[Knu73]
133-136. Kalman, R. E., Falb, P. L., and Arbib, M. A. (1969), Topics in Mathematical System Theory. McGraw-Hill, New York. Knuth, D. E. (1973), The Art of Computer Programming, Vol. 3, Searching and Sorting. Addison-Wesley, Reading, Mass.
414
Bibliography
Kudryashov, B. D. and Zakharova, T. G. (1989), Block codes from convolutional codes. Prob. Peredachi Inform., 25:98-102. [LaM70] Layland, J. and McEliece, R. (1970), An upper bound on the free distance of a tree code. Jet Propulsion Lab., California Inst. Technol., Pasadena, Space Programs Summary 37-62, 3:63-64. [Lar73] Larsen, K. J. (1973), Short convolutional codes with maximum free distance for rates 1/2, 1/3, and 1/4. IEEE Trans. Inform. Theory, IT-19:371-372. [Lau79] Lauer, G. S. (1979), Some optimal partitial-unit-memory codes. IEEE Trans. Inform. Theory, IT-25:240-243. [Lee74] Lee, L.-N. (1974), Real-time minimal-bit-error probability decoding of convolutional codes. IEEE Trans. Commun., COM-22:146-151. [Lee76a] Lee, L.-N. (1976), Short unit-memory byte-oriented binary convolutional codes having maximal free distance. IEEE Trans. Inform. Theory, IT-22:349-352. [Lee76b] Lee, L.-N. (1976), On optimal soft-decision demodulation. IEEE Trans. Inform. Theory, IT-22:437-444. [Lee78] Lee, S. C. (1978), Modern Switching Theory and Digital Design. Prentice-Hall, Englewood Cliffs, N.J. [LFM94] Loeliger, H.-A., Forney, G. D., Jr., Mittelholzer, T., and Trott, M. D. (1994), Minimality and observability of group systems. Linear Algebra and Its Applications, 205-206:937-963. [LiC83] Lin, S. and Costello, D. J., Jr. (1983), Error Control Coding: Fundamentals and Applications, Prentice Hall, Englewood Cliffs, N.J. [Lin86] Lin, C.-F. (1986), A truncated Viterbi algorithm approach to trellis codes. Ph.D. Thesis, ECSE Dept., Rensselaer Poly. Inst., Troy, N.Y., Sept. 1986. [LoM96] Loeliger, H.-A. and Mittelholzer, T. (1996), Convolutional codes over groups. IEEE Trans. Inform. Theory, 42:1660-1686. [MaC71] Massey, J. L. and Costello, D. J., Jr. (1971), Nonsystematic convolutional codes for sequential decoding in space applications. IEEE Trans. Commun. Technol., COM-19:806-813. [Mas63] Massey, J. L. (1963), Threshold Decoding. MIT Press, Cambridge, Mass. [MaS67] Massey, J. L. and Sain, M. K. (1967), Codes, automata, and continuous systems: Explicit interconnections. IEEE Trans. Automatic Control, AC-12:644-650. [MaS68] Massey, J. L. and Sain, M. K. (1968), Inverses of linear sequential circuits. IEEE Trans. Comput., C-17:330-337. [Mas72] Massey, J. L. (1972), Variable-length codes and the Fano metric. IEEE Trans. Inform. Theory, IT-18:196-198. [Mas74] Massey, J. L. (1974), Coding and modulation in digital communications. Proc. Int. Zurich Seminar on Digital Communications, E2(1)-E2(4). [Mas75] Massey, J. L. (1975), Error bounds for tree codes, trellis codes, and convolutional codes with encoding and decoding procedures. In Coding and Complexity-CISM Courses and Lectures No. 216 (Ed. Longo, G.). Springer-Verlag, Vienna. [MaS77] MacWilliams, F. J. and Sloane, N. J. A. (1977), The Theory of Error-Correcting Codes. North-Holland, Amsterdam. [Mas82] Massey, J. L. (1982), What is a bit of information? Scienta Electric, Band 28, Heft 1:1-11. [Mas84] Massey, J. L. (1984), The how and why of channel coding. Proc. 1984 Int. Zurich Seminar on Digital Communications, Zurich, 67-73. [KuZ89]
Bibliography
415
[Mas85]
Massey, J. L. (1985), Coding theory. Handbook of Applicable Mathematics (Ed. W. Ledermann), Vol. V, Part B, Combinatorics and Geometry (Eds. W. Ledermann and S. Vajda). Wiley, Chichester & New York. [MaW86] Ma, J. H. and Wolf, J. K. (1986), On tail-biting convolutional codes. IEEE Trans. Commun., 34:104-111. [MaZ60] Mason, S. and Zimmermann, H. (1960), Electronic Circuits, Signals, and Systems. [McE77] [McE98]
Wiley, New York. McEliece, R. J. (1977), The Theory of Information and Coding. Addison-Wesley, Reading, Mass. McEliece, R. J. (1998), The algebraic theory of convolutional codes. Handbook of Coding Theory (Eds. V. Pless and W. C. Huffman), Vol. 1. Elsevier Science, New York.
Van de Meeberg, L. (1974), A tightened upper bound on the error probability of binary convolutional codes with Viterbi decoding. IEEE Trans. Inform. Theory, IT-20:389-391. [MoA84] Mohan, S. and Anderson, J. B. (1984), Sequential coding algorithms: A survey and cost analysis. IEEE Trans. Commun. Technol., COM-32:169-176. [Mon70] Monna, A. F. (1970), Analyse Non-Archimedienne. Springer-Verlag, Berlin. [MSG72] Massey, J. L., Sain, M. K., and Geist, J. M. (1972), Certain infinite Markov chains and sequential decoding. Discrete Mathematics, 3:163-175. [NJZ97] Nystrom, J., Johannesson, R., and Zigangirov, K. Sh. (1997), Creeper-an algorithm for sequential decoding. Submitted to IEEE Trans. Inform. Theory. [Nys93] Nystrom, J. (1993), Creeper-An Algorithm for Sequential Decoding. Ph.D. Thesis, Dept. of Inform. Theory, Lund University, Lund, Sweden. [OAJ98] Osthoff, H., Anderson, J. B., Johannesson, R., and Lin, C.-F. (1998), Systematic feed-forward convolutional encoders are better than other encoders with an Malgorithm decoder. IEEE Trans. Inform. Theory, 44:831-838. [Ode70] Odenwalder, J. P. (1970), Optimal decoding of convolutional codes. Ph.D. Thesis, UCLA, Los Angeles. [O1s70] Olson, R. R. (1970), Note on feedforward inverses of linear sequential circuits. IEEE Trans. Comput., C-19:1216-1221. [Omu69] Omura, J. K. (1969), On the Viterbi decoding algorithm. IEEE Trans. Inform. Theory, IT-15:177-179. [Ost93] Osthoff, H. (1993), Reduced complexity decoding with systematic encoders. Ph.D. Thesis, Dept. Inform. Theory, Lund University, Lund, Sweden. [Paa74] Paaske, E. (1974), Short binary convolutional codes with maximum free distance for rates 2/3 and 3/4. IEEE Trans. Inform. Theory, IT-20:683-689. [Pa193] Palazzo, R., Jr. (1993), A time-varying convolutional encoder better than the best time-invariant encoder. IEEE Trans. Inform. Theory, IT-39:1109-1110. [Pin67] Pinsker, M. S. (1967), Bounds for error probability and for number of correctable errors for nonblock codes. Probl. Peredachi Inform., 3:58-71. [Pir88] Piret, P. (1988), Convolutional Codes: An Algebraic Approach. MIT Press, Cambridge, MA. [Pos77] Post, K. A. (1977), Explicit evaluation of Viterbi's union bounds on convolutional code performance for the binary symmetric channel. IEEE Trans. Inform. Theory, IT-23:403-404. [Pro89] Proakis, J. G. (1989), Digital Communications. McGraw-Hill, New York. [Mee74]
416
Bibliography
[Roo79]
[SAJ97]
Roos, C. (1979), On the structure of convolutional and cyclic convolutional codes. IEEE Trans. Inform. Theory, IT-25:676-683. Stahl, P., Anderson, J. B., and Johannesson, R. (1997), Optimal and near-optimal
encoders for short and moderate-length tailbiting trellises. Submitted to IEEE [SaM69] [Sav66]
[Sch97] [SeS89]
Trans. Inform. Theory. Sain, M. K. and Massey, J. L. (1969), Invertability of linear time-invariant dynamical systems. IEEE Trans. Automatic Control, AC- 14:141-149.
Savage, J. E. (1966), Sequential decoding-the computation problem. Bell Syst. Tech. J., 45:149-176. Schlegel, C. (1997), Trellis Coding. IEEE Press, New York. Seshadri, N. and Sundberg, C.-E. W. (1989), Generalized Viterbi algorithms for error detection with convolutional codes. Proc. IEEE GLOBECOM Conf., 15341538.
[SGB67]
[Sha48]
[Sha93] [Sov79] [SPA79]
[Sti63]
[SVP78] [Tan81 ]
[ThJ83] [Tho83]
[Tro94]
[UnC76]
[Ung82] [Ung87a]
Shannon, C. E., Gallager, R. G., and Berlekamp, E. R. (1967), Lower bounds to error probability for coding on discrete memoryless channels. Inform. Contr., 10, Part I: 65-103 and Part II: 522-552. Shannon, C. E. (1948), A mathematical theory of communication. Bell Sys. Tech. J. 27:379-423 (Part I), 623-656 (Part II). Also reprinted in Key Papers in the Development of Information Theory (Ed. D. Slepian) (1974). IEEE Press, New York, 5-29. Shannon, C. E. (1993): Claude Elwood Shannon: Collected Papers (Eds. N. J. A. Sloane and A. D. Wyner). IEEE Press, New York. Solomon, G. and van Tilborg, H. C. A. (1979), A connection between block and convolutional codes. SIAM J. Appl. Math., 37:358-369.
Schalkwijk, J. P. M., Post, K. A., and Aarts, J. P. J. C. (1979), On a method of calculating the event error probability of convolutional codes with maximum likelihood decoding. IEEE Trans. Inform. Theory, IT-25:737-743. Stiglitz, I. G. (1963), Sequential decoding with feedback. Sc.D. Thesis, Dept. of E. E., MIT, Cambridge, Mass. Schalkwijk, J. P. M., Vinck, A. J., and Post, K. A. (1978), Syndrome decoding of binary rate k/n convolutional codes. IEEE Trans. Inform. Theory, IT-24:553-562. Tanner, R. M. (1981), A recursive approach to low-complexity codes. IEEE Trans. Inform. Theory, IT-27:533-547. Thommesen, C. and Justesen, J. (1983), Bounds on distances and error exponents of unit-memory codes. IEEE Trans. Inform. Theory, IT-29:637-649. Thompson, T. M. (1983), From Error-Correcting Codes Through Sphere Packings to Simple Groups, The Carus Mathematical Monographs No. 21, Mathematical Association of America. Trofimov, A. (1994), Soft output decoding algorithm for trellis codes. Submitted to IEEE Trans. Inform. Theory, Nov. 1994. Ungerboeck, G. and Csajka, I. (1976), On improving data-link performance by increasing channel alphabet and introducing sequence coding. Proc. IEEE Int. Symp. Inform. Theory, Ronneby, Sweden, 53. Ungerboeck, G. (1982), Channel coding with multilevel phase signals. IEEE Trans. Inform. Theory, IT-28:55-67. Ungerboeck, G. (1987), Trellis-coded modulation with redundant signal sets, part I: Introduction. IEEE Commun. Mag., 25:5-11.
Bibliography
417
[Ung87b] Ungerboeck, G. (1987), Trellis-coded modulation with redundant signal sets, part II: State of the art. IEEE Commun. Mag., 25:12-21. [Vi079] Viterbi, A. J. and Omura, J. K. (1979), Principles of Digital Communication and Coding. McGraw-Hill, New York. [Vit67] Viterbi, A. J. (1967), Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory, IT-13:260-269. [Vit71 ] Viterbi, A. J. (1971), Convolutional codes and their performance in communication systems. IEEE Trans. Commun. Technol., COM-19:751-772. Viterbi, A. J. (1990), From proof to product. 1990 IEEE Commun. Theory Work[Vit90] shop, Ojai, CA. [VOJ88] Vinck, A. J., Osthoff, H., Johannesson, R., van der Vleuten, R., and Smeets, B. (1988), Linear complexity decoding for convolutional codes and intersymbol interference channels. Proc. IEEE Int. Symp. Inform. Theory, Kobe, Japan, 28. [VPS80] Vinck, A. J., de Paepe, A. J. P., and Schalkwijk, J. P. M. (1980), A class of binary rate one-half convolutional codes that allows an improved stack decoder. IEEE Trans. Inform. Theory, IT-26:389-392. Wald, A. (1947), Sequential Analysis. Wiley, New York. [Wa147] Wiberg, N. (1996), Codes and decoding on general graphs. Ph.D. Thesis, Dept. E. [Wib96] E., Linkoping University, Linkoping, Sweden. [Wil96] Wilson, S. G. (1996), Digital Modulation and Coding. Prentice-Hall, London. [WLK95] Wiberg, N., Loeliger, H.-A., and Kotter, R. (1995), Codes and iterative decoding on general graphs. Euro. Trans. Telecommun., 6:513-526. [WoJ65] Wozencraft, J. M. and Jacobs, I. M. (1965), Principles of Communication Engineering. Wiley, New York. [Woz57] Wozencraft, J. M. (1957), Sequential decoding for reliable communication. IRE Cony. Rec., Vol. 5, pt. 2, 11-25. See also Wozencraft, J. M. and Reiffen, B. (1961), Sequential Decoding. MIT Press, Cambridge, Mass. [YKH84] Yasuda, Y., Kashiki, K., and Hirata, Y. (1984), High-rate punctured convolutional codes for soft decision Viterbi decoding. IEEE Trans. Commun., COM-32:315319. [Yud64]
[ZiC89]
Yudkin, H. L. (1964), Channel state testing in information decoding. Sc.D. Thesis, Dept. of E. E., MIT, Cambridge, Mass. Zigangirov, K. Sh. and Chepyzhov, V. V. (1989), Study of tail biting convolutional
codes. Proc. 4th Joint Swedish-Soviet Int. Workshop Inform. Theory, Gotland, Sweden, 52-55. [ZiC91]
[Zig66] [Zig72] [Zig74] [Zig75]
Zigangirov, K. Sh. and Chepyzhov, V. V. (1991), On the existence of time-invariant
convolutional codes with transmission rates 2/c, c > 4 which meets the Costello bound. Probl. Peredachi Inform., 3. Zigangirov, K. Sh. (1966), Some sequential decoding procedures. Probl. Peredachi Inform., 2:13-25. Zigangirov, K. Sh. (1972), On the error probability of the sequential decoding in the BSC. IEEE Trans. Inform. Theory, IT-18:199-202. Zigangirov, K. Sh. (1974), Procedures of Sequential Decoding. Svjaz, Moscow. Zigangirov, K. Sh. (1975), Procedures of sequential decoding. In Coding and Complexity-CISM Courses and Lectures No. 216 (Ed. G. Longo). SpringerVerlag, Vienna.
418
Bibliography
[Zig85] [Zig86] [Zig98]
[ZiK80]
[Zi093]
Zigangirov, K. Sh. (1985), New upper bounds for decoding error probability for convolutional codes. Probl. Peredachi Inform., 21:20-31. Zigangirov, K. Sh. (1986), New asymptotic lower bound on the free distance for time-invariant convolutional codes. Probl. Peredachi Inform., 2:34-42. Zigangirov, K. Sh. (1998), APP decoding of convolutional codes. Submitted to Probl. Peredachi Inform.. Zigangirov, K. Sh. and Kolesnik, V. D. (1980), List decoding of trellis codes. Problems of Control and Information Theory, No. 9:347-364. Zigangirov, K. Sh. and Osthoff, H. (1993), Analysis of global-list decoding for convolutional codes. European Trans. Telecommunications, 4:165-173.
Index
A
Abstract state, 61 space, 61 Active burst distance, 121 see also Active distance Active column distance, 119, 137 see also Active distance Active distance active burst distance, 121 active column distance, 119, 137 lower bound, 147 normalized, 149 active reverse column distance, 120, 138 lower bound, 147 normalized, 149 active row distance, 118, 137 lower bound, 145 normalized, 148 active segment distance, 120, 138 lower bound, 148 normalized, 149 Active reverse column distance, 120, 138 see also Active distance Active row distance, 118, 137 see also Active distance Active segment distance, 120, 138 see also Active distance Additive white Gaussian noise channel (AWGN), 3 Anderson, J. B., 242, 243, 266, 267, 370 Antipodal signaling, 2
A posteriori probability (APP) decoding, 317 backward-forward algorithm, 344 backward metric, 324 forward metric, 324 one-way algorithm, 334-337 two-way algorithm tailbiting trellis, 330-334 terminated convolutional code, 321-329, 344 APP decoding, see A posteriori probability (APP) decoding Arbib, P. L., 44 Asymptotic coding gain, 181, 374 AWGN, see Additive white Gaussian noise channel B
Back-search limit, 220, 302, 310, 312, 334 Bahl, L., 344 Bandwidth efficiency, 371 Basis, 10 Baum, L. E., 344 Bayes' rule, 7, 270 BCH, see Block code (BCH) BEC, see Binary erasure channel Berlekamp, E. R., 22, 234, 313 Berrou, C., 344 Bhattacharyya, A., 170, 212 Bhattacharyya bound, 170, 212 Bhattacharyya parameter, 170, 197, 212, 222 Binary entropy function, 24, 205, 401 Binary erasure channel (BEC), 214 419
420
Index
Binary phase-shift keying (BPSK), 2 Binary symmetric channel (BSC), 3, 4 Bit, 1 Bit energy, 5, 176
Bit error probability, 4, 169, 170, 173, 175, 180 Bit error rate, 4 Blahut, R. E., 5 Block code, 6 Bose-Chaudhuri-Hocquenhem (BCH), 16 coset, 380 cyclic, 16 dual, 12 expurgated, 29 extended, 27, 29 Golay, 28, 228, 229 Hamming, 11 inner, 22 linear, 10 maximum-distance-separable (MDS), 13 minimum distance, 9 orthogonal, 12 outer, 22 perfect, 28 Reed-Solomon (RS), 13 repetition, 12, 28 self dual, 12 single-error-correcting, see Hamming code tailbiting, 224, 362, 369 Block coding exponent, 192 see also Exponent Block error probability, 7 Block length, 6 BPSK, see Binary phase-shift keying Branch metric, 181 Brouwer, A. G., 363 BSC, see Binary symmetric channel Burst error length, 186 Burst error probability, 169, 170, 178, 179, 184, 185
Bussgang, J. J., 103 Butman, S. A., 24 C
Cain, J. B., 236 Calderbank, A. R., 227 Capacity limit, 6, 23 Cascaded concatenated code, 149 Catastrophic error propagation, 351 Cedervall, M., 181, 234, 313, 370 Chang, R. W., 344 Channel capacity, 1 Channel coding, 1 theorem, 5 Chepyzhov, V. V., 143, 229 Chevillat, P. R., 113
Clark, G. C., Jr., 236 Cocke, J., 344 Code, see Block code; Convolutional code; Trellis code Code rate, 6 Code sequence, 31, 34 Code tree, 18 Codeword, 6 Codeword estimator, 45 Coding gain, 5 Coding theory, I Coherent demodulation, 2 Column distance, 109 active, 119, 137 active reverse, 120, 138 function, 113 lower bound, 147 normalized, 149 Column-interleaved matrices, 341 Complementary error function, 4 Complexity of decoding, 8, 220 Computational cutoff rate, 187, 213, 230 Computing distance spectrum algorithm, 347-350 stop condition, 348 Concatenated coding, 22, 149-153 Constraint length, see Convolutional generator matrix overall, see Convolutional generator matrix Controller canonical form, 32 Convolutional code, 16, 34 convolutional dual code, 93 definition, 34 distance profile, 111 dual, 93, 361 equivalent, 36, 97 free distance, 113 high-rate, 360-361 low-density parity-check, 337-344 homogenious, 339 low-rate, 357-359 minimum distance, 111 optimum distance profile (ODP), 112 punctured, 236 redundancy, 380 time-invariant, 148 time-varying, 136 Convolutional coding exponent, 200, 202 lower bound, 209 see also Exponent Convolutional encoder, 17, 35 equivalent, 52 minimal, 72, 393 parallel concatenated, 317-318
421
Index
parity-check matrix, 91 polynomial, 36 systematic, 36, 96 time-varying, 136 Convolutional encoding matrix, 38 basic, 52 catastrophic, 37 definition, 38 equivalent, 52 minimal-basic, 52 noncatastrophic, 37 ODP, 112 quick-look-in (QLI), 351, 353, 354 systematic, 97 systematic polynomial, 251 systematic polynomial periodically time-varying, 201 see also Convolutional generator matrix Convolutional generator matrix, 17, 35 canonical, 73 catastrophic, 37 constraint length, 55, 73 overall, 55, 73 equivalent, 52 memory, 55 minimal, 64 noncatastrophic, 37 nonsystematic, 36 polynomial, 47 rational, 46 realizable, 46 reciprocal, 120 systematic, 36, 97 see also Convolutional encoding matrix Convolutional scrambler, 339 delay, 339 homogenious, 341 identity, 339 multiple, 340 rate, 341 size, 346 uniform, 340 Convolutional sphere-packing exponent, see Exponent Convolutional transducer, 34 Correct path, 123, 271 Correct path loss, 240 expurgation bound, 255 lower bound, 261 sphere-packing bound, 258 upper bound, 261 Viterbi-type, 264, 266 Correct state, 123 Coset, 13 Coset leader, 14
Costello, D. J., Jr., 103, 109, 111, 113, 114, 139, 142, 151, 158, 351 Costello bound, see Free distance cascaded concatenated code, 151 Creeper algorithm, 278-287 buds, 280 computational analysis, 310-312 current node, 278 error probability analysis, 312-313 examined node, 278 exchange operation, 281 node stack, 280 stem, 278 subtrees, 278 successors, 278 threshold, 280 threshold stack, 280 Critical length, 195 Critical rate, 189, 193 Crossover probability, 3 Csajka, J., 388 Cumulative metric, 181 Fano, see Fano metric Gallager, see Gallager symbol metric Viterbi, see Viterbi metric Zigangirov, see Zigangirov symbol metric D
Data transmission rate, 1 De Bruijn graph, 19 Decoder, 7 APP, see A posteriori probability (APP) decoding List, see List decoding MD, see Minimum distance decoding ML, see Maximum-likelihood (ML) decoding sequential, see Sequential decoding Viterbi, see Viterbi decoding Decoding error, 7 bit, see Bit error probability burst, see Burst error probability first event, see First event error probability word, see Word error probability Defect, 79 external, 85 internal, 84 Degree, 32 Deinterleaver, 22 Delayfree element, 31 Delayfree matrix, 35 Demodulation, 2 Determinantal divisor, 39 Discrete memoryless channel (DMC), 4, 211 output symmetric, 215
422
Index
Distance profile, see Convolutional code optimum (ODP), see Convolutional code Distance spectrum, 347 DMC, see Discrete memoryless channel D-transform, 31 E
Elementary operations, 39 Elias, P., 25, 233, 266, 344 Encoder, 6 inverse, 45 right, 46 memory, 17 state, 19, 61, 347 state space, 61 Encoding matrix block codes, 10 convolutional codes, see Convolutional encoding matrix Encoding rule, linear, 10 Engdahl, K., 344 Ensemble of convolutional codes, 128 time-varying, 136, 211 Equivalent block code, 10, 16 Equivalent convolutional code, 36, 97 Equivalent convolutional encoder, 52 Equivalent convolutional encoding matrix, 52 Error amplification factor, 353 Error bound, convolutional code bit error, periodically time-varying, 203 lower bound, 210 systematic, 203 bit error, time-invariant van de Meeberg, 175 Viterbi, 175, 179 burst error, periodically time-varying, 200 lower bound, 207, 209 systematic, 201 burst error, time-invariant finite back-search limit, 220, 223 expurgation, 221 random coding, 221 sphere-packing, 222 tighter Viterbi-type, 184 van de Meeberg, 171, 185 Viterbi, 170, 178, see also 179 burst error, time-varying sequential decoding Creeper, 312, 313 Fano, 310 Stack, 301, 304 Viterbi decoding, 215, 220
burst length, periodically time-varying, 193 expurgation, 189 random coding, 187 sphere-packing, 192 Error burst length exponent, see Exponent Error control coding, 2 Error event, 186, 195 Error exponent, see Exponent Error pattern, 9 Estimated message, 7 Even-weight code, 27 Exponent block coding, 192 sphere-packing, 207 convolutional coding, 200 lower bound, 209 sphere-packing, 207 error burst length, 193 finite back-search limit, 223 Expurgation bound, 189, 221 see also Error bound Expurgation function, 197, 261 Expurgation rate, 187, 193 Extended signal flowchart, see Signal flowchart Extrinsic information, 319 F
Falb, M. A., 44 Falconer, D. D., 309, 313 Fano, R. M., 25, 233, 313 Fano algorithm, 269, 276-278 computational analysis, 305-309 error probability analysis, 309-310 stepsize, 276 threshold, 276 Fano metric, 271, 293, 301, 309, 312 branch, 271 symbol, 272 FAST, 349-350 Field binary, 8 binary Laurent series, 31 binary rational functions, 32 Finite back-search limit exponent, 223 First event error probability, 169 First memory length, 111 Forney, G. D., Jr., 25, 26, 58, 60, 69, 81, 84, 85, 103, 145, 168, 227, 234, 344, 384, 388 Forney's inverse concatenated construction, 194 Free distance, 21 Costello bound, 139 definition, 113 Griesmer bound, 133, 135, 359 Heller bound, 132, 135, 359
423
Index
asymptotic, 133, 135, 208 squared Euclidean, 373 G
Gallager, R. G., 199, 233, 234, 344 Gallager function, 199, 236, 261, 401 Gallager symbol metric, 294, 301, 310, 312 Gaussian memoryless channel, 4 capacity, 4 Geist, J. M., 236 Generator matrix block, 10, 35 convolutional, see Convolutional generator matrix Generator submatrix, 17 Gilbert-Varshamov bound, see Minimum distance Gilbert-Varshamov parameter, 128, 132, 188, 189, 192, 193 Glavieux, A., 344 Globally orthogonal set, 80 Golay, M. J. E., 25, 28 Golay code, see Block code Golomb, S. W., 19 Griesmer, J. H., 133 Griesmer bound convolutional codes, see Free distance linear block codes, 133 H
Haccoun, D., 313 Hamming, R. W., 25 Hamming bound, see Minimum distance Hamming code, 11 Hamming distance, 8, 204 Hamming sphere, 28 Hamming weight, 8 Hancock, J. C., 344 Hard decisions, 3, 5 Hartley, B., 44 Hawkes, T. 0., 44 Heller, J. A., 26, 132, 313 Heller bound, see Free distance Host, S., 159 I
Incorrect path, 271 Incorrect segment, 123 Incorrect sequences, 140 Information, 1 Information sequence, 31, 34 Information symbols, 10 Information theory, 1 Interleaver, 22, 339
Intrinsic information, 319 Invariant-factor, 39, 44 Invariant-factor decomposition, 44 Invariant-factor theorem, 87-89 Iterative decoding, 317-346
J Jacobs, I. M., 26, 313 Jacobson, N., 38, 75 Jelinek, F., 25, 313, 344 Jimenez, A., 340, 344 Johannesson, R., 103, 111, 158, 159, 181, 234, 267, 313, 370 Justesen, J., 159 K
Kalman, R. E., 44 Knuth, D. E., 242 Kolesnik, V. D., 261, 266 Kotter, R., 344 L
Lattice, 378 coset, 378
fundamental values, 381 orthogonal transformation, 379 partition, 378 scaling, 379 square, 378 sublattice, 378 Laurent polynomials, 32 Laurent series, 31 delay, 31 field, see Field Layland, J., 132 Lee, L.-N., 232 Lee, S. C., 72, 99, 393 Likelihood ratio for output letter, 232 on the unquantized channel, 232 Lin, C.-f., 266 Linkabit Corporation, 26 List decoding, 239-268 algorithm, 240 List minimum weight, 247, 248 List path weight enumerator, 264 List size, 239 List weight, 248, 249, 251, 253 Loeliger, H.-A., 34, 69, 103 M
Ma, J. H., 234 M-algorithm, 266
424
Index
McEliece, R. J., 23, 24, 132 MacWilliams, F. J., 145 MAP, see Maximum a posteriori probability Markov inequality, 255 Mason, S., 154 Massey, J. L., 25, 103, 111, 230, 231, 233, 313, 351
Matched filter, 3 Maximum a posteriori probability (MAP) decoding, 7, 325 Maximum-likelihood (ML) decoding, 7 see also Viterbi decoding MD, see Minimum distance decoding MDS, see Block code (MDS) Memory, see Convolutional generator matrix Memoryless channel, 3 discrete (DMC), 4, 211 Mergeable node, 296 Message, 6 Metric, 8 see also Cumulative metric Minimum distance block code, 9 Brouwer-Verhoeff lower bound, 363,
368-369 Brouwer-Verhoeff upper bound, 363, 368-369 Gilbert-Varshamov bound, 145, 363-369 Hamming bound, 28 inter, 226 intra, 226 Plotkin bound, 132 Singleton bound, 13 tailbiting, 226 convolutional code, 111 Minimum distance (MD) decoding, 8, 165 Minimum weight, 12 Mittelholzer, T., 69, 103 ML, see Maximum-likelihood decoding see also Viterbi decoding Modulation, 2 Modulation code, 373 decoding, 387-389 Modulo-two arithmetic, 9 Mohan, S., 242 Moment generating function, 399 Monna, A. F., 80 Multiple-error event, 168 N
Node error probability, 169 Nonoptimality, principle, see Principle of nonoptimality Number of errors, 9 Nyquist rate, 23 Nystrom, J., 313
0 Observer canonical form, 34 ODP, see Optimum distance profile OFD, see Optimum free distance encoder Olson, R. R., 103 Omura, J. K., 69 Optimum distance profile (ODP) code, 112 definition, 111, 112 encoder, 351-352, 354, 355, 356, 357, 358, 359, 360, 361
generator matrix, 111 Optimum free distance (OFD) encoder, 353 Osthoff, H., 267 P Paaske, E., 370 Parity check, 11 Parity-check matrix, 11 Parity-check symbols, 12 Path weight enumerator, 114, 153, 154 extended, 156 Petrie, T., 344 Phase-shift keying (PSK), 2 Pinsker, M. S., 234 Piret, E, 103 Plotkin bound, see Minimum distance, block code Polynomial, 32 degree, 32 Laurent, 32 ring, 32
p-orthogonal set, 80 Power series, 31 ring, 31 Predictable degree property, 59, 81 Predictable p-valuation property, 81 global, 81 p-residue matrix, 81 Principle of nonoptimality, 163 Product formula, 76 Pseudo inverse matrix, 51 PSK, see Phase-shift keying Pulse-amplitude-modulation, 381 Q
QAM, see Quadrature amplitude modulation Quadrature amplitude modulation (QAM), 371-372 in-phase component, 371 quadrature component, 371 Qualcomm Inc., 22, 26 Quantization, 230 optimum, 231
425
Index
R
Random coding bound, see Error bound Random walk, 129, 399 absorbing barrier, 399 threshold, 402 tilted, 403 Rate distortion theory, 23 Rational function, 32 causal, 76 degree, 76 delay, 76 field, 32 finite, 76 Rational transfer function, 33 Rational transfer matrix, 33 Raviv, J., 344 Realizable function, 33 Received sequence, 7 Reciprocal generator matrix, 120 Redundancy, 7 Remaining weight, 347 Ring binary polynomials, 32 formal power series, 31 Roos, C., 71 Row-column-interleaved matrices, 342 Row distance, 114 active, 118, 137 low bound, 145 normalized, 148 Row space, 10 RS, see Block code (RS)
Singleton bound, see Minimum distance Sloane, N. J. A., 145 Smeets, B. J. M., 143 Smith form, 38 Soft decisions, 4, 5 Solomon, G., 234 Source coding, 1 Span of Laurent series, 66 Spectral bit rate, 26 Spectral component, 114 Sphere-packing bound, 192, 205, 222, 258 Stack algorithm, 269, 274-275 average number of computations, 290-295 computational analysis, 289-295 error probability analysis, 296-305 error probability bound, 301, 302 lower bound for computational distribution, 295
upper bound for average number of computation, 293 upper bound for computational distribution, 294 Standard array, 14 State-transition diagram, 19, 153 State variable, 347 Stiglitz, I. G., 313 Stahl, P., 370 Symbol energy, 176 Syndrome, 14, 96 Syndrome decoder, 15 Syndrome former, 91 time-varying, 339
S
T
Sain, M. K., 103 Savage, J. E., 309, 313 Scalar multiplication, 9 Sequential decoding, 22, 269 315 finite back-search limit, 300 Set partitioning, 374 Shannon, C. E., 1, 4, 5, 23, 25, 234 Shannon limit, 5, 344 Shannon's separation principle, 1 Signal average energy, 372 Signal constellation, 371 cross constellation, 383 PSK, 386-387 QAM, 374 Signal flowchart, 154 extended, 156 Signal point, 371 Signal set, 385 geometrically uniform, 385 regular array, 385 Signal-to-noise ratio, 1
Tanner, R. M., 344 Thitimajshima, P., 344 Thommesen, C., 159 Thompson, T. M., 25 Transfer function delayfree, 35 matrix, 33, 34 rational, 33 realizable, 33 Transformation of a metric space, 384 isometry, 384 Transmission gain, 229 Tree code, 269 Trellis, 20, 26 circular, 225 tailbiting, 223 Trellis code, 20 fundamental coding gain, 382 geometrically uniform, 384-387 lattice type, 378-384, 380 shape gain, 382
426
Index
Trofimov, A., 344 Two-way algorithm tailbiting trellis, 330-334 terminated convolutional code, 321-329, 344 see also A posteriori probability decoding U
Ungerboeck, G., 374, 388 V
Valuations, 75 exponential, 7 p-valuations, 75 van de Meeberg, 171, 175, 236 van de Meeberg bound, see Error bound, convolutional code van Tilborg, H. C. A., 234 Vardy, A., 227 Vector addition, 9 Vector space, 9 Verhoeff, T., 363 Viterbi, A. J., 21, 25, 26, 154, 165, 170, 178, 179, 234 Viterbi algorithm, 21, 165, 229 Viterbi bound, see Error bound, convolutional code Viterbi decoding, 387
Viterbi metric, 164 branch, 164 symbol, 164
w Wald, A., 404 Wald's equality, 405 Wald's identity, 182, 216, 290, 404-406 Wan, Z.-X., 103 Wei, L.-F., 384 Weight spectrum, 114 Wiberg, N., 344 Wilson, S. G., 181 Wolf, J. K., 234 Word error probability, 7 Wozencraft, J. M., 25, 313 Y
Yudkin, H. L., 234, 313
z Zero state driving informtion sequence, 114 Zigangirov, K. Sh., 25, 143, 158, 159, 181, 223, 229, 234, 261, 266, 267, 294, 313, 340, 344
Zigangirov symbol metric, 302, 304, 310, 312 Zimmermann, H., 154 Zyablov, V., 159
About the Authors
Rolf Johannesson was born in Sweden in 1946. He received the M. S. and Ph.D. degrees in 1970 and 1975, respectively, both from Lund University, Sweden. From 1970 to 1976, Dr. Johannesson held various teaching and research positions in the Department of Telecommunication Theory at Lund University. During 1973 to 1974, he spent 14 months in the Department of Electrical Engineering, University of Notre Dame, IN, working on convolutional codes under supervision of Professor James L. Massey. While at the University of Notre Dame, he was awarded the American-Scandinavian Fellowship for the period of August 1973 to June 1974. From 1976 to 1985, he was an associate professor of Computer Engineering at Lund University. In 1985, he received the Chair of Information Theory at Chalmers Institute of Technology, Gothenburg, Sweden, and soon thereafter he received the Chair of Information Theory at Lund University. During 1986 to 1996, he was Head of the Department of Information Theory, and since 1996 he has been Head of the newly formed Department of Information Technology, both at Lund University. During 1988 to 1993, he was Dean of the School of Electrical Engineering and Computer Sciences and, during 1990 to 1995, he was a member of the Swedish Research Council for Engineering Science. Dr. Johannesson was associate editor of the Journal of Information & Optimization Sciences from 1980 to 1993, and from 1998 he is associate editor for the International Journal of Electronics and Communications. His scientific interests include information theory, error-correcting codes, and cryptography. In addition to papers in the area of convolutional codes and cryptography, he has published two textbooks on switching theory and digital design (both in Swedish) and one on information theory (in both Swedish and German). He became a Fellow of the IEEE in 1998 for "contribution to the understanding and application of convolutional codes".
Kamil Sh. Zigangirov was born in the USSR in 1938. He received the M. S. degree in 1962 from the Moscow Phisico-Technical Institute and the Ph.D. degree in 1966 from the Institute of Radio Engineering and Electronics of the USSR Academy of Sciences, Moscow. From 1965 to 1991, Dr. Zigangirov held various research positions at the Institute for Problems of Information Transmission of the USSR Academy of Sciences, Moscow, first as a Junior Scientist and later as a Main Scientist. During this period, he visited several times
427
428
About the Authors
as a guest researcher universities in the U.S., Sweden, Italy, and Switzerland. He organized several symposia in information theory in the USSR. In 1994 Dr. Zigangirov received the Chair of Telecommunication Theory at Lund University. His scientific interests include information theory, coding theory, detection theory, and mathematical statistics. In addition to papers in these areas, he has published a book on sequential decoding of convolutional codes.
ELECTRICAL ENGINEERING
Fundamentals of Convolutional Coding A volume in the IEEE Press Series on Digital and Mobile Communication John P. Anderson, Series Editor
Convolutional codes, among the main error control codes, are routinely used in applications for mobile telephony, satellite communications, and voice-band modems. Written by two leading authorities in coding and information theory, this book brings you a clear and comprehensive discussion of the basic principles underlying convolutional coding. Fundamentals of Convolutional Coding is unmatched in the field for its accessible
analysis of the structural properties of convolutional encoders. Other essentials covered in Fundamentals of Convolutional Coding include: Distance properties of convolutional codes Viterbi, list, sequential, and iterative decoding Modulation codes Tables of good convolutional encoders Plus an extensive set of homework problems The authors draw on their own research and more than 20 years of teaching experience to present the fundamentals needed to understand the types of codes used in it variety of applications today. This book can be used as a textbook for graduate-level electrical engineering students. It will be of key interest to researchers and engineers of wireless and mobile communication, satellite communication, and data communication.
ABOUT THE AUTHORS Rolf Johannesson holds the Chair of Information "I'heory at Lund University, Sweden. He has written two textbooks on switching theory and digital design, as well as a textbook on information theory. Dr. Johannesson's research interests include information theory, error-correcting codes, and cryptography. He is a Fellow of the IEEE.
Kamil Sh. Zigangirov holds the Chair of Telecommunication Theory at Lund University, Sweden. He is widely published in the areas of information theory, coding theory, detection theory, and mathematical statistics. Dr. Zigangirov is the inventor of the stack algorithm for sequential decoding.
IEEE Press 445 Hoes Lane P.O. Box 1331 Piscataway, NJ 08855-1331 USA 1-800-678-IEEE (Toll-Free USA and Canada) or 1-732-981-0060
IEEE Order No. PC5739
ISBN 0-7803-3483-3