This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
and ( p , q ) -. 2. If p~,(z) e F~,[z] is right (left) orthogonal to z k, then p#v(z) e F~,[z] is left (right) orthogonal to z "-k . Proof. 1. For the first part we see that
(p, q)
_
pHT~,,~, q _ []; p]H[ ~ T~,,,,, .I ][ I q]
=
[.i p]H[T~,,~,]T[ .i q] -- [i q]TT~,,~,[ i p]
= q#HTt,,~,p#--. For the second equality we note that
(zp, zq)
-: = -
[0 pH]Tv+l.v+l[0 qT]T [pH o]ZTTv+I.~+IZ[qT 0]T [pg 0][T..~ | 0][q T 0] T pHT~,,~,q -- (p, q).
2. For the second property note that (zk,p(z)> - 0 implies that also
- 0 by part 1 of this theorem. El
4.9.3
Iohvidov indices
We introduce now indices which are closely related to the notion of characteristic of a Toeplitz matrix as given by Iohvidov [156, p. 106] or the one given by Heinig and Kost [144, p. 91]. In those definitions, the characteristic is a triple of integer numbers that can be associated with an arbitrary Toeplitz matrix. We shall associate with an infinite Toeplitz matrix T two sequences of couples of integers: a sequence of left couples and a sequence of right couples. They will fix the size of the sequence of nonsingular leading principal submatrices of T, hence the block sizes for the biorthogonal polynomials. We shall call them Iohvidov indices, although this might be different from what is meant by the same term in other publications.
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
184
Let T - [#./_~] be a Toeptitz matrix whose nonsingular leading principal submatrices are Tk, k - 1, 2 , . . . of size tck. Let bo,k and ao,k be the uniquely defined TOPs of degree ~r associated with the moment matrix T. Define ak+ 1 to be the smallest integer a >_ O, such that
Definition
4.17 (Iohvidov
indices)
-[~ct+l
ak+ 1+
-''
~k+ct+l]aO,k r
0
tO be the smallest integer a _> 0 such that
. . . /z-a]ao,k 7~ 0.
The numbers ( a-~+l , ak+ + 1 ) are called the right Iohvidov indices for block k. The left Iohvidov indices (fl~-+l, #+ k+l) for block k are defined similarly, using the left orthogonal polynomial bo,k. Note t h a t the Iohvidov indices can be infinite. Clearly, using T h e o r e m 4.25, we also have C~k+l
a++l
fl~-+l
flk+ 1
(1,z~+~o,k(~)) r 0}
=
min{O ___ a
"
=
min{O _< a
9
=
min{O<_a
9
=
min{O_
"
=
min{O<_fl
9
=
min{O _< fl 9
--
min{O_<j5
=
min{O <_ fl 9
a
o,k(z), z'~ TaT1 ) r 0}r o} r o} ,bo,k( z
o}
9 I , z #h# t ~o,k~z)) r
o}
A n o t h e r way of expressing these relations is by saying t h a t
Zk
, ~o,k(z)) -
while
/ /
-~- r
o,,,+
rk r
k - - a k - ~ _1 _ - 1 + k - - a k 4 . 1 , 9 9 Nk -~- ak4.1 - 1 k-
+ N;k -~-O:k+ 1.
-~; r O, ,,+
sk r
-1 k-
~k + fl+ k+l"
4.9.
T O E P L I T Z CASE
185
These left and right indices are of course not completely independent. Comparing the expressions for ak+ 1 + and flk+l , and also the other two indices, already suggests that "a0,k is like b0,k # and ~k+l + is like f~k-+1 + 1" and similarly for the other two. It will turn out t h a t we usually do have the relations ak+ 1 + - ~k-+l + 1 and f~k+l - ak+ 1 + 1, while, up to a normalization a0,k is the same as bO,k" # This can not be true in general since the indices are nonnegative integers, it is impossible for c~k+ 1 + to be ~k+l + 1 when ak+ 1 + - O. This will be an exception to the previous rule. The next l e m m a shows that ak+ 1 + - 0 happens iff ~k+t+ - O. L e m m a 4.26 Let T be Toeplitz then the Iohvidov indices satisfy ak+ 1+ -0 iff flk+l + - 0 iff block k has size 1. P r o o f i Set u + 1 = ~k, then T~,~ is invertible by definition. Let Au+l be the unit upper triangular matrix containing the stacking vectors of the first u + 2 right block orthogonal polynomials. Similarly, B v + l contains the corresponding vectors for the left polynomials. Set
Av+l -
0
a~,+l
a n d B~,+I -
0
Then
B~,H T~,.~,A~, - D~ and B~H1T~+I,~+IA~+x - D~ | d u + l , with det B~ - det Bz,+l - det A,, - det A v + l - 1 and thus det T~,,~, -det D~ ~ 0 and det Tv+t,v+l = d~+l det D~. Thus T~,+1,~+1 is singular iff dv+l - (b~,+l, a~,+l) - 0. The remaining equivalences follow easily since
(b~+l, av+l}
-- / z v+l ,avTi / - - ( b v T i , z ~ - k - 1 ) . [:]
4.9.4
Row
recurrence
The Szeg6-Levinson recursion gives a recurrence that was quite unlike the three t e r m recurrence in the Hankel case. It works especially for polynomials orthogonal with respect to a measure on the complex unit circle. We shall return to this in a later section. However, there is also a so called row recurrence (since one moves along a row in a certain Pad~ table) which is
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
186
the Toeplitz version of the three term recurrence. This is what we shah look at in this subsection. The generalized three term recurrence relation can be derived in several ways. In the papers by Jonckheere and Ma [158, 159] we find a derivation from the linear algebra point of view which is parallel to the deduction in our Section 2.2. In [38], the derivation is more Euclidean style as in our first chapter. Many other authors have obtained similar algorithms. The reader is referred to the references in those papers. We summarize these results to illustrate how closely they follow the Hankel case. The next l e m m a shows that a number of right orthogonal polynomials can be chosen simply by shifting a0,n if a~+ 1 is positive and a number of left orthogonal polynomials can be obtained by shifting b0,n if ~ + 1 is positive. L e m m a 4 . 2 7 Let T be a Toeplitz matrix with nonsingular leading principal submatriz Tn-1 - T~,,~, 5.e., u Jr 1 - '~n) and an-+1 a Iohvidov indez for block n. Then we can choose a~,+k+l(Z)- zkav+l(Z) - zkao,n(z)
for
k - O,...,an-+l
where {ak} is a set of right block orthogonal polynomials for the moment matrix T. Of course a dual result holds for the left polynomials. P r o o f . Note that the invertibility of Tv,~, implies t h a t the monic a~+l (z) = ao,n(z) is uniquely defined. Furthermore, we have by definition t h a t
( 1,zka~,+l)-0
for
k-0,...,an+
1.
If we add to this the fact t h a t a~+l is orthogonal to all z k for k - 0, 1 , . . . , v by definition of block orthogonality, we get (z ~ , a ~ + l ) - 0 Therefore
/ z ~, zka~,+l / -
for
a--an+l,...
0 for i - k - - a n + l , . . .
,v. , u. Consequently, zkav+l
is right orthogonal to z i for i - k - a ~ + l , . . . , u + k. In other words, zka~,+l is right block orthogonal as long as k - an+ 1 _< 0 or k _< an+ 1. D We could of course compute the orthogonal polynomials by a block twosided Gram-Schmidt algorithm. However, we shall propose a generalized three term recurrence which will compute the right orthogonal polynomials in blocks, independently from the left polynomials. Unfortunately, these blocks will not coincide with the blocks we considered in previous sections. It
4.9.
T O E P L I T Z CASE
187
was already suggested by the previous lemma that if a~+ 1 > 0 and an+ 1 + - 0, then shifting of a0,n gives a simple update across a~+l classical blocks of size 1. We shah rearrange the right block orthogonal polynomials (where block has the classical meaning of being related to nonsingular leading principal submatrices of the moment matrix) in somewhat larger blocks. For example, the shifted KOPs of the previous lemma would typically belong to the same block of ROPs, even though they might (or might not) belong to different classical blocks of size 1. Thus we rearrange the ROPs in blocks defined in function of the computational procedure. To make the distinction, we call them right blocks (R-blocks) while the classical blocks related to the nonsingular leading principal submatrices of T are called T-blocks (T-blocks do not depend on the adjective left or right). A formal definition of the Rblocks is given below. The R-blocks are the same as the T-blocks, except that some of the l~-blocks may group a sequence of T-blocks of size 1. Our previous block structure is a refinement of the R-block structure: The boundaries of l~-blocks are boundaries of T-blocks, but the converse need not be true. There is a similar algorithm for the left orthogonal polynomials which will generate these as (in principle) yet another set of blocks, which will be called L-blocks. We shall develop the recursion for the right orthogonal polynomials, leaving the left dual as an exercise for the reader. We shall describe how to compute R-blocks of right orthogonal polynomials. The R-block sizes will be denoted by pk, k - 1, 2, .... We denote also P0,n - ~ = 1 Pk, which are the l~-block indices. The right Iohvidov indices corresponding to l~-block number n will be denoted by Pn+l• They form a subsequence of the Iohvidov indices a,~+l+ which correspond to the T-blocks. We are now ready to start the derivation of the row recursion. We first show how to start up the recursion. L e m m a 4.28 Let T = [#j_i] be a moment matrix and let ao = 1. The right Iohvidov indices for R-block 1 (and thus also for T-block 1) are p{--a{ pl+ - a
t
-
min{a 9 / Z a + l - {1, za+la0) # 0 } - o r d f + ( z ) -
--
min{a 9 # _ , - ( z " , a 0 ) # 0 } - o r d f - ( z ) .
1
1. If p+ - oo, then T is strictly upper triangular and there is no invertible leading principal submatrix. There is only one infinite T-block which is also an R-block. The right orthogonal polynomials do not have to satisfy any orthogonality condition and they are arbitrary of appropriate degree.
C H A P T E R 4.
188
ORTHOGONAL POLYNOMIALS
2. If Pl - oo, then we may choose ak(z) - z k, k - O, 1 , . . . which gives an infinite R-block, the only one there is. (a) If P+I - O, then T is lower triangular with nonzero diagonal. All the T-blocks have size 1. (b) If p+ > O, then T is strictly lower triangular. There is no invertible leading principal submatriz. There is only one infinite T-block which is an infinite R-block. The above choice for the right orthogonal polynomials still holds but any other set of polynomials of appropriate degree would do as well since there are no orthogonality conditions to be satisfied. 3. If v - p+ + p~ < oo then there is an R-block number 0 of size pl - v + 1 containing the right orthogonal polynomials ak(z) - z k, k - 0 , . . . , v. (a) T~,,~, is the first nonsingular leading principal submatriz of T for which a monic true right orthogonal polynomial a~,+l exists, which can not be chosen to be z ~'+1 . (b) If p+ - O, the R-block number 0 contains v T-blocks of size 1. (c) If P+I > O, then T~,,~, is the first nonsingular leading principal submatriz of T. It has the form 0
...
0
I~l+p~
...
9
#,, 9
9
0
,L/,l_l_p~
#_p~
0
9
9
9
,u_,,
9
~
""
~-o~
0
...
0
with #_p+ #l+p~- ~ 0. (d) The right orthogonal polynomial av+l, which is the first one of R-block number 1 can be computed by solving the system Tv,~,~,+l -- - [ # v + l -
z
+
"'" #I]T.
The system can be solved very
e.ffi-
ciently. P r o o f . We only check case 3. The fact t h a t the polynomials in B.-block 0 can be chosen to be the powers of z follows from L e m m a 4.27 if p+ - 0, and
4.9.
189
TOEPLITZ CASE
in the case pl+ > 0, this is a trivial observation since T~,~ is then the first nonsingular leading principal submatrix of T. For k < v, we can choose for ak any monic polynomial of degree k because it need not be orthogonal to any other polynomial of lower degree. We have chosen R-block 0 to be of size v + 1 because it is the first value of n for which the orthogonality requirements- 0 for k - 0 , . . . , n, lead to a unique nontrivial monic solution. Indeed, there is either a trivial solution z n+l (for n - 0 , . . . , p ~ - - 1) or no monic solution (for n - & , . . . , v 1). Since T~,~ is nonsingular, the system defining a~+l and reflecting the true right orthogonality of a~+l, is uniquely solvable. [:] !
Note t h a t we could have a situation with IZ0 ~ 0 and ~1 - 0. In that case p+ - 0 and p~- > 0. The submatrices T0,0 as well as T1,1 are nonsingular, while the polynomial al is obtained by shifting a0, i.e. both are considered to be in the same R-block number 0 of right orthogonal polynomials while T-block number 0 in the sense of earlier sections clearly has size 1. It was said in the previous l e m m a that if p~- - cr then an infinite 1%block of right orthogonal polynomials can be obtained by shifting the initial one. This observation is true in general. If T,,_I - T~,~ (v + 1 - P0,n) is nonsingular and the corresponding Iohvidov index Pn+l is infinite, then an infinite block of shifted right orthogonal polynomials exists. This fact is shown in the next lemma. L e m m a 4.29 Let T be a Toeplitz matrix and assume T~,,v is nonsingular. A s s u m e it defines the first n R-blocks, thus v + 1 = po,,~. The monic right block orthogonal polynomial a~,+l is a T O P and thus uniquely defined and therefore also the corresponding right Iohvidov indices Pn+ + 1 are defined. Let a~,+l (z), k O, 1 , . . . is a set of right P,~+I be infinite. Then a v + k + l ( Z ) - z k orthogonal polynomials. The R-block number n is infinite. If moreover pn+ 1 - O, then this R-block consists of infinitely m a n y Tblocks of size 1 9 If however Pn+l + > O, then there is no leading principal submatrix Tk,k that is invertible for some finite k > v, i.e. the T-block is also infinite. P r o o f . By L e m m a 4.27, we know that zka~,+l(z) is right orthogonal to all z i for i - 0 , . . . , v + k. We can express this in matrix terms as ([lz ...],[a0al
. . . a: l a~+l za~+l . . . ]> - T A - R
190
C H A P T E R 4.
ORTHOGONAL POLYNOMIALS
with T the infinite moment matrix and A the infinite unit upper triangular matrix containing the stacking vectors of the right orthogonal polynomials, as we have described them above. The matrix R will have the form R
R0o R10
0 ] Rll
with R00 of size v + 1, block lower triangular and invertible and Rll is lower triangular Toeplitz with zero diagonal element iff P++x > 0. Comparing the principal leading principal submatrices on the left and on the right, we see that all submatrices Tk,k are singular for k -> v + 1 if P,~+I + > 0 or they are all invertible if Pn+l + - O. [3 m
Note that the condition Pn+l - c~ is sufficient to have an infinite block of right orthogonal polynomials, but the extra condition Pn+l + > 0 is necessary to have an infinite block of singular submatrices. We can check this by considering the following example E x a m p l e 4.4 Consider the moments satisfying /z0 ~t 0 while #k - 0 for all k > 0. Then p~- - oo but p+ - 0 and all leading principal submatrices are invertible (all T-blocks have size 1) while there is an infinite It-block of right orthogonal polynomials which are just the powers of z.
The situation where P~,+I - oo clearly requires the simplest possible updating procedure. Another simple situation occurs when Pn+l + - oo. This situation is discussed in the next lemma and after that the general update will be given. Before that we have to define formally how we subdivide the right orthogonal polynomials into K-blocks, which, as we have repeatedly said, are not always the same blocks as the T-blocks corresponding to invertible leading principal submatrices. Deft n i t i o n 4.18 ( R - b l o c k s ) Let T be a m o m e n t matrix which is Toeplitz. We set by definition po,o - O. R-block number 0 has size pl - p{ + p+ + 1. Set v + 1 = pl, then if v < oa, the matrix T~,,~, is invertible. Hence its Iohvidov indices are defined. Denote them by p~. R-block number I of right orthogonal polynomials will have size p2 - p2 + p+ + 1. In general denote the cumulative R-block sizes as po,n - ~'~=1 pk. It will then be proved below that for v + 1 = po,n, the leading principal submatrices T~,,~, are all invertible whenever v < oo. Hence its lohvidov indices Pn+l + are well defined and R-block number n of right orthogonal polynomials will have size Pn+ l - Pn+ l + P++a + 1.
4.9.
TOEPLITZ CASE
191
The invertibility of T~,~ when v + 1 = p0,n proves t h a t boundaries of Rblocks are also boundaries of T-blocks. This property will be proved by induction in the following lemma's. The initialization of this induction was already given in L e m m a 4.28. In K-block number n of right orthogonal polynomials, two polynomials will be of special interest to describe the recursions. T h a t is the first one of the R-block, which we denote as a0,n (referred to as the T O P of t h a t block) and h,~ - zP:+ ~+1 ao,n (referred to as the NOP of t h a t block see below). This is a slight abuse of notation. Previously a0,n referred to a~., the T O P of T-block n, whereas we now use it with the meaning of ap0,. , the T O P of l~-block n. We are convinced though that this will not lead to confusion. Considering the sequence of polynomials zkao,n for k - 0, 1 , . . . , the polynomial an is the first one in the sequence which is not a right orthogonal polynomial from P~-block n, because it is not right orthogonal to the constant polynomial 1. Indeed, by definition of P~+I, the inner product
<1, an>- (1, z p'~--+'+Iao,n > # ( zk
II
0. Note also that
an)-O
for k - 1,
9
9
9
,po,n+l-1
but (zp~247 ~ , a . )
-
zP~
~ , a0,.
r 0
by definition of Pn+l. + For easy reference we shall call the polynomial fin the NOP for K-block n, referring to the fact that it is Not a right Orthogonal Polynomial. We can refer to it as the N OP for block n if we assume it is uniquely defined by a monic normalization. We now give the l e m m a describing the u p d a t e when PO,n+ 00. L e m m a 4 . 3 0 Let T be Toeplitz and T~,,v with v + 1 = po,n a nonsingufar leading principal submatriz. Let the corresponding Iohvidov indices be P++I - oo and Pn-+a < 00. Then all Tk,k are singular for k > v. There is an infinite R-block of right orthogonal polynomials which can be
computed as follows: Set a~+k+~ (z) - zka~+l (Z) for k - 0 , . . . , Pn-+l. Shifting the last of these polynomials once more will give the N O P fin. Setting - v + P~+I + 2, we can now find a nonzero constant cn+x = - ~ / r n - - 1 (see page 18g) and monic polynomials s of degree j such that a~,+j(z)
-- z3~l,n_l (z)cn+ 1 -~- an(z)d~+l,j(z), j - 0, 1, 2 , . . .
P r o o f . T h a t Tk,k is singular for all k > v follows from the fact t h a t Ta~+l - 0 because Pn+l + - oo. Let A be the unit upper triangular m a t r i x
192
C H A P T E R 4.
ORTHOGONAL POLYNOMIALS
whose u + 2 first columns are the stacking vectors of the right orthogonal polynomials ak, k = 0 , . . . , u + 1 and the remaining columns are columns from the identity matrix. Then, setting T A = R, we see that the determinant of a leading principal submatrix Tk,k of T will be equal to the determinant of the corresponding leading principal submatrix of R. Since the latter always contains a zero column for k > u, all Tk,k are singular for k>u. The choice for the first P~+I + 1 polynomials a~,+i+l is correct by Lemma 4.27. The remaining polynomials can be obtained as indicated. For example, take j = 0. We have
{ :riO f o r k - 0 '
-0
for k -- 1 , 2 , . . .
Similarly (
){ z k,a,,-1
Thus there exists a right orthogonal to More generally, right orthogonal to
#0 -0
fork-O fork-l,...,p0,n
-1
nonzero constant Cn+I such that ap - a,n-lc,,+1 + an is z k for k - 0 , . . . , p 0 , n - 1. suppose ap+i for i - 0 , . . . , j 1 have been computed all z k for k - 0 , . . . , p 0 , n - 1. Now, z j ap - z ~(a,,_ 1 c,,+ 1 + a . )
is right orthogonal to z k for k - j , . . . , p 0 , n - 1 + j. By adding multiples of z'~n, i - 0, 1 . . . , j 1, we can satisfy the remaining orthogonality conditions. It remains to be shown that the polynomials ap+j are monic of the correct degree. This will be the case if deg 5n-1 < deg ~,,. Now deg an-1
=
Po,n-1 + P~, + 1
<
po,,,-~ + pT, + p+ + 1 - Po,n
<
Po,n + P~+l + 1
=
deg an.
This proves the lemma. The most general case occurs when both P++I and P,,-+I are finite. We then have a finite block of right orthogonal polynomials of size pn+l = - -l--P,,+I + 1. The update within this block is as described in the previous Pn+l +
4.9.
193
TOF, P L I T Z C A S E
lemma but we do this only a finite number of times. The update required to leap out of the block and compute a0,n+l, the first vector of the next block is new in this situation. The result naturally generalizes the classical three term recurrence relation for polynomials orthogonal with respect to a positive definite Hermitian Toeplitz matrix. T h e o r e m 4.31 ( t h r e e t e r m r e c u r r e n c e ) Let T be an infinite m o m e n t matrix which is Toeplitz. Suppose v + 1 - po,n is the index denoting the start of an R-block of right orthogonal polynomials, then the leading principal submatrix T~,,~ is nonsingular. Let the corresponding Iohvidov indices P~+I and Pn+l+ both be finite. Then the block size of block number n of right orthogonal polynomials is P~+I - 1 + P~+I + Pn++l < oo. The first monic true right orthogonal polynomials, i.e., the first of each block are given: ao,k, k - 0 , . . . , n. In particular a~+l - ao,n. The remaining polynomials o] R-block number n can then be obtained as follows. The first P~+I + 1 are given by 9
a~+i+l(Z) - z'a~+l (Z)
for
m
i - 0 , . . . , Pn+l"
(4.24)
For the remaining Pn+l+ , there exist a nonzero constant Cn+l and monic polynomials d~+l, j of degree j such that a~,+j(z) - zJ~tn_l(Z)Cn+l -~- ctn(z)dln+l,j(z),
j -- 0 , ' ' ' , P n ++ I -- 1
(4.25)
where we have denoted ~ - ~ + P~+I + 2 and ak zPk--+l+lao,k, k - n - 1 , n . The first polynomial in the next R-block can be found by a recursion of the form + a0,n+l (Z) -- zP'~+1 a,n-l(Z)Cn+l -~- ao,n(z)dn+l (z) (4.26) -
-
where cn+ 1 iS a nonzero constant and dn+l(Z) is a monic polynomial of degree pn+l. Let Tv,,v, be the leading principal submatrix of size •' + 1 - Po,n + pn+l = Po,n+l 9 Then, if Pn+l + > 0 it is the smallest nonsingular leading principal submatrix of T which contains T~,,~,. In other words, pn+l has the meaning of the block size of a T-block. If Pn+l + - O, then all T~,+i,~+i for i - O, 99 9, Pn+l are nonsingular. The corresponding right orthogonal polynomials are all TOPs, but Tv,,~,, is the smallest submatrix containing T~,~ for which the T O P a~,+l can not be obtained by simply shifting ao,n.
P r o o f . The choice for the first P~+I + 1 polynomials a~+i+l is justified by Lemma 4.27.
CHAPTER
194
4. O R T H O G O N A L
POLYNOMIALS
To prove that the remaining polynomials in the same R-block are of the indicated form, we may repeat the proof of the previous lemma for a finite number of updates. To show the claim about the (non)singularity of the leading principal submatrices, we consider the current block of right orthogonal polynomials. Express the right orthogonality of the polynomials ak, k = 0 , . . . v ' , which we have obtained so far in a matrix form. If we put their stacking vectors in a unit upper triangular matrix A~, and if we denote Tn = T~,,~,, then it holds that T,.,A,., = R,., where Rn is block lower triangular. The determinants of the leading principal submatrices of Tn are equal to the determinants of the leading principal submatrices of Rn because An is unit upper triangular. Now we can write R,~ as (suppose for the moment that Pn+l+ > 0)
R,, -
[Rnl ~ X
[~ R12] ,47,
R,,,,,
R22
X
with R22 a lower triangular Toeplitz matrix with diagonal elements
,
-
# o.
Its size is Pn-+i + 1. The right top block R12 is lower triangular Toeplitz and nonsingular, with diagonal elements (zP~ ~, a n - l ) c n ~ 0 and its size is + Pn+l. It follows that the first nonsingular leading principal submatrix of Rn which contains Rn-1 is Rn itself. If Pn+l + - 0, then the block Rn,n consists only of the R22 block, which is a nonsingular lower triangular Toeplitz. This proves our claims about the singularity and nonsingularity of the leading principal submatrices of T. It remains to show the updating for the first polynomial of the next block. The proof begins along the same lines as in the previous construction. + First we observe that zP-+~a0,n - z p~+~~,~ is not orthogonal to the initial powersz k k - 0 + P,~+I By doing similar operations as before, we can generate the polynomial +
a'(z)
=
zP,~+~an_l(z)c,+l + 5,.,(z)d~+l(z ) p+
(4.28)
-
z "+~an-l(z)cn+l + zP"+~+lao,n(z)d~+l(z) !
with Cn.t_1 a nonzero constant and dn+ 1 a polynomial, in such a way that it is of degree exactly P0,n+l and such that it is orthogonal to the powers z k for k - 0 , . . . , p0,n + p++l - 1. The polynomial d~+ 1 plays the same role as
4.9. T O B P L I T Z CASE
195
' but now with j the previously mentioned dn+i,j t h a t it is a monic polynomial of degree Pn+i + "
+ This means e.g. Pn+i"
This is not enough for a'(z) to be block orthogonal though. Since T~,~, is invertible, the polynomial a~,+i should be right orthogonal to all powers z k for k - O , . . . , v ' - p0,n+i - 1. To obtain this, we add multiples of z ta0,n, I - O, 9 9 9 Pn+-1 to the polynomial a'(z) to get also the remaining orthogonality relations. The polynomial + zlao,,., is indeed right orthogonal to z k for k - 0 , . . . , po,n + Pn+l + l - 1 but not right orthogonal for k - p0,n + P++I + I. Thus by adding these multiples to a'(z), we get a~,+l(Z) which is a true right orthogonal polynomial and which satisfies the recursion as it was announced. The last u p d a t e is denoted as
II where dn+i(z ) is a polynomial of degree at most P,,+iu p d a t e can be rewritten as
-
-
+ zP"+l~Zn-1
Thus the whole
( , ,, 1 + ao,n(Z) z p-~+~+ldn+l(z ) + dn+l(Z
+
=
+
where dn+l (z) - zP:+ ~+1 dn+ , 1 ( z ) + dn+ l(z) is a monic polynomial of degree Pn+l
.
[]
Some readers prefer not to work with the series and polynomials and like to see the operations described in terms of linear algebra and matrices. For them, we can give the following reformulation. Let Tk be the invertible leading principal submatrices of the Toeplitz m a t r i x T corresponding to the l~-blocks. Set v + 1 = p0,n+x, so t h a t Tn = T~,~ is invertible. For k - n, n - 1, let a0,k - [~T,k 1]T with a_0,k a vector of size Po,k which solves Tk-la0,k = -[#p0,h "'" tq] T" These vectors are shifted and if necessary extended with zeros to get
T~,~+I [ Zp,,- +p~+~ + +1 ao,n-1
l ao,n Zao,n
.-.
Zp-+~ ao,n ]
(4.29)
C H A P T E R 4.
196
ORTHOGONAL POLYNOMIALS u
9" + Pn+l + l , n - 1
9 ''
Tl,n
T + Pn+l+l, n
9
9
T1jn-1
Tlzn
0
0
r O+, n - 1
r O,n + 9
T
+
On--+l ,n-- 1
0
0
0
.
7`+-
r.+ O,n
...
P n + 1 ,n
where the bold zeros have p0,n rows. The series O O
(z)
-
--
--
1
,
rl, k -- r k -
r
0
i=1
+
k+O++
,
,~+ 0,k--rk r
i=O
will be identified as residuals in two-point Pad~ approximants below. Thus the triangular Toeplitz matrices in the right-hand side are nonsingular and they are used to eliminate the elements in the first column. More precisely, t H define the vectors dn+ 1 and dn+ 1 as the solutions of the systems Tl,n
9 ''
9
T+ P,~+I + l , n
..
T+ P,~+I + l , n -
d~+ 1
9
-
-
_
1 Cn+ 1
9
9
(4.30)
9
Tl,n
Tl,n--1
and r 0,n +
r 0+, n - 1
: r+
"'.
Pn--+i ,n
...
d"+l
-
r+
-
c~+1 r +_
O,n
P n + I ,n-- 1
then, multiplying the relation (4.29) from the right with the vector [Cn+l
dnT+l]T
where
dn+l
- - [ d nII+ 1 T
t T]T , dn+l
yields all zeros in the right-hand side. Hence the corresponding combination of the a0,k columns in the left-hand side should give a0,n+l. In the series notation, the solutions of the triangular Toeplitz systems are formulated as divisions of series, which corresponds to solving the systems by backward and forward substitution.
4.9. T O E P L I T Z CASE
197
It should be admitted that it is not really a simple generalization of the generic three term recurrence. It is more complicated in the sense that the block upper Hessenberg matrix Tn in the general relation
F(ao,n+l )An = A,~Tn
(4.32)
of Theorem 4.9, which expressed the recurrence, does not reduce to a block tridiagonal form here. It remains unit upper Hessenberg with a special structure, but which is too involved to describe easily. We leave it as a challenge for a devoted reader. It is however possible to find another matrix relation involving simpler matrices. We shall do this in Section 4.9.9. Note that the generic case in which all the leading principal submatrices of T are nonsingular and p~- = 0 is contained in Theorem 4.31. All the • Pk - 0 and all the 1~- and T-blocks are of size 1. There is no update within a block (no inner polynomials) and the only recursion that remains has the form
ao,nTl(Z) = zao,n-l(Z)Cn+l + ao,n(z)dnTl(Z), with dn+l(z) - z + d~+ 1 a monic, first degree polynomial.
4.9.5
Triangular
factorization
As we have pointed out several times before, by re-reading the previous proofs carefully, it is found that at the boundaries of the R-blocks we have a nonsingular leading principal submatrix T~,~. However, there may be nonsingular leading principal submatrices found at every position between two successive boundaries. The latter happens for R-block number n of size pn+l > l i f t Pn+l + -0andpn+l >0 With these observations in mind, we can now give the structure of the diagonal blocks in the block triangular factorization of the moment matrix. Note that in the next theorem, we subdivide the biorthogonal polynomials in the finer set of T-blocks and not R.-blocks. We also use the much stronger condition of quasi-orthogonality (indicated by the dot) and not just block biorthogonality. T h e o r e m 4.32 Let the moment matrix of the inner product be Toeplitz and let {bi, izi}~ be sets of left/right normalized quasi-orthogonal polynomials, divided in T-blocks bn, itn with block sizes an+l and block indices ~n+l. Let us denote the TOP (first true orthogonal) polynomial bo,n of b,~ as b and the TOP (first true orthogonal) polynomial iZo,,~ of it,~ as a. Then the matriz
198
C H A P T E R 4.
Dn, n - ( [ ) n , i t n )
ORTHOGONAL POLYNOMIALS
form
of size ~ n + l has the
V,n,n
0
"
(4.33)
The matrices un,n and Vn,n are identity matrices of size T]n_t_l + and ~7~ 1 + 1 respectively. These numbers are related to the Iohvidov indices Oln_t_1 of Tn-1 = T~,~ with u + 1 = an by
+ 77n+1 and
{o
~7n+1
-
-
+ Otn+l
-
/.f (a, b) r 0 r
-
O:n..i.. 1
+ 1 -0 O~n_l_
otherwise.
P r o o f . First suppose that for block n the Iohvidov index an+ + 1 ~ 0, then the T-block is also an B.-block and we can apply the results for R-blocks that were just obtained. The left orthogonal polynomials are subdivided into blocks conformal to these R-blocks. We start with the block orthogonal polynomials (bn, an) and show how they are transformed in a set of quasiorthogonal polynomials (l~,, hn). The block diagonal Dn,n = (bn, an) is obtained by multiplying the matrix Rn of (4.27) with B H, where Bn is the unit upper triangular matrix of stacking vectors of the left orthogonal polynomials. Thus the block Dn,n is found by multiplying the block Rn,n of (4.27) from the left with the lower unit triangular block B 1'gi.~7"1~of Bn Because Rn,n is a column permuted lower triangular, we can multiply with a permutation matrix
P-
[o11] I2
0
w i t h / 1 the identity matrix of size O~n-_l..1 + 1 and /2 the identity matrix of size an+ 1 + to make Rn,n lower triangular (and nonsingular). It follows that also Sn - BHnR,,,,~P is lower triangular. By rescaling the left orthogonal polynomials of the current block, i.e., by replacing bn with I~n - bnSn H, w e find that is an identity matrix. Thus ([)n,an/- p T , whence
(bn,anPI
(l~n, an) is a set of blocks of normalized quasi-orthogonal polynomials and + therefore P is unique. This proves the theorem for the case an+ 1 > 0. Now suppose that the T-block structure of the leading principal submatrices of T is finer than the K-block structure of the right orthogonal polynomials 9 Then P n++ l - - ~ n + - 0 " In this case of course /)n,n is 1 by 1 +l
4.9. T O E P L I T Z CASE
199
The formula (4.33) still holds if we use the fact that Un,n has size O~n§ 1 § -- 0 and therefore vanishes. So bn,,~ - Vn,n which contains only 1 element (which is ~,~ ~ 0). This explains the definition of the sizes v/n+1 • 9 o This theorem has a most interesting corollary. C o r o l l a r y 4.33 Let the left and right Iohvidov indices an+ and ~n+l+ for the Toeplitz matrix T satis]y an+ + 1 ~ 0 and thus also f~n+l + ~ 0, then ~n+l + -9 ~n+l ~,~+1, there is in general no an+l + l anda++l ~ , + 1 + 1 If + - 0 + relation between an-+1 and ~ + 1 . P r o o f . The previous theorem fixes the size of the identity matrices Un,n and vn,n. A completely symmetric treatment of the problem using left block orthogonal polynomials would lead to the conclusion that un,n should be of size f~n-bl - 1 and Vn,n should be of size f~n+l. Since the permutation matrix P is fixed, even though the quasi-orthogonal polynomials are not, we can identify the sizes of the Un,n and vn,n which leads to the result. [:] Of course, as in the general case the block orthogonal polynomials give an inverse LDR factorization of the nonsingular leading principal submatrices of the moment matrix. T~, 1 - AnD~ 1B H. However, using the persymmetry for the Toeplitz case, we also have Tn T -
i T~ 1 i
- ( I An i )( i D~, 1 l )( i Bn i )H
which is an LDR (and not an RDL) factorization of T~ T. 4.9.6
Rational
approximation
If we want to give appropriate rational approximation properties related to this Toeplitz case, we should first prove a corollary to Theorem 4.31. C o r o l l a r y 4.34 Let us use the same notation as in Theorem ~.31. Then every polynomial ao,n which starts a new R-block of right orthogonal polynomials has a nonvanishing constant term, i.e. a0,n(0) ~ 0. P r o o f . The proof goes by induction. Since a0,0 - 1, we should only prove that a0,n+l(0) ~ 0 if a0,n(0) ~ 0. Following the update formula (4.26) of Theorem 4.31, it is sufficient to show that the constant term dn+l(0) ~ 0 because the constant in 5n-1 - zP;+la0,n-1 vanishes. Now express the fact
CHAPTER 4. ORTHOGONAL POLYNOMIALS
200 that
a 0 , n + l is orthogonal to z k with
+ k - po,n + Pn+l" This gives only two
terms from the right-hand side (see (4.29)) 0
cn+l + + (zk,ao,n> dn+l(O).
--
Both of the inner products are nonzero by definition of p+ and Pn+~ + respectively. Because we also know t h a t cn+~ is nonzero, it follows t h a t also dn+l (0) can not be zero. D Note however t h a t the previous corollary is only true for polynomials starting a new P~-block. T-blocks corresponding to nonsingular leading principal submatrices may start with polynomials with vanishing constant terms if they do not coincide with the start of an R-block. We are now able to prove an approximation property which will be identified as a two-point Pad6 approximation in a later section. The two series t h a t will be approximated are
f +(z) --f_(z)
-
-
-
[tO "{- ~ - l Z + ~-2 z 2 + ' ' "
(4.34)
#1 z - 1 + ~2 z - 2 + ' ' '
(4.35)
The series f+ is the same as the previously mentioned f + , but for ease of notation we have used f_(z) for what was previously denoted as f - ( z - 1 ) . The rational approximant has denominator a0,n and the n u m e r a t o r can be found as follows. First note that for v + 1 - p0,n, we may write /tO
~1
"""
~v
9
#-1 9
9
#-~,
#o
".
,
~v+ 1
9
#,,
9
.
a v + l -- 0
.
9
9
/.tl
"'"
#-1
9
tto
#1
which can be split up into /.to ~
Cv+l
--
/to
#-1
av+l 9
~--v+l
0
~1
~2 9
~ 9
~
9 ..
/to
9 .
.
/.tv+
0 1
o
av+l. ~2
0
#~
4.9. T O E P L I T Z CASE
201
Clearly c~+1 is the stacking vector of a polynomial c~+1 = co,,, which has degree at most v. It can be described as the polynomial part of f_(z)ao,n(z) or as the initial terms in f+(z)ao,n(z). Using the notation of Section 1.5, we can express this as
co,,(z)
-
--
f_(z)ao,,(z) div 1 ZP~
(4.36) zP~
(4.37)
It follows readily from this definition t h a t
r~, (z) - f_ (z)ao,n(z) - Co,n(z)
-
~ z - l - P ~ + 1 + lower degree terms §
r+(z) - f+(z)ao,n(z) - C0,n(Z)
-
~+z~'+l+~
+ higher order terms
with nonzero ~ , and ~ . Now because the constant t e r m as well as the leading coefficient in a0,n is nonzero, we can t r a n s f o r m this into f_(~) - c0,,~a0, -In
=
r~-n z -.-2-.:+ 1 + lower degree terms
(4.38)
-1 -- C.o,nao, n
__
~
(4.39)
f +(z)
ZU-[-1 +o,~+~ +
+ higher order terms.
The rational function co,nao, n-1 which has 2u + 2 degrees of freedom, fits in the two series f+ a total of 2(u + 1) + pn+l - 1 coefficients with pn+l + Pn+l + P~,+I + 1 >_ 1. Since one a p p r o x i m a t i o n is in the neighborhood of z - 0 and the other in the neighborhood of z - oo one calls this a two-point Pad6 a p p r o x i m a n t . The complete analysis of the generic case and the block s t r u c t u r e of the nongeneric case can be found in the book [41]. 4.9.7
Euclidean
type
algorithm
W h e n using the n o t a t i o n of series as we just did, it is not difficult to give a Euclidean type algorithm for the Toeplitz case too. It is related to the type of algorithms described in [38]. First note t h a t the nonzero initial coefficients ~+ and ~ t h a t we introduced above, correspond to
~+ --rn
_
-(
( : 0 , . ++~+~, a0,~ ) z - 1 - ~ 2 4 7 , ao,n
(z~~ ~+~, a ~ )
) -- (1, an)
This means t h a t the constant cn+l in the three t e r m recurrence can be expressed as
~+~ = - ( ~ _ ~ ) - ~ ( ~ ) .
CHAPTER 4. ORTHOGONAL POLYNOMIALS
202
The monic polynomial d~+l(Z ) which also occurs in the recursion, has a stacking vector satisfying (4.30). This system expresses the equality of the polynomial part of oo
oo
(E ?'i,n Z -
-i+1
t
)dn+l(Z)
+
- -Cn+l(E
i=1
?'i,n-lZ-i+l)
Zp'+I"
k=l
!
Thus d,+z(z ) is obtained as -+~ +pZ-p~+l r~,_z (z)c~+l div r~, (z). d~+ (z) - - z p+ All the other dn+l, j (z) are obtained by taking fewer terms in this division. ' To be precise d'+ a,j(Z) -- -- zJ+PZ -':"~+1 7 ,-n _ l ( Z ) C n + l div r~, (z), quadj - 0,...,P~+I. + t t t t is Note that dn+xj(z ) - zdn+xj_x(z ) + dn+l, i(0), i.e. that the next dn+ld obtained by continuing the division one step further. Since, according to Theorem 4.3 , the largest j for which we need d~+~j(z) in the recurrence is j - Pn+l, + it is sufficient to compute d~+l(z ) - d'n+l,p++ (z). iF we know
this one, we know them all. The polynomial dn+x(z) in the three term recursion is a composition of two parts:
d n + l ( z ) - z~
!
I!
dn+l(Z ).
(4.40)
It is explained above how to find the higher degree part d~+ 1. The lower II II degree part dn+ l(z) has a stacking vector d,~+l , satisfying (4.31). As for d~+l, this system expresses the equality of the first P~+I + 1 coefficients in oo
oo
ljn
i=0
z' ) d"n + l ( Z ) - - c n + l
$1n
-1 z' ) "
i=0
This polynomial d~+~(z) is at most of degree Pn-+l" The previous relation means that it can be obtained by dividing suitably shifted versions of r+_l (z) and r+(z). To be precise a"+l(Z)
-
-
+ -p~+ +o;+1 r n+_
1
divr+(z)]
Thus to get an algorithm in Euclidean division style, we should be able to compute the successive r k+ and r~-. By the linearity of their definitions one can easily find that they obey exactly the same recurrence as the denominators a0,k. We can collect these results in Algorithm 4.6. To show the
4.9.
TOEPLITZ
203
CASE
similarity with the Hankel case Euclidean algorithm, one could write the recurrence in terms of G and V-matrices. So if we set
Ok -
?'k-1
Tk
ao ,k- 1
ao ,k
rk_ 1
rk
I
+
,
+
+ -Fpk-I-1
[
0
V k + 1 --
Z ph+l
1
Ok+ 1
dk+l(Z)
]
then Gk+l = G k V k + l . Note however t h a t the G m a t r i x contains two residual series and the denominators, whereas for the Hankel case we had the numerators, denominators and one residual series. We now give an example which illustrates the result of the algorithm. Example
4.5 We consider the following series I+(~)
:
I z2 + ~ 1 Z3 + g 1 Z4 + 1-6 z5 + 9 z6 + "
I-(~)
=
-z -I+~
l z _ 3 _ 1 -s
~z
+g
lz
-
7
1
-9
- 1--~z + . . .
The algorithm initializes a0,0 - 1 and c0,0 - 0 and the corresponding residuals are
r 0(z) _
=
Iz_3_ -z -1+ ~
1
~z
-s
I
+ gz
-7
1 -9 - 1--~z + . . .
The Iohvidov indices are p+ - 2 and p~- - 0 so t h a t Po,l -
Pl -
P+ -I- P l -I- I - 3.
The leading coefficient in r o is ro - - 1 . To find a0,1(z) we should solve a homogeneous Toeplitz system. Setting a0,1 - [ ~ T 1 1] T, this system is
I~176 0 1
0
1
a_o,x-
0 0
-i
The m a t r i x of this system is the first nonsingular leading principal s u b m a t r i x of the m o m e n t m a t r i x T - [#j_,]. The solution of this system defines a0,1(z) ~s , t
ao,l(z) - z a + ~ z L
2.
C H A P T E R 4.
204
ORTHOGONAL POLYNOMIALS
Algorithm 4.6: Toeplitz_Euclid Civen the y+(z) - E~~ # - k z k ~nd - y _ (z) - E~~ t, kz -k Set aoo - 1, ~oo - O, %+(z) - f § p~- - min{p _> 0 " ~l+p # O} p+ - min{p > 0 " # _ v # 0} Set ro - - t tp~-+1
To(Z)-
y(z),
poo-
0
Solve T,,,v+lao,1 - 0 , T,,,,,+I = [ t t j - i j i l~,,v+ = o j = o1
with u + 1 - Po,1 - Pl - P t + P l + 1 Set Co,1(z) - f(z) - ao, 1 (z) d i v 1 r t ( z ) -- f + ( z ) a o , l ( Z ) - c0,1(z) r l ( Z ) -- f - ( z ) a o , l ( Z ) - C0,1(Z) for k - 1 , 2 , . . . Define Pk+l and ~- by r k ( z ) -- r k ( Z ) Z-I-p-~+I if 1.d.t., ~- ~ 0 Define Pk+l + by + r k ( z ) - r v+ k ( z ) zPO,k+P++~ -F h . o . t . , rv+ k r
Define Ck+l - - ( ~ -
1)-1~ -
Set d~+ l(z) - -[zP++ 1+P; r~-_l (z)ck+l] div[z pk+l r~- (z)] m
It Set dk+l(z ) _ _zpk+~ [zpk++,-pk+~ +p~-+l rk_ 1 + (z)ck+l d i v rk+ (z)]
Set dk+l(Z) -- dg+l(Z ) + ZPk-+l+ldlk+l(Z ) Tk+l Tk-1 + +p~-+1 a0,k+l aO,k-1 z pk+~ Ck+l + a O , k dk+l + + r+k rk+l rk-1 co,k+l(Z) - f _ ( z ) a o , k + l ( z ) d i v 1 + Set pk+l - Pk+l + Pk-+l + 1 and po,k+l - po,k + pk+l endfor -
-
4.9.
TOEPLITZ
CASE
205
The corresponding n u m e r a t o r c0,1 can be found as the polynomial part of f _ (z)a0,1(z). It is C0,1(Z) -- . f _ ( z ) a o , l ( Z
) div 1 -
-2.
The corresponding residuals are now obtained as
~+(z)
~{-(z)
--
f+(z)ao,l(z)-
_-
_l_z~_ _lz,_ A z ~ _ ! z ~ 4
8
16
f_(z)ao,l(z)-
-
=
C0,1(Z) 32
c0,1(z)
2z_ 1 _ I Z - 3 + l2z _ s + Oz_ s . . .
This shows us i m m e d i a t e l y t h a t p~ - 0 and ~{- - 2. Hence P0,2 4. We can check the correct orders of approximation"
C.O,l(z)ao,l(Z) -1
1
=
- - z 2 ( z 3 -[- ~ z - -
=
_z-1
_-
~l z 2 +
-
-
P0,1 + 1
-
-
2) - 1
+ ~1 z - 3 - 2 z - 4 + . . .
z3+
1 z4 + " -
W h e n we compare this with the series f_ and f+ respectively, we see t h a t the difference starts deferring from zero with the t e r m in z -4 where 4 p0,1 + P2 + 1 and for the other one with the t e r m in z 3 where 3 - p0,1 + p+. We s t a r t now a regular iteration. Next we have to identify c2 as E2-
--('PO)-I'p1
--"
2 --1 = 2.
Then we c o m p u t e
d~2 -
- [ r o C l ] div[r{-]
=
- [ - 2 z -1 + z -a - l_z-S + . . . ] d i v [ 2 z -1 - - Z -~ + ...] 2
:
1.
In fact, the above calculation is unnecessary because we know this should be 1 because the polynomial has to be monic. The second part of the recursion polynomial d2 is found as
=
-[z 3+
----
4.
z 4 + ~l z s
+'"
i divi_ z3_ z4+ i
206
C H A P T E R 4.
ORTHOGONAL
POLYNOMIALS
Consequently the polynomial d2(z) is d2(z) - z + 4. We can now do an update.
ao,~(z)
=
++p;-+a ao,o zp' c2 + ao,l(z)d2(z) 1 1.z.2+(z ~+~z-2)(z+4)
=
z4+4z 3+~ lz2 +2z-8.
-
The residuals can be u p d a t e d by similar recursions"
~+(z)
~+(z)z.~++p?+~ ~ + ~+(z)d~(z)
=
~ - (z)
lz4 4
_
lzs
_
8
_lz6 +.... 16
~o- (z)z.~++p~-+l ~ + ~ (z)a~(z)
-
:
8Z - 1 _ 4Z - 3 +
2z - s + . . . .
This gives p3~ - 0 again and r2 - 8. Thus/90,3 P0,2 W 1 -- 5. We have again the predicted correspondence between the expansions of c0,2ao, ~. We have indeed t h a t -
c0,2(z)
:
-
f_ (z)a0,2(z) d i v 1 _ z 3 _ 4z 2.
The expansions give co,2(z)ao,2(z) -1
_
x + 1_z-3 33 -s -- - - Z ~2 4 l z 2 + iz3 + L z 4 + . . .
-
~
=
--Z--
~
"'"
32
We check just one more iteration. The constant c3 is ~
-
_(~-)-~
-
--- --4.
Now the higher degree part of the recurrence polynomial is d~
=
-[r{-c2] div[r~-] - [ - 8 z -1 + 4 z - 3 . . . . 1.
] div[8z - 1 _ 4z -3 + --.]
4.9. TOE, PLITZ CASE
207
Again, we know in advance that this should be 1 because the polynomial has to be monic. The lower degree part is again of degree zero and is given by d~
-
-[zr+c2] div[r +]
=-[z4+ --
zS+...]div[-~
z 4 -- i z S
....
]
8
4.
This delivers d3(z) - z + 4. Now we are ready for another update: a0,3(z)
--
aO,l(z)zP+3+P;+lc3 Avao,2(z)da(z)
=
(z 3 + ~ z1 - 2 ) z ( - 4 ) + ( z
=
zS+4z 4+~6z3+2z
4 + 4 z 3 + ~ 1 z2 + 2 z - 8 ) ( z +
4)
2+8z_32.
2
The corresponding numerator is c0,3(z) - - z 4 - 4z 3 - 16z 2. The expansions of the approximants are =
1 -3 1 s - - Z - 1 -~- ~ Z -- ~ Z --
_
_1z2 +
-
2
lz3 + lz4 +
32z -6 + . . .
...
etc.
4.9.8
Atomic
version
The previous algorithm was formulated in terms of R-blocks, which in a computational sense is the most natural thing to do. It computed explicitly the TOPs of the R-blocks, and not the inner polynomials of the blocks. It is of course possible to explicitly compute the inner polynomials as well and at the same time define the T-blocks. This would correspond to an atomic version of the previous algorithm. We do not give all the details, but we describe the decomposition of R-block number n. Since the T-block partitioning is a refinement of this R-block partinioning, this decomposition
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
208
should also give the T-block recursion. Therefore we let the index n refer to a T-block. Suppose we know the T O P s of two successive l~-blocks. This is what we need to make the recursion work. Suppose t h a t the second l~-block corresponds to T-block n u m b e r n. We will then denote the T O P as a0,n and the corresponding residuals as r + and r~,. Thus co,,,
=
f - a0,n d i v 1
rn
:
f-ao,n
- - Co,n
r I"L +
:
f+ao,n
- - Co,n
m
The T O P of the previous l~-block will also be the T O P of some previous T-block, which need not necessarily be T-block n u m b e r n - 1. Assume it is T-block n u m b e r 7i with fi. <_ n - 1. The T O P and corresponding residuals + We define the m a t r i x are therefore denoted as a0,,~ and r,~ and r,~.
G(n~
r~, r; 1
- Gn --
aO,h
ao,n
7"h+
r+
9
and c o m p u t e V
-(k+)1 _ ~(k-1)~/(k) "n-l-I "n-l-l~
k - 1 ~ ' " 9~ a , + l
"
The polynomials at position (2,2) of G(k+)1 are the inner polynomials of block n for k - 1 )" " "~ an+l - 1 and Gn+l - ~ ( ~ + ~ ) " For simplicity of notation, "-'nh-1 we drop the index n + 1 everywhere. The V - V,~+I m a t r i x for T-block n u m b e r n is now decomposed as V
-- V(1)V
(2)...V
(an+l).
First we check if a + - 0. This can be done by computing an inner p r o d u c t or by inspecting the a p p r o p r i a t e coefficient in r +. If a + - 0, we know t h a t the n t h T-block has size 1. We then check if a - - 0 again by inspecting an inner product or an a p p r o p r i a t e coefficient in r,~. If a - > 0, we know t h a t we can find the T O P of T-block n + 1 as a0,n+l = zao,r,. Thus V_V(1)_
[ 1 0
0] Z
"
We can replace n by n + 1 and restart the procedure.
4.9.
209
TOEPLITZ CASE
I f a + = a - = 0 o r i f a + > 0, the T-block is of size a = a + + a +1 and it is treated exactly as an l~-block, except t h a t the p's are replaced by a's. In these cases, the V matrix is decomposed as
[ ] [ 1 0
0 z
~
0 z
cz ~'~ 0
with V'-
V'(~
] [ [ V'
'(~+) -
--
z-(~-+l) 0
1
0
d'(z)
Z a++l
0 1
]
'
] [ V"
d'
z-(~-+l) 0
~
(Z) -- E
0 1
]
'z ~ ,
di
i=o
V'(i) _
0
:,+-i z
Vtt-Vtt(~
'
i- 0 ,.. . , a + ,
[ za-+lO
do,+ - 1,
dt'(Z)
d"(z)-
E'"' aiz
,
i=O
V"(i)-[ zO d~']l , i-O,.. ,. a-. These factors are grouped into V(0 factors, computing the inner polynomials as follows. First, for i = 1 , . . . , a - :
V(0-[
10 0]z
,
i=
l,...,a-
which are just shifts for the second column of G. The next step is to generate
This has two objectives. First it brings 5~ and the corresponding residuals to the first column of G and second, it eliminates the nonzero leading coefficient of the r - residuM. This leading coefficient indeed prevents ~,~ from being an orthogonal polynomial. The remaining a + - 1 inner polynomials for T-block n are obtained by multiplying with
V (a-+i) - V t(i-1), i - 2,..., (2+. The effect of these multiplications is that the previous inner polynomial is multiplied by z, causing an unwanted coefficient to appear in the corresponding r - residual, wich is immediately eliminated. This is easy since the ~,~ and its residuals are available in the left column of G.
210
CHAPTER 4. ORTHOGONAL POLYNOMIALS
This describes how to compute all the inner polynomials of block n. To make the transit to the next T-block, we have to multiply with
V(~,,+,) = V'(~ +) [ The first factor works as the previous ones, shifting the last inner polynomial of block n and eliminating the unwanted r - coefficient. The second factor restores a0,n and corresponding residuals from 5,, in the first column of G. The factors of V" are needed to eliminate one by one, the unwanted r + coefficients. This is obtained by subtracting multiples of shifted versions of the first column of G, so these shifts are also built up in this first column. Thus finally by the last factor, we restore a0,n in the first column once again. 4.9.9
Block bidiagonal
matrices
As we have promised earlier, we now come back to an alternative for the matrix relation (4.32) expressing the three term recurrence with the help of an Hessenberg matrix Tn. Instead of the less simple Hessenberg matrix Tn, we shah derive another matrix expression describing the recurrence involving much simpler matrices. The reason that Tn is filled up considerably, even though we have a recursion which is of a three term type, is basically because some a,, polynomials depend upon some tilded one(s), i.e., upon NOPs of the current or a previous block and these are not elements from the sequence ak, k < v. Since Tn acts upon An which contains only the computed orthogonal polynomials, we should express these NOPs in terms of the ak which involves several of the ak's and this generates the nonzero entries in Tn. However, these NOPs are of the form z times a polynomial which is one of the preceding ak. In the general case, the Frobenius m a t r i x F(ao,n+l) in the left-hand side of (4.32) precisely generated these shifted polynomials. This suggests t h a t we should bring the term involving the tilded polynomials to the left-hand side. This is what we shall do now for the different updating possibilities' of Theorem 4.31. Since we need several v and P wlues, we give them an index: We set vk + 1 -- P0,k and v k - 1 -- vk + Pk+l + 1. From the updating given in (4.24) we get
za,,,.,+,+l(z)- a~,,.,+,+2(z)for
i-
0 , . . - , P n + , - 1.
(4.41)
From update (4.25) for j - 0, we thus get +
-
(4.42)
4.9. T O E P L I T Z CASE
211
(recall that
d'n+l, 0 - 1 and ~ n ( Z ) - z p~+~-bla0,n(z)). I I I For j > 0 in (4.25), we use dn+ld(Z ) - zdn+lj_ l(z) + dn+15(0 ) as we found before. So we can rewrite (4.25) as
afir~q_j(Z ) -- Z (zJ-lr
Af_ ~l,n(z)dn+l,j_l(Z) ) -JF Cl,n(z)dn+l,j(O).
The expression between the big brackets is a ~ + i - l ( Z ) . Using the definition of ?~n(z), we can bring this in the form
z (a~,,.,+j-1 ( z ) + zP-~+lao,n(z)dn+l,j(O)) - ac,,.,+j(z).
(4.43)
Finally from the recursion (4.26), using (4.40), we get
+ zp..+l~l,n_l(Z)Cn+ 1 _~_ao,n(Z ) ( zp~+ 1 +1 dn+l! (z) + d ni+t ! (z) )
ao,n+l(z)
(+
z z~
'
an-l(z)cn+l + an(z)dn+ 1,P,~+I + -1
(0) + -
(z))
(z)
z (av,,+p++l_ 1 (Z) AV aDn_ 1 (z)dtn+l (0)) -~ ao,n(z)d~+ 1(z)
Which is equivalent with
Z
(av,~+1(Z)
-[- apn_l(z)dtn+l(O))
- ao,n+l(Z) - ao,n(z)d~+l(Z).
(4.44)
Collecting the relations (4.41-4.44)into one matrix relation results in something of the form F(ao,,.,+l )Ar, U,., - A,.,Sn (4.45) where F(a0,n+l) is still the Frobenius matrix for the (monic) polynomial a0,n+l of degree p0,n+l and An is the unit upper triangular matrix whose columns are the stacking vectors of the polynomials ak for k - 0 , . . . , p0,n+l1. The unit upper Hessenberg matrix Tn from (4.32) is written here as SnU(_, 1 where now Sn is a simplified unit upper Hessenberg and Un is unit upper triangular. Puzzling the pieces together we find that U,, is block bi-diagonal V 00
V 01 U 11
r n
m
U12 9
, 9
,
n-l,n
C H A P T E R 4. O R T H O G O N A L
212
POLYNOMIALS
with U~ - l ' n a typical off diagonal block of size pn x p . + l having the form
Pn+l
U~ - 1 ' " -
1
P n++ l
p;,
o
o
o
1
0
c.+1
0
p+~
o
o
o
]
]
and Unn, a typical diagonal block of size pn+l x p , + l of the form p~,+~
1
I 0
0 1
0
0
P~+l Vnnn - 1 +
p.+~
P n++ l
0 -| [ i d_n+ ' 1 ]T .
J
I
The middle row contains the reversed stacking vector of the polynomial d~+l(Z ). As for the matrix Sn, this is also block bi-diagonal
so0 s~~ s 2 9
o 9
, o
Snn,n 1
S,,nn
with a typical block S~" of size p.+x x p . + l of the form
S n~ -
P n- + l
P n+ + l
1
0
0
Pn+l
I
0
0
I
+
p.+~
1
_ I,
dn+l 0
1
]
which is in fact the Frobenius matrix for the polynomial z p"+I + d~+l(Z ). The subdiagonal blocks are zero everywhere except for the right top element which is 1. From (4.45) we can now derive the following determinant formula for
~0,.+~(z)" ao,n+ 1 ( Z )
--
d e t ( z I - F(ao,,.,+l))
=
d e t ( z I - A,.,S,.,U(_, x A~ 1)
=
d e t ( z I - SnU(, 1)
=
det(zU,., - S,.,)
because Un and An are unit upper triangular. The latter relation gives a determinant expression in terms of the simple matrices Un and Sn rather than the general relation a 0 , n + l ( Z ) - d e t ( z I - S,.,U(_,1) in which S,.,U(_,1 is a unit upper Hessenberg matrix with many nonzero elements in general.
4.9.
TOEPLITZ
4.9.10
213
CASE
Inversion
formulas
To find inversion formulas for a Toeplitz matrix, we need a so called fund a m e n t a l system. This system is defined as follows. Suppose T~,~ is an invertible Toeplitz m a t r i x , then the system
T~,~+I
P0
q0
0
1
o
9
0
0
9 P~ 1
9 q~ 0
.
.
-
(4.46)
00
will have a unique solution. The couple of vectors (q, q) with q given by q - [qo "'" qL, 0] T and p - [po "'" pv 1] T, is called a f u n d a m e n t a l system. Obviously, for v + 1 = ~n, the vector p corresponds to the stacking vector of the T O P a0,n. The vector q is not explicit in our row recurrence scheme. On condition t h a t a + > 0, thus t h a t the T-block has size larger t h a n 1, we find t h a t the N O P 5n-1 is of degree nn-1 + a ~ + 1 which is less t h a n the degree of a0,n which is n,, - '~n-1 + a~, + a + + 1, while its stacking vector satisfies an equation like q with the 1 in the right-hand side replaced by ~,-1 ~ 0. Thus in this case q - ~,,_1(~,_1) -1. If however a + - 0, then an-1 has the same degree as a0,n and we do not get the zero at the position v + 1 of vector q. We could in t h a t case set [qr
0it
_
_
whereby the offending leading coefficient is eliminated without changing the right-hand side. This however is not so nice to work with. This is because we only used the right polynomials. At this point, it is b e t t e r to introduce the left polynomials as well since the vector q appears more naturally in terms of a reversed left polynomial as we shall see. For the dual results, we use the following notation. The analog of (4.34,4.35) is g+(z) --g_(z)
Note t h a t g ( z ) a p p e a r in
g+(z)-
-
#o + # l z + # 2 z 2 + . . .
(4.47)
--
~_1 z-1 + ~_2 z-2 + ' ' ".
(4.48)
g_(z) -
f(z-1).
The corresponding residuals
with s~ (z)
-
~, z - l - ~ + ~ + lower degree terms
s+ (z)
-
~+ z"+l+~++ ~ + higher order terms
214
CHAPTER
4.
ORTHOGONAL
POLYNOMIALS
with nonzero $~, and $+. Using Theorem 4.25 and the definition of the Iohvidov indices fl+ , it follows that q'(z) - z ~+ b0,,,-1(z) # is right orthogonal to all z k for k - 1 , . . . , u - ~,~- 1 but not for k - 0 since (1, q'(z)) - sn_ ~+ x O. Recall that the size of the T-blocks is not left-right sensitive, and thus { an-~0,nandan-1 a n - a + +a~, + l - f l n - / 3 + + fl~ + l
ifa +-f~+-0 otherwise.
Thus the previous orthogonality property means that the polynomial q = ( sn_ ~+ 1)- ~q~ of degree at most a,~-I + f~+ < an has a stacking vector which satisfies the defining equations of the vector q. This choice is always valid, independent of the size of the T-block. Of course it should be the same as before, since the system has a unique solution. The left and right polynomials can be computed independently from each other by the algorithms we have seen. There are however combined computational schemes, which are generalizations of the Levinson and Szeg6 algorithms. These will be given in Section 4.9.11. Since the fundamental polynomial p ( z ) - zTp -- ao,n(z) is a TOP polynomial, we shall denote the corresponding residuals r~ temporarily as r~. Recall r ; (z)
-
f _ ( z ) a o , , ( z ) - co,,(z),
r+(z)
-
f+(z)ao,n(z) - Co,n(z),
deg r~-(z) - - a ~ + 1 - 1 + o r d r + ( z ) - an + an+x
and that f ( z ) p ( z ) - r + ( z ) - rp(Z). For the other fundamental polynomial q(z), we introduce similar notations. We set ?'q- (Z)
--
_ ('~d-Sn_l)-18n_d- 1 ( z - l ) z'~'-~+~+ ,
deg r~-(z) - 0
Tq+ (Z)
__
__ 8n_l)("d- -1 8 n _ l (Z -1 )Z ~;'~-1 +fl: ,
oral ?'q
-
To motivate this choice, note that
g ( z ) b o , n _ l ( Z ) - 8n_ § 1 (Z) -- Sn_ I(z) -- f ( z -1 )b0,n-1 (z) f(z)b-o,n-l(Z - 1 ) - 8 dn _ i ( Z - 1 ) - - 8 n -_ l ( Z - 1 ) f ( z ) b ~ n _ l ( Z ) - Z~n-l[,~n+ i(Z--1 ) -- 8 n _ l ( Z - - 1 ) ] . Multiply the last relation with z fl~+ (Sn+_l)-1, t h e n the above definitions lead to f ( z ) q ( z ) - r + ( z ) - r~-(z). L e m m a 4.35 With our definitions, just introduced, we have r+(z)p(z)-
r+(z)q(z)-
z '~" - r ; ( z ) p ( z ) -
r;(z)q(z).
4.9. T O E P L I T Z CASE
215
P r o o f . The relations f ( z ) q ( z ) lead to
0
1
r+(z)-r;(z)and f(z)p(z)-
q(z)
p(z)
q(z)
r+ (z)-r~- (z)
p(z)
"
The determinant of the first matrix is 1. Checking the orders of the residual series r + and r + it follows that the determinant of the second matrix has an order at least '~n. Similarly, checking the degrees of the series rp and r~and the polynomials p and q, it follows that the determinant of the third matrix has a degree at most '~n. More precisely, it is of the form z ~ + lower degree terms. Combining these results, the only possibihty is that the determinants of the second and third matrices are equal to z ~ . This proves the result. [:3 To express r~- and r~- in terms of the vectors p and q, and the given Toeplitz matrix, we use the following notation. q0
P0
Pl
ql
P0
qo
~ ~
Lp
P0 Pv
q~,
Lq
q~
Pl 9
q0 ql
(4.49)
9 , ,
0
,
"'.
: qv
Pls
1
0
The invertible Toeplitz matrix T - Tn-i - T~,~ has size u -{- 1 - ,~n. If we extend it with a square block T s of the same size so that
then, by definition of the residual series rp- and r~-, the products Up]
_
R~-and
[
IT']
[ Lq
]-R;
(4.50)
deliver both upper triangular Toeplitz matrices containing the initial coefficients of these series. R~-
9 "rO,p
with
r ; ( z ) - ~ - ~_ r_~ . p_ z k-O
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
216
and similarly for R~-. Before we prove the inversion formula, we need one last lemma. L e m m a 4.36 With the notation just introduced, R g U, - R ;
-
P r o o f . We recall that the multiplication of series in z -1 is equivalent with the multiplication of upper triangular Toeplitz matrices. For f, g 6 F[[z -1]], and denoting by Th for h 6 F[[z-~]], an upper triangular Toeplitz matrix with symbol h, then TfTg - Tfg. When f, g 6 F(z -1 ), the Woeplitz matrices are not exactly upper triangular. They are however upper triangular up to a certain shift and a similar result holds. Applying this to the relation r;(z)q(z)-
z
and cutting out matrices of appropriate sizes gives the desired relation.
Now we can give the inversion formulas for Toeplitz matrices which is due to Heinig [134], see also [144]. T h e o r e m 4.37 ( T o e p l i t z i n v e r s i o n f o r m u l a s ) Let p(z), q(z) be a fundamental pair of polynomials for the invertible Toeplitz matrix Tn-1 = T~,,~,, u T 1 = nn. This means that their stacking vectors satisfy the equations (~.46). Furthermore, define the triangular Toeplitz matrices Lp, Up, iq, Uq as in (~.49). Then
T(,~_~ - L q U p - LpUq - U p L q - UqLp. P r o o f . Using (4.51)and (4.50), we find
I-
T(LqUp - LpUq) + T'(UqUp - UpUq).
Since both Up and Uq are upper triangular Toeplitz matrices, they commute and therefore the last term in the right-hand side will vanish. The first formula then follows immediately. For the second formula, we can make use of the persymmetry of Toeplitz matrices i.e., .f T .~ - T T for any Toeplitz matrix T. When applied to the first inversion formula, we get the second one. [:] As an immediate consequence of the inversion formula, we find a Christoffel-Darboux type formula for block orthogonal polynomials with respect to an arbitrary Toeplitz matrix.
4.9.
TOEPLITZ CASE
217
T h e o r e m 4.38 ( C h r i s t o f f e l - D a r b o u x f o r m u l a ) Let Tn-1 = T~,~ with v + i - an be an invertible Toeplitz matrix. Set Kn-1 - T(~T , then its generator is a reproducing kernel for the polynomials of degree at most v. Let ak and bk be the blocks of monic block orthogonal polynomials associated with T. These reduce Tn-1 to the block diagonal matrix Dn-1 by a decomposition of the form B,~H1Tn_IAn_I - Dn-1. If (p, q) is a fundamental pair .for Tn-1, then p(z) - ao,n(z) and q(z) have
-($+_l)-lzf3+b o,,~# 1 ( z ) and we
n-1
bk(y)D-k, Tak(m) T k-O
q(~lp#(y) q#(y)p(~) -
i-
xy
P r o o f . With the notations of the previous theorem, we get
[
Lq
Uq Up - Up Uq
-
0
-
Uq L p - Up L q
01
T,~11
"
The last equality follows from the Toeplitz inversion formulas and the commutativity of upper (and lower) triangular Toeplitz matrices. Multiplying from the left with xT and from the right with y gives (1 - (xy) ~+1) K n _ l ( X , y) - [q(x)p#(y) - p(x)q#(y)](1 + x y + . . . + (xy) v) where (p(z), q ( z ) ) i s a fundamental pair for Tn_ 1 and the reciprocal is taken for degree v -{- 1 for both of them. Since (1 - z)(1 + z + . - . + z ~) - 1 - z ~+1 , the result follows. [:] A direct expression in terms of the orthogonai polynomials is not so nice. Substituting for the fundamental polynomials in terms of the orthogonal polynomials, we get
Kn_l(x,y) -
~"-+ b0,._~ (y)~o,~(~) - ~'-+ b0,n-l(x) # ~#o,n(Y) v+
sn_ i (I -
x~)
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
218 if a + > 0 and
Kn_l(x,y)
=
~bo,._~(y)~o,~(~)- b#o,._~(~) ~ ( y ) v+ Sn_ 1(1 - x y )
if a + - fl+ - O. These Christoffel-Darboux formulas are not s y m m e t r i c in their left/right aspects, although the defining formula for Kn-1 is. This is because our analysis was based on the f u n d a m e n t a l pair, which should in fact be called a right f u n d a m e n t a l pair, since there is a complete s y m m e t r i c definition for a left f u n d a m e n t a l pair. A left f u n d a m e n t a l pair (/5, ~) would be defined as
[
i5o ~o
"'" "'"
i5, ~,~
]
1 0
T,~+I,,~=
[00 0] 1
0
...
0
or equivalently
<
1
0
0
0
P~
0~
9
.
:
:
-
0
15o qo
=
,
1
In this case ~5(z) - bo,,(z) and ~(z) - the following symmetrical formula
g,,_~(~,y)
T < - [~j-i-1]"
o o
('~+7'n-1)-1Z~ a---~0,n-1(z). This suggests
+a#O,n-1 (Y)bo~,n( x ) - T'?n ao, n-1 (x)bo,n(Y) ,~+ 'rn_l(1
-- Xy )
with -y,~ - max{ 1, ~+ }. It follows immediately from the form of the Christoffel-Darboux formula that Tn-T (x, y) -- g n _ l ( X , y) - - H ( y ) H ~ G ( x ) , i-
my
with q(z
'
and
H(y)-
P#(Y) q#(y) ] .
Thus Corollary
4 . 3 9 The inverse of a Toeplitz matrix is a quasi-Toeplitz matrix.
4.9.
T O E P L I T Z CASE
219
We conclude this section with a block form of the Christoffel-Darboux formula. First we have to introduce the reciprocal for a block of polynomials. For a single polynomial we had
~(~) - [ 1 ~ ... ~ao~~
then
~#(~)-
~" i [1 z ... zd~176T.
Now let a(z) - [al(z), a 2 ( z ) , . . . , an(z)] - [1 z ... zd~S']A be a vector of polynomials with v - deg a(z) dr max{0i -- deg ai" i - 1 , . . . , n} then we define a#(z)
-
A g ] [l z ... zdr
T
=
[z "-~ a#~ (z), z v-~ a2# ( z ) , . . . , z v-~ a~ (z)].
Now we can formulate the theorem. Theorem
Suppose T is a Toeplitz biorthogonal polynomials {bk,ak}. Let (bi, aj} = is an invertible submatriz of T, corresponding to Let Kn-1 - T~,T and let K,,-1 (x, y) - yHK,,_lX be Then
4.40 (block C h r i s t o f f e l - D a r b o u x )
matriz with blocks of ~ijDij. Suppose Tn_l the blocks 0 , . . . , n - 1. the reproducing kernel.
b~(:r.)D;,Ta~(y) H - an(x)D,~,,~bn (y)-I K , , _ ~ ( x , y) -
H
1 - (xy)~',,+ ~
"
P r o o f . This is based on the formula T,~ 1 - A n D ~ I B H, so t h a t
g . ( ~ , y)
-
y'Tjrx
- xrTj~y
=
g n _ ~ ( x , y ) + a n ( x ) Dn,nbn(y) -1 H.
(4.52)
On the other hand, using the p e r s y m m e t r y :
K,,(x, y)
-
yH ] T~I ] x _ yH I A,,D~ 1B H ] x n
-
# _~n-lb#
)W
k=O
-
T (z~) '~'+' K,,_, (z, y) + an#( y)D,,,,,b,,(m) -' #
=
(my) ~"+~ K , _ I ( m , y) + b~(m)D,,--.Ta~(y) g.
(4.53)
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
220
Subtracting (4.53)from (4.52) yields (1 -- (x~) ~'~+~) K n - l ( X , y) - b#~ (z)D~,Ta~(y) H - a,,,(z)D~lbn(y) H, [::]
which is the desired formula.
Also this formula reduces to a well known formula in the case where all blocks are of size 1. Of course one can imagine that there is a corresponding inversion formula for Tn-1 in terms of block triangular Toeplitz matrices. The formula is less interesting though because in general it requires the whole blocks an and bn and Dn,n which are only known if Tn+l is known. Thus we need much more information than really needed to invert Tn-1. 4.9.11
SzegS-Levinson
algorithm
The concept of para-conjugate, of reciprocal polynomial and the corresponding orthogonality relations lay at the base of a generalization of the Szeg6Levinson recursion. A complete account on the algebraic aspects of this type of orthogonal polynomials, which are formal generalizations of the Szeg5 polynomials, can be found in Baxter [10] and more recently in Bultheel [41] at least for the nondegenerate case. See also [120]. For the degenerate case, we describe some generalizations in the rest of this section. Such generalizations have been found before e.g., in [82, 116, 134, 144, 199, 203,218, 224]. We shall follow the approach of Freund and Zha [98] who use the terminology of formal orthogonal polynomials. For the formulation of the following results, it will be convenient to subdivide the vector z - [1, z, z2,...]T into blocks conformally With the blocks of orthogonal polynomials. So we set z - [1, z, z2,...IT _ [z~0z0T,z~lzT ... ]T with Zk
--
[1, Z, Z 2 , . . . ,
Zak+t-1] T,
k - 0, 1, .
.
.
.
(4 54)
Note that in the normal case, when all blocks have size 1, the zk - 1 and Z~k -- Z k.
We can now formulate the next theorem. T h e o r e m 4.41 Let the moment matrix M - T be Toeplitz and let an and bn be the nth blocks of right and left biorthogonal polynomials. Let zk be as defined above. Then the matrices
4.9.
TOEPLITZ
221
CASE
are invertible and
<.~ b.*(z/) - * while for k = a n + l , . . . ,
~,~+1
<4(zl,
-o
- O.
and
Proof. Clearly T'~A'~- R'~- [ Rn- Rnn,O
].
(4.55)
Because Tn and An are nonsingular (An is unit upper triangular), 0 r det Tn = det R n = det R n - 1 det Rn,n. Thus Rn,n is nonsingular. The proof for Sn,n is similar. The last block column of (4.55) reads Tn A.,n -
[ 0] Rn,n
implying that ] R,,,n0 ] and since i T n i
-T T,weget A T.,. ] T,,, - [RT,,,, /
Io...o]
which implies that
- {0, ~
RI, I,I•
i,
k-0,1,...,n-1 k :
g/,,
The proof for the left polynomials is by duality. We are now ready to derive the general Levinson type algorithm. The idea is that the block orthogonal polynomials are obtained by shifting the previous polynomial and then modify it to get the required orthogonality. So let us start with zao,n and see which orthogonality condition is missing. Any polynomial of the right block n, thus also the TOP, is right orthogonal to all the previous blocks. Thus \
I z k ,ao,n(Z) ) -- O, I
k-
O, 1, .. ., ~ , - 1.
CHAPTER 4. ORTHOGONAL POLYNOMIALS
222 Hence
, z~0,.(~))
-
-
-
:,
but in general
t~ doj (:, za0,.(z)) need not be zero. Thus
(-.._:, z~0,.(z))
- [t~ 0 ... 0] r
while we know that
(Zn-1, bn~-I (z)) - I Sn-l,n-1 " Therefore zao,n(z)+b#n_:(z)pln will be right orthogonal to all previous blocks if the vector p~ makes
(zn_:,zao,,,,(z) +
'k
bL:(z)p~/
-~- i S---n_l,n_lp 1 o
zero. Thus p~ should be chosen as
~,~ -- -Sn_l,n_ --: 1[o
... o t~] T
This procedure can be repeated for all the inner polynomials of block n: Set v-,r
(z k, zav_l(Z)) -- ( tin'O, k - O k-
1,2,...,an
and thus
zav_:(z)-Jr b L l ( z ) p / will be right orthogonal to all the previous blocks 0 , . . . , n the vector pin to be
P/n--
--
~-1
n - l , n - 1
[0 ... 0 ti]T
~
1 if we choose
tni _ (l za~_l(Z)> ~
9
This allows to compute all the inner polynomials of the right block n. They are not unique and we can for example add to a~,(z) in block n any linear combination of preceding polynomials in block n. Thus we may add an
4.9. TOE, PLITZ CASE
223
element of the form an(z)q~ where q/n is a vector containing i - 1 arbitrary constants, followed by zeros. The T O P of block n + 1 is uniquely defined. We apply the same procedure as before by shifting a~_l, u = ~n+a and making it first orthogonal to all the blocks 0, 1 , . . . , n - 1. Thus
za,,_l (z) + b,,#l (z)p~ is made right orthogonal to blocks 0 , . . . , n -
pO+, - - - S- -n, _ l . n _ 1 [o . . . o t,~-+ ,]r ,
1 by taking
r
--<1 , z a u _ l ( Z ) > .
Note that now we could as well have used b~#(z) since this block has just been completed. In that case, one has
za~,_l(z) + b~#(z) Pn+l 0 with P no+ l
:
- -S-n', n
[0
"""
0 t.~-+~ ]r
t.~"+~ =
Suppose we take the second choice. It now remains to make the result also right orthogonal to block n. Therefore we add a linear combination of the polynomials an(z). All of these polynomials in block n are right orthogonal to previous blocks and adding such a combination will not destroy the orthogonality to block 0, 1 , . . . , n - 1 that we have obtained already. So the question is how to choose the vector 0 qn+l C ]W~'+~ such that ao,,,+l(Z)
-
za,._l
(z)
+ b.#(
o o Z)Pn+l + an(Z)qn+l
is right orthogonal to block n as well. Note that the second term is right orthogonal to z k for k - 1 , . . . , ~ n + a since b~#(z) is right orthogonal to z k for k - a n + l , . . . , ~n+a by Theorem 4.41 while the vector p~ is chosen such that b~#(z) Pn+l 0 eliminates the element (1, za~,_l(z)), thus in the vector zn, b~#(z
P,,+I, only the first element can be nonzero. Thus
_ + < z ~ . z . , ~ . ( z ) > qn+l o where
(z~"z"' ~ " - l ( Z ) > -
[o] t.
'
= R . , . .
224
C H A P T E R 4.
ORTHOGONAL
POLYNOMIALS
The vector tn C IF~''+~ -1 appears in the last column of R,~,,~"
[XX'nix Thus right orthogonality to block n is obtained by setting
o
,[0]tn
- -R~,,~ZR,,,,,e~,+~_I.
q n + l - - -- R n , n
A similar derivation can be obtained for the left orthogonal polynomials. The results are resumed in the following theorem for further reference. T h e o r e m 4.42 Let T be a Toeplitz matrix and (bk, ak) the blocks of biorthogonal polynomials. Define zk as above in (~.5~) and
Rn'n-[ XX tnIx ' t,~ E~a"+~-1)xl
SH
[ X X]
n,n-
SH
X
~
Sn ~
]~c~-+l-1)xl
and for ~ , - an + i, 1 < i < an+l - 1
-
tin
i and- sn.
Then
~,(z)
b,(z)
z a ~ _ l ( z ) + b~_l(z)p/,, + a,,(z)q~
-
-
z b ~ _ l ( z ) + a~ l(z)v~ + b,,(z)w/,,
with
r
-- ~-~ n--l,n-1
[ 0 . . . 0 tn]T~ ~
q ~ - [y ..~.~ 0 . . . 0]r,
V 'n - - R n - l-,-n~- 1
[ 0 . . . 0 4] r
~ . - ' [? ... ~ 0 . . . 0]r,
i-1
i-1
where ? represent arbitrary elements. The T O P s of the next blocks are obtained for v - an+l as
~o,.+~(z)
__
za~_l(Z)+
bo,n+l(z)
--
z b v - l ( Z ) + a~(z)v~
bn#( z)Pn+l o
o + an(z)qn+l + bn(z)w~
4.9. TOEPLITZ CASE
225
where
o
1[ t~,~+:0 ] , Vn+l o
o
1[0]
Pn+I -- -- S n , n
qn + l -- -- R n , n
tn
'
1[ &~,~+: 0 ],
-- -- R n , n
o
110]
Wn +1 -- -- Sn,n
Sn
"
This is a generalization of the classical Levinson algorithm for arbitrary Toeplitz matrices. Originally, the Levinson algorithm was derived in the context of prediction filters [180] by Levinson. It was then realized t h a t it was equivalent with the recurrence relation of Szeg6 for orthogonal polynomials on the unit circle [229.]. In the special case where all ak = 1, we only have updating steps of the last kind since there are only TOPs. We then have an+ I (Z) -- z a n ( z ) Jr b ~ ( Z ) P no + l ,
o Pn+l --
b n T l ( Z ) _ z b n ( z ) jr a~n ( z ) v o + l ,
V no+ 1
_
--
-1
which is the recurrence found by Baxter [10]. See also [41] for an extensive survey of the algebraic aspects in the normal case. If moreover T is Hero mitian, then an - bn and thus Pn+: - %0+ : , which are then the classical reflection coefficients or Szeg5 parameters. To link this up with the previous derivations, note that the second t e r m in the recurrence relation for an inner polynomial a,, is b n#- l P ni .
# . -S-n:. - : , n. - 1 [0 . bn_: .
.
0 1]Tt~
i b n#- 1 S- - n- - -1 1,n-- 1 e a r - 1 tn.
Now T),).-1 .it B-.,n-1 -
[S#_l,n_l
.[
0
...
0]
and hence T n - 1 I - B . , n - l ~ -nI _ l , n _ l e c ~ , ~ _ l
--[l
0 "'" 0] T
and this means that q n _ : ( z ) - b_# ,,- :(z]Si-11,n_1e==_: , , ,,is the q-polynomial in the fundamental pair for T , _ : . Thus b n ~ _ l ( Z ) ~-n- _: l , n _
l eoLn-1 -- q n - 1 (Z) -- -- z
bO # ,n-l(Z) /<-
19
The other fundamental polynomial for Tn-1 is of course P n - 1 (Z) --- ao,n(Z ). Let v = '~n + i, then it was shown above t h a t a~ is a polynomial linear combination of p n - : ( z ) = a0,n(Z)and qn_:(z), the fundamental polynomials for Tn_:. More precisely, for i = 1 , . . . , an+: - 1
a,,(z)
- z t p n _ 1 ( z ) -~
q,,_: (z)dn,i(z)
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
226 where
dn,i(z)-
t l z i-1 + t 2 z i - 2 + . . .
+ t TL" i
Denoting as earlier the minus residuals for Pn-1 and qn-1 as OO
r p- ( z ) -
OO
- ~ r i , p-z - '
,
and
r~-(z)- - ~
i=0
~ -, , ~ z-i,
(%
_
1)
i=0
then the coefficients tin solve the system TO,q Ooo
~
~
~'o-:~
~
~
t~
~'l,p
Since ri,~ - 0 for i - 0 , . . . , a n , also tin -- 0 and consequently also dn,i(z) - 0 for i - 0 , . . . , a ~ . Moreover d=,i(z) is given by z'r~- (z) d i v r~- (z).
4.10
Formal
orthogonality
on
an
algebraic
curve
So far we have considered a Hankel or a Toeplitz moment matrix. These are the most intensively studied cases. The reason is probably because they are direct formal generalizations of the polynomials orthogonal on the real line and on the unit circle. However, other special cases were considered recently. For example, Brezinski gives a survey of formal orthogonality on an algebraic curve [27]. The idea is that the equation of the curve implies certain relations for the moments so that the moment matrix gets a specific structure which can be used to derive appropriate recurrence relations. We give a short introduction to this idea. Let [#/j] be a moment matrix for the bilinear form (-,-). Suppose for simplicity that it is nondegenerate, i.e., suppose that all its leading principal submatrices are nonsingular. Then, if we define the polynomials /~o,o
Pk(z) - Pk det
9
9
/zO,k
9
9
#k-l,O
1
9
"'"
#k-l,k
...
zk
with pk an arbitrary nonzero constant, we have
(z ~ P~(z)) - O
~
i-
O 1 ~
~~
oo~
k-1
9
4.10. F O R M A L O R T H O G O N A L I T Y ON A N A L G E B R A I C CURVE 227 This is an obvious observation for the reader who is familiar with formal orthogonal polynomials. Those who are not can convince themselves by the observation that the coefficients pf Pk should be proportional to the last column of M~-1 which follows from the orthogonality condition for Pk. When this last column is computed by Cramer's rule, then the same expression is obtained as for the Laplace expansion of the previous determinant along its last row. A similar argument can be used to see that by defining the polynomials #o,o
999
#k,o
:
Qk(z) - qk det
9
.
#o,k-~ 1
9 9 9
,uk,k-~
zk
. . .
with qk an arbitrary nonzero constant, we have Qk, z i / - 0 ,
i-O, 1,...,k-1.
Therefore, (Qk, Pt)=Ck~k~,
Ck~0,
k,l=0,1,...
which expresses the biorthogonality of these polynomials with respect to (.,.). Note that ifpk - qk for all k and #k,t - #t,k for all k and l, then Pk(z) = Qk(z) for all k. Define now a linear form L on F[z, ~] by L(~z j)-#iS,
i,j-0,1,...
The relation with the bilinear form (., .) is that
(Q(z), P(z)) = L(Q(z)P(z)). The linear form L is more general than the bilinear form (., .) because we can apply L to any element in F[z, ~] i.e., any polynomial in the two variables z and ~ which is in general of the form ~k,t ak,t zkz--t. On the other hand, (., .) will always lead to the application of L to a separable polynomial of the form Q(z)P(z) with P and Q polynomials. We can now define orthogonality on an algebraic variety. Definition
4.19
(orthogonality
on an algebraic
algebraic variety defined by n
j -0, k,j=O
c F.
variety)
Let F be an
228
CHAPTER
4.
ORTHOGONAL
POLYNOMIALS
T h e n if the m o m e n t s satisfy n
E
ak,JtzJ+m, k+i - O,
m , i - O, 1 , . . .
k,j=O
the polynomials Pk are said to be orthogonal with respect to L on F.
Consider for example F - C and let ~ be the complex conjugate of z C C, then a variety of complex dimension 1 is called an algebraic curve. For example if f ( x , y) - 0 describes an algebraic curve F, then setting x -
z +-2
2
z--5 y - ~ , 2i
and
we see that f ( x , y ) - 0 is of the form (4.56) with ak,t - -dt,k. In general, there are several possibilities. The description of the algebraic variety leads to either one equation, or to two equations. In the first case, the variety is called indeterminate: it contains infinitely many points. This happens for example if aik - - ~ - d k i . Then we get one equation f ( x , y) - 0 with z - x + iy. The variety is an algebraic curve. When it gives rise to two different equations, there is only a finite number of points in the intersection. The algebraic variety is called regular. When the intersection is void, the variety is called inconsistent. Polynomials orthogonal on algebraic curves appear naturally in the context of root location problems. See [131, 168]. In section 6.7 we shall discuss in detail the root location problem for the imaginary axis (Routh-Hurwitz test) and for the unit circle (Schur-Cohn-Jury test). Several algebraic curves were studied. We refer to [27] for several cases and many more references. Here we restrict ourselves to the introduction of the most classical cases where we have orthogonality on the real line or on the unit circle. The real line is characterized by y - 0, thus z - ~ - 0. When we multiply this with zi~ " and apply L, we get # m , i + l -- I~m+l,i
which means that M - [#i5] is a Hankel matrix. The unit circle is given by x 2 + y 2 _ 1 or z ~ with z i ~ 'n and applying L gives ~m+l,i+l
--
1. Again multiplying
~m,i
which means that M - [/z,,j] is a Toeplitz matrix. Note however that it is not Hermitian in general.
4.10.
FORMAL ORTHOGONALITY
O N A N A L G E B R A I C C U R V E 229
Polynomials orthogonal on the union of the real line and the unit circle are orthogonal with respect to r , described by ( z - g ) ( z 2 - 1 ) = 0. In this case the moment matrix is the sum of a Hankel and a Toeplitz matrix. Finally, we have a look at the vector orthogonal polynomials of dimension d as we discussed them in Section 4.6. In this case, the moments satisfy (possibly after rearrangement) [s
1 -- ~i+d,j
for some fixed d. Obviously, d - 1 corresponds to the Hankel case and d - - 1 to the Toeplitz case. These polynomials are orthogonal with respect to the variety I' described by z - z -d. (4.57) For d - 1, this is the real line and for d - - 1 , this is the unit circle. For d 7~ +1, we have a regular variety which consists of the finite number of points which are on the intersection of the curves characterized by (4.57). Clearly, z - 0 is a solution if d > 1. If z 7~ 0, we set z - re {~ which gives for d 5r 1 that Td-1
_
ei(d+l)O.
Thus, r - 1. Therefore, besides the origin, the other points in the variety are given by the points e iO where the 0 are solutions of e iO - - e - i d O
or
e i(d+l)O =
1.
That is, these 0 are given by 2k
O- ~~r,
k-O,l,...,d
d+l
ford>
1 and by
0for d < - 1 . points"
2k
~~', d+l
For example, for d -
(0, 0), (1, 0), and for d -
k-O,-1,...,d+2 2, the algebraic variety consists of 4
(-1/2, vf/2)
- 2 , there is only the one point (1, 0).
Chapter 5
Padd approximation The history of continued fractions, and associated with it, the problem of Pad~ approximation is one of the oldest in the history of mathematics. One can read about the history in Brezinski's book [25]. There are very early predecessors, but the study was really started in the 18th century and came to maturity in the 19th century. The serious work started with Cauchy in his famous Cours d'analyse [50] and Jacobi [157] and was continued by Frobenius [100] and Pad~ [198]. A current standard work is the book by G. Baker Jr. and P. Graves-Morris [7]. For the proofs of the theorems given in this section we refer to the latter. For the less patient reader, there are surveys by Brezinski and van Iseghem [33, 35]. This chapter is mainly devoted to linking the previous algorithms to the recursive computation of Pad~ approximants on a cetain path in the Pad~ table. The recursions are mostly well known and follow also immediately from the recursion formulas for continued fraction expansions of the given formal series. Many of them can be found in the book of Wall [238] and in the references on Pad~ approximants given above. Of course these relations can be translated in a terminology of orthogonal polynomials and the recursive computation corresponds to the recurrences of orthogonal polynomials and adjacent orthogonal polynomials. For this interpretation we refer to [24, 41, 238, 240] for the normal case and to [87, 88] for the more general situation. One of our main intentions in this chapter is the introduction of the notion on minimal Pad~ approximant in Section 5.6. This is the translation of the notion of minimal partial realization that is extensively discussed in Chapter 6. In this chapter, we shall assume again that the coefficients are taken from a skew field F. This implies that we should discuss again left/right duality. We shall give a development for the right sided terminology. The 231
232
C H A P T E R 5. PADF, A P P R O X I M A T I O N
left sided analog is left as an exercise for the reader.
5.1
Definitions
and terminology
Let us start with the definition, or rather some definitions, since one could accept different definitions for a Pad6 approximant. Definition 1.13 for Pad6 approximant that we have given at the end of Section 1.5 corresponds to the Baker-type definition. We repeat it here in a slightly different form. D e f i n i t i o n 5.1 ( P A - B a k e r ) Given a formal power series f ( z ) - E~=o fk zk C F[[z]], with fk C F, we define a right Pad6 approximant ( P A ) [ ~ / a ] of type (~, a), a , ~ > O, as a rational fraction c(z)a(z) - t satisfying 1.
deg c(z) <_ fl
(5.1)
2.
deg a(z)<_ a
(5.2)
3.
ord ( f ( z ) -
(5.3)
4. 5.
>
c(z)a(z) -1) ___w + 1
+
deg a(z) as low as possible.
(5.4) (5.5)
The following uniqueness property is well known. T h e o r e m 5.1 All Padd approximants of type (~, a) for fixed values of fl and a are rational fractions representing the same rational function. Hence, the Pad6 approximants [~/a] are essentially unique modulo a trivial normalization. Because the reduced form always has a(0) ~ 0, we can assume without loss of generahty that a(0) = 1. Note that from this theorem and the definition of PAs, it follows that the minimality of condition (5.4) is equivalent with c ( z ) a n d a(z) being right coprime. Thus the previous definition is equivalent with the earher Definition 1.13 of Pad6 approximant. As we already said, if the PA of the previous definition exists, then we can always consider its reduced form and without loss of generality, we can assume that it is comonic i.e., a(0) = 1. Modulo this trivial normalization we can speak about the Pad~ approximant [fl/a]. Unless mentioned otherwise, we assume this normalisation in the sequel. The approximants can be arranged in a table as follows. Definition 5.2 ( P a d 6 t a b l e ) The table with as (fl, a) entry the Padg approzimant [~/a], if this exists, is called the Pad6 table. A Padd table is said to be normal if all its entries exist. Otherwise it is called nonnormal.
5.1. D E F I N I T I O N S A N D T E R M I N O L O G Y
233
A Pad@ table is represented by a vertical fl-axis pointing downwards and a horizontal a-axis pointing to the right. In the sequel we need the following properties of the normal and nonnormal Pad6 tables. It describes the block structure of the Pad6 table. It is referred to as the Pad@ theorem. A proof based on Hankel determinants is given in [7, Theorem 1.4.3]. See also [118]. T h e o r e m 5.2 /f the Padd table is normal d e g c = / ~ , d e g a = a and
and
c(z)a(z)
-1
-
[~/a], then
ord ( f ( z ) - c ( z ) / a ( z ) ) - N with N - a + fl + 1. A nonnormal Padd table contains singular blocks. If fo # 0 these are square blocks in the Padd table where on and above the main antidiagonal of the block all elements are equal to the left top element and below this antidiagonal, the Padg approzimants do not exist. When f ( z ) is the expansion of a rational function c(z)a(z) -1 (c(z) and a(z) coprime) an infinite block of PAs with left top element c ( z ) a ( z ) - I occurs. If fo - " " - ft-1 - O, ft ~ O, an infinite rectangular block of t rows with left top element 0/1 occurs in the Padd table. Examples of such tables will be given later. Above, we have given the Baker definition of the [~/a] Pad6 approximant. Not all [~/a] approximants exist, as stated in the previous theorem. To ensure existence of all [~/a] approximants we can use the older Frobenius type definition of a PA which is essentially a linearized form of the Baker definition. D e f i n i t i o n 5.3 ( P A - F r o b e n i u s )
A rational fraction c(z)a(z) -1 is a PA of type (fl, a), a, fl > O, for the series f ( z ) e F[[z]] iff 1.
deg c(z) <_ ~
(5.6)
2.
deg a(z) <_ a
(5.7)
3.
ord ( f ( z ) a ( z ) -
4.
w>_a+~
5.
deg a as low as possible.
c(z)) >_ w + 1
(a(z) ~ O)
(5.8) (5.9) (5.10)
Such a PA will also be denoted as For the Baker definition, a(0) ~ 0 and thus a comonic normalisation seemed natural. For the Frobenius definition, it can not be ensured that a(0) ~ 0 and another normalisation has to be adopted. We shall occasionally assume
234
CHAPTER
5.
PADE APPROXIMATION
that the PA is normalised by making a ( z ) monic, i.e., that a,~ = 1 with = deg a. By this normalisation, the PA will be fixed uniquely. Note that conditions (5.5) and (5.10) of the two definitions are the same and that (5.8)is the linearized form of (5.3). The original Frobenius definition [7] is the above definition without condition (5.10). Without this restriction, it allows a complete equivalence class of PAN for entries in a singular block of the Pad@ table. A single one could be selected by taking the reduced form, i.e., the rational function defined by it. Then in a square block of the Pad~ table, all the PAN (viewed as rational functions) are equal to the left top element, including those below the main antidiagonal of the block. Note that the Frobenius definition is weaker than the Baker definition. A PA in the Baker sense will automatically be a PA in Frobenius sense, but the converse is not true in general. Depending on the approach we take, (see the next two sections), the Euclidean algorithm sometimes computes entries of the Pad~ table that are below the antidiagonal of a square block in an unreduced form. Therefore it might then be convenient to define the PAN by conditions (5.6)-(5.9)and make them as simple as possible by imposing condition (5.10). This is our main incentive to introduce the Frobenius type of PA besides the stronger Baker definition. In the sequel we shah assume the PAN are defined by the Baker type Definition 5.1 or the Frobenius type Definition 5.3 depending on what will be the most appropriate. E x a m p l e 5.1 To illustrate the difference between both definitions, consider the example F ( z ) - 1 + z 4 + z ~ + z 9 + . . . . Its Pad@ table has a 4 by 4 block in its left top corner. It can be filled in as in Table 5.1. The original Table 5.1: Pad~ table for different definitions of PA
1/1 1/1 1/1 1/1
1/1 1/1 1/1
1/1 1/1
Definition 5.1 Baker
1/1
0 1/1 1 1/1 2 1/1 3 1/1
1/1 1/1 1/1 z/z
1/1 1/1
1/1 z/z
z/z z2/z 2
Z2 / Z 2 Z3 / Z 3
Definition 5.3 Frobenius
5.2.
COMPUTATION
235
OF D I A G O N A L P A S
Frobenius definition has 1/1 all over the square. Note that our Baker and Frobenius definitions are equiwlent iff a(0) 7t 0.
We shall see next how the Euclidean algorithm or one of its variants, computes the Pad6 approximants which are located at certain paths in the Pad6 table.
5.2
Computation of diagonal PAs
It has already been discussed in Section 1.7 on Viscovatoff's algorithm that the Euclidean algorithm (and the Euchdean algorithm is a special case obtained by setting the parameter v - 0 in the general algorithm) will produce Pad~ approximants for a series f e F[[z]]. This property of the Euclidean algorithm was described in Theorem 1.18 (case v - 0). However, it was assumed there that the series f - - s - l r has ord f _> 1. When we work with power series it is more conventional to start the series with a constant term f0, which need not be zero. Thus we should allow f to be replaced by a series f - f0 + f with f0 C F, possibly nonzero. This is a simple m a t t e r since to make this work, we only have to replace the matrix
V0 -
[vo 0] by [vo oao] 0
ao
0
a0
with a0, v0 E F0. Thus if f0 7t 0, the algorithm starting with V0 will compute approximants which are different from the approximants computed in the case f0 = 0, i.e., when it starts with Vo. However, it is easily checked that if Co,k and ao,k denote the polynomials as they are generated in Theorem 1.18 for the series f and if Co,k and 50,k are the polynomials generated with the initial matrix Vo, replacing f by ] - f0 + f, then it holds that ~o,ka0,k--~ ---fo + Co,kao,k-1 and thus ~o,k(z) - f o a o , k ( z ) + co,k(z) and ao,k - ao,k. The degrees of the c0,k are still bounded by the Kronecker indices ~k and the orders of approximation are still the same. By using the same method as in the proof of Theorem 1.18, it can be shown that c0,k and a0,k are right coprime and hence that ~0,kho,~ is a [~k/~k] PA for the series ]. Thus we can formulate the following theorem.
T h e o r e m 5.3 Let s - - 1 and f - s - l r - r - ~_,k~176f k z k C F[[z]]. We apply the Extended Euclidean algorithm with the above initialisation step,
236
C H A P T E R 5.
PADE APPROXIMATION
which means the following. Define
ilol
G-l-
0
1
8
r
Vo-
[ ivao vo
foao
Vo, ao C Fo arbitrary.
For k > 1 set
I UO,k vo,k aO,k ~o,k1
Gk-
$k
ckzk 1
'
rk
~k~"~ ~k(~)
and zk - - s k 1rk with ak - ord rk-1 -- ord Sk-~, ak(z) -- --(sk-1 l d i v rk-1)ckz ~k Ck, uk C F0 arbitrary Vk -- O. Then,
-s
-
l
al
I+
[
.
Ir-fo+v
o
~ 1 - - O:1,
~ k - - O~k-1 -~-O~k,
aO,k(O)- a o a l ( O ) ' - "ak(O) ~ O,
+ ...+
a2
l an+unz~
ao 1 ,
k _> 2
k ~_ 0 k
degc0,k _< 'ok,
degao,k _< 'ok,
ord zk
--
O~k+l, i=1
f
--1
a-1
- c0,k a0,k - rk 0,k
ord rk
-
-
ord sk +
r
-- ak+l
+ 2~;k.
1 The polynomials co,k and ao,k are right coprime and all co,k a -o,k are Padd approximants for f .
Note t h a t since ao,k(0) ~ 0, we need not distinguish between the Baker and the Frobenius definition here. Let us illustrate this with an example. E x a m p l e 5.2 The example is nothing but a translation of E x a m p l e 1.5 to the present situation. It also shows t h a t the algorithm works in more general situations of given Laurent series s and r. The restriction to s - - 1
5.2. COMPUTATION OF DIAGONAL PAS was j u s t for t h e ease of f o r m u l a t i o n .
237
N o t e also t h a t in this e x a m p l e , it so
h a p p e n s t h a t fo - 0. W e c h o o s e for t h e g i v e n s a n d r t h e following L a u r e n t series $-
r-
--Z - 1 -~- Z 4 a n d
1 + z 4.
W e shall n o t d i s t i n g u i s h b e t w e e n left a n d r i g h t in o u r n o t a t i o n . Vo - ao -
We take
1, t h e n we c a n d i r e c t l y s t a r t w i t h so - s a n d ro - r. S u p p o s e we
c h o o s e ck a n d
uk e q u a l t o 1 for all k _ 1. T h e n t h e successive c o m p u t a t i o n s
give us t h e following results" a l - o r d ro - o r d so - 0 - ( - 1 ) al
--
Note that deg al - 0 < 31
d i v r 0 ) z "~ - z -1 z, t h u s a l -
--(s0 al
--
-
1. T h u s
1.
1.
[~o ~o],1
T1 ]
-
l+z'][
~
_ [z+z~ ~ , § N e x t we find t h a t O~2 -- o r d r l - o r d
S 1 -- 4 -
1 - 3. So t h a t we get
a2 -- - - ( 8 1 d i v r l ) Z 3 -- - 1 + z -
z 2 ~- z 3.
Thus 82
T2 ]
[~1 ~]~ _ [~§ z4§
F o r t h e n e x t s t e p : a3 - o r d r2 - o r d
[oz~
s2 - 8 -
7-
-1
+ z-
z)12.
= [zT+z ~ 2~8][ z0
-(1
-(s2 divr2)z
-
T h i s gives $3
T3] -[~ -
~ ]v~ [2z 9 o].
]
1 a n d t h u s we find
-(1 +
a3 -
z 2 + Z3
z ] + z)/2
238
CHAPTER
5.
PADE
APPROXIMATION
Here the algorithm stops because r3 - 0 which means that - s - l r and co,3ao, t3 correspond exactly. The given data corresponded to a rational function which is recovered after 3 steps of the algorithm. The successive V0,k matrices, containing the successive approximants are given by Vo,1-
[0z] z
Vo,3_
1
[z4 ; Vo,2 - Vo,I V2 -
Vo,2V3-
-
za
+
-z+z
-l
-
+ z-
+
z~ + za + z4
(z +
)/2
2-z 3+z 4+z 5 (1-z5)/2
;
"
The degree properties can be easily checked. For example, we find that the series expansions of f - - s - l r - [1 + z 4 ] / [ z - 1 z 4] and C0,aa0-~ = [(z + z S ) / 2 ] / [ ( 1 zS)/2] match completely. Note that f - - s - l r has the continued fraction expansion z4
[+
whose successive convergents a r e c o , k a o , k ,-1 for k - 1 ,2 3, . A picture representing the computed elements graphically is given in Figure 5.1. The successive approximants are indicated by an X.
Figure 5.1: Pad~ table for f ( z ) - z + z s + z 6 + z t~ + z it + z is + . . - : Diagonal path
0 1 2 3 4 5 6 7 8
x0i
i i i :X~
,
,-,
~
i i i
" ,-,
,
,-,
,
, : :
:
: ,
,
r
,
,-,
,
,-,
,
,
~
,',
,
,
,
,-,
,
,
,
,
. .
:Iil
"
'i
...........
,,
, ' , ,
~
x2
............
.
~
,
, ' , ,
, ' , ,
,
,.
,
,%
,
,',
,
,-~
,
,-,
,
,-,
.
.
,
,
.
,_-:
:
~ , ,
,::
=
:,
=
~ ,
,_-:
.
.
~",,
,
,
~
"~
,
,
r
,
,
o
.
,",
, ~
.
.
,
.
.
,-,
.
.
.
.
In the case where we replace the previous f by f + 1, i.e., we choose for example s - - z -1 + z 4 and r l + z -1 then we can start with the
5.2.
COMPUTATION
OF D I A G O N A L P A S
239
multiplication with the V0 matrix to get
'30 7'0 ]
and thus, after this initialisation step, we can reuse the previous calculations. Thus all the V0 matrix does is to reduce the problem with a term f0 to a problem without a constant term. Note also t h a t if f0 - 0, V0 - V0. We have seen t h a t the extended Euclidean algorithm computes (reduced) PAs on the main diagonal of the Pad6 table. We could wonder whether we did not miss any, i.e., is there any other PA on the main diagonal of the Pad6 table in between Co,k_lao,k_ -a 1 and co,kao, ~. The answer to the latter question is negative. We indeed find them all. If the main diagonal cuts successive singular blocks in the Pad6 table, then the algorithm produces for each block precisely one PA, which is the reduced one in the left top corner of the block. The correctness of this statement should be clear if we identify the tck as the Kronecker indices of the sequence {fk}k>l and the ak as the Euclidean indices. Knowing that the denominator
a,~(z) = ao + ' . . + a,,,z '~ of an [a/a] approximant in the Baker sense should satisfy the Hankel system 9 ""
f
+l
"'"
fa+2 9
-.-
0
aa-1 9
ao
0 ~
,
ao#O.
0
It then follows from the definition of Kronecker indices and the interpretation as right orthogonal polynomials of the denominators t h a t these denominators are uniquely defined (up to a nonzero constant factor) when a is some Kronecker index tck and all solutions of denominators in between are polynomial multiples of a0,k. The same polynomial multiples will then occur in the corresponding numerators c0,k. Thus all PAs t h a t might exist between two successive Kronecker indices tck and tck+l can be represented in the reduced form cO,kao,k. -1
C H A P T E R 5. PADE A P P R O X I M A T I O N
240
In conclusion we can say that the (main) diagonal algorithm computes the PAs along a diagonal, precisely one per singular block the diagonal passes through, namely its reduced left top element. If we are interested in other approximants than those on the main diagonal, then we can use the following theorem (see [7, Theorem 1.5.4]). T h e o r e m 5.4 ( T r u n c a t i o n t h e o r e m )
Given f ( z ) -
~
fkz k ~
F[[z]].
Define fk = 0 for k < O. Define for some shift parameter a C Z oo
f~(z)-
~
~ f(z)fkz ~+k and p ~ ( z ) - [ 0
z-"f,,(z)
if a < 0 if a > O
k=-a
Suppose a(z) and c(z) are polynomials satisfying ord (fa(z)a(z) - c(z))
>
deg a(z)
_<
deg c(z)
_<
2a+1
0~.
Furthermore we define a(z) - ~(z) ~ a
~(z) -
p~(z)a(z)+ z -~(~)
Then these are polynomials and they satisfy ord ( f ( z ) g z ( z ) - ~(z))
>_ 2a - a + 1
deg~(z)
_< a
deg~-(z)
_< a - a .
P r o o f . First suppose a >_ 0. Then
f~(z) - z~ f ( z ) and p~(z) - O. From the defining relations for a(z) and c(z), it follows that the solution has got to have the form c ( z ) - z~c'(z), with c'(z) a polynomial of degree a - a at most (it is zero when a < a). Hence ~ ( z ) - z - ~ c ( z ) - c'(z) has a degree as claimed. Also the order of approximation is as it should be since
f(z)a(~)
-
e(z)
-
z-.(f~(~)~(z)- ~(z)).
This gives the proof for a _> 0. Now let a < 0. Then
f~(z) p~(z)
--
f_o.
-~- f _ o . + l
Z .4- . . .
--
f o -~- f l Z .-~ . . . ..~ f _ o . _ l Z
--o'--1
5.3. COMPUTATION OF ANTIDIAGONAL PAS so that f(z) - p,,(z) + z-"f,,(z). degree < a - a and the order of f(z)a(z)-
241
Thus ~(z) - p,~(z)a(z)+ z -'rc(z) has
e(z) :
_
is as claimed.
[3
The theorem says that if we want to find a PA for f(z) on diagonal a, we can always produce one starting from a diagonal 0 approximant for the series f,~(z). Thus it is sufficient if we know how to compute PAs on the main diagonal of the Pad~ table. Another fact that can be investigated is an adaptation of the atomic Euclidean algorithm to the case of series in F(z). It turns out that if we do this carefully, that we can obtain approximants that are on a downsloping staircase path in the Pad~ table. In the special case of a normal Pad~ table for example, the adaptation of the atomic Euclidean algorithm can be arranged such that it decomposes one step of the extended Euclidean algorithm into two more elementary steps. The first one will find an approximant whose order of approximation is one higher and which is located on the neighboring diagonal. The second one adds again one to the order of approximation and brings us back on the main diagonal again. One can imagine how this has to be extended to the nonnormal case where blocks may occur in the Pad~ table. For further details we refer to [38, 39, 87] etc. In Section 5.4 we discuss the Viscovatoff algorithm which also computes a staircase. Also the Berlekamp-Massey algorithm is an atomic version of the diagonal algorithm and it computes staircases. We shall come back to this in greater detail in Section 5.7. In the paper by Brent, Gustavson and Yun [22] a divide-and-conquer strategy was applied to speed up the algorithms.
5.3
Computation
of antidiagonal
PAs
The idea which was used in the previous section to translate the results for series in F(z -1) into corresponding results for series in F(z) was a transformation z ~ z -1. This resulted in an algorithm which computed the reduced elements in the sequence [ a - a / a ] for a - O, 1 , . . . , where a refers to the chosen diagonal in the Pad~ table, a - 0 refers to the main diagonal, negative a's number diagonals below it and positive a's refer to diagonals above the main diagonal. Hence, the downward sloping diagonals in the Pad~ table,
242
C H A P T E R 5. PADE A P P R O X I M A T I O N
which are described by a constant a, will be called a-lines for short. The upward sloping antidiagonals are orthogonal to the a-lines. These are lines where the lower bound w + 1 for the order of approximation is constant. We shall call them w-lines. Thus the coordinate net of fl's and a's can be replaced by another orthogonal net which is rotated over 45 degrees and which numbers the diagonals in the Pad6 table. The diagonal algorithm discussed in the previous section computes Pad6 approximants along a a-line. To compute the PAs along an w-fine another technique is used than the one proposed in the previous section. Note that the length of an w-line is always finite. If we have to fit the first w + 1 coefficients of f e F[[z]], then it is sufficient to know only h for k = 0 , . . . , w. The remaining coefficients do not enter the computations and could as well be thought of as being zero. Since an w-line starts at the point (w, 0) and ends at the point (0, w), we immediately know how to start the computations since [w, 0] - ~ = o h z k / 1 is obviously the simplest element in [w/0]. As on a a-line, the algorithm should compute the elements on the current w-line, or rather only those that correspond to the first entry found at the intersection of the w-line and the blocks in the Pad6 table. However, it will be most convenient now to use a monic normalization for the denominators of the approximants. If we consult Table 5.1 again, we see that, if the w-fine intersects a block below its main antidiagonal, then according to the Baker definition, (left-hand side table in Table 5.1) the PA does not exist there. Since we do need a representative for each block, we shah use the Frobenius definition (righthand side table in Table 5.1) and use a reducible (by some power of z) PA (below the antidiagonal) as a representative of the block. Note that we can not take the reduced left top corner element as in the previous section, since this one will not meet the required order of approximation imposed by w if the intersection is below the antidiagonal. Hence the monic normalization for the denominator (rather than a comonic one) is a correct choice here. The use of the Euclidean algorithm in this context is based on the following observation. Let f be a series from F[[z -1]]. The extended Euclidean algorithm gave the relation a-1 a-1 f - c o , k O,k-- rk o,k which can obviously be written as
This means that we can freely interchange the role of numerator and residual, certainly if all the series involved are actually finite series, i.e., polyno-
5.3.
COMPUTATION
OF A N T I D I A G O N A L
243
PAS
mials. Here we can choose for f the strictly proper series =
k=0
k=0
From the analysis of the extended Euclidean algorithm applied to a series -1 from F[[z -1]], we know that the degrees of ao,k and c0,ka0, k are nondecreasing while the degree of rka0-,~ is nonincreasing. Taking rk in the role of numerator and c0,k in the role of residual, this is precisely what we want to obtain when walking along an w-hne from left to right. The desired quantities then will only need a shifting back with the power z ~+1. This leads us to the following theorem which is an analog of Theorem 1.13. T h e o r e m 5.5 Let f ( z ) - E F : o h zk e F[[z]] be given. Let s - - z w+l and r - ~ f k z k. Apply the right sided extended Euclidean algorithm with the initial matrix
I- ol
G-l-
This means F \ { 0 } and vk - 0 and polynomials Set
the following uo - C o - O. ak - - ( s k - 1 and should be
1
0 8
.
r
9 Choose nonzero constants vo and ao C F0 F o r k >_ 1 and until rk - O, choose uk, ck C Fo, l d i v r k _ l ) c k . The ldiv operation acts here on considered as a division of elements from F ( z - 1 ) .
V k _ [ v k uk
ck] ' ak
k > O,
and generate G k -
a-
x V o V~ . . . V k
-
vO,k
-I cO,k |
ltO,k 9Sk
aO,k J 9 rk
Then the following relations hold
degao-ao-O,
anddegak-ak_>
1,
k>_ 1
k
deg ao,k - ~k - ~ i = o ai ordvo,k >_ w + 1 and ordco,k >_ w + 1. degrk--w+l-t~k+l ord (fao,k - rk) - ord co,k >_ w + 1. The rational fractions r k a o,k - 1 ate P A s of f of type (w - ~k, ~k). This means they are approzimants on the w-line number w in the Padd table of f .
C H A P T E R 5. PADF, A P P R O X I M A T I O N
244
The algorithm shall eventually end because rn will become zero for finite n <_ w.
a
P r o o f . We can keep the results of Theorem 1.13 unchanged if we started with the initial matrix G~ 1 which is defined by
-1
e !
I 1 0 -1
0 ] 1 ] -
z-(,,,+l) 0
0 1
0 0
0
0
z -(~+l)
f'
Zw+l 0 1 0 1 , --Z w + l T
The rightmost matrix is the matrix G-1 as proposed in the theorem, the element f ' stands for - s - l r which is in our case z - ( ~ + l ) r - z -(~+1) ~ fkz k. Suppose t h a t the quantities that are obtained by the starting m a t r i x G'- 1 are indicated by a prime, while the quantities that are obtained by starting with the matrix G-1 of this theorem are denoted without a prime. Then the primeless quantities are obtained from the primed ones by multiplying them with z ~+1 as far as v0,k, c0,k, sk, and rk are concerned while u0,k and a0,k are left unchanged" [u~,k a ~ , k ] - [u0,k a0,k]. Thus the statement about the degrees of the ak and a0,k can be taken immediately from Theorem 1.13. ' k and c0, ' k are polynomials by Theorem 1.13, both Since the elements v0, ord v0,k and ord c0,k are at least w + 1 as claimed. This implies also the last relation of the theorem, since the equahty [ - 1 r - 1 ]G-1 - 0 which holds at the start, is not affected by multiplication with some matrix V0,k from the right. Hence, it holds for any k that [ -1
r
-1 ]G0,k-0,
and this implies e.g. that ra0,k - co,k + rk. Thus we have that ord (ra0,k - rk) - ord c0,k _> w + 1. Since moreover f and r differ only in terms of order at least w + 1, we have proved the last relation of the theorem. The relation deg r k' - - n k + l from Theorem 1 13 transforms into d e g r k - w + 1 - nk+~. Note also that both sk and rk are polynomials, since at the start so and ro are polynomials and they are only transformed into polynomial combinations. Because ~k+l - ~k + ak+l > ~k - deg ao,k, it follows that rkao--,~ satisfy the degree-order conditions of a Pad4 approximant (in Frobenius sense) on the indicated w-line.
5.3.
COMPUTATION
OF ANTIDIA
GONAL
245
PAS
By Theorem 1.16, Co, k~ and a0,k are right coprime. Since by f a o , k - C o , k rk, any common right divisor of c0,k and a0,k is also a common right divisor of a0,k and rk, it follows that the only common divisor that rk and ao,k can have is a power of z. To show that the degree of the denominator is as low as possible, we should prove that if rk and a0,k have a common factor z, then it is impossible to divide out this factor in the relation f a o , k - rk = co,k without violating the order condition, thus that if a0,k(0) = 0, then ord c0,k is exactly equal to w + 1. To prove this, we introduce
[1121w12],2 Note that Wo,,~ is a polynomial matrix (see e.g. Theorem 1.4). Thus we have 0
1
uo,k
aO,k
1/321 1022
or explicitly __
VO,k1011 -Jc CO,k1/321
(5.11)
0
:
'00,k1012 ~ C0,k1022
(5.12)
0
=
U0,k 1/311 "~- ao,k w21
(5.13)
1
--
U0,klOl2 -[- a0,kl/322 .
(5.14)
ZW+ 1
From (5.11) it follows that min{ord vo,k + ord w11, ord Co,k + ord w21 } _< w + 1. Since ord v0,k _ w + 1 and ord c0,k _> w + 1, this is only possible if (ordvo,k=w+l (ordco,k=w+l Now suppose that
aO,k(O) =
0,
& ordwll-O) - 0). & ord
(5.15)
i.e. ord ao,k _> 1, then
(5.13) :::> Uo,k(O)w11(O)- 0 1
(5.14)
or
1
f
11(0) = 0.
But if w11(0) = 0, then ord wll _> 1 and thus by (5.15) we find that ord co,k = w + 1, which confirms what we said before. Thus the degree of a0,k is indeed as low as possible and this means that the approximants rka0--,~ are Pad6 approximants in Frobenius sense.
CHAPTER 5. PADJE APPROXIMATION
246
T h a t the algorithm will end is obvious since the degree of rk is strictly decreasing with k (because ak _ 1) and therefore, one shall end up with an rn - 0. The computations actually correspond to the (extended) Euclidean algorithm computing the greatest common left divisor o f - s and r. When rn has become zero, then sn is the g.c.l.d, of s and r. [::] Note that we do not necessarily have ord
( f - rka 0,k) -1 > w + 1
since we are not sure that a0,k(0) ~ 0. T h a t this constant term may indeed be zero is illustrated in the next example where X2 is reducible and a0,2(0) 0. E x a m p l e 5.3 Take the example w - 6, the initial matrix G-1 is
G~I
f(z) - 1 + z 4 + z s + z 9 + . . . .
z7
0
0 -z 7
1 1 + z4 + zs
Choose
We choose the matrix V0 to be the identity matrix, so that Go Consequently, we have the initial approximant -1 7'0,0a0, 0 :
G-1.
1 -~- Z 4 -[- Z 5
In the rest of this example, we shall always choose the constants uk and ck such that - s k as well as a0,k are monic polynomials. This is always possible to obtain: in the initialisation step by an appropriate choice of the m a t r i x V0 and in every subsequent step ck is used to make ak monic and if this is monic, then also a0,k will be monic by induction. On the other hand, uk which is used to transform r k - 1 into sk, can be used to make - s k monic (see the normalization Theorem 1.14). For example we can choose ul - - 1 , while we can take cl = 1. Thus we get
iv1 c1] [01 1] U1
a 1
z2- z + 1
--
Note that al is the quotient of z 7 divided by z S + z 4 + 1. This gives
Go,l-
I
0
-1 -z s-z 4-1
Z7
z2-z+1 z4+z 2-z+1
1
-1
andr1%,l =
1--Z+Z2+Z 4 l-z+z
2
5.3. COMPUTATION OF ANTIDIAGONAL PAS
247
T h e n e x t V - m a t r i x is given by
iv2 c2] [01 1] u2
a2
-
z + 1
w h e r e c2 = - u 2 = 1 a g a i n a n d a2 is t h e q u o t i e n t of z 5 + z 4 + 1 a n d z4+z 2-z+l. So we find t h a t
I Go,2 -
--Z7
Z 8 q- z 7
-z 2 + z- 1 -z 4 - z 2 + z- 1
z3 z3
N e x t we find
,03 C3 ] '/t3 a3
V3
I
z3 a n d r2ao, 2 = z3.
_[o i] --i
Z
and
I _ Z8 _ Z 7 _Z 3 Go,3 : _Z 3
z 9 + z 8 _ z ~' z4- z 2 + z- 1 I -z2+z-1
-l+z-z a n d r3ao, 1 -
2
--l+z--z2+z
4"
A n e x t step gives
~34 ~4
V4
0 1
C4 ] --
a4
-1 l+z
]
and I
Go,4 --
z9 + z 8 z4-
z7
z 2 + z-
z 10 _[- 2z 9 + z 8 z s + z4-
1
-z2+z-1
1
-1 a n d r4ao, ~ -
T h e last step gives
Y~ --
us
1
z 1~ + 2z 9 + z a
I
z s + z4-1
+ z 4 + z S"
[0 1]
a5
1 - z + z2
with
Go,5
-1
--1
1
z 12 ~-
Z11 _~-Z 7 z7 0
1 .
T h e successive solutions we o b t a i n e d are i n d i c a t e d by an X in t h e P a d ~ t a b l e of f ( z ) in F i g u r e 5.2.
248
CHAPTER
Figure 5.2" Pad6 table for Antidiagonal path
f(z)
-
5.
PADE
APPROXIMATION
1 + z 4 § z S -Jr- z 9 -+- z 1~ -+- z 14 -}- . . . .
0 1 2 3 4 5 6 7 8
II
I
! I•
-1 0 1 "i'"i'"i ....... X;!"!"i'" 2 : X3: : :X2 3 4 I J• ..... 5 6 Xo: 7 8 o
r ,
, . ~ ,
, - , ,
,-,
,
,_-
:
e , ,
,
.
.
.
.
.
-,
~
,
, ' , ,
, 7 ,
,',
,
,r:
,-,
,
~
:
_-,
0
; ~
,
r
, ;
,
,
il
,
,'0
~
~
,
,-,
,
~
o
~
,
~
,-,
,
-
-
,
,
,
"~
.
Note that the entry X2 is reducible since it is below the antidiagonal of its block. All the other entries indicated are irreducible because they are in the upper left triangular part of the blocks. The entry Xs is purely artificial. It corresponds to the "approximant" in G0,s which is O / z ~. We shall say more about it in Section 5.6.
This version of the Euclidean algorithm is often called the Kronecker algorithm [176] (see [7]). Also Chebyshev was aware of this algorithm. In [58] he describes how to expand a rational function into a continued fraction and this corresponds to what we have described in this section.
5.4
C o m p u t a t i o n of staircase PAs
In the previous sections, we have considered the computation of PAs on a diagonal or an antidiagonal PAs, depending on the initial conditions and on whether the algorithm is working on series in z or series in z -1. The fact that for the antidiagonal, the algorithm stops when the upper border of the Pad~ table is reached does only happen because one starts the algorithm by taking a polynomial as the initial condition. If we would add to this polynomial infinitely many negative powers of z, then the algorithm could go on just as it does in the case of series in z on a downward sloping diagonal. So
5.4.
COMPUTATION
OF
STAIRCASE
249
PAS
if we consider bi-infinite Laurent series, then depending on whether these are truncated on the side of positive powers or on the side of negative powers, we can compute an infinite diagonal in one direction or the other. The transformation z ~ z -1 then flips the corresponding bi-infinite (half plane) Pad4 table upside down along the horizontal axis and the duality is quite natural. This is the idea used in the construction of Laurent-Pad6 approximants which were considered in [41]. W h a t has been discussed for the diagonals can be repeated for staircases that alternate between two neighbouring diagonals. This is precisely what the Viscovatoff algorithm does. If we consider the general Viscovatoff algorithm with the parameter v 6 {0, 1} as described in Section 1.7, then the case v = 0 corresponds to what has been explained in the foregoing sections and the algorithm computes PAs on a diagonal or antidiagonal. When the Viscovatoff algorithm is applied to series in F(z>, with the parameter v set to 1, then it follows from Theorem 1.18 that the successive elements which are computed are the relevant elements on a staircase path of the Pad6 table. E x a m p l e 5.4 One can have a second look at the Example 1.8 and it can be observed that the elements which are computed are located on the staircase path as given in Figure 5.3. All of them are Pad6 approximants in the Baker sense. Figure 5.3:Pad4 table for Staircase path
= z + z 5 -4- z 6 + z 1~ -4- z 11 + z 15 _[_ ...:
f(z)
012345678 .
~o: ,-,,- , ,
,,i,,:~
-. '
: .
"
,
,"
]
.
.
.
.
.
[
~
.
. . . . . . . . . .
..................
.
o,
.
:..:,.?,,:
250
C H A P T E R 5. PADE A P P R O X I M A T I O N
Using the Truncation Theorem 5.4, it is possible to compute other staircases parallel to the main one. For more details see [7, Section 4.2] or [38]. When the Viscovatoff algorithm is applied to polynomials which are considered as truncated power series in z -1, one can also compute upward sloping staircases. In this case, like for the antidiagonal, not all the approximants computed will be Pad~ approximants in the Baker sense because the approximant which is computed is the first one that one finds in a block at the place where that path enters a block. Since an upward sloping staircase can enter a block either from below or from the left, one may have a reducible Pad6 approximant in the Frobenius sense when the p a t h enters the block from below but not in the first column of the block. In that case the first entry that is find in the block is below the main antidiagonal of that block where no Baker type Pad6 approximant exists. We have identified the Berlekamp-Massey algorithm as a variant of the Euclidean algorithm in Chapter 1. Thus it will obviously compute diagonal paths in a Pad6 table. However, the original idea behind the design of this algorithm is the construction of a minimal annihilating polynomial. This polynomial can be chosen as the denominator of a rational approximant that is "as simple as possible" and approximates up to a certain order. Depending on what is defined to be "as simple as possible" or "minimal", one can generate diagonals or staircases. This leads us to the notion of minimal Pad6 approximation. Minimal Pad~ approximation is a translation of the notion of minimal partial realization. The latter is usually discussed in a framework of series in z -1. Translating this to a framework of power series in z, as we have done several times before in this book, we obtain minimal Pad~ approximants. This is what we shall do in the next sections.
5.5
M i n i m a l indices
Before we discuss minimal Pad6 approximation, we introduce in this section the notion of minimal indices and minimal annihilating polynomials. These are needed to connect the denominator and numerator degrees of minimal Pad6 approximants with the Kronecker indices. In a sense, the way minimal indices are defined is an alternative way to characterize the Kronecker indices. Here is the definition.
Definition 5.4 (minimal indices, mAP)
Let
f
-
{fk)k>~
sequence and r > 0 an integer. Consider the Hankel matrices //~-~-1,~,
a = O, 1 , . . . , r
1
be a given
5.5. M I N I M A L I N D I C E S
251
]k,l
with Hk,t - [fi+j+lji,j=o. We define H-I,r as an empty matrix with zero rows and r + 1 columns. Let a(0; f ) - 0 and for r > 1, we define a(r f ) as the smallest integer a >_ 0 for which the homogeneous system
I~176176 9
-
.=
9
,
(5.16)
0
has a nonzero solution. The a(r f ) is called the minimal index of order r and a(z) - ~/]~ akz k is called a minimal annihilating polynomial ( m A P } of order r for the sequence f . If we impose the extra condition aa 7~ 0 in (5.16}, then we call a(r f ) the constrained minimal indices and a(z) the constrained m A P . This definition means that a(r f ) is the smallest a for which the columns of Hr are linearly dependent. The matrix has a kernel or right null space, that is of dimension at least 1. Any vector from the kernel defines a minimal annihilating polynomial. If we consider H = [f~+j+l] as a moment matrix, we can rephrase this by saying that a(z) - ~ akz k is the right orthogonal polynomial of degree a at most which is right orthogonal to all polynomials of degree < r a with a as low as possible. We represent this in a picture so that the relation with the Kronecker indices will become obvious. Let H be the infinite moment matrix and A the unit upper triangular matrix whose columns are the stacking vectors of the block orthogonal polynomials. Then ATHA - D where D is block diagonal with block sizes ak. Each block is a lower Hankel matrix. This block diagonal D is depicted in Figure 5.4. The orthogonality requirement formulated above means that we should walk along the diagonal r from left to right and find the first column for which there exists among all columns considered so far, one which is completely zero, on and above the C-line. There are two possibilities to consider: either the C-line cuts a block above its main antidiagonal or not. This corresponds to the left and right picture in Figure 5.4. In the first case (to which we shall refer as case A), there is no doubt: the minimal index is a Kronecker index of the sequence fl, f2, .... This is true for the constrained as well as for the unconstrained case. The column always coincides with the starting point of a new block, i.e., a(r f ) is a Kronecker index. If we denote the Kronecker indices of fl, f 2 , . . , by ~k and the Euclidean indices
252
CHAPTER
5.
PAD.E A P P R O X I M A T I O N
Figure 5.4: Minimal indices and orthogonal polynomials
~(r
r
.
a1(r
~2(r
r
/
/
/
/
/
/
/
/
/
/
/
/
II
/ /
/
/
/
I /
/
P
/ /
/
,/
/
/
/ /
by ak, then in case A case A"
a(r f ) -
nk
for
2~;k _< r < 2nk+l - a k + l .
As a m A P we should take the true right orthogonal polynomial ( T O P ) a0,k of degree ~k. The second case, when the C-line cuts a block on or below its main a n t i d i a g o n a l - see the right-hand side situation of Figure 5 . 4 will be called case B. It now depends on the minimal indices being constrained or not what value they will have. When we are in case B and choose the constrained minimal indices, (case Bc) then the m A P is an orthogonal polynomial that should be of strict degree a and then the only possible choice for the minimal indices is the next Kronecker index, which is indicated in the figure by a2(r Hence for the constrained minimal indices, we have case Bc:
a(r f ) - ~k
for
2~k - ak _< r < 2~;k.
As a m A P we can not only take the TOP a0,k, but also any polynomial of strict degree ~;k which is right orthogonal to z i for i - 0 , . . . , r ~ k - 1. That means any polynomial from the manifold
~0,k + ~p~n{~'a0,k_," i - 0,...,2~k - (r + 1)}. Their stacking vectors span the kernel of Hr
for a - ~;k.
253
5.5. M I N I M A L INDICES
If the minimal indices are unconstrained (case Bu), then we find the minimal indices to be caseBu:
a(r162
for
2,r162
This is indicated by a1(r in Figure 5.4. For the mAP we can take the TOP ao,k-l(z). Note however that for r = 2,r 1, the minimal index is a(r f) = 'ok, thus for that particular value of r we can choose any combination of a0,k and a0,k-1 as a mAP. This exceptional value r = 2,r - 1 with 'on a Kronecker index, will be needed several times in the sequel. For further reference, we call it an E (for exceptional) situation or an E value of r We summarize in the following theorem what has been found so far. T h e o r e m 5.6 Let f l , f 2 , . . , be a sequence with Kronecker indices 'ok and Euclidean indices ak. Then for given r > O, the (unconstrained) minimal indices are given in case A by
a(r f ) -- tCk
2tCk_< r < 2~k+1 -
for
O:k+l
with corresponding m A P equal to ao,k. In case B, they are
a(r162
for
2,r -- at: <_ C < 2~k
with corresponding m A P ao,k-l(Z), ezcept for the E value r = 2,r the m A P is any linear combination of a0,k-1 (z) and a0,k(z). The constrained minimal indices are Or(C; f )
= K,k
for
1, where
2K, k -- Otk <_ r < 2K, k + l -- Otk+l
which covers both cases A and B. The corresponding m A P is equal to ao,k in case A and in case B, it can be any polynomial from the manifold
a0,k + span{ Z i aO,k_ 1
9 i -- O , . . . , 2 K , k -- ( r
1)}.
Note that a(r f) indicates that a is defined by the first r coefficients fl, f z , . . . , f t . We can formulate the following special case when all Euchdean indices O~k+l - - ~ k + l
- - ~ k ---- 1.
C o r o l l a r y 5.7 The (un)constrained minimal indices a ( r
for the sequence f satisfy the following relations when all Euclidean indices ak = 1.
ct(C; f) - [r + 1 ]
I r -2r
when r is odd when r is even
254
CHAPTER 5. PADF, APPROXIMATION
The mAP for r even is a right orthogonal polynomial ar of degree r but when r is odd, one can take any combination of the right orthogonal polynomials of degree ( r 1)/2 and (r + 1)/2 of the form ca(r + da(r ). In the constrained case, c should be nonzero. P r o o f . For the constrained indices, there are only two possible values of r for which a(r f) is some Kronecker index ~k, namely r = 2ak - 1 and r = 2ak which implies the result above. When the indices are unconstrained, then for r = 2 a k - 1, a(r f ) = r + 1 - tck = t% and for r = 2t%, we find a(r f ) = ~k. The corresponding mAPs can be found easily if we realize that an even r corresponds to case A and an odd r corresponds to case B, which consists of the exceptional value only. [:] We are now ready to return to the notion of minimal Pad6 approximation.
5.6
Minimal Pad6 approximation
The Pad~ approximants that were obtained in Section 5.3 have a well-defined optimality property. We shall call them minimal Padd approzimants (mPA). This term is inspired by the terminology of minimal partial realization. The minimality property of these Pad~ approximants is a direct translation of the minimality property in the problem of minimal partial realization. See also the next chapter concerning these system theoretic matters. For the minimal Pad~ approximation problem, it is a more natural parameterization to replace the fl-a grid that we used originally, by the a-w grid that has been introduced in Section 5.3. The solutions of the problem, i.e., the mPAs will be Pad6 approximants along an antidiagonal in the Pad~ table that are computed by the Euclidean algorithm, precisely as it was done in Section 5.3. If we are given the order w, and ask for the simplest c(z) and a(z) that satisfy the linearized order condition (5.8) for PAs, then which specific PA we shall select will depend on the relative importance we want to attach to the numerator degree and denominator degree. Such a relative importance could e.g. be imposed by requiring that deg a(z) <_ a while deg c(z) _< a - a for some a ( - w ~ a _ w). In view of the notation that was imposed by the algorithm at the end of Section 5.3, where the approximant turned out -1 it might be better to denote a mPA by to be rkae, ~ rather than c0,ka0,k, r(z)a(z) -1. This is what we shall do. So we consider the following minimal Pad6 approximation (mPA) problem.
255
5.6. M I N I M A L PAD]~ A P P R O X I M A T I O N
D e f i n i t i o n 5.5 ( m P A - F r o b e n i u s , d e f e c t ) Given a formal power series f ( z ) E F[[z]], an integerw E N and an integer a, - w <_ a <_ w. Find polynomials a( z) and r( z) such that ord ( f ( z ) a ( z ) -
.
(a(z) ~t O)
(5.17)
a is minimal in
2.
.
r(z)) ___w + I
2a.
deg a(z) < a
(5.18)
2b.
deg r(z) <_ a - a
(5.19)
The defect d which is defined as d -
max{a
-
deg a, a - a - deg r}
(5.20)
is mazimal .
Among the solutions satisfying 1-3, minimize deg a(z)(5.21)
The approzimant ra -1 is called a minimal Padd approzimant (mPA) of type (w, a) in the Frobenius sense with minimal indez a and defect d. It is denoted as [w, a]. Note that at least one of (5.18) or (5.19) has to be an equality if a is minimal. Thus of the two numbers defining the defect, one is zero, the other is nonnegative. A positive defect is either caused by the denominator or by the numerator degree being deficient. Although we have yet to prove that the solutions of the mPA problem are usual PAs (in the sense of Frobenius), it is good to keep already in mind that this will turn out to be the case. We have coined this a Frobenius type mPA since it uses the linearized order condition (5.17). As explained before, in this case, ra -1 may be reducible, as is the case if the coordinates (w, a) refer to an entry in the Pad6 table which is below the antidiagonal of a singular block. Expressing that the approximation order is _ w + 1 while deg a _ a and deg r _< a - a, leads us to the relations fw
fw-1
"..
fw-a
f,,,-x 9
9
9
.
f~_.+~
...
f a-cr-1
999 9
9
9
...
O~
/_.+~
9
fo
a
0
(5.22) ~
]r
256
C H A P T E R 5. PADE A P P R O X I M A T I O N
where a is the stacking vector of the denominator and ] r is the reversed stacking vector of the numerator. We supposed that fk - 0 for k < 0. The first part gives the defining conditions for the denominator. The second part describes the numerator in terms of the denominator, but does not give extra conditions for the denominator itself. At the left most endpoint of the w-line, i.e., for a - - w and a - 0, the first of these two systems disappears because there are no equations left in the first one. Only the second system has to be satisfied. The denominator a(z) - 1 and the numerator r(z) - ~ ' fkz k form a solution with minimal O~0. In general, the first system represents w - a + a conditions for the denominator. This means that there will always exist a solution for c~ < w + a because for a - w + a, there are no conditions at all and then any polynomial of degree w + a is a good choice for the denominator. This simply proves the existence of at least one mPA of type (w, a). It may happen that r(z) - 0, in which case we set d e g r - - o o . To avoid exceptional rules for the case r(z) - 0, we add an artificial row - 1 to the Pad~ table with entries [ - l / k ] - 0 / z k. With this extension we shall then be able to prove that a solution for problem mPA of type (w, a), will indeed be a PA in the Frobenius sense of a type to be specified. With the theory of minimal indices and m A P s of the previous section, it will be clear by now that the minimal a of the mPA definition on some w-line for successive a values ( - w <__a <_ w) are given by the minimal indices of the sequence ] - {f~, f ~ - l , . . . } ( f - k - 0 for k > 0) where the role of r is now played by w + a. Because on an w-line, w is fixed, we shall write a ( a ) , rather than a(w + a; f). Note that r ranged over 0, 1 , . . . , while a ranges over - w , . . . ,w. Moreover, this is the only meaningful range for the mPA problem at hand. If we have recognized the a as a minimal index, then we shall also see that the denominator of a minimal Pad~ approximant will be a corresponding minimal anihilating polynomial. Let us give a visual insight in the notion of minimal Pad~ approximation in the case of a normal Pad@ table (See Figure 5.5). It serves as a justification of our definition and explains the rationale behind it. Draw the upward sloping diagonal (w-line) ~ + a - w, which cuts off the left top corner of the Pad@ table. Consider the remaining set So~ of all [~/a] PAN with f l + a _> w. Every element in S0~ will satisfy condition (5.17). Next, draw the downward sloping diagonal (i.e., a-line) a - ~ - a. The conditions (5.18) and (5.19) mean that we have to select our approximants from a rectangle R~(a) which has its right b o t t o m element on the a-line we
5.6. M I N I M A L PAD.E A P P R O X I M A T I O N
257
Figure 5.5" mPA table
S~
w
have just drawn. The approximant should therefore be in the intersection S,,, N R,,(a). If a has to be minimal, we should push up the b o t t o m right corner of R~(a) as far as we can on the a-line leaving the intersection with S~ not empty. Depending on the values of w and a, there is only one PA left in the minimal intersection (when w + a is even) or there are three solutions left (when w + a is odd). The second situation is illustrated in Figure 5.5. This dichotomy corresponds to the two possibilities given in Corollary 5.7 for (unconstrained) minimal indices. There may be one and only one m A P if r - w + a is even, or there may be the choice between a0,k-1, a0,k, or a combination of them when r = w + a is odd. The choice of a0,k-1 gives the denominator polynomial of least degree, i.e., the leftmost of the three approximants remaining in the intersection of Figure 5.5. The choice of a0,k gives the top most and a particular combination of them will give the third one, i.e. the one on the a-line. This illustrates that the denominators of the PAs are mAPs, but not every m A P is the denominator of a PA. By the maximization of the defect, we exclude the third possibility, but both of the others are possible choices. For the left most, the denominator degree is deficient by 1 and for the top most, the numerator degree is deficient by one. The maximization of the defect does not define the mPA uniquely. We can now choose which of the two possible deficiencies giving the maximal defect is the most important. In our definition, we have chosen to minimize the denominator degree, but the minimization of the numerator degree would have worked as well. Thus taking all conditions into account, we are left with a0,k-1 as the only choice which corresponds to the left most
258
CHAPTER
5.
PADE
APPROXIMATION
of the three. It has a defect 1 caused by its denominator. The mPA is uniquely defined up to a constant factor in numerator and denominator. Note that the maximization of the defect is not important for the normal table. We could as well have dropped this condition from the definition and choose a unique one by the minimization of the denominator degree only. However, in the nonnormal case, it does make a difference, and moreover an appropriate normalization makes it easier to compare with the Baker type definition to be given later. If the Pad~ table is nonnormal and thus has singular blocks of size larger than 1, the situation is much more complicated because the coordinates in the table do not correspond to precise degrees anymore. It is possible to describe the Pad~ table in terms of w and a as we have said before. A constant w refers to an upward sloping diagonal while a fixed refers to a downward sloping diagonal. All the entries and block structure etc. can be described in terms of the grid (w, a) with w = 0, 1 , . . . and a = - w , . . . , w . The mPA table is essentially the PA table rotated over 45 degrees, but there is something more to it. On an w-fine, in the usual Pad~ table there are only w + 1 different entries possible. However, in the table of mPAs, the a on an w-line ranges from - w to w. Hence, the number of entries is about doubled on such a line. Let us give a part of the mPA table for the example of Section 5.3. E x a m p l e 5.5 ( E x . 5.8 c o n t i n u e d ) The mPA table is given in Figure 5.6 as it corresponds to the Pad~ table that was represented in Figure 5.2 for Example 5.3. Figure 5.6: mPA table for f ( z ) = 1 + z 4 + z s + z 9 + . . .
- 7 - 6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
W
................
! XoXoXoXo I..:...:...:..
................
i
I
5.6. MINIMAL PADE APPROXIMATION
259
We have indicated which mPAs are found on the w-line number 6 for t h a t example. One should compare with Figure 5.2 and an explicit numbering of the downward sloping diagonals (0 for the main diagonal, negative below and positive above) will help to u n d e r s t a n d Figure 5.6 better. The Euclidean and gronecker indices for the sequence f - {f6, f s , . . . } can be easily found from Figure 5.2. The Euclidean indices are given by the number of elements the w-line cuts of from each singular block. The Kronecker indices are therefore ~0 = 0, 'r = 2, ~2 = 3, ~3 = 4, ~4 = 5 and ~;5 = 7. For r = w + a = 0, the minimal a ( - w ) = 0. This gives us X0 for [ 6 , - 6 ] . The current block for the sequence f has size 2. The solution • is kept until r = w + a = 2,r - a l = 2. For r = 1, t h a t is for a = - 5 , we reach the main diagonal of the current block in the Pad~ table of f. Here a(a) is still 0 and [ 6 , - 5 ] = X0. For a = - 4 , the minimal a ( a ) becomes 1 since r = w + a is then 2. But we still select a m A P of degree ,Co = 0, giving the solution [ 6 , - 4 ] = X0 again. For a = - 3 is r = w + a = 3, and this means t h a t we are in an E situation: r = 2,r - 1 giving a minimal a ( - 3 ) = 'r = 2. We have a choice for the m A P . Both X0 and X1 are good candidates for the mPA at t h a t position. However the one with minimal defect is definitely [ 6 , - 3 ] - X0. For a - - 2 , i.e., r = 2~1 = 4, we leave the current block, and this time we have to take X1 as a solution. The diagonal a = - 2 goes right through the 1 • 1 block of X1 in Figure 5.2. The size of the block is a2 = 1. For a = - 1 , we get r = 5 = 2,r - 1 and this means t h a t the minimal a(a) is now 3. We have again the choice for the m A P . The one with minimal denominator degree is X1. We skip a few steps and come to a = 3. Then r = w + a = 9 = 2,r - 1, an E value. We can now choose between X3 and X4. The precise degrees for X3 are 2/4, while for X4 they are 0/5. Since a ( 3 ) : 5 and a ( 3 ) - a : 2, we see t h a t the defect of X3 is 1, while for X4 it is 2. Hence, we should choose X4 here. For a = 4, 5, 6, X4 is the only possible choice. For a = 7 i s w + a = 2,r 1, another E situation. We can choose between X4 and Xs. Because Xs represents 0 / x 7, with n u m e r a t o r degree - o o , we get an infinite defect by choosing Xs here.
In general, to find all different mPAs on some w-line, we should find all true right orthogonal polynomials a0,k for the moment sequence f = {f~, f ~ - l , . . . } and choose them as denominators while the corresponding
260
C H A P T E R 5. PADE, A P P R O X I M A T I O N
numerators are found by the second system in (5.22). This is precisely what the antidiagonal algorithm of Section 5.3 does if we remember its interpretation in the context of orthogonal polynomials of the previous chapter. Hence the following theorem is true. 5.8 The algorithm for the antidiagonal as formulated in Section 5.3 computes all different minimal Padd approzimants on that antidiagonal.
Theorem
Although we implicitly used the fact t h a t a m P A of type (w, a) is also some PA, we show now explicitly of what type it is. 5.9 If ra -1 is a mPA in the Frobenius sense of Definition 5.5 of type (w, a), then it is a PA in Frobenius sense of Definition 5.3 of type ( a - a, a)~where a is the minimal index for the approximant, hence for the sequence f - {f~, f ~ - l , . . .} of order w + a, except in an E situation, where it is still a PA, but the type may be different.
Theorem
P r o o f . Clearly, the only thing t h a t has deg r. The bounds deg a _< a and deg r show t h a t 2 c ~ - a _< w or, 2a _< w + a the notation of the previous section. In case A, where a - 'ok for 2,~k <_ r t h a t 2a <__r
to be checked is t h a t w _> deg a + _ a - a say that it is sufficient to r Note t h a t a - a ( r ] ) . We use < 2,r
-ak+l,
we have trivially
In case B, where a - r + 1 - 'ok for 2,r - a k _< r < 2~;k, we find t h a t , if we exclude the E situation r - 2,r - 1, so t h a t also for case B, 2a g r For an E situation, the defect will always be positive: a - 'ok and either we have a defect d - ak in the denominator if we choose the orthogonal polynomial a 0 , k - 1 as the denominator, or we have a defect d - ak+l in the n u m e r a t o r if we choose the orthogonal polynomial a0,k as the denominator. Thus we have a PA of type (a - a , a - d) or of type (a - a - d , a ) . In fact, the latter statement is true for all w and a and not only for the E situation. D In view of the minimal Pad~ approximation problem, we shall come back to the computation of PAs along a downward sloping d i a g o n a l , i.e., a a-hne. From the discussion of the diagonal algorithm in Section 5.2, it has become clear t h a t we compute there PAs in the Baker sense, i.e., the denominator has nonvanishing constant term. Hence, a comonic normahzation for the denominator is appropriate. This indicates t h a t we should also give a Baker type definition of the mPA problem.
5.6. M I N I M A L PAD]~ A P P R O X I M A T I O N
261
D e f i n i t i o n 5.6 ( m P A - B a k e r , d e f e c t ) Given a formal power series f ( z ) e F[[z]], an integer w C N and an integer a, - w < a < w. Find polynomials q(z) and p(z) such that 1.
ord ( f ( z ) - p(z)q(z) -1) > w + 1
2.
a is minimal in
2=. 2b. 3.
(5.24) (5.25)
deg q(z) < a degp(z) < a -
(5.23)
a
The defect d which is defined as d = m a x { a - deg q, a - a - degp}
(5.26)
is maximal 4.
A m o n g those satisfying 1-3, minimize deg q(z)
(5.27)
The approzimant pq-1 is called a minimal Padd approzimant ( m P A ) of type (w, a) in the Baker sense with minimal index a and defect d. It is denoted as [w, a]. Note that the order condition (5.23) together with the minimality of a requires that q(0) y~ 0. If q(0) were zero, then p(0) would also be zero and thus p(z)q(z) -~ would be reducible, which would mean t h a t a solution with lower a existed, which is impossible because a is minimal. Hence q(0) ~ 0 and we can suppose t h a t it is 1, i.e., we suppose that we have a comonic normalization for the denominator. Under this restriction, the linearized order conditions (5.22) are equivalent with the rational ones (5.23). We rewrite t h e m in view of the fact t h a t now the a value is constant while the w values increase in the following relations. (Again fk = 0 for k < 0.)
0
fo iq-
f-~
f-a+l
f-=+l f-a+2 f-~+2
"
"
"
9
9
9
p
(5.2s)
fc,-a+l iq-
0, q(0)#0
f~-~ where /~ q is the reversed stacking vector of the denominator q(z) which corresponds to a certain w value and which is supposed to have a degree < a. The stacking vector p corresponds to the associated n u m e r a t o r p(z) which has degree _ a - a.
262
CHAPTER
5.
PADE APPROXIMATION
The reader will recognize here a problem related to constrained minimal indices for the sequence ] - { f - ~ + l , . . . } . It is constrained by the extra condition q(0) ~ 0. However, note that although the polynomial z~ -1) is of strict degree a, the polynomial q(z) itself need not be of strict degree a. Since in this case we are working on a a-fine where only w changes, we shall denote the (constrained) minimal indices as a ( w ) i n s t e a d of a ( w + a ; ]). We shall in the rest of this section always mean constrained minimal index if we say minimal index, and the same holds for the mAPs. As in the Frobenius type definition, the minimality of a = a ( w ) impries that at least one of the inequalities (5.24) or (5.25) is an equality and the defect will be caused by a deficiency in either the numerator or the denominator degree, while the other degree is exact. To see the effect of the maximization of the defect, we consider the following example for mPAs without the extra restrictions 3 and 4 of the definition. E x a m p l e 5.6 ( E x . 5.3 c o n t i n u e d ) Consider f ( z ) - 1 + z 4 + z s + z 9 + . . . and choose a = 0 a n d w = 4. The minimal value for a i s a = a ( w ) = 4. Thus not only (1 + z4)/1 and 1 / ( 1 - z 4) have this minimal a, but one can choose e.g., any of the PAs among [0/4], [1/4], [2/4], [3/4] [4/0], [4/1], [4/2], [4/3] [4/4] to be minimal solutions (see Figure 5.2). For the diagonal a = 1, we have the choice among [0/4], [1/4], [2/4], [3/4] and for a = - 1 we can choose among [4/0], [4/1], [4/2], [4/3]. Hence, to select the "simplest" one, we should maximize the defect, which would give us [0/4] for a = 1 and [4/0] for a = - 1 . For a = 0 we stin have the choice between both of the latter candidates. Choosing the minimal denominator degree will give a unique (up to normalization) choice here.
5.6. MINIMAL PADE APPROXIMATION
263
Suppose we fix some a. In fact the computations in Section 5.2 are done for a = 0. A combination of Theorem 1.18 (v = 0), Theorem 1.16 and the Truncation Theorem 5.4 gives the results for a general a-fine. From the theory of minimal indices, we know that the jumps occur for w+a=r at 0 -- too, 2~i
-- ~I,
2~2
-- a2,
999
where the tck represent the successive Kronecker indices of the sequence f - { f - ~ + l , f - ~ + 2 , . . . } . By our convention we have for w >_ - a that a(w) is the minimal index of this sequence of order w + a. To see better what is going on, we show a picture of a general situation in Figure 5.7. It represents a part of the Pad~ table for f. The downward Figure 5.7: Diagonal algorithm
/~k-1
~k
I I
I I --
~k-1
--
(7
dk-1
I
I I ~k
/';k+l
ot k
\
I I
-----,I
,",, I
tgk+l
I
/1 /
'
I
I,
ak+~
~l I I
-- G
--
(7
u
sloping diagonal is the a-fine on which the algorithm progresses for increasing w _> - a . The picture shows two singular blocks crossed by the diagonal.
264
CHAPTER
5.
PADE APPROXIMATION
The first one is entered from the top. The point where the a-line enters this block refers to an w + a value equal to 2,r The left top element of the block is selected and the defect dk-1 in the denominator degree is indicated. The excess in approximation order is ak. Thus we may increase the w value up to w + a = 2~k-1 + ak - 1 = 2,r - ak - 1 and it will not be necessary to change the approximant. This approximant is indeed a mPA of type (w, a) for f because the minimal a(w) = 'r which means that we should look for an approximant which is at the left of the vertical 'ok-1 and above the horizontal 'r - - a and has still an approximation order at least w + 1. It is clear that then the left top element of the block should be selected since this is the one that maximizes the defect within these bounds. When w + a = 2 , ~ k - ak, we have reached the antidiagonal of the first singular block. Now the order condition is not satisfied any more by the left top element. The minimal degree jumps to a ( w ) - 'r The algorithm computes the left top element of the next singular block. Note that the a-line shall always enter the blocks alternatingly from the top and from the left except when it coincides with the main diagonal of the block, in which case the next situation is undecidable. This implies that a positive defect occurs alternatingly in the n u m e r a t o r and the denominator degree, unless the defect is zero. The minimal a ( w ) = ~k now is maintained for all w values from the antidiagonal of the first singular block to just before the antidiagonal of the second block. Once w has crossed the first antidiagonal, the approximant should be found left of the vertical ~k and above the horizontal ~ k - a. The algorithm computes the reduced approximant of the second block, but this is not a mPA yet. For example, when w + a = 2~;k- ak, the PA ['r -a/,ck] is a mPA with (numerator) defect ak, while the reduced element of the second block is the PA [~k - a - dk/,c.k] with a (numerator) defect dk that is smaller (at least in the example of the picture). The latter is a mPA of type (w, a) as soon as w + a becomes 2~;k - d k where dk is the defect of the second block, and it stays being a mPA until w reaches the antidiagonal of the second block i.e., when w + a = 2,r - ak+l - 1. We remark that it is possible to adapt the algorithm to make it generate a mPA at every step. For example, for the picture in Figure 5.7 when w + a = 2,r - ak, the mPA is ['r a/~k], but the algorithm chose to compute the PA [ ~ k - a - d k / ~ k ] instead, which is not a mPA. This choice is the result of the fact that the algorithm takes the orthogonal polynomial ao,k as the next denominator, and this is but one particular choice for the mAP. However, we can add polynomial multiples to it of the previous orthogonal polynomial a0,k-1 up to powers ak - 1. These can be used to decrease the -
5.7.
THE MASSEY
ALGORITHM
265
numerator degree that goes with a0,k down to at least ' ~ k - a - ak which gives a defect of at least ak. If ['r is an element from a larger block, then the defect may be larger than ak. Although we have not given all the proofs, it should be clear from this discussion that the diagonal algorithm for diagonal a does not compute all the mPAs of type (w, a) for w - - a , - a + 1 , . . . , but it does compute all the PAs on the a-line and these are mPAs for certain w values. The PA ['ok - a/,ck] from block k with defect d >_ 0 is a mPA of type (w, a) for all w values satisfying 2,~k - d _< w + a < 2,~k+1 - ak+l. The algorithm could be modified to find also the lacking mPAs for the other w values. We shall not do this here. The next section will describe an atomic version of the diagonal algorithm. This will give several intermediate results because the jump in degree with ak will be composed of ak + 1 elementary steps. But also there we shall not do the extra effort to assure the minimality of the approximant in every step.
5.7
The Massey algorithm
We shall here formulate an atomic version of the algorithm for the main diagonal, which, after some manipulations, will turn out to be precisely the algorithm as proposed by J.L. Massey in the context of BCH decoding [i92]. It was originally given by Berlekamp in Chapter 7 of his book [14] which appeared in 1968. It was given a more practical implementation with shift registers by Massey in [192]. See also Chapter 8 of [193] and Section 5.4 of [62]. It was formulated in a different terminology since it was designed to decode BCH codes. These codes were named after Bose and l~ay-Chaudhuri who published their results in 1960 [20, 21] and Hocquenghem who had his paper published in 1959 [152]. Later on, these codes were generalized by Goppa in 1970 [115]. Among these, the Keed-Solomon codes [207] are probably best known. They are the ones that are used to store the information on compact discs. They are at the same time a generalization of BCH codes and a subclass of Goppa codes. It was only in 1975 that Sugiyama et al. [219] recognized that the Euclidean algorithm was in fact the method to decode Goppa codes. By that time however, the names of Berlekamp and Massey were already associated with the algorithm. That is why it became popular under their name. Suppose we fix some a. In fact the computations in Section 5.2 are done for a = 0 but in the previous section we already gave results for arbitrary diagonals.
C H A P T E R 5. P A D E A P P R O X I M A T I O N
266
We shall thus fix some integer a and walk along this diagonal by using the parameter w, which has the same meaning as in the previous section. Thus we know that the constrained minimal indices (and hence also the PAs we want to compute) jump for w + a - r values at 0 - - too, 2 ~ i
-
-
C~l ~ 2~;2 -- a 2 ,
999
where the ~k are the Kronecker indices associated with the sequence f
-
"[/-~r+1,/-~+2,...}. Let us start with the recurrence formula of the Euclidean algorithm for series in F(z). It follows from the continued fraction in Theorem 5.3 that this recurrence is
aO,k(Z) - ao,k-2(z)'u,k_lCk zak-l+ak + aO,k-l(Z)ak(z)
(5.29)
where ak - - ( s k - a ldiv rk-1)ckz"". Suppose that we have just computed the polynomial a0,k-1, that we know the value of 'r and the value of # + 1 = 2,r - a k - 1 , that is the point where the latest jump in the minimal degree has occurred. Let us denote a0,k-1 as qr for r = w + a = 2,~k-1 (recall that w counts twice as fast as a). Note that in this section we shall use r to count the elements on diagonal a rather than w. This is for notational reasons, because r starts from 0 on, unlike the w which started with w = - a . This implies that the order of approximation is now at least w + 1 = r a + 1. To find the next value of ak, or equivalently, the next value of 'ok, we should inspect the order of the corresponding residual. We have explained at the end of Section 1.6 that this can be done using inner products. We therefore should compute re-
[fr
fr
"" "]qr
where qr is the stacking vector of the current polynomial qr ~r is zero, we do not have to change qr and thus just set qr know that the first nonzero ~r will show up when r - 2tck--ak -Thus we may then compute nk as tck - nk-1 + r 2nk-1 - r Now the updating comes into action. We shall suppose that
As long as - qr We 2~k-1 + a k . nk-1.
ak(z) - qk,o + qk,1 z + . . . + qk,~k Z'~k. We consider an atomic version, which means that the updating with ak of relation (5.29) is decomposed into the following set of updates:
5.7.
THE MASSEY
267
ALGORITHM
r 2tCk - { 2 k 2 n k - ak + 1 2~k - ak + 2
update qr
: :
2tCk
--
q2~,_t-otk_,-lZCtk+ak-tUk-lCk + qk,oqr
qr 1 = qr + qk,1 zq2,~k -,,,h- 1
qk,2z 2 q2~,-otk-1
qr
-- qr +
qr
= qr + qk,~h z~h q2~k--ak--1
We observe t h a t the first step is different from the other ones. It can be given the same appearance by making the following observations. First, we introduce the normalization as we have defined it for series in F(z -1 ) in the normalization Theorem 1.14, but now translated to power series in F{z). As one may well recall, this leads to reversing the polynomials ak and thus, what was a monic normalization for ak in Theorem 1.14 becomes a comonic normalization in the present case. This means t h a t qk,0 = 1 and u k - l c k -- --[~k-2]-l~k-1 where ri, i - k - 2 , k - 1 are the residuals associated with a0,i. Second, recall t h a t # + 1 = 2,~k-1 - a k - 1 when we do the first updating step, t h a t is when r + 1 = 2 , r ak, so t h a t the power z ~k-~+~k is then equal to z r Moreover, q2~k_2 - q2~k_~-~k_~-I = q~," This suggests a more appropriate notation for the residuals. We use rt, instead of ~k-2 since it is associated with qg and for r + 1 = 2,r - ak. For similar reasons, we denote ~k-1 as ~r since it is associated with qr Mixing this all together, we see t h a t the first u p d a t e step can be written as qr
-- qr + ( - - ~ l ~ r 1 6 2
After this step, we can adapt the # value, since from now on, the last j u m p has occurred, not at the old # + 1, but at the r 1 value which has just been processed. Therefore if we now set the new value of # equal to the current value of r we give # + 1 the correct interpretation as the point where the most recent j u m p in the minimal degree has occurred. Using this new # value, we can rewrite the subsequent updates for r = # + i, i = 1 , . . . , ak as qr 1 - qr + qk,i z r and these have the same formal expression as the first step. Now we should find out how to compute qk,i. If we recall t h a t this coefficient serves to eliminate a residual, we easily find t h a t qk,i -- _ ~ 1 ~ r
where for a r b i t r a r y r and a - 0, ~r denotes the coefficient of zr in the residual associated with qr for the series ~_,~=1 fk zk. T h a t is the coefficient
CHAPTER 5. PADE APPROXIMATION
268
of z r in the series fqr In general, for diagonal a, this is the coefficient of z r in the expansion (~k~176 fkzk)qr As explained at the end of Section 1.6, this corresponds to
~r [fr
fr
fr
...]qr
Because qr has degree at most equal to the constrained minimal index, a(r which is the current Kronecker index 'ok, we know the number of terms we have to evaluate in the inner product. Apart from the starting values, we have now proved the algorithm of Massey which is described in Algorithm 5.1. Algorithm 5.1" Massey Given f0, fl, f 2 , . . . , fk - 0 for k < 0 and a C Z Set # - - 1 , qu(z)- O, a ( ~ ) - 0 and r t , - 1 Set r qr 1 and a ( r 0 for r 0,1,2,... ~r x-'~(r m z...,k=O f r 1 6 2
if ~ r qr
0 then - qr a ( r + 1 ) - a ( r
else qr if r + 1 < a(r + else c~(r g-r
qr - ~-~l~r162 2a(r t h e n 1) - a ( r 1)- r
1 - a(r
endif endif endfor
We have denoted in the algorithm qr - z-~k=0 x-"~(r qk,r z k . Note also that the algorithm tests whether the residual ~r is zero or not. If it is zero, nothing has to be done: either the zero occurs during the updating cycle, where some residual happens to be zero accidentally, or it happens in the cycle after the updating steps have been completed where the number of zeros determines the next value of ak. When it then happens that a residual is not zero, it will depend on whether it is still during the updating cycle, in which case the updating just continues, or whether it happens after the
5.7.
THE
MASSEY
269
ALGORITHM
value of ak has been found, then one has to do the first of the updating steps, which is kind of special. The distinction between both can be made by checking whether r 1 <_ 2 a ( r or not. If r 1 > 2a(r then the first updating step is performed, which is formally the same as the other updating steps, but besides the updating also the new value of # has to be installed and the jump in a ( r should be made. It remains to check the initial conditions, but this is a trivial m a t t e r . E x a m p l e 5.7 ( E x . 5.3 c o n t i n u e d ) We choose again the example f ( z ) 1 + z 4 + z s + z 9 + . . . with a - 1. The initializations are /~---1, q_1--0, r
q0-1,
r-1
a(-1)--O,
-- 1
a(0)-0.
The first step gives r0 - 1, so that ql -- q0 -- ( r 0 / r - 1 ) q - 1 -- 1. Because 2 a ( 0 ) -
0 < 1, a ( 1 ) i s computed as
a(1)-r
I
a n d / z - 0. The next ~r - 0 for r - 1,2,3, also qr values. Then r4 - 1 ~ 0 and we get an u p d a t e qs(z) -
q4(z)- (r4/ro)Zr
1 and a ( r
-
-
0 for these
I - z 4.
We find 2a3 - 2 < r + 1 - 5, and we have the updates a(5) - r + 1 - a(4) - 4 and # - 4. For the next step we have again ~5 - 1 with u p d a t e q6(z)- q5(z)-
(1/1)zq4(z)-
( 1 - z 4) - z -
but because 2 a ( 5 ) - 8 > r + 1 - 6, we keep a ( 6 ) For r 6, we get r6 - - 1 , giving q7(z) - q6(z)- [(-1)/1]z2q4(z)
and
-
1-
1- z-
a(5)-
4.
z + z2 - z4
z4
270
CHAPTER
Then f ~ -
5.
PADE
APPROXIMATION
1, so that
qs(z) - q T ( z ) -
(1/1)z3q4(z)
-
1-
z + z2 - z3 - z4
and still 2 a ( 7 ) - 8 _> r + 1 - 8 and therefore a ( 8 ) - a ( 7 ) - 4. The last step of this example gives rs - - 2 with updates qo(
) -
qs(z)-
-
-
z + 2
-
+ z
It just happened here that the denominators computed are the denominators of the PAs that border the 4 by 4 singular block from the right. Hence we have at every stage a mPA, however this is not the rule in general. The algorithm, as we gave it here is the same as the one given by J.L. Massey in [192] with the exception of the initial condition. He chooses the starting value to be q-1 - 1 instead of our q-1 - 0. It is our choice that generated the PAs in the previous example. Massey's initial condition does not give all these PAs. Thus the algorithm of Massey, when used in the context of Pad~ approximation is a simplified (not computing the numerator polynomials) atomic version of the diagonal computations and therefore a variant of the atomic Euclidean algorithm. Note that it just computes the denominators and the necessary residuals ~ are found via inner products. We have given it in that form to make it resemble the original algorithm of Massey, but it will be clear that it is not difficult to formulate an extended version which simultaneously updates the numerators and the residuals too. These satisfy the same recurrence relations as the denominators and we thus only have to find appropriate initial conditions for them. In the context of shift-register synthesis of BCH decoding, Massey was not interested in these other quantities. Only the denominator being a minimal annihilating polynomial was important. The problem there is to find for a given sequence {fk}k>0 the numbers { a k } ~ = l such that -
j-a,a+l,...
_
k--1
with a as small as possible. The Massey algorithm (for a - 1) does compute the coefficients ak and the minimal number a needed on a recursive basis, i.e., they are rechecked and updated when needed as the data fk become available.
Chapter 6
Linear systems and partial realization This chapter will serve to introduce several concepts from linear system theory and to give an interpretation of the previously obtained results in this context. Since it is our experience that specialists in approximation theory are not always familiar with the notions and the terminology from linear systems, we think that it is worth introducing the basic concepts. Eventually we shall work with and apply our results to very specific systems that can be defined quickly. However, we think that then the meaning of certain assumptions will be lost and in that case one is playing with mathematical objects that haven't any physical interpretation. This is what we want to prevent and therefore we start with the most general situation and introduce several concepts at the lowest possible level. We hope that this will give the reader a grip on what all the terminology means. Of course we restrict ourselves to the bare minimum since we do not have the ambition to rewrite any of the excellent books on linear system theory that already exist. The reader is warmly recommended to consult the extensive literature [36, 101, 165, 169]. Linear system theory is a place where many mathematical disciplines meet. In harmony with the previous chapters of the book, we shall give a rather formal algebraic treatment. So most of the definitions will be given for a commutative field. However at a certain point this will not be possible anymore. We then switch from formal series and rational forms to functions of a complex variable. In contrast with this relatively extensive introduction, our interpretation of the results of previous chapters will be relatively short. Most of the theorems will be just reformulations of earlier ones in a different terminology 271
272
CHAPTER
6. L I N E A R S Y S T E M S
without needing a new proof. Other results, like the theory of orthogonal polynomials with respect to a Toeplitz moment matrix, as we have discussed it, will be definitely too thin to serve system theoretic demands. In both cases, the discussion will be brief.
6.1
Definitions
In this section, we shall introduce the notion of a hnear system and several ways to represent them. To give the reader insight into these mathematical objects, called hnear systems, we shall develop some results without demanding much other mathematical background. We start by defining what we mean by a signal.
Definition 6.1 (signal) Let F be a field and ~ C_ R a time set.
Then a scalar signal is a f u n c t i o n defined everywhere on ~ with scalar values in F. Similarly, a vector signal is a f u n c t i o n ~ ~ yr and a m a t r i z signal is a f u n c t i o n ~ ~ Fpxm with the integer numbers p, m > O. The set of all possible signals is denoted as S.
Note that S is a vector space over F. It is in fact equal to 1~, the set of mappings from lr to F (or of (IP') ~ for the vector signals). In this chapter, we shall only work with scalar signals; if not, this will be stated explicitly. Two choices for the time set 11are of particular interest. When ]I - R, the set of real numbers, we have a continuous time set and the signals are continuous time signals or continuous signals for short. On the other hand, also 1[ - Z, the set of all integers, is important. This is a discrete time set and the signals are called discrete (time) signals. Now we can introduce the concept of a system with a single input and a single output. It is basically a device that maps certain (input) signals into (output) signals.
Definition 6.2 ( S I S O s y s t e m ) A single-input single-output s y s t e m or SISO s y s t e m is a triple (U, Y, 7( ) where U C S is a set of input signals, Y C S is a set of output signals and 75 9 U ~ 3{ is an input-output map. connects with each input signal u C U an output signal y - TI u C 3{.
7(
This is a rather general definition. In the sequel, we shah impose further limitations onto the set of systems to get more interesting subsets.
Definition 6.3 ( l i n e a r s y s t e m ) If the input and output sets U and Y are subspaces of S, then we call the system (U, u 7"l ) linear if 7-[ is a vector space homomorphism.
Thus
273
6.1. D E F I N I T I O N S
Because we have an ordering on the time set, we can speak about the past, the future and the present with respect to a certain time r. D e f i n i t i o n 6.4 ( p a s t , p r e s e n t , f u t u r e ) For every ~" E ~ we can partition the time set into
If we consider the singleton { r } as the present, then we refer to the first set as the past and to the last set as the future. The signal space S can be written as a direct sum of three subspaces
where S~+ is called the space of past signals: it contains all signals that are zero on the present and future, S r- is the space of future signals" it contains all signals that are zero on the present and past and S~ is the space of present signals" it contains all signals that can only be nonzero for t - 7. The projectors onto the past (present, future) signals are denoted as IV+,
II ).
e S,
H
) the
(p,
ent,
of the signal s with respect to 7. Occasionally we shall use combinations: e.g. (II~_ + II~)s is called the past and present of s and it will be abbreviated as II~_0s etc.
Note that we have indicated the past by a "+"-sign, the present by a "0" and the future by a "-"-sign. The reason for this will become clear later on when we define the z-transform following the engineering convention. We introduce now the concept of causality. D e f i n i t i o n 6.5 ( c a u s a l i t y ) Let the space of input signals U be projection invariant, i.e. VT C ~ " IIT"U C U, where IIT" is any of the projectors introduced above. A system is called causal or dynamical if with respect to all possible times T the present input signals can only influence present and future output signals" 7"[ H~U C II~_u for all 7" C ~. A strictly causal system is a system where present input signals can only influence future output signals, i.e. 7-l II~U C IV_u Note that causalitycan be expressed in other, equivalent ways. A system is causal if and only if two input signals with the same past, have corresponding outputs with the same past" VT E ][,
VUl,U2 C U
7"
7"
9 I I : u I -- I I + u 2 ==~ H + ~ ~ u I -
7"
H+']-~ U 2.
274
C H A P T E R 6. L I N E A R S Y S T E M S
Equivalently I I ~ _ u - 0 => II~_~ u - 0 or 7-/II~_U C_ II~_Y. Yet another way to describe causality is II~.~ II~_ - 0 for all ~" C ]I. In the sequel we shall use the latter expression, but the other interpretations may be useful too. Now we define time invariance of a system. D e f i n i t i o n 6.6 ( t i m e i n v a r i a n t ) Let the time set ~ and the space of input signals U be shift invariant, which means VT, t C ~ " t - 7 C ~ and Z - ~ U C_ U, where Z - r is the backward shift or delay operator, defined on S by Z - r s ( t ) s(t - T), s E S. A system (U, u 7-/) with a shift invariant set U of input signals is called a constant or time invariant system if Z -~" and 7[ commute for all 7 E ~, i.e. 7-[ Z - ~ u - Z-~7-l u. Note that the continuous time set R as well as the discrete time set Z are shift invariant. To obtain further results, we shall limit the discussion for the m o m e n t to discrete linear time invariant causal systems. The time set wiU now be ]I = Z. Moreover, the set of input and output signals will be equal to the set of all possible functions Z ~ F: U = u = S = F ~. A parallel t r e a t m e n t can be given for continuous systems, but we do not want to complicate the exposition more than strictly necessary. Note that in the discrete case, the signal space S can be described in terms of the canonical basis consisting of shifted impulse signals. D e f i n i t i o n 6.7 ( i m p u l s e s i g n a l ) The impulse signal ~ is the Kronecker delta function at O, i.e., ~k = 1 for k = 0 and zero elsewhere. W i t h the impulse signal and the shift operator, we can describe the signal space S as span~{Zk6 9 k 6 Z}. It will be useful in certain instances to describe a system in an isomorphic framework. The isomorphic image of the signal space S will turn out to be the space of two-sided formal Laurent series, which is defined as follows. D e f i n i t i o n 6.8 ( t w o - s i d e d f o r m a l L a u r e n t s e r i e s ) The space of the twosided formal Laurent series F(z, z -1) is the set of all mathematical objects of the f o r m ~ = - o ~ sk z - k with sk E F. Clearly F(z, z -1) is a vector space over F. It is isomorphic to the space S of discrete signals. The isomorphism is given by the z-transform. D e f i n i t i o n 6.9 ( z - t r a n s f o r m ) The z-transform s - ~ of a discrete time signal s C S is defined as the two-sided formal Laurent series whose coefficients are the signal values at the discrete times, i.e. ~.s - ~k=-oo+c~ skz -k with sk - s ( k ) , k C ~ - Z. It is clear that also the inverse z-transform
6.1. D E F I N I T I O N S
275
is defined for all two-sided formal Laurent series generating a signal whose values at the discrete times are the coefficients of the formal Laurent series. So we have a one-to-one relationship between s and ~. s is an isomorphism between S and F(z, z - l ) 9 S - / Z S - F(z, z-l}. The inverse z-transform is a generalization of what we described previously as a stacking operation. As we associated with a polynomial or a formal series a column matrix which was called its stacking vector, the inverse z-transform does basically the same thing. To describe the properties of a discrete linear time invariant causal system, we can work with the space S, but it is also possible to describe it in its isomorphic image S - F(z, z -1). This has been given a name. D e f i n i t i o n 6.10 ( t i m e d o m a i n , f r e q u e n c y d o m a i n ) The signal space S is called the time domain and the space of two-sided formal Laurent series is called the frequency domain. The signal space S is shift invariant with respect to the shift operator Z. That we have represented the shift operator by Z is not a coincidence because the corresponding operator Z in S defines a multiplication with z. The reader will recognize it as the down shift operator for stacking vectors. We shall often write the multiplication operator ,~ simply as z. Thus S being shift invariant means that Vs E S,VT E I[ : s
= z~s
which can also be expressed a s / : Z - 2 ~ o r / : - 2 - 1 ~ Z - ZZ:Z -1. Not only the shift operator in S has an isomorphic equivalent in S. We can generalize this to any operator acting on S. We shall use the convention that an operator T acting on S has a corresponding operator in S which is denoted by T. They are related b y / : T - T/:. This holds especially for the input-output map 7-/ and the projection operators we introduced before. Adding the shift operation to the vector space structure of F ( z , z - 1 ) , we see that the multiplication of a two-sided formal Laurent series with a polynomial is well defined, because each coefficient of the result can be computed with finitely many operations in F. Multiplication with a formal Laurent series, two-sided or not, is in general not defined. Recall however that multiplication of elements in the subspaces of formal Laurent series F(z} or F(z -1 } are well defined. In the frequency domain S - F ( z , z - 1 ) , the properties of our linear system can be reformulated as linearity:
is a linear operator on F(z, z - l ) .
C H A P T E R 6. LINEAR S Y S T E M S
276
" T ~ II"T0 _ - 0 f o r a l l T E l I II+
causality:
7:t 2 - 27:t.
time invariant"
Note that time invariance implies Z-17-/Z = 7-I which means that with respect to the canonical basis in S, the operator 7/ must have a matrix representation with a bi-infinite Toeplitz structure. Suppose that we are considering the past, present and future input and output signals with respect to a certain time T = k C lI = Z. Then in the frequency domain, the past, present and future subspaces correspond to
n~
_
~-k+~F[[z]] aoj F~
~ik__.~
__
z _ k _ 1 F[[Z_ 1 ]]
de__fFk (Z_ 1 ).
Note that the following property holds for the projection operators
HI + H0~ + rt ~_ - z
(6.~)
and for II any of the possible projection operators, we have
Hk -
Z - k I I ~ k.
(6.2)
In general one can write for s = {sk}k C S that IIko_3 -- B k Z - k ~
-3l-
IIok_ +1 s.
(6.3)
Suppose that we subdivide the signal space S into the direct sum S-
S~_ | Sok_ with S~_ - H~_S and Sok_ - IIok_S.
Accordingly, the effect of an operator can be split up in four parts, e.g. for the operator 7-/:
-
+
+
no
+ no
no _.
(6.4)
For example II~_7-/II0k_ maps the present and future into the past. For a causal system, we know that this is zero. It is of practical importance to describe what the reaction of a system will be at present and in the future when the system is driven by an arbitrary input signal. Thus we should be able to describe the effect of II0k_7-/. As we just said, for a causal system, the second term in the decomposition - II~_7/II~_ + II~_7/II0k_ gives zero anyway. If the input signal has
II~_7-I
6.1. DEFINITIONS
277
no past, then the so called impulse response of the system is sufficient to describe its full effect. However, if the input signal has a past, then we shall need the notion of state of a system. The state condenses in some sense the complete history in a compact form. The state depends on the past of the input signal and the set of all possible states a system can reach at a certain m o m e n t is the state space. These are the i m p o r t a n t concepts to be introduced next. We start with the impulse response. For a shift invariant system, the output can partly be described in terms of the impulse response which is the output generated by the system when driven by the impulse signal. D e f i n i t i o n 6.11 ( i m p u l s e r e s p o n s e ) The impulse response h of a discrete time system is the output signal produced by the system when the impulse ~ is given as input signal, i.e. h = 7-[ ~ or in the frequency domain formulation h - ~ ~ - ~ 1. The impulse response has the following properties. 9 If the system is time invariant, we know that zh - 7~ Z~ Zh = ZTI ~ = 7"l Z~.
-
7~ z
or
9 If the system is linear and time invariant, ]~(azk+flz l) - ~ (azk+flzt), or ( Z k h a + Zlht3) - 7"l ( a Z k + flZt)6, 9 If the system is causal, we know that II~]z - 0, i.e., ]z E F[[z-1]] F_l(Z -~) or h C S~ If the system is strictly causal, we know t h a t l~I~0]z- 0, i.e., ]z e F0(z -1 ) or h e S ~ Now suppose that u C Sok_ has a zero past. Then H ~ u - 0. If moreover the system is causal, then we can see from the decomposition (6.4) t h a t
y=7-lu
-
iIo _
=
Ilok_g
=
Ho
=
h.(II0k_u)-
h.u
where 9 denotes the convolution operator. D e f i n i t i o n 6.12 ( c o n v o l u t i o n ) If h and g are two discrete signals, then their convolution f = h . g is defined as another signal given by
f(j) - ~ iEz
h ( i ) g ( j - i),
278
C H A P T E R 6. L I N E A R S Y S T E M S
whenever this exists for all j E Z.
Note that these sums are infinite and need not converge in general. However when both h and g are only half infinite, i.e., when both have a zero past or future for some time instant, then these sums are actually finite and the convolution is then well defined. For example, in the situation above, h C S ~ and u C S0k_, so that their convolution y - h . u is only nonzero for time instances > k and J j f(k
+ j) -
h(i)
(k + j -
i=0
-
h(j -
+
j>o.
i=0
Thus y E S0k_. We may now conclude that we have proved the following theorem. T h e o r e m 6.1 Let (S,S, 7-I ) be a discrete linear time invariant causal system with impulse response h - 7-[ 5. For given k C Z, let u C Sko_ be an input signal with zero past. Then the output signal y = 7-[ u is given by the convolution of h and u: y-
7-l u -
h.u
k
C So_.
In the frequency domain, this translates into the following. Let it C S ko_ = Fk-1 (z -1 ) be given and 7~ a linear shift invariant map on ~ - F(z, z -1 ) with impulse response h - 7~ 1 e F_I (z -1 ). Then ~1 - 7~ iz is given by the product
f/-- ~ ~t-- h~ E Fk-l(Z -1). In other words, 7~ represents the multiplication operator with h in S.
Note that with respect to the canonical basis in S0k_, the convolution operator II0k 7-/II0k_ for a causal system is represented by a lower triangular Toeplitz matrix whose nonzero entries are the impulse response values. However not all input signals are in S0k_, so how about the contribution of the past component H~_u C S~_ to the present and future output? That is IIok 7-/IIku. To describe this we shall need the notion of state. D e f i n i t i o n 6.13 ( s t a t e , s t a t e s p a c e ) The state at time k of a discrete system with input signal u is defined as the present-future component of the output signal generated by the past of u which is shifted to time O:
The set of all possible states at time k is called the state space at time k and denoted by S k : s
-
c s~
6.1. D E F I N I T I O N S
279
We have the following properties. T h e o r e m 6.2 If the system is time invariant, then the state space S k is independent of the time k. If the system is linear, then the state space forms a vector space. P r o o f . We can rewrite the state space for time k as
s~-
z~iio~_~ ii~s =
Z kZ-klIo_Zkl-I Z-kII~
= n o_ ~ n~ z~s = n~
H~
For the second line we used (6.2) and for the third line the time invariance of the system while the last line follows from the shift invariance of S. The fact that the state space is a vector space is obvious. [3 Because the state space is independent of time k, we shall drop the superscript k from the notation. We are interested in systems with a finite-dimensional state space. D e f i n i t i o n 6.14 ( f i n i t e - d i m e n s i o n a l s y s t e m ) A discrete linear time invariant system is called finite-dimensional if the state space is finite-dimensional. Hence, for a discrete finite-dimensional linear time invariant system, we can span the complete state space by some finite basis of signals from So_, e.g. yl, y 2 , . . . y n if the state space has dimension n. We can now completely describe the effect of II0k_H with the help of the state space basis and the impulse response. We get the following result. T h e o r e m 6.8 For a discrete linear time invariant causal finite-dimensional system with state space dimension n and state space basis signals yl, y 2 , . . . , yn, and impulse response h, we can write the present-future component of the output signal at time k as IIko_"]-~ "Ul,- z - k [ y
~ith z-k[y ~ y ~ . . . r
1 y2 . . .ynjT, k ._[_h .
(IIko_'U,),
x k C:_.~ ' ,
n~o_n H~_~.
P r o o f . We know from Theorem 6.1 that II0k~ u
- n0~_ n ~0~_= + n0~_ n n~ = h 9 (n0~_=) + z -~z~n0~_n n~
(6.5)
C H A P T E R 6. L I N E A R S Y S T E M S
280
Because the system is finite-dimensional, we can interpret in the second t e r m ZkIIko_~ II~.u ~s a state and write it as a linear combination of the state space basis signals y l , . . . , yn determined by the coefficient vector xk. D
Note that, given the state space basis, the vector zk characterizes the state of the system uniquely. Note also that the state space is independent of time, but the vector zk is not. D e f i n i t i o n 6.15 ( s t a t e v e c t o r ) Once we have chosen a state space basis for a discrete linear time invariant causal finite-dimensional system, we can write the state at time k of the system uniquely as a linear combination of these basis vectors. The vector containing the coordinate coefficients is called the state vector x k. To describe the time dependence of the system, we should be able to describe the evolution of the state, or equivalently, of the state vector. In the next theorem, we give the relationship of the next state vector Xk+l and the output value yk in function of zk and uk. Such a description of a system is called a state space description. T h e o r e m 6.4 ( s t a t e s p a c e d e s c r i p t i o n ) For every discrete linear time invariant causal finite-dimensional system the present-future output II'o_ TI u for any input signal u, can be described by a state space description, characterized by a quadruple (A, B, C, D), as follows. Given an initial state xi, it holds for k >_ i that zk+~ -
Azk +
A E ]~x,., B E ]W xl
Buk,
CE]~
Yk -- cTT, k + Duk,
xl
DEF.
Vice versa, every quadruple (A, B, C, D) of the given form can be interpreted as the characterization of a state space description of a discrete linear time invariant finite-dimensional system. For a strictly causal system, i.e. with D = O, we can characterize the state space by a triple (A, B, C) instead of a quadruple. A is called the state transition matrix. P r o o f . If we can prove the theorem for i - 0, the same result will also be valid for i r 0 because the system is time invariant. We start from (6.5) in Theorem 6.3
n0 _n
- H0 _y - z-
[y
+ h,
6.1.
281
DEFINITIONS
Now we use property (6.3) to rewrite
1Io~_ y
-
y~ z -~ ~ + no~_+a y
no~_.
-
. ~ z - ~ 6 + Ho~_+'.
II~ yJ
-
yJoS + Il lo_ yJ , j - 1 ,
-
ho~ + n~_h.
n~
The signals Z I I ~ _ y J and Z I I ~ h we can write
. . .,n
are both in the state space and therefore n
znx_yJ
E
yiaij,
aij C F
i=1 n
ZH~_h
E
Yibi,
bi G F.
i=1 Using all this we get yk z -k ~ + Hko+_~U
z-~[yo~ yo~...y~]~ + Z - k - l [ y l y2 . . . y ~ ] A z k +
(h05 if- z - l [ y 1 y2 . . . yn]B ) , (,U,k Z - k 5 .4_ ii0k+lu) where the components of A 6 F nxn and B 6 F nxl are aij and bi respectively. Taking the projection IIok results in
Yk -- C T x k + D u k , with C T - [ y ~
y ~ . . . y'~] 6 ~lxn and D - ho 6 F. The projection IIk gives
no~_+ly
z - k - 1 [yl y2 . . . y " ] A z k + Z - k - 1 [yl y2 . . . y " ] B u k +
h 9 (IIok_+lu) which should be compared with
Ho~+~y - z - ~ - i [ y x y ~ . . . y " ] ~ + ~ + h 9 (no~_+i=). Because the signals yl, y 2 , . . . , yn are linearly independent, we can write zk+~ - A z k + B u k .
This proves the first part of the theorem.
282
C H A P T E R 6. L I N E A R S Y S T E M S
It is clear that every quadruple (A, B, C, D) characterizes a state space description of a discrete hnear time invariant causal system. We prove t h a t this system is finite-dimensional. The state space is the set of the presentfuture output signals generated by all possible past input signals. In this case, the past is characterized by a vector x0. All possible present-future output signals with initial vector x0 (x0 represents the past input) and present-future input values uk - 0, k _ 0 have values yk - C T A k - l x o . Therefore, any state y can be written as a linear combination of the states y' with y~ the ith component of the row vector C T A k-1. Hence, the system is finite-dimensional. Note that the dimension of the state space is not necessarily equal to n. It could be smaller. The system is strictly causal if and only if D - h0 - 0. [:3 The state space description is especially interesting when we want to know the present-future output signal from a certain time on, e.g. time 0. The influence of the past input signal is then condensed into the state vector x0. In the following theorem, we shall write the impulse response in terms of the state space description. T h e o r e m 6.5 Given the state space quadruple ( A , B , C , D ) of a discrete linear time invariant causal finite-dimensional system, the impulse response h of this system can be written in terms of this quadruple as h0 hk
-
or in the frequency domain h -
D C T A k- ~B,
k>0
~.h can be written compactly as
h(z) - D + C T ( z I -
A)-IB
where the operations have to be carried out in the field ]~(z-1).
P r o o f . The impulse response is h - 7-/6. Knowing t h a t the system is causal, and I I ~ - 0, the state at time k - 0 is II0~ I I ~ - 0. Hence the state vector x0 - 0. From the state space description theorem, we then find easily that yo-
D,
xl - B,
yt - C T B ,
z2 - A B ,
y2 - C T A B , . . . .
For the induction step, suppose that xk-1
_
A k-
2
B
and
hk-1
_
Yk-1
_
C T A k- 2 B, for some k _ 2,
6.1. D E F I N I T I O N S
283
then we get xk-
Axk_l - A k - ~ B
and
hk - Yk
-
c T T, k -- C T A
k-lB.
Hence, we have proved the first part of the theorem. To prove the second part, we use the fact that if the operations are carried out in the field F(z -1), we get (zI-
A) -1
-
I z -1 + A z -2 + A2z -3 + A3z -4 + . . . .
Hence D + CT(zI-
A ) - I B - D + C T B z -1 + C T A B z -2 4- C T A 2 B z -3 + " ".
Another way to prove the second part of the theorem starts from the state space description, which we can write using the z-transform as follows z ~ , ( z ) - A~,(z) + B'~(z) ~)(z) - cT~,(z) + Diz(z) where & denotes the z-transform of a vector signal. Eliminate & and use the fact that u - 6 ( ~ - 1) and xo - 0 to get h - D + C T ( z I - A ) - I B . [:] From now on, we assume that the systems we are working with are discrete linear time invariant causal and finite-dimensional. Moreover we assume that the field F is commutative. Suppose that it is strictly causal (D - 0), then the impulse response can be represented in the frequency domain by a polynomial fraction h(z)-CT(zI-A)-IB
-
C T adj(zI- A)B d e t ( z I - A)
where a d j ( z I - A) represents the adjugate matrix containing the cofactors of z I - A. Note that the degree of the denominator d e t ( z I - A) is equal to n with n the dimension of the state transition matrix A. The degree of the numerator C T a d j ( z I - A ) B is smaller than n. Conversely every polynomial fraction with the degree of the numerator smaller than the degree of the denominator (a strictly proper fraction) can be interpreted as the frequency domain representation h of the impulse response as stated in the following theorem. ^
284
C H A P T E R 6. L I N E A R S Y S T E M S
T h e o r e m 6.6 ( p o l y n o m i a l f r a c t i o n r e p r e s e n t a t i o n ) Given a state space triple (A, B, C) of a system, then its impulse response is in the frequency domain given by the polynomial fraction h(z) - C T ( z I - A ) - I B -
C r adj(zI- A)B det(zI - A)
with the degree of the numerator smaller then the degree of the denominator the latter being equal to the dimension of the state transition matriz A. Vice versa, every polynomial fraction pq-1 with deg p < deg q = n can be interpreted as a polynomial fraction representation in the frequency domain of the impulse response of a system. P r o o f . The first part of the proof was given above. To prove the second part, we shall explicitly give one of the possible state space representations of the system with polynomial fraction representation pq-1. We know that the impulse response satisfies ]~ - ~ = 1 hkz -k - pq-1 E F(z -1 ). Note that the system does not change when we normalize the polynomial fraction representation by making q monic. A possible state space representation is given by A-
F(q),
B-
[l O O...O] and C T - [ h 1
h2...h,.,],
with n - deg q and where F(q) is the companion matrix of q (see Definition 2.3). The reader should check that hk - C T A k - I B , k > 0. Note that the C-vector which consists of the first values of the impulse response h can be computed as follows 1 q,.,-1 "'" 0 1 ... C T-
[hl h 2 " " h n ] - [P,.,-1 p,.,-2"" "P0]
9
.
9
o
0
0
"'"
0
0
'
ql q2
--I
qn-1
1
with p ( z ) - ~k=o ,.,-1 Pk zk and q(z) - ~'~=o qk zk The state space description that we have given here is called the controllability form by Kailath [165, p. 51]. KI With the state space description, two other concepts are easily introduced. D e f i n i t i o n 6.16 ( o b s e r v a b l e , r e a c h a b l e ) If ( A , B , C) is a state space triple representing a system with state transition matriz A of dimension
6.1. DEFINITIONS
285
n, then we call (A, C) and the system itself observable if the observability matrix 0 - [C A T c ... (AT)n-Ic]T has full rank. The couple ( A , B ) and also the system itself is called reachable if the reachability matrix ~ = [B A B ... An-IB] has full rank. The matrix H = O ~ is the n x n Hankel matrix consisting of the first impulse response entries. It is called the Hankel matrix of the system. With respect to the appropriate basis, it is the matrix representation of the operator II0k_?-/II~_ mapping past into present and future. The matrices O and T~ appeared already as Krylov matrices in our discussion of the Lanczos algorithm. The reader should recall that if rank O = n then it has reached its maximal possible rank. Extending the Krylov matrix further has no influence on the rank. The same observation holds for the rank of T~. Note that ~t(z) - p(z)q(z) -1 - C T ( z I - A ) - I B when we consider z as a variable can also be interpreted as a rational function and not just as a formal ratio. According to Definition 3.2 the state space triple (A, B, C) is then a realization of the rational function pq-i. This leads us to the definition of the transfer function which we shall also give for the case where D might be nonzero. D e f i n i t i o n 6.17 ( t r a n s f e r f u n c t i o n , r e a l i z a t i o n ) /f ( A , B , C , D ) is the
state space quadruple of a system, the rational function f(z) - D + CT(zI- A)-IB is called the transfer function of the system. Describing a system with data like the transfer function, the impulse response or the polynomial fraction description is often called an external description, while the state space representation is called an internal description. It is clear that each realization of a rational transfer function is another name for a state space description of the system whose impulse response can be found from a polynomial fraction representation for the rational function. Therefore, we can speak about the (minimal) realization of a system as being the (minimal) realization of the transfer function of the system. We can also speak about equivalent realizations for the same system as defined by Definition 3.3. For completeness, we repeat it here. D e f i n i t i o n 6.18 ( ( m i n i m a l ) r e a l i z a t i o n ) If f ( z ) is a rational transfer function of a system and B , C C F"xl, D E F and A C Fnxn such that f ( z ) - D + C T ( z I - A ) - I B . Then we call the quadruple ( A , B , C , D ) a realization of f(z). If n is the smallest possible dimension for which a realization ezists, then the realization is called minimal.
CHAPTER 6. LINEAR SYSTEMS
286
Two realization quadruples ( A, B, C, D) and ( ~i, B, C, D) are equivalent if they represent the same transfer function D e f i n i t i o n 6.19 ( E q u i v a l e n t r e a l i z a t i o n s )
f(z) - D + C T ( z I - A ) - I B - b + C T ( z I - /~)-1 and have the same size. Two realizations related by D-D,
C-
FTC,
JB- F - 1 B
and
tt- F -1AF
with F some nonsingular transformation matrix are equivalent and if the realizations are minimal the converse also holds [165, Theorem 2.4-7]. Translating Definition 3.4 of the McMillan degree within this context gives us
The McMillan degree of a system is the dimension of the state transition matrix A of a minimal realization quadruple (A, B, C, D) for the system. It is clear that the minimal denominator degree of all polynomial fraction representations of the system is also equal to this McMillan degree. Hence, the McMillan degree is equal to the denominator degree of a coprime representation of the transfer function.
D e f i n i t i o n 6.20 ( M c M i l l a n d e g r e e )
The McMillan degree can be seen as a measure for the complexity of a system. Indeed, all systems having McMillan degree n can be represented by a coprime polynomial fraction representation pq-1 with deg p < deg q = n. It is a well known fact that a minimal realization can be characterized in several ways. For example. T h e o r e m 6.7 Let the triple (A, B, C) be a state space triple for a system
with the dimension of A equal to n. Let T~ and 0 be the reachability and observability matrices and C ( z I - A ) - I B the transfer function. Then (A, B, C) is a minimal realization iff one of the following conditions is satisfied. 1. C ( z I - A ) - I B is irreducible, i.e. d e t ( z I - a) and C a d j ( z I - A)B are coprime. 2. rank 7~ - rank 0 = n. I.e., the system is reachable and observable. 3. rank OT~ - n. P r o o f . See for example in Ka~lath [165, p. 126-128].
6.2. M O R E DEFINITIONS A N D P R O P E R T I E S
6.2
287
More definitions and properties
Most of the derivations in the previous section were given for discrete systems. A similar route can be followed for continuous ones. However, where the discrete case allows easily a formal approach, since we are working with formal series and sequences, this is much more difficult to maintain for continuous systems. Since we want to include again continuous time systems in the discussion, we shall suppose for simplicity that our basic field F is the field of complex numbers C. The series we shall consider are now in a complex variable and do (or do not) converge to complex functions in certain regions of the complex plane. In the same way as for discrete time systems, we can define the continuous time impulse signal as a generalized function 6 (a Dirac impulse) satisfying
f+f
6 ( t - T)f(t)d(t)
f(~').
The transform which is the equivalent of the discrete z-transform, is the Laplace transform of the signal. We shall not build up the theory for continuous systems from scratch as we did for the discrete time case, but immediately summarize the most important results. The reader should consult the literature if he wants to know more about the derivations. Although not really necessary, we shall suppose that all the signals we consider in the sequel are zero for negative time. If the system is causal, this means that also the initial state at time zero will be zero. So we come to the following characterization of a system. Some authors see it as the actual definition of a system.
The state space description of a linear time invariant causal finite-dimensional system has the following
D e f i n i t i o n 6.21 ( s t a t e s p a c e d e s c r i p t i o n )
form: s:(t)
-
Az(t)+Bu(t),
y(t)
-
c T x ( t ) + Du(t),
~(o)
-
O,
t E ~,
BE][~"~,xl
A E F "~xn, C e F ~xl
,
.D E l~l x l ,
t>_O.
As before, u is the input, y is the output and z is the state vector. For a discrete time (d.t.) system the operator S represents the shift operator Z, i.~. s=(t) - z=(t) - =(t + 1). Fo~ a ~ o ~ t i ~ o ~ tim~ (~.t.) ~y~t~m S ~ t ~ ~ for differentiation Sx (t) - d x (t). Let s denote the Laplace transform in continuous time or the discrete Laplace transform (i.e. z-transform) in discrete time. This means ] - s f
CHAPTER 6. LINEAR SYSTEMS
288
depends on the frequency domain variable z, in the following way oo
:(z) - ~ I ( t ) z - '
(d.t.)
t=0
-
Jo"
e-*tf(t)dt
(c.t.)
whenever the right-hand side makes sense. In the frequency domain, we can write the state space description as z~(z)
=
A~(z)+B~(z),
f/(z)
-
cT~(z) + D~(z)
which gives an expression for the transfer function Z(z) as
f/(z) - Z(z)~(z) with Z(z) - D + C T ( z I - A)-IB. The reaction of the system to an impulse signal 5 as input, which is the Kronecker delta 5 = {50k}ke~ for d.t. or the Dirac impluse function with spike at zero for c.t., is called the impulse response h. Since in both cases, the frequency domain equivalent of an impulse is the constant function - /:6 - 1, the impulse response will correspond to the transfer function in the frequency domain. ]z(z) - Z(z). 1 - Z(z). Therefore, the impulse response itself is the inverse transform of Z(z)" h - s Thus we have the following relations between the transfer function Z(z) and the impulse response h(t) whenever they make sense:
Z(z)- ~oo h(t)z-'; h(t)- ~1
fo 2~ e~:tZ(e~ ~ )d~
(d.t.)
(6.6/
F
(c.t.).
(6.7)
t--0
Z(z)-
/o e-*th(t)dt;
h(t)-
ei~tZ(iw)dw
oo
Without loss of generality, we shall assume strict causality, i.e. D = 0 in the sequel. Hence, the system can be described by the transfer function Z(z) connected with the state space triple (A, B, C) as follows:
Z(z)-CT(zI-A)-IB. It is a strictly proper rational function. In the theory of linear systems, it is important that the system model is stable. Stability means the following.
6.2. MORE DEFINITIONS AND PROPERTIES
289
D e f i n i t i o n 6.22 ( s t a b i l i t y ) We shall call a system (bounded-input-bounded-output) stable (or BIBO stable) iff a bounded input generates a bounded output. This is known to be equivalent with
(~.t.)
fo ~ ]h(t)]dt < M < ~o oo
Ih(t)l < M < oo
(~.t.)
t=l
with h(t) the impulse response of the system. A system is called internally stable iff
rt~(r
< o
(~.t.)
ICzl < 1
(d.t.)
where (i are the eigenvalues of the matrix A from the state space representation (A, B, C) for the system. Re (() is the real part of the complex number
r Internal stability implies BIBO stability but the converse is only true if the state space representation is minimal. For a physical system, we should expect that it is BIBO stable. However, for the approximants that are generated by (minimal) partial realization (see next section), we cart not guarantee that they will be stable. Some issues about the stability checks for systems (or more generally for polynomials) will we discussed in Section 6.7. To come to the definition of the minimal partial realization (mPR) problem , we introduce the Markov parameters of a linear system as follows" D e f i n i t i o n 6.28 ( M a r k o v p a r a m e t e r s ) For a linear system with transfer function Z(z), and state space triple (A, B, C), we cart the Mk, defined by
1 dkZ(z) Mk = k~ dz k
- CA k - l B ,
k - 1,2,...
(6.8)
z~oo
the Markov parameters of the system. Using equations (6.6-6.8) we can write the Markov parameters in function of the impulse response as follows
Mk - h(k) Mk-
dk-lh(t) dtk-1
(d.t.) (c.t.). t=0
C H A P T E R 6. LINEAR S Y S T E M S
290
This indicates that the first Markov parameters contain information about the transient response of the linear system. That is the behavior of the system for small values of time. This is easily verified for a discrete system. The output for small t depends on the input for small t and the initial Markov parameters, i.e., the first terms in the impulse response. For continuous time, it is somewhat harder to see, but it remains basically the same. Where the Markov parameters of a system give the most adequate information about the transient behavior, the time moments will be better to model the steady state behavior, i.e. the behavior for t ~ oo. We define them as follows. 6.24 ( t i m e m o m e n t s ) /f Z(z) is the transfer function of a continuous system and Z(z) is analytic in z - O, then its time moments are defined as mk--(--1)k (d) k , k - 0,1,2,... (6.9)
Definition
z--0
If Z(z) is the transfer function of a discrete system and Z(z) is analytic in z - 1, then its time moments are defined as
mk - ( - 1 )
(d)k
Z(z)
,
k - 0,1,2,...
(6.10)
z=l
Note that for stable systems the analyticity condition for the transfer function Z will be satisfied. From this definition we get the following connection between the time moments and formal power series. T h e o r e m 6.8 Matching the first ~ time moments is equivalent with matching the first ~ coefficients Tk in oo
T(z) - ~
Tkz k
k=O
for continuous systems, where T(z) is the MacLaurin ezpansion of Z(z), and with matching the first A coefficients Tk in T(z) - ~
Tk(z - 1) k
k=O
for discrete systems, where T(z) is the Taylor expansion of Z(z) around z - 1. These coefficients Tk are called time coefficients.
6.2. M O R E DEFINITIONS A N D P R O P E R T I E S
291
P r o o f . For continuous systems this is clear from the definition of the time moments. For discrete systems we get from (6.10) that each time moment mk is a linear combination of the time coefficients To, T 1 , . . . , Tk. The coefficient in this linear combination for Tk is different from zero. This proves the theorem. [::] To show that time moments give information about the steady state behavior of stable linear systems, we give the following theorem. T h e o r e m 6.9 Let mk be the time moments of a stable linear system with
impulse response h(t) (see (6.6-6.7)). Then these time moments are equal to
mk - ~ tkh(t)
(d.t.)
t=O
and ink-
tkh(t)dt
~0~176
(c.t.).
P r o o f . See the paper of Decoster and van Cauwenberghe [?4].
[::]
From this theorem it is clear that the time moments m k of a stable system are a weighted average of the impulse response h(t). When k becomes larger more emphasis is laid on the impulse response for larger t. Hence, we can conclude that matching the first time moments of a stable system will model the steady state behavior if the approximating system is also stable. If some initial Markov parameters are known, we know the initial coefficients of a series expansion of the transfer function in the neighborhood of infinity; of a series in F[[z -~ ]] say. We can thus construct with our algorithms some rational approximant that will match these Markov parameters and since these parameters model the transient behavior of the system, we may hope that we get a good approximate model for the system as it behaves for small time. This type of approximation problem is the partial realization problem. An important issue is the simplicity of the approximant, i.e., its minimality. Hence, one is usually interested in a minimal partial realization problem. We shall interpret the previously obtained results in this context in the next section. A severe drawback of this type of approximation however is that we can not be sure that the approximant will correspond to a stable system, even if we started out with a stable one. So stability checks as will be discussed in Section 6.7 are in order. Other techniques consist in allowing only restricted types of approximants that will be guaranteed to be stable. For this a price is paid because less coefficients can be fitted. Some examples are reviewed in the survey paper [42].
C H A P T E R 6. L I N E A R S Y S T E M S
292
If some time coefficients are known, we can use the division algorithms of the previous chapters for fls in F(z) to generate approximants. This is a problem like described in Pad~ approximation since we are now dealing with power series (in z or z - 1). Thus we shall give a system theoretic interpretation of the Pad~ results in Section 6.4. Of course, one can think of a mixed minimal partial realization problem, i.e. where both time coefficients and Markov parameters are given. One may hope to have a model that is good for the transient behavior as well as for the steady state behavior. For that we should approximate both a series in z -1 (Markov parameters) and a power series in z or z - 1 (time coefficients). Hence we have here a sort of two-point Pad6 approximation problem. If we want to do the previous computations recursively, i.e., we suppose more and more Markov paramaters (or time coefficients) become available as the computations go on, we are basically facing a problem with "Hankel structure". We mean that the data can be collected in a Hankel matrix which is bordered with new rows and columns as new data flow in. If there are only data of one sort, i.e., either only Markov parameters or only time coefficients, then the data expand in only one direction, just like a bordered Hankel matrix gets only new entries in its SE corner. Even if we have a mixed problem but with a preset, fixed number of data of one kind, we are basically in the same situation, viz., data expanding in only one direction. However, if Markov parameters as well as time moments keep flowing in, the problem has a "Toeplitz structure". If a Toeplitz matrix is bordered with an extra b o t t o m row and an extra last column, the new d a t a appear in the NE and the SW corner of the matrix, which illustrates the growth in two different directions. Thus in the first case, we shall be dealing with the results we obtained for Hankel matrices, while in the second case we shall have to interpret the results we have given for Toeplitz matrices. We shall investigate this in more detail in the following sections.
6.3
The minimal partial realization problem
We assume in the rest of this section that the systems taken into consideration are linear time invariant and strictly causal. Usually, the realization problem is defined as the problem of finding a state space description triple (A, B, C) given the Markov parameters Mk, k - 1,2,3, .... However, because the state space approach is equivalent to the polynomial fraction description approach, we can also look for a
6.3. THE MINIMAL PARTIAL R E A L I Z A T I O N P R O B L E M
293
polynomial fraction. In other words, given the coefficients in the expansion
M1z-1 -[- M2z -2 + M3z -3 + . . . , find the function it represents, either in the form of a state space realization Z(z) - C T ( z I - A ) - I B or in the form of a polynomial fraction Z(z) c(z)a(z) -1. Thus our definition is as follows" D e f i n i t i o n 6.25 ( ( m ) ( P ) R - M p r o b l e m ) Given the Markov parameters of a finite-dimensional system, find a polynomial fraction c(z)a(z) -1 which represents the transfer function of this system. We speak about a minimal problem if we look for a polynomial fraction c(z)a(z) -1 with a minimal degree for the denominator a(z). This is equivalent to finding a state space representation(A, B, C) of the transfer function with the order n of the state transition matrix A as small as possible. In short, we look for a minimal state space description or a coprime polynomial fraction representation of the transfer function, with a complexity equal to the McMiUan degree of the system. The adjective partial refers to the fact that we do not consider all Markov parameters (not necessarily connected to a finite-dimensional system) but .y only the first ones, {hk}k=l say. In this case the set of (finite-dimensional) systems having the same first 7 Markov parameters, i.e. the set of partial realizations of degree 7 (not to be confused with the degree of the transfer function), consists of more than one element. The partial realization problem looks for a polynomial fraction description of one or all of these systems, while the minimal problem looks for all those with the smallest possible McMillan degree. By solving a partial realization problem, one finds a (rational) transfer function which can be expanded as an element of F(z -1) whereby the first 7 coefficients in the expansion correspond to the given Markov parameters. We shall have to show that the previously described techniques give indeed (minimal) solutions. Let us start by looking again at the results of Section 1.5 where we obtained approximations for formal series from F(z -1 }. From Theorem 1.13, we can easily derive the following reformulation which can be stated without proof. T h e o r e m 6.10 Given the Markov parameters M k , k >_ 1 and using the notation of Theorem 1.13, the (right sided) extended Euclidean algorithm applied to s - 1 and r ( z ) f(z)Z(z)E ~ - i Mk z-k, the given transfer function, generates partial realizations co,kao,k-1 of degree 2t~k_ 1 + ak, . 9 2ak, . . . , 2ak + ak+l -- 1 having McMillan degree ak.
C H A P T E R 6. L I N E A R S Y S T E M S
294
In fact, the partial realizations co,kao, ~ are minimal. This result is not contained in Theorem 1.13, but it follows from the corresponding results about orthogonal polynomials. In fact, the previous theorem is a special case of Theorem 6.12 to be given below. We continue our scan of the results of the previous chapters. The pre-1 The intervious theorem says something about the approximants co,k a o,k" mediate results that are generated by the atomic Euclidean algorithm are partial realizations too. Indeed, interpretation of the results of Section 1.6 within the context of linear systems leads to the following theorem. T h e o r e m 6.11 Given the Markov parameters Mk, k > 1 and using the notation of Section 1.6, the atomic Euclidean algorithm applied to s - - 1 and Ci), (i) _~ , r - f ( z ) - Z ( z ) - ~-]~=1 hk z - k generates partial realizations Vo,ktUo,k)
i -- O, 1 , . . . , ak, of degree 2,r
+ ak + i having McMillan degree 'ok.
P r o o f . From equation (1.17), we get u0,k - ~0,k with deg,~'0,k (i) -
'ok and deg s (i) -
-'r
right-hand side of equation (6 " 11) by -(i) ~'O,k
(6.11) i - 1 9 Dividing the left and proves the theorem,
o
Note that the polynomial fractions Co,kao,~ as well as ~0,k(~0,k) can be written as continued fractions using the results of Section 1.4. Some of the linear algebra results of Chapter 2 can also be restated in the current context of linear systems. Let us start by rewriting the partial realization problem as a linear algebra problem. Given the Markov parameters M k , k > 0, i.e. the transfer function f ( z ) - Z ( z ) , the partial realization ca -1 of degree 7 has to satisfy f - ca -1 - r with deg r _< - 7 - 1. Suppose deg a - a. Hence, we get the equivalent linearized condition f a c - r a - r ~ with deg r ~ <_ - 7 + a - 1. Hence the denominator polynomial a has to satisfy the set of homogeneous linear equations Hx_~_l,~a - 0 with HT_~_l,a a Hankel matrix defined by equation (2.1). The entries of the Hankel matrix are Markov parameters. Using the results about minimal indices obtained in Section 5.5, we can easily prove the following theorem. T h e o r e m 6.12 Let the Markov parameters Mk and hence also the transfer function y ( z ) - Z ( z ) - E ~ = t Mk z - k be given. Suppose it has Kronecker indices tck and Euclidean indices ak, the monic denominator polynomials of all minimal partial realizations of degree 7 > 0 with 2~k - ak _< 7 < 2nk+x - ak+x
6.3.
THE M I N I M A L P A R T I A L R E A L I Z A T I O N P R O B L E M
295
are given by the manifold ao,k +pao,k-1 with p an arbitrary polynomial having degp < 2 ~ k - (7 + 1) and a0,k and ao,k-~ the true (right) monic orthogonal polynomial for the Hankel moment matrix [Mi+j-1] having degrees ~k and ~k-1 respectively. For each of the denominator polynomials a = ao,k + pao,k-1 of this manifold, the corresponding numerator polynomial c can be found as the polynomial part of f a or c = Co,k + pco,k-1. For each different choice of p we get a coprime polynomial fraction representation of a different system, each system having McMillan degree nk. P r o o f . The first part of the theorem follows immediately from Theorem 5.6. The polynomial fraction ca -1 is coprime because, otherwise, the reduced rational form c~a~-1 would be a partial reahzation with smaller degree for the denominator. This is impossible because dega - ~k is minimal. Because c and a for different choices of p are always coprime and because ao,k, ao,k-1,..., z'ao,k-1 with i - 2~k - (7 + 1) are hnearly independent, ca -1 is a coprime polynomial fraction which is different for each choice of p. Therefore, for each choice of p this polynomial fraction represents a different system. [3 We assumed above that we started from the Markov parameters Mk or, equivalently, from the transfer function Z(z) - ~ = 1 Mk z-k. We then looked for a representation ca -1 having smallest possible McMillan degree and having the same first 7 Markov parameters. Instead of the Markov parameters we could consider the situation where a polynomial fraction is being given, one of high complexity say and possibly not coprime. We can then simplify the model (the given polynomial fraction) by replacing it by another one with a lower complexity. This is called model reduction. Thus suppose that in that case we know the transfer function for the given system in the form of a numerator-denominator pair ca -1. We can then also apply the different versions of the Euclidean algorithm which we described before with initial data s - a and r - c and get a reduced model (one with reduced McMillan degree or one with the same McMillan degree but represented by a coprime polynomial fraction). Instead of a given set of Markov parameters or a transfer function or its given polynomial fraction representation, we could consider the case where an input signal u ~ 0 is given together with its corresponding output signal y. When we assume that the past input signal is zero, we can also use the different versions of the algorithm of Euclid with initial data s - ~2 and r - -~) because the transfer function equals Z(z) - fl~z-1. In this case the problem of constructing a rational representation for Z(z) in one way or another from these data is usually called system identification. If the initial system is finite-dimensional the
296
CHAPTER
6.
LINEAR
SYSTEMS
algorithm will stop after a finite number of steps as in the previous case and we shall get a coprime polynomial fraction for this system. Otherwise, the algorithm will generate coprime polynomial fractions being minimal partial but never full realizations of the given system. E x a m p l e 6.1 Let us reinterpret the results of Example 1.5 within the context of linear system theory. We can view it(z) - - s ( z ) - z - z -4 as the z-transform of the input signal of a discrete system with corresponding output signal ~(z) - r ( z ) - 1 + z -4. This is the system identification interpretation. Because the input signal as well as the output signal have only a finite number of nonzero values, we can also adapt a model reduction viewpoint. We could see ca -1 with a(z)-z4~(z)-z
~- 1
and
4+1
c(z)-z4~(z)-z
as a polynomial fraction representation of the transfer function of a systern with great complexity for which we want to compute a less complex approximation (i.e. having smaller McMillan degree) or a reduced polynomial fraction representation. Implicitly we work with the transfer function, which for a discrete system corresponds to the z-transform of the impulse response Z(z)
-
-
-
z
+
z
+
z
+
z
+....
In the two cases, we can use the extended Euclidean algorithm to compute the true (monic) orthogonal polynomials a0,k together with the corresponding numerators c0,k. For the system identification problem, we start with the series s and r while for the model reduction problem, we start with the polynomials a and c. We could even use the transfer function (impulse response) explicitly by starting the algorithm with s - - 1 and r - Z - h. The Kronecker indices are ^
' ~ 1 - 1,
'~2-4
and
'~k-5
for
k>_3.
Hence, following the previous theorem, the denominator polynomials of all minimal partial realizations of degree 7 with 5 _< 7 < 9 are given by a 0 , 2 -~-
pao, 1
with
a0,2(z) - z 4 - z 3 + z 2 - z - 1
and
a0,1(z )
-- Z
and p an arbitrary polynomial with deg p _ 8 - (7 + 1). For example, take 7 - 7. Hence deg p _< 0 or p C F. All minimal partial realizations of degree 7 can be written as
(gO,2 -~- Pgo,1)(ao,2 + pa0,1) -1
6.3. THE MINIMAL PARTIAL R E A L I Z A T I O N P R O B L E M
297
with impulse response
z -1 + z -5 4- z -6 --pz -8 - - ( 2 p - 2)z -9 - ( p -
3)z -1~ + . . . .
Note that for p # 0 the first 7 Markov parameters are the same, while for p = 0 exactly 8 Markov parameters are matched.
Up until now, we have always looked for a (minimal) (partial) realization in the polynomiM fraction representation. We can easily translate the results to state space descriptions. T h e o r e m 6.13 Suppose the Markov parameters Mk, k > 0 are given defining the transfer function f ( z ) = Z(z). Using the notations of Corollary 2.30 and Theorem 2.7, we can give several state space representations (A, B, C) for the minimal partial realization co,kao, -1k of Theorem 6.12:
9 A- F(ao,k)- Hk-l(f)-lHk_l(zf), B T - [1 0 . . . 0 ] and C T - [ M 1 M2"..M,~k], with F(ao,k) the companion matrix of the (monic) polynomial ao,k. 9 A - Tk_l, B T [l 0 ' ' ' 0 ] and C T - eTDk_l - - [ 0 . . - 0 ~00--'0] where ro ~ 0 is the a l th element with a l the first Euclidean index of -
Y. P r o o f . Lemma 2.5 gives us
F(ao,k)- Hk-l(f)-lHk-l(zf). The controllability form of Theorem 6.6 gives us the first state space representation. Starting with this controllability form, we can construct the equivalent representation (Ak-_l1F(ao,k)Ak-1, A-kl_IB, AT_I C). Equation (2.29) gives us
A k l l F(ao,k)Ak_l
-
-
Tk_l"
Because Ak-1 is a unit upper triangular matrix, A~-_ll will also be unit upper triangular. Hence A~-_ll B - a0 - e0.
298
CHAPTER
6. L I N E A R S Y S T E M S
Finally, C T A k - 1 - eT H k - l A k - 1 -- e T ] R k - 1 .
This proves the theorem. Note that the results also follow from Corollary 4.14 applied to the special case where the moment matrix is equal to a Hankel matrix. From equation (4.12), we also have C T A k _ I -- e T D k _ l . [:]
E x a m p l e 6.2 Let us continue Example 6.1. A state space representation (A, B, C ) o f the minimal partial realization of degree 8 is given by
A
=
F(a0,2)=
BT
-
[1 0 0 0 ] ,
c r
=
[1000].
0 1 0 0
0 0 1 0
0 0 0 1
1 1 -1 1
The other state space representation based on the block tridiagonal matrix T1 gives the same representation for this example. Note that T1 is the upper left part of T2 given in Example 2.3. <5
6.4
Interpretation of the Padd results
The Pad~ approximants were associated with series from F[[z]], i.e. with positive powers of z. As we have seen in Section 6.2, such series with time coefficients Tk appear in system theory as series expansions of the transfer function. It is somewhat simpler to discuss this m a t t e r for continuous systems because then the time coefficients are the same as the time moments, but most of all because these coefficients are the MacLaurin coefficients of the transfer function Z ( z ) . For discrete systems, the one-to-one relation between time moments and time coefficients is more complicated and the time coefficients appear in a Taylor series expansion at z = 1, rather than z = 0. Of course, this is not a fundamental problem, but for simplicity, we assume in the sequel, that we work with continuous systems. For a discrete time system, we can always transform the independent variable z such that the time moment information is around z = 0 as for continuous time systems. In the same way as for the Markov parameters, we define the (minimal) (partial) realization problem for the time coefficients as follows.
OF T H E PADF, R E S U L T S
299
D e f i n i t i o n 6.26 ( ( m ) ( P ) R - T p r o b l e m ) Given the To, T 1 , . . . , the partial realization problem of order ~ systems having the same first )~ time coefficients T o , . . . , having minimal McMiUan degree is called a minimal order )~. I f )~ = cr then we drop the word "partial".
time coefficients looks for one or all Ta_:. Such a system partial realization of
6.4. I N T E R P R E T A T I O N
We are looking for the realization(s) in coprime polynomial fraction representation ca -1. To couple the minimal Pad~ results to the realization problem here, we rewrite the time coefficients Tk, k - 0, 1 , . . . as fk+l - Tk, k _ 0. The condition of having the same first )~ time coefficients, leads to the following set of homogeneous linear equations for the coefficients of the denominator polynomial a H~_~_:,~ i a - 0 with deg a - a and a(0) ~ 0 and Hx-~-I,~ the Hankel matrix built upon the coefficients {fk}k>l having )~ - a rows and a + 1 columns. As for the mPK-M problem for the Markov parameters, all solutions of the mPR-T problem for the time moments are described in the following theorem. T h e o r e m 6.14 Given the time coefficients Tk, k >_ O. Associate with it the Hankel m o m e n t matriz [Ti+j] and the Kronecker indices tck and Euclidean indices ak. Let us write the minimal partial realizations for these time coefficients in the form of a coprime polynomial fraction representation as c'(z)a'(z) -1 with a'(z) - z~a(z -1) with deg a - a and c'(z) - z ~ - l c ( z - 1 ) . I f 2~k - ak < ~ < 2tck, the reversed polynomials a, associated with the denominator polynomials a ~ of all minimal partial realizations of order ~ for the given time coefficients can be written as a-
ao,k + pao,k-l~
degp < 2~;k - (~ + 1)
and with the extra condition that a(0) = ao,k(0)+ p(0)ao,k-l(0) ~ 0. The polynomials ao,k and ao,k-1 are the true (right} orthogonal polynomials with respect to the m o m e n t matriz [Ti+j] of degree tck and ~k-1 respectively. If 2t;k < ,k < 2t;k+l - a k + l , we have to make a distinction between two other cases. If ao,k(0) ~ 0, the unique minimal partial realization denominator a' is the reverse of the polynomial a = ao,k. If ao,k(0) = 0, the denominators of art minimal partial realizations of order ~ are associated with the reversed polynomials a = pao,k + cao,k-1 with p a monic polynomial of degree ~ - 2tck + 1 and 0 ~ c C F = C. In all these cases, we have chosen for a monic normalization for a, i.e. a comonic normalization for a'.
300
C H A P T E R 6. L I N E A R S Y S T E M S
Together with the numerator polynomial c ~ which can be found as the polynomial part of degree < a - deg a of f a ~, f - ~k~176Tkz k, we get minimal partial realizations of order )~ written as a coprime polynomial fraction. For each different denominator, we get a different system. The McMiUan degree of all these systems is '~k except when a0,k(0) = O. In this case the McMillan degree is )~ - '~k + 1. P r o o f . The proof is very similar to the proof we have given for the mPKM case in the previous section. In the same way, we can use the results we obtained in Section 5.5 about minimal indices. However, there is an additional element here. It is important that a(0) should be different from zero. To show that such an a can always be found by the results above, we have to show that a0,k(0) and a0,k-~ (0) can not vanish simultaneously. This is easy to see because
E
"O0'k
E0'k
"U,0,k aO,k
]
-- Y 0 , k - 1
Evkck] Uk
ak
with vk - 0 or [v0,k Uo,k] T -- Uk[CO,k-1 ao,k-1] T and V0,k a unimodular matrix, i.e., its matrix is a nonzero constant. Hence the matrix V0,k(0) is invertible and this proves not only that a0,k(0) and a0,k-~(0) can not be equal to zero simultaneously but also that c0,k(0) and a0,k(0) can not be equal to zero at the same time either. We leave the rest of the proof to the reader. [J Note that according to Section 5.6 about minimal Pad~ approximation we can connect the a polynomials to the Frobenius definition (along an antidiagonal, see Definition 5.5 and equation (5.22)) and the a ~ polynomials to the Baker definition (along a diagonal, see Definition 5.6 and equation (5.28)). As before, we could give state space descriptions for co,kao,k~ ~-1. Again, we leave the details for the reader.
6.5
The mixed problem
In the two previous sections we considered the cases where either only Markov parameters or time coefficients were given. With the same methods, we can also solve mixed problems where both time coefficients and Markov parameters are given on condition that there is only a finite number of one of them. We try to find a realization having minimal McMillan degree with the same first Markov parameters and time coefficients, i.e. the mixed problem.
6.5.
THE, MIXED P R O B L E M
301
Suppose ~ + 1 time coefficients Tk are given and 7 Markov parameters Mk, which we want to fit simultaneously. Let the denominator degree be bounded by a, then the numerator degree is bounded by a - 1 since we want strict causality and at least one of both degrees is exact, because we want a minimal realization. If we denote the denominator by q and the numerator as p, then q(0) should be nonzero, as was derived for the minimal Pad6 approximation problem in Bakers sense. This means that numerator and denominator are linked to the time coefficients Tk by the systems
T,)t
~ ~ 9
T,~
-- Ot
O, T1 ... ...
To
~ * "
O
To 0
9
.
9
.
To
qo#O
q-
...
(6.12) q
-
ip.
0
This is just a rewriting of the systems (5.28). We used ~ instead of w, the coefficients fk are replaced by the time coefficients Tk and ~ was set to 1. If you then reverse the order of rows and columns, you get the system above. We could also consider it as a special case of (5.22), but with the extra condition that q(0) ~ 0. Similarly, since also 7 Markov parameters have to be fitted, we also have a system of equations that looks like
...
0 0 M1
M1
M 1
~ ~ ~
9
9 ~ ~
Mr:
9 9 9
M , ~ + I
q
ip
~
(6.13)
M2 9
,
, ,
q
O,
q,~ # 0 .
M,,]r
In both systems, a should be minimal. If we now subtract the appropriate equations in (6.12) and ( 6 . 1 3 ) f r o m each other, we obtain a homogeneous
C H A P T E R 6. LINEAR S Y S T E M S
302 Hankel system t h a t looks like
0
0
O
0
9
fa+l fc~+2
q-0,
qoq~O
(6.14)
9
where we have set
/ -T,~+I_k fk
-
-
-
Mk-~-I 0
1,...,
1
k~+ k - )~ + 2 , . . . , ~ + 7 + 1 otherwise
~+7+1
(6.15) (6.16)
It was tacidly understood that a was _ )~ and < 7, but it is easily checked t h a t we can come to the latter system for other values of a too. This system has the form (see (5.16)) Hn-~-l,~q - 0 with a minimal and qoq~, ~ 0 and we have seen before how to solve this. We should consider the minimal indices associated with the sequence {fk}nk=l and the denominator is found to be an annihilating polynomial for t h a t sequence. Because of the nonvanishing constant term and leading coefficient, it is actually a constrained problem as we discussed it in Section 5.5. Once we have obtained the denominator as a solution, we can easily find the n u m e r a t o r coefficients from (6.12) or (6.13). Both of them give the same numerator. Thus we take the sequence of ~ + 1 time coefficients in reverse order and with opposite sign followed by the 7 Markov parameters, i.e. -Tx,-Tx_I,...,-To,
M1, M 2 , . . . , M~.
Rename the elements of this sequence as f k , k - 1 , 2 , . . . , r / , and we can adapt a combination of Theorems 6.12 and 6.14 to the current situation, so t h a t it reads T h e o r e m 6.15 Let there be given ~ + 1 time moments T 0 , T 1 , . . . , T x and 7 Markov parameters M~, M ~ , . . . , M r. Define ~7 and fk as in (6.15} and (6.16}. Associate with this sequence the Kronecker indices nk and Euclidean indices ak. Denote by ao,k the monic true (right} orthogonal polynomials for the Hankel moment matriz with coefficients fk.
6.5.
THE M I X E D P R O B L E M
303
Then, if2tck--ak < ~? < 2t;k, all minimal partial realizations that fit these )~ + 1 time moments and the 7 Markov parameters have (monic) denominators of the form a = ao,k + pao,k-~ with p an arbitrary (monic) polynomial of degree 2ak - (~/+ 1) at most and subject to the condition a(O) ~ O. If 2ak <_ 77 < 2t;k+x - a k + l , there are two possibilities. If a0,k(0) # 0, the polynomial ao,k is the denominator of the unique minimal partial realization of the given data. If ao,k(O) = O, the (monic) denominators of all minimal partial realizations of the given data are of the form a = pao,k + cao,k-1 with c a nonzero constant and p an arbitrary (monic} polynomial of degree (7? + 1) - 2a~:. The corresponding numerator can in all cases be found by the relations (6.12) or (6.13). The proposed solutions describe all possible minimal partial realizations and for each different choice of p we get a different solution. The McMillan degree of all realizations is ak, except in the last case where ao,k(O) = O, then the McMillan degree is ~ + 1 - ak. Note that the formulation of the problem allows a recursive computation when using the Euclidean algorithm if we suppose t h a t the d a t a are given in the order of the sequence fk. This means, first the time coefficient T~, then T~_I until To and then the Markov parameters M1, M2, etc. This implies t h a t we need not stop with M~. We can continue with M ~ + I , M ~ + 2 , . . . without knowing where we shall stop. However, we do have to start with T~ if we want the recursive algorithm to work. Thus we assume t h a t only a finite number of time coefficients are given, which are all known to begin with, while there may be an infinite number of Markov parameters. In the converse situation where we know a finite number of Markov parameters beforehand but where the number of time coefficients can increase infinitely, we can consider the sequence {M~, M ~ _ ~ , . . . , M 1 , - T o , - T ~ , . . . } which we denote by {fk}k>l and solve the problem with this sequence. A theorem much like the previous one can be formulated then. In fact the solutions for c and a described in the previous one are the reversed polynomials of n u m e r a t o r and denominator of the solution of the current problem. Thus we have T h e o r e m 6.16 Let the polynomials c and a be defined as in the previous theorem but now for the sequence { fk } = { M-r, U~_ 1 , . . . , U l , -To, - T I , . . .}, then the minimal partial realizations for the 7 Markov parameters Mk, k = 1 , . . . , 7 and )~+ 1 time coefficients Tk, k = 0,...,)~ are given by c'(z)a'(z) -1 with d ( z ) za-lc(z)and a'(z)zaa(z), where a is the
304
CHAPTER
6.
LINEAR
SYSTEMS
M c M i l l a n degree o f ca -1, the m i n i m a l partial realization described by the p r e v i o u s theorem.
In the next section we shall interpret the Toeplitz results of Section 4.9 as a solution to the mixed minimal partial realization problem. There, the number of Markov parameters and time coefficients can both grow simultaneously.
6.6
Interpretation of the Toeplitz results
We shall reinterpret the results obtained in Section 4.9 where orthogonal polynomials with respect to a Toeplitz moment matrix were briefly studied. It will lead to a solution of the mixed minimal partial realization problem. Thus we suppose as in the previous section that Markov parameters as well as time coefficients are given. The advantage of the present approach is that the methods will work recursively when the number of time coefficients and the number of Markov parameters are increasing simultaneously. This is in contrast with the previous section where we supposed a finite and fixed number of either time coefficients or Markov parameters as known in advance. The two-point Pad~ interpretation that we gave in Section 4.9 gives us a possible method to find partial realizations that match both Markov parameters and time coefficients. So as in equations (4.34) and (4.35), we couple the series f+ and f_ with the given Markov parameters Mk, k > 1 and time coefficients Tk, k _> 0 respectively" + T2z 2 + . . . ,
/+(z)
-
To +
- f_(z)
-
M l z - 1 - F M 2 z -2 -F " " " .
This means that in the notation of Section 4.9 we have set f k -- Tk for k = 0 , 1 , . . . and f-k = Mk for k = 1,2, .... Now consider the Toeplitz moment matrix M - [f~_j] and associate with it the orthogonal polynomials as in Section 4.9. We recall that for example a0,n was the first of a new block of (right) orthogonal polynomials and P0,k were the (right) block indices and pk the (right) block sizes for the Toeplitz moment matrix M. Furthermore, define co,n(z) as in (4.36)or (4.37). Then it is easy to see from (4.38)and (4.39) that a partial realization for the first p0,n + P~+I Markov parameters and the first p0,n+ Pn+l + time coefficients is given by Co,nao, -1n having McMillan degree P0,n. They need not be minimal though. It is also possible to generalize these results to the cases where other numbers of Markov parameters and/or time coefficients are given but we
6.6. I N T E R P R E T A T I O N
OF T H E T O E P L I T Z R E S U L T S
305
shall not do this here. The Toeplitz case is only developed in the chapter on orthogonal polynomials and certain aspects were not scrutinized as we did for the Hankel case. The Toeplitz case really needs a separate t r e a t m e n t because they can not be directly obtained from the Hankel case by some mechanical translation and the results are different, since, although similar to the Euclidean algorithm, the recurrence is not the same. For example, we did not deepen the ideas of minimal indices for Toeplitz matrices as we did for Hankel matrices. So we shall also be brief in this chapter about the Toeplitz case. Instead of proving new things, we just give an example or rather give a system interpretation of an example that we gave earlier. E x a m p l e 6.8 ( E x . 4.5 c o n t i n u e d ) Let us interpret the results of Example 4.5 as solutions of a mixed partial realization problem. First of all, let us assume that we have a continuous time system with the first Markov parameters - 1 , 0, 1/2, 0 , - 1 / 4 , 0, 1/8, 0 , - 1 / 1 6 , . . . and first time coefficients 0, 0, 1/2, 1/4, 1/8, 5/16, 9/32, .... For example, we find that the polynomial fraction c0,2 (z) - z3 - 4 z 2
ao,2(z)
1 z 2 +2z-8 z 4 + 4 z 3 + -~
is a mixed partial realization for the first po,2 + P3 - 4 Markov parameters and the first po,2 + P+ -- 4 time coefficients. In the same way, we can interpret the result as a mixed partial realization for discrete time system data when we take into account that the time coefficient information is now around 1 instead of around 0. Hence, co,2[z'_ - -'I)_
a o , 2 ( x - 1)
- x 3 - x 2 + 5x - 3
x4
__
~ x 2 + 9x
2
25
is a partial realization of order and degree 4 for the same time coefficients as above but for the transformed Markov parameters ( z - x - 1) -1,
-1,
-1/2,
1/2, 7/4, 11/4, 23/8, 13/8,
-17/16, ....
Note that the steady state behaviour of the original system will not necessarily be matched by the partial realization because the realization is not stable not only for the continuous time interpretation but also for the discrete time viewpoint. Indeed, the zeros of a0,2(z) are approximately equal to 1.0257, - 0 . 4 5 6 9 • 1.2993i, -4.1118. <5
306
6.7
CHAPTER
6. L I N E A R S Y S T E M S
Stability checks
Because an approximation of a linear system should preferably be stable, it is important to check the location of the roots of the denominator in the transfer function. To check internal stabihty for example (Definition 6.22), we have to know the position of the eigenvalues of A with respect to the imaginary axis in the continuous time case or with respect to the unit circle in the discrete time case. Since the eigenvalues of A are also the roots of the denominator of the transfer function if (A, B, C) is a minimal realization, it is important to be able to say something about where the roots of a polynomial are found with respect to the imaginary axis or with respect to the unit circle. The Euclidean algorithm shows up in several versions when such tests are performed. The Routh-Hurwitz test is a method to check the position of the roots of a polynomial with respect to the imaginary axis. For the position with respect to the unit circle one has the inverse Levinson algorithm, known as the Schur-Cohn-Jury test. 6.7.1
Routh-Hurwitz
test
Many aspects of the Kouth-Hurwitz algorithm can be found in several well known books, e.g., [8, 148, 149, 186]. See also the bibliography [42]. An excellent and rather complete treatment of the topic is given in [107]. In this section we shall give a survey of the results that were obtained there. We shall consider in the beginning of this section the Euchdean algorithm in its simplest form and use for the normalization in each step uk = 1 and ck = - 1 . This means that if P0 and/)1 are two (in general complex) polynomials with deg P0 _> deg P1, then the Euclidean algorithm generates remainder polynomials Pk and quotient polynomials qk = Pk-1 div Pk such that P k - l ( Z ) : q k ( z ) P k ( z ) - Pk+l(Z),
k = 1,2,...,t
(6.17)
leading to a situation with Pt+l = 0 and hence Pt = gcd(P0, P1). If deg Pt = 0 then P0 and/)1 are coprime. Note that this corresponds to the continued fraction expansion P0 1 l..... 1] Pl -- q l - [ q 2 I qt" The simplest way to link the Euchdean algorithm with the real root location problem is through the theory of orthogonal polynomials. We say that a system of real polynomials {Pk}~=0 with deg pk - k is orthogonal on
6.7.
STABILITY
307
CHECKS
the real line with respect to a positive measure # if
oo
with 6k,t the Kronecker delta and vk - Ilpkll2 > 0. The link with the general algebraic o r t h o g o n a h t y which we discussed in C h a p t e r 4, is by the H a m b u r g e r m o m e n t problem. If the m o m e n t s for the m e a s u r e / z are defined as
#k -
V
zkdtt(z),
k - O, 1, . . ., n
then if all the Hankel matrices Hm - [ttk+l]~,t=o, m -- O, 1 , . . . , n are positive definite, the measure # will be positive and the inner product is expressed as
(pk
,
r PkHmPt, m >_ m a x { k , t } ,
pj(z)-
T pjx.
In the Hamburger moment problem, one investigates the existence, unicity and characterization of the measures t h a t have a (infinite) sequence of prescribed moments. See [1]. The following theorem is a classical result which is known as Favard's theorem. T h e o r e m 6 . 1 7 Let Pk, k = 0 , . . . , n (n <_ cr be a set of real m o n i c polynomials with deg pk = k. T h e n these p o l y n o m i a l s are orthogonal with respect to a positive m e a s u r e iz on the real line iff they satisfy a recurrence relation of the f o r m p - 1 = O, PO = 1 P k T l ( X ) - ak(x)pk(T.) + ~ k P k - l ( X ) = O,
k = O,..., n-
1,
where a k ( x ) = x + 7k with 7k E R a n d 6k > O.
For a proof, we refer to the literature. See for example [92, Theorem 1.5, p. 60]. Note t h a t for orthogonal polynomials which are not necessarily monic, but assuming they have positive leading coefficients ),t > 0, we can write a similar recurrence relation [222, p. 42] Pk+l(X) - a k ( x ) p k ( x ) + 6 k P k - l ( X ) = O,
where now ak(x) = flkx + 7k with 7k E R and flk -- ~ k + l / ~ k
> 0
and
6k > 0.
Another observation to make is t h a t if pn+l and pn are polynomials of degree n + 1 and n respectively whose zeros are real and interlace, then they
308
C H A P T E R 6. L I N E A R S Y S T E M S
can always be considered as two polynomials in a sequence of orthogonal polynomials. To see this, we assume without loss of generality that both their leading coefficients are positive. We construct pn-1 by the Euclidean algorithm, so that
PnTl(X) -~- Pn_l(X) =
an(X)pn(X)
where an(x) is of the form flax + 7n with ~,~ > 0. Since at the zeros (i of pn, the right-hand side is zero and because pn+l((~) ~ 0 by the interlacing property, we see that Pn+l((i)Pn-l((z) < 0. Thus Pn-1 alternates in sign at least n - 1 times. Thus it has at least n - 1 zeros which interlace the (i. Thus, degpn-1 = n - 1 and its leading coefficient is positive. The same reasoning is now applied to pn and pn-1 etc. By Favard's theorem we can conclude that the sequence pk, k = 0, 1 , . . . , n is a sequence of orthogonal polynomials with respect to a positive measure on the real line. C o r o l l a r y 6.18 Consider two polynomials Pn+l and Pn of degree n + 1 and degree n respectively. Then these polynomials are orthogonal with respect to a positive measure ~ on the real line iff the zeros of pn+l and pn are real, simple and interlace. P r o o f . The interlacing property of the real zeros for orthogonal polynomials is well known [222, p. 44] and follows easily by induction from the recurrence relation. The converse has been shown above. 77 This result implies the following observations. If P is a real polynomial of degree n, with n simple zeros on the real line, then its derivative P~ is a polynomial of degree n - 1 whose zeros interlace the zeros of P. Thus if we apply the Euclidean algorithm to the polynomials P0 = P and P1 = P~, then it should generate a recurrence relation for orthogonal polynomials. Thus we get the following theorem. T h e o r e m 6.19 The zeros of the real polynomial P will be real iff the Euclidean algorithm applied to Po - P and P1 - P~ will generate quotient polynomials of degree 1 with a positive leading coefficient. The zeros of P are simple iff Pt -- gcd(P0, P1) and deg Pt -- 0. P r o o f . This follows from our previous observations and because obviously the polynomials Pk -- P n - k / P t should be a system of orthogonal polynomials. If deg Pt - 0, then P and P~ are coprime and hence the zeros of P are simple. [~
6.7. S T A B I L I T Y C H E C K S
309
In fact, a more general result exists giving more precise information about the number of zeros in a real interval. This is based on the theory of Cauchy indices and Sturm sequences. To formulate this result, we introduce the following definitions. D e f i n i t i o n 6.27 ( C a u c h y i n d e x ) Let F be a rational function with a real pole ~. Tracing the value of F ( x ) when x crosses ~ from left to right, then either we have a jump from - c ~ to +co, in which case the Cauchy index at is 1, or a jump from +oc to - c ~ , in which case the Cauchy index at ~ is - 1 or the left and right limit at ~ give infinity with the same sign, in which case the Cauchy index at ~ is set to zero. Let I be a real interval such that the boundary points are not poles of F. Then the Cauchy index of F for I is the sum of the Cauchy indices at all the poles of F that are in I. If I - [a, b], we denote it as I b { F } . Let P be a real polynomial such that P ( a ) P ( b ) # O, a < b and let P' be its derivative. Then by a partial fraction decomposition, it is easily seen that, the Cauchy index of F = P ' / P gives the number of distinct zeros of P in the interval [a, b]. D e f i n i t i o n 6.28 ( S t u r m s e q u e n c e ) A sequence of polynomials {pk} tk = 0 is a Sturm sequence for an interval I - In, b] if
p0(a)p0(b)# 0 2. If ~ 6 I and Pk(~) - 0 and 1 <_ k < t then Pk+l(~)Pk-l(() < O; if k - O then pl(~) # O 3. pt(z) # 0 for all z 6 I. The Cauchy index of pl/po can be easily computed when we have a Sturm sequence. This is formulated as the following well known property [104] [148, Theorem 6.3a, p. 445]. T h e o r e m 6.20 Let P 0 , P l , . . . ,Pt be a Sturm sequence for the interval In, b], let ~ @ [a, b], and let Y(~) denote the number of sign changes in the sequence p o ( ~ ) , . . . , p t ( ~ ) . Then the Cauchy index, of F - Pl/Po is given by I b { F } -
-
v(b).
It is is if
is obvious from the definition of the Euclidean algorithm that when it applied to real polynomials, then the sequence pk = Pk/Pt, k = 0 , . . . , t a Sturm sequence for any interval [a,b] for which Po(a)Po(b) ~ O. So we choose P0 = P and P1 = P', then the number of distinct zeros of
310
C H A P T E R 6. L I N E A R S Y S T E M S
P in the interval [a, b] is given by the Cauchy index I b { p ' / P } , hence it is immediately found by applying the Euclidean algorithm and counting sign changes. I b { p ' / P ) -- Y ( a ) - Y(b). All the zeros in [a, b] will be simple if Pt has no zeros in [a, b]. If P has a zero of multiplicity m, then Pt will have the same zero of multiplicity m - 1. One can derive from these results for real polynomials also results for complex polynomials, or get information about the location of the zeros with respect to a half plane or a circle. See for example [148]. We shall not elaborate this here, but instead give a direct approach to the problem of determining the location of the zeros of complex polynomials with respect to the imaginary axis. This approach is based directly on the Euclidean algorithm. It can be found in [107]. For complex polynomials, the results will refer to the location of zeros in the left or right half plane or on the imaginary axis, whereas the previous results were concerned with the location of zeros on the real axis. Since the transformation of the real axis to the imaginary axis corresponds to a transformation z ~ iz, it follows that we are able to drop the normalization of the Euclidean algorithm that we used above by setting Uk = 1 and Ck = --1. Thus we are in the convenient situation that we can use the Euclidean algorithm in its simplest form, i.e., we compute for given complex polynomials P0 and/91 Pk-l (z) = qk(z)Pk(z) + Pk+l (z),
k = 1, 2 , . . . , t
(6.18)
where qk = Pk-1 div Pk and Pt = gcd(P0, P1). The complex conjugate is indicated with a bar. We first introduce some definitions. D e f i n i t i o n 6.29 ( p a r a - c o n j u g a t e , p a r a - e v e n / o d d , p s e u d o - l o s s l e s s ) For any complex function f , we define its para-conjugate (w.r.t. the imaginary f,(z)= A complex polynomial P is called para-even when P, - P and para-odd when P, = - P. If a rational function F satisfies F, = - F , it is said to be pseudo-lossless. Let P C C[z] with deg P = n. We define the polynomials
1 Po(z) - ~ [ P ( z ) + ( - 1 ) n P , ( z ) ] and
1 P~(z)- ~ [ P ( z ) - (-1)nP,(z)].
Then Po is para-even if n is even and para-odd when n is odd and the opposite holds for P1. They are called the para-even and para-odd parts of
6.7. S T A B I L I T Y CHECKS
311
Po
Let ~ be a zero of a rational function F. We say that ( is a para-conjugate zero if F(~) = 0 implies F, (~) = O. Para-conjugate poles are defined along the same lines. To avoid technical difficulties, we shall assume in what follows that the leading coefficient of P is real. Thus we have deg P0 = deg P > deg P1 or deg/)1 = deg P > deg P0. Our definition of para-conjugate should not be confused with the definition of para-conjugate which was given in Chapter 4. Here, the paraconjugate is taken as a reflection in the imaginary axis" z ~ - ~ . As in Chapter 4, later in this section we shall also consider the para-conjugate with respect to the unit circle, which will refer to a reflection in the circle: z--~ 1/~. The previous definition says that r is a para-conjugate zero of F if also (, - ~ is a zero of F. Thus a para-conjugate zero lies either on the imaginary axis or it shows up in a pair of zeros symmetric w.r.t, the imaginary axis. Note also that all the zeros of para-even or para-odd polynomials are necessarily para-conjugate. Conversely, if the polynomial has a paraconjugate zero, then this zero will be a common zero of its para-even and its para-odd part. Because on the imaginary axis, the para-conjugate of a function is just its complex conjugate: F,(iw) = F(iw), a pseudo-lossless function has a real part t h a t vanishes on the imaginary axis. A para-odd polynomial is a pseudo-lossless function with all its poles at infinity. Obviously, when F is pseudo-lossless, then also 1 / F is pseudo-lossless and thus will the zeros and poles of a pseudo-lossless function be para-conjugate. Thereby one has to consider a pole at infinity as lying on the imaginary axis. -
D e f i n i t i o n 6.30 ( i n d e x ) If P and Q are coprime polynomials such that F = P/Q is pseudo-lossless, then its index. I ( F ) is defined as
I ( F ) = N ( P + Q) where N ( T ) is the number of zeros of the polynomial T in the (open) right half plane, each zero being counted according to its multiplicity. Note t h a t by definition I ( F ) = I ( 1 / F ) . If F = P / Q is pseudo-lossless, then P + Q can not vanish on the imaginary axis. Indeed, if P(~) + Q(~) = 0, and Q(~) ~ 0, then F ( ( ) = - 1 , which is impossible on the imaginary axis because F has a vanishing real
C H A P T E R 6. L I N E A R S Y S T E M S
312
part there. If Q ( ( ) were zero, then P ( ( ) + Q ( ( ) = 0 implies that also P ( ( ) = 0, and then P and Q would not be coprime. A pseudo-lossless function with index zero is a classical lossless function as used in circuit theory. The following lemma is easily verified. L e m m a 6.21 The quotient of two polynomials with the same para-parity is always para-even and the quotient of two polynomials with an opposite para-parity is always para-odd. The para-parity of the remainder is always the same as the path-parity of the numerator in the quotient. A para-odd polynomial has a very special form" L e m m a 6.22 Consider the para-odd polynomial
P ( z ) - ao + a l z + ' ' ' + anZ n. Then its coefficients are of the form ak - ik+lck with ck C R, k - 0, 1 , . . . , n . P r o o f . We first set ak - ikbk, without loss of generality. P.(z) + P ( z ) - O, we find that
Then from
n
+
-
o
k=O
for arbitrary z. Hence bk + bk - 0, or bk is purely imaginary. Thus ak D i k b k - ik+lck with Ck E R. Since a para-odd polynomial is a pseudo-lossless rational function with aH its poles at infinity, we can consider its index. This can be computed as f, .llows. Theorem
6.23 Let
P(z) - i [ c o ( i z ) n + c l ( i z ) n - 1 + . . .
+ cn] ,
Ck e ]~,
]g -- 0 , . . . ,
n
be a path-odd polynomial, then I(P)
=
n/2, (n+sgnc0)/2,
if n is even ifnisodd.
P r o o f . We have to find I ( P / 1 ) - N(1 + P). Just like 1 + P has no zeros on the imaginary axis ( P is pseudo-lossless and hence Re P(iw) - 0 for w C R) also 1 + a P will have no zeros on the imaginary axis for any a > 0.
6.7. S T A B I L I T Y CHECKS
313
Therefore, since the zeros of a polynomial are continuous functions of its coefficients, we have N(1 + P) = N(Q=) with Q= = 1 + a P , for any a > 0. Furthermore N ( Q = ) = N ( Q ~ ) w i t h Q ~ ( z ) = zn[1 + aP(1/z)]. The zero z = 0 of multiplicity n of the monomial z n will be perturbed into n roots by adding a z n P ( 1 / z ) to this monomial. For a small enough, these roots can be approximated by the roots of z n + a i n+lc0 and the latter are given by
k = O, . . ., n - 1 ,
~k=pexp{i(r with
r
arg(in-lco)/n,
and
p-I~coi
If n is even, then, because there are no zeros on the imaginary axis, there are precisely n/2 roots in the left half plane and the same number of roots in the right half plane. If n is odd, there will be either (n + 1)/2 or ( n - 1)/2 zeros in the open right half plane and the others are in the open left half plane. This depends on the value of r = (n • 1)~r/(2n) where the +1 is for the case co < 0 and the - 1 for the case Co > 0. Thus r = ~r/2 • ~r/(2n) and hence for co < 0, the root ~0 will be in the left half plane and for co > 0, it will be in the right half plane. Therefore the number of roots in the right half plane is given by (n + sgn co)/2. [::1 For a more general pseudo-lossless function we can use a similar continuity argument and we obtain a proof for the following property which we leave as an exercise to the reader. The details can be found in [83, Theorem 2]. 6.24 Let F be a pseudo-lossless function whose distinct poles in
Theorem
the closed right half plane are given by ( 1 , . . . , ~r. If #k is the multiplicity of ~k and if the principal parts of the Laurent expansions of F at these poles is given by c k ( z - ~k) -uk for ~k ~ oo and ckz "k for ~k = ~ , then I ( F ) = i~ + . . . + i, with ik ik ik
=
~tk ~
=
#k12,
-
(#k + i u h + l s g n c k ) / 2 ,
for Re ~k > 0 for Re ~k = 0 and lzk even for Re ~k - 0 and lzk odd.
Note that in the last case, ck is indeed a real number so that sgn ck is well defined. This has as an immediate consequence the remarkable additivity property. See [83, Theorem 3].
C H A P T E R 6. L I N E A R S Y S T E M S
314
T h e o r e m 6.25 If F1 and F2 are two pseudo-lossless functions, which do not have a common pole in the closed right half plane, then I(F1 + F2) =
Z(F ) + Let P0 and P1 be the para-even and para-odd parts of P. Since we assumed that the leading coefficient of P was real the degrees of P0 and P1 are different. Assume deg P0 = n > deg/91 (if not one has to exchange the role of P0 and P1). Suppose we apply the Euclidean algorithm with starting polynomials P0 and P1. As we said before, we only consider here the simplest form of the Euclidean algorithm, that is, we have the recursion (6.18) for the polynomials Pk. L e m m a 6.26 Let P be a complex polynomial of degree n with para-even and para-odd part given by Po and P1. Let the sequence of remainder polynomials Pk and the sequence of quotient polynomials qk be as constructed by the Euclidean algorithm (6.18) for k = 1 , . . . , t. Let Pt = gcd(P0, P1). Then ~
The para-parity of all P2j is the same as the parity of n and opposite to the para-parity of all P2j+I.
2. The rational functions Fk-1 = Pk-1/ Pk and the quotients qk are pseudo-lossless. 3. Suppose deg Pt >_ 1. Then ( is a para-conjugate zero of P if and only if it is a zero of Pt. P r o o f . The first and second point follow from Lemma 6.21. Thus Pt will be either para-even or para-odd. Thus it can only have para-conjugate zeros. These zeros are common zeros of P0 and P1, hence also zeros of P = P0 + P1. Conversely, para-conjugate zeros of P are common zeros of P0 and P1, hence are zeros of Pt. D Define now the pseudo-lossless functions Fk = Pk/Pk+l. Then it holds that
Fk-i =qk + l/Fk,
k= 1,...,t
and because the para-odd polynomial qk is a pseudo-lossless rational function with all its poles at infinity, and by construction 1/Fk is strictly proper, qk and 1/Fk do not have common poles in the closed right half plane, so that by the additivity property
6.7. S T A B I L I T Y C H E C K S
315
Thus, by induction t
I(Fo) -
~
I(qk).
k=l
Setting Q - P / P t , then obviously N ( P ) -
N ( Q ) + N ( P t ) . Furthermore,
P Po P1 _ x( Po/ P, p /p ) - Z(Fo). N ( Q ) - N ( -~t ) - N ( -~t + -~t )
So that we have t
N(P) - N(Q) + N(Pt) - ~
I(qk) + N(Pt).
k=l
Thus P will have N ( P t ) para-conjugate zeros in the open right half plane. The number of zeros in the open right half plane which are not paraconjugate is given by N ( Q ) . To find I(qi), we can use Theorem 6.23 which we gave above. As a special case, consider a situation where the polynomial has real coefficients. The para-even and para-odd parts are then given by
Po(z) - c , z n + cn-2 Xn--2 + cn-4z "-4 + " " and
P l ( z ) - c,-1
xn-I
+ cn-3z
n-3
+ cn-5
xn-5
+""
when
P ( z ) - cnz n + s
Xn--I
2t- Cn--2 xn--2 -4- "'"-'~ CO.
For a stable polynomial, all the zeros have to be in the left half plane. Thus there can be no para-conjugate ones. This implies that deg Pt - O. Moreover we need ~ k I(qk) = O. The only possible way to obtain this is that all qk are of degree 1, and that their leading coefficients are positive. Thus the Euclidean algorithm is nondegenerate (all quotients have degree 1), with para-odd quotient polynomials qk(x) = ~kz, and since we need I(qk) = 0 we should have sgn flk = + 1. Thus f~k > 0 for all k. We have thus proved T h e o r e m 6.27 (classical R o u t h - H u r w i t z a l g o r i t h m ) A realpolynomial P of degree n has all its zeros in the (open) left half plane 5t is said to be strictly Hurwitz or stable) iff the Euclidean algorithm applied to the paraeven and para-odd part of P has n nondegenerate steps and all the quotient polynomials qk are of the form qk(z) = ]~kX with ~k > O.
316
CHAPTER
6.
LINEAR
SYSTEMS
This problem of stability became famous since it was proposed by A. Stodola to his colleague A. Hurwitz around 1895. It had been studied earlier by E. P~outh and A. Lyapunov. The l~outh array [104] is a practical organization of the Euclidean algorithm. It arranges the coefficients of the polynomials Pk from highest degree coefficients to the constant t e r m as the rows of a triangular array. The polynomial is then stable if the elements in the first column of the array are all positive. We have given the criterion for real polynomials only, although it is a simple m a t t e r to generalize this to complex polynomials. A more serious problem is that the test only works if all the zeros are strictly in the left half plane. If they are not, the algorithm can be degenerate and in this case we loose more precise information. This however can be remedied. It is indeed possible to extract much more information about the zero location of a complex polynomial. For example, how many zeros there are on the imaginary axis, what multiplicity they have etc. To find the multiplicity, of the zeros of a polynomial Q0 = P , it is obvious that if we start the Euclidean algorithm with P0 = Q0 and P1 - Q~, then a g.c.d. Qi - gcd(P0, P1) will contain all the zeros of Q0 that have multiplicity at least 2. Restarting the procedure with P0 = Q1 and P1 - Q~, we find a g.c.d. Q2 which contains all the zeros of P of multiplicity at least 3 and we can repeat this until we have found the multiplicity of all the zeros of P. Suppose this procedure ends with deg Q~ = 0, thus t h a t P has a zero of multiplicity s but not higher. By defining the polynomials Ps -- Q s - 1 / Q s ,
and
pk - Q k - l Qk+l / Q ~ ,
k - 1,..., s - 1
(6.20)
we see t h a t they have simple zeros and t h a t we have found a factorization of the polynomial P which clearly shows the multiplicity of its zeros P(z) - cpl(z)p](z)...p~(z),
c-
Q-~I e C.
In other words, the number of distinct zeros of multiplicity k is given by deg pk. If we are able to decide how many of the para-conjugate zeros of P, i.e., how many of the zeros of Pt are strictly in the right half plane and how many are on the imaginary axis, then we know exactly how the zeros of P are distributed on the imaginary axis, to the left and to the right of it. Recall from L e m m a 6.26 that P~ is para-even or para-odd. In t h a t case, we can easily compute the information we want. So in our previous approach, we now suppose that Q0 = P is para-even or para-odd, then all its zeros are para-conjugate. By induction, also the zeros of the subsequent Qk, k - 1 , . . . , s - 1 will be para-conjugate. Hence, the rational functions
6.7. STABILITY CHECKS
317
Fk+l - Q~/Qk, k - 0 , . . . , s -
1 which can be constructed from these polynomials will be pseudo-lossless and have simple para-conjugate poles. More precisely, let iwj, j = 1 , . . . , j k be the purely imaginary zeros of P which have a multiplicity nj >_ k + 1 and let (r l - 1 , . . . , Ik be the other para-conjugate pairs of zeros of P with a multiplicity mt _ k + 1, then these zeros are precisely the zeros of Qk where they will appear with multiplicity n j - k and m l - k respectively. Thus Fk+l has the form
F~+~(z) - ~
+
j--1 Z -- io.)j
+ 1--1
_
Z -- ~l
Z "~- ~!
The index of Fk+l is the sum of the indices of each (pseudo-lossless) term in the sum. To compute the latter, note that N ( z - iwj + nj - k) - 0 and that N [ ( z - ~,)(z + ~t) + ( m , - k)(2z - r + r - 1. Thus I(Fk+l) the right half Qk which are k = 0,..., s-
is equal to the number of poles of F k + 1 which are strictly in plane. By construction this is the number of distinct zeros of in Re z > O. Thus, because N(Qk) = I(Fk+l)+ N(Qk+I) for 1, we find by induction that s-1
N(Qo) - N(P) - ~ I(Fk+ 1) k-O
and hence P has E I ( F k + l ) pairs of para-conjugate zeros and thus deg ( P ) 2 ~ I ( F k + l ) purely imaginary zeros. The practical way to find I ( F k + l ) i s by applying the Euclidean algorithm to the polynomials P0 - Q k and/91 - Q~. When it produces quotient polynomials qj, then I(Fk+l) - ~ j I(qj) and the latter are easily found as in Theorem 6.23. E x a m p l e 6.4 Consider the example P ( z ) - ( z - 2)(z 2 - 1)3(z 2 + 1) 2 9 explicit form, this is P ( z ) - Po(z)+ P~(z)with
In
Po(z)-z 11-z 9-2z 7+2z s+z 3-z
and Pl(z) - - 2 z 1~ -}- 2z 8 + 4z 6 - 4 z 4 - 2z 2 + 2 the para-even and para-odd parts respectively. ends after one step. Po(z) -
z 11 -
z 9 - 2z ~ + 2z 5 + z 3 -
The Euchdean algorithm
z
P l ( Z ) - - 2 z I~ + 2z 8 + 4z 6 - 4 z 4 - 2z 2 + 2
P~.(z)- o
--,
Oo(z)-
--, q l ( z ) - - z 1 2
- 2 z 1~ + 2 z ~ + 4 z ~ -
4z ~ - 2z ~ + 2
CHAPTER
318
6. L I N E A R S Y S T E M S
Hence, because I ( q l ) = 1, there is exactly one zero in the right half plane which is not p a r a - c o n j u g a t e (the zero z - 2). To investigate the p a r a - c o n j u g a t e zeros, we have to analyse the g.c.d. p o l y n o m i a l Q0, which is a para-even polynomial. So we s t a r t again the Euclidean a l g o r i t h m with P0 - Q0 and P1 - Q~. This gives
eo(z) PI(Z)P2(z) P~(z) P4(z)Ps(z)I(F~) -
- 2 z 10 q- 2z 8 q - 4 z 6 - 4z 4 - 2z 2 --k 2 - 2 0 z 9 + 16z 7 + 24z s - 16z 3 - 4z 2(z 8 + 4z 6 - 6z 4 - 4z 2 + 5 ) / 5 9 6 ( z ~ - z ~ - z ~ + z) 2(z 6 - z 4 - z 2 + 1) 0 I ( q ~ ) + I ( q , ) + I ( q ~ ) + I(q~) -
= Qo(z) q~(z) : z / l O q2(z) = - 5 0 z -+ q~(z)= zl24o q4(z) = 48z ~ Ql(z)2(z 6- z 4 - z 2 T l)
Since deg Q1 - 6 > O, we have to s t a r t again with Po - Q1 a n d / ) 1 - Q~. Po(z) P~ (z) P~(z) P3(z)
-
2 ( z ~ - z ' - z ~ + 1) 4 ( 3 z ~ - 2z ~ - z) 4 ( - z ~ - 2z ~. + 3)/6 3 2 ( - z 3 + z) P4(z) - - 2 z 2 + 2
-+ q l ( z ) - z / 6 q2(z) = - 1 8 ~ --, q ~ ( z ) = z/48 --+ q 4 ( z ) = 16z
Ps(z) - 0 I(F2) - I ( q l ) + I ( q 2 ) + I ( q 3 ) + I(q4) -- 1
Q ~ ( ~ ) - - 2 ( ~ ~ - 1)
Again deg Q2 - 2 > 0 so t h a t we have to s t a r t a n o t h e r Euclidean sweep with P0 - Q2 a n d / ) 1 - Q~. Po(~)P~(z) : P~(z) : Pz(z) = I(F3) =
-2(z ~-4z 2 o
1) q~(z)--* q 2 ( z ) --, Q ~ ( z ) -
z/2 -2z 2
I ( q l ) + I(q2) = 1
Now deg Q3 - 0 and we have all the i n f o r m a t i o n we need. T h e n u m b e r of p a r a - c o n j u g a t e pairs of zeros off the i m a g i n a r y axis is I ( F ~ ) + I ( F 2 ) + I ( F 3 ) 3 (the pair • with multiplicity 3). T h u s there r e m a i n 11 - 1 - 3 . 2 - 4 zeros on the i m a g i n a r y axis (the zeros + i with multiplicity 2). We can find the factorization of Q0 by defining
p~(z)-
Qo(z)Q2(z) = 1 Ql(Z)2 '
p2(z)-
Q l ( z ) Q 3 ( z ) _ z2 -t- 1 Q2(z)2 '
6.7.
STABILITY
319
CHECKS
and Q
(z) =
Q (z) Then 1 z 2 + 1) 2 ( 1 Qo(Z) - ~ P l ( Z ) P 2 ( z ) P 3 ( z ) - 2(
z 2 )3 .
The number of distinct pairs of para-conjugate zeros in these factors which are not on the imaginary axis is given by I(F2) - I ( F 1 ) = 0 for Pl, I ( F 3 ) I(F2) = 0 for P2 and I(F3) = 1 for P3.
We can summarize the results in a general form of the Routh-Hurwitz algorithm, which is formulated as Algorithm 6.1. Algorithm 6.1" Routh_Hurwitz Given P = cnz n + . . . E C[z] with 0 ~ cn C R Set P0 = [P + ( - 1 ) n P . ] / 2 and P~ = [ P - ( - 1 ) n P . ] / 2 if deg P0 < deg P1 t h e n interchange P0 and P1 N=M=O k=l
w h i l e Pk Pk+l qk = if ak
r 0 = Pk m o d Pk-1 Pk div Pk-~ = - i ~ k ( i z ) ~'k + " " + 7k odd t h e n N - N + sgn~k
k=k+l
endwhile Q0 = Pk-1, s = 0 w h i l e deg Q, > 0 Po - Q,; P1 - Q's; M s - 0
k=l w h i l e Pk ~ 0 Pk+l = Pk m o d Pk- 1 qk = Pk div Pk-~ = - i f l k ( i z ) '~h + " " + 7k if ak odd t h e n Ms = Ms + sgnflk k=k+l endwhile M = M + Ms, s = s + 1, Q~ = Pk-1 endwhile
CHAPTER 6. LINEAR SYSTEMS
320
In a first cycle, the g.c.d. Q0 of the para-even and the para-odd part of the polynomial P is computed. Then, this g.c.d, polynomial is further analysed in the s-cycle of Euclidean algorithms applied to P0 = Q~ and P1 - Q',. The N and M counters are used to give the information about the location of the zeros of P. For example the number of zeros in the right half plane is given by N ( P ) = ( n - N - M)/2. To see this, recall that t
N(P) - ~ I(qk) + N(Pt). k=l
Now,
I(qk)
_
f
gn k)/2
-
akl2
Since ~ k deg qk - n - deg Pt and ~=k I(qk)
-
(n -
if ak - deg qk is odd if ak - deg qk is even. odd
sgn flk -- N, we have
deg P t
-
N)/2.
k
Since Qo - Pt, we have by a similar argument that (degQk_l-degQk-Mk_l)/2,
k-
1,...,s
is the number of zeros of Q o - Pt in the right half plane which have multiplicity k. Summing this up for k - 1 , . . . , s and using degQ~ - 0, we get $
N(Pt) - (deg Pt - ~ Mk-1)/2 -- (deg Pt - M)/2. k=l
Therefore
N(P)-
degPt + d e g P t 2 - M = n - N 2 - M
n-N-2
The number of (para-conjugate) zeros on the imaginary axis is given by N o ( P ) - M because
N~
degP*-M)2 -M.
Thus the number of zeros in the left half plane is given by (n + N - M)/2. Indeed, this number is n-
N(P)-
No(P)
-
n-
n-N-M 2
-
M
-
n+N-M 2
6.7. S T A B I L I T Y C H E C K S
321
The polynomials (6.20) could be defined to give the factorization
Qo(z)-
c e c.
The number of zeros of Pk that are on the imaginary axis is given by No(pk) = Mk-1, and the number of (para-conjugate) zeros of pk which are in the right half plane is given by N (Pk) = (deg pk - M k - ~) / 2. A number of classical zero location criteria can be obtained as a special case of this general l~outh-Hurwitz algorithm. For example, the classical l~outh-Hurwitz algorithm also works for complex polynomials. A polynomial P C C[z] is said to be Hurwitz in a strict sense if all its zeros are in the open left half plane. We have C o r o l l a r y 6.28 ( s t r i c t s e n s e H u r w i t z ) The polynomial P C C[z] is strict sense Hurwitz iff the general Routh-Hurwitz algorithm gives quotient polynomials qk(z) - ~kz + iTk where ~k > 0 and deg Q0 = 0. A polynomial P E C[z] is said to be Hurwitz in wide sense if all its zeros are in the closed left half plane and if there are zeros on the imaginary axis, then these zeros should be simple. C o r o l l a r y 6.29 ( w i d e s e n s e H u r w i t z ) The polynomial P C C[z] is wide sense Hurwitz iff the general Routh-Hurwitz algorithm gives quotient polynomials qk(z) - ~kz + iTk where ~k > 0 and deg Qo - 0 (in which case it is Hurwitz in strict sense) or deg Q0 > 0 but deg Q~ = 0 (in which case there are deg Q o simple zeros on the imaginary axis). Also the criterion for the zeros of a real polynomial to be real can be generalized to complex polynomials. Therefore we have to use the transformation z ~ iz to bring the real axis to the imaginary axis. If P ( z ) has only real zeros, then P(iz) has only zeros on the imaginary axis. These are paraconjugate and therefore P(iz) is either para-even or para-odd. Thus the first cycle in the Generalized P~outh-Hurwitz algorithm should vanish and we can immediately start with P0 - Q0 - P and P1 - Q~ - P~. Working out the details is left to the reader. The result is as follows. C o r o l l a r y 6.30 ( r e a l z e r o s ) The polynomial P C C[z] has only real zeros iff the general Routh-Hurwitz algorithm which should now be applied with initial polynomials Po = P and P1 = P~ (instead of the para-even and paraodd parts of P) is regular with quotient polynomials qk(z) - ~kz + iTk where ~k > 0 and Tk C R. Moreover, if the pk, k = 1 , . . . , s are defined as in (6.20), then P has deg Pk zeros of multiplicity k. P has only simple zeros if degQ1 = 0. Several other related tests can be derived from this general P~outhHurwitz algorithm. For more applications we refer to [107].
C H A P T E R 6. L I N E A R S Y S T E M S
322 6.7.2
Schur-Cohn
test
It will not be surprising that the stability checks for continuous time systems which require an appropriate location of polynomial zeros with respect to the imaginary axis have an analog for discrete time systems, where the location with respect to the complex unit circle is important. LetD{z e C ' [ z [ < 1 } , ' l g - {z e C ' [ z [ 1} and E - {z e C " [z[ > 1} U {oo} denote the open unit disk, the unit circle and its exterior respectively. The most classical test to check whether all the zeros of a complex polynomial are in D is the Schur-Cohn-Jury test [64, 163, 164, 186, 215]. Again the link can be easily made when considering orthogonal polynomials with respect to the unit circle. Let # be a positive measure on the unit circle, then the sequence {pk}~, n < oo is called a sequence of orthogonal polynomials on the unit circle with respect to # if
(Pk,Pt) - fv p k ( t ) p t ( t ) d # ( t ) - 5k,tvk,
k,l - O, 1 , . . . , n
with 6k,t the Kronecker delta and u k - Ilpkll2 > 0. Note incidentally that if the trigonometric moments are given by
#k-fv tkd#(t)' kcZ, then
( P k , P t ) - pHTmpk,
m _> max{k,l},
pj(t)-
pTt
where T,,~ is the Toeplitz matrix T,n - [#t-k]'~,t=o Such polynomials are called Szeg6 polynomials [222]. It is well known that the zeros of these polynomials are all in D. To give the recurrence relation for Szeg6 polynomials, we need to adapt our notion of pard-conjugate which we have given for the imaginary axis, to the situation of the unit circle.
Definition 6.31
( p a r a - c o n j u g a t e , r e c i p r o c a l ) For any complez function f , we define its pard-conjugate (us.r.t. the unit circle) as the function f , (z) =
If P is a complex polynomial of degree < n, then its reciprocal with respect to n is given by P # (z) - z"P, (z). We say that ~ is a pare-conjugate zero of a polynomial P if P(() = 0 and also P( (. ) = 0 with ~, = 1/~. A polynomial P is called e-reciprocal if P # - eP with e C T. A polynomial is called reciprocal if it is e-reciprocal with e = 1. A polynomial is called anti-reciprocal if it is e-reciprocal with e = - 1 .
6.7.
STABILITY
323
CHECKS
The definition of reciprocal depends on the degree of the polynomial. It should be clear from the context what degree is assumed. If P ( z ) = c n z n + c,,-1 Zn-I + " " + Co, with c,., - c n - 1 = " " = c k + l - 0 and ck ~ 0, then it is said to have a zero of order n - k at co. Thus P C C_~[z] has a zero of order n - k at co iff P # has a zero of order n - k at the origin. Note t h a t the zeros of P are para-conjugates of the zeros of P # i.e. if P ( ( ) - 0 then P # ( r - 0 with (. - 1/~. This implies t h a t ( i s a para-conjugate zero of P if and only if it is a common zero of P and P # . Para-conjugate zeros are on "ll' or appear in pairs ((, (.). Thus ( is a paraconjugate zero of P iff it is a zero of g c d ( P , P # ) . In other words, the greatest common divisor of P and P # collects all the para-conjugate zeros of P and it has only these zeros. A polynomial is e-reciprocal iff it has only para-conjugate zeros. As for the real line case, the Szeg6 polynomials satisfy a three term recurrence relation, but they also satisfy another type of recurrence which is used in the Levinson algorithm. For simplicity, assume t h a t the Szeg5 polynomials are monic, then we can write this recurrence as
I, Pk+l(z) -- ZPk(Z)"~- PkTlP~k(Z),
p0
-
Pk-.}-I e ]D),
k - 0,...,
7/,- 1.
(6.22)
The Pk are known as Szeg5 or Schur parameters or reflection coefficients or partial correlation (PAt~COI~) coefficients. Note that they are given by = pk(o).
It is also well known t h a t the zeros of the Szeg5 polynomials are all in lD. Thus, to check if a polynomial P of degree n has all its zeros in lD, we could check if it is the n t h polynomial in a sequence of Szeg5 polynomials. To do this, we can rely upon Favard's theorem which says t h a t a sequence of polynomials is a sequence of Szeg5 polynomials iff it satisfies a recurrence of Szeg5 type. Thus, given pn = P monic, we can invert the recurrences as
(1-
IPkl2)zpk_
(z) - p k ( z ) -
pkp
(z),
Pk -
pk(0)
k - n,n-
1 , . . . , 1.
Thus if all the Pk which are generated in this way are in ID, then p,, = P will be the n t h Szeg5 polynomial in a sequence of orthogonal polynomials because it can be generated by the Szeg5 recurrence and hence it has all its zeros in D. Note that we can leave out the factor ( 1 - Ipkl 2) since it will not change the value of pt: - P k ( O ) / P # k ( O ) . This is the computation used in the classical Schur-Cohn test. We thus found t h a t if P is stable, then the pk are all in ID. But the converse is also true. If all the pk are in D, then P will be stable. Therefore
C H A P T E R 6. L I N E A R S Y S T E M S
324
we note that if P has all its zeros in D, then Sn - P / P # will be a Blaschke product of degree n which is analytic in D and its modulus equals 1 on T. This is a Schur function. D e f i n i t i o n 6.32 A complex function f is called a Schur function if it is analytic in D and if ]f(z)] _< 1 for all z E D. A classical theorem by Schur gives us the tool to invert the previous statement about the Blachke product. Theorem
6.31 ( S c h u r )
p = S(O) C D
and
The function S is a Schur function iff either i S(z)- p S - I ( Z ) - ; I -fiS(z)
is a Schur function
-
or
S ( z ) is a constant function of modulus 1. Its proof is easily obtained by observing that if p = S(O) C D, then
z-p
1
-
is a MSbius transform which is a one-to-one map of the unit disk into itself. Moreover Mp(S) has a zero in the origin which can be divided out. Hence the Schur function S is m a p p e d into the Schur function S-1. If S(0) C T, then this map is not well defined, hence the second possibility, which follows by the m a x i m u m modulus principle. Note that if S - P / P # , with P a polynomial of degree n, then we can express S ~ S-1 as
1 P(z) - p P # ( z ) S _ l ( z ) - z P # ( z ) - -tiP(z)'
p-
p(o)/P#(O),
or in terms of polynomials: S-1 - P-1//)-#1 with P-1 (z) - z1 [P(z) - pP#(z)],
19#-1(z) - P # ( z ) - -tiP(z),
P(0) P - P#(0)
Note that this corresponds to one step in the inverse Szeg6 recurrence. So, we reformulate it as follows: Given a polynomial P of degree n, set Pn = P and generate until Pt#_t(O)- 0 (t _> O) the sequence of polynomials Pk as Pk(O)
Pk#_l(Z) - P k # ( z ) - --#kPk(z),
k - n,n-
1,...,t.
(6.23)
6.7.
STABILITY CHECKS
325
We refer to it as the classical Schur-Cohn algorithm. Thus if Pn has all its zeros in D, then Sn - P n / P ~ will be a Schur function of degree n and thus the Schur-Cohn algorithm will generate Schur parameters pk which are all i n D . If for s o m e t > 0 we get Pt C T, then St is a c o n s t a n t and Pt-1 is identically zero. Thus Pt is pt-reciprocal: P t ( z ) - p t P ~ ( z ) . Since common zeros of Pk and P f are also common zeros of Pk+l and Pk#+I it follows that Pt - g c d ( P , P #) if pt C T. Since Pt can only have para-conjugate zeros, this is impossible if we started with a Pn that had all its zeros in D. Thus the polynomials Pk, k = 0, 1 , . . . , n satisfy a Szeg6 recurrence and thus they are orthogonal by Favard's theorem. To recapitulate, we have obtained the following analog of the Favard Theorem 6.17. T h e o r e m 6.32 Let Pk, k - 0 , . . . , n be a set of complex monic polynomials with deg pk = k. Then these polynomials are orthogonal with respect to a positive measure on the unit circle iff they satisfy a recurrence relation of
the/o,'m For a proof of this Favard type theorem, see for example [89]. The property saying that the zeros of Szeg& polynomials are in D is also classical. See for example [222]. From our previous observations, we also have the following analog of Corollary 6.18. C o r o l l a r y 6.33 Let Pn be a complex polynomial of degree n, then it is the nth polynomial in a sequence of polynomials orthogonal with respect to a positive measure on the unit circle iff it has all its zeros in D. We also have proved the classical Schur-Cohn test. T h e o r e m 6.34 (classical S c h u r - C o h n t e s t ) Let P be a complex polynomial of degree n, then it has all its zeros in D iff the classical Schur-Cohn algorithm (6.23) generates n reflection coefficients pk which are all in D. If we do not start from a polynomial P with all its zeros in D, then it may happen that P ~ ( 0 ) - 0. If Pt-1 is identically zero, we still have Pt - g c d ( P , P #) as before. If P ~ I ( 0 ) - 0 without P t - i being identically zero, then P ~ i ( 0 ) - P ~ ( 0 ) - - f i t P t ( O ) - 0 or St(O) - 1/-fit and since by definition St(O) = pt, it follows that Pt = 1~-fit. Thus also in that situation we have pt C T. To deal with this situation conveniently, we rely again on the index theory of pseudo-lossless functions.
D e f i n i t i o n 6.33 ( p s e u d o - l o s s l e s s , p s e u d o - i n n e r ) A rational function F is called pseudo-lossless (w.r.t. T ) if F + F, = O. A rational function S is called pseudo-inner (w.r.t. T ) if S S , = 1.
326
C H A P T E R 6. L I N E A R S Y S T E M S
Note that there is a one-to-one mapping between the pseudo-lossless and the pseudo-inner functions given by the Cayley transform S-
1-F I+F
or
1-S I+S"
F=
(6.24)
Obviously, a rational pseudo-inner function S is of the form S - P / P # with P a polynomial. Definition 6.34 ( i n d e x ) Let T be a polynomial and denote by N ( T ) the number of zeros of T that are in D. The index of a rational pseudo-inner function S - P / P # with P and P # coprime is defined as I ( S ) - N ( P #). The index of a rational pseudo-lossless function F = P / Q with P and Q coprime is defined as I ( F ) = N ( P + Q). If S and F are related by (6.24), then they have the same index: I ( S ) = I ( F ) . Indeed, if S - P / P # , then F - ( P # - P ) / ( P # + P), so that I(F) - N(2P # ) - I(S).
The following properties hold for pseudo-lossless fuctions. For a proof we refer to the literature [83]. See also [81, 84]. The proof goes along the same lines as the proof of Theorem 6.24 or it can be directly obtained from that theorem by applying a Cayley transform mapping the half plane to the unit circle. T h e o r e m 6.35 1. If F is a rational pseudo-lossless function then 1/F, F + ic with c C R are also pseudo-lossless of the same degree and with the same index as F. 2. If F1 and F2 are rational pseudo-lossless functions without common pole, then F = F1 + F2 is pseudo-lossless of degree deg F = deg F1 + deg F2 and with inde~ I( F) = I( F1 ) + I( F2 ). 3. If F is a rational pseudo-lossless function with distinct poles 7k C T of multiplicity tk, for k = 1 , . . . , t and with poles ~t E D of multiplicity st, for l = 1 , . . . , s , then F = F1 + . . . + Ft + G1 + " " + G s + ic with c C R and -
fjr
j=l
+, z
rj
z
and at(z)
-
j
j=l
_
z- G
1 - ~jz
6.7. S T A B I L I T Y CHECKS
327
All the Fk and Gl are pseudo-lossless and I ( a l ) - st and I(Fk) - tk/2 if tk is even while I ( F k ) - ( t k - sgn ftk)/2 if tk is odd. Moreover, they have no common poles and I ( F ) - ~':~=1 I(Fk) + ~-~=11(Gl). If P is a complex polynomial and if g o d ( P , P # ) - Pt, then P - Q Pt and since Pt is e-reciprocal, we also have P# - eQ # Pt. Hence
F = 1 - P / P # _- P# - P __ eQ # - Q 1 + P/P# P# + P eQ# + Q is a pseudo-lossless function and S - g / P # function and obviously
I(F)-
I(S)-
- Q/eQ # is a pseudo-inner
N(Q#).
Thus
N ( P #) - N ( Q #) + N ( P t ) -
I ( S ) + N(Pt).
To find N ( P # ) , we should be able to compute I ( S ) and N(Pt) with S = P / P # , a pseudo-inner function and Pt - g c d ( P , P # ) an e-reciprocal polynomial. The Schur-Cohn algorithm actually does this. Indeed, with the Pk as in the classical Schur-Cohn algorithm ( 6 . 2 3 ) a n d Sk - Pk/Pk#, we have deg Sk - deg Pk - k. Now the MSbius transform z
~-+
z-p 1 - -pz
maps D onto D if p C D and maps D onto E if p C E. Because this mapping is one-to-one, we have
I(Sk) - I ( S k _ l )
ifpkEDand
I(Sk) - k - I ( S k _ l )
ifpkcE.
This allows us to compute conveniently I(Sn) as long as we do not meet the singular situation pk C T. We know that this will happen when P~-I (0) - 0. Either Pk-~ is identically zero, and then Pk - g c d ( P n , P ~ ) a n d Sk is a unimodular constant (N(Sk) = 0). Or Pk-1 is not identically zero. This is a curable breakdown of the Schur-Cohn algorithm which can be solved as follows. Since P~_I(0) - 0, it has a zero at the origin. Suppose this zero has multiplicity m. Because P~-I - P ~ - PkPk with Pk C T, it is an e-reciprocal (with respect to k) polynomial with e = - p k and hence it will have degree k - m. Thus it is of the form (z) - z m T k _ 2 m ( z )
328
C H A P T E R 6. L I N E A R S Y S T E M S
where T k _ 2 m is an e-reciprocal polynomial of degree k - 2m. The pseudolossless function of degree k l + p k S k = P~ +-PkPk 1 - ~kSk Pff - P~Pk
i~k-
will have a para-conjugate pair of poles (0, co) of multiplicity m. These poles can be isolated in an elementary pseudo-lossless term Gm of the form ,n
hj
where the hj are suitable complex numbers, so that Fk - G m
+
F k - 2 rn
with Gm having only poles at 0 and co and F k _ 2 m pseudo-lossless and analytic in 0 and co. By the additivity property of Theorem 6.35(2), we have I ( F k ) - I ( G m ) + I ( F k - 2 m ) and because I ( G m ) - m, we get I(/~k) - m + I(Fk-2r,,).
We can transform this back to the pseudo-inner formulation by setting 1 + Fk-2m S k - 2m =
i ~ Fk_2rn
and we have
z ( s k ) - Z(~kSk)- Z ( P k ) - m + Z(Fk_~m) - m + Z ( S k _ ~ ) . To put this procedure into a practical (polynomial) computational scheme, we should be able to generate the polynomial Pk-2m in the representation Sk-2m - Pk-2m/P#_2m from the polynomial Pk in Sk - Pk/Pk#. Therefore, we note that Gm has the form
v,,,(z) -
D2,.,(z) Z TM
with D2m a polynomial of degree 2m which has the particular structure D2m(z)
-
do + dlz +
=
b,,,_,(z)- ~'+'b~_,(z).
9 . .
+ d~-i
Z m-1
zm+l - -din-1
.....
-doz2m
6.7. S T A B I L I T Y C H E C K S
329
Letting
Wk(z) F~(~)- ~"Tk_~,,,(z)
P:(~) + -p~P~(~) P~ ( z ) - -ZkPk ( z )
and
Fk_~,,,(z)- y~_~,,,(z)
Tk-2m(z)
P:_~,,,(z) + P~_~,,,(z)
wzJ
T~_~,,,(z)- ~
with polynomials k
w~(~)-
k-2m
~
and
ts~ s,
j=O
j=0 we find from the relation Fk - G m
+ Fk-2m that
Tk_2~(z)D2m(z) + z m V k - 2 m ( z ) - Wk(z).
(6.25)
Equating the coefficients of z j, j - 0, 1 , . . . , m - 1, one sees t h a t the second term is not involved and we find t h a t the coefficients di, j - 0 , . . . , m - 1 of the polynomial Din-l, which completely defines the polynomial D2,,,, can be found by solving the triangular Toeplitz system
to tl
to
i
i
tin- 1
tin- 2
do dl
Wo wl
i
i
din- 1
ZOrn- 1
"'. "
99 to
Once D2m is known, we can solve (6.25) for Pk-2m is found as
Yk-2rn
and finally the polynomial
P~-2m = Tk-2m + Vk-2m. So we have managed to j u m p from Pk to Pk-2m, thus from Sk to Sk-2m in the case Pk C T. After that, the Schur-Cohn algorithm can go on. Since we are now able to deal with any situation where pk can be in D, in E or on T, the Schur-Cohn algorithm will only stop when pt C T and Pt-1 identically zero so t h a t Pt - g c d ( P , P # ) . The computation of the index I(S) thus follows the rules
I(Sk)
--
I(Sk-1),
I(Sk)
-
k-I(Sk_l),
I( Sk )
-
m + I( Sk_2m ),
if Pk e D ifpkeE ifpkcT
C H A P T E R 6. L I N E A R S Y S T E M S
330
where in the last case m is the multiplicity of the zero z - 0 in P~-I" Thus we are able to compute I ( S ) where S - P / P # and we end up with I ( S ) and Pt - g o d ( P , P # ) . Thus
N ( P #) - I ( S ) + N(Pt) - N ( P # / P t ) + N(Pt). Example
6.5 ( G e n i n [83]) We consider the polynomial
P(z)-l+z+z
2 + 2 z 3 + z 4 + z 5.
To compute N ( P ) , we set P5# - P and find that in the very first step of the Schur-Cohn algorithm we have p5 - P s ( O ) / P ~ ( O ) - 1. The polynomial P ~ is given by P 4 # ( z ) - z 2 ( - 1 + z) which is not identically zero. Thus we have to rely on the special procedure to overcome this singular situation. Note that z = 0 is a zero of multiplicity m - 2. The general formula P~_l(Z) - z'~Tk_2m(z) gives Tl(z) - - 1 q- z. The n u m e r a t o r of/55 is W ~ ( z ) - P ~ ( z ) + -;~P~(z) - 2 + 2 z + 3z ~ + 3z ~ + 2~ ~ + 2z ~
Thus we have to solve the triangular system
giving do = - 2 and
dl
-
- 4 . Thus
D4(z) - - 2 - 4z + 4z 3 + 2z 4. We then find V1 as
V~(z) - z - 2 [ W s ( z ) - Tl(z)D4(z)] - 7(1 + z) and hence PlY(Z) - y , ( z ) + T , ( z ) - 6 + 8z.
We note t h a t I ( $ 5 ) = 2 -}- 1(81)
and we can continue with the Schur-Oohn algorithm using P1#.
px - PI(O)/PI~(O)- 4/3 E E. Thus /(Sl)
-- I - - / ( S 0 )
-- I
We get
6.7.
STABILITY
331
CHECKS
since So is a constant. This means that Ps and P ~ are coprime and thus N ( P ) - N(P~#) - I ( S s ) - 2 + I ( S 1 ) - 2 + 1 - 3.
There are no para-conjugate zeros. There are 3 zeros in D and 2 zeros in E. In general P and P # can have a nontrivial g.c.d. P~ which will contain all the para-conjugate zeros of P. Then we have to find a convenient way to compute N ( P t ) . Note that Pt is always e-reciprocal. Our next objective is therefore to compute N ( Q ) with Q an arbitrary e-reciprocal polynomial where for simplicity we assume that Q ( 0 ) ~ 0. We start by noting that such a polynomial is always of the form ~!
Tn I
Q(z) - ,7 I I ( z - n) TM I-[ 1--1
n
[(z - r
- ~z)]
~
q, zJ
- ~
m--1
(6.26)
j--O II
with 77 C T, TI C T, l - 1 , . . . , I ' , 0 ~ (m E D, m - 1 , . . . , m ' and ~ t = l nt + TIq~t 2 ~ m = l nm - n. Define the pseudo-lossless function F(z) - n-
(6.27)
2z Q'(z---~)
Q(z)
where Q' is the derivative of Q. Using a partial fraction decomposition of F and the definition and additivity of the index of pseudo-lossless functions, it can be shown [108] that i e m m a 6.36 With Q of the f o r m (6.26) and F as in (6.27), we have that I(F) = m'= N(Q). P r o o L By a partial fraction decomposition, we see that F(z)
-
+
1=1
7"/-- Z
~-~nm
m=l
+
(rn-- Z
-
,
l--~mZ
which incidently shows that F is pseudo-lossless. For the terms in the first sum we note that X nt
-N Tl--
(nt+l)Tt+(nt--1)z
--0.
Z
For the terms in the second sum we have with bin(z) - 2((m - ~ m z 2) and am(z) - (,,,,, - ( 1 + I(,nl2)z + (,~z 2 that its index is given by
i
nm~
- N('~mbm + ~m).
C H A P T E R 6. LINEAR S Y S T E M S
332 Setting
P(z)
-
n ~ b m ( z ) + a~(z)
-
(1 + 2n,,,,)~,,,, - ( 1 + Ir
+ (1 -
the product of its zeros is (1 + 2nm)/(1 - 2nm)" ~,~/~,~ r T. Thus it has no para-conjugate zeros so that we can apply the Schur-Cohn test with the theory we have seen so far. We obtain p2 and pl C E. Thus N ( P ) = N(nmbm + am) = 1. Using the additivity property for the indices, we see t h a t indeed I ( F ) = m'. [:] Thus I(F) counts the number of para-conjugate pairs of zeros of Q which are not on T. To compute I(F) practically, we first note 6.37 With Q as in (6.26) and T(z) = n Q ( z ) - 2zQ'(z) the numerator of the function F in (6.27), we have g c d ( Q , Q') = g c d ( Q , T).
Lemma
P r o o f . Obviously, any common zero of Q and Q' is also a zero of T. Conversely, a common zero of Q and T is also a zero of zQ'(z). Since Q is e-reciprocal, it can not have a zero at the origin because this would imply t h a t it is not of degree n. Thus a common zero of Q and T is also a zero of Q'. This proves the lemma. 0 Let Q1 = g c d ( Q , Q'). Then, since F = T / Q , we have
I(F)-
T/Q1 I(Q/Q1)-
N[(Q + T)/Q1] - N(U/Q1)
with n
U(z) - Q(z) + T(z) - (n + 1 ) Q ( z ) - 2zQ'(z) - ~ . ( n - 2j + 1)qjz j. j=O Since also Q1 is e-reciprocal and
Q - VQ1,
Q ' - RQ1,
T - WQ1,
hence
U - (V + W)Q1,
for some polynomials V, R, W, it follows that Q1 is also the g.c.d, of U and U #. Thus Q1 can be computed by applying the Schur-Cohn algorithm to V. Moreover, the algorithm provides us with N(U/Q1) = I(F). Because Q1 = g c d ( Q , Q'), it contains the zeros of Q which have a multiplicity at least 2. Because Q1 is again e-reciprocal, thus of the same form as Q, this procedure can be repeated, replacing Q by Q1. Thus, after the classical
6.7.
333
STABILITY CHECKS
Schur-Cohn algorithm has computed Q - P~ - g c d ( P , P # ) , the general Schur-Cohn algorithm will append to it the computation of the polynomials
Qo - Q,
Uk(z) - n k Q k ( z ) - 2zQ~(z),
nk -- deg Qk,
k - O, 1 , . . . , s - 1
and apply the classical Schur-Cohn steps to these Uk to give Qk+l - gcd(Uk, Uff) - gcd(Qk, Q~)
and
Nk - N ( U k / Q k + I ) .
This iteration stops when eventually degQ, = 0. The number of paraconjugate pairs of zeros of Q, hence also of P, which are not on T and that have a multiplicity at least k is given by Ark. Thus if P is a complex polynomial and Q0 - g c d ( P , P #) is e-reciprocal, then the number of its para-conjugate pairs of zeros which are not on 2" is given by N ( Q o ) = $--1 ~k=0 Ark and the number of zeros of P that are on "IF is given by N o ( P ) No(Qo) = deg Q 0 - 2N(Q0). In conclusion we can say that the total number of zeros for P that are in D is given by N ( P ) = N ( P / Q o ) + N(Qo). By applying the Schur-Cohn algorithm to P we get N ( P / Q o ) and Q0. By the s subsequent Schur-Cohn sweeps applied to the polynomials U0, U1,..., U~-I, we get the quantities N ( Q o ) and N o ( P ) = go(Qo). As in the Routh-Hurwitz case, one can obtain from the Qk the factors pk that collect precisely the zeros of Q that have multiplicity k. The formulas are the same p~ - Qs-1/Q~,
and
pk - Q k - l Q k + l / Q ~ ,
k-
1,...,s-1
and
Q o ( z ) - cpl(z)pi(z)...pSs(z),
c e C.
with N ( p k ) - Nk - N k - 1 , the number of para-conjugate pairs of zeros which are not on 21"and that have multiplicity k. See [108]. E x a m p l e 6.6 Consider the polynomial 9 1 ) 2 ( z - 2 ) ( z - 1 / 2 ) - 1 - 5z + 7z 2 _ 59 Z 3 +
Q(z) - ( z -
We construct 4
U(z)
j=o
27
9
2j),sz'
-
-
5-
-z
2
+ 7z
+
-
2
3z
Z4
"
C H A P T E R 6. L I N E A R S Y S T E M S
334
Applying the Schur-Cohn algorithm to this polynomial P ~ = U, we get the successive steps 5-
+
9 Z3 -
3z4
9 --~ p3 -- - g E E
-r~ + ~ z ~ + ~ z 3 17 9 19_2
~--
pg(z)
P~(z)-
19
20 + g z - $6~ -
-
3
--*p4---g ED --'* P 2 -
z)
i-fEE
---, pl = - 1 E T.
Thus we arrive in the singular situation Pl E T. But since P0# - 0, we have found that gcd(Q, Q ' ) - gcd(U, U #) - Q l ( z ) -
Pp(z)-
18 --~-(1 - z)
which contains the zero z = 1, which is indeed the only zero of Q that has multiplicity larger than 1. To count the number of para-conjugate pairs, we need -
=
3-
3-(2-
N C P I # / P I # ) ) - 3 - 2 - 1.
Thus Q has one pair of para-conjugate zeros which are not on T of multiplicity 1, and it has one zero on T of multiplicity 2.
We mention that both the P~outh-Hurwitz and the Schur-Cohn algorithm have an operation count which is O(n 2) with n - deg P. However, since the Schur-Cohn algorithm works with the full polynomials while the l~outhHurwitz algorithm needs para-even and para-odd polynomials which contain approximately half the number of coefficients of the full polynomials, the Schur-Cohn algorithm will need about twice as many operations as the Routh-Hurwitz algorithm. This observation has led Y. Bistritz [16] to design an alternative for the Schur-Cohn algorithm which makes use of the symmetric and anti-symmetric parts of the polynomial when it has real coefficients. These polynomials, like e-reciprocal polynomials show a symmetry in their coefficients so that only half of them have to be computed, and thus also for the circle case the number of operations can be halved just like in the l~outh-Hurwitz test. Later, this gave rise to so called "split" versions of the Levinson and Schur algorithm as explored by Ph. Delsarte, Y. Genin and Y. Kamp [75, 76, 77, 78, 79, 80, 106]. They also give a generalization of the Bistritz test to complex polynomials [81].
6.7. S T A B I L I T Y CHECKS
335
We first sketch the ideas of the split Levinson algorithm and show how the inversion of this algorithm leads to a stability test in the normal case. Later on we give the complex Bistritz algorithm for the general case. Suppose that {pk} are the Szeg6 polynomials. Define a new set of polynomials for arbitrary nonzero wk by R0=w0+~0, 0#w0EC Rk(z) -- "WkPk_ # 1 (Z) Ar ~ k Z P k _ l (Z),
"U)k -- R k ( O ) ,
k -
1, 2, . . . .
(6.28)
Obviously these polynomials form a set of reciprocal polynomials (whence the notation R), meaning that Rff - Rk. Because of the symmetry in the coefficients of such polynomials, they are completely defined in terms of the first half of their coefficients. It is possible to give an alternative expression for the Rk. L e m m a 6.38 The polynomials Rk of (6.28) can also be expressed as
okak(z) -
~kp~(~)+
~kPk(z),
~k -- O k - l ( ~ k -- P--k~k),
where the ak are defined by the recurrence 0 ~ o~ -
o-k_~(i
- I,okl~),
k -
a-1
= ao E
k -- 0 , 1 , ~ , . . . (6.29) R and
o, 1, 2,...
where we have introduced the artificial po = O. P r o o f . This is immediately obtained by using the SzegtJ recurrence (6.22) in (6.28). o The interesting thing about these reciprocal polynomials Rk is that they satisfy a specific three term recurrence on condition that the parameters wk are chosen in an appropriate way. T h e o r e m 6.39 One can choose the parameters wk such that the polynomials (6.28) satisfy a three term recurrence of the form
Ro = wo + Wo Rl(z) = Wl + @1z
~k+l(Z) --
(6.30)
(0l k + - ~ k Z ) R k ( Z ) + Z e k _ l ( Z )
: O,
k :
1, 2 , . . .
P r o o f . This is obtained by using the definition (6.28) and the Szeg6 recurrence (6.22)in the expression R k + l ( z ) + z R k - l ( Z ) . This takes the form (ak + -Skz)Rk(z) when we set ak = wk+l/wk which is defined in terms of wk, wk-1 and the Schur parameters pk-1 and pk such that ak - Wk+l _
(Wk-lPk-1 --
wk-i)
k - 1, 2, 3, .
~
~
336
CHAPTER
6. L I N E A R S Y S T E M S
Using the expression for ak and the definition of the Tk, it is immediately seen that a k - wk+l _- ~k'-_____A,1 k - 1, 2,3, ... (6.31) "tOk
Tk
T h e o r e m 6.40 The reciprocal polynomials Rk and the Szeg6" polynomials pk are also related by Rk+l(z)-
)~k+lzRk(z) - wk+l(1 - ~kz)p#k (z),
k - O, 1 , 2 , . . .
(6.32)
where ~k+~ = w--k+~ak _ Rk+~(,?k)
~k
k-
0 ~ 1 ~ 2 ~''*
nkRk(nk) '
The quantities l/)k+l, Tk and ak are as above and ~?k - wk+~Tk C T
k - O, 1 2,
ll3k+ l T k
P r o o f . This follows easily by combining the formulas (6.28) and (6.29).
The )~k are called Jacobi parameters [78, 79]. Finally, for the Schur-Cohn algorithm, the most important observation is that
~k+l
llJ---k+1 Tk-1 (Tk
Ak
~krkak-z
= I~kl~-(1 - Ipkl~),
k - 1, 2, 3 , . . .
(6.33)
Therefore we find that Ak+t/Ak is real and that Pk C D (V, IE) iff Ak+l/Ak > 0 ( - 0, < 0). The efficient version of the Schur-Cohn test will therefore reverse the recurrence (6.30) to compute the polynomials Rk by
R k _ , ( z ) - z-X[(~k + ~ k z ) R k ( z ) - Rk+a(z)],
k-
~, ~ -
1,..., 1
(6.34)
where ak = Rk+l(O)/Rk(O). This will work as long as Rk(O) ~ O, which will be assumed for the time being. Note that all the polynomials Rk are reciprocal and we thus have to compute only the first half of its coefficients. To get started, we have to find appropriate Rn and Rn+l. Given a polynomial P, this P will play the role of the Szeg6 polynomial ion, but since in general it will not be orthogonal with respect to a positive definite
6.7. STABILITY CHECKS
337
measure on the unit circle, we switch to the notation with capitals. Thus we set Pn = P. We assume without loss of generality that it is monic, thus P # ( 0 ) - 1. A possible choice is then
Rn+l(z)- P#(z) + zP(z)
and
(6.a5)
R , ( z ) - P(z) + P#(z).
Both of them are reciprocal and the relation (6.32)holds with p,(z) = pn(z), wn+l = 1, ~,~ - 1 and ),n+l = 1. This is a most convenient situation because it follows from (6.31) that arg (Wk+l)+ arg ('rk) = arg (Wk) + arg ('rk_l) while ~n = 1 implies a r g ( w , + l ) + arg (Tn) = 0. Therefore, it follows by induction t h a t for all k we have arg (wk+l) + arg (rk) = 0 and thus that all ~k = 1. Thus Ak+l = Rk+l (1)/Rk(1). The evaluation of a polynomial in the point 1 is particularly easy since we only have to add its coefficients. Because all the polynomials Rk are reciprocal, we can compute Rk(1) by taking twice the sum of the real parts of the first half of its coefficients. Thus all Rk(1) and hence all Ak are real numbers. Since we only need the sign of the )~k, hence of the Rk(1), we can even discard the factor 2. Let us first consider the nondegenerate situation. In the Schur-Cohn algorithm, this means that none of the pk (except for the last one) are on T. By the definition of the rk (6.29) we see that this means that all rk # 0 and by (6.31) this also implies that all ak ~ 0 and thus that all wk # 0. The converse however is not so simple. We do know the following: Since ~k = 1, we get from (6.32)
P : (z) - Rk+l (z) - /~k+l z R k ( z ) Wk+I(1--Z) with P ~ ( 0 ) -
1 and pk - Pk(0). Thus
Pk(O) _ 1 (~k+l,Wk __ ,t.Ok+l) __ 'Wk+l ('~k+___~l Pk - pk# (O) - ~ k+----~ t//%+l O~k
1).
Therefore Pk C T iff 1 - )~k+l/ak C 2". This implies that if wk = Rk(0) = 0, then ak = oo, so that pk E T. Also, from (6.29), we find
O'k-l'WkTk-1 Thus if R k - l ( 1 ) = 0, then ~k = oo and thus Pk E T.
--
~
1
338
CHAPTER
6.
LINEAR
SYSTEMS
However Rk(0) = 0 is not necessary for Pk to be in 21" and neither is Rk_~(1) = 0. It may happen that pk E ql" but Rk(0) r 0 as the following example shows. In fact all kinds of situations can occur as we will illustrate IIOW.
E x a m p l e 6.7 Suppose P ( z ) - 1 + z + 2 z 2 + z 3 so that p3 - P ( O ) / P # ( O ) 1. The polynomial P has one real zero inside ]D and two complex conjugate zeros in E. The algorithm should be initialized with R , ( ~ ) - ~ + 3z + 2 z ~ + 3~ ~ + z ~
~nd
e~(z) - 2 + 3z + 3z ~ + 2z ~
After one step one obtains R2(z) - ( - 1 + 2 z - z2)/2 and we note that R2(1) = 0. However, the algorithm can terminate without some Rk(0) becoming zero. We have R 4 ( z ) = 1 + 3 z + 2z 2 + 3 z 3 + z 4
R3(z) = 2 + 3z + 3z 2 + 2z 3 R~(z) - (-1 + 2z- z~)/2 R~ (z) = - 5 ( ~ + z) R0(z) = - 2
R,(1) R~(1) R~(1) R~(1) R0(1)
= = = = =
10 10 0 -10 -2
Another example: consider P ( z ) - 1 + z + z 2 + 2 z 3 -[- z 4 -+- z 5 where again p~ = 1. The initialization is R6(z) - 1 + 2z + 2z 2 + 4z 3 + 2 z 4 + 2 z 5 + z 6 R s ( z ) = 2 + 2z + 3z 2 + 3z 3 + 2 z 4 + 2 z 5.
and
Now we obtain after one step R4(z) - ( z - 2z 2 + z3)/2, thus R4(0) - 0 and R~(1)=o.
A last example: P ( z ) = 1 + z + 2z 2 - z 3 with P3 = - 1 . The initialization is R4(z)--l+3z+2z
2+3z 3-z 4
and
R3(z)-3z+3z
2,
which immediately gives R 3 ( 0 ) = 0 but R 3 ( 1 ) ~ 0.
Suppose we have a nondegenerate situation and let us consider an example of a polynomial P # of degree 5, for which the classical Schur-Cohn algorithm computes the polynomials P ~ and the Schur parameters pk, k = 5 , . . . , 1 as in the table below. If also the inverse split Levinson algorithm is applied to this polynomial to compute the reciprocal polynomials Rk and the coefficients )~k, then the relation (6.33) implies the signs of the ~k and of the Rk(1) as indicated. We know that )~6 -- R6(1)/Rs(1) - 1,
6.7. S T A B I L I T Y CHECKS
339
so that R6(1) and Rh(1) have the same sign. We have assumed ample that R6(1) > 0. The other possibility R6(1) < 0 would signs of all the subsequent Rk(1). It is easily checked t h a t the zeros N ( P #) t h a t are in D corresponds to the number of sign the sequence Rk(1).
! R . ( 1 ) :> 0 sgn flips
N(P:)Rs(1) > 0
]
N(p~) R4(1) > 0
I
N(P:)Ra(1) > 0
!
3
N(P2#) R2(1) < 0 1
--t--
I
3
N(P1#)
R1(1) > 0 1
-'t-
in this exchange the number of changes in
I
-3 /t0(1) < 0 1=3
This observation holds in general as long as we have a nondegenerate situation. Nondegenerate means that all pk ~ T, thus that none of the Rk(0) and none of the Rk(1) become zero. We formulate it as a theorem, but leave the details of the proof to the reader. T h e o r e m 6.41 If the inverse split Levinson algorithm (6.34,6.35) is applied for a given complex monic polynomial P of degree n, then if there is no degenerate situation during the execution (all Rk(O) and all Rk(1) are nonzero) then the number of sign changes in the real sequence Rk(1) ~ O, k - n , . . . , 0 is equal to N ( P #), the number of zeros of P# in D. In particular, the polynomial P will have all its zeros in D, iff all the numbers Rk(1) have the same sign. This inverse split Levinson algorithm is however unable to deal elegantly with singular situations (where some pk C 'IF). Only when all wk = Rk(0) 0, k = n , n - 1 , . . . , t and Rt-1 =- 0, we can say something, l~ecall t h a t Rn+l (0) - P # ( 0 ) - 1 is always nonzero. Note t h a t when Rt-1 = 0, then R~+l(z) - (at + - h t z ) R t ( z ) . Thus g c d ( R t + l , R t ) = Rt. It is easy to see from ( 6 . 3 4 ) t h a t a common zero of Rn+l and Rn is also a zero of Rn-1, Rn-2, ... Thus if at a certain stage in the algorithm we find Rt-1 = 0, then we have found g c d ( R n + l , Rn) - Rt. Since we do not have to count sign changes for this, we do not need the polynomial P to be monic. So let us assume that P is any polynomial of degree n but t h a t it has no zero at z = 1. Such a zero is easily recognized and can be eliminated. Then we renormalize the resulting P such that 0 ~ P(1) C E. 6.42 Suppose 0 ~ P(1) C R. Then with the polynomials as defined above, we have g c d ( R n + l , Rn) - g c d ( P , P # ) .
Lemma
P r o o f . Any common zero of P and P # will also be a zero of Rn+l and Rn because of (6.35). Thus g c d ( P , P # ) divides g c d ( R n + l , Rn).
340
C H A P T E R 6. L I N E A R S Y S T E M S
Conversely, any common zero of Rn+l and Rn will also be a zero of a n + l ( z ) - R n ( z ) - ( z - 1)P(z) and of z R n ( z ) - Rn+l(z) - ( z - 1)P#(z). Now Rn(1) - Rn+l(1) - 0 iff Ke P(1) - 0. Thus the common zeros of Rn+l and Rn are common zeros of P and P # . This proves the lemma. [:] Thus we may conclude the following theorem. T h e o r e m 6.43 Given a complex polynomial P with 0 ~ P(1) C I~. Suppose that in the algorithm (6.3~,6.35) all Rk(O) ~ 0 for k = n, n - 1 , . . . , t and that Rt-1 - O. Then Rt - g c d ( P , P # ) . We note that this result depends only on the Rk(O) being nonzero and not on the fact that some pk is or is not on T. Because g c d ( P , P # ) contains precisely all the para-conjugate zeros of P, we arrive at the second stage of the algorithm where such zeros are treated. It has been explained before how this can be handled by the Schur-Cohn algorithm or by any alternative for it. This solves the problem that arrises when at a certain stage in the algorithm we find a polynomial Rt-1 that vanishes identically but none of the R k ( 0 ) a r e 0 for k = n, n - 1 , . . . , t. As we said before, there is no simple adaptation of the inverse split Levinson algorithm to deal with other singular situations. We shall have to switch to the complex Bistritz algorithm to solve these problems. The disadvantage of the complex Bistritz algorithm is there is not a simple link with the Szeg6 polynomials as in the inverse split Levinson algorithm. Most is surprisingly however, that, except for the initialization, these algorithms are formally exactly the same. We give the derivation of the complex Bistritz algorithm, based on the index theory as given in [81]. We drop the condition that P should be monic (which was mainly used to make the link with the monic Szeg6 polynomials) but instead assume that P(1) is real and nonzero. As we stated above, we can easily get rid of a zero z = 1 and then renormalize to make P(1) real. Thus assume P is a complex polynomial of degree n with 0 # P(1) C R. We split P # in its reciprocal and anti-reciprocal part P # = Rn + An where Rn = P # q- P is a reciprocal polynomial and An = P # - P is an anti-reciprocal polynomial. Because of the normalization P(1) e R, we can write An(z) - ( 1 - z ) R n _ l ( z ) with R~-I a reciprocal polynomial of degree n - 1. Then by definition, the complex Bistritz algorithm applies the same steps as in the inverse split Levinson recursion to these initial polynomials Rn and Rn-1 to give (in the nondegenerate situation) the reciprocal polynomials
Rk-X (z) -- z -1 [(C~k + -~kz)Rk(z) -- Rk+x (z)],
C~k --
k = n-
Rk+l(O) ak(O)
1 , n - 2 , . . . , 1.
6.7. S T A B I L I T Y CHECKS
341
If we then also compute the (real) numbers )~k = Rk(1)/Rk-l(1), then it turns out that the number N ( P #) of zeros of P # in D is given by the number of negative elements in the sequence Sk, k = n, n - 1 , . . . , 1. Before we show this general result, we give an example first. E x a m p l e 6.8 Consider the polynomial
P(z) - 1 + z + z 2 +
2z 3 + Z 4 "-~ 1 z 5
which has a pair of complex conjugate zeros in E, a pair of complex conjugate zeros in D and a real zero in D. Thus N ( P #) - 2. The Bistritz algorithm generates
k Rk( ) 5 4 3 2 1 0
53 + 2 z + 3 z 2 + 3 z 3 + 2 z 4 + 5 z3 s Z2 - Z3 - Z4 ) - ~1( - 1 - z + 1 - 3 z - 3z 2 + Z 3 89 + 5z + 3z 21 ~ ( 1 + z) 1
Rk(0)
Rk(i)
3/2
+
1/2 1
3/2 3/2
i/2
2
-3 -1/2 3/2 2/3
+ +
+ +
+
+
The number of sign changes in the sequence Rk(1) is 2 and this is equal to the number of negative ~k's.
The justification of this result is not as simple as in the Schur-Cohn test because the recurrence is not a simple translation of the Schur recurrence. If Pt - god(P, P # ) and S - P / P # is pseudo-inner, then N ( P # / P t ) - I(S). If Fn = (1 + S ) / ( 1 - S), then F,~ is pseudo-lossless and I ( S ) = I(1/F~,) = I(Fn). Thus the problem of counting N ( P # / Pt) reduces to the computation of I(Fn) where Fn(z) = R n ( z ) / [ ( 1 - z)Rn_l(Z)]. When we do not have degenerate situations, the degree of Fn is reduced by 2 in two steps, after which a pseudo-lossless function Fn-2 of the same form is obtained. This process is repeated until R0 is reached. We shall now describe in detail such a 2-step reduction of Fn to Fn-2. So suppose we have given at stage k the reciprocal polynomials Rk and Rk-1 and the pseudo-lossless function of degree k defined by F k ( z ) = R k ( z ) / [ ( 1 First, it is seen that this function has a simple pole at z = 1 which can be extracted by writing (all functions are pseudo-lossless) -
+
CHAPTER 6. LINEAR SYSTEMS
342 with
~ ( ~ ) _ ),k(1 + ~) 2(1- z) ' ~k
-
-
Rk(i) Rk-i(1)"
H~ has no pole at z - 1 and hence by the additivity property of the indices I(Fk) - I ( g ~ ) + I ( g ~ ) . On the other hand splitting Fk as
F~ - c~ + 1/a~ with alk(z)
__ Otk-1 -iF "~k-1 Z ,
Otk-1 __
1- z
Rk(O)
Rk_~ (0)
we find after elimination of Fk that 1
zRk_2(~)
C~(z) = [ H ~ ( z ) - a~(z)] + H~(z) - - ( 1 - z)Rk_l(z) where Rk_~(z)-
z-~[(~k_~ + - a k _ ~ ) R k _ ~ ( z ) - nk(z)].
(6.36)
Because, by construction, H~ - G ~ and H~ have no common pole, we have -
I(H~ - a ~ ) + I(H~)
= I(H~ - a~)+ I ( F k ) - I(H~) Some computations yield that
I(H~)- { while ~(H~ - G~) -
{1 o
1 if~k < 0 0 if,~k > 0
if 2Keak_l - ~k > 0 if 2Ke O~k_1 -- )~k < 0.
This can be expressed compactly as
z ( a ~ ) - ~(Fk)+ ~l[sgn)~k + sgn (2Ke ak_l
,~k)].
Setting z = 1 in (6.36) yields 2Keak_i - ~k = 1/~k-1 so that 1 Z(Fk) - I(V~) - ~(~g~ ~k + ~gn ~k-~).
For the second step, we isolate the pole z - 0 from G~ as follows (again all functions are pseudo-lossless) G ~ - K~ + K~
6.7. STABILITY CHECKS
343
with
K~(z) -
Or.k-2
F -~k-2Z~
Rk-l(0) ak-2 = Rk_2(0)"
Writing this as
a2k(z) - K~(z) - K ~ ( 1 ) + with K~(1) purely imaginary and F~_~(z)
-
(1- z)Rk-3(z)
where
Rk-3(z) - z - l [ ( a k - 2 + -ak-2z)Rk-2(z)- Rk-l(Z)] and
K ~ ( z ) - K~(1) - Lk(z) - - ( ak-2z + ~ k - 2 ) ( 1 - z ) . Obviously I ( L , , ) - 1 so that I ( G ~ ) - 1 + I(Fk-2) and therefore Z(Fk) - Z(Fk_~)+
1 -
~1 (sgn
,kk + sgn )~k-1)"
Since Fk-2 has the same form as Fk, we can repeat this two-step procedure until Rt-1 - 0. Since
1
1 - ~(sgn )~k + sgn ~k-1 ) --
/ 2l
if ~k < 0 and )~k-1 < 0 if ),kAk-~ < 0 0 if ),k > 0 and ~k-1 > 0
it follows that I(Fn) is equal to the number of negative ~k's for k = 1, 2 , . . . , n, which is the same as the number of sign changes in the sequence of Rk(1), k - 0, 1 , . . . , n . We shall not prove the Bistritz algorithm in the most general case. The reader is referred to [81] for all the details. The result is a four-step algorithm which is described as follows. Given a complex polynomial P with 0 ~t P(1) E R, it computes N ( P # / P t ) - I(Fn) and Pt where Pt - gcd(P, P # ) and Rn(z) f.(~)
-
(1 - z ) R ~ _ ~ ( z )
with
R,.,(z) - P#(z) + P(z)
and
R,~-I (z) -
P # ( z ) - P(z) 1-z
CHAPTER
344
6. L I N E A R S Y S T E M S
The iteration at stage k is entered with a pseudo-lossless function
Fk(z) -
ak(z) (1 - z)Rk_~ (z)
or equivalently, the reciprocal polynomials Rk and R,k-1 of the indicated degree. Then the following 4 steps are executed. STEP
1
If Rk-1 - 0 then stop, I(Fk) - 0 and Rk - gcd(P, P # ) . Otherwise write
ak-.(~) - ~"Rp_.(z).
R.,_~(O) ~ O.
p-
k-
2~.
Compute the anti-reciprocal polynomial D2m from
Rk(z) - (1 - z)D2m(z)Rr,_l(z ) m o d z TM. Compute the reciprocal polynomial Rp from
Rr,(z ) - z - m [ R k ( z ) -
( 1 - z)D2,,,,(z)Rr,_l(z)]
Define the pseudo-lossless function Fp and the Jacobi parameter )~p
aS
R.(z) Fp(z)-
(1 -
Z)Rp_l(Z)
and
x.= Rp(1) Rp_l(1)"
Then
~(F~) - ~ + ~(F~) Set k - p and go to STEP 2. STEP 2
Compute
Rk-2(z) - z-l[(otk-l-["-~k-l Z)Rk_l(Z)--Rk(Z)], Set Rk-l(1)
~k-~
=
Rk-2(1)"
Define the function C ~ ( z ) - - (1 - ~ ) a k _ ~ ( ~ ) zRk_2(z)
Rk(0) O~k_ 1
--
Rk-l(0)"
6.7.
STABILITY
345
CHECKS
Then I(Fk) - I(G#)+ a
with
o / Go to
0 1
5 (sgn)~k + sgn ;Xk_1 ) 1 ~(1 - sgn Ak)
-
STEP
if R k - l ( 1 ) - 0 and Rk-2(1)5r 0 if R k - l ( 1 ) r 0 and Rk-2(1)7 ~ 0 if R k - l ( 1 ) r 0 and R k _ 2 ( 1 ) - 0
3.
STEP 3
If Rk-2 - 0 then stop, I ( G ~ ) - 0 and R k - t - gcd(P, P # ) Otherwise write
Compute the reciprocal polynomial T.-1 from Rk-l(z) + zT,,_l(z)Rq_2(z)-
Compute the reciprocal polynomial
0mod
(1 - z) ".
Rq-1 from
R~_~(z) - (i - iz)-~[zT~_~(z)R~_~(z) +
Rk_~(z)].
Define the function
C~(z) -
-
( 1 - z)R~_~(~) zRq_2(z)
Then
Z(G~) - Z(G~)+
b
with
b-
if v is odd
{ 89 - i) l(v-
Set k - q and go to
1
STEP
sgn Rh_t(1)
if v is even.
4.
STEP 4
If Rk-2 - 0 then stop, R k - 1 -- gcd(P, P # ) . Otherwise write
346
C H A P T E R 6. L I N E A R S Y S T E M S C o m p u t e the reciprocal polynomial
U2w+l from
Rk-1 (z) -- U2w+l ( z ) R r - 2 ( z ) m o d z w+l. C o m p u t e the reciprocal polynomial R r - 3 from
R~_~(z)- z-(~+l~[u~+l(z)R,_~(z)- ak_~(z)]. Define the pseudo-lossless function
F~_~(z)
-
R~_~(z) ( ~ _ z)a,_~(z)
Then Z(G~)
Set k = r -
2 and go to
- ~ + ~ + ~(F,_~). STEP
1.
Remarks. 1. In step 1, m is the R k - l ( 0 ) ~ 0, which and step 1 becomes Rv-1 = Rk-1 so that
multiplicity of z = 0 as a zero of Rk-1. If is the nondegenerate situation, then m = 0 trivial because then D2m = 0, Rp - Rk and Fp = Fk.
2. Step 2 of this general procedure corresponds to the first step of the nondegenerate version. In the nondegenerate case, the second choice for a holds. 3. In step 3, v is the multiplicity of z = 1 as a zero of Rk-2. If Rk-2(1) 0, then we are in the nondegenerate situation and this step becomes again trivial because then v = 0, T.-1 is a polynomial with a negative degree and is therefore equal to zero. The polynomials Rq-1 = Rk-1 and Rq-2 - Rk-2, so t h a t G~ - G~. Obviously, b is then equal to zero. 4. Step 4 is the generalization of the second step of the nondegenerate version. Indeed, if Rk-2(0) ~ 0, then w = 0. The polynomial U2w+l is then of degree 1, hence of the form Ul(z) = u + ~z and u has to be chosen such t h a t the constant terms in Rk-1 and T I R k - 2 are the same, which means that u - R k - l ( O ) / R k - 2 ( O ) - ak-2. Thus Rk-3 is obtained by the recurrence of the nondegenerate case.
6.7. S T A B I L I T Y CHECKS
347
E x a m p l e 6.9 We reconsider the polynomial of E x a m p l e 6.5 which is also the second polynomial in E x a m p l e 6.7. We know from the Schur-Cohn test t h a t the polynomial
P(z) - 1 + z + z 2 + 2z 3 + z 4 + z s has no p a r a - c o n j u g a t e zeros, 3 zeros in ]D and 2 zeros in E. Taking P in the role of P # in the initialization of the Bistritz algorithm will yield N ( P / P t ) - I(Fs). This initialization is thus
Rs(z) - P(z)-+- P # ( z ) - 2 + 2z + 3z 2 + 3z 3 + 2z 4 + 2z s
and
R4(z) - P ( z ) -
P # ( z ) = _z2. 1-z
So in step 1 we have with k - 5 t h a t R4(0) - 0, m - 2, p - 1, R p - 1 - - 1 and n4(z) has the form n 4 ( z ) - do + d l z - di z3- doz4. Its coefficients are obtained from R ~ ( z ) - (1 - z)D4(z)Ro(z) m o d z 2 which gives rise to the system
so t h a t d o - - 2 and dl - - 4 . T h e n Rl(z) is c o m p u t e d from Rl(Z)-
z - 2 [ R s ( z ) - (1 - z ) D 4 ( z ) R o ( z ) ] - 7(1 -{- z).
Step 1 is concluded with the c o m p u t a t i o n of )~1 = - 1 4 and the knowledge t h a t I(F5 ) - 2 -{- I(F1 ). In step 2, R - 1 is c o m p u t e d and of course it turns out to be identically zero (it should have degree - 1 ) . So )~0 = oo, but it does not occur in the index c o m p u t a t i o n because we are in situation 3 for the c o m p u t a t i o n of a: a-
1 ~(1-
sgn ( - 1 4 ) ) -
1
and thus
/(El)
- ~(a~)+
~.
The algorithm moves on to step 3, but there it stops because R - 1 - 0. Thus g c d ( P , P # ) - R0 - - 1 so t h a t P and P # are coprime, thus P has no p a r a - c o n j u g a t e zeros. Finally we add up the indices to find t h a t
N ( e ) - I(F~)- 2 + I ( F 1 )
- 2 + 1 + I(a~)
There are 3 zeros in D hence 2 zeros in E.
- 3.
CHAPTER 6. LINEAR SYSTEMS
348
If we switch the role of P and P # , the c o m p u t a t i o n s are practically the same. Rs is the same polynomial. There is a sign change in R4, hence also in R0 and D4. So R1 is again the same as before, but )~1 changes sign. Therefore in step 2 we now find a = 0 and thus
N ( P #) - I(Fh)
-
2 + I(F1) -
2 + 0 +
I(G21) - 2
because again step 3 t e r m i n a t e s the procedure while R-1 = 0. Thus we find indeed the symmetric situation where there are 2 zeros in ]D and 3 zeros in E.
E x a m p l e 6 . 1 0 For the other polynomials in E x a m p l e 6.7, the Bistritz test gives the following results. The polynomial P(z) - 1 + z + 2z 2 + z 3 gives rise to the initializations R3(z) - P ( z ) + P#(z) - 2 + 3z + 3z 2 + 2z 3 '
R2(z) - P(z) - P#(z) = z. 1-z
Step 1 finds m 1, R o ( z ) - 1 and D 2 ( z ) - 2 - 2z 2 so t h a t R l ( z ) 5(1 + z). Therefore )~1 = 5 and I ( F 3 ) = 1 + I(F1). Step 2 computes R - 1 - 0 and I(F1)+I(G2)-Fb with b - 8 9 0. Step 3 concludes the c o m p u t a t i o n with g c d ( P , P # ) - R0 - 1 and o. Thus P and P # are coprime. There is no p a r a - c o n j u g a t e zero in P and
N(P)-
I(F3)-
1 + I(F1) - 1 + 0 +
I(G21)- 1.
There is one (real) zero in ][3) and hence 2 zeros in E. For the polynomial P(z) - 1 + z + 2z 2 - z 3, the initialization gives R3(z) - P ( z ) + P # ( z ) -
3z(l+z)and
R2(z) - P ( z ) - f # ( z ) = 2 + z + 2 z 2 . 1-z
Step 1 gives m = 0, so t h a t it is trivial except for the Jacobi p a r a m e t e r c o m p u t a t i o n ) ~ 3 - 6 / 5 > 0. Step 2 computes a2 = R3(O)/R2(O)= 0 and therefore
+ z). Thus )~2 = R2(1)/Rl(1) = 5 / ( - 6 ) formula and it gives a-
< 0. Now a is c o m p u t e d by the second
21(sgn>'3+sgn)~2)-0'
so t h a t
I(F3)-I(G
2)+0.
6.7. STABILITY CHECKS
349
In step 3 we find v - 0 so t h a t this step is trivial and gives
I(C]) - •
+ b with
b-
l[v-1-
~gn ( R ~ ( ~ ) / R , ( ~ ) ) ]
- 0.
We arrive at step 4 where it is found t h a t w - 0. This means as we have said before t h a t we are in the nondegenerate situation and thus
U l ( Z ) - al +-~,z
with
~1
--
R2(O)/RI(O)- - 2 / 3 .
Thus
R o ( z ) - z - l [ a l ( 1 + z ) R l ( z ) - R 2 ( z ) ] - 3. The conclusion here is I(G 3) - 1 + I(F~). We now return to step 1, which gives m - 0 so t h a t it becomes again trivial and we only compute $1 - - 6 / 3 - 2 < 0. We pass to step 2 where a0 - - 1 and this gives R - 1 - 0. The applicable formula for a is here 1
a - ~ ( 1 - sgn ~ ) -
1 so t h a t
~(f~ ) - ~(C~) + ~.
Step 3 terminates again the algorithm because R-1 - 0 and g c d ( P , P # ) 0
Thus there are no para-conjugate zeros in P and N(P) - I(F3), which after filling in the successive steps gives N(P) - 2. This polynomial has indeed a pair of complex conjugate zeros in D and hence one real zero in
Chapter 7
General rational interpolation As explained in Chapters 2 and 5, the recurrence relations described in this book allow to compute Pad~ approximants along different paths in the Pad~ table: on a diagonal, a staircase, an antidiagonal, a row, . . . . In this chapter, we develop a general framework generalizing the above in two directions. Firstly, a more general interpolation problem is considered from which, e.g., the Pad~ approximation problem can be derived as a special case. Secondly, recurrence relations are constructed allowing to follow an arbitrary p a t h in the "solution table" connected to the new interpolation problem. In Pad~ approximation, the approximant matches a maximal number of coefficients in a power series in z or in z -1 . This corresponds to a number of interpolation conditions at the origin or at infinity. In the Toeplitz case, we had two power series: one at the origin and one at co. As shown in (4.38), (4.39), the interpolation conditions are distributed over the two series. The general framework will allow to m a t c h a number of coefficients in several formal series which are given at several interpolation points. Again all the degrees of freedom in the approximant are used to satisfy a maximal number of interpolation conditions that are distributed over the different formal series. This could therefore be called a multipoint Pad~ approximation problem.
7.1
General framework
The main ideas for this general framework are given by Beckermann and Labahn [12, 13] and Van Sarel and Sultheel [228, 229]. The reader can 351
352
CHAPTER 7. GENERAL RATIONAL INTERPOLATION
consult these papers for more detailed information. We sketch the main ideas. As before F denotes a (finite or infinite, commutative) field, F[z] denotes the set of polynomials and F[[z - ~]] denotes the set of formal power series around ~ C F and F ( z - ~) denotes the set of formal Laurent series around with finitely many negative powers of ( z - ~). We need these formal series for different points r C F, called interpolation points. The set (finite or infinite) of all these interpolation points is denoted by Z. Now suppose Z = { ~ l , . . . , ~,~ and that we want to find a rational form of type (fl, a) whose series expansion at ~ matches k~ coefficients of the given series gz e F[[z-~z]] for i = 1 , . . . , n . I f k l + . ' - + k n is equal to the number of degrees of freedom in the approximant, namely a + fl + 1, then the approximant is a multipoint Pad~ approximant for this collection of series {g~ : i = 1 , . . . , n } . In the special case that k~ = 1 for all i, then the multipoint Pad~ approximant is just a rational interpolant. If n - 1 and hence kl = a + fl + 1, then we have an ordinary Pad~ approximant at ~1. All the information which is used for the interpolation (the k~ terms of g~, i = 1 , . . . , n) could of course be collected in one Newton polynomial, which could be the start of a formal Newton series expansion of a function. However, in this formal framework, infinite power series need not converge and thus need not represent functions. So we have to replace the notion of function by a set of formal series at the interpolation points. In principle, these series are totally unrelated. Therefore we define a formal Newton series as a collection of power series {g~}z~Z with gr e F[[z - r D e f i n i t i o n 7.1 ( f o r m a l N e w t o n series) The set F[[z]]z of formal Newton series with respect to the set of interpolation points Z C F is defined as
~[[z]]z
-
{ g - {g~}r162 e F[[z- r
We call gr the expansion of g at ~. Elements in F[[z]]z can be multiplied as follows. With f, g e F[[z]]z, the product h - fg e F[[z]]z is defined as
h- fg-{h(}~ez
with
h ( - f(g(.
Also division can be defined. When gr ~ 0, V( E Z, the quotient h is defined as h - f / g - {hr162 with h r fr162
Note that in general h i - f r F[[z- r
f/g
belongs to F(z - () and not necessarily to
7.1. G E N E R A L F R A M E W O R K
353
Because polynomials can be written as an element of F [ [ z - (']] for any (" E Z, we can consider the set of polynomials F[z] as a subset of the set of formal Newton series: F[z] C F[[z]]z. Hence, the product of g E F[[z]]z and p E F[z] is well-defined resulting in an element of F[[z]]z. Similarly for the quotient. ,~x, denotes the set of m x s matrices whose entries are in F[[z]]z and similarly for F[z] 'nx~ etc. Recall the Pad~ interpolation condition (5.9) for power series in z ord ( f ( z ) a ( z ) - c(z)) >_ a + 13 + 1. Setting G(z)-
[1
- f(z)],
P(z) - [c(z) a(z)] T,
and
w(z)-
z ~+~+1, (7.1)
this can be rewritten as
a ( z ) P ( z ) - w(z)R(z),
R(z) e
(7.2)
The power series R(z) is called the residual. For power series in z - (', the factor w in the right-hand side should be replaced by w(z) - ( z - ~)~+1 and R should contain only nonnegative powers of ( z - ('). Generalizing this notion further to our situation of formal Newton series where multiple interpolation points are involved, w(z) can be any monic polynomial having interpolation points as zeros. When G(z) is not a row vector, but a general matrix, we introduce not just one polynomial w, but a vector of polynomials tO.
Definition 7.2 ( o r d e r v e c t o r ) Let m be an integer with m >_ 2. An order vector ~ - ( w t , . . . , w m ) with respect to Z is defined as a vector of monic polynomials having interpolation points as zeros. Now we can express interpolation conditions for the polynomial matrix P, given a matrix series G by requiring that G P - diag(a~)R with residual R containing only nonnegative powers of ( z - (') for all (" C Z. We say that P has G-order a3. D e f i n i t i o n 7.3 ( G - o r d e r ) Let G e F[[z]]~ x " and ~ an order vector. The polynomial matriz P C F[z] mxs is said to have G-order ~ iff GP-
diag(wl,...,wm)R
with R e F[[z]]~ x~.
(7.3)
The matriz R is called the order residual (for P). The set of all polynomial vectors having G-order ~ is denoted by S ( G , ~ ) , i.e., 8(G,5)-
{Q e F[z]mXl 9 Q has G-order ~}.
354
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
E x a m p l e 7.1 The G-order of a polynomial P will depend on the set of interpolation points Z. For example suppose
G(z)-
[zz 1, z] ~ ( ~ - 2)
z-
[1]
~
~- 9
Then G(z)P(z)-
(z-
2 ) ( 2 z - 1)
"
Therefore, if Z - {0, 1} or Z - {0}, then P has G-order ~7 - (z k, 1) for k - 0, 1. If Z - {0, 2}, then P has G-order a7 - (z k, ( z - 2) t) for k, l - 0, 1. If
a(z)-
z(z-~)
~-~
~-~
,
then
so that for Z - {0, 1}, this P has G-order ~ - ( z P ( z - 1)q, ( z - 1) k) for p , q - 0, 1 , 2 , . . . and k e {0, 1}. When Z - {0, 1,2}, we can add the factor ( z - 2) ~ to the first component of ~7 with r any nonnegative integer.
Before we look into the algebraic structure of S(G, ~), we need the following definition; see [17, p. 3, Th. 7.6, 7.4, p. 105]. D e f i n i t i o n 7.4 ( m o d u l e , free m o d u l e , f i n i t e - d i m e n s i o n a l ) Let R be a ring. Then an additive commutative group M with operators R is a (left) R-module if the law of external composition R x M ~ M 9 (~, a ) ~ ~a, is subject to the following axioms, for all elements ~, ~ E R and a, b C M" a( a + b) (a + $)a
-
(~)~la
-
aa + ab, aa + $a,
~(~), a.
A (left) R-module M is free if it has a basis, i.e., if every element of M can be expressed in a unique way as a (left) linear combination of the elements of the basis. The dimension dim M of a free R-module M is the cardinality of any of its bases.
7.1. G E N E R A L F R A M E W O R K
355
We have a right R-module when instead of (a)~)a = a(),a) we have (a)~)a - ~(aa). Since we only need the left version here, we drop the adjective "left" everywhere. It is clear that F[z] TM is an m-dimensional F[z]-module. A possible basis is given by the set {ek)k~=~ with ek - [ 0 , . . . , 0, 1, 0 , . . . , 0] T where the one is at position k. A principal ideal domain is defined as follows [66, p. 301,318]. D e f i n i t i o n 7.5 (ideal, p r i n c i p a l ideal d o m a i n ) An ideal A in a ring R is a subgroup of the additive group of R such that R A C A and A R C A. In any commutative ring R, an ideal is said to be principal if it can be generated by a single element, i.e., when it can be written in the form aR with a C R. An integral domain in which every ideal is principal is called a principal ideal domain. The Euclidean domain of polynomials F[z] is a principal ideal domain [66, Th. 1, p. 319]. Every submodule of a free R-module will also be free if R is a principal ideal domain. We have indeed [17, Th. 16.3] T h e o r e m 7.1 Let R be a principal ideal domain and let M be a free Rmodule. Then every submodule N of M is free with dim N <_ dim M. Because the set of all polynomial m-tuples is an m-dimensional (free) module over the principal ideal domain of the polynomials F[z], we know that each submodule of F[z] TM is also free with dimension smaller than or equal to the dimension m of the complete module. The key idea in this chapter is the following fact. T h e o r e m 7.2 The set S(G, ~) of all polynomial vectors Q having G-order forms a submodule of the F[z]-module F[z] TM. A basis for the submodule S(G, ~) always consists of exactly m elements, i.e. dim S(G, ~) - m. P r o o f . It is easy to see that
Q1 - -Q2 E e
VQ e
VQ1, Q2 E and W e F[z].
Hence, we have proved that S ( G , ~ ) is a submodule. To prove the second part of the theorem, we refer to the updating steps of Section 7.2 keeping always just m basis vectors. [:] A basis for S(G, cD)is called a (G,cs The polynomial m • m matrix B whose columns form a (G,~)-basis is called a (G,~)-basis matrix.
356
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
The following property allows to characterize a basis matrix in terms of its determinant. T h e o r e m 7.3 For S = 8 ( G , ~ ) there exists a unique monic polynomial Xs, such that
(a) a polynomial matrix B is a ( a , ~)-basis matrix iff the columns of B belong to $ ( G , ~ ) and det B = cxs with 0 ~ c E F; (b) for any polynomial matriz P e F[z] '~x'~ whose columns belong to S(G, &): det P = c. )is with c e F[z]. P r o o f . (a) Take a (G,~)-basis matrix B and define Xs as the monic polynomial such that det B = cXs. Any other (G,~)-basis matrix B ~ can be written as B' = B U with U a unimodular polynomial matrix (i.e. a matrix whose determinant is a nonzero constant). By taking determinant-values, part (a) is proved in one direction. On the other hand, if the columns of P are elements of $ ( G , ~ ) and det P = cxs with 0 r c C F, we can write the columns of P in terms of a (G, ~)-basis matrix B as P = B U with U a polynomial matrix, since the columns of P are in 8 ( G , ~ ) . Taking the determinant of the left and right-hand side it turns out that U is unimodular. Hence, also P is a basis matrix. (b) The polynomial matrix P can be written as P = B C with C C F[z] "~x'~ and B a (G,~)-basis matrix. Again by taking determinant-values, part (b) is proved. 0 From this theorem, we can conclude C o r o l l a r y 7.4 A polynomial matrix B is a (G,~)-basis matrix iff the columns of B belong to 8(G, ~) and 0 ~ deg det B is as small as possible.
Definition 7.6 (characteristic p o l y n o m i a l )
The polynomial Xs as described in the previous theorem is called the characteristic polynomial for
In [13], the characteristic polynomial is called generating polynomial. However we think that characteristic polynomial is a more appropriate name. As we said before, our formal series need not converge and thus do not represent functions. However, if G e F[[z]]~ • we can define the "function value" G(() for all ( C Z as the constant term in the formal series Gr E G which is the formal expansion of G at ~ e Z, i.e., G(() = Gr We say that G is regular iff for each point ( E Z, the function value G(r nonsingular. The characteristic polynomial has the following property.
7.1. G E N E R A L F R A M E W O R K
357
T h e o r e m 7.5 Let Xs be the characteristic polynomial of $(G, 5). Let ~2 d e t d i a g ~ - 1-[~=lwk. Then
(a) If G is regular and B is a (G,~)-basis matrix with order residual R, then the following statements hold (1) R is regular (3) d e t G = c d e t R (b) If G e F[[z]]~ • off~.
with O ~ c C F then the characteristic polynomial Xs is a divisor
P r o o f . ( a ) ( 1 ) S u p p o s e R is not regular, i.e., 3( C Z such that det R ( ( ) 0. Hence, there exists a nonzero vector x C ] ~ x l having, e.g., the kth with I~ the component different from zero such that RI~Dr E F[[z]]m• z identity matrix with the kth column replaced by x and D~ the identity matrix with the kth diagonal element replaced by ( z - ()-1. Therefore, because G is regular, BI~D~ belongs to F[z] 'n• with a lower degree of the determinant whose columns are in S(G, ~). Hence, this contradicts the fact that B is a basis matrix. (2) Taking the determinant of both sides of
G B - diag ~ . R gives us Xs - 12 because G and R are regular and Xs and 12 are monic. (3) From (2)it follows that det G - c det R with 0 ~ c e F. (b) The proof follows from the updating steps of Section 7.2 below. Now that we can describe all the polynomials P that are in $((7, r i.e., all polynomials that satisfy a certain number of interpolation conditions, we should select some that satisfy certain degree restrictions. Conditions on the degree can be seen as interpolation conditions at infinity. Recall that in the Pad~ example, we did not want all the polynomials P - [c a] T which satisfied [1 - l I P - O(z~+~), but we had Pad~ approximants only if dega _ a, degc _ fl and v = a + ~ . Even then, the problem had many different solutions. In the minimal Pad~ approximation problem (see Section 5.6), we introduced a shift parameter a to impose a certain structure in the degree of numerator and denominator. We defined the mPA indeed as the "simplest possible" rational form which satisfied the interpolation conditions and for which a degree structure was imposed by the parameter a, i.e., the degree of the numerator was at most a - a and the degree of
C H A P T E R 7. GENERAL R A T I O N A L I N T E R P O L A T I O N
358
the denominator at most a.
In other words we have the componentwise
inequality deg(a(z),z~c(z)) < (a,a), or Zr
H(z)P(z)-
[
]
a(z)
- S(z)z~'
with S e F[[z-1]].
(7.4)
As we know from Section 5.6, either deg a - a or deg c - a - a. Thus it is impossible to increase a in the right-hand side since S(oo) # 0. We shall therefore call a the H-degree of the vector P (see below). Similarly, in the general situation, we have to select within the submodule S ( G , ~ ) , those polynomial m-tuples which satisfy additional conditions concerning their degree structure. In other words, we have to impose interpolation conditions at infinity. However, since we now have a rectangular matrix P E F[z] mx~ instead of just one column, we shall have to replace the scalar z ~ in the right-hand side by some diagonal matrix, which we denote as
z 6 - diag(z 6~, z62,..., z 6"). For the left-hand side matrix H, we shall allow a more general form H(z) zr where ~ - ( a l , . . . , am) is a vector of shift parameters, which impose the relative importance of the degrees of the different rows of P. The role of H will become clear below. To introduce the generalization we are heading for in a more formal way, we need a precise definition of the concepts H-degree and H-reducedness. Recall that F(z -1) denotes the set of formal Laurent series over F with finitely many positive powers of z.
Definition 7.7 (H-degree, H-reduced)
Let H E F(z -1)mXm with
det H ~ 0 and P E F(z-1 ),7,x9 without zero columns. .-)
Then P is said to
have H-degree ~ if HP-
Sz 6
(7.5)
with S e F[[z-~]] m• and S(oo) not containing zero columns. S is called the degree residual and S(oo) the H-highest degree coefficient of P. A zero column of P has corresponding H-degree equal to -oo and the corresponding S-column is zero. The matriz P is called H-reduced if the matviz S(oo) has full rank. A polynomial matriz is called column reduced when it is H-reduced with H = I, the identity matviz. T h e / - d e g r e e corresponds to our usual notion of degree and t h e / - h i g h e s t degree coefficient is just the usual highest degree coefficient. Note that if
7.1. GENERAL F R A M E W O R K
359
P is H-reduced, then deg det S = 0. When P is column reduced, then deg det P = 61 + ' " + ~ . Thus the degree of the determinant is the sum of the degrees of its columns. If a (G,5)-basis matrix B is H-reduced, it is called a (G,H,&)-basis matrix. It is easy to transform the basis of a submodule into an H-reduced one. If the basis is not already H-reduced, take the basis element Bi with largest H-degree 6/whose H-highest degree coefficient is linearly dependent on the H-highest degree coefficients (hdc) of the other basis elements m
(H-hdr Bj)dj - o ./=1
with di ~ O. Replacing Bi by ~j~--1 djz6~-'5~Bj, gives us another basis for S(G,~) but the H-degree of Bi is decreased. Repeating this process, we finally get an H-reduced basis. The transformation, in each elementary step, can be obtained by multiplying the basis matrix from the right with an elementary matrix which differs from the unit matrix only by its column i. There can be powers of z in that column, but it has 0 r di C F on its diagonal. Thus this elementary matrix has a constant, nonzero determinant. Thus each elementary transformation is a right multiplication with a unimodular matrix. The product of all these unimodular matrices is of course again unimodular. For further details, we refer to [239]. Once we have a (G, H, &)-basis, it is easy to parametrize all elements of the submodule with a given upper bound on the H-degree. T h e o r e m 7.6 Given an H-reduced basis for the submodule S ( G , ~ ) with
basis elements Bi having H-degree 8i, all elements of the submodule S(G, ~) having H-degree <_ 6 can be parametrized uniquely as Eim=l ciB i with ci a polynomial of degree <_ 6 - 6i. P r o o f . Let us denote the H-highest degree coefficient of a polynomial vector P by 7~(P). The H-highest degree coefficient of Eim=i ciB i is determined by m
7 ~ ( ~ ciBi) i:1
~
(hdc cj). 7-/(B3)
7j +Sj =max{"fi +dii}
with 7i - deg ci. This coefficient can not become zero because the H-highest degree coefficients of the basis elements are linearly independent. Hence, m
H-deg ( E c i B i ) - max{3'/+ 6i). i=l
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
360
Because the H-highest degree coefficients of the basis elements are linearly independent, different choices of the polynomials c~ lead to different elements of the submodule. This proves the theorem. D We introduce the following notation: If ~ C Z TM is a vector of integers, then Note that this is not a norm since the ~z can be negative. To characterize a (G, H,~)-basis matrix B, we can use the following property. T h e o r e m 7.7 Let B e F[z] '~• deg det H. Then we have
with H-degB - ~ and denote O(H) -
(a) 161 > deg det B + O(H), -.r
(b) I~1 - deg det B + O(H) iff B is H-reduced. (c) B is a (G, H, ~)-basis matrix iff det B ~ 0, I~ - deg Xs + O(H) and the columns of B belong to S(G, ~). P r o o f . These properties directly follow by taking determinants in (7.5). [:] Following [13], we introduce the following definition for later use.
Definition 7.8 ( n o r m a l d a t a , w e a k l y n o r m a l d a t a ) We say that the data (G, H , ~ ) are normal if all components of the H-degree of an H-reduced basis matrix are equal. The data are called weakly normal if the components of the H-degree differ at most by one.
7.2
Elementary updating and downdating steps
To motivate the elementary steps described in this section, think of the minimal Pad~ approximants [w, a] to the series f e F[[z]]. We characterized the entries of the Pad~ table not with the degrees of numerator and denominator, but with the parameters w and a. Elementary steps (from one entry to a neighboring one) correspond to changing w or a by one. Increasing or decreasing w by one meant satisfying one more or one less interpolation condition. Decreasing or increasing a by one meant changing the degree structure of the mPA. In the more general situation, we characterized the interpolation conditions by an order vector ~ and the degree structure was imposed, mainly by the shift vector d. Thus elementary steps in this multi-dimensional interpolation table shall correspond to adding or removing one interpolation
7.2. E L E M E N T A R Y
UPDATING AND DOWNDATING
STEPS
361
condition in one of the points of Z or to increasing or decreasing a component of d by 1. Let us assume that we start with an H-reduced basis B = [B1, B 2 , . . . , Bin] for the submodule S = S ( G , ~ ) of Theorem 7.2 and that we want to make one of the elementary changes mentioned above. 1. If we want to change the order vector ~ --. ~ by component wi(z) ~ w~(z) - ( z - ()wi(z) with ( w~(z) - w i ( z ) / ( z - () (if wi(z) is divisible by (z to transform the H-reduced basis matrix B with H-reduced basis matrix B ~ with G-order ~ . 2. If we want to ponent aj ~ basis m a t r i x with G-order
changing only one e Z , or wi(z) r then we have G-order u~ into an
change the shift vector d ~ d ~by changing only one comaj aj • 1 then we want to transform the H-reduced B with G-order a3 into an H~-reduced basis matrix B ~ ~, where g ' ( z ) - z ~ ' - ~ H ( z ) . -
Let us consider the procedures to do these elementary operations in more detail. 1. Let us consider the change ~ ~ a3~ by changing only one component wi(z) ~ w ~ ( z ) - ( z - ()wz(z). Any polynomial vector P e F[z] TM with G-order ~ satisfies the set of linear homogeneous equations (7.3), i.e.,
G P - diag ~ . R with R C F[[z]]~ xl. If it has to get a G-order ~ , then it should satisfy one additional equation, namely Ri(() - 0 in
a~(z)P(z)-
w~(z)R~(z)
with R~ e F[[z]]z
(7.6)
where G~ and Ri denote the ith row of G and R respectively. Thus P satisfies this extra equation (7.6)iff TQ(P) - 0 where 7~i(P) denotes the ith component of the order residual for P evaluated at (: 7~i(P) -
R,(r Thus the simplest case is when T~i(Bj) = 0 for all j = 1 , . . . , m, since then obviously S ~ = S and we can keep the same H-reduced basis. In this case, Xs = Xs'. Note that this is only possible when G ( ( ) is singular. When at least one of the residuals T~i(Bj) r 0, we can use the following algorithm to determine an H-reduced basis for S~:
362
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N 9 take a basis element having nonzero order residual and smallest H-degree; suppose this is the basis element Bt; 9 take the following m -
B~.- B j -
1 polynomial m-tuples
(TQ(Bj)/TQ(Bt))Bt,
j - 1 , . . . , m but j 7~ l;
9 add the polynomial m-tuple B [ ( z ) m polynomial vectors.
( z - ()Bt(z) to get a set of
It is clear that all elements B}, j - 1 , . . . , m belong to S'. Moreover the matrix B' is H-reduced while the degree of its determinant deg det B'(z) - deg Xs,(z) - deg [Xs(Z) " (z - ()] is as small as possible. Hence, according to Corollary 7.4, B' is an H-reduced basis matrix for S'. 2. Let us consider the change a~ - , ~' by changing only one component wi(z) ---, w~(z) - w i ( z ) / ( z - ~) (if wi(z) is divisible by (z - ()). This is the reverse of the previous step. The new basis matrix B' should satisfy
a(z)B'(z)-
d i a g ( w l , . . . , w i _ l , w i / ( z - C),Wi+l,...,OJm)Rt(z)
with R ' 6 F[[z]] m e x m . Consider the set of linear equations R(()c-
(7.7)
e,:.
When R(() is nonsingular, there is a unique solution - [cl,
r 0.
Note that R ( ( ) i s always nonsingular if G is regular. If R ( ( ) i s singular, consider all nonzero solutions of (7.7). One can show that the following set of polynomial vectors forms an H-reduced basis for the submodule
9 try to find (one of) the solution(s) c such that the polynomial vector ~"=1 c i B i ( z ) i s divisible by ( z - () (note that, if G is regular, the unique solution gives a linear combination divisible by ( z - ( ) ) ; if such a solution does not exist, we take B ' = B and we are d o n e .
7.2. E L E M E N T A R Y UPDATING AND DOWNDATING STEPS
363
9 Otherwise, if such a solution does exist, look for the index l with cl # 0 and ~t as large as possible; take the polynomial vector m
i=1
We take B[(z) - B [ ' ( z ) / ( z - () as the first member of the new basis; the other m - 1 members of the new basis are the elements of the old basis, excluding the /th one, i.e., B~ - Bi for i - 1 , . . . , m but i # I. --,
0
!
Consider next the change a ---, tY~ where aj - aj + 1. That is H ~ is obtained from H by multiplying the j t h row of H by z. Note that the interpolation conditions (7.3) do not change, i.e., the submodule S ( G , ~ ) considered is the same, only the degree structure is different. Let us denote the j t h component of the H-highest degree coefficient of a polynomial vector P as 7-/j(P). Note the analogy with the interpolation at r in our previous procedure. Degree conditions are like interpolation conditions at 0o. 7-/j(P) is indeed the j t h component of the degree residual evaluated at cr We now transform the H-reduced basis { B 1 , . . . , Bin} for S({~, ~ ) i n t o an H~-reduced basis as follows 9 take a basis element Bt having the smallest H-degree and such that 7-lj(Bt)~ O. This is chosen as B[; 9 add the following m -
1 polynomial m-tuples -
(nj(B,)/Uj(B,))B,(z),
i-
1,...,rebut
i#l
with 8i the H-degree of Bi. It is easy to see that by this procedure B' is generated from B by multiplying B to the right by a unimodular matrix. Thus, the new matrix B' is also a basis matrix for S ( G , ~ ) . Moreover it is obviously HI-reduced. 4. Similarly, we can divide the j t h row of H by z resulting in H'. This is equivalent to decrementing one of the shift parameters aj. Suppose
364
CHAPTER 7. GENERAL RATIONAL INTERPOLATION that the H-degree of B is ~ - ( ~ 1 , . . . , ~,,~) and let us denote the Hhighest degree coefficient of a polynomial vector P excluding the j t h component as ~ # j ( P ) . The set of linear equations TrL
- o
F_, i=1
has a nontrivial solution (cl, c2,..., Cm) ~ O. Once again, it is easy to show t h a t the following set of polynomial vectors forms an H~-reduced basis for the same submodule S(G,~). 9 look for the index l with ct ~ 0 and ~t as large as possible; take the polynomial vector m
-
B,
i=1
as the first member of the new basis; 9 the other m - 1 members of the new basis are the d e m e n t s of the old basis, excluding t h e / t h one, i.e., B~ - Bi for i - 1 , . . . , m but i ~ I. When we combine the basis elements as columns of a square m • m basis matrix, then the transformation for a change in the shift parameters d, i.e., in case 3 and case 4, involves a right multiplication by a unimodular matrix. For cases 1 and 2 where an interpolation condition is added or removed, this is also described by a right multiplication with a unimodular m a t r i x which is now possibly followed by a multiplication or a division of one of the columns by ( z - (). Each of these elementary steps is very simple, efficient and straightforward. They do not only allow us to u p d a t e or downdate an already computed solution, but by concatenating several of these elementary steps, we can go from one "point" (~, Y) to any other point ( ~ , Y~). Hence, we can follow any path in the solution table. For example, considering the scalar linearized Pad~ approximation problem reformulated in Definition 7.9, one can follow a diagonal, a row, a column, an antidiagonal, and any combination of these in the Pad~ table. We can even make circular walks. See Section 7.4. For example, to go from one entry of a row in the Pad~ table to the next entry, one increments the interpolation index v (connected to the interpolation point 0) and decrements a2 (or equivalently increments a l ). This m e t h o d is not only an efficient updating or downdating procedure, but it allows to compute very efficiently the solution at any point (~, Y),
7.3. A G E N E R A L R E C U R R E N C E STEP
365
starting from scratch. This can be done as follows. First, note that the columns of the identity matrix Im• are H-reduced for any kr which is column reduced, thus forming a basis for the module F[z] TM. If at a certain stage H is not column reduced, we can make it column reduced by right multiplication with a unimodular matrix. The columns of this unimodular matrix form an kr-reduced basis. Starting from this basis, we can recursively compute a z~H-reduced basis for any choice of the order vector 07 and for any choice of the shift parameter vector Y, using the three basic updating steps 1, 3 and 4 described above.
7.3
A general recurrence step
In the previous section we have described the elementary steps to a solution of our general interpolation problem corresponding to a certain point (07, d) in the solution table. In this section, we shall look at this from a higher level and describe how to find a (G (2), H(2),~7(2))-basis matrix B (2) in two steps. First we find a (G(1),~(1))-basis matrix B(1) and then we can update B (~) into B (2). Such a global updating step is described in the following theorem. T h e o r e m 7.8 (division into t w o s u b p r o b l e m s ) Let 0~(1) and 07(2) be two order vectors such that ~(1) <_ ~(2), i.e., such that ~(1) divides ~(2) (componentwise). Let ~(1'2 ) - w-.(2) / w-.( )1 (componentwise). Let B (1) be a (G(1),~(x))-basis matrix with order residual R(x). Set H(1,2) - H(2)B (1). If B(1'2) is a (R (1), H(l'~),~(1,2))-basis matrix, then B(2) - B(1)B (1'~) is a (G(2), H(2),~(2))-basis matrix. They have the same order residual and the same degree residual and the H(2)-degree of B (2) is equal to the H(1,2)-degree of B(1,~). Conversely, if B(~) is a (G(2),H(2),~(2))-basis matriz, then B(1'~) = (B(~))-~B (2) is an (R(~),H(~'2),~(l'2))-basis matriz. They have the same order residual, the same degree residual and the H(1,2)-degree of B(1'2) is equal to the H(~)-degree of B (~). P r o o f i Because
G(1)B (1)
=
R(1)B(1, 2) =
diag(07(1))R (1) and diag(07(1,2))R(1,2),
we get that
G(1)B(1)B(1, 2) = G(1)B (~) =
diag(03(1)~(l'2))R(l'2) or diag(~(2))R (1,2) - diag(~(2))R (2)
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
366
Because H (i'2) - H(2)B (i), we can rewrite H(1,2)B(1,2)
H(2)B(i)B(i,2) H(2)B (2)
:
S(1,2)Z6-'(i'~)
aS
--
s(i,2)z6-~1'~1o r
=
S(1,2)z g(t'2) = S(2)z g(~).
Because S(2)(00)is nonsingular, B (2) is H(2)-reduced. Any (G(2),~(2))-basis matrix B can be written as B - B(i)B ~. Hence, deg det B is minimal iff deg det B ~is minimal. This proves the first part of the theorem. The second part can be proven in a similar way. 0 We shall see now how the general theory can be applied to the Pad~ approximation problem.
7.4
Pad6 approximation
Let us write the Frobenius-type definition 5.3 of a Pad~ approximant in another way. Observe how close this is to the definition of a minimal Pad~ approximant (Definition 5.5). D e f i n i t i o n 7.9 ( r e f o r m u l a t i o n P A - F r o b e n i u s ) Given a formal power series f ( z ) - ~ ' = o fk zk E F[[z]], with fk E F, we define a (right) Pad6 approximant (PA) [~/a] of type (~,a), ~ , a >> O, as a rational fraction c(z)a(z) -1, which can be written as a polynomial vector Q(z) - [c(z) a(z)] T (or as a polynomial couple (c(z), a(z)) when appropriate), satisfying ord (G(z)Q(z)) > (v + 1, O) (row-wise) with v - / 3 + a
deg ( H ( z ) Q ( z ) ) <_ ~ (column-wise, i.e. H-degree)
(7.8) (7.9)
6 as low as possible where G(z) and H (z) are defined as G(z) -
[
01 - f(Z)l
]
and
H(z) -
0
01
z -~'
"
(7.ii)
Note that in the definition, we can always multiply H by a power of z and still obtain the same approximant. This can be used to make one component of ~ equal to 0. This is what was done in (7.4) since multiplying the above H ( z ) with z ~ gives the H matrix of (7.4). The above definition fits into the general definition when we set Z -- {0}, ~ - (z v+l, 1), d - ( - ~ , - a ) , H ( z ) - z ~ and G ( z ) a s above. In this definition, H is used to impose a
7.4. PADE APPROXIMATION
367
degree structure on the solution. This is essentially the difference a between the upper bounds for the numerator and denominator degree. The solution is found among the rational fractions satisfying the order condition and having a minimal H-degree, i.e., it has the lowest possible degree, taking the imposed degree structure into account. All solutions Q(z) of (7.8) form a submodule of the F[z]-module F[z] 2 of polynomial couples. A basis for such a submodule consists of two elements of F[z] 2. When the basis is H-reduced, it is easy to write all solutions of (7.8) and (7.9)for any ~. Hence, it is easy to get the solution having minimal We call a (G, H,~)-basis matrix for the data of Definition 7.9 a (~, a)basis matrix. Because 0 is the only interpolation point, the order vector a3 will be of the form ~ ( z ) - (z ~+1, 1). Our aim is to use Theorem 7.8 in the present situation to construct the recurrence to compute the (fl + r/, a q- ~)-basis matrix when the (13, a)-basis matrix is known. Before we come to that, we have to analyse the situation a bit deeper to see what kind of updates will be needed. We first shall see why the data are not normal, but weakly normal and then we shall find out why there are two possible types of weakly normal data. Note that a3 is completely defined in terms of (~, a). Thus, when G and H are known, then, instead of calling the data (G, H, ~) normal or weakly normal, we shah also say that the point (fl, a) is normal or weakly normal, and assume that the matrices G and H are understood. One can remark that also H is known when (~, a) is known, but we should warn here that this is only true when H has indeed the simple form of (7.11). We shah meet situations further on where H will not be that simple. Because G(0) is nonsingular and 0 is the only interpolation point, G is regular. Hence, the characteristic polynomial X.,s(z) - z ~'+~+1 and by Theorem 7.7, a polynomial matrix B with H-degree ~ is a (13, a)-basis matrix iff all columns of B are elements of S and [~[ - deg Xs + c3(H) thus (a + + 1) q- ( - a - ~) - 1 - ~1 + ~2. Therefore, 61 # ~2, i.e., the point (/3, a) can not be normal. Because there always exists a Frobenius-PA of type (fl, a), we can always assume that this PA forms the second basis vector. So, ~2 <_ 0. Hence, ~ 1 - 1 - 6 2 > 0 . Let ...#
B_[C~ c2] al a2 and introduce T-
C H A P T E R 7. GENERAL R A T I O N A L I N T E R P O L A T I O N
368
p = ord r, fl*=degc2-7
7 = deg gcd(c2, a2), and a* = dega2 - 7
_< a.
Taking into account the block structure of the Pad6 table (with f0 ~ 0) described in Theorem 5.2, we know that c2/a2 is a PA (Frobenius) of type (/~, a ) l y i n g in the singular block with top-left corner (/~*, a*) and dimension = v - 7 + P - (a* + fl*). Note that m i n { 7 , p } = 0. If ~c = 0, the singular block is trivial, i.e., (/3", a * ) = (fl, a) and p = 0. Lemma
7.9 Let G, H be as in (7.11} and ~ - (z ~+~+1, 1). Then the fol-
lowing are equivalent: 1. The data triple (G, H, ~) is weakly normal. 2. There exists no nontrivial solution in S ( G , ~ ) of negative H-degree. .
The dimension of the subspace of all solutions in S(G, ~) having nonnegative H-degree is one.
The data triple (G, H, ~3) at the point (fl, a) can not be normal. However, we can assume it is weakly normal, i.e., d~l = 1 and 62 - 0. There are two types to be distinguished which will correspond to familiar situations when interpreted in terms of their position in the Pad~ table. (See Corollary 7.11.) Indeed, when 62 = 0, then there are two possibilities: either 7 = 0 or 7 > 0, i.e., the couple (c2, a2) is coprime or not. (a) If d~2 = 0 and 7 = 0, then/3* = degc2 = ~ or a* = dega2 = a, i.e., the point (fl, a) lies at the top or the left border of a singular block. (b) If 62 = 0 and 7 > 0, it easily follows that p = 0. Hence, we have the equalities fl* = degc2 - 7 = fl - 7 and ~ = a - a* or the equalities a* = deg a 2 - 7 = a - 7 and ~ = ~ - fl*. Therefore, if 3' > 0, the point (fl, a) lies under the antidiagonal in a singular block at the right or at the b o t t o m border of that block. The two situations are depicted on the Pad~ tables of Figure 7.1. Referring to the numbering used, we call them weakly normal points of type (a) and type (b) respectively. These two possibilities for weakly normal points depending on 7 appear also in the following theorem. 7.10 Let the data (G,H,~3) of Definition 7.9 correspond to the weakly normal point (fl, a). Thus ~(z) - (z ~+1, 1) with v - a + ~. Let
Theorem
369
7.4. PADF, A P P R O X I M A T I O N
Figure 7.1" Weakly normal points
type (a)
type (b)
a >_ 0 a n d f l >_ 0 and hence, v >__ O. Let B be a (,O,a)-basis matrix and define [1
f]B- [s r].
-
Then, B has one of the two (mutually excluding)forms" (a) B has the unique form
B -
Iv c] z2u ~ a
with the following normalization
a(O)- I
and s(O)- i.
(b) B has the (nonunique) form
,_iv zc] u
za
with the following normalization u(O)- 1
and
~ ( o ) - 1.
370
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
P r o o f . From Theorem 7.6, we get that all solutions (c, a) having H-degree _ 0 and satisfying
('/.12)
c ( z ) - f(z)a(z) - ?'(Z)Z c~+/3+1 form a subspace of dimension 1. There are two possibilities:
(a) a(0) ~ 0. Hence, we can take for the second column of B the unique polynomial couple (c, a)with a(0) = 1. From Theorem 7.6, we get that there exists a basis vector (v", u") of H-degree 1, linearly independent of (c, a) and (zc, za). Because a(0) ~ 0, it is clear that there exist (unique) 71,72 C F such that
('V'1, "t/,H) + ")'1(r a)+ ")'2(zc, za) has the form (v, z2u ') with H-degree 1. This polynomial couple is unique up to a nonzero constant factor. By taking the determinant of the equality G B - diag u3. R we obtain that
R(0)-
,(0)]
with u -
Z2lt !
is nonsingular. Because u ( O ) - O, s(O)~ 0 and (v, u ) c a n be normalized such that s(0) - 1. (b) a(0) = 0. Because u = a + f l >_ 0, this implies that also c(0) = 0. Because R ( 0 ) i s nonsingular and a(0) = 0, r(0) # 0 and u(0) ~ 0. Hence, we can scale (c,a)such that r(0) = 1 and (v,u) such that u(0) = 1. Note that ( u , v ) i s not unique. A linear combination of (c, a ) a n d (zc, za) can be added. Thus the theorem is proved.
[3
Because the basis matrix of the previous theorem in case (a) is unique, we will call it the canonical (fl, a)-basis matrix for the weakly normal data (G, H,&) of type (a). An example of weakly normal data of type (b) is given below. E x a m p l e 7.2 ( w e a k l y n o r m a l d a t a of t y p e (b)) Let f ( z ) - 1 + Oz + 0z 2 - lz 3 + . - . a n d consider the case a - 1 and f l - 2. It turns out that each solution having H-degree <_ 0 has a(0) - 0 and has exact H-degree 0.
7.4. PADF, A P P R O X I M A T I O N
371
Hence, the data are weakly normal of type (b). A possible choice for a basis matrix is B(z) -
1 + z3
z
with r ( O ) - 1. We summarise our results about the two different types of weakly normal data in the following Corollary. C o r o l l a r y 7.11 The point (~, a) is weakly normal of type (a) iff the point
(~, a) in the Padd table is at the top or at the left of a singular block. The (~, a) is weakly normal of type (b) iff the point (~, a) in the Padd table is below the antidiagonal in a singular block and at the bottom or the right side of a singular block. Weakly normal data of type (a) are particularly interesting. We look at this in more detail below. From now on we often use the term weakly normal without specifying the type; we then always mean that the data are of type
(a). T h e o r e m 7.12 The point (~, a) is weakly normal of type (a) iff the Hankel
matrix
f•-a-}-I
f~-a+2
"'" 9
H(~, ~)
f~-a+2
""9
fB ~
9
o
=
/~
**~
fill+a-1
is nonsingular. If moreover a + ~ >_ 1, then the first column of the canonical (~, a)-basis matrix is the (unique with s(O) - 1) PA of type (f~- 1, a - 1) multiplied by Z2 .
P r o o f . From Lemma 7.9, we know that the data ( G , H , ~ ) are weakly normal of type (a) iff the dimension of the space of solutions in $(G,05) having H-degree _< 0 is one and each of these solutions (c, a) satisfies a(0) ~t 0. Let a(z) - a o + a l z + . . . + a ~ z ~. The a-part ofasolution (c,a) 6 S ( G , ~ ) having H-degree <_ 0, satisfies the following equation H(f~'~)[a~ -" "al] T - a0[f~+l'-" f~+~]T. This proves the first part of the theorem. The proof of the second part is trivial.
O
CHAPTER 7. GENERAL RATIONAL INTERPOLATION
372
Now we are finally ready to construct the recurrence relation to compute the (/3 + r/,a + ~r matrix B(2) when the (/3, a)-basis matrix B(1) is known (always assuming that (fl, a) as well as (/3 + r/,a + ~r are weakly normal). We suppose that the number of interpolation conditions does not decrease, i.e., ~7+ ~ >_ 0. We also assume that ~ > 7/. The other case 7/> is similar. For the notation, see Section 7.3. That is, we set B (2) - B(1)B (1'2) and H (1'2) - H(2)B (1), where H (1)
-
-
diag(z -t3, z -=) and H (2) - diag(z -t3-n, z-"-~).
Furthermore, with ~(~) - (z aTf~+l, 1) and ~3(2) - (z aT~Tf~TrI+l, 1),
G(X)B(X) _
diag(~(*))R(*),
H(X)B (x) _ SO:)zg(')
and R ( ~ ) - [ s(~) 9 r(~)] 9 where x = 1 or 2. Also B(~)_ [ v(~) c(~) ] u(~) a(~) where (~) can be one of (1) (2) or (1,2) Let us look at the degrees of the polynomial matrix B(l'2). deg B(1'2) (z)
-- deg [(B(1)(z)) -1B(2)(z)] -
-
-<
deg
[z -5-'(1,('(1)(z))-lH(i)(z) (H(2)(z))-l,.,-e(2)(z)z 5-'(~,]
(+~1
~
(componentwise).
Because B (1) and B(2) are the canonical basis matrices, we obtain that a(l'2)(O)- 1 and ord (u(1,2))- 2. The formal Laurent series H(1,2) is defined as H(1,2)(z)-
H(2)(z)B(1)(z)- [
Z-a-~.U(1)(Z)
z-C~-~a(1)(z)
] 9
373
7.4. PADF, A P P R O X I M A T I O N
Hence, d e g H ( l ' 2 ) <- r / [+ l--~ + 1
-7/]_~
(componentwise).
Therefore, to find the second column of B(1,2), we have to compute the (1,2) - 1 of the following set of linear homogeneous unique solution with a 0 equations s~1)
0
..-
(i)
(i)
9
s 1 9
sO
9
.
,
r~1) r~l) 9
0 (11
7' 0 .
(1,2) Co
... .. 9
(~,21
o
9 9
.
999
0
9 ..
v(1) ~+~
o
(1)
v~+~
vO)
~
9
9 -9
""
0
~
c0)
(1)
0 0
"
(7.13)
cO)
C/3 9
%1,2)
0
1 .
9
9
(1,21
a(
The first block row contains ~ + 7/rows. These equations express the additional ~ + 7/interpolation conditions. The second block row has ~ - 7/rows. In view of the degrees of c(2) and a (2), the update would give an approximant of type (/3 + ~, a + ~). The equations of the second block row express that the numerator degree should not be fl + ~ but fl + 7/= .~ + ~ + (7/- ~r hence the ~r 7/equations. For the columns the subdivision is ~ columns for the first block column and ~ + 1 columns for the second one. The coefficient matrix of (7.13) with the first column of the second block column deleted is square and will be denoted by T1,2.
C o r o l l a r y 7.13 The following are equivalent 1. The matrix T1,2 is nonsingular.
2. The data (G,H(2),~ (2)) are weakly normal, which means that (fl + ~l, a + ~) is a weakly normal point for this problem. 3. The data (R(1),H(1'2),~(1,2)) are weakly normal.
374
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
Taking into account that ord (u (1'2)) - 2, we can compute the first column of B(1'2) as the solution of the following set of linear equations if ( + r/ > 0.
O
v~ v~ I,
.
0 1
(:.:4)
l
0
(1,2) u(+l
0
If ~ + 77 > 0, it follows that vO'2) - 0. If no new interpolation conditions are added, i.e. when : + 77 - 0, v0(1,2) - 1 / s o(:) ~ 0 and the first column of B(:'2) can be computed as the solution of the following set of linear equations. -
(:,2)
-
v1
0 ~
(1,2)
T1,2
;II:I)
-- __V~ 1,2)
+1
~!1,2)
v ~+n+2 (:) -
(1,2) . U~+l
0
_
Let us consider two special cases of the previous general recurrence between two arbitrary weakly normal data points in the Pad~ table. We shall look for the next weakly normal data point on a diagonal and on a row in the Pad~ table.
Diagonal
path
Suppose we know the canonical (fl, a)-basis matrix B (1) for the weakly normal data point 03, a). We want to construct B(2), the canonical basis matrix for the next weakly normal data point on the same diagonal in the Pad~ table, i.e., of the form (fl + ~, a + ~). For general ~, not necessarily minimal, the coefficients of the polynomial elements of B(1,2) can be found as the solution of the following set of linear equations (we drop the superscripts in
7.4.
PADE
375
APPROXIMATION
u (1'2), V(1'2), C(1'2) a n d a (1'2) a n d t h e s u p e r s c r i p t s in s (1) a n d r (1))
T(~)
Vl
CO
0
--7' 0
V2
Cl
0
-rl
9
9
9
9
9
~
v(
c(-1
0
'/L2
al
0
-rl~_l --r~
u(
a(-t
0
--r2~_ 2
I
--r2~-i
ul~+l
at~
(r 5)
where
80
0
"'"
81
80
"9
0
-..
0
0
ro
"'.
0
0
9
T(~)
-
S~-1 8~
9
9
S~-2
9. .
80
r~-2
...
ro
0
8~-1
" " "
81
r~-I
999
rl
ro
r(_l
r(_2
r(
r(-1
9 9
, ~
,
82~-2
82~-3
9. .
8~--I
r2(-3
""
82~-1
82~-2
9. .
s~
r2(-2
999
a n d w h e r e vo - uo - ul - 0 a n d a w e a k l y n o r m a l d a t a p o i n t , the smallest n o n s i n g u l a r m a t r i x T ( ( ) r k - 2 - 0 a n d rk-1 ~ 0. This T ( k )
So
0
ao - 1. B e c a u s e (f~ + ( , a + ( ) has to be m a t r i x T ( ~ ) should be n o n s i n g u l a r . T h e occurs for ~ - k w h e n ro - r l = " " has a lower t r i a n g u l a r f o r m
...
0
0
...
0
0
"'.
0
0
0
0
0
0
9
81
80
9
.
0
8k-1
8k-2
" 9"
80
0
Sk
8k-1
9
s1
rk-1
9
T(k)
-
9
9
9
9
9
9
82k-2
82k-3
82k-1
82k-2
.
9
...
"'" 9
9 9
9
.
9
9
9. .
8k__ 1
r2k-3
"''
Sk
72k-2
"""
rk-1 ?'k
.
.
0 ?'k-1
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
376
and the system becomes
2}1 ?32
T(k)
r c1
Vk 2t2
Ok-1 al
9
.
Uk Ukq-1
ak-1 ak
0
0
0 0
0 --7'k_ 1
0
--Tk
9
~
9
~
0
--r2k_2
1
--~'2k-1
Because of the lower triangular structure of the coefficient matrix T(k) and the initial zeros in the columns of the right-hand side, we get that
v(z) - 0
u(z) - uk+lz k+l ~ and
c(z) - Ck-1 Zk-1
with Uk+l - 1/rk_l # 0 and ck-1 - - r k - 1 / s o ~ O. The polynomial a(z) of degree _< k is the solution of
o r d ( c k _ l z k - l s ( z ) + r(z)a(z))
-
Ck-lS(z) + zr(z)q(z -1)
-
2k
or
with ord ( c k - l s ) -
O, ord ( z r ) -
k, ord (r') > k + 1 and q(z -1) - z-ka(z).
Hence, according to Definition 1.12, we can write
q(z -1)
-
a(z) - zkq(z -1)
-
- ( s d i v zr)ck_l
or z k - ( s div zr)ck_l .
Note that Ck_ 1 WaS chosen such that a(0) - 1. Finally, the basis matrix B (1'2) looks as follows
I B(I'2)(z)
-
0 "ll,k+l zk+l
Ck_ 1Z k-1 ] a(z)
with a defined by (7.16). We obtain the same results as in Section 5.2, more precisely, as in Theorem 5.3 taking into account that the (v, u) here are z times the (v, u) polynomial couple of Theorem 5.3. In a similar way, canonical basis matrices can be computed on an antidiagonal following a sequence of subsequent weakly normal data points.
377
7.4. P A D E A P P R O X I M A T I O N Row
path
To find the elements of the basis m a t r i x B (1'2) t r a n s f o r m i n g the c a n o n i c a l basis m a t r i x B(1) for t h e w e a k l y n o r m a l d a t a point (13, a ) to the c a n o n i c a l basis m a t r i x B (2) for the w e a k l y n o r m a l d a t a point (~, a + [), we have to solve t h e following set of linear e q u a t i o n s as a special case of (7.13) a n d (7.14) (also here we d r o p the s u p e r s c r i p t s , b u t d e n o t e the u, v, c a n d a w i t h a p r i m e to d i s t i n g u i s h t h e m f r o m the d i a g o n a l case) "
v~ v~
c~ c~
0
--T O
0 --r~-2 1 0 0
T([)
a _x
--r~-I
(7.17)
0 0
,
0
0
w i t h a~ - 1, v~ - u~ - u~ - 0 a n d w h e r e t h e coefficient m a t r i x T(~r s t a n d s for "
so
0
Sl
SO
9
.
...
9
0
0
.
r0 9
o
9
... ". ,
0
0
0
0
,
9
now
-
9
.
,
-
0 0 9
_ v#+l
If c~ ~t 0, the m a t r i x
999 999 9
...
0 v#+l ,
... 999
0 c~
c~
"..
c~-~+2
c~ C~_l
.
v~_~+3
v~_~+2
c~_~+1 _
[ 0] v#+l
is n o n s i n g u l a r . cient m a t r i x
0 0
V~+l v~
c~
If c o -- O, t h e n V~+l r 0 b e c a u s e the highest degree coeffi-
[ v/3+1 C/3 ] U~+l
a#
is n o n s i n g u l a r . S u p p o s e now t h a t c~ - c~-1 = "-" = c~-k+2 - 0 a n d c ~ - k + l ~t 0 a n d t h a t r0 - rl = . . ' = rl-1 - 0 a n d rt 7t 0. It can easily be
378
CHAPTER
7.
GENERAL
RATIONAL
INTERPOLATION
checked that the smallest nonsingular matrix T(~)is reached for ~ - k + l T(k +
l)O(z+x)x(k-1)
L( so, . . . , sk +z_ l )
L(rz, 99 r~+z-2)
R(v~+~,...,v~_~._z+2)
O(a+z)x(k-1)
O(k+z)x(z+~) O(k-t)x(z+i) R(c#_k+~, . . .,c~_~_z+~)
with L a lower triangular Toeplitz matrix
L(to, tl,. . .,tk)
I to
.."
0
i
"'.
i
-
tk
.. 9 to
and R a lower triangular Hankel matrix
R(to, tl,. . .,tk)
--
I O .-:
.."
to
"..
to 1 "
tk
The system (7.17) has the form" 0
0 ~
I Vl T(k+t)
u2 u;
c1
0
0
0
--7"l
1
--rk+l-1
0
0
a! L,0
9
~
o
o
0
0
Here we have split up a'(z) as a'(z)-
at(z ) + zka~H(Z)-
[1
z ...
zk-1]a'c +
zk[1 z .-. z~-k]a~y.
The notation a'L,O refers to the fact that we dropped the constant term (which is 1). Similarly we have set ~"(z)-
z~[1 z "'" zk-~]u 'L + z k+~ [1 z ... z ~-k]us
7.4. P A D E A P P R O X I M A T I O N
379
It is clear that v'(z) - 0 and u'(z) - u~z k. Hence we can write the above system of hnear equations as
ord(s(z)d(z)+r(z)a~L(Z))>_ deg (v(z)d(z) + c(z)zka~(z)) <_
k +l + Z.
Because o r d ( s ) - 0, o r d ( r ) - l a n d o r d ( a ~ ) - 0, we get that ord ( c ' ) - l from (7.18). Because deg (v) - fl + 1, deg (c) - fl - k + 1 and deg (a~z) _< l, we obtain from ( 7 . 1 9 ) t h a t deg(c') <_ I. Hence, c'(z) - c~z t with c~ # 0. From (7.18), we derive that
a~L(Z) -- --(s(z) div r(z)z k-t-1)c~z k-1 and from (7.18), we obtain that
a ~ ( z ) - - ( v ( z ) z I div c(z)zk)c~. The previous recurrence on a row in the Pad~ table allows to compute the canonical basis matrix at subsequent weakly normal data points (fl, a), i.e., points where the Toeplitz matrix
Tg -
......
j=l,2,... ,a
is nonsingular. If we want to solve a set of linear equations
T2.
- b,
(7.20)
we can apply the recurrence until we finally reach the point (~, a*). Because the basis matrix contains also the parameters for the inversion formula, we can solve (7.20)in an efficient way, i.e., using O((a*) 2) operations in the field F. If exact arithmetic is used, there can be no problems of numerical instability. However, scaling could be required. On the other hand, when finite precision is used, the computed solution of the linear system could differ a lot from the exact solution even if T~. is well-conditioned. To increase numerical stability, it is necessary to avoid the weakly normal data points in the Pad~ table for which the Toeplitz matrix T~ is ill-c~176 Then, in general we have to use the recurrence relation jumping from one weakly normal data point to a next one not necessarily the immediate next one. The criterion to decide whether T~ is ill-conditioned or not, is the so-called look-ahead criterion.
CHAPTER 7. GENERAL RATIONAL INTERPOLATION
380
7.5
Other applications
In this chapter we have seen how the ideas of recursive computations in Pad~ approximation can be generalized to general interpolation problems. We have illustrated how this specializes in the familiar case of Pad~ approximation. In fact we showed that all the solutions satisfying the interpolation conditions formed an F[z]-module and all these solutions could be described by a basis matrix. By imposing degree conditions on the elements of the basis, one can easily write down all solutions having a given degree structure. By adding new interpolation conditions and/or changing the degree conditions, the basis matrix changes accordingly. Pad~ approximation is not the only application of this general idea. There are many other applications that fit in this general framework. Explaining them all in detail would lead us too far. We therefore give some examples that have appeared in the literature and refer to the papers for details. 9 In [227] an algorithm is presented which computes Pad~-Hermite approximants along a diagonal in the Pad~-Hermite table. 9 Beckermann develops a similar algorithm for the M-Pad~ approximation problem in [11]. 9 In [226] the vector rational interpolation problem was solved in a similar way. As we have seen in Section 7.3, the updating formula for an arbitrary change of the degree conditions and arbitrary new interpolation conditions is possible. This general updating formula can be used to jump from a weakly normal data point in the Pad~ table to another such point. As a special case, the recurrence formula was given when the other point is the immediately next weakly normal data point along a diagonal or along a row in the Pad~ table. When following a diagonal path, the basis matrix at each point of the path can be found by solving a linear system of equations where the coefficient matrix is Hankel. These Hankel matrices are nested, i.e., the previous one involved is a leading principal submatrix of all the following ones. At each point of the path, the basis matrix contains the parameters of an inversion formula for the Hankel matrix involved. This leads to a procedure to solve a Hankel system of dimension n using O(n 2) arithmetic operations. Similar results are obtained for nested Toeplitz matrices when a row path is followed.
7.5. OTHER APPLICATIONS
381
As we have stressed before, in this book we did not investigate the numerical stability of the recurrence relations when they are computed in finite precision. However, the main tools are available to enhance the numerical stability of the computations by using look-ahead techniques. Indeed, the general updating formula can be used to jump from one weakly normal point to another one, which need not be the next one on a path that one is following (e.g., a diagonal or a row in the Pad~ table). Indeed, for any general interpolation problem, one can decide to make an update only when the weakly normal point is well-conditioned. This means a weakly normal point where the basis matrix (or the system to be solved at that point) is well-conditioned. It is not an easy task to decide whether this is the case or not. Condition numbers for matrices involve the norms of the matrix and its inverse. However, as illustrated in the Hankel and Toeplitz case, we do have the parameters available at every stage to compute the inverse, and it might even be possible to estimate the norm of the inverse or the condition number itself, without ever computing the inverse. This gives a criterion to decide whether to update or not to update. This is the basic idea of look-ahead. We give some examples in the literature where these ideas are used. 9 For Hankel matrices, Freund and Zha [99] develop a look-ahead Trench algorithm, Cabay and Meleshko [49] construct a look-ahead algorithm for Pad~ approximants while Bojanczyk and Heinig [18] give a lookahead Schur algorithm. 9 An overview of these three methods generalized to the block Hankel case as well as several connections can be found in [231]. 9 For scalar Toeplitz matrices, several of these look-ahead algorithms have been designed [51, 52, 93, 98, 103, 123, 150, 155]. 9 Even superfast, i.e. requiring O(n log 2 n) operations, look-ahead algorithms were developed for Toeplitz matrices [126, 129, 130]. 9 For the block Toeplitz case we refer the reader to [232, 233]. 9 In [117] stable look-ahead versions of the Euclidean and Chebyshev algorithm are handled. 9 An error analysis is made for Hankel matrices in [49], for generalized Sylvester matrices (connected to multi-dimensional Pad~ systems) in [44, 45] and for block Toeplitz matrices in [233].
382
C H A P T E R 7. GENERAL R A T I O N A L INTE, R P O L A T I O N 9 In [46] the results of numerical experiments are shown using the weakly stable algorithm of [45] for computing Pad~-Hermite and simultaneous Pad~ approximants along a diagonal in the associated Pad~ tables. In [230] a look-ahead method to compute vector Pad~-Hermite approximants going from one perfect point to another on a diagonal in a vector Pad~-Hermite table is designed. This generalizes [48] where the scalar Pad~-Hermite problem is handled. The connection with matrix Pad~ approximants is given. It is also explained how this method can be used to solve block Hankel systems of equations where the blocks can be rectangular. Another generalization is given in [47].
Besides using a "look-ahead" strategy, only recently a totally different approach was taken to overcome the possible instabilities of the "classical" algorithms. The Gaussian elimination method uses (partial or complete) pivoting to enhance numerical stability. Unfortunately, pivoting destroys the structure of a Hankel or Toeplitz matrix. However, other classes of structured matrices maintain their structure after pivoting and thus fast as well as numerically stable methods can be designed. In [136], Heinig proposed for the first time to transform structured matrices from one class into another and to use pivoting strategies to enhance the numerical stability. It is shown how Toeplitz matrices can be transformed into Cauchy-like matrices by the discrete Fourier transformation, which is a fast and stable procedure (see [182]). For Woeplitz-plus-Hankel matrices this was done in [137]. Real trigonometric transformations were studied in [111, 140, 141]. A matrix M = [mkl] is called Cauchy-like if, for certain numbers yk and zl, the rank of the matrix [(yk - zl)mkl] is small compared to the order of M. Pivoting does not destroy the Cauchy-like structure. For Cauchy-like systems several fast algorithms exist [110, 111, 113, 136, 144, 166]. Instead of transforming into a Cauchy-like matrix, [140] explains how to transform a Toeplitz matrix into paired Vandermonde or paired Chebyshev-Vandermonde matrices and how to solve the corresponding systems of linear equations. In [139] a Toeplitz system is also transformed into a paired Vandermonde system, which is solved as a tangential Lagrange interpolation problem. In [174], a Hankel system is transformed into a Loewner system. The parameters of an inversion formula for this Loewner matrix are computed by solving two rational interpolation problems on the unit circle. Recently Gu [122] has designed a fast algorithm for structured matrices that incorporates an approximate complete pivoting strategy. For an overview of different transformation techniques and algorithms we refer the reader to [102, 140, 141,142] and the references cited therein.
7.5. OTHER APPLICATIONS
383
Very recently Chandrasekaran and Sayed [53] derived an algorithm that is provably both fast and backward stable for solving linear systems of equations involving nonsymmetric structured coefficient matrices (e.g., Toeplitz, quasi-Toeplitz, Toeplitz-like). The algorithm is based on a modified fast QR factorization of the coefficient matrix. To develop this algorithm, the theory of low displacement structure is used [166]. In [132], Hansen and Yalamov perturb the original Toeplitz matrix when ill-conditioned leading principal submatrices are encountered. Hence, also the solution of the corresponding linear system is perturbed. Its accuracy is improved by applying a small number of iterative refinement steps. See also [241].
Chapter 8
W a v e l e t s and the lifting scheme Recently a somewhat unexpected application of the Euclidean algorithm was described in wavelet theory where it was shown that by the Euclidean algorithm any wavelet transform could be decomposed in a number of elementary so called lifting steps. Rather than using a Fourier approach to introduce wavelets, we use the lifting scheme which will be a central element in our development. Most of this chapter can be found in the papers by Sweldens and Schrhder [220] and in Daubechies and Sweldens [73].
8.1
Interpolating subdivisions
Suppose that we know a continuous signal y($), t C R via sampled values at integer points x0,k = k, k C Z: y0,k = y(x0,k), k C Z. Of course we can not reconstruct the signal y(t) completely if nothing more is known about y(t). However, we can fill up the values in between these integer abscissas by interpolation. One step in this process is constructing a finer mesh z l,k where xl,2k = x0,k and xl,2k+l = (x0,k + x0,k+l)/2. The values at xl,2k are left to be y0,k, while at the midpoints xl,2k+1, we compute a value by for example linear interpolation. Thus we set y l , 2 k - Y0,k and
1 yl,2k+l - ~(y0,k + y0,k+l),
k C Z.
This is illustrated at the left-hand side of Figure 8.1. This procedure can be repeated again and again to fill up the gap between the integer values of 385
386
CHAPTER
8.
WAVELETS
Figure 8.1" interpolating subdivision
level
level k
l e v e l k q-
I
I
Linear interpolation
z0,k completely. We set for l yl+ 1,~k - Yl,k
and
~
k
levelk q- 1
Cubic interpolation
1, 2 , . . . 1 Yl+l,2k+l - -~ (Yl,k + Yl,k+ 1 ),
k C Z.
This will result in a piecewise linear approximation of the unknown function. The result is a polygonal line connecting the given points at level 0. Note t h a t in this way we generate representations of the function at different levels l where l = 0 corresponds to the coarsest grid and as l increases, the grid, and hence the representation of the function becomes finer. In the limit, as l --+ oo, we obtain the continuous piecewise linear approximation. Of course this interpolating subdivision is but the simplest form of a procedure t h a t could be easily generalized to interpolation with a higher order polynomial of an odd degree N ' = N - 1 which uses N = 2D interpolation points. This N is called the order of the subdivision scheme. To define the value of y ( t ) at a midpoint on level l + 1, this will require D points to the left and D points to the right on the previous level I. Thus at level l we define polynomials Pk of degree N - 1 such that Pk(Xl,k+i) -- Yl,k+i,
for
i -- - ( D - 1), - ( D - 2 ) , . . . , D - 1, D.
We set at level l + 1 Y l + l , 2 k -- Pk ( X l + l , 2 k ) -- Pk ( X l , k ) -- Yl,k,
k C Z
8.1.
INTERPOLATING
SUBDIVISIONS
387
and YlT1,2k+1 -- P k ( Z l T 1 , 2 k + l ),
kEZ.
A complication which did not occur in the case of linear interpolation is that when only a finite number of points is given to start with at level 0, say y0,0,..., Yo,L, then we should only fill up the function in the interval [x0,0, xO,L] = [0, L]. Then when we want to find interpolation values at level l + 1 for points near the boundary, we can not find enough points on level l which are symmetric with respect to the point at level l + 1 where we want to define the new function value. This can however be easily dealt with by using an asymmetric interpolation near the boundaries. This is illustrated for the left boundary in the right-hand side of Figure 8.1. For simphcity, we assume for the time being, t h a t we consider the whole real line (and not a finite interval). Suppose we apply this interpolating subdivision process to an impulse at level zero. Thus to the d a t a y0,k = 50,k. The result after infinitely many steps is a continuous function which is called the scaling f u n c t i o n for this process. In the linear subdivision scheme this results in a hat function and for higher order interpolation schemes, one obtains a similar but smoother function. T h e o r e m 8.1 If ~o(x) is a scaling f u n c t i o n for an interpolation subdivision scheme with polynomials of degree N ~ = N - i = 2D - 1, then it has the following properties 1. It has compact support: [ - N + 1, N - 1].
it is zero outside the interval [ - N ~, N ~] =
2. It satisfies qo(k ) = 5o,k . o
The f u n c t i o n ~v and its integer translates reproduce all polynomials of degree < N . L e. y~kP~p(z - k) - z r',
O < p < N,
z C R.
k
4. It is smooth: ~o C C ~ with a = a ( N ) depending on N . nondecreasing with N : the larger N , the smoother ~p.
This a( N ) is
5. It satisfies a dilation equation: there exist filter coefficients hj such that 1v
-
hj (2
j=-N
- j).
(8.1)
C H A P T E R 8. W A V E L E T S
388
P r o o f . Most of these statements are easily verified. ~
,
This follows directly by construction. On level 1, there will be nonzero values in the interval [ - N ' / 2 , N'/2]. For level 2, we should add to this N~/4 on both sides, In general, level l adds to the previous support N~/21 on both sides. Thus the support is symmetric and the right boundary is given by ~ 1 N~/2t - N~" This is also by construction since the values Y0,k - 80,k at x0,k - k are maintained.
3. Since at every level, an interpolating polynomial of degree N ~ is used, the newly generated function values will be values of a polynomial of degree N ~. Thus if y(t) were a polynomial of degree _< N ~, it will be reconstructed exactly. 4. This is less trivial and we refer to the literature [85, 86] for a proof. ~
For this, it is sufficient to observe that when starting from level 0 with the delta function, then at level 1 this will generate values in the points xl,k, k = - N , . . . , N. Call these values hi. It is obvious that the result after infinitely many refinement steps starting from level 0 with the delta function or starting from level 1 with the coefficients hj will be the same. This implies the dilation equation. Note that h2j - ~0,j, so that in particular h - g - hN -- O. 0
E x a m p l e 8.1 The filter coefficients associated with linear interpolation are given by (h-l,ho, hl) = ( 1 / 2 , 1 , 1 / 2 ) . The scaling function is the hat function which connects ( - 1 , 0), (0, 1), (1,0) and is zero outside [-1, 1]. Define the functions ~pt,k(x) - ~p(2tx - k) which are scaled translations of the scaling function qa. Note that ~l,k(x) is the function that would result in the limit from a subdivision scheme that starts at level l with the data ~0,,-k, i C Z. We have C o r o l l a r y 8.2 The functions ~l,k(x) - ~P(2lx-k)where ~(x) is the scaling
function of a subdivision scheme with filter coefficients (hi) satisfies ~Pl,k -- ~
hi-2k~Ol-bl,i. i
(8.2)
8.1. I N T E R P O L A T I N G SUBDIVISIONS Proof.
389
This follows from (8.1) when we substitute 2 I x - k for x and set
j-i-2k.
[3
If we define the spaces V l - s p a n { ~ / , k ' k 6 Z},
l-- 0,1,...,
then, because each 7~l,k is a hnear combination of the ~l+l,i at the next level, it should be obvious that
VoCV~cv2c.... If f 6 ~ , then this means that f - ~ k )~l,kg~l,k. This means that f is the limiting function of a refinement process which starts at level l with the values )q,k. Since f 6 ~ C ~+1, it should also be possible to write it as -
i
The following theorem gives a relation between the coefficients ~l,k at level l and the coefficients ),t+l,, at level l + 1. Theorem
8.3 Consider a refinement scheme with filter coefficients hi. If
)q+l,i~Ol+l,i(T'),
f ( x ) - ~ ~kl,kg~l,k(X) -- ~ k
(8.3)
i
then )~l+l,i - ~ hi-2k)q,k. k
P r o o f . Substituting (8.2) in (8.3) gives the result. The subdivision scheme we have introduced in this section gives a brief motivation to consider the representation of a function at different resolution levels, i.e., to the notion of multiresolution which we discuss in the next section. We note that this interpolation subdivision scheme, which is due to Deslauriers and Dubuc, is but one example to arrive at this multires~176 There are other subdivision possibilities. In the paper [220] an example of an average interpolating subdivision scheme, which is due to Donoho, is treated in parallel. In the first one, an even number of points is used to produce a value on the next level, while in the second one, an odd number of points is used.
C H A P T E R 8.
390
8.2
WAVELETS
Multiresolution
As we have seen in the previous section, we can represent a function at different levels of resolution. This idea is the basis of the notion multiresolution as it is known in Fourier analysis. In the previous section, we started from a set of discrete samples of a continuous function and we have introduced the interpolating subdivision scheme to build a continuous guess of what the continuous function looked like. Here we want to invert this way of thinking and we suppose t h a t we know the continuous function, thus we know the limit of the subdivision scheme and we want to reconstruct the previous levels from it. Another difference with the previous section is that there we have assumed most of the time that the functions were defined on the whole real line. Many of the results obtained there still hold in the case of a finite interval. However some results from classical wavelet theory are not inherited for a finite interval. For example, the 7~t,k in the previous section were obtained as shifted and dilated versions of the scaling function 7~. It turned out that they could also be considered as the limit of an interpolating subdivision scheme which started from a unit pulse at position k on level I. For a finite interval, the latter is still true, but, due to b o u n d a r y effects they are not the shifts and dilates of one single scaling function any more. In this section we shall thus suppose that we deal with a continuous function known on a finite interval. Without loss of generality, we assume t h a t this interval is [0, 1]. In practical digital computations, the continuous signal is only known in a number of discrete points, which we suppose to be on a level fine enough to be considered as a good approximation of the continuous signal. Each time we move to a coarser level, we loose one out of two function values. Thus, inp principle, at the coarsest possible level, we will have only one function value left, which is some average (DC value) of the continuous function over the interval [0, 1]. Of coarse, in the other direction, one needs at least N + 1 points to start an interpolating subdivision scheme with a polynomial of degree N, and it is clear t h a t also the inverse scheme will come into trouble because of the b o u n d a r y effects when there are too few points to generate a coarser level from. Thus in the sequel it is tacitly understood that the coarsening may stop early and not go down to level l = 0 if this would cause problems. Again most of the results as we present them in this and the subsequent sections also hold for the whole real line. In fact, our examples are usually given for the case of the real line. This is the simplest case since we do not have to take the b o u n d a r y effects into account. Also, the wavelets on a finite interval are
8.2.
391
MULTIRESOLUTION
so called second generation wavelets and readers who are familiar with the classical wavelets will probably be more at ease with the case of the real fine, i.e., with wavelets of the first generation. Let us bring our previous observations in a more formal scheme, called a multiresolution. Let L 2 - L2([0, 1]) be the Hilbert space of functions defined on [0, 1] for which the inner product is
( f , g)
where w ( z ) is some Suppose at level at the interpolation subdivision scheme,
- foI f ( z ) g ( z ) w ( z ) d z ,
positive weight function. l - 0, 1,.. 9 we have function values yt,k, k - O, 1, . . . , 2 l points zt,k - k2 -I , l - O, 1 , . . . , 2 z. By our interpolating we can define the functions ~l,k(z),
for
k-0,1,...,2
l
which are obtained by the interpolating subdivision applied to a unit pulse at position k on level I. Thus ~l,k(zt,j) -- 6j,k. Then consider the function spaces V~
-
span{~/,k
"k
-
0,...,2/}.
Formally, a multiresolution is a set of closed subspaces V~ such that V~ C VI+I and such that [.JV~
is dense in L 2.
/>0
This latter requirement is to ensure that every function f C L 2, thus a signal with finite energy, can be approximated arbitrary close with scaling functions. Thus, if Pl represents the projection in L 2 onto V~, then we should have l]m P l f - f ,
l--,oo
for every f c L
2.
From the previous section, it should be clear that polynomial subdivision schemes do generate a multiresolution. Note that in the case of the real line, the multiresolution was characterized by the scaling function ~ or equivalently by the filter coefficients hi. For a finite interval however, the latter will not be true anymore. One possible generalization is to assume that instead of using the same filter coefficients hk at every level and for each function ~l,k, we let them depend on the particular function ~l,k, i.e., on l and k. Thus we have refinement equations defining the relation between
C H A P T E R 8. WAVELETS
392
the basis for the resolution space V~ and the basis for the finer resolution space F~+I which are of the following form
~Pt,k(z) -- ~ ht,k,j ~Pt+1,3(z ). J
(8.4)
The interpretation is then as before: qvt,k is the result of the interpolating subdivision scheme started at level l with a 1 at position k and zero everywhere else. Applying 1 step of the subdivision scheme results in values in the points zt+l,j at level l + 1, which we call ht,kd. Starting at level l + 1 with these data should result in the same limiting function ~t,k which is expressed in the above refinement equation. Because we shah allow these more general refinement equations, it will be impossible to use the classical dilation and translation approach to wavelets, of which the dilation equation (8.1) is the prototype. To obtain the projections Pl, it would be most convenient when the basis functions ~Pt,k were orthonormal. Then we had Pl f - ~
(f, q0t,k) qat,k.
k
The construction of an orthogonal basis is however rather restrictive. It does not leave much freedom for imposing other conditions like for example smoothness or symmetry conditions. The well known fractal character of the orthogonal Daubechies wavelets is a typical example. The next best thing to do is to use biorthogonal basis functions. Thus we assume that, at least formally, there is a dual multiresolution V0 C I~'t, 9 99 and a set of dual scaling functions ~t,k, such that -
span{~ol,k " k
-
0,...,
21),
and which satisfy the biorthogonality relation -
The projection is then given by
Pl f - ~
(f,
~l,k) ~l,k"
k
E x a m p l e 8.2 Consider the multiresolution generated by interpolating polynomial subdivision. Since the function values at level l are maintained at level l + 1, it follows that
e t f ( z ) - ~ f(zt,k)~t,k(z). k
8.2. M U L T I R E S O L U T I O N
393
Hence, we should have {f, @,k) - f(zl,k), and thus, assuming w - 1, @,k is a Dirac function: @ , k ( z ) - ~ ( z - zt,k).
One usually assumes a normahzation for the dual functions of the form
fo w( z )@,k( z )dz
1.
(8.5)
This should certainly hold for a multiresolution originating from polynomial interpolating subdivision. Indeed, as is in the example above, we can then represent the constant function f ( z ) - 1 as f(z)-
1 - y~ f(zt,,)~,,,(z)k
~
~,,,(z).
k
Thus taking the inner product with ~t,j gives (1, ~l,j) - 1, which is exactly (8.5). It can be shown that such a normalization is feasible in general. The polynomial reproducing property of Theorem 8.1(3), was given there at level 0 and for the real line. It only takes a glance at the proof to see t h a t the same is still true for multiresolutions for finite intervals which are based on polynomial interpolating subdivision schemes. Even more is true since it is easily seen t h a t the reproducing property holds true at every level. In terms of the projection operators, this means t h a t P l z p -- z p for 0 < p < N and l - 0, 1 , . . . where N - 1 is the degree of the polynomial used in the interpolating subdivision scheme. The corresponding multiresolution is then said to be of order N. D e f i n i t i o n 8.1 ( o r d e r o f a m u l t i r e s o l u t i o n ) A multiresotution {V}} is said to be of order N if the first N powers x p, p - 0 , . . . , N - 1 are reproduced. Thus if
Pt z t'
-
zv,
for p - 0,1, . . . , N - l a n d l - O ,
1, ....
If a multiresolution has order O, than it will not reproduce any power, not even the constant 1. Now assume that we have an orthonormal basis, thus @,k - ~vt,k, then this also means that the first N moments of any function f are preserved when going from one level to the next one. T h a t is, (z p, P t f ) - ( x p Pt+lf) for p = 0 , . . . , N 1 and f arbitrary. Indeed, since Ptxv -- x p, we have {zp, f ) =
]o1 z P f ( x ) w ( x ) d x - Jo1 x P P t f ( x ) w ( x ) d x ,
0 < p < N.
CHAPTER
394
8.
WAVELETS
For biorthogonal bases, the situation is a bit more subtle. Since the dual multiresolution consists of the set of nested subspaces V0 C ISrl C "" ", we can also associate with these the dual projections"
Pl f - ~
( f, ~Pl,k) ~l,k"
k These Pt are adjoints of Pt in the sense that for f , g C L 2
k Now, if the dual multiresolution has order N, i.e., Ptz r' - zp for 0 _< p < / V , then N of the moments of an arbitrary function f will be preserved. Indeed,
(T'p)Plf) -- ;Pl T,p,f) -- (Pl-J-IT,p,f) -- ('P,Pl-j-lf). E x a m p l e 8.8 Again in the case of a multiresolution which is obtained by an interpolating polynomial subdivision, we know t h a t the dual basis will consist of Dirac functions, and hence here we have /V - 0. Thus there are no moments that will be preserved. This is the price to pay for the freedom we have created by using biorthogonal instead of orthogonal bases. However, N - 0 will make this multiresolution impractical because it will cause unwanted so called aliasing effects. As we shall see later, the lifting scheme will provide a remedy for this problem. If P t f C I'rt C V}+x, then we can represent it as
Plf(x) - ~
)~l,kqOl,k(x) - ~_~ )~l+l,k~Ol+l,k(X).
k
(8.6)
k
Just as in the subdivision situation, it easily follows from ( 8 . 4 ) a n d (8.6) that
)~l+~,k - ~
hl,j,k At,j. J
Now assume t h a t we also have the refinement equation for the dual multiresolution
~l,k -- ~
hl,k,j ~lh-l,j.
(8. ~)
J The latter is needed if we want to go from a finer to a coarser level. Indeed, since )q,k - (f, (or,k), it follows by ( 8 . 7 ) t h a t
:x ,k -
ht,k,j J
t+l,j.
8.3.
WAVELET TRANSFORMS
395
Thus the primal filter coefficients describe how to compute the coefficients for a finer level, while the dual filter coefficients describe how to compute the coefficients for a coarser level. E x a m p l e 8.4 In the case of multiresolutions derived from interpolating subdivisions, we have seen that the dual scaling functions are Dirac impulses, thus the dual filter coefficients are given by ht,k,j - ~j-2k,0, and therefore the step towards a coarser level is extremely easy, since it corresponds to a decimation of the given coefficients: ~t,k = )~t+:,2k. We keep only one out of two coefficients. This is called subsampling.
8.3
Wavelet transforms
Consider a multiresolution {V~ : l = 0, 1,...}. Since V~ C V~+:, there is some complementary subspace Wl such that
~4+: =
~h 0 Wl.
If we consider the projections Ptf and Pt+if as two consecutive approximations for f , then we can arrange that the differences (Pl+l - Pt)f are in Wt. Like the scaling functions formed bases for the spaces Vi, we now introduce wavelet functions, which will form bases for the complementary spaces Wt. Since we have assumed that dim Vi = 2 t + 1, the spaces Wt should have dimension 2 t. Assume Wt - span{r
" k - 0, 1 , . . . , 2 t - 1}.
Since Wl C Vi+l, there should also exist a mixed refinement equation for the wavelet functions Ct,k, say
r
(9 ) - ~
g~,~,j v, +: ,j( 9 ).
( 8.8 )
J The Wt were spaces complementary to V~, but they are not completely arbitrary if we want that (Pt+: - P t ) f C Wt at every resolution level and for all functions f. We have indeed that PIVI+: = V~ and hence PIPl+:f = Plf. Thus Pt(Pl+:- Pt)f = O. Consequently PtWt = {0}. Thus taking Ct,k C Wt, it follows that its projection onto V~ should vanish. Thus
PtCt,k - ~
(r J
4~,j)~t,j - 0.
C H A P T E R 8.
396
WAVELETS
Whence we have the orthogonality relation (r
~t,j) - o
and by duahty also
~t,j) -
(r
0.
Let {r " k - 0 , . . . , 2 t - 1} be a basis for Wl. We assume that a similar construction is possible for the dual multiresolution { ~ } and that {r k - 0 , . . . , 2 l - 1} is a basis for l~t. Moreover, we choose the basis functions Ct,k and Ct,j such that they are biorthogonal, i.e.,
To arrive now at the notion of wavelet transform, we note that, if a function f is given at its finest resolution, say f,., - P,.,f E V,.,. Then, because v~
Vn-1
-l
*
o
0
Wn-1
--
Vn-2
0
Wn-2
0
Wn-1
~
n-1
=
Vo| 1=0
we should have n-1 2t-1
f,., - Pr, f - Pof + ~
~
"Yl,k~bl,k9
(8.9)
1=0 k = 0
The wavelet coefficients 7t,k are given by
- (s, where the Ct,k are the wavelet functions for the dual rnultiresolution. By filling the dual of (8.8)into (8.10)it follows that
7t,k - ~ .#t,k,j :~t+~,j. J The coefficients of the expansion (8.9) for a function fn E Vn is caned its wavelet transform. It gives the representation of the function in the wavelet domain. T h e o r e m 8.4 If the order of the primal and dual multiresolution are given by N and ffr respectively, then the primal and dual wavelet functions will have N and N vanishing moments respectively.
8.3.
WAVELET
TRANSFORMS
397
P r o o f . If the primal multiresolution has order N, then for 0 _ p < N it will reproduce x p at any resolution level: Ptz p - z p for 0 < p < N. Thus
( x'p, ~l,k)~l,k -- ZP,
0 <_ p < N.
k Therefore, and because the Ct,_k are orthogonal to the ~t,j, we find after taking the inner product with Ct,j that
o_
E x a m p l e 8.5 Consider the hnear interpolating subdivision scheme, then
Plf(z)-
~
~t,k~ol,k(z),
with
)~l,k- f ( k 2 -t)
and
991,&(~)- 99(2l~_k).
k Moving to a coarser level is obtained by simple subsampling, i.e., by keeping only the even samples. Thus ~l+l,2k -- ~t,k- The difference at the odd points is given by (Pl+l - Pl)f(Zl+l,2k+l), which is
1
"Tl,k -- (P/+I - Pl)f(~.l+l,2k+l ) -- ~/+l,2k+l -- -~(.)tl,k -3L ~/,k+l). Thus, since Figure 8.2" Linear interpolation level I
linesr i n t e r p o l s t i o n
level I + 1
subssmpling
1
difference
(8.11)
CHAPTER 8. WAVELETS
398 21-I
"yl,k~gl.q-l,2k+l(X),
Pt+lf(z)- PrY(x)- ~ k=O
we get n-I
P,.,f(z)- Pof(z)+ ~
21-I
~ 7t,kCt,k(z),
1=0 k=O
with r -- tb(21z - k) and r - ~o(2z - 1). Moving to a finer level can be simply described by saying that the value at the even points is maintained, while the value at the odd points is predicted by linear interpolation (averaging) and then corrected with the wavelet coefficients. Thus the wavelet coefficients give the error of the prediction, i.e., the linear interpolation error. See Figure 8.2. <5
8.4
The lifting s c h e m e
There is something terribly wrong with the simple wavelets as they were introduced in the previous section for multiresolutions generated by interpolating subdivision processes. We have seen t h a t moving to a coarser level was obtained by simple subsampling. Thus if the coefficients at level 1 4- 1 were 1, 0, 1, 0 , . . . , then this would result in coefficients 1, 1, 1 , . . . at level I. This is not what we would like to have. We would at least expect that some average value were maintained. Thus
f01
Pt+l.f(x)w(x)dx -
f01
Ptf(x)w(x)dx.
(8.12)
This means t h a t the zeroth moment should be maintained, t h u s / i t _ 1. As we know this is equivalent to saying t h a t the zeroth m o m e n t of the wavelet functions should vanish" 1
~0
Ct,k(:r.)w(z)dx
- O.
This explains that for a proper functioning, we can not just keep the even samples and omit the odd ones. We should somehow keep information about the odd samples as well. This means t h a t we should not write a wavelet solely in terms of the scaling functions of the coarser level as we did in (8.8), but we should also include scaling functions of the same level. This is the whole idea of lifting. So we start from a given multiresolution and construct
8.4.
399
THE LIFTING SCHEME
new one by leaving the scaling functions unchanged, but by modifying the wavelet functions with scaling functions of the same level. We set -
r
_
As we explained above, changing the wavelets, requires changing the dual scaling functions. These are given by ~Ol,k -- E
-o hl,k,j(Ol+l,J q- E j
Sl,k,i(bl,i i
where h~',k,3 are the refinement coefficients of the old dual multiresolution. E x a m p l e 8.6 Consider again the linear interpolation scheme. A possible choice for the new wavelet functions is 1 ~o(2z- I)- ~ o ( z ) -
r
1 ~o(z-
i)
and we set Ct,k(z) - r - k). Since P t + l f - PlY + Q , I with P~I E V~ and Q t f - (P~+i - P~)f E W~, we can write 2TM
2t
k=O
k=O
2 t-1
(8.13) k=O
Using 1
1
and the dilation equation 1
1
which we plug into (8.13), and setting z - zt,k, then because ~@,k(zl,j) -~k,j, we get 1 1 /~l,k - ~/+l,2k + -~"/l,k -}- -~Tl,k-1. (8.14) Note that the result can be made explicit by combining (8.11) and (8.14)" 1
"Tl,k -- - - ~ l + l , 2 k
1
+ "~l+l,2k+l -- ~,~I+1,2k+2
and
i
1
3
1
i -
w ~/+~,2k+2.
400
C H A P T E R 8.
WAVELETS
Note that the lifting formulation is more efficient than the direct application of these explicit filters. If we count division by a power of 2 as a shift operation and not as a multiplication, then the number of flops in the lifting scheme is 4 additions for each k. The direct application as given above requires a multiplication with the coefficient 3 and 6 additions per position k. The previous procedure of lifting some multiresolution to another, better performing multiresolution is called the lifting scheme. It has several advantages over the classical approach to wavelets. One of them is that the filters are never applied explicitly. It results in roughly halving the computation time with respect to the classical filtering approach. We note that the above construction is only one possibility, where we kept the primal scaling functions and the dual wavelet functions and adapted the primal wavelet functions and the dual scaling functions accordingly. We could as well have chosen to change the dual wavelet functions and adapt the primal scaling functions accordingly, while leaving the other bases as they are. This results in a dual lifting scheme. Let us now elaborate these ideas of a lifting scheme in the situation of interpolating subdivision. Using the refinement equation for the primal scaling functions, we have 5k,i - r
- E
hl,k,jCfll+l,j(Xl+l,2i)
-- hl,k,2i.
J Therefore the refinement equation is ~Pt,k - ~
+E
ht,k,j~t+l,j - ~ l + l , 2 k
J
hl,k,2jg-l~lg-l,2j 9-1"
J
So, after subsampling, i.e., setting ~t,k = ),t+l,2k, we use FIf -- ~ k
~l,k~Ol,k -- ~
~lg-l,2kqOlg-l,2k -J- ~
k
k
~
~l,khl,k,2jg-l~Olg-l,2jg-l.
j
This implies that the difference P t + l f - P t f can be expressed in terms of ~19-1,2k9-1 and hence ~bl,k -" ~19-1,2k9-1. By identification of the coefficients, we obtain the wavelet coefficients
"[l,k - ~1+1,2k+ 1 --
ht,j,2k+ l )~t,j . J
8.4.
401
THE LIFTING SCHEME
Since this multiresolution has N - 0, we propose to modify the wavelet functions as r -- ~ol+~,2k+1 -- Al,k~Ol,k- Bl,k~Ol,k+l. The coefficients Al,k and Bl,k are defined by requiring that the following moments vanish
/0 Thus N -
w(z)r
- 0 -
/01
z w ( x )r
z )dz.
2. Because 2 TM
2t
2 t-1
E /~l+l,kqPl+l,k -- E/~l,kqPl,k + E "Yl,k@l,k, k=O k=0
k=0
it follows like in Example 8.6 that the computation of the wavelet transform goes as )~l,k- )~l+~,2k + Al,kTl,k + Bl,k-171,k-1. The general lifting scheme is illustrated in Figure 8.3. We explain it with the following example. Figure 8.3" interpolating subdivision, lifting scheme "~l-
LP
--+
3
@ - -
3
v
v
,
@
i
E x a m p l e 8.7 For the hnear interpolating subdivision, the lifting scheme works as follows. The dual filter h is just 5k,0 and this is followed by a subsampling. This means that only the even samples are kept. This corresponds to ;Xl,k = ,~l+1,2k. The dual filter ~ has coefficients ( - 1 / 2 , 1 , - 1 / 2 ) since indeed by (8.11) we have 7l,k = )~/+1,2k+1 -- 1/2~/+1,2k- 1/2;kt+1,2k+2. Since only the odd samples are needed, this is followed by subsamphng again. Next, a lifting step is introduced, characterized by the filter s which has coefficients ( - 1 / 4 , - 1 / 4 ) . This corresponds to what is given in (8.14). This is but one step in the forward wavelet transform. The upper branch gives a low pass picture while the bottom branch gives a band pass picture of the signal. The same operation is then again applied to the low pass
C H A P T E R 8. W A V E L E T S
402
band etc. For the reconstruction, one has to apply the inverse operations in inverse order. First the inverse of the lifting by s is applied, which is obtained by replacing the subtraction by an addition. Then the signals are upsampled. This means that a zero is introduced between two subsequent samples. Then the filters h and g are applied to the two bands respectively. The filter h has coefficients ( 1 / 2 , 1 , 1 / 2 ) w h i l e the filter g has coefficient 1.
8.5
Polynomial formulation
To give a polynomial formulation of these wavelet transforms, we will introduce the discrete Laplace transform or z-transform. Suppose the signal has sample values Ak, then we introduce the series
A(~)- ~ ~,z -'. k
If a filter h with coefficients (hk) is applied to it, then this results in a new set of samples which are the coefficients from the series
h(z-~)a(z) where h(z) - ~_,k hk z-k. To express subsampling, we split the series in its even and its odd part. For any series
S(z) - ~ :~z -~, we set
re(z) -- E f2kz-k k
and
fo(z) -- E f2k+lz-k. k
Thus
J,~, ,~r f(z)+f(-z)
2
and
r 2~- f(z)- f(-z) jo~, : 2z_ 1
so that
f ( z ) - f~(z 2) + z-~ fo(Z2). If we agree on defining subsampling as taking the even samples, then subsampling of the series f ( z ) means replacing the series with its even part
/,(z) - l(z'/') + / ( - z ' / ' ) 2
8.5. POLYNOMIAL FORMULATION
403
To obtain the odd samples, we have to take the even part of zf(z) and the result will be fo (z). Let us now have a look at the lifting scheme as presented in Figure 8.3. In the upper branch, the signal is subsampled after filtering with h. Thus we have to compute only the even samples of the filtered signal. Since
(Ae(z2)he(z -2) + Ao(z2)ho(z-2))
=
+ z -1 (z2&(z2)ho(z -2) + Ao(z2)h~(z-2)), the result will be
A~(z)h~(z -~ ) + ho(z)ho(Z -1). In the lower branch a filter 9 is applied instead of h. Thus, we can describe this as a matrix multiplication
9o(z-1) 9o(Z-1)
r(z)
Ao(z)
9
After that, the lifting step is another matrix multiplication:
jAn,z,] [1 r(z)
0
1
r(z)
"
We can represent the whole operation as one matrix multiplication:
[An,z,] with
ao,z, 9(z) Ho(z) Go(z)
"
This matrix is called the polyphase matrix for the wavelet transform. In a similar way, the inverse transform can be represented by a polyphase matrix, say P(z) with
P(z)-
Ho(z) Go(z)
"
This corresponds to the scheme of Figure 8.4. If t5 _ I, then the wavelet transform just consists of taking the even and the odd samples. This is sometimes called the lazy wavelet transform. The corresponding filters are / ~ ( z ) - 1, ( ~ ( z ) - z -1.
CHAPTER 8. WAVELETS
404
Figure 8.4" Polyphase representation of wavelet transform ---> LP --->
r
P(z
-1)T
P(z)
- ~ Z -1
----~ B P --->
Figure 8.5: Classical representation of wavelet transform
)
~
~ LP
m
H
I
;BP
This polyphase representation corresponds to a more classical approach to wavelets. There the signal is filtered by a low pass f i l t e r / ~ ( z -1) and a band pass filter (~(z -1) which are both subsampled. See Figure 8.5. In the classical approach the filters (~ a n d / ~ are applied, that is a band pass and a low pass filter, and the filtered results are subsampled. In general, this is however computationaUy more demanding as the following example as well as Example 8.6 illustrate. E x a m p l e 8.8 For the case where we applied lifting to the linear interpolating subdivision as in Example 8.6, we have h(z)-
i
'
g(z)
-
1
- - ~ -~
1
1
z
2z 2'
s(z) -
-
1
(l+z)
-4
"
Thus -
h e ( z ) - 1,
-
ho(z)-O,
1
#e(z)--~(l+-),z
Therefore
1
P(z)
&o(~) ~o(z) _
i -~( 0
+ ~)
0]
-,(~-') 1 o I ~(1+ z)1
1
~ o ( z ) - 1.
8.5. POLYNOMIAL FORMULATION _
~3 - ~1( ~ + ; 1)
-
1(1 + z)
405
1 5 ( i + ~1]) _ [ /I~ (z) q~(z) - ] 1
Ho(z)
Co(z)
Thus -
2
1
3
1
1
1 2
H ( z ) - --8z- + -~ -~ + ~ + ~ z - gz and
-
1
a(z)-
--~ + ~-
i
-5
lz-2
'
which corresponds to our findings in Example 8.6. We have here the (N = 2 , / V - 2) Cohen-Daubechies-Feauveau biorthogonal wavelet [63].
Usually, one wants no information to be lost. This means that the signal can be perfectly reconstructed from its wavelet transform. In the polyphase representation, this is most easily represented by the condition that
p(z)f~(z-1)T _ I. Supposing that all the filters involved are finite impulse reponse filters, then they can be represented by Laurent polynomials. If both P(z) and its inverse should only have Laurent polynomial entries, then the determinant of P(z) should be a monomial, say of the form C z ~. Without loss of generality, we can assume that it is a constant which we shall moreover assume to be 1. We say that the polyphase matrix is then normalized or that the filters G(z) and H(z) are complementary. It is easily verified that perfect reconstruction implies in this case
fi~(~) - ao(~ -~ ),
~o(~) - - G ( ~ -~ ),
and
G(~) - -Ho(z-1),
Go(z) - H~(z -1).
In other words, the primal and dual filters for a perfect reconstruction scheme should satisfy
G(z)-z-lH(-z
-1)
and
[-I(z)--z-lG(-z-1).
The operations of primal and dual lifting are now characterized by the following theorem. T h e o r e m 8.5 (lifting) Assume that ( G ( z ) , H ( z ) ) is a couple of comple-
mentary finite impulse response filters, then
CHAPTER 8. WAVELETS
406
1. (G'(z), H(z)) will be another couple of complementary finite impulse response filters if and only if G'(z) is of the form C'(z)
- C(z)
+ H(z)
(z
where s(z) is a Laurent polynomial. 2. ( G( z), H'( z) ) will be another couple o] complementary finite impulse response filters if and only if H'(z) is of the form g'(z)
-
H(z) + G(z)t(z 2)
where t(z) is a Laurent polynomial. P r o o f . We only prove the first part, since the second part is completely similar. For the first part, it is sufficient to note that the even and odd components for H(z)s(z 2) are given by H~(z)s(z) and Ho(z)s(z). Thus the polyphase matrix for primal rifting has the form P ' ( z ) - P(z)
0
"
1
This proves the theorem.
[3
Note that the dual polyphase matrix for primal lifting is given by
P'(z)- P(z) [
1
0]
i.e., - [t(z)
-
We shall now consider the inverse problem. Suppose we are given the filters G, (~, H , / - / a s in some classical approach to biorthogonal wavelets. We shall give a procedure to decompose these filtering operations in elementary lifting steps which will lead to an efficient computation of the wavelet transform and of its inverse. The answer will be given by a factorization of the polyphase matrices. Suppose that we succeed in factoring such a polyphase matrix P(z) into a product of the form
P(z) = P~p(z)P~d(Z) " " "Pmp(z)Pmd(Z) with the elementary matrix factors given by Pip(z)-
0
1
ti(z)
1
'
"" '
8.6.
E U C L I D E A N D O M A I N OF L A U R E N T P O L Y N O M I A L S
407
where si(z) and ti(z) are Laurent polynomials. Then we have reduced the application of the polyphase matrix to a successive application of couples of dual and primal lifting steps to the lazy wavelet transform. Indeed, the multiplication with a matrix Pzd(Z) represents a dual lifting step and the multiplication with a matrix Pip(Z) represents a primal lifting step. We shall now show that such a factorization is obtained by the Euclidean algorithm. First we shall introduce the Euclidean algorithm for Laurent polynomials.
8.6
Euclidean
domain
of Laurent
polynomials
Let us consider the ring of Laurent polynomials over the field F which we denote as F[z, z-l]. We assume for simplicity that F is commutative. The units are all the invertible elements. Unlike for polynomials, where the units consist of all nonzero constants, here the units are given by all monomials az k a E F0 k C Z We turn this into a Euclidean domain as follows. First we set for any p e F[z,z -1]" I P l - - 1 if p - 0 and for p 7~ 0 }
9
u
Ip(z)l - ~ - l > 0
if
p(~) - ~ pkz k,
p~p~ # 0.
k-l
Then we define O ( p ) - 1 + IPl. The division property requires that for any two Laurent polynomials a and b, there should exist a quotient q and a remainder r such that a-
bq + r,
O(r) < O(b).
Such a quotient and remainder always exist, but they are far from unique. There is a much larger degree of freedom than in the polynomial case. We can best illustrate this with an example. E x a m p l e 8.9 Consider the Laurent polynomials
~(z)-
~-~ + 2z + 3z ~
a~d
b(z)-
z -~ + z.
Since ]b] - 2, we have to find a Laurent polynomial q(z) such that in [ a - bq[ < 2. Thus there may only remain at most two successive nonzero coefficients in the result. Setting q(z) - q_2z -2 + q _ l z -1 + qo + qlz + q2z 2
(other possibilities do not lead to a solution), we see that the remainder is in general r(z)-(a
_
bq)(z) - r_3z - 3 + r _ 2
Z-2
+r_lz-
1
+ ro + r l z + r2 Z2 + r3 Z3
CHAPTER
408
8.
WAVELETS
with r_3 r_2 r_l r0 rl r2 r3
--
q-2 1 -q-1 -q-a-q0 - q - 1 - q~ 2-q0-q2 3 - ql -q2
Now one can choose to keep the successive coefficients rk and rk+l, for some k E { - 3 , . . . , 2} and make all the others equal to zero. This corresponds to a system of 5 linear equations in 5 unknowns. Possible solutions are therefore q(Z) -- --2Z -2 -- 3Z -1 -~- 2 -]- 3Z
T(Z) -- --2Z -3 ~- 4Z -2
q(z) -- --3Z -1 + 2 + 3Z
r(z)
q ( z ) - z - ~ + 2 + a~ q ( z ) - z -1 + a~
~(z) ~(z) -
q(z) - z-~ - z
~ ( z ) - 2 z + 4~ ~
q ( z ) - ~-~ - z + 2 z ~
~(z) - 4z-
-
-
4Z -2
-- 2Z -1
-2z-~ -4 - 4 + 2~ 2 z ~.
It is clear t h a t the quotient and the remainder are far from uniquely defined. In general, if Ua
~(~)-
Ub
~ ~z ~ k=la
then we have
a~d
b(z)-
~ b~ ~ k=lb
Uq
q(z)-
~ q~ k=lq
with uq = u ~ - I b and lq = l ~ - u b so t h a t Iql = l a l + Ibl 9 The quotient has lal + Ibl + 1 coefficients to be defined. For the product bq we have Iqbl = lal + 21bl, thus it has lal + 21b I + 1 coefficients. Thus also a - bq has t h a t m a n y coefficients. Since at most Ibl subsequent of these coefficients may be a r b i t r a r y and all the others have to be zero, it follows t h a t there are always lal + Ibl + 1 equations for the lal + Ibl ~- 1 unknowns. W h e n these coefficients are made zero in a - bq then there remain at most Ibl successive coefficients which give the remainder r. We can conclude t h a t the quotient and remainder always exists and thus we do have a Euclidean domain. Therefore we can apply the Euclidean algorithm and obtain a greatest common divisor which will be unique up to
8.7. FA CTORIZATION ALGORITHM
409
a unit factor. It is remarkable that, with all the freedom we have at every stage of the Euclidean algorithm, we will always find the same greatest common divisor up to a monomial factor.
8.7 Factorization algorithm Suppose we start with a filter H(z) - He(z 2) + z -1Ho(z 2) and some other complementary filter G. The Laurent polynomials H~ and Ho are coprime. Indeed, if they were not, then they would have a nontrivial common divisor which would divide all the entries in P(z), thus also divide det P(z), but we assumed that det P(z) = 1, so this is impossible. The Euclidean algorithm will thus compute a greatest common divisor which we can always assume to be a constant, say K. This leads to [H~(z) Ho(z)]V~(z)...Vn(z)-[K
0]
with the Vk matrices of the form
~1
1 l
-qi(z)
where q~(z) are Laurent polynomials. After inverting and transposing, this reads
where the matrices Wk(z) are given by
Wk(z)-
) 1] -r [ qk( 1 0 "
We can always assume that n is even. Indeed, if it were odd, we can multiply the filter H with z and the filter G with z -1. They would still be complementary since the determinant of P(z) does not change. This would interchange the role of H~ and Ho which would introduce some "dummy" V0 which does only interchange these two Laurent polynomials. Let G~(z) be a filter which is complementary to H(z) for which G~ and G~ are defined by
P~(z)-
Ho(z)
G~,(z)
- Wl(z)...W,(z)
K0
K -1
"
C H A P T E R 8.
410 Because
qk(z) 1
1][
WAVELETS
qk,z,][01]_[01] [ 1 0]
0
1
0
1
1
0
qk(z)
'
1
we can set
nJ2[lq kl,z,][ 1 P~(z)-
II
0
k=l
1
0]
q2k(z)
1
0
K -1
"
In case our choice of G ~ does not correspond to the given complementary filter G, then by an application of Theorem 8.5, we can find a Laurent polynomial s(z) such that
P(z)-
1 ,(z)] 1 "
P~(z)
o
As a conclusion we can formulate the following theorem. T h e o r e m 8.6 Given two complementary finite impulse response filters ( H ( z ) , G(z)), then there exist Laurent polynomials sk(z) and tk(z), k = 1 , . . . , m and some nonzero constant K such that the polyphase matrix can be factored as
P(z)-
rail
1-I
k=l
o
10][
1
tk(z)
1
0
0]
K -1
"
The interpretation of this theorem is obvious. It says that any couple of complementary filters which does (one step of) an inverse wavelet transform can be implemented as a sequence of primal and dual lifting steps and some scaling (by the constants K and K - l ) . For the forward transform in the corresponding analysis step of a perfectly reconstructing scheme, the factorization is accordingly given by
.P(z) -
m[ 1 0][1
II k=l
--,~k(Z-1)
1
0
1
0
0]
K
"
E x a m p l e 8.10 One of the simplest of the classical wavelets one can choose are the Haar wavelets. They are described by the filters
H ( ~ ) - ~ + z-~
~d
1
C(z)- -~ +
1 -1 ~z
8.7. FACTORIZATION A L G O R I T H M
411
The dual filters are H(z)-
1
1
~+~z
-1
and
-1
G(z)--l+z
It is clear how these compute a wavelet transform" the low pass filter /~ takes the average and the high pass filter G takes the difference of two successive samples. Thus 1
J~l,k --
~()~l+l,2k -t- "~l+l,2k+l)
and
"Tl,k
= J~l+l,2k-t-1- ,~l-t-l,2k-
The polyphase matrix is trivially factored by the Euclidean algorithm as
P(z)-
i11j2] [10][1 lj21 1
1/2
-
11
0
1
"
The dual polyphase matrix is factored as
P(z)-
[lj211 E11][10] 1/2
1
-
0
1
1/2
1
"
One could object that these are not factorizations of the form we proposed above, but they are since we have left out the identity matrix in front of the two factors and an identity matrix between the two factors. For practical implementation, these of course do not do anything and they are just kept out of the computation. Thus, to compute the forward wavelet transform, we have to apply the lazy wavelet, i.e., take the even and the odd samples separately. Then, a first lifting step leaves the even samples untouched and computes the difference 71,k = )~l+~,2k+l -- ,~l+~,2k. In the next lifting step, this result is left untouched, but the even samples are modified by computing :Xl,k = )~l+l,2k + 1/27l,k. For the inverse transform, first one computes )~l+l,2k = ~t,k- 1/27l,k, and then )~1+1,2k+1 -- )~t+~,2k + 7t,k. This is just a matter of interchanging addition and subtraction. Note that in this simple example, there is not really a gain in computational effort, but as our earlier examples showed, in general there is.
Many more examples of this idea can be found in the paper [73].
Bibliography [1] N.I. Akhiezer [Achieser]. The classical moment problem. Oliver and Boyd, Edinburgh, 1969. Originally published Moscow, 1961. [2] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The design and analysis of computer algorithms. Addison Wesley, Reading, Mass., 1974. [3] R.V. Andree. Selections from modern abstract algebra. Holt, Rinehart and Winston, New York, 1958. [4] O. Axelsson. Iterative solution methods. Cambridge University Press, 1994.
[5] G.A. Baker Jr. Essentials of Padd Approzimants. Academic Press, New York, 1975. [6] G.A. Baker, Jr. and P.R. Graves-Morris. Padd Approzimants. Part II: Eztensions and Applications, volume 14 of Encyclopedia of Mathematics and its Applications. Addison-Wesley, Reading, MA, 1981. [7] G.A. Baker, Jr. and P.R. Graves-Morris. Pad~ Approzimants, volume 59 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2nd edition, 1996. [8] S. Barnett. Polynomials and linear control systems. Marcel Dekker, New york, 1983.
[9]
R. Barret et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, 1993.
[10]
G. Baxter. Polynomials defined by a difference system. J. Math. Anal. Appl., 2:223-263, 1961.
[11]
B. Beckermann. A reliable method for computing M-Pad~ approximants on arbitrary staircases. J. Comput. Appl. Math., 40:19-42, 1992. 413
414
BIBLIOGRAPHY
[12] B. Beckermann and G. Labahn. A uniform approach for Hermite-Pad~ and simultaneous Pad~ approximants and their matrix-type generalizations. Numer. Algorithms, 3(1-4):45-54, 1992. [13] B. Beckermann and G. Labahn. Recursiveness in matrix rational interpolation problems. Technical Report AN0-357, Laboratoire d'Analyse Num~rique et d'Optimisation, Universit~ des Sciences et Technologies de Lille, 1996. [14] E.I~. Berlekamp. Algebraic coding theory. McGraw-Hill, New York, 1968. [15] E. Bezout. Sur le degr~ des ~quations r~sultantes de 1' ~vanouissement des inconnues. Mdm. de l' Acad. Royale des Sciences, pages 288-338, 1764. [16] Y. Bistritz. Zero location with respect to the unit circle of discretetime linear system polynomials. Proc. IEEE, 72:1131-1142, 1984. [17] T.S. Blyth. Module theory. Claredon Press- Oxford University Press, 1990. [18] A.W. Bojanczyk and G. Heinig. A multi-step algorithm for Hankel matrices. J. Complexity, 10:142-164, 1994. [19] D.L. Boley, S. Elhay, G. H. Golub, and M.H. Gutknecht. Nonsymmetric Lanczos and finding orthogonal polynomials associated with indefinite weights. Numer. Algorithms, 1(1):21-44, 1991. [20] I~.C. Bose and D.K. l~ay-Chaudhuri. Further results on error correcting binary group codes. Inf. Control, 3:68-79, 1960. [21] I~.C. Bose and D.K. l~ay-Chaudhuri. On a class of error correcting binary group codes. Inf. Control, 3:68-79, 1960. [22] I~.P. Brent, F.G. Gustavson, and D.Y.Y. Yun. Fast solution of Toeplitz systems of equations and computation of Pad~ approximants. J. Algorithms, 1:259-295, 1980. [23] I~.P. Brent and H.T. Kung. Systolic VLSI arrays for polynomial GCD computation. IEEE Trans. Comput., C-33(8):731-736, 1984. [24] C. Brezinski. Padd-type approximation and general orthogonal polynomials, volume 50 of Internat. Set. of Numer. Math. Birkhs Verlag, Basel, 1980.
BIBLIOGRAPHY
415
[25] C. Brezinski. History of continued fractions and Padd approximants, volume 12 of Springer Set. Comput. Math. Springer, Berlin, 1990. [26] C. Brezinski. Biorthogonality and its applications to numerical analysis, volume 156 of Lecture Notes in Pure and Appl. Math. Marcel Dekker Inc., 1992. [27] C. Brezinski. Formal orthogonality on an algebraic curve. Ann. Numet. Math., 2:21-33, 1995. [28] C. Brezinski, editor. Projection methods for systems of equations. Studies in Computational Mathematics. Elsevier, Amsterdam, 1997. To appear. [29] C. Brezinski and M. Redivo Zaglia. Orhogonal polynomials of dimension - 1 in the non definite case. Rend. Mat. Roma, Set. VII, 14:127-133, 1994. [30] C. Brezinski, M. l~edivo Zaglia, and H. Sadok. Avoiding breakdown and near-breakdown in Lanczos type algorithms. Numer. Algorithms, 1(3):261-284, 1991. Addendum, vol. 2(2):133-136, 1992. [31] C. Brezinski, M. Redivo Zaglia, and H. Sadok. A breakdown free Lanczos type algorithm for solving linear systems. Numer. Math., 63(1):29-38, 1993. [32] C. Brezinski and H. Sadok. Lanczos type algorithms for solving systems of linear equations. Appl. Numer. Math., 11(6):443-473, 1993. [33] C. Brezinski and J. van Iseghem. Pad~ approximations. In P.G. Ciarlet and J.L. Lions, editors, Handbook of Numerical Analysis, volume 3. North-Holland, 1993. [34] C. Brezinski and J. van Iseghem. Vector orthogonal polynomials of dimension -d. In R.V.M. Zahar, editor, Approximation and Computation, volume 119 of Internat. Set. of Numer. Math., pages 29-39. Birkhs Verlag, 1994. [35] C. Brezinski and J. van Iseghem. A taste of Pad~ approximation. In Acta Numerica, pages 53-103, 1995. [36] l~.W. Brockett. Finite dimensional linear systems. John Wiley and Sons, New York, 1970.
416
BIBLIOGRAPHY
[37] A.M. Bruckstein and T. Kailath. An inverse scattering framework for several problems in signal processing. IEEE ASSP magazine, pages 6-20, 1987. [38] A. Bultheel. Division algorithms for continued fractions and the Pad~ table. J. Comput. Appl. Math., 6(4):259-266, 1980. [39] A. Bultheel. l~ecursive algorithms for non normal Pad~ tables. SIAM J. Appl. Math., 39(1):106-118, 1980. [40] A. Bultheel. Triangular decomposition of Toeplitz and related matrices: a guided tour in the algorithmic aspects. Bull. Soc. Math. Belg. Sdr. A, 37(3):101-144, 1985. [41] A. Bultheel. Laurent series and their Padd approzimations, volume OT-27 of Oper. Theory: Adv. Appl. Birkhs Verlag, Basel-Boston, 1987. [42] A. Bultheel and M. Van Barel. Pad~ techniques for model reduction in linear system theory. J. Comput. Appl. Math., 14:401-438, 1986. [43] S. Cabay and D.K. Choi. Algebraic computations of scaled Pad~ fractions. SIAM J. Comput., 15(1):243-270, 1986. [44] S. Cabay, A. Jones, and G. Labahn. Computation of numerical Pad~Hermite and simultaneous Pad~ systems I: Near inversion of generalized Sylvester matrices. SIAM J. Matrix Anal. Appl., 17:248-267, 1996. [45] S. Cabay, A. Jones, and G. Labahn. Computation of numerical Pad~Hermite and simultaneous Pad~ systems II: A weakly stable algorithm. SIAM J. Matriz Anal. Appl., 17:268-297, 1996. [46] S. Cabay, A.I~. Jones, and G. Labahn. Experiments with a weakly stable algorithm for computing Pad~-Herm ite and simultaneous Pad~ approximants. A CM Trans. Math. Software, 1996. Submitted. [47] S. Cabay and G. Labahn. A superfast algorithm for multi-dimensional Pad~ systems. Numer. Algorithms, 2(2):201-224, 1992. [48] S. Cabay, G. Labahn, and B. Beckermann. On the theory and computation of non-perfect Pad~-Hermite approximants. J. Comput. Appl. Math., 39:295-313, 1992.
BIBLIOGRAPHY
417
[49] S. Cabay and R. Meleshko. A weakly stable algorithm for Pad~ approximants and the inversion of Hankel matrices. SIAM J. Matrix Anal. Appl., 14:735-765, 1993. [50] A.L. Cauchy. Cours d'analyse de l'Ecole Royale Polytechnique. Premiere pattie" Analyse Algdbrique. Paris, 1821. [51] T. F. Chan and P.C. Hansen. A look-ahead Levinson algorithm for general Toeplitz systems. IEEE Trans. Sig. Proc., 40(5):1079-1090, 1992. [52] T. F. Chan and P.C. Hansen. A look-ahead Levinson algorithm for indefinite Toeplitz systems. SIAM J. Matrix Anal. Appl., 13(2):490506, 1992. [53] S. Chandrasekaran and A.H. Sayed. Stabihzing the generahzed Schur algorithm. SIAM J. Matrix Anal. Appl., 17:950-983, 1996. [54] P.L. Chebyshev. Sur les fractions continues. Journ. de Math. Pures et Appliqudes, Sdr II, 3:289-323, 1858. See (Euvres, Tome I, Chelsea Pub. Comp. pp.-. [55] P.L. Chebyshev. Sur l'interpolation par la m~thode des moindres carr~s. Mere. Acad. Impdr. des Sciences St. Petersbourg, sdr. 7, 1:124, 1859. See (Euvres, Tome I, Chelsea Pub. Comp. pp. 471-498. [56] P.L. Chebyshev. Sur les fractions continues alg~briques. Journ. de Math. Pures et Appliqudes, Sdr II, 10:353-358, 1865. See (Euvres, Tome I, Chelsea Pub. Comp. pp. 609-614. [57] P.L. Chebyshev. Sur le d~veloppement de fonctions en s~ries s l'aide des fractions continues. In A. Markoff and N. Sonin, editors, (Euvres de P.L. Tchebycheff, Tome I, pages 615-631, New York, 1866. Chelsea Publishing Company. (Original in Russian Mem. Acad. Impdr. des Sciences St. Petersbourg, 9, Append. 1). [58] P.L. Chebyshev. Sur la d~termination des fonctions d'apr~s les valeurs qu'elles ont pour certaines valeurs de variables. Math. Sb., 4:231-245, 1870. See (Euvres, Tome II, Chelsea Pub. Comp. pp. 71-82. [59] G. Chen and Z. Yang. Bezoutian representation via Vandermonde matrices. Linear Algebra Appl., 186:37-44, 1993. [60] G.-N. Chen and H.-P. Zhang. Note on products of Bezoutians and Hankel matrices. Linear Algebra Appl., 225:23-36, 1995.
418
BIBLIOGRAPHY
[61] T. Chihara. An introduction to orthogonal polynomials. Gordon and Breach Science Publishers, New York, 1978. [62] G.C. Clark Jr. and J.B. Cain. Error-correction coding for digital communications. Plenum Press, New York, London, 1981. [63] A. Cohen, I. Daubechies, and J.C. Feauveau. Biorthogonal bases of compactly supported wavelets. Comm. Pure Appl. Math., 45(5):485560, 1992. [64] A. Cohn. Uber die Anzahl der Wurzeln einer Algebraischen Gleichung in einem Kreise. Math. Zeit., 14:110-148, 1922. [65] P.M. Cohn. Free rings and their relations. Academic Press, London, 1971. [66] P.M. Cohn. Algebra, volume 1. John Wiley & Sons, Chichester, 2nd edition, 1982. [67] J.K. Cullum and I~.A. Willoughby. Lanczos algorithms for large symmetric eigenvalue computations, Volume 1, Theory. Birkh/~user Verlag, Basel, 1985. [68] A. Cuyt. A review of multivariate Pad~ approximation theory. Jo Comput. Appl. Math., 12/134:221-232, 1985. [69] A. Cuyt and L. Wuytack. Nonlinear methods in numerical analysis, volume 136 of Mathematical Studies. North Holland, Amsterdam, 1987. [70] A.M. Cuyt. Padd approzimants for operators: theory and applications, volume 1065 of Lecture Notes in Math. Springer, 1984. [71] A.M. Cuyt. General order multivariate rational Hermite interpolants. Habilitation, University of Antwerp, July 1986. [72] G. Cybenko. An explicit formula for Lanczos polynomials. Linear Algebra Appl., 88/89:99-115, 1987. [73] I. Daubechies and W. Sweldens. Factoring wavelet transforms into lifting steps. 1996. Technical Report, Bell Labs, Lucent Technologies. [74] M. Decoster and A.B.. van Cauwenberghe. A comparative study of different reduction methods (part 2). Journal A, 17(3):125-134, 1976.
BIBLIOGRAPHY
419
[75] Ph. Delsarte and Y. Genin. The split Levinson algorithm. IEEE Trans. Acoust. Speech Signal Process., ASSP-34:470-478, 1986. [76] Ph. Delsarte and Y. Genin. On the splitting of classical algorithms in linear prediction theory. IEEE Trans. Acoust. Speech Signal Process., ASSP-35:645-653, 1987.
[77] Ph. Delsarte and Y. Genin. A survey of the split approach based techniques in digital signal processing applications. Phillips J. Res., 43:346-374, 1988. [78] Ph. Delsarte and Y. Genin. The tridiagonal approach to Szegt~ orthogonal polynomials, Toeplitz linear systems and related interpolation problems. SIAM J. Math. Anal., 19:718-735, 1988. [79] Ph. Delsarte and Y. Genin. An introduction to the class of split Levinson algorithms. In G. Golub and P. Van Dooren, editors, Numerical linear algebra, digital signal processing and parallel algorithms, volume 70 of NATO-ASI Series, F: Computer and Systems Sciences, pages 112-130, Berlin, 1991. Springer. [80] Ph. Delsarte and Y. Genin. On the split approach based algorithms for DSP problems. In G. Golub and P. Van Dooren, editors, Numerical linear algebra, digital signal processing and parallel algorithms, volume 70 of NATO-ASI Series, F: Computer and Systems Sciences, pages 131-147, Berlin, 1991. Springer. [81] Ph. Delsarte, Y. Genin, and Y. Kamp. Application of the index theory of pseudo-lossless functions to the Bistritz stability test. Philips J. Res., 39:226-241, 1984. [82] Ph. Delsarte, Y. Genin, and Y. Kamp. A generalization of the Levinson algorithm for hermitian Toeplitz matrices with any rank profile. IEEE Trans. Acoust. Speech Signal Process., ASSP-33(4):964-971, 1985. [83] Ph. Delsarte, Y. Genin, and Y. Kamp. Pseudo-lossless functions with application to the problem of locating the zeros of a polynomial. IEEE Trans. Circuits and Systems, CAS-32:371-381, 1986. [84] Ph. Delsarte, Y. Genin, and Y. Kamp. A survey od the split approach based techniques in digital signal processing. Philips J. Res., 43:346374, 1988.
420
BIBLIOGRAPHY
[85] G. Deslauriers and S. Dubuc. Interpolation dyadique. In Fractals, Dimensions non-enti~res et applications, pages 44-55, Paris, 1987. Masson. [86] G. Deslauriers and S. Dubuc. Symmetric iterative interpolation processes. Constr. Approx., 5(1):49-68, 1989. [87] A. Draux. Polyn6mes orthogonauz formels - applications, volume 974 of Lecture Notes in Math. Springer, Berlin, 1983. [88] A. Draux and P. van Ingelandt. Polyn6mes Othogonaux et Approximants de Padd- Logiciels. Editions Technip, Paris, 1987. [89] T. Erd~lyi, P. Nevai, J. Zhang, and J.S. Geronimo. A simple proof of "Favard's theorem" on the unit circle. Atti. Sem. Mat. Fis. Univ. Modena, 29:551-556, 1991. Proceedings of the Meeting "Trends in Functional Analysis and Approximation Theory", 1989, Italy. [90] M. Fiedier. Special matrices and their Applications in Numerical Mathematics. Martinus Nijhoff Publ., Dordrecht, 1986. [91] M. Fiedler and V. Pts Loewner and Bezout matrices. Linear Algebra Appl., 101:187-202, 1988. [92] G. Freud. Orthogonal polynomials. Pergamon Press, Oxford, 1971. [93] K.W. Freund. A look-ahead Bareiss algorithm for general Toeplitz matrices. Numer. Math., 68:35-69, August 1994. AT~T Bell Labs. Numerical Analysis Manuscript 93-11. [94] I~.W. Freund, G.H. Golub, and N.M. Nachtigal. Iterative solution of linear systems. Acta Numerica, 1:57-100, 1992. [95] P~.W. Freund and N.M. Nachtigal. QMK: A quasi-minimal residual method for non-Hermitian matrices. Numer. Math., 60:315-339, 1991. [96] P~.W. Freund and N.M. Nachtigal. Implementation details of coupled QMR algorithm. In L. Reichel, A. l~uttan, and K.S. Varga, editors, Numerical Linear Algebra, pages 123-140, Berlin, 1993. W. de Gruyter. [97] I~.W. Freund and N.M. Nachtigal. An implementation of the QMR method based on coupled two-term recurrences. SIAM J. Sci. Statist. Comput., 15(2), 1993.
BIBLIOGRAPHY
421
[98] R,.W. Freund and H. Zha. Formally biorthogonal polynomials and a look-ahead Levinson algorithm for general Toeplitz systems. Linear Algebra Appl., 188/189:255-303, 1993. [99] I~.W. Freund and H. Zha. A look-ahead strategy for the solution of general Hankel systems. Numer. Math., 64:295-322, 1993. [100] G. Frobenius. Uber relationen zwischen den N~herungsbriichen yon Potenzreihen. J. Reine Angew. Math., 90:1-17, 1881. [101] P.A. Fuhrmann. A polynomial approach to linear algebra. Springer, 1996. [102] K. Gallivan, S. Thirumalai, P. Van Dooren, and V. Vermaut. High performance algorithms for Toeplitz and block Toeplitz matrices. Linear Algebra Appl., 241-243:343-388, 1996. [103] K. Gallivan, S. Thirumalai, and P. Van Dooren. A block Toeplitz lookahead Schur algorithm. In M. Moonen and B. De Moor, editors, SVD and Signal Processing III, pages 199-206, Amsterdam, 1995. Elsevier. [104] F.I~. Gantmacher. The theory of matrices, Volume II. Chelsea, New York, 1959. [105] W. Gautschi. On generating orthogonal polynomials. SIAM J. Sci. Statist. Comput., 3:289-317, 1982. [106] Y. Genin. An introduction to the modern theory of positive functions and some of its today applications to signal processing, circuits and system problems. In Advances in modern circuit theory and design (Paris, 1987), pages 195-234, Amsterdam-New York, 1987. NorthHolland.
[107] Y. Genin. Euclid algorithm, orthogonal polynomials and generalized l~outh-Hurwitz algorithm. Linear Algebra Appl., 246:131-158, 1996. [108] Y. Genin. On polynomials nonnegative on the unit circle and related questions. Linear Algebra Appl., 1996. To appear. [109] I. Gohberg, editor. I. Schur methods in operator theory and signal processing, volume 18 of Oper. Theory: Adv. Appl. Birkhs Verlag, Basel, 1986.
422
BIBLIOGRAPHY
[110] I. Gohberg, T. Kailath, I. Koltracht, and P. Lancaster. Linear complextity parallel algorithms for linear systems of equations with recursive structure. Linear Algebra Appl., 88/89:271-315, 1987. [111] I. Gohberg, T. Kailath, and V. Olshevsky. Fast Gaussian elimination with partial pivoting for matrices with displacement structure. Math. Comp., 64:1557-1576, 1995. [112] I. Gohberg, P. Lancaster, and L. P~odman. Invariant Subspaces of Matrices with Applications. John Wiley & Sons, New York, 1986. [113] I.C. Gohberg, T. Kailath, and I. Koltracht. Efficient solution of linear systems of equations with recursive structure. Linear Algebra Appl., 80:81-113, 1986. [114] G. Gomez and L. Lerer. Generalized Bezoutians for analytic operator functions and inversion of structured operators. In U. Helmke, 1~. Mennicken, and J. Saurer, editors, Systems and Networks: Mathematical theory and applications, volume II, volume 79 of Mathematical Research, pages 691-696. Akademie Verlag, 1994. [115] V.D. Goppa. A new class of linear error-correcting codes. Peredach. Inform., 6:24-30, 1970.
Probl.
[116] M.J.C. Gover and S. Barnett. Inversion of Toeplitz matrices which are not strongly non-singular. IMA J. Numer. Anal., 5:101-110, 1985. [117] W. B. Gragg and M. H. Gutknecht. Stable look-ahead versions of the Euclidean and Chebyshev algorithms. In K.V.M. Zahar, editor, Approximation and Computation: A Festschrifl in Honor of Walter Gautschi, pages 231-260. Birkhs Verlag, 1994. [118] W.B. Gragg. The Pad~ table and its relation to certain algorithms of numerical analysis. SIAM Rev., 14:1-62, 1972. [119] W.B. Gragg. Matrix interpretations and applications of the continued fraction algorithm. Rocky Mountain J. Math., 4(2):213-225, 1974. [120] W.B. Gragg. Positive definite Toeplitz matrices, the Arnoldi process for isometric operators and Gaussian quadrature on the unit circle. In E.S. Nikolaev, editor, Numerical methods in linear algebra, pages 16-32. Moscow university press, 1982. (In Russian). [121] W.B. Gragg and A. Lindquist. On the partial realization problem. Linear Algebra Appl., 50:277-319, 1983.
BIBLIOGRAPHY
423
[122] M. Gu. Stable and efficient algorithms for structured systems of linear equations. Technical Report LBL-37690, Lawrence Berkeley Laboratory, University of California, Berkeley, 1995. [123] M. Gutknecht and M. Hochbruck. Optimized look-ahead recurrences for adjacent rows in the Pad~ table. BIT, 36:264-286, 1996. [124] M.H. Gutknecht. The unsymmetric Lanczos algorithms and their relations to Pad~ approximation, continued fractions, the QD algorithm, biconjugate gradient squared algorithms and fast Hankel solvers. In Proceedings of the Copper Mountain Conference on Iterative Methods, 1990. [125] M.H. Gutknecht. A completed theory of the unsymmetric Lanczos process and related algorithms. Part I. SIAM J. Matrix Anal. Appl., 13(2):594-639, 1992. [126] M.H. Gutknecht. Stable row recurrences for the Pad~ table and generically superfast lookahead solvers for non-Hermitian Toeplitz systems. Linear Algebra Appl., 188/189:351-422, 1993. [127] M.H. Gutknecht. A completed theory of the unsymmetric Lanczos process and related algorithms. Part II. SIAM J. Matrix Anal. Appl., 15:15-58, 1994. [128] M.H. Gutknecht. Lanczos-type solvers for nonsymmetric linear systems of equations. Acta Numerica, 6, 1997. [129] M.H. Gutknecht and M. Hochbruck. Look-ahead Levinson- and Schurtype recurrences in the Pad~ table. Electron. Trans. Numer. Anal., 2:104-129, 1994. [130] M.H. Gutknecht and M. Hochbruck. Look-ahead Levinson and Schur algorithms for non-Hermitian Toeplitz systems. Numer. Math., 70:181-228, 1995. [131] S. Gutman and E.I. Jury. A general theory for matrix root-clustering in subregions of the complex plane. IEEE Trans. Automat. Control, 26:853-863, 1981. [132] P.C. Hansen and P.Y. Yalamov. Stabilization by perturbation of a 4n 2 Toeplitz solver. Preprint N25, Technical University of l~usse, Bulgaria, January 1995. Submitted to SIAM J. Matrix Anal. Appl.
424
BIBLIOGRAPHY
[133] G. Heinig. Beitriige zur Spektraltheorie yon Operatorbiischeln und zur algebraischen Theorie yon Toeplitzmatrizen. PhD thesis, TH KarlMarx-Stadt, 1979. [134] G. Heinig. Inversion of Toeplitz and Hankel matrices with singular sections. Wiss. Zeitschr. d. TH. Karl-Marx-Stadt, 25(3):326-333, 1983. [135] G. Heinig. On structured matrices, generalized Bezoutians and generalized Christoffel-Darboux formulas. In H. Bart, I. Gohberg, and M.A. Kaashoek, editors, Topics in matrix and operator theory, volume 50 of Oper. Theory: Adv. Appl., pages 267-281. Birkhs Verlag, 1991. [136] G. Heinig. Inversion of generalized Cauchy matrices and other classes of structured matrices. In Linear Algebra in Signal Processing, volume 69 of IMA volumes in Mathematics and its Applications, pages 95-114. IMA, 1994. [137] G. Heinig. Inversion of Toeplitz-like matrices via generalized Cauchy matrices and rational interpolation. In Systems and Networks: Mathematical Theory and Applications, volume 2, pages 707-711. Akademie Verlag, 1994. [138] G. Heinig. Matrix representations of Bezoutians. Appl., 1994.
Linear Algebra
[139] G. Heinig. Solving Toeplitz systems via tangential Lagrange interpolation. SIAM J. Matrix Anal. Appl., 1996. Submitted. [140] G. Heinig. Transformation approaches for fast and stable solution of Toeplitz systems and polynomial equations. In Proceedings of the International Workshop "Recent Advances in Applied Mathematics", pages 223-238, State of Kuwait, May 4-7 1996. [141] G. Heinig and A. Bojanczyk. Transformation techniques for Toeplitz and Toeplitz-plus-Hankel matrices I. Transformations. Linear Algebra Appl., pages 1-24, 1996. To appear. [142] G. Heinig and A. Bojanczyk. Transformation techniques for Toeplitz and Toeplitz-plus-Hankel matrices II. Algorithms. Linear Algebra Appl., pages 1-20, 1996. To appear. [143] G. Heinig and F. Hellinger. On the Bezoutian structure of the MoorePenrose inverses of Hankel matrices. SIAM J. Matrix Anal. Appl., 14(3):629-645, 1993.
BIBLIOGRAPHY
425
[144] G. Heinig and K. P~ost. Algebraic methods for Toeplitz-like matrices and operators. Akademie Verlag, Berlin, 1984. Also Birkhs Verlag, Basel. [145] G. Heinig and K. Kost. Matrices with displacement structure, generalized Bezoutians, and Moebius transforms. In H. Dym, S. Goldberg, M. Kaashoek, and P. Lancaster, editors, The Gohberg anniversary collection, volume I: The Calgary conference and matrix theory papers, volume 40 of Oper. Theory: Adv. Appl., pages 203-230, Boston, 1989. Birkhs Verlag. [146] U. Helmke. Rational functions and Bezout forms" a functional correspondence. Linear Algebra Appl., 122/123/124:623-640, 1987. [147] U. Helmke and P.A. Fuhrmann. Bezoutians. Linear Algebra Appl., 122/124:1039-1097, 1989. [148] P. Henrici. Power series, integration, conformal mapping, location of zeros, volume I of Applied and Computational Complex Analysis. John Wiley & Sons, New York, London, Sydney, Toronto, 1974. [149] P. Henrici. Applied and computational complex analysis. Volume 2: Special functions, integral transforms, asymptotics, continued fractions, volume II of Pure and Applied Mathematics, a Wileyinterscience series of texts, monographs and tracts. John Wiley Sons, New York, 1977. [150] M. Hochbruck. Further optimized look-ahead recurrences for adjacent rows in the Pad ~ table and Toeplitz matrix factorizations, August 1996. Manuscript. [151] M. Hochbruck. The Padd table and its relation to certain numerical algorithms. PhD thesis, Mathematische Fakults Universits Tfibingen, November 1996. [152] Hocquenghem. Codes correcteurs d'erreurs. Chiffres, 2"147-156, 1959. [153] A.S. Householder. Bigradients and the problem of Routh and Hurwitz. SIAM Rev., 10:56-66, 1968. [154] A.S. Householder. Bezoutiants, elimination and localization. SlAM Rev., 12"106-119, 1970.
426
BIBLIOGRAPHY
[155] T. Huckle. A look-ahead algorithm for solving nonsymmetric linear Toeplitz equations. In Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, Snowbird, Utah, June 199~, pages 455-459, 1994. [156] I.S. Iohvidov. Hankel and Toeplitz matrices and forms. Birkh~iuser Verlag, Boston, 1982. [157] C.G.J. Jacobi. Uber die Darstellung einer l~eihe gegebener Werte durch einer gebrochenen rationale Funktion. J. fiir reine und angew. Math., 30:127-156, 1845. [158] E.A. Jonckheere and C. Ma. l~ecursive partial realization from combined sequence of Markov parameters and moments. Linear Algebra Appl., 122/123/124:565-590, 1989. [159] E.A. Jonckheere and C. Ma. A simple Hankel interpretation for the Berlekamp-Massey algorithm. Linear Algebra Appl., 125:65-76, 1989. [160] M. T. Jones. The use of Lanczos' method to solve the generalized eigenproblem. PhD thesis, Duke University, Department of Mathematics, 1990. [161] W.B. Jones and W.J. Thron. Continued Fractions. Analytic Theory and Applications. Addison-Wesley, Reading, Mass., 1980. [162] W. Joubert. Generalized conjugate gradient and Lanczos methods for the solution of nonsymmetric systems of linear equations. PhD thesis, Center of Numerical Analysis, The University of Texas at Austin, Austin, Texas, January 1990. l~eport CNA-238. [163] E.I. Jury. Theory and applications of the z-transform method. J. Wiley Sons, New York, 1964. [164] E.I. Jury. Inners and stability of dynamical systems. Wiley, New York, 1974. [165] T. Kailath. Linear Systems. Prentice-Hall, Inc., Englewood Cliffs, N.J., 1980. [166] T. Kailath and A.H. Sayed. Displacement structure: theory and applications. SIAM Rev., 37:297-386, 1995.
BIBLIOGRAPHY
427
[167] 1~. Kalman. On partial realizations, transfer functions, and canonical forms. Acta Technica Scandinavica, 31:9-32, 1979. (Zbl 424.93020, MR 80k:93022). [168] tt.E. Kalman. Algebraic characterization of polynomials lie in certain algebraic domain. Proc. Natl. Acad. Sci. USA, 64:818-823, 1969. [169] P..E. Kalman, P.L. Falb, and M.A. Arbib. Topics in mathematical system theory. International Series in Pure and Applied Mathematics. McGraw-Hill Book Company, New York, San Francisco, St. Louis, Toronto, London, Sydney, 1969. [170] M.D. Kent. Chebyshev, Krylov, Lanczos : matrix relationships and computations. PhD thesis, Stanford University, Dept. of Computer Science, June 1989. l~ept. STAN-CS-89-1271. [171] A.N. Khovanskii. The application of continued fractions and their generalizations to problems in approximation theory. Noordhoff, Groningen, 1963. [172] M. Kimura. Chain scattering approach to H-infinity-control. Birkhs Verlag, 1997. [173] D.E. Knuth. The art of computer programming, Vol 2 : Seminumerical algorithms. Addison Wesley, Reading, Mass., 1969. [174] P. Kravanja and M. Van Bard. A fast Hankel solver based on an inversion formula for Loewner matrices. Linear Algebra Appl., 1996. Submitted. [175] M.G. KreYn and M.A. Naimark. The method of symmetric and Hermitian forms in the theory of the separation of roots of algebraic equations. 1936. Translated in English: Linear and Multilinear Algebra, 10 (1981)265-308. [176] L. Kronecker. Zur Theorie der Elimination einer Variabel aus zwei algebraischen Gleichungen. Monatsber. K6nigl. Preus. Akad. Wiss., pages 535-600, 1881. [177] S. Kung. Multivariable and multidimensional systems : analysis and design. PhD thesis, Stanford University, 1977. [178] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Stand., 45:225-280, 1950.
428
BIBLIOGRAPHY
[179] F.I. Lander. The Bezoutian and the inversion of Hankel and Toeplitz matrices. Matem. Issled., 9(2):69-87, 1974. [180] N. Levinson. The Wiener rms (root mean square) error criterion in filter design and prediction. J. Math. Phys., 25:261-278, 1947. [181] J.D. Lipson. Elements of algebra and algebraic computing. Addison Wesley Publishing Co., Reading, Mass., 1981. [182] C. Van Loan. Matriz frameworks for the Fast Fourier Transform, volume 10 of Frontiers in Applied Mathematics. SIAM, 1992. [183] L. Lorentzen and H. Waadeland. Continued fractions with applications, volume 3 of Studies in Computational Mathematics. NorthHolland, 1992. [184] A. Magnus. Certain continued fractions associated with the Pad~ table. Math. Z., 78:361-374, 1962. [185] A. Magnus. Expansion of power series into P-fractions. Math. Z., 80:209-216, 1962. [186] M. Marden. The geometry of polynomial approzimants. Amer. Math. Soc., Providence, l~.I., 1966. [187] P. Maroni. Sur quelques espaces de distributions qui sont des formes lin~aires sur l'espace vectoriel des polyn6mes. In C. Brezinski, A. Draux, A.P. Magnus, P. Maroni, and A. l~onveaux, editors, Proc. Polyn6mes Orthogoneaux et Applications, Bar-le-Duc, 198~, volume 1171 of Lecture Notes in Math., pages 184-194. Springer, 1985. [188] P. Maroni. Prol~gom~nes s l'~tude de polynSmes orthogonaux. Ann. Mat. Pura ed Appl., 149:165-184, 1987. [189] P. Maroni. Le calcul de formes lin~aires et des polyn6mes orthogonaux semi-classiques. In M. Alfaro et al., editors, Orthogonal polynomials and their applications, volume 1329 of Lecture Notes in Math., pages 279-288. Springer, 1988. [190] P. Maroni. Une th~orie alg~brique de polyn6mes orthogonaux. Application aux polynSmes orthogonaux semi-classiques. In C. Brezinski, L. Gori, and A. l~onveaux, editors, Orthogonal Polynomials and their Applications, volume 9 of IMACS annals on computing and applied mathematics, pages 95-130, Basel, 1991. J.C. Baltzer AG.
BIBLIOGRAPHY
429
[191] P. Maroni. An introduction to second degree forms. Adv. Comput. Math., 3:59-88, 1995. [192] J.L. Massey. Shift-register synthesis and BCH decoding. IEEE Trans. Inf. Th., IT-15:122-127, 1969. [193] l~.J. McEliece. The theory of information and coding: A mathematical framework for communication, volume 3 of Encyclopedia of mathematics. Addison Wesley, Reading, Mass., 1977. [194] l~.J. McEliece and J.B. Shearer. A property of Euclid's algorithm and an application to Pad~ approximation. SIAM J. Appl. Math., 34:611-616, 1978. [195] M.S. Moonen, G. Golub, and B.L.I~. De Moor, editors. Linear algebra for large scale and real-time applications, volume 232 of NATO-ASI series E: Applied Sciences. Kluwer Acad. Publ., Dordrecht, 1993. [196] M. Morf. Fast algorithms for multivariable systems. PhD thesis, Stanford University, 1974. [197] O. Nevanlinna. Convergence of iterations for linear equations. Birkhs Verlag, 1993. [198] H. Pad~. Sur la Reprdsentation Approchde d'une Fonction par des Fractions Rationelles. PhD thesis, Ann. Ecole. Norm. Sup., vol. 9, pages 3-93, Paris, 1892. [199] D. Pal and T. Kailath. Fast triangular factorization and inversion of Hermitian Toeplitz, and related matrices with arbitrary rank profile. SIAM J. Matrix Anal. Appl., 14(4):1016-1042, 1993. [200] B.N. Parlett. Reduction to tridiagonal form and minimal realizations. SIAM J. Matrix Anal. Appl., 13(2):567-593, 1992. [201] B.N. Parlett, D.R. Taylor, and Z.A. Liu. A look-ahead Lanczos algorithm for unsymmetric matrices. Mathematics of Comp., 44(169):105124, 1985. [202] O. Perron. Die Lehre yon den Kettenbriichen. Teubner, 1977. [203] S. Pombra, H. Lev-Ari, and T. Kailath. Levinson and Schur algorithms for Toeplitz matrices with singular minors. In Int. Conf. Acoust., Speech and Signal proc., pages 1643-1646, 1988.
430
BIBLIOGRAPHY
[204] V. Pts Explicit expressions for Bezoutians. Linear Algebra Appl., 59:43-54, 1984. [205] V. Pts Lyapunov, Bezout and Hankel. Linear Algebra Appl., 58:363390, 1984. [206] L. l~dei. Algebra, Erster Tell. Akademische Verlagsgesellschaft, 1959. [207] I.S. Reed and G. Solomon. Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math., 8:300-304, 1960. [208] J. Rjssanen. l~ecursive identification of linear systems. SIAM J. Control, 9(3):420-430, 1971. [209] K. l~ost. MSbius transformations, matrix representations for generalized Bezoutians and fast algorithms for displacement structure systems of equations. Wiss. A. Techn. Univ. Chemnitz, 33:29-36, 1991. [210] K. l~ost. Generalized companion matrices and matrix representations for generalized Bezoutians. Linear Algebra Appl., 193:151-172, 1993. [211] K. l~ost. Generalized Lyapunov equations, matrices with displacement structure and generalized Bezoutians. Linear Algebra Appl., 193:7594, 1993. [212] E.M. l~ussakovskii. The theory of V-Bezoutians and its applications. Linear Algebra Appl., 212/213:437-460, 1994. [213] Y. Saad. Numerical methods for large eigenvalue problems. Algorithms and Architectures for Advanced Scientific Computation. Manchester University Press/Halsted Press, 1992. [214] E. Saff. Orthogonal polynomials from a complex perspective. In P. Nevai, editor, Orthogonal polynomials, pages 363-393, Dordrecht, 1990. NATO, Kluwer Academic Press. [215] I. Schur. Uber Potenzreihen die im Innern des Einheitskreises Beschrs sind I. J. Reine Angew. Math., 147:205-232, 1917. See also [109, p.31-59]. [216] T.J. Stieltjes. Quelques recherches sur la th~orie des quadratures dites m~caniques. Ann. Aci. Ecole Norm. Paris, Sdr. 3, 1:409-426, 1884. Oeuvres vol. 1, pp. 377-396.
BIBLIOGRAPHY
431
[217] G.W. Struble. Orthogonal polynomials: variable-signed weight functions. Numer. Math., 5:88-84, 1963. [218] Y. Sugiyama. An algorithm for solving discrete-time Wiener-Hopf equations based upon Euclid's algorithm. IEEE Trans. Inf. Th., IT32(3):394-409, 1986. [219] Y. Sugiyama, M. Kasahara, S. Hirasawa, and T. Namekawa. A method for solving key equation for decoding goppa codes. Information and Control, 27:87-99, 1975. [220] W. Sweldens and P. SchrSder. Building your own wavelets at home. In Wavelets in Computer Graphics, ACM SIGGRAPH Course Notes. ACM, 1996. [221] J.J. Sylvester. On the theory of sysygetic relations of two rational integral functions, comprising an application to the theory of Sturm functions and that of the greatest algebraic common measure. Philos. Trans. Roy. Soc. London, 143:407-548, 1853. [222] G. Szeg~. Orthogonal polynomials, volume 33 of Amer. Math. Soc. Colloq. Publ. Amer. Math. Soc., Providence, Rhode Island, 3rd edition, 1967. First edition 1939. [223] D.K. Taylor. Analysis of the look ahead Lanczos algorithm. PhD thesis, Center for Pure and Applied Mathematics, University of California, Berkeley, 1982. [224] E.E. Tyrtyshnikov. New cost-effective and fast algorithms for special classes of Toeplitz systems. Soy. J. Numer. Math. Modelling, 3(1):6376, 1988. [225] M. Van Barel. Nested Minimal Partial Realizations and Related Matriz Rational Approximants. PhD thesis, K.U. Leuven, January 1989. [226] M. Van Barel and A. Bultheel. A new approach to the rational interpolation problem: the vector case. J. Comput. Appl. Math., 33(3):331346, 1990. [227] M. Van Barel and A. Bultheel. The computation of non-perfect Pad~Hermite approximants. Numer. Algorithms, 1:285-304, 1991. [228] M. Van Barel and A. Bultheel. A general module theoretic framework for vector M-Pad~ and matrix rational interpolation. Numer. Algorithms, 3:451-461, 1992.
432
BIBLIOGRAPHY
[229] M. Van Barel and A. Bultheel. The "look-ahead" philosophy applied to matrix rational interpolation problems. In U. Helmke, 1~. Mennicken, and J. Saurer, editors, Systems and networks: Mathematical
theory and applications, Volume II: Invited and contributed papers, volume 79 of Mathematical Research, pages 891-894. Akademie Verlag, 1994. [230] M. Van Barel and A. Bultheel. A look-ahead method to compute vector Pad@-Hermite approximants. Constr. Approx., 11:455-476, 1995. [231] M. Van Barel and A. Bultheel. Look ahead methods for block Hankel systems. J. Comput. Appl. Math., 1995. Submitted. [232] M. Van Barel and A. Bultheel. Look-ahead schemes for block Toeplitz systems and formal orthogonal matrix polynomials. In Proceedings
of the Workshop on orthogonal polynomials: the non-definite case, Rouen, France, April 2~-26, 1995, 1995. To appear. [233] M. Van Barel and A. Bultheel. A look-ahead algorithm for the solution of block Toeplitz systems. Linear Algebra Appl., 1996. Accepted. [234] J. van Iseghem. Approximants de Padd vectoriels. PhD thesis, Lille, 1987. [235] J. van Iseghem. Convergence of the vector QD-algorithm. Zeros of vector orthogonal polynomials. J. Comput. Appl. Math., 25:33-46, 1989. [236] H. Viscovatoff. De la m@thode g@n@rale pour r@duire toutes sortes de quantit@s en fractions continues. Mdm. Acad. Sci. Imp. St. Pdtersbourg, 1"226-247, 1805. [237] H.S. Wall. Analytic theory of continued fractions. Princeton, 1948.
Van Nostrand,
[238] H.S. Wall. Analytic theory of continued fractions. Chelsea, 1973. [239] W.A. Wolovich. Linear Multivariable Systems. Springer, New York, 1974. [240] P. Wynn. A general system of orthogonal polynomials. Math., 18:81-96, 1967.
Quart. J.
BIBLIOGRAPHY
433
[241] P.Y. Yalamov. Convergence of the iterative refinement procedure applied to stabilization of a fast Toeplitz solver. In P. Vassilevski and S. Margenov, editors, Proc. Second I M A C S Symposium on Iterative Methods in Linear Algebra, pages 354-363. IMACS, 1996. [242] D.M. Young and K.C. Jea. Generalized conjugate-gradient acceleration of non-symmetrizable iterative methods. Linear Algebra Appl., 34:159-194, 1980. [243] O. Zariski and P. Samuel. Commutative algebra, vol. II, volume 29 of Graduate texts in mathematics. Springer, New York, Heidelberg, Berlin, 1960.
List of Algorithms i.I
Scalar_Euclid
1.2
Extended_Euclid
1.3
Initialize
1.4
Comput e_Gk
1.5
Normalize
.......................... .........................
2 ii
............................
II
............................
II
............................. ..........................
Ii
1.6
Atomic_Euclid
1.7
U p d a t e_k
1.8
Comput e_Gk(v)
..........................
45
1.9
Comput e_Gk(V)
..........................
48
2.1
Block_MGS
.............................
............................. ..............................
35
36
86
3.1
Lanczos
4.1
Two_s i d e d _ G S
4.2
Two_s i d e d _ M G S
4.3
Block_two_sided_MGS
......................
142
4.4
Generalized_Lanczos
......................
167
4.5
Block_MGSP
4.6
Toeplitz_Euclid
5.1
Massey
6.1
Routh_Hurwitz
........................... ..........................
............................ .........................
............................... ..........................
435
ii0 140 140
171 204 268 319
Index Euclidean type, 201 extended Euclidean, 9, 23, 30, 69,102,104, 111,239,242, 293,296 fast, 61, 94, 133,382 generalized Lanczos, 167 Gram-Schmidt, 139 Lanczos, 72,100,115,135,166, 285 Look-Ahead, 99, 107 layer adjoining, 37 layer peeling, 37 left sided Euclidean, 55 Levinson, 306,323 Look-Ahead-Lanczos, 107 modified Gram-Schmidt, 140, 152, 166 right sided atomic Euclidean, 35 right sided Euclidean, 23 Routh-Hurwitz, 306 Schur, 158,381 Schur-Cohn, 323,325 split Levinson, 334 superfast, 61,381 Toeplitz Euclidean, 204 two-sided Gram-Schmidt, 135, 186 Viscovatoff, 37, 95, 158, 235, 241,249 annihilating polynomial, 302 antidiagonal, 234, 241, 251, 255,
Atomic_Euclid, 35 Block_MGS, 86 Block_MGSP, 171 Block_two_s ided_MGS, 142 Compute_Gk, ii Comput e_Gk(v), 45 Compute_Gk(v), 48 Extended_Euclid, ii Initialize, ii Lanczos, Ii0 Massey, 268 Normalize, 11 Routh_Hurwitz, 319 Scalar_Euclid, 2 Two_sided_GS, 140 Two_sided_MGS, 140 Update_k, 36
adjoint, 166 adjoint mapping, 137 algebraic curve, 226 algebraic variety, 227 algorithm atomic Euclidean, 32,241,270, 294 Berlekamp-Massey, 24,241,265 Chebyshev, 23, 381 classical l~outh-Hurmitz, 315 conjugate gradient, 105 division, 292 Euclidean, 32, 62, 68, 73, 78, 234, 254, 265, 295, 303, 306, 381 436
INDEX
260, 264, 300, 351, 364, 368,376 approximant, 18, 31 minimal Pad6, 254, 300 mPA-Baker, 261 mPA-Frobenius, 255 PA-Baker, 232 PA-Frobenius, 233,366 Pad6, 31, 99, 231, 232, 233, 292 Baker, 300 Frobenius, 300 minimal, 254 two-point, 200, 292,304 Pad6-Hermite, 382 rational, 23, 30, 159, 200,351 simultaneous Pad6, 382 vector Pad6, 163 artificial row, 256 associates, 4 backward substitution, 37 basis (G, ~)-basis, 355 biorthogonal, 101 canonical, 274, 370 dual, 137 module, 354 BCH code, 265 behavior steady state, 290 transient, 290 Bezout equation, 178 theorem, 14 Bezoutian, 88, 176 Bezoutiant, 176 bilinear form, 135, 136, 138, 166, 226 biorthogonal bases, 101
437 biorthogonal polynomials, 139 biorthogonality, 102, 135,392 biorthogonalization, 166 two-sided Gram-Schmidt, 104 biorthonormal, 104 block index, 141,152,153,154,159, 168, 187, 197 right, 190 singular, 239,255,263 size, 141,193 block orthogonal polynomial, 141 breakdown, 105,112,115,127, 141 curable, 128,327 incurable, 106, 130 canonical basis, 274, 370 canonical decomposition theorem, 120 Cauchy index, 309 causal system, 273 Cayley transform, 326 characteristic of a Toeplitz matrix, 183 characteristic polynomial, 77,124, 129,356 Chebyshev algorithm, 23 Christoffel-Darboux relation, 90,170, 173, 217, 219 circle unit, 169 circuit theory, 312 class equivalence, 234 cluster, 108 code BCH, 265 Goppa, 265 P~eed-Solomon, 265 cofactor, 117, 283
438 common divisor, 246 comonic polynomial, 232 complementary filters, 405 complementary subspaces, 119 complex conjugate, 100, 169,310 complexity, 286, 295 composite, 6 conjugate gradient method, 99 constant system, 274 continued fraction, 15,231,294 approximant, 31 convergent, 16, 56 left sided formal, 56 principal part, 30 tail, 18, 30, 56 continuous time, 272, 287 controllability form, 284 convergent, 16, 56 convolution operator, 277 coprime, 6,232 polynomials, 118 curable breakdown, 128,327 decoding, 265 decomposition LDU, 143 partial fraction, 131 defect, 255,257, 261,264 degree, 22, 254, 264 H-degree, 358 McMillan, 286 degree residual, 358 delay, 274 denominator, 161 description external, 285 internal, 285 state space, 280,287, 297,300 determinant, 67, 70, 77, 80, 141, 146, 154, 160, 194, 212
INDEX
diagonal, 235, 240, 251,256,300 main, 239, 265 Dirac impulse, 287, 288 discrete Laplace transform, 287 discrete time, 272 division, 202,243 division property, 6, 22, 30 divisor, 4 common, 246 domain Euclidean, 1 frequency, 275,288 integral, 3 time, 275 down shift operator, 62, 125, 152, 182 dual basis, 137 space, 136, 138 dual MSbius transformation, 17 dynamical system, 273 echelon form, 72 eigenstructure, 127 eigenvalue, 77, 117, 289 eigenvector, 132 elimination elementary row, 148 Gaussian, 147 equivalence class, 234 equivalent realizations, 117, 286 Euclidean algorithm, 32, 62, 68, 73,234, 254, 265, 295 domain, 1, 2, 3,407 index, 68, 107, 141,239, 251, 253,259, 294 ring, 3, 18, 55 evaluation backward, 17
439
INDEX
forward, 17 exceptional situation, 253 value, 253 expansion Fourier, 151 Laurent series, 160 factorization, 147 LDU, 141 of the moment matrix, 197 triangular, 151 Favard theorem, 307, 323 field, 21 skew, 21,231 finite impulse response, 405 finite-dimensional system, 279 form bilinear, 136, 138, 166, 226 linear, 136, 138, 179 linearized, 234 rational, 159 reduced, 232,234 unreduced, 234 formal Fourier series, 149 formal Laurent series, 21 formal Newton series, 352 formal power series, 21,138 formal series, 21 Fourier expansion, 151 fraction polynomial, 295 rational, 232, 233,243 frequency domain, 275,288 function rational, 115, 160, 201, 232, 285 transfer, 285, 288, 290, 294, 295,298 fundamental system, 213
future, 273 Gaussian elimination, 147 (G, ~)-basis, 355 G-order, 353 grade, 101,123, 126, 132 Gram-Schmidt procedure, 151 greatest common divisor, 1 definition, 5 uniqueness, 5 Hamburger moment problem, 307 Hankel symbol, 63 H-Bezoutian, 88 H-degree, 358 higher order term, 201 homomorphism, 136, 272 Hurwitz polynomial strict sense, 315, 321 impulse Dirac, 287, 288 response, 277, 282, 284, 288, 291 signal, 274, 287 incurable breakdown, 106 indefinite weight, 169 index, 326 block, 141,152,153,154, 159, 162, 168, 187, 197 Cauchy, 309 Euclidean, 68, 107, 141, 239, 251,253,259,294, 302 Iohvidov, 184, 186, 189, 190, 193, 198,203 Kronecker, 68, 78, 107, 113, 115, 141, 239, 251, 259, 266,268, 294, 302 minimal, 250, 255, 256, 260, 261,263,300,302,305 constrained, 251,266
440 unconstrained, 253 of rational function, 311 inner polynomial, 143 inner product, 103, 105, 138, 166, 179, 182, 191,197, 266 nondegenerate, 139 input signal, 272,295 input-output map, 272 interpolation points, 352 invariant set, 274 invariant subspace, 101, 119, 127, 128 inverse, 76, 77 involution, 100, 136 Iohvidov index, 184, 186,189,190, 193, 198,203 isomorphism, 274 Jacobi parameter, 336 Jordan block, 124 chain, 126 curve, 169 form, 131 Julia set, 169 jump, 263, 264, 266,269 kernel, 64 moment generating, 149, 153, 162 reproducing, 150 Kronecker delta, 139,274 index, 68, 115, 141,239, 251, 259,266,268,294 theorem, 118, 123 Krylov matrix, 100,114,116,166,285 maximal subspace, 101 sequence, 100, 107, 135, 166 space, 100
INDEX
Lagrange polynomial, 137 Lanczos, 99 polynomials, 99 Laplace transform, 287, 402 Laurent polynomial, 169, 179,407 series, 21,236 lazy wavelet, 403 leading coefficient, 62 left-right duality, 54 lifting scheme, 385,400 linear form, 136, 138, 179 functional, 138, 169 operator, 166 system, 271,272 linear system continuous time, 287 discrete time, 287 input, 287 output, 287 state vector, 287 linearized form, 234 Loewner matrix, 169,382 look-ahead, 133,379,381 lower degree term, 201 MSbius transformation, 15 dual, 17 main diagonal, 239 Markov parameter, 289, 301 matrix adjugate, 117 bi-diagonal block, 211 block tridiagonal, 75 companion, 74, 76, 78, 79,152, 160,284 diagonal, 103 block, 79, 153, 198,251
INDEX
Frobenius, 210,212 Gram, 139 Hankel, 37, 63, 77, 78, 102, 107, 114, 116, 135, 15t, 169, 251,294, 299, 302 Hankel Hessenberg, 79 Hessenberg, 159,161,168,197, 210, 211 block, 151 block upper, 152, 153 unit upper, 152 infinite dimensional, 69 Jacobi, 75, 114, 129, 151 Krylov, 100, 114, 116, 166 Loewner, 169, 382 moment, 135, 139, 146, 149, 151, 153, 159, 166, 167, 178, 184, 190, 203, 226, 251,302, 304 truncated, 161 observability, 285 of bilinear form, 136 permutation, 147 positive definite, 102, 106 quasi-Hankel, 169, 173 quasi-Toeplitz, 169, 180 rank deficient, 68 reachability, 285 shifted, 78, 153 similar, 123 state transition, 280 Sylvester, 176,381 Toeplitz, 37,178,182,190,194, 197,304 triangular left upper, 63 lower/upper Hankel, 63 right lower, 63 unit upper, 76,102,140,185, 251
441 tridiagonal, 66, 74, 102, 103 Jacobi, 103 truncated, 168 unimodular, 300 Vandermonde, 137, 169, 382 weight, 103 McMillan degree, 117, 123, 126, 286, 300, 303 measure, 169 minimal annihilating polynomial, 250, 270 index, 250,255,256, 260,261, 300 constrained, 251,266 unconstrained, 253 Padd approximant, 254, 300 partial realization problem, 289, 291 polynomial, 126, 129 realization, 117, 126, 254,285 Mismatch theorem, 130 model reduction, 295 modified Gram-Schmidt procedure, 166 module, 354 moment, 139, 162, 166, 190 generating kernel, 149, t53,162 modified, 139 monic, 234 monomial, 407 mPA-Baker approximant, 261 mPA-Frobenius approximant, 255 multiplicity, 124 geometric, 126 multiresolution, 390 normal data, 360 normalization, 10, 27, 34, 62, 66, 74,104,111,232,258,261, 267
442 monic, 104, 242, 299 null space, 64, 251 numerator, 161 observability, 119,285 operator convolution, 277 delay, 274 down shift, 62, 125, 152, 182 linear, 166 projection, 276 reversal, 63, 182 shift, 274, 287 truncation, 69 order, 21,254, 264 of approximation, 266 of multiresolution, 393 order residual, 353 order vector, 353 orthogonal, 104, 119, 136 on an algebraic curve, 226 output signal, 272, 295 P-fraction, 30 P-part, 29 PA-Baker approximant, 232 PA-Frobenius approximant, 233,366 Pad~ approximant, 231,232,233,292 Baker, 300 Frobenius, 300 two-point, 200,292, 304 table, 232, 263,382 antidiagonal, 241,300 artificial row, 256 diagonal, 240,300 main diagonal, 239,265 nonnormal, 232 normal, 232 path, 235 singular block, 233, 239
INDEX
staircase, 241 Pad~-Hermite approximant, 382 para-conjugate, 310, 322 zero, 311 para-even, 310 para-odd, 310 part polynomial, 22,201 strictly proper, 22 partial correlation coefficient, 323 partial fraction decomposition, 125, 131,309 partial realization, 23, 271 mixed problem, 292 past, 273 path, 235 pencil, 81 persymmetry, 182 pivot element, 147 pole, 117, 124, 126,309 polynomial, 21 annihilating minimal, 250, 270 biorthogonal, 135, 139 block orthogonal, 141,161,186, 195 characteristic, 77,124,129,356 coprime, 118,232 fraction, 295 representation, 283, 295 inner, 143 Lanczos, 99 Laurent, 169, 179,407 minimal, 126, 129 monic, 234 orthogonal, 135,138,239,251, 294, 302,304, 306 on an algebraic curve, 226 part, 201 quasi-orthogonal, 147, 197
INDEX
second kind, 163 shifted, 210 Szeg6, 220 true orthogonal, 144, 184, 195, 252, 296 monic, 145 true right orthogonal monic, 154 polynomial part, 22 polyphase matrix, 403 present, 273 prime, 6 principal ideal domain, 355 principal part, 29 principal vector, 124, 132 procedure Gram-Schmidt modified, 166 updating, 190 product inner, 36 projection, 122,276 projector, 273 proper fls, 22 proper part, 29 pseudo-inner, 325 pseudo-lossless, 310, 325 quadruple, 280 quasi-Hankel matrix, 169, 173 quasi-orthogonal polynomial, 147, 197 quasi-Toeplitz matrix, 169, 180 quotient left, 30 polynomial, 22 rank, 68, 115, 123, 126 rational approximant, 159,199,200,351 form, 159
443 fraction, 232,233,243 function, 115, 160, 201, 232, 285 strictly proper, 162 reachability, 119, 285 realization, 1.17,285 equivalent, 117, 122,286 minimal, 117, 123, 126,285 partial, 271 triple, 118, 120 realization problem, 293,298 minimal, 9.93,298 minimal partial, 289, 291,298 mixed problem, 300,304 partial, 291,293,298 reciprocal, 182, 322 recurrence Szeg6-Levinson, 220 three term, 201,210 generalized, 186, 193 reduced H-reduced, 358 column reduced, 358 form, 232,234 reflection coefficient, 158,225,323 remainder, 22 left, 30 representation polynomial fraction, 283,295, 299 reproducing kernel, 150 reproducing property, 151 residual, 206,243, 266 response impulse, 277, 282, 284, 288, 291 transient, 290 reversal operator, 63, 182 right block, 190 ring
444 division, 21 Euclidean, 3, 55 integral, 3 right sided Euclidean, 6 valuation, 20 l~outh array, 316 Routh-Hurwitz test, 228,306,334 scaling function, 387 scattering theory, 37 Schur algorithm, 158,381 function, 324 parameter, 323,325 theorem, 324 Schur complement determinant formula, 146, 150, 161 Schur-Cohn test, 228,323,334 self- adjoint, 100 sequence Krylov, 100, 107, 135, 166 one sided infinite, 138 series formal Fourier, 149 formal Newton, 352 Laurent, 236 power, 290 two-sided Laurent series, 274 set invariant, 274 Julia, 169 time, 272 shift, 274 operator, 274, 287 register, 24, 265 upward, 116 shift invariant, 274 shift-register synthesis, 270 signal, 272,385,401,402 impulse, 274, 287
INDEX
input, 272,295 output, 272, 295 similar matrices, 123 simultaneous Pad@ approximant, 382 singular block, 239, 255,263 SISO system, 272 size block, 141,193 skew field, 231 space dual, 136, 138 Krylov, 100 vector, 136 spectrum, 129 stability BIBO, 288 internal, 289 stability check, 306 stacking vector, 62, 139, 151,198, 201 staircase, 241 state, 278 space, 278 description, 280, 287, 297, 300 triple, 284 transition matrix, 280 vector, 280 state space, 118 state space description, 287 strictly proper, 22 structure theorem, 120 Sturm sequence, 309 subspace complementary, 119 invariant, 101,119, 127, 128 maximal Krylov, 101 observable, 119 reachable, 119 Sylvester determinant formula, 117
INDEX
Sylvester matrix, 176,381 symbol Hankel, 63 Toeplitz, 179 synthesis shift-register, 270 system causal, 273 constant, 274 dynamical, 273 finite-dimensional, 279 Hankel, 239 identification, 295 linear, 271,272 SISO, 272 theory, 254 time invariant, 274 Toeplitz, 203 systolic array, 34 Szeg8 polynomial, 220, 323, 335 Szeg8 theory, 169 SzegS-Levinson recurrence, 220 table Pad~, 232,263 antidiagona~, 241,300 artificial row, 256 diagonal, 240, 300 main diagonal, 239,265 nonnormal, 232 normal, 232 path, 235 singular block, 233,239 staircase, 241 term constant, 201 higher order, 201 lower degree, 201 three term recurrence, 74, 104, 151, 185,201,210
445 generalized, 186, 193 time continuous, 272,287 discrete, 272 domain, 275 moment, 290 set, 272 time coefficient, 290,298, 301 time invariant system, 274 Toeplitz, 36,276 symbol, 179 system, 203 transfer function, 285, 288, 290, 294, 295,298 transform atomic, 32 discrete Laplace, 287 dual MSbius, 17 Laplace, 287, 402 MSbius, 15 z-transform, 274,287, 402 transient response, 290 true orthogonal polynomial, 144 truncation operator, 69 truncation theorem, 240 two-point Pad~ approximant, 200, 292,304 two-sided Laurent series, 274 unimodular matrix, 300,363 unit, 3 circle, 169, 185, 226, 306,382 unreduced form, 234 upward shift, 116 valuation, 20 Vandermonde matrix, 137,169,382 vector Pad~ approximant, 163 principal, 124, 132 stacking, 139, 151,198,201
446 state, 280 vector space, 136 Viscovatoff algorithm, 37, 95, 158, 235, 241,249 VLSI, 34 wavelet domain, 396 wavelet transform, 396 wavelets, 385 weakly normal data, 360 weight, 169 zero, 117, 311 para-conjugate, 311 zero division, 105 z-transform, 274, 287, 402
INDEX