DSP SYSTEM DESIGN
DSP System Design Complexity Reduced IIR Filter Implementation for Practical Applications by
Artur...
91 downloads
906 Views
19MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
DSP SYSTEM DESIGN
DSP System Design Complexity Reduced IIR Filter Implementation for Practical Applications by
Artur Krukowski University of Westminster and
University of Westminster
KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: Print ISBN:
0-306-48708-X 1-4020-7558-8
©2004 Springer Science + Business Media, Inc. Print ©2003 Kluwer Academic Publishers Dordrecht All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America
Visit Springer's eBookstore at: and the Springer Global Website Online at:
http://www.ebooks.kluweronline.com http://www.springeronline.com
Contents
Contributing Authors
vii
Preface
ix
Symbols and Abbreviations
xi
Polyphase IIR Filters Frequency Transformations
1 99
Filter Implementation
153
VHDL Filter Implementation
199
Appendix
221
References
227
Index
233
Contributing Authors
Artur Krukowski is with the University of Westminster since 1993 working as an Academic Researcher, then since 1999 as a Post Doctoral Researcher in Advanced DSP Systems in the Applied DSP and VLSI Research Group and since 2001 as a permanent member of the research staff. His areas of interest include Multi-rate Digital Signal Processing for Telecommunication Systems, digital filter design and their efficient lowlevel implementation, integrated circuit design, Digital Audio Broadcasting, Teleconferencing and Internet Technologies for Teaching. is with the University of Westminster (formerly the Polytechnic of Central London) since 1984. He is currently Professor of Applied DSP and VLSI Systems, leading the Applied DSP and VLSI Research Group at the University of Westminster. His research and teaching activities include digital and analog signal processing, silicon circuit and system design, digital filter design and implementation, A/D and D/A sigma-delta converters. He is currently working on efficiently implementable, lowpower DSP algorithms/architectures and Sigma-Delta modulator structures for use in the communications and biomedical industries.
Preface
This work presents the investigation of special type of IIR polyphase filter structures combined with frequency transformation techniques used for fast, multi-rate filtering, and their application for custom fixed-point implementation. Despite a lot of work being done on these subjects, there are still many unanswered questions. While a detailed coverage for all these questions in a single text is impossible, an honest effort has been made in this research monograph to address the exact analysis of the polyphase IIR structures and issues associated with their efficient implementation. Detailed theoretical analysis of the polyphase IIR structure has been presented for two and three coefficients in the two-path arrangement. This was then generalized for arbitrary filter order and any number of paths. The use of polyphase IIR structures in decimation and interpolation is being presented and performance assessed in terms of the number of calculations required for the given filter specification and the simplicity of implementation. Specimen decimation filter designs to be used in SigmaDelta lowpass and bandpass A/D converters are presented which seem to outperform traditional approaches. A new exact multi-point frequency transformation approach for arbitrary frequency choice has been suggested and evaluated. The use of this frequency transformation has been applied to the example of multi-band filter based on the polyphase IIR structure. Such filters substantially improved upon the standard techniques in terms of band to band oscillations, overall filter order, passband ripples and calculation burden for the given filter specification. A new “bit-flipping” algorithm has been developed to aid in filter design where the coefficient wordlength is constraint. Also, the standard Downhill
x Simplex Method (floating-point) was modified to operate with the constrained coefficient wordlength. Performance of both these advances is being evaluated on a number of examples of polyphase filters. Novel decimation and interpolation structures have been proposed, which can be implemented very efficiently. These allow an arbitrary order IIR antialiasing filter to operate at the lower rate of the decimator/interpolator. Similar structures for polyphase IIR decimator/interpolator structures are being discussed too. A new approach to digital filter design and implementation has been suggested which speeds-up silicon implementation of designs developed in Matlab. The Matlab program is being developed which takes the Simulink block description and converts it into a VHDL description. This in turn can be compiled, simulated, synthesized and fabricated without the need to go through the design process twice, first algorithmic/structural design and then the implementation. The design was tested on the example 14-bit polyphase two-path two-coefficient polyphase filter. The structural Simulink design has been converted into VHDL and compared bit-to-bit. This research monograph resulted from a doctoral study completed by the first author at the University of Westminster, London, UK, while working under the supervision of Prof. Artur Krukowski
Symbols and Abbreviations
n(k) u(k) e(k)
H(z) A(z)
FFT FST DFT dB SNR rms exp FIR IIR LPF HPF BPF LMS PZP VLSI
Delta sequence, equal to 1 for k=0 and zero for Noise Step function Error signal Update weight coefficient Unit delay operator Transfer function of the discrete-time filter Transfer function of an allpass filter Z transfer function of the bandpass filter Z transfer function of the notch filter Normalized frequency Fast Fourier Transform Fourier Summation Transform Discrete Fourier Transform Decibel Signal-to-Noise Ratio Root Mean-Square Exponential function ex Finite Impulse Response Infinite Impulse Response LowPass Filter HighPass Filter BandPass Filter Least-Mean-Square Pole-Zero Pattern Very Large Scale Integration
xii
VHSIC VHDL HTML M R L MSB LSB A/D ADC DAC SRD SRI MIP MAP LPH MOS COMB MAVR SLINK PCM PDM AM FM AGDP
Very High Speed Integrated Circuit VHSIC Hardware Description Language Hyper Text Mark-up Language Lowpass oversampling ratio Bandpass oversampling ratio Lowpass interpolation ratio Most Significant Bit Least Significant Bit Analogue-to-Digital Analogue-to-Digital Converter Digital-to-Analogue Converter Sample Rate Decreaser Sample Rate Increaser Minimum Phase filter Maximum Phase filter Linear Phase Metal Oxide Semiconductor technology Filter described by Nth-order difference equation. Moving AVerage Filter. Sink function defined for discrete signals Pulse Code Modulation Pulse Density Modulation Amplitude Modulation Frequency Modulation Arbitrary Group Delay Filter Sampling frequency LLFT Lowpass-to-Lowpass Frequency Transformation AA Anti-Aliasing Sigma-Delta PC Personal Computer TB Transition Bandwidth AM Amplitude Modulation FM Frequency Modulation ALU Arithmetic-Logic Unit CDSM Constrained Downhill Simplex Method CSDC Canonic Signed Digit Code NBC Natural Binary Code FSD Fractional Sample Delayer IEE Institute of Electrical Engineers IEEE Institute of Electronic and Electrical Engineers Inverse Z-transform ZII Zero-Insert Interpolation
Chapter 1 POLYPHASE IIR FILTERS Design and Applications
1.
OVERVIEW OF POLYPHASE IIR FILTERS
The idea of a polyphase structure can be easily derived from an FIR filter if all coefficients are substituted by appropriate allpass subfilters with constant unity gain and frequency dependent phase shifts leading to the structure as presented in Figure 1-1(a).
Each allpass has a different, carefully designed phase response. This is why the structure is called polyphase. The frequency selective characteristics of the filters are due to the phase shifts between consecutive branches (Figure 1-1 (b)). This book concentrates on the special class of polyphase structure in which AllPass Filters (APF) are made of
2
DSP System Design
one-coefficient allpass sections (2), where N is the number of branches in the structure. It is worth noting that their impulse responses are sequences of samples with non-zero values at locations, where The step response changes its value every each N samples. The impulse and step responses, and the average time delay of indicate that allpass filters may be very useful for decimation with factor N. The general transfer function of the polyphase structure presented in Figure 1-1 is [2]:
Allpass filters have N poles at and N zeros at To ensure absolute stability of the overall polyphase structure, the absolute coefficients value should be less than unity Substituting (2) into (1), the overall transfer function of polyphase structure becomes:
The frequency response is obtained by evaluating (3) on the unit circle [3]:
Phase response defined by (5) of the allpass filter (see Figure A.3 and Figure A.4 in Appendix A) is a monotonic function of normalized frequency and ranges from zero at DC, to N-multiple of at Nyquist, By proper choice of coefficients the phase shift of each branch of the N-path structure can be designed to match or to differ by
1. Polyphase IIR Filters
3
multiples of at certain regions of Such a situation for N=2 is shown in Figure 1-2. The phase characteristics of both paths overlap (in-phase) at low frequencies and are displaced by (out of phase) for high frequencies.
The two-path polyphase lowpass filter is formed as a sum of two parallel allpass filters with phase shifts carefully designed to add constructively in the passband and destructively in the stopband. It results in the magnitude response function of the half band filter presented in Figure 1-2(c). The proposed two-path polyphase filter structure offers very desirable properties in comparison to most published decimation and interpolation filters used in Sigma-Delta data converter implementations.
1.1
Two-Path Structure
The general transfer function of the halfband lowpass filter presented in Figure 1-2 can be derived from (3) by substituting N=2:
4
DSP System Design
This equation describes a stable system provided that the absolute value of the coefficients should be less than unity, It is clear that the order of the filter is where and are the number of cascaded allpass filters in the upper and lower branches respectively. The frequency response can be calculated by evaluating (6) on the unit circle [3]:
Phase responses of both branches are:
Very important for the quality of decimation is the magnitude response of the filter. Its shape is the cosine-like function derived from (7) and given by:
Example magnitude response for the seventh-order (three-coefficient) LPF is presented in Figure 1-3.
Deep minima of the magnitude response for normalized frequencies above are due to all zeros being placed in the left-hand z-plane close to the unit circle or on the unit circle. Considering the properties of the
1. Polyphase IIR Filters
5
allpass sub-filter phase response, given by (A.6a)-(A.6c) in Appendix A, then the lowpass filter magnitude response has the following properties [3]:
Specifications for half-Nyquist LPF having the transition band
are:
Both and are the peak ripples in the passband and in the stopband respectively. Both and should be as small as possible, preferably close to zero. This requirement is satisfied exactly only at DC and at Nyquist due to properties (a) and (b) in (10). Symmetry property (d) in (10) means that if is the passband frequency at which the magnitude response is minimum then the stopband frequency at which the magnitude response is maximum is For the least square optimal equiripple magnitude response performance in both passband and stopband, the gain must equal at the passband cut-off frequency, and achieve the stopband maximum at the stopband cutoff frequency, Considering that cutoff frequencies are related by the equation:
Using (11) we obtain the relation between passband and stopband ripples:
As for any practical filter
then
and (13) approximates to:
6
DSP System Design
In practical cases, it is possible to concentrate only on achieving stopband specifications because a reasonable stopband performance guarantees, in view of (14), an even better passband performance. If, for example, stopband ripples are (60dB attenuation) then resulting passband ripples are and the minimum passband gain is
Phase response of the two-path LPF can be derived from (7):
Factor represents phase jumps due to sign changes of the cosine function in (10) corresponding to the frequencies of the filter zeros and. Those phase jumps can be seen in the filter stopband in Figure 1-3. Linearphase requirement is very important in the design of digital filters. In processing broad-band signals we invest lots of effort to keep the filter phase as linear as possible to prevent the harmonics of the useful signal from being delayed by different amounts of time. Group delay function, is a good measurement of phase linearity. It can be derived from its definition (16):
Substituting (8) and (15) into (16) gives:
1. Polyphase IIR Filters
7
Simplifying:
Equations (17) and (18) contain the sum of delta functions located at frequencies at which phase jumps occur. As they only occur in the filter stopband, they do not cause any problems for the signal frequencies in the filter passband. The group delay function for the example seventh-order lowpass filter is presented in Figure 1-5.
The function is even symmetric in respect to DC and half-Nyquist and positive for all frequencies. It can be seen from the group delay that the phase response is linear only close to DC and close to Nyquist. It peaks at half-Nyquist where the phase response has its inflection point. It can be seen from (18) that the larger the value and the number of coefficients, the larger group delay peak is and the more non-linear the phase response becomes. The design algorithm for the floating-point version of this class of two-path polyphase structure has been developed by Harris and Constantinides in [1], [4]-[5] and is based on the analogy to the analog elliptic filters. The design for more than two paths and for fixed-point coefficients will be described later in Chapter 1 and in Chapter 3 respectively. The structure described above has significant five-to-one savings in multiplications in comparison to the equivalent elliptic filter. A fifth-order two-path structure requires only two coefficients as opposed to the direct implementation requiring ten coefficients. As for any filter design for a fixed order, the transition width can be traded for the stopband
DSP System Design
8
attenuation, or it can be reduced for the fixed attenuation by increasing the number of filter coefficients. An approximate relationship has been published in [1], [4] for the equiripple filter:
Factor A is the attenuation in dB, TB is the normalized transition bandwidth and N is the number of allpass segments (including the delay). 1.1.1
One-coefficient two-path polyphase IIR lowpass filter
One-coefficient two-path LPF has a structure with one allpass in the upper branch and a delayer in the lower one as shown in Figure 1-6.
The transfer function of this simple structure is:
It can be see that the poles of the LPF are identically located to the poles of the allpass filter. There exists a zero at Nyquist due to the delay in the lower branch, which forces the frequency response to zero at Nyquist. The other two zeros are deployed in a conjugate pair and located on the unit circle. Their exact location can be determined using the standard assumption:
Comparing the second-order term from the numerator of (20) with the transfer function of the second-order FIR filter allows determining the zero locations in terms of the coefficient value:
1. Polyphase IIR Filters Solving for
9
and
Equivalently in terms of the magnitude and phase of the roots:
If conjugate zeros are located near the Nyquist frequency they can produce enhanced attenuation in the stopband. The attenuation and transition band in terms of the coefficient value for this filter are presented in Figure 1-7 and Figure 1-8 respectively.
10
DSP System Design
Clearly requirements for large stopband attenuation and narrow transition band are mutually exclusive. For attenuations above 60dB, the transition band is forced to be greater than The disadvantage of a onecoefficient filter is that, in order to achieve high attenuation, the coefficient must be very accurate. For example it requires 11 bits long coefficient representation to achieve 96dB attenuation (over 16 bits to achieve 120dB).
Equations (20) and (22) prove that all the zeros of the one-coefficient filter are on the unit circle. As and the coefficient must be If the coefficient value reaches 1/3, the filter becomes the third-order integrator while both the conjugate zeros reach the z=-1 point. Decreasing coefficient value below 1/3 make the zeros split into two, sliding along the left-hand real axis as seen in Figure 1-9.
1. Polyphase IIR Filters 1.1.2
11
Two-coefficient two-path polyphase IIR lowpass filter
Two-coefficient structure is presented in Figure 1-10. Filters A0(z-2) and A1(z-2) are one-coefficient second-order allpass subfilters.
The transfer function of such a polyphase LPF is:
This transfer function can be rearranged into a product form showing pole-zero locations. It is easy to find the roots of the overall transfer function as they are the same as the roots of the allpass subfilters composing the filter, except for an additional pole at the origin (due to the delayer in the lower branch). An equivalent product form of (25) is the following:
It is well known that the roots of any fourth-order symmetrical polynomial can be found analytically [6]. The roots of the function:
DSP System Design
12
Using (29) to find the roots of equation (27) gives:
This result gives two conjugate pairs of zeros. In order to find the maximum stopband attenuation for a given transition band, zeros of the filter should be on the unit circle. This requires to be real and negative and further the argument under the square root of (30) must be positive.
After rearranging we get:
Considering that coefficient values are positive we get:
Rearranging in terms of
gives:
For both sides of (34) are equal and there are two conjugate pairs of double zeros on the unit circle. When changing coefficient values within the allowed range of (0...1), the placement of the filter zeros changes dramatically as can be seen in Figure 1-11 [7]-[8]. The shaded area represents coefficients for which all filter zeros are on the unit circle. The grid points inside the area represent the cases of four-bit long coefficient pairs. The bold ones have no more than two bits set in the entire coefficient in either Unsigned (0,1) or Signed Binary Code (-1,0,1). For the filter reduces to a single-coefficient case (third-order transfer function).
1. Polyphase IIR Filters
13
The area is bounded from one side by (34) and from the other one by the condition requiring the argument under the square root for to be less than or equal to zero, forcing all four zeros to be complex ones:
Trajectories of equations (33) and (35) have their contact point (as they touch, but not cross each other) at and For these values of and all five zeros of the transfer function to be nailed at Nyquist. The contact point can be though of as being the “focal” point of the filter. Moving away from this point to any of the four areas, C0, C1, C2 or C3, results in different filter behaviors. Choosing coefficients from area C0 (shaded one) will result in two conjugate pairs of zeros moving from Nyquist on the unit circle towards half-Nyquist point which they reach when both coefficients are equal to one. The area C0 is bounded by two curves of coefficients for which zeros are in single for (35) or double conjugate pairs on the unit circle for (35) (all the remaining ones are at Nyquist). When coefficients are chosen from area C1 then there are two pairs of zeros moved from Nyquist, a pair of conjugate zeros sliding on the unit circle and a pair of reciprocal zeros on the negative
14
DSP System Design
real axis. This area is bounded by the condition (35) of single zeros on the unit circle and the remaining ones in Nyquist, and the general requirement of limiting coefficient values to For coefficients chosen from area C2 zeros of the transfer function stay on the negative real axis and split into two pairs of reciprocal zeros and a single one staying at Nyquist. His area is bounded by the condition of a single reciprocal pair of zeros on the negative real axis for (35) (all remaining ones stay at Nyquist) and the condition requiring two pairs of reciprocal pairs of zeros to stay on the negative real axis for (34) (single zero stays in Nyquist). If coefficients are chosen from area C3 then four zeros of the transfer function are arranged into reciprocal conjugate zeros with only single zero remaining at Nyquist. The area is bounded by the condition (34) requiring two pairs of conjugate zeros to stay on the unit circle, and the general requirement of limiting coefficient values to There is also the fifth area C4 that is far away from the focal point. It represents the coefficient for which four zeros of the transfer functions form two conjugate pairs of zeros on the unit circle, but on the right-hand side of the imaginary axis. An interesting example of the double zero is the case marked in Figure 1-11. Its coefficients (1/8, 9/16) are only four-bit long and have only three bits set in total making it an attractive case for implementation. Establishing the bounded area of coefficients for which all zeros are on the unit circle is very important not only for understanding the behavior of the filter, but can be used by the constrained filter design algorithms (as described in Chapter 3) as the principle of their operation is the structured search within the established required boundaries. 1.1.3
Two-path lowpass filters with more than two coefficients
The filter having three or more coefficient becomes too complicated and equations describing its pole-zero locations become too complex to be solved analytically in the same manner as for the two-coefficient filter. Such an analysis has to be done numerically. An algorithm has been developed to aid such analysis which returns the coefficients of the polyphase structure for given zero or pole locations of the filter and it works for any number of filter coefficients. For the case when pole locations are known, finding filter coefficient values is trivial. They are simply equal radii of the filter poles raised to the power. Depending on the number of paths the zeros of the polyphase structure can take any of a number of permissible symmetrical dispositions. The example six possible zero patterns are shown
1. Polyphase IIR Filters
15
for the two path two-coefficient filter case in Figure 1-12. All the zeros of the polyphase LPF can take any combination of these dispositions.
Determining the coefficient values from its zero locations is especially useful for specifying the bounded area of filter coefficients for which the zeros are on the unit circle. This information is extremely useful and can be used as the starting point by such floating-point filter design algorithms like Powell or Simulated Annealing [9] as well as by any constrained filter design algorithm like bit-flipping or Constrained Downhill Simplex (described in Chapter 3) in which the design is based on the search within the bounded space. Analytic methods of finding the area of coefficient values for which the polyphase filter zeros are on the unit circle, thus allowing one to achieve the maximum attenuation for the given filter order, proved to not be practical for more than three coefficients. The equation manipulations involved made such an approach computationally impractical and very difficult to solve analytically. Alternatively, the required range of coefficients can be found by a direct numerical method. Let us now have a look at the general transfer function of the N-path polyphase structure:
16
DSP System Design
It can be noticed from (36) that the filter denominator has non-zero coefficient values only at with K being the total number of filter coefficients. For every path the bracketed expression in the numerator is a product of type terms, which when multiplied together results in non-zero values only at (just like in the case of the filter denominator). The term changes the exponents of those products (obtained for every path) by a different integer value for every path product. As a result, when transfer functions of all path subfilters (functions of are being added together, only one sub-filter has a non-zero factor at each So what does this tell us? One should pay careful attention to the way the product for each path is created. All of them are created from K one-coefficient blocks, either or one for each filter coefficient (one coming from the numerator and the other from the denominator of the allpass section respectively). This means that if the location of the zeros of the filter is known the procedure for finding the filter coefficients becomes very simple, as follows: 1. Create the filter numerator from its zeros by convolving together terms, for k=1,...,K (where is the zero of the transfer function). 2. Take every factor of the numerator polynomial and use them to create a new polynomial in terms of 3. Calculate the roots of the new polynomial 4. If any root magnitude is greater than unity, then take its reciprocal. 5. Root values obtained are then equal to the polyphase filter coefficients. To show this method in practice it will be used to first calculate the coefficients of the polyphase structure for which all zeros are located at Nyquist. Such a filter case is very suitable in the polyphase filter design for the initial point of optimization algorithms. Results from trial runs of the algorithm are summarized in Table 1-1 for the two-path polyphase filter having different number of coefficients. The same method can be used to specify the bounded area of the two-path, three-coefficient polyphase filter for which all zeros lie on the unit circle, thus allowing achievement of the maximum possible stopband attenuation. Such an approach is especially useful for the case of constrained coefficients for which the equiripple stopband can be only approximated.
1. Polyphase IIR Filters
17
Method for finding filter coefficients from known (valid) zero locations can be used to analyze the polyphase structure to identify the space of coefficients where the desired filter solutions reside. This may not be very important for floating point designs, considering the method suggested by Constantinides and Harris in [4], but would be very useful for algorithms searching in the space of constrained coefficients. In such a case the optimum solution in terms of maximum attenuation will no longer be identical to the floating-point one.
18
DSP System Design
The visualization of zero placement for the two-path, three-coefficient polyphase LPF becomes more difficult than for the two-coefficient case, as shown in Figure 1-13. It can be noticed that coefficient space corresponding to zeros on the unit circle is only a small part of the total area of possible coefficient combinations. This area is bounded by three curves defining cases of all zeros, single, double and triple, either on the unit circle or at Nyquist. Those curves converge to the point where all zeros reside at Nyquist. The plane established by the curves of single and double zeros on the unit circle represent the case where one conjugate pair of zeros migrate from Nyquist towards single zeros, in the end converging into double zeros. In the same way the plane spread between curves of double and triple zeros on the unit circle describe the case where a conjugate pair of zeros moves from Nyquist towards the double zeros, creating triple zeros on the unit circle. The third plane bounded by curves of single and triple zeros describes filter cases having double conjugate pair of zeros moving from Nyquist towards the other pair of single zeros, converging into triple zeros. It is important to know also what happens when coefficients are chosen from outside the described volume. If so, some or all of the zeros will no longer reside on the unit circle. This is very often the case when filter coefficients are being constrained into short wordlength. The curves of the equiripple stopband attenuation for one, two and three-coefficient cases, indicated in Figure 1-13, are very close to the boundaries of the volume. As a result when the coefficients are constrained, they are very likely to fall out of the volume, resulting in non unit circle zeros. It is characteristic that in the case when zeros fall off the unit circle, they do it in pairs. For each such pair one of them falls outside and one inside the unit circle (to keep with the further constraint of having to keep the transfer function real), both being at the same angle (frequency) forming a conjugate quartet. The shaded triangular shapes in Figure 1-13 represent cases for which the filter has a constant transition band (not an optimum case in terms of the stopband attenuation). Thick lines represent optimum filter design cases in terms of equiripple passband and stopband ripples for different transition bands. These were obtained from the standard polyphase LPF floating-point filter design algorithm [1], [4]-[5]. It is easy to notice that the coefficients of the top-branch allpass filter can be exchanged without affecting the behavior of the filter. Therefore there should be two volumes, from which only one was shown in Figure 1-13. This is because, first, both of them have exactly the same shape, causing the same filter behavior and therefore showing both volumes would not convey any extra information. Secondly, such visualization allowed the user to zoom into one of the volumes which otherwise would be very small and barely visible inside the full cube
1. Polyphase IIR Filters
19
Using the analogy to the three-coefficient case it is very easy to predict how the filter will behave for four-coefficients and how the volume of coefficients ensuring the placement of all the zeros on the unit circle would look like, even if it is difficult to picture it on paper. In the four-dimensional space there will be a single point of all zeros at Nyquist, four curves of single/double/triple/quadruple zeros on the unit circle only going from the overall convergent point towards all four convergent points of the threecoefficient filter (cases of one/three/five/seven zeros at Nyquist only).
1.2
N-Path structure versus the classical 2-path one
The two-path structures are applicable for a wide range of halfband type filters including lowpass, highpass, bandpass and Hilbert ones. Extending the structure to more paths can lead to having more flexibility in specifying filter bandwidths while preserving such advantages of the polyphase structure as high attenuation and small passband ripples achievable for a small number of coefficients and, what will be shown later, little sensitivity to constraining their coefficient values as well as ease of implementation. The N-path polyphase structure, as it was described in the previous sections, is based on the N-tap FIR filter in which coefficients are substituted by a cascade of single-coefficient allpass filters as described in Appendix A. When looking at Figure A.3 and Figure A.4 in Appendix A, it can be noticed that such allpass filters have their phase response independent from the coefficient value at 2N frequencies around the unit circle, i.e. for the allpass section:
The consequence of this is that the complex frequency response (magnitude and phase) of the polyphase LPF will also have fixed values irrespective of the coefficient values at the frequencies specified by (37) (since all the allpass sections are of the same order in all the paths). The values of the magnitude response at these magic frequencies can be calculated from the general transfer function of the N-path polyphase lowpass filter (3) by setting all coefficient values to unity. It can be observed that in such a case the polyphase lowpass filter becomes the Moving AVeRage (MAVR) filter:
20
DSP System Design
The magnitude response at the characteristic frequencies is then:
Results for the number of paths ranging from two to eight are presented in Table 1-2. Looking at the results it can be seen that the MAVR filter imparts its characteristic on the shape of the lowpass filter magnitude response. The advantage is that it creates the cutoff frequency at and places zeros on the unit circle at even multiples of but on the other hand it creates spikes in the stopband at even multiples of These spikes are very high with the first one at around 9-13dB and the following ones a few dBs smaller. The amplitude of the neighboring spikes does not differ substantially and until recently this has been reducing the application of such lowpass filters. The cutoff frequency of the polyphase lowpass filter is the same as for the MAVR filter of the same order. This also implies that performing transformations on the MAVR filter itself can change the magnitude response of the whole polyphase filter. For example this can be a way of reducing spikes in the stopband of the polyphase lowpass filters having more than two paths, like in the example of the four-path (N=4) lowpass polyphase filter designed with eight coefficients in Figure 1-14.
1. Polyphase IIR Filters
21
The reasons for spikes in the magnitude response can be explained from a pole-zero plot. It can be noticed that all filter poles are strictly inside the unit circle positioned at four frequencies equispaced around the unit circle. Those closest one to DC determines the position of the edge of the filter passband while the more distant ones create high spikes in the stopband. Zeros are no longer only on the unit circle. Now, almost half of them are close to poles to compensate the influence of those more far away. There are N-1 zeros placed on the unit circle at frequencies which are even multiples of the cutoff frequency irrespective of the coefficient values, just like for the case of a two-path filter. The rest of them (less than half) are placed on the unit circle. The conclusion is that the more paths that are used, the less performance the filter can achieve for the same number of coefficients, both in terms of the achievable stopband attenuation for the given transition bands and considering also high spikes in the stopband magnitude response which for almost all applications have to be got rid of.
1.3
Compensating for peaks in the stopband
The problem of decreasing the spikes in the stopband of multi-path polyphase lowpass filter magnitude response is not a trivial task. The main idea is not to damage highly precise passband of the filter (its ripples and the edge of the transition band) and do the correction as efficiently as the polyphase structure does its filtering operation, thus preserving the implementation efficiency of the polyphase structure. There are three methods proposed here that minimize the effect of the stopband peaks: Use a polyphase halfband compensation filter. Design the non-halfband filter as a cascade of multirate subfilters. Apply the frequency transformation to adjust the filter cutoff frequency.
22
DSP System Design
All these methods incorporate polyphase allpass-based IIR structures and therefore do not degrade the performance of the prototype filter in terms of its passband and stopband ripples and do not require much more computations, thus preserving implementation efficiency of polyphase LPF. 1.3.1
Correction with halfband polyphase lowpass filters
The multiband polyphase LPFs have their stopband peaks at certain fixed frequency locations given in (37), dependent only on the number of paths. These locations are independent of the number of coefficients as well as their values. As the frequencies at which the spikes occur are well known, it is possible to design a compensation filter which would have its zeros at these frequencies. The obvious choice would be a notch filter. The problem arising from such choice is that the filter, in order to achieve similar ripple performance to the polyphase structure, would require many more coefficients (usually with floating-point precision) than the LPF itself and the implementation efficiency of the whole filter in terms of the number of calculations would degrade. The solution is to use a similar polyphase IIR structure to compensate for the spikes. For the number of paths between three and five such compensation filter can be designed as a halfband polyphase LPF as in Figure 1-15, Figure 1-16 and Figure 1-17 respectively. For the larger number of paths the stopband spikes would appear in the passband of the compensator and hence would not be removed.
The example three-path polyphase LPF was designed for a stopband attenuation of A=50dB and transition width of TB=0.05. It can be seen that
1. Polyphase IIR Filters
23
for three paths the spike is only at Nyquist and it can be easily compensated with a halfband filter which has a zero at Nyquist. The obvious question coming into mind is why not to use unity coefficients for the compensator, as it always has the zero at Nyquist. The reason is that the compensator has two tasks to accomplish, one to decrease the spike in the stopband of the multi-path LPF and second to keep passband ripples, within allowed limits, Therefore the compensator has to be designed always for the same passband width (transition band as for the original filter, TB, and for passband ripples, Assuming that then:
In
the
example design the allowed passband ripples were The two-coefficient corrector designed for the attenuation of and transition bandwidth (2 coefficients) was enough to keep the passband ripples within the required limits at the same time getting rid of the spike and increasing the stopband attenuation for the overall filter. For the four and five-path cases spikes happen in the middle of the stopband, and therefore the halfband filter must be designed so that it has one of its zeros placed at the frequency of the spike. The proposed method places the first zero of the corrector at the frequency where the spike is and then the next ones between the first one and the Nyquist in such a way to achieve equiripple stopband attenuation for the corrector. The number of coefficients (equivalent to half the number of zeros excluding the one at Nyquist) is chosen to keep the passband ripples within the limits, To the first approximation the number of coefficients, is the same as for the halfband filter corrector designed for the same passband ripples, and the transition band of The zeros of the polyphase LPF are on the cosine frequency scale (analogs to the logarithmic scale) and therefore if the first zero is at frequency positions of other zeros of the polyphase halfband LPF, can be calculated from:
The coefficients can be then calculated from the position of the zeros using the algorithm described in section 1.1.1.3. Alternatively the
24
DSP System Design
optimization routine can be applied to adjust the transition band of the equiripple polyphase correction halfband LPF in order to place its first zero at the frequency of the spike. The result of correction of the four-path LPF is shown in Figure 1-16. The specifications for the prototype filter were similar to the previous example, namely the attenuation A=50dB and transition width of TB=0.05. The corrector was designed to achieve the total passband ripples less than It was designed for the attenuation and transition band which it achieved with two coefficients, setting the passband ripples of the overall filter at and its minimum stopband attenuation at
The five-path LPF was designed in a similar way as the four and threepath ones. The five-path case has two spikes: one at Nyquist and the other one at (Figure 1-17). In some way it is similar to both three and fourpath cases. As before the first zero of the corrected filter was chosen to be at the frequency of the spike with all the other ones put between the first one and Nyquist to achieve equiripple magnitude response of the corrector in its stopband. The corrector was designed to achieve total passband ripples less than It was designed for the attenuation of and transition band of which it achieved with four coefficients, setting passband ripples at and the minimum stopband attenuation at The result is shown in Figure 1-17. For more than five paths the spikes in the stopband of the magnitude response appear at frequencies and lower which makes it impossible for the halfband filter to correct as it has its 3dB cutoff at Therefore some of the spikes are not correctable. The way to correct them would be to use filters with more than two paths, but they themselves would require correction as well.
1. Polyphase IIR Filters
25
In such a case the order of the overall filter would increase considerably and become impractical for implementation, hence a different approach is suggested that can also be used for the number of paths between three and five (see above). The method is based on the cascaded structure of halfband filters transformed with frequency transformation, as shown below. 1.3.2
Cascading polyphase halfband LPF transformed by
The idea of cascaded polyphase lowpass filters achieving spike-free stopband, having a baseband equal the integer-fraction of Nyquist frequency is based on the idea of converting a prototype lowpass filter into a multiband filter using a frequency transformation. It replaces each delayer of the original filter transfer function with where k is an integer equal to number of required filter replicas around the unit circle as in Figure 1-18.
26
DSP System Design
In this example a four-coefficient polyphase IIR half-band LPF was transformed by and Cascading a number of such filters, each transformed to have different numbers of replica, allows the design of a variety of lowpass filters having integer-fractional passband widths. The minimum stopband attenuation is equal to the minimum one of all the cascaded filters, while the total passband ripples are the sum of the ripples of the basic filters. The power factor in the frequency transformation has to be properly chosen for each basic filter so that there is at least one filter, which has its stopband at the frequencies where the overall filter should have its stopband, i.e. to avoid other passbands. A good example is a class of filters for which the cutoff frequency m being an integer and Such a class of filters can be designed as a cascade of prototype halfband polyphase IIR filters, each successive one transformed by where and i=1..m. All halfband prototype filters except the last one must be designed in a way to avoid overlapping their passbands. Therefore their transition bandwidths before transformation have to be specified as Last stage filter determines overall transition bandwidth, TB, and is designed for the transition bandwidth of,
The example of such class of filter is a lowpass filter achieving passband ripples of up to its cutoff frequency of having transition bandwidth of TB=0.05 and stopband attenuation of A=53dB as shown in Figure 1-19. It was designed using three cascaded LPFs having transition bandwidths of (one coefficient), (two coefficients) and (three coefficients) with one more coefficient per stage required for the lowpass-to-lowpass frequency transformation.
1. Polyphase IIR Filters 1.3.3
27
Cutoff frequency adjusted using frequency transformation
The different approach to the design of polyphase filters with cutoff frequencies different from where N>2 is the number of paths and the basic allpass filter order, and not suffering from high spikes in the stopband is to use the Lowpass-to-Lowpass Frequency Transformation (LLFT) (42) to move the cutoff of the prototype halfband polyphase LPF, to the required new one,
where and In practice the transformation is performed by substituting every delayer of the original filter with the allpass filter (42). As the halfband polyphase LPF has equal ripples both in its passband and stopband (does not have any spikes in its stopband), the new transformed filter will also have equiripple magnitude response both in its passband and in its stopband. Applying the first-order frequency transformation does not obviously increase the order of the overall filter as the first-order delayer is replaced with the first-order allpass filter. However the implementation of each allpass filter of the polyphase LPF will now have two coefficients per second-order allpass section, instead of one as previously.
It is clear that if the prototype polyphase filter was designed to have constrained coefficients, the coefficients of the resulting frequency transformed filter will have to be re-optimized to have comparable (preferably the same) wordlength as the prototype filter. Each of the transformed allpass blocks can be now implemented using the standard allpass Numerator-Denominator (N-D) structure as in Figure 1-20 which has two sets of delayers and calculations concentrated at the output. The LLFT is a single-point exact transformation, which means that only one feature of the prototype filter is going to be accurately mapped from the old frequency, to the new frequency,
28
DSP System Design
Movement of old features of the prototype lowpass filter from their old frequency locations to the new ones can be easily calculated from the phase of the mapping filter (42), treated as the transformation function,
The shape of the transformation function is shown in Figure 1-21. Notice that the shape of the phase describes the unstable mapping function. It has to be strictly unstable, i.e. to have all its poles outside the unit circle, in order to transform the stable prototype filter into the stable target one (see Chapter 2).
1. Polyphase IIR Filters
29
The problem arising when using a one-point frequency transformation to adjust the cutoff frequency of the halfband polyphase lowpass filter is that the new cutoff is no longer in the center of the transition band as the transformation function becomes non-linear and non-symmetric against as soon as If the new cutoff frequency is smaller than it will be closer to the left bandedge, otherwise it will be closer to the right one. Therefore one must decide whether only the width of the transition band and position of its edges is important, or also the placement of the cutoff frequency. In the first case the adjustment need to be made to both edge frequencies of the transition band, in the second case adjustment will be made to the cutoff frequency and the most distant edge of the transition band (the other edge will be simply closer to the cutoff than it was required). It is important to notice that the frequency transformation preserves the ripple structure both in the filter passband and its stopband, but the frequency location of their features are going to be shifted accordingly to the movement of the cutoff frequency. Only DC and Nyquist features remain in the same place.
The example polyphase LPF was designed for the cutoff frequency the minimum stopband attenuation and the transition bandwidth of TB=0.05 (case (a) in Figure 1-22). The prototype polyphase filter required five coefficients and achieved A=105.5dB of attenuation for the transition width of The coefficients of the second-order sections transformed with LLFT given by (42) into (43) are given in Table 1-3. It can be noticed that for such frequency transformation specification the left edge of the transition band is much closer to the cutoff
30
DSP System Design
frequency than the right one. For some applications such, even unwanted, increase of the passband may be advantageous. This, however, happens this way only for otherwise a stopband is closer to the cutoff frequency.
If position of both edges of the transition band is more important than position of the cutoff frequency then the following modified algorithm can be used. As the location of the cutoff frequency is then not important and the maximum allowed width of the transition band is known, the coefficient of the LLFT can be calculated from (42) and the transition bandwidth from:
Frequency is the left edge of the required transition band if the cutoff frequency is smaller than or the right edge otherwise. For the case when only the position of the edges of the transition band are important and the filter cutoff frequency can be at any frequency between these edges, the calculations of the required parameter of the frequency transformation and the transition band of the prototype filter are too difficult to be determined analytically and it requires an iterative approach as follows: 1. Specify the new cutoff frequency and the new transition band ( is the original filter cut-off frequency). 2. Calculate coefficient from (42). 3. Inverse transform edges of the target filter transition band (indexes T- and T+ indicate left and right edges of the transition band respectively) into from (46).
1. Polyphase IIR Filters
31
4. Modify the target cutoff frequency:
5. If modification is greater than allowed frequency error then go to step 2. 6. Calculate the required transition band of the prototype lowpass filter:
An example polyphase LPF was designed to illustrate the method as described above. It was designed for the set of similar specifications as in the previous example. This time the edges of the transition band were chosen to be at The resulting magnitude response of the designed filter is shown in blue (b) in Figure 1-22. The polyphase prototype filter required the transition width of and only four coefficients, in contrary to the previous example where it required five coefficients. The method described here is the most flexible as it designs filters with arbitrary cutoff frequencies and transition bands. It does not require designing a number of lowpass filters (for different transition bands). The stopband ripples achieved are equiripple; there is no problem with spikes in the magnitude response in the stopband. The price to pay for it is very small. It only requires a slightly more complicated allpass filter structure to be implemented. Therefore this method, in most of the cases of fractional-band lowpass filters, should be the preferable one.
1.4
Converting general IIR filter into parallel structure
Parallel realization of filtering tasks is gaining importance as the parallel processing capability of DSP engines is enhanced and advanced. However, there is relatively little methodology developed in the field for tackling the problem of converting from direct-form IIR transfer functions to parallel equivalents. Most work (e.g. [54]-[56], [81]) approaches the task from another direction, composing the overall IIR filter as a combination of elementary IIR filters. Here a decomposition method is presented, which allows for any arbitrary complex IIR filter transfer function to be converted into a sum of variable-order IIR sections or first-order IIR or allpass sections
32
DSP System Design
[57]. Such transformation allows parallel processing of every section in each path by a separate processing element and hence greatly increases the filter computation speed. For the general case, both real and complex filters are decomposed into parallel complex IIR filters as well as real filters decomposed into a set of real IIR sections. Even so the method depends on the root finding algorithm, the decomposition algorithm is able to improve any loss of its accuracy during the successive iterative calculation. A general IIR filter transfer function H(z) is considered:
It can be shown that H(z) can be decomposed with no magnitude or phase distortion into an M-path parallel structure in Figure 1-23, described by (50) in which stands for the IIR filter order in the branch.
There are two ways of calculating the coefficients of the decomposed structures: the direct one and the iterative one. We refer to the latter one as the “successive-separation” algorithm, since one IIR section is calculated for each successive path and the remainder of the original transfer function is then subjected to the next iteration. If the original filter numerator order (as a function of is higher than its denominator order we equalise the numerator and denominator orders by extracting an FIR part from the filter.
1. Polyphase IIR Filters
33
One way of dealing with the FIR part of the numerator is to cascade it with the parallel combination of IIR sections, as shown in Figure l-24(a).
First, the FIR transfer function is calculated from the division of the filter numerator B(z) by its denominator A(z). Its coefficients of the can be calculated from the inverse transform of the transfer function of the filter (51), in other words the taps of the are equal the first coefficients of the impulse response of the filter:
The term denotes the inverse Z-transformation. Then the new IIR filter numerator B’(z) coefficients can be calculated as the remainder of the division of B(z) by A(z) of original filter transfer function (49).
Another way of incorporating of the FIR part of the filter numerator is to place it into an extra parallel path, as shown in Figure 1-24(b). The FIR section can be extracted from the numerator as follows: Calculate the roots of the filter numerator, for Form the FIR filter transfer function (53) by recombining roots:
DSP System Design
34
Form the new numerator, B’(z), by recombining the remaining roots of the original filter numerator as in (54):
1.4.1
IIR-to-IIR decomposition
For IIR-to-IIR decomposition coefficients of the structure can be calculated directly by comparing transfer functions of the general form of an IIR filter with the one of the decomposed structure, (49), as in Figure 1-23.
After rearranging a set of linear equations is obtained, allowing noniterative way of calculating the coefficients of the decomposed structure in terms of the roots of the original filter, for i=1,..,N.
The direct method of (56) is very accurate. MATLAB simulations show the absolute error for orders up to 20 as shown in Figure 1-25. This is a value very close to the internal MATLAB eps value. Higher filter orders are subject to higher errors and this is for two reasons. The linear equation solution algorithm gives bigger errors for higher orders. But a more important source of error is the way the matrix is created. If one looks carefully, the number of calculations required to create this matrix is enormous for high order filters. It is equivalent to the total calculation time, shown in Figure 1-26.
1. Polyphase IIR Filters
35
Very long calculation time increasing fast with filter order and fast decrease of the accuracy are both due to a very large number of calculations required to create and solve a set of linear equations. Considering this fact and the exponential increase of calculation time with the increase of the filter order limits the practical usefulness of the method to filter orders of up to 24. The following, so-called “successive-separation” methods dramatically
36
DSP System Design
decreases the calculation time, with less accuracy for small order filters, but better for high order ones. Let us consider again the general IIR filter transfer function (48) with a gain factor, in front. As calculations are performed iteratively M times, the right-hand side of (57) is the outcome of separating one IIR section from the filter. In the next iteration the gain factor changes to and the constant, at now becomes M-2. This process continues until the iteration is reached.
Both and coefficients can easily be found by calculating the roots of the original filter denominator. Simplifying the bracketed expression of (57) into a single fraction and comparing numerators of both sides gives (58).
These equations are used for calculating the numerator coefficients for both the separated section and the remaining part of the transfer function from the previous iteration. This method is capable of decomposing a complex IIR transfer function into a parallel form of complex IIR sections, as well as real-to-real decomposition. For the real case it is required that the denominators of the IIR sections have real coefficients, a condition which will subsequently lead to real numerator coefficients. This property in turn limits the number of paths the original filter can be decomposed to, which equals half the number of the original IIR filter complex roots plus the number of its real roots. The “successive-separation” method is very insensitive to the accuracy of the root finding algorithm. Error introduced during the calculation of any section during the decomposition is being compensated by the remaining part of the filter and is being taken care of in the successive iterations. In the presented method the choice of K is arbitrary. More careful selection of K and a proper composition of the roots into each section may increase the accuracy of the overall decomposition.
1. Polyphase IIR Filters 1.4.2
37
First-order allpass decomposition
The direct IIR-to-Allpass decomposition can be performed in the similar way as the direct decomposition to IIR sections by comparing the transfer function of the general IIR filter, this time with structure of Figure l-27(b):
After rearranging we get the set of linear equations which, as before, can be given in the following matrix form:
The special case is the decomposition into first-order IIR sections as shown in Figure 1-27(a). Certainly if the filter roots are not real, complex IIR sections will result. In such a case the matrix of equation (58) can be easily simplified. Assuming boundary conditions: and we get a closed-form set of linear equations:
where X is the vector of unknowns.
38
DSP System Design
The coefficients where i = 1...N-1, are easily obtained by de-convolving (dividing the original denominator by the term It is also possible to perform exact IIR to first-order allpass decomposition, as shown in Figure 1-27(b). For this case we start from:
After cross-multiplication and algebraic manipulation of the polynomial coefficients we obtain another set of linear equations for calculating the gain factor preceding each first-order allpass section and the numerator coefficients of the remaining part of the original transfer function. Assuming boundary conditions and these linear equations with vector of unknowns, X, can be expressed as:
There is an easy way of converting the IIR structure of Figure l-27(a) directly into an allpass form of Figure 1-27(b) that is fast and does not suffer from the loss of precision. It bases on transforming a first-order IIR section into a sum of a constant and a scaled first-order allpass section:
1. Polyphase IIR Filters
39
In this case we simply compute (64) for all the IIR sections of the structure in Figure 1-27(b). At the end summing all the constants leading to:
1.4.3
Example 1 converting real prototype into complex sections
First let us consider the decomposition of a multiband, asymmetric real IIR filter into the sum of complex IIR sections of equal orders. The original filter for this example was created from a low-order elliptic filter converted into an FIR of 128 coefficients and then reduced to a IIR one using Balanced Model Truncation (BMT) [58]. Decomposition algorithms were implemented in MATLAB. Results are summarised in Table 1-4, showing absolute peak error of coefficient calculation and of magnitude and phase responses. The reconstruction error was calculated as the difference between original filter transfer function and the one of the recomposed one. The magnitude and phase errors are the peak absolute difference between an original and a decomposed one. Figure 1-28 shows the magnitude response of a sample filter and its associated magnitude and phase errors following decomposition into 32 branches. Note that error is concentrated in the filter’s stopband and its transition band.
From the inherent symmetry the structure performs an equal number of calculations in every path permitting the optimised processor use. This method is also applicable to real filters as demonstrated in Example 2. In such cases denominators were carefully chosen for filters in each path so that they had real coefficients. It is only required to put conjugate pairs of poles into the same filter, optionally with some real ones (if there are any available). The same algorithm as mentioned above was then applied which returned complex-valued coefficients for every branch filter. It was found
40
DSP System Design
that decomposition errors (reconstruction, magnitude and phase responses) are only mildly dependant on the number of paths.
1.4.4
Example 2 converting a real prototype to real sections
This example demonstrated the decomposition of the real prototype filter from previous example into equal-order sections, but this time forced to be real ones. This was achieved by cancelling all imaginary parts of calculated branch filter numerator coefficients. Peak error values resulting from the performed calculations are summarised in Table 1-5 and the full-band error profiles shown in Figure 1-29.
1. Polyphase IIR Filters
41
For the real decomposition case it is essential to pair (recombine) conjugate roots of the prototype filter into each section of the decomposed structure in order to achieve small errors of the overall decomposition. Otherwise imaginary coefficient parts in the calculated sections become too big to be disregarded. 1.4.5
Summary
The parallel structure with several sections in each path, which we have targeted to test our decomposition approach, has a number of advantages for high-speed applications. It requires the same number of multiplications (2N+1) and summations 2N, as well as memory locations 2N, as the direct IIR filter implementation. The memory requirement for storing old samples is less for the decomposed structure, (2N-M) in comparison to 2N for the direct IIR implementation. The real advantage of the parallel structure comes from the fact that it naturally lends itself to performing all the calculations in parallel, which is ideal for a multiprocessor environment. For M paths the filtering will be performed M times faster. The error introduced through our decomposition as can be seen in Table 1-4 and Table 1-5 is very small. For filter orders up to 128, the peak decomposition errors were less than However it should be noted that the error performance is highly dependent
42
DSP System Design
on the filter specification and the number of paths, and can be minimised by careful grouping of poles in each path The calculation speed is determined by the speed of the algorithm solving the linear equations and the number of subfilters the original filter is decomposed to. For the M-path decomposition the linear equations are solved (M-1) times and as every new subfilter is calculated the size of matrices involved in each calculation is smaller. The direct “Gauss’ elimination method with the selection of basic elements” and the “iterative Gauss-Seidel” one [59] was used in the experiments. For the first one the calculation time required to solve an set of linear equations is about 7 times longer than for an one. For iterative methods like the second one the calculation time is dependent not only on the order of the set of linear equations, but also on the number of iterations (dependent on the required accuracy). The accuracy is certainly also dependent on the filter being decomposed, namely the location of its poles which can heavily influence the accuracy of calculations. For the case of the decomposition of the transfer function of the real filter prototype it is important to properly pair the conjugate roots of the filter denominator (in other words recombine them into the second-order components) in order to eliminate the imaginary part from the calculation within the algorithm. This has been pointed out in the second example. The presented method is similar to the well-known Partial Fraction Expansion (PFE) [81]. The difference is that the PFE calculates sections with numerator order one less than their denominator one. The method presented in this chapter calculates sections having the numerator order equal to the one of its denominator.
2.
MULTIRATE FILTERS IN DATA CONVERTERS
Before physical signals can be processed digitally they have to be converted from the analog to the digital form, by means of an analog-todigital (A/D) converter. Unfortunately this conversion introduces distortion. The stringent requirements, which the modern market imposes on A/D conversion, became achievable only due to the advent of fast and complex integrated circuits. There is a growing demand of more than 13-bit A/D converters which can not be designed solely in analog as the required component matching for a large number of bits is extremely difficult to achieve and sustain. One of the many alternative choices is presented here that allows achieving very high bit lengths: “A/D conversion via the intermediate stage of 1-bit coding, at a very high sampling rate” [11]. Onebit coding is achieved by a Sigma-Delta modulator stage, which is then followed by a decimator in which the sampling rate is decreased and the
1. Polyphase IIR Filters
43
magnitude resolution is increased. This architecture primarily covers audio band applications and is especially attractive for VLSI implementation. In this section we first look at the requirements to be met by the Nyquist and the oversampling A/D converters in digital signal processing applications and the arguments in view of the intermediate stage 1-bit coding. Then the typical modulator is presented. The quantization noise shaping by the modulators and the effects of the noise on the resolution of the A/D converter are also presented.
2.1
Nyquist-rate converter versus oversampling ones
The function of an A/D converter is to convert a continuous-time, analog input signal x(t) into a sequence of digital codes y(k). Generally, A/D converters can be divided into two groups according to the sampling rate: the so called Nyquist rate ones and the oversampled ones (oversampling and oversampling ones) [11]. Block diagrams of the Nyquist converter and both types of the oversampled ones are given in Figure 1-30.
The conventional Nyquist rate converters, called Nyquist Rate Pulse Code Modulation (PCM) one, sample the analog signal at its Nyquist frequency, or a little above. The examples of such are Flash converters, the sub-ranging/pipelined ones and the successive approximation ADCs [12]. All the converters working at the Nyquist rate suffer from the required band-
DSP System Design
44
limiting the incoming analog signal in accordance to the Nyquist criterion. For the Nyquist converter the sampling frequency is required to be:
B
Input signal bandwidth. Input signal maximum frequency. Input signal Nyquist frequency.
Therefore Nyquist converters require to be preceded by a continuoustime analog Anti-Aliasing (AA) filters which band-limits the input signal to 2B. As can be seen from Figure 1-30 (a) the AA filter is clearly the first analog processing element of the conversion chain. The AA filter is required to have simultaneously very sharp transition band, small passband ripples and high stopband attenuation. Achieving and maintaining stringent requirements for analog AA filter even for moderate rate and middle range of resolution (13-16 bits) is very difficult, if not impossible. The second block of the Nyquist converter is the sampler. This is then followed by the amplitude quantizer which limits the number of different sample values. So each sample can be expressed by a finite binary code y(k). Most Nyquist A/D converters perform the conversion in a single sampling interval and at its full precision. The amplitude quantization is based on comparing the sampled analog input with a set of reference voltages, which are usually generated inside the Nyquist converter. Their resolution is determined by the amount of the reference levels that can be resolved. For high-resolution Nyquist A/D converters, establishing the reference voltages is really difficult. For example, a standard 16-bit A/D converter, very common for audio applications, requires different reference levels. For a 2V converter input range, the spacing between levels is as small as is (for the maximum output level V fixed to ±1V). The tolerance of VLSI technology exceeds this value many times. However, there are some techniques (like laser trimming or self-calibration) that can be employed to extend the resolution of the Nyquist converter beyond the component tolerances, but these approaches result in additional fabrication complexity and cost of increased circuit area [12]. Analytic analysis of the quantizer, the non-linear element, is very difficult and many models and approaches developed throughout the years. The simplest and the most commonly used is the linear additive noise model. It assumes the quantization noise power (variance) [13]. If N is made large, then the quantization noise power can be approximated by The quantization noise power for the Nyquist converter is
1. Polyphase IIR Filters
45
assumed constant and white, and uniformly distributed over the range of frequencies from -B to B (see Figure 1-31). Good measure of the performance of the A/D converter is the Signal to quantization Noise Ratio (SNR) which for the input power is [12],[14]:
The way to solve the problems with the resolution of the conventional PCM A/D converter is to oversample it way above the Nyquist requirement in connection with an averaging lowpass filtering and a downsampler (decimator) as in Figure 1-30(b). Implementing the complex and costly decimator pays the price of the increased resolution. Oversampled converters sample at much higher rate than required by Nyquist criterion:
The integer ratio is called the oversampling ratio (downsampling ratio from the decimator point of view). Its value ranges from several tens to several thousands, depending upon the application [15]. Both converters require the analog input signal x(t) to be first filtered by the analog anti-aliasing filter. The aim of this filter is to truncate the bandwidth of x(t) to be within the half sampling rate thus preventing the aliasing distortion. The filter transition band is required to be very sharp for Nyquist converters and can be much wider for the oversampling ones This is because the filter must not affect the input signal within the
46
DSP System Design
baseband bandwidth of B. As the complexity of the filter is a strong function of the ratio of transition bandwidth to the width of the passband, oversampling converters require considerably simpler anti-aliasing filters compared to Nyquist ones with similar performance characteristics. When analyzing the quantization noise power of the oversampled A/D converter again the same additive noise model can be used for the quantizer. Then the total noise power added to the input by the quantizer is which is the same constant as before. Therefore if the noise power is constant, white and uniformly distributed over the whole range of frequencies from MB to MB, then the effect of oversampling the conventional converter is to spread the constant, uniformly distributed quantization noise power over a wider range of frequencies. This effectively increases the resolution of the converter as it reduces the inband noise power. The quantization noise Power Spectral Density (PSD) of the oversampled converter is shown in Figure 1-31. As the noise is now spread M-times more than for the conventional Nyquist converter, the measure for the SNR should be taken over the range -B to B. The SNR becomes [14]:
The variation of the oversampled converter is the Sigma-Delta ADC. The name ADC is often used to describe the class of oversampled converters which employ single-bit quantizer with a noise shaping loop, coupled with a decimator as in Figure 1-30(c). Such an ADC is known as Pulse Density Modulation (PDM) ADC as at every time the relative density of single bit tracks the input amplitude. In ADCs the amplitude resolution, sacrificed by the crude quantization, is trade with the temporal resolution. This gives a way to overcome the limitations of VLSI processing technology by removing the stringent requirements from the analog circuits. The problem does not disappear though by oversampling and noise shaping. It is only shifted to the digital domain, namely the decimation filter. The advantage is that in the digital domain the problem can be solved much easier to the required degree of accuracy, but at the expense of speed, area and power consumption [11]. As shown in Figure 1-25 the main difference between the oversampled PCM ADC and the PDM ADC is the insertion of the loop filter H(z) and a one-bit quantizer with a negative feedback loop to shape the large quantization noise.
1. Polyphase IIR Filters
The low-resolution quantization is first provided by a modulator operating at the high rate, Then the decimator decreases the sample rate but increases the resolution in amplitude. The aim of the modulator is to produce a binary bit signal, which average tracks the analog input signal. The concept of a first-order modulator [15]-[16] is explained in Figure 1-32. It consists of a differential node, an integrator and a threshold (one-bit A/D converter) in the forward path, and a clipper (one-bit D/A converter) in the feedback loop. The modulator output, y(k), is a one-bit digital signal which can be converted by digital methods alone into a b-bit PCM signal by means of a decimating filter [11]. The input to the integrator is the difference between input signal x(t) and the output value y(k) converted back to an analog signal q(t). Provided that the D/A converter is perfect, this difference x(t)-q(t) at the integrator input is equal to the negative quantization error. This error is summed in the integrator and then quantized again by the threshold. Its output is:
The is the quantizer step size and is the input to the quantizer. In each clock cycle the value of the output y(k) of the modulator is either or The larger the quantization error, the longer the integrator output u(t) has the same sign and the longer the output pulse is. When the sinusoidal input to the modulator is close to plus full scale the output is positive during most clock cycles. A similar statement holds true for the
47
DSP System Design
48
case when the sinusoid is close to minus full scale For inputs near zero the modulator output varies rapidly between and values with a mean value approximately equal zero. The consequence of the negative feedback loop surrounding integrator and threshold is that the local average of the analog quantized signal q(k), and therefore the value of the modulator output y(k), tracks the analog input signal x(k). The amplitude resolution of the whole A/D converter increases when more samples are included in the local averaging process. The bandwidth is decreased at the same time. Consequently, the resolution of an oversampled A/D converter is a function of the oversampling ratio M. This means that the resolution of oversampling A/D converters can be improved without increasing the number of levels of the threshold [15]. In many applications, like digital audio, the bandwidth of the signal is small compared to the typical speed of operations possible to be performed in modern VLSI circuits. In these cases, oversampling A/D converters offer the possibility to change this excess in speed into higher resolution in magnitude. Oversampling structures do not depend as much on component matching in order to achieve high-resolution as Nyquist converters do. It is an important advantage for monolithic VLSI implementation [11].
2.2
Quantization noise in lowpass
modulators
Although the quantization error is large in the oversampled A/D converter due to the small resolution quantizer being used, the modulator reduces the quantization noise within the signal baseband. The reduction is performed through the use of feedback and sampling at a rate higher than the Nyquist frequency. Two general approaches can be used to achieve this goal: prediction and noise shaping [15]. Predictive modulators spectrally shape the quantization noise as well as the signal, so they are not so useful for precise A/D converters. Noise shaping modulators, known as modulators, simply shape the noise spectrum without significantly altering the signal spectrum. They do not reduce the magnitude of the quantization noise, but instead shape the power density spectrum of this error, moving the energy towards high frequencies. Provided that the analog input to the quantizer is oversampled, the highfrequency quantization noise can be eliminated with a digital lowpass filter without affecting the signal band. Noise shaping modulators are especially well suited to signal acquisition applications because they encode the signal itself and their performance are insensitive to the slope of the signal. The shape of modulator noise power density function should be found in order to design a proper digital lowpass filter for the decimation process. It is derived using the discrete model of the modulator as in Figure 1-33.
1. Polyphase IIR Filters
49
The discrete model was obtained by replacing the elements of the analog model presented in Figure 1-33 with their discrete equivalents. A simple digital integrator with the transfer function replaced the analog integrator. In order to simplify the analysis of the modulator the linear model of the quantizer was assumed which allowed ignoring the clipper. Also, the quantizer levels were assigned the values of ±1. Under these assumptions it is easy to find out the following difference equation:
The equation (71) has an obvious equivalence in the z-domain:
Equation (73) shows that the input x(k) simply goes straight through the modulator, delayed by only one sample. The threshold error e(k) would be seen at the quantizer output if there were no feedback loop. Due to the feedback, the first-order difference e(k)-e(k-1) appears at the modulator output instead. This difference operation acts to attenuate the quantization noise at low frequencies, thus shaping the noise with function The noise shaping function is the inverse of the transfer function of the filter in the forward path of the modulator. In the choice of the filter, care must be taken to ensure the stability of the whole circuit. For the model being analyzed the shaping function originates from the integrator. Modulators with more than one integrator in the forward path, such as the second-order system (two integrators cascaded), perform a higher order difference operation of the error produced by the quantizer and thus deliver stronger attenuation at low frequencies. In general, the power of the function is equal the number of integrators in the forward path (modulator order).
DSP System Design
50
Bennett [13] proved that error generated by quantizer with quantization levels equally spaced by has the power uniquely distributed in the frequency range
For an modulator with L integrators in the forward path, the noise shaping function is an difference equation described by:
Hence the spectral distribution
of quantization noise e(nT) [15]:
Evaluating the last equation on the unit circle we obtain the expression:
Equation (77) describes a theoretical noise power density function for order modulator as shown in Figure 1-34 for example simple modulators.
1. Polyphase IIR Filters
51
It would be interesting to compare the dynamic range (DR) of the ideal modulator and the Nyquist rate uniform quantizer. The inband quantization noise is calculated by integrating over the baseband:
Factor is the oversampling ratio. Useful SNR or dynamic range, of an A/D converter with sinusoidal inputs is defined as a ratio of the output power at the frequency of the input sinusoid for a full-scale input to the output signal power for a small input, for which SNR is unity (0dB). For an ideal modulator the dynamic range follows from equations (71) and (78):
Dynamic range of a Nyquist rate uniform quantizer with b-bits is [15]-[l6]. Comparing both relations gives the maximum bit resolution of an order modulator for a given oversampling ratio. Results are shown in Figure 1-35.
It is clear that the first-order modulator can cater practically for only up to 12-14 bit resolution A/D converters, where the second and the
DSP System Design
52
third-order modulators have the potential of 20-bit or even higher resolution. The term practically means the oversampling ratio less or equal (higher ones are seldom used). First-order modulators are therefore not attractive for high-resolution applications, unless very low signal bandwidth is required. A further disadvantage of first order systems is the fact that the quantization noise from the first order modulator is highly correlated with the input signal [17]. Their main merit is an absolute stability over the full analog input range. There are some techniques, which increase the resolution of the modulators, like a multi-bit A/D and D/A converters in the loop and an appropriate cascade of first-order modulators. Oversampling modulators offer the potential of about 15-dB reduction in baseband quantization noise for every doubling of the sampling frequency without a need for tight component matching [15].
2.3
Decimation filter for lowpass
A/D converters
This section detail a design technique for high fidelity multistage decimation filters based on the polyphase and decimator structures presented in [4] and [5], catering for powers of two sample-rate decreases. The technique is well suited for Analog-to-Digital Converter (ADC) applications in excess of 16-bit resolution. The resulting filter coefficients are constrained to the required bit length using a “bit flipping algorithm” [18]. This technique is presented comparatively through an example of a cascaded decimation filter, designed for a 20-bit resolution ADC, advantageous to other approximation methods, [19]-[21], incorporating a MAVR decimation stage followed by the FIR compensation. The coefficients and frequency responses of the cascaded filter are also reported. 2.3.1
Introduction
With the advent of high-speed and high precision integrated instrumentation, control and Digital Signal Processing (DSP), the demands for higher-resolution ADCs are on the increase. The need to achieve resolutions in excess of 16 bits monolithically has popularized and reemphasized the scope offered by modulators in conjunction with digital decimation filters, in overcoming the limitations inherent in the conventional analog techniques. The quantization noise generated by the modulator is pushed (shaped) out of the baseband and concentrated at the high frequencies. It is then filtered out by the decimation filter having extremely stringent amplitude (and sometimes phase/group delay) response attributes. A decimation filter specification for a 20-bit resolution ADC employing a third-order modulator with oversampling ratio of M=256, must have overall
1. Polyphase IIR Filters
53
passband ripples less than and noise power level below -131.2dB. Furthermore, for efficient physical realization, the filter coefficients of at least the first few stages of the decimator (operating at the highest samplingrates) should be constrained by design to have as few bits as possible. The cascaded polyphase two-path halfband IIR filter, lends itself to meeting the above requirements with the minimum of implementation complexity (only one coefficient per second-order stage), as well as exhibiting low sensitivity to coefficient value variations. This attribute makes it possible to home in on short coefficient wordlength. A PC based design environment developed in [18] has successfully been used in the design of efficiently implementable binary constrained coefficient ADC decimation filters, for varying bit resolution requirements. The following sections elaborate on and present details of the bit-flipping design algorithm with its associated parameters through the use of the aforementioned decimation filter example, providing comparative results for truncated, rounded and bit-flipped coefficient filters. For clarity of understanding basics of noise power calculation for modulators, and the concept of the multistage decimation is also presented. 2.3.2
Design of the multistage cascaded decimation filter
The first important decision in the design of a based ADC is the choice of the modulator. The quantization noise power generated by the twolevel quantizer is shaped by the modulator loop filter to exhibit a transfer function, which is the inverse of the loop filter (Figure 1-34). In order to estimate the required modulator order the baseband quantization noise power with is the error generated by a quantizer with its levels equal ( ±1), as in (78) is compared to the maximum allowed ADC inband noise power, (where b is the required ADC resolution).). This gives the relation between modulator order, L, and the required oversampling ratio, for the given bit resolution, b [15], [18].
The function in (2) stands for the smallest integer greater than or equal x. If (80) is used for a 20-bit ADC example, a third-order modulator with M=256 is estimated. Having determined the modulator order and the oversampling ratio, the decimation filter can be designed. Since the two-path halfband lowpass filters are being used the decimation filter should be designed as the cascade of stages, each decimating by two, as shown in Figure 1-36.
54
DSP System Design
Allpass filters in both branches of the structure have equal number of coefficients in order to optimize the use of processors that are doing calculations in both branches of the filter. The choice of the N-D allpass filter structure was made considering that for any frequency of the input signals limited in amplitude to unity, the internal results of the multiplication and summations do not exceed two (see Chapter 3 for more details). Both the modulator and the filter must satisfy overall ADC requirements. The output quantization noise power is a sum of the noise introduced by the modulator into the signal baseband and the high-frequency noise aliased during decimation. Total baseband magnitude response passband ripples, assuming that no distortion is introduced by the modulator, is the sum of all lowpass filters passband ripples of all the stages and the noise spectrum aliased into the signal band. For the multistage decimation filter case, decreasing the sampling rate by two at each stage, both output quantization noise power and decimation filter passband ripples can be calculated at the end of each stage using “equivalent lowpass transfer function”:
For a small number of decimation filter stages, the only significant noise aliasing into the baseband originates from the modulator noise spectrum, at frequencies where only one lowpass filter has its stopband replica. Then the filter passband ripples and the noise power at the end of the stage can be computed from:
1. Polyphase IIR Filters
55
The stage-to-stage aliasing in the multi-stage filter cascade is presented in Figure 1-37 for the case of the three-stage decimation filter. The contribution of each stage filter towards the overall transfer function is shown.
The conclusion is that it is possible to design and optimize each filter stage to its required specification independently from the successive stages. In the design of the overall filter one can assign equal noise power alias contribution into the baseband for stages following the current one and allow the passband magnitude response ripples to increase by the same amounts from stage to stage, including the current stage. As the sampling rate is being decreased by two at each stage the required transition bandwidth, TB is smaller at each new stage down to zero for the final one. The transition band of the last stage lowpass filter determines an overall filter bandwidth.
As can be seen from (19), for the given design, T and N are fixed and hence the attenuation is also fixed. In most cases the attenuation is much larger than required. This means that the requirements for the lowpass filter
DSP System Design
56
attenuation of the next decimation stages may be more relaxed and can be satisfied with fewer coefficients. The halfband filters in each stage are designed to have the maximum allowed transition band TB in accordance with (84). This approach leads to maximizing the stopband attenuation, which results in the reduction of the inband noise power, to a level less than the specified. As the design constraint was to produce binary constrained coefficients capable of satisfying the given decimator specification, a “bitflipping” algorithm has been developed [15]. This algorithm is explained in Chapter 3 in detail. In most of the cases the bit constraining approach results in reduced stopband attenuation in comparison to the floating-point case. This also makes it possible to relax requirements for all following stages and to subsequently decrease filter complexity, which is important for physical implementation [22]. 2.3.3
Application of the “bit-flipping algorithm”
The “bit-flipping” algorithm has been designed for optimum coefficient bit constraining. It finds w-bit long coefficients that steer the filter response close to the floating point equiripple case without the necessity of checking for all the (where is the number of filter coefficients) possibilities. It should be noted that, for very short bit-lengths, there may be no satisfactory optimization result and hence the number of coefficients must be increased and the algorithm launched again. The algorithm starts with the floating-point coefficients delivered from the elliptic approximation [5]. A structured exhaustive search of the possible bit patterns yielding improvement in the filter frequency response, starting from the least significant bits of the fixed-point coefficients, is the main philosophy of this approach. The optimization process starts with the first stage filter in the decimator and proceeds sequentially forward until the last stage is reached. If the performance of a given filter in the cascade is better than that required at the end of any one stage’s optimization, this fact is used to advantage by relaxing the specification of all the following stages, hence opening the possibility of reduced implementation complexity. The “bit-flipping” approach delivers more efficient filters for a given wordlength in comparison to when the elliptic filter coefficients are crudely truncated or rounded. A comparison of all three is presented in Figure 1-38 for a filter specification having transition band TB=0.125 (normalized frequency), number of coefficients N=7 and coefficient wordlength w=10 bits. As can be noticed in the figure, the bit-flipping algorithm is superior to the standard truncated and rounded cases, resulting in the highest stopband attenuation and smallest passband ripples outside the required transition band.
1. Polyphase IIR Filters
2.3.4
57
A 20-bit decimator example
An example eight-stage polyphase IIR cascaded decimation filter was designed for the 20-bit Analog-to-Digital converter. Magnitude responses of the halfband filters with their associated bit-flipped optimized coefficients for each stage of the decimation are shown in Figure 1-39 and Figure 1-40.
58
DSP System Design
Equivalent lowpass decimation filter magnitude response as given by the floating point COMDISCO SPW (now ALTA) [23] simulation is given in Figure 1-41. Note that, in order to achieve the appropriate attenuation when using short coefficient wordlength (down to four bits), a cascade of two lowpass filters is employed in each stage.
1. Polyphase IIR Filters
59
The presented multistage decimation filter performs within the 20-bit fidelity specifications for up to 90% of the input signal bandwidth. Furthermore, the quantization noise power has a margin of 7.6dB below the required value as well as having an overall passband ripple, which is below For audio applications where passband ripple requirements are not very stringent, the last two stages of a decimation filter can be simplified to have two or four coefficients only. The filters were designed with even number of coefficients for symmetry reasons. Certainly one must be aware of the fact that, due to the allpass IIR filters used in the two-path structure, the overall filter has non-linear phase and phase corrector should be applied for applications demanding low phase distortion (linear phase). 2.3.5
Summary
This section has presented results obtainable from the bit-flipping ADC decimation filter coefficient constraining and optimization algorithm and demonstrated its superiority over rounding and truncation of the floating point ideal elliptic design. The full description of the bitconstrained optimization algorithm can be found in Chapter 3. The presented method reaches beyond 16-bits, unlike MAVR-based ones [19]-[21]. Although only results from the design of a 20-bit resolution decimation filter example have been presented, higher resolutions can also be achieved. Application of the method for the 24-bit lowpass decimation has been published in [2], [24]-[25]. The technique has been extended to cater for multi-path decimation and interpolation filter design incorporating the polyphase structure.
2.4
Decimation filter for bandpass
A/D converters
The traditional lowpass Sigma-Delta based Analog-to-Digital (A/D) conversion principle has been recently extended to bandpass for direct IF conversion [26]. Such a converter offers high Signal-to-Noise Ratios (SNR) for narrow-band signals relative to the sampling frequency, at significantly lower oversampling ratios in comparison to the conventional lowpass converters. In this chapter a sixth-order modulator sampled at 1.82MHz, coupled with a multistage polyphase decimation filter, is reported for the conversion of bandpass signals centered at 455kHz with 14kHz-bandwidth [27]. The decimation filter is uses a small number of short-wordlength filter coefficients. Simulations undertaken demonstrated that this setup realizes 124.3dB SNR with less than passband ripples for a half/full-scale composite 20-sinewave input signal potential of up to 20-bit performance. Such a converter is able to directly demodulate narrow-band AM signals.
DSP System Design
60
2.4.1
Introduction
Modulation is a technique employed in A/D conversion which makes use of oversampling and digital signal processing in order to achieve a high level of accuracy. Modulators are designed such that the noise is shaped away from the band of interest, thus retaining the original signal in the noisefree band. Utilizing an appropriate digital decimation filter can then filter this noise. Such a decimation filter is composed of a high quality bandpass filter and a sampling rate converter, which brings the sampling frequency down to the required one for a given application. For example for audio applications the sampling rate is decreased down to the audio signal Nyquist frequency. In other applications, like in the front-end of a radio receiver, a bandpass converter [26], [28] is used to perform the direct conversion to digital at either intermediate- or radio frequency. Filters used in the decimator must perform both proper reduction of the out-of-band quantization noise and prevent excess aliasing introduced during sampling rate decreasing. They must also be very efficient computationally as the filtering is usually performed at a high rate. Additionally, for precision conversion of wide-band signals they must also have very small passband ripples, less than half the quantization step for the given resolution. Here polyphase structures come in very handy. They can achieve lowpass filtering with very small passband ripples, very high stopband attenuation for a very small computation burden. Their application for audio band lowpass conversion has already been reported in several publications [5]. Such structures are also attractive and viable for bandpass decimation filters. 2.4.2
Bandpass
modulator design
The typical way of designing a bandpass modulator is to use an existing high-quality lowpass modulator and perform a frequency transformation on the feedback loop filter transfer function. If in the prototype modulator the loop filter is a cascade of three first-order integrators (L=3), after transformation we get a modulator as in Figure 1-42.
1. Polyphase IIR Filters
61
The modulator uses a cascade of three second-order notches located at half Nyquist (quarter of the sampling frequency). Equation describing the behavior of such a modulator is given in (85).
Function E(z) is the quantization error assuming ideal 1-bit quantizer.
Even if the resulting noise shaping transfer function and therefore the noise power spectrum (86) are dependent on modulator coefficient values‚ for high oversampling ratios the noise transfer function can be approximated by The theoretical shape of the quantization noise for a three-loop modulator used in our converter design is given in (86) which assumes a linear additive noise model for the quantizer and presented in Figure 1-43 for a half-scale signal composed from twenty orthogonal sine waves.
Integrating (86) over the signal bandwidth where R is the bandpass oversampling ratio‚ gives the inband noise power received from the modulator (88).
62
DSP System Design
For each octave increase of the oversampling ratio the inband noise power of the modulator decreases by 6n+3 dB‚ where n is the number of filter notches [26]. The theoretical results of inband noise levels for different numbers of loops and oversampling ratios are given in Figure 1-44.
In order to achieve the required 20-bit conversion resolution for a 14kHz bandwidth and minimize the computation burden permitted‚ a sixth-order bandpass (BP) modulator was chosen. To achieve this while maintaining appropriate SNR of at least 124.3dB or more‚ a sampling frequency of 3MHz was chosen (R=64). Employing a half-scale input signal‚ the theoretical modulator SNR was found to be 124.8dB. The decimation filter had to be designed so that the noise aliased into the signal band did not increase the noise level by more than 0.5dB‚ with up to 1.4kHz transition band and less than passband ripples. The modulator exhibits a
1. Polyphase IIR Filters
63
75mdB roll-off itself for the composite tone input and a linear approximation of the quantizer‚ which can be compensated after decimation at the signal Nyquist rate with an appropriate compensation filter or by careful design of the modulator coefficients. 2.4.3
The design of the cascaded bandpass decimator
In a bandpass A/D converter the modulator is followed by a digital filter‚ which converts the high-speed bit stream into a multiple-bit output at the Nyquist rate. For bandpass modulation‚ one needs to perform narrowband filtering on a high-speed bit-stream. One can modulate the band of interest down to DC using where is the signal center frequency‚ splitting the modulator output into real and imaginary channels followed by lowpass decimation filters as in Figure 1-45 [28].
For the sine and cosine sequences have very simple structures: each term is either zero or ±1. Such multiplicands can be achieved by simple Boolean operations. Then the decimation can be done with lowpass filters‚ identical for both real and imaginary channels‚ combined with the sample rate decreaser. In the case presented here‚ a two-path polyphase structure is used to perform the lowpass filtering [2]‚ [5]. The decimation filter can be designed exactly in the same way as it was done in section 1.2.3.3 for the case of the lowpass ADC considering the shape of the noise power density spectrum being the same as for the lowpass modulator used before for constructing the bandpass modulator. One should only bear in mind that‚ because of the demodulation‚ the decimation filter has to be designed now for the lowpass oversampling ratio of M=R/2. For the case of the oversampling ratios equal powers of two‚ decimation filters can be designed as a cascade of two-times decimation stages using two-path halfband polyphase filters as in Figure 1-46.
64
DSP System Design
The example design employs seven of such stages‚ decreasing the sample rate by M=128. Cascaded decimation filter comprising double-lowpass filter sections with two 4-bit wide (unsigned) coefficients (0.125‚ 0.5625) in first five stages‚ four 8-bit long (0.04296875‚ 0.17187500‚ 0.39453125‚ 0.74609375) in the sixth one and six 10-bit long (0.0810546875‚ 0.2763671875‚ 0.4990234375‚ 0.689453125‚ 0.833984375‚ 0.947265625) in the seventh one. The filter coefficients were again optimized by the specially developed “bit-flipping algorithm” [29]. The out-of-band noise was attenuated 135.8dB in stages one to five‚ 146dB in stage six and 123.8dB in stage seven. As a result the total quantization noise aliased into the signal baseband was at a level of -133.8dB. The first five stages can be easily implemented with hardwired shifts-additions as they require only six shift/add operations per stage. Also‚ the first stage works with 1-bit data stream somewhat simplifying matters. In contrary‚ if multiplications in stages six and seven were performed through shifts-adds‚ they would require 51 operations for stage six and 96 for stage seven employing 28-bit convergent-round arithmetic. These calculations need to be performed by a specially designed ALU.
The first five stages of decimation can be integrated together into a bandpass filter working at the high rate and the sampling rate decreaser by 32 incorporated after the demodulator. This idea is given in Figure 1-47. The prototype polyphase lowpass filter (0.13349539733‚ 0.57730761228) having
1. Polyphase IIR Filters
65
attenuation of A=65dB each and transition band of was converted into the bandpass filter through the lowpass-to-bandpass frequency transformation [30]. The DC feature of the lowpass filter was shifted to half-Nyquist and the edge of its passband, originally at was used to create the bandwidth of 0.0078125·fs. The resulting tenth-order IIR bandpass filter transfer function was a function of and had evensymmetric denominator and odd-symmetric numerator. This led to simplifying computations down to only four floating-point multiplications for both bandpass filters, but their coefficients had to be at least 22 bits long to achieve the correct magnitude response. The last two stages of decimation were exactly the same as in the previous structure. The overall performance of both structures was exactly the same. 2.4.4
Decimation Filter Performance Evaluation
The overall decimation filter performance is summarized in Figure 1-48 showing the overall decimation filter passband ripples. As the impulse test signal does not provide enough energy for sufficient illumination of the transfer function‚ the twenty-tone test signal had to be designed.
It comprises twenty equally weighted (each having amplitude of 0.05) and approximately equally spaced (roughly spanning the whole of the baseband)‚ mutually orthogonal sine waves. This input signal was designed as suggested in [31] for coherent spectral analysis purposes. Such a 20-tone signal was passed through the modulator and then applied to the designed decimation filter. The magnitude response of the output signal from the decimation filter is shown in Figure 1-49. The noise shaping of the inband quantization noise can be noticed in the frequency range where the multitone signal exists.
66
DSP System Design
Magnitude response of the bandpass filter output signal is shown in Figure 1-50. Note that the filter was designed to perform exactly like the first five filter stages from Figure 1-46 and in result the difference between overall noise and ripple performance for both structures are negligible.
The passband ripples obtained were less than with SNR=124.3dB‚ i.e. quantization noise level below 124dB (assuming half full-scale input). For the structure in Figure 1-46 all calculations in the first five stages can be done as hardwired shift/add arithmetic units. The last decimation stages would require a specialized type of processor similar to the one used in [2].
1. Polyphase IIR Filters 2.4.5
67
Conclusions
In this section a sixth-order bandpass A/D converter with the potential of conversion of bandpass (AM) signals centered at 455kHz with 14kHz bandwidth for up to 20-bits of resolution have been presented. The bandpass modulator is sampled at 1.82MHz and employs the filter transformed from a lowpass prototype through transformation. This modulator coupled with a seven-stage polyphase decimation filter delivered SNR=124.3dB‚ with decimation filter passband ripples less than The use of the polyphase structures in the decimation process allows minimization of the hardware complexity by using a small number of shortwordlength filter coefficients. The simulation result indicates SNR=124.3dB with less than passband ripples for a half full-scale composite input potential of up to 20-bit performance. The resultant converter directly accomplishes demodulation of the narrow-band AM signals. The performance achieved by the design and reported in this section is very difficult‚ if not impossible‚ to achieve by other means.
2.5
Polyphase IIR interpolation filter for lowpass DAC
Oversampling and incorporating modulators improves the accuracy of D/A converters (DAC) just as it does for the Analog-to-Digital Converter. A reason for using modulators for D/A conversion is easily understood when analyzing conditions for the 16-bit DAC. If the converter is using the typical 3V reference voltage‚ then the voltage corresponding to the permissible half-LSB error is This approximately equals the voltage generated by about a dozen electrons stored in a 0.1pF capacitor [32] and is comparable to the thermal noise present at the input of the MOS operational amplifier. The direct design of such a converter will require very expensive trimming and/or calibration procedures. The oversampling technique overcomes the problem of the analog accuracy by trading the digital complexity and speed for lower sensitivity to analog non-idealities. As there are fast and dense digital circuit realizations available due to the state-of-the-art technologies‚ such a tradeoff is very desirable. The general structure of the oversampled DAC is shown in Figure 1-51. The input to the converter‚ x(k)‚ is now a multi-bit digital stream of long words coming at the data rate In most cases is close to the Nyquist frequency of the signal. This signal is first processed by the interpolation filter which changes the data rate of the incoming signal to the higher rate where L is the Sampling Rate Increase (SRI) factor‚ and then suppresses the spectral replicas located at multiples of the old data rate This is done by the high quality lowpass filter following SRI block.
68
DSP System Design
At the output of the interpolation filter the data will have wordlength which is the same or slightly smaller than (Figure 1-44) [32]. This signal then enters the standard modulator‚ which drastically changes the wordlength‚ typically down to a single bit. It can be exactly the same as for oversampled A/D converters. The quantization noise power introduced by such drastic truncation is shaped in a way so that most of it lies outside the baseband. Next the truncated signal is converted into an analog signal by an internal one-bit DAC‚ the first analog stage of the circuit. Since its realization is conceptually simple‚ it can be made linear. The analog DAC output contains a linear replica of the digital input signal x(k) and a large amount of quantization noise. As most of it lies out of the band of interest‚ it can be almost completely suppressed by an analog lowpass filter following a D/A converter. In general the overall performance of the overall converter‚ for ideal operation‚ is determined by the noise shaping loop of the modulator. The requirements for both the digital and analog lowpass filters are dependent on the shape of the loop‚ in and out of the baseband‚ as they must limit the amount of minimum stopband loss due to digital signal replication (interpolation lowpass filter) and due to the quantization noise (analog lowpass filter). The interpolation filter‚ in contrast to the decimation filter used in the A/D converter‚ has a slightly different purpose. It has only to increase the sampling rate and make sure that the power of the signal replicas‚ originating from Zero Insertion or Zero-Order hold by the SRI‚ are attenuated to the appropriate level as required by the given ADC b-bit resolution‚ It also assures that their amplitude is below Additionally it has to make sure that the signal is not distorted in its baseband‚ i.e. the passband ripples of the interpolation filter should be less than half of the LSB‚ for the bipolar converter and for the unipolar one. In contrast to the A/D converter application the interpolator does not filter out the quantization noise from the modulator. This task‚ in the discussed D/A converter‚ is given to the analog filter. If the interpolation filter is used in the multirate system in which the result of the interpolation is subsequently decimated back to the original input rate then the interpolation lowpass filter has to satisfy a more stringent condition‚(89).
1. Polyphase IIR Filters
69
Symbol M=L is the decimation factor. This condition makes sure that‚ following the decimation‚ the ripples originating from the passband and the stopband of the interpolation filter will be less than the required one for the given b-bit accuracy. If (89) is satisfied then the subsequent decimation can be performed without the anti-aliasing lowpass filter. Note that equation (89) assumes flat stopband characteristic. The polyphase based LPF structure‚ discussed earlier in this chapter‚ can be used to accomplish the lowpass filtering needed in the interpolation process as effectively as it could do in the previously described decimation process. The similarity between decimation and interpolation is very close and it will be shown how this relation can be used to design the interpolation filter in a similar way to what was done for the decimation filter in a number of two-times interpolation stages. In such a case at each successive stage the zero-insertion is followed by the lowpass filtering by the polyphase halfband filter. Thanks to its simple structure and small number of short fixed-point coefficients‚ filtering can be performed faster than by other types of LPF.
The equivalent magnitude response of the interpolation lowpass filter has its shape the same as for the multistage decimator and is presented in Figure 1-52 for the three-stage interpolation. Assuming that the LPF in each interpolation stage have the same stopband ripples‚ and the same passband ripples‚ the equation (89) can be simplified to (90).
70
DSP System Design
Considering that passband ripples of the polyphase structure are much smaller than its stopband ripples‚ as in (14)‚ hence they will not significantly influence the overall passband ripples after the decimation. Additionally‚ stopband ripples aliased into the baseband and originating from the overall interpolation filter stopband regions in which more than one-stage filter has its stopband are negligible in comparison to those aliased from other regions. Under these assumptions (90) simplifies to:
Then the stopband attenuation of each stage LPF should be less than:
The SNR considered as a ratio of the total power of the signal replicas to the power of the signal in its baseband can be calculated from Figure 1-52:
The is the stopband attenuation of the stage filter; is the power of the signal in the baseband and is the power of the signal replicas in the stopband of the interpolation filter. Equation is based on the assumption that only parts of the interpolator stopband where the attenuation is the smallest are significant. Stage filter stopband attenuation for the total power of the signal replica in the interpolator stopband less than can be found from:
1. Polyphase IIR Filters
71
The choice of the transition band for each subsequent interpolation stage is different than for the case of the multi-stage decimation filter. The input signal to the interpolator is a full-band signal. After two-times SRI the input signal becomes squeezed within the baseband range of up to of the normalized frequency with a mirror replica above this frequency up to Nyquist. The transition band‚ should be theoretically zero. This is similar to the case of the last stage filter in the multi-stage decimation (refer to Section 1.2.3.2). Practically the transition band of the first filter is chosen very small; just how small depend on the application. One must take care with its choice as it limits the input signal bandwidth of the overall interpolator to At each subsequent stage‚ as the sampling frequency increases‚ the transition band is more and more relaxed as well as the complexity of the lowpass filter decreasing. The requirements for the transition band for each interpolation stage‚ m‚ can be specified as follows:
The standard implementation of the interpolator requires the lowpass filtering to be performed at the high sampling rate (after SRI). The standard zero-insertion interpolation‚ as in Figure 1-53(a)‚ gives the way to simplifying the structure of the polyphase interpolation filter. It can be noticed from Appendix A that the allpass filter that is a function of has its response relative to the input every other sample. This means that output samples at even sampling intervals are independent of the samples at odd sample intervals‚ and they are responses to only even and only odd input samples respectively. This means that zeros inserted into the structure will not have any influence on the output and therefore the SRI block can be moved to the input of the filter (similar to Curtis’ decimator structure [33])‚ as in Figure 1-53(b) for a single LPF interpolator in order to use the allpass filters in the structure most effectively.
72
DSP System Design
The example shows the performance of the polyphase structure used in the interpolation filter for a bipolar b=20-bit Digital-to-Analog (D/A) converter with an L=256 times oversampling modulator. Design process is similar to decimation filter design and the required passband ripples are:
As the interpolation filter is used for D/A application and there is no subsequent decimation involved then the minimum required stopband attenuation for a 20-bit D/A for each filter stage is calculated according to (95). In order to avoid the signal replica affecting the baseband frequencies‚ these replicas must be attenuated below half of LSB. The minimum required stopband filter attenuation was that was chosen as the requirement for the filter design. Transition band of the first interpolation stage is chosen arbitrarily as which limits the baseband to 95% of the original signal bandwidth. Cascaded interpolator requires eight two-times interpolation stages. In order to simplify the requirements for lowpass filters‚ each interpolation uses a cascade of two identical polyphase lowpass filters both designed for half of the required minimum attenuation of
The designed interpolation filter has more than of the stopband attenuation and passband ripples. The ratio of the total
1. Polyphase IIR Filters
73
power of the out-of-band replica to the signal power SNR=151dB. These parameters clearly demonstrate that the designed interpolator exhibits 20 bits of accuracy. The overall interpolator magnitude response is shown in Figure l-54(a) and its baseband ripples in Figure l-54(b). All filter coefficients were constrained to fixed-point values with the bitflipping algorithm detailed in Chapter 3. The first two stages of interpolation require much more accurate representation of the filter coefficients due to the small transition bandwidth and high stopband attenuation demands. The first stage‚ which has the most stringent requirements and required five 12-bit long coefficients. The next stage‚ for which the transition width was more relaxed needed only two 11-bit long coefficients. The remaining stages‚ operating at higher rates‚ were designed to have two and one coefficients only. It is worth noting that filters in stages three to eight require only 6 shift-and-add operations each to perform its filtering. The whole filter requires as little as 104 shift-and-add operations for the whole interpolation filter.
The interpolator structure presented here has by default a non-linear phase as it is designed using IIR filters. The total peak-to-peak group delay ripples in the passband are samples. The phase linearity of the interpolator can be a problem for some applications. However‚ for some applications like audio it will not be a problem as the human ear is insensitive to phase‚ except for the difference of phases of signals coming to both ears creating the sense of the direction of sound (Hass effect) [80]. For other applications the interpolation filter will have to be coupled with
74
DSP System Design
additional filters performing the compensation for phase linearity. A method of compensating phase non-linearity of the IIR polyphase structure is presented in the next section. It incorporates the same type of allpass filters as used for constructing the lowpass filter for phase compensation. The suggested structure of the interpolation filter using a cascade of twotimes interpolation stages and using two-path polyphase lowpass filters is novel [16] and can be applied to a number of applications‚ from D/A converters to multirate filtering requiring both up and down sampling. Both decimation and interpolation filters can be in such cases designed using the same polyphase structure. Performing calculations in a number of stages operating at different rates allows the optimized use of the arithmetic unit (processor). Additionally using the structure in Figure 1-53(b) avoids making any redundant calculations (due to zero insertion) by the interpolator. The performance shown before for both types of filters are very difficult if not impossible to achieve by other types of filters‚ especially when implementation issues are crucial. They require very little computations due to the small number of multiplications done in fixed-point arithmetic. In many cases such multiplications can be easily done using few shift-and-add operations. Such filter implementation makes the polyphase decimation and interpolation filters very competitive to other types of filters having similar specifications.
2.6
Application of multi-path polyphase IIR filters for cascaded decimators and interpolators
The cascade of two-path two-times decimation filters‚ as it was shown in previous sections‚ permits achievement of a very good decimation and interpolation filters but is applicable only for conversion ratios equal powers of two. Applying more than a two-path structure allows extending the choice of possible conversion ratios to any integer number. Then‚ the number of paths of the filters in cascade must be equal to all smallest integer divisors of the oversampling ratio‚ each of them being a primary number. For example‚ if the oversampling ration is M=60 then there will be two two-path twotimes conversion filters‚ one three-path one and one five-path one. In general one can also choose to use one six-path conversion stage and one ten-path conversion stage or any other combination. However‚ in such case the transition bandwidths required for each of the filters would be more stringent and it would result in more coefficients required (for the whole decimation/interpolation filter) than if the smallest number of paths were used. This implicates that increasing the number of paths and the number of coefficients accordingly would not allow achieving the same stopband attenuation. It would result in the smaller one. One can remember that more
1. Polyphase IIR Filters
75
than two-path polyphase filters exhibit spikes in their stopband and the obvious question arises if these would affect the performance of the whole cascaded decimation/interpolation filter. However‚ these spikes are at such frequencies‚ which at the sampling rate conversion will alias only into the transition band. Secondly‚ if such a filter is not the last one in the cascade‚ the spikes would be canceled by zeros on the unit circle of at least one of the next stages of the decimator/interpolator. This is shown in Figure 1-55 for decimation by M=12 assuming that filters with same attenuation of A=50dB.
It can be seen that the spike at is canceled by the zero of the third decimation stage. The idea is analogous for the cascaded IIR polyphase interpolation filters. To confirm the theoretical the equivalent lowpass magnitude response of the designed decimation filter is also presented which confirms the shape of the theoretical one.
3.
ALMOST LINEAR-PHASE FILTER DESIGN
For systems‚ which cater for wide-band signals‚ it is important to ensure same time differences between signal spectrum components before and after filtering. This requirement is met if phase response of the filter is linear without constant factor‚ thus having a constant group delay function‚
76
DSP System Design
A flat group delay function is all that is required to assure the phase response linearity for a polyphase lowpass filter. This is due to the phase response equal zero for DC and continues across the whole baseband. The phase corrector can be designed in two ways‚ either straight by designing the corrector phase response shape to be opposite to the one of the filter or by making the group delay of the filter flat by designing its coefficients accordingly. The first method can be applied if a typical IIR corrector is designed. Then the shape of its phase can be specified. The latter method uses the correction filter or a set of them‚ phase responses of which are only approximately opposite to the shape of the phase response of the original filter. It can be noticed that phase response of the allpass subfilter with negative coefficient can be used to correct the phase of the polyphase lowpass filter for frequencies below Certainly the quality of correction is better for smaller frequency bands. A cascade of different order allpass sections can be used for better correction. Phase correction is normally required only in the signal band. The smaller the bandwidth the easier the correction is and the simpler the implementation is. The number of coefficients required for the same compensation is smaller - as is the requirement for the coefficient wordlength.
3.1
Phase compensation with allpass sections
An important problem in the cascaded decimation filters incorporating the polyphase structure is non-linear phase. Phase non-linearity is small for first stages of decimation (high oversampling ratios) and grows quickly with each succeeding stage. Linear phase is not an important issue for applications dealing with single sine signals as often happen in testing and measurement. However for applications requiring high accuracy and dealing with bandpass signals‚ the decimator needs to have its phase linearity compensated. Allpass sections having negative coefficients are very attractive for correcting the group delay (phase linearity) of the polyphase lowpass filters‚ especially within frequency bands which are the power of two divisions of the Nyquist frequency‚ i.e. 0.25‚ 0.125 etc. The phase response shape of the allpass section‚ used to build the polyphase filter (polyphase decimator)‚ having a negative coefficient‚ is opposite to the one having the positive coefficient. It can be seen from (17) that the total group delay of the polyphase structure is equal to the average of its allpass components and will follow the same bell-like shape of the group delay‚ peaking at half-Nyquist for second-order structure‚ quarter-Nyquist for form-order structure‚ etc. Here only the second-order case will be considered.
1. Polyphase IIR Filters
77
Let’s consider first the simplest corrector‚ being a single second-order allpass section. Best correction is achieved when the group delay of the compensated filter at DC is equal to the group delay at the cutoff frequency‚ This means (skipping the constant term of 1/2):
Symbols are corrector coefficients and The effect of this correction can be seen in Figure 1-56 for the two-coefficient (0.125‚ 0.525) two-path LPF with cutoff of Even a one-coefficient corrector gives a significant‚ 6.5 times decrease of group delay peak-to-peak error in the signal band as shown in Figure 1-56 and Table 1-7.
Cascading more higher-order allpass filter corrector sections improves phase linearity. Although consecutive orders can be chosen‚ it is preferable to use even-order sections. Such a choice assures the symmetry of compensation as well as making them easier to integrate into the decimation filter where they could be placed after the sample rate decreaser. The order of the second compensator section should be chosen to be i.e. the full K-section compensator transfer function is:
78
DSP System Design
Higher order compensator coefficients can be calculated by minimizing the sum of the differences of the resulting group delay squared with respect to an average value‚ as in Figure 1-57‚ i.e.:
Downhill-Simplex optimization method was used to minimize (101) [59]‚ It was observed that for the number of corrector sections greater than three‚ the result was very dependent on the starting point of the optimization. Therefore approximate values had to be calculated before the optimization. Notice that absolute values of consecutive section coefficients approximately follow a geometric series. Therefore it was enough to find the coefficients of the first two sections to specify a good starting point.
Approximation of the second section coefficient is explained graphically in Figure 1-58. The coefficient is calculated to satisfy the equation:
1. Polyphase IIR Filters
79
Symbol is the group delay of the filter compensated with a single allpass section. The initial values of compensator coefficients applied to the optimization can be calculated as:
The example of the filter group delay compensated with four allpass sections is shown in Figure 1-59. Original filter had two coefficients (0.125 and 0.5625) and its group delay was compensated up to Group delay peak-to-peak error decreased from 0.43 to 1.3e-4‚ over 3200 times! Compensation results for the same filter with the number of sections ranging from one to four and cutoff frequencies between 0.03125 and 0.25 are summarized in Table 1-7.
The important issue for the implementation of the polyphase cascaded decimation filter is the high-speed calculations‚ especially in its first stages (operating at the highest frequencies). Therefore it is required that the compensator coefficients are also constrained to a small bit length. The first stages‚ where the group delay is not big anyway‚ can be compensated with Such multiplication can be implemented as a single shift-and-add operation. The last stages would require more sections and longer coefficients‚ but then there is much more time for calculations as the sample rate is much slower than in the first decimation stages.
80
DSP System Design
Floating-point coefficients were constrained to four‚ eight and sixteen bits using a “bit-flipping” algorithm. One must consider that‚ if the required wordlength of coefficients are too small‚ the effectiveness of compensation
1. Polyphase IIR Filters
81
will be limited‚ sometimes not feasible. Then alternative solutions must be sought. Result of limiting floating point coefficients to four eight and sixteen bits are shown in Table 1-8 and Table 1-9.
82
DSP System Design
Factor K used in the tables is the ratio between the baseband group delay peak-to-peak error before optimization and the one obtained after optimization and can be calculated from equation (87).
1. Polyphase IIR Filters
83
Comparing the results of the constrained coefficient compensator with the floating-point ones reveals that the latter ones give much better performance. This shows that compensator coefficients must be designed and implemented using long arithmetic wordlength. In many cases even eight bits were too small if wordlength like 16 bits were to be used‚ the effectiveness of the compensation was really amazing. Sometimes by using three of four compensator sections the group delay ripples can be decreased even a few thousand times (see results for small passbands in Table 1-8). The effectiveness of the compensation decreases quickly when decreasing the wordlength of the coefficients because their values in consecutive allpass sections follow a geometric series‚ quickly converging to zero. Therefore for small wordlength some of the coefficient values are too small to be represented with the given number of bits. The performance of the compensation is the smallest for passbands reaching (short transition bandwidths). The performance decreases very little when the wordlength is decreased. Although the group delay ripples are decreased only two to three times‚ one must remember that the bell-like shape of the polyphase halfband lowpass filter is peaking at and therefore even such small compensator performance is still giving considerable decrease of the absolute group delay ripples. The performance of the constrained compensation is much better for smaller passbands‚ just like for the floating-point versions. The performance for wordlength of 4 or 8 bits is much worse than for 16 bits or floating-point cases. This decrease is much clearer for three or four coefficients than for one or two coefficients. Simply the coefficients of the first sections can be much easier represented (approximated) with a small number of bits than the successive ones. As they are much smaller‚ there is not enough dynamic range to represent them properly. Therefore usually the three or four coefficient compensators degenerate to two or one coefficient cases (compare results of passbands 0.0625 and 0.03125 in Table 1-8 and Table 1-9). Looking at the results of constrained compensators it can be noticed that there is a coefficient -0.0625‚ which appears many times as the result of 4-bit designs. It can be included into first stages of decimation to do primary compensation at the higher frequencies. The more crude and fine compensation can be done later after all filtering operations. This sounds very attractive for multistage polyphase decimation filters requiring minimum group delay distortion. For such filters a small compensation could be performed at each of the first stages‚ giving approximately ten-fold
84
DSP System Design
decrease of the group delay ripples at a cost of a single shift-and-add operation and two delayers. This is what is required to implement the compensator having the coefficient -0.0625. At every next stage when the arithmetic wordlength increases and the sampling interval increases‚ the compensator can have more bits and more coefficients giving better and better performance. At the last stage when the passband reaches v=0.25 the compensation can be performed using a full-blown FIR/IIR filter. The requirements for the last stage will be much smaller than it would be if there were no prior compensations performed. In conclusion‚ the compensation method presented in this chapter is a good method for compensating the group delay ripples of the polyphase lowpass filters. It can be used in two different ways: one is to do fast but small compensation during the filtering (small coefficient wordlength) and would not involve much calculation or much time‚ the other one is to do slower but more effective compensation (larger wordlength and more coefficients) and would require more calculation and more time. The choice is certainly dependent on the application. Certainly either method is competitive to using a full-blown FIR/IIR compensator in terms of the amount of calculation and time required achieving the same compensation performance. The results achieved very clearly demonstrate that such a type of group delay compensation can be effectively used to perform a good compensation at a low computational and hardware cost. It can show above that it can be also easily included into each stage of the polyphase multistage decimation filter.
3.2
Approximating the bulk delay phase response
Alternative solution to linear-phase polyphase filtering is to rearrange the standard structure from Figure 1-2 by replacing the allpass filter in the lower branch with a bulk delayor (simplest form of an allpass) as in Figure 1-60.
The lowpass polyphase structure works on the principle of the allpass filters from both branches being in-phase at the low frequencies and distant
1. Polyphase IIR Filters
85
by at the frequencies close to Nyquist. By putting a bulk delayer‚ (K equal to the order of the allpass filter in the upper path)‚ into the lower branch‚ the top allpass branch will have to be designed to follow the linear phase response of the delayer to achieve the lowpass characteristics. In order to achieve equal phase characteristics of both branch filters at low frequencies (required for lowpass filtering) the order of the delay is one less than the order of the allpass filter in the top branch. The idea as such is not new and was suggested and used by Curtis and employed in a design routine recently published by Lawson [34] and Lu [35]. The current design methods based on the standard idea of composing two identical IIR (non-linear phase) filters to achieve approximately linear-phase characteristic [34] or apply iterative quadrature programming methods [35]. Such an approach does not allow much flexibility‚ limiting the number of points of freedom to half of what would be available when standard IIR filter is used; thus the resulting filter has larger stopband ripples and larger group delay ripples than the structure is really capable of achieving. Our design routine uses the Matlab weighted least-squares optimization routine to approximate the phase of the delayer in the filter passband and in the filter stopband. The transition band was not controlled at all. The important factor was the choice of the weighting function. Choosing constant weights for passband and stopband led to stopband ripples decreasing monotonically with frequency while passband ripples were monotonically increasing. Therefore an iterative method was applied which was changing the weighing function according to the shape of the envelope of the passband/stopband group delay ripples at every iteration. Because the general IIR filter is used in the top branch of the structure‚ which obviously does not have to be symmetric against it is important to monitor the group delay ripples both in the filter passband and its stopband. The passband ripples are responsible for achieving small passband ripples and the stopband ones for high stopband attenuation. The design routine can be described as follows: 1. Specify the required complex magnitude response shape to be equal the response of the delayer in the filter passband and equal the response of the delayer in the filter stopband. Also specify the frequency grid in a logarithmic scale to be denser close to the transition band. 2. Choose the weights of the optimization routine equal to unity at all frequencies‚ W(v)‚ 3. Perform weighted least-squares fit to the frequency response data. 4. Calculate the group delay of the resulting filter‚ and interpolate 5. Calculate maximum of the group delay function‚ the function through these points.
DSP System Design
86
6. Update the weights:
7. Normalize weight to its maximum value.
8. If the iteration number is less than the limit‚ proceed to point 3‚ otherwise deliver the answer vector. It was found during experiments that a maximum of four iterations was required to achieve the final result within 1% difference with regard to the result obtainable if iterations were to continue for more iteration. In order to measure the performance of the method it was compared to the similar approaches suggested by Lu and Lawson. Example filters were designed according to the specifications given in their papers [34]‚[35]. Comparative results showing the stopband attenuation‚ the ripples of the magnitude and group delay responses for the given passband and stopband cut-off frequencies‚ and respectively‚ are given in Table 1-10. Plots of the designed filters are shown in Figure 1-61 and Figure 1-62.
1. Polyphase IIR Filters
87
It can be clearly noticed that the method presented in this section is advantageous when contrasted with both Lu’s and Lawson’s. In both cases the filter order was chosen one less than the ones in the competitive designs‚ even so leading to much better passband and stopband ripples as well as group delay ripples. There are two values given for the group delay ripples. The first one is the ripples within 96% of the bandwidth and the second one for the full bandwidth. It can be noticed that the stopband ripples are not equal. The purpose was to make a compromise between achieving maximum attenuation and minimum group delay ripples in the passband. However‚ it is not possible to achieve both equiripple group delay and equiripple stopband ripples [36].
DSP System Design
88
It was noticed that increasing the requirements for the group delay ripples led to decreased stopband performance. Decreasing group delay ripples by a few percent was causing a few dB decreases in stopband attenuation. The design method was tested on the number of examples for different cutoff frequencies and transition bandwidths. The best performance for the given filter order and transition band specification are achievable for the case of the halfband filter. In such a case the design takes the advantage of the symmetric allpass filter response which is easier to achieve. The example design uses similar filter specification to the ones from Table 1-10 with both bandedges shifted in order to set the cutoff frequency at Performance of same filters forced to be symmetric is shown in Table 1-11.
It can be seen that the stopband attenuation does not increase much when the filter is symmetric against improving only by 1.5dB for Lu’s modified specification and not at all for Lawson’s modified filter. However‚ there is a difference in the group delay ripples‚ which became twice smaller for both example filters.
4.
POLYPHASE FFT ECHO CANCELLATION
In this section an example modification of the Subband adaptive polyphase FFT echo cancellation system [79] is presented in which the standard FIR filter banks are replaced with the polyphase IIR structure. It is demonstrated that such an alternative approach results in a much more computationally efficient implementation combined with more accurate channel detection and improvement in the adaptation speed.
4.1
Introduction
Adaptive signal processing applications such as adaptive equalization or adaptive wideband active noise and echo cancellation involve filters with hundreds of taps required for accurate representation of the channel impulse response. The computational burden associated with such long adaptive filters and their implementation complexity is very high. In addition adaptive filters with many taps may also suffer from long convergence time‚
1. Polyphase IIR Filters
89
especially when the reference signal has a large dynamic range. It is well known that subband adaptive techniques are well suited for high-order adaptive FIR filters‚ with a reduction in the number of calculations by approximately the number of subbands‚ whereby both the number of filter coefficients and the weight update rate can be decimated in each subband. Additionally faster convergence is possible as the spectral dynamic range can be greatly reduced in each subband [79] and [80]. A number of subband techniques have been developed in the past that uses a set of bandpass filters; block transforms [81] or hybrids [82]‚ which introduce path delays dependent on the complexity of subband filters employed. The architecture proposed in [79] avoids signal path delay while retaining the computational efficiency and convergence speed of the subband processing. The architecture of the delay-less subband acoustic echo cancellation employing LMS in each subband is shown in Figure 1-63 [79].
In this structure x(k) is interpreted as a far-end signal. The signal d(k) is interpreted as the signal received by the microphone containing some echoes after passing through the channel and e(k) is the error (de-echoed return signal as defined in [79]). Both x(k) and e(k) are decomposed into 16 bands and decimated down by 16. The LMS algorithm calculates 16 individual filters for each of the subbands‚ which are then re-composed back into one in the frequency domain to obtain the coefficients of the high-order Adaptive
DSP System Design
90
FIR Filter. Update of the weights‚ like in the original design [79] is computed every 128 samples (at 128 times lower rate than the input rate). The computational requirements can be separated into four sections: subband filtering‚ LMS‚ composition of the wideband filter and signal convolution by the wideband filter. Naylor and Constantinides first proposed to replace the FIR filter bank with polyphase IIR structures for subband echo cancellation [83]. The work reported in this paper presents further improvement of the computational efficiency of the subband filtering stage of the polyphase FFT subband echo cancellation system. Consider the 16band polyphase FFT adaptive system with the Adaptive FIR Filter having 1024 taps‚ in which each subband filter is based on the 128-coefficient prototype FIR filter. The number of Multiply/Add/Accumulate operations (MAC) required for the structure with the FIR filter bank was estimated to be 1088 per input sample. In order to lower the number of calculations per input sample Mullis suggested updating the weights of the high-order Adaptive FIR filter every 128 samples [79]. He argued that the output of the adaptive filter could not change faster than the length of its impulse response. Applying polyphase IIR filter structures to perform the subband filtering makes it possible to reduce the number of calculations‚ improving the convergence‚ accuracy of adaptation and allows efficient implementation.
4.2
Polyphase IIR Filter Bank
The two-path polyphase IIR structures as given in [83] and [85] can be modified in such a way to perform simultaneously both the lowpass and highpass filtering operation as in Figure 1-64(a).
1. Polyphase IIR Filters
91
The output of the adder returns the lowpass filtered signal and the output of the subtractor returns the highpass filtered signal. It should be noted that both the lowpass and highpass filtering actions are complementary‚ i.e. they result in perfect reconstruction giving zero reconstruction error. Adding the Sample Rate Decreaser (SRD) by two gives rise to achieving a two-band subband filter. Shifting the sample rate decreaser to the input can further modify this basic building block [25]. This modification results in half the number of calculations per input sample and half the storage requirements.
The polyphase IIR structure incorporates allpass sub-filters‚ as given by (106)‚ and has a possible structure as given in Figure 1-65.
Because of the small number of calculations required per filter order and very high performance‚ such a structure is very attractive for filtering requiring high speed of operation and high levels of integration. The 16channel subband filtering and 16-times decimation was achieved by incorporating the polyphase IIR filtering block from Figure 1-65 in the structure shown in Figure 1-66 where LPF stands for the LowPass Filter (LPF) and the number indicates the number of coefficients. This four-stage structure splits the signal frequency response into equal size bands followed by decimation by two. Each of the resulting signals undergoes a similar operation at the next stage. All filters were designed for the same 70dB of attenuation for achieving an appropriate separation from the neighboring bands. The transition bandwidths were different‚ as they had to cater for the decrease of the sampling frequency at which they were operating. The required Transition Band width‚ TB‚ and resulting filter coefficients for each stage of the filter bank are given in Table 1-12.
92
DSP System Design
Each output from the filter bank is applied to each of the 16 LMS blocks (Figure 1-67)‚ each operating on a small fraction of overall input frequency response‚ thus achieving fast operation speed. Channel approximation error is split into 16 frequency bands in the same way as the input signal was. This way the phase non-linearity of the polyphase IIR filter bank does not cause errors as both the input signal and the error are subject to the same group
1. Polyphase IIR Filters
93
delays. Each of LMS block returns a 64-tap FIR filter decreasing the error of channel approximation to an acceptable level in its frequency band. The output of each LMS block is then applied to an N point FFT‚ where N=64.
Note that the output bands of the filter bank were not in an increasing order. Additionally the sample rate decrease of the output of the highpass filter causes a flip of the frequency response. Therefore channel re-ordering and re-flipping is necessary before doing the IFFT operation (Figure 1-68).
The bank re-ordering block is responsible for arranging the subbands in an increasing order of frequencies. The SL and SH (2) operators are the
94
DSP System Design
frequency selectors returning the lower half or the upper-half of the FFT output respectively. This was necessary as some outputs of the 16-channel subband decomposition had their frequency responses flipped.
The bank re-ordering creates the positive part of the frequency response of the approximated channel filter. Flipping and conjugating the positive part of the frequency response inherently calculated the negative part of the frequency response. This means that the output of the IFFT returns the real FIR filter‚ which accurately approximates both the magnitude response of the channel and its bulk delay.
4.3
Comparison of FIR and Polyphase IIR filter banks
The performance of channel identification when using the polyphase IIR filter bank was compared to Morgan’s FIR approach [79] both in theory and in simulation. Analysis included the comparison of frequency responses from both subband decomposition methods in terms of passband and stopband ripples and channel overlap. Morgan suggested using the ‘firl’ Matlab routine for designing the prototype FIR filter for the subband filtering‚ which resulted in a filter as shown in Figure 1-69 [79].
1. Polyphase IIR Filters
95
The frequency axis was normalized to the input sampling rate. Band filters achieve -6dB at the crossover point to the next band while -3dB was required for zero reconstruction error at this point. It had 50dB of attenuation at the center of the next band and 70dB‚ required for the aliasing noise floor to be below -116dB‚ at the second band. The proposed polyphase structure achieves -3dB at the Fs/2 point by definition. It reached 45dB attenuation at the center of the next band and 70dB before the end of the next band‚ giving a good separation from all the bands except the two adjacent ones. The overall reconstruction error achieved for the polyphase filter bank was below 10-13dB - close to arithmetic accuracy of simulation platform (Figure 1-70).
One of the main advantages of the polyphase approach is in its low number of multiplications required per input sample. The FIR approach as suggested by Morgan requires 32-tap filters and 17 bands giving an overall MAC requirement of 1088 for subband filtering before the 16 times sample rate decrease. In comparison‚ the use of the multi-stage multi-rate polyphase IIR structure allowed a decrease in the number of MACs (excluding the trivial subtractions) per input sample to 24 if implemented as in
96
DSP System Design
Figure l-65(a)‚ and only 14 when using the structure in Figure 1-65(b). Half of the calculations for this structure are performed at the odd sample intervals and the rest of them at the even sample intervals. Practical tests were carried out to compare the channel approximation when using both the FIR and the Polyphase based approaches. The first one was designed in accordance to the one reported by Morgan [79]. The second one was using the Polyphase IIR filter bank as described in this paper. The input test signal was speech sampled at 8kHz with 5% additive white noise. The channel was a 50th-order least-squared FIR bandpass filter positioned from 0.0625 to 0.375 on the normalized frequency axes with 50dB of stopband attenuation using an additional bulk delay of 400 samples.
1. Polyphase IIR Filters
97
The performance comparison between the polyphase approach and the standard one employing the FIR filter bank is shown in Figure 1-71. Both approaches accurately detected the 400-sample bulk delay proving the viability of the method for echo cancellation. Their adaptation speed was similar. The estimated least squared error of channel approximation for the polyphase approach was 6.2dB in the passband and 9.4dB in full range. This shows an improvement in comparison to the FIR approach giving 7.5dB error in the passband and 13.5dB in full range. The superior results for the polyphase IIR filter bank can be attributed to its steeper transition bands‚ perfect reconstruction (zero error)‚ good channel separation and very flat passband response within each band. For an input signal rate of 8kHz the response time to the changes in the channel is 0.064 seconds. The adaptation time for the given channel and input signal was measured to be below 0.2 seconds. The channel approximation error fell below 10% in approximately 0.5 seconds. The times were estimated assuming that all calculations were completed within one sample period.
4.4
Summary
In this section the application of polyphase IIR filters for subband filtering of the polyphase FFT adaptive echo cancellation architecture was presented. The results of the system incorporating the polyphase filter bank were compared to the standard FIR approach as it was reported in [79]. The novel approach alternative multi-stage multi-rate polyphase IIR approach for the design of the subband filter-bank gives an almost ten-fold decrease in the number of MACs required‚ which can be easily translated into an increased number of bands for higher fidelity for the same computational cost as that of the FIR or a low-power subbands adaptive echo canceller. Additionally the polyphase IIR filter structure used here is are not very sensitive to coefficient quantization [85]‚ which makes a fast fixed-point implementation of the echo cancellation algorithm an attractive option. Applying the polyphase IIR filters in the filter-bank demonstrated more accurate channel detection than was possible with the FIR version suggested by Morgan [79]‚ with much reduced computational complexity. They could be also very applicable for use in echo cancellation applications that require dealing with delay paths in excess of 64ms.
Chapter 2 FREQUENCY TRANSFORMATIONS High-Order Mappings for Digital Signal Processing
AN OVERVIEW
1.
The idea of the frequency transformation where each delay of an existing FIR or IIR lowpass filter transfer function is replaced by the same allpass filter is a simple one and allows a lot of flexibility in manipulating the original filter to fit the required specification. Although the resulting designs are considerably more expensive in terms of dimensionality than the original prototype, the ease of use (in fixed or variable application) is a big advantage and has ensured that such mappings are frequently used for IIR filter designs. A general idea of the frequency transformation is to take an existing filter and produce some other filter replica from it in the frequency domain. Up to now the definitive mapping equations are those put forward by Constantinides [30] and since adopted as “industry standard”. These wellknown equations are geared up to map lowpass to bandpass and several other highly stylized combinations. They are culmination of preceding work [37]-[39] which departures from the earliest transformation work by Broome [40], where a simple modulation approach (suffering from severe aliasing) was used. Recent work [41], [42] has strengthened the utility of both of these methods. The basic form of mapping in common use is:
Here
is a prototype filter acted upon by, in general case, an complex allpass mapping filter, as described by (2) - thus
DSP System Design
100
forming a target filter, The choice of an allpass to provide the frequency mapping is necessary to provide the frequency translation of the prototype filter frequency response to the target one by changing the frequency position of the features from the original filter without affecting the overall shape of the filter response.
is the Rotation Factor The N degrees of freedom provided by the choice of filter coefficients are usually under-used by the restrictive set of “flat-top” classical mappings like lowpass-to-bandpass requiring only second-order mapping filters. In general, for the mapping filter, any N transfer function features can be migrated to (almost) any two other frequency locations. The additional requirement for the mapping filter is to keep the poles of its transfer function strictly outside the unit circle - since is substituted for z in the original prototype transfer function - in order to transform a stable prototype filter into a stable target one. The “Rotation Factor”, S, specifies the frequency shift for the target filter. For the case of the real Frequency Transformation equation (2) will be only allowed to have conjugate pairs of poles and zeros and only single ones on the real axis, this means:
Furthermore, the selection of the sign of the outside factor S in equation (3) is limited for the case of the real transformation to two choices of S=+1 and S=-1. This, as it was first pointed out by Constantinides, influences whether the original feature at zero frequency can be moved (“DC mobility”) for the leading minus sign or whether the Nyquist frequency can be migrated (“Nyquist mobility”) arising when the leading sign is positive. For the case
2. Frequency Transformations
101
of the complex transformation the factor S is allowed to take any complex value that would satisfy the condition of This shows that for complex transformation both Nyquist and DC mobility can be achieved simultaneously. If the chosen rotation factor is then the mapping filter modifies both the frequency scale of the prototype filter and the values of its magnitude response by changing the radii of the prototype filter pole-zero pairs. For example using a trivial mapping of results in (4).
The frequency does not change in this example, but each filter coefficients is scaled by while the filter frequency response is rotated in frequency by the argument of S. In other words, the factor S causes a windowing effect on the prototype filter with a windowing function resembling a moving average filter:
This function has N zeros equispaced around the unit circle at intervals, except the DC (v=0) and radius The example for different values of S and N=8 is shown in Figure 2-1. Note that in order to change the frequency scale of the filter without affecting the height of its frequency response, the scaling factor should be chosen to be
102
DSP System Design
The mapping equation of (1) has an intuitive graphical interpretation, as shown in Figure 2-2. In the example shown the mapping filter converts a real lowpass filter into a real multiband one. It can be notices that the phase of the mapping filter acts as a mapping function. The characteristic points of this function are its zero crossings and the discontinuities caused by the phase crossing the boundary. The first one indicates the frequencies where the DC feature of the Prototype Filter is mapped and the other one where the Nyquist of the original filter is placed.
Though the enhanced design flexibility that frequency transformations offer is readily evident, there has been little work reported in this area. Standard transformations under-use the freedom given in (1) by limiting them to simple first-order mapping filters performing low- and highpass transformations, and second-order mapping filters” for bandpass and bandstop targets. Although Mullis and colleagues have given one very useful multiband solution to the general mapping problem [45]-[48], it seems that scant application experience of that method has been related in IIR design literature. Recent work reported in [42] showed how N arbitrary features of the prototype can be mapped by employing an allpass mapping filter easily defined by solving a set of N complex linear equations which gives real mapping filter coefficients. In [43] a different approach to design of the mapping filter was suggested. Typical approaches are based on mapping a selected feature to its new location, which gives certain mapping filter coefficients. Using such an approach designers can only hope that filter behavior between specified features will be correct. This is true when the
2. Frequency Transformations
103
allpass mapping filter order equals the number of replicas. A better way to design a mapping filter is to concentrate explicitly on designing its phase. If this is done through deployment of poles and zeros then such a design becomes easier and surer. Avoiding additional replicas is not an easy task and may not work for all the design cases as discussed in [43], presenting also comments on target filter stability dependence arising out of the prototype filter and the mapping filter behavior.
1.1
Selecting the Features
Choosing the appropriate frequency transformation for achieving the required effect and the correct features of the prototype filter is very important and need careful consideration. It is not possible to use a firstorder transformation for controlling more than one feature, as the mapping filter will not give enough flexibility. It is not good to use high-order transformation just to change the cutoff frequency of the lowpass filter, as this unnecessarily increase of the filter order, not mentioning additional replica of the original filter that may be created in the undesired places.
In order to illustrate the second-order real transformation it was applied three times to the same elliptic halfband prototype lowpass filter in order to make it into a bandpass filter each time selecting two different features for the transformation (Figure 2-2). This filter was designed in Matlab to have ripples of 0.1dB in the passband and -30dB in its stopband. The idea was to convert the prototype filter into a bandpass one with passband ranging from 0.125 to 0.375 on normalized frequency scale. In
DSP System Design
104
each of the three cases different features of the prototype filter were selected. In the first case the selected features were the left and the right band-edges of the lowpass filter passband, in the second case they were the left bandedge and the DC, in the third case they were the DC and the right edge of the filter passband as shown in the figure. Results of all three approaches are different as shown in Figure 2-3. For each of them only the selected features were positioned precisely where they were required. In the first case the DC is moved towards the left passband edge just like all the other features close to the left edge being squeezed there. In the second case the right passband edge was pushed way out of the expected target as the precise position of DC was required. In the third case left passband edge was pulled towards the correctly positioned DC feature.
The conclusion is that if only the DC can be anywhere in the passband, the edges of the passband should have been selected for the transformation. For most of the cases requiring the positioning of passbands/stopbands, the position of the edges of the original filter need to be well chosen so that the edges of the target filter end up in the correct places.
1.2
Designing the mapping filter
It is frequently required to have a full control of the filter behavior - at other frequencies besides the (almost) arbitrary pair under direct control than that delivered through standard first and second-order transformations [45]-[48]. A typical example is the need to convert a lowpass filter to a bandpass, simultaneously retaining a capability of precise placement of upper bandedge and lower bandedge frequencies, along with a couple of specified intervening frequency features.
2. Frequency Transformations
105
Certainly such enhanced transfer function control can only be achieved at a cost in complexity, as each pole-zero pair of the prototype filter is replaced with N such pairs for the mapping filter, as given by (2). Nevertheless, practical goals such as rapid re-design in tunable filtering scenarios is one of the good reasons practitioners might wish to absorb this cost and has provided the motivation for a noteworthy body of earlier work, [42],[43]. This gives four distinct opportunities for influencing the overall filtering operation: a) Choice of the structure and order for the prototype filter b) Selection of coefficients for the prototype filter c) Choice of the structure and order N for mapping filter d) Selection of the
in
Real-time design in items (b) and (d), in particular, give a nice way of achieving nested variability. There is, moreover, scope for driving these changes (including even dynamic change of N) in an adaptive IIR arrangement. This adds greatly to the appeal of the whole approach, and has motivated our development of a general matrix solution equation for the N in (2) in terms of arbitrary frequency migration specifications. The coefficients of the mapping filter can be calculated by solving a set of N linear equations created for from N migrations pairs
Where phases factors
for
are given by:
DSP System Design
106
For real frequency transformations equation (6) can be simplified to:
The designer needs only (in principle) to specify the N pairs assemble by solving either (6) or (8), dependent on the type of transformation, for the values and then map with (1). The only difficulty lies in selecting allowable mapping point pairs at the outset of the procedure. For the best results the features to map from the prototype filter should be in sorted in either increasing or decreasing order of their values. In most of the practical cases there is no need to use the high-order transformation beyond the second-order ones. The flexibility of movement of the filter features often looses with the increase of filter dimensionality especially for large prototype filters. Consider an average IIR prototype filter to be transformed by an mapping filter. Such a transformation would result in a target filter. Next sections on the cases of first and second order mapping cases and the most commonly used high-order transformation – a multiband one for both real and complex situations. At the end the general case will be presented.
1.3
First-order transformations
We can think of (2) as relating old, and new, z-domain pairs of features. For the second-order mapping case equation (2) reduces to:
For the general case of the complex transformation the selection of two distinct migrations and are allowed. In order to express (9) in the frequency domain we need to evaluate it on the unit circle by substitution Factor S in (9) specifies the distance around the unit circle between the two points and Assuming that the relation the parameters of the mapping filter can be calculated.
2. Frequency Transformations
107
For the first-order mapping filter to perform a real transformation, the values of S and must have real values, as well as and This means that for the case of a real transformation only one mapping pair can be specified. Rotation factor S can only take values of either S=(+1) or S=(-1). Selecting S=(+1) defines a lowpass-to-lowpass mapping, while choosing S=(-1) creates a mirror imaging of the original transfer function, i.e. a lowpass-to-highpass mapping according to the property of the Z-transform:
The possible choices of first-order transformations, both for real and complex cases, are shown below including standard Constantinides cases. For all the examples the prototype was designed as a third-order elliptic filter having 0.1dB ripples in the passband and 30dB stopband attenuation. The transfer function of this filter is given by:
1.3.1
Complex frequency shift (complex rotation)
This is the simplest first-order transformation which is also the only one that performs exact mapping of all the features of the prototype filter frequency response into their prescribed new locations. Its purpose is to rotate the whole response of the prototype lowpass filter by the distance specified by the selection of the feature from the prototype filter and corresponding one from the target filter. The mapping filter is given by:
DSP System Design
108
With
defined as:
Where is the frequency location of the selected feature in the prototype filter and is the position of the corresponding feature in the target filter. The example of rotating by is shown in Figure 2-5.
Target filter coefficients can be calculated from the prototype filter using:
The special case of the complex rotation transformation in common use is the mirror one for the case of shifting by when the target filter is a mirror image of the prototype filter against half-Nyquist frequency
2. Frequency Transformations
109
The example design is shown in Figure 2-6 for a quarter-band (cutoff frequency at lowpass prototype filter. Such a transformation can be used to quickly convert a lowpass filter into a highpass mirror complement one. One possible uses is in the design of complementary (matched) filters for quadrature mirror filter banks. Another special case of the complex rotation, often used for practical applications, is the Hilbert transformation. In this case the rotation factor is or for the inverse Hilbert transformation.
The typical use of Hilbert transformation is for Single Sideband Modulation (SSM) and demodulation and for extracting the envelope of the oscillated signals in measurement applications. 1.3.2
Real lowpass-to-lowpass
This transformation allows such a conversion of the prototype filter that the DC and Nyquist features are locked in their places and one selected feature of the prototype filter frequency response, is mapped into a new location, The mapping filter is derived from (9) by using:
110
DSP System Design
The example use of this mapping for moving the cutoff of the prototype halfband filter of (12) from to is shown in Figure 2-7.
Note that freezing the DC and Nyquist locations and moving one feature to a new location causes stretching and contracting of the rest of the filter frequency response. Calculating the mapping function allows to access those effects and determine where other features are mapped. By evaluating (12) on the unit circle the mapping functions of (17) are obtained.
Some cases of (17) are shown in Figure 2-9. It can be noticed that when the selected feature moves towards the Nyquist frequency (positive the features above it get squeezed while the ones below get stretched. The effect is opposite for the case of the selected feature moving towards the DC.
2. Frequency Transformations 1.3.3
111
Real lowpass-to-highpass
This transformation is analogous to the real lowpass-to-lowpass one with the only difference of DC and Nyquist features replacing each other. This in effect converts a lowpass filter into a highpass one and vice versa. The mapping filter is derived from (9) by using:
The example use of this mapping for moving the cutoff of the prototype halfband filter of (12) from to at the same time changing its lowpass character into a highpass, is shown in Figure 2-8.
The mapping function
calculated from (12), is given by:
The relation of (19) is very similar to the one of (18) for the real lowpassto-lowpass transformation, shown in Figure 2-9. The difference is that the mapping function is shifted in by half of the Nyquist frequency, creating a phase discontinuity at DC (clear indication that the Nyquist feature is mapped at DC). For reaching the Nyquist the goes to zero, indicating that DC feature is mapped at this frequency.
DSP System Design
112
1.3.4
Complex lowpass-to-bandpass
This transformation is derived as a cascade of the real lowpass-tolowpass mapping and the complex frequency rotation. It performs exact mapping of one selected feature of the prototype filter frequency response, into two new locations, and in the target filter creating a passband between them. Both Nyquist and DC features can be moved with the rest of the frequency response. The mapping filter is derived from (9) by using:
Frequency location of the selected feature in the prototype filter Position of the feature originally at in the target filter Position of the feature originally at in the target filter The mapping function is given by:
The example in Figure 2-12 shows the use of such a transformation for converting a real half-band lowpass filter into a complex bandpass one with band edges at and
2. Frequency Transformations
113
The shape of the mapping function is shown in Figure 2-13 for different values of and the DC shift as in the example from Figure 2-12. The features mapped in the example are marked on the plots.
1.3.5
Complex lowpass-to-bandstop
This first-order transformation performs exact mapping of one selected feature of the prototype filter frequency response into two new locations in the target filter creating a stopband between them. Both Nyquist and DC features can be moved with the rest of the frequency response. The mapping filter is derived from (9) by using:
DSP System Design
114
Frequency location of the selected feature in the prototype filter Position of the feature originally at in the target filter Position of the feature originally at in the target filter The mapping function is given by:
Example in Figure 2-14 shows converting a real half-band lowpass filter into a complex bandstop with bandedges at and
The shape of the mapping function is shown in Figure 2-13 for different values of and the Nyquist shift as in the example from Figure 2-12. The features mapped in the example are marked on the plots.
2. Frequency Transformations 1.3.6
115
Complex bandpass-to-bandpass
This first-order transformation performs exact mapping of two selected features of the prototype complex bandpass filter into two new locations. Both Nyquist and DC features can be moved with the rest of the frequency response. The mapping filter is derived from (9) by using:
Frequency locations of selected features in the prototype filter Position of the feature originally at in the target filter Position of the feature originally at in the target filter The mapping function is given by:
Example in Figure 2-16 shows an example of converting a complex bandpass filter with bandedges originally at and into a new complex bandpass filter with bandedges at and
The shape of the mapping function is shown in Figure 2-17 for different values of and the Nyquist shift as in the example from Figure 2-16. The features mapped in the example are marked on the plots.
DSP System Design
116
1.4
Second-order transformations
For the case of the second-order mapping old and new z-domain images, and are related by:
The D(z) is the denominator polynomial. The numerator of the filter is a mirrored and conjugated version of the denominator polynomial. The phase response can be then calculated using the method from [86], which for the case of the real transformation leads to the mapping function as given below:
The second-order mapping function allows to independently specify two distinct migrations and They are then used to solve the two simultaneous equations which arise from equation (26). For the case of a real transformation the result is:
2. Frequency Transformations
117
Where:
The upper (+) sign in (14) is applicable for “Nyquist mobility”, while the lower one for the “DC mobility”. Explicit unit-circle form is formed by replacing “old” location, with and appending suitable subscripts. Likewise it is done for “new” unit-circle locations, Here is the normalised frequency variable. This yields the more complete relations: (30) for DC mobility and (31) for Nyquist one.
In the well-known Constantinides formulas “old” features are usually cast as bandedges whose “new” images are also bandedges. For instance, selection of (where is the edge of the passband of a prototype lowpass filter), along with corresponding images and will deliver a bandpass resulting filter with its passband within the positive-frequency band However, as Constantinides pointed out in [30], the DC frequency gain of the prototype does not map to but rather is warped to a location, which requires calculation of an additional equation. It is easy to modify the standard Constantinides result. For instance, to explicitly control the movement of the DC feature and one bandedge the parameters and are selected for use in (30) and (31). This results in a “DC plus upper bandedge” controlled alternative to Constantinides’ lowpass-to-bandpass equation. However, there is always a limitation of explicit controlling only two features as long as a second-order mapping filter is employed.
DSP System Design
118
It is worthwhile reflecting upon whether any two combinations will be legitimate when using (22). The intention might be to map a stable prototype to a stable purely through use of (1), taking to have poles strictly outside the unit circle. Such a condition is easily seen to be sufficient to guarantee such stability inheritance. Therefore certain restrictions in mapping pairs must be observed. Note that a minimum-phase numerator (all zeros inside the unit circle) in (5) requires:
Then (30) and (31) begin to reveal the interplay of allowable “old” and “new” locations if these are specified in unit-circle forms (as is most often of interest to the filter designer). Continuing in this way and demanding DC movement to frequency the following relation is obtained:
Finally:
This finishes the discussion on allowable frequency specification combinations. A lists of general real and complex choices of frequency transformations covered by second-order mapping functions is given below. 1.4.1
Real lowpass-to-bandpass
This transformation performs exact mapping of one selected feature of the prototype filter frequency response, namely the cutoff frequency, into two new locations, and in the target filter creating a passband between them. The DC feature moves with the rest of the frequency response, while the Nyquist one stays fixed. The mapping filter is derived from (26) using:
2. Frequency Transformations
119
Frequency location of the selected feature in the prototype filter Position of the feature originally at in the target filter Position of the feature originally at in the target filter The example in Figure 2-18 shows the use of such a transformation for converting a real half-band lowpass filter into a real bandpass one with band edges at and The shape of the mapping function corresponding to this example is presented in Figure 2-19.
1.4.2
Real lowpass-to-bandstop
This transformation maps exact one selected feature of the prototype frequency response, namely the cutoff frequency, into two new locations, and being the passband edges of the target filter. The Nyquist feature moves with the rest of the frequency response, while the DC one stays fixed.
DSP System Design
120
The mapping filter is derived from (26) using:
Frequency location of the selected feature in the prototype filter Position of the feature originally at in the target filter Position of the feature originally at in the target filter The example in Figure 2-20 shows the use of such a transformation for converting a real half-band lowpass filter into a real bandstop one with band edges at and The shape of the mapping function corresponding to this example is presented in Figure 2-21.
2. Frequency Transformations 1.4.3
121
Real frequency shift
This transformation performs exact mapping of one selected feature of the prototype filter frequency response, namely the cutoff frequency, into a new locations, in the target filter performing an operation of frequency shift. It is very similar to the real lowpass to bandpass transformation, likewise moving the Nyquist feature with the rest of the frequency response and keeping the DC feature fixed. He only difference is that any feature of the prototype filter can be selected. It is then mapped to the exact position in the target filter. The mapping filter is derived from (25) using (33). S= – 1
Frequency location of the selected feature in the prototype filter. Position of the feature originally at in the target filter. The example in Figure 2-20 shows the use of such a transformation for shifting the cutoff of the real half-band lowpass filter to a new frequency location converting it into a real bandstop filter.
Mapping function shape for the above example is given in Figure 2-21. Note that the real shift transformation is not linear. It shifts features of the prototype filter differently, dependent on their distance from the selected
DSP System Design
122
feature - the only one shifted correctly. Features between and will be moved by smaller amounts while the other ones will be moved by bigger amount. This is an unavoidable side effect of this transformation needed to preserve the real character of the filter.
2.
M-BAND TRANSFORMATION
This transformation performs an exact mapping of one selected feature of the prototype filter frequency response into a number of new locations in the target filter. It’s most common use is to convert a real lowpass with predefined passband and stopband ripples into a multiband filter with arbitrary band edges. The order of the mapping filter must be even, which corresponds to an even number of band edges in the target filter. The complex allpass mapping filter is derived from (6) and given by:
Where:
2. Frequency Transformations
123
For the case of the real transformation the mapping filter coefficients can be calculated from much simpler set of N linear equations:
In the fourth-order example shown in Figure 2-24 the cutoff frequency of the prototype filter, originally at was mapped at a four new frequencies Rotation factor S specifies whether the DC should be mapped to itself (S=1), similar to the lowpass-to-lowpass transformation (solid line), or replaced with the Nyquist (S=-1) as for the case of the lowpass-to-highpass mapping (dashed line).
DSP System Design
124
The shape of the mapping function corresponding to the example from Figure 2-24 is given in Figure 2-25 with mapping points marked. The flexibility offered by this transformation is presented in Figure 2-26 for a number of designs with one bandedge fixed and other ones varying.
The direct calculation of the multiband transformation parameters by solving a simple set of linear equations gives great advantage for applications requiring adaptive band tuning. The list of applications includes digital equalizers, noise, echo and interference cancellation and others. 2.1.1
Rate-up transformation
The rate-up mapping is a special case of the M-band transformation, where the target filter is a symmetric multi-replica version of the prototype filter. The mapping filter takes a trivial form of (41).
The example of this transformation for the case of M=3, creating three replica of the original lowpass filter around the unit circle, is shown in Figure 2-27. The cutoff frequency of the original filter, and the bandedges of the target filter, (where was mapped) are shown on the plots of magnitude responses.
2. Frequency Transformations
125
Rate-up mapping is a linear mapping affecting the distances between the features on a frequency axes in the same way, even clearer when looking on the mapping function plot corresponding to the example (Figure 2-28).
The most common use of this transformation is for shifting the position of the digital filter from one place in the system chain to the other one operating at a different rate, assuming integer ratio of both frequencies. This is especially useful for polyphase IIR filters as shown later in Chapter 3.
3.
GENERAL N-POINT (N-IMAGE) MAPPING
Usage of a “transformation filter” like is often inadequate. It is frequently required to have greater control - at other frequencies besides the (almost) arbitrary pair under direct control - than that delivered as a
DSP System Design
126
consequence of the warping attending this simple transformation. A typical example is the need to transport a lowpass filter to bandpass, simultaneously retaining a capability of precise placement of upper and lower bandedges frequencies, along with a couple of specified intervening frequency features. This requires modification and extension to higher-order versions of (1) and (5). In particular, a mapping filter would replace in (1). Of course such enhanced transfer function control can only be achieved at a cost in complexity, as can be seen clearly from the escalated dimensionality of PZP characteristic of such designs. Nevertheless, practical goals such as rapid re-design in tuneable filtering scenarios is one of the good reasons practitioners might wish to absorb this cost and has provided motivation for earlier work [49]-[50]. The approach is indeed flexible. Equation (42) gives the opportunity to control up to N independent features of the prototype filter spectrum. Standard transformations under-use the flexibility offered by frequency transformations by limiting them to simple first-order mapping filters performing lowpass and highpass transformations, or second-order “mappers” for bandpass and bandstop targets. Clearly there is a need to extend beyond the traditional second-order mappings, which can control redeployment of only two transfer function features.
The D(z) term is the denominator polynomial. The numerator of the transfer function is a mirror image of the denominator polynomial. Assuming that the filter is stable, that is has all its poles inside the unit circle and hence all its zeros outside the unit circle at reciprocal locations. The phase response can be expressed as:
The
term is the phase response of
that is:
2. Frequency Transformations The mapping function
127
can be then expressed as:
Equation (42) gives four possibilities of influencing overall operation: Choice of the structure and order for the prototype filter Selection of coefficients for the prototype filter Choice of the structure and order N for the mapping filter Selection of the in The real-time design of item (b) and (d), in particular, give a nice way of achieving nested variability. There is, moreover, scope for driving these changes (including even dynamic change of N) in an adaptive IIR arrangement. This has motivated the development of a general matrix solution equation for in (42) in terms of (almost) arbitrary frequency migration specifications. Considering (42) as a mapping:
From all the N mappings
a set of N linear equations is created:
The designer need only (in principle) specify the N pairs assemble by solving (47) for the values and then map with (1). The only difficulty lies in selecting allowable mapping point pairs at the outset of the procedure. It has been established as a rule of thumb that in order to avoid bad conditioning of the matrix of equations, the migrations should be arranged in increasing order of frequencies, that is:
128
4.
DSP System Design
PHASE RESPONSE OF THE MAPPING FILTER
All-important is the behaviour of an allpass mapping filter in its phase response. This is frequently forgotten by designers who focus on moving only selected points of the frequency response of the original filter to their new locations without bothering about what happens in between the selected features. If the mapping filter phase is not monotonic then inevitably “breakup” of the prototype filter transfer function will take place and an “N-for-N” multi-banding will result. Enforcement of the monotonic change of the mapping filter phase response is therefore a goal inseparable from the “lessthan-N-band” mapping task. It is this realization which has motivated the prime breakthrough in our work: a viewpoint of constrained design of mapping filters - as opposed to rather haphazardly spawning them implicitly through (closed-form or linear equation) specification of isolated points on the phase characteristic. The new method reported enforces monotonic change of the mapping filter phase by design; the shape of allpass filter phase response between specific points (responsible for delivering mapping chosen features) is controlled. This requires use of an allpass filter to map M, (M
2. Frequency Transformations
129
Pole/zero pairs located on the negative real axis are placed too far from each other, which creates a non-monotonic phase of the mapping filter (point C) placing one additional replica at point D. The DC feature mapped at frequency B results from the replica created by the phase jump at As both pairs have to close to each other we can freely change only one of their radii and their frequency which means that, in most of cases two pole/zero pairs can control only one feature of the prototype filter if we do not want one more replica. Using these observations we can design an allpass filter able to map the chosen set of prototype filter features and create any number of replicas (if required) located at any frequencies. The implication is that any M-point frequency transformation with R-replicas can be performed with a mapping filter of order N given by:
The design procedure was implemented in Matlab and uses a GaussNewton optimization method. The constraint includes pole (zero) radii and their frequencies and also takes care of their interaction. For pairs of poles and zeros located at the same frequency it also takes care not to let the mapping filter phase go non-monotonic (which would be fatal). This effect is shown in Figure 2-30.
130
DSP System Design
In this example the poles and zeros of a mapping filter in Figure 2-30(b) were arranged in such a way to make the phase of filter from Figure 2-30(a) not to reach values close to zero. This means that the features in the range indicated by would not be mapped to the target filter. Lowpass prototype filter was designed for the cutoff frequency of hence the target filter consisted only of stopband features of the prototype filter, as shown in Figure 2-30(c).
5.
STABILITY OF THE TARGET FILTER
A very important aspect of the frequency transformation process is the stability of the target filter. Note that each pole and each zero of the original filter will be replaced with N pole-zeros pairs. This means that each term will be substituted by a polynomial:
2. Frequency Transformations
131
From the stability point of view we need only to take care where poles of the target filter are going to be placed. For a stable prototype filter all poles are inside the unit circle. It means:
It is interesting to see what happens to poles of the target filter when the stable prototype filter is mapped using different mapping filters. Consider a mapping filter with all poles outside the unit circle then the target filter will have all its resulting poles inside the unit circle as in Figure 2-31(a). Similarly if all poles of the mapping filter are inside the circle then the target filter will end up with all its poles outside the unit circle as in Figure 2-31(b).
For a combination of poles inside and outside the unit circle resulting poles will lie both inside and outside the unit circle as in Figure 2-3l(c). This means that for a stable frequency transformation mapping all the poles of the mapping filter must be outside the unit circle. We can notice that root trajectories start at reciprocal locations of mapping filter roots for Moreover, from (50), it is clear that location of roots inside or outside the
132
DSP System Design
unit circle is determined in general case by the nominator of the mapping filter. For the numerator, dominates over the denominator, The high-order frequency transformation we propose does not place all target filter poles inside the unit circle for less-than N-band transformation. Therefore after the transformation each exterior pole must be flipped inside the unit circle and the filter transfer function scaled appropriately. Though such “stability-forced transformation” method requires root-finding and associated numerical intervention, it vastly expands the utility of the mapping idea. It appears to be the only way to obtain the correct shape of the target filter magnitude response for less-than N-band transformation. In fact, [42] advocated relaxation of stable mapping in favour of a supplementary post-mapping stage of pole-flipping to bring about final stabilisation. Certainly, the prototype filter's phase characteristic is an immediate casualty of this “stability forced” transformation process and a polynomial factorization burden also seems unavoidable. Users are frequently tolerant of nearly any sort of phase characteristic and the typical orders of polynomials are low, so these matters need not be perceived as any great disadvantage. It is worthwhile reflecting upon whether any two combinations will be legitimate when using (2), this means allowing to map a stable prototype to a stable As stated before, taking to have poles strictly outside the unit circle and the zeros inside the unit circle (condition that is easily seen to be sufficient to guarantee such stability inheritance), then we find that certain restrictions in mapping pairs must be observed. Note that a minimum-phase numerator of the mapping filter (2) is required to satisfying the condition of (52).
If H(z) is minimum phase, then is maximum phase, and vice versa. Therefore if condition of (52) is satisfied then the denominator of the mapping filter is maximum phase, i.e. all its poles are placed outside the unit circle. Continuing in this way and demanding a feature originally at to migrate to DC and assuming that |S|=1, the relation of (53) is derived.
Substituting it into (52) gives a condition (54) for achieving a stable Npoint transformation.
2. Frequency Transformations
133
The condition of (54) can be used in the mapping filter design approaches employing optimization algorithms for detecting the non-stable situations. It is interesting to notice that for the real (both S and are real) lowpass-to-lowpass/highpass transformations (DC mapped to itself or to Nyquist), the condition (54) simplifies to:
5.1
Single-replica multi-point transformation
As it was shown for the cases of first- and second-order transformations in the previous sections, the separated replicas are easy to create, but do not exercise all the possibilities that the designer might want in practice. There is an intense interest in control of versions of the prototype filter which number less than N, but where several selected features of the prototype transfer function are deployed through judicious selection of transformation coefficients It is at this juncture that one must be prepared to sacrifice complete transformability of the prototype filter in order to assure stability of the resulting It is often acceptable to deliver only the magnitude response, over targeted spectral bands for the expense of changed phase response. Having such freedom allows the notion of strict transformation to be modified to a more liberal procedure: 1. Create a mapping filter with some or all poles inside the unit circle. 2. Perform cancellation of overlapping (or very near) pole-zero pairs. 3. Obtain a final stable result by flipping exterior poles inside the unit circle and scaling appropriately. This “stability-forced transformation” method - though requiring rootfinding and associated numerical intervention at stage (3) - vastly expands the utility of the mapping idea. This is shown by employing a multipoint transformation to perform a migration of four selected frequencies of the prototype elliptic lowpass filter with a cutoff of into a bandpass
134
DSP System Design
version using the constraints specified in Table 2-1. The prototype was an IIR filter having numerator and denominator coefficients given by N=[0.1001151, -0.3281158, 0.4648838, -0.3281158, 0.1001151] and D=[1.936188, -8.077861, 12.88845, -9.326601, 2.589792] respectively.
Here fairly precise values of the locations of zeroes have been taken and transported (along with the DC feature) to clearly identifiable locations shown in Figure 2-32.
But it should not be thought that this approach is without its difficulties. It is no accident that 0.335 has been specified as an image frequency target in Table 2-1. Even a small offset (say to 0.34) from the “favourable value” causes upset to the integrity of the mapping. It has been found that the extreme sensitivity of such degenerate “Less than-N-band” mapping can be often countered by raising N and specifying more points still! The objective in doing this is to provide a pole/zero pattern sufficiently rich to encourage alignments that can maintain a phase plot (for a mapping filter) free of discontinuities. This phase plot is the key to good unit-circle mapping and must be a one-to-one curve (for positive frequencies), for instance, if a single-band mapping is sought. Figure 2-33 is a waterfall plot of four filter gain plots which (by exploiting symmetric disposition about have been produced by lowpass-to-bandpass transformation with complete freedom in placement of three pairs of constraint points. Here, just for ease of display, bandedges and zeroes coincide in all four plots, while the “ridge” arising from movement of a single peak feature (with its frequency changing
2. Frequency Transformations
135
between 0.13 and 0.23) mainly distinguishes them. This flexibility comes at the cost of employing a sixth-order filter to carry out the mapping task.
It is felt that the great flexibility such “dimensionality overkill” offers to the filter designer is, in many circumstances, a legitimate and attractive alternative to direct IIR design and has the considerable advantage of permitting precise deployment of familiar gain features. Interim relaxation of stability requirements and abandonment of phase fidelity offers enormous scope in a general design setting. It has been shown above how N arbitrary features of the prototype can be mapped by employing an allpass mapping filter easily defined by solving a set of N complex linear equations which delivers real mapping filter coefficients. Here a different approach is suggested to the design of the mapping filter. Typical approaches are based on mapping a selected feature to its new location, which gives certain mapping filter coefficients. Using such an approach designers can only hope that filter behaviour between specified features will be correct. This is true when the allpass mapping filter order equals the number of replicas. For high-order mapping with less-than N replicas one often encounters problems with additional replicas created in most unexpected places. A better way to design a mapping filter is to concentrate explicitly on designing its phase. If this is done through deployment of poles and zeros then such design becomes easier and surer. A way of avoiding additional replicas in the target filter is shown here.
136
DSP System Design
Furthermore some comments on target filter stability dependence arising out of the prototype filter and the mapping filter behaviour are given. Turning now to compare the design method through shaping the phase response to the exact N-point transformation (47) [43], one can get a feel for how problems arising there can be corrected. The same prototype fourthorder elliptic lowpass filter with cutoff at the normalised frequency as was used for the example in Figure 2-32 is deployed here too.
This example demonstrates the influence of mapping filter phase jumps by on the example of the second-order mapping with migrations defined as The magnitude response shown in Figure 2-33(a) has a phase jump of at DC. Notice two real pole-zero pairs located on both the positive and the negative real axes. This results in creation of the unwanted mapping of the Nyquist feature at DC, forcing the lowpass filter to become a two-band one. Increasing the order of the mapping filter by one allows countering this effect. The result of such a procedure is shown in Figure 2-34 where the pole/zero pair creating the jump in phase at DC was shifted up to the Nyquist frequency by the addition of an extra pole-zero pair (an automatic outcome of the design method) and re-designing of the mapping filter. Furthermore, this additional pole-zero pair served also to compensate for the additional jump by that would otherwise happen at the Nyquist frequency. It needs to be added here that the final filter required a subsequent stabilisation step to achieve a final result.
2. Frequency Transformations
137
The next example demonstrated in Figure 2-35 shows the influence of local extrema in the mapping filter's phase response. In this lowpass-tohighpass two-point transformation the mapping filter has two reciprocal pole/zero pairs on the real axis.
138
DSP System Design
One of them has its pole outside the unit circle at the Nyquist frequency and the other one inside at the DC frequency. Their influence is that the phase response of the allpass filter changes its slope at The resulting effect is that the phase response does not get below so the original filter frequency points below 0.051 (its whole passband) are not mapped into the target filter. Instead only features residing in the range of (0.5, and are mapped. Invoking an additional pole/zero pair and shifting the pole and the zero from DC to Nyquist does the trick, as presented in Figure 2-36.
The power of the presented technique of designing the mapping filter by shaping directly its phase response is better demonstrated when varying the mapping features. In order to do this a “less-than-N-band” target is sought from an mapping [43]. Specifically, the movement of four distinct features of the original prototype filter to four controllable (positivefrequency) locations in the target filter are demanded. And, in doing so, the monotonic behaviour of the transfer function can be preserved. The result is shown in Figure 2-37. These examples provide important waypoints in the development of greater understanding of the potential and the limitations of traditional transformations for obtaining flexibly modifiable IIR filters. The technique presented here stands at the watershed between “acceptance of what you get” with standard mapping and a new style of purposeful imposition of beneficial mapping effects through orderly design.
2. Frequency Transformations
5.2
139
Summary
The extension of popular transformations for IIR filters, which employ high order mapping filters, was presented in this section. Easy control of prototype transfer function features in multiband renditions is demonstrated. A wider interpretation of transformation is also suggested which permits “less-than-N-band replication” (at a cost in dimensionality, phase fidelity and attention to stability enforcement) that is believed to be of considerable benefit in practical design situations. High-order frequency transformations were addressed here which extend standard Constantinides, Mullis/Franchitti transformations. They give the opportunity to control up to N independent features of the prototype filter spectrum for mapping filters. Standard transformation under-use the freedom of frequency transformations by limiting them to simple firstorder mapping filters performing low- and highpass transformations, and second-order “mappers” for bandpass and bandstop targets. Clearly there is a need to extend beyond the traditional second-order mappings, which can control re-deployment of only two transfer function features. Using specially designed order allpass filters, mapping of N features of the prototype filter can be carried out independently (where N<M). The
140
DSP System Design
mapping filter is designed taking special care to ensure that there are no excess mapping replicas, which may arise if an exact N-point transformation is used. This design viewpoint represents a significant departure from previous approaches to filter transformations. A different approach to the design of the mapping filter was also suggested. Typical approaches are based on mapping one selected feature to its new location, which gives certain mapping filter coefficients. Using such an approach designers can be sure that filter behaviour on both sides of the specified feature will be correct. It becomes different when the number of features increases. Then the designer can be sure of the correct filter behaviour between selected features only when the allpass mapping filter order (number of features) equals the number of replicas. For high-order mapping with less-than N replicas one often encounter problems with additional replicas created in most unexpected places. Therefore the user should concentrate explicitly on designing its phase response, which shows much more clearly what the original filter is going to be converted to. If this is done through deployment of poles and zeros then such a design becomes easier and surer. It has been shown in this section how to avoid additional replicas in the target filter and conclude with comments on target filter stability dependence arising out of the prototype filter and the mapping filter behaviour.
6.
MULTIBAND POLYPHASE FILTER DESIGN
This section addresses a new approach for the design of multiband filters with step-like magnitude responses and extremely flat weighted passbands down to that uses frequency transformation as part of the design procedure [51]. This new technique can also be used for the multiband stepwise approximation of arbitrary filter magnitude responses with precise transition band control. One marked advantage of the technique is that the basic building blocks are the modified polyphase (IIR) filters as reported in [7], [4] and [5]. The points of transition from one flattop to another, namely the multiple transition bands of the filter are completely free of crossover oscillations. One other advantage of this technique is that it is not confined to employing IIR filters only. To this end perfect reconstruction FIR filters as in [52] can also be used. In both the FIR and IIR cases the technique is a general purpose one and works for both real and complex valued filter coefficient cases.
2. Frequency Transformations
6.1
141
Modified polyphase halfband building block
Polyphase structures as suggested in [4], [7] are very attractive for the design and implementation of the halfband lowpass and highpass filters delivering very small passband ripples for a very small coefficient budget. Lowpass and highpass filter functions can be created using the same coefficients, with exception of a sign change at one of the summation blocks. By combining lowpass and highpass filters as in Figure 2-39(b) one can get a very efficient two-band filter with controllable band and Figure 2-39(a) shows the form of the computationally and hardware efficient single coefficient second-order allpass building block employed in the parallel bands of the overall filter structure.
Lowpass and highpass prototype filters combined in the basic building block in Figure 2-39(a) are complementary filters. When they are added with the same gain factors and they result in the allpass performance:
Here
is the second-order allpass filter as in Figure 2-39(b):
When the filter becomes an allpass function having the phase response (59) and the magnitude response shown as in Figure 2-40:
DSP System Design
142
Where
is the normalised frequency.
If gain factors then in the result the two-band filter will have band gains in the first band and in the second one. The magnitude response of a sample four coefficient (ninth order) two-band building block filter as in Figure 2-40(a) having gain in the first band and in the second band, displaying smooth transition band characteristics as in Figure 2-40(b) is shown in Figure 2-40. Moreover there are no oscillations in the transition band that follows the transition band of the original highpass filter. It has been shown in [5] that the characteristic of the building block filter is equiripple in both bands and displays a monotonic cosine like shape in its transition region from one band to another. The symmetric monotonic behaviour in these transition regions ensures oscillation free behaviour. One can observe the detailed behaviour of the building block filter both in its bands and in its transition region through the calculation of the filter frequency response. This is obtained by evaluating the transfer function (56) on the unit circle.
Where
2. Frequency Transformations
143
This gives:
As the building block filter is a linear combination of two complementary equiripple polyphase filters, [4], [7], the building block filter will also have equiripple behaviour in both its bands. For the class of allpass filters (57), which have been used here, their phase response changes monotonically between zero and Therefore the overall filter (59) has magnitude response at frequencies near DC (small and for frequencies near Nyquist frequency close to 0.5), The magnitude response for frequencies between DC and Nyquist is determined by the dynamics of and With reference to (61) functions and have to be custom designed to force the argument to be approximately zero in the first band and equal in the second one. The order of the building block is exactly the same as the order of the prototype polyphase lowpass filter. This may seem wrong at a first glance as addition of two transfer functions usually leads to the filter order equalling to the sum of orders of the added filters. However the key to the reduced order of complexity lies in the simple fact that the same structure is used to generate both lowpass and highpass filters as in Figure 2-39(a).
6.2
The Multiband Structure
At the heart of the multiband polyphase filter lays the two-path halfband building block. As its name suggests, the halfband filter is restricted to having cutoff at half-Nyquist. However this is not a problem as the cutoff can easily be changed through the use of frequency transformations, as outlined in [30], [43]-[44]. If a number of such two-band filters are cascaded, the resulting filter can be engineered to exhibit the desired multiband transfer function. If the simplest frequency transformation, that of real lowpass-to-lowpass as in (42), or the lowpass-to-highpass as in (63) is used, then the mapping can be performed with first-order allpass filters. During transformation each delayor of the original filter is substituted with the function As a consequence, the order of the resultant filter is kept the same as the starting-point prototype order, and hence no increase in implementation complexity.
144
DSP System Design
Here and are the cutoff frequencies of the original and target filters respectively. On the other hand lowpass-to-lowpass transformation squeezes or stretches the rest of the filter frequency response to ensure that the target filter is real. It should be made clear that the amplitude of the ripples is unchanged as a result of the stretching and squeezing process. However, the location of the peaks and troughs of the ripples, as well as the cutoff frequency and transition band are altered. Therefore the designer must bear in mind that the filter cutoff, transition band and ripple structure will be different after the transformation. In the example treated in this chapter the prototype two-band filter is pre-warped so that the resulting target filter has the required transition bandwidth, and it is centred on the new cutoff frequency.
In order to create a multiband filter a set of M two-band filters are cascaded as described in (64), with cutoff frequencies changed through the lowpass-to-lowpass frequency transformation as in (42). The variable M is the number of bands of the overall target filter. Each of the building block filters must have magnitude response equal to unity in its first band, to
2. Frequency Transformations
145
ensure the magnitude response shape created by the preceding cascaded sections is unaltered. This idea is clearly exposed in Figure 2-18, for the case of M=4. The next band of the target filter is created through the careful choice of the scaling factor K, as in (64) and Figure 2-18.
Here specifies magnitude response gains of the overall filter in all their passbands. Each of the building block filters is designed with requirements for ripples and transition bands so as to match the specifications for the overall target filter. Furthermore each building block filter undergoes a lowpass-to-lowpass transformation. As a result of the stretching and squeezing on the magnitude response by this transformation, it is not possible to calculate coefficient straightforward as in (42) and (63). This is because the transition band after the transformation is not centred on the target cutoff frequency. It also complicates calculation of the required transition band of the lowpass prototype filter. Therefore an iterative approach is required for each basic two-band subfilter. 1. Specify the target cutoff frequency and target transition band is the original filter cutoff frequency). 2. Calculate coefficient from (42). into 3. Inverse transform target filter passband edges
4. Modify the target cutoff frequency:
5. If modification is greater than allowed frequency error then go to step 2. 6. Calculate the required transition band of the prototype lowpass filter:
Another problem is the specification of the attenuation for prototype filters so that the ripples of the overall filter in each of its individual
146
DSP System Design
passbands have the desired value. The passband and stopband ripples, and respectively of the polyphase filter are related to each other through, and the overall multiband filter is designed as a cascade of such filters. The implication of this is that one can not get magnitude response ripples to exactly match the specification. However, one can design the basic polyphase filters so that the resulting ripples are smaller than the specification calls for. The minimum values of the magnitude response in passbands, and (passband ripples and of the basic twoband building block filter are:
Here A is the dB stopband attenuation of the polyphase lowpass filter employed in constructing the two-band building block. If a number of such basic two-band building blocks are cascaded to form a multiband filter, then the ripples in each passband become a function of all the prototype polyphase lowpass filter attenuation and the minimum gain in each passband (passband ripples, ) are calculated from (69).
The calculation of the required attenuation (passband ripples, of the prototype polyphase filters, for the required ripples requires solving a set of linear equations This is a standard linear programming problem, which was solved in Matlab with the 'lp' function.
2. Frequency Transformations
6.3
147
Multiband Complex Structure
A similar idea to the one described previously can be used to design multiband complex filters, non-symmetric in respect to DC, by combining the LLFT with a complex rotation in frequency domain:
The additional complex rotation frequency transformation is used (71) on the polyphase prototype filter after changing its cutoff frequency by (42) and (63). Contrary to the design of the real multiband filter, the design procedure for the complex one is dependent on the symmetry of the magnitude response. In certain cases like in Figure 2-42(a) it is possible to create the desired filter response using smaller number of basic building blocks. They are limited to the odd number of bands equal number of transition bands. Even symmetric sequence of band level values and equal transition bands between each similar pair of levels, as in Figure 2-42(b) and Figure 2-42(c). The special case is when the magnitude response is even symmetric, i.e. as in Figure 2-42(a), where is the centre of symmetry. General case for non-symmetric magnitude response, as in Figure 2-42(d), in which some of the cutoff frequencies overlap which, can cause possible ripples at this frequency. The special case of the first type of filter (being even symmetric) can be easily obtained by designing first its real equivalent centred at DC and rotating the result by using (71) as shown in Figure 2-42(a). In such a case the design procedure is exactly the same as for the real case. The only difference is that the implementation must take under consideration the complex rotation in the frequency domain as well. If each of the two-band sections are controlled independently by (42) and, subsequently, each of them individually rotated using (71) then the even symmetry can be broken. The same equations as for the real multiband filter can be used for specifying the requirements for each of the basic two-band sections as for the real multiband filter design. The change of bands of one basic section does not have to happen within a single band of one of the other ones. This leads to the more non-symmetric magnitude response, as in Figure 2-42(c). Obviously, in such a case some bands values are equal the product of some other ones. These types of filters have two drawbacks. First, they require the magnitude response either to be symmetric in terms of level values of the final filter (cases (a) and (b) in Figure 2-42) or to be possible to be constructed from the symmetric two-band sections (cases (a), (b) and (c) in Figure 2-19).
148
DSP System Design
Secondly, only odd number of band can be achieved and there are pairs of equal transition bands. The advantage is that they can be designed for a
2. Frequency Transformations
149
very small computation burden as they require only (M-1)/2 basic two-band filters for an M-band filter. They can also achieve constant behaviour at Nyquist. The design of the general complex multiband filter from Figure 2-19(c) requires the modification to the basic two-band filter. Now, the structure in Figure 2-16 undergoes two types of frequency transformations. First the real LLFT adjusts the cutoff frequency of the twoband filter to be equal half of the width of one of the bands. Then the result is rotated in the frequency domain by the complex frequency shift transformation, (71). The cascade of M-1 of such building blocks forms a complex M-band flattop filter. It is required that one of the transition bands of each basic section is fixed at one frequency (for example at Nyquist). Obviously this causes ripples in this transition band, but this is not possible to avoid.
6.4
Examples
Two examples for the case of real and complex multiband filters are treated here with a set of similar specifications to both designs in order to compare real and complex design methods. The design parameters for both filters are given in Table 2-2 and the results are presented in Figure 2-43 and Figure 2-44 for both filter cases.
For both cases the filter order of the resulting overall IIR filters was estimated to be 50, which is equivalent to 81 multiplication operations. This is a very good result considering the ripples that were achieved in each passband. These were better than what was required. The cutoff frequencies and transition bands were achieved with accuracy bordering on the floatingpoint precision of the computational platform. This level of accuracy can be apportioned to the prototype polyphase filters having only four to six coefficients as well as employing the simple first-order frequency transformations. As a result, only 81 multiplications are involved.
150
DSP System Design
The results obtained from this approach were compared to Matlab “yulewalk” implementation [53] of the band flattop filter being a least-squares fit in the time domain using modified Yule-Walker equations with correlation coefficients computed by inverse Fourier transformation of the specified frequency response. It was designed for identical specifications (Table 2-2) for the real case only as this method does not work for complex cases. Although the specifications for level values were achieved, bandedges were all shifted approximately by ±0.0015 and ripple values were a lot higher than for the polyphase approach. Thus the ripples were calculated both as the maximum difference from the required band gain and as the maximum peak of ripples (“true ripples”) within the required bandedges.
2. Frequency Transformations
6.5
151
Summary
A novel technique for the design of multiband IIR filters, employing twopath polyphase building block has been described in this chapter. It achieves solutions to very stringently specified magnitude response requirements, both for the real and complex coefficient filter cases. This was achieved by using a polyphase IIR structure known to be applicable for high quality lowpass/highpass filters. As such structure does not require only (N-1)/2 multiplications for the structure, the multiband implementation suggested in Figure 2-16 is the most efficient, and probably the only way to implement this class of filters. Computing the equivalent IIR transfer function would necessitate too many convolutions and hence suffer from numerical error accumulation as well as implementation difficulties. Its advantageous performance over Matlab “yulewalk” implementation was presented and proved. Although the complex multiband filter design is possible, the magnitude response is required to have some symmetric features. Either bands levels have to be arranged symmetrically or they need to be related to each other. The transition bands are also not fully independent. They come in pairs with equal widths. Even if they have such drawbacks, the level of the ripples within each band and efficiency of the implementation is impressive. Such a performance could not be achieved with a standard IIR transfer function implementation, only by using the polyphase structure. No other method could be found in the literature that would achieve a similar performance for such a small number of coefficients as the polyphase IIR based multiband filter presented in this section.
Chapter 3 FILTER IMPLEMENTATION
1.
CONSTRAINING FILTER COEFFICIENTS
Digital filtering designs may require different implementation approaches dependent on the application speed and power dissipation. Many of them can be easily and effectively implemented using standard floating-point DSP processors. Unfortunately these requiring fast operation speeds and lowpower dissipation require design specific structural implementation. One of the most important ways of achieving high operation speeds is by using constrained coefficients filters with calculations performed in fixed-point arithmetic. This drastically simplifies implementation of Arithmetic-Logic Units (ALU) which in consequence increases the speed of their operation. The use of constrained filter coefficients is not easy from the point of view of the designer. Constraining coefficients to the certain bit-wordlength makes it impossible to achieve optimal filter performance as in most of the cases the best filter (floating-point) can not be represented with a small number of bits. It happens very often that for LBit-long wordlength the optimum filter is not the one, which has its coefficient values closest to their floating-point versions (best one). Therefore rounding up or down and truncating floating-point coefficients does not give the best result. Some different design approaches have to be used. Unfortunately there is a lack of good coefficient constrained algorithms. The ones found in literature [23], [61] were not described in a way that allowed their easy implementation and performance comparison with other ones.
154
DSP System Design
Constrained filter design methods can be put into two groups. Some methods first calculate the best floating-point filter coefficients, truncate or round them and optimise the result on a constrained grid to achieve the best possible filter characteristics. An example of this approach can be the “bitflipping” algorithm presented later in this chapter [3]. Other ones start from random or arbitrarily chosen starting points and then proceed with the optimisation in a manner similar to the floating-point methods. An example of such method is a Constrained Downhill Simplex Method presented also in this chapter. Both methods basically perform a structured search for the best filter characteristic within the constrained coefficient space. The first one looks for the best solution in its closest neighbourhood while the second one searches for the better solution dependent on the result of the previous iteration. Both methods presented in this chapter have been tested on the polyphase lowpass filter designs, but can be used, almost with no alteration, for general constrained filter design applications. The use of fixed-point arithmetic causes also other unwanted effects. They originate from rounding of the results of the arithmetic operations like addition and multiplications. Although such rounding operations (loss of precision) are predictable due to their deterministic nature, they can be modelled as a white noise source at the multiplier or adder. These effects will be discussed later in this chapter.
1.1
Floating-point optimisation by “Flipper”
In many cases it is relatively easy to design the filter which has floatingpoint coefficients, but implementation of such filters causes troubles with area of silicon realisation and its speed when custom silicon realisation is employed. Similar problems crop up if filters were to be implemented on a stand-alone DSP processor. Therefore it is very much required to find the filter which would have its coefficients limited to the required bit-length, specified by the implementation constraints, with its frequency characteristic kept as close as possible to its floating-point prototype. Simple rounding up/down or truncation radically decreases the filter performance. The solution, optimum for the given bit-length, cannot be found by neither of those methods. The optimisation algorithm must be able to independently adjust each filter coefficient within the boundaries of the wordlength. Certainly the floating-point solution is a very attractive starting point for such an operation. In this section the bit-constrained optimisation method is described which starts from the truncated or rounded floating-point prototype and performs a structured local search for the optimum filter coefficients within the specified bit-length. The algorithm was called “bit-flipping” since bits of the
3. Filter Implementation
155
filter coefficients are being set/reset during the optimisation, causing a flipping effect. A flow-chart of the general version of the algorithm is presented in Figure 3-1. The algorithm in this form is very suitable for filters that have a very small number of coefficients and/or they need to be constrained to a small number of bits. The main idea of the optimisation is to change a few bits of each coefficient in search of improvement in the cost function. The cost function can be the magnitude response with or without the phase response. For the case of the polyphase lowpass filter it can be passband and/or stopband ripples for the given transition band. The algorithm as it stands caters for positive coefficients in absolute value less than one. In case the filter has its coefficients negative as well then the Most Significant Bit (MSB) can be used as a sign bit. If two’scomplement arithmetic is used then the conversion can be done at the moment of generation of the new set of coefficients. The input is the set of coefficients either randomly generated or floating-point optimum solution rounded. Certainly the latter one will require less computation to find the optimum for the LBit-long wordlength. The designer has the choice of the first bit to start the other ones to flip. For a random starting point it should be the first bit as the optimum solution can be very far away from the initial coefficient values. If the floating-point filter version is available then it is enough to optimise only the last few bits as the constrained coefficients are very likely to have their values close to the floating-point ones. This fact was observed and proved in the numerous simulations performed. The second parameter, MSize, is the number of coefficient bits changed simultaneously. The more bits are changed at the same time, the higher the chance is that optimisation will reach its optimum. The drawback is that this will cause slowing down the optimisation. Practically MSize should be chosen between one and three. In each iteration coefficients are changed by applying a mask. The mask is a set of bits that are substituted for the ones in the coefficient. Mask bits are changed until they cover all the possible combinations available with MSize-bits. For each change of the coefficients a new cost function evaluation is being calculated. If there is any improvement then the new set of coefficients is stored as the best result. The search continues to find out if there is any better combination available. After exhausting all combinations the optimisation search goes down by one bit and applies masks at lower bits of the coefficients. It does like this until it reaches the Least Significant Bit (LSB).
156
DSP System Design
The modified version of the algorithm which can be used for filters having a larger number of coefficients (reduced number of function evaluation) is given in Figure 3-2.
3. Filter Implementation
157
158
DSP System Design
In such cases only one filter coefficient is being changed at a time. It was confirmed during a number of filter optimisations for filters having a large number of coefficients (FIR filters) that the filter response is less sensitive to small changes of coefficient values. On the other hand, this means that a small change in one of the coefficients is much more likely to give an improvement to the overall filter response than in case of the filter having a small number of coefficients like the polyphase structure. The efficiency of the method is indisputable. All the simulations performed for the case of the polyphase structure, for which the algorithm was designed in the first place, showed that the “bit-flipping” algorithm gives a big improvement in comparison to regular truncation and rounding. A set of comparative results for the number of bits 8, 12, 16 and 20-bits are shown in Figure 3-3, Figure 3-4, Figure 3-5 and Figure 3-6 respectively.
The result of the method is compared to the floating-point implementation, truncated and rounded ones. The test filter was a sevencoefficient polyphase halfband lowpass having a transition band of and floating-point stopband attenuation of 133dB. Such filters are not very sensitive to the quantization of their coefficients as the interaction is not between the coefficients themselves, but between the phases of the allpass filters generating them [3][4]. In such a case the overall error due to the quantization of the coefficients is reduced. Even such filters exhibit a large decrease of their performance when constraining coefficient to very small wordlength like 8 bits (Figure 3-3). Rounding caused almost 90dB
3. Filter Implementation
159
decrease of the stopband attenuation to 47.9dB, while truncation did only slightly better, reaching 62.3dB. Applying “bit-flipping” either to the truncated or rounded results improved them almost twofold (more than 30dB).
Increasing the coefficient wordlength obviously makes all the results closer to the ones for the floating-point filter. It can be noticed that truncation and rounding gives very close results, with maximum stopband attenuation difference within 2dB both far away from the floating-point result. In contrast to them, both the “bit-flipping” performed 30dB better for 12 bits (Figure 3-4), 20dB better for 16 bits (Figure 3-5) and 10dB better for 20 bits (Figure 3-6). For 20 bits “bit-flipping” gave results which were less than 3dB worse than the floating-point ones. The drawback of the presented algorithm is the large number of computations required to achieve such results. The optimisation implemented in Matlab 5 was taking approximately 30 minutes on a 266MHz Pentium II PC. Even so, the filter performance achieved compensates for the time spent during the optimisation. In most of the cases the optimisation is performed only once during the design and every designer is ready to wait even for few hours to obtain a set of filter coefficients that lend themselves to very effective implementation in silicon. Herein lies the power of the “bit-flipping” algorithm; a very good performance for an acceptable cost in design time.
160
1.2
DSP System Design
Constrained filter design with “Hybrid Amoeba”
The Downhill Simplex Method (Amoeba) is due to Nelder and Mead [59]-[60]. It only requires function evaluations and does not require any derivatives. It is not very efficient in terms of the number of required
3. Filter Implementation
161
function evaluations. Because of the lack of derivatives it can be very efficient for constrained optimisation, especially for problems for which the computational burden is small. The working of this method can be explained easily using a geometrical analogy. A simplex is an N-dimensional geometrical figure, which has N+1 vertices connected with straight lines. In two dimensions simplex is a triangle, in three dimensions it is a tetrahedron, not necessarily a regular one. The only condition required is that it encloses a finite inner N-dimensional space (non-degenerate). If one of the points is chosen to be the origin, the other ones define directions in which the search would be made. The general idea of the minimisation is to bracket the minimum, which should subsequently succeed to isolate it. The downhill simplex method starts, unlike most of optimisation algorithms, not from one point, but from N+1 points, defining the initial simplex chosen such that it encloses the minimum. The lowpass filter based on the polyphase structure, as presented in Chapter 1, is an attractive subject for such minimisation. The minimum of the stopband attenuation (case of all zeros on the unit circle achieving equiripple response) lies inside the space described by the vertices for which all filter zeros are either at Nyquist or compensate poles at half-Nyquist as shown in Figure 1-10 and Figure 1-11 for the cases of two and three coefficient two-path polyphase IIR structures respectively.
The downhill simplex method now takes a series of steps. Most of them are just moving the point of the highest function value through the opposite face of the simplex to the point for which the function should have a lower value – effect called as Reflection (Figure 3-7). Such a step preserves the volume of the simplex, thereby maintaining its non-degeneracy. When such
162
DSP System Design
steps are taken the method expands the simplex in one or more directions to take larger steps. When the method reaches the valley, the simplex is contracted in the transverse direction and made to slide down the valley. There may be also situations when the simplex contracts in all directions and tries to pull the simplex through the best point (the lowest value of the function). Such behaviour is similar to the movement of amoeba from which it took its name. The termination criterion is not easy to specify as it is not possible to choose a specific tolerance for the single independent variable, unless the condition has to be satisfied in all dimensions or by the length of the N-dimensional vector. For the floating-point optimisation such a tolerance should be specified to be the square root of the machine precision. Such a termination criterion can be out-flanked by a single anomalous step which for some reason failed to get anywhere [59]. The idea of the Constrained Downhill Simplex (CDS) Method incorporates the idea of the simplex, but instead of making floating-point steps in each direction, both the optimisation step and the starting points are now constrained to the given LBit wordlength, thus achieving final constrained of the resulting filter coefficients to LBit-bits.
The constraining of the step size has a major influence on the behaviour of the simplex, especially when the wordlength is small. Even the original algorithm suffers from the attraction to local minima. It can be seen from Figure 3-8 how different the result of the optimisation can turn out to be for the polyphase 7-coefficient lowpass filter design, where the step size was chosen to be 8 bits. The result is very different (19dB) to that achieved by
3. Filter Implementation
163
the bit-flipping algorithm (Figure 3-3). Such decrease of effectiveness of the minimisation is due to step constrained, which does no longer hit the minimum, but walks on its edges. This is because it is difficult to predict the direction in which the rounding of the step should be performed when constraining it to the LBit wordlength.
164
DSP System Design
Here the bit-flipping idea can be utilised. Even the simplest one-bit bitflipping on the least significant bit makes the whole optimisation much more robust. For the same 8-bit long coefficients the result is 25dB better in terms of the stopband attenuation (Figure 3-8). The results of 12, 16 and 20-bit designs are shown in Figure 3-9, Figure 3-10 and Figure 3-11 respectively. For every wordlength the result of the optimisation was very close to the one achieved by the bit-flipping algorithm. Even if bit-flipping gives the best result in most of the cases (especially when two or more bits are being flipped) the constrained downhill simplex is the second best for the case when the minimum can be bracketed within the starting simplex. It helps very much to know the floating-point results as well. Then the optimisation converges much more easily and faster to the true minimum. It is also advantageous to the bit-flipping when the floating-point result is not known. The latter one requires much more time as bit-flipping has to be performed on all the bits of the coefficients and this may take long time to finish.
1.3
Reducing coefficient complexity with CSDC
Filter complexity is a function of the number of calculations (shift and add or multiplications and summations) per sampling period required to perform the filtering. The proper structure (like polyphase structure) and fixed-point coefficients obtained from bit-constrained algorithms (bitflipping etc.) of direct constrained-coefficient filter design (constrained Downhill-Simplex etc.) are the main ways of achieving this purpose. Most of the methods represent bit-constrained coefficients in Natural Binary Code
3. Filter Implementation
165
(NBC). In such cases multiplications can be simplified to the appropriate number of shift-and-add operations. The additional reduction of the filter complexity can be achieved by reducing the complexity of its coefficients. This means reducing the number of non-zero bits by using other types of binary codes like Canonical Signed-Digit Code (CSDC) [61][62], where each bit can have a value of one, minus one or zero, as in equation (1).
Adding (-1) bit into the NBC gives the possibility of representing coefficients in CSDC with smaller number of bits than it was otherwise possible. The CSDC code allows representing the same values as the NBC. The difference is that NBC has a unique representation for the given number, but CSDC can have more than one. For instance there are 21 possibilities of representing the coefficient 21 with 7-bits (see Table 3-1).
Using CSDC code will not allow to get any different coefficient value that NBC can not achieve for the given bit wordlength, N. In such a case it
166
DSP System Design
can only be used to reduce the number of non-zero bits of the coefficient. Although there are so many possibilities the NBC has the smallest number of bits. The question arises: “When does the saving happen?” Analysis of the bit combinations leads to the conclusion that it happened only when there are three or more consecutive bits with a leading zero bit in the NBC, which is:
There are not very many such bit combinations in the NBC. Statistically only about 14% of bits can be saved for 8-bit wordlength (20% for 12-bit wordlength). The saving is clearly increasing with the wordlength, as there are more chances of the combination (2) to happen. Since adder can easily be used for subtraction too, the hardware requirements are not increased when using CSDC. The difficulty is that now, instead of two states of the bit (“0” and “+1”), there are three states (“0”, “+1” and “-1”) which have to be stored in the memory. Therefore, either the memory has to be different, to store three-state bits, or a conversion has to be performed from binary code to the CSDC every time the coefficients are going to be used. It is different if the multiplications are hard-wired in the structure, which does not require using the memory. In such cases positive and negative bits are equivalent in terms of hardware complexity. However, this is feasible only for a short coefficient wordlength.
1.4
Implementation of decimators and interpolators
Efficiency in implementing the polyphase structure comes not only from the small sensitivity to coefficient quantization, shown in the previous section, but also from its structure. Such a structure can be easily implemented using parallel calculations by N processors, where N is the number of paths. It was also presented before that the number of coefficients required to achieve a high quality filter is only (K-1)/2. This considerably reduces the number of multiplications required in comparison to the standard IIR structure. The structure of the polyphase filter has been shown in Chapter 1. Here the discussion concentrates on the efficient implementation of the polyphase IIR decimation and interpolation filters showing also examples of their efficient implementation.
1.5
Implementation of the polyphase IIR decimator
The standard decimation filter is implemented as a combination of the lowpass filter followed by the Sample Rate Decreaser (SRD) that selects
3. Filter Implementation
167
every sample from its output. Considering that the polyphase LPF is used, such a standard structure will have the form as in Figure 3-12(a). Implementing the allpass filter, using the presented N-D structure, has two advantages. First, it allows easy cascading of a number of allpass sections sharing one delayer between each pair of them. Additionally, as will be discussed later in this chapter, its peak gain at the multiplier is limited to two (Figure 3-2) in comparison to other allpass implementations which can peak to infinity at half-Nyquist This fact allows one to ensure that the internal values can be stored in the memory with only one additional guard bit without adjusting the decimal point.
It can be easily noticed that the standard decimation structure is doing a lot of unnecessary calculations as only half of the calculated samples are being used to form an output signal. The valid output is the sum of the odd input samples processed by the upper branch APF and even samples going through the lower branch APF. The other samples are discarded. From the symmetry of the structure the valid output can be also considered as the even samples going through the upper APF and odd ones passing the lower branch APF. As only every second sample of each APF is important for the output, the switch performing the SRD can be moved to the input of the decimation filter as in Figure 3-12(b). This way it directs all odd samples into the APF in the upper branch and all even ones into the APF in the bottom branch. Due to the situation that both APF operate now at twice lower rate and the valid output is produced at every sampling interval, the order of the delayers can be decreased twice. When designing the architecture one must be careful as the output of both APF is displaced by half of the sampling interval and before outputs of both branches are summed they need to be synchronised. This can be achieved by storing results of both APFs in latches before summation with the result stored in the output buffer, making the output of the decimation available for the next processing stages. This will be discussed later in this section when describing the architecture.
168
DSP System Design
It is possible to achieve the same purpose as in Figure 3-12(b) by switching coefficients of the APF instead of input samples. As all APF used in the two-times decimation structure are functions of even output samples are dependent on even input samples and independent from the odd input samples and odd output samples, and vice versa. This means that instead of using two APF - which when implemented directly would require two processors (multipliers) - it is enough to use one as in Figure 3-13.
Then the stream of input samples is fed into the APF which changes its coefficients depending on which sample comes in, odd coefficients for odd input samples and even coefficients for even input samples. A two-sample output buffer remembers the last two results of allpass filtering making them available for subsequent summation. The latch “L” is enabled at every output sampling interval to make sure that correct samples are being passed to the output. Such a structure permits optimisation of the architecture to use only one multiplier and double the speed, as well as minimising the area of the integrated circuit. The structure in Figure 3-13 requires the same number of multiplications per input sampling interval as the one in Figure 3-12(b). It also requires that both upper and lower branch allpass filters have the same number of coefficients. Otherwise, one more unity-valued coefficient has to be added to the lower path APF. There are several applications, like the cascaded decimation filter presented in Chapter 1, which require more than two polyphase LPFs in one decimation stage like in Figure 3-14(a). Moving the SRD to the beginning of the cascade of two LPFs is not as easy as in the case of the one LPF. After moving the SRD in front of the second LPF, as in Figure 3-14(b), it becomes clear that all output samples of the first LPF are significant for the second LPF. Therefore in order to move the SRD further to the input of the first LPF, this LPF has to be doubled as in Figure 3-14(c) [25].
3. Filter Implementation
169
Notice that samples fed into both branches of the polyphase filter are displaced by and then, assuming that each allpass section needs the same time to provide the output sample, samples reaching every summation point are also displaced by the same interval of time. Therefore in the implementation there must be latches at the output of each branch allpass filters in order to align samples in time at each summation point. The idea was shown for the case of the two-path filter. If more paths are needed the structure can be easily modified by increasing the number of states of the switch, equal to the number of paths. For the structure in Figure 3-14(c) it will also mean that 3N allpass branch filters will be required, where N is the number of branches in the structure. Implementing the polyphase filter exactly as in Figure 3-14 would be impractical, as it would require a lot of multiplier and summation blocks leading to a huge
170
DSP System Design
and costly integrated circuit realisation. The best solution is to share a single multiplier and a single adder, which leads to the architecture in Figure 3-15 presenting the simplest implementation of the IIR polyphase lowpass filter with single section per allpass branch filter [25].
The processor can be viewed as comprising two distinct sub-processors, the inner one and the outer one, using a two-phase clocking scheme. The inner processor comprises two single-port RAMs, latches, a subtractor, m by n multiplier, where m and n are the coefficient and arithmetic wordlength respectively, and an adder forming a DMAC (Difference-MultiplyAccumulate). This sub-processor computes the allpass sections and the outer processor, comprising the Dual-Port RAM (DPRAM), latches and an adder combines the allpass outputs to form the halfband lowpass filter output. The subtraction of the allpass section is done by two’s-complement adder, Sum1, with latches L1 and L2 holding its input samples. The adder should provide a sign-bit extension to have a guard bit before providing the result of the subtraction to the multiplier. The multiplier should employ rounding-to-zero (see section 4.3.1) circuitry when constraining the result back to an n-bit wordlength. The OR gate examines if there is any 1’s set among the disregarded bits. If the MSB=0 (positive number), the output of the OR gate is disregarded forcing to truncate the data. If the MSB=1
3. Filter Implementation
171
(negative number), the result of the OR gate is added to the output rounding it up towards zero. Adder Sum2, with latch L4 holding one of its inputs and L5 storing its outputs does the summation of the allpass section. Adder Sum3 with latches L6 and L7 holding the inputs performs the lowpass summation. The output can be stored either in DPRAM, if it is required to cascade lowpass filters, or in latch L8 as the final result. The loss of precision is compensated for Sum3 with convergent-round being an unbiased quantization scheme.
172
DSP System Design
The DPRAM is used to store both the input samples to the structure as well as the samples from the intermediate stages. The memory is being addressed employing address swapping. This means that the data i.e. input samples and contents of the allpass delayers are being stored in memory only ones. Instead of assigning one memory location to the sample delayed by one sample and the different memory address to the sample delayed by two samples which mean reloading all the stored samples from one place into the other one, the addresses of memory locations change their meaning. In one sample interval one memory location contains the sample delayed once, in the next sampling interval the same address points to the sample delayed twice. Such an approach saves a lot of time and power consumption. The controller can be designed as a finite-state machine. From the architecture point of view it can be implemented as a ROM addressed by the circular counter (going through all the states) with a decoding logic providing the control signals [63]. Such decoding logic can be designed using FINESSE [23]. In case there is more than one allpass section per branch filter, the processor from Figure 3-15 needs to be modified to cater for the cascading of the basic allpass sections. The modified structure employing a four-phase clocking scheme is given in Figure 3-16. This processor is similar in overall structure to the processor from Figure 3-15 in that it comprises both an inner and an outer sub-processor where a DMAC forms the core of the inner processor. However, this processor uses only one single-port RAM which operates at twice the frequency of the DMAC and has additional data pathways required to implement the higher-order allpass filters that may be used in the later stages of the decimator [25].
1.6
Implementation of the polyphase IIR interpolator
The principles of operation of the interpolation filters are very much similar to the decimation filters. In the decimation filters only significant samples are being calculated which allowed the decrease of the clock rate of the whole polyphase structure. A similar principle can be used to decrease the clock rate of the polyphase interpolation structure where sample rate increase is done by zero-insertion. It was discussed before that if the digital filter transfer function is defined in terms of then only every of its output samples are dependent on each other and on every of the input samples. This effectively means that the filter can be decomposed into N independent filters with transfer functions in terms of z with its input sequence composed of every input sample and the output sequence composed of every output sample.
3. Filter Implementation
173
It can then be noticed from Figure 3-17(a) that every second input sample has a zero value and therefore has no influence on the result of the summation as the zero comes from the top branch in one sampling interval and from the bottom branch in the next one. This means that the summation is relevant and the switch can be shifted to the input as in Figure 3-17(b).
For the case of two cascaded polyphase lowpass filters used for the interpolation, as in Figure 3-18(a), only the summation of the first filter can be avoided. The output of the first filter does not have the same property as the input signal. Therefore, in order to move the switch to the output of the whole lowpass filter cascade the second filter has to be split into N, driven from all N branches of the first filter as shown in Figure 3-18(b) for the case of the two-times interpolation.
DSP System Design
174
The proposed structure gives obvious N-times decrease of the number of memory locations and number of multiplications per output sample for the first lowpass filter. For the case of the cascaded lowpass filters there is no such improvement for the second filter in cascade. If three or more filters were cascaded then the total number of multiplications and memory required would be larger than for the original structure in Figure 3-18(a). On the other hand, if the structure is implemented in the multi-processor architecture, moving the switch to the output allows more time for the calculations allowing N-times increase of the input signal bandwidth. The architecture required to implement the interpolation structure can be the same as for the decimation filter (Figure 3-15 and Figure 3-16) with a specially designed controller. If only one polyphase structure can be used for the lowpass filtering then adder Sum3 and relevant input latches become obsolete and can be discarded.
2.
FINITE WORDLENGTH EFFECTS
One of the important implementation issues, which have to be considered during the design of the architecture for any type of filter, is the storage requirement for the internal calculations. The size of the memory has to be such that it does not cause the loss of precision due to rounding effects of the results of internal multiplications and summations. It happens very often, especially for IIR filters having a feedback loop that even if input and output samples are limited to unity and are represented with w-bits the internal values might have values well above one (even infinitely large values for unity allpass coefficient) and require much more bits than w in order to provide a reliable output. Additionally, there may be a big difference between internal values. The result of one summation can be below unity, the output of the other one may be very large. This makes for many problems in the implementation, as it would require varying the position of the decimal point. It will be shown in this section that it is possible to implement the polyphase structure without varying the position of the decimal point. Polyphase structures can be implemented both in floating-point and in fixed-point arithmetic. Because of the small number of calculations required per filter order, such a structure is very attractive for fixed-point implementation. It is advantageous over the floating-point one in terms of calculation speed, area on the integrated circuit and the total power consumption. The disadvantage is that fixed-point implementation is subject to such effects as quantization noise, caused by quantising the result of multiplications and summations to the internal arithmetic wordlength, and
3. Filter Implementation
175
limit-cycle oscillations, repetitive flipping of the least significant bits caused by the chosen quantization scheme. The first effect has to be dealt with at the time of the filter coefficient design. The coefficients have to be such that the filter removes the effects of the arithmetic quantization. It will be shown in this section how this noise characteristic can be determined, assessed and applied in the filter design. The second quantization effect can be limited by the appropriate selection of the quantization schemes, shown in this section, and the choice of the internal arithmetic wordlength.
2.1
Quantization [23]
Using any type of fixed-point binary arithmetic with a uniform quantization step size, whether it is signed binary, one or two’s complement, they will cause errors due to product quantization and due to register overflow after additions. In the filtering operation, where a number of multiplications and additions are undertaken, the error will accumulate causing in the most drastic cases wrong filtering results. Therefore proper care has to be taken to avoid or minimize the effect of the arithmetic quantization error. The error due to product quantization is caused by the fact that that a product of two fixed-point numbers with and bits, respectively, will yield a new number that requires N1+N2-1 bits for its nonerror representation. On the other hand an addition of the same two numbers yields a result that requires at most bits. In the digital filter implementation the signal values are multiplied with the filter coefficients and the result is usually constrained back to the original signal wordlength. It is not feasible to allow the increase in the number of bits at every product calculation, especially when a recursive filter structure is used, as the wordlength would keep growing too rapidly towards infinity after successive multiplications. It is commonly used in the literature to model the product quantization as an additive disturbance:
Where Q[.] denotes the quantization operation on the argument signal and e(n) is the disturbance due to quantization. Assuming that the quantization noise for the w-bit data path length is caused by variation of the then to a first approximation the quantization error can be modelled as a white noise source with power of [5], [6]. Such an approximation is used very often in modelling quantization effects [6], [7] even if it does not consider such effects like the loss of precision,
176
DSP System Design
when the data needs to be represented with smaller number of bits than it is required for keeping it in full-precision, and the correlation between the quantization noise and an input signal. The way of modelling the quantization process more accurately by considering the loss of precision mechanisms can be found in [8]. The correlation between the quantization noise and the structure of the input signal is explained in [9], [10]. The most common quantization schemes are summarised in Table 3-2.
These differ in DC offset (bias) and number of loss bits to consider, different for two’s-complement (TC) and unsigned (UN), and the way that the loss of precision is handled: Rounding to zero returns the nearest integer value to the original value between zero and that value (truncates towards zero), i.e. both 0.9 and -0.9 will be rounded to zero. Rounding returns the nearest integer value to the original one. For the original value exactly halfway between the two integers it returns the one closer to plus infinity, i.e. and Rounding to plus infinity returns the nearest integer value to the original value (truncates towards plus infinity), i.e. and
3. Filter Implementation
177
Truncating returns the nearest smallest integer value to the original value (truncates towards minus infinity), i.e. and Convergent rounding returns the nearest integer value to the original value. For the original value exactly halfway between the two integers it returns the “even” integer (LSB is zero), i.e. and It is similar to rounding except for the case when the original value is exactly halfway between two integers. The “number of loss bits to consider” specifies how many bits of those thrown out have to be used to calculate the result of the quantization for the specific scheme. For most of them, except simple round, all the bits have to be considered. This requires more complicated hardware to perform the quantization. For lowpass filtering operation the important factor which must be properly taken care of is the DC bias. There are only two quantization schemes, which are DC bias free, round-to-zero and convergent-round. This is due to the results of the quantization rounded up and down with, statistically, equal probability. Therefore these are the only schemes applicable for the lowpass polyphase filters.
2.2
Peak gains along allpass filter structures
When looking at the whole structure of the polyphase filter in Figure 1, it can be seen that it does not have any feedback. The block, which makes the polyphase structure, an IIR one, creating local feedback loops, is the allpass filter Ai(z). For the purpose of our analysis we made an assumption that the input signal, just like the filter coefficients, is less than or equal to one. Such an approach, where there are no integer bits and the whole data is the mantissa, simplifies the analysis of both the values of the internal calculation and assessment of the quantization noise. When the input signal is limited to unity then the output of the allpass filter is also limited to unity. Therefore the summation of two such filters would require one extra bit to represent the result of the addition without loss of precision. In most practical cases applications of the structure in Figure 1 is used with the number of paths equal to the power of two, i.e. N=2n with n being an integer. In such a case the output scaling by N can be realized as a simple n-bit right shift operation moving the fractional point by n bits. Regarding the allpass filter themselves, they can be, in principle, implemented in at least four different ways, as shown in Figure 319. They differ in terms of the number of summations and multiplications as well as the arrangement of their numerator and denominator parts - which one is first and which one is second.
178
DSP System Design
The important distinction between the structures is the value of the results of the internal multiplications and summations, which influences the memory size requirements. These results are dependent not only on the value of the allpass coefficient, but also on the frequency of the input signal. In
3. Filter Implementation
179
order to assess how big the internal value may become (its dynamic range), Transfer Functions of Internal Calculations (TFIC) have to be determined for each allpass filter structure between the input of the filter (limited to unity) and the input to each of the delayers (implemented as a memory). In other words this has to be done only for outputs of each adder. As filter coefficients are less than one for the polyphase structure, multipliers will not contribute to the increase of the peak internal value. The dynamic range analysis can be done without considering the quantization effects because of the several order of magnitude differences in value between the one and the other. The transfer functions in question have been determined for all the allpass filter structures and are presented in Figure 3-20.
180
DSP System Design
Those transfer functions have been evaluated for a range of coefficients values between zero and one and the resulting magnitude responses are shown in Figure 3-20. These plots clearly illustrate how only one structure from Figure 3-19(c) has TFIC magnitude responses limited to a finite value of two for any frequency and independent of the coefficient value. All the other ones are peaking at half-Nyquist to very large numbers. This means for such structures that, if the input signal has frequency components close to half-Nyquist, then the results of the internal multiplication and summations may well have very large values, causing problems with different positioning of the fractional point in different parts of the system or inaccuracy of the final result. The verification of the theoretical analysis has been done by applying various input signals to the fixed-point model of the polyphase filter designed using the Fixed-Point Blockset from Simulink: single tones at various frequencies, impulse, wideband signals as well as noise sources and speech. The simulation results matched the theory. The maximum values at the points of interest were below limits given in Figure 3-19, derived for an impulse input. Only one structure, in the Numerator-Denominator (N-D) arrangement, does not suffer from such effects. For the DC, half-Nyquist and Nyquist frequencies the memory will store very small numbers. The maximum values are for two frequencies: and This can be taken care of by increasing the size of the memory by one additional integer guard bit. An advantage is that this structure allows to create higher-order allpass filters easily, by cascading a number of such structures together (Figure 3-20), sharing one delayer between each pair of allpass sections.
2.3
Quantization noise in polyphase IIR filters
In the polyphase filter, like in any other filter, quantization has to be performed on the result of any arithmetic operation. This is because any such
3. Filter Implementation
181
operation requires more bits to represent the result than is required for each of the operands. If the wordlength were always to be adjusted to store the data in full precision, this would be impractical, as there would soon be too many bits required to be stored in the available memory. Therefore the wordlength of the internal data, w, has to be chosen and the result of any arithmetic operation has to be constrained back to using any of the previously shown quantization schemes, as appropriate for the given application. The quantization effects in allpass filters have been studied in different publications [5], [10]. Here the attention is drawn to the analysis of the special case of the polyphase IIR structure. The quantization operation may cause the disturbance to the result of the arithmetic operation. For normal filtering operations, such a quantization disturbance can usually be successfully considered as white noise and modeled as an additive noise source at the point of the arithmetic operation with the quantization step equal to the LSB of the internal data, This certainly is not a case for zero-valued or constant input signals. However, modeling the quantization has - in most cases - the purpose of determining the maximum noise disturbance in the system. Hence, even if the additive quantization noise model gives overestimated values of the noise for very specific signals, this fact does not decrease the usefulness of the approach. After the shape of the quantization Noise Power Spectral Density (NPSD) is found, it can be used to identify regions that might cause overloading or loss of precision due to arithmetic noise shaping; also the required input signal scaling and the required internal arithmetic wordlength can be estimated for a given noise performance. The standard methods of estimating the maximum signal level at a given node are L1-norm (modulus of the impulse response – worst case scenario), L2-norm (statistical meansquare) and (peak in frequency domain giving the effect of the input spectral shaping). These norms can be easily estimated for the given node from the shape of the NPSD. The quantization noise injected at each adder and multiplier, originally spectrally flat, is shaped by the Noise Shaping Function (NSF), calculated from the output of the filter to the input of each of the noise sources, i.e. to the output of each of the arithmetic operators. These functions were calculated for all the allpass filter structures from Figure 3-19 and are shown in Figure 3-21 and the shapes of the non-trivial of the NFS are shown in Figure 3-22.
182
DSP System Design
3. Filter Implementation
183
The accumulated quantization NPSD transferred to the output, is obtained by shaping the uniform NPSD from each of the quantization noise sources by the square of the magnitude of the NFS corresponding to the given noise injection point can be described by (4).
184
DSP System Design
The results show that all structures perform in a way very distinct from the other ones. Structure (a) has the best performance at DC, half-Nyquist and Nyquist where the NPSD falls towards minus infinity. Its two maxima are symmetric about and independent of the coefficient value. The peaks are distant from for small coefficient values and approaches it as the coefficient increases. Structure (b) has uniform noise spectral distribution as all the arithmetic operations are either at the filter input - then noise is shaped by the allpass characteristic of the whole filter - or at its output. Structure (d) has also a minimum at Its average noise power level decreases as the value of the allpass coefficient increases. Structure (c), the best from the point of view of the required guard bits, has its maximum at going towards infinity for coefficient values approaching one. This effect is a result of the denominator of the Nth-order allpass filter causing the poles of the filter to move towards the unit circle at normalized frequencies of k=0...N-1, for the coefficient approaching one. If there is no counter effect of the numerator, like for the case of for structure (c) and for structure (a), then the function goes to infinity. Even though structure (c) goes to infinity at for it has the lowest average noise power from all the structures. This structure has a big advantage in terms of the number of required guard bits and ease of cascading a number of them into higher order allpass filters. If the filter coefficients approach one, then the increase in quantization noise power could be countered with few additional bits. Using other structures would only replace the problem from dealing with an increase in the quantization noise to having to increase the number of guard bits required to deal with an increase of the peak gains. The NPSD of the quantization noise at the output of the polyphase structure can be calculated as the sum of the NPSD at the output of all allpass filters in the filter scaled by the 1/N factor, N being the number of paths. If the filter is cascaded with another filter, the NPSD of the first one will also be shaped by the square of the magnitude of the second filter. The verification of the theoretical analysis was done both in Simulink by comparing the results from the fixed-point implementation with the floatingpoint equivalent that incorporated quantization effects modelled as additive white noise sources. The intention was to check both the correctness of the theoretical equations by applying the white noise sources instead of quantization and by performing the quantization after addition and multiplication (rounding and truncating) to verify the shaping of the
3. Filter Implementation
185
quantization noise and its level both for white input noise sources and reallife signals. The shape of the output quantization noise accumulated from all arithmetic elements for a wide-band input signal assuming, for simplicity, no correlation between the noise sources is shown for all considered allpass structures in Figure 3-23. The solid curve indicates the theoretical NSF which is very well matching the median of the quantization noise (curves lying on top of each other). The quantization noise power increase calculated for the given coefficient was 8.5dB for structure (a), 6dB for structure (c), 7.3dB for structure (d) and 9dB for structure (b).
It is clear that the quantization “noise” differs from the assumed white noise characteristic. However, the approximation still holds with an accuracy of around 5-10% depending on the structure of the input signal. An example of more accurate modeling of the quantization noise caused by arithmetic operations can be found in [8]. The arithmetic quantization noise certainly decreases the accuracy of the filter output. The value of the arithmetic wordlength has to be chosen such that the quantization noise power is smaller than the stopband attenuation of the filter and the stopband ripples. In certain cases the design requirements have to be made more stringent to allow some unavoidable distortion due to the arithmetic wordlength effects. For the case of decimation filters for the based A/D converters the
186
DSP System Design
quantization noise adds to the one originating from the modulator. Then each decimator stage has to be designed so that it filters out this noise as well.
The verification of the peak gain analysis was performed by applying single tone signals at the characteristic frequencies – where functions from Figure 3-19 have their extremes – and by using wideband signals to make sure that the estimates are accurate. The experimental results confirmed the theoretical calculations. The results of the simulation for the white noise input signal of unity power is given in Figure 3-24. The simulation was performed for white noise input signal of unity power in order to have a uniform gain analysis across the whole range of frequencies. The theoretical shape of the gain is shown by a solid line that is very closely matching the median value of the signal at the test points. The structure (c) from Figure 319 can be identified as the best for full band filtering. Any other structure can be also used, but their usefulness is limited to signals having no spectral components at of the sampling frequency or its closest vicinity. Otherwise internal overflow may occur. The same structure (c) also performs very well in terms of the quantization noise injection. Even though it peaks in its transition band; it has the lowest injection in the filter passband and stopband, in the parts of the filter that are the most important. In most of
3. Filter Implementation
187
the cases the transition band is not that important anyway. If the noise performance is important in the transition band of the filter then the structure (a) from Figure 3-19 could be a choice. In such a case the noise has its minimum in the transition band as well as at DC and at the half sampling frequency. If it is preferable to have the same noise injection throughout the frequency range then structure (b) from Figure 3-19 could be the choice.
3.
FIR AND IIR DECIMATION FILTERS
In digital signal processing the decimation process is used to generate a discrete-time sequence y(n) with a lower sampling rate from another discrete-time sequence x(n). If the sampling rate of x(n) is the sampling rate of y(n) is where M is the down-sampling factor. Variable M is a positive integer greater than one. The time-domain input-output relation of a down-sampler is given by:
The down-sampling operation by an integer factor M>1 on a sequence x(n) consists of keeping every sample of x(n) and removing M-1 inbetween the samples. This principle will be used later in converting classical decimation filter structures into those more efficient in terms of number of operations performed per one output sample. In the frequency domain, the input-output relation of a factor-of-M down-sampler is given by:
Here and denote the Fourier Summation Transforms (FST) of x(n) and y(n) respectively. is the sum of M frequency scaled and shifted images of with adjacent images of Analysis in the frequency domain indicates possible problem of overlapping images of (“aliasing”) which can be faced during down-sampling as explained in Figure 3-26 for down-sampling by M=2. Notice that the original shape of is lost when x(n) is down-sampled, although it is still periodic. Unless is band-limited to the normalised frequency range of there the overlap of adjacent terms will exist, causing aliasing. To prevent any aliasing that may be caused by the down-sampling process, x(n) is passed through a lowpass filter approximating the ideal frequency response of (7) before it is down-sampled as indicated in Figure 3-27.
DSP System Design
188
Such a combination of an anti-aliasing filter and a down-sampler is called a decimator or a decimation filter [72]. Subsequent sections present efficient ways of implementing such decimation filters using ordinary FIR filters, general IIR ones and special class of polyphase filters from Chapter 1.
3.1
FIR decimation filter
Implementation of FIR filters is based on the idea of Finite Impulse Response (FIR) which requires storing of only up to N old samples of the incoming signal and they are the only samples required to calculate the valid output. Respectively the valid output is calculated from the latest N input samples. This makes them very easy to implement. The only difficulty is that, in order to achieve effective filtering, such filters must have a large number of coefficients that implies a big number of multiplications. In case of multirate signal processing, like decimation where samples are coming at a very high rate, performing a big number of calculations at the signal rate is a tiresome task, requiring a very fast calculation unit (and/or fast processor
3. Filter Implementation
189
clock) and a lot of buffer memory. This means large area of the integrated circuit and large power consumption. Thus, there is a need for methods that will allow decreasing the workload on the processor. The structure of the decimation filter gives a possibility of decreasing FIR filter complexity. It is possible to reduce the clock rate of calculations by moving a sample rate decreaser before the anti-aliasing filter. By additionally feeding filters with different samples in a circular fashion it avoids calculating the output samples which would be later removed anyway. Let’s consider now the standard structure of an FIR decimation filter as in Figure 3-27. Samples are coming with rate at which they are processed and then every one is being taken while other ones removed as in (8):
Considering that only one sample per M samples is kept (i.e. at indexes k = 0, M, 2M...), N*(M-1) useless computations are performed. Upon closer inspection it can be noticed that every coefficient of the filter is multiplied by samples spaced by M sampling intervals. It is then possible to spread both input samples and filter coefficients between branches, taking proper care that if coefficients are placed starting from the top-path, input samples should be inserted starting from the bottom one (Figure 3-28).
A special case is when the filter preventing aliasing of the signal replica into the baseband is a halfband filter with the number of coefficients equal M·N. Such a filter can be combined with a two times sampling rate decreaser forming a structure as in Figure 3-29. Although the classical implementation would also take advantage of having every second filter coefficient (except
190
DSP System Design
the middle one) equal to zero, such a structure allows decreasing memory size and half the number of calculations per output sample.
Interpolation filters can benefit from reduced calculation burden in a very similar way to the decimation structure, on condition that Zero Insertion Interpolation (ZII) is performed during the sampling rate increase. Consider the standard structure as on Figure 3-30.
It is easy to notice that, for every output sample, only every multiplier is receiving a nonzero sample. Therefore it is possible to move the anti-replica (image) filter in front of the sample rate increaser (multiplexer) as in Figure 3-31.
Decimation and interpolation structures above could be used to do more than just reduce the workload per output sample. By combining both the decimation and interpolation filters, the designer gets two types of parallel structures depending on which operation is done first, decimation or interpolation.
3. Filter Implementation 3.1.1
191
Interpolation - Decimation arrangement
The arrangement shown in Figure 3-32(a) allows a relaxation of the requirement for the filter doing the required processing, H(z), by designing it for the lower passband (wider transition band) or the use of filters in which the band of interest is in the lower frequencies.
This structure can be further simplified if we notice that: The anti-aliasing filter does not have to be implemented when converting back to the original (low) rate as the signal bandwidth is already band-limited to the correct cut-off frequency FS/(2M). The main processing filter can be moved behind the sample rate decreaser using the same idea as when moving the anti-replica filter for the case of the interpolation filter. Sample rate increaser and sample rate decreaser can be substituted with wires. Care has to be taken to make sure that the total number of coefficients of the anti-replica filter and the main processing one, is divisible by M. This will make sure that correct samples are reaching the summation block. If is not divisible by M then one of the filters must have delayers added to achieve synchronisation. Also, if the first coefficient of the first filter is put in the first path with others put into successive paths, then the other filter must have its coefficients arranged starting from the last path, putting each succeeding coefficient into each previous path. This leads to the structure shown on Figure 3-32(b). This structure is indirectly working at the high rate, even if all its paths are doing processing at an M-times lower rate. Such a structure is very applicable for use in parallel (multiprocessor) systems with the additional advantage that, if used in adaptive system, the
DSP System Design
192
adaptation time will be M-times lower. The practical example incorporating such multirate filter arrangement with a Lagrangian fractional delay filter doing the required processing is presented in Section 3.2 [10], [64]. 3.1.2
Decimation - Interpolation arrangement
Decimation-interpolation structure in Figure 3-33 allows the decrease of a signal sampling rate, doing calculations at the lower rate in several paths and then converting the result back to the original high output sampling rate.
This idea has been in long use and requires an anti-aliasing filter to be used before sample rate decrease. This has the disadvantage of limiting the signal bandwidth to which sometimes might not be acceptable. It can be also beneficial when used in parallel processing systems where both anti-aliasing and anti-replica filters can be processed in parallel. Only main processing filter requires sequential execution, unless its structure allows multiprocessing. An advantage is that the main processing filter is not limited to an FIR only; IIR filters can also be incorporated.
3.2
FIR decimation and interpolation structures for wideband, flat group delay filters [10]
The ideas presented above for implementing FIR decimation and interpolation filters at a lower rate can be shown in an example incorporating both. They can be used together with a Lagrangian fractional delay filter to achieve a wide-band filter with variable, arbitrary delay and high quality (high resolution) both in amplitude and phase linearity. Additionally this structure allows adaptive change of the delay at every sampling instant. The arbitrary valued delay implementation is based around the use of the Lagrangian interpolation filter [65], a delay-line and sampling rate conversion [66]. The technique involves producing a two times oversampled input signal which is then halfband limited and passed through a fractional delay filter. The resulting signal is finally decimated by two back down to the baseband frequency. The basic scheme is as shown in Figure 3-34.
3. Filter Implementation
193
The simplest way to obtain the over-sampled input signal is to resample it using ZII on alternate samples. It should be made sure that such input signal does not contain frequencies within the transition bandwidth of the decimation filter used. Through over-sampling the input signal by a factor of two the spectrum is replicated within the higher-rate sampling scheme’s normalised bandwidth. Thus, it is enough to convolve the over-sampled input signal with a fractional delay filter impulse response that only meets a desired specification up to half Nyquist frequency. In several experiments it has been found that a 35-tap fractional delay filter of the type proposed in [65] provides 20-bit magnitude resolution (i.e. passband ripples) with delay resolution amounting to one of a sample. To maintain the flat group delay characteristic over the majority of the base-bandwidth after decimation, a linear-phase lowpass band-limiting filter was required. To achieve the necessary 20-bit resolution, a 511 tap halfband FIR filter, weighted by a 120dB Dolph-Chebyshev window can be used and was used in the example to be presented later in this section. This resulted in a baseband width of 0.481 of the normalised frequency with approximately passband ripples, shown in Figure 3-35.
194
DSP System Design
The performance of the overall structure is dependent on the quality of the lowpass filter: the higher the stopband attenuation, the higher the group delay resolution. However, this is countered by increasing the transition bandwidth for the same number of filter taps. Since the interpolation process performs zero insertion between input samples, at each higher-rate sampling instance only half of the FIR coefficients are required to generate a valid output. At the first sampling instant input samples are processed by evenindexed coefficients and at the next one by odd-indexed coefficients of the cascaded FIR filters [66]. This fact permits the decomposition of filters into parallel odd and even coefficient branches, which operate at the input sampling rate. To keep the delay variable in real time, the lowpass and arbitrary delay branch filters are not combined, but are cascaded with the even part of one filter being used in the same branch as the odd part of the other. This form of connection removes the need for a unit-delay that would be required if the odd parts of both filters were combined together. As the Arbitrary Group Delay Filter (AGDF) is in effect working at twice the input sampling rate, the required delay must be scaled by a factor of two. The overall scheme also removes the need for a Sampling Rate Increaser (SRI) at the input and Sampling Rate Decreaser (SRD) at the output. The output signal is formed by summing the output signals from the branches and a scaling applied to restore the signal power. This results in the structure presented in Figure 3-36.
By placing the lowpass filter before the variable delay element the system can be adaptively changed at every input sample instant. If the filters were in reversed order and the delay requirement was to be changed, a valid output would only become available after the mid-array delay of the lowpass filter had been passed. The AGDF is composed of an integer delay line and a fractional delay filter, which is designed to have constant group delay and unity gain to a specified tolerance within the bandwidth of the lowpass filter. The integer delay is implemented using a shift register. If the newest sample is placed at the end of the register then the record of samples fed to the fractional delay filter is taken starting at the sample from the end of it,
3. Filter Implementation
195
where k is the required integer group delay at the current sampling instant. Specifying the maximum allowable integer delay makes the total length of the register equal to this number with added number of taps of the branch fractional delay filter. The motivation for devising this improved structure was to increase the signal bandwidth, while maintaining real-time variability of the delay, beyond bandwidths achievable using least integral square or unmodified Lagrangian interpolation methods [67]. It was very difficult to cross the limit of approximately 0.45 bandwidth using these existing techniques owing this to the unavoidable escalation of error right at the Nyquist frequency [67]. Moreover the new structure produces much smaller magnitude response ripples than seen in alternative approaches. As only FIR filters are used both for lowpass filtering and fractional group delay tuning, the structure is very useful for adaptive fractional delay filtering where the linearity of the group delay is of paramount importance. Although the structure uses interpolation and decimation, all calculations are performed at the original input sampling frequency. The multi-branch (multirate) approach decreases the overall group delay resulting from using a long FIR lowpass filter by the number of branches of the structure. It also allows enhanced speed of the filtering by the use of parallel processing which is especially useful when very long filters are used for high bandwidth and high-resolution applications. This idea was subsequently extended for use in an efficient fractional sample delayer for high precision digital beam steering at a baseband sampling frequency [64].
3.3
IIR decimation in Denominator-Numerator form
Implementing the decimation filter based on the general form of an IIR filter is more complicated than for the case of the FIR one as it contains the feedback loop. This requires all the calculated samples for the proper operation of the whole filter, even if some of the output samples are going to be thrown away during the sampling rate decrease. The NumeratorDenominator (N-D) form as shown in Figure 3-37 arrangement allows to achieve this requirement. The example structure implements a third-order filter with transfer function:
The switch at the input makes sure that odd samples go into one branch and the even ones into the other one. Additionally it has to make sure that
196
DSP System Design
samples in the lower branch are older than in the top one. The samples are being fed back through the coefficients of the feedback loop and a single delay except for coefficients at even-order delayers in the top branch. This is because samples coming from the lower branch are already delayed by one sampling period of the original sampling rate. The idea is for the filter to operate at a lower rate, but performing as if it was working at a higher rate.
The advantage over the standard implementation, with the switch at the output, is that all blocks in the structure operate at the output rate. The example structure performs two-times decimation. By changing the switch to the higher-order one and making similar interconnections between each path like in the structure proposed for two-times decimation it is possible to achieve decimation by any integer factor. The D-N structure by definition may suffer from the large values of the internal values. The feedback loop in the IIR filter has usually an effect of accumulating the sample values (because of poles), while the numerator is attenuating it (due to zeros). Both effects combined result in the required, stable output signal. Therefore the best result, from the point of view of internal sample values, is when the filter is implemented in the Numerator-Denominator (N-D) form.
3.4
IIR decimation in Numerator-Denominator form
The idea of the N-D structure is that if the numerator (FIR) part is preceding the denominator (feedback) it has to provide all the high rate samples to the latter one. This means that it is not allowed to lose any sample
3. Filter Implementation
197
at this stage in order to make the denominator return the correct results. Therefore the first structure to consider is the one shown in Figure 3-33. The numerator part is operating here at the high rate and all the samples are being calculated no sample rate decrease is done at this point. The denominator part is exactly the same as in the D-N structure. The output of the decimation can be taken from either branch of the denominator part. Notice that delayers are storing only input and output samples of the overall filter. As the filter is stable and the input signal bounded, then it follows that the output signal is also bounded. Therefore this structure does not have problems with excessively large sample values to store in the memory.
It is possible to force the numerator part to operate at the output rate, but because it has to provide all the high rate samples to the denominator, this requires doubling of the numerator hardware as shown in Figure 3-38. One filter replica is calculating the odd samples and the other one the even samples of the numerator. As the switch is providing the odd samples at the same time with the even samples, this fact can be used to omit some of the delayers. The delay between the result of one of the numerator filters and the input to the denominator was taken out of the numerator when it was noticed that each of its branch contains at least one delay element. This modification allows one to save three memory locations for this filter.
DSP System Design
198
Sharing delayers between both of its replicas minimised the number of memory locations for the numerator part. Unfortunately there is no way to avoid doubling the numerator coefficients. In terms of the number of multiplications the structure from Figure 3-39 is similar to the one from Figure 3-38. The difference is that the second one does all the calculations at the lower rate (output rate) than the first one. It also allows easier implementation in the multi-processor (parallel) arrangement.
3.5
Summary
The issues of efficient implementation of FIR and IIR filters in multirate filters were addressed in the section above. The changes to their structures were suggested allowing shifting of the filters originally working at the high rate to the other side of the sample rate decreaser or increaser allowing them to operate at the lower rate in the implemented system. Although the modified structures may show an increased number of multiplications and additions compared to the original structure, the number of computation per sample period, calculated for the same reference rate, is lower. In the worst case it is equal to the number of calculation of the prototype implementation. The presented structural modifications may be used to advantage for optimising the work load between multiple processing units allowing each of them to operate at a single clock rate. This also allows lowering the clock rate of the processing units, often leading to reduced power consumption and decreasing other effects due to high operating rate of the integrated circuit.
Chapter 4 VHDL FILTER IMPLEMENTATION Automatic Code Generation Techniques
1.
BASICS OF VHDL
VHDL stands for Very High Speed Integrated Circuits (VHSIC) Hardware Description Language (HDL). It is a language for describing digital electronic systems. It was born out of the United States Government’s VHSIC program in 1980 and was adopted as a standard for describing the structure and function of Integrated Circuits (IC). Soon after it was developed and adopted as a standard by the Institute of Electrical and Electronic Engineers (IEEE) in the US (IEEE-1076-1987) and in other countries [76], [77]. VHDL continues to evolve. Although new standards have been prepared (VHDL-93) most commercial VHDL tools use 10761987 version of VHDL, thus making it the most compatible when using different compilation tools. VHDL enables the user to: Describe the design structure; specify how it is decomposed into subdesigns, and how these sub-designs are interconnected. Specify the function of designs using a familiar, c-like programming language form. Simulate the design before being sending it for fabrication, so that designers have a chance to rapidly compare alternative results and test for correctness without delay and expense of multiple prototyping.
200
DSP System Design
VHDL is a C-like a general purpose programming language with extensions to model both concurrent and sequential flows of execution and allows delayed assignment of values. To a first approximation VHDL can be considered to be a combination of two languages: one describing the structure of the integrated circuit and its interconnections (structural description) and the other one describing its behaviour using algorithmic constructs (behavioural description). VHDL allows three styles of programming: Structural Register Transfer Level Behavioural The first one, structural, is the most commonly used as it allows description of the structure of the IC very precisely by the user. This in very many cases gives the best performance over compiler optimised structures, especially for high speed, fixed-point applications like polyphase structures. Its behavioural style permits the designer to quickly test concepts, where the designer can specify the high-level function of the design without taking much care how it will be done structurally. This can be very attractive for quick design of low and medium-speed and low-volume applications, where the designer expertise is not available. A word of warning is appropriate here. Designs synthesised from behavioural descriptions will often end up using a lot more resources than actually necessary, even after optimisation. However, the success of VHDL for designing integrated circuits is indisputable. Unfortunately there is a lack of tools available linking VHDL tools with such high-level digital filter design/simulation tools like Matlab and Simulink which operate on the levels higher than the structure. At the moment the designer who designed and tested his design theoretically using high-level tools is required to spend the same or more time on designing the structure and the architecture for his theoretical design, simulate it, test it and fabricate it. This involves a dangerous break in the integrity of design flow, giving chances for inconsistencies to creep in. An automated high-integrity link between theoretical design and implementation is essential and can be achieved with VHDL via a conversion tool. A very attractive high-level design/simulation tool is provided by Math Works™ and is called Simulink. It is a very flexible design tool, which allows testing of a high-level structural description of the design and makes possible quick changes and corrections. The circuit description structure is very similar to the way the design could be implemented later. Therefore mapping tool allowing conversion of such a structure into a VHDL code would save the designer’s time, which otherwise has to be spent in rewriting the same structure in
4. VHDL Filter Implementation
201
VHDL and probably making mistakes that will need debugging. This idea is the basis of the work described later. Primarily the work has been concentrated on the analysis of the Simulink structure and its similarity with the VHDL description. The structural style of programming has been chosen for the first version of the program, as this would allow direct mapping of Simulink structures into ones described in VHDL. As Simulink is a highlevel description tool and allows such operations as unconstrained arithmetic operations, the behavioural style will be included in the next version of the conversion program that is still under development.
2.
CONVERTING FROM MATLAB TO VHDL
So far the biggest problem which the designer faces very often is how to pass from the algorithmic design to its physical implementation. The first tool the designer uses when developing the new idea is a high-level design and simulation tool. One of the most commonly used high-level tools is Matlab with Simulink. It allows the designer to put together a behavioural or structural simulation very easily and quickly checking the algorithm or making the necessary adjustments to it. Working directly with any low-level implementation tool from the start is simply not practical, as every small change in the algorithm may sometimes require substantial redesign of the implementation. Therefore an automatic link between the high-level algorithmic design, like Simulink model, to some implementation description, like a target netlist or VHDL, would lead to great effort and time savings in the design cycle. The task of the conversion tool is as follows: Analyse the Simulink model and identify: Common and different blocks Connections (signal lines) and ports for multilevel models Block parameters Generate a VHDL equivalent: Identify entities available in the standard component library Create architectures for each block from bottom up Create configuration files for every entity linking in standard libraries Matlab has been used by a number of developers for a long time and has proven to be an invaluable tool for DSP applications. Therefore this software was chosen for the high-level design part of the whole system. In the first
DSP System Design
202
instance Simulink part of Matlab, has been chosen to be the input to the conversion tool. The fact that Simulink makes it possible to design both behavioural and structural designs (where this latter one is the closest to the physical implementation) justifies its choice. The description of a typical Simulink block is similar to the netlist of the physical implementation. However, it can be easily noticed is that there is a set of blocks in Simulink, which have to be treated as the basic ones. There are compiled “s-functions”, the contents of which are not available. Therefore, their behaviour has to be carefully analysed in order to create their equivalent VHDL descriptions, to be later included into the library of standard Simulink entities/architectures. The VHDL description has been chosen for the output of the conversion tool, as it is the highest level technology-independent description of the design to be realised. There are other tools available, both for UNIX and PC, for compiling VHDL into a netlist, then ported into the silicon fabrication arena or FPGAs. Such tools include Peak VHDL/FPGA from Accolade Design Automation Inc. [78], Galileo and Renoir from Mentor [79].
2.1
Basics of Simulink
Simulink, as is true for most of high-level simulation software, does not allow testing certain behaviour patterns that a real target design can exhibit, most of which are available for the VHDL simulator. The most reliable simulation can only be performed after porting the compiled VHDL into the implementation software. However, Simulink does not: Support fixed point arithmetic in the general sense. Use data types compatible with bit logic (floating-point bit simulation). Define propagation delay in its blocks, necessary for implementation. Support reusable symbols (same symbols may have different contents). In the structural simulation using bit logic arithmetic it is possible to force Simulink to assign only 0s and 1s, even though they are represented with floating-point variables/signals. Fixed-point arithmetic can be implemented structurally in Simulink using gates. This also simplifies setting propagation delay, as this could be included into the VHDL description of each gate. However, this is not possible in the Simulink model. Summarising, the structural fixed-point design can be quite easily converted into VHDL directly, without much additional intelligence required from the conversion program. The model description of the Simulink block (MDLfile) is very similar to the representation of the common structure. It contains both the parameters of the simulation, description of each block with parameters for each block and block connections. The problem is that Simulink does not use reusable symbols. This means that if there are several
4. VHDL Filter Implementation
203
blocks or symbols of the same name, they are all fully duplicated to the most basic element. These makes the analysis of common blocks much more difficult as these blocks may have slight differences and then qualify as two different ones, even if they have the same name. Therefore, the designer must obey the rule that all blocks having the same symbol must also have the same contents. They may only have different parameters.
2.2
Analysis of the Simulink MDL description
As was pointed out earlier, the description of the Simulink model has close resemblance to the Matlab structure. Describing the model with the structure would allow simplifying the conversion process as interdependence of blocks could be indicated by their position in the tree of blocks. Therefore the conversion of the MDL-file into the Matlab structure was the first task to be done by the conversion utility developed. The main problems faced in this stage were: The structure did not allow the same field names at the same level, which was allowed in the MDL-file. All the blocks and lines (connection signals) had to be renamed consecutively as a remedy to this problem. There are no commas to separate parameters and values in the MDL-file, required by the structure syntax. They had to be included appropriately. There is an inconsistency in the description of text constants. In Matlab they are indicated by a single quote, in the MDL-file by the double quote. Therefore single quotes were replaced by double quotes wherever the text constant was found. Simulink does not require ports to have their width always defined. This created confusion in specifying the number of input/output signals in the entity definition. The safest solution was to make a rule of explicitly defining the width of the ports in the Simulink model wherever it was possible. Even so there were cases when the data type had to be derived indirectly from the block to which the port was connected. The number of input and output ports was not defined consistently. For some Simulink blocks they were clearly given by the parameters “Inputs” and “Outputs”. For other ones there was only one parameter “Ports”, containing a five element vector with the number of input ports in the first element and output ports in the second element. There were also several blocks for which there was no description of the number of ports at all. For such a case whether the block had input or output port had to be derived from the connection description (“Line”). The main keyword in the MDL-file to look for is “System”. This indicates the beginning of the description of the blocks and
DSP System Design
204
their connections within one block. It is then followed by a number of “Block” sections describing components of the design and “Line” sections each equivalent to a single wire connector (one can connect to multiple outputs). The “Block” can have another “System” section, which means it contains a lower-level circuit description. Sometimes such blocks have also had some mask parameters. This indicates that there has been a symbol created for such a block. In this case “Mask type” describes the common symbol name (which could be used for the entity name later), “MaskPromptString” contains descriptions of the symbol parameters, “MaskInitialization” has their names and “MaskValueString” their values. If no “System” is found it means that the block is the basic component of the Simulink library and its description should be later copied from the library of basic VHDL blocks. The “Line” statement contains the names of one source block and one or more output ones and their port numbers. For multiple output ports each of them is described by its own “Branch” statement. There are also other block parameters like “Decimation” and “SamplingTime”, which are useful for multirate systems. The problem with using the MDL description is that it has been changing from one Matlab release to another. The conversion tool that has been developed initially for version 5.3, was not working on version 6.0. The same has happened later with version 6.1 and the next ones. In order to allow users to create their own converters from the Simulink block diagram into other ones (C, ADA and other ones) Math Works has introduced a concept of the Target Language Compiler.
2.3
Target Language Compiler
The Target Language Compiler (TLC) is a feature within Real Time Workshop, introduced from Matlab version 6.0, which allows the user to customize the code generated by Real-Time Workshop (RTW). The TLC includes: Complete set of TLC files corresponding to all Simulink blocks TLC files for model-wide information specifying header and parameters The TLC files are ASCII text files that allow controlling the way that the code is generated by the Real-Time Workshop. The designer may change the way code is generated for a particular block by editing a TLC file. The Target Language Compiler provided with Matlab contains a complete set of working TLC files allowing generating ANSI C code. The TLC files can be changed as required. The TLC is an open environment giving full flexibility of customizing and adjusting the code generated by Real-Time Workshop to suit users’ needs and applications.
4. VHDL Filter Implementation
205
The Target Language Compiler enables the user to customize the C code generated from any Simulink model and generate an in-lined code for his own Simulink blocks. By modifying the TLC code, the user can produce platform-specific code, or even may include own algorithmic changes to increase the performance, change the code size, or make the code compatible with existing methods that user has been using and wishes to maintain. The top-level diagram in Figure 4-1 shows how the Target Language Compiler fits in with the Real-Time Workshop Code generation process. The blocks drawn with dashed lines are not necessary for converting the Simulink description into VHDL. They are only required if the stand-alone executable Simulink model is to be generated. The TLC was designed for the sole purpose of converting the model description file, “model.rtw” or other similar files into target-specific code. Being an integral part of the Real-Time Workshop, the Target Language Compiler transforms an intermediate form of the Simulink system, called “model.rtw”, into custom language code (in our case the VHDL code). The “model.rtw” file contains a “compiled” version of the model including the execution semantics of the block diagram in a high level language, described in the “model.rtw” file.
TLC-generated code is can take advantage of the capabilities and fit to the limitations of specific target processor architectures. After reading the
206
DSP System Design
“model.rtw” file, the Target Language Compiler generates the new code according to the target files, specifying how each block should be coded and model-wide files, specifying the overall code style. The TLC behaves similar to the text processor, using the target files and the “model.rtw” file to generate the VHDL code. In general case, in order to create a target-specific executable application, the Real-Time Workshop also requires a template makefile that specifies the appropriate compiler and compiler options required for the build process. The model makefile is created from the template makefile by performing token expansion specific to a given model. A target-specific version of the generic “rt_main” file (or “grt_main”) must also be modified to conform to the target’s specific requirements such as interrupt service routines. A complete description of the template makefiles and “rt_main” is included in the Real-Time Workshop documentation. The Target Language Compiler resembles other high level programming languages borrowing ideas from HTML, Perl, and MATLAB. It has mark-up syntax similar to HTML, the power and flexibility of Perl and other scripting languages, and the data handling power of MATLAB. The TLC can generate the code from any Simulink model, including linear, nonlinear, continuous, discrete, or hybrid. All Simulink blocks are automatically converted to code, with the exception of MATLAB function blocks and S-function blocks that invoke M-files. The Target Language Compiler uses block target files to transform each block in the “model.rtw” file and a model-wide target file for global customization of the code. It is possible to write a target file for custom C MEX S-function to inline the S-function (see Matlab documentation), thus improving performance by eliminating function calls to the S-function itself and the memory overhead of the S-function’s simStruct. Target files can be also written for M-files allowing incorporating them into VHDL convertible Simulink systems. If the user needs to customize the output of Real-Time Workshop, he will need to instruct the Target Language Compiler how to: Change code generated for a particular Simulink block inline S-functions Modify the way code is generated in a global sense Generating code in a language other than C In order to produce customized output using the TLC, the user needs to understand how blocks perform their functions, what data types are being manipulated, the structure of the “model.rtw” file, and how to modify target files to produce the desired output. Please refer to Matlab documentation for the directives and built-in functions describing the target language directives and their associated constructs. The TLC directives and constructs need to be used to modify existing target files or create new ones, depending on particular needs. See TLC Files for more information about target files.
4. VHDL Filter Implementation
207
Inlining S-Functions
The TLC provides a great deal of freedom for altering, optimizing, and enhancing the generated code. One of the most important TLC features is that it lets to inline S-functions that may be written to add custom user algorithms, device drivers, and custom blocks to a Simulink model. In order to create an S-function, C code needs to be written following a well-defined API. By default, the compiler generates non-inlined code for S-functions that invoke them using this same API. This interface incurs a large amount of overhead due to the presence of a large data structure called the SimStruct for each instance of each S-function block in the model. In addition, extra run-time overhead is involved whenever functions within an S-function are called. This overhead can be eliminated by using TLC to inline the S-function, by creating a TLC file named “sfunction_name.tlc” that generates source code for the S-function as if it were a built-in block. Inlining an S-function improves the efficiency and reduces memory usage of the generated code. In principle, the TLC can be used to convert the “model.rtw” file into any form of output (in our case - the VHDL) by replacing the supplied TLC files for each block it uses. Likewise, some or all of the shipping system-wide TLC files can be replaced. It is not generally recommended by Math Works, although it is supported. In order to maintain such customizations, custom TLC files may need to be updated with each release of the Real-Time Workshop as Math Works continues to modify code generation by adding features and improving its efficiency, and very likely by altering the contents of the “model.rtw” file. There is no guarantee that such changes will be backwards compatible. However, the changes to TLC files are less likely to cause problem of converting Simulink blocks into VHDL that by using the MDL model description. Moreover inlined TLC files that users prepare are generally backwards compatible, provided that they invoke only documented TLC library and built-in functions. Code Generation Process
Real-Time Workshop invokes the TLC after a Simulink model is compiled into an intermediate form of “model.rtw” that is suitable for generating code. To generate code appropriately, the TLC uses its library of functions to transform two classes of target files: system target files and block target files. System target files are used to specify the overall structure of the generated code, tailoring for specific target environments. Block target files are used to implement the functionality of Simulink blocks, including user-defined S-function blocks. You can create block target files for C MEX, Fortran, and M-file S-functions to fully inline block functionality into the body of the generated code.
208
2.4
DSP System Design
Automated conversion from Simulink to VHDL
In order to simplify the first version of the conversion program, it has been designed with some constraint put on the original Simulink model. The model was required to: Operate on bit signals or vectors of bits Have only one sampling rate throughout the design Be composed of gates, constants, ports and buses only This allowed the generation of the structural VHDL description relatively easily. The next versions of this toolbox will allow different variable types and generate structural or behavioural VHDL wherever applicable. The conversion requires two passes. First it looks through the whole design identifying common blocks of the model, each of which would be described in a separate VHDL file. It distinguishes the sub-blocks of the model from the basic Simulink blocks. It also gathers information about ports of each block and their types. This information is needed for creating “component” statements in the VHDL file. At the second pass the algorithm looks recursively through the whole hierarchy of the model from the top level down to the bottom one creating the structural description of each block found in the first pass. For each of them it finds the list of “blocks” and the list of “lines”. The first ones are used to generate block instantiation and configuration commands and the latter ones to define the internal signals. The entity definition is being created from the information found in the first pass of the conversion.
2.5
Fixed-point polyphase half-band filter example
The idea of converting the Simulink design into VHDL has been tested on the example of the two-path two-coefficient polyphase filter [25]. The design was first captured using standard floating-point Simulink blocks. In order to make it close to the implementation the results of additions were rounded-to-zero to 14-bits (Table 4-1), subtractions truncated to 14-bits (Table 4-2) and multiplication truncated to 18-bits (Table 4-3).
4. VHDL Filter Implementation
209
Local increase of wordlength at the multiplication was decided upon in order to avoid the unnecessary loss of precision before the subsequent addition. All data was being represented in two’s complement arithmetic with 2 integer bits and a sign, which gives enough guard bits to deal with internal calculation, 14 altogether. Such a rounding scheme allowed eliminating of the limit cycles while keeping the DC offset low. The floating-point version of the filter has been compared to the architectural one designed from standard gates (Figure 4-1).
The simulation used a two-phase non-overlapping clock required by the delayers built from two D-type flip-flops per bit per unit delay. Flip-flops were active with the rising edge of the clock. The data was read at the rising edge of Clock1 and was available at the output at the rising edge of Clock2.
210
DSP System Design
The comparative simulation allows testing of the design for both an impulse and for the signal generated by modulator. Results of both the fixed-point behavioural and the fixed-point structural design versions matched bit to bit. The fixed-point structural system has been designed to run from the external clock signal in order to be able to synchronise the filter with the input data for the ultimate physical implementation. The only blocks requiring the clock are the delayers; the rest is just combinational blocks for which the result is available at a certain time after the change of the input. This time is called the propagation time. The maximum propagation time is dependent on the propagation time of the gates and the maximum number of dependent gates the signal has to go through.
Figure 4-2 shows the inside of the fixed-point polyphase lowpass filter and Figure 4-3 describes the allpass structure used for both the UpperBranch and the LowerBranch blocks (the only difference being the multiplication factor). The floating-point design is similar to the fixed-point one. It differs in not having a clock signals since Simulink controls the simulation itself.
4. VHDL Filter Implementation
211
The 14-bit wide delayers in Figure 4-4 have been designed using two Dtype flip-flops in the Master-Slave arrangement for each bit. The Mux and Demux are just converting the single bit lines into the vector of bits and back again. They were used for the purpose of the simulation only and were not required for implementation. The structure of the 14-bit adder with truncation is shown in Figure 4-5. The second input is being negated before being added to the first input. As the two’s complement arithmetic is used, negation is achieved by inverting all the bits at the negated delayer output, Q!, and adding one using a ladder of two-bit adders with carry, shown in Figure 4-6. Assuming all the gates to have same propagation delay, the time required to add two numbers was estimated to be
212
DSP System Design
4. VHDL Filter Implementation
213
The multiplication by 0.125 (Figure 4-7), required in the UpperBranch, effectively means shifting data three bits towards the Least Significant Bit (LSB). In order to take care of the negative numbers in two’s complement arithmetic, the Most Significant Bit (MSB) has been propagated to the next three bits (sign extension). The output is given in 18 bits without any loss of precision. Actually, 17 bits is enough to provide the full accuracy. However, 18 bits sizing have been chosen for the consistency with the other multiplier by the factor of 0.5625.
214
DSP System Design
The multiplication by 0.5625 is more complicated (Figure 4-8), as this requires adding together two shifted versions of the input, by one bit (0.5 factor) and by four bits (0.0625 factor). The result is available after a maximum time of For such a case 18 bits are required to provide the output at the full accuracy.
The result of the multiplication by 0.125 or 0.5625 is being added to the delayed samples of the input by the structure shown in Figure 4-10. Incorporated in the multiplier structure is a one-bit no-carry adder used to add an additional carry bit originated from the selected rounding scheme. Its implementation is shown in Figure 4-11.
4. VHDL Filter Implementation
215
The result of adding a 14-bit input to the 18-bit one is then constrained back to 14 bits using a round-to-zero scheme achieved with OR and AND gates. The four-port OR gate examines if there is any 1’s set among the disregarded bits. If the MSB=0 (positive number), the output of the OR gate is disregarded forcing to truncate the data. If the MSB=1 (negative number), the result of the OR gate is added to the output rounding it up towards zero. The maximum propagation time of the block was estimated to be
216
DSP System Design
The 14-bit addition with truncation of the result has been implemented as in Figure 4-12. No loss of precision happens here as the format of the data is such that it takes care of the possible carry bit. The maximum propagation time was estimated to be The final and the simplest block of the polyphase two-path IIR structure is a divider by two implemented as a one-bit right shifter (Figure 4-9). It is required at the output of the filter to scale the overall transfer function to unity at the DC. The result has been subsequently truncated to 14 bits by disregarding the LSB of the input data.
The conversion of the Simulink model description into VHDL was achieved using Target Language Compiler approach. This avoided most problems associated with the previously used MDL to VHDL program [87]. The basic blocks like D-type flip-flops with reset, standard logic gates and the two-phase clock generator (added to the custom library of Simulink blocks) could now be converted using their associated TLC files instead of coding them manually into VHDL . A top-level Simulink block converted to VHDL served as a test-bench file. It was used to compare the results of the VHDL simulation with the output from the fixed-point Simulink model. The complete output of Simulink run has been stored in the file comprising all bits of the input and the output. This file has been read sample-by-sample and compared with the output of the VHDL simulation at each clock cycle. The compilation and simulation of the VHDL code, and subsequently its synthesis, has been done with PeakFPGA from Accolade Design Automation Inc. [78] provided by the company for the purpose of evaluation. The example screen shot of the simulator software running the designed VHDL code is shown in Figure 4-13.
4. VHDL Filter Implementation
217
The test bench may be converted from Simulink, but it is better to create a new one, which would compare the results from Simulink with the results of the VHDL simulation, exactly the way it was done for the two-coefficient example design. The VHDL simulation differs from the high-level Simulink one as: There is a need to take proper care of avoiding unassigned states by properly resetting the design before it starts operating. This has been achieved by setting a CLR_L to zero for the first of the simulation, before the first rising edge of i.e. first reading of the data from the input. The propagation time through the blocks of the design plays an important role in the design. This parameter is not considered in Simulink at all. The VHDL simulation allowed assessment of the maximum speed of operation of the design that was approximately four times the maximum settling time of the combinational logic. For the simulation provided here the clocking speed has been set to 2.5MHz, assuming 2ns propagation time for the logic gates.
218
DSP System Design
The simulation in Simulink required only that and were non-overlapping clock signals. The VHDL simulation proved that the best performance (highest speed of operation) has been achieved when overlapping time was a quarter of the clock period. The VHDL code designed for the example single-stage polyphase filter has been subject to synthesis in both PeakFPGA version 4.25 and in Galileo for Xilinx from Mentor [79]. The first one returned the same result for all design families, including Actel, Altera, Lattice, Lucent and QuickLogic EDIF devices. It turned out that only252 flip-flops and 790 two-input gates were required to implement the design (excluding the clock) generator.
The results of the synthesis by Galileo for Xilinx only were different for each design component. These results are presented in Table 5.1. Basically Galileo calculated that only 168 flip-flops were needed for the delayers. The difference was in the number of gates required, between 185 and 1018 depending on the technology. Galileo, in contrast to PeakFPGA, also gave the estimated input-to-output delay between 86ns and 245ns, dependent on the technology used. The maximum clock frequency of the filter may therefore range from 1.1MHz for Xilinx-5200, 1.4MHz for Xilinx-3000 up to 2.9MHz for Xilinx-3100. It is dependent on the propagation delay of combinational logic, which is maximum 4.5ns for Xilinx-5200, 3ns for Xilinx-3000 and 1.5ns for Xilinx-3100. The propagation delay for Xilinx9500XL is 4-6ns. The sequential delay of the flip-flop is merely up to 6ns and all of them work in parallel. Therefore the preferable technology to implement the filter could be Xilinx-3100A, giving the best speed of operation at low cost and optimum use of the FPGA. Assuming technology with a transistor size of the gate consisting of four transistors and each flip-flop consisting of eight the estimated total size of the components of the design would be approximately 0.2mm by 0.2mm plus few percent for the connections.
4. VHDL Filter Implementation
3.
219
SUMMARY
The specimen filter that has been designed could be comfortably used for the first four stages of the decimation filter described in [25]. Even when considering that the design has to be repeated eight times, the total required silicon area of 0.5mm by 0.5mm is very tiny. Putting together the hardwired filters would avoid the need for fast processors, giving more space for the analogue part of the A/D converter, hopefully the whole modulator. The small size implications are a big advantage as this would free up silicon real estate for the implementation of other functions. The example design of the polyphase filter and then its conversion into VHDL proved that such an idea would be a very attractive way of designing test chips very quickly. It took three days to get from the Simulink model to its final synthesised version. The next stage of the research work would be either to compile to a custom layout and put it onto silicon or to commit the design onto a standard FPGA. The current version of the program performs only direct mapping of structures from Simulink to VHDL and does not work for multiplexed architectures. In order to perform such a conversion the program require an algorithm analysing behavioural or structural descriptions to find common operators, and convert them into the multiplexed structure with added control circuitry. This is the aim of the on-going work.
Appendix A
SINGLE COEFFICIENT ALLPASS SECTION A basic single-coefficient allpass section is described here. It is a basic block of the polyphase recursive IIR filter structures described in this book. The commonly used structures of such a filter are shown in Figure A-1:
222
DSP System Design
Structures (a) and (d) require less space of the integrated circuit than the other two ones due to less number of mathematical operations required. One multiplier, two adders and a small memory is all they need. Structure (d) is very useful for cascading allpass sections. The delayers can be shared by successive allpass sections. The transfer function of this basic building block is given by:
The basic allpass section impulse response:
The phase response:
The group delay response:
Appendix A. Single Coefficient Allpass Section The step response obtained by convolution in time domain:
Total energy:
Impulse response centre of gravity:
Average time delay:
223
224
DSP System Design
Appendix A. Single Coefficient Allpass Section
225
References
1. harris, f., “On the design and performance of efficient and novel filter structures using recursive allpass filters”, IEEE 3rd International Symposium on Signal Processing and its Applications (ISSPA’92), Volume: 1, Page(s): 1-5, Gold Coast, Queensland, Australia, 16-21 August 1992. 2. Kale, I, R. C. S. Morling, A. Krukowski and D. A. Devine, “A high fidelity decimation filter for Sigma-Delta A/D converters”, IEE Second International Conference on Advanced A-D and D-A Conversion Techniques and their Applications (ADDA’94), No: 393, Page(s): 30-35, Cambridge, United Kingdom, 6-8 July 1994. 3. Krukowski, A., I. Kale, K. Hejn and G. D. Cain, “A bit-flipping approach to multistage two-path decimation filter design”, Second International Symposium on DSP for Communications Systems (SPRI’94), Adelaide, Australia, 26-29 April 1994. 4. harris, f., M. d’Oreye de Lantremange and A. G. Constantinides, “Digital signal processing with efficient polyphase recursive all-pass filters”, IEEE International Conference on Signal Processing, Florence, Italy, 4-6 September 1991. 5. Valenzuela, R. A. and A. G. Constantinides, “Digital signal processing schemes for efficient interpolation and decimation”, IEE Proceedings, Volume: 130, Part: G, No: 6, Page(s): 225-235, December 1983. 6. Korn, T. M. and G. A. Korn, Mathematical handbook for scientists and engineers: Definitions, theorems and formulas for reference and review, Dover Publications; ISBN: 0486411478, February 2000. 7. Kale, I., A. Krukowski and N. P. Murphy, “On achieving micro-dB ripple polyphase filters with binary scaled coefficients”, Second International Symposium on DSP for Communications Systems (SPRI’94), Adelaide, Australia, 26-29 April 1994. 8. Hejn, K. and A. Krukowski, “Insight into a digital sensor for sigma-delta modulator investigation”, IEEE Instrumentation and Measurement Technology Conference (IMTC’94), Proceedings: Advanced Technologies in Instrumentation and Measurement, Volume: 2, Page(s): 660-663, Hammamatsu, Shizuoka, Japan, 10-12 May 1994. 9. Kale, I., N. P. Murphy and M. V. Patel, “On establishing the bounds for binary scaled coefficients of fifth and seventh order polyphase half-band filters”, IEEE International Symposium on Circuits and Systems (ISCAS’94), Volume: 2, Page(s): 473-476, London, United Kingdom, 30 May - 2 June 1994.
228
DSP System Design
10. Murphy, N. P., A. Krukowski and I. Kale, “Implementation of wideband integer and fractional delay element”, Electronics Letters, Volume: 30, No: 20, Page(s): 1658-1659, 29 September 1994. 11. Kale, I. and R. C. S. Morling, “High resolution data conversion via sigma-delta modulators and polyphase filters: a review”, Proceedings Measurement - Journal of the IMEKO, Elsevier Science Publisher, Volume: 19, No: 3/4, Page(s): 159-168, 1996. 12. Sheingold, H. D., Analog-digital conversion handbook, Edition, Analog Devices Inc., Prentice-Hall, New York (USA), ISBN: 0130328480, July 1997. 13. Bennett, W. R., “Spectra of quantized signals”, Bell Systems Technical Journal, Volume: 27, Page(s): 46-472, July 1948. 14. Aziz, P. M., H. V. Sorensen and J. Van Der Spiegel, “An overview of sigma-delta converters: How 1-bit ADC achieves more than 16-bit resolution”, IEEE Signal Processing Magazine, Page(s): 61-84, January 1996. 15. Boser, B. E., “Design and implementation of oversampled Analog-to-Digital converters”, PhD Thesis (Contract: 88-DJ-112), Stanford University, California, USA, October 1988. 16. Hejn, K., N. P. Murphy and I. Kale, “Measurement and enhancement of multistage sigmadelta modulators”, IEEE Instrumentation and Measurement Technology Conference (IMTC’92), Page(s): 545 - 551, New York, USA, 12-14 May 1992. 17. Chu, S. and C. S. Burrus “Multirate filter designs using comb filters”, IEEE Transactions on Circuits and Systems, Volume: 31, No: 11, Page(s): 913-924, November 1984. 18. Krukowski, A., “Decimation filter design for oversampled A/D converters”, MSc Project Report in DSP Systems, University of Westminster, London, United Kingdom, 1993. 19. Dijkstra, E., O. Nys, C. Piguet and M. Degrauwe “On the use of modulo arithmetic COMB filters in sigma-delta modulators”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’88), Volume: 4, Page(s): 2001-2004, New York, USA, 11-14 April 1988. 20. Park, S. and W. Chen, “Multi-stage IIR decimation filter design technique for high resolution sigma-delta A/D converters”, IEEE Instrumentation and Measurement Technology Conference (IMTC’92), Page(s): 561-566, New York, USA, 12-14 May 1992. 21. Dijkstra, E., L. Cardoletti, O. Nys, C. Piguet and M. Degrauwe “Wave digital decimation filters in oversampled A/D converters”, IEEE International Symposium on Circuits and Systems (ISCAS’88), Volume: 3, Page(s): 2327-2330, Espoo, Finland, 7-9 June 1988. 22. Kale, I., R. C. S. Morling, A. Krukowski and D. A. Devine, “Architectural design simulation and silicon implementation of a very high fidelity decimation filter for SigmaDelta data converters”, IEEE Instrumentation and Measurement Technology Conference (IMTC’94), Proceedings: Advanced Technologies in Instrumentation and Measurement, Volume: 2, Page(s): 878-881, Hammamatsu, Shizuoka, Japan, 10-12 May 1994. 23. Krukowski A., R. C. S. Morling and I. Kale, “Quantization effects in the polyphase N-path IIR structure”, IEEE Instrumentation and Measurement Technology Conference (IMTC’2001), Volume: 2, Page(s): 1382-1385, Budapest, Hungary, 21-23 May 2001. 24. Kale, I., R. C. S. Morling and A. Krukowski, “The design, simulation and silicon implementation of a very high fidelity 24-bit potential decimation filter for sigma-delta A/D converters”, Fourth Cost#229 Workshop on Adaptive Methods and Emergent Techniques for Signal Processing and Communications, Page(s): 155-161, Ljubljana, Slovenia, 5-7 April 1994. 25. Kale, I., R. C. S. Morling and A. Krukowski, “A high-fidelity decimator chip for the measurement of sigma-delta modulator performance”, IEEE Transactions on Instrumentation and Measurement, Volume: 44, No: 5, October 1995.
References
229
26. Jantzi, S., R. Schreier and W. M. Snelgrove, “Bandpass sigma-delta analog-to-digital conversion”, IEEE Transactions on Circuits and Systems, Volume: 38, No: 11, Page(s): 1406-1409, November 1991. 27. Krukowski, A. and I. Kale, “Constrained coefficient variable cut-off polyphase decimation filters for band-pass Sigma-Delta data conversion”, IMEKO Workshop on ADC Modeling, Page(s): 85-90, Smolenice Castle, Slovak Republic, 7-9 May 1996. 28. Schreier R. and W. M. Snelgrove, “Decimation for bandpass sigma-delta analogue-todigital conversion”, IEEE International Symposium on Circuits and Systems (ISCAS’90), Volume: 3, Page(s): 1801-1804, New Orleans, USA, 1-3 May 1990. 29. Krukowski, A., I. Kale, K. Hejn and R. C. S. Morling, “A design technique for polyphase decimators with binary constrained coefficients for high resolution A/D converters”, IEEE International Symposium on Circuits and Systems (ISCAS’94), Volume: 2, Page(s): 533-536, London, United Kingdom, 30 May - 2 June 1994. 30. Constantinides, A. G., “Spectral transformations for digital filters”, IEE Proceedings, Volume: 117, No: 8, Page(s): 1585-1590, August 1970. 31. Mahoney, M., DSP-based testing of analogue and mixed-signal circuits, Wiley - IEEE Computer Society Press, ISBN: 0-8186-0785-8, April 1987. 32. Krukowski A. and I. Kale, “The design of arbitrary-band multi-path polyphase IIR filters”, IEEE International Symposium on Circuits and Systems (ISCAS’2001), Volume: 2, Page(s): 741-744, Sydney, Australia, 6-9 May 2001. 33. Curtis, T. E. and A. B. Webb, “High performance signal acquisition systems for sonar applications”, IEE International Conference on Analogue to Digital and Digital to Analogue Conversion, IEE Conference Publication No: 343, Page(s): 87-94, Swansea, United Kingdom, 17-19 September 1991. 34. Lawson, S., “On design techniques for approximately linear phase recursive digital filters”, IEEE International Symposium on Circuits and Systems (ISCAS’97), Volume: 4, Page(s): 2212-2215, 9-12 June 1997. 35. Lu, W. S., “Design of stable IIR digital filters with equiripple passbands and peakconstrained least squares stopbands”, IEEE International Symposium on Circuits and Systems (ISCAS’97), Volume: 4, Page(s): 2192-2195, 9-12 June 1997. 36. Lawson, S. S., “Direct approach to design of PCAS filters with combined gain and phase specification”, IEEE Proceedings on Vision, Image and Signal Processing, Volume: 141, No: 3, Page(s): 161-167, June 1994. 37. Constantinides, A. G., “Frequency transformations for digital filters”, Electronics Letters, Volume: 3, No: 11, Page(s): 487-489, November 1967. 38. Constantinides, A. G., “Design of bandpass digital filters”, Proceedings of IEEE, Volume: 1, No: 1, Page(s): 1129-1231, June 1969. 39. Constantinides, A. G., “Frequency transformations for digital filters”, Electronics Letters, Volume: 4, No: 7, Page(s): 115-116, April 1968. 40. Broome, P., “A frequency transformation of numerical filters”, Proceedings of IEEE, Volume: 52, Page(s): 326-327, February 1966. 41. Hazra, S. N. and S. C. Dutta Roy, “A simple modification of Broome’s transformation for linear-phase FIR filters”, Proceedings of IEEE, Volume: 74, No: 1, Page(s): 227-228, January 1986. 42. Cain, G. D., A. Krukowski and I. Kale, “High order transformations for flexible IIR filter design”, VII European Signal Processing Conference (EUSIPCO’94), Volume: 3, Page(s): 1582-1585, Edinburgh, Scotland, 13-16 September 1994.
230
DSP System Design
43. Krukowski, A., G. D. Cain and I. Kale, “Custom designed high-order frequency transformations for IIR filters”, IEEE 38th Midwest Symposium on Circuits and Systems (MWSCAS’95), Volume: 1, Page(s): 588-591, Rio de Janeiro, Brazil, 13-16 August 1995. 44. Nowrouzian, B. and A. G. Constantinides, “Prototype reference transfer function parameters in the discrete-time frequency transformations”, IEEE 33rd Midwest Symposium on Circuits and Systems (MWCAS’90), Volume: 2, Page(s): 1078-1082, Calgary, Canada, 12-14 August 1990. 45. Franchitti, J. C., “Allpass filter interpolation and frequency transformation problems”, MSc Thesis, Electrical and Computer Engineering Department, University of Colorado, 1985. 46. Feyh, G., J. C. Franchitti and C. T. Mullis, “All-pass filter interpolation and frequency transformation problem”, 20th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, California, USA, Page(s): 164-168, 10-12 November 1986. 47. Mullis, C. T. and R. A. Roberts, Digital Signal Processing, Section 6.7, Addison-Wesley Publication Company, ISBN: 0201163500, 1 February 1987. 48. Feyh, G., W. B. Jones and C. T. Mullis, “An extension of the Schur algorithm for frequency transformations”, Linear Circuits, Systems and Signal Processing: Theory and Application, Editors: C.I. Byrnes, C.F. Martin and R.E. Saeks, New York: North Holland, ISBN: 0444704957, October 1988. 49. Schuessler, H. W., “Implementation of variable digital filters”, First European Signal Processing Conference (EUSIPCO’80), Signal Processing: Theories and Applications, Page(s): 123-129, Lausanne, Switzerland, September 1980. 50. Jarske, P., S. K. Mitra and Y. Neuvo, “Signal processor implementation of variable digital filters”, IEEE Transactions Instrumentation and Measurement, Volume: 37, No: 3, Page(s): 363-367, September 1988. 51. Krukowski, A., I. Kale and R.C.S. Morling, “The design of polyphase-based IIR multiband filters”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’97), Volume: 3, Page(s): 2213-2216, Munich, 21-24 April 1997. 52. Saghizadeh, P. and A. N. Wilson, Jr., “A genetic approach to the design of M-channel uniform-band perfect-reconstruction linear-phase FIR filter banks”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’95), Volume: 2, Page(s): 1300-1303, Detroit, USA, 8-12 May 1995. 53. Friedlander, B. and B. Porat, “The modified Yule-Walker method of ARMA spectral estimation”, IEEE Transactions on Aerospace Electronic Systems, Volume: 20, No: 2, Page(s): 158-173, March 1984. 54. Mitra, S. K. and K. Hirano, “Digital allpass networks”, IEEE Transactions on Circuits and Systems, Volume: 21, No: 5, Page(s): 688-700, September 1974. 55. Saramaki, T., “On the design of digital filters as a sum of two allpass filters”, IEEE Transactions on Circuits and Systems, Volume: 32, No: 11, November 1985. 56. Saramaki, T., Tian-Hu Yu and S. K. Mitra, “Very low sensitivity realization of IIR digital filters using a cascade of complex all-pass structures”, IEEE Transactions on Circuits and Systems, Volume: 34, No: 8, Page(s): 876-886, August 1987. 57. Krukowski, A., I. Kale and G. D. Cain, “Decomposition of IIR transfer functions into parallel, arbitrary-order IIR subfilters”, Nordic Signal Processing Symposium (NORSIG’96), Espoo, Finland, 24-27 September 1996. 58. Kale, I., J. Gryka, G.D. Cain and B. Beliczynski, “FIR filter order reduction: balanced model truncation and Hankel-norm optimal approximation”, Proceedings IEE on Vision, Image and Signal Processing, Volume: 141, No: 3, Page(s): 168-174, June 1994. 59. Numerical Recipes in C++, 2nd Edition, Cambridge University Press, Cambridge, MA 02238 (USA), ISBN: 0-521-75033-4, 2002.
References
231
60. Nelder, J. A. and R. Mead, “A simplex method for function minimization”, Computer Journal, Volume: 7, Page(s): 308-313, 1965. 61. Samueli, H., “An improved search algorithm for the design of multiplierless FIR filters with powers-of-two coefficients”, IEEE Transactions on Circuits and Systems, Volume: 36, No: 7, Page(s): 1044-1047, July 1989. 62. Hwang, A., Computer Arithmetic, Principles, Architecture and Design, John Wiley & Sons, New York (USA), ASIN: 0471034967, January 1979. 63. Psoloinis, P. C., “VHDL to silicon implementation of a high resolution decimation filter”, BEng Honors Project Report, University of Westminster, London, United Kingdom, 1995. 64. Murphy, N. P., A. Krukowski and A. Tarczynski, “An efficient fractional sample delayor for digital beam steering”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’97), Volume: 3, Page(s): 2245-2248, Munich, Germany, 21-24 April 1997. 65. Liu, G. S. and C. H. Wei, “A new variable fractional sample delay filter with nonlinear interpolation”, IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Volume: 39, No: 2, Page(s): 123-126, February 1992. 66. Elliot, Douglas F., Handbook of Digital Signal Processing - Engineering Applications, Chapter 3, Academic Press, New York (USA), ISBN: 0122370759, November 1997. 67. Cain, G. D., N. P. Murphy and A. Tarczynski, “Evaluation of several variable FIR fractional-sample delay filters”, IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP’94), Volume: 3, Page(s): 621-624, Adelaide, April 1994. 68. Skolnik, M. I., Introduction to Radar Systems, McGraw-Hill Companies, Edition, ISBN: 0072909803, August 2000. 69. Monzingo, R. A. and T. W. Miller, Introduction to Adaptive Arrays, Wiley John & Sons Inc., New York (USA), ISBN:0471057444 , September 1980. 70. Kellermann, W., “A self-steering digital microphone array”, IEEE International Conference oon Acoustic, Speech and Signal Processing (ICASSP’91), Volume: 5, Page(s): 3581-3584, Toronto, Canada, 14-17 April 1991. 71. Krukowski A., R. C. S. Morling and I. Kale, “Quantization effects in the polyphase N-path IIR structure”, IEEE Transactions on Instrumentation and Measurement, Volume: 51, No: 5, Page(s): 1271-1278, December 2002. 72. Vaidyanathan, P. P., Multirate Systems and Filter Banks, Prentice Hall PTR (USA), ISBN: 0136057187, 1st Edition, 21 September 1992. 73. Murphy, N. P., A. Tarczynski and T. I. Laakso, “Sampling-rate conversion using a wideband tunable fractional delay element”, Nordic Signal Processing Symposium (NORSIG’96), Page(s): 423-426, Espoo, Finland, 24-27 September 1996. 74. Välimäki, V., Fractional Delay Waveguide Modeling of Acoustic Tubes, Report No: 34, Helsinki University of Technology, Laboratory of Acoustics and Audio Signal processing. Espoo, Finland, July 1994. 75. Farrow, C. W., “A continuously variable digital delay element”, IEEE International Symposium on Circuits and Systems (ISCAS’88), Volume: 3, Page(s): 2641-2645, Espoo, Finland, 7-9 June 1988. 76. Ashenden, P. J., The Designer’s Guide to VHDL, Morgan Kaufmann Publishers, San Francisco (USA), ISBN: 1-55860-270-4, 1995. 77. Holmes, C., VHDL Language Course, Rutherford Appleton Laboratory, Microelectronics Support Centre, Chilton, Didcot, USA, 23-25 May 1995. 78. Krukowski A. and I. Kale, “Constraint two-path polyphase IIR filter design using downhill simplex algorithm”, IEEE International Symposium on Circuits and Systems (ISCAS’2001), Volume: 2, Page(s): 749-752, Sydney, Australia, 6-9 May 2001.
232
DSP System Design
79. Morgan D and C. Thi, “A delayless subband adaptive filter architecture”, IEEE Transactions on Signal Processing, Volume: 43, No: 8, 1995. 80. Chen J.D., H. Bes, J. Vandewalle et al, “A zero-delay FFT-based subband acoustic echo canceller for teleconferencing and hands-free telephone systems”, IEEE Transactions on Circuits and Systems II, Volume: 43, No: 10, Page(s): 713(7), 1996. 81. Sondhi, M. M. and W. Kellerman, “Adaptive echo cancellation for speech signals”, Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds, NY: M. Dekker, Ch. 1, 1992. 82. Gerald J., N. L. Esteves and M. M. Silva, “A new IIR echo canceller structure”, IEEE Transactions on Circuits and Systems II, Volume: 42, No: 12, Page(s): 818-821, 1995. and A. G. Constantinides, “Subband adaptive filtering for 83. Naylor P. A., O. acoustic echo control using allpass polyphase IIR filter banks”, IEEE Transactions on Signal Processing, Volume: 6, No: 2, 1998. 85. Vaidyanathan P. P.,“Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial”, IEEE Proceedings, Volume: 78, No: 1, 1990. 86. Valimaki, V., Discrete-Time Modeling of Acoustic Tubes Fractional Delay Filters, PhD Thesis, Helsinki University of Technology, Finland, ISBN:951D22D2880D7, TKK Offset, December 1995. 87. Krukowski, A. and I. Kale, “Simulink/Matlab-to-VHDL Route for Full-Custom/FPGA Rapid Prototyping of DSP Algorithms”, MATLAB DSP Conference 1999, Dipoli Conference Centre, Espoo, Finland, November 16-17, 1999.
Index Additive Noise Model 48 AGDF See Arbitrary Group Delay Filter Allpass Section average time delay 236 centre of gravity 236 description 233–38 group delay response 238 impulse response 234 phase response 236 phase response properties 5 step response 235 structures 234 total energy 236 ALU See Arithmetic-Logic Unit Amoeba See Downhill Simplex Method Arbitrary Group Delay Filter xii, 204 Arithmetic-Logic Unit 161 Balanced Model Truncation 41 Bandpass Oversampling Ratio 66 Bit-Flipping 16, 56, 60, 69, 78, 167, 171, 173 BMT See Balanced Model Truncation
Dolph-Chebyshev Window 203 Downhill Simplex Method basic moves 170 constrained 170 overview 169 DPRAM See Dual-Port RAM Dual-Port RAM 179 Dynamic Range 54, 55, 89, 95, 189 Elliptic Filter 8, 41, 60, 114 Equivalent Lowpass Magnitude Response 81 Fourier Summation Transforms 197 Frequency Transformation lowpass-to-lowpass 28 mapping function 30 FST See Fourier Summation Transforms Harris 8, 18 Hilbert Transformation 115 Hybrid Amoeba See Downhill Simplex Method Interpolation Filter 63, 72, 73, 74, 76, 201 Inverse Hilbert_Transformation 116
Canonical Signed-Digit Code 174 CDS See Downhill Simplex Method Compensation Filter 23, 67 Constantinides 8, 18, 96, 105, 114, 147 Constrained Downhill Simplex 16 CSDC See Canonical Signed-Digit Code
Lagrangian Interpolation Filter 203 Least Significant Bit 163, 226 LMS 95, 96, 98 LSB 163
DC Mobility 107, 124 Difference-Multiply-Accumulate 179 Digital Audio 51 Direct Decomposition Method 36 DMAC See Difference-MultiplyAccumulate
MAVR 21, See Moving Average Filter Morgan 100, 102, 104, 243, 244 Moving Average Filter 20 Mullis 96, 109, 147, 242 Multiband Filter 27, 129, 154, 155, 157 Multiplexer 200
DSP System Design
234 Multistage Decimation 56 Natural Binary Code 174 NBC See Natural Binary Code Noise Shaping Function 191 NSF See Noise Shaping Function Nyquist Converter 45, 46, 47, 49 Nyquist Mobility 107, 124 Oversampling Ratio 48, 51, 54, 55, 56, 57, 68, 80 Partial Fraction Expansion 44 PCM See Pulse Code Modulation PDM See Pulse Density Modulation PeakFPGA 229 Perfect Reconstruction 97, 104, 148 PFE See Partial Fraction Expansion Phase Linearity 6, 79, 82, 83, 202 Polyphase IIR Filter group delay 7 N-path structure 20 passband and stopband ripples 5 phase response 2 pole-zero plots for 2 coefficients 14 pole-zero plots for 3 coefficients 19 structure 1 two-path halfband 3 Power Spectral Density 49, 191 Predictive Modulators 51 PSD See Power Spectral Density Pulse Code Modulation 46 Pulse Density Modulation 49 Quadrature Mirror Filter Banks 115 Quantization Noise 48, 53 Quantization Schemes 185 convergent rounding 187 rounding 186 rounding to infinity 186 rounding to zero 186
truncating 186 Real-Time Workshop 216, 218, 219 Reconstruction Error 41, 97, 101 Rotation Factor 106, 113, 130 RTW See Real-Time Workshop Sample Rate Decreaser 97, 176 Sample Rate Increaser 200 Sigma-Delta Modulator 3, 45, 49, 63, 239, 240, 241 Signed Binary Code 13 Simplex 169, 170, 171, 243, 244 Simulink 214 MDL 215 SRD See Sample Rate Decreaser Stability-Forced Transformation 140, 141 Subband Adaptive Echo Cancellation 94 Successive-Separation Method 34, 37 Target Language Compiler 216 code generation 219 diagram 217 inlining S-functions 219 TLC See Target Language Compiler Transformation Function 29, 30 Unsigned Binary Code 13 VHDL 211 VLSI 45, 47, 49, 51 Yulewalk 158, 159 Zero Insertion Interpolation 200 ZII See Zero Insertion Interpolation See Sigma-Delta Modulator