Table of Contents Preface
vii
Part I: Structured Mixed-Mode Design Introduction
1
Structured Oscillator Design C. Verhoeven and A. van Staveren
3
Systematic Design of High-frequency gm-C Filters E. Lauwers and G. Gielen
21
Structured LNA Design E. H. Nordholt
47
High-level Simulation and Modeling Tools for Mixed-Signal Front-ends of Wireless Systems P. Wambacq, G. Vandersteen, P. Dobrovolny, M. Goffioul, W. Eberle, M. Badaroglu and S. Donnay
77
Structured Simulation-Based Analog Design Synthesis R. A. Rutenbar
95
Structured Analog layout Design K. Lampaert
115
Part II: Multi-Bit Sigma Delta Converters Introduction
133
Architecture Considerations for Multi-Bit SigmaDelta ADCs T. Brooks
135
Multirate Sigma-Delta Modulators, an Alternative to Multibit F. Colodro and A. Torralba
161
Circuit Design Aspects of Multi-Bit Delta-Sigma Converters Y. Geerts, M. Steyaert and W. Sansen
181
High-speed Digital to Analog Converter Issues with Applications to Sigma Delta Modulators K. Doris, D. Leenaerts and A. van Roermund
205
Correction-Free Multi-Bit Sigma-Delta Modulators for ADSL R. del Rio, F. Medeiro, J.M. de la Rosa, B. Pérez-Verdú and A. Rodríguez-Vásquez
235
Sigma Delta Converters in Wireline Communications A. Wiesbauer, J. Hauptmann and P. Laaser
261
vi
Part III: Short Range RF Circuits Introduction
285
RF Circuits in DECT and Bluetooth P.T.M. van Zeijl
287
Wireless LANs J. Glas, M. Banu, J. Hammerschmidt, V. Prodanov and P. Kiss
317
Design of Wireless LAN Circuits in RF-CMOS D. Leenaerts and N. Pavlovic
345
A Fully Integrated Single-Chip Bluetooth Transceiver J. Craninckx
365
Continuous-time Quadrature Modulator Receivers P. Vancorenland, P. Coppejans and M. Steyaert
387
Low Power RF Receiver for Wireless Hearing Aid A. Deiss and Q. Huang
411
Preface The book contains the contribution of 18 tutorials of the 11th workshop on Advances in Analog Circuit Design. Each part discusses a specific to-date topic on new and valuable design ideas in the area of analog circuit design. Each part is presented by six experts in that field and state of the art information is shared and overviewed. This book is number 11 in this successful series of Analog Circuit Design, providing valuable information and excellent overviews of analog circuit design, CAD and RF systems. These books can be seen as a reference to those people involved in analog and mixed signal design. This years workshop was held in Spa, Belgium and organized by E. Janssens of Alcatel Microelectronics, Belgium. The program committee consisted of M. Steyaert, KULeuven, Belgium, H. Huijsing, T.U.Delft, The Netherlands and A. van Roermund, T.U.Eindhoven, The Netherlands. The topics of 2002 Spa (B) are: Structured Mixed Mode Design Multi-Bit Sigma Delta Converters Short Range RF Circuits The other topics covered before in this series: 1992 Scheveningen (NL): Opamps, ADC, Analog CAD 1993 Leuven (B): Mixed-mode A/D design, Sensor interfaces, Communication circuits 1994 Eindhoven (NL) Low-power low-voltage, Integrated filters, Smart power
viii
1995 Villach (A) Low-noise/power/voltage, Mixed-mode with CAD tools, Volt., curr. & time references 1996 Lausanne (CH) RF CMOS circuit design, Bandpass SD & other data conv., Translinear circuits 1997 Como (I) RF A/D Converters, Sensor & Actuator interfaces, Low-noise osc., PLLs & synth. 1998 Copenhagen (DK) 1-volt electronics, Design mixed-mode systems, LNAs & RF poweramps telecom 1999 Nice (F) XDSL and other comm. Systems, RF-MOST models and behav. m., Integrated filters and oscillators 2000 Munich (D) High-speed A/D converters, Mixed signal design, PLLs and Synthesizers 2001 Noordwijk (NL) Scalable analog circuits, High-speed D/A converters, RF power amplifiers
I sincerely hope that this series provide valuable contributions to our Analog Circuit Design community. Michiel Steyaert
Part I: Structured Mixed-Mode Design Johan H. Huijsing In order to shorten the design time of integrated circuits, industry has a general need to automate the design flow. Alltough this has been successful for the digital VLSI design flow, automation is still in its infancy for the analog and mixed-mode design flow. The six papers in the first part of this collection of AACD 2002 papers describe approaches towards a more structured and automated design of analog and mixed- mode circuits. The first three papers cover the structured design of three basic analog building blocks: oscillators, gm-C filters and LNA s. The fourth paper describes tools for high-level simulation and modeling of mixed-signal front-ends for wireless systems. The fifth paper discusses a structured simulation-based analog design synthesis. The sixth paper presents a structured design approach for mixed-mode layout.
This page intentionally left blank
Structured Oscillator Design Chris Verhoeven, Arie van Staveren TU-Delft, ITS Mekelweg 4 2628 CD Delft, The Netherlands Abstract
This paper presents a general design approach applied to the design of oscillators. The design approach is based on classification of oscillator circuits. From this classification rules for designing oscillators can be extracted. These design rules are used as a fast means to get to an overview of the design space and to focus the creativity of the designer to the spot where the real design challenge is. The structured oscillator design approach has led to insight such that new circuits where found. The key ideas in these circuits are also presented. 1. Introduction
A huge amount of papers has been written on oscillators. For a designer it is no problem to find papers on designs similar to his own design. Once a designer has decided to use a certain topology, all available information on that topology is just a few mouse-clicks away on the Internet. This paper is not intended to add yet another paper to this abundance. Structured Oscillator Design does not deal with details of a specific design problem or topology, but with design rules that can help a designer to pick the most appropriate oscillator topology for his application from the known set of topologies or to help him to decide that there is none that is really appropriate. In the latter case, the same rules could help him to “invent” a new topology. The rules used for Structured Oscillator Design differ from the rules that are needed for CAD. The requirements on a design rule to be used for CAD are much more strict than they are here. The only demand on the rules here is that they can help the designer to find an optimal topology from a known set of topologies and, more 3 M. Steyaert et al. (eds.), Analog Circuit Design, 3-19. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
4
important, to help him to find really new topologies. So they do not need to be logically correct, mathematically proven, and generally applicable or to form a closed set. As long as they help focusing the designers’ creativity on the spot where it is needed, they are “correct”. In this paper, a (not the) set of design rules will be presented, that has helped the authors to find some new oscillator concepts that will also be discussed in this paper. These new concepts have been designed, tested and patented, so in this respect the usefulness of the rules has been “proven”, but it is not the intention of this paper to claim the only, the best or the optimal way to design oscillators. It is just to present a structured design approach that may inspire designers and researchers. 2. The first design rule: a proper definition
Everybody knows the definition of an oscillator. So why spend costly design time on producing an exact definition? Two reasons doing so could be: Knowing the exact definition, it may be possible that thinking about ways to implement the various “ingredients” of the definition, a designer finds a new circuit. Circuits that produce a periodical signal, but do not completely match to the exact definition of an oscillator usually are able to de more than just behave like an oscillator. Pity for the designer when the circuit decides to show this alternative behavior when it is already with the customer. A designer would like to know of this potential danger on beforehand. The Colpitts oscillator is a known example of a circuit that can do more than just oscillate [1,2,3,4,5]. Also, when a designer is aware of the “extra features” of a circuit, he might find a new (non-oscillator) application for the circuit, where a customer could enjoy the formerly undesired behavior. So let’s produce the exact definition: An oscillator is a tunable circuit that generates a stable periodical signal, which is in the limit independent of the initial conditions.
5
Note the words tunable and stable in the definition. For oscillators that are really not tunable there are only some niche-applications (Cesium-standards etc.), for all other oscillators the tunability, either being an intended feature or just a parasitic effect, needs to be evaluated and designed. Stability always is an issue, since stability costs either power or money, or both. Tunability and stability will be discussed later. Now the focus will be on the phrase “which is in the limit independent of the initial conditions”. Differential equations (DEs) are a very convenient means to describe the behavior of a dynamic circuit. Differential equations that just have one periodic particular solution are of the second order. So the ideal oscillator is described by a second-order differential equation. Higherorder differential equations can have more than one periodical solution, or even a-periodical ones. Practice has proven that higherorder circuits can, and very often will, produce non-periodical (e.g. chaotic) output signals. A naive choice for a second-order DE would be, for instance,
For this equation, it is known that the solutions are periodic with frequency However, there is a strong dependence on the initial conditions for the amplitude of the attainable solutions. A solution to this equation is:
in which (the amplitude) is an arbitrary constant, determined by the initial condition which is of course undesirable. The only DE’s that can have solutions of which the amplitude is not dependent on the initial condition are non-linear DE’s like:
6
where the function static non-linearity.
is the required amplitude control and defines a
From this it can be concluded that oscillators are always non-linear circuits that should be of second order and no more. 3. A classification
Since part-list of an oscillator consists of two time-constants and a non-linearity, it seems a good idea to use the number of dominant time-constants as the basis for a classification. It is not necessary that all time-constants in an oscillator have an equal influence on the output frequency. Based on this the following classification can be made:
Of course classifying in this way, there is an unlimited number of classes, but the higher the order, the less suited a circuit is to be used as a reliable oscillator. Examples of first-order oscillators are a-stable multivibrators, of second-order oscillators Wien-bridge oscillators and of third-order oscillators many RC-oscillators. Delay-line oscillators seem to form a particular class as they are sometimes called “infinite order oscillators” [6], but actually they show (if working properly) the behavior of a second-order oscillator. They can be described with an infinite set of second order DE’s:
7
One of these equations is related to the desired mode in the output signal, the others are related to unwanted modes. Due to the very small interaction between the modes, the behavior of a delay-line oscillator can come very close to that of a true second-order oscillator, but it will take more design effort to obtain the desired oscillator behavior, since there are more parameters to control. The circuit can do more than a true oscillator. Makes one wonder what other applications could be.... 3.1. First-order oscillators
In first-order oscillators one time constant is dominant. This means that most of the time, only one time-constant is responsible for the variation of the output signal. Only for a very small part of the period, the essential effect of the other time-constant is present. The advantage of this is that the dominant time-constant can be a welldefined integrator. Integrating a constant, the integrator produces a ramp-signal at its output. Two reference levels are chosen at which the sign of the constant that is integrated is changed (see fig.1). When integrator is around one of the two reference levels, during the switching of the sign, the first-order oscillator really shows “second order behavior”. Often, the second time-constant comes from a binary memory that switches during (see fig.3a) the transition. Changing the integration constant or the reference levels to tune the oscillator can be done fast and well defined. For this reason first-order oscillators have a high modulation bandwidth, can be linearly tuned and have a wide tuning range. These properties form the most important reason to select an oscillator from this class.
8
The big “challenge” of first-order oscillators is the frequency stability. Although during the time interval the oscillator is just integrating of which the timing can be very accurate (the Q of a capacitor is generally very high), during the transition many effects play a role that destroys the over-all stability. One of the most important effects is the poor stability of the switching of the binary memory. The binary memory (Schmitt-
trigger, flip-flop) is a regenerative circuit that shows a typical transition behavior. The reason for this poor stability can be explained with the aid of figure 2. It shows the limit cycle of a firstorder oscillator (see fig.3). On the horizontal axis is the capacitor voltage and on the vertical axis is the state of the binary memory. Most of the time during a period is spend on the bold part of the limit cycle where the integrator is charged or discharged. At the point where the transition of the memory should occur, the very slowly varying capacitor voltage has to initiate a transition that is usually a few orders of magnitude faster. This means that with respect to the timescale at which the events in the memory occur, it seems as if the capacitor voltage has come to a near “stand-still”, just before it could initiate the transition. Any signal from outside that falls within the bandwidth of the memory, (noise, a spike etc.) can initiate a transition well before it is due or delay it after the capacitor voltage just crossed the reference level. This can result in an enormous jitter [6,7]. Two methods that can be used to counteract the problem are:
9
Removing the binary memory from the essential timing path. Observing the generally used block diagram in fig.3a, it can be seen that the memory is in the timing path. The only essential function of the memory is to keep the sign of the integrated constant in the right position during the charging and discharging of the capacitor. There is no rule that states that this memory has to “deliver” the second time-constant that is needed according to the oscillator definition. The circuit shown in fig.3b shows a block diagram in which a comparator delivers this second time-constant and the memory is bypassed [9].
Introducing an intentional well-timed external transition pulse. Injection-lock can cause big problems in an oscillator, and first-order oscillators are very susceptible to this problem. But when the mechanism is there anyway, why not use it beneficially? The only problem is to obtain this well-timed external pulse in such a way that there is no additional highquality oscillator is necessary to derive it. Then it would be better to use that oscillator in the first place. So if no other type
10
of oscillator can be used, the only option is to use the same type of oscillator again. From each of the two oscillators a pulse is derived that is injected in the other one to force a transition. Fig.4 shows an architecture like this. It is possible to derive a stable pulse from a first-order oscillator by observing the integrator output signal by means of a comparator and compare it to a reference level, not being one of the reference levels at which the memory of the oscillator switches. A good choice could be a level just in between of these two levels. In that case the two oscillators run in quadrature [12]. The result is a much more stable first-order oscillator system that additionally produces quadrature signals with a high quality phase relation. The tunability of this system is equal to that of the separate oscillators, so improving the stability of first-order oscillators does not necessarily reduce the quality of their key-performance aspect, the tunability. 3.2. Second-order oscillators
In second-order oscillators both time constants play an equally dominant role. Both can originate from passive components. An active component is only needed to compensate for losses in the passive components. This means that the active circuit in a second-order oscillator only needs to play a “secondary role”. It can be loosely coupled to the passives that determine the frequency. Figure 5 shows this. The passive elements of the second-order oscillator (on the left)
11
can be well isolated from influences from outside. The observer is just needed to compensate for losses, but does not (need to) interfere with the timing. For high-Q resonators, it is even possible to completely detach the observer for some time and only reconnect it when the energy in the resonator has become too low. This strategy, already used in medieval church clocks, yields a system that can produce a very stable output signal. For the first-order oscillator, the observer forms an essential part of the timing loop and every timing error made by the observer has a direct consequence for the stability of the output signal. Therefore second-order oscillators can be very stable. This high stability is the key-performance aspect and should be the main reason to select an oscillator from this class. In most applications at least some tunability is required for the oscillators. A tuning signal can also be seen as interference from the outside. Highly stable oscillators are therefore not very well tunable. Actually, making a second-order oscillator tunable can only be done by “opening it up” for the outside world, which means by breaking down the stability. It is very difficult to make a second-order oscillator tunable without loosing too much stability. Coupling of two oscillators to improve stability was a relevant option for first-order oscillators; however, it is certainly a not good idea to couple second order oscillators to improve tunability. Figure 6 shows two pole-zero plots that can illustrate this. On the left side of the figure, the poles of two ideal resonators are shown (the zero’s do not play an important role here). It can be seen that the resonant frequencies differ. On the right side the two resonators are coupled in such a way that only two poles end up close to the imaginary axis. (Also the original poles are depicted in this plot.) It can be seen that, with the aid of an active circuit to get the two poles exactly on the imaginary axis, this coupled system will produce a periodical output signal with a frequency that is the average of the resonance frequencies of the two separate resonators. Varying the coupling factor tunes the system. Since this system actually is a fourth-order (four pole) system, it is interesting to see what has happened with the other two poles. They are shifted into the left halfplane, which indicates that there are losses; and losses are inevitably related to noise. Apparently the only way to make a tunable second-
12
order oscillator system with two high-Q (low-loss) resonators with different resonance frequencies is to destroy the Q of the system! Introducing tunability for second-order oscillators reduces the quality of their key-performance aspect, the stability [13].
In short: An oscillator has two poles on the imaginary axis. When the circuit is of higher order, the other poles have to be in the left halfplane. They can only be there when there are losses in the circuit. And when there are no sufficient losses, they have to be introduced. This implies that high Q’s are lost and the noise performance of the oscillator is degraded. 3.3. Third-order oscillators
Many oscillators designed as second-order oscillators, are actually best described via a third-order DE. This third order may originate from the dynamics of the amplitude-control loop. In that case the third pole usually is so far away from the other two that it can be safely assumed that the system will nicely behave like an oscillator. In other cases, it happens that an active circuit (usually only one transistor, like in the Colpitt) is not able to produce the required phase relation between input and output to make a resonator resonate. An additionale phase shift created via an extra pole is necessary to bring the other two poles on the imaginary axis. The resonator then largely determines the output frequency. However, in this case the influence of the third pole
13
is of course not negligible. Oscillators like this can, when not carefully designed, show typical third-order (e.g. chaotic) behavior. Proper design of the active part can of course yields resonator-oscillators that are truly second-order. [10,11]. The main reason for designing true third-order oscillators, i.e. oscillators relying on three equally dominant poles, is the fact that they can be implemented with just RCnetworks. A filter having three real poles is then used in a feedback configuration that has two poles on the imaginary axis and one on the real axis in the left half plane, see fig.7. This means that losses and therefore noise are inevitable in these oscillators. Stability is not a key-feature, nor is tunability. It might be the fact that they can be build with just R’s and C’s, but this also holds for the second-order Wien-bridge oscillator, which performs better.
3.4 Conclusions on the classification
From the classification it can be concluded that the only really interesting classes of oscillators are the first-order and the secondorder oscillators, including the second-order like delay-line oscillators. The choice is based on a preference for either toptunability or top-stability. Apparently tunability is a very robust property whereas stability is a very fragile one. Coupling oscillators to improve the stability of first-order oscillators seems to be a concept that can work. Coupling of second-order oscillators to obtain tuning does not seems to be a very good idea, since this tends to degrade the stability. Coupling two well-matched resonators to improve the stability has been shown to work, but this is of course not improving the tunability in any way.
14
4. New circuits 4.1 FM-demodulation via injection lock
Above, injection lock was successfully used to improve the stability of a first-order oscillator. Injecting it with a signal from another firstorder oscillator did this. Of course, also other injection sources can be used. (The resulting system is not an oscillator in the strict sense of the definition, but the main point of structured electronic design is not to
confine the creativity of a designer to a set of solutions limited by rules, but to get an well-structured overview of properties and possibilities leading to new circuits.) An interesting experiment is injecting a first-order oscillator with an FM-modulated signal. The result is shown in fig.8. The oscillator locks to the injected signal. As a result, the integrator output signal becomes AM-modulated. Using an AM-detector on this signal results in the demodulated FM baseband signal. Injecting an FM-modulated signal in the coupled oscillator system shown in figure 4 yields two integrator output signals in quadrature that are AM-modulated. Simple goniometry shows that by adding these quadrature signals in the proper way (via a pythagorator), the base-band signal is found without the need of extra filtering. So this concept results in a FM-demodulator that has a very
15
high signal bandwidth, is completely integratable since it does not need a large low-pass filter and is easily set to the carrier or a subharmonic of the carrier [14].
4.2 Injection-lock divider
The fact that injection lock can also be on a sub-harmonic makes the first-order oscillator also very suitable as injection-lock divider. The maximum bandwidth is just as high as it is for a standard flip-flop based prescaler, except for the fact that one stage of an injection-lock divider can divide by more than 2. Experimental results have shown that reliable division ratios can be obtained well above 10. So the injection-lock divider can be a good low-power alternative to standard dividers [15]. 4.3 Resonance-mode selection
It is also possible to couple a first-order oscillator to a resonator. When the resonator has several resonance modes (like an overtone crystal), during startup the firstorder oscillator mostly excites the resonance mode that is closest to its free-running frequency. So the resonator starts oscillating on that frequency. This signal is injected back into the first-order oscillator that, because of the
16
injection lock, synchronizes to this frequency. This results in a system that oscillates with the stability of the resonator at the resonance frequency that was selected via the free-running frequency of the firstorder oscillator [16]. The simplest version of this is the overtone oscillator depicted in fig. 10. 4.4 Self-oscillating mixer
Looking at fig.4 again, it is clear that this is at least a fourth-order system. There are two integrators, and two binary memories. Due to the mutual injection, it is the two integrators that are in the timing loop that sets the output frequency of the system, giving the jitter of the memories less change to introduce instability. The two mutual injection paths are not influencing the output frequency directly. This means that it might be possible to manipulate them without affecting the stability. The reason for wanting to do so is the old dream of the “self-oscillating mixer”. In all modern transceivers the mixing
function is performed via oscillators and a separate mixer circuit. Although the oscillator is switching itself, it is not possible to feed the RF-signal through the oscillator since this influences the oscillation frequency causing all kinds of demodulation errors. Also the coupled oscillator system has many signal-paths that are switched. But in this
17
case it appears that there are signal paths through which the RF-signal can be put without affecting the frequency of the system. Fig. 11 shows the block diagram of this self-oscillating mixer. The RF-signal is passed through the injecting paths from one oscillator to the other. Since the two oscillators have an accurate quadrature relation, this results in accurate quadrature mixing of the signal [17]. The mixing function is in the feedback loop that controls the quadrature [6] in the system, so the phase accuracy of the quadrature mixing does for example not depend on matching of the signal paths to the mixers or the mixers themselves. A very accurate quadrature mixing can be expected from this circuit. It has been designed and tested in an Europractice BiCMOS process at 5.8GHz. Measurement results so far confirm theory and simulation results.
5. Conclusions
In this paper it has been shown that a classification based on an accurate definition of oscillators can inspire designers to find new circuits. The design rules that follow from the classification are not very strict and are not directly useable for CAD, but they appear to rise the right questions at the right time, thus focusing the creativity of
18
a designer at the spot where the real problem is and where a really new solution can be found. So it is always a good idea to try to formulate design rules, general statements about groups of circuits and to try to make one or even more classification trees. A statement that could be made is: “classification beats creativity”, but perhaps it is more fun to state that “creativity focused by a classification beats the random search of a genius” References
[1]
[2]
[3]
[4]
[5]
[6] [7]
[8]
W. Szemplinska-stupnicka and E. Tyrkiel, Sequences of Global Bifurcations and the Related Outcomes After Crisis of the Resonant Attractor in a Nonlinear Oscillator, International Journal of Bifurcation and Chaos, Volume 7 · Number 11 · November 1997 Devaney Chaos in an Approximate One-Dimensional Model of the Colpitts Oscillator, International Journal of Bifurcation and Chaos, Volume 7 · Number 11 · November 1997 O. De Feo and G.M. Maggio, “Bifurcation phenomena in the Colpitts oscillator: A robustness analysis,” In Proc. ISCAS 2000, Geneve (Switzerland), May 28-31, 2000 G.M. Maggio, O. De Feo and M.P. Kennedy, “ Application of bifurcation analysis to the design of a chaotic Colpitts oscillator,” In Proc. NOLTA’98, pp. 875-878, Crans-Montana, Switzerland, September 14-17, 1998. Tamasevicius A., Mykolaitis G., Bumeliene S., Cenys A., Anagnostopoulos A. N. and Lindberg E.: “Two-stage chaotic Colpitts oscillator”, IEE Electronics Letters, vol. 37, nr. 9, pp. 549-551,2001. Westra, J.R. et.al “Oscillators and Oscillator sytems”, Kluwer Academic Publishers, Boston, 1999, pp: 296 Westra, J.R, et.al., Coupled relaxation oscillators with highly stable and accurate quadrature outputs Analog and Mixed IC Design, 1996., IEEE-CAS Region 8 Workshop on , Page(s): 32 –35 Westra, J.R.; Verhoeven, C.J.M.; Staveren, A.van, “Design Principles for low-noise relaxation oscillators”, Proceedings Electronics ’96, Sozopol, Bulgaria, 1996
19
[9] [10] [11]
[12] [13] [14] [15]
[16] [17]
J.G.Sneep; C.J.M.Verhoeven, “A new low-noise 100MHz balanced relaxation oscillator”, IEEE Journal of Solid Statee Circuits, vol.25, p.692-698, 1990 Nordholt, E.H. Single-pin integrated crystal oscillators IEEE transactions on circuits and systems vol.37 nr.2, 1990 Staveren, A. van et.al. “Structured Electronic Design: HighPerformance Harmonic Oscillators and Bandgap References” Book Series: the Kluwer international series in engineering and computer science : Volume 604 Verhoeven, C.J.M.; “US05233315; Coupled Regenerative Oscillator Circuit”; August 1993 John M. Wolfskill, Erie, United States Patent Office: 2,210,452: Piezoelectric Crystal Apparatus, Patented Apr. 29, 1941 Verhoeven C.J.M.; Bos, C. van den; Application for Dutch patent Nr: 1017938, “FM-Demodulator”, April 25, 2001 Bos, C. van den; et.al “Frequency division using an injectionlocked relaxation oscillator”, accepted for publication at ISCAS2002 VerhoevenC.J.M.; US.Patent US6225872; “Resonator having a selection circuit for selecting a resonance mode”, May 1, 2001 Verhoeven C.J.M.; Bos, C van den; Nieuwkerk,; Kouwenhoven, M.H.L.; Application for Dutch patent Nr. 1017191:, “Quadrature Modulator”, 2001 (US patent application procedure started jan.2002)
This page intentionally left blank
Systematic design of high-frequency gm-C filters Erik Lauwers, Georges Gielen Katholieke Universiteit Leuven, ESAT-MICAS Kasteelpark Arenberg 10 B-3001 Leuven-Heverlee, Belgium Abstract
A systematic design flow for high-frequency continuous-time gm-C filters is presented, starting with high-level characterization of the filter and ending with transconductor circuit sizing and layout. In between the filter topology is defined and optimized. Each step of this flow is highlighted and where possible solutions are discussed that are good automation candidates. To illustrate the flow, an example is developed in parallel. 1 Introduction
In a broad sense electrical filters are devices that modify, reshape or manipulate the frequency spectrum of an electrical signal according to prescribed requirements. Integrated electrical filters are only a subclass of the vast amount of possible filter realizations. To reach high levels of integration and to reduce cost, these filters are desirably integrated in deep submicron CMOS technologies. In order to obtain filters that operate at several tens of megahertz or that go beyond cutoff frequencies of 100 megahertz as needed in many present day applications, the choice is mainly restricted to continuoustime gm-C filter implementations [1]. This is the class of filters on which this text will focus. High-frequency gm-C filters are typically encountered in communication systems where they can act as anti-aliasing or channel-select filters. Another example can be found in the read channels of hard disk drives where filters perform a pulse shaping function. These two application areas can demand for specific filter 21 M. Steyaert et al (eds.), Analog Circuit Design, 21–45. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
22
phase and amplitude characteristics. For example a linear phase response may be required or a very steep magnitude roll off might be important. Apart from the frequency response other specifications can be equally important. Typical specifications like dynamic range and nonlinear distortion are often imposed. But also linear distortion is of concern and has to be covered during the design cycle just like power and area limitations. Besides these specifications some extra attention has to be given to specific integration-related problems such as parasitic capacitances, matching and, related to matching, tuning. In this paper a complete design cycle will be covered and most of these specifications will be discussed. The goal is to describe a topdown design methodology from the high-level specifications down to the transistor sizing. However, to keep the text structured, only gm-C filter topologies that are good candidates for automatic optimization are used. Section 2 will start with a general overview of the filter design flow. Next, in section 3, after first giving an overview of the filter synthesis flow, the different design steps will be discussed in detail and an example will be given. In section 4 some additional high-level operations are described that bridge the gap between the optimized ideal solution obtained thus far and the practical implementation of the operational transconductance amplifiers (OTA’s). In section 5, the design phase of the transconductors and some practical integration issues are discussed. Finally, conclusions are drawn in section 6.
23
2 Filter design flow overview
The proposed design flow for the systematic filter design is shown in Figure 1. Inputs are given in the upper left corner. They contain the filter specifications (passband ripple, stopband attenuation, ...), information about the input signal (range, bandwidth) and additional limits for the non-idealities such as noise and distortion. The latter are for example specified as requested dynamic range and total harmonic distortion at the output of the filter. Using this information a filter topology is chosen and optimized. To this end, some topology library has to be available, either in a real database in the case of a fully automated system or in a virtual database in the case of a designer making the filter autonomously. Once the high-level optimal filter coefficients are known (transconductance and capacitor values in the case of gm-C filters) the OTA’s can be designed or chosen from a database. Finally the output is a fully sized filter topology that meets the input constraints. Note that a variation of this design flow leads to a high-level power estimation tool for analog filters [2] if, instead of choosing or realizing a sized OTA, a good high-level model of candidate OTA’s is available. Next the filter synthesis flow is discussed in detail.
24
3 Filter synthesis flow
A typical filter synthesis flow can be divided into four steps (Figure 2). The first step is the approximation step in which the transfer function is generated. The second step is the realization step in which a filter topology is chosen (e.g. cascade of biquads or leapfrog). In the third step an active implementation is made (e.g. gmC), eventually preceded by making a passive prototype. The last step is the high-level optimization step. The result is a set of optimal gm and C values that have to be designed further on at the circuit level. Each step is now discussed.
3.1 Approximation step
The first step is to translate the filter specifications (like for example Figure 3) into a mathematical description, namely the transfer function. Typical constraints are for example a certain maximum passband attenuation, a certain minimum attenuation at the stopband frequency or a constant group delay in the passband. A lot of work has been done on this so called approximation problem and a good introduction or starting point for further investigation can be found in [3]. As a result, many approximations are readily available to the designer. The most known ones that yield reasonable approximations are of course the Butterworth, Chebyshev, Bessel and Elliptic or Cauer approximations. Each have their specific qualities such as for example an equiripple-magnitude characteristic for the Chebyshev type or a linear phase for the Bessel type but then at the
25
expense of a less steep magnitude rollof. After having obtained a normalized low-pass prototype, the transfer function is scaled and if desired transformed into any other filter type (high-pass, ...). Next the topology selection or filter realization is discussed. 3.2 Topology selection or realization step
The possibilities to create a realization for the obtained transfer function are numerous. A discrete filter realization such as for example a ladder filter is one possibility. Another possible solution is to make an opamp-RC filter such as for example a Sallen and Key filter, a leapfrog filter or a CGIC filter. Yet another solution might be a MOSFET-C filter topology where the resistances are now realized by MOSFET transistors. Also a gm-C filter is possible realizing for example biquads or gyrator-C filters. The latter two types are better integration candidates with respect to the first ones. For a good starting point on the here mentioned topologies, the reader is referred to [3]. In this text only high-frequency gm-C topology solutions are discussed. Solutions with operational amplifiers (op-amp) using local feedback consume too much power or do not have a high enough bandwidth compared to the gm-C solutions. 3.3 Implementation step
The gm-C implementation can be divided in two mainstream solutions: the LC-ladder simulation and the cascade realization. Although reasons for choosing one or the other are not always straightforward, a first-order selection rule is as follows. If you want to realize a high-order filter with a high Q or with a sharp magnitude cutoff or a filter that has low passband sensitivity to component tolerances, then an LC-ladder simulation is a good choice. If however high-speed operation, ease of design or tuning are important, then the choice goes to a cascade realization. Here, of each solution, one possible implementation is looked at closer. For the LC-ladder simulation type the state-space method is described here, which is a modified signal-flow-graph method. For the cascading type a cascade of biquads method is chosen. Both methods allow to automate their
26
optimization process through the use of their state-space representation. These two implementations will now be explained. 3.3.1 State-space method
In the state-space method first a passive RLC prototype has to be realized using known algorithms, formulas or lookup tables [3, 4]. The source and load impedance have to be chosen such as to maximize the power transfer. Then the Kirchhof equations have to be written for the RLC prototype with admittances in the series branches and impedances in the shunt branches:
A scaling factor R=1/gm is introduced to obtain appropriate values for Z and Y. These equations are rewritten into a state-space form.
The filter is directly realized from these state-space equations as a combination of integrators. X contains the state-space variables which
27
in the case of gm-C filters correspond to voltages over the integrating capacitors. are the system inputs and Y is the output. Combinational ladder branches can be resolved similarly by using the Coates transformation. This topology contains no floating capacitors nor nodes which are not connected to an integrating capacitor. In Figure 4, an example of this method is given. From the LC prototype on top the equations are derived. After introducing a scaling factor the given A and B matrices of the state-space equations are obtained. Setting each coefficient equal to a value directly yields a possible filter realization. 3.3.2 Cascade of biquads method
For the cascade realization with biquads, first the transfer function has to be factorized into second-order sections. If the order is odd, then one first-order stage has to be added.
The second-order sections are then directly implemented with biquads. Rules of thumb for section ordering, pole-zero pair grouping and gain assignment can be found in literature [3]. The main concern is not to apply in-band attenuation in one section and amplification in a subsequent section.
28
In Figure 5 an example of a gm-C biquad section is shown. Also the realized transfer function and the state-space representation is shown. Given the transfer function to be realized, the values can now be determined. 3.4 Optimization step
At this point, a solution exists that realizes the desired transfer function, but this is maybe not yet a solution with optimal dynamic range. Next the dynamic range (DR) optimization will be explained: the values of and obtained in the previous step are scaled to improve the DR.
In Figure 6 a general overview of the dynamic-range optimization is depicted. Starting point are the state-space matrices of the total filter. Additional inputs are the noise figures of the transconductors (a noise figure of one means a noise-free integrator), a starting value for the total capacitance the wanted dynamic range or none if there is no real goal other than to maximize the DR and the input voltage swing Usually the are not known at this point because no transconductors have been designed yet or have not been selected from the OTA library. In that case they can be chosen to
29
be for example one which is the ideal value and afterwards, when they are known, the optimization can be iterated. The advantage of this approach is to have an ideal upper boundary value for the DR of the filter. In practice the noise figure is larger than one. The total capacitance is made minimal during the optimization to minimize the power consumption. The scaling and capacitance division operations are matrix operations that do not alter the filter topology nor the implemented filter transfer function. Both operations are explained in detail in [5]. For the scaling operation, a is used is the scale factor) to keep the signal levels at the outputs of the integrators equal. In case all the maximal internal amplitudes have to be equal, a is needed. Scaling is done with a transformation matrix T. K and W are the controllability and observability Gramian matrices. The scaled state-space matrices are obtained as:
Other transformation matrices T are also possible and can yield better dynamic ranges for the filter. However, while leaving the transfer function unaltered, a non-diagonal transformation matrix does alter the filter topology. The capacitances are now sized, using the scaled state-space matrices, such as to minimize the total output noise according to:
In this operation the observability matrix W has to be used. The dynamic range of the filter is then calculated as follows:
30
Once the optimal filter coefficients are known, the model can be expanded with non-idealities. The DR optimization can be placed in a loop and executed automatically as indicated by the flow in Figure 7. The lowest value allowed, is determined by parasitic capacitances or by sensitivity considerations. This is described later in the text.
3.5 Example
An example illustrates the design up to this point. The task is to design a high-frequency low-pass Bessel filter (constant group delay) to be used for example in a hard disk drive read channel to perform a pulse shaping function. The cut-off frequency is 335MHz. The first step is the approximation of the specifications with a mathematical expression. The resulting factorized transfer function is (for example with the aid of MATLAB®):
The second-order sections are cascaded in the order indicated in equation 7 and are realized by biquads. The first section has a Q of
31
0.5219 and the second a Q of 0.8055. Hence the gm-C realization is chosen.
The biquad structure from Figure 8 is chosen for this example but different biquad topologies exist that realize the same transfer function. For example in [6], a slightly different topology is used but the optimal achievable dynamic range (see further) is the same. The state-space matrices A and B are:
The coefficients of the A and B matrices correspond to the gm/C ratios. A starting point value is chosen randomly for some unknown ratios and from there on the optimal values are found with the DR optimization. The starting point DR is totally dependent on these first values but unimportant for the final result. For this random starting point, an input rms voltage of 0.25V, a noise figure of 1 and 1.5pF capacitors, the dynamic range is 61.23 dB. After optimal capacitance division and scaling with a scaling constant of 1 the dynamic range is 66.1 dB. The optimal DR solution is actually independent of the value of when the noise of the input branches is not taken into account [5] and hence it is chosen to be one for convenience. If it is taken smaller then one, the B matrix entries become smaller and the noise of the input branches becomes unimportant anyway but then additional amplification is needed before or after the filter. This
32
amplification must of course be performed with negligible additional noise production. For high-speed transconductors with stringent speed specifications, the can be an order of magnitude higher than one. The transformation matrix T for the scaling operation and the optimal capacitor sizes are:
The obtained gm values for this example are finally reconstructed from the transformed state-space matrices and yield:
4 Additional high-level considerations
Other aspects can now be taken into consideration in the highlevel design such as for example linear distortion due to transconductor parasitic effects. Also, at this point it is possible to look at the nonlinear distortion behavior of the filter and to derive distortion specifications for the transconductors. Both are described next. 4.1 Linear distortion
Important transconductor non-idealities whose effects on the filter characterisic can easily be observed, are a finite output impedance and a parasitic non-dominant pole. Slightly modifying the state-space equations allows to take these effects into consideration and hence yields extra specifications for the transconductor design. This allows to define requirements for the minimum output conductance and non-dominant pole position for every transconductor. The transfer function of a non-ideal transconductor is:
33
The first part indicates the effect of the output impedance. This output impedance is modeled by adding the output conductance values present at an integrating node of the filter to the corresponding expression in the state-space equations for that integrating node. This means that they are added to the diagonal entries of the A matrix of the state-space equations.
The second part of equation (10) is the non-dominant pole introduced by each transconductor. If is one and then expanding the state-space equations with one state variable for each extra pole, models this effect:
The disadvantage of introducing these extra variables is that the statespace equations can double in size. This is not always desirable, certainly not when the order of the filter is already high. Unfortunately, this is when this analysis is mostly needed. For example, adding a output conductance to all the gm’s in the optimized filter from section 3.5 reduces the DC gain to –0.8 dB and the 3 dB bandwidth to 270 MHz. Of course, in a real example, normally every transconductor has a different output conductance that is related to its transconductance value. 4.2 Nonlinear distortion
Once optimal gm and C values are computed and a minimal output conductance for the transconductors is defined, a harmonic distortion model can be constructed. The goal of the model is to compute the distortion specifications for single transconductors when given an allowed filter output distortion. Different approaches are possible. The simplest approach is to set the output distortion levels of
34
the transconductors equal to the noise floor at that node. The noise floors are directly available from the DR optimization. However, typically maximum harmonic distortion is specified as a percentage of the output signal which is independent of the noise floor. A second approach is hence to use the filter transfer function and to compute the allowed signal distortion at the integrator nodes from the output signal distortion. In case was applied, this is trivial. For a random input signal will however not be the best solution. In addition a filter transfer function is per definition frequency dependent, making it difficult to define an allowed fixed distortion voltage. Hence, to include phase information and to exploit the real transfer function, a Volterra series model is applied [7]. The disadvantage is that the model is somewhat more complex but is still manageable with programs like Maple® that can handle symbolic expressions. An example of the model derivation is explained for the example filter from section 3.5 (Figure 8). However, for the sake of brevity, only a third-order harmonic distortion is calculated here and not an intermodulation distortion product. With respect to the model complexity, the only difference is that fewer terms are involved in the calculations. Also the output conductances are omitted, but they are in practice readily added on the diagonal of the compacted MNA matrix. A single transconductor is modeled as:
The nonlinear coefficients and are derived from the classic harmonic distortion expressions and can be found in [7]. For example, if a target third-order harmonic distortion HD3 (in dB) is specified, then is calculated from:
To start, the first-order responses have to be calculated from the compacted MNA matrix (equation 15). The results are so called firstorder phasors whose coefficients (x,y,z) contain information about the node number (x) and the involved harmonic frequencies (y and z).
35
This set of equations is solved for example with Kramer’s rule. Next, to calculate higher-order distortion terms, nonlinear current sources are introduced for each transconductance:
The coefficients (i,j) of the nonlinear current sources are the transconductor number (i) and the considered nonlinearity order (j). If intermodulation products must be calculated, these nonlinear current sources are different. The higher-order distortion terms are then computed from the modified compacted MNA matrices. For example for the second-order non-linearities the matrix equations are:
The third-order phasors are calculated similarly. The harmonic distortion is then given by:
The obtained symbolic expressions rapidly become too long and have to be computed numerically or visualized graphically. For example, after addition of the output conductance to the model, the third-order harmonic distortion as a function of the input signal frequency and of the output conductance of the transconductors is plotted in Figure 9. For the example all gm’s have been made equal.
36
For low frequencies and high output conductances the distortion terms are internally slightly cancelled due to the biquad topology. For example, and from Figure 8 have an opposite sign and hence cancel out each other’s odd-order distortion. For low frequencies and normal output conductance values this cancellation is still weakly present but the driving voltages are different. At higher frequencies, when the filter nodes are out of phase, the cancellation effect is no longer present.
Results as shown in Figure 9 allow to specify a maximum bound on the output conductance for a given distortion requirement. 5 Practical filter integration issues
After the high-level optimization part, a set of and values are obtained. Due to matching and tuning considerations, these gm’s are often realized as integer multiples of one so called master gm. This is discussed in section 5.1. Next the actual design of the transconductor is looked at in section 5.2 and in section 5.3 some extra integration considerations are discussed.
37
5.1 Discrete-gm problem
The high-level optimization yields gm values (such as Table 1) that can be all different and can have an arbitrary real value. All these different values are difficult to realize precisely on chip. Not only will the real values all shift due to technology variations over different batches (“tolerances”), also their relative values will vary (“mismatch”). These restrictions are usually solved by applying tuning to one so called master gm. The other gm’s (so called slaves) that constitute the filter, are then matched to this one tuned gm. However, matching arbitrarily different gm values yields poor results. Therefore, it is better to also limit the gm values of the slaves to integer multiples of the master gm. Another possibility is to have constant different gm values and to tune the deviations only by changing the capacitance values (capacitor tuning). However, for area-limited solutions (and area is cost for integrated solutions), constant-gm approaches are not optimal [6]. This means for our example filter of section 3.5 that other gm values have to be found than the ones generated by the DR optimization or that the optimization has to be made discrete which is a difficult problem. Actually, the problem is that if two gm’s arriving at the same integrating node are taken the same and this solution is scaled by changing the diagonal of the controllability matrix, the gm values are made different. Hence, gm’s that have common output nodes have to be scaled together. We choose a solution that finds a good discrete solution starting from the optimal DR solution. Looking at the initial results from Table 1, a good solution would be to choose:
If is 1.1 pF such that the total capacitance remains the same as in the example, then a DR of 65.15 dB is obtained which is only 1 dB less than the optimal solution. However, another solution could be derived from Table 1 as well, and yields the same DR but with a little less power consumption (18 instead of 20 unit gm’s):
38
The value is now about 1.2pF and the DR is 65.1 dB. This illustrates that the DR optimization is necessary and yields a good starting point for the discrete problem. But also that for optimal dynamic range solutions of low-Q biquads, for example power consumption is another factor that can be optimized. Formalizing this discrete optimization process is a subject for future work. 5.2 Transconductor design
At this point in the flow, optimal capacitor values and gm values, which are all integer multiples of one unity gm, are known. Also maximum distortion specifications for the transconductors are known, as well as their minimal output impedance for an acceptable transfer function deviation. The next step then is to actually design the transconductors in the filter. The number of possible transconductor circuits is huge and many have been presented in literature. It is not the intention of this text to elaborate on the design of a good transconductor. However, to enable further computer-aided filter design, an in-house tool is described that automates to a certain extent the transconductor design. The tool is implemented as a Matlab® script and uses a genetic programming approach [8].
A short summary of what the script performs is shown in Figure 10. The input of the script is a transconductor circuit netlist, a cost function and an initial so called population of parameter sets. The parameters are typically device widths and lengths, bias currents and voltages. To the netlist the set of parameter values are automatically
39
added and after carrying out the necessary simulations the obtained performance characteristics are extracted. From these performance values the cost is calculated and this is repeated for each member of the population. The cost function consists of adherence to the performance specifications and additionally power minimization. When all members of the population are processed and hence have a cost associated with them, a new population is generated by crossover and selection until a predefined number of iterations is attained. The goal is to minimize the cost function. This flow will now be explained in more detail. The inputs are a circuit netlist, an initial population and a cost function. The initial population can be anything from a precisely calculated set of values to a completely random set of numbers. The only restriction is that all parameters are within defined bounds. The initial values and their bounds can influence the convergence speed and the number of iterations necessary to yield a useful solution. The netlist must be parameterized and apart from the parameter values be a valid file that is ready for simulation. In addition, the commands that extract and print out the desired performance characteristics of the netlist must be present. In fact, this netlist is what a designer would use when analyzing the circuit, but, instead of graphically observing the output, the output values must be defined formally to be extracted automatically. Finally a cost function must be provided. This cost function calculates a net value from the extracted specifications that reflects the usefulness or goodness of the simulated circuit. An example will be given at the end of this section. At the center of the flow is the simulation and specifications extraction and subsequent cost calculation based on these specifications. Simulating small-signal parameters and DC operating point information of transconductors is straightforward and is taken care of in the initial netlist simulation. Obtaining large-signal information and more specifically distortion information is more difficult because in a filter typically the input and output voltage swing of the transconductors must be equal. Actually, some feedback is needed to master the high output impedance of the transconductors and to equalize the voltage swings, but this feedback must not alter the open-loop distortion characteristics of the transconductor.
40
This can be obtained by simulating a low-pass filter setup as indicated in Figure 11. The second gm is an ideal voltage-controlled current source and its value is obtained from the first simulation run. For low frequencies with respect to the cut-off frequency, the openloop harmonic distortion of the transconductor can be simulated. Finally, after calculating the cost for the initial parameter sets, new parameter sets have to be generated based upon the cost information. The goal is to generate parameter sets that yield lower costs. Many optimization algorithms exist that can do this task. Here a genetic algorithm was used which is implemented as a Matlab® function and which is readily available on the internet [8]. The optimization stops when a defined number of iterations is performed. The presented flow is a simulation-in-the-loop approach and the main disadvantage of these approaches is that finding an optimum takes a lot of time: for every parameter set evaluation, a circuit simulation is performed. However, with ever stronger computing power available, it becomes an interesting solution to obtain relatively fast useable results and in addition the flow is easy to implement. As mentioned above, the tool automates only to a certain extent the transconductor design because of the problems that emerge when a non-feasible solution is asked for. This means that to obtain a good feasible result, some designer interaction is required. Two kinds of interaction are necessary: changing the weights of the cost function and changing the bounds of the parameters. Feedback of both is graphically presented to the designer. Typically the solution is then found iteratively using small population sizes and few iterations.
41
5.2.1 Example
An example of automatic transconductor sizing is given to illustrate the methods described above. A transconductor circuit taken from [9] is used (Figure 12) and it is designed in a CMOS technology with a 2.5V supply voltage. The goal is to obtain a transconductor that can be used in the example filter described previously. For example the input gm must drive an integrating capacitor of 1.468pF and have a value of 1.895mS. The third-order harmonic distortion of the differential solution should be at least 40dB and the output conductance at least A netlist is constructed for the initial simulation and for the LPF configuration. The Matlab tool uses these netlists and just adds parameter declarations and retrieves specifications by parsing the simulator output.
Next the cost is calculated. For example the cost for the required gm is calculated as:
42
To the basic set of requirements some other costs are added that ensure for example that no transistors are in the subthreshold region or that the input and output common-mode voltage are the same. Also a cost proportional to the total power consumption is added to obtain a minimal power solution. To start with, a differential input signal with a voltage swing of 300mV is applied. A fairly small population of 25 is randomly chosen by the genetic algorithm. After about 5 rounds of 10 iterations, always continuing with the previous best population and every time slightly adapting the weights, it is clear that combining the required gainbandwidth (GBW) of 230MHz and the 40dB third-order harmonic distortion is not feasible. Hence the designer has the choice to lower the specifications, change the parameter bounds or accept the nonperfect solution. First the input swing is lowered from 0.3 to 0.2V. After 10 iterations still the distortion-GBW combination is not satisfactory. The bounds are changed to allow for a larger biasing current and larger transistors. After another 40 iterations a good result is obtained. In total about 110 iterations where necessary for a total design time of about 9.5 hours of work (mainly computer simulation time) but this without requiring any explicit knowledge about the working of the circuit. 2750 (=110*25) circuit evaluations is a normal amount for optimizations problems like this.
In Figure 12 the sized circuit with all device sizes is depicted and in Table 2 some key performance results are summarized. In Figure 13 a simulated FFT of the differential output signal is shown, indicating the HD3 of 41.5 dB.
43
5.3 Extra integration considerations
Finally a layout of the filter has to be made. Automatic layout tools exist that for example generate a layout starting from a template file [10]. This is a netlist extended with matching and symmetry statements. The physical layout introduces interconnect and transistor parasitic capacitances. They can both be taken into account to some extent by predistorting the integrating capacitor values with the amount of parasitic capacitance present on the same node. That is why for high-frequency filters topology solutions of which all nodes are connected to an integrating capacitor are more suitable. Actually, the amount of predistortion that is needed forms a constraint on the size of the integrating capacitor in high-frequency designs. The interconnect capacitance also has to be as equal as possible for the differential nodes in the circuit. Another important factor is that the amount of parasitic transistor capacitance can shift due to process variations. In order not to let this alter the transfer function, dummy transconductors [11] and tuning are introduced. This is not elaborated further on in this text.
44
6
Conclusions
The systematic design flow for high-frequency continuous-time gm-C filter design has been described with emphasis on methods and algorithms that allow for design automation. The different steps are described, starting with high-level considerations such as the filter approximation, the realization, implementation and optimization steps, and finalizing with the automatic sizing of a transconductor. In addition the discrete scaling problem for an optimal dynamic range integrated solution has been covered and a method to derive largesignal harmonic distortion information from transient simulations has been presented. The flow has been illustrated with a fully elaborated filter design example from filter specifications down to fully sized circuits.
45
References
[1]
F.Rezzi, I.Bietti, M.Cazzaniga, R.Castello, “A 70-mW seventhorder filter with 7-50MHz cutoff frequency and programmable boost and group delay equalization”, IEEE Journal of SolidState Circuits, vol.32, Dec. 1997, pp 1987-1999 [2] E.Lauwers and G.Gielen, “ACTIF: A high-level power estimation tool for analog continuous-time filters”, Proceedings of the International Conference on Computer Aided Design, IEEE/ACM, Nov. 2000, pp. 193-196. [3] “The circuits and filters handbook”, IEEE press, 1995, ISBN 08493-8341-2 [4] A.I.Zverev, “Handbook of filter synthesis”, J.Wiley & sons, 1967, ISBN 0-471-98680-1 [5] G.Groenewold, “Optimal dynamic range integrated continuoustime filters”, Ph.D.thesis, T.U.Delft, Delft University Press, 1992, ISBN 90-6275-755-3 S.Pavan, Y.Tsividis, K.Nagaraj “Widely programmable high[6] frequency continuous-time filters in digital CMOS technology”, IEEE Journal of Solid-State Circuits, vol.35, April 2000, pp 503-511 [7] P.Wambacq, W.Sansen, “Distortion analysis of analog integrated circuits”, Kluwer Academic Publishers, 1998, ISBN 0-7923-8186-6 C.Houck, J.Joines, M.Kay, “A genetic algorithm for function [8] optimization: A Matlab implementation (GAOT)”, http://www.ie.ncsu.edu/mirage/GAToolBox/gaot/ [9] F.Krummenacher, N.Joehl, “A 4-MHz CMOS continuous-time filter with on-chip automatic tuning”, IEEE Journal of SolidState Circuits, vol.23, June 1988, pp. 750-758 [10] K.Lampaert, G.Gielen, W.Sansen, “Analog layout generation for performance and manufacturability”, Kluwer Academic Publishers, April 1999, ISBN 0-7923-8479-2 [11] W.Dehaene, “CMOS integrated circuits for analog signal processing in hard disk systems”, nov. 1996, ISBN 90-5682055-9
This page intentionally left blank
Structured LNA design Ernst H. Nordholt Catena. Elektronicaweg 40 2628 XG Delft, Netherlands
Abstract A qualitative structured approach to the design of low-noise amplifiers is presented. Emphasis is on design methodology rather than on specific specification issues. The LNA configuration is synthesized based on the requirements of the source and the load. Some examples will support the design methodology.
1. Introduction. LNA design is traditionally approached as a copy and modify action of proven concepts known from literature or recommended by colleague designers. Usually it requires a lot of iterations to realize the required specifications if at all achievable with the chosen concept. There is another way to approach the design. It is based on qualitative reasoning supported by a proper set of generic models, by error reduction techniques as well as by a good understanding of the influences of various design measures. This method leads to a quick decision with respect to the choice of a good amplifier configuration. For this purpose we first have to study the properties of the signal source in terms of its modeling and its signal spectrum. This will lead to criteria for the input port of the LNA. Criteria for the output port will have to be derived from the properties of the load which is usually a mixer or a set of mixers. Dependent on these properties we may have very different requirements for different source and load conditions, which means that we have to investigate various amplifier configurations. The transfer properties of the source quantity (voltage, current or power) to the load quantity in terms of noise, linearity and speed are related to these criteria. 47 M. Steyaert et al. (eds.), Analog Circuit Design, 47–76. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
48
Due to the biasing elements there will usually be more input ports, which receive unwanted signals. The sensitivity of these ports should be made sufficiently low so as to have sufficiently low interfering signals at the output port. Especially in system on chip applications this plays an extremely important role. Techniques for desensitizing these parasitic input ports will be discussed separately. To cover all aspects of LNA design in one article will be impossible. Therefore we will restrict ourselves to qualitative aspects and some examples without providing a full background. The interested reader is referred to [1] for this purpose
2. Properties of the signal source; LNA requirements. We will assume that the LNA is used as the first block in a radio system and therefore it has to process the signals generated by an antenna. These signals normally cover a very wide spectrum. It contains many channels from which we wish to make a selection. As an example we consider the European FM broadcast band, which ranges from 87.5MHz to 108MHz. The radio channels in this band are on a 100kHz raster. A nearby transmitter might generate a signal in the antenna of 2V. For remote transmitters or with poor reception conditions we still wish to receive signals in the order of a few microvolts. So we need to cover a dynamic range of signals in the order of 120dB. It may be clear that it is not possible to linearly handle such a large dynamic range in a fixed-gain amplifier operating at a supply voltage of for example 3V. Moreover, outside the FM band we have a lot of other radio and TV signals that can be very strong and could easily overdrive the LNA if no proper measures were taken such as prefiltering and automatic gain control. It is frequently assumed without further discussion that the impedance of the source for the LNA is (or ). The antenna may have a fixed and real impedance in the frequency band of interest under ideal conditions. In the proximity of other objects, the impedance may become different and frequency dependent. Also the received signal levels depend on the
49
environment. Though it may be convenient from a measurement point of view to standardize on characteristic impedances, there is no fundamental reason why an antenna should be terminated for optimum power transfer. The required termination is much more determined by the way of connecting the antenna to the LN A input. Frequently we need a cable or a stripline when the antenna is not extremely close to the LNA input port. Moreover, we need filtering between the antenna and the input of the LNA. The filter will have to prevent the receiver from becoming sensitive to all kinds of unwanted channels due to nonlinearities in the front-end signal path, and - in the case of heterodyning - to image channels or channels related to local-oscillator harmonics. The cable will need a real termination in order to avoid power reflections and thereby (strong) fluctuations in the sensitivity. A filter, when implemented with lossless elements (LC), will need at least at one port a real impedance termination in order to bring the poles in their required positions. Sometimes real terminations at both ports are needed in order to apply practical inductor and capacitor values. Due to the losses in these components, the insertion loss of the filter will be sometimes sufficiently high to obtain the required filter characteristic without real terminations. The conclusion is that we do not always need or wish to have an optimum power transfer. Sometimes it is more convenient to have either a high or a low impedance termination. At the output of the LNA, we also may have different requirements. The most common situation is that we have to drive one or more mixers that convert the input frequency band to another band (heterodyning), centered about the so-called IF frequency. Dependent on the type of mixers, we may need a voltage source character or a current-source character of the LNA output port. In some cases it may be necessary to have additional band-pass filtering, which again may require real-impedance terminations at their input or output port or at both ports. In the case where we have more than one mixer, the output signal of the LNA has to be distributed to all of them. This all means that only in certain cases, a definition of the power gain of an LNA can be meaningful. Frequently we will have to define the gain in terms of voltage or current gain or in terms of a transimpedance or a transconductance. On top of that, we may have special
50
requirements on the input or on the output impedance. From this perspective it is useful to define ideal amplifier types before we start designing an LNA.
3. Ideal amplifiers and basic implementations. In the introduction it has been made plausible that different amplifier types may be needed as LNAs. We can characterize all of these by means of a transmission-parameter representation:
where we use associated directions for voltages and currents. The ideal amplifiers with only one transmission parameter different from zero are characterized as follows:
As can be seen from this table, the input and output impedances are either zero or infinity. As a matter of fact these amplifier types define the ideal controlled sources (transactors). They are perfectly linear, produce no noise and have an infinite available power gain. They sense either the source voltage or the current in an ideal way and deliver a voltage or a current to the load. Approximations of these transactors can be realized as single-loop negative-feedback configurations. The other useful amplifier types have an input impedance and/or an output impedance different from 0 or
51
From this table it is not yet completely clear how we can realize them. For the first and last two configurations the starting point could be one of the transactors of the first table, where we use series or shunt impedances at their input or output ports in order to fix the input or output impedance. In cases where these impedances are intended as dissipative matching terminations for the source or the load, they will, however, seriously affect the noise performance (at the input) or the power dissipation (at the output). The only good way to realize these properties in an LNA is by using negative feedback, where we use two feedback loops instead of one. It is important that the feedback networks are chosen in such a way that they minimally affect the noise performance or the power consumption. We can distinguish various types of feedback. Setting up a ranking order in terms of increasing effects on noise, distortion and dissipation, we have: 1. Non-energic feedback to be realized by using elements which are lossless and without memory (transformer, gyrator, short-circuit, open-circuit) 2. Lossless feedback using lossless components with memory (L,C). 3. Resistive feedback using resistors 4. Active feedback with one or more active devices in the feedback network 5. Indirect feedback using copies of input or output stages (example: current mirror) Non-energic feedback is not a very realistic concept except for realizations with short and open circuits. To classify negative feedback amplifiers, however, it is a good starting point. Figure 1 shows an example of a feedback configuration around a gain block
52
(ideally a nullor) with four feedback loops, fixing all four transmission parameters, from which all other configurations can be derived.
It can easily be proven that the equivalent input noise sources of the gain block (provided that all its transmission parameters have a value very close to zero) have very nearly the same values as the equivalent noise sources of the amplifier in the case of one or two feedback loops. This means that the feedback networks don’t have any influence on the noise figure. Since there is no power dissipation in the feedback elements, any influence on the power dissipation is also absent. Single loop feedback configurations using short and open circuits in their feedback network are known as the voltage follower and the current follower with unity voltage and current gain, respectively. They are shown in figure 2.
The use of transformers for a limited frequency range may be a possibility on silicon. However the insertion loss will normally be unacceptably high so that no advantage over resistive feedback
53
would be obtained. Lossless feedback is a more realistic concept since it makes use of capacitors and/or inductors. Transconductances and transimpedances will become frequency dependent using these elements. In voltage and current amplifiers it is still possible to realize wide-band amplifiers since the transfer function will be determined by impedance ratios. This type of feedback has an influence on the noise figure and on the power dissipation although the elements do not generate noise themselves or dissipate power. Resistive feedback is the most commonly used type of feedback in wide-band amplifiers. Due to their thermal noise generation and power dissipation the resistors will contribute to an increase of the noise figure and the power dissipation. These influences can, however, be much less than those of shunt or series resistors at the input or output port of the amplifier. Active feedback configurations use active devices in their feedback network(s) which may introduce unacceptable non-linearity or noise. Indirect feedback normally has a large influence both on the noise and the power dissipation and is therefore the least attractive feedback technique. It should be avoided in LNAs as much as possible. For quantitative aspects of these types of feedback we refer to [1]. Before we will discuss the design of some basic LNA examples, we will briefly address some modeling considerations.
4. Device modeling considerations. For the purpose of a systematic approach, we need different device models at different stages in the design. In order to find design strategies, we need generic device models that allow us to investigate the influence of error reduction techniques such as negative feedback, compensation, isolation and error feedforward. Once we understand these influences, their benefits and their drawbacks, we can develop a suitable circuit topology and design and verify in more detail with specific device models with
54
increasing complexity. The proposed generic models are: 1. A non-linear two-port model using generalized non-linear (voltage-controlled) i – v relations will be used for investigating the influence of error-reduction techniques on non-linear behavior:
where we use associated directions for voltages and currents. The fact that we have to supply DC power to the actual device in order to obtain power amplification, can be made explicit by defining an operating point characterized by: For excursions from this operating point, to be marked by a tilde (~), a power amplification in these devices is possible, provided that we have chosen the correct operating conditions. We separate the excursions from the operating-point quantities and the equations can thus be written as:
Based on these equations, we can then build a new two-port, where the operating point is translated to the origin by adding the operating-point sources to the input and the output, respectively, as shown in figure 3.
The new two-port has equations can then be written as:
55
where and can formally be found from the equations of the original device. The advantage of the generic modelling is that it fits to all three-terminal devices, whether it is a bipolar transistor, a field-effect device or even a vacuum tube. It is particularly useful for studying the influence of compensation techniques (e.g. balancing). 2. As for the ideal amplifier types discussed in section 3, a linear two-port model using transmission parameters will be used for investigating the influence of error-reduction techniques on small-signal and noise behavior:
where A, B, C, and D are in general complex quantities representing the transmission parameters. The reason why we use these is because we frequently wish to approximate the behavior of an ideal transactor in the sense that the behavior is as much as possible determined by well fixed and linear transmission parameters. One other reason is that they greatly simplify noise calculations [1]. The specific design models are as a matter of course based on the complete device models (such a Gummel Poon, BSIM, etc.). In RF processes, the models can be even more complicated than these, thereby resulting in subcircuits. These models are completely useless for initial design purposes. Therefore, we need additional simplified models. The following device specific models are proposed: 3. Derive a simplified non-linear model, not dealing with memory effects, from the complete model. Together with a graphical representation of the device characteristics, this model can be used for discussing biasing techniques and evaluating behavioral aspects relating to operating point and low-frequency non-linearity. 4. From this simplified model, we derive a linearized (small-signal) model to which we add small-signal capacitances resulting in the hybrid equivalent circuit for the bipolar transistor and in similar equivalent circuits for the other devices. This model allows us to study the small-signal dynamic behavior in
56
terms of gain, immittances, poles and zeros, impulse response, etc. 5. In order to obtain a noise model, we add (stationary) noise sources to the small-signal model. In general these noise sources will depend on the biasing conditions, so that this model allows us to optimize the noise performance for a certain source impedance. The small-signal models for the different device types have the same topology. The bipolar transistor has the most complicated model, called the hybrid model and is shown in figure 4.
In the FET model The noise models for these two types of devices are given in figures 5 and 6, respectively.
Note that the 1/f noise in the bipolar transistor occurs in the input current noise source whereas in the FET it occurs in the output current source. The influence of these 1/f noise sources is therefore distinctly different for different source impedance conditions.
5. Single stage LNA examples. 5.1.Introduction. Although the design methodology as advocated here is
57
illustrated by LNA topology examples with bipolar transistors, it is a general methodology that can equally well be applied to MOS transistors, junction transistors or even vacuum tubes. As a matter of course, the models of these other devices are different in terms of their static, dynamic and noise performance. It may be clear that we will have specific process technology requirements dependent on frequency and dynamic range. Moreover, some topologies are more suited to be used at lower supply voltages than others. It is not the purpose of this text to discuss all these aspects but rather illustrate the way of reasoning, the use of generic models and error-reduction techniques. In order to keep it simple we will even ignore the dynamic behavior in our examples. We will discuss in this section some examples of single-stage amplifiers based on an idealized bipolar transistor biased in its normal operating region. It is characterized by its exponential relation between base-emitter voltage and collector current and a fixed relation between base current and collector current:
Although this is a highly simplified model for the device, it serves our purpose of showing trends in performance improvement due to different design measures. Ignoring the influence of with respect to the generic non-linear voltage-controlled model leads to the following equations for excursions from the operating point.
Alternatively we can write these equations in a current-controlled form as:
58
The small-signal model (without memory effects and the base resistance) is given in figure 7, where
In figure 8, we have added the shot-noise sources With these can alternatively be written as equivalent thermal noise sources:
Note that we have left out the base resistance and its associated noise source a simplification that is only allowed when In many cases, however contributes significantly to the noise (as the gate resistance does in a MOSFET).
59
5.2 CE stage. As an example, we will now consider a single stage amplifier implemented as a common-emitter stage intended as an LNA for a source. The load is modeled as a noise-free resistor. The noise contribution of the load can then be modelled by two equivalent noise sources representing the noise of the block that is driven by the amplifier. This amplifier is shown in figure 9, where we have added the four biasing sources to the transistor in order to translate the operating point to the origin.
Although this is not a realistic way of biasing, it serves our purpose of highlighting the amplifier properties without any effect of the biasing components. Due to the (over)simplified modeling of the transistor (A = 0, C = 0), only the noise-current source at the output will contribute to the total equivalent noise of the amplifier. This noise will be transformed to the input into two equivalent sources:
where
and
Note that we cannot define a noise figure for the block behind the amplifier in a situation like this since the amplifier has an infinite output impedance. Assuming that we can ignore the contributions of this block with respect to the noise production of the transistor itself, we find (with a Norton-Thevenin transformation) for the total equivalent
60
input noise:
This noise can be minimized by making the two contributions of the transistor equal, which leads to
With amounts to
in the present example, the optimum value of
This requires a collector bias current of 5mA. The noise figure is then given by and The input impedance of the amplifier has a value of This means that we don’t have an input power match. If we need a termination, we would need a bias current of 50mA or we would have to use a shunt resistance. The first option would lead to NF = 1.76dB, but is quite unrealistic in view of its current consumption. The second option would lead to a noise figure of at least 3dB. A combination of noise optimization and power matching in one CE stage appears to be impossible without using other techniques. Before we consider these possibilities, we will briefly look at linearity issues. First, we assume that the input impedance of the stage is much higher than the source impedance. The source voltage then appears entirely on the base-emitter junction. In this situation we can evaluate the compression and intercept points. Here, we consider the third-order intercept point (IP3) exclusively. For this purpose we apply two voltage sources at two different frequencies at the input, each with an rms value of for example Note that we present the IP3 figures in relation to the open source voltage, which is not conventional! The intermodulation component at the output is then 68dB below the required signal components and the third-order intercept point is found at about As the bias current is made larger, the linearity will improve, however not very significantly. For example at 5mA , where the noise optimum occurs, we still have about 90% of the signal on the base-emitter junction and the IP3 will increase by about 2dB. At 50mA , where
61
we obtain a
termination, the IP3 will further increase to about
5.3 CE stage with matching network. The combination of noise optimization and power matching is not possible for a CE stage. When we use a matching network at the input, we can, however, manipulate the dynamic range. A wide-band match would require a transformer. A narrow-band match can be obtained by means of an LC network. In figure 10 we show two examples, where a termination of the source is supposed to be realized.
Assuming that we have no insertion loss, the source impedance of is ideally transformed by a factor where n is the turns ratio of the transformer. Let be 10, so that we need to realize an input impedance of in the CE stage. This will require a bias current of about 5mA (as opposed to 50mA in the previous section). We will find that in this case NF = 1.76dB (as in the 50mA case without matching network). Because the base-emitter voltage is about larger than in the 50mA case, the IP3 will be about 10dB lower
62
A noise figure of 1.76dB can also be obtained when we take In that case we need a bias current of 500mA to obtain a termination. Since the base-emitter voltage is now much lower, the IP3 will be higher
5.4 Single stage with non-energic feedback. From section 5.2 it is clear that we create some more degrees of freedom by using a matching network in combination with a CE stage. In this section we will show that the application of negative feedback can do the same, however at a much lower current consumption. For this purpose we consider a single stage configuration where the current gain is made close to unity by means of non-energic shunt feedback at the input in combination with series feedback at the output. This results in a common-base (CB) stage. All transmission parameters remain virtually the same as in the CE stage, except the parameter D which obtains a value close to unity. The non-energic feedback will lead to virtually the same values for the equivalent input noise sources as in the CE stage. The input impedance (with A and C still equal to zero) is given by: If we wish to terminate the source in a impedance, we have to bias the stage at about 0.5mA . In this operating point, the noise voltage source is dominant and the noise figure will given by The base-emitter voltage now equals half the open source voltage and therefore we can expect a better linearity than in the CE stage in the same operating point. The IP3 now has a value of the same as was obtained for the CE stage at a bias current of 50mA ! Also in this case, we cannot combine a noise optimization with a power match. The optimum noise figure would be obtained at a collector bias current of 5mA , same as in the CE stage. The input impedance will then be about When the antenna is very close to the LNA input, this could be a proper termination. The source current is sensed in this way and it allows band-pass filtering with a series LC tank as shown in figure 11.
63
In this case we find that the exponential i – v relation of the base-emitter junction has much less influence on the non-linearity. We find: NF = 0.45dB and This leaves us with the question how to realize such a high dynamic range while simultaneously meeting the power match and the noise optimum. For this purpose we need dual-loop negative feedback.
5.5 Non-energic dual-loop feedback. The CB stage uses a single negative feedback loop. It uses series feedback at its output port, giving it a current source character at the output and shunt feedback at the input port resulting in a low input impedance, thereby making it suitable for current sensing. In the operating point that optimizes the noise performance for a source impedance, the input impedance is much too low for obtaining a power match. By applying an additional feedback loop that realizes series feedback at the input, we can increase the input impedance to the required value of Preferably, this second feedback loop would also use a non-energic element. Looking now at the table of useful amplifier types, this element should either increase the transmission parameter B or A. In the first case we would have and in the second case Increase of the parameter B requires output series feedback (current sensing) together with a conversion into a voltage to be applied in series with the input. It would
64
therefore need a gyrator as a non-energic negative feedback element and this cannot be considered as a realistic option. The second case leads to an influence of the load impedance on the input impedance. For to be equal to we need to be equal to as well. In order to linearize the v – i relation of the input port, we would also require the load to have a linear v – i relation (assuming that the feedback element is linear). If is the input impedance of a block behind the LNA this is not very likely. Therefore, we have no other good option than to use a linear resistor as a load for the stage and make sure that this load is not significantly influenced by the load of the block behind the LNA. For this purpose we could use a voltage follower when the load has to be driven by a voltage or a transconductance stage delivering a current. We will present one of these options as an example. The first step is to bias a CB stage for optimum noise performance (5mA for a source) and provide this stage with a linear resistive load. Note that this load resistance affects the noise figure as if it were in parallel with the input since Therefore, it should be much larger than let’s say The parameter A should then have a value of A = 0.05. For this purpose we can use a transformer as a non-energic element with a turns ratio of 20. The circuit could look like figure 12.
Due to the load resistor, the noise figure is somewhat higher than the absolute minimum at The IP3 increases to Much better figures cannot be achieved in a single idealized stage. With a real transformer, the primary inductance
65
should be large enough not to form an additional load for the CB stage at the frequencies of interest. This is usually an unrealistic requirement.
5.6 Single stage lossless and resistive feedback. An alternative for the transformer is a lossless or a resistive 20 : 1 voltage divider. Lossless feedback would affect the load impedance so that the wide-band character is lost, whereas resistive feedback leads to a resistor in the base lead of This would result in a noise figure of more than 3dB. Using an additional voltage (emitter) follower solves this problem in the way shown in figure 13.
Due to noise of this stage and its resistive load, noise and linearity performance will degrade with respect to non-energic feedback. We find: NF = 1.6dB and We can alternatively base the design on the fourth type of amplifier in the table with dual-loop configurations. The input
66
impedance is then given by Obviously, we have to fix both the transconductance and the transimpedance of the stage and we need a resistive linear load at the output port. We can base such a design on a CE stage, where the transconductance is chosen for optimum noise performance: for a source impedance. Then and consequently As an example we take and The transmission parameter C is determined by shunt feedback both at the input and at the output, resulting in the circuit of figure 14.
For this circuit we achieve a power match to at NF= 0.6dB and Also here a voltage follower or a transconductance amplifier would be needed when the load has to be driven by a voltage or a current, respectively. If we do so, we might as well include the second stage in the transimpedance feedback loop as shown in figure 15. This doesn’t affect the previous figures significantly but we can now load the output stage with much less effect on the input impedance.
67
5.7 Conclusions on single stage configurations. We have demonstrated in this section how we can manipulate the dynamic range of single-stage low-noise amplifiers. In order to avoid the use of transformers , the CB stage, being a non-energic feedback configuration, is a good starting point for obtaining a high-dynamic range LNA. Especially in cases where we wish to sense the source current, the CB stage will be a good option as demonstrated in section 5.4. A power match can be obtained at a low bias current, which results in a relatively high dynamic range. When the range has to be extended, we need an additional stage as argued in section 5.5. We didn’t discuss the realization of a high input impedance which would be needed for sensing a source voltage. It goes without saying that the emitter follower and the transconductance stage (sometimes labeled as emitter degeneration) are the best single-stage options in that situation. As was mentioned in the introduction, LNAs may have more ports where interfering signals from other sources than the desired one can enter into the amplifier. In discrete designs, we can normally take proper measures to avoid this. For example, we can have a good ground plane as well as proper decoupling of the supply voltage. In an integrated circuit, the bondwires make the internal ground and supply connections to relatively high
68
impedance nodes at high frequencies. Common supply and ground rails can therefore act as serious sources of interference. Moreover, coupling between bondwires and package pins may introduce other sources. Finally, substrate coupling will be an important issue especially in systems where digital processing is done on the same chip. The design strategy in such cases tries to desensitize all parasitic input ports of sensitive parts in the system by using balancing and isolation techniques. Simultaneously it has to lower the production of interfering signals by using similar techniques (for example current-mode logic in extreme cases). In the next section we will discuss a design approach.
6. Balanced configurations. 6.1. Introduction. Besides the technique of negative feedback some other techniques are required especially to prevent interfering signals from entering into the LNA through unintended input ports. The most powerful technique uses a combination of balancing and isolation. The result of this technique for single devices is the well-known differential pair. Under ideal drive and load conditions, a differential source signal and a differential load, we obtain favorable properties with respect to the sensitivity to interfering signals. Frequently it is difficult to meet these conditions since the source signal is normally not available in differential mode. The load is normally on chip and can therefore be made differential. In this section we will briefly present a general approach to the synthesis of balanced configurations.
6.2 Odd function synthesis. A differential pair has odd v – i characteristics as opposed to a single stage where both even and odd terms describe the Taylor expansion of these characteristics. In a structured design approach it is useful to develop a synthesis technique for odd functions in general and investigate the behavioral modifications that it brings about. A starting point for this technique is the use of generic
69
non-linear two-port equations as introduced in section 4. We can write these generic equations in six different forms (like the equations of linear two-ports). To illustrate the approach, we will only use voltage and current controlled representations of the biased two-ports. Voltage controlled representation:
Current controlled representation:
Taking the voltage-controlled representation as an example and reversing the orientation of the non-linearities as shown in figure 16, the equations change into:
These equations do not only represent the reversed two-port but also the two-port where complementary devices replace the original devices. In other words when the original two-port contains NPN devices, the other contains PNP devices with the same values of the model parameters. Moreover, all two-terminal devices in that two-port are reversely connected. A new two-port, described by a set of odd functions is obtained when we add the input currents as well as the output currents of the original two-port and the reversed or complementary two-port as shown in figure 17.
70
Provided that we do not violate the two-port constraints of the individual two-ports, we can write the equations describing the combination as
We label this combination as an anti-parallel connection of two identical two-ports or as a parallel connection of two complementary two-ports. In the case where the two-port contains a three-terminal device like a bipolar transistor, the two-port constraints will be violated in an anti parallel connection. The result is still an odd function but the two-port will behave as the anti-parallel connection of two diodes. The equations will not be valid for this configuration. In the parallel connection of two complementary three-terminal devices, however, the equations are valid. As an illustration, consider the configuration of figure 18, where we have two biased complementary transistors in parallel.
71
It is immediately clear that the biasing current sources can be omitted since they cancel in pairs. Using the simplified device equations for the bipolar transistor of section 5, we find
Note that these characteristics show no compression. They are expanding up to the level where the signal is limited by the supply voltage (or current). This illustrates that the frequently assumed relation between the 1dB compression point and IP3 does not always exist. Due to the odd characteristics, we find ideally no even-order distortion or intermodulation products. It is interesting to see what happens to the transmission parameters and the equivalent noise sources of such a stage. Without proof (easy to verify) we will find that the transmission parameter B is halved (twice the transconductance of a single device) and that the transimpedance (1/C) is halved. For the noise sources, we find that the spectrum of the equivalent noise voltage source at the input has half the value of each single two-port, whereas the spectrum of the equivalent input current source is doubled. This means for example that we can obtain the same noise figure as in a single two-port at half the bias current. This example has a limited practical value since normally we
72
have no good PNP transistors in a process. For MOS devices it may be more useful. Furthermore, since the configuration will be biased by voltage sources, there is normally not a good isolation from the supply and ground rails. We can much better isolate by using anti-series connections both at the input and at the output port. These follow from the current controlled representation. To synthesize these, we take both the original two-port and the reversed (or complementary) two-port and connect them as shown in figure 19.
Obviously, we can now omit the bias voltage sources, since these cancel in pairs. This leads in practice to much better isolation possibilities. The resulting equations now become:
For the bipolar transistor with the simple model this leads to:
When converted to a voltage controlled representation, we find: and with The characteristics of the anti-series connection of two bipolar transistors are obviously compressing. Again, we will have no
73
even-order distortion and intermodulation products. For the transmission parameters of general anti-series connected two-ports, we now find that B has been doubled, whereas C will be halved. The input voltage noise spectrum doubles and the current noise spectrum is halved. The anti-series connection of two single devices obviously yields a differential pair.
6.3 Balanced LNAs. In order to fully profit from the properties of a anti-series balanced configuration, we need to drive it from a balanced source and make sure that also the load is balanced. This normally requires that we use a balun (balanced to unbalanced network) at the input of the amplifier. Ideally this would be a transformer with a floating output port. However, from an application point of view this is usually not highly appreciated. Since transformers on the chip are not very realistic, neither are on-chip inductances because of their low Q and consequently high insertion loss, we have to find a solution outside the chip. This requires of course two pins. However, also in an unbalanced case it will be wise to spend two pins for the input of the LNA. One of these can then be connected to the external ground plane so that interfering signals have less chance of entering into the LNA. In this section we present only one configuration of a balanced LNA, which is based on the example in figure 13. Since the noise voltage source of an anti-series connection is 3dB larger than in a single stage and the noise current source is 3dB smaller, the noise optimum for a given source impedance occurs at four times the total current consumption. For a source this would lead to a current of 10mA per transistor. In order to achieve a lower current consumption, the use of a balun that converts the impedance in for example is recommended. The biasing current for minimum noise can then be reduced to a total of 5mA . The noise figure can then again be as low as 0.45dB, provided that we use no feedback or non-energic feedback. In the first case, we have an anti-series connection of two CE stages (a conventional differential pair). The following table gives NF and IP3 values for different operating currents.
74
At 50mA total bias current we obtain a power match at the input. The NF and IP3 values are the same as for a single stage as could be expected. Next we look at the anti-series connection of two CB stages. Now, a power match is obtained at a total current of 0.5mA , whereas the optimum noise figure is again found at 5mA . The results are given below:
We assume now that we wish to have a power match in combination with a very low noise figure and use the same technique with an additional emitter follower as before. Figure 20 shows this configuration where NF = 1.45dB and with a current of 2.5mA per emitter follower. A larger current in the follower will yield a higher noise figure as well as a higher IP3. At 5mA, we find NF = 1.62dB and Assuming that the supply voltage is sufficiently high, this topology is capable of directly driving a double balanced mixer when using the collector currents of Q18 and Q20. Furthermore, the topology is also suitable for driving two of such mixers by doubling the output stages. The noise currents at the outputs, being mainly determined by the input stage, are almost fully correlated so that the noise in the image channel can be optimally rejected. Since we have a power match, it makes sense to express the IP3 in terms of dBm and find IP3 = +8 dBm. Note that this figure is related to the power entering into the amplifier and not to the open source voltage as before. In such an amplifier with real transistors, we can expect the noise figure to be about 1dB higher, mainly because of the non-zero base resistors. At frequencies below the IP3 will be
75
close to +8dBm at higher frequencies it drops because of decreasing loop gain. An LNA configuration like this would be capable of meeting the very high dynamic-range requirements in a high-end FM receiver.
In most receivers for mobile communication, the dynamic range requirements are much lower, even so that in some applications no negative feedback has to be applied. If so required, an impedance match can then be obtained by virtue of the base resistance and the input capacitance, resulting however in a noise figure larger than 3dB. The application of inductive series feedback is another popular way to fix the input impedance. Its use is limited to narrow-band applications and requires a fair amount of silicon area. The dynamic range will significantly benefit from this type of feedback.
7. Conclusions. This text has briefly dealt with a structured design approach for low-noise amplifiers. Although we could not provide a full background and justification for the way of reasoning, we believe that the examples sufficiently illustrate how this approach can be
76
used. In order not to complicate things too much, we deliberately ignored the effects of reactive elements and bulk resistances in the devices. Moreover, we didn’t discuss the effects of biasing elements. It goes without saying, that these effects need careful attention in a real design. However, their presence doesn’t affect the way of reasoning as far as the choice for an LNA concept is concerned. Presently, LNAs are designed in silicon or silicon-germanium technology for frequencies ranging from 100kHz up to several GHz. It depends highly on the system specifications which technology is most suited to the integration of these circuits. It is save to say that bipolar silicon and silicon-germanium are the preferred technologies for high performance LNAs. The topologies discussed in this text are useful up to at least one tenth of the transit frequency. For other parts of the system, such as active IF filters, CMOS may be a very welcome addition. The use of RFCMOS processes is still limited to lower performance requirements, not in the first place because the noise and speed properties would be insufficient, but mainly due to inadequate modeling and characterization.
Reference. [1] ”Design of High-Performance Negative Feedback Amplifiers”, E.H. Nordholt, Delft University Press, Delft, 2000, ISBN 90-407-1247-6.
High-level simulation and modeling tools for mixedsignal front-ends of wireless systems Piet Wambacq, Gerd Vandersteen, Petr
Michael Goffioul,
Wolfgang Eberle, Mustafa Badaroglu, Stéphane Donnay
IMEC vzw Kapeldreef 75 3001 Heverlee, Belgium Abstract Wireless applications such as WLAN, GSM, DECT, GPS, ... require low-cost and low-power transceivers. Moreover, a high flexibility is required when wireless terminals will have to cope simultaneously with several standards. To achieve this, while maintaining high performance, the possibilities of analog and digital signal processing need to be combined in an optimal way during the realization of a transceiver. This is only possible when system designers can efficiently study tradeoffs between analog and digital. Making such tradeoffs is too complicated for pen-and-paper analysis. Instead, efficient simulation of mixed-signal architectures with detailed models for the different building blocks is required. This paper discusses high-level modeling and simulation approaches for mixed-signal telecom front-ends. Comparisons to commercial high-level simulations show an important reduction of the CPU times of typical high-level simulations of telecom transceivers such as bit-error-rate computations. This efficient simulation approach together with the accurate modeling tools, that include substrate noise coupling, form an interesting suite of tools for advanced architectural studies of mixed-signal telecom systems.
1 Introduction Front-ends of telecom transceivers perform the combination of downconversion, removal of interferers by filtering, channel selection and amplification in the receive path, and upconversion and amplification in the transmit path. These functions are implemented partially in the analog and partially in the digital domain. The amount of digital signal processing is steadily growing in modern telecom systems. The digital world offers a higher flexibility compared to analog blocks, and can – at least partially – compensate some of the signal impairments caused by analog front-end blocks. To predict the effectiveness of complicated digital compensation algorithms, the analog and the digital blocks need to be simulated together. Transistor-level simulations are not feasible to this purpose. Even a co-simulation of digital blocks at a higher abstraction level with analog blocks at the transistor level, as is possible e.g. in SABER [1], is not feasible. Indeed, typical measures of telecom systems are bit-error-rates, packet77 M. Steyaert et al. (eds.), Analog Circuit Design, 77–94. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
78
error-rates, ... that require simulations in which many bits of information are propagated through the complete telecom link. This is typically done in MonteCarlo simulations [2]. If in these simulations iterations are required per timepoint (as is the case in SPICE transient simulations) then simulations proceed far too slowly. High-level simulation is still not widely accepted for analog and mixed-signal systems. Analog designers have a blind confidence in circuit-level SPICE-type simulations and often prefer breadboarding or extra IC iterations to high-level modeling. This reluctance has several reasons. First, high-level models of analog blocks often depend on “low-level details”. This complicates the construction of high-level models. Further, a consistent translation of the high level - whatever this may be for analog circuits – to the circuit level is a step that is difficult to automate. This means that after high-level simulations one has to start the circuit-level design almost from scratch, the only reusable information inherited from the higher level being the specifications for the circuit. In spite of their limited acceptance, high-level simulations increase the design productivity. This is confirmed by several success stories of high-level simulation of effects that take too much time or are even impossible to simulate at the circuit level. Examples are found in the simulation of analog-to-digital converters [3], digital-to-analog converters [4], phase-locked loops [5], ... Complete transceiver front-ends have an even higher complexity than the above subsystems, such that the use of high-level simulations is even more justified. Although high-level models are a necessity to come to BER simulations in acceptable CPU times, the high-level models should not hide too much detail, since otherwise important signal-degrading effects are not seen in the simulations. In this way, the front-end blocks must be modeled by more than just their nominal behavior. Second-order effects that can seriously degrade the BER occur both in the analog domain (noise, nonlinear behavior, phase noise, impedance loading,...) and in the digital domain (finite wordlength effects, passband ripple and finite stopband suppression of filters, ...).
1.1 Efficient high-level simulations Apart from a high simulation accuracy and efficiency, there are several challenges for these high-level simulations: 1. Analog and digital blocks should be simulated together; 2. baseband analog and digital blocks should be co-simulated with RF blocks; 3. the input signals to the front-ends are not sinusoidal, but digitally modulated signals, that cannot be described by a small set of sinusoidal signals.
These requirements exclude the use of classical simulation approaches both from the low-frequency IC design world (SPICE-like simulations) and the microwave world (harmonic balance methods). Several alternatives have
79
already been elaborated to meet these requirements. E.g. a co-simulation of analog and digital is implemented in SABER [1]. The second and third requirements are addressed with approaches such as the circuit envelope approach [6], envelope transient analysis [7], envelope following [8]. Although these approaches originally were used at the circuit level, they can be used at the higher level as well (e.g. a combination of HPTOLEMY and the circuit envelope approach in ADS [9]). However, as will be demonstrated in Section 2.2, they lack simulation efficiency to obtain BER simulations in short CPU times. To increase the simulation efficiency, one could try to avoid the use of iteration at each simulation point (e.g. a timepoint). This is not possible at the circuit level, since in general the network equations and transistor model equations yield sets of coupled nonlinear differential equations that cannot be solved without iteration. At a higher abstraction level, circuit details can be omitted. If this is done in a clever way, then the response at each timepoint can be computed without iteration. In other words, the challenge for high-level models is that they are sufficiently accurate while they can be evaluated without iteration. Avoiding iteration is not a matter of the models only: the simulation method also plays a role. For example, the simulation engine for a SPICE-like transient analysis always uses iterations, regardless of the shape of the equations. This also explains why simulators for VHDL-AMS or Verilog-AMS, that also use iteration, are not efficient enough for BER computations of transceivers. The main EDA players offer a bunch of high-level simulation tools for mixedsignal communication systems. Examples are SPW from Cadence [10], COSSAP from Synopsys [11], ADS from Agilent [9]. Still, research on highlevel simulation is ongoing [12, 13, 14] to increase the simulation efficiency even more. In Section 2 the high-level simulator FAST, developed at IMEC for the simulation of transceiver front-ends is presented and compared to ADS using a 5 GHz WLAN receiver front-end as a simulation example.
1.2 Accurate high-level models For realistic high-level simulations the accuracy of the models is crucial. These models should cover the relevant signal degrading effects, while at the same time they should not be too complex such that simulations are slowed down too much. Further, the construction of such models should not be too time consuming. This is especially a problem for the nonlinear behavior of analog circuits. In Section 3 we discuss an approach to generate nonlinear high-level models from a circuit netlist. The resulting models can be interpreted such that they yield insight in the linear and nonlinear operation of the circuit. They take into account frequency dependent linear and nonlinear behavior, impedance loading and noise, and they do not need iterations during evaluation. Similar models can also be constructed starting from measurements, as will be discussed in the same section.
80
1.3 A mixed-signal design flow High-level simulators can be used to make clever architectural trade-offs between the analog domain and the digital domain. However, such simulation tools will only find acceptance in the design community if they can be inserted elegantly in a design flow that covers both analog and digital. For the analog blocks this requires for example that there should be a transparent link between the high-level models used at the architectural level and the circuit level, in the sense that the error between the behavior of the lower-level model and the higher-level model should be user-controllable. Similarly, for digital blocks there should be a coupling down to the gate level. These issues are discussed in Section 4.
1.4 The problem of substrate noise coupling Having determined an optimal analog-digital partitioning with high-level simulations, the design of analog and digital parts can be performed independently, at first sight. However, when these two parts are put on the same chip, then the switching activity in the digital part produces noisy signals that propagate via the silicon substrate to the analog parts. This interference tends to be more pronounced as IC technologies scale down and the relative size and complexity of the digital part increase. Design flows often do not take into account yet this effect. The few exceptions [15] consider the effect substrate noise after layout. Since layout occurs at the end of a design flow, several iterations over the design flow might be needed to master the substrate noise problem. A methodology that predicts the substrate noise voltage at the gate level is presented in [16, 17]. This approach, which has been implemented in the program SWAN (Substrate Noise Waveform Analysis tool), predicts the substrate noise voltage with an accuracy of about 10% on the RMS voltage for digital circuits of practical size on a low-ohmic substrate with an epitaxial layer [18]. SWAN has been successfully used to develop digital design techniques that produce lower substrate noise than standard design techniques [19]. SWAN is now being extended to high-ohmic substrates. An approach such as SWAN is a good candidate for inclusion in a complete mixedsignal design flow that can handle analog-digital tradeoffs while taking into account the interference of digital signals to the analog circuits.
2 Efficient high-level simulation of telecom front-ends To simulate the bit-error-rate (BER) of a complete telecom link that includes the transmit part of the digital modem, the transmit front-end, the channel, the receiver front-end and the receive part of the digital modem, a very efficient simulation engine is required, since a lot of information (corresponding to many experiments in a Monte-Carlo approach) has to be sent through that link. In addition to an efficient simulator, techniques that can reduce the number of experiments compared to a Monte-Carlo analysis can further decrease the CPU time of BER experiments. The next subsections discuss such technique as well as efficient simulation approaches.
81
2.1 Co-simulation of analog, RF and digital High-level simulations that support architectural studies of mixed-signal telecom front-ends most often require the co-simulation of three parts: 4. a digital part, which in this phase is typically modeled as a dataflow system,
either in floating point or fixed-point representation; 5. an analog part operating at RF frequencies; 6. an analog part operating at lower frequencies (IF or baseband).
For an efficient co-simulation of analog and digital, a common simulation approach for analog and digital can be used. In the program FAST [12] there is one single simulation method for analog and digital blocks, namely a dataflow approach. To this purpose the analog blocks are translated before simulation into a computational graph, that is an equivalent digital dataflow representation. A simulation is then nothing else but an evaluation of that computational graph. Efficient co-simulation of RF blocks with low-frequency blocks is not feasible with a SPICE-like approach. This is due to the large difference in time constants of low-frequency and high-frequency blocks. A harmonic balance method [20] circumvents this problem, but this method is only efficient for signals that can be represented by a small number of sinusoidal signals. Digitally modulated signals cannot be accurately described as a sum of just a few sinewaves. The problem of large differences in operating frequency of the front-end blocks is often solved with a complex lowpass signal representation [2]. This is used in tools such as SPW [10] and COSSAP [11] to co-simulate RF blocks and digital blocks with a dataflow approach. With the complex lowpass representation only in-band distortion is considered for the nonlinear blocks by modeling these blocks with AM/AM and AM/PM characteristics. Modeling in-band distortion only, however, yields inaccurate results when two nonlinearities are cascaded. Indeed, out-of-band distortion generated by the first nonlinear block can be transformed into in-band distortion by the second nonlinearity. Front-end architectures that increase the degree of integration compared to superheterodyne (e.g direct conversion), generally contain more cascade connections of active (and hence nonlinear) blocks than superheterodyne architectures. Further, in some cases (e.g. in I/Q modulators) PM/AM and PM/PM conversion should be taken into account [21] in addition to AM/AM and AM/PM conversion. The circuit envelope approach [6] that is used e.g. in ADS from Agilent [9], or the related envelope transient analysis [7] solve the problem of large differences in operating frequency by performing successive harmonic balance analyses. In order to take into account the dynamic effects on the modulation, a time-domain numerical integration method is used to compute the influence of one harmonic balance analysis onto the other. This implies that the original harmonic balance equations are augmented with a transient term. With HP Ptolemy [6, 9] it is even possible to couple different envelope simulation processes. However, since
82
the envelope method is based on the harmonic balance method, it suffers from the same drawbacks when performing system level simulations: a large memory usage which is proportional to number of carriers times the number of nodes. This is especially a problem for the simulation of strongly nonlinear behavior. a global definition of the simulation frequencies and the number of harmonics. This implies that the simulator cannot take advantage of the fact that some signals in the signal path can be represented by a subset of these simulation frequencies. Furthermore, for its numerical integration in the time domain the envelope simulator uses a common timestep for all blocks. Hence, it cannot take advantage of a change in the signal bandwidth. The program FAST takes into account out-of-band distortion with the simulation efficiency of a complex lowpass representation by using a local multi-rate, multi-carrier (MRMC) representation of signals: each signal in a front-end is considered as a set of one or more modulated carriers. These carriers are each represented with a complex lowpass model and with a possibly different- timestep. The carriers are used locally. This means that carriers that are important at some place in the architecture are no longer considered at places where they are negligible. Also, the simulation timestep is local: it varies throughout the front-end according to the bandwidth of the modulated signals at a given place in the front-end. A change of the timestep is accomplished by the insertion of digital interpolators or decimators in the computational graph. The digital FIR filters (polyphase filters) that are used in these interpolators/decimators introduce errors due to their finite ripple and stopband suppression. However, this error can be controlled by the user in the sense that a small error gives rise to an FIR filter with many filter taps.
2.2 Example 1: a 5 GHz receiver front-end The approach of FAST is compared to ADS on the simulation of a 5 GHz WLAN receiver front-end (see Figure 1).
83
The CPU time per timepoint for a FAST simulation of the complete receiver front-end of Figure 1 equals on a Pentium II 266 MHz processor. The input at the antenna is an OFDM signal with 256 carriers, each with a QAM 16 modulation. The carrier frequency is 5.25 GHz. In addition to the input signal, two other waveform generators have been used for the two local oscillators (LO1 and LO2). These generators produce a sinusoidal signal with phase noise. Their frequency is GHz and GHz. The LNA is a static nonlinearity, described by a polynomial of order three that relates the output to the input. The coefficients of this polynomial are related to the intercept points [22]. The three mixers in this example are also described with a thirdorder polynomial, but now as a function of two inputs, the local oscillator input and the RF input. All filters in the front-end are elliptic filters. The RF bandpass filter has order three, the bandpass filter at 250 MHz order four, and the lowpass filters order six. Finally, the analog-to-digital converters (ADC) are ideal samplers that quantize the signal with a given resolution. A simulation of the receiver front-end of Figure 1 with FAST is ten times faster than with MATLAB where the MRMC signal representation of FAST is implemented. Without the use of this MRMC signal representation in MATLAB, FAST is 700 times more efficient than MATLAB. The latter does not introduce any approximation error due to integration methods, FFT leakage, digital filtering, ... Therefore, its simulation results can be considered as the reference to check the accuracy of the other approaches. For the simulation of the receiver front-end of Figure 1, the error introduced with FAST (due to the use of digital filters) does not exceed –60 dB anywhere in the front-end. The same receiver front-end has been simulated with the envelope simulator of ADS [6, 9]. Hereby, two independent carriers have been chosen for the harmonic balance simulation, namely a carrier at = 5.25 GHz and a carrier at 250 MHz. This means that for each timestep we have a two-tone harmonic balance approach in which for each carriers three harmonics have been taken
84
into account. The different combinations of two tones are computed everywhere in the circuit, whereas FAST can discard frequencies at some places in the circuit where a frequency component is below a certain (user-definable) threshold. The input signal, which is the carrier at 5.25 GHz, is modulated in the same way as in the FAST and MATLAB simulations. The effects on the modulation are simulated in the time domain with a timestep that is equal to the inverse of the difference in frequency between two adjacent carriers. With this timestep ADS is about as slow as MATLAB without the MRMC representation. Furthermore, the results between ADS on one hand and FAST or MATLAB on the other hand, differ up to a few decibels. This is – at least partially – due to the approximations made by the numerical integration method [23]. These differences decrease when the timestep for the time-domain simulation part is increased. This of course requires more CPU time in ADS.
2.3 Example 2: a complete 5 GHz link Thanks to its dataflow nature, FAST can be coupled fairly easily with a digital simulator. In this way, FAST has been coupled with the digital modeling and simulation environment OCAPI [24], also developed at IMEC. As an example, a complete end-to-end simulation of a WLAN link (see Figure 2) taking into account a complete receiver front-end and a complete receiver modem is possible at a reasonable CPU time (0.35 seconds per OFDM symbol).
85
In the OCAPI environment the digital blocks can be represented at different abstraction levels, ranging from untimed descriptions (used in dataflow simulations) to timed descriptions (which can be specified down to the VHDL or Verilog level). The untimed representation is mostly used in architectural simulations. Hereby signals and constants (e.g. filter coefficients) can be represented as as a mix of fixed and floating-point numbers. The combination FAST-OCAPI offers the possibility to capture non-idealities of digital and analog blocks in one single simulation. Also, it allows to study digital compensation techniques for signal degradations that occur in the analog domain. For example, in a 5 GHz WLAN transceiver the multicarrier modulation gives rise to instantaneous large signal peaks. The large ratio between these peaks and the average signals necessitates the use of a large number of bits for the signals. In practice, the number of bits is limited such that some peaks are clipped to a saturation value. Despite the clipping in the transmit part of the digital modem, there are still large peaks that reach the power amplifier and drive this circuit into saturation. Some amount of nonlinear behavior in the power amplifier can be tolerated, since the equalizer in the digital part of the receiver can reconstruct the distorted signal to some extent. This is illustrated with the FAST-OCAPI simulation results of Figure 3. Mixed-signal compensation is not limited to forward correction. The coupled FAST-OCAPI simulator can be used to implement a feedback topology as well, for example modelling automatic gain control with digital steering of an analog variable gain amplifier.
2.4
Efficient bit-error-rate simulations
The performance of a complete telecom link is often quantified with the biterror-rate (BER). This measure is typically determined with lengthy MonteCarlo simulations. In [25] a methodology is described that reduces the CPU time to determine the BER by more than two orders of magnitude, compared to Monte-Carlo simulations. This methodology is specific for telecom applications that use a multicarrier modulation scheme. Examples of such modulation are OFDM, used in 5 GHz WLAN, and ADSL. An OFDM-modulated signal consists of a sum of carriers, each being modulated using a separate modulation scheme such as Phase Shift Keying (PSK), Quadrature Amplitude Modulation (QAM), ... An OFDM symbol then gives rise to a specific constellation point in the modulation for each carrier. For some symbols, the different carriers can combine in a constructive way such that large peaks occur. The large signal peaks can lead to severe nonlinear distortion (e.g. saturation), which can increase the BER significantly. A practical measure to characterize an OFDM signal is the crest factor CF. This is the ratio of the maximum amplitude over the root-mean-square value of the signal. An efficient way for accurate BER estimations in multicarrier systems requires a dedicated approach. Indeed, measurement and/or simulation of the nonlinear effects on all possible symbols is not feasible. For example, an OFDM
86
modulation which uses 256 carriers and a 16-QAM modulation has possible symbols to transmit.
A Monte-Carlo approach, on the other hand, would still require the generation of many symbols, since the ones that give rise to the high signal peaks, and hence to bit errors, have a fairly low probability of occurrence. To lower the required number of experiments to obtain a given accuracy on the BER, the method of [25] proceeds as follows: prior to simulation, a large number of OFDM symbols is generated. These are classified in sets according to their crest factor. For each set a BER is computed. These BER values together with the probability of occurrence of each crest factor value are combined to obtain the overall BER. This probability has been computed before the actual simulation. It only needs to be done once for a given modulation scheme. The BER for each set of symbols with the same crest factor is computed either with a Monte-Carlo method or with a quasi-analytical method. The latter approach which is used for low crest factors, considers the in-band distortion as Gaussian distributed noise.
87
This noise is computed based on simulation results and it is used in an analytical formula to compute the BER. For high crest factors, the assumption of having Gaussian distributed noise can be violated. Then a Monte-Carlo approach can be used with a higher accuracy but with a comparable efficiency. Indeed, the number of required experiments is low, since the probability of bit errors is high for large crest factors. Using this methodology, the BER of the WLAN link of Figure 2 is determined in one hour (Pentium II, 266 MHz) with the coupled FAST-OCAPI environment and with the models for the analog and digital blocks as in Figure 1 and 2. This is more than 100 times faster than with a pure Monte-Carlo approach for the same accuracy on the BER (see Figure 4).
3 High-level models of analog blocks Analog circuits are inherently nonlinear. This complicates simulation. Moreover, it is much more difficult to construct a high-level model that takes into account nonlinear behavior than a linear model. One of the reasons is that electrical engineers have sufficient knowledge on linear system theory, but usually not on nonlinear systems. To overcome these difficulties, we have developed a methodology [26, 27] that can generate a high-level model from a given circuit (specified as a netlist) with only the dominant nonlinearities. Further, the methodology splits the nonlinear behavior of the total circuit into different contributions, one contribution for each nonlinearity. Since a
88
contribution consists of static nonlinearities and linear transfer functions, it can be interpreted. In this way, this methodology provides high-level models as well as insight into the nonlinear operation of a circuit. The approach yields models that take into account the frequency dependence of the nonlinear behavior of the circuit under consideration. The approach is limited to weakly nonlinear behavior. The weak nonlinearities of a circuit are described as power series that are broken down after the first few terms. For example, the drain current of a MOS transistor is described as a threedimensional power series of three variables, and Each coefficient in this power series, referred to as a nonlinearity coefficient, can give rise to a contribution of the nonlinear behavior. As another example, the small-signal collector current as a function of the AC value and the DC value of the base-emitter voltage is given by
where and are the second- and third-order nonlinearity coefficients, respectively. These are proportional to the second- and third-order derivatives of the transistor model equation. The modeling methodology has been implemented in a program called DISHARMONY. This program computes approximations of the Fourier transforms of Volterra kernels [22]. The first-order kernel transform describes the linear behavior of the circuit. The second-order and third-order kernel transforms, which are functions of two and three frequency variables, respectively, describe the second- and third-order nonlinear behavior. DISHARMONY first computes the exact values of the Fourier transforms. This is performed by combining AC analyses on the netlist of the circuit under consideration, together with a knowledge of nonlinearity coefficients, which are determined with DC circuit simulations. The Volterra kernel transforms that have been computed in this way contain many contributions, namely one for each second- or third-order coefficient of the power series description of the different nonlinearities in the circuit. Next, DISHARMONY determines the contributions that are dominant (up to a userdefinable error). A translation of the dominant contributions into a block diagram yields the final model. In most practical circuits usually few contributions dominate. This leads to compact high-levels that can be evaluated efficiently during high-level simulations. Moreover, a knowledge of the dominant contributions yields insight in the nonlinear circuit behavior. As an illustration, Figure 5 shows a 5 GHz low-noise amplifier and its high-level model generated by DISHARMONY. In this circuit transistor is the amplifying transistor. The
89
rest of the circuit provides the necessary bias. It is found by DISHARMONY that the largest contribution to the second-order nonlinear behavior originates from the nonlinearity coefficient of the collector current power series expansion of Other nonlinearities of such as the base current, the nonlinear diffusion and junction capacitors and the base resistance, yield a negligible contribution. Also, the influence of the other transistors is negligible. For the third-order nonlinear behavior, the nonlinearity coefficients and of yield the most important contributions. These three contributions suffice for an accurate high-level model. In addition to the generation of a high-level model, DISHARMONY also offers the possibility to interpret this model, since it consists of linear transfer functions, scale factors (namely the nonlinearity coefficients) and static nonlinearities or an ideal multiplier). The transfer functions can be interpreted. For example, is the transfer function from the input to the base-emitter voltage of The nonlinearity of the collector current of produces from the signal at the base-emitter voltage a second-order and thirdorder nonlinear current. These currents correspond to the two last terms of equation (1). They propagate through the rest of the circuit (via transfer function to form the second- and third-order output respectively of the circuit.
90
The models generated by DISHARMONY take into account the frequency dependence of the nonlinear behavior in a natural way. The need to take into account this frequency dependence is evident in wideband applications. In narrowband applications, the variation over the frequency band of interest can be neglected. In that case the transfer functions such as the ones shown in Figure 5, reduce to complex numbers. In this way, the model predicts the correct phase shift (e.g. expressed in terms of AM-PM conversion) of the nonlinear response. This is a more complete description than just an intercept point, which is a fixed real number (not a complex one) that does not model any phase shifts. Accurate high-level models used in FAST can be determined based on simulations, as is done with DISHARMONY, but also on measurements [28]. As an example, a high-level model for use in FAST has been derived for the 5 GHz low-noise amplifier of Figure 6, based on nonlinear S-parameter measurements. In this model the nonlinear dependencies are described as loworder nonlinear rational functions that do not require iterations during simulation.
4 Towards a mixed-signal design flow A modeling approach such as DISHARMONY and a high-level simulator such as FAST can only find acceptance in the design community if they can be linked in an elegant way to design tools at the circuit level and the layout level. A possible design flow that links the high level to the levels below is shown in Figure 7.
91
The design flow starts with a high-level simulation of a front-end architecture. At this stage analog-digital tradeoffs can be made. Afterwards, the architecture is split into analog and digital parts according the chosen tradeoff. The digital parts are designed in a separate design flow (not shown in Figure 7) [29]. During the first high-level simulations the models for the analog blocks are very rough. They could be generated by hand at this stage. From the high-level simulations an initial set of specifications for the individual analog/RF circuit blocks is derived. These serve as input for the circuit-level design. This can be performed in a manual way or using analog synthesis tools [30]. After the circuit design, the layout can be generated. Again, CAD tools for analog placement and routing could be used here to speed up the process [31]. The design flow as described up till now is a top-down process. The reliability of the results however very much depends on the high-level models used to model the behavior of the different blocks. The flow therefore also contains a bottom-up path in the form of verification using bottom-up models that are generated automatically from a circuit netlist. In this verification stage DISHARMONY can be used to extract a reliable high-level model that is consistent with the circuit that has been designed. These models are more accurate than the models that have been used initially, such that high-level simulations are now more reliable, leading to more realistic specifications for the analog circuits. Moreover, the accuracy of the DISHARMONY model is user-definable, such that the user can control the consistency between a circuit and the corresponding high-level model. This design flow splits the analog and the digital parts completely after the high-level tradeoffs have been made. This split neglects possible interference between the digital and the analog domain, as discussed in the next section.
92
5 Conclusions Modern telecom front-ends require intelligent tradeoffs between analog and digital to meet the stringent specifications. These tradeoffs are made at the architectural level and their effectiveness is tested with bit-error-rate (BER) simulations. This paper has shown several methodologies to efficiently perform this task: a high-level simulator FAST, a modeling methodology DISHARMONY that yields accurate high-level models for analog circuits, and an efficient BER estimation method for multicarrier modulation schemes such as WLAN. Per simulation step, FAST is more than two orders of magnitude faster than commercial approaches. In addition, the presented BER estimation method reduces the number of experiments with more than two orders of magnitude compared to a Monte-Carlo approach. This method, combined with FAST, can compute BER values of the order of in CPU times of about one hour for a complete WLAN link with high-level models that take into account non-idealities in the analog and the digital domain. Together with an approach for high-level modeling of substrate noise caused by digital circuits, these tools can be integrated in a mixed-signal flow for the design of advanced architectures of front-ends of wireless terminals.
6 References [1] SABER of Avant!, http://www.avanticorp.com/ [2] Jeruchim, Balaban and Shanmugan, “Simulation of Communication Systems,” Plenum, 1992. [3] F. Medeiro, B. Perez-Verdu, A. Rodriguez-Vazquez and J.L. Huertas, “A vertically integrated tool for automated design of modulators,” IEEE J. Solid-State Circuits, vol. 30, no. 7, pp. 762-772, July 1995. [4] J. Vandenbussche, G. Van der Plas, G. Gielen and W. Sansen, “Behavioral model of reusable D/A converters,” IEEE Trans. Circuits and Systems II: Analog and digital signal processing, vol. 46, no. 10, pp. 1323-1326, Oct. 1999. [5] A. Abidi, “Behavioral modeling of analog and mixed signal IC’s,” Proc. Custom Integrated Circuits Conference, pp. 443-450, May 2001. [6] J.L. Pino and K. Kalbasi, “Cosimulating synchronous DSP applications with analog RF circuits,” Proc. Annual Asilomar Conference on Signals, Systems, and Computers, Nov. 1998. [7] E. Ngoya and R. Larchevèque, “Envelop transient analysis: a new method for the transient and steady state analysis of microwave communication circuits and systems,” Proc. IEEE MTT-S, pp. 1365-1368, 1996. [8] K. Kundert, J. White and A. Sangiovanni-Vincentelli, “An envelopefollowing method for the efficient transient simulation of switching power and filter circuits,” Proc. IEEE International Conference on ComputerAided Design, November 1988. [9] HP-ADS of Hewlett-Packard, http://www.tm.agilent.com/tmo/hpeesof/products/ads/adsoview.html. [10] SPW of Cadence, http://www.cadence.com/products/spw.html
93
[11] COSSAP of Synopsys, http://www.synopsis.com/products/dsp/cossap_ds.html . [12] Gerd Vandersteen, Piet Wambacq, Yves Rolain, Petr Dobrovolný, Stéphane Donnay, Marc Engels, Ivo Bolsens, “A methodology for efficient high-level dataflow simulation of mixed-signal front-ends of digital telecom transceivers”, Proc. Design Automation Conference, pp.440-445, 2000. [13] I. Vassiliou and A. Sangiovanni-Vincentelli, “A frequency-domain, Volterra series-based behavioral simulation tool for RF systems,” Proc. IEEE Custom Integrated Circuits Conference, pp. 21-24, 1999. [14] P. Vanassche, G. Gielen, “Efficient Time-Domain Simulation of Telecom Frontends Using a Complex Damped Exponential Signal Model”, Proc. DATE 2001. [15] SubstrateStorm of Simplex, http://www.simplex.com. [16] M. van Heijningen, M. Badaroglu, S. Donnay, M. Engels and I. Bolsens, “High-Level Simulation of Substrate Noise Generation Including Power Supply Noise Coupling”, Proc. Design Automation Conference, pp. 446451, 2000. [17] M. van Heijningen, J. Compiet, P.Wambacq, S. Donnay M. Engels and I. Bolsens, “Modeling of Digital Substrate Noise Generation and Experimental Verification Using a Novel Substrate Noise Sensor”, IEEE Journal of Solid-State Circuits, vol.35, pp.1002-1008, July 2000. [18] M. van Heijningen, M. Badaroglu, S. Donnay, H. De Man, G. Gielen, M. Engels and I. Bolsens, “Substrate noise generation in complex digital systems: efficient modeling and simulation methodology and experimental verification”, Proc. International Solid-State Circuit Conference, pp. 342343, 2001. [19] M. Badaroglu, M. van Heijningen, V. Gravot, J. Compiet, S. Donnay, M. Engels, G. Gielen and H. De Man, “Methodology and Experimental Verification for Substrate Noise Reduction in CMOS Mixed-Signal ICs with Synchronous Digital Circuits”, Proc. IEEE International Solid-State Circuits Conference, 2002. [20] K. Kundert, J. White and A. Sangiovanni-Vincentelli, “Steady-state methods for simulating analog and microwave circuits,” Kluwer Academic Publishers, 1990. [21] J. Chen, D. Feng, J. Philips and K. Kundert, “Simulation and modeling of intermodulation distortion in communication circuits,” Proc. IEEE Custom Integrated Circuits Conference, pp. 5-8, 1999. [22] P. Wambacq and W. Sansen, “Distortion analysis of analog integrated circuits,” Kluwer Academic Publishers, 1998. [23] K. Kundert and P. Gray, “The designer’s guide to SPICE and Spectre,” Kluwer Academic Publishers, 1995. [24] P. Schaumont et al., "A programming environment for the design of complex high speed ASICs", Proc. Design Automation Conference, pp. 315-320, June 1998.
94
[25] Gerd Vandersteen, Piet Wambacq, Yves Rolain, Johan Schoukens, Stéphane Donnay, Marc Engels and Ivo Bolsens, “Efficient Bit-Error-Rate estimation of multicarrier transceivers,” Proc. DATE 2001. [26] P. Wambacq, P. Dobrovolný, S. Donnay, M. Engels and I. Bolsens, “Compact modeling of nonlinear distortion in analog communication circuits”, Proc. DATE 2000. [27] P. Dobrovolny, P. Wambacq, G. Vandersteen, S. Donnay, M. Engels and I. Bolsens, “Bottom-up generation of accurate complex lowpass equivalent of integrated RF ICs for 5 GHz WLAN”, Proc. IEEE MTT-S International Microwave Symposium, Vol. 1, pp. 419-22, Phoenix (AZ), USA, May 2001. [28] G. Vandersteen, F. Verbeyst, P. Wambacq and S. Donnay, “Highfrequency nonlinear amplifier model for the efficient evaluation of inband distortion under nonlinear load-pull conditions”, Proc. DATE 2002. [29] D. Verkest, W. Eberle, P. Schaumont, B. Gyselinckx and S. Vernalde, “C++ based system design of a 72 Mb/s OFDM transceiver for WLAN,” Proc. Custom Integrated Circuits Conference, pp. 433-439, 2001. [30] G. Van der Plas et al., “AMGIE – a synthesis environment for CMOS analog integrated circuits,” IEEE Trans. On Computer-Aided Design, vol. 20, no. 9, pp. 1037-1058, September 2001. [31] K. Lampaert, G. Gielen and W. Sansen, Analog layout generation for performance and manufacturability, Kluwer Academic Publishers, Dordrecht, the Netherlands, 1999.
Structured Simulation-Based Analog Design Synthesis Rob A. Rutenbar Department of Electrical and Computer Engineering Carnegie Melon University Pittsburgh, Pennsylvania 15213 USA Abstract
Early generations of analog synthesis tools failed to migrate into mainstream use primarily because of difficulties in reconciling the simplified models required for synthesis with the industrial-strength simulation environments required for validation. We have recently seen the emergence of simulation-based synthesis tools that can size/bias a fixed circuit topology by exploiting the same simulation environment created to validate the sized circuit. These methods work remarkably well across a range of difficult analog circuits, and augmented with suitable macromodeling, have also been applied successfully to system-level designs. In this paper we review the motivation and architecture of simulation-based analog synthesis tools, and survey a few recent results from industrial designs. 1. Introduction
Mixed-signal design starts will shortly outnumber purely digital starts. The reason is simple: new communication-oriented ICs require an interface to the external, continuous-valued world. The digital portion of these designs can be attacked with modern cell-based tools for synthesis, mapping, and physical design. The analog portion, however, is still routinely designed by hand. Although it is typically a small fraction of the overall device count (e.g., 10,000 to 20,000 analog transistors), the analog partition in these designs is often the bottleneck because of the lack of automation tools. The situation worsens as we strive to build System-on-Chip (SoC) designs. To manage complexity and time-to-market, SoC designs require a high level of reuse, and cell-based techniques lend themselves 95 M. Steyaert et al. (eds.), Analog Circuit Design, 95–114. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
96
well to a variety of strategies for capturing and reusing digital intellectual property (IP). But these digital strategies are inapplicable to analog designs, which rely for basic functionality on tight control of low-level device and circuit properties that vary from technology to technology. The analog portions of these systems are still designed by hand today. They are even routinely ported by hand as a given IC migrates from one fabrication process to another. A significant amount of research has been devoted to cell-level analog synthesis, which we define as the task of sizing and biasing a device-level circuit with 10 to 100 devices. However, as we have noted previously [1], early approaches failed to make the transition from research to practice. This was due primarily to the prohibitive effort needed to reconcile the simplified circuit models needed for synthesis with the “industrial-strength” models needed for validation in a production environment. In digital design, the bit-level, gate-level and blocklevel abstractions used in synthesis are faithful to the corresponding models used for simulation-based validation. This has not been true for analog synthesis. Fig. 1 illustrates the basic architecture of most analog synthesis tools. An optimization engine visits candidate circuit designs and adjusts their parameters in an attempt to satisfy designer-specified performance goals. An evaluation engine quantifies the quality of each circuit candidate for the optimizer. Early research focused on trade-offs between the optimizer (which wants to visit many circuit candidates) and the evaluator (which must itself trade accuracy for speed to allow sufficiently vigorous search). Much of this work was really an attempt
97
to evade a harsh truth—that analog circuits are difficult and time-consuming to evaluate properly. Even a small cell requires a mix of ac, dc and transient analyses to correctly validate. In modern design environments, there is enormous investment in simulators, device models, process characterization, and cell sign-off validation methodologies. Indeed, even the sequence of circuit analyses, models, and simulation test-harnesses is treated as valuable IP. Given these facts, it is perhaps no surprise that analog synthesis strategies that rely on exotic, nonstandard, or fast-but-incomplete evaluation engines have fared poorly in real design environments. To trust a synthesis result, one must first trust the methods used to quantify the circuit’s performance during synthesis. Most prior work failed here. Given the complexity of, investment in, and reliance on simulatorcentric validation approaches for analog cells, we have argued that for a synthesis strategy to have practical impact, it must use a simulatorbased evaluation engine that is identical to that used to validate ordinary manual designs [2]-[5]. Several synthesis and optimization tools that adhere to this dictum have recently emerged, in partial response to these acknowledged problems. Efforts here include academic tools [6][8], proprietary (internal) industrial tools [9], and commercially available synthesis systems [10]-[12]. Simulation-based synthesis poses many significant technical challenges. For example, commercial circuit simulators are not designed to be invoked 10,000 times in the inner loop of a numerical optimizer. And the CPU time to visit and simulate this many solution candidates seems, at first glance, intractable even for basic circuits, let alone system-level designs that might require hours just to simulate once. In this paper we review a series of simulation-based synthesis tools developed at Carnegie Mellon University over the last five years that successfully address all these issues [2]-[5]. Our tools rely on four key ideas:
1. We recast the synthesis task as an unconstrained numerical optimization problem. 2. We encapsulate commercial simulators so that their implementation idiosyncrasies are hidden from our optimization engine.
98
3. We introduce distributed global optimization algorithms that are robust in finding workable circuits, yet require no partially sized starting solution. 4. We exploit network-level workstation parallelism to render the overall computation times tractable across a pool of machines. To demonstrate the practicality of the approach, we review several industrial designs synthesized using these tools. The remainder of the paper is organized as follows. Section 2 briefly reviews prior approaches. Section 3 gives a brief overview of the basic components of a simulation-based synthesis tool, including issues at both cell and system level. Section 4 reviews several production designs done at Texas Instruments, using prototype versions of the CMU tools. Finally, Section 5 offers concluding remarks. 2. Review of Prior Approaches
2.1 Early Synthesis Approaches for Cells
Referring again to Fig. 1, we can broadly categorize previous work on analog synthesis by how it searches for solutions and how it evaluates each visited circuit candidate. See [13],[14] for more extensive surveys. The earliest work on synthesis used simple procedural techniques [15], rendering circuits as explicit scripts of equations whose direct evaluation completed a design. Although fast, these techniques proved to be difficult to update, and inaccurate. Numerical search has been used with equation-based evaluators [16]-[18], and even combinatorial search over different circuit topologies [19],[20], but equation-based approaches remain brittle in the face of technology changes. Hierarchical systems [21]-[24] introduced compositional techniques to assemble equation-based subcircuits, but still faced the same update/accuracy difficulties. Qualitative and fuzzy reasoning techniques [25], [26] have been tried, but with limited success. Recent work has emphasized equation-based simplifications that enhance numerical tractability, notably convex modeling with posynomials [27]. However, it remains prohibitively expensive to create these models--indeed, often more expensive than manually designing the circuit. Also, the simplifications required
99
in these closed-form analytical circuit models always limit their accuracy and completeness. Symbolic analysis techniques, which have made significant strides of late [28]-[30] offer an automated path to obtaining some of these design equations. However, they remain limited to linear or weakly nonlinear performance specifications. Some “partial” steps toward full simulator-in-the-loop synthesis have also appeared. One option is to simplify the simulator itself, e.g., [1],[13], which targeted fast linear small-signal performance evaluation. Or, one can severely truncate the search process to afford the cost of SPICE simulation for each candidate solution as in [31]. Neither extreme yields a satisfactory and general solution: we either mis-evaluate some candidate solutions, or miss them altogether by limiting the search process. 2.2 Early Synthesis Approaches for Systems The literature on cell-level synthesis is rather rich; the literature on system-level synthesis is not. Macromodeling, e.g., [32], plays a central role, since many system-level designs are intractable to simulate flat, at the device level. There is a wider variety of attacks on topology generation, e.g., via templates, pattern matching, scripting, hierarchical performance prediction [33]-[35] since systems have more degrees of freedom than cells. Hierarchical composition in the style of [21] plays a central role, since systems negotiate specifications with subblocks to achieve overall goals. The added degrees of freedom inherent in these more complex designs have limited analog systems synthesized to date to modest size and performance, (e.g., [35]) or to restricted classes of structured systems (e.g., [34] for designs). 2.3 Analog Optimization
Analog Synthesis
Finally, we need to note the distinction between optimization and synthesis for analog sizing/biasing. Several circuit optimization tools have relied on simulator-based methods (e.g., [36]). For circuit optimization we assume some “close” initial circuit solution, and seek to improve it. This can be accomplished with gradient techniques requiring a modest number of circuit evaluations. For example, modern trust-region methods have been used with notable success here [8],[37]. However, in
100
our model of circuit synthesis, we assume that no starting solution is available at all. We know nothing about our starting circuit other than acceptable numerical ranges for the design variables, and how to simulate it to evaluate against specifications. This scenario is much more realistic of industrial design tasks, and more difficult as a numerical problem; it requires a more powerful and general solution strategy. 3. Simulation-Based Synthesis Formulation
The problem with early synthesis approaches was their use of circuit evaluation engines different from the simulators and simulation strategies that designers actually use to validate their circuits. These engines traded off accuracy and completeness of evaluation for speed. We argue that this is no longer an acceptable trade-off. Our simulationbased synthesis strategies all rely on a few key ideas. Given limitations of space, we review these rather briefly. Additional details appear in [2]-[5]. 3.1 Cell-Level Simulation-Based Synthesis Given un unsized analog circuit topology, our goal is to size/bias the topology to meet an arbitrary set of nonlinear performance goals, as evaluated by an arbitrary detailed circuit simulator. Our architecture for synthesis relies on the following four ideas to accomplish this: Optimization-based formulation: We use the numerical formulation of OBLX [1], in which the synthesis task is rendered as a single, scalar, weighted-sum cost function whose individual terms measure the divergence of the current solution candidate from the desired performance specification. Individual variables in this optimization represent circuit parameters, ‘., MOS device sizes, passive component values, bias voltages and currents. The components of the cost function, and their relative weights, are all constructed automatically; no “magic numbers” are required from the designer. Our goal is to search this complex, highly nonlinear cost-function for a minimum, which corresponds to a “best” circuit solution. Simulator encapsulation: Each circuit, which corresponds to a point on this cost-surface, is evaluated via complete, detailed simu-
101
lation. We use a technique we call simulator encapsulation [2] to hide the low-level idiosyncrasies of individual commercial simulators from our optimizer. Encapsulation is physically a layer of “insulating” software that handles the many data formatting issues necessary to communicate with a simulator, and also handles starting each simulator, and restarting it after crashes. Such crashes are common occurrences in synthesis, which frequently visits highly nonphysical circuit candidates. Encapsulation allows us to change simulator details without disturbance to our optimization engine. Distributed global search: If we seek only to “tweak” an existing design to improve it slightly, we can use deterministic optimization methods that rely on local search to find a superior nearby solution. Such optimization methods often assume that the cost surface is locally smooth, and reasonably well behaved. Unfortunately, in synthesis we can assume no viable starting solution, and the cost surfaces are not only discontinuous, but sometimes even un-evaluatable; some combinations of design parameters lead to circuits whose performance will crash a simulator. Hence we focus on derivative-free sample-based global optimization methods. However, to mitigate the CPU time needed to simulate each design candidate, we need search algorithms that scale easily to many processors. Fig. 2 shows the general architecture of our search algorithms. We borrow methods from evolutionary algorithms, and use a population of solutions, many samples of which are independently “evolved” via short bursts of local optimization. We use ideas from annealing, genetic, and direct optimization [2]-[5]. The population of solutions helps avoid local minima in search, and allows robust scaling behavior, even up to 20 or more CPU nodes. Workstation-level parallelism: Because simulation-based search is expensive in time, we exploit workstation-level parallelism [2]. This means our optimizer simply schedules necessary simulation tasks on any machine in a pool of designated CPUs. A critical attribute of our search algorithms is their ability to parallelize both individual circuit simulations and the numerical search process itself. In other words, when multiple simulations are required to evaluate a single circuit candidate, these can be distributed across
102
parallel machines. But in addition, multiple circuit candidates are also being simultaneously evolved. Control mechanisms in the search algorithms force convergence to a set of solutions of similar quality. Distribution of jobs to processors, restarting crashed simulators, balancing the load when some simulations are fast (e.g., a dc evaluation) while others slow (e.g., a full transient THD analysis), are all managed transparently and automatically. 3.2System-Level Simulation-Based Synthesis In simulation-based synthesis, we simulate each design candidate during numerical search. The new problem we face is that system-level blocks are often vastly more expensive to simulate at device-level (if they can simulate at all) than cells. This can defeat our preferred simulator-in-the-loop formulation. We have explored three alternative strategies for coping with system-level complexity [4]: Flat synthesis chooses to ignore the hierarchical system-level structure, flattening it down to a single, potentially large circuit, and treating it just as cell-level synthesis. Where this method works, and the resulting run times are acceptable, it is the easy choice to make. Unfortunately this approach does not always work. Not only are the simulation times for large circuits problematic, but simulator convergence also becomes an issue. The reason is that numeri-
103
cal search often visits exotically parameterized designs--circuits with behaviors that deviate widely from the norms for which commercial simulators are designed. Iterative-sequential synthesis mimics top-down design practice. At the top level of our design, we replace subsystems with simplified behavioral macromodels, guess appropriate model parameters for these subsystems, and formulate synthesis as the task of choosing the remaining top-level component values to satisfy systemlevel goals. This is a straightforward simulation-based synthesis task since our “simulations” are just evaluations of subcircuit models. The real problem is the need to move down the design hierarchy one level at a time, and deal with the fact that even given good macromodels, predicting feasible trade-offs among the parameters for a subsystem can be difficult. We resolve this by iterating the design, using human insight to adjust specifications at each level. Concurrent synthesis is a novel alternative to the above two strategies. The system-level design and its component subsystems evolve simultaneously. Unlike a fully flattened design, the system still uses macromodels for its components, and synthesis sets their input parameters. In contrast, the component cells use complete device-level models and detailed circuit simulation. We link the two synthesis processes into a single numerical problem via a transformation of the cost function. We add terms that coerce agreement between the macromodel parameters evolving at the top of the design hierarchy, and the actual simulated behaviors of the devicelevel components at the bottom of the hierarchy. For example, at the top we may be evolving the gain and bandwidth of an amplifier, while at the bottom we are evolving MOS widths/lengths to achieve this gain/bandwidth. We refer to such specifications as being dynamically set since they evolve naturally as a negotiation between the system and its components. The virtue of the concurrent approach is that it reduces iteration steps, and automatically avoids designs in which macromodel parameters and device-level simulated behaviors disagree. Each of these approaches is useful, though in different design scenarios. We explore two of these in the following section.
104
4. Simulation-Based Synthesis Results
We have implemented versions of these ideas in several prototype synthesis systems built at CMU, notably the tools named MAELSTROM [2], and ANACONDA [3]-[5]. We review here a few results obtained from experiments conducted at Texas Instruments (TI) in Dallas, Texas, using two versions of ANACONDA. 4.1 Cell-level Synthesis: TI Opamps
We benchmarked an early version of ANACONDA on the three opamps shown in Fig. 3, which are all examples of production TI circuits. The power amp was designed in a CMOS process and the other two opamps were designed in a CMOS process. Table 1 gives the input performance constraints, simulation environment, and run-time statistics for each synthesized circuit. For our experiments, the performance constraints were set by running nominal simulations on original expert hand designs supplied by TI. Each design was synthesized using a pool of 24 Sun Sparc workstations. Because runtime is highly dependent on the size of the circuit and the type of simulation being performed, there was significant variation among the circuits. For example, for the power amp, each THD analysis took a few CPU seconds, and consequently, that circuit ran longer. Fig. 4 shows the results of 5 repeated synthesis runs for each circuit. This sample size of 5 gives a good picture of the statistical spread in solutions for our stochastic algorithms (i.e., these are 5 runs in sequence, not the best 5 out of a large number of runs). Given the large number of performance specifications for each circuit, we summarize these results as a power-versus-area scatter plot for each circuit. In all but one case, the synthesized circuits met all of the performance constraints specified in Table 1. The one exception provides excellent justification for why we value simulation-based evaluation: this folded cascode misses its UGF spec (159MHz < 162MHz) by less than 2%, but this is enough to give an almost 15% improvement in area. For designs pushed to tight performance limits, simulation gives us results that are much easier to trust than any approximate analytical equation. For each design, the objectives were to minimize area and static power dissipation. In addition, each of the synthesized designs was ver-
105
ified for robustness by performing Monte Carlo simulations with process, 10% voltage supply and 0 to 100C° temperature variations. The resulting histograms for each performance characteristic showed that each synthesized design was as robust as the original hand design; Fig. 5 shows one of those histograms. These results are noteworthy in several respects. First, these were production-quality industrial analog cells with difficult performance specifications. Second, our synthesis approach used as its evaluation engine the identical simulation environment used by TI’s designers to
106
validate their manual designs. As a result, we could deal accurately with difficult design specifications such as noise, settling time, and THD, which require detailed simulations to evaluate complex nonlinear effects. Finally, nearly all these synthesized circuits compared favorably with their manually designed counterparts, both in performance and in robustness across corners. 4.2System-Level Synthesis Results: TI EQF Filter
Digital Subscriber Line (DSL) technologies combine sophisticated analog and digital signal processing to deliver high-speed digital data and conventional analog voice data over existing copper telephone wires. To understand how synthesis could be applied to systems, we
107
were asked by TI to target the frontend of the remote modem CODEC receiver (Fig. 6.) from the ADSL design they introduced at the 1999 International Solid State Circuits Symposium [38]. We used a later version of ANACONDA [4] on the equalizer filter (EQF) subsystem at the front-end of the analog signal path. Signals in copper phone lines attenuate severely with increasing frequency and line length, and cable
108
bundles introduce considerable crosstalk. The equalizer amplifies the attenuated line signal, the filter extracts the digital data from the spectrum without corrupting the nearby analog voice traffic, and the combined EQF must do this under stringent noise and area constraints set by the CODEC. The EQF itself is shown in Fig. 7 and comprises five identical low noise operational amplifiers (Fig. 8) connected via R’s, C’s and CMOS switches. The equalizer has six separate modes (Fig 9) to compensate for high frequency line attenuation across the frequency range of interest, 25kHz to 1104kHz. Fig. 9 also shows some of the EQF spectral mask. We fixed the topology of the EQF and formulated synthesis as the task of designing parameter values to meet performance specifications. This respects the fact that “libraried” analog blocks are most likely to be stored as topologies that can be re-parameterized to handle new specifications, or fabrication processes. Moreover, expert designers routinely choose good topologies to optimize gross system function, and then spend enormous effort iteratively resizing them; the problem is to determine if a proposed sizing can realize the specified performance in the face of many interacting second-order circuit effects. Hence, it is this sizing/biasing we seek to automate. As a system, EQF has one layer of hierarchy, and five instances of a single component, the low-noise amplifier of Fig. 8. It was designed in a proprietary TI 0.6um CMOS process. At the top level, the EQF has 46 R’s, 32 C’s and 36 CMOS switches. Pole and zero location constraints from the transfer function set a large number of the R’s and C’s. The amplifier is itself a complex cell, and has 20 independent variables.
109
110
In [4] we synthesized components of the EQF, and the complete EQF, in several different styles. Here, we review just two of these results: Flat component synthesis: since we have both the system and component specifications from an expert hand design, we re-synthesized individual pieces of the design to see if we could match the manual effort. Fig. 10 shows one such experiment, in which we redesigned the low-noise amplifier five times. All designs but one met all specifications, and that one “miss” had a few devices with low by a few mV. Again, the virtue of simulation-based synthesis is that we see these effects more precisely; note that compromising gives us a significant, but unmanufacturable, area savings. Each synthesis run took roughly 10 hours Concurrent system synthesis: we also resynthesized the entire EQF using the concurrent approach outlined in the previous section. The top-level EQF circuit was modeled using a proprietary TI opamp macromodel, to allow fast, flat simulation. The bottom-level low-noise amplifiers were modeled flat at device level. Concurrent synthesis forced convergence of the performance requirements evolved for the opamps at the top, with the actual evolved opamp behaviors at the bottom. Fig. 11 shows the resulting passband responses with the equalizer gain set to 0dB. The two important observations to make are that the ripple in the passband and the cut-
111
off frequency are nearly identical for the hand design and the three synthesized designs. When setting up this experiment, we decided to allow the gain to vary slightly to provide an extra degree of freedom to the optimization engine. The difference in gain between the hand design and the synthesized design can be compensated for using the programmable gain amplifier (see Fig. 6), which is the next stage in the CODEC receiver. Fig. 11 shows the EQF’s response over the entire frequency range of interest. We observe
112
from the figure that the cutoff frequency, stopband attenuation, and overall shape of the response are nearly identical for the hand design and the three synthesized EQF designs. Comparing the other performance goals for the synthetic and expert design: Run1 and Run3 had nearly identical noise, but with 7% and 17% better area, respectively; design Run2 has slightly better noise, but at a cost of 14% more area. EQF took a few months to design by hand; our tools were able to complete each EQF run in 12 hours. To the best of our knowledge, the EQF design is the largest, most complex, most thorough controlled experiment ever undertaken to demonstrate how simulation-based analog synthesis can be applied to a stateof-the-art industrial analog system. We successfully redesigned the EQF block in the TI ADSL frontend in several different ways, and examined the trade-offs involved. Simulation-based synthesis, with a mix of macromodels, transistor-level detailed simulation, and vigorous global numerical search, can yield practical results on large, complex designs. 5. Conclusions To trust a synthesis result, one must first trust the methods used to qualify each solution candidate during synthesis. Analog designers necessarily evolve an intimate relationship with the simulators, models, etc., that they use on a daily basis to evaluate their circuits. To make analog synthesis practical, we have argued that synthesis must use identical verification methods internally. We have developed a family of simulation-based synthesis tools at CMU that have been successfully used across many designs. Other similar tools are also being developed elsewhere. Open problems here are how to scale these techniques “up” to ever bigger systems, and how to handle the layers of macromodeling needed to do this. Acknowledgment: This paper reviews joint work done in collaboration with several colleagues whose contributions were essential to this research: L. Richard Carley, Michael Krasnicki and Rodney Phelps of CMU, and James R. Hellums of TI. The work was funded in part by the Semiconductor Research Corp., by the National Science Foundation under contract 9901164, and by Rockwell and Texas Instruments.
113
References [1] E. Ochotta, R.A. Rutenbar, L.R. Carley, “Synthesis of High-Performance Analog Circuits and ASTRX/OBLX,” IEEE Trans. CAD, vol. 15, no. 3, Mar. 1996. [2] M. Krasnicki, R. Phelps, R. A. Rutenbar, L.R. Carley, “MAELSTROM: Efficient Simulation-Based Synthesis for Custom Analog Cells,” Proc. ACM/ IEEE DAC, June 1999. [3] R. Phelps, M. Krasnicki, R.A. Rutenbar, L.R. Carley, J.R. Hellums, “ANACONDA: Robust Synthesis of Analog Circuits Via Stochastic Pattern Search,” Proc. IEEE Custom Integrated Circuits Conference, May 1999. [4] R Phelps, M. Krasnicki, R.A. Rutenbar, L.R. Carley, J. Hellums, “A case study of synthesis for industrial-scale analog IP: Redesign of the equalizer/filter frontend for an ADSL CODEC,” Proc. ACM/IEEE DAC, June 2000. [5] R. Phelps, M. Krasnicki, R. A. Rutenbar, L. R. Carley, J. R. Hellums, “Anaconda: Simulation-based synthesis of analog circuits via stochastic pattern search,” IEEE Trans. CAD, vol. 19, no. 6, June 2000. [6] P. J. Vancorenland, G. Van der Plas, M. Steyaert, G. Gielen, and W. Sansen, “A Layout-aware Synthesis Methodology for RF Circuits”, Proc. ACM/IEEE ICCAD, Nov 2001. [7] P. Vancorenland, C. De Ranter, M. Steyaert, G. Gielen, “Optimal RF Design Using Smart Evolutionary Algorithms,” Proc. ACM/IEEE DAC, June 2000. [8] R. Schwenker, J. Eckmueller, H. Graeb, K. Antriech, “Automating the Sizing of Analog CMOS Circuits by Consideration of Structural Constraints,” Proc DATE99, March 1999. vol. 19, no. 6, June 2000. [9] M.J., Krasnicki, R. Phelps, J.R. Hellums, R.A. Rutenbar, L.R. Carley, “ASF: A Practical Simulation-Based Methodology for the Synthesis of Custom Analog Circuits,” Proc. ACM.IEEE ICC AD, Nov. 2001. [10] S Dugalleix, F Lemery, A Shah, “Technology migration of a high-performance CMOS amplifier using an automated front-to-back analog design flow,” Proc Design Auto. & Test in Europe (DATE), March 2002. [11] E. Hennig, R. Sommer, L. Charlack, “An automated approach for sizing complex analog circuits in a simulation-based flow,” Proc Design Auto. & Test in Europe (DATE), March 2002. [12] T. McConaghy, “Intelligent Systems Solutions for Analog Synthesis,” Integrated Communications Design, Penwell, January, 2002 [13] E. Ochotta, T. Mukherjee, R.A. Rutenbar, L.R. Carley, Practical Synthesis of High-Performance Analog Circuits, Kluwer Academic Publishers, 1998. [14] G.G.E. Gielen and R.A. Rutenbar, “Computer-Aided Design of Analog and Mixed-Signal Integrated Circuits, Proc IEEE, vol. 88, no. 12, Dec. 2000. [15] M. Degrauwe et al., “Towards an analog system design environment,” IEEE JSSC, vol. sc-24, no. 3, June 1989. [16] H.Y. Koh, C.H. Sequin, and P.R. Gray, “OPASYN: a compiler for MOS operational amplifiers,” IEEE Trans. CAD, vol. 9, no. 2, Feb. 1990. [17] G. Gielen, et al, “Analog circuit design optimization based on symbolic simulation and simulated annealing,” IEEE JSSC, vol. 25, June 1990. [18] F. Leyn, W. Daems, G. Gielen, W. Sansen, “A Behavioral Signal Path Modeling Methodology for Qualitative Insight in and Efficient Sizing of CMOS Opamps,” Proc. ACM/IEEE ICCAD, 1997.
114
[19] P. C. Maulik, L. R. Carley, and R. A. Rutenbar, “Integer Programming Based [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38]
Topology Selection of Cell Level Analog Circuits,” IEEE Trans. CAD, vol. 14, no. 4, April 1995. W. Kruiskamp and D. Leenaerts, “DARWIN: CMOS Opamp Synthesis by Means of a Genetic Algorithm,” Proc. 32nd ACM/IEEE DAC, 1995. R. Harjani, R.A. Rutenbar and L.R. Carley, “OASYS: a framework for analog circuit synthesis,” IEEE Trans. CAD, vol. 8, no. 12, Dec. 1989. B.J. Sheu, et al., “A Knowledge-Based Approach to Analog IC Design,” IEEE Trans. Circuits and Systems, CAS-35(2):256-258, 1988. E. Berkcan, et al., “Analog Compilation Based on Successive Decompositions,” Proc. of the 25th IEEE DAC, pp. 369-375, 1988. J. P. Harvey, et al., “STAIC: An Interactive Framework for Synthesizing CMOS and BiCMOS Analog Circuits,” IEEE Trans. CAD, Nov. 1992. C. Makris and C. Toumazou, “Analog IC Design Automation Part II--Automated Circuit Correction by Qualitative Reasoning,” IEEE Trans. CAD, vol. 14, no. 2, Feb. 1995. A. Torralba, J. Chavez and L. Franquelo, “FASY: A Fuzzy-Logic Based Tool for Analog Synthesis,” IEEE Trans. CAD, vol. 15, no. 7, July 1.996. M. Hershenson, S. Boyd, T. Lee, “GPCAD: a Tool for CMOS Op-Amp Synthesis”, Proc. ACM/IEEE ICCAD, pp. 296-303,1998 G. Gielen, P. Wambacq, and W. Sansen, “Symbolic ANalysis Methods and Applications for Analog Circuits: A Tutorial Overview, ”Proc. IEEE, vol. 82, no. 2, Feb., 1990. C.J. Shi, X. Tan, “Symbolic Analysis of Large Analog Circuits with Determinant Decision Diagrams,” Proc. ACM/IEEE ICCAD, 1997. Q. Yu and C. Sechen, “A Unified Approach to the Approximate Symbolic Analysis of Large Analog Integrated Circuits,” IEEE Trans. Circuits and Sys., vol. 43, no. 8, August 1996. F. Medeiro, F.V. Fernandez, R. Dominguez-Castro and A. RodriguezVasquez, “A Statistical Optimization Based Approach for Automated Sizing of Analog Cells,” Proc. ACM/IEEE ICCAD, 1994. Y-C Ju, V.B. Rao and R. Saleh, “Consistency Checking and Optimization of Macromodels”, IEEE Transactions on CAD, August 1991. B. Antao and A. Brodersen, “ARCHGEN: Automated Synthesis of Analog Systems”, IEEE Transaction on VLSI Systems, June 1995. F. Medeiro, B. Pérez-Verdú, A. Rodríguez-Vázquez, J. Huertas, “A verticallyintegrated tool for automated design of SD modulators,” IEEE Journal of Solid-State Circuits, Vol. 30, No. 7, pp. 762-772, July 1995. A. Doboli, et al, “Behavioral synthesis of analog systems using two-layered design space exploration,” Proc. ACM/IEEE DAC, June 1999. W. Nye, et al, “DELIGHT.SPICE: an optimization-based system for the design of integrated circuits,” IEEE Trans. CAD, vol. 7, April 1988. A.R. Conn, R.A. Haud, C. Viswesvariah, C.W. Wu, “Circuit optimization via adjoint lagrangians,” Proc. ACM/IEEE ICCAD, Nov. 1997. R. Hester, et al., “CODEC for Echo-Canceling, Full-Rate ADSL Modems,” IEEE Int’l Solid-state Circuits Conference, pages 242-243. 1999.
Structured Analog Layout Design Koen Lampaert Mindspeed Technologies 4000 McArthur Blvd. Newport Beach, CA 92660
USA Abstract Analog integrated circuits are very important as interfaces between the digital parts of integrated electronic systems and the outside word. A large portion of the effort involved in designing these circuits is spent in the layout phase, which continues to be a manual, time-consuming and error-prone task. This is mainly due to the continuous nature of analog signals, which causes analog circuit performance to be very sensitive to layout parasitics. Successful automation of analog layout requires a structured layout methodology, state-of-the-art CAD tools and a well-integrated design system. In this paper, we describe such a system and show it’s importance in obtaining a manufacturable layout that meets all specifications at minimum cost and in the minimum time. We give an overview of the advanced methods and tools currently available for analog layout generation, migration and reuse.
2. Challenges in Analog Layout Design Generating the layout of high-performance analog circuits is a difficult and time-consuming task that has a considerable impact on circuit performance. The various parasitics that are introduced during layout design can introduce severe performance degradation. The parasitic elements associated with interconnect wires cause loading 115 M. Steyaert et al. (eds.), Analog Circuit Design, 115–132. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
116
and coupling effects that degrade the frequency behavior and the noise performance of analog circuits. Device mismatch and thermal effects put a fundamental limit on the achievable accuracy of circuits. Since these parasitics are unavoidable, one of the main concerns in analog layout is to control and predict their effect on circuit performance and to make sure that the circuit after layout still performs within its specifications. The layout of basic analog components also has a direct influence on their electrical behavior. The performance of MOS transistors, resistors, capacitors and inductors is extremely sensitive to low-level geometric device layout details. For instance, the junction capacitance of the MOS transistor source and drain regions is proportional to the total diffusion area and perimeter and can be minimized by proper layout. The parasitic components of resistors and capacitors are also determined by their layout. Minimization of these device parasitics requires careful and time-consuming polygon level layout. Analog layout also impacts the manufacturability and reliability of a design. For instance, the width of a wire and the number of vias used determine the maximum current that can flow through a circuit connection. Insufficient wire width will result in electro-migration failures. Because of its high impact on circuit performance, analog layout has traditionally been done manually by skilled layout engineers. In recent years, increasing time-to-market pressure has forced companies to structure their layout flows and to apply automation techniques where possible. What follows is an overview of the state-of-the-art in analog layout tools and methodologies. We have divided the overview in five main sections. The first section describes device-level analog layout automation through the use of procedural device generators. Integration of device generators in a schematic driven layout system is discussed in section 4. In section 5, we discuss constraint-driven layout as a prerequisite for cell-level analog layout automation. In section 6, we give an overview of analog place and route algorithms.
117
The often-overlooked problem of analog layout reuse and process migration is the topic of section 7.
3. Device Generators A lot of the effort involved in device level analog layout can be automated through the use of device generator libraries. A device generator is a program that procedurally generates a layout for a device, based on a set of device parameters, a technology specification and a number of user specified options. Most commercially available analog design frameworks support device generators. They can be written in a tool-specific macro language or in a standard programming language such as C. In general, layouts created by device generators can be of any complexity, ranging from basic devices (transistors, capacitors, and resistors) to complete amplifier stages. Virtually all of the commonly used analog-specific layout optimizations, e.g. device merging, layout symmetries and matching considerations, can be programmed into these generators. Writing and maintaining a library of module generators is a major engineering challenge and generator libraries often turn out to be large and complex software systems. It is therefore crucial that module generators are written in a process-independent way, to make it easy to port them to new technologies.
118
Fig. 1 shows an example of an electro-migration constraint that can be enforced by a device generator. The MOS transistor in this figure has wide source and drain wires to accommodate a large current. The device generator that was used to produce this layout calculates the required width based on the specified current and the technology specific maximum current density information. Fig. 2 illustrates how diffusion-sharing optimizations can be built into a device generator. The layout shown in the figure consists of two MOS transistors connected source to drain in a cascode configuration. No contacts are necessary on the diffusion regions that form the shared source/drain of the devices. This allows putting the poly gate wires at minimum distance, which results in substantially reduced parasitic source/drain capacitance on this node. The circuit designer can enforce the use of this layout style by instantiating the corresponding symbol in his schematic.
119
The obvious advantage of using device generator libraries is the increase in productivity achieved by automating the repetitive and time-consuming device-level layout task.
4. Schematic Driven Layout Although the use of device generators can significantly reduce the effort associated with analog layout, their real advantages become apparent when the they are integrated into an analog design kit and supported by a schematic-driven layout system support. An example of such a system is shown in Fig. 3. At the heart of the system is a schematic driven layout editor that uses instance attributes from a schematic and a library of procedural device generators to create an initial layout for a circuit. The tool also extracts the connectivity from the schematic and uses it to drive subsequent interactive and automatic layout optimization steps. Some systems also create an initial placement based on the relative positions of the components in the schematic.
120
There are three advantages associated with the use of a schematic-driven layout methodology. The first advantage is the increased productivity that results from the automated layout of the individual devices. The second advantage is the reduction in layout verification effort that results from the connectivity-driven place and route style. The third and most important advantage is the tight control over the parasitic elements associated with the layout of devices. For high-performance analog and RF designs, the effect of layoutdependent device parasitics has to be taken into account throughout the design cycle. The use of procedural device generators results in predictable layout and hence predictable device parasitics. This allows making accurate predictions of circuit performance early in the design cycle, before the actual layout is done.
5. Constraint-Driven Layout
Constraint-driven layout is an extension of the schematic-driven layout style where the layout is driven by a set of formal layout
121
constraints that are specified on the schematic. In this methodology, layout constraints are annotated to a design using a constraint-editing tool. Such tools are starting to become available in commercial EDA platforms. This section describes some of the important analog layout constraints and proposes a formal way of specifying them. In high-performance analog circuits, it is often required that groups of devices are placed symmetrically with respect to one or more symmetry axes. Symmetric placement allows for symmetric routing and results in matched parasitics. Symmetry constraints can be formulated in terms of couples, self-symmetric devices and symmetry groups. Two devices that are placed symmetrically with respect to an axis form a couple. A self-symmetric device is a device that is placed on a symmetry axis. A symmetry group is a collection of couples and self-symmetric devices that share the same symmetry axis. The symmetry group represented in Fig. 5 consists of the couples (M1A,M1B) and (M2A,M2B) and the self-symmetric device M5. More than one symmetry group can be specified for a circuit. The presence of one or more symmetry groups has the following implications for the layout: Two devices that are specified as a couple must be placed symmetrically with respect to an axis and must have identical shapes and mirrored orientations. A device that is specified as self-symmetric must be placed on the symmetry axis. Couples and self-symmetric devices that belong to the same symmetry group must share the same symmetry axis. Matching constraints can be specified by defining matching groups. A matching group is a set of two or more devices for which an accurate ratio of device characteristics is required. The simplest and most common case of a matching group is a pair of equal devices. A more complicated case of a matching group is shown in Fig. 6. Any number of matching groups can be defined in an analog circuit. The presence of one or more matching groups has the following implications for the layout:
122
All devices that belong to the same matching group must have equal orientations. If the devices of a matching group are equal (1:1 ratios), they must be implemented with equal shapes. If they have another ratio, they should be built of equal unit devices, according to the ratio. The devices have to be placed in close proximity to minimize any variations in device characteristics.
123
5. Cell Level Analog Layout Automation
Once the analog layout constraints for a circuit are specified, analog layout automation tools can be used to enhance design productivity. Most analog layout automation systems follow a basic three-step approach that mimics the way human layout designers create layouts. During the first step of the process, the circuit is divided into groups of one or more devices, which will be treated as individually placeable and shape-able objects by the placement tool. The complexity of these modules ranges from basic devices (transistors, capacitors, resistors) to more complicated structures like current mirrors or differential pairs. Some analog layout systems restrict the complexity of their modules to basic devices, in which case the module recognition step is trivial and can be omitted. Other layout systems allow more complex modules and rely on the designer or on dedicated algorithms to make the division. Once the division into modules is completed, a placement can be generated. Each module can
124
be laid out in several electrically equivalent ways, called variants. The task of the placement program is to select an optimal variant for each module, and to place these variants in the layout in an optimal way. The next step in the design flow is routing. The task of the router is to connect the modules according to the netlist of the circuit. Sometimes, compaction can be used as a final step to improve the density of the layout. A schematic-driven layout system enhanced with analog layout automation tools is shown in Fig. 7.
Throughout this process, layout parasitics that degrade the circuit performance have to be monitored and their effect on performance has to be kept within user-specified bounds. During placement, symmetry and matching constraints have to be enforced to limit the device and interconnect mismatch. During routing, the parasitic elements associated with interconnect wires have to be controlled to avoid excessive loading and coupling effects that degrade the frequency
125
behavior and the noise performance of analog circuits. Controlling these complex and often conflicting requirements is what makes analog layout such a difficult task and explains the slow acceptance of analog layout automation tools in industrial environments. Commercial analog place and route tools that are based on a decade of academic research in this area [1,2.3] are starting to appear. In the remainder of this section, we describe a performance-driven implementation of an analog place and route tool. 6.1 Analog Placement The first step in the execution of the placement program consists of a number of numerical simulations which result in a set of performance sensitivities and operating-point information (branch currents, node voltages) of the circuit. This information, together with the circuit netlist, is then used as input for a set of module generators that construct a list of geometrical variants for every device. Only the information needed for the optimization of the placement is generated (the generators are called in interface mode). Next, a simulatedannealing algorithm is used to generate the actual placement, taking into account all the constraints and objectives that have been identified in the previous section. After the optimization, the module generators are called again (this time in layout mode) to create the actual layout for the selected geometrical device variants and the final layout is constructed. The output of the program is the final placement, together with information about the performance degradation in this final layout and an identification of the most important contributions to this degradation. In case the degradation exceeds the required performance specifications, this information can be used by the designer to see the failing performance(s) and to identify the critical effects. This allows him to improve his design when desired. Although the constraints identified in the previous sections make analog performance driven placement a unique problem, the core problem, arranging a set of connected blocks on a layout surface, has been studied extensively in the context of other related VLSI
126
placement and floorplanning problems. The analog placement problem is usually solved using a stochastic optimization algorithm such as simulated annealing. The advantage of the simulated annealing algorithm is that it offers the possibility of putting arbitrary constraints on the generation of new candidate placement solutions. The cost function that evaluates the quality of intermediate solutions can be used to implement the analog performance and geometrical constraints. Simulated annealing operates on the entire solution simultaneously, which is essential for accurate constraint evaluation. An important aspect of the design of an analog placement tool is how and where to implement the different analog constraints that have been identified in the previous section. The effect of these constraints is that they make a subset of the complete set of possible placements illegal. For instance, a placement where two matching devices have different orientations violates a matching constraint and is an illegal solution. Two different approaches can be taken to avoid these illegal solutions. The first approach is to design the move set such that unacceptable placements can never be reached. In the case of a symmetry group for instance, this can be accomplished by placing the devices symmetrically in the initial solution and always moving the couples simultaneously, such that the symmetry is preserved at all times during the annealing process and therefore also in the final solution. A set of moves that accomplishes this is shown in Fig 8. The advantage of this approach is that there is no CPU time wasted evaluating placements which are considered unacceptable. In addition to this, the constraints are guaranteed to be met by construction. The price that has to be paid for this is a more complicated move set. The second approach is to put a penalty on constraint violations in the cost function. For the constraint to be met, this penalty term has to be driven to zero by the annealing mechanism. For the case of two matching transistors, this penalty term can be set to zero if their orientations are equal and to some non-zero value if they are different. An important consequence of this approach is that the constraint is no longer guaranteed to be met by construction. The simulated annealing
127
mechanism tries to minimize the overall value of the cost function, which does not necessarily mean that every single term will be driven to zero. Implementing a constraint in the cost function implies that it will be traded off against other competing constraints and that it will be driven to zero only if this gives the best overall result.
A conclusion that can be drawn from the preceding discussion is that hard constraints, i.e. constraints that absolutely must be met, are best implemented as restrictions in the move set. Unfortunately, some hard constraints are difficult to maintain by construction. If this is the case, they must be implemented as penalty terms in the cost function and special care must be taken that they are actually driven to zero in the final result (for instance by giving them large weights).
128
6.2 Analog Routing
The routing phase is critical for the overall performance of a layout, since it fixes the final values of the interconnect parasitics. While the placement phase has taken into account the effect on the performance of the minimum values for the interconnect parasitics, their real value is determined during routing. Therefore, the main concern during performance driven routing is to connect all wires while limiting the performance degradation introduced by the actual interconnect parasitics within the specifications of the user. The basic operation of the algorithm that is used to connect a source and a target region is illustrated in Fig. 9. The router starts with a collection of partial paths that are derived from the source terminal. These paths are sorted according to an analog-specific cost function and stored in the collection of partially completed paths. The following procedure is then executed until one of the partial paths reaches the target terminal: 1. Out of the collection of partially completed paths, one path is selected for expansion. This path selection process is based on a best-first search strategy. The path selection process and the cost function are based on the quality of the path: i.e. the path that introduces the least amount of performance degradation has a higher priority. 2. The partial path selected in step 1 is expanded into a collection of new partial paths using a multi-directional expansion mechanism. 3. The partial paths generated in step 2 are checked to see if they are design rule correct, i.e. if they do not overlap with the devices or with previously routed wires. 4. The design rule correct paths are checked again to see if they overlap with the target terminal. If they do, the connection is complete and the iterative procedure can be terminated. If the target has not been reached yet, the new partial paths are inserted into the partial path collection and they become candidates for further expansion, together with all the partial paths generated during previous iterations.
129
In analog circuits, it is often critical to match parasitics on symmetric nodes. To achieve this matching, nets have to be routed symmetrically, even if the placement is not completely symmetric. This can be achieved by routing a pair of symmetric nets in one step. During this routing step, only one of the nets is actually routed and mirroring the net with respect to the symmetry axis generates its symmetric counterpart. To make sure that the layout of the symmetric net pair is DRC correct on both sides of the axis, the DRC step which is executed after each expansion step has to be done on both sides of the axis. The router needs a cost function to evaluate the quality of a partial path. In the context of performance-driven analog layout, the cost of a partial path is determined by its impact on the performance of the overall layout. The cost function that is used to discriminate between alternative routes needs to reflect this. For instance, a path that introduced coupling between a sensitive and a noisy net should not be selected for further expansion.
130
7. Analog Layout Migration and Reuse
IP reuse is one of the key factors in achieving the engineering quality and the timely completion of today's complex analog designs. The hard IP reuse techniques that are emerging in digital design are difficult to apply to analog building blocks, since these circuits have to be optimized for each application. All though analog circuit topologies are frequently reused, the parameters of the individual devices are usually optimized to maximize the performance and to minimize the power consumption for a given specification. In practice, this means that a significant portion of the layout has to be redone each time a circuit topology is reused, and that a major portion of the benefit is lost. To overcome this problem, a template driven layout systems can be used. These systems allow the reuse of an analog circuit layout for different designs and/or process technologies. A layout template is used to capture an expert's knowledge of analog layout for a given circuit topology. The template is created once by an expert layout designer and captures his knowledge of analog specific constraints like symmetry, device matching and parasitic minimization. To generate a circuit layout for a new design, the designer supplies a schematic with the new device parameters for the circuit and/or a new technology file. The layout is generated by transforming the template into an actual layout using specialized analog shape optimization and compaction techniques. During this process, all the layout knowledge implemented in the template is preserved: the new layout has the same relative device configurations, the same wire trajectories and material types and the symmetry and matching relations as the template layout.
131
Fig. 10. gives an overview of a template driven layout system. The input to the system consists of a template layout, a schematic with the new device sizes and a new process technology file. As a first step, a library of device generators is used to generate device layouts for the new device sizes specified in the schematic. The best layout variant for each device is selected during an optional shape optimization step. A shape optimizer can be used to optimize the shapes of individual devices while preserving the relative device configurations of the layout template. Different aspect-ratio's of devices can be generated by varying the geometric parameters of the instances, e.g. changing the number of fingers of a transistor. After shape optimization, the original templates devices are replaced with the actual device layouts and a compaction tool is used to generate a new design rule correct layout that preserves all the analog constraints of the template. A specialized compactor that supports analog constraints like symmetry and matching has to be used. Another important requirement is the capability to correctly resize wires based on their currents flowing through the circuit.
132
10. Conclusion
A well-defined and integrated set of tools and libraries is needed to produce high-quality analog layouts that meet specifications at minimum cost and in the minimum time. Schematic driven layout methods supported by device generator libraries should be a standard part of every analog design flow, since they enhance layout productivity and quality at the same time. A formal way of specifying analog layout constraints for a design is essential in a structured layout methodology and is also a requirement for any analog layout automation tool. Once the constraints are specified, analog cell-level automatic layout tools can generate an optimized layout using cost functions that take the analog layout constraints into account. These tools have reached the level of maturity where they can be used successfully in an interactive or fully automated way. Tools that support analog layout migration and reuse offer tremendous opportunity for productivity improvements in analog design flows.
References [1] [2]
[3]
J. Cohn, D. Garrod, R. Rutenbar, and L. R. Carley, Analog Device-Level Layout Generation. Norwell, MA: Kluwer 1994. H. Chang, E. Charbon, U. Choudhury, A. Demir, E. Felt, E. Liu, E. Malavasi, A. Sangiovanni-Vincentelli, I. Vassilliou, A Top-Down, constraint-Driven Design Methodology for Analog Integrated Circuits, Kluwer Academic Publishers, Norwell Ma, 1997 K. Lampaert, G. Gielen, and W. Sansen, Analog Layout Generation for Performance and Manufacturability. Norwell, MA: Kluwer 1999.
Part II: Multi-Bit Sigma Delta Converters Arthur van Roermund
Sigma-delta modulation is a popular way of trading off amplitude resolution for bandwidth. The final accuracy of a sigma-delta converter is to a large extent given by the accuracy of the DAC in its feedback loop. Single-bit sigma-delta converters have the inherent property of linearity, as a single-bit DAC is inherently linear. For this reason these converters have gained much popularity during especially the last decade. The tradeoff for single bit, however, is a relatively large quantization noise power, that has to be shaped to bring it outside the signal band of interest, which requires a relatively large bandwidth and/or filter order. But, higher-order loops face severe stability problems, and high bandwidth goes with high power dissipation. The stability problem can be managed to a certain amount by restricting the input amplitude within a limited range, but that on its turn decreases the achievable signal-to-noise ratio. Using cascaded-loop concepts, like in the so-called mash structures, is another way to circumvent stability problems for high-order systems, but this is paid for by severe matching requirements. For small signal bandwidths the single-bit solution might be feasible, as there is room for a large oversampling ratio, but for higher signal bandwidths, the price to be paid for the single-bit approach seems to be too large. For these reasons, several multi-bit approaches have appeared recently. A multi-bit approach lowers the quantization noise, decreases the stability problems, relaxes filter demands, and lowers the oversampling requirement. Multi-bit asks for a multibit quantizer and a linear multi-bit AD converter; the amount of bits for both is not necessarily the same. Architectural concepts for multi-bit sigma-delta converters are the primary focus of the first paper. It addresses aspects like
multi-quantizer concepts, feedforward/feedback combinations, and multi-bit cascade structures. The following paper discusses the possibilities of multi-rate concepts, with a relatively low clock rate at the input side of the first loop. The next two papers address circuit-level implementation aspects, and the consequences of imperfections on the performance at system level, respectively for the forward path components (opamps, switches, quantizer) and for the DA component in the feedback loop. The last two papers consider the sigma-delta converter from an application standpoint. They address the design aspects and the developments of these type of converters for ADSL and VDSL, where signal bandwidths are becoming quite high.
Architecture Considerations for Multi-Bit
ADCs
Todd Brooks Broadcom Corporation 16215 Alton Parkway P.O. Box 57013
Irvine, CA. USA 92619-7013
Abstract
Sigma-delta ADC implementations today commonly make use of multi-bit architectures. The noise performance benefits of multi-bit implementations may be used for improvement in dynamic range or for reduction in oversampling ratio, relative to single-bit implementations. The performance and the limitations of several multi-bit architectures are presented and analyzed herein. The architectures studied include single-stage, truncation feedback, and cascaded multi-bit architectures. Additional architecture options, including inter-stage gain and analog feedforward, are also studied. Performance considerations for these various multi-bit architectures are compared with one another.
1. Introduction
The evolution of multi-bit ADC architectures has taken several distinct paths. The most common reported architecture includes a single-stage multi-bit modulator with a multi-bit feedback DAC [1-8]. This single-stage architecture uses the same number of levels for the quantizer and for the feedback. The noise transfer function (NTF) may provide more aggressive noise-shaping than the NTF in a singlebit modulator. A second common architecture is the cascaded multibit modulator [9-14]. This is often implemented with single-bit feedback in the first stage and with multi-bit feedback in the last stage 135 M. Steyaert et al. (eds.), Analog Circuit Design, 135–159. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
136
[9-11]. A third architecture uses a quantizer with more levels than the feedback. This is also often implemented with single-bit feedback to take advantage of the inherent linearity of a single-bit DAC [15-17]. Dynamic element matching (DEM) techniques are commonly used to improve the multi-bit DAC performance. These DEM techniques have increased the degrees of freedom available for multibit architecture design. DEM has become a typical component in the feedback loops in a majority of the reported multi-bit ADCs [12],[4-8],[12-14]. An obvious benefit of multi-bit implementation is the 6dB-perbit reduction in quantization noise energy. Another benefit, which can be much more significant, is an improvement in modulator stability which enables more aggressive noise shaping. The reduction of inband noise energy obtained with a multi-bit implementation may be used to improve the dynamic range or may be traded off for a reduced oversampling ratio (OSR). A reduction of the OSR may either be desired for the purpose of increasing signal bandwidth or for reducing clock rate and power consumption. A flow chart illustrating relationships between various ADC architectures is presented in Fig. 1. Architectures at the top of the flow chart are less complex than the following architectures. The architectures in black are discussed in the indicated sections, while architectures in grey are not studied. Each arrow represents a modification or addition between the preceding architecture and the following architecture. The focus herein is on multi-bit architectures and on architecture options which provide an advantage when applied in a multi-bit implementation. Section 2 presents single-stage multi-bit ADC modulators implemented with an equal number of levels for the feedback as for the quantization. The limitations and performance of these singlestage modulators are discussed and analyzed. A drawback of multi-bit ADC modulators is the cost and complexity of multi-bit feedback processing circuitry, including the feedback DACs and DEM circuits.
137
The truncation feedback architecture in section 3 reduces the required complexity of the feedback circuitry to achieve a given level of performance. In this alternative architecture, the number of feedback levels is smaller than the number of quantization levels. Section 4 provides an introduction to the use of inter-stage gain in cascaded multi-bit ADCs. An increase of the inter-stage gain decreases design sensitivity to errors in later stages. Two cascaded multi-bit architectures using inter-stage gain are presented and compared in section 5. In section 6, feedforward architectures which are useful for low-voltage implementation are presented. In these feedforward architectures, the dynamic range of internal signals in the loop filter is reduced in proportion to the number of feedback levels.
138
2. Single-Stage Multi-bit Modulators with equal ADC and DAC resolution
A block diagram of a single-stage multi-bit ADC with an m-bit quantizer and an m-bit feedback DAC is shown in Fig. 2. The digital output, can be described by equation (1). G(z) is the signal transfer function (STF) of the ADC and H(z) is the noise transfer function (NTF) of the ADC. The relationships between the loop filter transfer function, F(z), and the STF and NTF of the modulator are given in (2) and (3).
2.1. Noise Performance of Multi-bit Single-Stage Modulators
The signal-to-quantization-noise ratio (SQNR) of a single-stage multi-bit modulator depends on the order of the loop filter, the placement of poles and zeros in the loop filter, the OSR, and the number of feedback bits. Fig. 3. provides simulated results for SQNR from a study of various and order modulators operating at
139
OSR = 1 6 . In each modulator the loop filter response has been designed with optimized NTF for a given number of feedback bits, loop-filter order, and OSR. This study was done using the functions for NTF optimization provided in the delta-sigma toolbox [18], [19]. The optimization is based on the techniques described in reference [20]. Three curves in Fig. 3 provide the simulated performance for modulators which have been optimized for the number of feedback bits, m, indicated on the x-axis. The SQNR increases at a rate much greater than the expected 6dB-per-bit for these designs. This occurs because the modulators have been optimized to take advantage of the improved stability that is available with a larger number of feedback bits. In one of the curves in Fig. 3 the NTF has only been optimized for 1-bit feedback. For this curve the SQNR only increases at exactly the expected 6dB-per-bit rate. There is approximately a 30dB performance improvement in the optimized 5-bit design, as compared with the corresponding 5-bit design in which the NTF has only been optimized for 1-bit feedback.
140
2.2. Feedback Processing Limitations
Performance goals of a given ADC implementation may be met through the use of a single-stage multi-bit modulator as discussed in section 2.1. One of the important means of improving SQNR performance is by increasing the number of feedback bits. However, there are practical limits to the number of bits that can be fed back, since the feedback processing circuit complexity (flash ADC and DAC circuitry with DEM) grows exponentially with the number of feedback bits. Since the m-bit ADC and the m-bit DAC in Fig. 1. are included in the feedback loop, the delay in this circuitry must be small relative to the total number of clock-periods available for processing the feedback signal. Typical implementations do not tolerate more than one full clock period of total delay in the feedback processing. Delays in the DEM circuitry and DAC circuitry contribute to the overall timing budget of the feedback processing, resulting in less time available for the ADC. Flash ADC implementations provide small delay but are not suited for high-resolution due to exponential growth in the number of comparators in proportion to m, the number of bits. The complexity of the DEM and DAC circuitry also typically grows as an exponential function of m. For example, the number of switching blocks required in the tree structure DEM grows in proportion to [21]. The number of swappers in the butterfly DEM grows at a faster rate of [1 -2],[22]. Truncation feedback and cascaded architectures both reduce the required number of feedback bits to achieve a given level of SQNR performance. These architectures are based on the same approach, in that they both provide a digital estimate of the quantization error that is fed back in the modulator loop. This digital estimate of the error is then used to cancel the error in the output signal. Both of these architectures have been implemented in multi-bit ADCs with single-bit feedback [9-l l],[15-17]. These single-bit feedback implementations are typically limited by leakage of firststage quantization errors, although with moderate OSR and with
141
careful matching and analog accuracy, very wide dynamic range is possible. 3. Truncation Feedback Architectures
In a typical truncation feedback architecture, the digital feedback signal in the modulator loop is obtained by simply truncating LSBs from the digital output of the quantizer. The truncated LSBs from this quantizer are used to provide a digital estimate of the quantization error in the feedback signal. An alternative structure for generating a digital estimate of the quantizer error is illustrated in Fig. 4. This structure is useful for understanding the principle of truncation feedback. It includes a fine n-bit quantizer and a coarse m-bit quantizer, where n is greater than m. Both quantizers are used to quantize the input signal X. The output of the coarse m-bit quantizer is the MSB signal, which includes the large quantization error Q of the coarse quantizer. The output of the fine n-bit quantizer includes the fine quantization error q of the fine quantizer. The LSB signal, is the difference, q-Q, between the outputs of the two quantizers. Since the amplitude of q is smaller than Q provides an approximation for -Q. This approximation improves as n increases relative to m.
142
In practice, the MSB signal, is typically obtained by simply truncating n-m of the LSBs from the digital output of a single n-bit ADC. The truncated LSBs are equivalent to the LSB signal, While this structure appears to be more complicated than necessary, it does have some practical advantages explained at the end of this section. A notable peculiarity with this structure is that the numbers of coarse and fine quantization levels do not need to be related. The only requirements are that both quantizers must have the same gain and that the number of levels in the fine quantizer must be greater than the number of levels in the coarse quantizer
3.1 LSB Truncation Feedback
The LSB truncation feedback architecture is illustrated in Fig. 5. A single-stage multi-bit modulator loop is implemented with a fine nbit ADC and a coarse m-bit DAC. The m-bit feedback for the DAC, is obtained by truncating the n-m LSBs from the fine n-bit ADC. The expression for is given in (4). The LSB signal, is processed with a digital filter, to provide a compensation signal, at the output of the digital filter. This compensation signal is added to to cancel the noiseshaped quantization error, H(z)Q(z), provided by the modulator. The output signal, of the truncation feedback circuit in Fig. 5 is given in (5). This output signal includes a term, due to the imperfect estimate provided by the signal, for the quantization error, Q(z).
143
Quantization error leakage is a very well understood problem in cascaded MASH architectures. As indicated in (5), the same problem occurs in truncation feedback architectures. Inaccurate cancellation of the quantization error, Q(z), is due to imperfect matching of analog and digital transfer functions H(z) and Leakage of Q(z) limits performance if the integrated inband energy of the term in (5) is larger than the integrated inband energy of the noiseshaped fine n-bit quantizer term, The leakage term can be reduced either by improving the analog accuracy such that H(z) more closely matches or by increasing the number of feedback bits, m. With sufficient matching and/or with a large enough number of feedback bits, the integrated inband quantization noise may be dominated by the fine n-bit quantization error, q(z), of the ADC. In this case, (5) simplifies to (6).
Equation (6) is functionally equivalent to (1), with the exception that the large H(z)Q(z) term in (1) has been replaced with the smaller term in (6). Thus, the truncation feedback architecture can provide performance equivalent to that of a single-stage modulator with a larger number of feedback bits. This performance is achieved with a less expensive feedback DAC and with a smaller DEM circuit. If single-bit feedback is used then the DEM circuitry is not required, but in this case greater care must be taken to avoid performance limitations due to the leakage. In addition to leakage, there is another important disadvantage for truncation feedback when the number of feedback bits is reduced relative to that in a conventional single-stage multi-bit modulator with a larger number of bits in feedback. In this case, modulator stability is
144
degraded, and a less aggressive NTF must be used in the truncation feedback architecture. If only a small number of feedback bits are used (e.g. 1 to 1.5bit feedback), then the required reduction in noiseshaping gain may have a big effect on overall performance. Throughout the following sections the two variables, n and m, are used to represent the numbers of bits in fine and course quantizers respectively, where n is greater than m. The two variables, q and Q, are also used to represent the quantization errors, where the energy of Q is larger than the energy of q.
3.2 Coarse/Fine Truncation Feedback
Another architecture that is functionally equivalent to the LSB truncation feedback architecture in Fig. 5 is shown in Fig. 6. The coarse/fine truncation feedback architecture in Fig. 6 uses two separate ADCs for a direct implementation of the structure in Fig. 4 to derive the MSB and LSB truncation signals. Output signal of the circuit in Fig. 6 is given in (7) and is identical to given in (5). However, the effect of mismatch in the gains of the two ADCs in Fig. 6 has not been included in (7). This mismatch causes additional leakage of Q(z), such that the coarse/fine truncation leakage may be more problematic than the leakage of Q(z) with the simple LSB truncation in Fig. 5.
145
The coarse/fine truncation feedback architecture may be used to solve one practical problem with truncation feedback, as discussed in detail in reference [12]. In Fig. 5, the fine n-bit ADC is inside the feedback loop of the modulator, and therefore the delay in this ADC must remain within the timing budget available for the feedback processing. Efficient implementations for this fine ADC will have larger throughput delay than that of flash ADC implementations. In Fig. 6, the fine n-bit ADC is now outside of the loop, and the resolution of the ADC inside the feedback loop has been reduced to m-bits. Therefore the fineADC is easier to implement while meeting the timing constraints. In practice, a fixed digital delay (not shown) must also be added at the output of the coarse ADC to align the m-bit ADC data in time with the delayed n-bit ADC data. Otherwise, would need to be non-causal.
4. Inter-stage Gain in Cascaded Multi-bit Architectures
A residue amplification stage is shown in Fig. 7. This circuit is a common block used in pipelined and algorithmic ADCs to repeatedly process and amplify quantization errors from previous stages or cycles to following stages or cycles. The quantization error of an m-bit ADC is processed using an m-bit DAC and a subtractor. The analog signal at the output of the subtractor is directly proportional to the quantization error, Q, of the m-bit ADC.
146
A plot of the signals in the residue amplification stage, for m = 3, is shown in Fig. 8. The amplitude of the signal at the output of the subtractor is times smaller than the reference, Vref, of the m-bit ADC and the m-bit DAC. An amplification factor of A is used to amplify the signal at the output of the subtractor. An amplification factor of may be used to scale the quantization error back up to the full-scale span of the reference, but in practice somewhat smaller amplification factors must be used to avoid exceeding the full-scale range when offsets are present in the ADC. The block diagram of a cascaded two-stage ADC is shown in Fig. 9. This simple block diagram may, for example, be used to describe the basic functionality of a two-stage pipelined ADC. A residue amplification stage is used to implement the first stage of the converter. The analog output signal, -AQ, from this stage is proportional to the quantization error, Q, of the first ADC. The second stage, implemented with ADC2, introduces a second quantization error, The output of ADC2 provides a scaled digital estimate of the quantization error, Q, in the first-stage ADC. This digital estimate of the error is used to cancel the error Q of the first ADC. The error cancellation is similar to that used in truncation feedback in Fig. 5, in which the digital estimate of error was provided with the LSB signal.
147
In Fig. 9 a digital attenuator, compensates for the scaling factor of the analog inter-stage gain, A. In practice the value of A is often chosen as a power of 2, such that the attenuator may be implemented by shifting bits from ADC2 to the right relative to bits from the first ADC before adding the bits together to create the output signal,
The output signal, for the cascaded two-stage ADC is given in (8). The quantization error, Q2, of ADC2 is attenuated by a factor which is nominally equal to A. The effective resolution of the converter is increased through the use of inter-stage gain A. The overall two-stage ADC resolution is equal to the number of bits in ADC2 plus an additional factor of
The third term in (8), represents the leakage of first-stage quantization errors. This leakage is due to imperfect matching between the analog gain, A, and the digital attenuation, The quantization error, Q, is highly correlated to the input signal, and much of the error energy is harmonically related to the input signal. Since the first-stage ADC in Fig. 7 does not include a modulator
148
feedback loop, the harmonic energy in Q is not suppressed in feedback, and therefore leakage of Q degrades the linearity of the ADC. If the inter-stage gain is perfect, then and no leakage occurs. In this case (8) simplifies to (9).
5. Oversampled Multi-bit Cascaded Architectures
Oversampled cascaded architectures take advantage of noiseshaping feedback in the first stage. modulation in the front-end of the cascade minimizes inband quantization error leakage to later stages. Two multi-bit oversampled cascaded architectures are studied in this section. Both of the cascaded architectures presented also make use of inter-stage gain to attenuate errors in later stages. Without loss of generality, only two-stage architectures are studied here. 5.1. Multi-bit Cascaded
Architecture
A cascaded ADC is shown in Fig. 10. A modulator is used in the first stage. A residue stage, including the m-bit ADC, an m-bit DAC and a subtractor, is used to process the quantization errors of the m-bit ADC in the modulator feedback loop. The m-bit DAC in the residue stage may be shared with the m-bit DAC in the modulator, however, this is not possible in a switched-capacitor implementation. The second stage includes an analog inter-stage gain, A, and a second p-bit ADC. The digital post-processing includes digital attenuation, digital filter, and a summer to add the digitally post-processed bits from the second stage together with the bits from the first stage. The digital post-processing attenuation, compensates for the scaling of Q(z) with the interstage gain, A. The digital filter, compensates for the noiseshaping of Q(z), in the same manner as previously discussed in the truncation feedback architectures.
149
The second stage p-bit ADC in Fig. 10, ADC2, must process data at the rate of the modulator sample clock. For example, flash, folding, pipelined and ADC architectures are all suitable for the implementation of ADC2. A 3-stage 8-bit pipelined ADC is used to implement ADC2 in reference [12]. Any throughput delay in ADC2 must be matched with a corresponding digital delay in the bits taken from the output of the first stage. The output, of the cascaded ADC in Fig. 10 is given in (10). The quantization error, of ADC2 is divided by the attenuation factor Consequently the effective resolution of the converter is increased through the use of the inter-stage gain. This is a useful performance advantage. By comparison, to achieve equivalent performance with the same number of feedback bits, m, the resolution of the fine ADC in the coarse/fine truncation feedback architecture in Fig. 6 must be larger than the resolution of ADC2 in the cascade architecture in Fig. 10.
150
The third term in (10), represents the leakage of first-stage quantization errors. This leakage is due to imperfect matching between the analog gain and the digital attenuation factor, A and and between the NTF and the digital filter, H(z) and An important drawback of using inter-stage gain to attenuate the quantization errors, of the second stage is that the gain adds an additional analog inaccuracy which may contribute to the leakage of first stage quantization errors, Q(z). For this reason the leakage of Q(z) in the cascaded architecture in Fig. 10 may be more problematic than the leakage of Q(z) in the LSB truncation feedback architecture. If the inter-stage gain and the modulator NTF perfectly match the digital attenuation and the digital filter, respectively, then (10) simplifies to (11). In this case the quantization error, Q(z), of the first stage is completely removed. The overall resolution of the cascaded ADC is then determined by three factors: the number of bits p in ADC2, an additional factor due to the inter-stage gain, and an additional factor due the processing gain of the NTF integrated within the signal band.
5.2. Multi-bit Cascaded MASH Architecture
The block diagram of a cascaded MASH ADC is shown in Fig. 11. This architecture is a subset of the cascaded architecture in Fig. 10 in which the second stage ADC has been replaced with a second modulator loop. This second modulator includes loop filter, p-bit ADC2, a p-bit feedback DAC, and a subtractor. The p-bit ADC in the second stage modulator must be implemented with a small throughput delay because it is inside the feedback loop. ADC2 in Fig. 11 is typically implemented with a flash architecture, and the number of bits p in this ADC is practically limited to less than that
151
which is possible with ADC2 in the cascaded 10.
architecture in Fig.
The STF and NTF of the second stage modulator are represented by and The digital processing in Fig. 11 is similar to the digital processing in Fig. 10, with the exception that the bits from the second stage modulator must be post-processed with an additional digital transfer function, This post-processing compensates for analog filtering of Q(z) in the STF, of the second stage modulator.
The output of this cascaded MASH ADC is given in (12). The quantization error, of ADC2 is attenuated with the additional NTF, of the second stage modulator. This provides a combined higher-order NTF, for the second stage quantization error
152
The third term in (12) represents the leakage of first-stage quantization errors. Imperfect matching between analog and digital processing (A and H(z) and and and contributes to the leakage. If the leakage term is made small enough relative to the contribution of the second stage quantization noise, due to sufficient analog accuracy and/or due to the use of a sufficient number of feedback bits to decrease Q(z), then (12) simplifies to (13).
Several conclusions can be drawn by comparing the two cascaded architectures presented in this section. While the MASH architecture provides more noise shaping, the cascaded architecture in Fig. 10 may be implemented with more resolution because it is not in a feedback loop. The MASH architecture may also be more sensitive to leakage of first-stage quantization noise due to an additional component of analog inaccuracy, Both architectures may be used to achieve similar performance [12], [13]. However, the total circuit complexity required to implement the cascaded ADC with a high-resolution ADC2 is typically larger than the complexity required for the MASH architecture. With increased OSR the MASH architecture becomes much more desirable due to the corresponding increase in noise-shaping gain.
6. Multi-bit Feedforward Architectures
The block diagram of a multi-bit feedforward architecture is shown in Fig. 12. This architecture is similar to the single stage multibit modulator in Fig. 2, but an additional summer is included which adds the input signal to the output of the loop filter. The output signal, is given in (14). The feedforward summation changes the signal transfer function of the ADC to unity, independent of the loop filter, but has no effect on the noise transfer function. The benefit of this feedforward summation becomes apparent by analyzing the
153
signal, at the output of the loop filter. This signal is independent of the input signal as indicated in (15), but is a function of the quantization noise, Q(z), of the m-bit ADC. As illustrated in the residue plot in Fig. 8 the amplitude of Q(z) decreases in proportion to the number of quantization levels. Consequently, the amplitude of decreases by 2 for each additional feedback bit in the modulator. This decrease in signal swing in the loop filter is obtained without degrading the performance of the modulator. For this reason, the combination of the feedforward architecture with multi-bit feedback is promising for use in low-voltage wide dynamic range ADC implementations.
The preceding discussion of the feedforward architecture in Fig. 12 fails to note that the output of the summer must swing with the full dynamic range of the ADC. However, any errors introduced in this summation circuitry are attenuated by the NTF of the modulator. Consequently, the accuracy and noise performance of this summation circuitry is less critical than the accuracy and noise of the loop filter circuits. In order to achieve very low voltage operation, the summation circuitry may need to attenuate the two input signals of the summation, and and the reference of the m-bit ADC may need to be decreased. The adder is still helpful for reducing the signal swing at the output of the loop filter, even if the feedforward gain of the input signal is not accurate (which will only cause the amplitude of
154
to increase slightly). The accuracy of the adder is much less critical than the analog accuracy in the truncation feedback and cascaded architectures. The block diagram of a digital multi-bit feedforward architecture is shown in Fig. 13. This digital feedforward approach avoids the need for analog summation circuitry. The output signal, is given in (16). The quantization noise is increased due to the feedback of quantization errors from both of the ADCs. The output of the loop filter, is given in (17). This signal is also the input of the ADC in the feedback loop of the modulator. The increase in quantization noise due to the use of two ADCs is a significant drawback of this approach and must be countered through the use of a larger number of bits in both ADCs to obtain equivalent performance to the analog feedforward architecture. The benefit of this digital feedforward architecture is that no analog summation circuit is required and the analog signal swing at the input of the ADC in the modulator is reduced in comparison with the analog feedforward architecture in Fig. 12. However, the feedforward ADC must be capable of processing the full signal swing of the input signal. For this reason it may be necessary to attenuate the input of this ADC and to decrease the reference voltage of this ADC to facilitate low voltage operation.
155
7. Conclusion
Table 1 provides a comparison summary of relative SQNR performance issues with the architectures presented in previous sections. If quantization leakage is insignificant, then the performance of each architecture is equivalent to that of a modulator loop with the specified “effective number of feedback bits” indicated in the column and with the specified “effective NTF” indicated in the column. With all of the architectures listed in Table 1 the quantization leakage may be decreased by increasing the number of bits, m, fed back in the input modulator loop. The feedforward architectures in section 6 are not listed in Table 1, since feedforward techniques do not directly improve the SQNR performance. They can help to improve the SNR when performance is limited by device noise, through use of larger signal swings and larger DAC feedback levels at the input of the ADC. Feedforward techniques also help improve THD performance by removing or decreasing components of the input signal energy which must be processed in the front-end circuitry of the ADC. Feedforward techniques may be implemented in any of the architectures summarized in Table 1.
156
As explained, truncation feedback and cascaded architectures are both based on the same approach. They both generate a digital estimate of the quantization error that is fed back in the modulator loop. This digital estimate of the error is used to cancel the error in the output signal. This enables an “effective number of feedback bits” which exceeds the “actual number of feedback bits.” This is made possible without the typical exponential growth in feedback circuit complexity that is necessary with conventional single-stage multi-bit feedback.
157
References
[1] [2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
L. R. Carley, “A noise-shaping coder topology for 15+ bit converters,” IEEE JSSC, vol. SC-24, pp.267-273, April 1989. J. W. Fattaruso, S. Kiriaki, M. de Wit, and G. Warwar, “Selfcalibration techniques for a second order multibit sigma-delta modulator,” IEEE JSSC, Vol. 28, pp. 1216-1223, Dec. 1993. R. T. Baird and T. S. Fiez, “A low oversampling ratio 14-b 500-kHz ADC with a self-calibrated multi-bit DAC,” IEEE JSSC, Vol. 31, pp. 312-320, Mar. 1996. E. Fogleman, I. Galton, W. Huff, and H. Jensen, “A 3.3V single-poly CMOS Audio ADC Delta-Sigma Modulator with 98-dB Peak SINAD and 105dB Peak SFDR,” IEEE JSSC, Vol. 35, pp. 297-307, Mar. 2000. Y. Geerts, M. S. Steyaert, and W. Sansen, “A high-performance multi-bit CMOS ADC,” IEEE JSSC, vol. 35, pp. 18291840, Dec. 2000. E. Fogleman, J. Welz, and I. Galton, “An Audio ADC DeltaSigma Modulator with 100-dB Peak SINAD and 102dB DR Using a Second-Order Mismatch-Shaping DAC,” IEEE JSSC, Vol. 36, pp. 339-348, Mar. 2001. R. Schreier, J. Lloyd, L. Singer, D. Paterson, M. Timko, M. Hensley, G. Patterson, K. Behel, J. Zhou, and W. J. Martin, “A 50mW Bandpass ADC with 333kHz BW and 90dB DR,” in ISSCC Dig. Tech. Papers, Feb. 2002, pp. 216-217. R. Jiang and T. S. Fiez, “A 1.8V 14b DS A/D with 4Msample/s Conversion,” in ISSCC Dig. Tech. Papers, Feb. 2002, pp. 220221. B. P. Brandt and B. A. Wooley, “A 50-MHz Multibit SigmaDelta Modulator for 12-b 2-MHz A/D Conversion,” IEEE JSSC, Vol. 26, No. 12, pp. 1746-1756, Dec. 1991. F. Medeiro, B. Perz-Verdu, and A. Rodriquez-Vazquez, “A 13bit 2.2-MS/s 55-mW multibit cascade sigma-delta modulator in CMOS 0.7-um single-poly technology,” IEEE JSSC, Vol. 34, pp. 748-760, Jun. 1999.
158
[11]
[12]
[13]
[14] [15]
[16]
[17]
[18] [19]
[20]
S. K. Gupta, T. L. Brooks, and V. Fong, “A 64MHz ADC with 105dB IM3 Distortion using a Linearized Replica Sampling Network,” in ISSCC Dig. Tech. Papers, Feb. 2002, pp. 224-225. T. L. Brooks, D. H. Robertson, D. F. Kelly, A. Del Muro, and S. W. Harston “A Cascaded Sigma-Delta Pipeline A/D Converter with 1.25MHz Signal Bandwidth and 89dB SNR,” IEEE JSSC, Vol. 32, No. 12, pp. 1896-1906, Dec. 1997. I. Fujimori, L. Longo, A. Hairapetian, K. Seiyama, s. Kosic, J. Cao, and S. Chan, “A 90-dB SNR 2.5-Mhz Output-rate ADC Using Cascaded Multibit Delta-Sigma Modulation at 8x Oversampling Ratio,” IEEE JSSC, Vol. 35, pp. 1820-1827, Dec. 2000. K. Vleugels, S. Rabii, and B. A. Wooley, “A 2.5V Sigma-Delta Modulator for Broadband Communication Aplications,” IEEE JSSC, Vol. 36, pp. 1887-1899, Dec. 2001. T. C. Leslie & B. Singh, “An improved sigma-delta modulator architecture,” IEEE Proc. ISCAS ’89, vol. 1, pp. 372-375, May 1990. A. Hairapetian and G. C. Temes, “A dual-quantization multi-bit sigma-delta A/D converter,” IEEE Proc. ISCAS ’94, vol. 5, pp. 437-440, May 1994. T. Salo, T. Hollman, S. Lindfors, and K. Halonen, “A DualMode 80MHz Bandpass Modulator for a GSM/WCDMA IF-Receiver,” in ISSCC Dig. Tech. Papers, Feb. 2002, pp. 218219. R. Schreier, Delta-Sigma Toolbox, Jan. 2000, http://www.mathworks.com/matlabcentral/fileexchange S. Northworthy, R. Schreier, and G. Temes, Delta-Sigma Data Converters: Theory, Design and Simulation. New York: IEEE Press, 1997, ch. 4, pp. 156. J. G. Kenney and L. R. Carley, “Design of multi-bit noiseshaping data converters,” Analog Int. Circuits Signal Proc. J. (Kluwer), vol. 3, pp. 259-272, May 1993.
159
[21]
[22]
Ian Galton, “Spectral shaping of circuit errors in digital-toanalog converters,” IEEE Trans. Circuits Syst. II, vol. 44, pp. 808-817, Dec. 1996. R. Adams and T. Kwan, “Data-directed scrambler for multibit noise shaping D/A converters,” U. S. Patent 5404142, Analog Devices, Inc., Apr. 4, 1995.
This page intentionally left blank
Multirate Sigma-Delta Modulators, an alternative to Multibit F.Colodro, A.Torralba Dpto. de Ingeniería Electrónica, Escuela Superior Ingenieros Camino de los Descubrimientos, s/n 41092 Sevilla, Spain Abstract New high-speed sigma-delta (SD) analog to digital converters (ADC's) are required for xDSL and RF receivers. As sampling frequency is upper limited by the amplifier bandwidth and power consumption, these high-speed, low-power converters operate with a small oversampling ratio, using a unique sampling frequency. This paper shows that multirating is a useful technique to reduce power consumption in high speed SD modulators. To this end, three different multirate SD modulators are presented. The first and second ones use a low sampling frequency in the first integrator(s) of a single loop structure, while the third one uses a low oversampling frequency in the first stage(s) of a cascade converter. 1. Introduction Oversampling analog-to-digital modulators have been widely used for high-resolution applications [1]. For high performance modulators there is a tradeoff between oversampling ratio (M) and modulator order (L). As sampling frequency is upper limited by the amplifier bandwidth and power consumption, high speed low-power SigmaDelta (SD) modulators operate with a small oversampling ratio. High accuracy at a low oversampling ratio can be achieved either, by increasing the modulator order or by using a multibit quantizer in the oversampled loop. Single-loop high order modulators show stability problems which can be overcome by cascading several low order single-loop modulators. Multibit modulators offer a direct improvement over 1-bit topologies of 6b dB, where b is the number of 161 M. Steyaert et al. (eds.), Analog Circuit Design, 161–180. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
162
bits. Besides, they alleviate the stability problems that appear in high order single-loop modulators. Their major drawback is the high accuracy requirement on the multibit digital-to-analog converter (DAC) located in the feedback path. Several techniques have been proposed to mitigate errors caused by the feedback DAC, by using digital [2] or analog correction [3], or dynamic element matching [4]. New high performance CMOS ADC's with high accuracy at the MS/s range are now required. Usually they are high-order cascaded structures of single bit first- or second-order stages, with a multibit quantizer at the last stage of the cascade, showing that selecting the optimal modulator is a compromise between modulator order, oversampling ratio and resolution of internal quantizers. Conventional first-order analysis show that, for a given Dynamic Resolution (DR), power consumption of SD modulators does not depend on the oversampling ratio M. For switched capacitor implementations, this analysis assumes that the DR is limited by the kT/C noise of the first integrator. Increasing M has a twofold consequence on the first integrator opamp. On the one hand, the capacitor load can be decreased, as the kT/C noise is inversely proportional to M. On the other hand, the opamp has to settle faster. Both effects counteract so that power consumption does not change. Similar first-order analysis can be done for continuous-time and switched-current implementations [5]. Nevertheless, this simple analysis assumes a one-pole model of the opamp, which is only valid for low sampling frequencies. Therefore, it does not apply for new high-speed modulators, where opamps are operated near their maximum bandwidth. In that case, power consumption dramatically increases with sampling frequency because: 1) parasitic capacitances become a significant fraction of the total capacitance, and 2) clock nonoverlap, and rise and fall times become a significant fraction of the clock cycle. This issue was also addressed in [5], where an analysis of three different opamp topologies including first-order parasitics, showed a significant increase of the power consumption in the first integrator of a SD modulator for a high sampling frequency.
163
This paper shows that the use of different oversampling ratios along the structure of a SD modulator can alleviate some of the major problems faced in its design. As the integrators have a considerable amount of gain at frequencies in the baseband, baseband noise and distorsion occurring in the integrators succeeding the first one are greatly attenuated when referred back to the modulator input. Therefore, the noise and distorsion performance of a SD modulator is primarily determined by the first integrator, which, in turn, determines the power consumption of the full converter. Concerning resolution, this paper shows that a reduction in the oversampling ratio of the first integrator(s) of a SD modulator can be compensated by an increase in the oversampling ratio of the last integrators, whose contribution to power consumption is not so significant. In this sense, this paper shows that proper selection of the oversampling ratio in each integrator of a single-loop or a cascaded SD modulator is another architectural decision to be considered in the design of highresolution, low-power, high-speed SD modulators. Note that, concerning power consumption, only the analog portion of the modulator is considered here. Besides, the actual decrease in power consumption achieved with these architectures are not quantified here, as it is technology dependent and should be addressed in a case-by-case basis. 2. The Multibit-Multirate Sigma-Delta (MM-SD) Modulator. For the sake of simplicity, the MM-SD modulator will be presented for the second-order case [6]-[8]. Figure 1b shows the architecture of a conventional second-order SD modulator. Let be the sampling rate and let be the input signal Nyquist frequency. The oversampling ratio is defined as Unlike the conventional SD modulator of figure 1b, in the MM-SD modulator of figure 1a, the first and second integrators are operated at different sampling frequencies. Therefore, this modulator has two different oversampling ratios, and where N is the oversampling ratio increment of the second integrator.
164
In the MM-SD modulator, the output y has to be downsampled at rate before being fed-back to the first integrator. To avoid aliasing, y is filtered (by means of the digital filter H(z)), before being decimated. To this end, simple comb-type digital filters are used [1]. Note that, in a practical implementation, the modulator output would be taken from v, rather than from y, to take profit from the decimation process done in the feedback path. After decimation, signal v is wide, and a b-bit DAC is required to convert it to the analog world. The MM-SD modulator can be also considered to be a multibit SD modulator, where the multibit quantizer in the forward path has been replaced by a single bit one, operating at a higher frequency, by the increase in the clock rate of the second integrator. In next subsection a relation between the word lengths in the feedback path of a MM-SD modulator and that of a multibit SD modulator with similar performances, will be given.
165
For the sake of completeness, in a MM-SD modulator with order higher than 2, every integrator is operated at the high frequency rate except the first one which is operated at the low frequency rate Note that other intermediate solutions could be considered, extending the low oversampling ratio to integrators other than the first one. These intermediate solutions are not studied here and will be deferred to future research. 2.1. Analysis of the MM-SD modulator. The circuit in figure 1a can be shown to be equivalent to the circuit in figure 2. Now the first integrator of figure 1a, working at the rate and the Sample & Hold Interpolator have been replaced in figure 2 by an integrator working at the rate and a N-delay.
Using a linear model for the quantizer in figure 2, the following expression can be obtained:
where X(z) is the z-transform of the subsampled input sequence and E(z) is the quantization noise. The second member of expression (1) is the summation of three terms: the input signal filtered by the Signal Transfer Function, the shaped quantization noise, and an error term, W(z), due to aliasing in the decimation process. The Signal Transfer
166
Function STF(z) and the Noise Transfer Function NTF(z) are given by:
respectively. The error term W(z) is given by:
It can be shown that, for common values of N, the spectral components of W(z) folded into the signal band are negligible when H(z) is a second-order comb-type filter,
Besides, in a first order approach, at frequencies in the baseband, as and F(z) can be replaced by 1/N. Then
Note that the N factor which multiplies in equation (5) cancels the attenuation introduced by the input interpolator (see figure 2). Proceeding as in [1], and extending the result to MM-SD modulators with order higher than 2,
where is the in-band Quantization Noise Power of a SD modulator with quantization step oversampling ratio in the first integrator, and in the rest. An alternative analysis with similar results can be found in [9].
167
To evaluate the performances of MM-SD modulators, expression (7) shows the ratio between the in-band QNP of a MM-SD modulator with oversampling ratios and and the inband QNP of a conventional c-bit SD modulator with oversampling ratio M. From (6),
Three different conventional modulators will be compared to a MMSD modulator with oversampling ratios and a) 1-bit,
modulator operating at the low oversampling ratio In this case, the ratio between the in-band QNP's falls by 3 • (2L -1) dB, for every doubling of the oversampling ratio increment N. b) 1-bit, modulator with oversampling ratio M (i.e., c=1). According to expression (7), the in-band QNP of a MM-SD modulator with and (i.e., N=4), is 3 • (2L -3) dB smaller than the in-band QNP of the conventional SD modulator. c) c-bit,
modulator operating at the low oversampling ratio According to (7), the MM-SD modulator and the cbit conventional SD modulator have the same in-band QNP when
If N or L >> 1, this expression reduces to 2.2. Simulation results and discussion.
Simulations reveal that a convenient order for filter H(z) in the feedback path of the MM-SD modulator is k = L, and this value has been used in the simulations reported here1. Figure 3 shows the 1All simulations results in this paper have been obtained by simulation in the time-domain using Matlab@.
168
simulated Signal to Noise plus Distorsion Ratio (SNDR) for a MMSD modulator with and N=4 (the feedback DAC resolution is
To compare performances three conventional SD modulators will be considered again: a) 1-bit modulator operating at the low oversampling ratio expected, approximately 18 dB separate both SNDR curves.
As
b) 1-bit modulator with oversampling ratio M=64: Expression (7) predicts 3 dB of larger SNDR for the MM-SD modulator. This difference is hardly appreciated in figure 3, except for small input signals, where second order effects seem to make the MM-SD modulator to perform better than predicted by our analysis. c) 4-bit modulator operating at the low oversampling ratio (i.e., M = According to expression (8), a 3-bit conventional SD modulator should have a SNDR similar to the SNDR of the MM-SD modulator of figure 3. However, as the DAC resolution of the MMSD modulator is 4 bit, a 4-bit conventional architecture has been selected for comparison. It can be seen in figure 3 that the 4-bit SD
169
converter approximately has a 11 dB improvement in the SNDR, at least for high input signals, as predicted by equation (7). For small input signals, second order effects mentioned above, seem to make the MM-SD modulator to exhibit a better SNDR. 3. Multirate Single-Bit Sigma-Delta (MS-SD) Modulator. Although MM-SD modulators achieve a reduction in the oversampling ratio of the first integrator, the presence of a multibit DAC in the feedback path introduces all drawbacks inherent to multibit architectures. A new topology of SD modulators, called Multirate Single-bit SD (MS-SD) modulators, was proposed in [10]. MS-SD modulators achieve similar performances to multibit SD modulators, avoiding the detrimental effects of a multibit ADC in the forward path and a multibit DAC in the feedback path. Starting from the MM-SD modulator of figure la, let's replace the multibit DAC in the feedback path by a single bit DAC (figure 4a). The error introduced in the feedback path is measured in the digital domain by substracting the modulator output y from an up-sampled version of the single-bit DAC output. This error term, properly shaped by a cancellation filter C(z), is finally added to the modulator output y to form the new modulator output v. Note that a similar idea was proposed by Leslie and Singh [11] for single rate multibit SD modulators, although the technique proposed here is more involved and takes also into account the error introduced by the decimation process in the feedback path. 3.1. Analysis of the MS-SD modulator. The circuit in figure 4a can be shown to be equivalent to the circuit in figure 4b. The first integrator of figure 4a, working at the rate and the Sample & Hold Interpolator have been replaced in figure 4b by an interpolator and an integrator, working at the rate with a N-delay. Let T(z) and E(z) be the errors introduced in the single bit quantizers located in the feedback and forward paths, respectively. The following expression can obtained for the modulator output Y(z):
170
where NTF(z) and W(z) are given by expressions (2) and (3), respectively. In the cancellation path,
where R(z) is the transfer function of a Sample & Hold,
171
To evaluate the contribution of the feedback quantization error T(z) to the modulator output, the forward quantization error E and the input signal X are zeroed in (9). Applying the result to (10), and rearranging terms,
The cancellation filter which eliminates the error terms is given by
and W(z)
Replacing STF(z) by its value given in (2), and then, using (11),
Such a cancellation filter can be shown to be unstable, so that a direct implementation of C(z), as given by expression (14), is not possible. However, for reasonable values of N, expression (14) can be approximated by a 2N delay, i.e., [10]. Note that, after replacing STF(z) by its value in expression (2), the final expression for C(z) in (13) does not depend on H(z). In fact, it has been observed by simulation that performances of the MS-SD modulator of figure 4a do not appreciably change if the filter H(z) is a first-order filter, showing that the cancellation path is also able to remove the error introduced in the decimation process.
172
3.2. Simulation results and discussion. With such a simple implementation of the cancellation filter figure 5 shows the SNDR of the MS-SD modulator of figure 4a compared to other two second-order architectures: the MM-SD modulator of figure 1a with a 4-bit DAC in the feedback path, and a conventional 3-bit multibit SD modulator operating at the low oversampling ratio
Curves in figure 5 have been obtained by simulation for and N=4. A good matching between these three curves can be observed, except for low input signals, where second-order effects, already mentioned, seem to make the MM-SD architecture to perform better than expected, although it has to be experimentally confirmed. MS-SD modulators with order L > 2 can be also built. Like in MMSD modulators, in high-order MS-SD modulators only the first integrator is operated at the low sampling frequency and the rest is operated at the high sampling frequency The decimation filter H(z) is a L-th order digital comb filter, although a smaller order can also be used for H(z), because the cancellation path also contributes to remove the error introduced in the aliasing process, as discussed in previous sub-section. The simulated SNDR of a
173
MS-SD modulator N=2) is depicted in figure 6 and compared to conventional 2- and 3-bit SD modulators operating at the low oversampling ratio
As expected, the MS-SD modulator performs like a 2.5-bit conventional SD modulator, except for the peak SNDR, which is smaller due to the 1-bit implementation of the MS-SD modulator. In the case with N=2, the corresponding cancellation filter C(z) is stable and it has been used in the simulations of figure 6. A firstorder comb filter was selected as H(z). The coefficients for the order architecture are: integrator gains: feedback coefficients: and cancellation filter
Coefficients and for the and SDmodulators have been chosen to maximize their SNDR. However, it has to be mentioned that performances of the proposed modulators relative to their conventional single-rate counterparts do not depend on modulator coefficients, provided that the cancellation filter is properly designed.
174
4. Multirate-Cascade SD (MC-SD) architectures High-order SD modulators can be built by cascading multiple loworder SD modulators. In this way the stability properties of low-order (namely, first and second-order) structures are preserved. In a cascade SD modulator, a measure of the error term of the i-th stage is digitized in the ( i+1 )-th stage, and its outputs are combined in the digital domain, in such a way that the noise of the i-th stage is cancelled. Special attention has to be paid to mismatches between coefficients in the analog and digital signal flows. Like in the integrators of single loop modulators, the noise of each succeeding stage of a cascaded modulator, when referred to the modulator input, is shaped to a higher degree than the noise of the previous stage. The higher the order of noise shaping, the greater the attenuation in the pass-band. Therefore, the requirements on cancellation of noise from the second stage are much less than those from the first stage. Because of noise shaping, usually only the first stage in a cascade requires special design considerations and it determines the power consumption of the full modulator. Multirate signal processing can be also applied to reduce the power consumption of a cascade SD modulator. As power consumption highly increases with sampling frequency when the opamps are operated near their maximum bandwidth, a significant reduction in power consumption is expected if the first stage is operated with a smaller oversampling ratio than the rest of stages. A new class of cascade SD modulators, called Multirate-Cascade Sigma-Delta (MCSD) modulators, was presented in [12]-[13]. In a MC-SD modulator the first stage is operated at a low sampling frequency while the rest of stages is operated at a high sampling frequency where N is the oversampling ratio increment in the last stages of the modulator. Figure 7 shows the 2-2 MC-SD modulator. Unlike MMSD modulators, in a MC-SD modulator the highest rated signals are not fed-back to the analog circuitry working at the lowest rate. Therefore, the modulator in figure 7 does not require an additional decimation filter, neither a DAC.
175
Note that other intermediate topologies could also be considered. For instance, a different oversampling ratio could be selected for each modulator stage, just as sampling capacitors are usually scaled according to their contribution to the input referred quantization noise. These new multirate topologies are a natural extension of the MC-SD modulators proposed here.
4.1. Analysis of the MC-SD modulator. The analysis of the 2-2 modulator of figure 7 will be made. Assuming perfect matching between coefficients in the analog and digital domains, the z-transform of the modulator output is given by:
where is the quantization noise in the second modulator stage. Proceeding as in [1] the error cancellation filters are now given by
176
The filter can be implemented as a cascade of two filters: working at the lowest sampling frequency and working at the highest sampling frequency Note that the Sample & Hold Interpolator placed between filters and compensates the effects of the Sample & Hold Interpolator placed between stages in the analog domain, therefore, its transfer function has not been included in the cancellation filter Coefficient in cancels the coefficient that appears between stages in the analog domain. Although it can be optimized to maximize signal swing, a value of is appropriate.
4.2. Simulation results and discussion. Considering that the low-frequency gain of the integrators in the second stage is dB greater than that of integrators in the first stage, the expected theoretical improvement in the SNDR is 40 dB when compared to a classical cascade of two second-order stages where all integrators are working at the lowest rate For instance, if N = 4 the expected improvement in the SNDR is 24 dB.
177
This result is validated by simulation. Figure 8 shows the SNDR for four different 2-2 cascade SD modulators. Note that the SNDR for a MC-SD modulator with and N=4 is approximately equal to the SNDR of the conventional SD modulator operating at the oversampling ratio M=64.
5. Influence of Non-Idealities on Circuit Performances In practice, imperfections in the analog components prevent the complete cancellation of the error terms. This is especially significant in MS-SD and MC-SD modulators.
For MS-SD modulators, if the signal transfer function error due to component non-idealities is then the contribution of the feedback error terms and W(z) to the output can be shown to be:
where
is the nominal signal transfer function obtained for The first factor in (18) is close to 1 in the pass-band, so
178
that the modulator sensitivity to noise leakage is approximately given by
A likely cause of such leakage is the finite dc gain of the input opamp, A. This will cause a that is (to a good approximation [14]) a high-pass filter of order L, where L is the order of the loop filter. Noise leakage due to analog inaccuracies limits the achievable accuracy if the sampling rate increment N is high. This effect is hardly appreciable for reasonable values of N, as shown in figure 9. Concerning MC-SD modulators, figure 10 shows the simulated SNDR for the 2-2 MC-SD modulator with and N=4 and its single-rate counterpart with for different values of the op-amp dc gain. According to figure 10, the proposed modulator requires op-amps with only l0dB more de-gain to achieve a sensitivity similar to that a conventional single-rate modulator. 6. Conclusions This paper has shown that multirating is a useful technique to reduce power consumption in SD converters. Three new types of SD
179
modulators have been presented. The first one, called Multirate Multibit SD (MM-SD) modulator, uses a low oversampling ratio in the first integrator of a single loop modulador. MM-SD modulators require an additional digital filter and one multibit DAC in the feedback path, and they can be considered to be multibit converters, whose multibit quantizer in the forward path has been replaced by a single bit one, by an increase in the sampling frequency of last integrators. The second type of new SD modulators, called Multirate Single-bit SD (MS-SD) eliminates the multibit DAC in the feedback path by digital cancellation. The third type of new SD modulators, called Multirate Cascade SD (MC-SD) modulator, uses a low oversampling ratio in the first stage of a cascaded architecture. MCSD modulators do not require a multibit DAC. Performances of these new modulators have been studied and compared to those of conventional SD modulators showing that, concerning resolution, a reduction in the oversampling ratio of the first integrator(s) or stage(s) of a SD modulator can be compensated by an increase in the oversampling ratio of the last ones, whose contribution to power consumption is not so significant. These results open the way to new high-performance multirate SD modulators, where the sampling frequency of each integrator or stage is optimized to minimize power consumption. Although this paper focused on discrete-time implementation of SD modulators, similar results can be expected from their continuous-time counterparts. Acknowledgements This work has been supported by the Spanish CICYT under project DABACOM. References [1] [2]
J.C.Candy, and G.C.Temes, “Oversampling methods for A/D and D/A conversion,” in Oversampling Delta-Sigma Data Converters. New York: IEEE Press, 1992, pp. 1-25. L.E.Larson, T.Cataltepe, and G.C.Temes, “Multi-bit oversampled A/D convertor with digital error correction,” Electron. Letters, vol. 24, pp. 1051-1052, Aug. 1988.
180
[3]
[4]
[5]
[6]
[7] [8] [9]
[10] [11] [12] [13] [14]
J.W.Fattaruso, S.Kiriaki, M. de Wit, and G.Warwar, “Selfcalibration techniques for a second-order multi-bit sigma-delta modulator,” IEEE J. Solid-State Circuits, vol. 28, pp. 12161223, Dec. 1993. B.H.Leung, and S.Sutarja, “Multibit sigma-delta A/D converter incorporating a novel class of dynamic element matching techniques,” IEEE Trans. Circuits Syst.-II, vol. 39, pp. 35-51, Jan. 1992. S.Rabii, B.A.Wooley. “Appendix A: Fundamental limits,” and “Appendix B: Power dissipation vs. supply voltage and oversampling rate,” in The design of low-voltage, low-power sigma-delta modulators. Boston: Kluwer AP, 1999. F.Colodro, A.Torralba, F.Muñoz, and L.G.Franquelo. “New class of multibit sigma-delta modulators using multirate architecture,” Electron. Letters, vol. 36, pp. 783-785, April 2000. F.Colodro, A.Torralba, A.P.Vega-Leal and LG.Franquelo, “Multirate-multibit Sigma-Delta modulators,” Proc. IEEE ISCAS’00, vol. 2, pp. 21-24, 2000. F.Colodro, and A.Torralba. “Multirate sigma-delta modulators,” IEEE Trans. on Circuits and Systems-II, (to appear). O.Oliaei, “Analysis of multirate sigma-delta modulators,” Proc. IEEE ISCAS'01 , vol. 1, pp. 448-451, 2001. F.Colodro, A.Torralba, “Improved multirate sigma-delta architecture,” Proc. IEEE ISCAS’01, vol. 1. pp. 464-467, 2001. T.C.Leslie, and B.Singh, “An improved sigma-delta modulator architecture,” Proc. IEEE ISCAS'90 , pp. 372-375, 1990. A.Torralba, F.Colodro, “Multirate-cascade sigma-delta (MCSD) modulators,” Proc. IEEE ISCAS’01, vol. 1. pp. 384-387, 2001. F.Colodro, A.Torralba. “Modulador Sigma-Delta en Cascada Multifrecuencia,” Spanish patent application number P200101073, filled May 2001. G.C.Temes, “Finite amplifier gain and bandwidth effects in switched-capacitor filters,” IEEE J. Solid-State Circuits, vol. SC-15, pp. 358-36, 1980
Circuit Design Aspects of Multi-Bit Delta-Sigma Converters Yves Geerts1,* Michiel Steyaert2 and Willy Sansen2 1 Alcatel Microelectronics, Zaventem, Belgium Email :
[email protected] 2 KU Leuven, ESAT-MICAS, Heverlee, Belgium
Abstract
converters are suitable to implement high-performance analogto-digital converters. Several topologies are first reviewed in the context of high-resolution high-speed design targets. The remainder of the paper focuses on the influence of several important circuit nonidealities which can become performance limiting factors. A 16-bit 2.5 MS/s converter is discussed as a design example.
1
Introduction
Over the last decade, a vast evolution of communication systems was observed driven by the broadband Internet access demands and the development of wireless communication systems. The core of these complex electronic systems consists of digital circuits which have a huge computational power and are implemented in CMOS technologies. These systems require a high-performance interface to the analog world. This paper discusses design aspects of high-performance AD converters. The first section briefly introduces different architectures which will be used as an example throughout the paper. Section 3 takes a look at the optimal implementation of the integrator and the DAC and the sizing of the sampling capacitances. Section 4 discusses the influence of several circuit non-idealities on the performance of the converter. Finally, the design of a 16-bit converter is discussed as an example, followed by some conclusions. *Yves Geerts was at KU Leuven until august 2001. Since then he has been with Alcatel Microelectronics
181 M. Steyaert et al. (eds.), Analog Circuit Design, 181–203. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
182
2
Architectures
In order to combine a high resolution with a high speed, a small oversampling ratio should be selected to limit the clock speed of the converter and thereby the bandwidth requirements of the integrators. Therefore, a topology is required which can deliver high accuracies at low oversampling ratios. Classical single-loop single-bit converters would require a high order of the loop filter to achieve a good accuracy at a low oversampling ratio. Unfortunately, increasing the order seriously degrades the stability, resulting in a serious deterioration of the SNR compared to an ideal n-th order structure [1, 2]. Cascaded or MASH topologies [1, 3, 4, 5, 6, 7] can combine high order noise shaping with the excellent stability of a second order converter. The main drawback of these topologies are the high building-block specifications in order to avoid noise leakage from the first stage to the output. Finally, multi-bit converters can achieve a significant improvement in performance by employing a multi-bit quantizer. Besides the accuracy improvement for each extra bit in the quantizer, they offer improved stability. This allows a more aggressive noise shaping which results in an additional accuracy improvement [1, 2, 8]. For example, the accuracy of a four-bit third-order design improves by 37dB compared to a single-bit design, of which 13.5dB is due to the more aggressive noise-shaping function and the larger overload level of the converter [9]. This shows that the stabilizing effect of multi-bit converters can be used to gain a significant performance improvement. In [9], the optimal coefficients are derived for a large variety of topologies. This leads to the overview of the achievable performance shown in Fig. 1. These graphs clearly show that cascaded and/or multi-bit topologies can combine a high-performance with a low oversampling ratio. Finally, it should be noted that multi-bit converters impose severe linearity requirements on the DAC. Since the DAC is in the feedback loop, its accuracy needs to be at least as good as the converter in order not to deteriorate the performance. Two main trends exist to relax these specifications. The first option, the dual-quantization topology [10, 3, 4, 5], is no longer a true multi-bit structure since the feedback to the first stage in now implemented with only one bit. Therefore, the benefits due to the improved stability are partly sacrificed. Furthermore, when dual-quantization techniques are used in cascaded topologies,
183
the single-bit output of the first stage contains much more quantization noise than the subsequent multi-bit stages. Therefore, the already severe building block specifications of cascaded topologies are increased even further to avoid degradation due to noise leakage. The second option, dynamic element matching (DEM) [11, 12], uses a digital algorithm to shuffle the unit element of the DAC and obtain a shaping of the DAC error. This technique does provide all the advantages of a multi-bit structure, while it only requires an additional digital block in the feedback path of the converter.
3
Implementation of the Integrator
The feedback of the converter can be implemented in different ways. First, the possibilities for a single-bit feedback are discussed, followed by a discussion of multi-bit feedback. The principles are illustrated in a single-ended way, although implementations are generally differential since this improves the by 3dB and provides better suppression of even order harmonics. There are generally two different ways to implement a single-bit feedback in a switched-capacitor integrator. Fig. 2a uses a single reference voltage which is used as the input to both an inverting and a non-inverting network. The decision of the comparator determines how both branches are connected to the differential integrator duringclock-phase Fig. 2b uses two symmetrical reference voltages which can be connected to the input terminal of both sampling
184
capacitances to perform the feedback function. Under the assumption that equals both circuits have the same function but some important differences exist with major implications on the power consumption and speed of the integrator. Since the implementation with double reference voltages shares the sampling capacitance to sample the input and subtract the feedback, it only requires half the number of capacitances and less switches. This results in half the amount of kT/C noise power and consequently the capacitance sizes for the double reference voltage implementation can be reduced by a factor two for the same This results in a power decrease of the OTA and a smaller die-size. On top of that, the capacitive feedback factor during the integration phase is larger since less capacitors are connected to the input of the amplifier. This results in a smaller capacitive load and a larger dominant closed-loop pole, leading to a faster settling of the charge transfer. Unfortunately, there are also a few drawbacks associated with the use of double polarity reference voltages. The circuitry to generate and buffer the reference voltages is more complicated since two symmetrical reference voltages are required. Furthermore, the charge that these buffers deliver to the capacitors in dependent on the input signal. Therefore, special attention must be paid to the reference buffers to ensure sufficient settling performance. Note that the power consumption of the buffer can become an important part of
185
the total power consumption of the converter [13]. Despite this drawback, double polarity reference voltages are chosen due to the power and settling advantages of the integrators. When a multi-bit feedback path is required, a first implementation option is to use a resistor ladder between and to generate the required feedback voltages, as shown in Fig. 3a. The switches are driven by a 1-of-n code and connect the wanted ladder taps to the sampling capacitances. Since the resistance ladder is also needed for the multi-bit quantizer, the extra hardware is limited to the switches and a simple digital conversion from the thermometer output code of the quantizer to a 1-of-n code. This hardware can be shared among all the integrators. In order not to degrade the settling performance of the integrators, the resistance of the ladder has to be small, resulting in an increased power consumption. Another drawback of this implementation is the incompatibility with dynamic element matching techniques since these require the shuffling of the unit elements of the ladder. Therefore, this implementation is mainly used in the last stage of dual-quantization converters [3, 4, 5]. Fig. 3b shows a more interesting alternative. For a three-bit implementation, the sampling capacitance is split up into seven parallel unit capacitors, which
186
can be connected separately to either or during the integration phase. The connection of the unit capacitors to or is directly controlled by the thermometer output code of the quantizer. Since the use of each unit capacitor can be directly controlled, this implementation is very well suited for dynamic element matching techniques. To determine the size of the sampling capacitance, two requirements have to be taken into account. The first requirement is related to the circuit noise of the converter. In most practical cases, the noise of the OTAs can be neglected compared to the kT/C noise [9] and the circuit noise for a differential implementation can be expressed as
where OL represent the overload level of the converter and OSR the oversampling ratio. The relative level of the quantization noise and the circuit noise are important for the power consumption of the converter. If the total in-band circuit noise is lower than the in-band quantization noise, power is wasted since the sampling capacitors are oversized. On the other hand, when the circuit noise dominates, a lower order converter or a lower oversampling ratio could be used to obtain the same performance. These considerations show that the in-band circuit noise should be approximately equal to the in-band quantization noise to obtain a low power consumption [14]. This condition results in a minimum value for the sampling capacitance The second requirement for the sampling capacitance is related to the accuracy of the multi-bit feedback DAC. This accuracy depends on the chosen architecture and the use of dual-quantization or DEM algorithms to relax these specifications. Monte-Carlo simulations, combined with matching data of the technology, yield a minimum size for the unit capacitance and thus also for the sampling capacitance, noted as Depending on the specifications of the converter, the topology, the number of bits in the quantizer, the dynamic element matching algorithm and the technology, either the specification due to the matching or due to the kT/C noise will dominate. If the total sampling capacitance will be larger than needed for the kT/C noise floor. This means a waste of power simply due to the matching constraints of the multi-bit feedback DAC. This situation should be avoided by a proper selection of the architecture. If the capacitances are determined by the kT/C noise floor
187
and no power is wasted. Thus, when a good architecture choice is made, the multi-bit integrator can be implemented with the same amount of capacitance as a one-bit implementation with the same accuracy goal, thereby avoiding additional kT/C noise and capacitive loading of the integrator. The power consumption remains the same as for a single-bit implementation and the die size is also comparable.
4
Circuit non-idealities
Due to the influence of several circuit non-idealities, the performance of a practical implementation of a converter can be significantly worse than the values shown in Fig. 1. In order to properly design the converter, one should know the influence of all the non-idealities on the performance of the converter. First, several important non-idealities of the switched-capacitor integrator are discussed, followed by a closer look at the quantizer. The derivation and the equations of the models for these non-idealities are not discussed in this paper, but a detailed description can be found in [7, 9, 15]. Instead, this paper focuses on the influences of the architecture choice on the specifications of the building-blocks and presents some general design guidelines and rules-of-thumb for various non-idealities. The model for the switched-capacitor integrator is shown in Fig. 4. represents the feedback of the DAC and and are the sampling and integration capacitances. In order to obtain an accurate model, it is very important to include the parasitic capacitors and associated with the
188
input and output capacitances of the OTA, respectively. also includes the bottom plate parasitic of is included to model the sampling operation of the next stage. The model assumes that the OTA of the next integrator is ideal such that the top plate of is connected to a perfect (virtual) ground during both clock phases. 4.1
Finite OTA gain
The finite gain of the OTA is the first non-ideality of the integrator that is discussed. The OTA is represented by a voltage controlled voltage source with gain A. By applying the methods described in [16], the output of the integrator at the end of the sampling phase can be calculated as [7]
where and are the closed-loop static errors and and are the capacitive feedback factors during the sampling and the integration phase, respectively. They are given by
The finite OTA gain introduces two errors in the transfer function of the integrator. The gain error reduces the gain of the integrator by and the pole error moves the pole from dc (z=l) to Both these errors depend on the product of the OTA gain and the capacitive feedback factors. This model can be used to determine the required OTA gain. Fig. 5 shows simulations of single-loop and cascaded topologies for various oversampling ratios and different OTA gains. The left graph shows the simulation results of a third-order four-bit converter. This shows that a gain of only 30 to 40dB is sufficient in order to avoid performance degradations. Other single-loop topologies have similar characteristics and gain requirements. Note that the gain requirement depends only slightly on the oversampling ratio. This is no longer true when cascaded topologies are considered. The full lines in the right graph show the simulation results of a 2-1-1 cascaded topology with one-bit
189
quantizers. As the oversampling ratio increases, the gain requirement of the OTA increases from 40dB to more than 100dB. This large gain variation shows that the noise-leakage of the quantizer of the first stage to the output of the converter becomes more important as the oversampling ratio increases. This results in tougher building block specifications, such as the OTA-gain. The dashed line in the right graph indicates a dual-quantization cascaded topology with single-bit quantizers in the first two stages and a four-bit quantizer in the last stage. When this curve is compared to the single-bit cascaded implementation with the same oversampling ratio, it is clear that the increased performance of the dual-quantization structure comes at the cost of a severe increase of the gain requirement. In a practical implementation of an OTA, the gain is not the same for all values of the output voltage. Instead, it decreases as the differential output voltage increases. This is due to the reduction of the output resistance as the drain-source voltage of the output transistors decreases. The non-linear gain of the OTA can be modeled by a truncated Taylor expansion as
where is the output voltage of the OTA. Note that is a negative value since the gain decreases as the output swing increases. In [9], the following signal-to-harmonic-distortion ratios are calculated for this model
190
where is the amplitude of the input signal. These calculations show that the distortion components can be suppressed by increasing the gain of the OTA. Fig. 6 compares behavioral simulations to the analytical model for a thirdorder four-bit converter for various values of the nominal OTA gain. This shows that the model fits very well, except for very small and very large values of and The deviation for small values is easy to explain. Even when no harmonic distortion components are visible in the output spectrum of the converter, a shaped quantization noise floor is present. This limits the maximum observable and values since the harmonic distortion components are submerged in the quantization noise. Note that the level of the quantization noise in each bin is dependent on the number of points that are used in the simulations. However, when and are very close to unity, is up to 12dB worse and up to 6dB better than predicted by the analytical model. At that point, is no longer valid and some approximations made during the derivation of the analytical model are no longer valid. When a gain of 40dB is selected for the OTA, the gain variation due to can only be 0.05dB to achieve 16-bit performance levels for a 1V signal. These kind of values are almost impossible to achieve. So although the simulations with a fixed gain only require a gain of 40dB for single-loop topologies, this specification should be increased in order to reduce the distortion components sufficiently.
191
4.2 Finite dominant closed-loop pole of the OTA In the previous model, all the voltages instantaneously take their final value at the beginning of the sample and integration phase since the voltage controlled voltage source does not model any settling effects. However, in a practical implementation, the poles of the OTA limit the settling performance. Therefore, the OTA is modeled with a transconductance and a finite output conductance to study the influence of the dominant closed-loop pole of the amplifier [7]. The dominant closed-loop pole during the integration phase can be calculated as
where represent the total capacitive load at the output of the OTA and thus also includes the sampling capacitance of the next stage. Just as for the finite gain of the OTA, simulations are performed for a single-loop and a cascaded converter. Fig. 7 shows the simulation results and indicates the OTA gain which is used during the simulation. An OTA gain of 80dB is chosen for all cases, except for the large oversampling ratios of the cascaded converter. For the single-loop converters, a dominant closed-loop pole equal to 1.5 times the sampling frequency is sufficient. For the cascaded
192
converters, the requirement varies from 1.5 to 4 times the sampling frequency as the oversampling ratio increases. As a rule of thumb, can be chosen for all single-loop topologies. The requirement for cascaded converters depends on the topology and the oversampling ratio, but is generally larger than for single-loop converters.
4.3
Resistance of the switches
Up to this point, the switches have been assumed to have an ideal zero resistance when they are closed. In practice, they are implemented with nMOS and/or pMOS transistors exhibiting several non-ideal effects such as a nonzero resistance, clock feedthrough, charge injection and the variation of the resistance with the input signal. First, the influence of a fixed non-zero resistance is discussed. The resistance of switches S1 and S2 of Fig. 4 are lumped into one element with a resistance and switches S3 and S4 are represented by resistance For a single-bit converter, this model introduces no approximations. However, for a multi-bit converter, an exact representation requires a number of parallel branches with a unit capacitance, a resistance and a feedback to corresponding to Fig. 3b. This would result in a very complicated high-order model, since the number of nodes drastically increases. Therefore, the approximative model of Fig. 4 is used, but the results correspond very well to the more complex model [9]. Compared to the equation for the settling error in the previous section, which does not include the switch resistance, the dominant closed-loop pole is degraded to
where is given by (7). This clearly shows the influence of the switch resistance during the integration phase on the degradation of the settling behavior of the integrator. Fig. 8 shows simulations to determine the specifications for the on-resistance of the switches. In order to determine the influence of the resistance during the sampling phase during the integration phase and the combined effect of both, three simulations are shown which are represented by the dashed, dash-dot and solid lines, respectively. For the single-loop converter, all three lines are quite close together and it is sufficient that is larger than
193
1.5 times the sampling frequency. For the cascaded 2-1-1 converter with a low oversampling ratio, the same conclusion can be drawn. However, as the oversampling ratio increases, it becomes clear that the resistance during the sampling phase is of lesser importance. This could be expected from the model since and not only form a time constant which slows down the charge transfer, but also reduces the dominant closed-loop pole of the OTA. Just as for the previous models, a larger oversampling ratio requires tougher specifications for the cascaded converters. This comes down to a smaller switch resistance and therefore larger switches resulting in more clock feedthrough and increased capacitive loading of the clock buffers that drive these switches. Note that the dominant closed-loop pole influences the specifications of the switch resistance significantly. A larger dominant closed-loop pole generally results in a more relaxed specification for the switches, as can be seen from (8). The second important problem related to the switches is the variation of the switch-resistance with the input signal. When a switch is in the on-phase, it can be assumed that the transistor is in the linear operation region. The resistance of an nMOS is given by [17]
This equations shows two important effects. First, a reduction of the supply
194
voltage immediately increases the switch resistance since the overdrive voltage of the transistor decreases. Second, the switch resistance is dependent on the source and drain voltages. When looking at the switched-capacitor integrator of Fig. 4, it is obvious that the resistance of switch S1 depends directly on the input signal of the converter. This generates harmonic distortion. Both these effects can be reduced by employing transmission gates with n and pMOS transistors in parallel. The simulated switch resistance of switch S1 as a function of the input signal is shown in Fig. 9a. This graph clearly illustrates that the resistance of switch S1 depends directly on the input signal and thus causes harmonic distortion. Switch S2 always has one terminal connected to a fixed voltage. So at the end of the sampling phase, the voltages at the source and drain of that switch are about constant for each clock period. Therefore, the distortion generated by this switch can be neglected compared to the distortion introduced by switch S1. The same applies to switches S3 and S4, which are connected to respectively a fixed reference voltage or the virtual ground of the OTA at the end of the integration. Furthermore, no time varying input signal is driving the circuit during the integration phase. Instead, a constant charge proportional to the input signal is transferred from the sampling to the integration capacitance. Since this is like applying a dc signal during every clock phase, S3 and S4 generate considerably less distortion. Therefore, to study the distortion
195
introduced by the switches, only the sampling operation through switches S1 and S2 has to be considered. Fig. 9b shows that and improve approximately by 20dB/dec as the dimensions of the switches increase. For a differential implementation, is sufficiently suppressed, but one should ensure that is larger than the wanted resolution of the converter for an input signal close to overload and at the edge of the signal band since these conditions ensure worst case distortion components [7]. This can be done by making the switches large enough. Note that this improves both the distortion and the settling performance of the integrator. A drawback is that larger switches increase the clock feedthrough, the charge injection and the capacitive load on the clock drivers. If the switch becomes too large, the non-linear parasitic junction capacitances of the switch can eventually dominate the sampling capacitance and degrade the linearity of the sampling operation. As the supply voltage of modern technologies is further reduced, special circuit techniques and technologies can be used to improve the linearity of the input switch. A first option is the use of technologies with low devices since this leads to a larger overdrive voltage and a reduced on-resistance, but these devices tend to have problems with increased leakage currents. Besides this, the extra processing steps lead to increased cost and turn-around time [18, 19]. Another way to improve the linearity of the sampling operation is by applying clock boosting techniques [19]. A first technique boosts the clock signals to twice the supply voltage to reduce the resistances [20]. A second technique tries to keep the overdrive voltage of the switches constant by boosting the clock signal to the power supply plus the input signal [13, 21, 22]. The constant significantly enhances the linearity, although the body effect still makes the resistance signal dependent. Two variations of this techniques exist. The first has a constant gate voltage during the sampling process [13]. This voltage equals the input signal plus the power supply. It does not track any variation of the input signal during the sampling operation. In contrast to this, the gate voltage of the second technique tracks the sum of the input signal and the power supply during the sampling operation and ensures a constant [21, 22]. This last variation is more suitable for input frequencies close to the Nyquist rate. Finally, [23] proposes a technique which also compensates for the body effect by using a replica. The drawback is the requirement for a high-speed OTA and thus the large power consumption.
196
All these boosting techniques require the use of voltage levels above the intended supply voltage of the technology. Although some of these techniques ensure that and are always below the maximum power supply [21, 22], care must be taken not to compromise the lifetime of the circuits. Another drawback is the area and power overhead of these boosting circuits.
4.4
Slew-rate effects
Another important non-ideal effect in switched-capacitor integrators is the slewing of the OTA. The slewing performance of the integrator is dependent on many factors and it is very important to include all of them in order to obtain an accurate model. First, the parasitic capacitances and shown in Fig. 4, have a big influence on the slewing behavior of the integrator. Slewing is most likely to occur at the beginning of the integration phase. The voltage sampled on the sampling capacitance is then switched between the feedback signal and the input terminal of the OTA, causing a large voltage spike on node which can drive the OTA into slewing. When and are not included in the model, the height of this spike equals and slewing will occur very frequently. However, when and are included, the height of this initial voltage is significantly reduced since an immediate charge redistribution among the capacitors occurs at the beginning of the integration phase. The second requirement for the slew-rate model is the need to include the sampling operation of the next integrator. The waveforms of Fig. 10a illustrate this. At the beginning of is connected to the output of the OTA and a voltage drop of the output is observed due to an immediate charge redistribution. Note that shows the same drop, so the charge on the integration capacitance is not affected. Due to this peak, the OTA can also enter the slewing region during To ensure that the correct voltage is sampled on this peak should also settle. Therefore, the sampling operation of the next stage needs to be included in the model. Finally, the resistance of the switches smooths the initial peaks at the beginning of the clock-phases and thus also influence the slewing behavior. Fig. 10a shows the transient waveforms of the input and output node of the integrator. The full horizontal lines indicate the limits for the slewing condition, given by The thin lines are the waveforms from a full circuit simulation. This clearly shows that the model matches the circuit
197
simulator very closely. Fig. 10b shows simulation results for a third-order converter with one or four bits in the quantizer and an oversampling ratio of 32. A large difference can be observed between the single and multi-bit converters for the slew-rate simulations. In a multi-bit converter, the feedback signal tracks the input signal much closer and therefore the initial voltage drop at the input of the OTA will be much smaller. This results in more relaxed slew-rate specifications. Finally, it should be noted that the slew-rate specification depends a lot on the applied input frequency, especially for multi-bit converters. The reason is that a larger difference exists between the input and feedback signal as the signal frequency increases. This results in larger initial voltage steps at the input of the OTA and consequently slewing occurs more frequently. Therefore, larger slew-rate values are required to reduce the time spent in slewing and to ensure adequate settling in each clock phase.
4.5
Offset effects in the quantizer
The non-idealities in the quantizer are less important than the non-idealities of the feedback DAC due to the location in the converter. Any non-ideality of the DAC immediately appears unattenuated at the output of the converter. In contrast to that, the non-idealities of the quantizer are suppressed by the
198
gain of the preceding integrators. In fact, they are subject to the same noiseshaping action as the quantization noise. Therefore, they are generally less important and can be neglected in many cases. However, it will be shown that they can become a performance limiting factor in high-resolution converters [15]. The quantizer in a converter runs at the same speed as the converter without any latency. Therefore, it is implemented as a flash AD converter consisting of several parallel comparators and a reference ladder to generate the required voltage taps. Fig. 11a shows the transfer function of a comparator including offset and hysteresis effects. In this paper, only the influence of offset is discussed. The offset is considered a random variable, which depends on the matching performance of the technology and the sizes and topology of the comparator. Monte-Carlo simulations of the behavioral model are performed for various converters. The worst-case results are shown in Fig. 11b. Fig. 11b shows that the third-order single-bit converter is very insensitive to these non-idealities. For a reference voltage of 1V, the standard deviation of the offset needs to be smaller than 100mV to have less than 3dB degradation. For the cascaded converter, this specification is 40mV. These simulations show that single-bit converters are very insensitive to non-idealities in the quantizer. The offset specification requires some care during the design, but it imposes no real problems. When the same simulations are performed for a third-order four-bit
199
converter, the situation is quite different. The standard deviation of the offset voltage should be smaller than 6mV. Due to the offset specification, fairly large input transistors are required which results in a larger input capacitance of the comparator. On top of that, the multi-bit quantizer has a number of comparators is parallel which increases the input capacitance even further. This total input capacitance increases the effective load capacitance of the last integrator. Together with the settling requirements, this can lead to an increased power consumption of the last integrator [15].
5
Design example
In this section, the design of a single-loop multi-bit converter is presented [15, 24]. The system diagram is shown in Fig. 12. A third-order four-bit topology is selected which can achieve a resolution of 16 bits for an oversampling ratio of only 24. Data Weighted Averaging is used to limit the accuracy requirement of the DA converter in the feedback loop. The output of this DWA block has to be distributed over the entire chip to the local switch drivers. Therefore, a buffer is inserted to deal with the large gate and wiring capacitance. The local switch drivers generate the local control signals for the switches of the DACs, which are implemented together with the integrators as shown in Fig. 3b. The size of the sampling capacitances is determined by kT/C noise considerations. The first stage uses a total sampling capacitance of 3.2pF. The matching of the resulting unity capacitors is sufficiently good thanks to the use of the DWA algorithm. The specifications for the different building-blocks are derived from beha-
200
vioral simulations, as discussed in the previous sections. Table 1 shows some of the most important specifications. Note that a 15% margin is added to the specification of the dominant closed-loop pole and the resistance of the switches to take the lost time due to the non-overlapping clocks into account. A folded-cascode OTA with gain-boosting stages is used to combine a large gain with an excellent frequency performance. The converter is implemented in a CMOS process, operating from a 5V supply. The location of the different building blocks is indicated on the micro-photograph of the chip, shown in Fig. 13. Special care has been taken to provide identical surroundings for the unit capacitances of the DAC and to shield the most most sensitive nets such as the reference voltages. The total area, including bonding pads and decoupling capacitances, is Fig. 14a shows the output spectrum, while Fig. 14b shows the measured SNR and SNDR of respectively 95 and 89dB for a clock frequency of 60MHz
201
and an oversampling ratio of 24. A dynamic range of 97dB is achieved in a 1.25MHz signal bandwidth. The power consumption is 295mW, of which 152mW is consumed in the analog part. The digital power consumption is mainly due to the clock buffer, which generates the non-overlapping clocks for the switch-capacitor circuits from one external clock signal.
6
Conclusion
AS converters are widely applied to achieve high-performance AD conversion. A brief review of architectures was discussed to show that either multi-bit and/or cascaded converters are most suited to combine a high-speed with a high-resolution. In order to obtain an well designed implementation of these converters, it is important to know which are the performance limiting circuit non-idealities. In that respect, the (non-linear) resistance of the switches, the gain, settling and slewing of the OTA and the offset of the quantizer have been discussed. A 16-bit 2.5 MS/s converter was discussed as a design example.
References [1] Steven Norsworthy, Richard Schreier and Gabor Temes, editors, Delta-Sigma Data Con-
verters: theory, design, and simulation, IEEE Press, 1996.
[2] Augusto Marques, Vincenzo Peluso, Michiel Steyaert and Willy Sansen, “Optimal Parameters for modulator topologies”, IEEE Transactions on Circuits and Systems, vol. 45, n. 9, pp. 1232–1241, September 1998.
202 [3] Brian P. Brandt and Bruce A. Wooley,
“A 50MHz multi-bit Sigma-Delta Modulator for 12-b 2-MHz A/D Conversion”, IEEE Journal of Solid-State Circuits, vol. 26, n. 12, pp. 1746–1756, December 1991.
[4] Fernando Medeiro, Belen Perez-Verdu and Angel Rodriguez-Vazquez, “A 13-bit, 2.2 MS/s,
55-mW Multibit Cascade Modulator in CMOS Single-Poly Technology”, IEEE Journal of Solid-State Circuits, vol. 34, n. 6, pp. 748–760, June 1999. [5] Fernando Medeiro, Lelen Perez-Verdu and Angel Rodriquez-Vazquez, Top-down design
of high-performance Sigma-Delta Modulators, Kluwer Academic Publishers, 1999. [6] Augusto Marques, Vincenzo Peluso, Michiel Steyaert and Willy Sansen, “A 15-b Reso-
lution 2MHz Nyquist Rate ADC in a CMOS Technology”, IEEE Journal of Solid-State Circuits, vol. 33, n. 7, pp. 1065–1075, July 1998. [7] Yves Geerts, Augusto Marques, Michiel Steyaert and Willy Sansen, “A 3.3 V 15-bit Delta-
Sigma ADC with a Signal Bandwith of 1.1 MHz for ADSL-applications”, IEEE Journal of Solid-State Circuits, vol. 34, n. 7, pp. 927–936, July 1999. [8] Robert W. Adams,
“Design and Implementation of an Audio 18-Bit Analog-to-Digital Converter using Oversampling Techniques”, Journal Audio Eng. Soc., vol. SC-34, pp. 153– 166, March 1986.
[9] Yves Geerts,
Design of high-performance CMOS Delta-Sigma A/D Converters, thesis, ESAT-MICAS, K.U.Leuven, Belgium, December 2001.
PhD
[10] A. Hairapetian, G. C. Temes and Z. X. Zhang, “Multibit sigma-delta modulator with
reduced sensitivity to DAC nonlinearity”, Electronics Letters, vol. 27, pp. 990–991, May 1991. [11] L. R. Carley, “A noise-shaping coder topology for 15+ bit converters”, IEEE Journal of
Solid-State Circuits, vol. 28, n. 2, pp. 267–273, April 1989. [12] Rex T. Baird and Terri S. Fiez, “Linearity Enhancement of Multibit
A/D and D/A Converters Using Data Weighted Averaging”, IEEE Transactions on Circuits and Systems II, vol. 42, n. 12, pp. 753–762, December 1995.
[13] Todd L. Brooks, David H. Robertson, Daniel F. Kelly, Anthony Del Muro and Stephen W.
Harston, “A Cascade Sigma-Delta Pipeline A/D Converter with 1.25 MHz Signal Bandwidth and 89 dB SNR”, IEEE Journal of Solid-State Circuits, vol. 32, n. 12, pp. 1896– 1906, December 1997. [14] V. Peluso, M. Steyaert and W. Sansen, “A
delta-sigma Modulator with 12b Dynamic Range Using the Switched-Opamp Technique“, IEEE Journal of Solid-State Circuits, vol. 32, n. 7, pp. 943–952, July 1997.
[15] Yves Geerts, Michiel Steyaert and Willy Sansen, “A High-Performance Multi-Bit CMOS
Converter”, IEEE Journal of Solid-State Circuits, vol. 35, n. 12, pp. 1829–1840, December 2000. [16] Gabor C. Temes, “Finite Amplifier Gain and Bandwidth Effects in Switched-Capacitor
Filters“, IEEE Journal of Solid-State Circuits, vol. SC-15, n. 3, pp. 358–361, June 1980.
203 [17] Kenneth R. Laker and Willy M.C. Sansen, Design of Analog Integrated Circuits and Systems, McGraw-Hill, New York, 1994. [18] Ichiro Fujimori, Lorenzo Longo, Armond Hairapetian, Kazushi Seiyama, Steve Kosic, Jun Cao and Shu-Lap Chan, “A 90-dB SNR 2.5-MHz Output-Rate ADC Using Cascaded Multibit Delta-Sigma Modulation at 8x Oversampling Ratio”, IEEE Journal of SolidState Circuits, vol. 35, n. 12, pp. 1820–1828, December 2000. [19] Klaas Bult, “Analog Design in Deep Sub-Micron CMOS”, in Proceedings European SolidState Circuits Conference, pp. 11–17, September 2000. [20] Thomas Byunghak Cho and Paul R. Gray, “A 10b, 20Msample/s, 35mW Pipeline A/D Converter”, IEEE Journal of Solid-State Circuits, vol. 30, pp. 166–172, March 1995. [21] Andrew M. Abo and Paul R. Gray, “A 1.5V, 10-bit, 14.3-MS/s CMOS Pipeline Analogto-Digital Converter”, IEEE Journal of Solid-State Circuits, vol. SC-34, n. 5, pp. 599–606, May 1999. [22] Mohamed Dessouky and Andreas Kaiser, “Verly Low-Voltage Digital-Audio Modulator with 88 dB Dynamic Range Using Local Switch Bootstrapping”, IEEE Journal of Solid-State Circuits, vol. 36, n. 3, pp. 349–355, March 2001. [23] Hui Pan, Masahiro Segami, Michael Choi, Jing Cao and Asad A. Abidi, “A 3.3V, 12b, 50MS/S A/D Converter in CMOS with over 80dB SFDR”, IEEE Journal of SolidState Circuits, vol. SC-35, n. 12, pp. 1769–1780, December 2000. CMOS ADC [24] Y. Geerts, M. Steyaert and W. Sansen, “A 2.5MSample/s Multi-Bit with 95dB SNR”, in Proceeding International Solid-State Circuits Conference, pp. 336– 337, San Francisco, February 2000.
This page intentionally left blank
High-speed Digital to Analog Converter issues with applications to Sigma Delta Modulators K. Doris1, D. Leenaerts2, A.v. Roermund1 1
Technical University Eindhoven, 2Philips Research Laboratories, Eindhoven, The Netherlands Eindhoven, The Netherlands Abstract
This paper addresses the timing and switching errors of Switched Current (SI) Digital to Analog Converters (DAC) aimed to be used in high speed multibit and Nyquist Converters. The analysis is based on an error-analysis framework, according to which all the errors are classified dependent on their properties with respect to the time and spatial domains. Examples are presented that are related with significant problems occuring during implementation, and with overall requirements of high speed multi-bit modulators. I.
Introduction
Digital to Analog Converters (DAC) are primary building blocks -or complete systems on their own- of a wide variety of data converters. For example, they are necessary parts of Sigma Delta converters. The concept, applied both for Analog to Digital Converters (ADC) and DACs, allows the translation of signals that capture information in the amplitude domain to signals that carry this information in the time domain. In the extremes, we have the Nyquist converters (all information in amplitude) and single-bit converters (all information in time), but all situations in between are obtained gradually with the choice of the resolution of the internal ADC and DAC cores. However, the DAC, in particular, must have the linearity of the intented system. Typical architectures that employ DACs are shown in fig. 1 and 2. The difficulty for a DAC core to meet the requirements of a data converter is influenced by the signal algorithmic approach used. In Discrete-Time (DT) multi-bit ADCs [1], the continuous time input signal is initially sampled by a Track and Hold (T/H) circuit and the rest of the system operates with sampled values. Consequently, the accuracy of the settled values and the settling time behavior of the DAC is important. DT converters have reached 14 – 16 bits of 205 M. Steyaert et al. (eds.), Analog Circuit Design, 205–233. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
206
accuracy with signal bandwidth of 1 – 5 MHz and sampling rates within a few tenths of MHz, e.g. [2]-[4]. For signal frequencies in the order of tenths of MHz and proportionally higher sampling rates, a DT choice results in strict requirements on the input T/H and the Op.Amps of the DT modulator. This stems from the fact that the specifications for these blocks are tied to the sampling rate of the system. ContinuousTime (CT) ADCs [1] with multi-bit switched current based DACs [5]-[12] could provide an alternative solution. The advantage is that the specifications of the blocks are mostly related to the signal bandwidth instead of the sampling rate and that the sampling operation takes place in the loop. However, the main problem of such an approach is the linear operation of the internal DAC in continuous time mode. Generally, in a CT ADC, in a and in a Nyquist rate DAC, most of the blocks of the system must be accurate in a continuous-time sense. Hence, it becomes of great interest to obtain quantitative and qualitative insight to the problems of high-speed DACs.
The most suitable approach for high-speed DACs is based on a current mode approach. The primary element in such a DAC is a switched current source that
207
is controlled by data with a synchronization unit. Because of improper matching of the devices used to build the switching elements, the result is different switching behavior for each component, relevant to the location it has on the chip. This switching variability causes distortion to the output signal. Moreover, the interaction of components through global nodes of the system (clock, substrate, power supply, bias, output signal node etc) during operation adds also non-linear errors and noise. Hence, clock jitter, substrate coupling, impedance modulation errors, power supply bounce and switching behavior become the limiting factors for high speed and high accurate DACs. The aim of this contribution is, first, to analyze some aspects of the switching and timing problems of a DAC using a general framework of error analysis and, second, using this framework, to highlight emerging tradeoffs in high speed multi-bit modulators. The tradeoffs are related to oversampling ratio, noise shaping, multi-bit level, DAC circuit level requirements for given performance specifications etc. First, in sections II-V the operation of a current steering DAC and the main obstacles of performance are given with an emphasis on the switching problems and clock jitter. Next, in sections VI-VIII the errors are classified according to their relation with time and topology. Sections IX-X deal with the analysis of the errors. Finally, section XI summarizes the obtained results and conclusions are drawn. II. Operation
of the Current Steering architecture
The DAC output that corresponds to an input code for the discrete time index is composed by the sum of identical unit current pulses at a fixed sampling rate The block level diagram of this operation is shown in fig. 3. A code converter defines the weights of the current each DT/CT component delivers. In the following, a binary to thermometer code converter is assumed. Therefore, for N bits, there exist identical DT/CT elements and the output current is described by a sigma delta function:
208
where is the input code, is a unit step function, is the delta function, defines the convolution operator and is the amplitude of the unit current. stands for the sampling period. This general signal description reveals the nature of the DAC. A block level description of the complete DAC that stands for eq. (1) is shown in fig. 4.
In hardware, each current pulse is implemented by a combination of blocks such as a latch/driver and a current cell (a switch and a current source). This combination of blocks will be referred as a chain. Consequently, the block diagram of fig. 3 can be changed to that of fig. 5. Given the physical imperfections that are encountered in a silicon implementation, the chains are different and interfere with each other via electrical coupling. Consequently, the obtained physical signal deviates from the ideal description of eq. (1). In the following sections, III- V, the physical mechanisms that cause the chains to behave differently will be investigated. Then the properties of those error mechanisms will be discussed in a classification in sections VI-VIII. An analysis will follow upon this classification on the remaining sections and some general conclusions will be drawn.
209
III. Static performance
Ideally, in absence of dynamical problems, the DAC’s accuracy would be a function of the matching of its current sources consisting of transistors. The impact of MOS transistor matching in the accuracy of DACs has been investigated in [13] and it has been shown that threshold voltage variations and device size mismatches are the primary cause of current value differences. A DAC consists of a large set of equivalently designed current sources layouted in an array on silicon. In a simplified case, the matching problem in such an array is separated to random and deterministic parts dependent on the topological profiles of the errors. For random errors, the standard deviation of the variations in the current depend on the size of the MOS devices used to build the current sources and its overdrive voltage. Deterministic errors are dependent on thermal gradients, mechanical stress, and on the total area of the current sources -the larger the area, the larger the deterministic errors. Although matching coefficients tend to get better with newer technologies [14], it is questionable whether intrinsic matching can provide accuracy above 12 bits, and at the same time achieve a reasonable area and dynamic performance. Consequently, from a dynamic performance point of view, the interest is focused at the impact of the algorithms of accuracy correction on the dynamic performance. For example, Dynamic Element Matching (DEM) techniques reached a very mature level for DT systems aimed for low-medium frequencies (see [4] for an overview). At higher frequencies, however, and especially in CT type of converters, DEM adds significant switching noise and becomes less efficient. On the other hand, continuous time calibration [6] requires the replacement of a current source within each calibration cycle. This creates a dynamic problem due to switching and limits the efficiency of calibration at
210
higher frequencies. Background calibration, e.g. [11], requires extra voltage headroom and a large additional calibration logic. Summarizing, the DAC needs special algorithmic features in order to achieve static accuracy more than 12 bits. These techniques include proper sizing [5], [8], [9], spatial [8], [15], [16] and temporal mapping/averaging techniques Switching Sequences and Dynamic Element Matching, respectively,- and finally, calibration. At higher frequencies, the impact of these techiques on the dynamic behavior of the converter becomes very important. IV. Dynamic performance : switching errors
The most important and complex problem in current steering DACs is related with switching. If all pulses were analogous to a reference pulse the exact shape of the unit pulse wouldn’t matter. In reality, due to physical mismatches, the chains are not switching identically. In addition, because of electrical coupling phenomena, the switching behavior is related with the signal. Hence, non-linear signal distortion is created. Terms such as glitches and dynamic non linearities are commonly used [6]-[18]. Several questions rise because of the lack of knowledge for the above problems in conjuction with the building block requirements and with the generated signal. For example, it is a common argument to use de-glitchers/re-samplers [6], [11] at Nyquist DACs to eliminate the switching problem. In principle, a re-sampler eliminates the need for identical switching in the unit chain, but it is not clear how much impact the switching problems really have. It is also not clear whether an output re-sampler is really an effective way to circumvent the problem given the tough requirements for this block when more than 12 bits of accuracy and wideband performance are required at the same time. Moreover, using a multi bit DAC in CT ADCs means that any re-sampling must be current mode based. Another issue of concern deals with the summation of all the unit currents because it takes place directly at the output node of the DAC. The dynamics of this node are modulated by the signal because the impedance on this node is directly related to the number of switched-on chains. In this way, a significant source of signal distortion is created. An analysis on some parts of this problem can be found at [18]. In this paragraph, a brief overview is given on cases where switching problems have been discussed. In [6]-[17] the switching problem was related to glitches produced during an MSB transition in segmented architectures, to the synchronization of all the switches in each segment, and to the discharging of the common source node capacitance of the differential switch of each current
211
source. In addition, the switching problem was associated with the transistor mismatches between several chains, with charge feed-through/injection and with delays caused by improper sizing of the switches relatively to each other. In all the above and many other publications, the clock distribution network was also highlighted as a cause of synchronization problems of the chains [19]. The switching problem is also linked to the Track and Hold circuit -whenever usedin the output of the DAC because of its signal dependent non-linear switching behavior [17]. Another switching problem is the signal dependent delay of the chains caused by the signal dependent bounce on the power supply [20]. The impact of this problem in DACs has not been practically analyzed in literature, but it deserves attention. Based on the above, the switching errors can be: Delays during switching. The chains have different delays (skew) to switch their currents in the output. The timing errors are caused by offsets of transfer function characteristics of the chain’s blocks, timing skew of the clock lines, etc. Dynamical changes. The chains have different dynamical behavior that causes different recovery times from metastable situations, different gain, etc. Rise/fall time differences can appear in each chain separately, or in the combined output signal (code dependent settling time constants). Charge disturbances. Different charge is injected in the output of each chain. The charge per chain is different because of mismatches in the final switches, different slopes of the driving signals etc. Of course, other types of errors are not excluded. In fig. 6(a) a typical profile of the output of identically designed chains is shown stimulated with the same input signal. The chains generate different pulse shapes as a result of mismatches, clock skew etc. A way to proceed in the analysis of the switching problem is to decompose the problem in subparts according to the physical origins that were listed above. According to this categorization, in 6(b) the effect of delays are shown, in fig. 6(c) the effects of different dynamics in the chains, and in 6(d) the effects of charge disturbances. The basic property that the net result in all cases is a different charge from the nominal will be used. The focus, will be placed primarily on the analysis on the modulation of all different error charges during conversion, independently on how the charges are caused. Next, the exact weighting of each physical mechanism can be distinguished.
212
V. Clock Jitter
One of the fundamental problems of sampled data systems is the timing accuracy of the clock that defines the sampling instants. Any instability in the clock frequency is defined as clock jitter in the time domain, or phase noise in the frequency domain. Clock jitter is widelly used to describe both stochastic and deterministic events but phase noise is only associated with stochastic ones. Clock jitter has been analyzed for some cases of sampled systems [21]-[35], for oscillators and phase locked loops [24], [38]-[40] etc. For sampled-data systems, jitter is always related with A/D conversion and it is addressed in several conceptual levels, such as signal processing [21]-[29] and architecture [30][34], circuit [35]-[37] etc. Reported results on stochastic jitter for DT ADCs [31]-[33] and Nyquist rate ADCs [26] attribute jitter induced noise to the input sample and hold block. This type of jitter is called sampling jitter [21]. In CT modulators and also in all types of standalone DACs, jitter is dominated by the internal DAC [31], [32] and becomes important to exploit the dependency of jitter effects with resolution, noise shaping, oversampling ratio etc. The fundamental difference between ADC sampling jitter and DAC jitter is that the input signal in a DAC is a discrete-time, discrete-amplitude signal while
213
in an ADC it is a continuous-time and -amplitude signal. In the most commonly addressed case, the DAC input signal is assumed equal to the wanted analog signal. However, the discrete time signal contains all the architectural information (resolution, noise shaping, oversampling) because its level of resemblance to the wanted continuous signal is determined by the choices of DAC resolution N, oversampling ratio OSR and noise shaping order L. Shaping the quantization power towards higher frequencies changes the properties of the the discrete-time derivative of the DAC input signal and affects the jitter induced noise considerably. In the following, sections, VI-VIII the errors will be classified according to their principal properties and behavior in respect to topology and time. In this sense, we have amplitude and timing errors, local errors and global errors. Local errors are spatial stochastic or deterministic, and global errors are time stochastic or time deterministic. Then, the analysis of each class follows in the sections IX-X. VI.
Amplitude and Timing errors
A basic dinstiction can be made on the basis of the time scaling properties of errors. Let us consider a chain that consists of a latch and a current cell (a differentialy switched current source). Ideally, the settled value of the current should be When the chain is stimulated by data, it generates a current pulse different than the reference pulse because of physical mechanisms that introduce errors. For a chain the error on its current value is and it provides a charge during the period of the pulse. This error charge scales proportionally to the sampling period and therefore, the ratio of this charge over the nominal charge stays constant. This error is called an amplitude error. The switching error can be represented with a timing delay according to the following line of thought. Let us define the equivalent timing error of a chain that has a DC current amplitude value as the timing error that causes a charge error equal to the charge error caused by the actual switching error (timing, dynamic or charge feed-through or all) of this chain. The charge stays constant when scales and the ratio of this charge over scales proportionately to In this case we talk about a timing error.
214 VII.
Local errors
Consider chains that do not interact with each other during operation. Each chain is assigned a topological related behavior, in the sense that, once a chip sample is taken, each chain shows a different behavior from another chain placed in a different position. This means that each chain has a local behavior and the physical problem behind this behavior is limited device matching. For example, the threshold voltage values of two transistors defining two current switches, parts of two different chains, have values dependent on their position on the chip. If the switches of these chains are driven by the same input signal then they show different switching behavior: timing delays, switch onresistance and charge injection relative to their position in the chip. This reflects to a local timing error. The chain pulses are combined under the control of the input signal. Hence, signal dependency appears; for each input code, a fixed combination of chains is met, which reflects to a fixed combination of local errors . Consequently, the input signal modulates the topological problems and transfers them to the output signal. Local errors are associated with topology and matching and they can be categorized to stochastic or deterministic, dependent on their spatial profile (see left side of fig. 7). An important property of local errors is that they can be corrected at a different time scale than that of the signal. This property stems from the fact that, being associated with topology, local errors are can be identified separately from the signal. For example, calibration corrects local errors in a much slower rate than that of the signal. VIII.
Global errors
The distinguishing characteristic of global errors is that they can not be considered separately from the generated output signal. All switching chains behave in the same way but in a different manner for each code. The primary cause for this behavior is electrical interference and coupling between chains via global nodes. Global nodes are the power supply and substrate nodes, the output summing node, the biasing nodes and the clock node. The interaction of cells via the first two leads to power bounce and substrate coupling problems [20]; the pulse delay of all the pulses becomes a function of the number of switching chains at each sample change. The output summing node has impedance that varies with the number of active chains, i.e. the signal. Similarly, when a re-sampling block (T/H or S/H) is used, the sampling switches of this blocks see at their input a
215
signal similar to the input signal, hence, their behavior has a global characteristic: time delays, MOS switch resistance value and charge feedthrough become a function of the input signal. Finally, clock jitter is also a global error. Global errors can be distinguished to stochastic and deterministic with respect to time (see right side of fig. 7). They can be corrected only at the same time scale with that of the signal because they appear only when the current pulses are combined together. In the next sections, each category will be analyzed in detail and the impact that the results have on architectural issues, but also on building blocks, will be discussed. For deterministic-time timing errors, i.e. errors that have specific patterns related with the signal, the modeling and analysis (both for global and local) is based on the identification of a timing error transfer function, called Timing Non Linearity function (TNL). The TNL function is a time domain analogous to the Integral Non Linearity (INL) function. If is an input code, then defines the equivalent timing error associated this code. Similar is the definition for code differences Ideally, the TNL should be zero, or constant, i.e. all codes or code differences should be affected by the same error during switching. In this case signal distortion is avoided. The TNL function
216
provides a general tool to understand the impact of the timing errors on the signal. For time stochastic timing errors stochastic methods are used, which combine the general stochastic properties of the errors with the discrete nature of the input signal and the continuous time nature of the wanted signal [29]. IX.
Analysis of Global Timing errors
A. Global stochastic timing errors Let us define N as the resolution of the DAC, BW as the Nyquist bandwidth of the converter, and OSR as the oversampling ratio. The clock samples at where defines the timing jitter. The timing error process is strictly stationary. The form of correlation of the timing errors and the joint probability density function capture the physical properties of the errors. Clock jitter affects sampled systems in different ways. In an ADC the input is a continuous-time, -amplitude signal but the output is discrete-time, discrete-amplitude signal. Including quantization, the A/D output signal is
where is the quantized and sampled signal. is a unit pulse and is a delta pulse. On the other hand, in a DAC the input is discrete in time and in amplitude and the output is discrete in amplitude but continuous in time. Timing jitter here affects only the timing axis and the amplitude values are preserved. In contrast to an ADC, the DC performance of a DAC is never influenced by the existence of jitter. In a DAC without noise shaping, represents an N – bit ideally quantized and sampled real signal i.e. For noise shaped converters, is an L-th order noise shaped version of Then, the output signal is
where is the difference of two consecutive input codes. The model is shown in fig. 4. Relevant to the architectures shown in fig. 1 and 2, ADC jitter appears at all points where a signal is sampled and digitized and it is typical for DT from hereafter
217
ADC modulators. However, it appears at the input of the modulator and it is shaped by the feedback loop [31], [32]. DAC jitter becomes critical also for CT modulators because it affects directly the output signal. Here, the DAC pulses are considered to be of Non Return to Zero (NRZ) type. Return To Zero pulses are more sensitive to jitter [31], [32]. First, the general output power spectrum is given based on general assumptions for correlation, signal etc. Next, an example follows that leads to a degenerated form of the general solution, which is applicable for DACs and CT ADC modulators. The mathematical derivations can be found in [29]. The Power Spectral Density (PSD) of the DAC’s output signal is
The factor is the autocorrelation of the difference signal gives the power of this signal. is the double Fourier transform of the probability density function (pdf) of the jitter and defines the way jitter affects the spectrum of the signal. The function describes the spectral properties of the NRZ pulses and stands for the sampling rate. In summary, there are three main factors that determine the type and power of the jitter effects. First, the properties of the jitter itself. Second, the sampling period, which will be related to the oversampling ratio of the system. Third, the properties of the DAC’s discrete-time difference input signal. This factor relates to the noise shaping properties of the system. In the following subsection, eq. (4) will be applied for White Gaussian jitter. A.1 Example 1: ”White” Gaussian timing errors For White Gaussian jitter eq. (4) becomes
The term contains the signal. The input signal (and quantization part) is mirrored around the sampling carrier and it is selectively attenuated by the jitter through and by the pulse shape via Part of the discrete input signal’s power is transfered to a continuous noise power spectrum. The term is the noisy part. It is a continuous function of and it can be practically constant for the frequency range of interest. is analogous to the power of the input signal’s difference signal over and analogous
218
to the jitter through The combination of finite oversampling, finite resolution and noise shaping will make the power of the discrete-time difference signal effectively larger than difference signal of the wanted signal. A non noise shaped DAC is an example of a system that gives the best case jitter scenario because there is absence of noise shaping and the resolution is usually large. Consequently, the difference signal is very similar to the wanted signal’s difference signal. Without noise shaping, is given by is the fundamental and with and odd stand for the quantization harmonics [41]). Then the Signal to Noise ratio (SNR) in respect to the signal power becomes:
Eq. (6) gives the SNR as a function of the variance of the jitter, the sample rate the signal bandwidth BW, the S/H shaping of the pulse shape and the signal amplitude and the quantization components (the resolution). If the resolution is large the quantization components will have negligible effect on the SNR. Moreover, it will not depend on the signal amplitude. The SNR drops with 20dB/dec with the spread but it increases 3-dB when OSR doubles, since the squared sinusoidal term in eq. (6) dominates the single OSR. For large OSR, we obtain
Let us assume that we require 80dB of SNR within 10MHz of signal bandwidth Then from eq. (6), which is plotted in fig. 8 for a signal frequencies of MHz as a function of the spread of the timing errors, we see that the period of the clock should have less than 2.5 psec of jitter. The tolerance to the timing errors is larger when oversampling is added to the system. In the same figure the curve for OSR= 16 is shown and the required spread is increased to 6 psec. Even for this scenario the requirements for the clock generator are quite strict. The neglection of quantization effects (large resolution) and noise shaping (eq. (6)) gave a best case performance. Consequently, we are interested to identify the role of resolution and noise shaping to the jitter noise. This issue is important for CT modulators with multi bit internal DACs. Insight can be obtained from eq. (5) recalling the dependency of the jitter power with the power of the input difference signal over the sampling period, i.e. To demonstrate the effects of resolution N, we sample a full scale
219
sinusoidal input with several OSR for several resolution levels. From the resulting bit-stream, without noise shaping, we estimate with simulations the power and from this the ratio SNR(N + 1) / SNR(N). The results are plotted in fig. 9. We see that for the given the SNR improves by almost 3dB per extra bit of resolution until a saturation point. At the same time, increasing the OSR gives the 3dB per × 2 OSR benefit, but only if certain conditions are met for the resolution of the DAC -different for each frequency. Generally speaking, the SNR benefits with increased resolution. Increasing the OSR improves the SNR if sufficient resolution exists that new sample values can be created. In other words, in the presence of sufficient resolution the jitter power is determined by the wanted signal derivative (rightmost side in fig. 9), but when the resolution is low the SNR is determined by the derivative of the quantization terms (leftmost side of the same figure). Finally, we are interested to identify the role of noise shaping in the jitter power. If is the ideal continuous time signal at the output of the DAC, then where is the quantization error. The difference signal which determines the jitter power contains two parts, the signal part and the quantization part, i.e. (the is removed for simplicity). In the absence of noise shaping, but when noise shaping is added is not determined solely by the signal part, rather it is influenced - and in the extreme it is dominated - by the quantization
220
part. This occurs because noise shaping moves the quantization power from the baseband to higher frequencies while keeping the signal unaffected. In this way, the power of the quantization difference signal becomes effectively larger than that of the signal. Hence, high order noise shaping is expected to have impact on the jitter noise generated by the DAC. In the best case, discrete time filtering prior the N-bit core can remove all the high frequency quantization power, and then the jitter power will be dominated by the signal. Such an approach was used in [36] for a DAC. However, this technique can not be implemented that easily to a CT ADC because the added delay can influence the loop stability. B. Global deterministic timing errors
For global deterministic errors we use the definition of the TNL function. We use instead of to distinguish the deterministic nature of the errors. If the delays are directly dependent on the signal then is the result of the modulation of the with i.e. Similarly, the time delays can be defined by or dependent on whether the delays depend on the code differences, or specific combinations of codes. This will become clear with the examples that follow.
221
The DAC output signal is given by
where and have their usual meaning. The output is a Pulse Amplitude, due to input signal, and Pulse Duration Modulated (PDM/PAM) signal, due to the modulation of B.1 Example 2: Signal Dependent Settling time
Signal dependent settling time is caused by the change of dynamics in the output node of the DAC and results in data dependent rise/fall times. The delay is a function of the number of switched on chains in the DAC, consequently depends on the signal Parts of this problems have been analyzed in [18], with a focus on the distortion caused by the current that is lost by the changes in the output resistance. Here, we investigate the impact of the modulation of the rise/fall time. When a current source is off it has an infite resistance and a capacitance that depends on the dimensions of the current switch. The moment it switches on, the resistance changes to the finite value and the capacitance is slightly modified adding a unit capacitance The off state capacitance is not considered because only the difference between the on and off state are important. In case of a code in the input of the DAC, chains are connected in parallel and the DAC output resistance will be while the DAC’s capacitance Consequently, the output resistance is formed by the parallel combination of and and the total output capacitance is If and represent the DAC’s output load capacitance and resistance, respectively, then the settling time constant in first order, is
The values of vary from 1 to is large enough to ensure
For simplicity, assume that the resistance Then, eq. (9) changes to
The constant delay is not considered because we are interested only on the relative differences of the settling time from code to code. The TNL function, calculated on the basis of the equivalent timing error definition, is Let us consider the codes to be the sample values of the sinusoid Then we can approximate with
222
which leads to the distortion components
The function defines the Bessel function of the 1st kind. The signal to order harmonic distortion ratio is
The second order harmonic distortion , is given by
is dominant and for
it
The scaling of the distortion with the resolution in eq. (13) is justified, only if one assumes that when the resolution scales then the unit elements that compose the output transistors of the chain stay the same. To obtain useful insight, consider a 6 bit DAC loaded with un update rate of MSamples/sec, i.e. nsec. The DAC drives an external capacitance of and a resistance of Ohm. Assume also that each of the 63 current sources adds a capacitance of 2 fF when it goes from the off state to the on state. Then In fig. 11 (a) is plotted as a function of the signal frequency for a sampling rate of MSample/sec. raises near because when approaches the pulse effect filters the second harmonic. For frequencies up to 10 MHz
223
the is well above 75 dB and it drops to a minimum of 66 dB at 50 MHz. To obtain an better than 74 dB (12 bits) the unit timing delay should be less than 50 fsec. This implies a rise-time difference between switching the first and the last codes equal to 3.15 psec. Generally speaking, one solution to the problem of modulation of the dynamics of the output node is to decouple the interaction of the current cells through this node. This means that the summation should not be made directly in the output of the current cell so as the impedance of the summing node becomes independent on whether the cell is active or not. A typical solution that is common in practice is to provide a low impedance summation node in the output (virtual ground) to buffer the current [42]. In this case, the modulation of the impedance of the DAC is significantly reduced from (10) becomes virtually constant). The drawback of this solution when one aims at very high speed and relatively high accuracy is that the buffer should be more linear than the overall DAC while buffering currents generally in the range of 1 – 20mA -dependent on the type of DAC- at very high speeds. This drawback limits this solution to higher speeds.
224 X.
Local Timing errors
Local timing errors are either spatial stochastic or deterministic. We assign to each of the chains a timing error that appears when the chain turns on/off, for Ideally or constant, for every The relation with topology comes via where defines the position of the element in the pool of sources, i.e. the area where the last building blocks of the chains are placed. The unit values and the corresponding errors are mapped to the output signal range between zero and full scale by a map S. A map expresses the order with which the chains are selected to compose the values 1 to Practically, this refers to the algorithmic means of switching sequences and dynamic element matching. We assume to be the continuous output amplitude normalized to the quantization value with Then, the errors are re-ordered according to the map S and they are described by a new set A. Spatial stochastic errors
First, consider spatially random errors. Each is an independent identical distributed variable with zero mean and variance Each time a new code is converted, the input of the converter is changing from the code at the time instant to the code at the time instant The output current changes from respectively, but during this change the associated chains that are combined together have a different switching behavior according to the errors with Consequently, the error current is described by
where the TNL function is defined now on the basis of the difference of two consecutive codes:
Eq. (15) defines a random variable, which is the mean of a set of identical and independent random variables [23]. The variance of the TNL will be
225
The physical meaning of the resulting TNL is that if the code difference is large and a large number of unit currents are used, the error converges to its mean value, which is zero in this case. Therefore, when the resolution N of the converter is large the error due to timing is averaged out. Moreover, the error is closer to it’s average value when the signal exhibits large code differences. The expected value of the signal error power is
and the time averaged value of
becomes
Although optimum results would be obtained analyzing properly the discrete time signal, it suffices at the moment to assume that it resembles closely the wanted continuous time signal. Therefore, with a signal it can be found that
The Signal to total Distortion ratio (SDR) is the power of the signal the total error power:
For a full scale sinusoid with as Using
over
the SDR is expressed
and
From eq. (21), it is seen that, first, the SDR is proportional to the resolution with a factor 3 in dB. The larger the resolution of the DAC, the smaller the distortion is. The SDR drops with 20dB/dec with the spread of the timing errors, but only with 10dB/dec (3dB for ×2) with the signal frequency and the sample rate. In addition, it drops with 10dB/dec with the OSR. The larger the OSR the worse SDR becomes. The comparison of theory and simulations is demonstrated in fig. 12. The simulations are based in Matlab code and for each
226
mean value 50 runs were considered. The vertical lines show the spread of the SDR. In a hypothetical scenario of a CT modulator designed for a BW= 10 MHz with 12 bit accuracy (74 dB), the internal DAC, independent of its resolution, should have at least 12 bit accuracy. Figure 13 shows possible choices for the OSR and the resolution of the DAC that can satisfy this requirement given a spread of 5 psec for the timing errors. With 4 bits in the DAC, the OSR should be more than 8, but for a 6 bit DAC the OSR can be increased up to 32 obtaining the same performance. Of course, changing the resolution of the DAC and the OSR to satisfy the DAC requirements means that other architectural parameters of the complete CT modulator have to change as well to maintain an overall performance of 12 bits. The scaling of N assumes that the spread of the unit chains remains constant. In fig. 14, it can be seen that with a error spread in the order of 1 psec and a 5 bit DAC, the resulting distortion would be small enough for 14 bits accuracy with an OSR of 16. At this point, some additional remarks need to be made. Obviously, the re-
227
quirement of a few psec for the timing errors to guarante good performance are translated to hard requirements on the hardware blocks, because the spread of the errors includes all types of spatially random switching errors (charge injection, skew etc). Moreover, the impact of the order of noise shaping (the power of the high frequency quantization errors) has not been taken into account in the previous analysis. However, as it has been explained in the case of timestochastic timing errors (clock jitter), the shaping of quantization errors towards higher frequencies increases the power of the derivative of the quantization distortion components. This power, for high order modulators, is higher than the power of the derivative of the signal, which allowed us to derive the SDR of eq. (21) from the general solution of eq. (18). In summary, for optimal performance in a pre-specified analog bandwidth and with fixed local errors per unit chain, high resolution with minimum oversampling ratio should be prefered. B. Spatial deterministic errors
The most common type of spatial deterministic timing error is the systematic clock skew that is caused by the clock distribution network. Other errors, such as systematic skew on the latches driving the current cells, can occur as well.
228
The TNL function is defined as
as in the stochastic case but here are not random. Because the TNL becomes a function of two variables, there is a significant difficulty to analyze this problem for a given signal. In addition, it is almost impossible to calculate for complex types of errors. For this reason, the focus here is mainly to obtain some insight with a simple example and compare with the spatial stochastic case. Insight can be obtain observing the spatial spectrum of the timing errors with respect to the variable Since the particular shape of originates from topology, the spectrum of TNL with respect to is referred to as spatial spectrum. As explained previously, each chain exhibits a delay to switch on/off. If are random, the spatial frequency content of them is noisy. If have a systematic pattern, the spatial spectrum shows discrete tones. The TNL function performs spatial averaging, i.e. spatial low pass filtering, on the timing errors Therefore, when the errors are very systematic, the inherent averaging that occurs through eq. (23) is not really effective and the resulting distortion is expected to be worse than that of the random case. The analysis can be simplified assuming that the TNL function is a general
229
function of
of the sine series form
Consider that the sampled and quantized signal is i.e. Under the assumption that the DAC input signal resembles closely the wanted continuous one, it can be found that the ideal output DAC waveform is phase modulated by the function
where in [sec], and is the order Bessel function of the 1-st kind. The magnitudes of the spurs caused by are
Some observations can be made based on (26). First, the induced tones are periodic to the input signal. Their magnitude is affected by the shape of regularity of the profile of the timing errors results in large components. A 20dB/dec dependency exists for for low frequencies, which is similar to the 20dB/dec dependency of the distortion to the spread of the local stochastic timing errors is the equivalent of Second, the spurs depend on the signal frequency and amplitude; as a result, the relative distortion does not change with the input amplitude and scales up with 20dB/dec with the frequency in contrast to the random case. In fig. 15 a simple example of a delay distribution is shown. In fig. 15(a), defines half of the elements to be delayed with sec with respect to the other half. The ideal signal the distorted and their difference are shown in fig. 15(b). The amplitude is normalized to the full scale value and the time to the period of the signal. After calculations we find and for we obtain
For the theoretical and simulated ratio of signal to second order harmonic distortion power can be seen in fig. 16. A relative delay of i.e. 2.5ps on a 200MHz clock, restricts the spurious free dynamic range to 70dB.
230
XI.
Conclusions
A detailed analysis of timing errors has been performed under the basis of the problems that are encountered in Current Steering type of DAC implementations. The analysis was based in an error analysis framework that classified the timing errors according to their dependencies with the time and spatial domains. Example cases have been considered based on significant physical mechanisms of errors that are encountered during implementation. As a result of this analysis, insight has been obtained for the timing problems of DACs and the relation with a given architectural context (e.g. CT converters) and block level performance specifications.
References [1] “Delta-Sigma Data Converters, Theory, Design, and Simulation,” IEEE Press, 1997. [2] I. Fujimori et al, “A 90-dB SNR 2.5-MHz output-rate ADC using cascaded multibit delta-sigma modulation at 8x times oversampling ratio”, IEEE Journal of Solid State Circuits, vol. 35, no. 12, pp. 1820 -1828, Dec. 2000. [3] K. Vleugels et al, “A 2.5-V sigma-delta modulator for broadband communications applications”, IEEE Journal of Solid State Circuits, vol. 36, no. 12,
231
pp. 1887 -1899, Dec. 2001. [4] Y. Geerts et al, ”Design of High-Performance CMOS Delta-Sigma A/D Converters,” Ph.D. Thesis, Katholieke Universiteit Leuven, December 2001. [5] ”High speed D/A Converters”, Advances in Analog Circuit Design, Edt. 2001., Kluwer Academic Publishers, 2001. [6] R.J. v.d. Plassche ”Integrated Analog-to-Digital and digital-to-analog Converters,” Kluwer Academic Publishers,ISBN 0-7923-9436-4, 1994. [7] C.H.Lin et. al, ”A 10b 500 MSamples/s CMOS DAC in IEEE Journal of Solid State Circuits, vol.33, n. 12 pp. 1948-1958, December 1998. [8] G. v.d. Plas et. al, ’A 14-bit intrinsic accuracy random walk CMOS DAC’,IEEE Journal of Solid State Circuits, vol. 33, pp. 1708-1718, December 1999. [9] J. Bastos et. al, “A 12-bit Intrinsic Accuracy High-Speed CMOS DAC”, IEEE Journal of Solid State Circuits, vol. 33, no. 12, pp. 1959-1969, December 1998. [10] A.v.d. Bosch et.al, ’A 10-bit 1Gsample/sec Nyquist Current-Steering CMOS D/A Converter’ IEEE Custom Integrated Circuit Conference, Orlando, USA 2000. [11] A. Bugeja et. al “A 14-b, 100-MS/s CMOS DAC Designed for Spectral Performance,” IEEE Journal of Solid State Circuits, vol. 34, no. 12, pp. 1719-1732, December 1999.
232
[12] D. Mercer, “A 16-b D/A Converter with Increased Spurious Free Dynamic Range,” IEEE Journal of Solid State Circuits, col. 29, no. 10, pp 1180-1185, October 1994. [13] M. Pelgrom et. al, ”Matching properties of MOS transistors”, IEEE Journal of Solid State Circuits, vol. 24, no. 5, pp. 1433-1439, 1989. [14] M. Pelgrom et.al, ”CMOS technology for mixed signal ICs”, Solid-State Electronics, vol. 41, no7, pp.967-974, July 1997 [15] Y. Cong et.al, “Switching Sequence Optimization for Gradient Error Compensation in Thermometer-Decoded DAC Arrays”, IEEE Trans. CAS-II, vol. 47, No.7, Jul. 2000. [16] K. Doris et.al, ”D/A Conversion: Amplitude and Timing Error Mapping Optimization”, IEEE Int. Conference on Electronics, Circuits, and Systems, ICECS 2001,Malta. [17] B. Razavi, “Principles of Data Conversion System Design”, IEEE Press, 1995. [18] J. Wikner et. al, ”Modeling of CMOS Digital-to-Analog Converters for Telecommunications”, IEEE, Trans. on Circuit and Systems part II, vol.46, no.5, May 1999. [19] S. Sauter et.al, ”Effect of Parameter Variations at Chip and Wafer Level on Clock Skews,” IEEE Tran. on Semiconductor Manufacturing, vol. 13, no. 4, November 2000. [20] ’Workshop on Substrate Noise-Coupling in Mixed-Signal ICs’, IMEC, Leuven, Sept. 2001, www.imse.cnm.es/esd-msd/WORKSHOPS/IMEC2001. [21] A.V. Balakrishnan, ’On the Problem of Time Jitter in Sampling’, IRE Trans. on Information Theory, pp. 226-236, April 1962. [22] A. Papoulis, ”Error Analysis in Sampling Theory”, Proc. IEEE vol. 53, no. 7, July 1966. [23] W.A. Gardner ’Introduction to random processes, with applications to signals and systems’, McMillan Publ. Company, N.Y. 1986 [24] J. Philips and Ken Kundert, ”Noise in Mixers, Oscillators, Samplers, and Logic An Introduction to Cyclostationary Noise,” IEEE Custom Integrated Circuits Conference, USA 2000, pp.431-437. [25] S.S. Awad, ”Analysis of Accumulated Timing-Jitter in the Time Domain,” IEEE Trans. on Instrumentation and Measurements, vol. 47, no. 1, Feb. 1998. [26] H. Kobayashi et.al, ”Aperture Jitter Effects in WIdeband ADC Systems,” ICECS 1999, Cyprus. [27] Y.C. Jenq, ”Digital-to-Analog (D/A) Converters with Nonuniformly Sampled Signals,” IEEE Tran. on Instrumentation and Measurement, vol. 45,
233
no. 1, February 1996. [28] K. Doris et.al, ”Time Non Linearities in D/A Converters,” Proc. European Circuit Theory and Design Conference, Helsinki, Finland, vol 3, pp. 353357, September 2001. [29] K. Doris et.al, ”A General Analysis on the Timing Jitter in D/A Converters,” Accepted for publication at IEEE Int. Symposium on Circuits and Systems, ISCAS 2002, Arizona, USA. [30] J.A. Cherry et.al,’Clock Jitter and Quantizer Metastability in ContinuousTime Delta-Sigma Modulators’, IEEE, Trans. Circuits and Systems, Part II, vol. 46, no. 6, Aug. 1999. [31] E.v.d. Zwan et.al, ”A 0.2-mW CMOS Modulator for Speech Coding with 80 dB Dynamic Range,” IEEE Journal of Solid State Circuits, vol. 31, no. 12, December 1996. [32] Hai Tao et.al,”Analysis of Timing Jitter in Bandpass Sigma-Delta Modulalor s”, IEEE, Trans. Circuits and Systems, Part II, vol. 46, no. 8, Aug. 1999. [33] O. Oliaei et.al, “Clock Jitter Noise Spectra in Continuous-Time DeltaSigma Modulators,” IEEE Proc.Int. Symposium on Circuits and Systems, ISCAS’99 vol. 2 , 1999, pp. 192 -195. [34] V. Peluso et.al, ”Design of Continuous Time Bandbass Modulators in CMOS”, Analog Circuit Design, Eds. 1996, Kluwer Academic Publishers. [35] B. Jonsson, ”Sampling jitter in high-speed SI circuits,” IEEE Proc. Int. Symposium on Circuits and Systems, 1998, ISCAS ’98. vol. 1, pp. 524 -526. [36] I. Fujimori et.al, ”A Multibit Delta-Sigma Audio DAC with 120-dB Dynamic Range,” IEEE Journal of Solid State Circuits, vol. 35, no. 8, Aug. 2000. [37] M. Shinagawa et.al, ”Jitter Analysis of High-Speed Sampling Systems,” IEEE Journal of Solid State Circuits, vol. 25, no. 1, Feb. 1990. [38] A. Demir et.al, ”Phase Noise in Oscillators: A Unifying Theory and Numerical Methods for Characterization’”, IEEE Tran. on Circuits and Systems, part I, vol. 47, no.5, May 2000. [39] A. Hajimiri et.al, ”A General Theory of Phase Noise in Electrical Oscillators,” IEEE Journal of Solid State Circuits, vol. 33, no. 2, June 1998. [40] B. Razavi, ”A Study of Phase Noise in CMOS Oscillators,” IEEE Journal of Solid State Circuits, vol. 33, no. 3, March 1996. [41] N.M. Blachman, ”The intermodulation and distortion due to quantization of sinusoids,” IEEE Trans. on ASSP, vol. 29, no. 4, pp. 914-917, Aug. 1985. [42] C.M. Hammerschmied et.al, ”Design and Implementation of an Untrimmed MOSFET-Only 10-Bit A/D Converter with -79-dB THD”, IEEE Journal of Solid State Circuits, vol. 33, no. 8, pp. 1148-1157, Aug. 1998.
This page intentionally left blank
Correction-Free Multi-Bit Sigma-Delta Modulators for ADSL R. del Río, F. Medeiro, J. M. de la Rosa, B. Pérez-Verdú, and A. Rodríguez-Vázquez Institute of Microelectronics of Seville, CNM-CSIC Edif. CICA-CNM, Avda. Reina Mercedes s/n 41012 Sevilla, SPAIN
Abstract
This chapter first presents a scalable MASH dual-quantization architecture and a companion optimum set of scaling coefficients which result into minimum resolution losses. The chapter then outlines the dominant circuit imperfections that degrade the operation of this architecture, and presents illustrative design exploration considerations induced by these circuit imperfections. Finally, some practical considerations pertaining to the implementation of an ADSL sigma-delta modulator in CMOS technology are given and illustrated through experimental results. 1. Introduction
The increasing demand for broadband data communications has stimulated the industrial interest in high-performance A/D converters able to achieve between 12- and 16-bit accuracy for signal bandwidths exceeding 1 MHz [1]. These specifications seem a priori better suited for pipeline architectures. However the larger linearity and simpler circuitry of oversampled, sigma-delta modulation converters render them worth exploring for the implementation of wireline modems as mixed-signal systems on-chip. Under ideal operating conditions, the dynamic range, DR, of a converter is: 235 M. Steyaert et al. (eds.), Analog Circuit Design. 235–260. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
236
where M stands for the oversampling ratio, L is the modulator order, and B is the resolution of the internal quantizer. While issues related to increasing M are basically related to the dynamic requirements of the building blocks, augmenting L and/or B raises issues also at modulator level, principally due to the following: Unlike 1st- and 2nd-order loops, high-order loops are not unconditionally stable. The non-linearity of the multi-bit DAC in the feedback path may manifest as non-shaped, and hence not attenuated, error. High-order can be stabilized through several techniques. Conditionally stable modulators have been designed by properly choosing the scaling factors [2]; or by using multi-path feedforward structures to reduce the out-of-band value of the quantization noise transfer function [3]. Also, instability can be circumvented by resetting some integrator outputs if divergency is detected [4]. However, mechanisms adopted for stabilization generally deteriorate the dynamic range with respect to that predicted by (1). Reductions around 25dB in DR and even more have been reported [5]. Furthermore, stabilization often requires very small scaling coefficients, which means very large capacitors in SC implementations – unhelpful for power saving. Finally, the complexity of the modulator architecture increases considerably, which may result into rather non-optimum designs. Regarding the multi-bit DAC non-linearity, several correction/calibration methods have been employed working either in the digital domain [6] [7] or in the analog domain [8]-[10]. Among the latter, dynamic element matching techniques [8] [10] [11] are interesting, because they enable on-line correction and have been successfully applied at high frequencies [12]. However, since DACs cannot be efficiently linearized within an arbitrarily large resolution, the use of loworder multi-bit modulation may not be enough to obtain a given DR. A direct solution to this problems is to increase both the modulator
237
order and the internal quantization resolution, giving rise to moderateorder (3 - 5), multi-bit architectures [12] [13]. In fact, the use of multibit quantization in single-loop high-order modulators inherently improves their stability properties [5], so that these are good candidates to obtain high-resolution, high-frequency operation, provided that the non-linearity problem is solved. With the same objective, the combination of high-order cascade (MASH) architectures [14] with multi-bit quantization has been proposed [15]-[18]. These modulators gather the unconditional stability of cascade modulators (provided that only 2nd- and/or 1st-order stages are used) and the advantages of multi-bit quantization with relaxed requirements for the linearity of the latter. The feasibility and efficiency of this approach, because it needs no correction/calibration mechanism, has been proved in analog technologies [15] [19]. Circuit imperfections affecting cascade modulators can hide some of the benefits of multi-bit quantization. Particularly, finite opamp DCgain and integrator weight mismatch cause low-order quantization error leakage that may degrade the DR. The impact of these non idealities is more evident in low-voltage digital submicron CMOS technologies, due to the larger degradation of both MOSFET output conductance and capacitor matching. But even in that case, as we show in this chapter, these problems can be circumvented through a careful mixed-signal design assisted by precise models of the circuit behavior. This chapter is organized as follows: Section 2 describes the cascade multi-bit whose architecture and coefficients are optimized in
238
Section 3. Section 4 discuses the main non-ideal mechanisms degrading the modulator performance. Finally, Section 5 is devoted to describe a practical design example. 2. Cascade multi-bit
modulators
A robust multi-bit is obtained by increasing the number of quantization levels in some of the quantizers of a cascade architecture [15]-[18]. Fig.1 shows a typical architecture which includes multi-bit quantization only in the last stage, while the remaining are single-bit. Assuming perfect cancellation 1, it can be shown that only the signal X(z) and the last-stage quantization error remain at the modulator output, yielding:
Note that since is generated in a B -bit quantizer, the modulator response equals that of an ideal L th-order B-bit except for a scalar equal to the inverse of the product of the integrator weights in the cascade. In order not to prematurely overload the modulator, signal must be properly scaled down as transmitted from one stage to the next one. This results in a value of larger than unity, which means an amplification of the last-stage quantization error, thus degrading the effective resolution. However, this loss can be reduced to only one bit with respect to the ideal case (1), no matter which the modulator order is. Moreover, using multi-bit quantization only in the last stage relaxes the multi-bit DAC linearity requirement because the non-linearity error is not injected at the modulator input but high-pass filtered, so that most 1. In a cascade modulator, each stage re-modulates a signal which depends on the quantization error generated in the previous one. Once in the digital domain, the quantization errors of all stages but the last one are cancelled. This error appears at the modulator output shaped by a function of order equal to the summation of all stage orders. Because modulator input is unaffected by this procedure, the performance of a cascade modulator is equivalent to that of an ideal high-order loop. Furthermore, provided that only 2nd- and/or 1st-order modulators are used, the cascade can be designed to be unconditionally stable.
239
part of the DAC error power is pushed out of the signal band. Simple analysis shows,
where is the z-transform of the integral non-linearity error produced in the last-stage DAC. Such an error presents a shaping function of order (i.e., the overall modulator order minus the order of the last stage). It means that such non-ideality can be tolerated to some extent without correction/calibration. 3. Architecture and coefficients selection
In order to gain insensitivity to circuit imperfections, we consider that the 1st stage is 2nd-order. A first consideration refers to wether the remaining stages are either 2nd-order or 1st-order. Bear in mind that signals in the cascade must be properly scaled to prevent premature overload of the stages. The larger the number of stages, the more acute the overloading problem becomes. Moreover, it is known that, unlike 1st-order modulators, 2nd-order ones overload before reaching the fullscale input. Finally, as mentioned before, digitally compensating the scale factors causes a systematic loss of resolution, whose minimum value is larger the larger the number of stages and/or when 2nd-order stages are used. Consider for illustration purposes the 2-1-1-1 cascade in Fig.2(a) and compare it with the more usual 2-2-2 architecture in Fig.2(b) [20]-[22]. Table 1 and 2 show a set of relationships among analog and digital coefficients, and expressions of the transfer functions which guarantee correct ideal operation. Columns A and B of Table 3 show two sets of coefficients optimized to minimize the systematic loss of resolution, while keeping a high overloading point in the 5th- and 6th-order architectures, respectively. Columns C and D contain other sets of coefficients previously proposed for the 2-2-2 cascade [21] [22]. Fig.3 shows the ideal signal-to-(noise+distortion)-ratio SNDR curves for both architectures and the four versions of coefficients A, B, C and D, all with the same oversampling ratio, M = 16.
240
The reason for the 2-1-1-1 (5th-order) being better than the 2-2-2 (6th-order) is simply that the loss of resolution due to scaling is only 1bit (6-dB DR loss) in the former, whereas it is 4bit (24-dB DR loss) in the latter. Note that the 6th-order with coefficients C and D yield larger systematic resolution losses. The answer to the question: “could the 6thorder ideally behave better with another set of coefficients?” is likely to
241
be “no, if the overloading point has to be kept”, because, due to the use of 2nd-order modulators in the 2nd and 3rd stages, more attenuation is needed for the signal transmitted among stages.
242
Apart from ensuring correct operation with largest possible overloading point, and minimizing the systematic loss, the coefficients choice must consider the following practical points: (a) the output swing needed in integrators must be feasible within the available supply voltage; (b) the number of branches required for each integrator and the total number of unitary capacitors should be minimized, which simplifies the circuitry and saves silicon area; (c) for similar reasons, the resulting digital coefficients should be easy to implement; (d) the number of identical stages must be the largest possible in order to simplify the layout. The coefficients for the 5th-order modulator given in Table 3-column A also have the following interesting properties: (a) the total output swing required in all integrators equals the quantizer full-scale as illustrated in Fig.4 – an useful feature for low-voltage implementation; (b) the largest weight of each 3-weight integrator can be obtained as the summation of the others, so that no 3-branch SC integrators are required; this also minimizes the number of unitary capacitors; (c) the digital coefficients are so that they can be realized with simple shift registers; and (d), all the 1st-order stages use the same analog and digital coefficients, and they can be electrically identical as well. The architecture with the set of coefficients in Table 3-A can be easily extended to either lower-order or higher-order cascades just by remov-
243
ing or adding identical 1st-order stages. In this process a correct operation is maintained with constant overloading point. This is illustrated in Fig.5 where the ideal SNDR curves for the 4th-, 5th, 6th-, and 7th-order cascades are plotted. Note that the overloading point does not change although the number stages is increased from curve to curve. The main objective of using a cascade is to reduce the oversampling ratio required to obtain a given resolution. Fig.6 shows the minimum oversampling ratio needed to achieve 14bit (or 86-dB DR) as a
244
function of the order of the 2-1-...-1 topology2. Note that a further reduction of the oversampling can be achieved by replacing the singlebit quantizer of the last stage by a multi-bit one, as shown in Fig. 6 (curves with B > 1)3. For instance, if we consider a 4th-order 2-1-1 cascade, such resolution can be achieved with × 16 oversampling using 3~4bit in the last-stage quantizer, whereas its single-bit counterpart would require × 24 oversampling. Depending on the location of the signal band, such a 1.5 factor may define the border between feasible and unfeasible implementations. 4. Impact of circuit imperfections
SC implementations of cascade modulators suffer from certain nonideal behaviors, namely finite (and non-linear) opamp DC-gain and capacitor mismatch, more than their single-loop counterparts. Without circuit imperfections, the in-band error power would be contributed by the quantizer error with a shaping function of order L, plus the DAC non-linearity error with a shaping function of order L - 1, yielding: 2. The DR is estimated by substracting 6dB from (1) in order to account for the 1bit loss due to scaling. 3. We call to the modulator formed by cascading a 2nd-order stage and k 1storder stages, and will use for denoting its multi-bit version.
245
where is the power of the last-stage quantization error stands for the last-stage quantizer full-scale) and represents the DAC-induced error power, which can be estimated as with INL being the DAC integral non-linearity expressed in %FS. Consider now the finite DC-gain and mismatch. Analysis obtains the following modified finite difference equation for the integrator,
where stands for the opamp open-loop DC-gain, is the number of input branches of the integrator, is the nominal weight of the i-th branch, is its relative error, and and are the integrator input and output, respectively, during the previous clock cycle. This expression can be used to estimate the influence of the finite DC-gain and the mismatch on the incomplete cancellation of the quantization errors, and the corresponding degradation of the DR. Approximated expressions for the extra in-band error power, valid for all the architectures, are:
246
where i s is the quantization error power of a single-bit quantizer and refers to mismatching in weights , and An upper bound for is , where is the standard deviation of the mismatch error in capacitor ratios. These non-ideal contributions may indeed dominate the ideal one. On the one hand, they are generated by single-bit quantizers, i.e., On the other hand, they are attenuated by only and . Hence, the feasibility of calibration-free cascade modulators depends on how demanding the requirements for the opamp DC-gain and capacitor matching are, regarding the possibilities of a given technology. An evaluation of these concerns is given next. First, we will identify a feasibility limit to the order of the single-bit architecture through behavioral simulation and MonteCarlo analysis [23]. Fig. 7(a) shows the simulated half-scale SNDR as a function of the amplifier DC-gain for M = 16. Fig. 7(b) shows the SNDR histograms obtained from MonteCarlo simulation assuming 0.1% sigma in
247
capacitor ratios (0.05% is currently featured by MiM capacitors in standard CMOS processes [24]). Under these conditions, mainly because of the matching sensitivity, the 7th-order architecture is not worth implementing. Nevertheless, the 6th-order modulator provides 90-dB worstcase SNDR with DC-gain of 2500. Especially robust is the 5th-order cascade, which requires a DC-gain of 1000 to achieve 80-dB worstcase SNDR with M = 16. In order to avoid excessive degradation due to mismatch, we will limit the order to 5 for multi-bit modulators. Expressions (4), (6) and (7) have been used to identify possible selections of (M, B) pairs for the 4th- and 5th-order multi-bit architectures in the presence of finite gain capacitor mismatch and DAC non-linearity (INL = 0.4% FS). These values have been extracted from electrical and technological data of an actual CMOS process. Fig.8(a) shows the effective resolution of the modulator as a function of the last-stage quantizer resolution B, with M (whose value is depicted over each curve) acting as a parameter. Fig.8(b) shows equivalent results for the architecture. Note that curves saturate in the presence of non-idealities, leading to a practical limit for the use of multi-bit quantization. The thick solid line is intended to grossly estimate such a limit. For a given M, increasing B over this limit (points on the right of the line) would not further improve resolution. Nevertheless, quantizer resolutions below this limit are enough to significantly
248
relax the dynamic requirements of circuitry, with respect to single-bit approaches. Furthermore, it can be achieved without any correction/calibration mechanism for the multi-bit DAC non-linearity. A priori, different (M, B) pairs can be selected for both modulators to obtain a given resolution. Nevertheless, as the operation frequency increases, defective settling of the SC integrators is likely to become the dominant error source. Thus, the choice among possible (M, B) pairs must be made looking at the opamp dynamic requirements. As a matter of example, let us consider 14-bit effective resolution for a signal band of 2MHz (4-MS/s Nyquist-rate). Table 4 shows an estimation of the required opamp gain-bandwidth product GBW for (M, B) pairs in Fig.8 obtaining ~14bit. Note that multi-bit quantization significantly relaxes the dynamic requirements with respect to a single-bit approach. For instance, a modulator with single-bit quantization would require M = 22 (88-MHz sampling frequency) to obtain 14-bit resolution, whereas a 3-bit version of it can achieve the same resolution with M = 16 (64-MHz sampling frequency). So, with the latter the power saving can be in the range of 35%, not to mention that a lower sampling frequency always makes measurement easier.
5. A design example
In order to illustrate and validate previous considerations, we face the design of a with 14-bit effective resolution at 4.4MS/s in a 2.5-V CMOS technology. These specifications pose a significant design challenge when compared to the current state-of-the-art on high-speed CMOS [12] [13] [15] [19] [21] [22] [25]-[30]. On the one hand, both resolution and speed are at the top edge of the reported designs. On the other, high-resolution must be attained employing a single 2.5-V supply, whereas most of recent designs still use a 3.3-V supply in the modulator analog core [21] [27]-[29].
249
After a feasibility study at the architecture level, similar to that of the previous sections, the topology was selected with M = 16 and B = 3, because it trades oversampling ratio (and hence power consumption) for an assumable increase of the last quantizer complexity. Given the 2.2-MHz signal band, this oversampling ratio leads to a nominal sampling frequency Operating with a higher oversampling (20 or above) will push up to 100MHz, which may complicate the design and its subsequent test. 5.1. Switched-capacitor implementation
Fig.9 shows the fully-differential SC schematic of the modulator. The 1st stage of the modulator includes two SC integrators, the first one with a single input branch and the second with two input branches; and switches controlled by the comparator outputs are employed to feed the quantized signal back. The 2nd stage incorporates an integrator with only two input branches, although three weights are implemented: and This can be done because the selection of weights allows distribution of weight between the two integrator branches: and The same applies for in the 4th integrator of the cascade, also requiring only two input branches. Note that the integrator weights of the 4th-stage have been scaled with respect to those in Table 3 in order to keep a loop gain equal to unity in the multibit stage, so that . Nevertheless, this change causes no impact on other characteristics. The last integrator drives the 3-bit ADC and the 3rd-stage loop is closed through a 3-bit DAC. The modulator operation is controlled by two non-overlapped clockphases. In order to attenuate the signal-dependent clock-feedthrough, delayed versions of the two phases, and are also provided. As illustrated in Fig.9, this delay is incorporated only to the falling edges of the clock-phases, while the rising edges are synchronized in order to increase the effective time-slot for sampling and integration operations [25]. The comparators and the ADC are activated at the end of phase using as a strobe signal, to avoid any possible interference due to the transient response of the integrators outputs in the beginning of the sampling phase.
250
251
5.2. Building block specifications The modulator specifications have been mapped onto requirements for its building blocks using SDOPT [23]. The issues covered in the equations database of this tool are integrator/opamp and quantizer errors, passive component errors, analog switch errors, mismatches, thermal noise, etc. Particularly, these equations include new accurate models for the mismatch of capacitors, as well as for the settling behavior of SC integrators [31]. Table 5 summarizes the sizing results for the modulator achieving
[email protected]/s. Note that the main in-band error source is due to quantization noise, which amounts to –88.5dB. In practice, this error is formed by several contributions that have been separately displayed in
252
Table 5. Note that other error mechanisms are well below the limit imposed by quantization error except circuit noise. This is because, given the high-resolution, high-speed nature of this application, the choice of the sampling capacitor, involved in both kT/C noise and dynamic issues, is critical. Taking into account this trade-off, the tool selected a value of 0.66pF. The output swing requirement deserves special attention in a 2.5V supply implementation. However, thanks to the previous selection of integrator weights, the output swing specification can be relaxed to only ±2V, which is feasible in a differential approach. The previous integrator/opamp specifications apply to the 1st integrator. Some aspects, such as DC-gain and dynamic performance, can be relaxed for the remaining integrators, because their in-band error contributions are attenuated by increasing powers of the oversampling ratio. For the same reason, the sampling capacitor value can also be reduced with respect to the value used in the front-end integrator. The fact that the modulator exhibits less sensitivity to non-idealities associated to the integrators located at the back-end of the cascade provides a strategy to reduce power dissipation, consisting of sizing a dedicated opamp for each integrator. As an intermediate solution, more practical from the electrical design point of view, two different amplifiers have been considered, whose specifications were obtained by fine-tuning at the modulator level with the help of ASIDES, a behavioral simulation tool [23]. Results are given in Table 6. Fig. 10 shows the resulting output spectrum for –10dBV@281kHz input sinewave.
5.3. Building block implementation
Two amplifiers have been designed: a high DC-gain, high-speed amplifier for the 1st and the 2nd integrators (OPA), and a modest DCgain, high-speed amplifier for the 3rd and 4th integrators (OPB). OPA
253
uses a 2-stage 2-path compensated architecture (Fig.11(a)), with a telescopic 1st stage and both Miller and Ahuja compensation. OPB is a simple folded-cascode amplifier (Fig.11(b)). The common-mode feedback nets are dynamic. The circuit-level sizing tool FRIDGE [23] was used for the amplifier design. Amplifier DC-gain non-linearity and non-linear settling were carefully considered, because they are more evident in low-voltage implementations. Comparators at the end of the modulator 1st and 2nd stage demand a low resolution time (around 3ns) and a resolution lower than 20mV. A regenerative latch including a pre-amplifying stage has been used [32].
254
The value of can be obtained using CMOS switches with 2.5-V supply without clock-bootstrapping. Nevertheless, in lowvoltage technologies the switch DC-characteristic is highly non-linear, causing a dynamic distortion, the more evident the larger the signal frequency [33]. The sampling process in the integrators has been analyzed using electrical simulations for sinewave and DMT signals. With the analog switch used, THD is always lower than –96dB for sinewave signals (see Fig. 12) and has no practical influence on the multi-tone power ratio MTPR of DMT signals. With the set of integrator weights used only 2 × 16 unitary capacitors are required. They have been implemented using metal-insulator-metal (MiM) structures available in the intended technology. The measured mismatch is for 1-pF capacitances, low enough to avoid calibration. The 3-bit A/D/A converter consists of a simple fully-differential flash ADC and a resistive-ladder DAC, implemented using unsalicided n+ poly. The comparators in the ADC are of the same type as the ones at the end of the 1st and 2nd stages of the 5.4. Experimental results Fig. 13 shows a microphotograph of the prototype fabricated in a CMOS technology. It occupies without pads and dissipates 65.8mW (55mW correspond to the analog blocks) from a 2.5-V supply. Preliminary measurements are given next. Fig.14(a) and (b) show the measured SNR as a function of the input level for two values of the oversampling ratio. Each plot contains two curves obtained operating at 35.2-MHz and 72.4-MHz sampling frequencies. For the nominal oversampling ratio, M = 16, the DR measured is 83dB (13.5bit),
255
whereas with M = 32 it increases up to 88.6dB (14.4bit). However, the measured performance decreases as the sampling frequency increases, mainly as a consequence of the switching activity of the digital output buffers. Thus, at 72.4MHz the corresponding figures are 78.6dB (12.8bit) for M = 16, and 83dB (13.5bit) for M = 32. Despite these problems, that we are trying to solve at the board level, the figure-of-merit FOM [34] yields 2.1pJ at maximum speed (72.4MHz) and 2.6pJ at 35.2MHz. The stateof-the-art in high-frequency CMOS modulators is revised in Fig. 15, including the different implementations and topologies recently reported: Discrete-time single-loop [12] [13], Discrete-time cascade [15] [19] [21] [22] [25]-[28] [30] [35] [36],
256
Parallel [37], and Continuous-time [38]-[40]. Note that this modulator is among those with smallest value of the FOM for high-frequency applications.
ACKNOWLEGMENT: This work has been funded by the CEE (ESPRIT IST Project 29261/MIXMODEST), the spanish MCyT and the ERDF (Projects TIC2001-0929/ADAVERE and 1FD1997-1611).
References
[1] H.J. Casier: “Requirements for Embedded Data Converters in an ADSL Communication System”. Proc. of the IEEE Int. Conf. on Electronics, Circuits and Systems, Vol. I, pp. 489-492, Sept. 2001. [2] F. Op’t Eynde and W. Sansen: Analog Interfaces for Digital Signal Processing Systems. Kluwer Academic Publishers, 1993. [3] W.L. Lee and C.G. Sodini: “A Topology for Higher Order Interpo-
lative Coders”. Proc. of the IEEE Int. Symposium on Circuits and
257
Systems, pp. 459-462, May 1987. [4] S.M. Moussavi and B.H. Leung: “High-Order Single-Stage Single-Bit Oversampling A/D Converter Stabilized with Local Feedback Loops”. IEEE Transactions on Circuits and Systems, Vol. 41, pp.19-25, Jan. 1994. [5] S.R. Norsworthy, R. Schreier, and G.C. Temes (Editors): DeltaSigma Data Converters: Theory, Design and Simulation. IEEE Press, 1996. [6] T. Cataltepe et al.: “Digitally Corrected Multi-bit Data Converters”. Proc. of the IEEE Int. Symposium on Circuits and Systems, pp. 647-650, May 1989. [7] M. Sarhang-Nejad and G.C. Temes: “A High-Resolution ADC with Digital Correction and Relaxed Amplifiers Requirements”. IEEE J. of Solid-State Circuits, Vol. 28, pp. 648-660, June 1993. [8] F. Chen and B.H. Leung: “A High resolution Multibit Sigma-Delta Modulator with Individual Level Averaging”. IEEE J. of SolidState Circuits, Vol. 30, pp. 453-460, April 1995. [9] R.T. Baird and T.S. Fiez: “A Low Oversampling Ratio 14-b 500kHz ADC with a Self-Calibrated Multibit DAC”. IEEE J. of Solid-State Circuits, Vol. 31, pp. 312-320, March 1996. [10] O. Nys and R. Henderson: “A Monolithic 19bit 800Hz Low-Power Multibit Sigma Delta CMOS ADC using Data Weighted Averaging”. Proc. of the European Solid-State Circuits Conf., pp. 252-255, Sept. 1996. DAC with Dynamic Ele[11] F. Chen and B. Leung: “A Multi-Bit ment Matching Techniques”. Proc. of the IEEE Custom Integrated Circuits Conf., pp. 16.2.1-16.2.4, May 1992. CMOS ADC”. [12] Y. Geerts et al.: “A High-Performance Multibit IEEE J. of Solid-State Circuits, Vol. 35, pp. 1829-1840, Dec. 2000. [13] T.-H. Kuo, K.-D. Chen, and H.-R. Yeng: “A Wideband CMOS Sigma-Delta Modulator With Incremental Data Weighted Averaging”. IEEE J. of Solid-State Circuits, Vol. 37, pp. 11-17, Jan. 2002. [14] Y. Matsuya et al.: “A 16-bit Oversampling A-to-D Conversion Technology Using Triple-Integration Noise Shaping”. IEEE J. of Solid-State Circuits, Vol. 22, pp. 921-929. Dec. 1987.
258
[15]B. Brandt and B.A. Wooley: “A 50-MHz Multibit Modulator for 12-b 2-MHz A/D Conversion”. IEEE J. of Solid-State Circuits, Vol. 26, pp. 1746-1756, Dec. 1991. [16]N. Tan and S. Eriksson: “Fourth-Order Two-Stage Delta-Sigma Modulator Using Both 1 Bit and Multibit Quantizers”. Electronics Letters, Vol. 29, pp. 937-938, May 1993. [17]F. Medeiro et al: “Multi-bit Cascade Modulator for HighSpeed A/D Conversion with Reduced Sensitivity to DAC Errors”. Electronic Letters, Vol. 34, No. 5, pp. 422-424, March 1998. [18] V.F. Dias and V. Liberali: “Cascade Pseudomultibit Noise Shaping Modulators”. IEE Proceedings-G, Vol. 140, No. 4, pp. 237-246, Aug. 1993. [19] F. Medeiro et al.: “A 13-bit, 2.2-MS/s, 55-mW Multibit Cascade Modulator in CMOS Single-Poly Technology”. IEEE J. Solid-State Circuits, Vol. 34, pp. 748-760, June 1999. [20] I. Dedic: “A Sixth-Order Triple-Loop CMOS ADC with 90dB SNR and 100kHz Bandwidth”. Proc. of the IEEE Int. Solid-State Circuits Conf., pp. 188-189, Feb. 1994. [21]A.R. Feldman, B.E. Boser, and P.R. Gray: “A 13-Bit, 1.4-MS/s Sigma-Delta Modulator for RF Baseband Channel Applications”. IEEE J. of Solid-State Circuits, Vol. 33, pp. 1462-1469, Oct. 1998. [22] J.C. Morizio et al.: “14-bit 2.2-MS/s Sigma-Delta ADC ’s”. IEEE J. of Solid-State Circuits, Vol. 35, pp. 968-976, July 2000. [23] F. Medeiro, B. Pérez-Verdú, and A. Rodríguez-Vázquez: TopDown Design of High-Performance Sigma-Delta Modulators. Kluwer Academic Publishers, 1999. [24] J.C.H. Lin: “TSMC Mixed-Signal 1P5M+ MIM Salicide 2.5V/ 5.0V Design Guideline”. Taiwan Semiconductors Manufacturing Co. [25] A.M. Marques et al: “A 15-b Resolution 2-MHz Nyquist Rate ADC in a CMOS Technology”. IEEE J. of Solid-State Circuits, Vol. 33, pp. 1065-1075, July 1998. [26] L. Brooks et al.: “A Cascaded Sigma-Delta Pipeline A/D Converter with 1.25 MHz Signal Bandwidth and 89 dB SNR”. IEEE J. of
259
Solid-State Circuits, Vol. 32, n. 12, pp. 1896-1906, Dec. 1997. [27] I. Fujimori et al.: “A 90-dB SNR 2.5-MHz Output-Rate ADC Using Cascaded Multibit Delta-Sigma Modulation at 8x Oversampling”. IEEE J. of Solid-State Circuits, Vol. 35, pp. 1820-1828, Dec. 2000. [28] Y. Geerts et al.: “A 3.3-V, 15-bit, Delta-Sigma ADC with a Signal Bandwidth of 1.1 MHz for ADSL Applications”. IEEE J. of SolidState Circuits, Vol. 34, pp. 927-936, July 1999. [29] R. del Río et al.: “Top-down Design of a xDSL 14-bit 4-MS/s SigmaDelta Modulator in Digital CMOS Technology”. Proc. of the Design, Automation and Test in Europe Conf., pp. 348 -351, March 2001. [30] K. Vleugels, S. Rabii, and B. Wooley: “A 2.5V Broadband MultiBit Modulator with 95dB Dynamic Range”. Proc. of the IEEE Int. Solid-State Circuit Conf., pp. 50-51, Feb. 2001. [31] R. del Rio et al.: “Reliable Analysis of Settling Errors in SC Integrators – Application to Modulators”. Electronics Letters, Vol. 36, pp. 503-504, March 2000. [32] G.M. Yin, F. Op’t Eynde, and W. Sansen: “A High-Speed CMOS Comparator with 8-b Resolution”. IEEE J. of Solid-State Circuits, Vol. 27, pp. 208-211, Feb. 1992. [33] W. Yu et al.: “Distortion Analysis of MOS Track-and-Hold Sampling Mixers Using Time-Varying Volterra Series”. IEEE Transactions on Circuits and Systems II, Vol. 46, pp. 101-113, Feb. 1999. [34] F. Goodenough: “Analog Technologies of all Varieties Dominate ISSCC”. Electronic Design, Vol. 44, pp. 96-111, Feb. 1996. [35] G. Yin, F. Stubbe, and W. Sansen: “A 16-b 320-kHz CMOS A/D Converter Using Two-Stage Third-Order Noise Shaping”. IEEE J. of Solid-State Circuits, Vol. 28, No. 6, pp. 640-647, June 1993. [36] R. del Río et al.: “A High-Performance Sigma-Delta ADC for ADSL Applications in CMOS Digital Technology”. Proc. of the IEEE Int. Conf. on Electronics, Circuits and Systems, Vol. 1, pp. 501-504, Sept. 2001. [37] E.T. King et al.: “A Nyquist-Rate Delta-Sigma A/D Converter”.
260
IEEE J. of Solid-State Circuits, Vol. 33, pp. 45-52, Jan. 1998. [38] E.J. van der Zwan et al.: “A 13mW 500kHz Data Acquisition IC with 4.5 Digit DC and 0.02% accurate True-RMS Extraction”. Proc. of the IEEE Int. Solid-State Circuit Conf., pp. 398-399, Feb. 1999. [39] L. Luh, J. Choma, and J. Draper: “A 400Mhz 5th order CMOS Continuous -Time Switched Current SD Modulator”. Proc. of the European Solid-State Circuit Conf., pp. 72-75, Sept. 2000. [40] B. Hallgren: “Design of a second order CMOS sigma-delta A/D converter with 150MHz clock rate”. Proc. of the European SolidState Circuit Conf., pp. 103-106, Sept. 1992.
Sigma Delta Converters in Wireline Communications Andreas Wiesbauer, Jörg Hauptmann, Peter Laaser1 Infineon Technologies Design Center Villach, Austria 1 Design Center Munich, Germany
Abstract
The paper contains a review of Sigma Delta ADC development with focus on multi-bit Sigma Delta ADCs. Both, achievements in the Sigma Delta area and driving applications are presented. Wireline communication products employ Sigma Delta ADCs extensively in voice band modems, ISDN modems and ADSL modems. Especially in the latter ones multi-bit Sigma Delta modulators are used. Design examples for ADSL modems are presented and recent advances, such as low voltage and deep sub-micron converters are discussed. An outlook on the developments of Sigma Delta ADCs for ADSL and VDSL is given, also covering some aspects of continuous time Sigma Delta ADCs.
1. Introduction
Driven by the deregulation in the telecommunications industry and the vast growth rate of internet users, world wide industry has made enormous efforts to deploy and standardize a multitude of various different access technologies ranging from media such as air, twisted pair copper cables, coax cables and fibers. Wireline 261 M. Steyaert et al. (eds.), Analog Circuit Design 261–283. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
262
communication applications have gone through an amazing evolution during the last twenty years. Applications such as video conferencing, fast internet-downloads, digital TV and tele-working have been made available to the public with wide coverage. A public exchange has turned from a building full of racks to a rack full of ASICs. Highly complex coding algorithms, requiring thousands of MIPS of DSP processing-power were implemented in single ASICs, consuming only less than one watt of power. At the same time the integration of large dynamic-range and high-bandwidth analog front ends was achieved. Advances in (deep) sub-micron CMOS technologies along with advances in analog and digital circuit design made the so-called System on Chip (SoC) design possible. Except for fiber optic transmission, where the high speed switching, high speed optic-to-electric and electric-to-optic conversion is most challenging, wireline communication is performed with data rates close to the Shannon limit. This requires highly advanced line-codes being used at the physical layer. Besides the huge number of MIPS for coding, an efficient implementation of such technologies also requires high resolution ADCs and DACs. Up to a few MHz analog bandwidth the Sigma Delta principle was successfully applied in this area. Especially for twisted-pair coppercable modems sigma delta converters are widely used. Due to the high bandwidth needed in coax cable modems, Nyquist-type converters are employed in this area. A historical review on the 48 years of Sigma Delta ADC development is presented in section 2 along with a brief description of the advances in twisted pair modem development. Section 3 is focusing on state of the art AFEs for ADSL, HDSL and SDSL applications. Design examples of several multi-bit Sigma-Delta ADCs are discussed on an architectural level with some aspects of circuit design in deep sub-micron CMOS technologies. VDSL AFEs do not fully exploit sigma delta principles, since there is not yet an efficient implementation demonstrated for the required 12 MHz analog bandwidth. We will present an example of a VDSL converter design using oversampling and comment on the problems linked to the design of Sigma Delta ADCs for VDSL in section 4.
263
2. Evolution of Multi-bit Sigma Delta ADCs for xDSL
An excellent survey of early papers in the Sigma Delta area can be found in [2] extended by tutorial sections in a later work [3]. Some of the important milestones are described in this section with focus on the progress in multi-bit Sigma Delta AD conversion for DSL applications and an update on the recent achievements in this area. From the time noise-shaping was invented in 1954 [1] until the mid eighties, a great deal of Sigma Delta theory was developed and several practical implementations have proven the feasibility and potential for data conversion. Modulator architectures employing higher order loop filters, multi-bit quantization, and continuous time filters were published by then. Cascaded structures are known with delta modulators since 1967 [7]. Even though, a broad acceptance of Sigma Delta converters in the integrated circuit design community was not established. Only after 1985 when Candy published the often cited paper about the “standard” second order Sigma Delta modulator (SD2) [4], the modulators became popular and started to be deployed in ASICs. Besides the availability of this robust and still widely used SD2 architecture, the line width of integrated circuits fell below 2,4 and supply voltages came down to 5V, allowing the design of power-efficient ASICs with fast and complex digital algorithms. The driving application by this time was high quality digital recording of audio signals. Also voice codecs, voice-band modems and ISDN applications have been developed by that time. For this applications the demand on very linear ADC’s in combination with high signal to noise ratio was crucial, which could be easily achieved with single-bit sigma delta converters without trimming or calibration. 16 bit AD converters were needed in voice applications in order to have sufficient system flexibility, to handle synthesized line impedance, and the high level teletax signals. Due to the large oversampling ratios (OSR) the complexity of the anti aliasing filters for such modems was drastically reduced. These advantages caused an extremely fast deployment of such converters and at the end of the 1980’s there was no voice or ISDN application left without Sigma Delta converters. At this time sigma delta converters were mainly used for low bandwidth applications (up to 150kHz). Going to higher bandwidths would have required higher sampling frequencies and thus a significant increase of
264
power consumption. Nowadays, Sigma Delta converters can be found in many fields of engineering such as sensors and metrology, speech coding, digital audio, wireline and wireless communication. In 1985 Adams presented an 18 bit Sigma Delta ADC for audio applications [5]. This design, built out of discrete components, was way ahead of any integrated circuit designs by that time. It used multi bit quantization, a order loop filter and continuous time (CT) circuitry. It consumed 40 Watts of power including the decimation filter. Since then huge efforts were made to implement high performance Sigma Delta architectures in standard CMOS technology. Koch [6] published a 12 bit CT Sigma Delta ADC in 1986 using a CMOS process, with excellent power efficiency for the standards back then. In 1987 a 16 bit audio ADC in CMOS using a cascaded structure was published [8] and band-pass Sigma Delta architectures were disclosed [9].
In the 90’s the Internet became popular and has since then been the driving force for delivering increased bit rates to the home. This could no longer be accomplished with voice-band frequencies or the standardized ISDN modems. XDSL technologies came up, using much higher bandwidths on the twisted pair of up to 12 MHz as depicted in figure 1. SDSL, HDSL and ADSL are using frequencies up to 1.1 MHz. The depicted 10BaseS is an Infineon proprietary application using a non-standard VDSL modem to provide Ethernet
265
links via twisted pair connections over distances of more than 1000 meters and data rates of 10 Mbits/sec. Standard compliant VDSL makes use of frequencies up to 12 MHz.
With the xDSL technology a quantum step in terms of bit at the cost of a reduced maximum loop length rate was possible. An ADSL modem can provide a data rate of up to 8 Mbit, as depicted in figure 2.
For this application there was the need of having AD converters with 14 bit resolution at a signal bandwidth of 1.1MHz (ADSL). Figure 3 shows a typical block diagram of a xDSL analog front end. Transmit signals are generated in the digital domain, converted by the DAC and filter to an analog signal which is then fed to the line driver (LD) providing sufficient power to the low impedance transformer
266
which couples the transmit signal onto the twisted pair line. The receive signal along with a part of the transmit signal (the echo) enters the AFE through the hybrid. In the programmable gain amplifier (PGA) and the receive filter the signal is conditioned such that it’s peak value fits as good as possible to the full-scale range of the ADC. Signal levels and a typical wave form at the input of the ADC are depicted in figure 4. The effective SNR for the digital receiver is determined by the level of the receive signal and the noise floor at the ADC. Approximately 50 dB to 65 dB of effective SNR is required (10 bit to 15 bit constellation) for each carrier of a DMT signal in ADSL, asking for a 8 bit to 11 bit ADCs. For long reach connections the echo signal level might be higher than the receive signal additionally requesting several bits in dynamic range (DR). Further more the signal has a crest factor of 15 dB, which increases the DR by an additional 2.5 Bits. The ADC in different ADSL modems is specified with 14 bits [34-37] as a trade off in the complexity of the hybrid, receive filter and ADC. In VDSL 12 bit ADCs are employed. The reduced resolution is possible since there is less power on the twisted pair cable and more complex filters are used for echo attenuation [28,29,32].
In the beginning of the xDSL-area this performance could not be achieved with state of the art single-bit Sigma Delta converters having reasonable power consumption. Therefore pipeline AD converters have been widely used for ADSL [34,36,37]. But during xDSL development there have been essential improvements in the CMOS processes in terms of speed and matching properties, so that the power dissipation of Sigma Delta converters could be drastically reduced
267
using very fast deep sub-micron technologies in combination with multi-bit feedback instead of single-bit feedback. The first breakthrough for multi-bit Sigma Delta ADCs had it’s origin with the work of Leslie and Singh [10], where they introduced the concept of multi-bit quantization in the later stages of cascaded structures in 1990. The first stage of the cascaded modulator keeps its single-bit quantizer, ensuring excellent linearity properties, while later stages can use multi-bit quantizers with only moderate linearity requirements on the internal DAC. A 12 bit, 1 MHz bandwidth ADC was published in 1991 by Brandt and Wooley [11], using an extension of this concept. The ADC was implemented in a CMOS process. Efficient implementations using this architecture were published until the recent past with bandwidths of more than 1 MHz and resolutions of more than 14 bits [18,21] Using a multi-bit quantizer in the first stage of a cascade modulator or in a single loop modulator is critical, since the internal DAC needs to have excellent linearity requirements. Such an ADSL ADC was published [20], claiming that the DAC’s capacitor sizes were limited by noise constraints rather than matching limits. In general this might not be the case, depending on the CMOS process properties. A solution for this problem - another break-through - was the invention of mismatch shaping techniques, which basically convert internal DAC element errors to high frequency noise. Thereby highly linear oversampling DACs can be built with only moderate matching requirements for the DAC elements. A bibliography of such techniques is given in [12]. Those techniques are being developed since 1988 starting with randomization of the DAC-elements [13]. The methods are continuously improved with respect to implementation efficiency and order of shaping. Since the presentation of the Baird and Fiez paper [14] in 1995 and the disclosure of the ADC design of Brooks et. al. [15] in 1997, these techniques are well established in the Sigma Delta design community, allowing efficient and robust implementations of Sigma Delta ADCs with resolutions of more than 14 bits and bandwidths beyond 1 MHz. [16,17,19,20]. A combination of the above mentioned techniques was used in a CMOS design [19], achieving a peak SNDR of 87 dB with a
268
modulator having an analog bandwidth of 2 MHz and almost 16 bits of resolution. Single-bit Sigma Delta ADCs were also demonstrated to achieve ADSL performance [18,22,23]. Our experience proved that this designs are much more difficult to implement in SoC. They tend to use higher clock frequencies than multi-bit designs and have a much higher out of band noise, which makes the circuits more vulnerable to interference problems. At the same time they tend to have higher order noise shaping, which results in higher power consumption due to the more extensive digital filtering needed. To our knowledge, there is no Sigma Delta ADC reported achieving VDSL performance (12 MHz analog BW and more than 12 bit resolution). However, there are attempts to enlarge the BW for Sigma Delta ADCs by going from switched capacitor (SC) circuits to continuous time (CT) circuits [24, 25]. Modern CMOS processes allow the design of SC ADCs with a maximum sampling rate of approximately 100 MHz to 150 MHz - a factor of 7 below the gain bandwidth product achievable with operational amplifiers (op-amp). The same op-amp used in CT modulators allows sampling rates of 300 MHz and above. Thus, we believe that the feasibility of CT Sigma Delta ADCs covering VDSL applications will be demonstrated in the next few years.
3. Sigma Delta Design examples for DSL AFEs Within the wireline communications group at Infineon, several AFEs for DSL services were designed. The range covers applications from ISDN via HDSL, SDSL and ADSL to VDSL. Infineon’s ISDN codecs are built with classical SD2 modulators and VDSL codecs use subranging ADCs. All other DSL AFEs apply multi-bit Sigma Delta converters. In this section an overview on the design of multi-bit sigma Delta ADCs for such AFEs is presented. The requirements for the ADCs is 13 bits to 14 bits of resolution and bandwidths ranging from 138 kHz for ADSL-central office to 1.1 MHz for ADSL full rate on customer premises. In 1996 an ADC for full rate ADSL was designed using a 0.65 CMOS technology and 5V supply. This design was similar to [23]
269
with somewhat reduced resolution and has internal references. The architecture, a 2-1-1 cascade with all single bit quantizers, is depicted in figure 5. It consumed 500mW of power and had an active area of Clocked at 52 MHz, high bandwidth op-amps and reference buffers were needed, driving loads of several pF capacitance. The outof-band noise power of the first-stage output signal at frequencies around half the sampling rate is very high. Therefore, the first stage DAC and the integrators have to be designed and laid out very carefully, in order not to run into interference problems.
The first multi-bit Sigma Delta ADC within Infineon was designed into a product in 1998, showing excellent performance over the full 1.1 MHz bandwidth. The architecture is depicted in figure 6. The first stage uses a 3 bit internal quantizer, reducing the out of band noise power and improving design robustness. In the second stage a 5 bit quantizer was used, resulting in an overall performance of with 7 bits internal quantization. A more detailed description can be found in [20]. As compared to the single bit modulator, power consumption could be reduced by a factor of two at the cost of an increased silicon area. Sampling rate was 26 MHz as opposed to 52 MHz contributing to design robustness and additional power saving in the system’s PLL. From a circuit-design point of
270
view, the design was identical to the single bit design, using similar op-amps in the same CMOS technology.
Clearly, by architectural optimization the power efficiency of the ADC could be improved by a factor of 2. For comparison of ADconverters with slightly different resolution and different analog bandwidth (ABW) we define the Power Efficiency (PE) as
where P is the power consumption in mW and N is the resolution in bits. Above described multi-bit design is chosen as a reference, with 1100 kHz ABW, 250mW power consumption, and 14 bits resolution. Over the years the PE has increased up to 4 as it is depicted in figure 7. From the designs in 1996 to 1998 the change from single-bit to multi-bit was the main reason. In 1999 and 2000 the architecture was not changed significantly. The design was migrated from 5V supplies to 3.3V supplies. Even though this redesign was done in the same technology with similar circuit technology, constant PE could be achieved by more aggressive biasing in the op-amps, thereby increasing the relative voltage swing (Vsignal/Vsuply). The other three examples in 1999 and 2000 were design adaptations for different ABWs of 138 kHz, 276 kHz and 500kHz, without power optimization but extremely short design cycles, marked with reuse in figure 7.
271
An AFE for G.SHDSL was designed in 2000 [38] and redesigned in a CMOS technology in 2001, bringing up PE by a factor of 4. Several reasons made this progress possible. On architectural level, the cascaded structure was replaced by a single-loop structure, as shown in figure 8. The design is still however using 4 bits internal quantization. By taking a higher oversampling ratio (OSR) the reduction of the internal quantizer resolution from 7 bits to 4 bits is possible. Thereby the clock frequency roughly stays the same, since the ABW is reduced to 600 kHz. Compared to the previously discussed cascade architecture, the single loop architecture has more relaxed requirements on the op-amps, therefore allowing reduced power dissipation.
On circuit level, the design of more advanced reference buffers saved a significant portion of power. In previous designs, around 50% of the current drain was caused by the reference buffers. Instead of having class-A buffers directly driving the capacitors, a class-AB buffer is used. Therefore, the references draw only 25% of the total current. A different approach to reduce reference buffer current drain is presented in [26], where the transient charge is delivered from an
272
external capacitor having its voltage controlled by a class-A op-amp. In previous designs the sampling and DAC capacitors were arranged in such a manner that there was a constant charge drawn from the references in each clock cycle. A different arrangement, also used in [15], allows a reduction in the capacitor size by a factor of two at the cost of a signal dependent charge being drawn from the references. With the low output-impedance reference buffers this was proven not to be a problem. Finally, the smaller feature size allowed a more power efficient implementation of the op-amps.
Yet another Sigma Delta ADC with 14 bit performance was designed for an ADSL central office chipset. Exploiting the high speed capabilities of a process the PE could be slightly increased, even though the supply voltage decreased to 1.8 V. Besides the advances in circuit design described above, again the architecture was changed. As drawn in figure 9, a second order 3-bit modulator with an OSR of 96 is used. Thus the number of active blocks is reduced and likewise the overall power consumption.
273
Folded cascode op-amps were replaced with two-stage miller opamps to increase the relative output voltage swing. A major problem using such low supply voltages is the proper design of the switches. Linearity requirements of 14 bits cannot be fulfilled if standard transmission gates are used. In the literature several methods are known to overcome this problem. Some designers use low threshold voltage devices for switches in the signal path [16]. Others try to keep the gate overdrive constant employing a technique called bootstrapping [15]. In [27] a more sophisticated technique is presented and several important references are given. We have chosen to drive the switches with a higher supply voltage available. This is a feasible solution for many applications, since the I/O pads often use a higher supply voltage for interfacing with external chips. Another solution is to use a charge pump and increase the supply voltage for the switch drivers locally [22]. However, this techniques can cause reliability problems or additional power dissipation, since several clock buffers might be operated with a higher supply voltage. The internal DAC linearity requirements are met by the use of simple dynamic element matching algorithms as described in [3,12]. Since the input signal of a DSL ADC is quite busy, tone-generation problems related to some of the techniques are not critical during regular operation. Up to 80 dB SFDR and 13 bits resolution the DAC linearity can be achieved by proper sizing and careful layout without any calibration or dynamic element matching. Power efficiency is very important for telecommunication applications. At the customer premises a high level of integration and low-cost packaging is needed to keep the cost of the modems low. In the exchanges one single line card provides up to 64 ADSL modems, requesting minimal power consumption of each individual component down to each sub-circuit of a modem chipset. For this case very low power consumption is as important as integrating multi port front ends on one single silicon-die (8 to 16 channel AFE). However, PE is not the only quality factor of a certain design. Yield and chip-area are the dominating cost factors in high volume IC manufacturing. Our experience has shown that Sigma Delta ADC designs are extremely robust, even in the harsh environment of a SoC. With reduction of feature size we still experience a reduction in ADC size. As an example, in figure 10 layout plots of 3 different 14 bit ADC designs
274
are shown. From left to right, the feature size decreases, as well as the active silicon area. This is mainly due to the architectural changes we made along with the changes of the processes (see figures 6,8,9). In fact, the total capacitance decreased at the cost of a higher sampling rate. Besides the reduced total capacitance, the op-amps and the wiring shrink automatically with a feature size reduction. The total silicon area of each ADC did shrink approximately proportional to the feature size.
Future designs in this area will be dealing with power reduction and low supply voltage aspects. In terms of power efficiency, we believe that the trend shown in figure 7 will continue to levels well beyond 10 within the next few years. The 16 bit 1.1 MHz design of Geerts et.al. [17] is estimated to have a similar PE than the presented designs. Further improvement might be possible for the next generations in deep sub micron processes, since we do not observe a reduction in relative voltage swing (Vsignal/Vsupply) so far. The active silicon area might not be scaling linearly with the feature size
275
in the near future unless a significant reduction of building blocks is possible. Looking at the quite simple architecture in figure 9, this might not be possible. Advances in Sigma Delta ADCs will therefore have their origin rather in circuit-innovations than in architectural changes. For example the efforts made in the area of continuous time Sigma Delta ADC design might pay off in the near future. Once the problems in this area are solved there might be a significant reduction in power and silicon area possible.
4. VDSL AFEs exploiting oversampling ADCs
Pipeline-type ADCs can be built with very high resolution and high bandwidth, especially if calibration techniques are applied. The bubbles filled light gray in figure 11 indicate the range where different types of ADCs are employed in wireline communication applications. The + signs indicate ADCs that were implemented within Infineon and the dark gray areas show different twisted pair modem applications. Pipeline ADCs achieve the highest bandwidthresolution figure at the cost of a typically high silicon area. High
276
resolution multi-bit Sigma Delta ADCs are feasible for analog bandwidths below 2 MHz. For higher sampling rates the required opamps cannot be implemented in CMOS processes. Subranging type converters can achieve extremely high bandwidth but their resolution is limited to 10 bits in deep sub-micron CMOS processes [30], which is due to the fact that they do not employ any residual amplification. Algorithmic ADCs, such as the successive approximation converter (SAR) are the most power efficient and area efficient converters, since they use only one active component (the comparator). In terms of analog bandwidth for a given resolution, SAR converters show the poorest performance. Since the analog processing is performed completely serially, for each bit of resolution one clock cycle is needed requiring several time constants of settling. Thus the maximum operating frequency is relatively small. Recent progress in the SAR converter implementation makes the principle applicable to resolutions of 13 bits for Nyquist rates up to 1 MHz, being a competitive alternative for some Sigma Delta ADCs. As described in [31], a non-binary weighted capacitor array is used allowing more tolerance in each conversion step, thereby reducing the settling requirement.
Pipeline-type ADCs are widely used for VDSL AFEs [28,29]. In Infineon’s VDSL AFEs however, subranging converters exploiting oversampling are employed. In the most recent design in
277
CMOS, a 10 bit ADC clocked with 160 MHz achieves an effective resolution of more than 10.5 bits. The oversampling approach is more feasible than an extension of the ADC resolution at Nyquist rate to more than 10 bits. To illustrate the efficiency of our approach, the layout plots of an 11 bit Nyquist rate ADC (clocked at 26 MHz) in CMOS is depicted in figure 12 right next to the 10 bit 160 MHz ADC. Together with the decimation filter, this approach needs less than half the area of the comparable pipeline ADC in [28]. More details about this AFE concept can be found in [32]. This example indicates, that deep sub-micron process properties are very much in favor with oversampling architectures. By making use of these properties on an architectural level the well established pipeline ADCs can be outperformed in terms of silicon area and power dissipation. Converter architectures will be adapted to further benefit from the speed and matching advantages provided by deep sub-micron processes, which makes us believe that there will be other ADC types available in the near future for VDSL. High resolution switched capacitor Sigma Delta ADCs have been published with an analog bandwidth of 2MHz [19]. Operating with low OSR of 16 at 64 MHz the internal integrators need to provide gain-bandwidth-products of more than 400 MHz. Scaling this number to the VDSL bandwidth of 12 MHz, one would require opamps with gain bandwidth products of more than 2.4 GHz. This is certainly not achievable with modern CMOS technologies. An interesting alternative can be found in continuous time (CT) Sigma Delta ADCs. It is known in the literature [33], that continuous time modulators can be clocked up to 10 times faster than their discrete time equivalent. This is due to the relaxed requirements on settling time and the possibility to use open loop active structures, such as gm-C filters. Since there is no sampling operation at the ADC input, the driving stage is loaded continuously. Therefore no sampling switch is needed, being in favor with low voltage operation. Furthermore, it is quite simple to incorporate a programmable gain amplifier (PGA) function inside the modulator by simply making the internal DAC-current programmable, as indicated in Figure 13. However, there are several challenges with the implementation of continuous time Sigma Delta ADCs.
278
Design Complexity: A typical CT design is started with a discrete time equivalent. Parameters such as the noise transfer function, the resolution of the internal quantizer and the sampling frequency are chosen in the discrete-time domain. The design is then transformed to an equivalent CT circuitry, assuming an exact knowledge of the transfer function of each analog block, which in fact does not hold in reality. The consideration of all the analog circuit tolerances and imperfections requires lengthy simulations and iterations in modeling in order to finally achieve the desired performance. If, for example, a significant delay within the Sigma Delta loop is not modeled correctly the noise transfer function can change significantly and performance can degrade dramatically. As compared to a SC realization, where the design tools and methodologies are well known, we lack of experience and guidelines for modeling the important parameters in CT designs. Probably the main reason why there is no CT Sigma Delta ADC reported with more than 12 bit performance. Jitter sensitivity: The internal DAC of a high bandwidth CT Sigma Delta ADC is usually built in a current-steering topology, realizing currents with a step-function output. As opposed to a SC DAC, where the energy delivered into the integrators depends very little on the switch-on and switch-off times of the DAC elements, this topology is much more sensitive to timing jitter. A SC DAC causes a peak current with exponential decay, whereas a current steering DAC produces a rectangular (quasi constant) current making the charge transfer directly proportional to the switch timing. Typical impulse response currents are depicted in figure 14, where the dotted lines indicate the effect of jitter. A proposed solution [33] is to use nonsquare shaped current pulses. This approach however reduces the dynamic range of the modulator.
279
Linearity: The linearity of a Sigma Delta ADC is defined by the linearity of its input stage. It can be split up in two critical components - the integrator and the feedback DAC. Integrators built in gm-C technology are not efficient for linearity requirements beyond 12 bits. Therefore the first-stage integrator is often realized with an op-amp RC filter or even in SC technique. Both approaches increase the requirements on the op-amp thereby reducing the maximum sampling rate possible. Still, a CT Sigma Delta ADC with op-amp RC input-stage can be clocked twice as fast as it’s SC counterpart. The internal DAC is also an extremely critical component in any Sigma Delta ADC, especially when it comes to linearity issues. In a CT modulator the implementation is much more difficult than it is in a SC modulator. Non-equal rise time and fall time of the DAC pulses causes harmonic distortion, even if a singlebit architecture is chosen for the implementation. One possible solution is to use return to zero coding, at the cost of a decreased dynamic range. Above mentioned issues become more and more critical as the sampling rate increases. Very few publications can be found dealing with low-pass CT Sigma Delta ADCs clocked at rates beyond 100 MHz. In [24] a 400 MHz design is presented. It uses gm-C filters and a single-bit quantizer. The performance for a 3.1 MHz analog bandwidth is approximately 10 bits, even though the ADC was designed for 12 bits. Another design aiming at 6 MHz and 12 bits is described in [25]. It uses op-amp RC filters and multi-bit quantization. Unfortunately measurement results were not presented.
280
5. Conclusion Multi-bit Sigma Delta ADCs are the most efficient AD converters for high-resolution wireline communication applications up to analog bandwidths of 1.1 MHz. Even for supply voltages below 1.8 V in deep sub-micron processes we do not see a fundamental constraint for their efficient use. For lower bandwidths and moderate resolutions SAR ADCs might be more efficient than Sigma Delta ADCs. VDSL applications are starting to employ oversampling ADCs, but do not yet use Sigma Delta ADCs. A huge potential is attributed to continuous time Sigma Delta ADCs with respect to power reduction and bandwidth enhancement.
References [1] [2]
[3]
[4]
[5]
[6] [7] [8]
C. C. Cuttler,” Transmission systems employing quantization”, 1960 U. S. Patent No. 2,927,962 (filed 1954) J. C. Candy, G. C. Temes editors, “Oversampling Delta-Sigma Data Converters: Thery, Design, and Simulations”, IEEE Press, 1992, New York S. R. Northworthy, R. Schreier, G. C. Temes, editors, “Delta Sigma Data Converters: Theory Design and Simulations”, IEEE Press, 1997, New York J. C. Candy, “A Use of Double Integration in Sigma Delta Modulation”, IEEE Trans. Communications, pp. 249-258, March 1985. R. Adams, “Design and Implementation of an Audio 18-Bit Analog-to-Digital Converter Using Oversampling Techniques”, J. Audio Eng. Soc., vol. 34, pp. 153-166, March 1986 R. Koch, B. Heise, “A 120 kHz Sigma Delta AD Converter”, ISSCC Tech. Digest 1986, pp. 138-139. J. Das and P. K. Chatterjee, “Optimized Delta Delta Modulation System”, Elect. Lett., vol. 3, pp. 286-287, June 1967. Y. Matsuuya, K. Uchimira, A. Iwata et.al, “A 16 Bit oversampling A/D Conversion Technology using Triple Integration Noise Shaping”, IEEE JSSC, Vol. sc-21, Dec. 1986, pp. 1003-1010
281
[9]
[10] [11]
[12]
[13]
[14]
[15]
[16]
[17] [18] [19]
[20]
[21]
T. H. Pearce and A. C. Baker, “Analogue to Digital Conversion Requirements for HF Radio Receivers”, Proc. Of IEE Colloquium on system Aspects and ADCs for Radar, Sonar, and Communications, London, Nov. 1987. T. C. Leslie, B. Singh, “An Improved Sigma delta Modulator Architecture”, Proc. of ISCAS’90, pp. 372-375, May 1990. B. P. Brandt, B. A. Wooley, “A 50-MHz Multibit Sigma Delta Modulator for 12-b 2-MHz A/D Conversion”, IEEE JSSC, vol. 12, Dec. 1991. T. Shui, R. Schreier, F. Hudson, “ Mismatch Shaping for a Current-Mode Multibit Delta-Sigma DAC”, IEEE JSSC, vol. 34, March 1999. L. R. Carley and J. Kenny, “A 16-bit order noise shaping D/A converter”, Proc. 1988 CICC, Rochester, NY, May 1998, pp. 21.7.1-21.7.4 R. T. Baird, T. S. Fiez, “Improved Delta Sigma DAC Linearity Using Data Weighted Averaging”, Proc. ISCAS 1996, pp. 1316, May 1995. T. Brooks, D. H. Robertson, D. F. Kelly, A. Del Muro, S. W. Harston, “ A Cascaded Sigma-Delta Pipeline A/D Converter with 1.25 MHz Signal Bandwidth and 89 dB SNR”, JSSC, vol. 32, Dec. 1997. I. Fujimori, L. Longo, A. Hirapethian, et. al, “A 90-dB SNR 2.5-MHz Output-Rate ADC Using Cascaded Multibit DeltaSigma Modulation at 8x Oversampling Ratio”, JSSC, vol. 35, Dec. 2000. Y. Geerts, M. S. J. Steyaert, W. Sansen, “A High-Performance Multibit Delta Sigma CMOS ADC”, JSSC, vol. 35, Dec. 2000. J. C. Morizio, M. Hoke, T. Kocak, et. al ,“14-bit 2.2-MS/s Sigma-Delta ADC’s”, JSSC, vol. 35, July 2000. K. Vleugels, S. Rabii, B. A. Wooley, “A 2.5V Broadband Multi-bit Sigma Delta Modulator with 95 dB Dynamic Range”, ISSCC 2001. A. Wiesbauer, H. Weinberger, M. Clara, J. Hauptmann, “ A 13.5-Bit Cost Optimized Multi-Bit Delta-Sigma ADC for ADSL”, Proc. of ESSCIRC 1999, pp. 86-88, Sept. 1999. F. Medeiro, B. Perez-Verdu, A. Rodriguez-Vazquez, “A 13-bit, 2.2-MS/s, 55-mW Multibit Cascade Sigma Delta Modulator in
282
[22] [23]
[24]
[25]
[26] [27]
[28]
[29] [30] [31] [32]
[33]
[34]
CMOS Single-Poly Technology”, JSSC, vol. 34, June 1999. A. R. Feldman, “A 13-bit, 1.4-MS/s Sigma-Delta Modulator for RF Baseband Channel Applications”, JSSC, vol 33, Oct. 1998. A. M. Marquez, V. Peluso, M. S. J. Steyaert, W. Sansen, “A 15b Resolution 2-MHz Nyquist Rate Delta Sigma ADC in 1 CMOS Technology”, JSSC, vol. 33, July 1998. L. Luh, J. Choma Jr., J. Drapper, “A 400 MHz Order Continuous-Time Switched-Current Sigma Delta Modulator”, Proc. of ESSCIRC 2000. D. Cousinard, R. Kanan, M. Kayal, P. Deval, V. Valence, “Design Methodology and Practical Aspects of a 12- bit HighSpeed Continuous- Time Sigma- Delta ADC”, Workshop on Embedded Data Converters, ESSCICR 2000. P. C. Maulik, “A 16-Bit 250-kHz Delta-Sigma Modulator and Decimation Filter”, JSSC, vol. 35, April 2000. A. M. Abo, P. R. Gray, “A 1.5-V, 10-bit, 14.3 MS/s CMOS Pipeline Analog-to-Digital Converter”, JSSC, vol. 34, May 1999. I. Mehr, P. Maulik, D. Paterson, “A 12-bit Integrated Analog Front-End for Broadband Wireline Networks”, Proc. of CICC 2001. N. P. Sands et.al, ,,An Integrated Analog Front-end for VDSL“, Proc. of ISSCC 1999, session 14.5 B.P. Brandt, J. Lutzky, “A 75mW, 10-b, 20MSPS CMOS subranging ADC with 9.5 effective bits at Nyquist”, JSSC, Dec. 1999, pp. 1788-1795 F. Kuttner, “A 1.2V l0b 20MSample/s Non-Binary SuccessiveApproximation ADC in CMOS”, ISSCC 2002. H. Weinberger, A. Wiesbauer, M. Clara, C. Fleischhacker, T. Pötscher, B. Seger, “A 1.8V 450mW VDSL-4Band Analog Front-End IC in CMOS”, ISSCC 2002. J. A. Cherry, “Continuous-Time Delta-Sigma Modulators for High-Speed A/D Conversion”, Kluwer Academic Publishers, USA, 2000. P. P. Siniscalchi, J. K. Pitz, R. K. Hester, “A CMOS ADSL Codec for Central Office Applications”, JSSC, vol. 36, March 2001.
283
H. Weinberger, A. Wiesbauer, J. Hauptmann, “A 800mW, FullRate ADSL-RT Analog Frontend IC with Integrated Line Driver”, Proc. of CICC 2001 [36] Texas Instruments, Product Preview, „ TLFD600, 3.3V Integrated Analog Front End“, May 2000 [37] Data Path, Advanced Notification, „ DPS8100 ATU-R, ADSL Analog Front End with Integrated Line Driver Combo IC“, August 1999 [38] P. Laaser, T. Eichler, H. Wenske, et. Al., “A 285mW CMOS Single Chip Analog Front End for G.SHDSL”, Proc. of ISSCC 2001. [35]
This page intentionally left blank
Part III: Short Range RF Circuits Michiel Steyaert Since several years the research in the possibilities of CMOS technologies for RF applications is growing enormously. The trend towards deep sub-micron technologies allows the operation frequency of CMOS circuits above 1GHz, which opens the way to integrated CMOS RF circuits. The market for short range applications has become very important. Examples of Bluetooth and WLAN systems are finding their way into the market. For that research in further implementation and integration of those circuits is discussed in this part. The first paper discusses implementation techniques for both fully integrated DECT and Bluetooth applications. Also here we see a trends towards single chip CMOS technologies for the Bluetooth circuits. The second paper deals with the requirements and impact of WLAN applications on the circuits. Different receiver architectures and their impact on the performance are discussed. The difficult requirements of power amplifiers and its linearity specifications are analyzed. The third paper handles integration issues for high speed RF building blocks in CMOS technology. VCO circuits and power amplifiers are critical building blocks, and some integration improvements are presented. The fourth papers deals with circuit implementation of a Bluetooth circuit. The main focus is fully integration in standard CMOS technology. It is one of the first fully integrated RF CMOS short range transceiver circuits.
The fifth paper analyses the possible use of complex signal processing within sigma-delta structures. As such integrated quadrature modulator receivers can be obtained, which brings us one step closer to more signal processing in the digital domain. The last paper is research in extreme low power transceiver circuits for biomedical applications. In those systems, by optimizing the system, coding, architecture and different circuits, extreme low power in combination with low voltage can be achieved
RF circuits in DECT and BLUETOOTH
P.T.M. vanZeijl Ericsson Eurolab Netherlands BV NieuwAmsterdamsestraat 40 7801 CA Emmen The Netherlands
Abstract
RF circuits for DECT and Bluetooth ASICs will be presented. Consequences of ASIC packaging, ESD protection, mass production and component modelling will be discussed. The paper concludes with a view on the future.
1. An RF-frontend for DECT applications 1.1. DECT frontend radio architecture
Figure 1 shows a block-diagram of a DECT front-end ASIC. A heterodyne receiver and transmitter architecture were adopted. The ASIC contains an image reject front-end for converting the antenna signal from 1.88-1,9 GHz (RXA-RXB) down to the IF of 110 MHz (IFO). The active part of the 1.77-1.79 GHz oscillator (RFLO) was on-silicon, while the resonator part (inductor and varactor) are placed on the PCB. A low-frequency oscillator (IFLO at 100 MHz) has its active part on-silicon, while the resonator is placed on the PCB. The IFLO signal is used for the second down-conversion in the receiver. The signals of the RFLO and IFLO oscillator are mixed in an up287 M. Steyaert et al. (eds.), Analog Circuit Design , 287–316. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
288
conversion image rejection mixer to generate the +4 dBm output power for the transmitter (TXA-TXB). One of the major challenges in the design is the large range of power supply voltages: 3.0 to 5.5V in order to connect the ASIC directly to the battery, without extra power supply regulators. The ASIC was designed in 1994-1995, with stateof-the-art complexity [1], [2]. It contains an LNA, a PA, 4 mixers, 2 low-pass filters, 2 phase-shifters, 2 VCOs (without resonators), 3 bandgaps and 3 regulators.
1.2. An LNA for DECT
The schematic of the LNA is given in Figure 2. It consists of a differential pair (Q1-Q2), followed by CB-stages (Q3-Q4) and CC stages (Q5-Q6). The voltage gain given by:
and input impedance
are
289
The (real part of the) balanced input impedance is approximately 150 ohm. Thus a low Noise Figure can be reached at a relatively low bias current. A balun transforms the balanced 150 ohm to an unbalanced 50 ohm on the PCB. Such a balun can either be implemented with 2 inductors (L0 and L1) and 2 capacitors (C4 and C5) as shown in Figure 2, or by means of (micro-)striplines. The LNA has a gain of 20 dB, a Noise Figure of 3 dB while consuming 3.5 mA at 1.9 GHz.
290
1.3. Pushing and pulling for a DECT VCO
Figure 3 shows the schematic of the 1.78 GHz oscillator. The negative resistance is formed by Q0-Q1 and C5-C6. The resonator consists of L0, L1, C2 and C3. The stage around Q6-Q7 buffers the oscillator signal. These VCOs are very sensitive for power supply variations (pushing) and load pulling.
DECT type approval requires a very low "drift-in-slot", i.e. the frequency of the transmit signal is not allowed to shift more than 15 kHz in 1 DECT slot (417 us) [3]. This resembles only 8 ppm of the DECT transmit frequency (1.89 GHz)! There are two causes for frequency shifts in the VCO:
291
1) Pushing: switching ON the power amplifier gives a change in power supply voltage (0.2 V for a worst case 0.5 ohm battery series resistance and 400 mA drawn by the power amplifier) and 2) Pulling: switching ON the power amplifier gives a change in the load impedance of the VCO. The bandwidth of the PLL is too low (50 kHz) to correct disturbances on the VCO frequency within 1 slot. If a disturbance has taken place, the PLL will try to correct this and thus cause drift. In order to handle this requirement we must predict the change in VCO frequency and minimise the disturbance. The frequency of an oscillator can be estimated from a SPICE transient analysis. However, such an analysis is quite time-consuming for reaching the required resolution. A simpler method consists of simulating the oscillation conditions: the magnitude of the loopgain must be larger than 1 and the phase of the loopgain shall be a multiple of 360° at the frequency of oscillation. In other words: the phase of the loop-gain will be 0°
(or 360°, or 720°, etc) at the oscillation
frequency, which only requires a (fast) AC analysis. Figure 4 shows the simulated phase of the oscillator for Vcc=3.0 V and Vcc=5.5 V. The oscillator frequency changes 30 MHz due to the change in power supply voltage.
292
Analysis showed that the collector-substrate capacitors of Q0 and Q1 in combination with the parasitic capacitances of C5 and C6 result in a frequency change of 12 MHz/V. Even by choosing minimum geometry transistors (and degrade phase-noise performance) the frequency change is still more than allowed. A (low-voltage drop) power supply regulator with >54 dB power-supply rejection (over a 1 MHz bandwidth) has been designed to lower the frequency change to 5 kHz/0.2V in simulations. The load pulling can be simulated in a similar way. The VCO was loaded with a differential pair, which drove a double balanced mixer. The low Q of the resonator (approx. 15, a PCB stripline is used as inductor) in combination with the (voltage-dependent) parasitic capacitances gives a frequency change of 57 MHz when switching the load from ON to OFF and vice-versa. An isolation of >78 dB is required to lower the frequency shift to <7.5 kHz. Two extra buffer
293
stages containing Common-Base stages, each with their own biasing, are added to realise this isolation. The layout for reaching the 78dB isolation is very critical and requires a lot of attention. Figure 5 shows a frequency-time measurement on the 1.78 GHz oscillator when the TX part is switched ON and OFF at the rate of 500 Hz. This is done in order to reduce the influence of temperature effects on the measurement result. The measured frequency difference is 4.98 kHz. Similar measurement results are obtained for the VCO pulling. These measurements demonstrate that a large isolation is feasible on silicon. Moreover, it demonstrates that both pulling and pushing can be predicted in the DESIGN stage.
294
1.4. The output stage for DECT
The schematic of the DECT output stage is given in Figure 6. It consists of a differential pair Q1 and Q2 driven by another differential pair (Q3 and Q4). These transistors are fully switched in order to reach maximum efficiency. Resistances R1-R4 in combination with the differential emitter resistance of Q1-Q2 determine the output impedance [4]. Inductors L1 and L2 are PCB chokes or quarterlambda (micro-)striplines. Again a PCB balun transforms the on-chip balanced impedance to an unbalanced 50 ohm. The higher the impedance seen by Q1-Q2, the larger the voltage swing, and the lower the required bias current. However for very large voltage swings, voltage breakdown of the devices will become a bottleneck. A compromise was found with the (real part of the) output impedance close to 150 ohm.
295
1.5. RX and TX image rejection mixers for DECT
The low-pass/high-pass filter used in the LO-path is shown Figure 7 (left-hand side). This phase-shifter is followed by limiting stages in order to minimise amplitude unbalance between the I- and Q-path. All-pass 45° and 135° phase-shifters (see Figure 7, right-hand side) are used in the 110 MHz IF path of the receiver. In the transmitter allpass 45° and 135° phase-shifters are used in the 100 MHz oscillator for the TX image rejection. Figure 8 shows the measured image rejection of the receiver (IF is fixed to 110 MHz) and the transmitter (the frequency of the 100 MHz oscillator is fixed) versus LO frequency.
296
For an RX image rejection of 30 dB, the bandwidth is 700 MHz. For a TX image rejection of 30 dB, the bandwidth is 1150 MHz. The degradation in image rejection at high frequencies is due to bandwidth limitations in the limiter/buffer stages in the LO-path.
1.6. The overall performance of the DECT ASIC.
This ASIC was implemented in the Philips QUBiC1 process. First silicon demonstrated a high yield (>80%), no failures in ESD tests or latch-up tests. A redesign was not required. Figure 9 shows a photograph of the ASIC. The silicon-area of the ASIC is
297
2. An ASIC for BLUETOOTH applications 2.1. The Bluetooth radio architecture
Figure 10 shows the block-diagram of the radio part of a single-chip Bluetooth ASIC [5]. A heterodyne receiver with an active poly-phase filter [6] at an IF of 2 MHz is used. The output signal of the bandpass
298
filter is fed to a limiter and then demodulated. The reference voltage for the bandpass filter (and all other filters on silicon) is generated by an autotuner circuit. The autotuner is realised as a PLL coupled to the crystal-oscillator frequency, whereby the VCO in the autotuner is a replica of the gmC stages used in all filters.
The local-oscillator signals for the receiver and transmitter are derived from an oscillator running at 4.8-5.0 GHz. This enables the use of smaller on-chip inductors (and thus smaller silicon-area) with a higher Q, while the Q of the varactors does not significantly influence the oscillator performance. A divide-by-2 circuit generates the required quadrature signals for the RX and TX mixers. The PLL incorperates a delta-sigma modulator in the divider-block to enable locking to all Bluetooth channels, for any crystal-oscillator frequency between 10 and 40 MHz. The balanced crystal-oscillator can be trimmed on-
299
frequency by means of 7-bits switching-in load capacitances parallel to the crystal. External reference signals can also be injected if available in the application (i.e. 13 MHz in GSM). A duty-cycle correction circuit takes care that all internal circuits obtain a close-to 50% duty-cycle clock signal.
The transmitter uses IQ-ROM modulation. The low-frequency I- and Q-signals are generated by a ROM and fed to two DACs. After lowpass filtering, mixing and amplification, an on-chip PA delivers +4 dBm to the ASIC pins in a balanced output. An off-chip PA may be used to amplify this signal to +20 dBm. The pa-dac block controls the power level of such a +20 dBm PA. The digital part of the ASIC contains a microprocessor, ROM, RAM, I2C-interface, UART-interface and USB-interface. The RFCMOS8D process from STMicroelectronics with dual oxide thicknesses (3.2 nm giving 0.18um devices and 7 nm giving 0.34um devices) was used. Two extra mask realise a high-Q metal-metal capacitor and a buriedN-layer, thus giving the possibility of isolating the MOS devices from the substrate. The ASIC contains an LNA, a PA, 3 VCOs, 5 bandgaps, 8 regulators, 2 PLLs, 2 ADCs, 7 DACs, 3 divider-chains, a digital interface and a delta-sigma modulator. Multiple bytes are used for trimming various functions.
300
2.2. The Bluetooth LNA
Figure 11 shows the schematic of the LNA: a differential NMOS pair (M0-M1, thin-oxide devices) with resistive loads feedback resistors
The Drain-Gate
together with the gm of the differential pair and
the load resistances determine the input impedance [4]. The voltage gain
and the input impedance
of the LNA are given by:
The LNA does not use inductors in order to save silicon-area. The balun transforms the 50 ohm on the PCB to 150 ohm on the ASIC resulting in a Noise Figure of 3.5 dB at 2.5 GHz. Third-order intermodulation intercept point is -5 dBm, while the 1 dB compression level is -15 dBm (referred to the input). A voltage regulator using thick-oxide devices M2-M6 is used to prevent the thin-oxide devices M0-M1 reaching their breakdown voltage of 1.9V.
301
2.3. The Bluetooth VCO
Figure 12 shows the schematic of the Bluetooth VCO. It consists of two NMOS devices M0-M1 realising the negative resistance. The inductors are made in extra thick metal 5 and metal 6, and reach a Q of 10 at 5 GHz. Diode varactors with a high-Q (> 50 at 5 GHz) are used for tuning. Biasing is done by a current source I0 on top of the oscillator. As the power supply sensitivity is still very high (in the order of some MHz/V), a regulator is implemented.
The measured phase-noise is -122 dBc/Hz at 3 MHz offset frequency. Trimming is impemented to counteract the process spread of approximately 20%.
2.4. Si-crosstalk
A problem with the design of a single-chip ASIC is crosstalk. Crosstalk may be caused by crosstalk on the PCB, crosstalk via the bonding wires and package, crosstalk via ground and supply lines and crosstalk due to the Si-substrate. The main issue of this crosstalk
302
problem is that we don’t have ways to easily predict the effects (magnitude) of the crosstalk. PCBs, packages, ground and supply-line series resistances have to be modelled, thus increasing the complexity of the design and thus simulation time.
Consequences of interfering signals can either be linear or non-linear. Non-linear crosstalk may create modulation of signals (FM/AM, and thus create extra spurious components), or shifts bias points and gives pushing and/or pulling effects on oscillators like a VCO or a crystal oscillator. Si-crosstalk (or in general any crosstalk) can be minimised by separation of the desired signal and the interfering signal in the frequency domain (this may prove difficult due to high data-rates), separation in the time domain (no digital activity during reception/transmission of signals), or by lowering the interference source (introduce jitter on clocks), and /or isolation (use extra layout measures such as shielding or triple-well), or compensation and balancing. But how much does this improve our design? The answer to this question should be found at the beginning of the design-traject, otherwise the complete design may fail. During the design phase of our Bluetooth ASIC project, a substrate model was generated from the floorplan, see Figure 13, using the
303
SubstrateStorm program [7]. The model of the substrate is in the form of a netlist and can then be used for simulations.
The digital blocks on the ASIC were modelled in terms of Vcc, Icc, frequency and their behaviour: do the signals in these block behave like clocked signals or more like pseudo-random-bit-sequence signals? Then a simple effective large inverter with the same performance (i.e. the same Icc) is used as replacement of this large digital block. The combination of the digital effective inverter, the substrate netlist and the sensitive analog circuitry, like the LNA and VCO, can then be simulated. Of course, all components in the design should incorporate accurate modelling of parasitics to the substrate. Effects of the digital interference on the analog circuitry could be minimised or gave rise to
304
specifications on the analog blocks. See Reference [8] for more details on the procedure.
Several measures were taken for minimising the crosstalk problem: in the layout, a specially designed P-type wall isolates the radio from the baseband. All sensitive circuits in the analog part of the ASIC are balanced and have high Common-Mode rejection and low CommonMode to Differential-Mode conversion. Separate supplies and power supply regulators are used to increase the power-supply rejection of sensitive circuit like the LNA and the VCO. The measured sensitivity of the receiver, while Bluetooth transmission and reception is taking place (so the digital part is running), is -82 dBm.
2.5. Summary for the BLUETOOTH ASIC. This ASIC was implemented in the STMicroelectronics RFCMOS8 process. Figure 14 shows a photograph of the ASIC. The radio part of the ASIC is in the right-hand upper corner, it occupies including pads
A derivative of this ASIC
including baluns and antenna-switch, without the digital baseband part of the ASIC (less components on the PCB) will also become available shortly. The photograph was taken on a bonded version of the ASIC in a CQFP80 package. We used this package for debugging and functionality test. For production, the ASIC will be flip-chip mounted on a Ball-Grid-Array (BGA).
305
3. RF general issues
When designing an RF ASIC, or an ASIC containing RF parts, some basic requirements have to be fulfilled wrt packaging, ESD, tuning points, mass production requirements, which are not trivial.
3.1. ASIC packaging
The RF signals have to go on and/or off the ASIC. A complete model of the package (series resistance, capacitance from pin-to-pin,
306
package-inductance, bondwire-inductance, mutual inductances and parasitic capacitance to (PCB-) ground) is required. Usually, such a package model can be derived from the physical dimensions of the package and the dimensions of the die. Minimum length of bonding wires is in the order of 1 mm so an inductance of 1 nH. Note that this is an equivalent impedance of 15 ohm at 2.5 GHz! When the active parts of oscillators are designed on-silicon, but the passive (resonator) parts are on the PCB (like in our DECT front-end VCO), the model for the package and the consequences of ESDprotection are very important. The parasitics in the package can create extra resonance effects, thus giving the possibility of multiple oscillations.
During the design of the DECT frontend ASIC, we
encountered this problem: due to the ESD diodes, the bondpad capacitance and the package inductance, it was possible that the oscillator (designed for 1.78 GHz) started to oscillate at 3 GHz. By a smart choice of resonator-tapping, this problem was completely eliminated [9]. An interesting option for near-future designs is the use of bumps instead of bonding to connect the ASIC the the PCB. Bumps are smaller than 100 um in diameter, and thus the self-inductions smaller then 100 pH. We use flip-chip bumping on our Bluetooth ASIC.
3.2. ESD protection
ESD protection is an important issue in ASIC design for mass production. An ESD-protection of 2 kV (human-body model) is an
307
accepted value. The consequences for low-frequency design is usually minimal, and designers often forget about ESD-protection; ESDprotection is added to the design without checking the consequences, if any. However, for RF (and actually also for low-frequency critical issues) ESD-protection can be disastrous to your design. Very few literature is available on this subject, but luckily, some investigations are ongoing [10].
3.3. Tuning and calibration vs SW programmability
Manual tuning points are not allowed nowadays. They are usually replaced by on-chip calibration points: ADCs and DACs are programmed with the calibration values. Consequently tuning and/or setting various options wrt performance and functionality can and will be set by SW: we end up with a SW-tunable or SW-programmable radio. Our Bluetooth ASIC shown above has some 25 bytes of programming to set various function in the radio, ranging from programming the synthesiser to setting various test-modes.
3.4. Mass production
Mass production requires 3-sigma design or better. University designs tested on just 5 samples may therefore utterly fail in mass production. Extensive checking of design over process corners, power supply voltages and temperature is essential before an ASIC can go into mass production. Knowledge of mismatch of devices and designing for sufficient low mismatch therefore also becomes more and more
308
important [11]. Mismatch information on low-frequency effects, i.e. mismatch in resistance for resistors, mismatch in capacitance for capacitors and mismatch in threshold voltage and gm for MOS devices must be available and implemented in the Process Design Kit. For more complicated designs, this may not be enough. Assume we design our on-chip bandpass filters and oscillators with MOS devices using the inverter-like structure as proposed in [12], also see Figure 16. In this case the gm of the gyrator structure is defined by the gm of the MOS-devices.
The capacitance
in this
filter-structure is
determined by the MOS capacitance. So the mismatch model of MOS devices should also contain the mismatch in MOS capacitance, i.e. a high-frequency mismatch effect. Figure 15 shows an example of a simulation result in the autotuner reference oscillator used in the Bluetooth ASIC. Such simulations give direct feedback to the designer to prove that his design is according to the specification, also in mass production. Moreover, it can be used for optimising design in terms of silicon area.
309
High-frequency mismatch information is also needed for RF design. Look at the Bluetooth LNA in Figure 11 in combination with the desire the realise a low Common-Mode to Differential-Mode conversion for minimising Si-crosstalk. As we are operating the LNA close to its -3 dB bandwidth, the parasitic capacitances in the load and feedback resistors play a role in the Common-Mode to DifferentialMode conversion. Thus the mismatch in parasitic capacitance of the resistors as well as the mismatch in the MOS-capacitance must be known and implemented in the Process Design Kit. During the development of our Bluetooth ASIC we have had this functionality available.
310
3.4. RF design, layout and extraction tools
Simulation tools become more an more important. For instance, the simulation of complete RF front-ends including the mixers and IF parts for gain and noise was impossible in 1995; hand calculations were needed for calculating the Noise Figure and gain. For image reject front-ends with complex (I and Q) bandpass filtering, these hand calculations are very time-consuming and prone to errors. The use of Spectre-RF nowadays facilitates all these simulations and calculations significantly, and is thus a
major improvement of
predictability of designs. Simulators like ELDO and APLAC nowadays also have this functionality. Layout tools have improved: it is possible nowadays to have the routing of wires in analog designs done by computer tools like ICCraftsman. However, a lot of improvement is still needed before these tools are usable in more critical designs or before they are accepted by the designer community.
After layouts are finalised, the extraction of a layout is needed to perform post-layout simulations and check if the design is still centered and/or if the performance loss is acceptable. We found various problems related to accuracy and resolution of these tools, (such as parasitic effects not taken into account or even real errors in the extraction), which makes it difficult to optimise designs in terms of silicon area or power consumption. These difficulties are even
311
more important for CMOS designs as they are usally higherimpedance compared to bipolar designs.
3.5. Component modelling
As discussed before, resistor and capacitor models should include the parasitics to the substrate. For MOS transistors, the situation is more complicated. Usually the parasitic diode from Drain- and Sourceregion to the substrate (including its parasitic capacitance) is modelled. However, the series resistance of these diodes, and the series resistance of the MOS-backgate connection is not modelled. For the predictability of the design on RF MOS circuits, the modelling of these series resistances is a must.
Another important point in the modelling of MOS devices is the NonQuasi-Static behaviour of MOS-devices. If devices are used close to their fT, this behaviour becomes important. Figure 16 shows an example of a basic building block for gmC filters [12]. The gm-value of this gmC cell is determined by the gm of the NMOS and PMOS of the inverters ("gm-stage" in Figure 16). Two cmfeedback stages are present to fix the Common-Mode level at the outputs outp-outn. The filter capacitance is determined by the combination of all MOS capacitances in this cell, i.e. width and length of the devices are chosen such that both the required gm and filter capacitance are realised. As large device lengths are used, the fT of the devices reduces significantly, down to 5 to 10 times the filter
312
centre frequency. Consequenly, non-quasi-static effects become very important in order to design such a filter with a flat passband and prevent it from oscillating.
Figure 17 shows the measured filter transfer of such a filter as implemented in our Bluetooth ASIC. The bandwidth of the filter is 1
313
MHz centered at 2 MHz. As you can see, the pass-band is very flat, indicating no parasitic effects. The out-of-band rejection is better than 70 dB, also indicating no spurious responses and/or oscillations of the filter.
4. View on the future
We are getting more and more experienced with silicon crosstalk. The consequences for dealing with silicon crosstalk in a system-on-a-chip is becoming quite clear: RF circuits should also have good CommonMode rejection, good power supply rejection and low Common-Mode to Differential-Mode conversion. Such parameters were previously only specified and designed for in low-frequency circuits, but are becoming also important for RF circuits. These extra issues are also important on single-chip radio’s: serial interface, divider, etc generate multiple interference signals that can harm the performance of the radio. Due to the experience gained in silicon-crosstalk and the experience gained in the complexity of these single-chip designs, a system-on-silicon is definitely an achievable goal. A bottleneck may still be the difference in analog and digital design cycles. The design of a complete new piece of digital (VHDL) may take as long as the design of a complete new radio in a new process. However, the design of an existing VHDL-block in a new process is a merely the generation of a new layout and checking post-layout simulations while the transfer of an analog design into a new process is quite close to being a new design.
314
Technical challenges for going to new CMOS processes will be associated with designing analog circuits at lower voltages, not because the customer requires this, but because the maximum power supply voltages go down. Maximum power supply voltage here is defined as the voltage at which the lifetime of the product is garanteed for 10 years. Maximum gate-source voltages in 0.18u CMOS processes is in the order of 1.9 V, while in 0.13um CMOS processes this maximum voltage is reduced to 1.3 V. Luckily, the MOS threshold voltages also goes down, so analog design still is possible for future CMOS process generations [13]. Future single-chip radios or single-chip (Bluetooth) systems will incorperate fully integrated PLL loopfilters, receive- and
and
transmit-baluns, receive-transmit switches. Consequently, the number of external components will be reduced to antenna, battery, maybe a +20 dBm PA and some decoupling capacitances.
Acknowledgement
The author would like to acknowledge Niels van Erven, Frank Risseeuw, Marcel van Roosmalen and Bert Essink for their contribution to the DECT RF front-end ASIC, Jan-Wim Eikenbroek, Peter-Paul Vervoort, Suma Setty, Jurjen Tangenberg, Gary Shipton, Eric Kooistra, Ids Keekstra and Didier Belot for their contribution to the Bluetooth ASIC.
315
References
[1] Philips data sheet UAA2067G: "Image reject 1800 MHz
transceiver for DECT applications", January 1996. [2] P.T.M. van Zeijl et all, “Rembrandt: an RF ASIC for DECT
TDMA applications", ESSCIRC'97, September 1997, Southampton,
UK. [3] prTBR6: General Terminal Attachment Requirements for the
Digital European Cordless Telecommunications standard (DECT), European Telecommunications Standards Institute, June 18, 1996. [4] E.H. Nordholt, "The Design of High-Performance Negative-
Feedback Amplifiers”, Elsevier, 1983. [5] P.T.M. van Zeijl et all, “A 0.18um CMOS Bluetooth ASIC",
ISSCC2002, Februari 2002, San Francisco. [6] P. Andreani et all, “A CMOS gmC Polyphase Filter with High
Image Band Rejection“, ESSCIRC'2000, September 2000, Stockholm. [7] http://www.simplex.com. [8] P.T.M. van Zeijl, “A Practical Approach in Modelling Silicon-
Crosstalk in Systems-On-Silicon", Workshop on Substrate Noisecoupling, IMEC, September 2001, Leuven, Belgium. [9] M.W.R.M. van Roosmalen, "A Balanced Integrated
Semiconductor Device Operating With A Parallel Resonator Circuit", European Patent EP 0 785 616. [10] G. Gramega, M. Paparo, P.G. Erratico and P. De Vita, “A Sub-1-
dB NF +/- 2.3 kV ESD-Protected 900-MHz CMOS LNA”, IEEE Journal of Solid-State Circuits, July 2001, Volume 36, Number 7.
316
[11] M. J.M. Pelgrom, "Matching Property of MOS Transistors",
IEEE Journal of Solid-State Circuits, Vol 34, 1989. [12] B. Nauta, “A CMOS Transconductance-C filter technique for
Very High Frequencies“, IEEE Journal of Solid-State Circuits, Vol.27, No.2, Feb. 1992. [13] K. Bult, "Analog Design In Deep Submicron CMOS",
Proceedings of the
European Solid-State Circuits Conference",
Stockholm, September 2000.
Wireless LANs Jack Glas, Mihai Banu, Joachim Hammerschmidt, Vladimir Prodanov, and Peter Kiss Agere Systems, Murray Hill, NJ
Abstract Wireless LANs based on the IEEE 802.11 standard have achieved wide customer acceptance in the enterprise environment. They are expected to continue to expand in popularity and become ubiquitous communication systems even in private and public places. This paper discusses the basics of the wireless LAN physical layer, focusing on radio transceiver specifications and design options.
1. Introduction The introduction and proliferation of data network wireless access is a natural evolution in modern communication systems, stimulated by the promise of user mobility and freedom from wires and greatly encouraged by recent advances in portable wireless terminal technology. Just as already is the case of cellular voice communications, wireless data access is on the way of becoming a universal modern capability. For example, small and inexpensive PCMCIA modules, which readily attach to laptop computers, are available to make multi Mb/s wireless connections with access points strategically located within enterprise buildings, which further connect the users to wired LANs, intranets, etc. Likewise, in the home or in public places, wireless LANs will increasingly provide valuable data communication channels. 317 M. Steyaert et al. (eds.), Analog Circuit Design, 317–343. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
318
Typically, such networks are configured as in Fig. 1 with multiple users connected to each access point via a carrier sensing multiple access with collision avoidance (CSMA/CA) protocol. Naturally, the steady state data throughput for each user is the total system data rate after subtracting the network overhead divided by the number of users connected to one access point. The uncontested success of the current generation wireless LAN technology with speeds up to 11 Mb/s based on the IEEE 802.11b standard [1-2] has been impressive but, it could be argued that it is based on previously established digital cellular and cordless technology. For example, the substantially higher data rates and channel-bandwidth of wireless LANs compared to cellular systems are balanced by lower sensitivity and blocking requirements, yielding similar transceiver design strategies, integration level, etc. However, for speeds in excess of 50 Mb/s as specified by the 802.11a standard [3], new and difficult transceiver design challenges arise, especially in the context of low power
319
dissipation and low cost. In this paper we discuss the current wireless LAN technology and the new challenges designers will face for higher speed systems.
2. Current Systems based on the 802.11b standard 2.1 Modulation and radio specifications Originally, the 802.11 standard was written for 1 Mb/s and 2 Mb/s data rates in the 2.4 GHz -2.5 GHz ISM band, possibly using direct sequence code division multiplexing in combination with DBPSK and DQPSK modulation, respectively. An eleven-chip long Barker sequence provides processing gain, which relaxes the required SNR to below 0 dB. The channel bandwidth of 14 MHz placed anywhere in the band on a 5 MHz grid allows network configurations with 3-4 access points in close physical proximity. The maximum allowed RF transmitting power is 30 dBm but typically, 15 dBm is used in existing systems. The 802.11b standard option enhances the wireless LAN data rate to a maximum of 11 Mb/s by Complementary Code Keying (CCK) modulation [4]. While still using the same chip rate in order not to change the RF signal bandwidth, a much-reduced processing gain accommodates the higher data rate to the expense of approximately 10 dB higher SNR requirements. Practically, at 11 Mb/s CCK is equivalent in almost all respects to regular DQPSK. 2.2 Wireless transceiver solutions The recent advances in RFIC and radio system technologies have provided ample opportunities for the realization of miniaturized and economically viable wireless LAN transceivers. Typically, these blocks are implemented as shown in Fig. 2 using a few ICs and several hundred passives (mostly by-pass capacitors), packaged tightly into small modules such as PCMCIA cards.
320
Usually the cost of such modules is well within the consumer electronics market demands.
Focusing on the physical layer, notice that a radio chip and a base-band chip are typically used with analog I/Q transmit and receive interfaces. The baseband chip is mostly a digital circuit, containing only data converters as analog blocks. This system partitioning minimizes the digital switching noise coupling into the radio sections and provides low power chip-to-chip analog interfaces. The radio chip may be designed in different technologies such as Si bipolar, SiGe BiCMOS, or recently, even in straight CMOS. Typically, a –75 dBm sensitivity is accomplished for about 200 mW receiver power dissipation. The radio architecture has evolved from a conservative superheterodyne approach to less expensive direct down/up conversion. The efficiency of the linear power amplifier is limited by the signal peak-to-average ratio, which is moderate, allowing reasonable transmitter power dissipation, typically 500 mW.
3. Emerging Systems based on the 802.11a standard 3.1 Frequency bands, RF power levels, modulation formats, and data rates. In order to enable data rates up to 54 Mb/s and to increase the number of channels for easier network planning, the 802.11a standard specifies three 5
321
GHz ISM band sections (in US, similar in other countries), each containing four 20 MHz channels. The first band is from 5150 MHz to 5250 MHz and it allows up to 16 dBm transmitting RF power. The second one is from 5250 MHz to 5350 MHz with 23 dBm maximum power, and the third one, mainly intended for outdoor applications, is from 5725 MHz to 5825 MHz with 29 dBm maximum power. Eight data rates are provided (only three are mandatory), supported by various modulation techniques and coding schemes. Since the realization of the highest rate performance, 54 Mb/s using OFDM/64-QAM modulation, is the most challenging design aspect of 802.11a transceivers, for most of the following considerations we will focus on this topic. 3.2 OFDM Orthogonal Frequency Division Multiplexing (OFDM) [5], used in the physical layers of both 802.11a and HiperLAN/2 [6] is a special case of the classical frequency division multiplexing, in which the sub-carriers are orthogonal to each other in time domain, i.e., if any two are multiplied and integrated over a symbol period the result is zero. Therefore, an OFDM signal is a bank of narrow band-pass information-carrying sub-signals, placed very close to each other, as shown in Fig. 3. The time domain orthogonality is reflected in frequency domain as the property that each sub-signal spectrum is zero at the carrier frequencies of all the other sub-signals (the channel spacing is equal to the symbol-rate).
322
As a result, the sub-signal center frequencies are very close to each other leading to high spectrum efficiency. An equivalent mathematical explanation of OFDM is based on the fact that a Discrete Fourier Transform uniquely relates two sets of N complex numbers: N time samples with N frequency samples. In fact, the usual way of demodulating an OFDM signal is by performing an FFT on N time samples resulting on N magnitude and N phase quantities, which represent the transmitted information. Compared to a single-carrier system, the symbol period increases while the overall data-rate remains unaffected. In addition, as the symbol-rate becomes N times longer, the guard time interval commonly introduced to avoid ISI (inter symbol interference) adds only a relatively small overhead. Nevertheless, this brings a considerable hardware saving, as a time-domain equalizer now becomes unnecessary. A frequencydomain equalizer is still indispensable for the purpose of compensating the channel frequency response, which may change from one sub-carrier frequency to another making correct data detection impossible otherwise. However, this frequency equalizer is simple: consisting of only a single multiplication of every sub-carrier with a complex number. In summary, the main advantage of OFDM over single carrier modulation techniques is that a hardware intensive time-
323
domain equalizer is replaced by an FFT-operation followed by a simple frequency-domain equalizer. This is especially advantageous in high-rate systems. The most important drawback of OFDM is the large peak-to-average ratio (PAR) of the signal, particularly detrimental to transmitter linearity requirements. In the 802.11a 54 Mb/s mode, there are 52 sub-carriers; 48 of which are modulated with 64-QAM. There are four pilots, i.e., sub-carriers without any modulation, which enable coherent detection. Based on this simple description of OFDM it is apparent that designing 54 Mb/s 802.11a wireless LAN transceivers raises new difficulties compared to previous generation systems. It will be shown that the minimum required SNR of the received OFDM/64-QAM signals is about 30 dB, substantially higher than in other digital wireless systems. In addition, the presence of narrow-band sub-carriers across the channel implies accurate processing of the whole channel spectrum. For example, simple circuit solutions for broadband receivers such as AC coupling in direct conversion stages are not appropriate. Finally, the transmitter linearity requirements impose serious efficiency limitations to conventional power amplifiers. 3.3 Basic Transceiver Specifications Using the standard, one can derive the basic transceiver specifications. The following approximate calculations are not intended to give precise design values but rather to indicate the rough figures for 802.11a radio systems. Figs. 4 and 5 show the power spectral density (PSD), L(f) and the power levels P of the desired and undesired (noise) signal components observed on the receive side of an OFDM communication link, under different limit conditions described below. The overall propagation loss is assumed such that the received
324
power level of the desired signal
be always –65 dBm. This is the receiver
sensitivity level in 54 Mb/s mode, required by the standard for operation below a certain packet error rate. Figs. 4 and 5 illustrate how the radio channel affects the transmitted signal in receiver noise dominated and transmitter noise dominated conditions, respectively. In each case examples are shown for a frequency-flat (FF) channel and a frequency-selective (FS) channel. The FF Channel Cases: In Figs. 4a and 5a the time delay spread of the channel is assumed significantly lower than the temporal resolution of the OFDM signal, i.e., the delay is smaller than 50 ns, the typical sampling period in 802.11a. Under this assumption, the wireless channel affects each sub-carrier of the OFDM signal in the same way, which leads to a flat PSD of the desired signal component at the receiver. In fact, the power level requirement in the standard refers to this type of channel.
325
The FS Channel Cases: In Figs. 4b and 5b the average time delay spread of the channel is assumed to equal or exceed the 50 ns time resolution of 802.11a OFDM. This leads to destructive and constructive multi-path interference, creating sub-carrier PSD levels above or below the average PSD. The actual indoor environment for typical practical applications of 802.11a systems is an FS channel. Note that many channel frequency response “snapshots” are equally valid, where the signal fading occurs at different frequencies than in Figs. 4b and 5b but having the same normalized integral power
Next, we discuss the
signal-to-noise and signal-to-distortion ratios (SNR, SDR) at the receiver A/D output, which are the primary overall design requirements. First, we consider the limit case of Figure 4 when the background thermal noise and the receiver generated noise, usually expressed as the input-referred noise figure (NF), are the only sources of disturbance in the communication link. Starting from the –174 dBm/Hz background thermal noise power and
326
adding 72.3 dB corresponding to 17 MHz noise-bandwidth, we obtain the effective antenna noise power
dBm, shown as the dashed line in
Fig. 4. Subtracting this number from the required –65 dBm receiver sensitivity leads to an SNR at the input of the receiver of 36.7 dB. Simulations show that the static SNR at the receiver A/D output necessary to meet 10% packet reliability as required by the standard for an ideal additive-white-Gaussian-noise (AWGN) channel, is approximately 21 dB. Hence, for this FF limit case, we have about a 15-16 dB maximum receiver NF allocation. This is shown in Fig. 4a, resulting in an effective noise level of
In contrast, if
frequency selectivity is present (FS case), the previous calculation must be amended by a “channel correction factor” of about 5 dB, increasing the necessary SNR to 26 dB, as illustrated in Fig. 4b. This is due to the fact that the sub-carriers in deep fade (requiring additional SNR) dominate the overall performance. In a different limit case, shown in Figure 5, we consider the FF and FS channels under the assumption that the only relevant source of disturbance is transmitter imperfection, commonly referred to as transmitter implementation noise.
Typical transmitter-related noise sources include the effects of
transmitter non-idealities such as oscillator phase noise, finite linearity of the transmit chain, finite digital word length, and limited power amplifier (PA) back-off (see next subsection). These phenomena cause an error between the desired signal and the actual transmitted signal measured by the “error vector magnitude” (EVM). The standard specifies a maximum average RMS value for the EVM. The EVM-related noise process is proportional to the desired signal, and hence is specified by a relative dB number. For example, the standard specifies –25 dB EVM for the 54 Mb/s mode. In a first-order approximation, the in-band noise, caused by non-linear transmitter effects, is an AWGN process. Therefore, in the FF case an identical calculation with that of the previous
327
paragraph yields a
value of 21 dB, necessary to meet the specified
packet error rate. Since only transmitter imperfections are present, this number translates directly into 21dB receiver SNDR shown in Fig. 5a. In contrast to the situation of Figure 4, the same calculation is valid for the FS case shown in Fig. 5b, and the “channel correction factor” is zero! The reason for this property is that in the transmitter, all sub-carriers have equal power (before passing through the channel) and thus all are affected by transmitter noise equally. Hence, the sub-carrier signal-to-noise ratio remains unchanged during channel propagation. A third limit case is when the only source of disturbance in the communication link comes from the receiver distortion, commonly known as receiver implementation noise. This type of disturbance is signal dependent and is produced by many non-idealities such as local oscillator noise, non-linearity in receiver chain, I/Q imbalances, DC offsets, A/D converter quantization noise, residual adjacent channels or blockers due to insufficient filtering, etc. The resulting interference is a near-Gaussian and frequency-flat noise signal, essentially directly related to the desired signal by some number
in dB.
For the FF case, the overall SNR requirement is 21 dB, as in the previous calculations. For the FS case, the “channel correction factor” is nonzero since the desired signal exhibits faded sub-carriers whereas the noise signal is flat and added after the channel propagation, hence affecting the weak sub-carriers. In actual systems, all noise/distortion processes caused by transmitter and receiver imperfections and the receiver thermal noise effectively add in the receiver. Depending on the actual spectral noise shape, the “channel correction factor” is between 0 and 5dB. Assuming an overall transmitter/receiver implementation loss of about 3-4 dB and -65 dBm sensitivity (corresponding to
and
values greater than 30 dB), the receiver NF must be 7dB or lower. Notice that the only way the design methodology can make a difference in the transceiver performance is by minimizing the receiver NF and the various
328
practical errors mentioned previously. For this reason it is instructive to identify these errors and investigate the circuit blocks where they are produced in more detail. 3.4 Typical transceiver design issues Before focusing on specific transceiver issues, we point out a clear distinction between the thermal noise and the implementation noise. The latter is expressed relative to the intended signal while the former is expressed as an absolute value. As a consequence, implementation noise is always important, independently of the distance between the transmitter and the receiver. On the other hand, bringing the receiver closer to the transmitter decreases the effect of thermal noise due to higher signal strength. In a PER (packet error rate) against SNR performance plot, the implementation noise appears as an impenetrable PER floor. A measure of the “implementation loss” is the amount of PER curve shift after applying implementation noise at the PER value of interest. Summarizing the previous discussion, the total noise at the output of the receiver A/D converter is the result of contributions from three types error sources: transmitter noise with maximum level fixed in the standard, receiver thermal noise, and receiver implementation noise. The latter category can be divided into noise sources that are always present (e.g., integrated phase noise, quantization noise, DC offset, etc) and noise sources that are only present when a blocking signal is applied. During a gain/noise/linearity budget analysis it is important to make a distinction between these cases, as the thermal noise contribution is halved (intended signal is 3 dB above the sensitivity level) when a blocking signal is present. As an illustration of typical transceiver design issues, next we will focus on several sources of implementation noise.
329
PA back-off: As mentioned in subsection 3.2, OFDM suffers from a high signal PAR. As a result, the necessary transmitter dynamic range is higher than that of the 802.11b case. The gains in the various blocks are set such that the average power level stays below the 1dB compression point by a certain dB amount called the back-off. As this value is usually smaller than the PAR for power efficiency reasons, signal clipping occurs and the corresponding interference produces implementation noise. The PA back-off value is extremely critical since the already low 10-15% efficiency of linear PAs [7] is easily decreased further. In order to determine the appropriate PA back-off, first, the EVM requirement should be met (-25dB for 54Mbps) and second, the spectral re-growth due to clipping should be limited within the specified spectral mask. Fig. 6 is an illustration on how clipping causes spectral re-growth.
Transceiver Linearity: In OFDM, intermodulation of sub-carriers is of great concern, as the resulting products fall in channel exactly at the frequencies of other sub-carriers, corrupting the information carried by them. For example, the receiver maximum signal specified by the standard is –30dBm. Two
330
neighboring sub-carriers increased by sub-carrier PAR of 3dB (sub-carrier PAR is much smaller than the total OFDM signal PAR) intermodulate and corrupt other sub-carriers. Assuming a typical 5 dB margin allocated for other implementation noises, the resulting minimum receiver input IIP3 is about –10dBm. A similar analysis is valid for the second order intermodulation products, usually with less stringent effects but strongly coupled to the choice of receiver architecture (important in low IF and direct conversion). Channel Selection Filtering: An important contributor to the receiver implementation noise is the residual blocker signal after channel selection filtering. As an example we consider a filter for an 802.11a low-IF receiver. This filter may be a complex band-pass continuous-time circuit. The passband is 17 MHz, which is one channel-width wide, and the center frequency is at the 10 MHz low-IF. The filter must attenuate the unwanted interferers/blockers in order to reduce the aliasing noise (produced by sampling before A/D conversion) to acceptable levels. The worst-case blocking specification for the 54 Mb/s data rate is –63 dBm level adjacent channel while the desired signal is at –62 dBm level (3 dB higher than the sensitivity level). A conservative design uses a 6-order type-one Chebyshev filter with 0.5 dB pass-band ripple. The frequency response of this filter shown in Fig. 7 provides in excess of 30dB rejection for all sub-carriers in the blocking signal. In practice circuit imperfections may degrade the performance especially at the edges of the channels. A source of blocker-dependent implementation noise, related to this low-IF filter, is the limited image-rejection due to circuit imbalance. Typically, an image-rejection of 30dB is achievable without compensation algorithms. As the
331
required SNR in 802.11a exceeds this number, a compensation algorithm is required and the resulting implementation loss has to be taken into account.
3.5 Transceiver Design Choices
In this subsection we discuss several design possibilities of key blocks, without attempting an exhaustive treatment of this topic. The receiver technology is always a prime concern in any RFIC design so various alternatives will be discussed with more details on a new low-IF sampling solution. The stringent specification of the power amplifier linearity is a major limitation to its power efficiency, ultimately resulting in high power dissipation. Possible alternative design methods are mentioned. Finally, the perennial question of which IC technology is best suited for this application will be addressed. Using low integration multiple IF super-heterodyne receivers: This is the
332
most conservative design choice for 802.11a receivers. The required high performance as described earlier can be met readily if enough external filters and other precision RF components are used. Of course, the cost will almost surely be too high for this application. Using highly integrated single IF super-heterodyne receivers: Having a single IF SAW filter in addition to an RFIC and few external components may be a proper compromise between cost and performance. However, the design is still challenging due to analog I/Q down-conversion from IF to base-band. In order to insure final 30 dB SNR, excellent image rejection and linearity must be accomplished in the presence of the usual phase and magnitude errors, A/D converter imbalances, DC offsets, etc. In addition, the large frequency error between transmitter and receiver synthesizers allowed in the standard could place an OFDM sub-carrier very close to DC after the final down-conversion. There, substantial offsets and 1/f noise may corrupt the information in the respective sub-carrier. Using highly integrated super-heterodyne receivers with low-IF sampling: An attractive theoretical method for mitigating most problems related to analog conversion to base-band is by using IF sampling. Digitizing at IF removes any I/Q imbalance errors (down-conversion done digitally) and also avoids any DC offset or 1/f-noise problems entirely. Unfortunately, this approach requires two successive IF SAW filters in order to avoid aliasing of blockers close to the channel. Naturally, this is not attractive for cost reasons. However, it is possible to replace one SAW filter [8] by a combination of a fully integrated continuous-time complex filter at 10 MHz (second low-IF) with a complex (I/Q) sampling circuit [9]. The receiver block diagram is shown in Fig. 8. After a first IF conventional stage using a SAW filter, the signal is converted to a 10 MHz low IF and is processed through a continuous-time fully integrated complex band-pass filter. A similar filter for Bluetooth is described in
333
[10]. The frequency response of a three-pole Butterworth complex filter including practical component mismatches is shown in Fig. 9, including the leakage signal due to mismatches. A further 8 dB rejection of the adjacent channel is accomplished by the complex sampling operation, which exhibits a notch filter characteristic given in Fig. 10.
334
This method does not accomplish full integration, however, compared to the previous approach, it makes more efficient use of the hardware resources with very little if any power dissipation penalties and better performance. Notice that the sampling frequency is only twice the value of the base-band sampling case and a single A/D converter is used rather than two. Using fully integrated low-IF receivers: This recently popular method is attractive because it accomplishes full integration and provides high tolerance to DC offsets, 1/f noise, and frequency errors. The most important challenge for this technique is the image rejection precision in high-order complex filters, driven very hard by the OFDM SNR demands. Notice that, there is no SAW filter to help the complex filters in this approach as in the IF-sampled technique. Using fully integrated direct conversion receivers: Despite fundamental limitations, direct conversion has become an accepted and even preferred receiver method in many applications. The most attractive feature is its simplicity, although this may be deceiving when DC offset cancellation loops
335
and extra linearity demands are included in the picture. The application of direct conversion to 802.11a is possible but very difficult due to I/Q matching requirements and the large transmitter-receiver frequency errors. Notice that in this case the situation is much more serious than in the single-IF superheterodyne approach because first, the DC offset is substantially higher and second, the desired signal is not amplified by an IF amplifier prior to DC conversion. Two options remain: narrow down the AC-couple cut-off frequency as much as possible and accept the associated loss, or correct the local oscillator synthesized frequency with very fine resolution. The latter may require the use of complicated fractional synthesizer techniques, which in turn introduce extra noise and spurious signals. Using pseudo direct conversion receivers (wide/sliding IF): This techniques, used recently in a fully integrated CMOS 802.11a radio RFIC [11], converts the RF signal to base-band in two consecutive mixing operations. Some of the classical direct conversion problems, such as LO leakage in the RF band and DC offset from RF LO self-mixing, are avoided but the lack of an IF filter/AGC strip increases the receiver NF. The power amplifier efficiency problem: As discussed previously, the OFDM signals have large peak-to-average ratios, which requires PA operation in class A with large back-off. This is reflected in lower power efficiency with serious repercussions in terms of total transmitter power dissipation. The application of PA linearization techniques could improve the efficiency. In addition, two methods are known to potentially use efficient nonlinear PAs and still achieve linear amplification. However, these techniques are yet to be widely introduced in RFIC products. The first method, known as the LINC technique [12], decomposes the band-pass signal into two or more constant envelope signals. These can be amplified by highly efficient switching PAs and then recombined before sending to the antenna. A research test chip
336
demonstrating this concept is described in [13]. The second method uses a polar signal representation. A frequency-modulated signal is first amplified with an efficient PA and then an amplitude modulation is added to the signal [14]. The actual effectiveness of these techniques remains to be demonstrated. IC Technology choices: While CMOS is the universal technology for baseband and MAC processing, the proper technology choice for the radio RFIC depends on the availability to proprietary technology. From a purely technical perspective, it is clear that the challenges of the 802.11a transceiver design justify the use of a high-performance RFIC technology such as bipolar or BiCMOS. However, the consumer driven cost pressures of wireless LANs cannot be avoided. This will produce increasingly aggressive designs emphasizing low cost without compromising performance. Proprietary inexpensive SiGe BiCMOS technology such as in [15] is ideally suited for these developments. In addition, the standard CMOS widely available from foundries has been making considerable progress in RF capabilities driven by a large pool of talented designers with no access to other technologies. The main current limitation of CMOS RFICs for high-speed wireless LANs is more related to power supply voltage scaling rather than inferior transistor performance.
4. Future higher rate systems Following the trends of wired data communications, wireless LANs are likely to evolve towards even higher rates, i.e., 100 Mb/s, 200 Mb/s, etc. The following question arises naturally: which technical solution would best match this trend? From the facts discussed in this paper it appears that a further increase of the modulation depth, i.e. using 256-QAM, etc., will pose tremendous transceiver implementation problems not easily solvable with inexpensive circuits. From a fundamental point of view, Shannon’s famous
337
channel capacity theorem [16] clearly shows that increasing the information content per unit bandwidth is realized only with an asymptotically exponential SNR increase, which has dramatic cost and power implications in practice. For instance, since for a static AWGN channel with given SNR, the capacity in bits per transmitted symbol is given by
doubling the
theoretical limit for the number of bits per symbol from 1 to 2 (e.g., BPSK to QPSK) requires an increase in transmitted energy by a factor of 3. Going from 4 to 8 bits (16-QAM to 256-QAM), brings a 17-fold increase the transmitter output power. For large spectral efficiencies, each additional bit transmitted in the same symbol requires a doubling of the transmit power. Clearly, the common sense compels us to try to transmit twice the number of bits for doubled transmit energy. This is possible but only by exploring other “dimensions” in the communication theory. 4.1 Doubling the Bandwidth A brute force approach to accomplishing higher rates is by increasing the channel bandwidth. For example, 100 Mb/s could be realized through two present channel transmissions at 54 Mb/s. The current radio system and RFIC technology are suitable to accomplish this but the cost and power dissipation of the new system may not be as attractive as the current generation of wireless LAN products. Generally, there is a vast amount of bandwidth in the 5 GHz ISM band and hence, this approach is interesting due to its relative simplicity. 4.2 Exploiting the Spatial Dimension (MIMO) A fundamentally different system approach to obtaining higher rates is based on the MIMO concept. MIMO (Multiple-In-Multiple-Out) refers to a system in which there are both multiple transmit antennas transmitting simultaneously in the same bandwidth, and multiple receive antennas used to capture and recover
338
the data streams and reconstruct the desired information. A 2 x 2 MIMO structure is illustrated in Fig. 11. As demonstrated in [17], the Shannon capacity for MIMO structures is impressive. For a given overall power we can transmit significantly more bits per unit bandwidth than in the traditional single-antenna systems.
In general, it can be shown that for an
MIMO system, there is a linear
relationship between the overall signal power and the channel capacity (in contrast to the logarithmic dependency in the classical case). However, notice that in order to exploit this concept, it is required that the wireless propagation channel be “favorable”. In mathematical terms, this can be expressed by the condition that the channel matrix is of high rank. Only then we have parallel independent data pipes between transmitter and receiver which can be exploited in order to increase the data rate. In a typical indoor-situation, due to the rich multi-path fading environment, this condition is typically fulfilled provided that the antenna outputs are uncorrelated (for instance spaced at least _ to _ wavelength apart). Note, however, that similarly to the original (one-dimensional) Shannon limit, the MIMO capacity theory gives only upper limits of the achievable
339
spectral efficiencies, and does not provide any guidelines as to how these limits are approached. Over the past few years, various signaling schemes have been developed for MIMO channels emphasizing various aspects of the new degree of freedom. The most prominent examples include the different variants of BLAST [17] or space-time coding [18], but due to the wide spatial-temporal design possibilities a large number of other advanced techniques have been proposed recently. The performance of MIMO enhancements in 802.11a OFDM WLANs with spatial maximum likelihood (ML) detection has been investigated for varying propagation environments in [19].
Table I gives an overview of basic physical layer parameters for single and multiple antenna transmission and corresponding achievable data rates under the assumption that the general OFDM format remains unchanged (i.e., number of OFDM sub-carriers, number of pilot tones, duration of guard intervals etc.).
340
The columns contain the spatial dimension, i.e., number of transmit antennas, the effective code-rate (ratio of information-carrying bits to transmitted bits), the constellation depth, the resulting raw spectral efficiency in bits/s/Hz, and the achievable data-rate. Note that the spectral efficiency for 100 Mb/s is in the order of 9 bits/s/Hz, which is twice the number at 54 Mb/s and generally very large compared to other existing commercial wireless systems. Also, the system in the third row of Table I has been chosen by the 802.11a standard over the one in the second row due to better robustness under typical channel conditions, despite the fact that both systems provide the same data rate. However, although it is hard to beat well-designed MIMO codes in terms of spectral efficiency, these schemes come with the burden of increased base-band signal processing needs and increased complexity of the RF circuitry. Moreover, the sensitivity of MIMO to co-channel interference still has to be assessed and may partly reduce the theoretical spectral efficiency gains of this approach. The debate about next-generation WLAN standards is about to take a more concrete form in the standards bodies, and it will be interesting to see what technological features will eventually play a part in an ultra-fast WLAN air interface.
5. Conclusions While wireless access to data networks has become a fully accepted capability in everyday life, enabled by the availability of inexpensive transceiver technology, the upcoming speed enhancement to 54 Mb/s through the 802.11a standard encompasses serious design challenges. Power dissipation and cost are the most important final features, of course, assuming a reasonable range performance. Increasing the data rate even further, as will surely be dictated by the market, will set new technical hurdles, which will require innovative system
341
and circuit solutions. The presentation in this paper has tried to illustrate that close synergy between systems and circuits is a necessary ingredient for future successful designs.
REFERENCES
[1]
Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, ANSI/IEEE Std 802.11, 1999.
[2]
Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: Higher-Speed Physical Layer Extension in the 2.4GHz Band, IEEE Std 802.11b-1999.
[3]
Part 11: Wireless LAN and Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-speed Physical Layer in the 5GHz Band, IEEE Std. 802.11 a-1999.
[4]
K. Halford , S. Halford, M. Webster and C. Andren: “Complementary Code Keying for RAKE-based Indoor Wireless Communication,” Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, vol. 4, pp. 427-430, 1999.
[5]
Richard D. J. van Nee and Ramjee Prasad: “OFDM for Wireless Multimedia Communications,” Artech House , October 6, 1999
[6]
“Broadband Radio Access Networks (BRAN); HIPERLAN Type 2; Physical layer,” ETSI TS 101 475 vl.2.2 (2001-02), Technical Specification.
[7]
Kyounghoon Yang, George I. Haddad and Jack R. East: “High-Efficiency Class-A Power Amplifiers with a Dual-Bias-Control Scheme,” IEEE Transactions on Microwave Theory and Techniques, Vol. 47, No. 8, pp.1426-1432, August 1999.
342
[8]
J. Glas and V. Prodanov: "System and Method for an IF-sampling Transceiver," patent application filed in the United States Patent and Trademark Office on 11/20/01.
[9]
S. Levantino, C. Samori, M. Banu, J. Glas, V. Boccuzzi,”A CMOS IF Sampling Circuit with reduced Aliasing for Wireless Applications,” in IEEE ISSCC Digest of Tech. Papers, vol. 45,Feb. 2002, pp. 404-405.
[10] V. Prodanov, G. Palaskas, J. Glas and V. Boccuzzi, "A CMOS AGC-less IF-strip for Bluetooth," in the Proceedings of the 27th European Solid State Circuit Conference, Sept. 2001.
[11] D. Su, M. Zargari, P. Yue, S. Rabii, D. Weber, B. Kaczynski, S. Mehta, K. Singh, S. Mendis and B. Wooley: "A 5GHz CMOS Transceiver for IEEE 802.11a Wireless LAN," in the Digest of tech. papers of the International Solid-State Circuit Conference, San Francisco, February 2002.
[12] D. C. Cox, “Linear amplification with nonlinear components”, IEEE Trans. On Comm., 1974, COM-22, pp. 1942-1945.
[13] M. Tarsia, J. Khoury, V. Boccuzzi, “ A Low Stress 20 dBm Power Amplifier for LINC Transmission with 50% Peak PAE in 0.25 um CMOS’ in Proceedings for the
European Solid State Circuit
Conference, Sept. 2000.
[14] L.R. Kahn, "Single-sideband Transmission by Envelope Elimination and Restoration," Proc. IRE, vol. 40, July 1952, pp.803-806.
[15] C. King, et.al., “Very Low Cost Graded SiGe Base Bipolar Transistors for a High Performance Modular BiCMOS Process,” IEDM Tech. Dig., pp. 565-568, 1999.
[16] Shannon, C.E., “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, 1948, pp. 379-423
[ 1 7 ] Foschini, G.J., “Layered space-time architecture for wireless communication in a fading environment when using multi-element
343
antennas”, Bell Labs Tech. J., Autumn 1996, pp. 41-59
[18] Tarokh V.; Seshadri, N.; and Calderbank, A.R.; “Space-time codes for high data date wireless communication: performance criterion and code construction,” IEEE Trans. IT, vol. 44, no. 2, March 1998, pp. 744-765
[19] Zelst, A.v.; Nee R.v.; and Awater, G.A., “Space division multiplexing (SDM) for OFDM systems,” Proc. IEEE VTC-2000 Spring, Tokyo/Jp, vol.2, pp. 1070-1074
This page intentionally left blank
Design of wireless LAN circuits in RF-CMOS Domine Leenaerts, Nenad Pavlovic Philips Research Laboratories Prof. Holstlaan 4 5656AA Eindhoven, Netherlands Abstract
In this contribution, a few circuits will be described, which have been designed in a CMOS process. The circuits, a synthesiser and a 20 dBm power amplifier for Bluetooth and a 10 GHz VCO for the 5 GHz wireless standard, demonstrate the possibility of using CMOS as technology for RF applications.
1. Introduction
From an user’s perspective several wireless access technologies are available or will become so in the near future. For very high-datarate applications and individual links, Wireless Local Area Networks (WLAN) type systems such as the HIPERLAN/2 (part of the ETSI BRAN) or IEEE 802.11a equivalent systems are best suited due to their flexibility in terms of asymmetrical data services, supported data rates, and adaptive modulation. For real short-range communication for more personal links, wireless Personal Area Networks (PAN) type systems such as Bluetooth and IEEE 802.15 come into the picture. In Table 1 the main parameters of these different access systems are provided. All wireless access systems consist of an RF front-end system followed by a base band part. The front-end system converts the received antenna signal down to an IF or zero-IF analogue base band signal and vice versa it converts the base band signal up in frequency to the RF signal with the required power level at the antenna reference point. Dedicated AD- and DA- converter circuits realise the needed link 345 M. Steyaert et al. (eds.), Analog Circuit Design, 345–363. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
346
between the analogue front-end and digital base band. The digital base band system
(de-) modulates the input signal and performs the necessary signal processing such as performing error correction schemes. Due to continuous decrease of the feature size of MOS devices digital circuitry is always realised in CMOS technology. The best choice for the digital base band, consisting of microprocessors and memory, is CMOS technology. This option is also suitable for the mixed- signal parts, like the AD- and DA-converters, and IF synthesisers. However, CMOS is not a trivial choice as technology for the RF front-end. Most RF circuits are designed in bipolar-type of processes and CMOS has not yet been proven to be a good candidate for high- performance consumer RF. On the other hand, the trend of increasing integration density can be observed in wireless standards too. The industry has embraced Bluetooth as the most likely candidate to have the highest integration level possible, i.e. a single chip [1-3]. Obviously, the ultimate goal is to integrate the complete functionality on a single die, leaving only the antenna, crystal, decoupling capacitors and battery as external components. Going for a single die leaves us no other choice than implementing also the RF part in CMOS. In this contribution we will concentrate on the design of a few circuits needed in the RF front-end of wireless access systems. Hereby the focus will be on the use of RF-CMOS as technology carrier. As examples, a fully integrated synthesiser and a 20dBm power amplifier for the Bluetooth standard, and a 10GHz VCO for the 5GHz wireless standard will be discussed. Within the context of this contribution, the used technology is a CMOS process on a substrate with the choice of 5 or 6 metal layers. This process has been fully RF characterised with dedicated and accurate RF-MOST models [4]. Special bond pads and
347
ESD structures are available, fulfilling the HBM requirements with RF performance, i.e. quality factors beyond 50 at 10GHz [5]. The voltage supply is set to 1.8V. 2. Synthesizer for Bluetooth
The design requirements for the synthesizer can be derived from the following Bluetooth or system specifications: 1. The frequency band is the 2.4 GHz ISM band containing 78 channels with 1 MHz channel spacing, leading to f = 2402 + kMHz, k = 0,1,2,...,78. 2. The minimum modulation depth leads to 115 kHz frequency deviation. 3. The settling time is 4. The frequency accuracy should be better than 30 ppm, the target is 1ppm. 5. The tuning range of the VCO must be at least 10% to cope with process and manufacturing spread, i.e. 250MHz minimum tuning. 6. The phase noise must be less than –80 dBc at 1MHz offset, -11 dBc at 2 MHz offset and –120 dBc at 3 MHz offset.
To keep the design as simple as possible, an integer-N architecture has been chosen with direct quadrature generation to allow for near zero-IF demodulation. A similar frequency synthesiser for the same standard has been demonstrated in [6]. The synthesizer consists of a phase detector (PD), loop filter (LPF), VCO and divider. In addition to this a second divider scales down the external reference frequency to the comparison frequency, i.e. 500 kHz. The target is to integrate also the complete loop filter to reduce cost price in contrast to the design presented in [6]. The settling time of the synthesizer is set by the bandwidth of the loop filter. A larger loop bandwidth gives a faster settling time and rejects more of the VCO phase noise. However, if the loop bandwidth is above a tenth of the comparison frequency, the loop may become unstable and a discrete-time model of the synthesizer must be used [7]. The minimum loop bandwidth is given by
348
where is the effective damping coefficient as a function of the loop’s phase margin Furthermore, is the locking time, represents the frequency hop size and is the allowable frequency offset after time For the fastest settling time the phase margin should be set to 50 degrees and will equal 5, according to [7]. Following the design requirements, is and is 1 ppm. Therefore obtain from (1) a bandwidth of 13.82 kHz. The synthesizer will be designed for a 50 degree phase margin but to allow some margin, the bandwidth is set to 20 kHz. The maximum phase error of the loop will be less then when
is satisfied, where N is the main divider ratio. If (2) holds, then the transient response is accurately predicted by the continuous-time linear model and (1) holds. Consequently, N must be larger than 4. Now we have to find the correct values for the filter components and the charge pump current value. There is a trade-off between charge pump current and loop filter area. Reducing allows us to implement the capacitors on-chip. Some simulations have been carried out in order to obtain the values for the filter parameters. We have chosen a loop bandwidth of 20 kHz, a phase margin of 50 degrees and a gain of 150 MHz/V. The charge pump current hardly changes the phase noise at 1 MHz and 3 MHz offset, which denote the offset frequency for adjacent and first non-adjacent channel, respectively. Therefore, the charge pump current is mainly determined by the loop filter component values, as can be seen in Figure 1. The filter is a second order passive network. To achieve a good yield with MOS capacitors, the total capacitance must not exceed the value of 50 pF. This leads to a maximum charge pump current of The residual FM is a measure of the phase noise of the syntehsizer integrated across the channel and expressed as a frequency. It is calculated using
349
where is the phase noise [7]. Residual FM is important as Bluetooth uses Gaussian filtered Frequency Shift Keying (GFSK) modulation. Frequency noise is added to the received data and will degrade the SNR. The data modulation causes the carrier frequency to change with the modulation depth, i.e. 115 kHz. If the residual FM is compared to the modulation depth then the SNR can be calculated. Conversely, for a given SNR, the allowed residual FM can be calculated, and in this case the maximum allowed residual FM is calculated to be 14 kHz. For the chosen PLL configuration and loop filter constants, the simulated residual FM in a band from 1 kHz to 500 kHz is then 4 kHz.
The divider block scales the 2.4 GHz VCO output signal down to the comparison frequency of 500 KHz. It is based on the dual modulus 2/3 divider architecture described in [5]. The division ratios required for this application are between 2.4GHz/500kHz or 4800 and 2.48GHz/500kHz or 4960. The divider then consists of a chain of 12 cells.
350
Having set the main design parameters of the synthesiser, let us first concentrate on the design of the quadrature 2.4GHz VCO [8]. The tank circuit of the VCO has been built up around an inductor and a voltage-controlled capacitor. To achieve optimal performance, the total inductance is chosen to be 4.4 nH, thereby setting the total allowed capacitance to 900 fF. The differential inductor has been realised using two identical inductors, rather than designing a single symmetrical inductor. A better quality factor and higher resonance frequency were obtained at the expense of more silicon area. The measured unloaded quality factor of the inductor is 10 at 2.4 GHz, see Figure 2.
A quadrature VCO can be obtained by cross-coupling two identical oscillator cores, resulting in a configuration as shown in Figure 3 [9]. VCO2 is connected to VCO1 in anti phase, while VCO1 is connected to VCO2 in common phase. This yields a 180-degree delay in phase from VCO2 output to VCO1 input, forcing the two VCO’s to synchronise such that the phase delay in each is exactly 90 degrees, between the individual outputs, assuming they are identical.
351
Each oscillator core has been designed as a fully differential basic LC oscillator. The bias current of each oscillator is determined by a resistor R instead of using an NMOS device in combination with a current mirror configuration. The resistor contributes less noise than a current mirror with a high gain, needed to keep the power consumption low. A resistor also contributes less flicker noise. The chosen implementation means that no amplitude control is available. For the target application this is not a problem, as long as the output amplitude is more than The analogue tuning can be enlarged by a digital tuning technique, meanwhile not increasing the gain of the VCO (Figure 4). This additional tuning is needed to cope with process spread. The digital tuning will be used to set the VCO at the proper center frequency at the initial start-up. The analogue tuning will cover the channels and is controlled by the PLL. Both output nodes of the VCO cores VCO1 and VCO2 are loaded with 32 identical MOS devices implemented as a switched MOS capacitor bank. Each core will therefore be loaded with 64 MOS capacitors. They are switched between 0 V and 1.8 V in a digital manner. A 5 bits thermometer decoder controls the ON/OFF state of these capacitors.
352
A die microphotograph of the PLL is shown in Figure 5. The die was mounted in a standard plastic, low cost, LQFP package. The complete synthesizer draws 6.3 mA from a 1.8 V voltage supply. The VCO itself consumes 4 mA including the thermometer decoder block. The VCO core operates from a 1.4 V supply voltage. The oscillator operates from 2.370 GHz to 2.550 GHz, when the control voltage is swept from 0 V to 1.8 V for a ‘00000’ digital setting. This means a 7% analogue tuning range at a center frequency of 2.460 GHz. The ‘11111’ setting gives a maximum frequency of 2.763 GHz, yielding 393 MHz or 15.3% overall tuning range at a center frequency of 2.567 GHz. Phase noise measurements on the VCO were performed with an HP 3048 set-up and a Marconi 2042 low-noise signal generator as reference oscillator. The designed VCO has a phase noise of 115dBc/Hz at 1 MHz, -120 dBc/Hz at 2 MHz and, -128 dBc/Hz at 3 MHz offset from 2.5 GHz and fulfils the Bluetooth requirements. Figure 6 shows the phase noise for the in-phase output. The quadrature output has a similar phase noise characteristic.
353
Finally the phase error has been measured with an Agilent FSIQ set-up. The measured error over several samples was less than 1.2 degrees. The amplitude error was less than 5 %, resulting in an image rejection ratio (IRR) of more than 50 dB, leaving enough design freedom for the mixers to achieve the overall Bluetooth specification of 20 dB IRR.
The 5 highest frequency cells in the divider are implemented in current source coupled logic. The entire divider was designed to run
354
at 3 GHz, thereby consuming 1 mA. The resulting loop filter component values are a resistor of and two capacitors of 34.5 pF and 5.3 pF. For the synthesiser, the measured phase margin is 48 degrees and the bandwidth is 19 kHz. The settling time for maximum frequency hop is measured to be which is fast enough to fulfil the Bluetooth requirements. The entire synthesiser thereby fulfils the Bluetooth requirements. 3. 20dBm PA for Bluetooth
The power amplifier in integrated Bluetooth transceivers usually meets the Class 3 requirements of 0 dBm output power. An additional power amplifier with 20 dB gain is needed to amplify this signal to meet the Class 1 output power requirement of 20 dBm. Here we demonstrate a CMOS self-biased cascode RF power amplifier for Class 1 Bluetooth application [10]. There are two main issues in the design of power amplifiers in sub micron CMOS, being oxide breakdown and hot carrier effect, which get worse as the technology scales. The oxide breakdown is a catastrophic effect and limits the maximum allowable signal swing on drain. The hot carrier effect, on the other hand, is a reliability issue that affects the performance of the device. It increases the threshold voltage and consequently degrades the performance of the device. The recommended voltage to avoid hot carrier degradation is usually based on DC/transient reliability tests. CMOS power amplifiers have been reported with the DC voltage below the recommended voltage, but with the DC+RF voltage levels exceeding this value [11]. It has been shown that under this circumstance, the output power of the amplifier decreases in the order of 1 dB after 70-80 hours of continuous operation [12]. Cascode configuration and thick oxide transistors [1315] have been used to eliminate the oxide breakdown voltage and the hot carrier degradation. So far, in cascode power amplifiers, the common-gate transistor has had a constant DC voltage with an AC (RF) ground. Under large signal operation, the voltage swing on gatedrain of the common-gate transistor becomes larger than that of the common-source transistor. Therefore, the common-gate transistor
355
becomes the bottleneck in terms of breakdown or hot carrier degradation. In our design, we have used a self-biased cascode configuration, see Figure 7, that allows RF swing at G2. This enables us to design the power amplifier such that both transistors experience the same maximum drain-gate voltage. Therefore, we can have a larger signal swing at D2. The bias for G2 is provided by the Rb-Cb combination. The DC voltage applied to G2 is the same as the DC voltage applied to D2. The RF swing at D2 is attenuated by the low pass nature of Rb-Cb. The values of Rb and Cb can be chosen for optimum performance and for equal gate-drain signal swings on M1 and M2. As G2 follows the RF swing of D2 in both positive and negative swings around its DC value, a non-optimal gain performance is obtained compared to a cascode with RF ground at G2. However, as long as both M1 and M2 go from saturation into triode under large signal operation, the maximum output power and PAE are not degraded.
The two-stage power amplifier is given in Figure 8. Both stages use the self-biased cascode configuration. In the driver stage, the transistors M1 and M2 are 0.6mm and 0.3mm wide, whereas in the power stage the transistors are 2mm and 1.5mm wide, respectively. The gates of the driver and power stages are biased at 0.55 V and 0.8 V. The inter-stage matching is done by a high-pass LC section, consisting of the wire bond on the drain of the driver stage and an on-chip
356
capacitor. The output matching network is realised off-chip to avoid excessive power loss of on-chip inductors.
The die microphotograph is shown in Figure 9. The die area is 0.81mm by 0.57mm. It was mounted chip-on-board on an FR4 PCB. Large signal load pull measurements at 2.4 GHz revealed the optimum load to be similar to what has been simulated. As seen in Figure 10, the small signal gain is 38 dB. The amplifier saturates at 23.5 dBm with a PAE of 45% for a supply voltage of 2.4 V. At an output power of 23 dBm, 42% PAE with a gain of 31 dB was measured. There was no change in the Pout, PAE and Gain of the power amplifier after continuos operation for ten days at 2.4V supply and providing 23 dBm output power. The power amplifier has shown no sign of hot carrier degradation whereas in [12] the hot carrier degradation was seen during the first 24 hours, see Figure 11. This confirms the effectiveness of the self-biased cascode design.
357
358
As part of our measurement, we applied a Bluetooth signal at 2.4 GHz (PRBS sequence with length of 15 bits, modulation GFSK/160 kHz (m=0.32), Gaussian filter with index BT=0.5, Symbol rate 1 MSymbol/s) to the input of the self-biased cascode PA. The adjacent channel power ratio for lower and upper channels are –25.3dBc and – 27.1dBc respectively for an input power of –8.3dBm. The measured output power at 2.4V is 23.5dBm (gain of 31.8dB). Figure 12 shows the output spectrum. The output ACPR low and ACPR up are –25dBc and –26.8dBc respectively. The Bluetooth specification requires an ACPR of -20dBc at 2 MHz and –40dBc at 3 MHz offset, which are easily met by our design. Therefore, the self-biased cascode PA faithfully amplifies the Bluetooth waveform. 4. VCO at 10GHz
For the wireless LAN system at 5 GHz, a zero or near-zero IF concept can be used. Like in the Bluetooth synthesiser a direct quadrature generation can be applied, but for the purpose of demonstrating design possibilities we will target a double frequency approach in this section. Therefore, the VCO to be designed will operate at 10GHz.
359
One of the first issues is the design of a monolithic planar inductor. Shielding the inductor towards the substrate is the main issue. Normally, the shield is patterned in order to prevent circular currents from flowing. The question is then how to connect this shield to the ground node. Measurements reveal that one particular grounding method is the best option to get the maximum quality factor, see Figure 13. Once a proper inductor layout has been realised, the designer is left with finding the optimal division of capacitance in a tuneable and fixed part in combination with the inductance value. This is a similar problem as for the Bluetooth VCO described in section 2. A possible design strategy is to maximise the effective parallel impedance of the RLC tank at resonance. This increases the voltage swing and decreases the relative phase noise. This impedance is defined as where is the series resistance and L is inductance of the coil. In typical inductors, L and scale proportionally. That is, if L increases by a factor m, then so does From another perspective, if we rewrite (4) as
360
the strategy would be to maximise the LQ product [16], as maximising Q alone does not necessarily maximise In recent work the relation between the inductance value and the phase noise in the voltage-limited and current-limited region of VCO is established [17]. It is shown that increasing L beyond the value that puts the oscillator at the edge of the voltage-limited region degrades the phase noise performance. Therefore, we have set the inductance value to 0.6nH, leaving 300fF as total capacitance excluding the inductor parasitics. The varactor used is a differential PMOS P+ drain/source diffusion in an Nwell. The quality factor is approximately 6, making it the limiting element in this design. The ratio is in the order of 1.2. To increase the tuning range, the same concept as in the VCO of section 2 has been used, i.e. digital tuning by means of MOS capacitors. A die microphotograph of the design is shown in Figure 14. The circuit has been measured using on-wafer probing. The output of the VCO has been buffered to ease measurements in a environment. The VCO oscillates at 10 GHz, thereby consuming 6.4 mA from a 1.0 V supply.
361
Using the same measurement set-up as previously described, the resulting phase noise is plotted in Figure 15. The phase noise is –60 dBc at 10 kHz offset. Clearly visible is the 1/f-noise regime in the phase noise characteristic. The obtained phase noise performance is similar to the one reported in [18], but the power dissipation is less, i.e. 6.4 mW compared to 50 mW.
362
10. Conclusions
In this contribution we demonstrated the possibility to use RF-CMOS as technology for the design of wireless LAN circuits. The Bluetooth circuits demonstrate the possibility to achieve low power designs at 2.4 GHz. The 10 GHz VCO demonstrates that a CMOS technology is capable of realising designs operating at very high frequencies. All designs demonstrate that full integration of the functionality is possible in CMOS. Acknowledgements
The presented circuits have been developed as part of the RadioMaTIC and HiperLAN projects. The authors would like to thank all the team members of these projects for their valuable contributions. References
[1] [2] [3]
[4] [5] [6] [7]
[8]
P. van Zeijl, “RF-circuits for DECT and proc. AACD workshop 2002, Spa F. Op‘t Eynde, “A fully-Integrated Single-chip SOC for Bluetooth,” proceedings ISSCC, pp. 196-197, 2001 J. Cheah, et al., “Design of a Low-Cost Integrated CMOS Bluetooth SOC in Silicon Area,” proceedings ISSCC, 2002 D.M.W. Leenaerts, P.H. de Vreede, ‘Mixed mode telecom design’, in Analog Circuit Design, R.J. van de Plassche, J. H . Huijsing, W. M.C. Sansen (eds.), pp.247-266, Kluwer, 2000 D. Leenaerts, C. Vaucher, J. van der Tang, Circuit Design for RF Transceivers, Kluwer Academic Publishers, Dordrecht 2001 D. Theil, et al., “A fully integrated CMOS frequency synthesizer for Bluetooth,” proceedings RFIC, 2001, pp. 1-4 C.S. Vaucher, “An Adaptive PLL Tuning System Architecture Combining High Spectral Purity and Fast Settling Time”, JSSC April 2000, Vol. 35 No. 4. D. Leenaerts, et al. CMOS 2.45GHz low-power
363
[9]
[10]
[11] [12]
[13]
[14]
[15] [16]
[17] [18]
quadrature VCO with 15% tuning range,” proceedings. RFIC, 2002 P. van de Ven, et al., “An optimally coupled 5 GHz quadrature LC oscillator,” Symposium on VLSI, 2001, pp. 115-118 T. Sowlati, D.Leenaerts, “A 2.4GHz CMOS self-biased cascode power amplifier with 23dBm output power,” ISSCC, 2002 C. Fallesen, P. Asbeck, “A CMOS power amplifier for GSM-1800 with 45% PAE”, proceedings ISSCC, pp. 158159, Feb. 2001. V. Vathulya, T. Sowlati, D. Leenaerts, “Class 1 Bluetooth power amplifier with 24 dBm output power and 48% PAE at 2.4GHz in CMOS”, proceedings ESSCIRC, pp.84-87, 2001 C. Yoo, Q. Huang, “A common-gate switched 0.9W Class E power amplifier with 41% PAE in CMOS”, IEEE J. Solid-State Circuits, pp 823-830, May 2001 T. Kuo, B. Lusignan, “A 1.5W Class-F RF power amplifier in CMOS technology”, proceedings ISSCC, pp. 154-155, 2001 A. Shirvani, D. Su, B. Wooley, “A CMOS RF power amplifier with parallel amplification for efficient power control”, proceedings ISSCC, pp. 156-157, 2001. Hamid R. Rategh, "A CMOS Frequency Synthesizer with an Injection-Locked Frequency Divider for a 5-GHz Wireless LAN Receiver," IEEE Journal on Solid-State Circuits, vol. 35, no. 5, May 2000 D. Ham, A. Hajimiri, "Concepts and Methods in Optimization of Integrated LC VCOs", IEEE Journal of Solid-State Circuits, Vol. 36, 896-909, June 2001. W. de Cock, M. Steyaert,”A CMOS 10GHz Voltage Controlled Oscillator with integrated high-Q inductor,” proceedings ESSCIRC, pp. 496-499, 2001
This page intentionally left blank
A Fully Integrated Single-Chip Bluetooth™ Transceiver Jan Craninckx Alcatel Microelectronics Technologielaan 21 B-3001 Leuven, Belgium
Abstract This paper describes the implementation of a Bluetooth™ radio transceiver circuit in a CMOS process. The receive chain employs a 1-MHz low-IF architecture with a order complex bandpass filter, 0-48dB variable gain amplifier, 8-bit A-to-D conversion and a fully digital demodulation and clock recovery. In the transmitter, a digital modulator and D-to-A converters generate the baseband signals for a direct upconverter. The quadrature local oscillator signals are derived from a divide-by-2 circuit driven by a double-frequency auto-calibrated VCO.
1. Introduction Among the several standards that have appeared in recent years for short-range wireless communications, BluetoothTM [1] is certainly the one with the largest application range. Designed originally to be a low-cost replacement for the labyrinth of annoying cable connections on your PC, soon the huge potential of such a globally available standard became apparent. In a multimedia environment, easy communication is achieved between laptop PC’s, PDA’s, digital cameras, mobile phones, earpiece headsets, etc. Even in a simple household, the refrigerator could be talking to the PC in order to keep track of his inventory and make up your shopping list. Although some of these ideas might be far off, the market potential is huge and a lot of competitors are trying to take their share. Because of the large volumes, this market is driven by cost more than by 365 M. Steyaert et al. (eds.), Analog Circuit Design, 365–385. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
366
performance. The physical layer interface specification was clearly written with a low-cost objective. 1.1. Bluetooth System Description
Based in the license-free 2.4 GHz ISM band, the Bluetooth system occupies 79 channels, spaced 1 MHz apart, from 2402 MHz to 2480 MHz. Frequency hopping at a rate of 1600 hops per second is used to spread the spectral power. An ad-hoc piconet is created between Bluetooth units that are close enough to be detected. Each piconet consists of a maximum of 8 devices, one of which is assigned to be the master, the others are slaves. A TDMA access scheme is also used with alternating RX and TX slots of In such a slot, a guard time of approximately allows sufficient time to hop the LO frequency. The data bits are formatted according to the structure shown in figure 1.
For asynchronous transmission modes, up to 5 consecutive time slots can be used for 1 data burst, in which case the payload size can become 2745 bits long. The maximum transfer rate in that case is 723.2 kbit/s. A peculiarity of the access code format is the fact that the preamble is only 4 bits long. Theoretically, this leaves only time in the beginning of a received burst to perform gain control, offset compensation, carrier frequency offset estimation, etc. Luckily, the synchronization word is heavily error-corrected and not all 64 bits must be correct in order to recognize the packet address. However,
367
both AGC and frequency offset estimation must be design with maximum settling speed in mind. 1.2. The Bluetooth Radio Interface
As low cost implementation was the main goal of the Bluetooth system, some but not all of the requirements on the radio transceiver can be described as ‘relaxed’. Here are some key numbers for the transceiver spec. The modulation scheme to be employed is GFSK with a bit rate of 1Mb/s (BT=0.5, modulation index=0.28-0.35). The nominal transmit output power is in most cases 0 dBm. The specification requires a minimum receiver sensitivity of -70 dBm. Robust operation together with other users of the ISM band must be guaranteed by an carrier-to-interference (C/I) level specification of 0, -30 and -40 dBc at 1, 2 and offset, respectively. The interference level is relaxed however for a freely chosen image frequency to –9 dBc, which must allow the use of a low-IF receive architecture without the need for special trimming or calibration techniques to improve the image rejection. The intermodulation specification can be translated into an input IP3 of -21 dBm. The implementation of the synthesizer is relaxed by the availability of a guard time to hop the frequency. Translation of the receiver interferer specification results in a phase noise requirement of -120 dBc/Hz at 2.5 MHz offset. 1.3. Technology Choice
As low cost is a must for a successful Bluetooth product, both the architecture choice and the process technology choice are very important. For the architecture, of course monolithic implementation is aimed for, leading to low- or zero-IF up- and downconversion techniques. These will be discussed in more detail in the related sections. For the technology, a CMOS process with 5 metal layers was chosen. Transistor threshold voltages are 0.5-0.6 V, and resistors use the standard available unsalicided poly. The only RF option required are the MiM capacitors. One basic question that also pops up when thinking about the air interface is the following : single-ended or differential? Bluetooth
368
systems have a clear trend towards high integration. So chances are that this radio will be implemented on the same die with a large digital baseband controller. Making the air interface differential reduces the risk of suffering from digital interference at LNA input. Also in transmit mode, a differential output will load the power supply less, and hence reduce problems of e.g. VCO pulling. So a differential RF I/O was chosen. To limit the number of externals needed, the LNA input and the PPA output are connected to the same pins. Instead of using an external TX/RX switch, LNA and PPA are co-designed to provide the correct functionality. To improve the transmitter efficiency, an impedance level of instead of 50 was chosen. To connect to an external antenna, an external differential to singleended conversion network is needed. With advanced packaging techniques however, even the differential to single-ended conversion will not be needed when a differential antenna can be integrated in the package.
2. The Local Oscillator Synthesizer From the numbers above, the LO synthesizer specs are quite feasible in a monolithic implementation. However, because of the low- and zero-IF architectures, the biggest problem one is faced with is the generation of quadrature LO signals. 2.1. Quadrature Generation
Several options can be taken here, as illustrated in figure 2. The most straightforward approach uses a quadrature oscillator. A four-stage ring oscillator renders nice quadrature outputs, but normally its phase noise is not good enough [2]. Also LC quadrature oscillators have been realized [3], but again these need several coils and result in a large area penalty. Another approach is to use a high-quality LC oscillator that drives a quadrature generation stage. This can be a simple RC-CR filter, or a more sophisticated poly-phase filter [4]. However, because of the high operating frequency, this filter must consist of resistors and capacitors in the range of and 200 fF. Driving these impedance levels at this frequency will become too power-hungry, certainly in a
369
CMOS design. Another major problem of this system, as well as the previous solution, is the fact that the oscillator runs at almost the same frequency as the pre-power amplifier output during transmit, making it very susceptible to pulling.
Having an oscillator at half of the LO frequency, as shown in figure 2(c), avoids the pulling problem. But now a frequency doubler circuit must be used to generate the 2.5-GHz signal, which must than again drive a low-impedant poly-phase network. So also here a large power penalty will be present. Solution (d) uses a double-frequency VCO, which renders quadrature signals with a divide-by-2 master/slave toggle flip flop [5]. Only one coil is needed for the VCO and no RC network must be driven, although the critical block here is this divide-by-2 circuit.
370
Also possible is the use of a VCO at of the wanted LO frequency. Dividing this by 2 gives quadrature signals at and mixing these with the original VCO signal ends up at the wanted frequency [6]. The other mixing products generated here (at and other harmonics) must be removed by on-chip LC filtering, which is certainly a disadvantage. But it is a very elegant solution with respect to LO pulling. The reference frequency of the PLL must be 666.7 kHz, which cannot be made from the 13-MHz crystal available in all GSM phones and used in this design as the reference clock. So the doublefrequency option (solution ‘d’) was chosen here. 2.2. Voltage-Controlled Oscillator
The schematic of the 5-GHz VCO is shown in figure 3. The 2.5-V supply still allows for a symmetric implementation using both NMOS and PMOS active transistors. The two-turn symmetrical octagonal inductor [7] has an inductance value of 0.86 nH and a quality factor of 15.
371
Tuning of the frequency is done both analog and digital. The analog control voltage Vtune is driven by the PLL, and it controls the junction capacitance of a P+/N-well diode. To limit the upconversion of 1/f noise, nonlinearities in the oscillator must be avoided. So only the linear part of the tuning curve is used where the diodes are certainly operating in reverse bias. With this constraint, a tuning range of approximately 150 MHz is achieved, which is not sufficient to cover the complete Bluetooth band, and certainly doesn’t cover the expected process variations. So also a digital frequency control is added with the 3-bit signal Dtune. This adds the correct number of MiM capacitors to set the center frequency close to the desired channel frequency. A proprietary calibration algorithm continuously monitors the VCO tuning voltage. If this becomes too high or too low, capacitors are added or removed.
To fix the PLL bandwidth in spite of a varying VCO sensitivity, a linearization action is performed on the charge pump current. Constant bandwidth is indeed guaranteed by keeping the product of the VCO sensitivity and the charge pump current constant. So an
372
estimate of the VCO gain is done based on the analog tuning voltage and the digital calibration code, and the charge pump current is adjusted accordingly. To ensure fast settling, the PLL is equipped with a reset input so it can start with zero phase offset. The complete block diagram of the PLL with this calibration circuitry is shown in figure 4. 2.3. Quadrature Divide-by-2
The circuit block that stretches the capabilities of the CMOS technology the most is the 5-GHz divide-by-2 flipflop. Of course a master/slave configuration is used, the circuit schematic of one section is shown in figure 5.
The basis of this schematic is the CMOS equivalent of an ECL flipflop, but with the biasing current source omitted and the clock transistors connected directly to ground [8]. Another measure taken to enhance the speed performance, is the use of a dynamic load. The resistance of the PMOS transistors is modulated by the signal Cn. During the evaluation phase (when Cp is high), the impedance at the output node is reduced, resulting in a higher operating frequency.
373
2.4. Frequency Divider
The dual-modulus prescaler can divide by 32 or 33. For maximum speed and power efficiency, the phase-switching principle [8] was employed in the dual-modulus prescaler. In cooperation with the PS-counter, the total division factor can be N=32.S+P. With P=0..31 and S=75..77, all required Bluetooth frequencies can be generated. 2.5. Phase-Locked Loop
The reference frequency used in this synthesizer is 1 MHz, consistent with the channel spacing. This automatically limits the PLL bandwidth to approximately of this, i.e. 100 kHz. Because the VCO is operating at the double LO frequency, also a 2-MHz reference could have been chosen if a higher bandwidth was needed. But this would have increased the operating frequency of the phase-switching dual-modulus prescaler. And since the available guard time to hop the synthesizer frequency is large enough, this was not needed here. A mixed behavioral/spice model [9] was used to simulate the PLL phase noise, including the previously mentioned linearization. Careful design was needed in order to limit the phase noise contribution of the 1/f noise in the charge pump and the loop filter opamp. This loop filter is based on active implementation of a classic RC-C impedance, with an extra pole added. This extra pole was needed in order to filter out sufficiently the out-of-band noise contribution of the charge pump and filter. An active implementation is used to keep the charge pump output voltage constant, and thus matching as good as possible the up- and down current to prevent reference spurs. The total PLL now behaves as a order, type-2 loop. The total synthesizer power-up time, including digital calibration, still takes less than
374
3. The Receiver 3.1. RX Architecture
As already stated in the introduction, low cost drives the architecture choice to one that can be completely integrated, i.e. a zero-IF or a lowIF topology. Setting the intermediate frequency to zero guarantees a monolithic implementation, but has in fact two drawbacks. The first is the wellknown problem of DC offsets. These come both from LO-to-RF feedtrough in the mixers as from offsets in the baseband circuitry. As they are located inside the signal band, they cannot be removed by easy filtering. Although sometimes annoying, sufficient DC offset canceling techniques have been developed to overcome this problem [5]. The second drawback of the zero-IF topology is related to 1/f noise. Certainly in a CMOS implementation, a lot of 1/f noise is present at low frequencies and this can corrupt significantly the signal SNR. A low-IF topology does not suffer from these drawbacks, but generally has more stringent requirements on the image rejection ratio (depending of course on the choice of the IF and the power of the adjacent channels). Because this last constraint has been wavered in the Bluetooth specification, the low-IF receive architecture is chosen here. A receive block diagram is shown in figure 6.
375
In the choice of this low intermediate frequency, several criteria play a role. First, for easy LO synthesis with an integer-N PLL, it must be multiple of 1 MHz. For noise reasons, probably a higher frequency is better since the 1/f noise problem will be even further reduced. For filtering however, a smaller one is preferred because the complex bandpass filter poles will have a lower Q. This means the R and C ratios will be more practical, and more important errors in the RC time constant will have less impact. Indeed, making a 10-% error on a 3-MHz IF displaces the center of the bandpass filter by 300 kHz, which will deteriorate the BER and make the circuit less robust to interferers. Also the GBW of the opamps in the filter will be less critical for a 1-MHz center frequency with poles of a limited Q. So IF = 1 MHz was chosen. 3.2. RF Circuits
The LNA circuit is implemented as a common-gate amplifier. The circuit schematic is shown in figure 7. The input impedance is boosted to the required by regulating the gate of the input transistors.
The LNA has 14 dB gain, an amount limited by the blocking constraints of the Bluetooth radio specification. The input IP3 is with
376
8 dBm largely sufficient. The noise figure of 6 dB is rather high, but this is not limiting in the receiver as the total RX noise figure will be dominated by the downconversion mixer. This downconversion mixer is implemented as a Gilbert cell as shown in figure 8. The biggest problem faced in this design was the 1/f noise, which is basically the reason why the switches are implemented with PMOS transistors.
When the switches are in an unbalanced state, they act as cascode transistors and do not contribute any significant noise to the output. However, at the moment the LO signal crosses zero, their lowfrequency noise adds to the output signal. The exact amount of noise that is generated this way depends on many factors, including the shape and the amplitude of the LO signal. An approximation can be calculated [10], or the effect can be simulated with a dedicated RF simulation tool. So the design and the biasing point of the switch transistors requires careful optimization. Since the noise current is proportional to the biasing current, this one needs to be as small as possible. The limit for this is dictated by the amplitude of the blocker signals that can be
377
present here to. Indeed, also blocker signals are converted down to a certain intermediate frequency and if the associated AC current becomes larger than the DC biasing current, the wanted signal can no longer be downconverted. On top of this, the necessary margins must be taken into account for process and temperature variation, biasing inaccuracy, etc. Also the choice of the is somewhat contradictory. To reduce the low-frequency white noise contribution, small values are needed and thus must be set as large as possible. But this results in a much slower switching of the transistors and thus a higher conversion factor for the noise. Low values for lead to better switching, but the resulting large transistors lead to a lower value of the pole present at the sources of the switches. 3.3. Baseband circuitry
After downconversion, the signal is filtered by a order complex bandpass filter. As no limiting is applied, the main goal of this filter is to reduce the signals from adjacent channels to a level lower than or equal to the wanted channel, such that the following blocks will not be saturated. The filter structure is an opamp-R-C filter, one section of which is depicted in figure 9 [11]. The first stage takes the downconversion current output as its input, and only after this first stage the signal is represented in the voltage domain. This enables the system to cope with large blocking signals as they are filtered out before they can saturate the opamp. To set correctly the center frequency and the bandwidth of the filter, the capacitors are digitally tuned. An on-chip calibration block tunes the capacitors to the correct value by comparing an RC time constant with the reference clock. This filtered signal can now be amplified by the VGA (Variable Gain Amplifier) to a level suitable for conversion to the digital domain by the ADC. As there is no signal power present at DC, the VGA can be AC coupled to remove DC offsets. The VGA gain can be varied in 6-dB steps from 0 to +48 dB, and is controlled by the digital AGC algorithm.
378
Conversion to digital is done by two 8-bit, 6.5-MS/s A-to-D converters. It is in fact this sampling rate that sets the filtering order, because the interferer channels at the aliasing frequencies must be sufficiently attenuated. An interferer at 6MHz offset can be 40 dB higher than the wanted channel, so it should be reduced by almost 60 dB in order not to disturb the reception of this wanted signal. Three orders of filtering could just do the job, but as every effort has been made to make the circuit as robust as possible to interference, a order filter has been implemented. 3.4. Digital Signal Processing
In the digital domain, several functions are performed, such as the calibration algorithm for the VGA, the automatic gain control, etc. The main function is of course processing of the wanted signal. This is done as shown in figure 10.
379
First the received signal is further mixed down to DC by a digital quadrature multiplication. Then the final channel filtering is performed with a simple FIR lowpass filter. The power of the wanted signal is calculated and this drives both the gain control algorithm and the RSSI calculation. This RSSI value is available digitally and can be read out by the baseband controller. The demodulation is performed by calculating the frequency of the incoming signal with the formula for the vector product V: which approximates to with the modulation frequency of +/-160 kHz. So the information to be recovered is the sign of this vector product. The most annoying problem in an FSK demodulator is the fact that the transmitter and receiver do not share the same crystal reference and hence there is an offset in their RF frequencies. A crystal error of 20 ppm can result in a total carrier frequency offset of 100kHz, leaving only 6 0 kHz signal amplitude instead of 160. As this deteriorates drastically the BER, the carrier frequency offset must be calculated and used in the data recovery. Calculating the frequency offset is normally done by taking the average frequency during the preamble, which has a 0101 data pattern and the DC-content should be zero. So the average value of the
380
preamble frequency can be taken as the slicer threshold. A trade-off is always present in the time constant of this filter. It must be low enough to pass only the DC value, but also high enough to allow fast tracking within the limited length of the preamble. Because the preamble in Bluetooth is only 4 bits long, there is not sufficient time for this. Low-pass filtering the demodulated frequency during the complete header is also possible, since the DC-content should be almost zero here due to the error correction implemented. But this is not optimal because some slots might be missed if the header itself contains too many errors. In our implementation, the average frequency is calculated rapidly by tracking not the average, but the minimum and the maximum demodulated frequency. The slicer value is simply set to be in the middle of these two. Separate time constants handle the tracking of the min/max value to the demodulated waveform, and holding the values during long sequences of 1’s or 0’s. Correcting the demodulated waveform for this offset is done in two ways. First, a coarse value of the frequency offset is fed back to the digital LO generator. Its frequency is deviated from the original 1 MHz by the required amount. The still remaining offset is subtracted from the demodulated waveform in the slicer. This technique allows us to loose almost no bits of the header. A clock recovery circuit is also implemented on the transceiver. Although this task is usually done by the baseband controller, the radio can also supply a recovered data clock.
4. The Transmitter 4.1. TX Architecture
There is not that much variation in integrated transmitter architectures, and this transceiver is not different. Obviously, a direct upconversion topology is used, as depicted in figure 11.
381
4.2. Baseband circuitry
Based on the incoming data bits (and optionally also the transmit clock), the digital modulator generates 13 MS/s 8-bit I and Q signal, which are converted to the digital domain by current-steering DACs. Additional control is foreseen to remove the carrier feedthrough in the output spectrum by adding offsets to the baseband signals with small offset DACs. A second-order lowpass filter removes the DAC alias and the out-ofband quantisation noise. 4.3. RF circuitry
For upconversion, the same LO I and Q signals are used as in the receiver, although now of course the LO frequency is set to the same value as the desired RF channel. A standard Gilbert mixer generates the RF signal with better than 45 dB linearity. The external load is driven by an internal pre-power amplifier, the schematic of which is shown in figure 12.
382
The current in the output branches are set by the DC voltages at the gates of the transistors. These are controlled by the required commonmode feedback circuits, which are not shown on the figure for clarity. The output impedance is set to the desired value of by the resistive feedback.
5. Implementation A layout plot of the realized transceiver is shown in figure 13. All the relevant blocks of the circuit are indicated. The die area is and the circuit is mounted in a 48-pins MLF package. Communication with the baseband controller is done through a 14pins unidirectional rxmode-2 BlueRF interface.
383
A plot of the measured synthesizer spectrum (at the divided frequency of 2.5 GHz) is shown in figure 14. An excellent -127 dBc/Hz is achieved at 2.5 MHz offset. The reference spurs are attenuated by 45 dB at 1 MHz, and 55 dB at 2 MHz. Further measurements are awaiting silicon processing.
384
6. Conclusion A highly integrated Bluetooth™ transceiver circuit in CMOS is reported. A double-frequency VCO generates with excellent phase noise performance the required quadrature LO signals for the 1-MHz low-IF receiver and the direct upconversion transmitter. The receive chain is very robust to interferers due to order complex bandpass filtering, no limiting, and extensive digital signal processing in the demodulator. The transmitter shares the RF pins with the receiver and thus avoids the external TRX switch. This transceiver offers a high-performance and low-cost implementation of the Bluetooth system.
385
7. References [1] “Specification of the Bluetooth system”, version 1.1, February 2002, http:\\www.bluetooth.com. [2] T. Lee and A, Hajimiri, “Oscillator phase noise : a tutorial”, IEEE Journal of Solid-State Circuits, vol. 35, no. 3, pp. 326-336, March 2000. [3] M. Tiebout, “Low-power low-phase-noise differentially tuned quadrature VCO design in standard CMOS”, IEEE Journal of Solid-State Circuits, vol. 36, no. 7, pp. 1018-1024, July 2001. [4] J. Crols and M. Steyaert, “A single-chip 900-MHz CMOS receiver frontend with a high-performance low-IF topology”, IEEE Journal of SolidState Circuits, vol. 30, no. 12, pp. 1483-1492, December 1995. [5] F. Op ‘t Eynde, J. Craninckx and P. Goetschalckx, “A fully integrated zeroIP DECT transceiver”, International Solid-State Circuits Conference, pp. 138-139, February 2000. H. Darabi et al., “A 2.4-GHz CMOS transceiver for Bluetooth”, IEEE [6] Journal of Solid-State Circuits, vol. 36, no. 12, pp. 2016-2024, December 2001 [7] J. Craninckx and M. Steyaert, “A fully integrated CMOS DCS-1800 frequency synthesizer”, IEEE Journal of Solid-State Circuits, vol. 33, no. 12, pp. 2054-2065, December 1998. [8] J. Craninckx and M. Steyaert, “A 1.75-GHz / 3-V dual-modulus divide-by128/129 prescaler in CMOS”, IEEE Journal of Solid-State Circuits, vol. 31, no. 7, pp. 890-897, July 1996. [9] J. Craninckx and M. Steyaert, “Wireless CMOS frequency synthesizer design”, Kluwer Academic Publishers, 1998, ISBN 0-7923-8138-6 [10] J. Janssens, “Deep submicron CMOS cellular receiver front-ends”, Ph.D. dissertation, K.U.Leuven, July 2001. [11] J. Crols and M. Steyaert, “CMOS Wireless transceiver design”, Kluwer Academic Publishers, 1997, ISBN 0-7923-9960-9
This page intentionally left blank
Continuous-time Quadrature Modulator Receivers Peter Vancorenland, Philippe Coppejans, Michiel Steyaert KU Leuven, ESAT-MICAS Leuven, Belgium Abstract This paper presents a continuous time quadrature modulator receiver. The AD converter implements a continuous time complex band-pass loopfilter. The low power requirements of the modulator allow it to be integrated into a wireless receiver. An implementation in a 1.57 GHz receiver with a 2 MHz bandwidth is shown. This receiver only needs a SAW filter and a reference cristal as external components. I. INTRODUCTION
The main trends toward price reduction and single chip receiver implementations have pushed the research of integrated CMOS wireless receivers over the last ten years. More and more commercial CMOS implementations are finding their way into the market. The main focus lays on the low cost and high integratability. This paper presents a receiver architecture with the main focus toward integratability. The receiver uses a quadrature modulator [1] with a continuous time loop filter at the low-IF frequency, which is described in the next chapter. The modulator integrates the down conversion mixer function at its input [2]. The implementation of a continuous time loop filter in a wide DR AD converter doesn’t require the use of an anti-alias filter nor a variable gain amplifier in the receiver. The third chapter describes an implementation of this modulator in a 1.57 GHz receiver for spread-spectrum signals with a 2 MHz bandwidth. Finally the measurement results of this modulator are shown and conclusions are drawn.
387 M. Steyaert et al. (eds.), Analog Circuit Design, 387–410. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
388
II. MODULATOR ARCHITECTURE A. IF signal filtering
For a given RF down conversion system with an IF frequency and a bandwidth 2.BW, different filters can be used to process the IF signal. In the case of a low-IF down conversion stage, the wanted signal is positioned only in the positive frequency band between the frequencies BW and A.1 Low-Pass filter
The easiest approach is where the loop filter passband has a low-pass characteristic with the signal included in its passband. An example is shown in Fig. 1(a). Transfer function
The required bandwidth for this circuit is function is given by:
The transfer
389
Noise The output noise density is dominated by the thermal noise contributions from the input resistor with value R/G. This noise density is given by:
The output noise density at the center of the passband becomes:
The total noise integrated over the filter bandwidth (double sideband) and referred to the input of the filter becomes:
A.2 Band-Pass filter
A bandpass implementation is given by Fig. 2(a). The transfer characteristic is shown in Fig. 2(b). The required filter bandwidth is 2.BW on the positive and negative frequency axis.
390
Transfer function The transfer function for this filter is given by:
Noise The output noise density is dominated by the thermal noise contributions from the input resistor with value R/G. This noise density is given by:
391
The output noise density at the center frequency becomes:
of the passband
The total noise integrated over the filter bandwidth (double sideband) and referred to the input of the filter becomes:
A.3 Complex Band-Pass filter
A complex bandpass filter implementation is shown in Fig. 3. Transfer function The transfer function is given by:
Noise The output noise density is dominated by the thermal noise contributions from the input resistor with value R/G. This noise density is given by:
392
The total output noise density at the center frequency
becomes:
The total noise integrated over the filter bandwidth and referred to the input of the filter becomes:
The power consumption of the active elements is inversely proportional to the impedance R driven. If the Capacitance C is kept constant (C determines the integrated noise level), the resistance values necessary to obtain the required bandwidth, will be proportional to the values given
393
in Table I. The first row in this table shows the total noise integrated over the passband, the second row shows the resistor value R, the last row shows the proportionality factor From this table, it becomes clear that in terms of power consumption the complex bandpass filter is the most advantageous implementation.
B.
implementation with complex bandpass filter
Figure 4 depicts the equivalent schematic of a first order ADC with a complex bandpass filter. and are respectively the real and imaginary feedback coefficients and determine the poles and zeros in the noise transfer function. The indices "R" are used for “real” coefficients, e.g. the coefficients which represent the feedback from an I(Q) node to an I(Q) node, whereas the “complex” "C" indices indicate the coefficients of the feedback from an I(Q) node to a Q(I) node. Note that in the latter case the sign of the feedback coefficient is different for the “I-to-Q” feedback and the “Q-to-I” feedback. The complex noise transfer function of this modulator has a pole at and a zero at where is the GBW of the integrator. This can be generalized for higher order loop filters
394
C. AD simulation
In order to optimize the loop filter coefficients, the linear effects in the converter have to be simulated. A full Spice simulation over enough time periods to guarantee an accurate FFT of the output spectrum is a very long and tedious job. It is however enough to split up the loop filter signals into all the independent contributions and accurately simulate the response of the loop filter on these contributions after one period of time. These response coefficients can then be used in a z-transformed simulation of the converter [3]. The independent signals components are shown in Fig. 5. The independent signal contributions are explained next. C.1 Feedback pulse contribution
The effect of the feedback pulse from the I or the Q DAC will generate signals on both the I and Q integration nodes of the first and second stage of the filter.
395
C.2 Initial conditions
The superposition of all the signals on each of the integration nodes at the end of each clock period (except for the input signals which are explained next) determine the begin conditions for the next simulation period. In case of a complex bandpass filter with a center frequency at a begin condition on one of these nodes will generate damped sine waves with a frequency of on all the other nodes. C.3 Input signals
The signals at the input of the AD converter will generate signals on each of the I and Q output nodes of the first and second stage of the filter. The transfer function for these signals can be derived from the AC behavior of the loop filter.
The effects of these contributions can be simulated with a high accuracy simulator (Spice, Eldo) where the simulation has to be performed over one clock period only. The transfer coefficients for each of these contributions can be used in a matrix with time independent coefficients as show by:
396
with:
is the signal on output node I of the
stage at time
is the I feedback pulse at time is the coefficient which describes the effect of the feedback pulse on the I output node of the stage. (Note that in this system and has been left out) and are the feedback coefficients as described in the previous section. is the coefficient describing the effect of the begin condition on the Q output node of the stage on the I output node of the stage. is the transfer of the input signal to the I output node of the stage. These equations can be evaluated for every clock period in a high level simulator (Matlab [4]) to obtain the highest performance while still guaranteeing stability.
397
III. IMPLEMENTATION OF THE
MODULATOR IN A WIRELESS
RECEIVER
The discussed quadrature modulator has been implemented in a 1.57 GHz wireless receiver. By using an implementation with a wide DR, the VGA function used in many receivers can be omitted. The IF frequency of this Low-IF receiver is set at 4 MHz. The RF-signal is quadrature down converted by mixers connected directly to the input of the A/D converter. The loop filter of the AD converter is a complex bandpass loop filter as discussed in the previous section. The output of the receiver is a digital I and Q bit-stream at a bit-rate of 128 MHz. An anti alias filter in front of the ADC is not necessary anymore because the loop filter attenuates out of band signals before any signal sampling occurs as opposed to the switched implementations [1], and is therefore left out. The receiver consists of an LNA connected to a switched mixer followed by a wide dynamic range AD. The front-end converts the RF-signal to differential I and Q signals in a 2 MHz BW centered at 4MHz. The linear mixers are switched with a quadrature LO-signal derived from a VCO running at 3.14GHz followed by a divide-by-2 block. The implemented receiver topology is shown in Fig. 6. The different building blocks are discussed next.
398
A. LNA
As seen in Fig. 6 the LNA is only preceded by the antenna and an external blocking-filter. This means no amplification of the signal has been performed before it reaches the input of the LNA. Therefore, the LNA is designed to have a very low noise figure since it sets a lower bound for the total receiver noise figure. The circuit schematic of the LNA is shown in Fig. 7. A high voltage gain is necessary to sufficiently reduce the noise con-
tribution of following mixer block. The low noise figure and high gain are provided by a single-ended common-source amplifier with inductive source degeneration. The cascode transistor reduces the Miller-effect, increasing the stability of the amplifier. It also increases the reverse isolation so that LO leakage to the antenna is minimized. The load of the LNA consists of an inductor which was designed for a high equivalent load resistance in order to have sufficient voltage gain without jeopardizing the linear operation of the mixer. A patterned ground shield was placed beneath the inductor in order to avoid noise injection and to increase the Q of the parasitic capacitance. To further boost the gain without a penalty in power consumption, the LNA input impedance was designed slightly lower than A ground shield was used for the input bonding pad in order to minimize its capacitance and maxi-
399
mize the Q, which improves noise and gain performance. The input of the LNA is protected against ESD by two reverse biased diodes and a supply clamp. B. PLL
The type II fourth order PLL is shown in Fig. 8 [6]. The VCO is built around an on-chip octagonal balanced inductor that
is optimized through an in-house inductance simulator-optimizer. The VCO operates at a frequency of 3.14GHz and the quadrature signal is generated from a master-slave frequency divide-by-2. Two DSTC nlatches form a differential dynamic D-flipflop performing the high-speed division. The PLL is locked to a 16.37MHz frequency reference through a divide-by-96 block and a phase frequency detector without dead-zone [7]. The reference frequency spurs are minimized by adding a reference
400
branch in the charge pump core and careful timing of the switch control signals. This way, the charge pump current sources are always on. The current is alternatively flowing in the reference and the output branch of the charge pump. A virtual ground is provided after the charge pump by putting an opamp in the loop filter. This keeps the charge pump switches well in saturation and improves the symmetry between the Up- and the Down- side of the charge pump during locking. For stability reasons, a low frequency zero is inserted in the loop filter. This low-frequency-zero is implemented on-chip without applying any tricks like using a dual-path loop filter [8]. This way, power is saved since we do not need to implement a current summation circuit. This comes at the cost of an increased chip area to implement the loop filter capacitor of almost 2nF on chip. C. Modulator circuit implementation C.1
architecture
In figure 9 the full architecture as described in section II is depicted. The loop-filter is a second order complex bandpass filter consisting of two cascaded integrators and two feed-forward OTAs as shown in figure 9. Both blocks of the cascaded structure introduce complex nonconjugate poles at two different frequencies in order to guarantee a broad quantization noise suppression band under varying process conditions. For given specifications, optimal pole positions can be determined. The feed-forward OTAs introduce zeros in the filter frequency characteristics and guarantee the stability of the loop at higher frequencies. First filterblock
The first filter slice is shown in Fig. 10. The first filter block is implemented as an RC filter and is the most important block in the loopfilter for noise and linearity considerations. The input resistor linearizes the ota transconductance and the degeneration loop gain has to be high enough for the good functionality both in the strength of the virtual ground and the linearization of the transconductance. The complex feedback, shifting the integrator characteristic to a complex
401
bandpass filter, is done by the resistors which are coupled between the outputs of the I(Q) phases and the inputs of the Q(I) phases. The input voltage-to-current conversion is mainly performed by the input degeneration resistor requiring no extra components. The input impedance does not provide a considerable loading on the preceding block. The value of the resistor determines the NF of the entire receiver. The first OTA is a folded cascode transconductor as shown in Fig. 11. The output common mode voltage is controlled by both a high frequency feedback loop and a low frequency feedback loop. The first loop is closed by the transistors and controls the biasing. Although reducing the available headroom, this approach is feasible because of the small signal swing level in this receiver AD converter. The GBW of this loop is always smaller than the GBW of the OTA, therefore a second parallel loop with high GBW is needed as well. This loop is closed by the capacitors and controls the biasing source. The open loop CMFB transfer function of this loop has a band pass characteristic.
402
The capacitor value is a tradeoff between extra capacitive load on the output and the open loop CMFB gain. Due to a different common mode voltage at the input and output of the first filter block, a DC current will flow through the quadrature feedback resistors This (minor) DC current is drawn through a
403
CMFB circuit placed at the input of the transconductors. This circuit is shown in Fig. 12. The current is set by a bias source at the biasing node The input common mode is sensed at the and nodes. This is compared with the reference voltage (0.5V), thus controlling the current through which flows through the resistors.
Second filter block
The second filter slice is shown in Fig. 13. The filter is implemented as a gmC filter. The complex feedback is performed by transconductances, which are coupled between I and Q phases. These transconductances have their inputs connected to the outputs of the I(Q) filter stage and their outputs are summed together with the outputs of the Q(I) integrator transconductances. The power requirements for the second filter block are much more relaxed, particularly the noise and linearity specifications are lowered by the gain of the previous block. A simple low power gmC filter implementation with a minor degeneration loop gain is suitable. The OTAs in the second stage are folded cascode transconductors, similar to the circuits used in the first stage. This is shown in Fig. 14 where the currents of the degenerated transconductances marked “I” and “Q” are summed. The CMFB of this stage is implemented in the same way as in the first
404
stage. Input stage
The input of the ADC is a summing point for both the IF input signal and the DAC return-to-zero [2] feedback pulses. In this continuous time implementation this is done by current summation on that node. Resistors placed between the input of the system and the low impedant ADC input provide the necessary V-I conversion of the input signal. The down conversion is performed by the switching transistors
405
in series with the input resistors. The transistors are connected to the virtual ground inputs of the loop filter. The I and Q LO drive signals are square-waves and can be derived from a PLL. The switches have a much lower on-resistance than the value of the input resistor in order to achieve a very linear mixing operation. The value of this resistor determines the NF of the entire receiver architecture. The linearity is determined by the loop gain of the input transconductance degeneration. This implies that for a certain NF, the power drain in the filter ota is proportional to the desired DR. In receiver architectures, a single-ended input signal is often preferred to a differential one in order to save power. Both single ended and differential signals can be applied to the input of the AD converter. In the latter case, the CMFB has to be good enough to ensure the necessary single-ended-to-differential conversion and to maintain the linearity requirements. At the dummy input node a replica of the previous block (LNA) output has been placed to make the structure loading as symmetrical as possible. Due to this design choice power and area of an extra active balun at RF frequencies are saved. Comparator
The comparator [9] is shown in Fig. 15. The feed forward blocks of the loopfilter (B1,B2) are indicated on the left. the cascode transistors reduce the comparator kickback effects. The signals at the output of the comparator are sampled in the SR flip flop. Feedback stage
Small signal levels also imply a small feedback current value or feedback charge per sample. Special attention has been paid in the design of the feedback current structure (figure 16) to avoid current spikes at the ADC input during switching which would increase the noise level in the first ota. Firstly NMOS switches are placed in parallel with the PMOS switches and are driven by the opposite clock to cancel the clock feed-through pulses. A current mirror with low mirror factor of 20:1 further reduces the remaining capacitive feed-through.
406
Design for offset
Due to the low NF and resulting low signal level requirements, both the reference voltage levels as well as the offset levels have to be very low, putting a heavy burden on the low power circuit degrees of freedom in the design process. The input transistors of the comparator are sized to achieve the necessary matching level. The sizing however increases the capacitive loading of the preceding stage.
407
IV. MEASUREMENTS
In fig. 17, a photograph of the quadrature modulator receiver with the most important building blocks indicated is shown. It is laid out in a CMOS process. The Noise figure vs. frequency of the LNA is shown in Fig. 18(a). The NF at 1.57GHz is 1.5 dB. The LNA consumes 4 mA. The PLL phase noise is as low as -115 dBc/Hz at 600kHz offset and -138 dBc/Hz at 3MHz offset. Fig. 18(b) presents the PLL phase noise measured at 1.57GHz. The PLL has a locking range of 10% around the center frequency of 1.57GHz. The full PLL consumes 8.5 mA. The demodulated baseband output spectrum for a 1.6 GHz input signal at this frequency is shown in Fig. 18(c). The output SNR for increasing input signal levels is shown in fig. Fig. 18(d). The performance of the quadrature modulator receiver is summarized in table II.
408
V. CONCLUSIONS
A Continuous time quadrature modulator architecture was presented. The presented AD modulator has the ability to cope with very small input signal levels thanks to its very low thermal noise level and ditto comparator offset. Only an LNA has to precede this modulator in a wireless receiver system. Due to the high dynamic range of the ADC
409
no extra VGA is required. The modulator was implemented in a wireless receiver for signals with 2 MHz bandwidth at a 1.57 GHz RF frequency. This receiver only needs a SAW filter and a reference cristal as external components. VI. ACKNOWLEDGEMENTS
The authors would like to thank Kawasaki Microelectronics Inc. for providing the chip’s processing and particularly K. Akeyama and Y. Segawa for their help and support with the layout and design of the circuit.
410
REFERENCES [1] S.A. Yantzi, K.W. Martin, and A.S. Sedra, “Quadrature bandpass delta-sigma modulation for digital radio,” IEEE Journal of Solid State Circuits, vol. 32, no. 12, pp. 1935–1950, December 1997. [2] L. J. Breems, E.J. van der Zwan, E. C. Dijkmans, and Huijsing J., “A 1.8mW CMOS modulator with integrated mixer for A/D conversion of IF signals,” Proc. of the 1999 International Solid-State Circuits Conference, February 1999. [3] A. Marquez, High Speed CMOS DATA Converters, Ph.D. thesis, KULeuven, January 1999. [4] http://www.mathworks.com/, Matlab, the language of technical computing. [5] P. Leroux, J. Janssens, and M. Steyaert, “A 0.8 dB NF ESD-protected 9 mW CMOS LNA,” Proc. of the 2001 International Solid-State Circuits Conference, February 2001. [6] B. De Muer and M. Steyaert, Fully Integrated CMOS Frequency Synthesizers for Wireless Communications, pp. 287–323, Analog Circuit Design, W. Sansen, J. H. Huijsing, R. J. van de Plassche (eds.) Kluwer Academic Publishers, 2000. [7] Floyd M. Gardner, Phaselock Techniques, John Wiley & Sons Ltd., 2 edition, 1979. [8] M. Steyaert, J. Janssens, B. Demuer, M. Borremans, and N. Itoh, “A 2V CMOS Cellular Tranceiver Front-End,” Proc. of the 2000 International Solid-State Circuits Conference, February 2000. [9] E.J. van der Zwan and E.C. Dijkmans, “A 0.2-mW CMOS Modulator for Speech Coding with 80 dB Dynamic Range,” IEEE Journal of Solid State Circuits, vol. 31, no. 12, pp. 1873–1880, December 1996.
Low power RF receiver for wireless hearing aid Armin Deiss and Qiuting Huang Integrated Systems Laboratory (IIS) Swiss Federal Institute of Technology (ETH) CH-8092 Zürich, Switzerland Armin. Deiss@village. ch
Abstract The design of a low power receiver for a wireless hearing aid system working in the 174–223MHz range and its implementation in a BiCMOS technology is shown. The chip comprises LNA, RFmixer, variable-gain IF-amplifier, and demodulator, which consists of digital phase-shifter and I/Q IF-mixers, order Bessel filters, and DC-amplifiers. Merely is consumed for the reception of an 8-ary PSK signal with a data rate of 336kbit/s.
1
Introduction
Technological advances have changed hearing aids from passive to active, from mechanical to electric and later electronic devices, from 411 M. Steyaert et al. (eds.), Analog Circuit Design, 411–441. © 2002 Kluwer Academic Publishers. Printed in the Netherlands.
412
bulky to small and from analog to digital, from inflexible to individually configurable systems. However, one fundamental assumption has never been questioned up to now; the hearing aid system has always consisted of one single hearing aid device. However, a binaural configuration offers conveniently additional information about the position of a sound source, enabling a kind of “stereo” hearing. Employing a configuration as shown in fig. 1, noise from source j can be separated from the signal of source s due to their difference in propagation delay to microphones and
rom the two received signals,
and
413
two new signals can be derived, namely
and
Signal is independent of s(t) but correlated to By estimating with a guess of the desired signal s(t) can be calculated and fed to the loudspeakers and The noise is filtered out this way. The link between the two ear-pieces is most conveniently established wirelessly. In order to achieve a low power solution for the ear-pieces with contemporary technology, a direct ear-to-ear solution as shown in fig. 2 was sacrificed for the configuration of fig. 3 with additional body unit. Most of the signal processing takes place therein,
where space and battery power is less critical.
414
Although the limited distance in the range of 1m between transmitter and receiver makes a low power transceiver possible, signals received by the RF receiver in the ear-pieces remain weak and the external radio environment is complex, so that the design of a physically small and low power receiver remains a strong challenge. Commercially available products drain more current than acceptable already at much lower data rates than needed, cp. e.g. [4] (2.7mA at 2.4kbit/s) or [5] (6mA at 64kbit/s). The text is structured as follows. Firstly, some system aspects of the whole hearing aid ear-piece are considered, followed by a short derivation of the receiver specifications. Afterwards, the design of all building blocks along the receiver chain is discussed, the measurement results are presented, some conclusions are drawn and an outlook is given. More detailed description may also be found in [6]. An advanced BiCMOS process was used for the implementation to take full advantage of both the high speed, low noise, and high gain characteristics of the bipolar transistor as well as the benefits of the MOS technology, namely good scalability and no DC power consumption. The process offers a 12GHz npn transistor, double poly and high resistivity poly resistors, dedicated poly-poly capacitors, and double metal.
415
2
System level considerations
Analog or digital modulation may be feasible. However, the decision to employ time division multiplexing for its potential low power implementation ruled out analog schemata. The two RF-links in the system are separated by allocating two different channels. Sampling a 7kHz audio signal at 16kHz at a resolution of 14bit/sample with compression to 8bit/sample yields in a bit rate of 128kbit/s. Allowing 0.5ms of PLL (Phase Locked Loop) switching time between 2.5ms receive and transmit time slots and some data overhead, see table 1 and fig. 4, a gross bit rate per channel results. Such high data rates require complex modulation schemata
even if 200kHz channel spacing is invoked. Using 8-ary PSK (Phase Shift Keying) reduces the symbol rate to about Though high frequencies are beneficial to antenna efficiency and component size, power considerations favour low frequencies. The
416
174–223MHz band is available for short range devices as secondary services in many countries and was chosen as compromise [7,8]. Fig. 5 shows the implemented receiver of the ear-pieces. A superheterodyne architecture was chosen for its established functionality and minimal number of blocks working at the highest frequency, thus promising lower power drain.
417
3
Receiver planning
The receiver planning accounted for an important part of this work as there were no specifications generally available for this kind of service. The signal strength at the antenna input has been estimated using [9, 10]
with the transmit power the antenna gain the effective length of the monopole receiver antenna, and the distance between transmitter and receiver. The minimal signal strength was estimated as The nominal signal strength of at the receiver output leads to a maximum gain of To demodulate an 8-ary PSK signal at a symbol
418
error rate of a minimal signal-to-noise ratio of SNR=11dB is required at the A/D-converter input [11]. The maximal tolerable noise level at the A/D-converter input is therefore
Calculated backward to the receiver input, this corresponds to
and an input referred square root of the noise spectral density of
with the channel bandwidth B=200kHz. If enough gain is provided by the first stages, they dominate the overall noise budget. That is why noise is normally important in the first, but not in the latter stages. Important measures for the small and large signal linearity are the order intercept point and the 1dB compression Point (1dBcP), respectively. For both of them, the expected level of unwanted signals at the antenna had to be estimated lacking dedicated specifications. With a survey on the relevant transmitters in Switzerland, resonable assumptions of their distance to the hearing-aid, and the help of equation (5) again, blocker and interferer levels were estimated. They could be relaxed to some extent as the system may switch to
419
a kind of “mono” mode in case of distorted RF-links. For the interstage propagation of blocker and interferer, band and channel filter characteristics have to be considered. The 1dBcP of a building block is normally specified at the highest expected signal level (throughout all frequencies) exceeded by about 3dB. Because of each nonlinear stage generating intermodulation products anew, the number of nonlinear stages has to be taken into account for the input referred To account for four nonlinear stages (LNA, RF-mixer, IFamplifier, and demodulator) instead of one, the of each stage has to be increased by 6dB, thus lowering the intermodulation products by 12dB. The relevant levels for all blocks are gathered in table 2 together with the gain distribution. Where there are two values, both typical
and worst case IF-filter specifications have been accounted for.
420
4
Circuit implementation
See fig. 6 for the schematic diagram of antenna equivalent circuit, RF-filter, and LNA [1]. The equivalent circuit of an electrically short
antenna is a resistance in series with a capacitance. It forms the RFfilter together with shunt inductor L and capacitance and the input impedance of the LNA. The latter is controlled by resistor in Miller configuration for low noise contribution, resulting in a passive gain of 0dB. A two stage design was chosen to demonstrate functionality also at a voltage supply of 1.2V instead of the nominal 2.0V. The first stage transistor occupies 10 times the unit area of for base resistance noise reduction. The feedback resistor of the second stage, defines the gain of the
421
LNA independently of its load. The gain of the LNA, be estimated as 19dB using
can
where and denote the load of the first and the second stage, respectively. The mixer represents a load of is about However, taking into account all non-idealities, a gain of around can be expected. The input referred noise contribution of antenna, RF-filter, and LNA, is estimated as
with
The main noise contribution, follows from the final inductor Q of around 55, which corresponds to an equivalent parallel resistance
422
of about The next contributors are the shot noise from the base-emitter junction of and its feedback resistor These three noise sources account for more than 80% of The remaining noise generators are summarised as in equations (10) to (12). Since LO feedthrough will be suppressed by the 10.7MHz IFfilter, a single balanced mixer, shown in fig. 7, is used to achieve a given with the least current and only one pair of switching transistors (and the dominating noise current sources thereof). The IF-filter’s relatively low input impedance is power
423
matched to the high mixer output impedance in order to achieve the specified gain at low current drain. The IF-amplifier’s main functions are to provide most of the gain, which exposes it to high power consumption, and to provide constant signal levels before demodulation. A constant amplitude output can be achieved either with a limiting or a linear amplifier with gain setting. Any non-linearity induces distortion into the signal and hence increases the noise floor which might cause irreducible errors. The preferred linear amplifier also preserves a flexibility to use the receiver not only for constant-amplitude modulation formats like FSK or PSK, but also for modulation schemes like e.g. ASK or QAM with information in the amplitude. Increasing gain by adding stages increases current drain only logarithmically, whilst in one stage, current depends linearly on gain. On the other hand, concatenating stages increases both linearity and bandwidth specifications of the stages. The most power efficient trade-off was found with a three stage design, which is AC-coupled with digital gain setting from 0–42dB in 6dB steps as shown in fig. 8. The fully differential stages 2 and 3 provide 0–12dB gain each, while the first stage, shown in fig. 9, contributes 0–18dB for best noise distribution and incorporates single ended to differential conversion of the IF-filter signal. As noise is secondary at this stage, resistor could be used for impedance matching as well as single ended to differential conversion. Gain setting is implemented by switching currents at the input of the stages rather than switching loads at the output. The later is prone to gain errors caused by the strong impact of parasitics at
424
high and varying load impedances. The decision to use gain steps of 6dB offers best matching accuracy as building blocks can simply be doubled for double gain. Linearity is increased by the use of emitter degeneration resistors Their size varies indirectly proportionally to the switched currents in order to keep the loop gain T constant as seen in the basic gain equ. (14).
If a gain switch is turned off, the associated bias current bypasses its degeneration stage but preserves a constant low input impedance of the cascode transistors. A small common-collector buffer iso-
425
lates the load resistor from the rest of the circuit, allowing for best gain/current ratio thanks to high load resistances but without corrupting the bandwidth. The quadrature demodulator splits the signal into an inphase and a quadrature (I/Q) component, which differ ideally 90° in phase. They are translated to baseband, are amplified to the final signal level and undergo low pass filtering, before they are converted into the digital domain off-chip. Examination of various phase-shifting
426
techniques suggest that lowest power implementation at this IF is achieved by using a digital divider running at the double IF [12]. Its schematic in ECL (Emitter Coupled Logic) is depicted in fig. 10.
A doubly balanced Gilbert cell architecture with emitter degeneration is used to mix the IF-signal to baseband, see fig. 11. The common-emitter input configuration is used for its high input impedance. The doubly balanced structure is needed to reject local oscillator feedthrough which could saturate the following active filter stages. The pMOS current sources represent high impedance loads. The voltage gain of the transconductance mixer is defined together
427
with the filter input impedance. A order Bessel filter with providing antialiasing as well as further suppression of remaining interferer was chosen for its constant group delay. It is composed of a and a order filter stage (see fig. 12), followed by a small DC-amplifier providing gain (see fig. 13). Their transfer characteristic is where and are the transfer functions of the
428
second and third order filter stage and the DC-amplifier, respectively.
with
Filter structures with unity gain were chosen to avoid power hungry opamps, which were replaced by an emitter-follower type of configuration. See fig. 14 for the complete schematic of the two filter
429
stages. The emitter follower transistors were split to suppress direct signal coupling to the next stage through the feedback paths, which would corrupt out-of-band attenuation. A PTAT biasing circuit with start-up circuitry, see fig. 15, enables amplification independent of temperature and sheet resistance variation. The schematic diagram of the biasing circuit is shown in figure 15. It needs to be PTAT (Proportional To Absolute Temperature) to compensate for the negative temperature coefficient of the bipolar transistors’ transconductance. In order to generate the necessary voltage drop the current densities in transistors and have to be different, which can be attained either by different sizes of transistors and or of and
430
The MOS transistors were sized equally. See equations (21) to (23) for a derivation of and its direct proportionality to absolute temperature.
An emitter area ratio of
was chosen to achieve a
431
and for reasons of a symmetrical layout. The reference current is directly proportional to in equation (23) and results in
Normally, voltage gain in the circuits is attained by a transistor providing to a resistive load resulting in a gain or at least a gain proportional to as in the mixer stages. The transconductance is set by biasing the transistor with a current proportional to the reference current.
432
with the proportionality constant becomes
The voltage gain
Equations (26) and (27) firstly show that the transconductance becomes in fact independent of temperature. But secondly, the gain becomes also independent of absolute resistor tolerances, which can be as big as ±25%. The gain accuracy only depends on the mismatch between two resistors, which is normally much smaller than the absolute tolerance. However, it has to be kept in mind that the load resistors and the reference resistor have to be of the same kind. Reference currents are provided for the different building blocks as shown on the right side of fig. 15. Locally needed bias voltages are generated in the individual stages to avoid distribution of susceptible reference voltages over large distances. Since the generator has a second stable operating point at starting up is not self-evident. Although it is ensured principally by leakage currents, an extra circuitry has been added for faster settling and higher reliability.
433
5
Measurement results
Gain measurements and current consumption of all blocks are gathered in table 3. Input referred noise of the LNA alone and the whole
system are and resp. The maximal gain error of the IF-amplifier is 0.22dB (see fig. 16), its bandwidth 27MHz, the noise figure 15.0dB. The I/Q imbalance is 0.06dB in amplitude and 2.0° in phase. Fig. 17 shows very similar filter transfer characteristics for ideal and real curves. Blocking performance was measured by detecting 1dB gain drop of a signal in presence of a blocker signal swept through the neighbouring channels for worst case, resulting in Overall
434
was correspondingly measured and resulted in a worst case of Error vector magnitude measurement (EVM) for minimum input signal at 112kbaud is shown in fig. 19. According to [13], an EVM of 10% suffices to decode 8-ary PSK signals at a symbol error rate of 9.2% was achieved. To demonstrate functionality at a different modulation scheme, where linear amplification is necessary, a 32-ary QAM signal was applied at the same symbol rate, translating to 560kByte/sec, see fig. 20. The input signal was the resulting EVM 2.9%.
435
Figures 21 and 22 show a microphotograph of the die and a test board photograph.
6
Conclusions and outlook
Thanks to optimization on different levels, the implemented 200MHz receiver for high data rates of 336kbit/s drains merely while excellent performance is preserved. This confirms that in terms of power and size, miniaturised wireless hearing aids are feasible. Together with the previously implemented prescaler and VCO with buffer [1], even an overall current consumption of receiver and syn-
436
thesizer of 1mA is now conceivable. Although volume occupation seems acceptable for a BTE (Behind The Ear) solution and current consumption of the receiver is very low for the presented performance, it is still considerable for being added to the normal functionality of a hearing aid, especially if accounting for the additional power drain of synthesizer, transmitter and DSP. A relatively easy optimization step could be pursued by stricter specifications. Concentrating e.g. on digital phase modulation, a limiting instead of a linear IF-amplifier could be considered. The better the design can be aligned to the application, the more power may be saved. That is also the reason why general purpose solutions like a “bluetooth” implementation will not show lower current consump-
437
tion. If staying at a 2V power supply, an LNA architecture with stacked transistors is worth investigation for lower current consumption, increased linearity, and better noise performance. A better, maybe application specific, IF-filter could increase system reliability, lower linearity requirements of the circuits, and/or decrease power drain. I suppose that very advanced MOS processes, with minimal gate length of and beyond, might be more power efficient than a bipolar solution. Reducing the power supply to 1.2V is highly desirable to avoid voltage doubling, but has an impact on linearity of both MOS and bipolar solutions as demonstrated with the LNA. Parallel
438
instead of stacked transistor structures have to be implemented. A prototype including all system components should be used to identify remaining system shortcomings. Especially antenna design seems to be a major issue, especially in combination with the small ground plane which may or may not contact the human tissue. A working prototype could also prove very helpful in probably necessary frequency negotiations with authorities. In the long term, highest user acceptance is achieved with ITE (In The Ear) or CIC (Completely In the Channel) hearing aids. External components have to be avoided wherever possible. A direct conversion receiver architecture will have to be implemented as a consequence finally. Elimination of the body unit will also become a
439
highly desirable issue at this moment at the latest.
References [1] A. Deiss, D. Pfaff, and Q. Huang, “A 200-MHz Sub-mA RF Front End for Wireless Hearing Aid Applications,” IEEE J.
440
Solid-State Circuits, vol. 35, no. 7, pp. 977–986, July 2000.
[2] A. Deiss and S. Holles, Automatic Selection of Basis Functions for Nonlinear Prediction. Zürich: Diploma thesis, ETH Zürich, Mar. 1996.
[3] B. Widrow and S. D. Stearns, Adaptive signal processing. Upper Saddle River, NJ 07458: Prentice Hall, 1985. [4] Philips Semiconductors, UAA2080: Advanced pager receiver, Jan. 1996. [5] Xemics, XE1201A: 300–500 MHz Low-Power UHF Transceiver, 2001.
[6] A. Deiss, A Low Power 200 MHz Receiver for Wireless Hearing Aid Systems. PhD thesis, ETH Zürich, to be published, HartungGorre, Konstanz, 2002. [7] CEPT, ERC Rec. 70-03: Relating to the use of Short Range Devices (SRD), Mar. 2001. [8] BAKOM, FMB Nr. 03: Frequenzmerkblatt betreffend drahtlose Mikrofon-, Lautsprecher-, Kopfhörer- und Audioanlagen (Frequenzklasse 3), July 2000. [9] P. Leuthold and G. Meyer, Vorlesungsskript Übertragungstechnik II. Vorlesungsskript ETH Zürich, 1993.
441
[10] K. Rothammel, Antennenbuch. Stuttgart: Telekosmos Franckh, 1984. [11] J. G. Proakis, Digital Communications. New York: McGrawHill, 1995. [12] P. G. Orsatti, A Low Power CMOS GSM Transceiver for Small Mobile Stations. PhD thesis, ETH Zürich, Hartung-Gorre, Konstanz, 2000. [13] J. L. Pinto and I. Darwazeh, “Phase Distortion and Error Vector Magnitude for 8-PSK Systems,” Proceedings of the London Communications Symposium, Sept. 2000.